Capacity Planning

  • How many nodes?
  • The basic starting point is two nodes with 2 cores and 4GB of memory on each node
  • For fault tolerance perspective three nodes is more appropriate for any cluster
  • What's better more nodes or bigger nodes?
  • More nodes equals IO, Memory, and GC (garbage collector) distributed processing
  • Common pitfall with distributed databases - stressing common storage e.g. SAN (system attached storage)
  • Bigger nodes means more processing can be performed on a node with fast access to in-memory data and faster local IO
  • Resizing node in production is likely more challenging than adding a new node to the cluster
  • Elasticsearch is built for scaling out on commodity hardware, not up on single massive machine
  • How high can it go? Pretty high
  • So which one it is going to be: more smaller nodes or less larger nodes?

