As a hosted ELK provider, we are often asked by prospective users of the ELK stack to figure out how many and what kind of servers they'll need to get to support the requirements of their IT log management system.
Obviously, production deployments can vary wildly and this means that you could run a legitimate mission-critical log management system with just one server, or several hundred servers.
So, to answer the question, "How many servers does it take to run Elasticsearch?" the answer is, as always, "It depends."
A typical log message ranges from 200 to 2,000 bytes or more in size and can contain various types of data. Even if a log message cuts the middle ground and is, say, 500 bytes, the amount of space occupied on the disk can change depending on various factors.
To start making rough estimates of your server requirements you will need to carry out some testing using representative data.
The data volume in Elasticsearch is prone to expansion during the indexing process so you may need to tinker with the configuration of certain text fields as to whether they are analysed or not. Compression is also an option, although you may have to wait until version 2.0 of the Elasticsearch tool for a fully configurable compression offering.
To analyse of not?
Textual analysis at index time can have an impact on disk space. It is also a fundamental component of the search facility because it optimises the user experience by pre-processing the text during the query.
There are several other features that can be configured to optimise your disk space. Here's a quick rundown:
1. The _all field contains the values of all the fields of a document - if they are all indexed using the _all field property as well as being indexed in its own field then there will be additional storage overhead. For more information, please click here: [http://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html].
2. Doc values can reduce memory usage but enabling them also results in additional on-disk data structures to be created at index time. This can lead to larger index files and more information is available here: [ http://www.elastic.co/guide/en/elasticsearch/guide/current/doc-values.html].
3. Replication - Elasticsearch is a distributed system that works under the assumption that your hardware will fail and it ensures resiliency to such failures using replication. This, obviously, has an impact on your disk storage space so, when you are comparing a hosted ELK solution, consider whether replication is factored in.
If you enjoyed this post on how many servers it takes to run Elasticsearch then why not check out our blog on why centralised logging is so important, or the rest of our centralised logging blog posts.