If you are fairly new to centralised logging with Elastic Search, Logstash and Kibana (ELK), you may be uncertain about whether you need to account for old indices in the translog.

This would be an issue if you need to account for it, because it may affect current indices and it would significantly increase the storage space required for the overall system. Fortunately, there are many steps you can take to minimise the required storage space, and the translog shouldn't be a problem.

Tweaking the settings to improve efficiency

In general, the translog stays around a short while in order to enable faster recovery based on sequence numbers. You can fine tune these setting in accordance with your preferences, but once the translog retention age is reached it should cease affecting indices. This is a simple step that can make a significant difference for you.

For minimising the required storage space, you can optimise your settings for mappings and index. There are plenty of tips and guidance available on how to do this for your platform, and much of it can be applied as a generic practice, no matter what software you are using. Another relatively simple step that can make all the difference.

A few steps you can take:

  • It is good practice to keep the average shard size fairly large, as larger shards usually perform better when it comes to compression.
  • You can run the 'force merge API' protocol on indices that are no longer being indexed if you want to reduce the number of segments at play and save a little extra space.
  • You can enable "best_compression" for indices - this should reduce the overall size of your index.

ZFS compression

One seemingly obvious step to take is to use ZFS compression to make savings in terms of space. But if you take a closer look at the savings this makes, most of it is related to the translog, which should not exist for older indices anyway. If you follow the steps outlined above, there should be little to gain from using ZFS with compression.