How to increase the MDS Cache Memory Limit on your Ceph cluster on the fly

In the following I will show you how to increase the MDS Cache Memory Limit (mds_cache_memory_limit) on a Ceph cluster (Mimic 13.2.6) without downtime.

Published on: May 11, 2020 by Website Admin

A Ceph cluster (at least in Mimic version), by default, comes set up with MDS cache memory limit (mds_cache_memory_limit) of 1G... and that is not enough if you are running some heavy load clients with CephFS and you will soon start to get warning like client X is failing to respond to cache pressure.

How do I know that Ceph cluster comes with mds_cache_memory_limit of 1G you ask? Well, I run the following command on a Ceph MDS server:

ceph daemon mds.<<your_ceph_mds_server_name>> config get mds_cache_memory_limit
... and you should get the following output:
{
    "mds_cache_memory_limit": "1073741824"
}

Now, the important part...

Always perform modifications on a standby MDS server. Do not perform modifications on a active server because (from my experience) the MDS will get stuck for some time or restart. At least this happened to me on my Mimic 13.2.6 ceph cluster.

The command to increase MDS Cache Memory Limit from 1G to 6G on your Ceph cluster is (if want more, do some calculations as 1073741824 Kilobytes is 1G 😛 ):

ceph daemon  mds.<<your_ceph_mds_server_name>> config set mds_cache_memory_limit 68719476736

Do the above modification on all your MDS standby servers and I truly hope you have more then one MDS servers on your cluster, otherwise, you are screwed. Or, you can read this and quickly deploy extra MDS servers and it's all good 😊.

Now stop the active MDS server with systectl (I am running my cluster on Ubuntu 16.04) and watch how one of your standby MDS servers becomes active.

Remember to perform the above modifications on the ex-active MDS server and that's it.

Oh, one more thing... If you reboot the servers the default values will get activated (1G mds_cache_memory_limit) and every modification performed will be erased. To prevent that from happening add the following config in /etc/ceph/ceph.conf file.

[mds.<<your_ceph_mds_server_name>>]
mds_cache_memory_limit = 68719476736

That's it!.