Saturday, August 18

MONGODB Behaviour with NFS Storage

Just wanted to share a finding which I faced during mongoDB start up with NFS storage.

SCENARIO:

One of secondary went out of sync (Oplog window crossed at primary side). Copied the latest and good snapshot-backup of primary at secondary side.

When trying to start the instance at secondary side encountered with below Error:

ERROR:

2018-07-17T14:27:23.226Z I STORAGE [initandlisten] The size storer reports that the oplog contains 56474469 records totaling to 21717749153 bytes

2018-07-17T14:27:23.230Z I STORAGE [initandlisten] Sampling from the oplog between Jul 17 06:12:35:1cf7 and Jul 17 13:14:06:1313 to determine where to place markers for truncation

2018-07-17T14:27:23.230Z I STORAGE [initandlisten] Taking 1011 samples and assuming that each section of oplog contains approximately 558429 records totaling to 214748737 bytes

2018-07-17T14:27:23.769Z I STORAGE [initandlisten] Failed to get enough random samples, falling back to scanning the oplog

INDICATIONS:

Whenever I am starting using command line option /u01/mongodb/3.2/bin/mongod -f /u01/mongodb/3.2/etc/lmrevents/27029/mongod.conf & . It will get stuck at message “about to fork child process, waiting until server is ready for connections”.

And will not show the message “child process started successfully, parent exiting”. During this if you do ps against the same process you will notice three process would be running (Parent process along with child process) and it will end up with message “ERROR: child process failed, exited with error number 51” after long time (for us 15 minute)



mongodb 21038 19297 0 16:42 pts/0 00:00:00 /u01/mongodb/3.2/bin/mongod -f /u01/mongodb/3.2/etc/lmrevents/27029/mongod.conf --- PARENT PROCESS

mongodb 21039 21038 0 16:42 ? 00:00:00 /u01/mongodb/3.2/bin/mongod -f /u01/mongodb/3.2/etc/lmrevents/27029/mongod.conf --- CHILD PROCESS

mongodb 21040 21039 2 16:42 ? 00:00:00 /u01/mongodb/3.2/bin/mongod -f /u01/mongodb/3.2/etc/lmrevents/27029/mongod.conf --- CHILD PROCESS

WHY AND HOW IT IS HAPPENING:

In version 3.1.8 and later mongo added a new feature: Keep "milestones" against the oplog to efficiently remove the old records using WT_SESSION::truncate() when the collection grows beyond its desired maximum size

On startup WiredTiger now needs to read values from the oplog to figure out where to place the milestones. This process is fine if all of the data is stored locally on disk. The problem is when you are using external storage (For example: Amozon EBS and Network File Storage: NFS , for us it is NFS) and the sampling and truncation of oplog never completed.

In case of EBS: sampling from cold EBS is taking somewhere between 20 and 30 minutes leading to incredibly slow startup times or error message.

In case of NFS: Network latency is the culprit and encountering the error.

HOW TO RESOLVE THIS:

MongoDB has taken it feature request and brought it in version 3.4.

feature request: so that WiredTigerKVEngine::initRsOplogBackgroundThread() skips starting a thread and thus skips sampling the oplog when the undocumented --queryableBackupMode option is specified.
So till the version 3.2 you have to wait for no network latency or hot EBS to start it again or get rid of Network storage.
It is highly not recommended to use remote storage for MongoDB. 

References : SERVER-23935