Over the past several months several high traffic services have gone live at Gilt using MongoDB as a back end. MongoDB collections are a really good fit for many of our use cases. What we’ve found is that when MongoDB is happy it runs fast and reliably; but there are a few things that can catch you out.
Throttle your writes/updates
MongoDB uses a global write lock. Only one process can write at a time, and while writing is happening, nothing can read (items will queue). A very high write load will make read operations very slow.
If you need to do data migrations or batch jobs of any kind on your DB, you will need to ensure that the writes happen at a rate of < 50 per second if not using SSD (you can go higher with SSDs). Usually, you’ll need to implement some way of prioritizing real-time updates during this process.
If you expect a high write load during normal operation, you need to think about sharding. Good capacity planning and loadtesting prior to going live are important.
MongoDB write performance is limited by disk performance, so if lots of writes are going to happen SSDs are a must.
While performance testing your service, make sure to use a write load as well as a read-only load, to simulate how your service will perform in production.
Manage the length of field names
Mongo stores the field name as part of each record. If you pick very long names for your fields your data will be larger than it needs to be. MongoDB only performs well if it can keep working data in memory, so keep field names as short as is reasonable.
For any substantial dataset, query performance will suck if you don’t use the right indexes (I have seen it make a difference of about 4 orders of magnitude in response time).
More info here: http://www.mongodb.org/display/DOCS/Indexes
Indexed data doesn’t have to be unique, and compound indexes are supported.
Use the _id field wisely
MongoDB requires this field and always creates an index on it, so if possible actually use it for something useful.
Think about how your records will grow over time
If your record grows in size, MongoDB will have to move it. This is the most expensive thing you can do in MongoDB, because it means two writes, and then all indexes need to be updated as well (more writes). If your records are likely to grow it’s worth creating them with currently unused attributes in the documents (set to whitespace or similar). This means your updates won’t always lead to document moves. This needs to be balanced against unnecessary bloat in your document size, though.
Don’t use ensureIndex in code
Building an index is a blocking operation for MongoDB, and it won’t service requests while the index build is in progress. If you add a new index via code, the index will get created when you deploy your service, which is probably not what you want (essentially, you’ll end up bringing your own service down for however long it takes to build the index).
For production instances, the instance needs to be temporarily removed from the replica set, the index rebuilt, and then the instance is re-added. This process gets followed for all members of a replica set.
Don’t keep unnecessary indexes around
Indexes incur a cost when records are moved.