Shared publicly  - 
"Should I use MongoDB or CouchDB (or Redis)?"

I see this question asked a lot online. Fortunately, once you get familiar enough with each of the NoSQL solutions and their ins and outs, strengths and weaknesses, it becomes much clearer when you would use one over the other.

From the outside, so many of them look the same, especially Mongo and Couch. Below I will try and break down the big tie-breakers that will help you decide between them.

[**] Querying - If you need the ability to dynamically query your data like SQL, MongoDB provides a query syntax that will feel very similar to you. CouchDB is getting a query language in the form of UNQL in the next year or so, but it is very much under development and we have no knowledge of the impact on view generation and query speed it will have yet, so I cannot recommend this yet.

[**] Master-Slave Replication ONLY - MongoDB provides (great) support for master-slave replication across the members of what they call a "replica set". Unfortunately you can only write to the master in the set and read from all.

If you need multiple masters in a Mongo environment, you have to set up sharding in addition to replica sets and each shard will be its own replica set with the ability to write to each master in each set. Unfortunately this leads to much more complex setups and you cannot have every server have a full copy of the data set (which can be handy/critical for some geographically dispersed systems - like a CDN or DNS service).

[**] Read Performance - Mongo employs a custom binary protocol (and format) providing at least a magnitude times faster reads than CouchDB at the moment. There is work in the CouchDB community to try and add a binary format support in addition to JSON, but it will still be communicated over HTTP.

[**] Provides speed-oriented operations like upserts and update-in-place mechanics in the database.

[**] Master-Master Replication - Because of the append-only style of commits Couch does every modification to the DB is considered a revision making conflicts during replication much less likely and allowing for some awesome master-master replication or what Cassandra calls a "ring" of servers all bi-directionally replicating to each other. It can even look more like a fully connected graph of replication rules.

[**] Reliability of the actual data store backing the DB. Because CouchDB records any changes as a "revision" to a document and appends them to the DB file on disk, the file can be copied or snapshotted at any time even while the DB is running and you don't have to worry about corruption. It is a really resilient method of storage.

[**] Replication also supports filtering or selective replication by way of filters that live inside the receiving server and help it decide if it wants a doc or not from another server's changes stream (very cool).

Using an EC2 deployment as an example, you can have a US-WEST db replicate to US-EAST, but only have it replicate items that meet some criteria. like the most-read stories for the day or something so your east-coast mirror is just a cache of the most important stories that are likely getting hit from that region and you leave the long-tail (less popular) stories just on the west coast server.

Another example of this is say you use CouchDB to store data about your Android game in the app store that everyone around the world plays. Say you have 5 million registered users, but only 100k of them play regularly. Say that you want to duplicate the most active accounts to 10 other servers around the globe, so the users that play all the time get really fast response times when they login and update scores or something. You could have a big CouchDB setup in the west code, and then much smaller/cheaper ones spread out across the world in a few disparate VPS servers that all use a filtered replication from your west coast master to only duplicate the most active players.

[**] Mobile platform support. CouchDB actually has installs for iOS and Android. When you combine the ability to run Couch on your mobile devices AND have it bidirectionally sync back to a master DB when it is online and just hold the results when it is offline, it is an awesome combination.

[**] Queries are written using map-reduce functions. If you are coming from SQL this is a really odd paradigm at first, but it will click relatively quickly and when it does you'll see some beauty to it. These are some of the best slides describing the map-reduce functionality in couch I've read:

[**] Every mutation to the data in the database is considered a "revision" and creates a duplicate of the doc. This is excellent for redundancy and conflict resolution but makes the data store bigger on disk. Compaction is what removes these old revisions.

[**] HTTP REST JSON interaction only. No binary protocol (yet - vote for support), everything is callable and testable from the command line with curl or your browser. Very easy to work with.

These are the biggest features, I would say if you need dynamic query support or raw speed is critical, then Mongo. If you need master-master replication or sexy client-server replication for a mobile app that syncs with a master server every time it comes online, then you have to pick Couch. At the root these are the sort of "deal maker/breaker" divides.

Fortunately once you have your requirements well defined, selecting the right NoSQL data store becomes really easy. If your data set isn't that stringent on its requirements, then you have to look at secondary features of the data stores to see if one speaks to you... for example, do you like that CouchDB only speaks in raw JSON and HTTP which makes it trivial to test/interact with it from the command line? Maybe you hate the idea of compaction or map-reduce, ok then use Mongo. Maybe you fundamentally like the design of one over the other, etc.etc.

If anyone has questions about Redis or others, let me know and I'll expand on how those fit into this picture.

You hear Redis described a lot as a "data structure" server, and if you are like me this meant nothing to you for the longest time until you sat down and actually used Redis, then it probably clicked.

If you have a problem you are trying to solve that would be solved really elegantly with a List, Set, Hash or Sorted Set you need to take a look at Redis. The query model is simple, but there are a ton of operations on each of the data structure types that allow you to use them in really robust ways... like checking the union between two sets, or getting the members of a hash value or using the Redis server itself as a sub/pub traffic router (yea, it supports that!)

Redis is not a DB in the classic sense. If you are trying to decide between MySQL and MongoDB because of the dynamic query model, Redis is not the right choice. In order to map you data to the simple name/value structure in Redis and the simplified query approach, you are going to spend a significant amount of time trying to mentally model that data inside of Redis which typically requires a lot of denormalization.

If you are trying to deploy a robust caching solution for your app and memcache is too simple, Redis is probably perfect.

If you are working on a queuing system or a messaging system... chances are Redis is probably perfect.

If you have a jobs server that grinds through prioritized jobs and you have any number of nodes in your network submitting work to the job server constantly along with a priority, look no further, Redis is the perfect fit. Not only can you use simple Sorted Sets for this, but the performance will be fantastic (binary protocol) AND you get the added win of the database-esque features Redis has like append only logging and flushing changes to disk as well as replication if you ever grew your jobs server beyond a single node.

NOTE: Redis clustering has been in the works for a long time now. Salvatore has been doing some awesome work with it and it is getting close to launch so if you have a particularly huge node distribute and want a robust/fast processing cluster based on Redis, you should be able to set that up soon.

Redis has some of these more classic database features, but it is not targeted at competing with Mongo or Couch or MySQL. It really is a robust, in-memory data structure server AND if you happen to need very well defined data structures to solve your problem, then Redis is the tool you want. If your data is inherently "document" like in nature, sticking it in a document store like Mongo or CouchDB may just make a lot more mental-model sense for you.

NOTE: I wouldn't underestimate how important mentally understanding your data model is. Look at Redis, and if you are having a hard time mapping your data to a List, Set or Hash you probably shouldn't use it. Also note that you WILL use many more data structures than you might realize and it may feel odd... for example a twitter app may have a single user as a hash, and then they might have a number of lists of all the people they follow and follow them -- you would need this duplication to make querying in either direction fast. This was one of the hardest or most unnatural things I experienced when trying to solve more "classic DB" problems with Redis and what helped me decide when to use it and when not to use it.

I would say that Redis compliments most systems (whether it is a classic SQL or NoSQL deployment) wonderfully in most cases in a caching capacity or queue capacity.

If you are working on a simple application that just needs to store and retrieve simple values really quickly, then Redis is a perfectly valid fit. For example, if you were working on a high performance algorithmic trading and you were pulling ticker prices out of a firehose and needing to store them at an insane rate so they could be processed, Redis is exactly the kind of datastore you would want to turn to for that -- definitely not Mongo, Couch or MySQL.

IMPORTANT: Redis's does not have a great solution to operating in environments where the data set is bigger than ram and as of right now solutions for "data bigger than ram" has been abandoned. For the longest time this was one of the gripes with Redis and Salvatore (!/antirez) solved this problem with the VM approach. This was quickly deprecated by the diskstore approach after some reported failures and unhappiness with how VM was behaving.

Last I read (a month ago?) was that Salvatore wasn't entirely happy with how diskstore turned out either and that attempts should really be made to keep Redis's data set entirely in memory when possible for the best experience.

I say this not because it is impossible, but just to be aware of the "preferred" way to use Redis so if you have a 100GB data set and were thinking of throwing it on a machine with 4GB of ram, you probably don't want to do that. It may not perform or behave like you want it to.

ADDENDUM: I answered the question about Couch vs Mongo again on Quora with some more tech details if you are interested:
Rick Hightower's profile photoTobias Normann's profile photoJesse Armand's profile photoJovan Kostovski's profile photo
I would love to hear how you see Redis fitting into that picture ?
Len, I just added a huge addendum at the bottom for Redis, take a look when you have a chance and let me know if I answered your question.
Use Redis if you can solve your problem using basic data structures and understand that it's simply an awesome key-value store and doesn't belong in a comparison against Mongo nor Couch.
Thanks Riyad. Thats a great write up and positions it nicely for me.
Len, most welcome. Tim is exactly right -- he just summed up my 2k words into 1 sentence :)
Thank you for the kind words.
Don't miss to try OrientDB as well. Very fast (binary protocol used and/or HTTP), support for graphs, supports a SQL-derived language and it's extremely portable. Let's give it a chance ;-)
It depends by the database type used but on common notebook you can reach 150k write/sec on disk using the embedded protocol (Java client with the database engine running on the same JVM). This is the maximum value, but depending on your configuration can be much less. If you run a benchmark I can suggest some tips to optimize performances.
Thanks so much for this. Can I ask would it be appropriate to use Redis as a key value store for CouchDB to improve querying speed on CouchDB? Also how scalable is CouchDB, can it get very big like HBase / BigTable?
Stuart, you are exactly right, that is a perfect use case for Redis especially if your query model from CouchDB has data like that. You can also get really fancy with Redis by storing the keys with expiration times set on them ( so they will naturally expire after an appropriate amount of time at which point you can re-query CouchDB for the updated versions.

CouchDB is as scalable as any of these other popular NoSQL datastores fortunately; both in allowable connections (it scales gracefully to hundreds of connections) and in raw size (TBs of data).

If you check the Couch mailing list it seems regular to find people working with some fairly large data sets. I'd also point out that while the CouchDB storage model of append-only/revision-everything can take up more room between compactions, it is inherently a wonderfully safe storage design. You can actually snapshot the DB datastore while the server is running as it is always current. You don't need to worry about locking the DB first because updates are not done in-place.

This also avoids the risk of DB failure corrupting the datastore on a bad shutdown or crash which is great for big deployments where having a lot of redundant copies (Mongo's model) less attractive.

It all depends what you need, but I think that configuration of Redis+CouchDB should work really well for you.

NOTE: If you are talking about petabytes worth of data though, I have no idea how any of these solutions perform at that scale only because I haven't seen any data on it.
No I'm not anticipating anything above Terrabytes. Although the data set will grow over time, it was the opportunity to scale up easily as it grows that is most interesting about the Nosql DBs. Also as we anticipate the structure having to flex as it grows (we may well want to include fields we haven't anticipated yet), so the Nosql ability to alter data structure without having to take everything offline is also very cool.
hah, Park, do you have a deployment of Mongo that you could share some information about? What worked well? What didn't work so well?
hmm, it's absolutely up to you , In my case, i choose mongodb. because, i'm familiar with RDB (oracle, mssql , mysql) and tried both cassandra and mongodb. i think , mongo is better for node.js :) . but i said , it's up to you.
Park, what did you think of Cassandra? I have a very hard time mentally mapping my data models to the columnar views.
yeah...indeed... if u feel like that, why don't you try couchDB in order to understand noSql what i did.
i study noSql with couchDB , and i'm using mongodb. hmm, i will go to cassandra ~! hehe :)
but, it's just my case :)
+Riyad Kalla great writeup as I'm in the process of defining my data stores. Basically I'll be storing data incoming from a firehose and later running statistical analysis on it. I've been told (and it seems confirmed from your post here) that having Node.js + Redis for pure storage is the way to go.

As for the "datasets no larger than RAM" issue you bring up would you suggest using Redis purely for the (extremely fast) mass storage and then extracting the data to another solution (CouchDB or what not) for analysis? I love the Map/Reduce functionality built in to Couch and could use that along with the other stats I'm looking to extract from the data.
+Robert Dempsey Redis is meant for in-memory data storage only. So if you are storing more GBs of data than RAM on your computer, use MongoDB. But if you only need to temporary hold a few GBs of data as a queue of work before you process it, then yes, Redis is perfect for that. Especially for data coming out of a firehose (extremely high rate).

+Robert Dempsey Also keep in mind that both Node.JS and Redis are single-threaded. So if you are running this setup on a really beefy machine with lots of cores, you can actually run multiple Redis or Node (or both) processes to maximize the hardware you are running on.
+Riyad Kalla thanks. I'll keep that in mind. I'm planning on using Redis only for temporary storage to get the throughput into the database from the firehose. I'll be sure to run it on beefy machines too :)
Thank you for the kind comments.
How good does Couchdb handle 1000 replication requests? Lets say i have 1000 couchdb's on mobiles and actually one central couchdb server, and i activate sync at the same time to all, do you know couch can take this load without an issue?
+Akash Ramani This isn't actually as big of an issue as it sounds for CouchDB assuming the server is on sufficient hardware. Couch's leveraging of Erlang allows it to handle thousands (or 10s of thousands) of connections elegantly, by design... as in "That is why we chose Erlang, because it does that so well" type of design, not just the CouchDB design.

Additionally, Ubuntu's Desktop Sync used CouchDB for quite a while and has many more than 1000 clients syncing back to the Ubuntu masters -- while the collaboration was disbanded due to differences of opinions and some technical mismatches between what CouchDB did and what Ubuntu wanted, it was functioning wonderfully well.

In short "You will be fine, Couch in some capacity was designed to do just that".
Which database would you recommend for hierarchical data, for example there are organisations, groups (in an organisation) and members of a group? This will require a lot of joins within one query (for example, get all people in an organisation). On the other hand, each entity (organisation, group, member) can be described easily in a document and there can be different parameters for different organisations.
+Adam Dziedzic I would always advocate going with what your team knows best all other factors being equal; are you operating at scale so large that a classic MySQL/PostgreSQL datastore doesn't make sense here? You mention orgs/groups/members -- is this for some backoffice app where there will only ever be hundreds/thousands of these things, or are you just giving an example of the organization and there will be billions of these kinds of records? NoSQL isn't a godsend if you don't need it, the denormalization work/accounting (keeping metadata duplicated/updated correctly across records) can stink and run you into edge failure cases.

Before I hopped on the Mongo/Couch/Redis/Cassandra bandwagon, I'd make sure that datastore gave me something I had to have first. If you want to share more about your scale/design I'll try and help. We can also take this offline if it is not something you want to discuss publicly.
Thank you for your answer. Many members of my team have some experience in CouchDB and want to use it. The volume of data is not so big - we can say it's like for some backoffice app where there will be thousands of these things. Flexible schema is of great value for us. Unfortunately, I cannot discuss more details publicly. 
+Adam Dziedzic Given the scale I think many different approaches and data stores will likely work for you. If your team knows CouchDB and wants to use it, you could throw together a minimal viable impl pretty quickly. 
Thank you for your pieces of advice. I appreciate it!
Thanks for putting this together. Its a great launchpad for decision making.
Hi Riyad,

I am trying to build browser-based app, somewhat similar to eventbrite. I have never tried No-sql, but eager to use in this project. I would be building my app in SOA manner. Backend will provide RESTFul apis, so that I can build mobile apps easily in future, if I wish. Which no-sql, would you suggest me to use it. I would be using ROR on nginx. 

Thanks in advance.

If this is a commercial project, my recommendation is definitely "use what you know best", but if this is a project and you are having a bit of fun learning something new, then the best thing to do is start with the data model first. 

The best things to think about first are your data model and query model; if you can give me a better idea of what that looks like, I can probably give you a more detailed suggestion. If you'd rather do that privately you can drop me an email at rkalla at gmail
Thanks Riyad. I will send you an e-mail.
Thanks for the great article, Riyad. I am very much new to the world of noSQL, but have some good experience with RDBs. Which would be the best to start with among mongoDB, couchDB, cassandra etc. ? This is for learning only, not for any commercial project.
+Miltan Chaudhury MongoDB will feel the most at-home to you. Cassandra will feel huge/strange and CouchDB will feel dead-easy until you want to start querying for non-key based values and have to start writing map-reduce functions in JavaScript stored in the DB. None of this is horrible, just saying that Mongo will feel the most natural to you (one of the reasons it has been so successful).
If mongodb satisfies your usecase mysql would also most likely...But couch db on the other hand is just AMAZING...It offers some stuff that nothing else can...the replication engine with MVCC is just genius.
Uh, I am probably quite late to the discussion, but I have a doubt. I have this web-traffic monitoring and analysis app that not only records session data, but decides whether to allow the visitor to visit the page based on a blacklist.

Is using a noSQL db(specifically CouchDB) justified in this case? The target demographic of this app are medium-sized dbs, which hopefully can be extended to a little big websites.
+noyb noybee  -- given the blacklist check, I would strongly recommend Redis at least for that portion of it (using the set operations like SISMEMBER ( -- if you are really writing out web analytics data, I would probably go with MongoDB -- it is a great data store for heavy writes like web analytics tend to be and allows dynamic querying easily something akin to SQL.

If you are set on using CouchDB that is fine too, but if you had no idea, I would leverage the strengths of those other data stores for this particular case.

Keep in mind that CouchDB is an append-only file format, so you are writing a TON of new data which incurs rewriting internal node information in the index and when you go to erase the old records after an expired date, you have to compact your data file.

This is why I think Mongo makes a better fit here as your primary backing store.

Redis as your in-memory, blacklist store though for a quick SET check.
Riyad, I read your article, very helpful. Question: I have a situation for commercial deployment, looking for a highly scalable, NoSchema (not NoSQL), persistent db. Also need good pub/sub support with write performance and replication. I understand Mongodb, Redis, Couchdb  has their pros and cons. Also need good C++/C Driver (client lib) support with sync/async interfaces. Which one would you suggest and why ? 
Help me understand your data model a bit better -- is it highly structured data (e.g. product catalog data) or highly unstructured? What is the usecase for pub/sub so I can better understand what the function is you need. Also what kind of replication -- master/slave? master/master? is this a globally distributed system that needs to stay current or is it expected that the entire deployment will be in a single data center?

I can't speak to the sync/async support directly in any C/C++ drivers for any of these DBs, but you can obviously code that logic in your client code if the driver doesn't support it directly.
Hi Riyad,
The data is mostly structured. The pub/sub so that when a particular record has any CRUD action, business logic need to kick in.  Replication:  Apart from mobile environment, what situations require Master/Master ?  Since consistency is high priority, if Master/Slave can guarantee high consistency that would requirement. It would be hosted in both situations, globally distributed and also locally too.
what do you use for a search engine, I try toi use redis for her perform but it's not easy. What do you think about that ?
Hi Riyad,
We are developing a Mobile App with capability to store high resolution Images for our customers. Images won't be frequently uploaded- maybe 3-4 times a month in quantity of 3 to 4. Customer base would be extremely high. Offline-online capability is a must. Reads need to be faster.
Could you please suggest what DB would suit us best.

What do you mean online-offline caps?

For images, Amazon S3 is a no-brainer file store, doesn't sound like you need a NoSQL solution for that unless I missed something?
Riyad- Thanks for a quick response. Images are part of other User Data. Yes caps capability required. Reads & Writes casual per user and primary ability to search other Data. Were exploring MYSQL and separate File store vs. using NoSql. Thoughts ?
there is elasticsearch for seach quckly a word like for exemple autocomplete and it more fast than mongodb. I have used this option beacause it replace mongodb and redis in one. See more about elasticsearch :
Add a comment...