"Should I use MongoDB or CouchDB (or Redis)?"
I see this question asked a lot online. Fortunately, once you get familiar enough with each of the NoSQL solutions and their ins and outs, strengths and weaknesses, it becomes much clearer when you would use one over the other.
From the outside, so many of them look the same, especially Mongo and Couch. Below I will try and break down the big
tie-breakers that will help you decide between them.
[**] Querying - If you need the ability to dynamically query your data like SQL, MongoDB provides a query syntax that will feel very similar to you. CouchDB is getting a query language in the form of UNQL in the next year or so, but it is very much under development and we have no knowledge of the impact on view generation and query speed it will have yet, so I cannot recommend this yet.
[**] Master-Slave Replication ONLY - MongoDB provides (great) support for master-slave replication across the members of what they call a "replica set". Unfortunately you can only write to the master in the set and read from all.
If you need multiple masters in a Mongo environment, you have to set up sharding in addition to replica sets and each shard will be its own replica set with the ability to write to each master in each set. Unfortunately this leads to much more complex setups and you cannot have every server have a full copy of the data set (which can be handy/critical for some geographically dispersed systems - like a CDN or DNS service).
[**] Read Performance - Mongo employs a custom binary protocol (and format) providing at least a magnitude times faster reads than CouchDB at the moment. There is work in the CouchDB community to try and add a binary format support in addition to JSON, but it will still be communicated over HTTP.
[**] Provides speed-oriented operations like upserts and update-in-place mechanics in the database.
[**] Master-Master Replication - Because of the append-only style of commits Couch does every modification to the DB is considered a revision making conflicts during replication much less likely and allowing for some awesome master-master replication or what Cassandra calls a "ring" of servers all bi-directionally replicating to each other. It can even look more like a fully connected graph of replication rules.
[**] Reliability of the actual data store backing the DB. Because CouchDB records any changes as a "revision" to a document and appends them to the DB file on disk, the file can be copied or snapshotted at any time even while the DB is running and you don't have to worry about corruption. It is a really resilient method of storage.
[**] Replication also supports filtering or selective replication by way of filters that live inside the receiving server and help it decide if it wants a doc or not from another server's changes stream (very cool).
Using an EC2 deployment as an example, you can have a US-WEST db replicate to US-EAST, but only have it replicate items that meet some criteria. like the most-read stories for the day or something so your east-coast mirror is just a cache of the most important stories that are likely getting hit from that region and you leave the long-tail (less popular) stories just on the west coast server.
Another example of this is say you use CouchDB to store data about your Android game in the app store that everyone around the world plays. Say you have 5 million registered users, but only 100k of them play regularly. Say that you want to duplicate the most active
accounts to 10 other servers around the globe, so the users that play all the time get really fast response times when they login and update scores or something. You could have a big CouchDB setup in the west code, and then much smaller/cheaper ones spread out across the world in a few disparate VPS servers that all use a filtered replication from your west coast master to only duplicate the most active players.
[**] Mobile platform support. CouchDB actually has installs for iOS and Android. When you combine the ability to run Couch on your mobile devices AND have it bidirectionally sync back to a master DB when it is online and just hold the results when it is offline, it is an awesome combination.
[**] Queries are written using map-reduce functions. If you are coming from SQL this is a really odd paradigm at first, but it will click relatively quickly and when it does you'll see some beauty to it. These are some of the best slides describing the map-reduce functionality in couch I've read:http://www.slideshare.net/gabriele.lana/couchdb-vs-mongodb-2982288
[**] Every mutation to the data in the database is considered a "revision" and creates a duplicate of the doc. This is excellent for redundancy and conflict resolution but makes the data store bigger on disk. Compaction is what removes these old revisions.
[**] HTTP REST JSON interaction only. No binary protocol (yet - vote for http://ubjson.org
support), everything is callable and testable from the command line with curl or your browser. Very easy to work with.
These are the biggest features, I would say if you need dynamic query support or raw speed is critical, then Mongo. If you need master-master replication or sexy client-server replication for a mobile app that syncs with a master server every time it comes online, then you have to pick Couch. At the root these are the sort of "deal maker/breaker" divides.
Fortunately once you have your requirements well defined, selecting the right NoSQL data store becomes really easy. If your data set isn't that stringent on its requirements, then you have to look at secondary features of the data stores to see if one speaks to you... for example, do you like that CouchDB only speaks in raw JSON and HTTP which makes it trivial to test/interact with it from the command line? Maybe you hate the idea of compaction or map-reduce, ok then use Mongo. Maybe you fundamentally like the design of one over the other, etc.etc.
If anyone has questions about Redis or others, let me know and I'll expand on how those fit into this picture.
You hear Redis described a lot as a "data structure" server, and if you are like me this meant nothing to you for the longest time until you sat down and actually used Redis, then it probably clicked.
If you have a problem you are trying to solve that would be solved really
elegantly with a List, Set, Hash or Sorted Set you need to take a look at Redis. The query model is simple, but there are a ton of operations on each of the data structure types that allow you to use them in really robust ways... like checking the union between two sets, or getting the members of a hash value or using the Redis server itself as a sub/pub traffic router (yea, it supports that!)
Redis is not a DB in the classic sense. If you are trying to decide between MySQL and MongoDB because of the dynamic query model, Redis is not
the right choice. In order to map you data to the simple name/value structure in Redis and the simplified query approach, you are going to spend a significant amount of time trying to mentally model that data inside of Redis which typically requires a lot of denormalization.
If you are trying to deploy a robust caching solution for your app and memcache is too simple, Redis is probably perfect
If you are working on a queuing system or a messaging system... chances are Redis is probably perfect.
If you have a jobs server that grinds through prioritized jobs and you have any number of nodes in your network submitting work to the job server constantly along with a priority, look no further, Redis is the perfect fit. Not only can you use simple Sorted Sets for this, but the performance will be fantastic (binary protocol) AND you get the added win of the database-esque features Redis has like append only logging and flushing changes to disk as well as replication if you ever grew your jobs server beyond a single node.
NOTE: Redis clustering has been in the works for a long time now. Salvatore has been doing some awesome work with it and it is getting close to launch so if you have a particularly huge node distribute and want a robust/fast processing cluster based on Redis, you should be able to set that up soon.
some of these more classic database features, but it is not targeted at competing with Mongo or Couch or MySQL. It really is a robust, in-memory data structure server AND if you happen to need very well defined data structures to solve your problem, then Redis is the tool you want. If your data is inherently "document" like in nature, sticking it in a document store like Mongo or CouchDB may just make a lot more mental-model sense for you.
NOTE: I wouldn't underestimate how important mentally understanding your data model is. Look at Redis, and if you are having a hard time mapping your data to a List, Set or Hash you probably shouldn't use it. Also note that you WILL use many more data structures than you might realize and it may feel odd... for example a twitter app may have a single user as a hash, and then they might have a number of lists of all the people they follow and follow them -- you would need this duplication to make querying in either direction fast. This was one of the hardest or most unnatural things I experienced when trying to solve more "classic DB" problems with Redis and what helped me decide when to use it and when not to use it.
I would say that Redis compliments
most systems (whether it is a classic SQL or NoSQL deployment) wonderfully in most cases in a caching capacity or queue capacity.
If you are working on a simple application that just needs to store and retrieve simple values really quickly, then Redis is a perfectly valid fit. For example, if you were working on a high performance algorithmic trading and you were pulling ticker prices out of a firehose and needing to store them at an insane rate so they could be processed, Redis is exactly the kind of datastore you would want to turn to for that -- definitely not Mongo, Couch or MySQL.
IMPORTANT: Redis's does not have a great
solution to operating in environments where the data set is bigger than ram and as of right now solutions for "data bigger than ram" has been abandoned. For the longest time this was one of the gripes with Redis and Salvatore (http://twitter.com/#!/antirez
) solved this problem with the VM approach. This was quickly deprecated by the diskstore approach after some reported failures and unhappiness with how VM was behaving.
Last I read (a month ago?) was that Salvatore wasn't entirely happy with how diskstore turned out either and that attempts should really be made to keep Redis's data set entirely in memory when possible for the best experience.
I say this not because it is impossible, but just to be aware of the "preferred" way to use Redis so if you have a 100GB data set and were thinking of throwing it on a machine with 4GB of ram, you probably don't want to do that. It may not perform or behave like you want it to.
ADDENDUM: I answered the question about Couch vs Mongo again on Quora with some more tech details if you are interested:http://www.quora.com/How-does-MongoDB-compare-to-CouchDB-What-are-the-advantages-and-disadvantages-of-each/answer/Riyad-Kalla