Document-oriented (or schema-less) databases such as CouchDB and MongoDB have received a lot of attention lately in the form of the NoSQL movement and many new articles, books and websites. This is due to their unique approach to managing and querying data and also their promise of scalability. The data stored in a document-oriented database is nothing special, simply key-value pairs, but this simple method of storing data requires an entire paradigm shift when writing database-driven applications. Every time data is inserted or updated in a document-oriented database a new “version” of the document is added to your database. This version is maintained along with all the previous versions, and leads us to one of the first major differences between document-oriented databases and traditional databases.
In document-oriented databases accessibility of data is valued over consistency. This means that if you’re running multiple database servers a client will always be able to get some version of the data from the database, but that version is not guaranteed to be the most up-to-date version. This means that for example if I updated the title of this article from “CouchDB” to “Document-Oriented Databases” some users coming to this website might get the old title and some will get the new title. It all depends on what database server they get their data from.
The immediate implication of this is that CouchDB is out for any applications where up-to-date information is a requirement. So in all likelihood you won’t use CouchDB for tracking sports scores or accessing medical records, but that still leaves plenty of important applications in. So why is this feature good? Well, for one, it allows CouchDB to work as an offline database. Using the replication feature you can update your local version of the database to the latest one and then go offline, merging your updates when you come back. Secondly, it allows CouchDB to focus on serving data as its primary job, so it could potentially payoff huge in a high-performance setting.
So now we know how the data gets into the database, but how do we get it out? This is probably the most confusing change between CouchDB and a traditional database. CouchDB database queries use what is called map-reduce to filter your data before it is returned. The map part of map-reduce filters your data so you get only those values that you want. The reduce step (which is optional) can then take these filtered values and perform some operation on them returning a single value. This new mode of querying takes some getting used to, and is definitely not as easy for complex queries as SQL. That being said, it does provide a lot of flexibility for a certain class of queries.
So this has been a whirlwind tour of document-oriented databases. Should you use on your next project? Well if the Digg debacle (http://www.forbes.com/2010/09/21/cassandra-mysql-software-technology-cio-network-digg.html) is any indication, chances are good that you’ll want to consider the pros and cons very carefully before you re-engineer your website to run on NoSQL. Chances are good that right now they just aren’t understood or proven enough to be acceptable for the average website, but they do providing an exciting possibility for the future of highly scalable databases.