If You Are Designing Your Own REST backend You're Doing It Wrong

The only reason why I know this is because I've been guilty of that my self. This is meant to start a dialogue so please comment.

Let me walk though first what you are probably doing. You pick your favourite programming language and framework and get going. Whether that's node.js and restify, python and django or ruby and rails.

Then you are going to pick your database. Whether that it's tried and tested of MySQL or shiny and new of MongoDB. This choice is probably going to affect how you scale. Also how well you know some of the different databases and approaches.

Then you start coding. You hopefully care about how the URL schema looks like so you make a really nice interface for developers to work from like this:
GET          /users                  - will get you all of your users
GET          /users/olafur      - will get you one user
POST       /users                  - will make a new user
PUT         /users/olafur      - will update the user
DELETE /user/olafur         - will delete the user

You will go through all of your objects mapping to REST like this and hopefully you will end up with something sane. This is really nice to hook up with jQuery and mobile interfaces.

Now you have to scale. You hope that writing data to the server isn't going to kill it, so you hope you don't get too much traffic like that. You know how to handle reads at least. You have something like Nginx and Varnish, with Memcached, then you try finding bottlenecks and seeing if some more caching doesn't solve that. It's truly amazing to see the difference it makes.

Now you hit an API that has to do some async behaviour and now your screwed. There are some solutions for that but they make the code really complex, even Node.js.

But every one of these steps I've described are incorrect, now let me tell you why. Let's work our way back.

The first problem is that you have a lot of moving parts going on before the data you're trying to put into the database ends up there. There are problems with your APIs losing data because of errors or downtime. A lot of the Internet is going over unreliable wireless technologies. So your beautiful REST calls are now riddled with exception handling, because there are so many ways of things going wrong.

So what do you do. Of course you stick a REST database in front of your API. What does that accomplish, let's first talk about speed, we are talking about 4x writing speed improvements. You don't lose data when writing to the APIs. Databases are probably more solid than code you write. CouchDB is truly a speed freak when it's dealing with REST, it has security built in and validation. When you need to scale you have a multi master database so you stick one closest to your users and they all sync with each others. So we have covered scaling and dealing with the speed of writes and the reads.

How do you deal with writing to the REST APIs if the clients have bad Internet connections? You don't. You write to the native implementation in your browser[1][2] or mobile [3][4]. Then you sync with them when you have a connection. This also cuts down on traffic you have to get from the server. Trust me: it's an order of magnitude difference. You might say, "Why not implement syncing in your framework of choice?" If you did this, then you probably have to rewrite them because syncing works in CouchDB by keeping track of revisions and what has changed. This is hard to retrofit to a framework.

So then you don't have beautiful URLs right. It might be a valid use case for some simple API you have to maintain to have nice looking URLs but not for anything that has to scale. And it's possible in CouchDB with rewrite rules.

I personally like the model of a staging database and main database. You can create a rewrite rule so all writes go to staging and all reads come from main. What this gives you is a record of all the incorrect API calls without polluting your main database. What makes sense is to have types of documents you putting into the database and using views with map functions to sort them out. You don't have to do it like that but if you gain a lot from that.

So this is all fine and dandy for stuff that doesn't require processing but what if you actually want to do something more than just to store data?

You can do what I did and write a service that watches for changes in a database and then put those changes through plugins, in a flow like structure. Or you can use my implementation[5], currently only in python. This abstract the CouchDB from you code some your receiving information and sending back a response. I've written it in Tornado, but who knows asyncio looks pretty good.

With that kind of architecture you don't need to handle as much load as is being written to your servers you just handle as much load as you want. Balancing responsiveness with cost of the machines. But the reads are still going to be fast.

So why isn't everybody doing this then? We are still learning how to structure things well so I'm only able to write about this because of the awesome work of databases like CouchDB that are not afraid of being misunderstood and people that have formed the best practices from all the mistake they have done. I've done plenty of mistakes and will do plenty of mistakes in the future. The important thing is to learn from them.

I have to say that I really love REST and I love beautiful URLs but life is about doing the right thing, as often as you can get away with.

[1] https://github.com/olafura/sundaydata
[2] https://github.com/daleharvey/pouchdb
[3] http://www.couchbase.com/mobile
[4] https://cloudant.com/product/cloudant-features/sync/
[5] https://github.com/olafura/sundaytasks-py

#Python #CouchDB #NodeJS #Programming #RubyOnRails #REST
Simon Cornelius P Umacob's profile photoDimitris Karteris's profile photo黃建璋's profile photoVirendra Rajput's profile photo
Thanks for the post. I spent quite some time thinking about it, overally like the idea, but am not sure how it scales API-wise. What if your backend needs deeper nesting of (sub-)resources,  more fine-grained access control limits on a per-resource / per-HTTP-verb basis, ...? In the end, RESTifying existing system / domain APIs boils down to mapping rich, complex APIs to HTTP verbs, resources, representations. Introducing Couch or another "RESTful dispatcher" (this is how I see it) seems to add another layer of abstraction to this...? Maybe it would be worth giving it  a try for a more complex setup. That said, we also excessively make use of the Couch changes notifications for some purposes... ;)
There are some changes coming soon to CouchDB to have read restrictions[1], there are already write restrictions[2]. So that gives you a lot of control. 

An other option that might also be faster is to map different databases to different URLs that have role bases read restrictions[3], this can even apply different rules to different request like GET, PUT, POST, DELETE[4]. You can also make a general rule some nobody can access the underlying interface to get around your restrictions. 

There are probably some cases where it can't fit there you can maybe do a pass through to some service you've set up[5].

For me it's like when I first learned about document databases it took a while to wrap my brain around the concepts because I was thinking to much in SQL. 

There are already other a lot of interest in providing a REST interface directly to databases and I think the trend is going to continue. 

I'm glad you are able to use the CouchDB change notifications because they are really powerful. Funny on Reddit somebody said it would kill your REST interface to do polling for it which shows how awesome CouchDB is that it doesn't effect it negatively. 

[1] https://github.com/refuge/rcouch/wiki/Validate-documents-on-read
[2] http://guide.couchdb.org/draft/validation.html
[3] http://wiki.apache.org/couchdb/Security_Features_Overview
[4] https://wiki.apache.org/couchdb/Rewriting_urls
[5] http://couchdb.readthedocs.org/en/latest/config/proxying.html
I will have a close look at what's coming up then, thanks. Overally, we started using Couch mostly productive in 2011, and using it as a cross-system, cross-platform "poor mans message bus" definitely was one of the sweet spots this thing always had. Sure there always are things worth improving, but it mostly just worked well... I wrote a few things about it back then, see  [1].

Yet, about proxying and auth - consider this:


... this is not production API of ours but pretty close. Two things: At first, there are fine-grained permission controls. Global admins basically have full-featured CRUD access to everything on that path. Members of :bu, in example,  just have read access to :company and limited (calendar) write access to other :member s in the same :bu. And, adding to that, second thing: These resources heavily depend upon a particular user context: User of :bu1 does see different representations than member of :bu2.

This closely maps a backend business API to REST. It is difficult in some ways, and, maybe, considering frontend / load balancer HTTP caching, making it /api/:user/companies/... instead of just /api/companies would ease some things. Permission control, however, remains difficult. At the moment this works well because all this basically happens in an ACLish way in a HTTP request filter that lives right in the REST server backend, pretty "close" to the business API. Any proxy, immediately, would be capable of in some way handling or resembling or mapping this kind of logic. Will this be possible? How to determine whether my mapping in the proxy is correct?

I have no real idea here so far, except for either cutting down access restrictions (not a good idea) or leaving auth "close" to the backend exposed via REST. I'd be however happy to read other approaches. ;)

Thanks for pointing me to the article I enjoyed it.

The example you mentioned is easy to pull off on a functional level using different databases for different things, like the company would be it's own database, then each business unit might be it's own database depending on how many business units there are and how they correlate to the other data in the databases. Each database can have it's own rules regarding who can access it based on roles.

An other option if you don't have to many users is to have one database per user then the relevant data is replicated with the others through filtered replication from a main database with write rules in the main database. Possible with a server side plugin that handles some of the security or other concerns.

You can then use the rewrite rules to get the data to the correct views[1]. But you would need read validation for that to work. I'll look better into the rewrite rules, it might be possible to use a custom erlang plugin to not get those variable after question mark.

But overarching theme is maybe you shouldn't use those fancy URL schemes as a less fancy one that allows you to have mobile cache might be better solution. Because the mobile client wouldn't have those rewrite rules. Like not using relations in a document database, sometimes you have to change how you are doing things to get more benefits.

[1] http://www.pastebin.ca/2689267
Well, about the fancy URL schemes... maybe you're, to some point, right, but then again, I see two advantages here, being "RESTful" and all: At first, these deeply nested subresources clearly illustrate how things conceptually are tied together, and might(?) make it easier for client developers to understand how the application model is structured and, maybe more important, how and why certain access restrictions operate. And it helps to isolate certain "generic" access restrictions into dedicated hosts: In example, using

/api/companies/:company ,

it would be easy for an apache / nginx front controller to allow "anonymous" read access to public HTML / JSON representations provided by these resources. Strong URL schemes definitely help here.

That aside: In the end, you provided me with a bunch of very interesting input which I will be dealing with, as it seems rather interesting. Yet: I am not in doubt that even structures like the one I outlined might be implemented, say, using Couch+rewriting+possibly plugins. What I am curious about is whether or not this, in the end, is "cheaper" (easier to implement, easier to maintain, easier to scale) than making the RESTful API just a "thin layer" of access code on top of an existing rich business API, such as using Dropwizard with existing Java APIs or mojolicious with existing Perl runtime APIs.

In the latter case I see myself re-implementing a load of RESTish stuff that, as you very well outlined, already is perfectly well around in other platforms. This has drawbacks but it keeps the REST layer thin and keeps my developers within the world they're familiar with, and, thus, might ease things such as debugging, error tracking or testing whether, in example, securitly rules actually are correctly enforced.

Using CouchDB in example, however, I see myself using these features and, then, ending up with distributing data, access/security logic and a few other things across parts of a CouchDB facility to connect my backend to. I know what I get from that, but so far I can't really tell what's the price tag at this. Maybe the only way to really find out is doing a benchmark project... ;)
I would love to work with you on a benchmark, it's hard comparing apples with oranges but I've done test on raw throughput of Django and CouchDB and of course CouchDB wins by a big margin (Django doesn't pretend to be the fastest REST api). But that's maybe not fair test. I think we would need to setup multi part experiment, with plain read+write, roundtrip ( getting the answer back ), maybe seeing the load on the machines.

My problem with benchmark is that they are usually wrong, and it's really hard to do an unbiased one.
+Ólafur Arason  You mentioned that such model would be difficult to implement on a framework level but I think that's very similar to what Meteor.js does (disclaimer: I work there): we have a in-memory cache on the client called minimongo (with exact same API) that seamlessly synchronizes with the server's real MongoDB. Supports access/permissions logic, handles reconnects and has no explicit REST design (although something similar based on RPCs is implemented in the framework). What do you think?
Ólafur, I really like your thoughts and the conclusion you made. The funny thing is, what you describe is exactly the architecture that I've build for an app 3 years ago [1], and that we later on extracted into its own thing: Hoodie [2]

If you think about it, you could even go one step farther: bring the API to the frontend. For the example of web apps, you'd provide an SDK with methods like `api.signUp(username, password)` or `api.resetPassword(email)` or `api.store.add(object)`. This is, for obvious reasons, very attractive to frontend people consuming your API, and I've put many more examples together [3]. From the architectural perspective, it also brings many benefits though:

- the API ends up to be even more concise than the most beautifully crafted REST API
- It's transport technology agnostic. The SDK can use HTTP 1 today, HTTP 2 tomorrow, or sockets. The Apps consuming your API wouldn't need to change anything.
- with a frontend API, you could easily build in offline storage & synchronization, the consumers couldn't even tell. And why should they, it's not a functionality that is unique to an application, so why not abstract it, too?
- If you make it work offline, the API will mostly work, even if your backend is down (for maintenance or due to an error), if the user lost the connection or it became unreliable, or even if the user has not signed up for an account, as all data is cached offline
- If you followed the steps above, and you have offline storage and a separate process that synchronises the data, you can easily use that for tasks as well. Tasks would be hidden JSON objects that would be picked up by Workers, they would process it (as you describe it in your post), and once finished (or failed) they would update the task object, so that sync gets triggered again and you can react on it in the app

That is exactly what we do with Hoodie today, as an outcome of the same thoughts you  brought up here, only 2 years ago. I'd love to here your thoughts on it.

[1] minutes.io
[2] hood.ie
[3] nobackend.org
I'm very interested by the subject and I know a bit about CouchDB too (which I also find very interesting).

«The first problem is that you have a lot of moving parts going on before the data you're trying to put into the database ends up there. There are problems with your APIs losing data because of errors or downtime. A lot of the Internet is going over unreliable wireless technologies. So your beautiful REST calls are now riddled with exception handling, because there are so many ways of things going wrong.»

This part is really interesting from my point of view as I never had those kind of trouble in a web context which might be explained by the fact that I have little experience in SOA stuff. Now I'm not sure about what is the specific use case it describes maybe something in the spirit of OAUTH but more complex? Basicaly what I understand is because of many reasons not only you data at large, databases you have control over and third party databases, are not in a consistent state but the code that should handles is cluncky. The latter doesn't suprise me, as it is basically implementing some kind of transaction behavior over several components which basicly allows to commit or rollback the request intended changes in case something goes wrong during the transaction to guarantee that changes are Atomic.

Did I describe correctly the problem(s?) you are trying to tackle?
+Slava Kim I think Meteor.js doing really interesting things. Your model does help with network problems but doesn't address the stability problem, a database is hopefully very stable and having that in front negates bugs in you code, if things blow up you get notified and the queue just builds up until you get it back up on the backend. Also having an in-memory database doesn't work in the offline use-case. But you framework is more about being as instant as possible and I respect that, if I understand it correctly.

But I there is value in using anything that makes things better.
+Gregor Martynus I really learned a lot from all the people that have been doing this for a while, I've been using this model for two and a half years now. My brain has also been boggled trying to comprehend some of the things this model brings, so the amount of documentation and articles as helped me a lot.

I really like the nobackend movement but I don't know if I would describe it like that I would say more modular backend or maybe lego backend :) Also I'm a bit wary of having my backend locked up with a service that I can't run my self, so some of the services are out of the question for me.

I really like the simplicity of hood.ie API, I had already built my own library when I came across it but there are some things that make it so I can't use it.

I have a really high database requirement and 5mb is not enough, it's good for small apps. It's also cleared with clearing the cookies[1]. I also really like the power of indexedDB though defiantly not the API :P

I'm also worried about you using a pass-though model that you [2] and meteor are using, I really like the speed of node.js and I use it my self for that reason. But I would rather get the full speed of CouchDB and use it to pass-through traffic that is to complicated for it. I have discussed the other option with my colleges and it negates a lot of the benefit.

But you are pretty close to this model.

I'm also interested in this as a general model for many languages, I think there is great pieces of code in many different languages and we often rewrite just to get similar functional. Says the person that thinks some programming language are just ugly, mostly because of the functionality not maybe what characters they use.

I also just really like that I can through up an ec2 instance of CouchDB close to the customer with minimal setup and use the power of multi master replication.

But I really don't want CouchDB to be the only one that's using this model., because it's to important for just one implementation. There is an REST wrapper for some databases I forget now what it's called but it was missing some key functionality. There is also U1DB[3], which is mostly the syncing part.

[1] http://sharonminsuk.com/blog/2011/03/21/clearing-cache-has-no-effect-on-html5-localstorage-or-sessionstorage/
[2] Might be reading the source code incorrectly or thing have changed since I looked at it last.
[3] https://one.ubuntu.com/developer/data/u1db/index
+Slava Kim +Gregor Martynus I'll try to see if I can't find some time to implement indexedDB support for hood.ie and meteor.js, though in meteors case I don't know if it fits with it's model. I'm a bit busy now but I think both of you frameworks have a lot to give to the community.
+Amirouche Boubekki what I was referring to is that when calling a REST API you have no guarantee that your call will get through, even on wifi, mobile is much worse. Your browser tab might be closed before you complete your call. What you need to do then is to have all sorts of handlers that know about different errors that might come up and before you know it a simple call has turned into either using a framework that abstracts that really messy code. So just writing to a local database means  that your data doesn't get lost and you can just sync with your server when available. Plus it makes for more efficient sending of data rather then sending a hundred small expensive calls you send one expensive call.

I don't know if we should get into consistency and ACID related things, it seems like a bigger discussion, mostly because I'm not an expert :)

I don't see this kind of system being used for things like tickets or bank transactions. So only use it for things that don't require things to be consistent at any given time. If you have considered using caching for your data then your fine.
+Jashua Gupta yes it's very trivial to do the REST part per se, but getting it right can get tricky when you aren't just pushing the data to the database and there are a lot of corner cases and unexpected things. I think Rails also has a very simple REST model.

What you have to bolt on it to be able to scale isn't trivial, all the caching and fine tuning.

I'm not saying there are any magical things you can do so everything goes smoothly :)

What I like about these kinds of frameworks like flask is that it get's people up and running in no time doing stuff that works.

I just think we can do better and sticking something like CouchDB in front of gives a lot of performance, better uptime and less load on your servers.
Can you explain how you go about managing authentication? I've wanted to use couch for my front end for many years, but user authentication has always been a sticking point for me. I am not sold on the idea of storing all user data in couch. Can you enlighten me?
+Ólafur Arason "it's very trivial to do the REST part per se, but getting it right can get tricky when you aren't just pushing the data to the database and there are a lot of corner cases" - that was my point exactly. Restless makes handling all those corner cases trivial (as shown in the second link of my previous post), but I didn't find anything like that for couch db that gives you such fine grained control with such remarkable ease. I just glanced over the documentation and googled around for 10-15 minutes, so please feel free to point me to a resource (heh) with similar capabilities in case I missed it. 
+Jashua Gupta I think we aren't talking about the same thing, when I was talking about corner cases I wasn't refering to the REST urls, I was talking about figuring out why the REST API is slow in some situations and why and when things are going wrong. CouchDB doesn't do a good job at creating really good URL schemes[1], but they are powerful enough.

So like the differences between document databases and relational databases it sometimes needs to change how you do things.

For example, on option is to but type information in the schema of the document and have CouchDB enforce that[2], that way you don't have as many different databases for you data and can even do cool things with map, reduce[3].

So /article/:aid/comment/:cid would become:
    "commentid": ":cid",
    "type": "comment",
    "article": ":aid"
    "body": ""

That's not the only way but it still pretty clean, I would recommend having the "_id" unset so you get a strong hash for it, to minimize the risk of collusion.

This is not the only way, like I said you can still keep different database but this is a very good way, supports way more flexibility than an URL.

[1] https://wiki.apache.org/couchdb/Rewriting_urls
[2] http://guide.couchdb.org/draft/validation.html
[3] https://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
+Kurt Williams Here are some good links:

But basically you have a couple of options of registering user, one is with the built in system and then using either Facebook or Twitter:

After you have registered then you can use the built in system which provides a couple of ways of authenticating like cookies, http basic and oauth. Or you can use the Facebook and Twitter.

The facebook and twitter login is provided by a plugin and CouchDB can support what ever scheme you want to login with if you are so inclined, but it "has" to be in erlang.

That's authentication and registration, then there is security of the databases, they are governed by roles and each database can use a mix of usernames and roles to control, read and write access.

For example if you have a group working on something than you create a role for that and assign it to the members that need to be in it and the database that stores the data.

But that isn't enough you also have to make sure that some people can't mess with certain fields so you have write validation:

Hopefully soon, depends on how the merge is going, we will have read validations, but they might impact speed so use sparsely like with the write validation:

The best way is to spit things up into different databases to enforce security.
+Ólafur Arason oh, you were talking about performance. I was talking about validation, role-based permissions, column-filtering etc. Yep, sqlalchemy on its own can be pretty slow, but combining it with Gevent  (or any non-blocking library) gives it a huge performance boost. 
Add a comment...