Iterate fast and measure everything!
I started work on an idea that had been bugging me for some time: http://evaluat3.me
It was essentially born out of a personal and mainly vain need: I wanted to know what my co-workers and ex-coworkers thought about my work.
Of course I can’t really ask them, most are polite nice people, so they wont really critique me. My problem was that while I don’t think there was too much to diss about my work, I was pretty sure I wouldn’t get unfiltered feedback if I just asked. So I created a platform that allowed them to answer specific questions about my work, anonymously. I put a lot of effort into making sure not even I (developer of the app) could get back into their results. - One way encryption and no identifiable info to the response data.
In discussing evaluat3.me
with other people I got the impression there is some external interest for this type of thing. Weekend and week-night coding started. Minor skirmishes with wife ensued. Finally I got a POC done to test out some of my assumptions and sent out surveys to my peers.
The first measurement needed was: how many people would actually answer an evaluation. I sent out a survey about my work. I got much better numbers than I expected: about 70% of those asked responded. Great! Next step, alpha/beta version and signups.
For tracking evaluate.me
has both http://mixpanel.com
and google analytics enabled. Mixpanel was chosen because it is realtime. I instrumented everything! A feedback sent, a click on a button, a click on a link, a click on a field: if you farted, I would know realtime. In the mixpanel version I was using then, a user stream would give each visitor funny names: “Theta Omicron” just let one out, for example.
Pain points were pretty clear right away. I modified and redeployed in a couple of hours. It is relevant to mention I am using the Play framework (Scala/Java) : http://www.playframework.org/2.0
and EC2 for development, so iterations go really fast.
The subsequent iterations used an AB split framework (optimizely) to measure the relative success of any change.
I made a final iteration using a lot of the info I had been gathering and left a comment on a very relevant article linking back to the site. Both traffic and signups grew substantially! At this point it became quite difficult to do redeployments. My single EC2 instance box would have to go down for a minute and then come back up. But I could see in the realtime stream (and my logs) that I had users practically all the time from multiple regions. At this point I could proxy requests on the machine itself to two instances of jetty (still evaluating it) and have the proxy check the health of the instances. This would allow me to bring one down, while I deploy to it, having the other handle the traffic.
Because I had a lot of data on traffic spikes, and I’m an optimist, I decided to do it the right way: and auto-scaling farm based on Scalr. The current setup allows me to hot deploy and handles traffic spikes really well (and really cheaply!). getting things going though has been almost all a direct effect of measuring as much as I can and deploying changes as quickly as possible. I constantly think to myself, I love the fact I live in the future. With all the tools available today, there is no reason not to code this way (in most contexts - Not hospitals or aeroplanes).