I'm giving a talk about "Lessons Learned from Developing Apache Hadoop" at UC Irvine tomorrow.
Hadoop is quickly becoming the foundation for big data processing in
enterprises and has spread from its original web adopters (Yahoo,
Facebook, Amazon, Google, LinkedIn, Ebay, Twitter, Netflix, Alibaba) to
other large tech companies (Apple, Microsoft, IBM, Adobe, Intel, AMD,
NetApp, EMC, Cisco, Nokia, AOL), startups (DropBox, Foursquare,
StumbleUpon, Zillow,Match.com Match.com, Rackspace), government (CIA, NSA, Booz Allen Hamilton), commercial (Sears, Walmart, Gap) and financial (JP Morgan, Mastercard). I will summarize the history of Hadoop and discuss some of the right decisions and big wins as well as some mistakes and lessons learned. For example, releasing early and often and evolving quickly in the early days forced the project to stay focused on continuous improvements that mattered to the user. On the other hand, we wanted to improve the MapReduce API, so we made a completely new one, which is an improvement but now we have to support both APIs indefinitely. I'll also discuss some of the current challenges and how we are working on them.