Dhruba Borthakur: "The two experiments show that HBase+HDFS, as it stands today, will not be able to harness the full potential that is offered by SSDs. [...] Experiments on other non-Hadoop databases show that they also need to be re-engineered to achieve SSD-capable throughputs. My conclusion is that database and storage technologies would need to be developed from scratch if we want to utilize the full potential of Solid State Devices. "

But then consider, The Bleak Future of NAND Memory: http://static.usenix.org/events/fast/tech/full_papers/Grupp2-8-12.pdf

I think OLTP will fit under this ceiling, but we may be stuck with spinning rust for so-called "Big Data" cases. Some kind of mixed use deployment for HBase may make sense.

Consider a HDFS that adds storage device type information to volume and block metadata, and an extension of the HDFS API to specify storage device affinity. Then we might see an HBase that stores data to spinning media, but WALs and flush files and other short lived and frequently accessed objects to SSD. We may see enough benefit with this, plus the (theoretical) improvements that Dhruba talks about, to make a hybrid storage architecture with Hadoop+HBase+SSD+SATA make some sense.

The notion of optimizing database technology for SSD reminds me of "System Co-Design and Data Management for Flash Devices", a tutorial presented at VLDB 2011 last year: http://www.vldb.org/2011/files/slides/tutorials/tutorial2.pdf and the FlashStore paper from the year before: http://www.vldb.org/pvldb/vldb2010/papers/I04.pdf
Shared publicly