How do you define #BigData?
By December 2013, "meaningless buzzword" would be the most honest answer, but before the label "big data" became simply another way to publish more books, wow investors, and dupe customers, it actually was
trying to make a real distinction.
My short, quippy answer has generally been "more data than I can fit on my phone" (i.e. more than 48GB serialized). +Tariq Khokhar
told me on Twitter that he prefers "more data than I can pick up."
My longer, more considered answer is that big data
is data that -- because of its size,
not its nature
-- we can't handle using conventional tools like relational databases (or hierarchical databases, RDF triple stores, etc.). Relational databases can easily store, analyze, query, and report on millions or tens of millions of rows, so millions of rows isn't
"big data." At billions of rows, you're starting to strain the capacity of current RDBMS -- you might have to partition your data in complex ways and then merge the results, for example -- so I suspect that's when you start thinking about non-relational, distributed hashing approaches like Hadoop.
Once you make the jump to "big data" you're giving up some valuable stuff like internal consistency, but you're gaining the ability to scale. Depending on what you're trying to accomplish, the answers you get over a slightly inconsistent dataset of 10 billion nodes could be more interesting than the answers you get over a consistent dataset of 10 million nodes.
So that's my tentative answer, for today, anyway. What's yours?