Profile

Cover photo
Tariq Khokhar
Works at World Bank
Attended University of Cambridge
Lives in Washington DC
300 followers|20,843 views
AboutPostsPhotosVideos

Stream

 
 
How do you define #BigData?

By December 2013, "meaningless buzzword" would be the most honest answer, but before the label "big data" became simply another way to publish more books, wow investors, and dupe customers, it actually was trying to make a real distinction.

My short, quippy answer has generally been "more data than I can fit on my phone" (i.e. more than 48GB serialized). +Tariq Khokhar told me on Twitter that he prefers "more data than I can pick up."

My longer, more considered answer is that big data is data that -- because of its size, not its nature -- we can't handle using conventional tools like relational databases (or hierarchical databases, RDF triple stores, etc.). Relational databases can easily store, analyze, query, and report on millions or tens of millions of rows, so millions of rows isn't "big data." At billions of rows, you're starting to strain the capacity of current RDBMS -- you might have to partition your data in complex ways and then merge the results, for example -- so I suspect that's when you start thinking about non-relational, distributed hashing approaches like Hadoop.

Once you make the jump to "big data" you're giving up some valuable stuff like internal consistency, but you're gaining the ability to scale. Depending on what you're trying to accomplish, the answers you get over a slightly inconsistent dataset of 10 billion nodes could be more interesting than the answers you get over a consistent dataset of 10 million nodes.

So that's my tentative answer, for today, anyway. What's yours?
1
David Megginson's profile photoScott McQuade's profile photo
5 comments
 
Fair point, though I expect the 'big data' tools will also prove useful for smaller, complex tasks. Hopefully NSA-style tools will trickle down and find a home in other places.
Add a comment...

Tariq Khokhar

Shared publicly  - 
 
Hey Alex - I went though the same decision process a few months ago: I ended up with a 15" non-retina MBP. My reasons for going for it:

- hi-res matte (not glossy) display option which I find preferable to the retina for 95% of the coding, writing and a/v work I do. Movies look nicer on the glossy retina but that's about it for me.

- I want fast AND plentiful storage. I replaced the internal HDD with a (relatively) inexpensive Samsung 256GB SSD and swapped the optical drive with a caddy to house the 1TB HDD that came with it: result is a super-fast machine with heaps of storage.

- overall, with 16gb of third party ram and the SSD switcheroo + AppleCare, the non retina machine comes out a few hundred bucks cheaper than the closest configured retina. As a bonus I could use my existing magsafe power adapters..

Good luck!
 
OK, techie friends: if you were going to buy a new Apple laptop, what would it be -- and why? I want a 15" screen, so the Air is out. What's the general consensus on Retina vs non-Retina machines?
http://www.apple.com/why-mac/compare/notebooks.html
2
Alexander Howard's profile photo
 
Super helpful! Thx.
Add a comment...

Tariq Khokhar

Discussion  - 
 
Hi Folks - thought you may be interested in this event today: "What Happens When Big Data Meets Official Statistics?" Live webstream today from the +World Bank at 1430 EST / 1930 GMT. #bigstats   
4
Add a comment...

Tariq Khokhar

Shared publicly  - 
 
The UNDP just launched an open data site for the projects and operations. Here's my take - what do you think? 
1
Anne-Marie Soulsby's profile photo
 
Hi Tariq,
I think its a great step forward, however what I am trying to develop is a map of open source data regarding climate change, including projects, initially in Tanzania. This will use both open and crowd source data, and even potentially give citizens the ability to rate the success of each project, link NGO's together to support projects and also local solutions that can be replicated with the assistance of funding. Take a look at www.conservationinteraction.org. Asante!
Add a comment...

Tariq Khokhar

Shared publicly  - 
 
How can cutting edge technology be applied to financial data to fight fraud and corruption? Uber geeks from Transparency International, Palantir and the World Bank star in this exciting looking lunchtime (with lunch!) event hosted by the World Bank Finances team  here in DC next Wednesday September 5th. Come! 
2
Add a comment...
Have him in circles
300 people
Katherine Lucey's profile photo
Jason Goodrich's profile photo
Myrna Machuca-Sierra's profile photo
Aleem Walji's profile photo

Tariq Khokhar

Shared publicly  - 
 
New data  show the global child (under-five) mortality rate has dropped 47 percent since 1990: http://blogs.worldbank.org/opendata/global-child-mortality-rates-have-halved-1990-s-not-enough-meet-mdg-target 
3
3
Sebastien Racaniere's profile photoMaurice Nsabimana's profile photo
Add a comment...

Tariq Khokhar

Shared publicly  - 
 
Just wrote a little overview of access +World Bank Data in Python, R, Ruby and Stata - keen to hear about any other language-specific libraries out there! 
2
Tariq Khokhar's profile photoJohn Wesonga's profile photo
2 comments
 
+John Wesonga that's awesome - just spotted it! https://github.com/johnwesonga/wbdata 
Add a comment...

Tariq Khokhar

Shared publicly  - 
 
"What Happens When Big Data Meets Official Statistics?" Live webstream today from the +World Bank  at 1430 EST / 1930 GMT. #bigstats  
1
1
Bernard Tremblay's profile photo
 
+Eric Kavanagh Synchronicity, nae? Found this immediately following BriefingRoom!
Add a comment...

Tariq Khokhar

Discussion  - 
 
What Happens When Big Data Meets Official Statistics? Live webstream today from the +World Bank at 1430 EST / 1930 GMT. #bigstats  
4
3
Tariq Khokhar's profile photoSebastien Racaniere's profile photoJason Kolb's profile photo
5 comments
 
Hmmmm... bitcoin mining is far too efficient with ASICs these days. Plus we already have done bitcoinminers in Maxeler (the company I currently work for). I don't have any traditional supercomputer at hand, but we do build our own breed of supercomputers using FPGAs. Right now, I'm testing the water for interesting big data problems. Your post on "How can cutting edge technology be applied to financial data to fight fraud and corruption?" is pointing in one interesting direction.
Add a comment...

Tariq Khokhar

Shared publicly  - 
 
+World Bank Executive Directors Name Dr. Jim Yong Kim 12th President of the World Bank Group: http://t.co/P0VR3I4L Zoellick Congratulates Dr. Kim: http://t.co/c4bgwLiI
1
Add a comment...

Tariq Khokhar

Shared publicly  - 
 
The 10 design principles behind the UK's new central digital hub - GOV.UK.

This is a must-read post on #opendata and #opengov - it distills both good practice from an agile & participatory development perspective and captures the philosophy of open government in an actionable nutshell.
1
Add a comment...
People
Have him in circles
300 people
Katherine Lucey's profile photo
Jason Goodrich's profile photo
Myrna Machuca-Sierra's profile photo
Aleem Walji's profile photo
Education
  • University of Cambridge
Basic Information
Gender
Male
Work
Employment
  • World Bank
    Data Scientist, present
Places
Map of the places this user has livedMap of the places this user has livedMap of the places this user has lived
Currently
Washington DC
Links