Since I left my job as the IT Director/Director of Online Strategy at a small pm;omr archive last spring, I have been working on "what I want to be when I grow up"--what I want to do next. For me, the obvious next step is to find a way to apply what I have learned about new media, online collecting, the Semantic Web, and online community in some way that lets me push the boundaries of what can be done, again. Today's panel discussion, "Challenges of Linked Data" at MIT, part of an actual class, "Linked Data Ventures" was a great chance to recharge the batteries and do some thinking.
The proposition was that while Linked Data is growing in popularity, it is not growing nearly as quickly as other aspects of the web. The consensus seems to be, it's grow. Important foundations have been laid. Some of us are getting better at focusing on getting data on the web (as opposed to perfect, five-star LOD only). And there are some neat commercial opportunities out there even today.
Six panelists spoke (~10 min). I'm posting notes, pretty much as I tapped them out on my iPad. After questions, each speaker was given 30 seconds for a closing soundbyte:
Jim Hendler (http://www.cs.rpi.edu/~hendler/)--a
lot of the problems aren't tech, but tooling. Didn't need ontology editor, needed something more sophisticated than Excel to map things. Wasn't yet business model for how to use Linked Data. OpenLink has data on over 1mil items, but every use still seems to need custom tools. Closing soundbyte: "Broad data" Look at the new ecosystem and new tools and go from there.
Helena Deus (https://docs.google.com/document/d/1zA66gsRsY14Sura9rFZ4Q6aRJ6JIMZmllkdBb5erukQ/pub)--provider
of data for physicians. How to get info on which therapies work w/what genes, cancers, etc. Major data mess. Public structured info often of poor quality. Among other things, need provenance, rating. Academics? No problem. Doctors? Data =must= be good. But, how to get data from reputable sci journals into databases? In the meantime, quantity of data exploding exponentially--doubling every 8 mos. Also, SPARQL endpoints frequently down. We need to plug in analytics. So bridge from Life Sciences to Clinical Research: Linked Data+Analytics+Data Management+Trust Resolution.--ultimately, what defines linked data is how useful it is--it's a side effect. Closing soundbyte: Linked Data is about little data, all over the web.
David Karger (http://people.csail.mit.edu/karger/)--
(if there was a panelist whose presentation resonated, Karger was the one--and it's no surprise to realize that he is involve with Exhbiit, Simile, and other neat projects). Remember, web started w/users, not business model. LOD demands too much. Demand for access via API stifles demand. We emphasize requirements, not benefits. Potential users must learn RDF, learn about triple stores, SPARQL, etc. And, after all of that, what do you get? Too little. Putting data on the web expresses too little of you. Compare Wikipedia vs Freebase. There aren't enough tools for consumers that do something useful that they want to do. How to help people get data into Excel and visualize. Suggest: Naked Data--praise people who put spreadsheets on web! Remove need for APIs. Ensure that sites have places to download data in tables. Also, provide neat tools for visualizing data. See "Exhibit" framework as one example. We also need applications that do interesting stuff w/LOD. Closing soundbyte: Perfect is the enemy of the good. Focus on short-term goal of two-star data, which is the prerequisite for better data.
Mona Vernon (http://gracehopper.org/2013/speaker/mona-vernon/)--End
user vs Professional? Is all knowledge subjective? LD has to be ingestable by machine. Huge gap between how executives and techies understand Linked Data, Now that Gartner says this is $3T market, she can at least give executives something they understand. Closing soundbyte: We need to create economic incentives that make this data available.
David Wood (http://3roundstones.com/about-us/leadership-team/david-wood/)--Big
difference between data and docs. We get docs. My 80-year-old mother understands docs. But, most people have no idea what to do w/data or what it is. Also, real data is dirty. And it will never be clean. To respond to Karger's idea of "naked data"--not all of us have pinup bodies. Context requires materialization of assumptions. And we're not get that universal context because there simply isn't enough time/money.--your average data warehouse exists because orgs have a ton of databases they can't cross-query. Huge commercial opp't'y for linked data. (audience member: R2ML one promising direction). Closing soundbyte: A little semantics go a long way, and as these opportunities explored, tools and opportunitiess will grow.
Tim Berners-Lee (http://www.w3.org/People/Berners-Lee/)--[I
came in part because I had never heard Berners-Lee speak. Turns out, it is like listening to a stream of conscious idea generator. Gushes of word clusters, sentences, sentence fragments. Hard to follow. That's okay. Had a great exchange after his talk with an audience member who wanted to have a definition of Linked Data in 25 words or less. Pointed them to a mug (http://www.cafepress.com/mf/62597433/5-star-linked-open-data_mugs
) and its ~15 words.] Just because someone has created a triple store doesn't tell us anything useful. Speech not his medium? look for Ted or O'Reilly Strata talk on spreadsheets (http://strataconf.com/strata2014/public/schedule/detail/32009
). To Tim, spreadsheets are a huge waste, but that's what we are comfortable using. What tools can be created to support LOD that a spreadsheet user can produce? Closing soundbyte: It's still about small pieces loosely joined.
Me: Do we actually have more semantic data than we acknowledge? Facebook's Graph Search. My site-in-development enabling search by location, association.... As several speakers pointed out--once the data are online as data (as opposed, say, to being locked in narrative text without machine-readable semantic context), the door to use, building use, and more/better linked data is open.
Although speakers focused on spreadsheets as the typical "two-star data" online, it is easy to see some nice cycling happening, say, as I use existing microdata or microformat codes for my listings, events, etc. It is easy to see my data being expanded, reused, enhanced by others once it is accessible as data, leading to better data for me (and better ways to find things, over all, for everyone). Gotta stop now and get back to working out those new website tools....
In the meantime, corrections, comments welcome.