New blog post: "SPARQL and Big Data (and NoSQL)" Please add any comments here. http://www.snee.com/bobdc.blog/2012/10/sparql-and-big-data-and-nosql.html
14 plus ones
Shared publicly•View activity
View 21 previous comments
- Much of this how we also think. With our solution we try to find the sweet spot between Big Data, NoSQL, and the SemWeb standards like the SPARQL query language. We also believe that not having a standardized query language is a big (and not widely discussed) problem in the NoSQL scene, just look at what happended to object-oriented databases. Espesially enterprise customers are known to not like vendor lock-in's as will happend with a DB that uses a non-standard query language or API, and when they choose a standard like SPARQL they avoid just that.Nov 12, 2012
- I see you mention neo4j, which is a great graph DB. I did a bit of benchmarking on neo4j as a triple store (comparing it with open RDF Sesame native store and another nosql graphdb, OrienDB).
The tinkertop blueprints API provides a RDF API on top of neo4j and orientDB.
This appeared to me as a very promising alternative for the storage layer of triple stores. Unfortunately the first figures give the advantage to openRDF Sesame native store, which is not among the strongest players in term of performances; neo4j arrives in second position (3 minutes behind), and orientDb is last, offering very poor performances (19 minutes behind) (and bugs).
These measures are times for loading trig files. The same difference of performances are visible when measuring query response time.
Still need to wait for improvement in this field !Nov 29, 2012
- Nov 29, 2012
- Well, after some tinkering time and version upgrade (https://groups.google.com/forum/?fromgroups=#!searchin/neo4j/blueprints/neo4j/g8bV8w3LH9E/WIgx5GP14KAJ) it turns out that neo4j coupled with blueprints+graphsail has quite good performances.
To me this is a good piece of news because paradoxally enough, SPARQL 1.1 can tell you whether two concepts are connected, but it cannot print out the path way of the connection. Neo4j being at first a graphDB, it can do path search. I've heard though that Virtuso has an extension to do this. Thanks for the links.Jan 10, 2013
Very good post, particularly as it (and the responses to it) prove out somethings I've suspected for a while. The first is that much of the XML community has been quietly migrating to RDF/OWL/SPARQL for the last couple of years. I suspect this is primarily because most of us our now thinking about big data systems and dealing with large scale heterogenous XML groves (which is essentially what an XML DB is, after all) and seeing relational mapping and inferential processing as being the biggest bottleneck that emerges when you deal with large numbers of XML documents in a data store.
Hadoop has a critical role in ETL, but I see a systematic progression in the Big Data space. Hadoop and other M/R solutions are reasonably good for taking non-structured content (unmarked-up or minimally marked-up) content, performing entity extraction, document enrichment and other NLP processing, and then indexing that content. What typically happens at that point is that each community's preferences for data storage get in the way - the RDBMS types are trying to get the data into relational tables, the XML and JSON document types are more concerned about extracting and constructing discrete data property bundles, while the RDF/OWL folk are more concerned about working with assertions and relational bindings.
The central challenge that RDF has always faced is in decomposing and binding entity relationships, which are usually harder to extract than properties are for a given set of information from raw data. Where SPARQL 1.1 shines is that with a sufficiently large volume of information, a lot of the implicit relationships can be made explicit and consequently can themselves be objectified as entities. Here I think is one place where Hadoop and SPARQL can work effectively with one another, as the process of objectifying those relationships is asynchronous, distributed and likely near continuous.
I'm not that worried about the NoSQL crowd - in many cases, the people that are working with JSON stores (and similar hash stores) are now dealing with the same domain that XML developers have dealt with for the last decade, and will eventually come to the same conclusion that relationships are as significant as properties. At that point, SPARQL is there, it's JSON friendly, and it solves the more complex problems that arise from relational logic that many web developers generally do not encounter because of scope - they're working at the instance level, rather than at the aggregate.Feb 28, 2013
- I think PIGSparQL (http://www.iswc2013.semanticweb.org/content/posters/16) gives the SPARQL endpoint to Hadoop Map reduce based Big data.Apr 9, 2014
Add a comment...