owner

schema.org  - 
 
Some basic definitions related to schema.org

I'm reaching out to everybody - and especially the semantic technology illuminati here - to help me finalize some definitions related to schema.org and its.

I'm relatively confident that my terminology is correct, but I really, really want to nail these for something - well several things - I'm writing.  I'd like to record and put all these terms in the "let us never speak of these again" category. :)

True or false:  *schema.org is a vocabulary*?

The schema.org site calls itself  "a collection of schemas, i.e., html tags" and "a structured data markup schema," and that "the schemas are a set of 'types', each associated with a set of properties."   How does a "schema" vary from a vocabulary?  Is a "schema" in this context the combination a vocabulary and a protocol for its use in HTML?  Or in the context of the site quotes above, does schema simply and straightforwardly mean "XML schema" in the in the Wikipedia sense (http://bit.ly/18d5pHz)?

And even if it's not the full answer, is it actually wrong to call schema.org a vocabulary?

True or false:  *microdata, RDFa and RDFa Lite are each an instance of a snytax*?

As in, "you can use any of these syntaxes to markup HTML with schema.org."

And is it correct to say, more specifically, that each of these are all attribute-based HTML markup specification?

Finally, true or false, JSON-LD is a  method of transmitting linked data using JavaScript?

I know that JSON itself is a standard for data interchange using JavaScript, and it's used for the serialization and transmission structured data, and that JSON-LD is a JSON-based format to serialize linked data.  So as much as it's a mouthful to say "JSON-LD is a way of serializing schema.org data," is a wrong?

And because it is a mouthful, in this same schema.org content is it correct to say that JSON-LD - at the simplest level - is a method of transmitting schema.org information without using HTML (yes, I know, technically without using XML).

Many, many thanks for any responses, and by extension helping more people understand these concepts better:  my well-intentioned search marketing colleagues often get these wrong, and I want to be confident when I reach out to them.

cc: +Kingsley Idehen +Dan Brickley +Gregg Kellogg +Manu Sporny 
2
Gregg Kellogg's profile photoDan Brickley's profile photoAaron Bradley's profile photoKingsley Idehen's profile photo
9 comments
 
Hi Aaron, here are my opinions:

* True or false:  *schema.org is a vocabulary*?

True; although schema.org is based on several different vocabularies, it is entirely self-consistent, which I think is a criteria necessary to define a vocabulary.

 True or false:  *microdata, RDFa and RDFa Lite are each an instance of a snytax

Well, they are certainly syntaxes, although micro-syntax might be better, as HTML is really the syntax at play. RDFa Lite is really just a publishing profile for RDFa, so you can't really call it a separate syntax. And, of course, RDFa is an RDF concrete-syntax. Microdata is not an RDF syntax, but as you know, it can be used to extract RDF.

Meta-question, is schema.org an RDF syntax? I believe so.

* And is it correct to say, more specifically, that each of these are all attribute-based HTML markup specification?

Mostly; RDFa does stand for RDF in Attributes, but there are some cases where element content can be used as part of the markup. Pretty much the same thing for Microdata.

* Finally, true or false, JSON-LD is a  method of transmitting linked data using JavaScript?

False, JSON-LD is a means of transmitting linked data in JSON. Note that JSON is not JavaScript; it has its origin there (JavaScript Object Notation), but it is not bound to JavaScript at all. My own implementation, for example, is in Ruby. JSON is a common format for transmitting data through web services, in a similar way that XML is, but obviously much hipper :).

* So as much as it's a mouthful to say "JSON-LD is a way of serializing schema.org data," is a wrong?

I'd say that JSON-LD can be used to serialize data using the schema.org vocabulary, but it's not limited to schema.org. JSON-LD is another concrete RDF syntax, like RDF/XML, Turtle and RDFa. It's most interesting because it is based on JSON, which is widely used for web services. A design goal of JSON is to not let the RDF-bits get in the way, or even be noticed a as being RDF at all. To developers, it should just look like JSON, and they can use the same vocabulary terms defined in schema.org, or any other well-defined vocabulary (or vocabularies).
 
Thanks +Gregg Kellogg - I really appreciate your feedback, and particularly your clarification regarding JSON/JSON-LD.
 
Broadly agree with Gregg. Am on phone so can't type much now.

Sometimes we talk like schema.org is one big schema; sometimes as if it were several. This is because it has an associative, network structure. You can see similar ambiguity about how other networks are discussed.

The word 'vocabulary' emphasises description and communication. The word 'schema' emphasises data structures, databases. Unlike XML schemas, RDF-based schemas are closer to dictionaries than to grammar rules. They document the meaning and inter-relationship of descriptive terms rather than police strongly how you must use them. 
 
Expressed in Turtle (which works better than natural language prose for me, these days):

<#SchemaOrg> 
a <#DefinitionsCollections> ;
<#comment> """A collection of definitions for terms that aid structured data representation. Each term is grounded in a namespace (DNS name component of a URI/URL)""" ;
<#label> "Schema.org" ;
<#homePagURL> <http://schema.org> . 

<#JSON-LD> 
a <#JSONBasedNotation>, <#ConcreteJSONBasedRDFSyntax> ;
<#label> "JSON-LD" ;
<#comment> """A JSON based Notation for constructing RDF model and abstract syntax compatible Structured Data and/or Linked Data aimed at Javascript developers. You can also embedd JSON-LD based structured data islands in HTML documents using the <script/> tag""" . 

<#Turtle>
 a <#ConcreteRDFSyntax> ;
<#label> "Turtle" ;
<#comment> """A Notation for constructing RDF model and abstract syntax compatible Structured Data and/or Linked Data. You can also embedd Turtle based structured data islands in HTML documents using the <script/> """ . 


<#Microdata> 
a <#HTML5Notation>;
<#label> "Microdata" ;
<#comment> """An HTML5 based Notation for constructing structured data islands within HTML5 documents. These structured data islands are Entity->Attribute->Value based and compatible with basic the RDF model's abstract Subject->Prediact->Object syntax" . Basically, you can easily produce RDF and RDF based Linked Data from this form of structured data""" .

<#RDFa> 
a <#ConcreteXHTMLBasedRDFSyntax>;
<#label> "RDFa" ; 
<#comment> """An (X)HTML based Notation for constructing RDF and Linked Data oriented structured data islands within HTML documents""" .

<#LinkedData> a <#HyperlinkEnhancedStructuredDataRepresentationTechnique>;
<#label> "Linked Data";
<#comment> """A principled approach to structured data representation whereby resources (document content) become web-like (or webby) due to the use of hyperlinks (resolvable URIs e.g. HTTP URIs) as a mechanism for naming (denoting) entities and relations (sets of entity relationships represented in object->relation->object, entity->attribute->value, or subject->predicate->object based 3-tuples or triples)""" .


#LinkedData   #SchemaOrg   #JSONLD   #XHTML   #HTML5     
 
<#Thanks>
an <#ExpressionOfGratitude>;
<#label> "Thanks" ; 
<#comment> """Appreciate the contribution +Kingsley Idehen :)""" .
 
Thanks +Dan Brickley ... very helpful.

I find it interesting that we really don't have a vocabulary (ha) to describe many fundamental semantic web technologies, and must instead use long, descriptive compound phrases - unlike, say describing Windows ("an operating system") or PHP ("a scripting language").

One finds qualified "methods" and "approaches" and "sets" predominating.  Wikipedia will tell you that microformats are "a web-based approach to semantic markup"; schema.org is an "initiative"; RDF "a family of W3C specifications"; OWL "a family of knowledge representation languages"; RDF is a "is a set of classes".

Which all explains why it's easy for the less-technically inclined (often myself) to get confused when speaking of these technologies - they tend to be groupings of technologies, where the component parts are often themselves collections (sets, families, etc.) or are somehow difficult to nail down (method, approach, etc.).

Not a complaint, just a rumination on the reality of these technologies and how they're connected.  Though I'll note I've probably never said to myself, "I'll just markup this code with an initiative." :)
 
often a name is used for a project (microformats, schema.org, dublin core) aka initiative; but also there'll be an associated vocabulary/schema (hcard, xfn, etc.; schema.org's schema(s); the DC elements/terms); and often each project historically has had ideas about how to structure the data, although things have finally settled down towards object/property/value graphs...
 
Quite so +Dan Brickley - and this also highlights (healthfully, IMO) how so many of these technologies were developed collaboratively, as opposed to typical product development (Microsoft didn't call the Zune the Music Player Initiative ... though perhaps they should have).

And certainly object/property/value graphs have become the tie that binds - not to mention the overarching subject/predicate/object model without which I very much doubt Kingsley would be conversing in Turtle. :)
Add a comment...