Shared publicly  - 
 
Topic worth looking into.
Gregg Kellogg originally shared:
 
Things people get wrong in RDFa markup

Lately, I've been looking a lot of both RDFa and Microdata formatted HTML. There are a number of things that authors (even experts) regularly get wrong:

@src and @rel attributes create reverse relation

Having code such as the following:

<body about="">
<img src="image.jpg" rel="icon"/>
...
</body>

You'd think that this would indicate that the icon for the document is <image.jpg>, but it actually says <image.jpg> xhv:image <> . The why of this is lost in the haze of history, but people regularly get this wrong. To get what you need, consider something like the following markup:


<body about="">
<span rel="icon"><img src="image.jpg"/></span>
...
</body>
@rel and @typeof and/or @about shouldn't be on the same element

Another common mistake is format such as the following:

<body vocab="http://schema.org/" about="">
<div rel="mainContentOfPage" about="#me" typeof="Person">
<p>Name: <span property="name">Gregg Kellogg</span></p>
<p>Knows: <a href="http://greggkellogg.net/#me" rel="knows">Myself</a></p>
</div>
</body>

Placing @rel and @about or @typeof on the same element indicates that the @about/@typeof indicate the subject not the object of a relation. To get the desired effect, use @resource, however, this does not let you set the type of the object resource. Alternatively, use the following type of markup:


<body vocab="http://schema.org/" about="">
<div rel="mainContentOfPage"
<div about="#me" typeof="Person">
<p>Name: <span property="name">Gregg Kellogg</span></p>
<p>Knows: <a href="http://greggkellogg.net/#me" rel="knows">Myself</a></p>
</div>
</div>
</body>

Another area of common mis-understanding is that the document order of statements within an HTML document is not significant when creating a list of resources. Consider the following example from schema.org/MusicPlaylist:


<div itemscope itemtype="http://schema.org/MusicPlaylist">
<span itemprop="name">Classic Rock Playlist</span>
<meta itemprop="numTracks" content="5"/>

<div itemprop="tracks" itemscope itemtype="http://schema.org/MusicRecording">
1.<span itemprop="name">Sweet Home Alabama</span> -
<span itemprop="byArtist">Lynard Skynard</span>
</div>

<div itemprop="tracks" itemscope itemtype="http://schema.org/MusicRecording">
2.<span itemprop="name">Shook you all Night Long</span> -
<span itemprop="byArtist">AC/DC</span>
</div>

...
</div>

You would think that this describes a track ordering, but it does not (at least in RDF). Doing this requires RDF List constructs missing from both Microdata and RDFa. In Turtle, you could do it as follows:


@prefix:<http://schema.org/> .
[ a :MusicPlaylist;
:name "Classic Rock Playlist";
:numTracks 5;
:tracks (
[a :MusicRecording; :name "Sweet Home Alabama"; :byArtist "Lynard Skynard"]
[a :MusicRecording; :name "Shook you all Night Long"; :byArtist "AC/DC"]
...
)
]

It would seem obvious that an HTML ordered list could be used to generate an RDF List, but it received to achieve enough interest to make it through.These are just a couple of things that are confusing about RDFa, and offer good fodder for Microdata proponents to complain about the complexity of RDFa markup.

It's important to note that a core goal of RDFa 1.1 (http://www.w3.org/TR/rdfa-core/) is to be compatible with RDFa 1.0 (RDFa in XHTML), in which these decisions were established.

Perhaps a reconciliation between Microdata and RDFa could take the best of both:


* Craft RDF friendly URIs from terms (such as schema:Person above),* Reduce amount of document structure needed to describe common use cases,
* Better intuitive generation of RDF output,
* Ability to avoid RDF generation and go straight to JSON (perhaps JSON-LD),
* Use common URI prefixes,
* RDF Lists,
* Promote better HTML readability.

That's my 2 cents (for now)
5
3
Henri Bergius's profile photoKang-Hao Lu's profile photoIan Hickson's profile photoPhilip Jägenstedt's profile photo
34 comments
 
He is wrong about Microdata no? Microdata has ordering.
 
ick, I'd stay away from RDF Lists - have you ever tried to work with them? They're pretty terrible.
 
We could keep it so that Microdata cannot be completely converted to RDF. I do not care much either way.
 
If we do anything here I think it's probably better for us to just remove the RDF conversion. It's not clear to me that it's got any practical use cases. It was mostly added for completeness' sake, IIRC. (People who want an RDF conversion can always define vocabulary-specific conversions, as people are doing for schema.org.)
 
Strangely enough, I think that some of the more die-hard RDF people (who I don't consider myself as) would like to see RDF translation removed from Microdata because of the way the URLs are generated. There was also talk about having the RDF community define what the Microdata to RDF mapping is. Not having RDF support in Microdata /may/ differentiate it enough for the W3C TAG to no longer care about the differences between Microdata and RDFa - although, I doubt it.
 
Well what the TAG thinks is really neither here nor there. My concern is in making sure we address real-world use cases and address issues raised in real-world deployments.

If we remove the Micodata-to-RDF conversion, it would be because nobody needs a generic conversion, and they all get done on a vocabulary-by-vocabulary basis. So far, certainly, that appears to be the case.
 
It is only falling back into the microformats pattern if you consider RDF to be the goal. If you consider Microdata to be the goal there is no problem.
 
Even though I have actually spent time implementing the RDF conversion algorithm (for completeness) I'm not convinced it's worthwhile. Its only redeeming factor is that does a good job with the "Just a Geek" example so that people who like RDF can say "ah, this is not so bad." Of course, once they see a generated predicate URI they're not going to be so excited any longer...
 
+1 to both staying away from RDF List and removing RDF conversion from Microdata, or at least from the HTML community. I know some RDF people who don't even care about RDFa. (but they do publish RDF on the Web)
 
If microdata is not RDF, then its use falls only into the domain of SEO (via schema.org). That might be less confusing to publishers, but I'm worried about the flexibility of microdata for use cases like VIE (https://github.com/bergie/VIE#readme) where a CMS user interface uses the mark-up to actually edit stuff.
 
+Henri Sivonen according to Manu's comparison (http://manu.sporny.org/2011/uber-comparison-rdfa-md-uf/) microdata appears to lack some of the expressiveness that makes RDFa useful. That might be fixable, though. But for now I would see publishers doing microdata only to satisfy SEO possibilities given by schema.org.

That said, I'd really like to see there being a single format for doing structured data inside HTML5. It could be RDFa, or it could be microdata, as long as it does majority of the things people do with both now.

My use case is mostly related to using RDFa to make parts of a page (the actual RDF triples) editable on client side, and allowing users to synchronize their changes with the server.
 
Microdata lacks some of the expressiveness of RDF, but it's not what makes RDFa useful. It's not clear to me what makes RDFa useful, if anything. :-)

Microdata is useful for a whole series of things, for example:

Dragging or copying data between sites
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019835.html

Annotating structured data that HTML has no semantics for
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html

Helping people seaching for content filtered by license
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019668.html

(Note that the syntax of microdata has changed a bit since those days, but not in a way that changes the use cases.)
 
If you think "parse RDFa data into a triplestore and run SPARQL against it, either using Javascript on JSON result or XSLT on XML results (+CSS) to look after presentation" deserves the qualifier "just", I think we may have different experiences of how Web authors do things.
 
How could one make a useful presentation of the iguana collection without some custom code with domain knowledge? Parsing the data into a datastore of some kind is the easy bit, massaging it and combining it in meaningful ways is the only way to add value, and requires actual work.
 
It's also trivial to collate items scraped from several pages into an "itemstore" if you will, it requires no domain knowledge. The only difference that I can see is working with items or working with triples after that point, and what tools are available. Are there awesome and useful things you could do easily do with an RDF graph that would be very cumbersome to do without it?
 
Whether you write the script in a sane scripting language or in SPARQL, you still have to write a script, and I wouldn't use the word "just" to describe that step.
 
+Philip Jägenstedt (re. presentation without domain language) I would like to mention RelFinder[1] as an example of a general purpose RDF graph visualization tool, with which as long as you have an RDF store with SPARQL endpoint you can use it. Goggle search is itself a general purpose data presentation tool and I would like to see improvements (such as, let you ask what's the relationship between A and B).

The problem with microdata is that it encourages the pattern of attribute-value pairs of which the value is a dead-end non-URI string and not an item in another document, and hence data aggregation can not happen. However, I agree that RDFa wouldn't make this situation any better because it encourages the same pattern and in fact major deployers just treat it as an SEO tool.

I am happy with the "itemstore" idea if it is capable of data aggregation.

[1] http://www.visualdataweb.org/relfinder.php
 
Graph visualization is fun, but it's hard to imagine that presenting the iguana collection as a graph would be more useful than the original HTML from which it was scraped, at least to non-geeks.

RDF also has string literals, but why is that a problem? At the end of the day, if your iguana has a name, there must be a string as a leaf of the graph, right? There is a (theoretical?) problem of detecting sameness of items of course, itemid might help with that if people are pedantic enough with it. It's basically the same as with RDFa, where you shouldn't use blank nodes if you care about this issue.
 
As for "itemstore", it would just be merging the output of the JSON conversion of all pages. If you have a lot of data and need indexing/search/etc, I guess something like MongoDB or CouchDB could come in handy. I don't think there will be that many iguanas, though :)
 
+Philip Jägenstedt So what differentiates an "itemstore" from a "triplestore" or a "quadstore" for that matter? The reason I ask is because this conversation strikes me as an exercise in re-inventing the wheel. Yes, RDF/XML has some really craptacularly confusing aspects to it as does SPARQL and Microdata and RDFa, but having Microdata, RDFa and Microformats as "solutions" aren't doing Web developers any favors. So, what is the "itemstore" going to do that the "triplestore" or "quadstore" doesn't already?

My point being that there is a great amount of discussion in this thread about the differences between RDF and whatever Microdata's data model is - but in the end, both are expressing graphs (mathematical definition). I don't necessarily care if people call it RDF, or a graph, or tree, or Microdatamodel, or Frank - the nuances between each approach are vanishingly small. In the end, the structure of each can be transformed to one another, modulo datatypes, and end up doing effectively the same thing - expressing a graph. All three "solutions" can also be expressed as a tree.

This whole graph vs. tree sub-thread is such a red herring:

http://manu.sporny.org/2011/uber-comparison-rdfa-md-uf/#comment-416

That said - +Danny Ayers we've found SPARQL to be complex and overkill in most situations. SPARQL is adequate when you have a full "Semantic Web Technology Stack" - all the parsers, storage engines, query engines and serializers. Most Web developers, including our company, doesn't have access to a system that is capable of being put into production use /and/ is open source, backed by a thriving community. There is no sight of one on the horizon (not like CouchDB, MongoDB and others). This is why we just convert our RDFa/Microdata/Microformats into JSON-LD and work with the data like that. It just requires a readily available set of open source parsers (RDFa and JSON-LD for now), an open source storage engine (MongoDB/CouchDB), and glue code to put it all together (querying is easy once you can identify the items using a IRI, or know that you're looking for a particular IRI property).
 
+Philip Jägenstedt Well, I am just bringing up a general purpose RDF visualization tool you asked for, and I didn't look into the iguana collection use case. I do think having a simple user interface from google for looking up the relationship between A and B is useful, and calling it "graph visualization" just misses the point. "itemid" could potentially be used to build similar system but the attribute itself really just looks like an extra, unnecessary attribute in the microdata system, just like @resource in RDFa. (So I said RDFa wouldn't make this situation any better)

And I would stress again that there exist RDF people who don't care about this RDFa vs. microdata debate. They simply dump their database into piecewise interlinked RDF/XML or N3 (that is, RDF literals are not important there besides for the purpose of displaying a name) or randomly choose to embed RDF into HTML (aka. RDFa). This in my point of view is totally separated from the SEO use cases that normal Web authors care.
 
+Manu Sporny, that is precisely my point – that from a scraping/aggregation point of view, there's very little difference. AFAICT, the easy part is still easy and the hard part is still hard with both RDFa and Microdata. No surprises there...
 
+Philip Jägenstedt yes, in the end all formats can represent simple data simply, though RDFa appears to have advantages in more tricky cases (which we sometimes see with VIE when editing content via these annotations).

In my view, having three formats for exactly the same purpose is just silly. With same purpose I mean the case most people are going to use them for: marking up some basic data like contact information and events.

The main issue with RDFa that Microdata seems to address is perceived complexity. On this I agree with. While RDFa can be used quite simply, the documentation and tutorials for it are making things overcomplicated. But that isn't an issue in the format itself, just with how it is explained.

Doing a competing spec, like has happened with Microdata, in order to fix an issue of documentation is not very productive. I hope a common format can be agreed on, even if it only supports 80% of all the collected use cases.
 
+Philip Jägenstedt sure, I'll try to find time to blog about that next week. And yeah, creating a new super spec gets you to the XKCD situation. But you could also deprecate a spec in favor of another.
 
+Henri Bergius, So are you guys going to deprecate RDFa? ;-)

Note that deprecation doesn't solve the XKCD situation if software developers still feel like they need to consume the format. The W3C tried stopping HTML already. It seems that Bing developers still felt like they needed to add support for RDFa-esque data even though schema.org was supposed to obsolete RDFa for search engine purposes already when Google failed to go all the way and remove their pre-existing pseudo-RDFa consumption features.
 
I personally would be OK with it if Microdata was demonstrated to be good enough. But to toss the ball back, would you be willing to deprecate Microdata if the opposite was shown to be true?

I think to achieve unification, both sides have to be willing to compromise on some things.

The situation you describe with Bing is the exact outcome of the current status: because there is no consensus, both publishers and consumers of data have to support both formats.
 
If RDFa was any good at solving the use cases that microdata is intended to address, I wouldn't have done microdata in the first place. The problem is RDFa is horrifically badly designed for those use cases (either because it'd badly designed, or because it wasn't intended for those use cases — I've always thought the latter, but Manu insists it's the former), and has only been getting worse.

As far as I can tell, RDFa and microdata aren't competing standards. In practice (regardless of what RDFa was designed for), they don't address the same use cases. RDFa addresses some obscure RDF-related needs. Microdata addresses the needs described in my earlier comments. Trying to use microdata for RDF-related needs, or RDFa for the needs I list above, just results in author pain.
 
+Ian Hickson Be specific. Show me the link to where I insist that RDFa was designed to address /every/ Microdata use case. I don't remember saying that. In fact, if you could outline which Microdata use cases you think that RDFa does not address, that would be a productive discussion. What I said was that RDFa was not designed to solve /every/ Microdata use case, but it does solve the vast majority of them. Yes, Microdata supports drag-and-drop and RDFa doesn't. I will add that to the uber-comparison of the languages since it's not in there yet:

http://manu.sporny.org/2011/uber-comparison-rdfa-md-uf/

At what point does an additional use case warrant a completely separate specification that does the work of an already pre-existing specification, but in a mildly different/incompatible way? Are there other features that RDFa 1.1 doesn't have that Microdata 1.0 has? From my understanding, there were 5 things that RDFa does that you seemed to not want in there (prefix rebinding, graph vs. tree, data typing, CURIEs, vocabulary mixing). Of those five, one of them is a red herring (graph vs. tree). In addition, we have this from you:

http://www.w3.org/2010/02/rdfa/track/issues/66

Are there other issues? Be specific.
 
I would be hard-pressed to point to anything in RDFa that I think is good. The whole spec is just a disaster from top to bottom. It's unusable. It makes SVG look well-designed. Heck it makes HTML look well-designed, and you have to really work to do that.

But the time to listen to my feedback on this was years ago. I've no interest in trying to help RDFa improve; for the use cases that matter, we now have a solution that works, and RDFa can be left to die.

I was going to post a comparison of XForms, XHTML2, and RDFa, but I'll post it in my stream instead.
 
+Henri Bergius, the creation of Microdata, the adoption of Microdata by schema.org and the choice to pursue Microdata API implementation in browsers are all cases where people who have been well aware of the existence of RDFa have chose Microdata instead.

To get from this situation to deprecating Microdata because of RDFa being shown to be OK (after all) is not a realistic scenario if RDFa remains substantially similar to RDFa-as-we-know-it.

If RDFa changed enough to moot Microdata, it would be so different from RDFa-as-we-know-it that for practical purposes it would be a new language--i.e. the XKCD situation.
 
+Henri Sivonen since we're apparently both in Helsinki, I wonder if you'd like to meet and chat about this. I would like to understand your viewpoint to the issue a bit better.
Add a comment...