Shared publicly  - 
 
"*RDFa*: A horribly-designed technology that built on a technology without broad adoption. While it was the only game in town, it gained some traction, but its horrible usability actually pushed some of its biggest adopters (e.g. Google) to seek alternative solutions. End result: A vastly simpler technology designed to address the use cases that people actually care about that ignores the theoretical is developed. The RDFa community try to form a task force to do something about it. What happens next is as yet undetermined."
Ian Hickson originally shared:
 
It's interesting (from a historical standpoint) to contrast RDFa with other W3C technologies that have tried to improve on HTML only to find HTML itself improve on them:

XForms: A well-designed technology that addressed a number of quite important use cases very well. Its fatal flaws: it wasn't backwards compatible with the Web, and it didn't do a good job of balancing the declarative with the imperative when it comes to the preferences of Web authors. End result: Web Forms 2 (later merged into HTML5, now just the forms features in HTML, which build on the forms features from HTML4 and the script APIs in DOM2 HTML) is addressing the same use cases, despite being a far less technically clean design. (XForms WG spends considerable resources trying to make HTML adopt XForms. Fails due to lack of browser interest.)

XHTML2: Primarily seemed intended to address a non-problem ("tag soup"). The language itself was clean, and had good ideas, but insufficient new features meant the language never gained wide interest. Its fatal flaws: it wasn't backwards compatible with the Web, and it didn't solve real problems. End result: Some of XHMTL2's strongest proponents (e.g. me) ended up reviving the old HTML and fixing the real problems that had been left unaddressed. Some of the good ideas were adopted and adjusted to work in a backwards-compatible way. (XHTML2 WG spends considerable resources trying to stop W3C adopting HTML at all. Ends up closed instead.)

RDFa: A horribly-designed technology that built on a technology without broad adoption. While it was the only game in town, it gained some traction, but its horrible usability actually pushed some of its biggest adopters (e.g. Google) to seek alternative solutions. End result: A vastly simpler technology designed to address the use cases that people actually care about that ignores the theoretical is developed. The RDFa community try to form a task force to do something about it. What happens next is as yet undetermined.

It's worth noting that quite the opposite has happened with some other W3C technologies. For example, SVG and MathML both got adopted by HTML, despite both of them having problems (both are IMHO over-engineered in various ways). I think the difference is that Math and Vector Graphics are problems so large that "doing it right" would have taken a prohibitively long time. In comparison, forms, document markup, and inline annotations are relatively trivial problems that can just be addressed directly in HTML without much effort.
6
David Megginson's profile photoIan Hickson's profile photoPhilip Jägenstedt's profile photoJeni Tennison's profile photo
71 comments
 
I very much agree, Jeni, not only about RDFa, but about RDF in general.

I was an enthusiastic early RDF adopter in the late 1990s, and wrote one of the first widely-used RDF libraries, but the community was hell-bent on obfuscating it into tuples and other knowledge-representation models that could never get wide acceptance, rather than sticking with simple linked data following the XML hierarchical model.

Now, nearly a decade and a half later, RDF is still just about to take off, I hear. :)
 
+David Megginson Yup, a few thousand hours invested here, glad I did and for what I've learned, but something is wrong.. I mean when you can search the biggest job/project/contract websites for a tech as old, mature and big as "RDF" and get zero results back, that's got to mean something.

Loathed to say it since I'm heavily involved and highly rate a lot of people, and techs, in the community around it - but something doesn't add up.
 
I have really never understood what Ian is going on about with XForms not being "backwards compatible" with the web. The only way I can make the statement make sense is by assuming he means without use of namespaces. I use mixed HTML4 forms and XForms in the same page and have no trouble.

But then again, it's the same phrase Ian used at the 2004 W3C Workshop on Web Applications and Compound Documents, but there he said it about HTML; since Opera had spent a lot of money making an IE5.5 bug-compatible HTML he said that ought to be frozen as HTML. IIRC the Microsoft representative (who was pushing what would later become XAML) agreed, saying that it was a good idea to freeze HTML and let OS companies innovate in application space and leave the web for web pages.
 
+Danny Ayers - good point about graphs. That's where, I think, the "linked" part of linked data comes in. Represent simple data structures hierarchically, but complex structures as graphs of relationships of simple objects.

Of course, that's also (unfortunately) theoretical. In plainer speak, "put XML documents online and link 'em to other XML documents." People can get that.
 
People (including Ian) might forget that technology builds on the technology of others. If RDF didn't do the complete job, so what? Was there something that popularized the idea of linked data before RDF? Why didn't someone invent linked data before they discovered the problems with RDF, XHTML, XForms? Humans don't often go from 0 to perfection in a single step...
 
+John Kemp RDF never did popularize linked data, so is less a stepping stone than a dead end, like the pre-HTML hypertext systems. Instead of building on it, it would be best to tear it up and start over.

I don't remember for certain, but XLink might have started before RDF, FWIW. That didn't really take off, either.
 
+John Kemp ever see the everything is a remix series of videos? ( http://www.everythingisaremix.info/watch-the-series/ ).

RE linked data before RDF, you talk about them like two different techs? Linked Data is just "the point of RDF" written out in plain english as it seemed to have been lost and skewed over the years, RDF is one way to realise it, and there are other ways too - but in many ways Linked Data came ~before~ RDF, at least it sure seems that way for me, an easy to remember meme assigned to a long term message (even though much of that message, as in 50%+ of the linked data design issue's content, still fails to be noticed by most). Aside: I always find this set of slides explains things better than I ever can: http://www.w3.org/2007/Talks/1211-whit-tbl/#(1) - the vision and model are pretty clear.

RDF has a life of it's own now, but the core of it isn't too complex and can be applied to many things - essentially you just take L-Base and the basic principals of the web / linked data, merge it with a familiar model and wrap it up in a syntax for passing over the wire. The only remixing that needs to be done is finding the right mixture to appease the masses, i.e. make the correct design trade-offs to get something both useful and easy to use. "RDF" isn't quite there and never will be since it's a mature tech with backwards compatibility to think of, but it's certainly possible and the ingredients are most certainly widely available.
 
+David Megginson, Since the BBC publishes RDF music catalogues on its
website, and many other big companies produce and consume RDF, I'd say
there's an existence proof of some sorts. Is that not linked data?

I don't know whether it makes sense to build on or tear up RDF, but I do
know that regardless of which of those happens, RDF will have been
influential.

Dead-ends are stepping stones too.
 
+John Kemp: Good point about technologies building on each other. Of the ones in the OP, XHTML2 is an especially good example of this: we used a bunch of ideas from XHTML2 in the HTML5 work. Similarly, microdata was naturally influenced by both Microformats and (to a lesser extent) RDFa. And of course XForms was the original impetus for the HTML5 work — without it to kick-start us, who knows when we would have resumed work on HTML.

+Danny Ayers: Ignoring purely theoretical use cases and focusing only on concrete problems is a core part of how we've managed to make such rapid progress with HTML and other Web APIs in recent years. The rationale here is that we don't know what problems we'll have in the future, so instead of trying to guess, we'll just address those in the future, and focus on today's problems now. This same mindset is why we're using a "living standard" model rather than the traditional "write-publish-implement" model where a technology has a point where it is called "done".
 
+Gregg Kellogg sadly it seems we can't get beyond syntax and datatypes.

+Ian Hickson True enough, Haakon issued the ultimatum in the form of a last-call comment on XForms, but I think it would be more accurate to say the W3C is the reason you and Haakon started on WebForms and HTML5, and XForms was just the first thing the process produced that might have mattered.

WebForms doesn't do MVC, and I still believe that MVC has a place in browsers: Flex and Silverlight and other former competitors to succeed the HTML4 based web have MVC. MVC not fundamentally incompatible with HTML nor does it break backward compatibility to add to HTML. Consider that HTML5 has several ways of drawing: Canvas for bits, HTML elements and CSS for text and layout, and SVG for graphics, and all co-exist (some of them even with namespaces). XForms could easily have co-existed with HTML5, and the data model language (XML or JSON or whatever) doesn't really matter. The important points of interoperation are already defined on the DOM: DOM Events, CSS, and the various WebAPI JavaScript interfaces are all fine working with any of HTML5 markup, SVG, and XForms.

The speed of current JavaScript engines means that MVC frameworks can run inside the browser, and be fast, and we're seeing that it's possible to do application development at a level above what HTML has to offer, though for syntax and datatypes, the choice these days is JavaScript.
 
+Danny Ayers: I don't think I extrapolated that theory was bad. I'm not suggesting HTML (including microdata, WF2, etc) is good. I'm only suggesting that it is getting more traction in the market. It's quite possible that what leads to a success in the market is not what we would consider "good".

There does seem to be a misconception that I think that anything I work on is automatically better than anything else. That is most definitely not the case. XForms is a classic example of that, where I think XForms is technically far superior to anything HTML has ever done in the form space. What I care about is actually having an impact on real users, and actually improving the Web. To do this, the technologies need to actually get adopted. This is why I pay so much attention to what implementors want, and so little attention to theory and what makes a technology technically superior. I would rather deploy an inferior incremental technology after one year than spend ten years working on the ultimate solution to the world's problems only to have that solution not ship for at least another twenty years, if ever.

+Leigh Klotz, Jr.: We started the WHATWG a long time after we started work on Web Forms 2 (later to become HTML5). The work on HTML5 started entirely because of XForms. We wanted to do that work within the W3C, but they refused, which is why we started the WHATWG.
 
Out of curiosity, how many outside data collections does the BBC catalogue link to?
 
+Jeni Tennison - that's good to hear; I'm glad BBC is reaching out past their own dataset. That's a model I'd like to see government open-data project emulate more often.

MusicBrainz publishes RDF data, so that's a bona fide outbound link to other linked data. +1, as we say on G+ :)

Does Wikipedia publish RDF data as well? I know that RDF files frequently link to resources that aren't RDF, and that's a good thing, but it's not the same as having a web of linked data. DBPedia is interesting, because it's basically a dataset scraped from Wikipedia - not a bad way to kickstart an RDF data web, but eventually, we'll get tired of kicking if the engine won't fire.
 
http://wiki.musicbrainz.org/LinkedBrainz is worth reading. MusicBrainz ditched its RDF-based web service a long time ago, but now does publish RDFa. Given http://musicbrainz.org/robots.txt, is anyone aware of anyone consuming MusicBrainz data as RDFa, as opposed to http://musicbrainz.org/doc/XML_Web_Service/Version_2 or http://musicbrainz.org/doc/Live_Data_Feed ? (I'm not trying to make a point here, I actually spend a lot of time editing MusicBrainz and writing scripts using its Python-wrapped XML API and am curious if there is anyone who's found the RDFa "interface" useful. It certainly looks rather complicated, with 15 namespaces in one example I checked.)
 
+Yves Raimond might possibly be able to tell us more about the BBC's use of MusicBrainz, and how it consumes its data.
 
+Danny Ayers: I'm not suggesting that it's either one or the other. Only that having an impact is a requirement, and that being technically superior is not. I think it's obvious that having both is ideal.
 
Hello!

Yes, please ask me any questions about BBC's use of RDFa. Slightly annoyed by Ian Hickson's post and the eternal 'that's what implementers want' - we implement quite a lot of stuff on the web, so belong in this category I guess, and RDFa suits us very well.

Always very suspicious of people quoting 'the crowd' - it is just silly...

Cheers,
y
 
Yves, is the BBC scraping MusicBrainz using RDFa? I was always under the impression that MusicBrainz' customers were using the database replication or the live feed, as scraping the website or even using the XML API would just hammer the server into oblivion if you do it on a large scale.
 
Indeed, we replicate the musicbrainz data within the BBC - it would be too much strain on Musicbrainz if we'd hit them directly.
 
By implementors here I meant browser vendors (and to a lesser extent, search engine vendors, validator vendors, and the like — the people whose code decides whether the technology succeeds or not). Web site implementors are "authors" in the user/author/implementor/editor terminology used by the HTML design principles, and are important as well, but generally I pay more attention to what authors need than what they want. (It's quite eye-opening to see how self-reported desires from users and authors often do not correspond to actual needs. For example, I frequently see Web authors ask for features whose entire purpose is just to work around specific browser bugs: a nonsensical request, since it'd be simpler for the browsers to just fix the bug than to introduce a new feature that may itself have bugs.)
 
Hmm, so is RDFa at all involved when it comes to your use of MusicBrainz?

While Ian is terse and "frank" as usual, he's actually not lying about what implementors want. To browser implementors RDFa is just not very palatable at all, since it uses a feature which does not exist in HTML (XML namespaces) and is designed in a way that seems to make DOM APIs for it very complicated to implement and use. This may sound like a bad excuse for the NIH syndrome, but not having a solid DOM API misses out on the opportunity for scripts to do useful things with the data in the page.

Of course, there are other kinds of implementors, such as yourself. It'd be interesting to know if your use of RDFa relies of any of the things that microdata cannot represent. Last time I checked, that was XML literals, using blank nodes as objects and datatypes.
 
+Philip Jägenstedt (With you on DOM APIs.)

The microdata mapping to RDF at http://www.w3.org/TR/microdata/#rdf handles blank nodes (items without an itemid). I think XML literals and datatypes are the only thing in the RDF model that microdata can't capture or map.

There are places where the microdata data model is disconnected with the one that is produced through the RDF mapping. For example, microdata has each item having only one class. You can use itemprop to provide additional types in the RDF that is generated from microdata, but if you do so, those classes aren't available through the DOM APIs or in the JSON in the same way as the primary class.

Another subtlety is that even though IIRC in microdata you have to use an href attribute (on <a> or <link>) to provide a property value that is a URI, that isn't exposed in its data model: a property whose value is a URI is a string like any other, and different from a property whose value is an item with an itemid. Here, it isn't that microdata can't represent something that RDFa does but rather that the entity-attribute-value representation generated from natural HTML markup is different from (and less useful in RDF terms from) what would be generated by seemingly equivalent RDFa.
 
+Jeni Tennison that single type limitation is artificial though surely, any reason why a white space separated list couldn't be used?

Also, it appears that itemscope isn't needed, since "The itemtype attribute must not be specified on elements that do not have an itemscope attribute specified." then it stands to reason that if only itemtype were present then that would signify a new itemscope, and if the item were untyped then itemtype="" could easily be used.. just a thought.
 
Oops, I forgot that http://html5.org/r/6277 changed this a bit, even though I actually implemented it myself :-/ You're correct about the rest as well. Certainly at this point it's not terribly relevant if one is a subset of the other. Even if they were isomorphic, it would still be the case that it's easier to express nested name-values in microdata and easier to express complex graphs in RDFa. So obviously, if your use case is to export your pre-existing RDF graph in the form of an HTML page, then RDFa is for you.
 
I do think that Linked Data people are trying to solve a very concrete problem rather than a theoretical one (although sometimes a political problem rather than a technical one) - some information that ought to be public is not on the Web, and when the information is on the Web it is not necessary reusable. I am personally happy with any technology that can be used to achieve this goal no matter that's HTML5 in general, microdata, SQL dump, or RDF, and "my" definition of the Web's improvement solely depends on how much reusable information there is on the Web, not how big the browser technology stack is. I believe most Linked Data proponents think in the same way.

It would be nice if search engine implementers can work with content publishers and make rapid progress in this very problem as well, or we end up having a huge browser technology stack, known as HTML5, which every Web application from Flash is ported to but no new information is exposed.

See also Quora's "Why has the Internet not disrupted the scientific publishing industry to the same extent that other forms of media have been affected? " http://www.quora.com/Why-has-the-Internet-not-disrupted-the-scientific-publishing-industry-to-the-same-extent-that-other-forms-of-media-have-been-affected?q=scientific+publishing+disrupt
 
Nathan, I don't know, what would this be as RDFa?
<div itemscope>
<div itemprop="p1">foo</div>
<div itemprop="p2" itemscope>
<div itemprop="p21">bar</div>
<div itemprop="p22">baz</div>
</div>
</div>
 
Man, I cringe every time I see that itemscope attribute. Why not just require a itemtype attribute, use that to define the scope of the properties, and allow multiple whitespace-delimited values?
 
+Philip Jägenstedt it would be the same, just swap itemscope for about="" and itemprop for property. Hence why I'm saying, it is no simpler to use for all the usecases microdata covers, it's only simpler as a spec, because there is less of it and it doesn't expose as much functionality. NB, the other problems noted such as multiple types have all been solved for RDFa a long time ago.
 
Nathan, http://www.w3.org/2007/08/pyRdfa/ doesn't produce any triples from this input:
<div about="">
<div property="p1">foo</div>
<div property="p2" about="">
<div property="p21">bar</div>
<div property="p22">baz</div>
</div>
</div>
 
No need to tell me, I've heard the endless moaning about the study. Sure, a study of 100 people would have been better, but who's going to pay for it? The reasoning and conclusions seem reasonable and I think the spec is better now than it was before, even if itemscope+itemtype doesn't resonate with my personal aesthetics.
 
+Nathan Rixham: I was physically there, watching from behind the one-way glass, and the difference in reactions between when we tried using itemscope="" vs triggering it on itemtype="" was night and day. I really wish we could make the videos available too but we can't for privacy reasons. (As the guy who originally specced it as Bruce suggested, I have to say I was shocked to see this difference. I agree with you that it seems neater the other way.)
 
Nathan, putting absolute URLs everywhere still doesn't produce any triples. I'm just incompetent at writing RDFa, you're going to have to show me just how easy it is.
 
As a user (e.g. someone who publishes HTML, and writes some tools sometimes to generate it), my big problem with boolean attributes in general is that I have only ever written (and am quite comfortable with) XHTML. So the examples immediately flag for me "OK, that's not valid (XML)." Then when I poke around I realize I'd end up having to do the extreme awkwardness of:

<div itemscope="itemscope">

So then this goes back to the representativeness of the user-test population, and the methods. Did you try the itemscope alternative, +Ian Hickson, with people used to dealing with XHTML? If yes, were the results (really) the same?
 
If you must write XML, then itemscope="" is a bit less horrible. The original design wouldn't have been any better, though, as item="" and item="item" would mean completely different things but both could appear to mean the same thing as item in HTML.
 
+Ian Hickson Perhaps enough time has passed that people may "get it" now, perhaps there's a way to work in that the presence of itemtype explicitly sets a new itemscope, such that it may be adopted or phased out in the future if itemscope isn't deemed necessary - might be worth considering.
 
+Bruce D'Arcus: To a first approximation, nobody uses XHTML, so the needs of HTML authors, where they conflict, take priority. But in any case, as +Philip Jägenstedt points out, you can just say itemscope="" (empty value).

+Nathan Rixham: Not sure what you mean by time passing. The question here is about how people react when first exposed to the technology, not after we have had a few weeks to teach them. If there's one thing we have learnt over the past couple of decades, it's that education is very ineffective. There are only a few things I can point to where education has managed to overcome complexity. Not using tables for layout is probably the most successful, and even that hasn't been a huge success.
 
+Philip Jägenstedt: I should have said we mirror Musicbrainz (including the code-base) - we use the RDFa and the API. However, I don't think the post above was meant to mean that - just examples of RDFa deployment in the wild. For examples of our use of RDFa, see http://blog.dbtune.org/post/2011/06/28/Using-RDFa-for-testing-templates, http://www.slideshare.net/moustaki/linked-data-on-the-bbc-2638734 or http://www.slideshare.net/reduxd/beyond-the-polar-bear. The important point is that we use it for testing, and to share data across sites within the BBC. Therefore, we make extensive uses of things like datatypes, multi-typing (to handle both Rich Snippets and our vocabularies, such as http://www.bbc.co.uk/ontologies/programmes/, for example). Also, about the DOM API, I believe the RDF API was done to solve the same problem? Also, the W3C extractor is indeed broken - I've sent a report to its maintainer.

+Ian Hickson: I now see what you mean by implementors - this wasn't clear to me in your original post - sorry about that. However, considering what implementors want as the same as what authors need is a bit of a stretch imho. Going back to Microdata though, I am still struggling with the point it is trying to make. RDFa is complex? It takes one hour of training for a new Web dev within the BBC to get up to grips with it. And it takes one hour for them to pick up Microdata as well - the important point is about embedding structured data within a page, and that's what takes time to grasp. RDFa is difficult to implement? In that case, why not work within that specification and fix it? About "the use cases that people actually care", what are they, exactly? What are the use-cases Microdata solves that RDFa doesn't (or couldn't with some efforts invested in its spec?).

I just don't understand there is a need for a separate spec and, more than that, that there's no way for Microdata and RDFa to work together, especially as they are published under the same umbrella.
 
I was actually trying to understand if there's anyone consuming MusicBrainz RDFa data and doing something with it. It sounds like you consume the MusicBrainz database and code base and publish RDFa in the same way that MusicBrainz does. That's all fine, but I am curious to know if anyone would bother with RDFa given the excellent XML API available. I'll just ask on the MusicBrainz mailing list, someone ought to know.

The RDFa DOM API is unfortunately not very good, I've sent detailed feedback about that in http://lists.w3.org/Archives/Public/public-rdfa-wg/2011Jul/0001.html
 
This is a great thread of discussion. I only wish G+ allowed one to follow a discussion thread like this (beyond just adding a comment).
 
+Philip Jägenstedt The RDFa was released just over a month ago, so still pretty new - give it time :) Although as mentioned before, we use it (but through our own mirror). I missed your email on the RDFa mailing list - will read now.
 
+Danny Ayers, when +Ian Hickson says "use XHTML", he most likely means application/xhtml+xml. application/xhtml+xml usage is frustratingly just enough above "nobody" that browsers don't consider themselves able to drop support but aren't gaining much by keeping support, either.
 
+Henri Sivonen I would be interested to have +Manu Sporny's view on that. Seen from the outside (I am not a member of both working groups), it looks like the inverse situation.
 
+Yves Raimond, it may look like the inverse situation now after schema.org. I mostly gave up on trying to get RDFa fixed in early 2009. Looking at my version control and email records, in January 2009 I was still trying to get RDFa changed and by May 2009 I had given up hope on getting RDFa changed and thought that Microdata was worth trying.

+Danny Ayers, XHTML syntax issues are just an illusion when not deployed as application/xhtml+xml.
 
+Ian Hickson "To a first approximation, nobody uses XHTML, so the needs of HTML authors, where they conflict, take priority" - two followups:

1) "nobody"? Who is population of comparison here? Though admittedly anecdotal, I certainly see a lot of XHTML out on the web (just off the top of my head, wikipedia uses xhtml, and most drupal and wordpress sites seem to be).

2) did you do the testing to even assess whether there was a conflict in this case (using a boolean attribute to denote variable scope)? I'm guessing from what you write, no?
 
+Yves Raimond FWIW i completely agree, being the person who handled the last lot of feedback in the RDFa Working Group - it was a very uncompromising call to remove half the essence of RDF(a) from the specification, especially everything to do with prefixes or URI shortening, things which simply won't happen. HTML won't support URI shortening, RDFa obviously requires it to be even half useful (and anything more than the limited subset which is microdata) so there's a stale mate here, and an introduction of some unexpected functionality - even though the RDFa WG made every effort to introduce alternative ways to shorten URIs, to the point that traditional Namespaces are deprecated for all but BC - however, that's only served to make RDFa more complex (or at least that's the message I'm hearing now, even though those changes were a direct response to the "namespaces are crap" train of thought). Really though, much of this is a complete waste of everybodies time - neither spec will be changed in any significant way, they won't be dropped, they wont be merged and any potential merging of the specs will only be yet another spec thrown in to the mix.

This, is a complete waste of everybodies time, it's more negativity, and yet another smoke screen - when the smoke clears again we'll be in the same position, semweb + RDF rep will be further damaged, and yet again nobody will have a clue what really happened.

Frankly, this is all bullshit and stalling. (No offence to anybody involved, but as a community or two, it's pretty poor performance, socially, considering the technical merit of all involved).

Let's be honest, the people in this thread alone could come up with something far better than both RDFa and microdata quite easily if they just put their heads together for 24 hours and were civil + open to each others needs - it's a very talented group of people discussing this, but nothing of any practical usage coming out of all these lines of text.
 
So I'm really trying to understand the crux of the problem here, and it's really hard because of the way this conversation has proceeded. What's the problem with URI shortening and solutions like RDFa 1.1 profiles?

From a markup authoring usability perspective (which I took to be the primary concern, but which I don't think is), it seems to be a nice solution; makes it easier for web devs, and they don't need to especially care about the URI mappings. I simply can't see a problem here; seems like a red herring.

From the standpoint of a browser-based API (both browser devs writing the API implementation and the web devs using it), I guess this is where the problem may be?

From an RDFa perspective, you'd effectively need to insist that the values of properties and types be URIs. And the browser would need to be able to construct those URIs. So with a profile, for example, it would need to look them up, and hence introduces an external dependency. Ditto namespace prefixes (though those are typically embedded in the same document; still, since HTML has no namespace, it becomes awkward).

Is that the bottom line: the RDFa requirement for decentralized extensibility based on URIs vs. microdata insistence on self-contained description?

If yes, then the only way to resolve at least some of that tension is to say no simple tokens for types and properties; only URIs. But then presumably some markup authors complain about requiring URIs, and some devs using the DOM API might complain about URI values rather than simple tokens.

So is that the issue?
 
+Bruce D'Arcus hopefully someone else will answer, who isn't from the (former) RDFa working group, however to quickly address some of your questions:

"What's the problem with URI shortening and solutions like RDFa 1.1 profiles?" - RDFa profiles do have their own set of drawbacks (such as dependency on external documents, and slower processing due to that, when caching isn't around that is), however @prefix and @vocab are both very simple approaches,@vocab especially.

"I simply can't see a problem here; seems like a red herring." - Likewise!

"From the standpoint of a browser-based API (both browser devs writing the API implementation and the web devs using it), I guess this is where the problem may be?" - If it is, then goodness knows why, it's incredibly simple to support @prefix declarations, and many developers (well everybody who's written any RDF, SPARQL, or RDFa tooling) has found it v easy, I certainly did.

"since HTML has no namespace, it becomes awkward" - HTML doesn't have nice reusable first class namespace support, however RDFa 1.1 does introduce @prefix rather than traditional namespaces, and this works for xml, html, atom - any markup format which can host RDFa - if traditional XML namespaces are avoided altogether, then it's a very simple affair, if mixing and matching and using the deprecated XML namespace functionality in RDFa with profiles and prefixes, then it does become a little bit more complex to support due to resolving potential collisions on prefix declarations - easily avoided by just not using XML namespaces though!

"the RDFa requirement for decentralized extensibility based on URIs vs. microdata insistence on self-contained description?" - if it is the issue, then I'd suggest it's an artificial issue, since both approaches can easily be merged by using @vocab and simple token terms (which concatenate to the vocabulary uri to make full URIs for each property / type), or by using profiles. Perhaps it is the issue though.. who knows? (seriously, who knows?)

"But then presumably some markup authors complain about requiring URIs, and some devs using the DOM API might complain about URI values rather than simple tokens." - as above, simple terms can easily be used in conjunction with both profile and/or vocab, and a DOM API could easily support lookup by the simple terms, or indeed aliases, or prefixed tokens, extremely easily, it only takes a couple of objects to resolve the terms to URIs and conversely, again very simple to implement - quite sure on that one since I spec'd it myself in the RDF Interfaces specification, and have implemented it multiple times (it's only a few lines of simple code to implement).

Glad you're trying to get to the bottom of the precise issue though, rather than just discussing the noise! Wishing you the best of luck!
 
+Bruce D'Arcus: I once did a search of several billion pages and saw 0.0014% of pages were XHTML. Note that in HTML now, the only way to distinguish XHTML and HTML is the MIME type. (This has always been the case for practical purposes, but it is now syntactically true also.) Not sure what you mean by "conflict". Boolean attributes are used all over HTML.

+Bruce D'Arcus: The problem with prefix indirection is that it is cognitively difficult. I don't know why; I find them pretty easy myself, as do many people in the W3C (or so they claim — I can't count the number of teleconferences I've been on where I've spent 15 minutes explaining XML Namespaces to other W3C people, including many who thought they understood them). See lots of anecdotal evidence of the difficulties here: http://wiki.whatwg.org/wiki/Namespace_confusion

+Danny Ayers: There's a +1 button on each post, no need to say "+1". :-)
 
+Nathan Rixham, regarding DOM APIs, I invite you to read through the thread starting at http://lists.w3.org/Archives/Public/public-rdfa-wg/2011Jul/0001.html

In a nutshell, it requires browsers to create an internal RDF graph and keep track of which part of the graph came from which HTML element. This is already pretty bad, but is made much, much worse by the fact that scripts can modify the DOM at any time and RDF graph must be kept in sync with that. It's obviously possible, but way more complex than, say, getElementsByTagName.

+Manu Sporny also tells me that prefix="" is getting the axe, so it's no longer relevant.
 
So then, after reviewing the "Namespace confusion" article, and all links that are still accessible from that page, and filtering anything which is specific to "XML Namespaces", since RDFa 1.1 uses prefix and other methods, the only point I can see that is still possibly valid is "The indirection layer from prefix to URI confuses people.".

So, let's look at "The indirection layer from prefix to URI confuses people." issue in more detail, again using the "Namespace confusion" document and all links from there, we get the following:

http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0064.html
"Wrong prefix used causing bad expansion", in short rdfs:Property had been written instead of "rdf:Property".
This mistake was more than likely made because RDF and RDFS properties are frequently used together, would this mistake have still happened if simple terms like "rdf-property" and "rdfs-property" were used? Yes, if anything it reflects on the decision to put properties commonly used together in different specs and spaces, rather than the use of namespaces specifically. Was it caught and fixed easily? Yes, the email is proof.

http://lists.w3.org/Archives/Public/public-rdfa/2009Mar/0068.html
"Declared prefix and used prefix differ", this is much the same issue as above, a prefix "dc" was used rather than "dct". Why did this happen? Well, most metadata experts used dublin core for many years, using the "dc" prefix, a few years ago dcmi released dc terms, again in a different space, and many of those people familiar with using "dc" as a prefix just kept on using it out of habit. Would this have still happened had a simple token "dct-conformsTo"? Yes, if shows us that humans are creatures of habit. Was it caught and fixed easily? Yes, again the email is proof.

http://lists.w3.org/Archives/Public/public-rdfa/2009Mar/0060.html
"the xsd namespace has not been declared", this bug is exactly as it sounds, somebody didn't declare the "xsd" namespace, a familiar and age old mistake, people often think it's always just "known" by the processor. Thankfully in RDFa it is always known by the processor, so omitting it doesn't cause any issue. aside: it is quite natural for people to expect that a format has built in datatypes, and much of the confusion over XML Namespaces in relation to XSD was often due to this prior assumption, it seems that way to me at least. Regardless, this was a minor bug that didn't have any effect, and again, it was quickly caught and addressed.

http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-May/019717.html
"Wrong URI declared (slash missing)", xmlns:v="http://rdf.data-vocabulary.org" vs xmlns:v="http://rdf.data-vocabulary.org/" - a schoolboy error, yet still a mistake easily made - I certainly concede that this is a valid point, however I'd strongly suggest that on it's own (as it currently appears to be), it's hardly a blocker - in the same way that people writing href="www.google.com" doesn't block href from being in a specification...

http://lists.xml.org/archives/xml-dev/200808/msg00030.html
"declaring the same nsprefix twice on one element" - a fringe question at best, however it is addressed when using prefixes in RDFa 1.1, last declared wins. Is there any need to labour this minor fringe case?

Now, there is quite a lot of feedback on "XML Namespaces cause confusion", however this is all centred around having elements from different specifications and modules working together in a single document, and not about simple "prefix indirection", thus moot in this discussion.

This leaves us with one final point, the use of "Hard-wired prefix", for those not familiar, this is where certain large entities like facebook and yahoo hard code their tools to recognise their own prefixes, and in many respects treat prefixed terms like simple string tokens, ignoring how the prefix actually resolves.

IMHO, the final point is the only one with any credibility in this entire discussion; and my own personal opinion is that if these large entities choose to do that, it's fine by me (and indeed some of the sentiment behind the "default profile" in RDFa 1.1), however I can't personally say that it outweighs the benefits which many if not all RDFa authors gain by using prefixes.

Time sink.
 
+Ian Hickson - so on the XHTML issue, you're talking about XHTML served as XML, rather than the markup (which is what I thought we were talking about).

On the prefix indirection, I was trying to get people to be really specific on exactly who we're talking about, and what the precise problem is, on the belief that it would be to everyone's benefit if you all could agree on what you're disagreeing about. But you kind of passed over that. :-)

I identified effectively three groups of relevant stake-holders:

1) embedded data markup authors (the web devs that put together HTML pages + embedded data; they're typically writing templates I'd say)

2) API consumers (that would write JS code to do stuff with embedded data via an API)

3) browser/tool developers (they write the parsers and the API code to facilitate #2)

So who is the indirection "cognitively difficult" for? All three equally? One or two more than others?

My assertion is concerns about URI shortening (prefixes, etc.) is less about group 1 than the other two, in part because it's possible to do in ways that can be transparent to group 1 (they don't need to care about it). In this RDFa 1.1 example, a template author doesn't need to care about what URI gets prefixed to the "title" token:

<div profile="http://www.example.org/test.html">
<h1 property="title">This Title</h1>
</div>

So there's no cognitive load; it seems to me; that's instead shifted to the parsing code.
 
+Philip Jägenstedt I think you may be misquoting there, given that I am a member of the RDFa group and work on the specification, however I double checked with +Manu Sporny since I have been ill recently, and his response was "wtf, who said that??" - so I think we can safely say that @prefix, one of the primary features of RDFa, will be staying ;)

re the DOM API feedback, I'll look in to that in detail in a moment.
 
+Nathan Rixham: You are welcome to your interpretation of the anecdotal evidence. That's what language design is all about. I happen to draw a different conclusion, and thus design languages differently. The market will decide which of us is right.

+Bruce D'Arcus: When it comes to the syntax, if you're not using the XML MIME type, you don't have to be constrained by the XML syntax, so yes, the MIME type is what's important in this matter.

As far as the prefix indirection thing goes, I think we have evidence that it's confusing to all three stakeholders you mention. For example, Facebook and pretty much everyone Google worked with for Rich Snippets have repeatedly screwed up namespace-based authoring in an RDFa context (group 1), and Google and Yahoo! both screwed up namespace processing (group 3). For group 2, I'm not aware of any APIs for this stuff specifically in this context, but certainly lots of people have gotten the namespace aspects of the DOM API wrong disproportionately compared to other things in the API.
 
+Ian Hickson - OK, thanks. But as outlined by +Manu Sporny in https://plus.google.com/102122664946994504971/posts/EuJxd385NMH that proposal for RDFa does not use prefixes or indirection (except optionally).

On "When it comes to the syntax, if you're not using the XML MIME type, you don't have to be constrained by the XML syntax ..." Is that really relevant to content authors? The fact is, tons of content on the web is authored in XHTML (in my case, it's because I prefer to use XML tools for this, and prefer to allow the content to be processed using XML tools), so if you care about usability, shouldn't you meet people where they are?
 
+Ian Hickson I'm confused now. You already said:

"By implementors here I meant browser vendors (and to a lesser extent, search engine vendors, validator vendors, and the like — the people whose code decides whether the technology succeeds or not)"

.. and now you are saying that the market will decide.

Which one is it?

I guess logically, if something isn't made readily available in the market, by the implementers, then it's pretty much doomed to failure within that market place.

Thus, I'm unsure that the market will decide which of us is right, since there are many markets, and one of those which I'm familiar with has already very widely adopted prefix based indirection, and it's an integral part. So, all that remains to be seen it appears, is whether it can make it in to parts of HTML, with some vendor support, so that the (well another, intersecting) market actually can decide.

nb, I speak of prefix based indirection of course, not XML Namespaces - the two are different.
 
+Bruce D'Arcus: optional features do not simplify the language. You can't ignore them. If anyone uses them, as an author, you have to learn about them because you'll end up having to maintain someone else's code that uses them.

Perl is a classic example of this. You can write really readable, understandable, clear Perl, if you ignore a lot of the optional features. But that doesn't mean Perl is really readable, understandable, and clear. (Note: I love Perl, just like I have no problem with namespaces personally.)

Regarding XHTML: By and large, while people think they're writing XHTML, they're really not. They use /> and quote their attribute values and use lowercase elements, but those are all valid in HTML too, and they do things that aren't valid in XHTML, like omitting certain tags, or not quoting some attribute values, or using <noscript>, or assuming <script> gets parsed differently, or... etc.

Anyway, itemscope="" works fine in XML. I don't see the problem.

+Nathan Rixham: I don't understand the question. Making sure things get implemented is a huge factor in making them a market success — if they're not implemented, nobody's gonna use them. That's why I pay attention to what implementors are willing to implement. Where's the conflict in what I wrote?
 
+Nathan Rixham, implementors are key part of the market when it comes to technology success. (Of late, I've had the feeling that RDFa advocates are seeking to define "market" as not including browser vendors or search engines with the implication that search engines and browser vendors should implement everything and then let Web authors as the "market" decide what to use. That's not how it works.)
 
+Ian Hickson "optional features do not simplify the language. You can't ignore them." I think, based on how +Manu Sporny explains it, here you can. But I'll leave that him to explain if he likes, since I don't follow the details of RDFa development.

On the "people think they're writing XHTML [but] they're really not" point, if I write my HTML in an XML editor (gives me live-completion and validation) against an XHTML schema, then I definitely am writing XHTML. How it's interpreted by a browser is a separate matter that's not relevant to the question of markup usability.

"Anyway, itemscope="" works fine in XML. I don't see the problem." My point is, it seems, you didn't test it. You constantly cite a usability study to justify particular design decisions (and more importantly, the rejection of RDFa), and then you now hedge by saying testing the boolean attribute solution for XHTML authors wasn't important. That's not consistent.
Add a comment...