After questioning the sanity of grafting large amounts of metadata onto HTML in recent posts by +Bruce Lawson and +Jeni Tennison, I decided to have a look at one example: Queen's album Hot Space on MusicBrainz, which publishes RDFa.

The original size was 26481 bytes, after removing (http://pastebin.com/LPEv3KHk) all RDFa attributes together with then-redundant elements it was 19087 bytes. The RDFa is ~28% of the document size, or adds ~39% to a hypothetical original size. Given RDFa's URI compression feature/bug (CURIEs) it's unlikely that the microdata equivalent would be any smaller.

MusicBrainz has an XML API and legislation.gov.uk publishes both XML and RDF, which is much better than any (lossy) HTML encoding. Given that, is it worth the developer time to try to graft all of this data onto HTML? Is it worth the extra (say) 10-50% of markup for the benefit of the hypothetical triple/item crawler that comes along? (None have been confirmed to exist for MusicBrainz yet.)

Because this kind of complex example has come up in the discussion around microdata, I must ask: Is all of this extra markup only for good measure, or are there actual, concrete benefits? It seems to me that HTML is not the best tool for data interchange on a large scale and that no one who actually cares about data quality would use it for that purpose.

(Dear MusicBrainz developers: Please take no offense to this post; I love you all and thanks for NGS.)

(Original speculation in http://www.brucelawson.co.uk/2011/microdata-help-please/comment-page-1/#comment-779035 and http://www.jenitennison.com/blog/node/160#comment-11178)
Shared publiclyView activity