Shared publicly  - 
 
I've seen some talk here on Google+ that perhaps the so-called "Semantic Web" will go the way of XHTML and be replaced by the "Social Web". I beg to differ. I think we will see the social web, but the social web will also be semantic. I expect to hear more about projects like SIOC in the near future, especially given Google's history of trying to rally people around related standards (ActivityStreams, etc.).

cc: +Chris Messina +Stephen Weber +Will Norris +Joseph Smarr +Chris Saad +Tantek Çelik +Kevin Marks +Varish Mulwad +Ed Chi +John Breslin +Dan Brickley +Evan Prodromou +Monica Wilkinson +Tim Finin ...
4
1
Mike English's profile photoTantek Çelik's profile photoBruce D'Arcus's profile photoJack Park's profile photo
21 comments
 
This might sound controversial but, at least for the foreseeable future the semantic web simply won't work - especially in the context of Social. Social is Synaptic not Semantic. It already works that way.
 
The uppercase Semantic Web is just an academic pipe dream. If you mean putting semantics in web pages or otherwise having machine-readable data available... that's not going away :)
 
+Stephen Weber to your point about machine-readable data already being available, this is part of what I'm getting at - we'll only see the layers of semantic metadata that have real utility. For example, we're already seeing some adoption of Schema.org vocabulary terms because it holds the promise of better SEO - translating to higher value (and volume) of traffic from search. Whether or not that particular initiative catches on in a big way is irrelevant. The point is that it makes (financial) sense to use a common vocabulary (especially if it will impact your search rankings).
 
I think there is similar value to be found in coming to common terms for describing social connections. +Chris Saad pointed out that social is more synaptic than semantic, but what about the problem of sorting your friends into circles? That's a potential layer of metadata that goes beyond simple binary connections, and it's data that has value to you as a user.
 
Now, that layer of metadata (sorting of contacts into circles) might not be described in a machine readable way right now, but there's the potential added value of being able to augment and extend this manual sorting process with algorithmic sorting if there was a way of describing what criteria you were using in a machine-readable way.
 
Algorithms, yes. Semantic Algorithms, Unlikely. More likely people would be auto-grouped by how frequently they appear together in emails or messages which, again, is a synaptic process rather than a semantic one.
 
Have to mostly agree with +Chris Saad and +Stephen Weber here.

+Mike English - one point of yours I agree with: "it makes sense to use a common vocabulary" - which is why http://microformats.org was founded with that purpose, a small number of specific immediately useful (80/20) vocabularies developed scientifically via a process (http://microformats.org/wiki/process).

Now, you mention SIOC - and perhaps I've stayed silent on this far too long.

SIOC is part of the uppercase Semantic Web academic pipe dream that +Stephen Weber mentions.

Where is the science, the research, the use-cases that went into the development of the SIOC vocabulary?

Frankly I view SIOC (and most uppercase Semantic Web vocabularies) as "semantic alchemy" - they're just stuff that "experts" made-up - no documented research of real world web content publishing behaviors went into them - if they had, they'd have abandoned any semblance of RDF (or XML) a while ago and based something on HTML - which is what and why we did so with microformats.

And it's not just SIOC that's the problem. It's the very mindset of first thinking "SIOC project" or "standards" rather than use-cases and scenarios.

Semantics for semantics sake or standards for standards sake are almost a complete waste of time.

If you want to talk about what we'll see with the "social web", focus on developing use-cases, user scenarios, build something that users will use (start with your own site per indiewebcamp.com) and then once you've got things working for real that you use on a daily basis, then let's talk standards and interoperability.
 
So perhaps SIOC and the capital "Semantic Web" were the wrong things to call out. As above, I'm all for solutions that have real, practical utility and value. The reason I did choose to link to SIOC is simply because it provides large, descriptive, and already established vocabulary. It makes sense to me to borrow as many terms as possible from existing solutions before trying to re-invent the wheel. (Duly noting that microformats are another existing - and quite good - solution that I should have also mentioned above.)

I'd really love to hear a response from those behind the 'academic pipe-dream'...
Is SIOC the appropriate analog to the Dublin Core for the social space?
 
+Mike English I agree that it makes sense to at least research existing solutions before trying to re-invent the wheel, and yes, often borrow terms from them as well - but this doesn't happen automatically. It's specifically why the microformats process explicitly requires "researching previous formats".

You mention Dublin Core - in practice I've found Dublin Core only useful as a reference to previous work. Do you know of any real world applications, sites that actually use (publish and consume) Dublin Core on the open web today?
(Yes I'm saying Dublin Core is useful purely for research into previous formats, and not useful for any actual practical deployment, tools etc.)
 
DOAJ.org provides access to records using (OAI) DC. Worldcat.org provides XML results from their search API using DC. Zotero consumes an RDF format based on DC...
 
DOAJ.org is down. http://www.downforeveryoneorjustme.com/DOAJ.org

http://www.worldcat.org/ appears to only index actual "libraries" - not the open web (plus XML results are a bit dated these days, if they're not updating to JSON, they're behind).

http://www.zotero.org/ seems interesting - might try it out. Anyone here using it and if so, for what practical purpose(s), how often, and when was the last time you used it?

(oh and none of those interoperate - DC is not much of a "format" if no-one is actually interoperating with it - more like a vocabulary template for incorporation into proprietary APIs - hence what I said - useful for research into previous formats, but not a useful self-standing vocabulary/format of its own)
 
If you're looking at bibliographic data, there's a draft survey just circulated from the w3c 'linked library' incubator, see blogs.curtin.edu.au/libres/2011/07/01/w3c-library-linked-data-incubator-group-draft-report-call-for-public-comment/ and nearby:

(excuse the hasty reformatting)

http://www.w3.org/2005/Incubator/lld/wiki/DraftReportWithTransclusion

...which consists of “Benefits” “Vocabularies and Datasets” “Relevant Technologies” “Implementation challenges” “Recommendations” – “Use Cases”, a survey report describing existing projects...

http://www.w3.org/2005/Incubator/lld/wiki/UseCaseReport

– “Vocabularies and Datasets”, a survey report

http://www.w3.org/2005/Incubator/lld/wiki/Vocabulary_and_Dataset
 
Hi all -

I don't want to get into the old "microformats-for-real-needs" versus "Semantic Web-for-academics" battle with +Tantek Çelik as I know many are tired of that, and I prefer to think of all efforts working in the same direction - adding more useful metadata to the Web.

Development of SIOC was based on contributions from both academics and industry, and contributions from my own personal experience of running a large network of discussion systems that sat apart as data silos (boards.ie has many disconnected forums, blogs, newsgroups). Certainly, there is still some of the dream as opposed to practice in terms of getting everything to talk to each other, but baby steps... There are about 100 application that produce or use SIOC data; see my most recent presentation from the Web Science Summer School two days ago (http://www.slideshare.net/Cloud). Some apps are defunct, some are still emerging. SIOC along with FOAF and DC are produced out of the box from Drupal 7 (since January 2011), using RDFa, which is pretty significant for many applications including search.

@Mike English: Yes, I believe a combo of FOAF and SIOC are the analogues of DC for the social space, for describing people and the content they make. SIOC can be used as a means of marking up social web content using RDFa or by mapping to other microdata/microformats; it can be used to provide a complete dump of a social website's content in XML if needs be; or it can be used as a native storage format along with FOAF, etc. - see +Alexandre Passant's semantic microblogging system, SMOB (http://www.smob.me), a distributed microblogging framework.


@Chris Saad: I'd like to hear more of how you think this contrasts with the Synaptic Web. I liked Khris' interview last year on this at http://newtechpost.com/2010/08/17/the-synaptic-web-more-like-a-brain but it'd be good to contrast with microformats and Semantic Web.

Thanks,
John.
--
 
There seems to be a dichotomy of thought on this one. Broadly speaking academia in one corner and "the valley" in the other, with a healthy overlap in the middle. Both side have excellent thinkers. So it's nice to see the occasional thread that brings them together.

I'm not sure it's correct to dismiss one world view or the other. However I am reminded of 'The Cathedral vs The Bazaar'. Once side with the accent on what's practical, the other with leaning towards purity, with the hope of creating a masterpiece. No firm conclusions on this one, but go look at a cathedral and look at a bazaar, one might have taken longer to build, but perhaps worth the wait.

Another analogy is the investment community. Wall St. vs the so-called 'value investors'. Wall St. have a 2 year horizon. Value investors have a 2 decade horizon. Both work, but Warren Buffett is the richest man in the world. It just took him a long time to get there! Perhaps timbl is the Buffett of the internet ... time will tell. Whatever the paradigm we're at the start of a transformative period and it's exciting to be part of.

I think The Web was always designed to be social. FOAF, SIOC & WebID are hopefully growing part of the experience. Nice demos are starting to emerge e.g

http://myprofile-project.org/
http://sioc.me/
http://webid.info/

I'm not sure how well I've framed this debate, but I really hope to see a coming together of both approaches. Final thought on paraphrasing from the original text that inspired the great Cathedrals.

"The Social Web will not come by expectation. Social is spread across The Web but men do not see it!" :)
 
I agree. And I would think that RDF would be a particularly useful tool, given that it is conceptually graph-based and could allow for a great deal of information to be produced (about the social web) by applying various graph traversal methods. However, I can definitely see where +Tantek Çelik (and others) are coming from - RDF (e.g. FOAF) has proven too complex to see widespread adoption, and without widespread usage, much of the utility is lost.

I don't pretend to know what the best solution is, but I do know that both sides of this apparent divide stand to benefit from talking to each other.
 
Also, this may be of interest to some - I just happened across this JSON-LD spec being developed by +Gregg Kellogg and +Manu Sporny : http://json-ld.org/spec/latest/

It looks much lighter weight and more intuitive than previous attempts I've seen try to shoehorn RDF into JSON (which sometimes seemed to focus merely on the container format and missed the point of increasing usability completely).
 
+John Breslin thanks very much for your follow-up, appreciated. You wrote:

"Development of SIOC was based on ..." (snip)

That's great to hear - have you documented that and your methodologies in general on a wiki page? Lessons learned etc.

The reason I ask is that it's clear that with the launch of schema.org the battle for practical web semantics is on, and none of us in this space (who desire more useful open {meta}data on the Web) can afford to mess around, unless you want to simply accede to Google+Microsoft's fait accompli efforts. I for one don't (and I think schema.org is pretty close to pure crap in so many ways - seemingly shot straight out of someone's "Volcano"[1]).

We need to develop convergent vocabularies based on scientific methods (documented methodologies, research, data gathering, etc.) and abandon all semantic alchemy (vocabularies that people make up because of what they want or their supposed expertise), which frankly, there's all too much of in the "everybody go make up your own vocabulary as you want" philosophy of XML/RDF communities (AKA false promises of namespaces and "distributed extensibilty" (DE)).

We need to openly show our work in how we develop vocabularies, preferably so that someone else coming along can both check it, and perhaps even reproduce the same (or at least similar) results given the same data and research.

At this point, anyone not openly documenting (on the web with permalinks) their work in the development of their vocabulary is doing semantic alchemy and deserved to be judged (and even mocked/ridiculed) as such. We all know better.

The microformats process [2] is one such scientific methodology, and the microformats community has worked hard to publicly document research, real world examples, previous formats, etc. as a scientific approach to developing vocabularies.

I'm not saying that the microformats process is the only way to scientifically develop a vocabulary nor even that it's the best / most scientific way possible.

It's the best that I personally know of to date. I've worked hard to make it so, and it's been independently adopted by other groups like ActivityStreams[3]), but am more than happy to see others publish their alternative scientific methodologies and methods for open consideration and iterative improvement (the microformats process itself has been iterated and improved based on feedback over the years, just as methodologies in the development of science have been iterated over the years).


You mention FOAF and this is an area that I have a feeling we're going to have to have a particularly difficult conversation.

I've already discussed/mentioned much of this in-person with +Dan Brickley directly a number of times over the years, but again, with schema.org forcing the collective open communities' hands, we can no longer afford to work on divergent schema/vocabulary for the same thing (people).


In short: as a vocabulary about people, vCard has "won". vCard/hCard are the VHS to FOAF's Beta.

vCard is in every phone/laptop/tablet etc. hCard is based on vCard. hCard is the most popular way to represent people on the web[4].

Portable Contacts and OpenID attribute exchange vocabularies are both based on hCard/vCard (and started to diverge). I managed to get +Kevin Marks +Joseph Smarr +Chris Messina +Rohit Khare and others who work on various open person vocabulary efforts to agree to work together to converge all of our person vocabulary efforts in the IETF VCARDDAV vCard4 group (note: outside of microformats.org) and we've been largely successful with that. The next versions of hCard and PoCo will be based on additions that have been made in vCard4 and thus all interoperate.

So this is an example hard decision that uppercase "Semantic Web" advocates/communities will have to make:

Which is more important:

a) converging open vocabulary development efforts (wherever the critical mass is per-vocabulary), e.g. working on a common basis of vCard (whose terms hCard has provided RDF-friendly URLs for e.g. [5] - this was my deliberate effort at some degree of SemWeb/LinkedData compatibility/interoperability)

OR

b) insisting on FOAF and diverging from everybody else working on person semantics on the web?


When the Google schema.org folks complain (see their announcing blog post) that all they've seen is more divergence, this is the kind of problem they're talking about.

A bunch of us managed to get cooperation across several very different groups/communities/cultures (vCard, microformats, PoCo, OpenID) and several syntaxes (MIME properties, class names, JSON, XML) to converge on a person vocabulary - and the only exception was FOAF continuing on its own fork. (Nevermind that several of the folks collaborating to converge a person vocabulary work at Google - the schema.org folks are clearly in their own internal corporate silo, were apparently not paying attention to vCard4, and decided to make up their own Person vocabulary.)

Bottom lines:

* it's do or die for open semantic vocabulary development on the web, if we don't collaborate and converge scientifically (rather than politically on alchemy) we might as well give up and let a Google+Microsoft duopolistic collusion dictate the future of web semantics.

* namespaces / distributed extensibilty (DE) methodologies are bankrupt (as proven by example by the delayed-open divergent and deficient schema.org).

* anyone involved in practical web semantics needs to converge their vocabulary efforts in an open[6], scientific manner, regardless of which community, forum, wiki, email list, irc channels they use to develop them.

Thanks,

Tantek (probably should have written this as a blog post, maybe I'll edit and republish accordingly)
[1] http://schema.org/Volcano (did you know Volcanos had phone numbers?)
[2] http://microformats.org/wiki/process
[3] http://wiki.activitystrea.ms/Process
[4] http://microformats.org/2010/07/08/microformats-org-at-5-hcards-rich-snippets#microformats-rich-snippets
[5] http://microformats.org/profile/hcard#fn
[6] http://tantek.com/2011/168/b1/practices-good-open-web-standards-development
 
+Tantek Çelik a few responses:

Zotero - it's widely and commonly used by academics (students, researches and faculty) for gathering citations, organizing and taking notes on them, and subsequently integrating them into publications (theses, journal articles, books, etc.). It's not really designed for more general consumption.

Dublin Core - the problem in discussions of DC is people don't define what they mean (which might be part of DC's problem, admittedly; too many initiatives). But the core vocabulary terms (creator, title, subject, description, etc.) are quite widely used, often embedded in the HTML head elements. Wouldn't you agree?

Aside: I find it both funny and disturbing that the schema.org folks couldn't use the most obvious term in the world ("title") from DC for creative works, and instead used "name," presumably because of a clash with vcard/title. In the real world, don't we need a way to deal with ambiguous terms ("python" the serpent vs. "python" the language)?

On your bigger argument - I actually agree with much of the substance, but do get turn off by the way you characterize some tough issues. Your often arbitrary, and certainly loaded, distinction between "scientific" and "semantic alchemy" is a case in point. I would prefer if we all found a more precise, and open, way to talk about different design priorities in vocabulary development. Not exactly sure how to slice it, but it might be:

- general vs. (community) specific: small, say technical, communities, or specific industries, often simply have more precise, demanding, needs than will seem important for more general purpose. This is just a fact, and it has nothing to do with the distinctions you're relying on. One conclusion, then, may be that we ought to be specific who we're designing for (which communities, and how general) and who we're explicitly not designing for?

- ground up vs. top down: I actually don't like the language I'm using here. An alternative more familiar to the academic community might be empirical vs. theoretical. But the idea is, do you start from the concrete details of what you need to encode, or do you early on try to encapsulate the range of differences you see in abstractions? I know where you stand on this, but just want to point out that what you call "scientific" is a little more narrowly empirical science. There is also lots of science that is theoretical.

- researched and documented vs. not - straightforward enough; do you show the process by which you arrived at your result? But also, importantly, have you considered the range of input data appropriate to your stated audience?

So let me demonstrate with a case I know well: scholarly citations. The general perspective more-or-less only sees articles, books, chapters, and that's it. To get to the real, typically unstated, problem, it really implicitly assumes "scholarly citations" = "scientific citations."

But if you talk to a legal scholar (or a law clerk working for a court), or someone who studies medieval history (humanities), you're talking a different world of concern, and a different level of complexity needed to address their real concerns (for the Zotero case above, how do I mark up the document I'm putting up on the web so that a scholar can use them in their documents?).

This intersects with vocabulary design: if you want to allow room for the specific in the context of the general, you have to explicitly account for it. You can't assume, or blindly assert, that "scholarly citations" = "scientific citations" and stop there, for example, or you will end up excluding more specific communities from using your vocabularies. In that case, an effort towards convergence naturally and inevitably leads to fragmentation.

To conclude: these issues are hard, and we would do well to recognize this, and come up with language appropriate to describing these real, practical, tensions.
Add a comment...