Profile cover photo
Profile photo
Denny Vrandečić
1,346 followers
1,346 followers
About
Denny's posts

Post has attachment
Thanks to the Wikimedia community for their trust - today I was elected to serve as a member of the Board of Trustees for the Wikimedia Foundation for the next two years.

Post has attachment
About where Google Search is now and going to. Four parts to it.

Post has attachment
Want to become my colleague? We are looking for ontology engineers at Google.

Post has shared content
From Freebase to Wikidata
When we publicly launched Freebase back in 2007, we thought of it as a "Wikipedia for structured data." So it shouldn't be surprising that we've been closely watching the Wikimedia Foundation's project Wikidata[1] since it launched about two years ago. We believe strongly in a robust community-driven effort to collect and curate structured knowledge about the world, but we now think we can serve that goal best by supporting Wikidata -- they’re growing fast, have an active community, and are better-suited to lead an open collaborative knowledge base.

So we've decided to help transfer the data in Freebase to Wikidata, and in mid-2015 we’ll wind down the Freebase service as a standalone project. Freebase has also supported developer access to the data, so before we retire it, we’ll launch a new API for entity search powered by Google's Knowledge Graph.

Loading Freebase into Wikidata as-is wouldn't meet the Wikidata community's guidelines for citation and sourcing of facts -- while a significant portion of the facts in Freebase came from Wikipedia itself, those facts were attributed to Wikipedia and not the actual original non-Wikipedia sources. So we’ll be launching a tool for Wikidata community members to match Freebase assertions to potential citations from either Google Search or our Knowledge Vault[2], so these individual facts can then be properly loaded to Wikidata. 

We believe this is the best first step we can take toward becoming a constructive participant in the Wikidata community, but we’ll look to continually evolve our role to support the goal of a comprehensive open database of common knowledge that anyone can use.

Here are the important dates to know:

Before the end of March 2015
- We’ll launch a Wikidata import review tool
- We’ll announce a transition plan for the Freebase Search API & Suggest Widget to a Knowledge Graph-based solution

March 31, 2015
- Freebase as a service will become read-only
- The website will no longer accept edits 
- We’ll retire the MQL write API

June 30, 2015
- We’ll retire the Freebase website and APIs[3]
- The last Freebase data dump will remain available, but developers should check out the Wikidata dump[4]

The Knowledge Graph team at Google

[1] http://wikidata.org
[2] http://www.cs.cmu.edu/~nlao/publication/2014.kdd.pdf
[3] https://developers.google.com/freebase/v1/
[4] http://dumps.wikimedia.org/wikidatawiki/

Post has attachment
A few thoughts of mine on why AI might turn out to be rather boring, and not likely not destroy humanity.

Post has shared content
A little window into the workings of Wikidata. If you are queasy about how sausage is made, you don't want to know how knowledge bases are made.
I've been volunteering on wikidata.org for a while now and I thought it might be interesting for outsiders to see how Wikidata deals with issues by looking at some issues that arose around the 'Sex' property and how the those issues were handled.

'Sex' is a property used for items about people or animals to specify the sex of the subject of an item. In most cases it is 'sex:male' or 'sex:female' but there can be other values.

The first question that arose was whether 'man' and 'woman' would be better values that 'male' and 'female'. That was resolved by noting it is intended that this property be used for animals other than humans - there are wikipedia articles on a number of racehorses for instance. It was agreed that men would be 'instance of(p31):human(Q5)' and 'sex(p21):male(Q6581097)' and similarly for females.

The next issue that arose was that in many languages there is a sharp distinction between male humans and male animals with different words used for the different concepts. We considered creating a different property for the sex of animals but in the end we kept P21 the same and added too new items that P21 could link to - male animal (Q44148) and female animal (Q43445).

If you look at the labels in other languages you can see the difference:
English: male and male animal
German: männlich and männliches Geschlecht
Spanish: masculino and macho
French: masculin and mâle
Nederlands: man and mannelijk

The next issue that came up was highlighted by the Chelsea Manning affair. How are we to describe the sex of Chelsea Manning?

I will summarise the arguments raised below, together with the responses to these arguments:

Sex and Gender are different things. Combining them in one property is a poor design choice. We should have separate properties for these.

Response: There are millions of humans on wikidata and for most of them we are judging their sex from their names and how they appeared to whatever sources wrote about them - in other words their gender presentation. There are very few people for whom we have reliable information on their genitalia, much less their chromosomes. If we have 2 different specific properties then we need specific information to use these properties which means that in practice we will not be able to use these properties on most humans. Better to have an ambiguous property that reflects the information we have.

Response: We don't have to leave it blank. We can use the information we have, based on names etc. and we will be right most of the time - there aren't that many transgender people anyway and there are even fewer clandestine transgender people.

Response: There are a lot of people on Wikidata. If we are wrong 1% of the time that is thousands of people. We should find a way to express this that reflects our lack of specific information.

The conclusion was to have one ambiguous property and use qualifiers in those specific cases where we have more information. We changed the English label for this property from 'Sex' to 'sex or gender' with aliases 'gender identity', 'gender expression', 'gender', 'biological sex', 'man', 'woman', 'male', 'female', 'intersex', 'sex. The English description was changed to "male (Q6581097), female (Q6581072), intersex (Q1097630), transgender female (Q1052281), transgender male (Q2449503), genderqueer (Q48270); for animals use male animal (Q44148) or female animal (Q43445). Add qualifiers as appropriate."  

'Qualifiers' as mentioned here refers to a feature of wikidata. Each statement on wikidata can have qualifiers to give additional information. For example: 'sex or gender:male' can have the qualifier 'end date:3 June 2012' meaning that the statement 'sex or gender:male' is only true up till that date.

As well as qualifiers wikidata also allows a property to have more that one value, each with it's own qualifiers. So we can have:
'sex or gender:male', 'end date:3 June 2012'
and 'sex or gender:transgender female', 'start date:4 June 2012' 

This blog post just gives a brief and personal viewpoint of the discussions on the talk page for property P21. For the whole discussion see https://www.wikidata.org/wiki/Property_talk:P21

#wikidata

Post has attachment
Many Googlers are huge fans of Wikipedia. So here’s a little gift for Wikidata’s second birthday.

Some of my smart colleagues at Google have run a few heuristics and algorithms in order to discover Wikipedia articles in different languages about the same topic which are missing language links between the articles. The results contain more than 35,000 missing links with a high confidence according to these algorithms. We estimate a precision of about 92+% (i.e. we assume that less than 8% of those are wrong, based on our evaluation). The dataset covers 60 Wikipedia language editions.

Here are the missing links, available for download from the WMF labs servers:

https://tools.wmflabs.org/yichengtry/merge_candidate.20141028.csv 

The data is published under CC-0.

What can you do with the data? Since it is CC-0, you can do anything you want, obviously, but here are a few suggestions:

There’s a small tool on WMF labs that you can use to verify the links (it displays the articles side by side from a language pair you select, and then you can confirm or contradict the merge):

https://tools.wmflabs.org/yichengtry 

The tool does not do the change in Wikidata itself, though (we thought it would be too invasive if we did that). Instead, the results of the human evaluation are saved on WMF labs. You are welcome to take the tool and extend it with the possibility to upload the change directly on Wikidata, if you so wish, or, once the data is verified, to upload the results.

Also, Magnus Manske is already busy uploading the data to the Wikidata game, so you can very soon also play the merge game on the data directly. He is also creating the missing items on Wikidata. Thanks Magnus for a very pleasant cooperation!

I want to call out to my colleagues at Google who created the dataset - Jiang Bian and Si Li - and to Yicheng Huang, the intern who developed the tool on labs.

I hope that this small data release can help a little with further improving the quality of Wikidata and Wikipedia! Thank you all, you are awesome!

Post has shared content

We published qLabel today, an Open Source JavaScript library that let's anyone create (specific types of) content in more than 300 languages. It uses plenty of Web standards, Semantic Web technologies, Wikidata, Freebase, and all that jazz. Have fun and let me know when you use it!

http://google-opensource.blogspot.de/2014/04/qlabel-multilingual-content-without.html

Post has attachment
I wrote some fiction. Has been a while. And I am trying to write English.
Wait while more posts are being loaded