Profile cover photo
Profile photo
Welcome on the page of wikimeta semantic tagger
Welcome on the page of wikimeta semantic tagger

wikimeta's posts

Post has attachment
Wikimeta have partenered for the second time Workshops Wole, 2d International Workshop on Linked Entities in WWW 2013 conference

Post has attachment
At Wikimeta, we like good engineering practices, and fair evaluations of the performances of our technology. So we were interested by a paper published at ESWC 2013 conference, comparing various semantic annotators available on the market.  We give you the information related to this paper and two tables from it. We can argue on many aspects of the paper (like always in scientific context). For example, there is not enough variation in the test corpus used to really declare a clear performance difference on all the annotators tested . But its pretty interesting and a unique initiative and that's make it valuable for a read. To read the table, consider the F1 column is the reference metric (F-Score). 

"A Comparison of Knowledge Extraction Tools for the Semantic Web"

Abstract :
"In the last years, basic NLP tasks: NER, WSD, relation ex-
traction, etc. have been con gured for Semantic Web tasks including ontology learning, linked data population, entity resolution, NL querying to linked data, etc. Some assessment of the state of art of existing Knowledge Extraction (KE) tools when applied to the Semantic Web is then desirable. In this paper we describe a landscape analysis of several tools, either conceived speci cally for KE on the Semantic Web, or adaptable to it, or even acting as aggregators of extracted data from other tools. Our aim is to assess the currently available capabilities against a rich palette of ontology design constructs, focusing speci cally on the actual semantic reusability of KE output".


Post has attachment
Eric, main architect of Wikimeta (and also a scientist) is giving a lecture at Computer Science and Software Engineering faculty of Concordia University on Friday. The title is : NAMED ENTITIES DETECTION AND ENTITY LINKING IN THE CONTEXT OF SEMANTIC WEB.

This is an open seminar, and this is a good opportunity to view and discuss about what it's behind the scene in matter of semantic annotation and text analytics. Hope to see you there ! 


Entity linking consists in establishing the relation between a textual entity from a text and its corresponding entity in an ontology. The main difficulty of this task is that a textual entity might be highly polysemic and potentially related to many different ontological representations. To solve this specific problem, various information retrieval techniques can be used. Most of those involve contextual words to estimate which exact textual entity have to be recognized. In this communication, we will explore the question of entity linking and the disambiguation problems it involves. We will describe how a detection and disambiguation resource built from Wikipedia encyclopaedic corpus can be used to establish a link between a named entity (NE) in a text and its normalized ontological representation from the semantic web.


Post has attachment
Christophe Desclaux (!/descl3) have released publicly this morning the results of his work on semantic annotation of RSS feed.

He use our semantic API  to annotate news summaries. This work is applied on French text feeds. Because of this, it is a nice illustration of the impact on annotation robustness when one disambiguation model is used for each language suported.

Christophe explain how he built this project here ( He describe it in an accepted scientific paper (

This is a very nice work. You can use it here and the source code is available here

Post has attachment
One of the secrets of Wikimeta is our disambiguation ontology NLGbAse. This ontology is very innovative and licensed with creative commons (you just have to open a Wikimeta account to download it).

This year, for the first time a scientific paper presented at LREC use NLGbAse as a comparable reference and we are very proud of it. You can read about it in this paper: Aleda, a free large-scale entity database for French.

Thanks to Benoit Sagot and Rosa Stern for their very good and interesting work.

Post has attachment

Post has attachment
Whe have launched this week our final public version of Wikimeta. You can explore it on

This version improve the user experience, with an inline semantic annotator included, free of use, an exchange forum with a lot of documentation about the API, an improved user account, with many new functionalities, including a support ticket system.

New functionalities of the semantic annotator included are :

* improved detection models
* minor fixes on the output (json, xml) standard compliance
* a text mining function with calculation of keywords and word, verbs and adjectives distribution

New semantic product is:

* Wikimeta server, currently in alpha, that will allow third party to manage their own semantic and text-mining annotation service with Wikimeta Semantic Technology

You can try it online now !

#linkeddata #semanticweb #textmining #namedentities

Post has attachment
A word about co-reference resolution in text-mining solutions

Co-reference detection is an essential subtask of text-analytics. Co-reference consists in detection of a chain of related entities inside a text. For example, let’s see the following sentence:

* [The Tasmanian Darner] ([Austroaeschna tasmanicar]), is a species of dragonfly in the family Aeshnidae, which includes some of the world's largest dragonflies. [It] is found in Tasmania, Australia. [The species] was first described by Robert Tillyard in 1916 and inhabits streams and rivers.

In this paragraph, the co-reference chain of Tasmanian Darner is highlighted with brackets: you see that a unique conceptual entity (the insect Tasmanian Darner ) can be referred elsewhere, for example, by another name (the Latin one), a pronoun (it) or a noun phrase (The species).

As all those possible expressions of a same concept are very different by nature, detecting co-reference chain imply to use of a lot of very different natural language processing techniques. This explains the intrinsic difficulty of this task and why it is considered as probably one of the most difficult in NLP.

However, this difficult task is also essential to manage highly complex language processing applications like language understanding or structured information extraction. The KBP slot filling task – organized by NIST - is a good example of how such complex task involves co-reference detection.

KBP task consists in extraction of data from open text to fill a specific slot. For example, for an entry related to a person, the system have to find his birthdate somewhere in a text corpus and then have to fill for this person record his birthdate slot. In tasks like KBP, co-reference annotation can become crucial : for a given query, information required for a slot filling can be extracted from a document only if there is a robust co-reference module. Let’s take again the example of a person birdthdate. This information should be contained in a sentence like:

[Mr Purple] is French and Mrs Yellow is Swedish. He was born in [June 1990].

As we can see, in such sentence, it will be easy to collect the nationality of Mr Purple or Ms Yellow. But the system absolutely needs to know that he co-refer with Mr Purple and not with Mrs Yellow to extract the birthdate and fill the correct slot with it. And this is only possible if a gender detection module is implemented. And It’s just a very simple example … Let’s see this one, more difficult, now:

[Mr Purple] is French and Mrs Yellow is Swedish. The young man was born in [June 1990].

The system needs now to estimate the probability of co-reference for a noun phrase (the young man). In the past, such case was called an IA problem (now it’s just a machine learning problem). And this is still a simple example !

This explain why co-reference detection cannot be resolved through simple rules and need complex machine learning systems: multiple intermediate level of annotation are necessary to implement a co-reference labeling system, including gender detection, noun phrase annotation and comparison, and much more !

Some text-mining API applications currently implement rudiment of co-reference detectors, Wikimeta does not. But those systems don’t manage noun-phrase co-reference for example. It’s simply impossible to obtain an acceptable level of robustness without such component. Mostly such old fashioned co-reference detectors will obtain a MUC measurement – one of the standard metrics for co-reference systems evaluation - very low, probably under 20 (on 100), clearly non-sufficient to build complex information extraction tasks.

In academic evaluation context, state-of-the-art systems obtain frequently a level of MUC performance over 50 ( 59.57 on Ontonotes corpus for actual best system that come from Stanford NLP group).

Wikimeta people have built technology for co-reference detection and own a good prototype for demonstration and evaluation (MUC of 52.45 on Ontonotes corpus). It can be seen and downloaded here But currently, we consider that we do not have the complete module chain to maintain performances of our co-reference detector in a user context (we need for example, to integrate noun phrases detection in the API output).

That’s why there is no co-reference system in Wikimeta yet. We will implement one only when it will be usable by any users in very rich and complex applications.


Post has attachment
Securing the supply chain.

As you may know, we plan to open commercial Wikimeta service this month. This new step of Wikimeta development implies for us a lot of work on various engineering aspects of the web service and its supporting network and computer grid. Here is some news about this work. It will also gives you some information about underlying architecture of our planed services.

We have finished to work on scalability and efficiency of Wikimeta infrastructure. This means finding services and internet suppliers able to letting us offering a global efficient API service. Key features of Wikimeta infrastructure will be:

- Load balancing of service adapted to Europe and America customers needs.
This means you will be able to choose in which continental part of the world your API service will be most efficient. This is very important as intercontinental backbones does not necessary provides same quality of service as ground continental internet lines. At the moment no competitor have offers to answer this need.

- Scalability is also taken into account. Whenever you ask a simple account or 1 mb of text mining bandwidth, we will be able to activate new servers on our grid usually in one hour. Such capacity of reaction involves also high level of robustness for our API web service. We have worked a lot on that too (the current stable version have now more than 60 days of work without failure).

- Support. For a commercial service, customers needs high reaction capacity. We have worked a lot on the new Customer Management Systems of Wikimeta to give to our users various feeds of information and exchange channels. It's a hard work as an academic tool is not necessary built for integration with CMS :-) But it's on the way and we currently work on final version of this last part of the architecture. We hope to provide one of the most customer friendly CMS in the word of API with a lot of improvement possibilities that will appear along the year !

This is a lot of engineering work. When it will be completed, we will come back on scientific aspects and performance of the Wikimeta engine.

Stay connected, we are on the launch pad !

Post has attachment
If you want to know more about us, what we plan to do on Wikimeta Semantic Engine, read this interview from semanticweb !
Wait while more posts are being loaded