Profile

Cover photo
40 followers|10,186 views
AboutPostsPhotosVideos

Stream

wikimeta

Shared publicly  - 
 
Wikimeta have partenered for the second time Workshops Wole, 2d International Workshop on Linked Entities in WWW 2013 conference

http://wole2013.eurecom.fr/
1
Add a comment...

wikimeta

Shared publicly  - 
 
Eric, main architect of Wikimeta (and also a scientist) is giving a lecture at Computer Science and Software Engineering faculty of Concordia University on Friday. The title is : NAMED ENTITIES DETECTION AND ENTITY LINKING IN THE CONTEXT OF SEMANTIC WEB.

This is an open seminar, and this is a good opportunity to view and discuss about what it's behind the scene in matter of semantic annotation and text analytics. Hope to see you there !

http://www.cse.concordia.ca/newsandevents/lectureseries/ 

Abstract:

Entity linking consists in establishing the relation between a textual entity from a text and its corresponding entity in an ontology. The main difficulty of this task is that a textual entity might be highly polysemic and potentially related to many different ontological representations. To solve this specific problem, various information retrieval techniques can be used. Most of those involve contextual words to estimate which exact textual entity have to be recognized. In this communication, we will explore the question of entity linking and the disambiguation problems it involves. We will describe how a detection and disambiguation resource built from Wikipedia encyclopaedic corpus can be used to establish a link between a named entity (NE) in a text and its normalized ontological representation from the semantic web.

 
1
Add a comment...

wikimeta

Shared publicly  - 
 
One of the secrets of Wikimeta is our disambiguation ontology NLGbAse. This ontology is very innovative and licensed with creative commons (you just have to open a Wikimeta account to download it).

This year, for the first time a scientific paper presented at LREC use NLGbAse as a comparable reference and we are very proud of it. You can read about it in this paper: Aleda, a free large-scale entity database for French.

Thanks to Benoit Sagot and Rosa Stern for their very good and interesting work.
LREC 2012- May 2012. A disambiguation resource extracted from Wikipedia for semantic annotation Eric Charton & Michel Gagnon(Ecole Polytechnique de Montreal) - Istambul, Turkey , May 2012 (pdf) Ca...
1
1
wikimeta's profile photoBrian Buck's profile photo
Add a comment...

wikimeta

Shared publicly  - 
 
We are at Canadian AI 2012 to present Wikimeta technologies.
Welcome to AI-2012 ! Programs: Main Conference Program; Graduate Student Symposium: Program, Booklet. Where is the conference exactly? York University Keele Campus (4700 Keele St., Toronto); Both the ...
1
Add a comment...

wikimeta

Shared publicly  - 
 
A word about co-reference resolution in text-mining solutions

Co-reference detection is an essential subtask of text-analytics. Co-reference consists in detection of a chain of related entities inside a text. For example, let’s see the following sentence:

* [The Tasmanian Darner] ([Austroaeschna tasmanicar]), is a species of dragonfly in the family Aeshnidae, which includes some of the world's largest dragonflies. [It] is found in Tasmania, Australia. [The species] was first described by Robert Tillyard in 1916 and inhabits streams and rivers.

In this paragraph, the co-reference chain of Tasmanian Darner is highlighted with brackets: you see that a unique conceptual entity (the insect Tasmanian Darner ) can be referred elsewhere, for example, by another name (the Latin one), a pronoun (it) or a noun phrase (The species).

As all those possible expressions of a same concept are very different by nature, detecting co-reference chain imply to use of a lot of very different natural language processing techniques. This explains the intrinsic difficulty of this task and why it is considered as probably one of the most difficult in NLP.

However, this difficult task is also essential to manage highly complex language processing applications like language understanding or structured information extraction. The KBP slot filling task – organized by NIST - is a good example of how such complex task involves co-reference detection.

KBP task consists in extraction of data from open text to fill a specific slot. For example, for an entry related to a person, the system have to find his birthdate somewhere in a text corpus and then have to fill for this person record his birthdate slot. In tasks like KBP, co-reference annotation can become crucial : for a given query, information required for a slot filling can be extracted from a document only if there is a robust co-reference module. Let’s take again the example of a person birdthdate. This information should be contained in a sentence like:

[Mr Purple] is French and Mrs Yellow is Swedish. He was born in [June 1990].

As we can see, in such sentence, it will be easy to collect the nationality of Mr Purple or Ms Yellow. But the system absolutely needs to know that he co-refer with Mr Purple and not with Mrs Yellow to extract the birthdate and fill the correct slot with it. And this is only possible if a gender detection module is implemented. And It’s just a very simple example … Let’s see this one, more difficult, now:

[Mr Purple] is French and Mrs Yellow is Swedish. The young man was born in [June 1990].

The system needs now to estimate the probability of co-reference for a noun phrase (the young man). In the past, such case was called an IA problem (now it’s just a machine learning problem). And this is still a simple example !

This explain why co-reference detection cannot be resolved through simple rules and need complex machine learning systems: multiple intermediate level of annotation are necessary to implement a co-reference labeling system, including gender detection, noun phrase annotation and comparison, and much more !

Some text-mining API applications currently implement rudiment of co-reference detectors, Wikimeta does not. But those systems don’t manage noun-phrase co-reference for example. It’s simply impossible to obtain an acceptable level of robustness without such component. Mostly such old fashioned co-reference detectors will obtain a MUC measurement – one of the standard metrics for co-reference systems evaluation - very low, probably under 20 (on 100), clearly non-sufficient to build complex information extraction tasks.

In academic evaluation context, state-of-the-art systems obtain frequently a level of MUC performance over 50 (http://conll.cemantix.org/2011/ 59.57 on Ontonotes corpus for actual best system that come from Stanford NLP group).

Wikimeta people have built technology for co-reference detection and own a good prototype for demonstration and evaluation (MUC of 52.45 on Ontonotes corpus). It can be seen and downloaded here http://code.google.com/p/polyco-2/. But currently, we consider that we do not have the complete module chain to maintain performances of our co-reference detector in a user context (we need for example, to integrate noun phrases detection in the API output).

That’s why there is no co-reference system in Wikimeta yet. We will implement one only when it will be usable by any users in very rich and complex applications.

#semweb
Official page of the CoNLL-2011 shared task on coreference using the OntoNotes corpus.
2
1
Christian Bitter's profile photo
Add a comment...

wikimeta

Shared publicly  - 
 
If you want to know more about us, what we plan to do on Wikimeta Semantic Engine, read this interview from semanticweb !
1
Add a comment...

wikimeta

Shared publicly  - 
 
The honor code of beta testers

[An honour code or honour system is a set of rules or principles governing a community based on a set of rules or ideals that define what constitutes honorable behavior within that community] [Wikipedia].

Currently, Wikimeta is still under beta testing conditions (it's written at the bottom of each page of the website). This means that the labeling tool can encounter some failures and having some bugs or a lack of standard compliance.

We use a lot of engineering tools to detect automatically failures when they happen. Sometimes, our users find a problem in a minute and it disappears minute later: that is because this problem was monitored and solved by us in real time.
But sometimes, problems need delay to be fixed: there are some commonly accepted rules in such cases. Here are the 3 most important ones.

- Rule 1: a bêta test bug is not a communication object

Usually, when you are a user under bêta test conditions, there is a fair commonly accepted use of not publishing (on twitter, in a newspaper article, in a blog) excessive comments regarding a bug you found. This is the first beta tester common rule. If you participate in a beta test plan from a major company (most of our team members did for very big companies formerly), usually, you sign a contract for this rule (the NDA). In open source and academic universes it's less formal, but it's nice and normal to act like that because most often, bug is not really a bug, and comments are finally quickly becoming inappropriate.

- Rule 1 bis : if you decide to communicate about a bug, send us the post link or the tweet reference: it will be a way for us to discover and study the problem.

- Rule 2 : when you find a bug or a defect, signal it !

An other rule is to inform the developers of the code of the bug you found. It looks natural but for newbies in the beta-test universe. The main goal of the beta-test system is to receive feedback. We have now dozens of registered users of Wikimeta (really) and a few with "no-limit free account". Some are research scientists in labs, others are students, all have helped us a lot. Our partnership relation with them is the key for evolution of the quality of Wikimeta (specially standard compliance). They can say that (we hope) : we can work late at night and any day of the week to solve their problems.

Even on Sunday, we need to know that the bug exists to solve it (to report, one quick way: contact@wikimeta.com !

- Rule 2 bis

It's really, very, very, nice to not publicly complain on a bug (rule 1) that you haven't signaled (rule 2) :-)

- Rule 3: be fair when you comment or compare a tool under beta conditions

Usually, the exact goal and functional limitation of a bêta-software are not definitely defined. Specially in academic contexts like ours. This means there will be some unpredictable data feeded to the software that will create a new issue and unpredictable results. Once again, we are in academic context, we want our tool tested, compared, even aggressively. However it's also normal to explain us what your experimental protocols are and sometimes to give us an access to samples of their components, mostly test corpora (reproducibility of experiments sounds a bit?). We participated with our tool (and will continue) in international evaluation campaigns (like CoNLL ST last year) and believe us, when we said our tool is good (or bad), we do our best to work on state of the art methods of evaluation.

That's all, this is our only rule-set of beta partnership.

Thanks very much to all our beta test users who helped us for 2 years now. Wikimeta will leave the beta test state in the next weeks. It makes us very happy. We will open a new beta site to let you continue to participate to the innovations of our semantic tool.

Stay connected !
1
Add a comment...
In their circles
32 people
Have them in circles
40 people
Özer Kavak's profile photo
aUM OMm's profile photo
Nicolas Gouriou's profile photo
Brian Buck's profile photo
gael gégourel's profile photo
Adrián Vázquez's profile photo
John Hart's profile photo
Hertzel Karbasi's profile photo
John's Gardening Services's profile photo

wikimeta

Shared publicly  - 
 
At Wikimeta, we like good engineering practices, and fair evaluations of the performances of our technology. So we were interested by a paper published at ESWC 2013 conference, comparing various semantic annotators available on the market.  We give you the information related to this paper and two tables from it. We can argue on many aspects of the paper (like always in scientific context). For example, there is not enough variation in the test corpus used to really declare a clear performance difference on all the annotators tested . But its pretty interesting and a unique initiative and that's make it valuable for a read. To read the table, consider the F1 column is the reference metric (F-Score). 

Title
"A Comparison of Knowledge Extraction Tools for the Semantic Web"

Abstract :
"In the last years, basic NLP tasks: NER, WSD, relation ex-
traction, etc. have been con gured for Semantic Web tasks including ontology learning, linked data population, entity resolution, NL querying to linked data, etc. Some assessment of the state of art of existing Knowledge Extraction (KE) tools when applied to the Semantic Web is then desirable. In this paper we describe a landscape analysis of several tools, either conceived speci cally for KE on the Semantic Web, or adaptable to it, or even acting as aggregators of extracted data from other tools. Our aim is to assess the currently available capabilities against a rich palette of ontology design constructs, focusing speci cally on the actual semantic reusability of KE output".

Paper:
http://eswc-conferences.org/sites/default/files/papers2013/gangemi.pdf
1
Add a comment...

wikimeta

Shared publicly  - 
 
Christophe Desclaux (https://twitter.com/#!/descl3) have released publicly this morning the results of his work on semantic annotation of RSS feed.

He use our semantic API  to annotate news summaries. This work is applied on French text feeds. Because of this, it is a nice illustration of the impact on annotation robustness when one disambiguation model is used for each language suported.

Christophe explain how he built this project here (http://linuxfr.org/users/descl/journaux/ma-participation-au-concours-boostyourcode-2012). He describe it in an accepted scientific paper (https://github.com/descl/ZONE/blob/master/Rapport/papier1/christophe.pdf)

This is a very nice work. You can use it here http://zone.zouig.org/zone/rssfeed/index and the source code is available here https://github.com/descl/ZONE
3
Add a comment...

wikimeta

Shared publicly  - 
 
Whe have launched this week our final public version of Wikimeta. You can explore it on www.wikimeta.com.

This version improve the user experience, with an inline semantic annotator included, free of use, an exchange forum with a lot of documentation about the API, an improved user account, with many new functionalities, including a support ticket system.

New functionalities of the semantic annotator included are :

* improved detection models
* minor fixes on the output (json, xml) standard compliance
* a text mining function with calculation of keywords and word, verbs and adjectives distribution

New semantic product is:

* Wikimeta server, currently in alpha, that will allow third party to manage their own semantic and text-mining annotation service with Wikimeta Semantic Technology

You can try it online now !

#linkeddata #semanticweb #textmining #namedentities
Made with Wikimeta. Have a look at our user portfolio and promote your semantic work. Frequent questions. What to know about text-mining and semantics with Wikimeta. Join us on Google+. Follow us on o...
2
2
Christian Bitter's profile photoJ. Haits's profile photo
Add a comment...

wikimeta

Shared publicly  - 
 
Securing the supply chain.

As you may know, we plan to open commercial Wikimeta service this month. This new step of Wikimeta development implies for us a lot of work on various engineering aspects of the web service and its supporting network and computer grid. Here is some news about this work. It will also gives you some information about underlying architecture of our planed services.

We have finished to work on scalability and efficiency of Wikimeta infrastructure. This means finding services and internet suppliers able to letting us offering a global efficient API service. Key features of Wikimeta infrastructure will be:

- Load balancing of service adapted to Europe and America customers needs.
This means you will be able to choose in which continental part of the world your API service will be most efficient. This is very important as intercontinental backbones does not necessary provides same quality of service as ground continental internet lines. At the moment no competitor have offers to answer this need.

- Scalability is also taken into account. Whenever you ask a simple account or 1 mb of text mining bandwidth, we will be able to activate new servers on our grid usually in one hour. Such capacity of reaction involves also high level of robustness for our API web service. We have worked a lot on that too (the current stable version have now more than 60 days of work without failure).

- Support. For a commercial service, customers needs high reaction capacity. We have worked a lot on the new Customer Management Systems of Wikimeta to give to our users various feeds of information and exchange channels. It's a hard work as an academic tool is not necessary built for integration with CMS :-) But it's on the way and we currently work on final version of this last part of the architecture. We hope to provide one of the most customer friendly CMS in the word of API with a lot of improvement possibilities that will appear along the year !

This is a lot of engineering work. When it will be completed, we will come back on scientific aspects and performance of the Wikimeta engine.

Stay connected, we are on the launch pad !
1
Add a comment...

wikimeta

Shared publicly  - 
 
A word about our engine versioning. There are 3 main components in Wikimeta:

- The labeling software
- The metadata set (NLGbAse metadata)
- The detection model

Currently, the metadata set is the version 5. The detection model is also the version 5 (but this can be different sometimes). The labeling software is the version 1.6.5.0.2.

The previous metadata and detection models were the version 3 launched on November 2011. We kept the version 4 for internal evaluation an decided to not use it online. Yes, we have another version of Wikimeta on our intranet for dev purposes!

The version number of the labeling software is returned when you make an API call (it's very useful for us when you give this number in a bug report).

The only way of knowing the version number of the metadata set and the detection model ... is to follow our tweets :-)

Currently, the metadata model of Wikimeta is under the same version of the one displayed on www.nlgbase.org. But it can be slightly different sometimes. This explains why you can have a different information between the metadata displayed in Wikimeta and in NLGbAse.

We plan to give a specific information page about the metadata set very soon, with, for example, the reference of the source Wikipedia file used for generation.

Don't know if this helps, but looked interesting for me to give you this information :-)

Eric
1
Add a comment...
People
In their circles
32 people
Have them in circles
40 people
Özer Kavak's profile photo
aUM OMm's profile photo
Nicolas Gouriou's profile photo
Brian Buck's profile photo
gael gégourel's profile photo
Adrián Vázquez's profile photo
John Hart's profile photo
Hertzel Karbasi's profile photo
John's Gardening Services's profile photo
Story
Tagline
Welcome on the page of wikimeta semantic tagger
Introduction
Wikimeta provides content analysis and meta-data annotation tools. Wikimeta found the semantic richness hidden in any content including named entities, keywords, semantic nature of the concepts, nature of the words or of the sentence. Wikimeta uses innovative methods to analyze your content, extract semantic meta-data: information about people, places, companies, topics, languages, and more. Just try it on line  now for your day to day needs of text analysis, or use our API to build your own applications.