Profile

Cover photo
Robert Meusel
88 followers|54,868 views
AboutPostsPhotosVideosReviews

Stream

 
Hi Everybody,

The WebDataCommons team is happy to announce that we have released several class-specific subsets of the Schema.org Data contained in our Winter 2013 Microdata corpus [1]. We hope that providing those topic-specific subsets for over 50 different Schema.org classes (like product, event, or address) will make it easier for the community to explore and work with the data.

The different datasets, along with some statistics about the data can be found here: http://webdatacommons.org/structureddata/2013-11/stats/schema_org_subsets.html

The subsets contain all instances of a specific class as well as all other data that is found on the webpages containing these instances. For example, a page containing data about a product might also contain reviews and offers for this product; a page containing data about an event might also contain data about the location of the event and the persons involved in the event. The data was originally extracted using Any23 [2] from the Winter 2013 crawl provided by the Common Crawl Foundation [3]. The extracted data is represented in N-Quads [4] format, meaning that the forth element of each quad contains the URL of the webpage from which the data was extracted.

We thank the Common Crawl Foundation for providing their Web corpera.
Class-Specific Subsets of the Schema.org Data contained in the Winter 2013 Corpus. This page provides access to and statistics about class-specific subsets of the Schema.org data contained in the Winter 2013 version of the Web Data Commons Microdata corpus.
1
Add a comment...

Robert Meusel

Shared publicly  - 
 
never ever before
 
Share if you've never seen a peeled lemon until now...
7 comments on original post
1
Add a comment...
Have him in circles
88 people
Manish Mallik's profile photo
Craig Cmehil's profile photo
Stephan Schäfer's profile photo
Kevin Polley's profile photo
Davor Strehar's profile photo
Lydia Weiland's profile photo
Claus Neugebauer's profile photo
Jenny Zaino's profile photo
Matthias Fabinski's profile photo

Robert Meusel

Shared publicly  - 
 
Web Tables free for all! We just released over 147 million quasi-relational Web Tables for public download!
The Web contains vast amounts of HTML tables. Most of these tables are used for layout purposes, but a fraction of the tables is also quasi-relational, meaning that they contain structured data describing a set of entities.A corpus of Web tables can be useful for research and applications in areas s...
1
Add a comment...

Robert Meusel

Shared publicly  - 
 
First entirely open ranking of the WWW!
The Laboratory for Web Algorithmics together with the Data and Web Science Group of the University of Mannheim have put together the first entirely open ranking of more than 100 million sites of the World Wide Web. The ranking is based on classic and easily explainable centrality measures applied to...
1
Add a comment...

Robert Meusel

Free Online Resources  - 
 
ANN: Large hyperlink graph published, covering 3.5 billion web pages and 128 billion hyperlinks 

The Web Data Commons team has just announced the publication of a new large hyperlink graph.
The graph has been extracted from the Common Crawl 2012 web corpus [1] and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public.
The graph can be downloaded in various formats from 
http://webdatacommons.org/hyperlinkgraph
We provide initial statistics about the topology of the graph at
http://webdatacommons.org/hyperlinkgraph/topology.html
We hope that the graph will be useful for researchers who develop
• Search algorithms that rank results based on the hyperlinks between pages.
• SPAM detection methods which identity networks of web pages that are published in order to trick search engines.
• Graph analysis algorithms and can use the hyperlink graph for testing the scalability and performance of their tools.
• Web Science researchers who want to analyze the linking patterns within specific topical domains in order to identify the social mechanisms that govern these domains.
We want to thanks the Common Crawl project for providing their great web crawl and thus enabling the creation of the WDC Hyperlink Graph. 
The creation of the WDC Hyperlink Graph was supported by the EU research project PlanetData and by Amazon Web Services.  We thank your sponsors a lot.
Best Regards,
Chris, Oliver & Robert
 [1] http://commoncrawl.org/
2
3
Emre Safak's profile photoClay Kim's profile photoLauren Massa-Lochridge's profile photo
Add a comment...
People
Have him in circles
88 people
Manish Mallik's profile photo
Craig Cmehil's profile photo
Stephan Schäfer's profile photo
Kevin Polley's profile photo
Davor Strehar's profile photo
Lydia Weiland's profile photo
Claus Neugebauer's profile photo
Jenny Zaino's profile photo
Matthias Fabinski's profile photo
Work
Occupation
Researcher in the area of Data and Web Science
Links
Story
Tagline
data juggler, programmer, scientist, cineast, real-life addict
Basic Information
Gender
Male
Sehr authentisch und super lecker. Kein Standard mit kreativen Ideen und super Service.
Public - a year ago
reviewed a year ago
Great restaurant with amazing delicious starters and fantastic fish and meat main dishes.
Public - a year ago
reviewed a year ago
Sehr teuer
Food: GoodDecor: GoodService: Good
Public - a year ago
reviewed a year ago
Food: Very GoodDecor: Very GoodService: Very Good
Public - a year ago
reviewed a year ago
39 reviews
Map
Map
Map
Wir waren am einem Sonntag nach einer Radtour im Restaurant. Es war wenig los (4 Tische belegt). Die Bedienung war schnell und freundlich. Das Essen, vorallem die selbst gemachte Pasta war großartig. Wir können nicht beurteilen wie es ist wenn mehr los ist doch bei unserem Besuch Hat alles gepasst.
Food: Very GoodDecor: Very GoodService: Very Good
Public - a year ago
reviewed a year ago
Food: ExcellentDecor: Very GoodService: Very Good
Public - a year ago
reviewed a year ago
Food: ExcellentDecor: Very GoodService: Very Good
Public - a year ago
reviewed a year ago