Shared publicly  - 
Open Data and Content Mining
Talk at Sydney University by
Peter Murray-Rust
University of Cambridge and Open Knowledge Foundation

Seminar Room, Level 2 Fisher Library, 2 pm, Wednesday October 31st, 2012

The publicly funded research in the Scientific Technical Medical (STM) literature contains multibillion dollars of unused value. Most scientific articles contain names, numbers, places, chemicals, organisms, graphs, tables, etc. which can be extracted and re-used. This leads to better science, new information products, startup companies, better information for policy makers and much more which I have estimated  at "low billions" for chemistry alone. For STM, especially medicine, the figure is much higher. Yet this is currently unavailable for the reasons: (a) publishing uses PDF which is a very poor way of conveying the information (b) publishers active prevent mining of the content to preserve their revenues.

We must change this, and soon, though (a) evangelism of the opportunity (b) lobbying for our rights (c) building the next generation of tools. I shall cover all these, including our Manifesto on Open Content Mining and demonstrations of AMI2 - a weakly intelligent amanuensis for the scientist (based initially on understanding PDFs). This offers great opportunities for citizenry in general to liberate this vast resource of valuable information.

All welcome, no RSVP needed
Host: +Matthew Todd  School of Chemistry
Alex Holcombe's profile photoBradley Voytek's profile photo
Jessica, sorry it ended up pretty informal and PMR went through various demos without any organised slides. +Bradley Voytek probably you already know about this work wherein Peter Murray-Rust is doing natural language processing, mostly on chemistry papers to pull out chemicals and reactions and such. I am wondering if you have any plans to have sophisticated parsing and natural language processing to brainScanr, which as I understand it currently is restricted to co-occurrence analysis. I would like to pull out p-values and statistics (t's of t-tests, F's of ANOVAs) on a large scale to assess the degree of publication bias, and the average effect size and power in different fields. 
+Alex Holcombe: Progress on brainSCANr has been halted for a while now as I get my post-doc up and running and now that my wife and I have the baby. My wife is transitioning to a new technical job here shortly where she may have a chance to work on it again. If so, she and I will sit and brainstorm. That said, there are people who have been running with the idea, but in different directions, e.g. by +Shreejoy Tripathy.
Add a comment...