Cover photo
Richard Smith-Unna
Works at Millennium Seed Bank, Kew
Attends University of Cambridge
Lives in Cambridge, UK
5,406 followers|234,472 views


All welcome at our hackday today. We'll be hacking on our tools for the system. Come along to learn about web crawling and scraping, how to extract facts from PDFs, and how to reverse-engineer data from images

#contentmine   #datamining   #hackday   #hackathon   #openscience   #citizenscience
We'll be holding a very informal hackday in Makespace Cambridge (16 Mill Lane, Cambridge, CB2 1RX).
This is open to anyone interested in developing tools and expertise in automatic mining of information from the scholarly literature. Please register so we can anticipate numbers (probably max 20) and alert security. Topics include chemistry, crystallography, metadata and licences, biological species, phylogenetics, and clinical trials - or bring your own wishlist. No formal experience, but bring your own laptop and a sense of adventure. Details will be posted and updated on
This Hangout On Air is hosted by Graham Steel. The live video broadcast will begin soon.
ContentMining Hackday in Cambridge facilitated by ContentMine
Fri, January 23, 4:00 AM
Hangouts On Air - Broadcast for free

3 comments on original post
Rich Gillin's profile photo
audio is muddy, unreadable.
Add a comment...
+Mike Bostock, one of two people I can think of who have reached total data enlightenment, takes us all on a beautiful journey through algorithms using stunning visualisations.
Ümit Seren's profile photoLorenzo Moriondo's profile photo
Add a comment...
Don't buy the IEEE edition of my book.

In January 2013 the IEEE Xplore digital library began hosting digital copies of 400+ books published by MIT Press <>. The IEEE copies are not OA. They're digital copies sold by the chapter. I'm sure that's convenient for some users, and it may also be economical for some chapters of some books. So I'm not criticizing the program as a whole.

My latest book (Open Access, MIT Press, 2012) <> is one of those available in an IEEE edition <>. The problem is that IEEE charges $15 per chapter (PDF-only). By contrast, at Amazon today you can buy the whole paperback edition for $11.72 <> or the whole Kindle edition for $9.99 <>. Moreover, the whole book is open access in eight different file formats <>.

There are other problems here too. My chapters use endnotes, which are all collected in the back of the book and sold by IEEE as a separate chapter. Hence, you could pay $15 for a given chapter and not get any of its notes unless you paid another $15 for them. Moreover, the IEEE flat fee of $15 per chapter applies even to very short sections of the book, like my two-page glossary. The PDF from IEEE is packaged in 15 separate files and will cost you $225.

If you want to buy the whole book, to support the cause, buy the paperback or Kindle edition from MIT Press. If you want to read the book in PDF format, read the open-access PDF from MIT Press <>. It's not only free, it's packaged in one file for continuous reading. 

I can't think of a single reason to buy a single chapter of my book from IEEE.

In fairness, IEEE made its edition available five months before my book became OA. At that time, you couldn't save $225 by choosing an OA edition. You could only save $213 by buying the MIT paperback or save $215 by buying the MIT Kindle edition.

#oa #openaccess +The MIT Press
4 comments on original post
Add a comment...

Richard Smith-Unna

Shared publicly  - 
The good 'ol Fermi Paradox. I read about this a few years ago - still a trip to think about!
Scientists estimate that there are over 100,000 intelligent alien civilizations in our galaxy alone—but we've never heard anything from any of them. Here are 13 possible explanations for why.
View original post
Add a comment...

Richard Smith-Unna

Shared publicly  - 
The Seriousness of the Kessler Syndrome

The Kessler Syndrome recently stared in the hit movie "Gravity," but what is it? Well, it all starts with our trash. The problem with space debris isn't that it's just a problem of having all this trash floating around, or that it will have an environmental impact (since anything that re-enters the atmosphere typically disintegrates). The real problem with space debris is the speed of that debris, and the possibility that said debris will impact other (more valuable) objects in orbit. And we're not talking about a fender-bender here; we're talking about two rather fragile thousand-pound objects colliding at speeds of tens of thousands of miles an hour.

In the event that two objects impact one-another, the collision creates a massive debris cloud which is also traveling at thousands of miles an hour. Anything from stray solar panels to a screw could obliterate another spacecraft (imagine a screw traveling 20,000 miles an hour). That debris would then hit other objects in orbit, which creates more debris and hits more objects get the picture.

To read the full article, see:
1 comment on original post
Rahul Maurya's profile photoJoerg Fliege's profile photo
Add a comment...

Richard Smith-Unna

Shared publicly  - 
Definitely how it happened!

Aslam Raffee's profile photoFrancis Carr's profile photoMahmoud Al-Jammal's profile photoAlex Mateus's profile photo
Seems legit.
Add a comment...
Have him in circles
5,406 people
Natalia Manrique's profile photo
Matt Bird's profile photo
Olaf Prause's profile photo
Srikanth Kasturi's profile photo
Joao Ricardo Nickenig Vissoci's profile photo
Andy Smith's profile photo
William Mauritzen's profile photo
Jim Gray's profile photo
Squared Nob's profile photo
It seems to me that the conjunction of (a) perjuring oneself after (b) illegally breaking into Senate computers to (c) cover up an unimaginably brutal torture program is the sort of thing that one might get fired for.*

For those of you that haven't been following the story, a short summary of what happened: 

(1) The CIA, in offering full access to its files on torture, "inadvertently" included a memorandum submitted to Leon Panetta about the Bush-era torture program. That memorandum concluded that torture of detainees was far more widespread, and far less useful, than previous judgments on the subject had noted.

(2) The CIA attempted to cover its tracks by deleting the memorandum.

(3) Senate staffers for the Committee on Select Intelligence noticed the deletion, and somehow sufficiently averted it that they managed to produce an incomplete printout of the memorandum. That memorandum was brought out of the SCIF and transported to Diane Feinstein's office to prevent the CIA from further interfering with it.

(4) The CIA began an investigation which included spying on Senatorial staffers' emails and monitoring their usage of that computer. That investigation -- and the attempt at prosecution which followed -- was initiated by the same Assistant Counsel of the CIA that ordered that videotapes of the torture of terror suspects be destroyed in 2005. To make it very clear, this was not a disinterested party: this is a person who might face criminal prosecution if the facts which he was attempting to scrub were not scrubbed.

(5) That associate counsel submitted a false statement to the Justice Department in an attempt to start prosecution against Diane Feinstein's aides. This did not go over particularly well with the Senator.

The CIA and its apologists have attempted to justify this as a routine  classified-materials handling investigation. This is nonsense. The security clearances of Senate Select Intelligence staffers (and the need to know) are granted by the Committee itself, upon nonbinding consultation with the DCI. In other words, the committee overseeing the CIA can grant arbitrary clearance to investigators; they do not need to receive permission from the clearing agency. As a result, clearance itself is not an adequate reason for an investigation: if the Committee says its staffers are cleared, then its staffers are cleared, full stop.

But, of course, all of that is irrelevant, because the CIA may not investigate Senate staffers engaged in their official duties. After Gravel v. United States, it's clear that Senatorial aides engaged in legislative business enjoy the same absolute immunity that Senators do. Even if we presume that classified documents had been mishandled -- and I might suggest that attempting to delete files in order to obstruct a Congressional investigation is mishandling of a far fucking higher order than trying to ensure that the legacy of America's idiotic, brutal, shameful and wasteful torture program was actually brought to light. 

* In a just world, using an intelligence agency to intentionally subvert democratic processes is the sort of thing that you would get hanged for. Democratic traditions are fragile and fleeting, and spending those traditions on covering up crimes is the sort of thing that we should take very fucking seriously indeed. (But which we don't.)
Scribd is a way to easily put your documents online.
34 comments on original post
Add a comment...

Richard Smith-Unna

Shared publicly  - 
Google Scholar metrics of scientific journals

Includes, interestingly enough, also "journals" like, which ranks rather high.

The official source for information about Google Scholar
View original post
Add a comment...
Spot on
Elsevier's 50-day tease.

From +Elsevier: "The new Share Link service...allows authors and their network to access their final published articles on ScienceDirect for free for a 50-day period."

Comment. OA is better. It's not limited to 50 days. You can get OA by publishing your work in an OA journal ("gold OA") or by publishing in a non-OA journal, including an Elsevier journal, and depositing a copy of your peer-reviewed manuscript in an OA repository ("green OA"). If you haven't done this before, here's how. 

Here's one of Elsevier's arguments for the new offer: "Researchers who publish in academic journals understand the necessity to expose their papers to the widest audience possible." That's true. But it's an argument for real OA, not a 50-day tease. A more precise formulation makes Elsevier's true statement false: "Researchers who publish in academic journals understand the necessity to expose their papers to the widest audience possible for 50 days, and then keep them locked behind a paywall." 

Here's another Elsevier argument for the new offer: "The new Share Link service makes it easy for authors to share their articles so they can get more exposure and more citations." That's also true. But it's also an argument for real OA, not a 50-day tease. If you really want more exposure and citations, do you want to stop with a 50-day window onto a global audience, or do you want an ongoing global audience?

Elsevier is right that 50 days of free online access is better than no free online access. But watch it try to make that case without making the case for full-bore OA. 

#oa #openaccess #elsevier
The customized link gives 50 days of free access to the author’s article on ScienceDirect after final publication
View original post
Add a comment...

Richard Smith-Unna

Shared publicly  - 
Interesting project for archiving reproducible computational experiments. Now there's no excuse not to :)
Sharing Research Artifacts

My research group is building a facility, Apt (, with the aim of making it easier to share research artifacts (code, datasets, etc.).

The basic idea is that (1) we give you virtual and/or physical machines (completely yours, with root) (2) you get your work environment set up: install your software, get all the dependencies set up, copy in your datasets, etc. (3) you take a snapshot of your setup, called a "profile" (4) we give you a URL - anyone with this URL can get an instance of this "profile" on our hardware, giving them an exact copy of your environment to work in. You can share this URL with your collaborators, publish it in your paper, etc.

I think a good way to illustrate this is to walk through a working example:


In a paper we have appearing at NSDI next month, we included this URL: . Go ahead, try it out, it works.

With a minimal amount of account setup (Apt just verifies your email address) you can create an instance of this profile; in this case, it's a Linux VM containing a MySQL database pre-loaded with the data we used in our paper, plus all of the code used to analyze that data. Once you've verified your email address, Apt starts booting the VM in its cluster.

Once the VM finishes booting, you can log into it - you can either use SSH or a (surprisingly functional) terminal built right into Apt's website. You get a message on login (we just replaced the MOTD) that explains where to find the scripts, which ones to run to produce the numbers in our paper, and a URL (simple webserver hosted in the VM) where you can go to download the graphs once they are generated.

Of course, simply producing the same graphs is not the real end goal here, so there is also a pointer to a README that describes the database schema and basics of the scripts.


Apt is built on Emulab, so (to a first approximation), anything you can run on Emulab can be shared this way. We are also hoping that the streamlined interface makes it more appealing to research communities outside Emulab's traditional userbase of networking and distributed systems.

Having successfully put together a profile ourselves, I think we're about ready to help a few other early adopters try building their own. Let me know if you're interested.
Apt is a testbed facility built around profiles. A profile describes an experiment; when you instantiate a profile, that specification is realized on the Apt cluster using virtual or physical machines. The creator of a profile may put code, data, and other resources into it, and the profile may ...
9 comments on original post
Add a comment...

Richard Smith-Unna

Shared publicly  - 
Now that's a form of taxation I can get on board with. I'm off to... race my stone
Atlas Obscura on Slate is a blog about the world's hidden wonders. Like us on Facebook, Tumblr, or follow us on Twitter @atlasobscura. On a rocky shore in Sweden's southwest sits a pile of sticks that created a nation. Nimis, a mountainous, multi-towered sculpture made of 70 tons of driftwood...
Add a comment...

Richard Smith-Unna

Shared publicly  - 
Today I started thinking about how to set the world's knowledge free, and make science open to everyone. Here's the plan:
Last night at Open Research Cambridge, Jelena Aleksic gave a great talk about Open Access. In her closing comments, she floated the idea of an iTunes for scientific papers. Imagine being able to get any scientific paper for 79p. That's a reasonable price to cover costs of creating, archiving and ...
Add a comment...
Have him in circles
5,406 people
Natalia Manrique's profile photo
Matt Bird's profile photo
Olaf Prause's profile photo
Srikanth Kasturi's profile photo
Joao Ricardo Nickenig Vissoci's profile photo
Andy Smith's profile photo
William Mauritzen's profile photo
Jim Gray's profile photo
Squared Nob's profile photo
Plant scientist
  • Millennium Seed Bank, Kew
    Associate Researcher, 2010 - present
  • Myself
    Freelance iOS Developer, 2011 - 2012
  • Highways Agency Environment Team
    Conservation Biologist, 2009 - 2010
Map of the places this user has livedMap of the places this user has livedMap of the places this user has lived
Cambridge, UK
London - Bristol - Guildford
Other profiles
Contributor to
Plant scientist, geek
I'm a plant scientist who loves science, data, tech and Japanese. You can expect me to post mostly about plant science, and sometimes about general science and technology.

During and since my undergrad degree I worked at the Millennium Seed Bank as a researcher and data analyst, so I might write about seeds a lot. They're honestly much more interesting than you might think.

I'm currently a PhD student in the Hibberd lab at Cambridge, where I'm attempting to decipher the molecular basis of C4 photosynthesis using various tools, but especially comparative transcriptomics. I have grandiose (but totally achievable) ambitions of being able to solve the world food crisis using plants (following my supervisor's sterling example).

I'm also a programmer, and under that guise I write scientific data analysis tools and do some freelance iOS developing. My major project this year has been writing AnkiMobile 2 for iOS (released Oct 2012).

Blahah on fora, @blahah404 on Twitter, and this is
  • University of Cambridge
    PhD Plant Sciences, 2012 - present
  • University of the West of England
    Conservation biology, plant science, 2008 - 2012
Basic Information
Other names
Blahah, Blahah404
Richard Smith-Unna's +1's are the things they like, agree with, or want to recommend.
StrongLifts 5x5 Workout - Android Apps on Google Play

The simplest, most effective workout to get stronger, build muscle and burn fat fast. Three exercises, three times a week, 45 minutes per wo


Rick Castle is America's favorite crime author, and he's about to find inspiration for a new character in one of New York's finest detective


Humanity's open-source automated precision farming machine.

UWE experts in garden shed solve world's water purification crisis

In a garden shed in Bristol, scientists are preparing to change the lives of millions, by drinking the water from the pond by which it sits.

Hammond Auto

Hammond Auto hasn't shared anything on this page with you.

Gmail - Email from Google

The ease and simplicity of Gmail, available across all your devices. Gmail's inbox helps you stay organized by sorting your mail by type. Pl


Google+ aims to make sharing on the web more like sharing in real life. Check out Circles, Events and Hangouts, just a few of the things we'

Snowden: UK government now leaking documents about itself

Glenn Greenwald: The NSA whistleblower says: 'I have never spoken with, worked with, or provided any journalistic materials to the Independe


Instantly connect to what's most important to you. Follow your friends, experts, favorite celebrities, and breaking news.

Welcome to Facebook - Log In, Sign Up or Learn More

Facebook is a social utility that connects people with friends and others who work, study and live around them. People use Facebook to keep

The Strange Case of Barrett Brown | The Nation

Amid the outrage over the NSA's spying program, the jailing of journalist Barrett Brown points to a deeper and very troubling problem.

| Machine Learning

Video Lecture: in Machine Learning on Coursera.

Amazing Horse - Weebl's Stuff

A song about how womens rights have improved as the use of horses for transport has declined. Not kid friendly. [UPDATE] You can now buy the

NASA - The Rose

The spinning vortex of Saturn's north polar storm resembles a deep red rose of giant proportions surrounded by green foliage in this false-c

How to Create a Full Screen PDF Presentation in Illustrator

Sharp looking presentations give a professional edge to those presenting them. They provide a source of credibility that a client can hold o

Algorithms: Design and Analysis, Part I

About the Course. In this course you will learn several fundamental principles of algorithm design. You'll learn the divide-and-conquer desi

Google Plus Nick

Make short url for google+ profile.

Watch Mobile iPhone Anime

Watch and stream hundreds of anime directly on your iPhone, iPad, &amp; iPod Touch, Android, &amp; Mobile Devices here at

Reviewing the Kanji

A web app for James Heisig's Remembering the Kanji.

Stack Exchange

Making the Internet a better place to get awesome answers