Between genres: science journalism and science research

+Julie Moore  has a nice post [http://lexicoblog.blogspot.co.uk/2016/01/semi-academic-sources-in-eap-interview.html] on the use of science journalism articles in an EAP settings and how important it is to make students aware that these types of texts are very different to journal research articles.

She points out that part of the appeal of magazine articles on science is that abstracts in journals “are incredibly densely packed and require a certain degree of skill to decode.”

The PLOS (Public Library of Science) website asks authors to write an author summary which is “Distinct from the scientific abstract, the Author Summary is included in the article to make findings accessible to an audience of both scientists and non-scientists."

This presents a possible halfway house for EAP students. The PLOS abstracts are restricted to mainly biology and medical domains and not all papers have author summaries.

One could simply copy paste abstracts and author summaries from the web pages. Or one could semi-automate this.

There is a nice scraper called quickscrape [https://github.com/ContentMine/quickscrape] which allows you to download articles from various journals. Follow the instructions on the github site to set it up and to understand the quickscrape commands. The configuration for plos journals can be modified so that you only need to download the abstracts.

In the journal-scrapers/scrapers/plos.json file modify the file like so:
{
  "url": "plos.*\\.org",
  "elements": {
    "abstract_html": {
      "selector": "(//div[contains(@class,'abstract’)])[2]”
      
    }
      }
}

The number in the above config just downloads the author summary, to download the original abstract change the number to 1.

There seems to be a limitation if you start hitting the journal server too much so be wary of that.

Here are the files for 10 abstracts and 10 author summaries [https://drive.google.com/open?id=0B7FW2BYaBgeiNExJS2FiZzVpYW8]
Shared publicly