Post has attachment
Another article for our text-mining subgroup to check out. >> Text mining for search term development in systematic reviewing: A discussion of some methods and challenges. Claire Stansfield, Alison O'Mara-Eves, James Thomas. Research Synthesis Methods.
First published: 29 June 2017DOI: 10.1002/jrsm.1250 (Check our online folder)

MINUTES from June 30, 2017

Patricia, Lin, Carol, Tierney, Pamela
Regrets: Skye, Joanne

Articles & Publications

RSM article submission update.
No new activity, peer review not yet in progress as far as we know.
Proposed JMLA short article / editorial depends on publication of this piece.
Followup with RSM in mid-July if we haven't heard from them.

Search strategy article in semi-final edits stage. Will be open to full team for comments and questions within two weeks. Plan to review and meet to discuss in two weeks. Following that, expect submission to a journal to follow shortly.

The MLA poster needs to be written up as an article. This is the piece that combined tech mining strategies and text mining tools in the analysis. It can also be expanded with additional text mining analysis that Patricia has done.
Team to draft the article: Patricia, Carol, Lin. Available as back up: Tierney, Pamela.

Proposed (following publication of both tech mining articles): a short 'editorial' for JMLA on blended methods we used in our work, and their potential for expending the utility of systematic review datasets.

NEXT MEETING (Please put on your calendars)
July 14, 10am ET

Hi, all!

I think I have the stop word list narrowed down, but I would appreciate second opinions on what to include. I started with the original Voyant default stop words list, and then iteratively excluded terms that seemed to broad and non-specific to get at our concepts. I did the work and testing on the 2016 data set, and am about to re-run this in the other datasets (for 2015, 2014, 2013).

My concern is that because I am just one person and we are rushing that I may have inserted assumptions or bias or fatigue into the process, and that there may be some words excluded that could actually be valuable to have included. Especially, I am concerned about inconsistencies between things I put in the "words to keep" pile and the stop words list.

How I did this. I started out looking at the word cloud of terms, and deselected terms that were generic within healthcare and/or not associated with technology or tech discovery. I kept a two column list that accounted for all of the terms in that word cloud. I'd then a) send myself an email with both lists of changes, b) edit the master Google Drive files lists for each set, c) edit the user-generated stop word list, and d) rerun Voyant, and see how things changed. After a bit, I discovered that the TERM view would let me go a lot deeper into the terms faster, and I switched to that view for editing the stop word list, and the word cloud view for testing.

The most current full version of the stop words list is here:

The iterations for what words appeared with top frequencies and which I kept in the data set is here:

Please start out reviewing what was kept, and then scan the stop word list for anything you think should NOT be excluded from the analysis.


Post has attachment

Andrea, Carol, Joanne, Lin, Patricia, Tierney,

YAY, the cleaned files work on Patricia's big machine (but not on laptop)
Working on expanding the stop word list.
For poster, will probably have graphics from Voyant for 1-4 years. Probably not going to get Google Refine or AntConc done for the poster, but we will for the article.

Poster needs:
- a definition of "emerging" tech
Possible context: Gartner hype cycle
Translate the phases of the Gartner hype cycle to roles for librarians. Such as, the trigger phase = peripheral awareness; peak & trough = a support role for faculty/researchers working in those areas, and librarians use in academic or R&D spaces in an exploratory role; plateau = librarians start to use these more broadly.

Wednesday, May 10, 2pm ET, working session for poster

Friday, May 12, 10am, ET
To finalize and submit poster


Post has attachment
MINUTES from April 28, 2017

Patricia, Joanne, Lin, Skye, Tierney

1) Joanne reports search strategy article is done, and the team is ready to share draft with the rest of us for the final review prior to submission.
2) RSM article report out - journal reports many excellent entries resulting in slower than hoped turn around for processing.

1) Copy of last year's poster for us to edit for this year
2) Abstract for this year is here
3) Timeline: To get the poster printed free, I need to have it done by MLA 10 to give to the person here who is printing.

1) Data folder:
2) CSV data file
3) Fields needed for analysis: PMID, Title, Abstract, MeSH, Keywords
4) What to do next:
- When other fields are removed, add the new file with the word CLEANED added to the end
- Upload the complete five year file
- Break cleaned file into each year. Add YEAR (####) to end of each filename.
- Upload those five small files to the Data folder
5) Notes on original files and results:
For the original set results: - Five year data file has 162339 records - Endnote deduped version has 162221 references - Three year data file has 107531
6) Need to refine Stopwords list -

Challenges include: inadequate hardware, large file size, file conversion, computers that can't handle new version of text mining software
Software: Endnote, Voyant, OpenRefine/GoogleRefine, AntConc

POSTER TOPICS & ASSIGNMENTS: Datacleaning = Joanne; Analysis (Challenges & Solutions) = Lin; Software = Lin; Text Mining Images = Patricia; Results = Patricia; Next Steps = Tierney; Sources/Resources = Skye

Post has attachment
Just discovered this new book of interest to our team:
Yang, Sharon Q., author.: Emerging technologies for librarians : a practical approach to innovation / Sharon Q. Yang, LiLi Li.. Amsterdam : CP/Chandos Publishing, [2016]. xiii, 285 pages : color illustrations ; 23 cm..

Main topics include:
LMS; discovery tools; metadata; semantic web; mobile; digitization; digital library; online instruction; virtual reference; social media marketing; web design; web content management; software tools for assessment; altmetrics; online collaboration tools.

Post has attachment
MINUTES: Today's Meeting
ATTENDING: Carol, Pamela, Patricia, Skye, Tierney

It was mostly a working meeting, where we tried to find solutions to the known gaps in the article, ID what's unknown gaps, ID overlap or repetition, etc.


Question for Joanne/Lin:
what exactly is the EndNote custom style we used and how does that work?

compare table 2 (bottom of entire document) with case study sections. Do the two align? Is anything missing from Table 2 that was discussed in the case study and would seem appropriate? Was anything highlighted in table two that was left out of the case study? Jot notes in the introduction area, we may be able to do this through that.

Issues that arose:
For the original set results:
- Five year data file has 162339 records
- Endnote deduped version has 162221 references
- Three year data file has 107531

Define / Clarify:
search strategy - validate, refine, revise, update, repeat; this is kind of discussed so not sure if we want to expand
determine endpoint
data cleaning: clustering, irrelevancy, and automated techniques
trends and inductive analysis

why and how we included tech mining methods > "independent parallel development corroborates and validates both methodological approaches"
motivating challenge, goals/outcomes, and timeline

Post has attachment
This week we submitted our first article, and have been given a deadline extension to expand the content included in it. I'll be presenting portions of the work process we've done and this article in a local workshop on March 17. Our meeting today focused on what's needed right now for the articles, the presentation, and getting our data analysis completed for MLA.

Attending: Lin, Joanne, Tierney, Carol, Patricia


We've discovered that our data set cannot be exported as CSV from Pubmed using FLink.


We are trying to convert the exported MEDLINE txt file to CSV through EndNote, but are encountering challenges with that, also. Joanne has written a custom script for this that is being tested by Lin. Meanwhile, here are some other resources for this process.



We are expanding our article on systematic review methodologies compared with tech mining methods to include a case study of our methodology.

Our prior notes files on our methodologies:
Journal title inclusion/exclusion criteria notes
Publication file notes and drafts:

Plan for the methods case study:
- look at table 1 in article
- extract portions relevant to our methods
- brief paragraphs on the main areas, focusing on showing how we made our methods systematic & replicable
- Generate new table of highlights of our methods, where they are similar to systematic reviews, where they are similar to tech mining, and where they are new or unique


As we try to convert and clean data, versions of the file are being held here:

Data folder <>

Please use file naming convention of [team][datasource][date]

For the original set results:
- Five year data file has 162339 records
- Endnote deduped version has 162221 references
- Three year data file has 107531

We will meet again next week to finalize the article.

BRIEF NOTES from the Jan. 20 meeting.

February 15: draft of tech mining article to Carol for proofing and editing
February 20: MLA abstract completion deadline
February 25, submit tech mining article:

Webinar recording link was emailed.
I'm working on a blogpost for text-mining 101/basics.

The plan is/was for each subteam to take the search strategy and walk through the steps to get us going with text mining - data export, import, and analysis, as a trial / proof-of-concept so we can write the abstract.

I'd like to get all the subgroup team leaders together to coordinate some project management stuff, and just see:
- how is everyone doing
- does any group need something from another group
- etc.

I'm thinking of next Friday, December 16, at our usual time of 10am ET, if that's ok for folk? Here's what I have for who I hope will show up.

Article: Search Strategy: Tierney Lyons
Article: Tech Mining: Patricia Anderson
Search Strategy FINAL: Carol Shannon
Data Set Creation: (Who is lead? Skye Bickett / Pamela Herring / Lin Wu?)
Data Set Cleaning: Joanne Doucette
Analysis & Text Mining: Patricia Anderson / Lin Wu

Let me know if there are any problems or changes to any of this. THANKS!

- Patricia
Wait while more posts are being loaded