Profile

Cover photo
Ben Morris
Works at Machine Zone
Attended Utah State University
Lives in Union City, CA 94587
67 followers|126,506 views
AboutPostsPhotosYouTubeReviews

Stream

Ben Morris

Shared publicly  - 
 
Using Make for reproducible scientific analyses
The two tools that I use most frequently to manage my research are Git and Make. I've been using Git for years, but Make for only the pas...
The two tools that I use most frequently to manage my research are Git and Make. I've been using Git for years, but Make for only the past year or so - ironically, I learned about it at a Software Carpentry workshop at which ...
1
Stephen Eglen's profile photo
 
LIked this article Ben -- another advantage of make is the ability to run independent jobs in parallel.  So your last make all example could be "make -j4 n' to get close to 4x speedup if 4 cores are free.
Add a comment...

Ben Morris

Shared publicly  - 
 
 
We have a huge and valuable opportunity to honor hard work and dedication in our community:  the White House is calling for nominations for “Open Science” Champions of Change: http://www.whitehouse.gov/blog/2013/05/07/seeking-outstanding-open-science-champions-change

Awards matter.  They feel good, they help people get taken seriously, and they make it easier to get funding.  Let’s run with this opportunity!

Nominations must be in *by May 14 2013 (tomorrow!)* and you can nominate as many people and organizations as you like.  It isn’t clear, but it seems like multiple people will be honored.  The nominator and nominated must be both be affiliated with a US address.  Under “Theme of Service,” choose “Open Science”.

Here are my personal open science champions of change. [..]

#openscience   #opendata   #openaccess  
1
Add a comment...

Ben Morris

project ideas  - 
 
Also, there was talk on the Phylotastic mailing list about a NeXML to R converter. What ever happened to that idea?

Again, I've written a NeXML parser and have some basic R experience and would be happy to lead but there are probably people here more suited to this than I am - Scott Chamberlain, for one.
2
1
Scott Chamberlain's profile photoRutger Vos's profile photoCarl Boettiger's profile photo
9 comments
 
Hi all, I just added a draft project to the NESCent wiki. I took the liberty of putting myself down as "primary" and +Ben Morris, +Scott Chamberlain, +Brian O'Meara and +Carl Boettiger as "co-mentors", though frankly I would be happy with whatever configuration as long as someone with R competence (i.e. not me) is part of this. Here's the link, edit at will: http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013#Implementing_semantically_rich_NeXML_I.2FO_in_R
Add a comment...
Have him in circles
67 people
Nathan Watson-Haigh's profile photo
Michael Rosenberg's profile photo
JOE ROCK's profile photo
Carol Morris's profile photo
Brian Evans's profile photo
Shanyl Traboc's profile photo
abhineet agarwal's profile photo
Arlene Labonite's profile photo
Joel Adamson's profile photo

Ben Morris

Shared publicly  - 
1
Add a comment...

Ben Morris

discussion  - 
 
This is a message from +Vilmos Serédi, who is interested in the annotation of higher-order taxa project, which I'm posting here with his permission (continuing our conversation from the mailing list):

Dear Mr Morris,
 
Thank you for your answer. 
I’ve read the bio.phylos manual and tried some example codes, so I have now a vague idea about how a tree is represented in bio-python.
Still have a few questions:
how does a taxonomical data (required by this script) look like?
what source do you recommend reading before/after applying?
 Thanks for your effort and help in advance.
Regards,
Vilmos Serédi
1
Ben Morris's profile photo
2 comments
 
Regarding your timeline: the beginning of the 12 weeks should be the beginning of coding. It seems that you have planned "information gathering and data validation" for the first three weeks. Ideally, this would all be done before the first week, and the first week you would begin directly working on the solution. (I'm also not sure why you referenced "the open tree of life" in that section - that's a separate project.)

Also, when you say that synonyms and homonyms "could be resolved through unique identifiers," can you add more detail about what the problem is specifically, what identifiers you would use and how they would address the problem?

I like how you've listed milestones for each 3-week period. You could improve this by adding more detail about the specific pieces of the final solution you'd be building and when you expect those to be complete.
Add a comment...

Ben Morris

Shared publicly  - 
 
 
Have an open-source biology project and more ideas that you have time to  implement? Want to get more students into computational biology? The Phyloinformatics Summer of Code is looking for mentors and summer-long programming project ideas as part of NESCent's organizational application for the Google Summer of Code. 

If you are interested in being a mentor, or have a project idea, join the discussion. More information about participating and creating project ideas here:

http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013
1
Add a comment...

Ben Morris

project ideas  - 
 
The Phylotastic RDF treestore is a fast phylogenetic tree database which uses BioPython to convert trees into RDF for storage in a triple store, and SPARQL to query them. The source is available on GitHub: 
https://github.com/bendmorris/rdf-treestore

A web interface is being developed here that allows community storage, annotation, and querying of trees:
https://github.com/bendmorris/phylofile

Three things that would be helpful for this project:

**One: write a Python script to automate the annotation of the inner nodes of a tree, e.g. with names of higher taxa from a taxonomy. Here's a schematic of what I mean:

https://gist.github.com/bendmorris/5076354

This doesn't need to be RDF-specific at all; in fact, a student could complete this task using BioPython without any knowledge of RDF.

**Two: sometimes different sources will refer to the same species in multiple ways. For example, the three names "Crypturellus erythropus," "Crypturellus columbianus," and "Crypturellus saltuarius" look like three distinct species, but they're actually three subspecies of a single species of bird. A difficult problem is to take user input (which could be any of these three names) and try to match it to a species on a phylogenetic tree (which could be labelled with any of these names.) This is the job of a Taxonomic Name Resolution Service (see for example http://tnrs.iplantcollaborative.org/).

We're storing trees as RDF; we want to write a script that will use TNRS to find all of the possible name matches for a specific node in a tree, and generate RDF statements connecting them to that node, together information on the naming authority, etc. This way, queries using any of these names will be able to match the node. Users could also decide to limit query matches to a specific naming authority, etc. We'll also want some way of dealing with ambiguities if a name could refer to multiple nodes in the same tree.

**Three: existing repositories (e.g. TreeBASE) contain many tree files and metadata in a repository-specific format. These trees can already be converted to RDF, but it would be useful to write a script which automates the download of all trees from a repository and converts the metadata into RDF which can be used to inform queries.
1
Jim Procter's profile photo
 
could you put up a bit more detail about this ?  links to the existing codebase and some specific aims like 'a script to annotate an rdf tree with additional terms' would help (otherwise, you might find someone wanting to do ascii art might apply ;) )
Add a comment...

Ben Morris

project ideas  - 
 
I'd love to see a general many-to-many taxon matching tool.

In my research I constantly have to try to join two or more datasets (say, joining bird abundance data to bird morphological data), each of which may refer to the same species in different ways. For each species in Dataset A, we need to figure out whether the species is present in Dataset B, possibly under a different name. Tools like TNRS could be helpful for this, but I'm not aware of an algorithm that finds the possible matches for each record and then tries to find the most likely matches between two lists, row by row (a many-to-many point matching problem.) In the past I've rolled my own hacky solutions to this problem, and I suspect plenty of other people have. We should aim to make this a general solution that doesn't assume any specific TNRS but works with any tool that tries to clean up species names.

I'd be happy to co-mentor this, but there are plenty of people in this community who have more expertise than I do, including everyone that's been involved with TNRS. Thoughts?
2
Ben Morris's profile photoRutger Vos's profile photoJim Procter's profile photoGaurav Vaidya's profile photo
5 comments
 
OpenRefine does this against standard databases (see http://iphylo.blogspot.com/2012/02/using-google-refine-and-taxonomic.html), but it does have a way to match names between multiple projects, too (http://blog.ouseful.info/2011/05/06/merging-datesets-with-common-columns-in-google-refine/). It might be possible to build a UI over that to do something like what you suggest maybe?
Add a comment...
People
Have him in circles
67 people
Nathan Watson-Haigh's profile photo
Michael Rosenberg's profile photo
JOE ROCK's profile photo
Carol Morris's profile photo
Brian Evans's profile photo
Shanyl Traboc's profile photo
abhineet agarwal's profile photo
Arlene Labonite's profile photo
Joel Adamson's profile photo
Education
  • Utah State University
    Computational Biology, 2012
Basic Information
Gender
Male
Work
Occupation
Data Platform Engineer
Employment
  • Machine Zone
    Data Platform Engineer, present
Places
Map of the places this user has livedMap of the places this user has livedMap of the places this user has lived
Currently
Union City, CA 94587
Previously
Hillsborough, NC 27278 - Federal Way, WA 98023 - Providence, UT 84332
Links
YouTube
Contributor to
We've been thrilled with the service at RPM. We were moving across the country and needed to get our house rented fast. They were able to meet with us quickly and get everything set up, advertised our house, and approved a tenant within a couple weeks at $100 more than we were originally asking - effectively paying for themselves! The staff has always been very responsive to questions or concerns. It's also great not to have the stress of trying to handle collections/maintenance or whatever else will arise in another state. I couldn't recommend RPM enough. Thanks guys!
• • •
Public - 3 years ago
reviewed 3 years ago
1 review
Map
Map
Map