The Phylotastic RDF treestore is a fast phylogenetic tree database which uses BioPython to convert trees into RDF for storage in a triple store, and SPARQL to query them. The source is available on GitHub: https://github.com/bendmorris/rdf-treestore
A web interface is being developed here that allows community storage, annotation, and querying of trees:https://github.com/bendmorris/phylofile
Three things that would be helpful for this project:
**One: write a Python script to automate the annotation of the inner nodes of a tree, e.g. with names of higher taxa from a taxonomy. Here's a schematic of what I mean:https://gist.github.com/bendmorris/5076354
This doesn't need to be RDF-specific at all; in fact, a student could complete this task using BioPython without any knowledge of RDF.
**Two: sometimes different sources will refer to the same species in multiple ways. For example, the three names "Crypturellus erythropus," "Crypturellus columbianus," and "Crypturellus saltuarius" look like three distinct species, but they're actually three subspecies of a single species of bird. A difficult problem is to take user input (which could be any of these three names) and try to match it to a species on a phylogenetic tree (which could be labelled with any of these names.) This is the job of a Taxonomic Name Resolution Service (see for example http://tnrs.iplantcollaborative.org/
We're storing trees as RDF; we want to write a script that will use TNRS to find all of the possible name matches for a specific node in a tree, and generate RDF statements connecting them to that node, together information on the naming authority, etc. This way, queries using any of these names will be able to match the node. Users could also decide to limit query matches to a specific naming authority, etc. We'll also want some way of dealing with ambiguities if a name could refer to multiple nodes in the same tree.
**Three: existing repositories (e.g. TreeBASE) contain many tree files and metadata in a repository-specific format. These trees can already be converted to RDF, but it would be useful to write a script which automates the download of all trees from a repository and converts the metadata into RDF which can be used to inform queries.