Post has attachment
I’m organizing a special issue on Animal Phylogenetics to be published by
the journal Zoological Systematics (http://www.zootax.com.cn). You are welcome to write a mini-review
or an article on phylogenetic methods. Or please kindly encourage your
collaborators to contact me on the similar topics.

Best,

Chao-Dong ZHU/cd

Post has attachment
I have updated my discussion of analyzing data with saturation of silent sites.  The example is a sampling of vertebrate mitochondrial complete genomes including some fish, birds, mammals, amphibians, etc.

Post has attachment

Post has attachment
Erick Matsen has created a nice online community for phylogeneitc methods.  It has discussions of methods, data sets, software and so on. 

Post has attachment
I'm doing a short hangout on the 21st Jan as part of the #baltiandbioinformatics  series of bioinformatics meetings. Here's the link https://plus.google.com/events/cbtuikle0h2619obgjrgfu74424

The theme is "open data and reproducible bioinformatics". I'm going to talk about reproducible phylogenomics, and cover some of the issues we've addressed trying to modernise our way of dealing with phylogenomic data. +Amir Szitenberg and I have built ReproPhylo, a reproducible phylogenomics environment, as our solution. I really need to find time to blog about it but here are the #ReproPhylo links.
webpage: hulluni-bioinformatics.github.io/ReproPhylo/ documentation: http://goo.gl/yW6J1J

Post has attachment
Abstract
"With the availability of genomic sequence data, there is increasing interest in using genes with a possible history of duplication and loss for species tree inference. Here we assess the performance of both non-probabilistic and probabilistic species tree inference approaches using gene duplication and loss and coalescence simulations. We evaluated the performance of gene tree parsimony (GTP) based on duplication (Only-dup), duplication and loss (Dup-loss), and deep coalescence (Deep-c) costs, the NJst distance method, the MulRF supertree method, and PHYLDOG, which jointly estimates gene trees and species tree using a hierarchical probabilistic model. We examined the effects of gene tree and species sampling, gene tree error, and duplication and loss rates on the accuracy of phylogenetic estimates.

In the 10-taxon duplication and loss simulation experiments, MulRF is more accurate than the other methods when the duplication and loss rates are low, and Dup-loss is generally the most accurate when the duplication and loss rates are high. PHYLDOG performs well in 10-taxon duplication and loss simulations, but its run time is prohibitively long on larger data sets. In the larger duplication and loss simulation experiments, MulRF outperforms all other methods in experiments with at most 100 taxa; however, in the larger simulation, Dup-loss generally performs best. In all duplication and loss simulation experiments with more than 10 taxa, all methods perform better with more gene trees and fewer missing sequences, and they are all affected by gene tree error.

Our results also highlight high levels of error in estimates of duplications and losses from GTP methods and demonstrate the usefulness of methods based on generic tree distances for large analyses."

Post has attachment
Abstract
"Molecular phylogenetics is a powerful tool for inferring both the process and pattern of evolution from genomic sequence data. Statistical approaches, such as maximum likelihood and Bayesian inference, are now established as the preferred methods of inference. The choice of models that a researcher uses for inference is of critical importance, and there are established methods for model selection conditioned on a particular type of data, such as nucleotides, amino acids, or codons. A major limitation of existing model selection approaches is that they can only compare models acting upon a single type of data.

Here, we extend model selection to allow comparisons between models describing different types of data by introducing the idea of adapter functions, which project aggregated models onto the originally observed sequence data. These projections are implemented in the program ModelOMatic and used to perform model selection on 3722 families from the PANDIT database, 68 genes from an arthropod phylogenomic data set, and 248 genes from a vertebrate phylogenomic data set. For the PANDIT and arthropod data, we find that amino acid models are selected for the overwhelming majority of alignments; with progressively smaller numbers of alignments selecting codon and nucleotide models, and no families selecting RY-based models. In contrast, nearly all alignments from the vertebrate data set select codon-based models. The sequence divergence, the number of sequences, and the degree of selection acting upon the protein sequences may contribute to explaining this variation in model selection.

Our ModelOMatic program is fast, with most families from PANDIT taking fewer than 150 s to complete, and should therefore be easily incorporated into existing phylogenetic pipelines. ModelOMatic is available at https://code.google.com/p/modelomatic/."

Post has attachment
Abstract
"Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference.

 Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis–Hastings or Gibbs sampling of the posterior distribution."

Post has attachment
from the abstract:

We observe that the inclusion of structural information significantly reduces alignment and topology uncertainty, and reduces the number of topology and alignment errors in cases where the true trees and alignments are known. In some cases the inclusion of structure results in changes to the consensus topology, indicating that structure may contain additional information beyond that which can be obtained from sequences.

Post has attachment
Wait while more posts are being loaded