Shared publicly  - 
Is neighbor-joining a phenetic or phylogenetic method?

seriously though, I was always taught it was phenetic - I'm struggling to see how it could be classed as phylogenetic

Wikipedia seems to agree that it's phenetic...
Vladimir Blagoderov's profile photoStefan Schauer's profile photoLeonardo de Oliveira Martins's profile photoAlan Cann's profile photo
I learnt systematics at the NHM(London) - we're taught NJ is a phenetic method there. I'd be really interested to hear what other people's views are. This issue has come up before...
Speaking as a complete outsider, I think as the size of molecular datasets have increased, people have tended to see clustering methods as giving acceptable first-pass approximations of the sort of results that a 'true phylogenetic' method would produce.

Me, I'm an instrumentalist when it comes to scientific method. Give me some simulations which compare how well the methods produce phylogenies close to the 'true' tree; everything else is labels. We should be much more worried about whether any of our methods can produce trees which are close enough to the truth rather than arguing based on first principles which method is philosophically more sound.
No doubt about it: phenetic. As David mentions, a lot of people use this today simply because of the computational burden of genomic data sets. However, it is an algorithmic approximation, and even if it gets the tree correct 100% of the time, it ain't phylogenetic.

I also strongly agree with David that instrumentalism is the way to go: if it works for your purpose, use it; if it doesn't, don't. For example, NJ may be entirely adequate for someone working in genomics (although I'd always argue for the more rigorous approach, runtime be damned). Farris-style vitriolic footnoted arguments are simply unpersuasive. Sure (some of) it makes sense, but I want things demonstrated to be superior in practice.
No one can say that Farris doesn't write evocatively; here's an example published today: "... so that connecting phenetics with total evidence was much like calling Hitler a libertarian on the grounds that he ate tomatoes."

It's a wonder why we don't see more Hitler-tomatoe analogies in the literature.
Thanks +Joseph Brown I was beginning to think I was crazy...

Also note, I never argued that NJ should never be used. Like you say, it does have its uses and can be justified for use on practical grounds (it's computationally cheap to calculate).

Having said that... (just one example of too many like this in the literature) researchers in 2012 onward have NO excuse for analyzing a dataset of a mere ~30 taxa, genetic data, with just NJ.
"Phylogenetic trees among a nontrivial number of input sequences are constructed using computational phylogenetics methods. Distance-matrix methods such as neighbor-joining or UPGMA, which calculate genetic distance from multiple sequence alignments, are simplest to implement, but do not invoke an evolutionary model." by Jonathan Eisen
" I don't like it because they use an explicitly phylogenetic method (neighbor joining, which is designed to infer phylogenetic trees and not to simply cluster entities by their similarity) to cluster entities that do not have a phylogenetic history. "

So, it is a phylogenetic method.
Who originally defined the phenetic/phylogenetic distinction? What criterion was used?
I'm not sure there ever was a phylogenetic/phenetic distinction. I think that emerged out of the telephone game from the phenetic/cladistic distinction, along with other phylogenetic legends.
Think of it as more of a character vs. distance distinction. NJ (and other distance methods) boil down an entire alignment into pairwise distances, and cluster taxa according to those distances. This can also be said to cluster according to overall similarity i.e. phenetics. "Phylogenetic" methods work with the individual characters, and model where in the tree individual character transitions occur. NJ doesn't do this, and so is considered phenetic.
I think there is some diversity in the use of "phylogenetic" out there but here are my thoughts on some of the terminology. Phylogenetics is really the study of the relationships among organisms (or genes, genomes, or other entities). And phylogenetic methods are methods for inferring phylogeny.

Phylogenetic methods come in many flavors. Some people divide them into two classes - as Joseph Brown did above - into distance based methods and discrete data methods (see for example Other people divide up methods into distance, parsimony and likelihood or distance, parsimony, likelihood and Bayesian categories - in essence treating pastimony based methods as distinct from likelihood/Bayesian methods even though they both deal with analyzing discrete data/characters.

In regard to phenetics - phenetics as far as I am aware has been used to describe methods that grouped organisms by their similarity to each other and generally ignored evolutionary history. To group organisms by their similarity one generally uses distance matrix methods so there is some overlap between distance matrix phylogenetic methods and phenetic methods.

There is some disagreement as to what distance based methods should be called phenetic and just what neighbor-joining actually is doing (e.g., see In essence NJ attempts to infer a phylogenetic tree from a distance matrix by minimizing the total branch length in the tree and in essence assuming that the distances are additive. It is certainly true that one could feed ANY distance matrix into NJ. But given its methodology, I think it is only really suitable for distances that are the result of a bifurcating evolutionary history. I have yet to see an example of a case where other types of data can reasonably be analyzed using NJ. In addition, due to the method of NJ it is not the standard phenetic approach of grouping organisms by similarity per se - because NJ allows rates of evolution to vary between taxa and thus organisms could be monophyletic yet be more similar to things outside of their clade.

Just as distance methods can be used for non evolutionary purposes (e.g., standard clustering) so too can discrete character methods. For example, parsimony analysis can be applied to any data matrix. And one can infer "changes" between states even for objects that are not homologous and share no common ancestry. This does not mean parsimony methods SHOULD be used in such cases, but they can. And similarly, just because one can use a distance based phylogenetic method to analyze data that does not have a phylogenetic history, this does not mean one should. The issue in both cases is whether the model/algorithm is appropriate for the type of data. Since NJ in essence assumes additive distances it does not seem valid for most cases except phylogenetic history (note - I am not saying it is ideal for phylogenetic history and in fact I do not use it anymore) but that is not the point.

It does not matter what we call the methods - phenetic or phylogenetic. What matters is the nuts and bolts of how they work. And NJ seems like a bad idea for clustering most objects.
On the note of people choosing odd datasets to apply phylogenetic analyses to, I can think of two good examples: one paper did a cladogram of mathematic models for mollusc shell growth and another study did a cladogram of plankton bloom qualities. Similarily, there was one paper I know which did a theoretical morphospace of theoretical morphospaces...
And NJ in a general sense is not a strictly phylogenetic approach because, as Jonathan pointed out, NJ can be applied to any type of similarity or dissimilarity data. NJ in a stricter sense, as applied in phylogenetic analysis where data assume evolutionary models, is a phylogenetic method. It isn't correct to then classify it as phenetic, yet general NJ can indeed be applied to similarity measures for phenetic analysis.
+Ross Mounce IMHO if someone taught you that NJ was phenetic that person needs a good slap (it's not the 1970's anymore). The phenetic/phylogenetic distinction was nothing to do with methods per see. The phenetics/phylogenetic debate was partly about goals (do we represent similarity or do we infer evolutionary history?) and partly about personalities (there's a series of papers from the 80's that is basically a public fight between the thesis committee and a PhD student - good times).

The mainstay of "phenetics", UPGMA, is a perfectly good "phylogenetic" method under some situations (e.g., a molecular clock). "Phylogenetic" methods such as parsimony demonstrably fail to reconstruct phylogenies under some situations.

+Peter Roopnarine I don't agree that because a method can be applied outside phylogenetics it isn't phylogenetic. We can define any method algorithmically and apply it to any data we want (as +David Bapst points out).

Can we please lay the phenetics/phylogenetic debate to rest? It's a lazy way to characterise methods of tree building.
But we do need ways in which to characterise methods of dendrogram building, right?
As such 'phenetic' is potentially useful and established label for those methods which approach the matter mechanistically from a similarity-based POV.

One should not willy-nilly call methods one does not like 'phenetic' (as does happen in the literature) - I certainly don't support or condone this usage. But if it (phenetic) can be quantitatively defined as 'methods based on minimizing overall similarity, with no reference to homology-based shared evolutionary-history', e.g. similarity between neighbor distances, then I think such a label would be useful for systematists to distinguish between the many tree producing methods that are out there.

There are other labels as well which can be useful to discriminate between tree building / tree producing methods.

+Roderic D. M. Page Are you trying to say phenetic is a redundant/useless term? Has its meaning been eroded, or can no-one agree what it means?
+Ross Mounce I don't like invoking authority but +Joe Felsenstein wrote in "Inferring Phylogenies"

"Making this distinction ... implies that something fundamental is missing from the 'phenetic' methods, that they are ignoring information that the 'cladistic' methods do not. In fact, both methods can be considered to be statistical methods, making their estimates in slightly different ways ... In this book we will give the terms 'cladistic' and 'phenetic' a rest and consider all approaches as methods of statistical inference of the phylogeny."

The quote is taken from by David Williams and Malte Ebach (the comments are worth a read).

I guess I see "phenetics" vs. "phylogenetic" as sloganeering that doesn't provide any insights into the properties of the methods being so labelled. Your phrase "homology-based shared evolutionary history" sounds rather loaded (and old skool). I'm not convinced that we need be so restrictive (see, for example, ).
With all due respect, I'm not sure it's any better to dismiss a statement as loaded, without explaining why, and brand it "old skool" as though that has any scientific merit. It's actually rather condescending.

I'm not sure I agree with Felsenstein. Many would argue that there is, in fact, something missing from distance-based methods, if not from the results. Some people seem to be assuming that all systematists are as preoccupied with the tree topologies as they are. Some of us are not just interested in the monophyletic relationships of species (or "OTUs") but also in the monophyletic relationships of characters (homologies). The fact that all these methods produce a tree and can, under certain circumstances or assumptions, be interpreted as a phylogeny does not mean that all deliver the same content.

We could argue the value of this or whatever. And that may be productive. However, I would worry about the health of systematics if we grew to dismiss concepts, approaches, or programs of research simply because some people have deemed them to be no longer fashionable. After all, fashions may reflect the hive-minds of funding bodies and bureaucrats more than scientific value...
+Martin Brazeau I didn't mean to be condescending, my tongue was firmly in my cheek. There are certain phrases that carry baggage, and seem to me to stake out positions rather than explore the issues. "Phenetics" is one such phrase, "homology-based" is another. Maybe it's just that I'm sensitised to this having lived through the later stages of the cladistic wars.

Yes, distance methods in general loose information (see for an elegant statement of this) and if you want to explore characters then you want a method that retains the characters as distinct entities (we could argue, for fun, what methods for mapping characters on trees are best, and whether the method/data used to get the tree is necessarily the one you'd want to use for the mapping).

I'm was not trying to dismiss characters as unfashionable, rather I was suggesting that a split between "phenetics" and "homology-based shared evolutionary history" might not exhaust the possibilities. A lot of discussion of phylogenetic methods gets trapped in sterile dichotomies. There are more productive ways to carve up the phylogenetic landscape (see for example).
No offence taken Rod :)

After reading the relevant passage I wonder if this

"phenetics" vs. "phylogenetics" doesn't matter thinking

is itself an inherently political school of thought made popular by an influential textbook? Felsenstein (Inferring Phylogenies, p145) writes:

I have consequently announced that I have founded the fourth great school of classification, the It-Doesn't-Matter-Very-Much school. Actually systematists "voted with their feet" to establish this school, long before I announced its existence

I appreciate that the quote was perhaps intended as humorous but much of p145 to 146 'The irrelevance of classification' doesn't really convince me.

If we are to consider all approaches as methods of statistical inference of the phylogeny as Felsenstein tries to persuade us to, then all methods applied to a matrix of numbers can be used to infer phylogeny.

So if I apply addition or some other arbitrary mathematical operation to all the character states for a row in a matrix of numbers, and connect these rows based upon the summed pairwise difference between rows - does this give me a phylogeny of the rows? I think not.

That therein is why I think we do need to distinguish between operations that can be applied to matrices of numbers, and quantitative phylogenetic methods that consider homology and biological evolution in their workings. And thus with an analysis of how NJ operates, whether it is phenetic or phylogenetic.

Although you're welcome to disagree ;)
+Ross Mounce Yes, of course +Joe Felsenstein was being "political" in the sense that he was arguing that phylogenetics is a statistical problem and we could view all methods from that point of view (e.g., are they statistically consistent, what is there error rate, robustness, etc.). In the same way, we could view all phylogenetics methods as computational problems, and look at them in terms of complexity (NP-completeness, etc.).

I agree that in general considering what we we know about evolution may help develop better tree building methods. But we shouldn't be surprised if a method we develop in this way turns out to be formally equivalent to something that was developed with no notion of evolution. Or that tools from outside evolutionary biology can be adopted and interpreted in evolutionary terms (this, after all, is why mathematics is so powerful, it cares not what the actual task is). We can formulate maximum parsimony as an optimisation problem without any reference to evolution.

The point, I think, of the "irrelevance of classification" is that as much as biologists say classification matters, given the choice between a classification and a phylogeny we'd pick a phylogeny everytime. If you're going to use a phylogeny as the basis of a classification, you end up arguing about where and how many times to cut the branches, and as fun as that is it's not really getting you anywhere. But the phylogeny matters (you can do science with it). I suspect Joe was simply looking at what systematists do, not what they say they do.
"So if I apply addition or some other arbitrary mathematical operation to all the character states for a row in a matrix of numbers, and connect these rows based upon the summed pairwise difference between rows - does this give me a phylogeny of the rows? I think not."

But one could imagine a data matrix generated such where that does actually give you the true branching topology. There lies in the crux of the matter!

As this conversation seems to be dying down, I would like to note that I once sat in an Intro Bio class where a young faculty member introduced the concept of 'cladistics' and then referred to 'cluster-method cladistics', 'parsimony cladistics' 'likelihood cladistics' and 'Bayesian cladistics'. Furthermore, she made no distinction to the undergraduates that one was preferred above the other. I didn't ask, but I don't think she knew there was any reason to think these odd combinations of terms. Truly, the memory of the cladistics war is fading...
Rod: thanks for your explanation. Much appreciated, and indeed no offense taken—sorry for the kneejerk reaction! Thanks also for the interesting links.
David Bapst writes: "Truly, the memory of the cladistics war is fading..."
I dunno. Pick up a copy of Cladistics lately?
Some things will never change. The journal of the Hennig Society is one.
One sure sign that the memory of the cladistics wars is indeed fading is that this sort of discussion tends to come up again. It is worth rereading the reviews that appeared in Syst. Zool. after the 1966 translation of Hennig.
Cladistics won out over phenetics simply because it held that one character with two states would support one group, not two. This allowed for a link between the tree, its character support and a reasonable hypothesis of what happens during evolution: new character states arise. I think that still stands: a tree is cladistic to the extent that it deals with character support in an evolutionary context. Many of the disputes are about tree-building algorithms - interesting for the developer of programs, but fundamentally beside the point.
Old skool politics aside, I feel the distinction still has utility, foremost of which is that it contrasts the very nuts and bolts that Jonathan brought up (and which I agree is the most important matter).

The first distinction is how the data are analyzed. I don't think anyone would disagree that the analysis of distances vs. discrete character states is a fundamentally (and importantly) different take. The difference is akin to that between unpaired and paired t-tests: same data, but treated differently. Students and users need to know, for example, how ML with HKY differs from NJ with HKY.

The second distinction is more abstract. Mike Steel (and others) have shown a link between ML (via NCM) and parsimony, such that adding a ridiculous (i.e. unparsimonious) number of parameters to a ML model and you arrive at parsimony. Michael Sanderson talks about a continuum between parsimony and ML. Now, while that link doesn't always hold (, at some level ML and parsimony are trying to do the same thing. It may be argued that the ends of the continuum are so disparate that ML and parsimony are doing very different things (and, of course, I'd agree), but I am unaware of anyone showing that NJ lies on that continuum at all. Is this a useful distinction?

I agree the terminology involved is murky. "Phenetic" in particular is slippery. As Jonathan mentioned, it has been used to indicate overall similarity while ignoring evolutionary (homology) information. DNA-DNA hybridization distances would therefore be phenetic. What about NJ on a DNA alignment? The data are not phenetic, but the individual distances are overall pairwise (dis)similarities (perhaps accounting for multiple hits or gamma-distributed rate heterogeneity). The fact that the distances are used to reconstruct a tree does not in itself make the distances "incorporate evolutionary history". So what is it? I suppose it would be the model i.e. the possession of tree-wide parameters (e.g. substitution parameters, gamma, equilibrium nucleotide frequencies), and the give-and-take between these parameters and the inferred distances. I have always used "distance" and "phenetic" interchangeably, but I concede that that was incorrect (although I think some cladists would disagree). I suppose "phenetic" should be restricted to those data (like DNA-DNA hybridization) that are not explicitly modelled and are instead constant regardless of the tree configuration considered. But does anyone use these kinds of data anymore?

In discrete character-based analyses, individual character state transitions are modelled, whether implicitly or explicitly. I think it is useful to distinguish these types of methods from those that do not (i.e. distance methods). I can see how calling such former methods "phylogenetic" seems needlessly confusing (e.g. "phylogenetic" phylogenetic methods vs. "distance-based" phylogenetic methods). Despite the tenuous link stated above, ML and parsimony are (arguably) as different from one another as each is to distance methods. It seems that "character-based" vs. "distance-based" should suffice to describe this distinction.

Erg. This all seemed so clear a moment ago...
Why, it's almost as though we group ML and parsimony together based on a shared unique property, in spite of how different they are overall... ;)
Add a comment...