Why biologists need to be phylogenetically challenged




















To assess the behavior of Pagel under Darwin's scenario, we used Mesquite Maddison and Maddison to simulate birth—death trees, each with species. Mesquite was then used to traverse through the tree, and at the first clade of 40—60 species found, set the state of the character outside the clade to 0, inside the clade to 1. One such pair of characters was generated for each of the trees. We then used the diversitree package FitzJohn in R R core team to fit models using maximum likelihood where each character evolved independently, or where rates of evolution of each character depended on the other character.

Pagel's test applied to simulated cases like Fig. Pagel's method also indicates correlation in the unreplicated burst scenario. To test this, we used the same birth—death trees. For the X variable, we used the character generated as described above in mimicking Figure 1 c.

For the Y variable, we used Mesquite Maddison and Maddison to simulate a binary character evolving on the tree with reasonably high rate. After this Y character's states were evolved, its states outside the clade marked by X were set to 0. Although this is not directly an evolutionary simulation, it is nearly the same as a model in which the Y variable starts at the root with state 0 and very low rate, and then suddenly increases its rate of change in the marked clade—the only difference is that in that model the marked clade would have started with Y in state 0, but in our construction it might not.

Thus, Pagel's test is susceptible to yielding significant results from the effects of a single change in one of the characters, in both Darwin's scenario and the unreplicated burst.

Maddison , p. It will also give a significant result with Darwin's scenario, if the test is adjusted to focus on a single branch. The correlation test of Huelsenbeck et al. This is interpreted by the method of Huelsenbeck et al.

The methods of Maddison , Pagel , and Huelsenbeck et al. Pseudoreplication is still present, and coincidental correlations can be mistaken for interesting ones. Our first response to these results is to defend the methods. They are, as mathematical constructs, merely operating under the assumptions they are built around, doing what they were designed to do.

Thus, one might suggest that Pagel's test finds a significant correlation in Darwin's scenario because its assumptions are violated. If a character's distribution is inconsistent with the model of evolution assumed by a correlation test—for instance a continuous character with jumps violated the Brownian motion assumption of Felsenstein's test—then we could simply judge the test inapplicable to the character, and not use it.

Perhaps character distributions like those of X and Y in Figure 1 c,d cannot be reconciled with Pagel's model of character evolution although this has not been demonstrated. Pagel's test would not be at fault; it would simply be inapplicable. However, even if the test could be excused for the violation of its assumptions, that does not yield a solution. We need to be able to detect when the test's model is violated, and to devise a better test to handle more appropriate models. Although all models are inaccurate to varying degrees, we suspect that most biologists do not realize that this method and others fail to handle an iconic example of the need for phylogenetic corrections.

A second possible defence is that these tests are in fact correct to assign significance to Darwin's scenario. After all, if we imagine choosing two characters at random, it is very unlikely that their synapomorphies would fall on the same branch.

This is a superficially compelling argument. One could scan genetic variation across many species' genomes and find genes with significantly concordant changes, as long as one corrected for multiple comparisons of the many genes. This, however, is not usually what is done in comparative analyses.

Usually, only a few characters are considered in studies of correlation. While some may have been selected on purely functional grounds without prior knowledge of their distributions, others may have been selected precisely because we notice a trait characterizing a clade and wonder what effect it had on its bearers. That is, we may gravitate toward studying traits that we know a priori characterize clades of special interest. If this is the case, we cannot be surprised to see concordant distributions.

Thus, our mistakenly significant results may stem partly from ascertainment bias. When we react to Darwin's scenario as poor evidence for correlation, at least part of our negative judgment may come from an expectation that clade-biased character choice is possible or likely. Of course, if non-random choice of characters on which to focus and, for that matter, clades to study is rampant in evolutionary biology, then correlation tests will not be the only inferences that suffer.

Our very concept of phylogeny, of change and inheritance along lineages, predicts that there should be many traits throughout the genome that evolved along an ancestral lineage and were inherited by its descendants. Common descent generates many sets of co-distributed characters that would be unlikely if species evolved independently. Many of these would be sufficiently closely co-distributed to fur and middle ear bones—for example, milk or enucleate erythrocytes—so as to be alternative explanations for any purported correlation.

This is not just an effect of genomes being so big that there will, by chance, be variables with concordant distributions. Because of inheritance along lineages, there are likely hundreds or thousands of other traits with distributions that could equally be claimed as correlated, even if we have not yet discovered them.

How can we pin our functional or adaptive story on just one of these traits? Do we just hope that other biologists do not discover the other traits before we publish? The problem with accepting Darwin's scenario or the unreplicated burst as implying correlated evolution is that it would seem to open the door to innumerable papers claiming interesting correlations, significant by comparative methods, between pairs of synapomorphies for the same clade Darwin's scenario or between one synapomorphy and a character with a locally high rate of change unreplicated burst.

One might argue that most of these papers would be rejected for lack of a plausible mechanism between the arbitrarily chosen characters. Indeed, we have used examples such as milk and fur, or enucleate erythrocytes and middle ear bones, precisely because they seem ridiculous: a causal link seems implausible.

However, the imaginations of scientists are good, and there are many characters to choose from, so many could pass the test of plausibility. In this article we are concerned only with what support is given by the comparative patterns. Another explanation for the failure of Maddison's and Pagel's tests is that they mistakenly treat adjacent branches or adjacent infinitesimally small sections of lineages as independent, when in fact they can share common factors.

Indeed, this assumption of independence is at the heart of the Markov process that model-based approaches use. This criticism was first raised by Read and Nee and Grafen and Ridley In essence, our methods should be careful to count separate origins as independent, and recognize that the homologous instances along a branch are pseudoreplicates.

Consider Pagel's method applied to the unreplicated burst Fig. To assess likelihoods, the method sums probabilities for possible scenarios of ancestral states and parameters. As pointed out by Grafen and Ridley , if the different contexts of state 1 in X are homologous, then they represent the same instances of state 1, and there is not as much evidence as there appears to be for a correlation.

However, the pseudoreplication explanation makes no appeal to non-random character choice, and so we suspect that this effect would remain even if the problem of ascertainment bias were solved. Indeed, we expect within-clade pseudoreplication would remain even if the stochastic model underlying Pagel's test were adjusted to be consistent with characters evolving as X in Fig. To solve the problem of pseudoreplication of lineage-specific factors, it would seem that a rather different approach to modeling is needed.

When we discuss these issues with colleagues, we tend to get three alternative responses. Some share our concern that biologists may be seriously overestimating evidence for correlation, others consider within-clade pseudoreplication to be a minor problem and easily fixed by vigilance, while others are unconcerned, holding that significant results in Figure 1 c,d do indicate biologically meaningful correlations.

But, can we easily guard against over-interpretation simply by being vigilant, filtering and discarding the obvious problem cases? Surely, a good biologist should be aware enough of the dangers of Darwin's scenario and the unreplicated burst that they would not claim a significant result based on such patterns. A good biologist could, in addition to doing a statistical test, simply inspect the data to see how many origins are contributing to the pattern. If there is only a single origin, then the result should be discounted.

If there are sufficiently many separate origins, the result should be accepted. However, even if we knew exactly how many origins there were, what would be our rule for decision? Figure 3 shows four more scenarios that highlight why we cannot get an acceptable procedure simply by adding vigilance to current methods.

Are two origins enough for significance, and if so, why? What if the change is not precisely concordant Fig. Does that tip the balance to insignificance? And for replicated bursts, what if the independent origins of X are clustered on the tree Fig. Are the origins of X far enough apart in 3 d to rule out this alternative explanation? In each of the scenarios of Figure 3 , we expect that tests like Pagel's or Maddison's would give strong support to a statistical association, but we also know that the result is contaminated, to an unknown degree, by the problems of unreplicated effects within each of the clades.

Neither the tests nor our intuition tell us how serious are these problems, and thus whether to consider any correlation significant. As the number of independent origins rises, these problems would diminish, but how fast we do not know. We have no quantitative correction to apply to these methods, nor even a clear intuition as to whether there is sufficient replication, except in cases that are so clear that statistics are not needed to convince.

It appears unlikely that a satisfactory approach could be devised by adding to existing tests a step of counting origins. It would be far better to have all the evidence, including from the number of origins, summarized in a single well-defined quantitative method. The problems we outline go beyond tests of correlation between characters, and are suffered by other likelihood-based comparative approaches as well.

Maddison et al. A significant result can arise from a single clade, and thus BiSSE can conclude only that the diversification rates depend on that character or any other character that might be co-distributed with it Maddison et al.

FitzJohn found that a significant relationship between body size and speciation rate in primates inferred with the QuaSSE method could just as easily be explained by an unreplicated change in speciation rate attributable to the Old World monkeys. Thus, in our effort to develop methods to study diversification that use more of the information in the tree than was used by the older methods using sister-clade comparisons, we have lost the requirement for replication. This susceptibility of QuaSSE to unreplicated patterns shows that our concerns are not restricted to categorical data.

Could tests for correlations between continuous variables be susceptible as well? Felsenstein's method of phylogenetically independent contrasts can be misled by a single extraordinary event, but this is best considered a violation of the Brownian motion model, and can easily be detected by a contrast that stands as a distinctive outlier Jones and Purvis Otherwise, the method is expected to be immune to the problems outlined here, because a significant result requires many separate sister clade comparisons that show an evolutionary event of change in both characters; independent contrasts intrinsically focuses on replication.

Similarly, phylogenetic least squares e. However, we need to be aware that comparative methods detecting associations between continuous variables could in principle be susceptible.

If a method were designed to detect an association between the values in one continuous variable and the rate of change in a second continuous variable, then a single clade of high value could lead it to report an association. Indeed, we suspect that any comparative method that responds to the effect of a state, rather than the effect of a change, will be susceptible to within-clade pseudoreplication.

It might seem confusing that such different-looking trees can contain the same information. Here, it might be helpful to remember that the lines of a tree represent evolutionary lineages — and evolutionary lineages do not have any true position or shape. It is therefore equally valid to draw the branch leading to tip A as being on either the right or the left side of the split, as shown in Figure 7. Similarly, it doesn't matter whether branches are drawn as straight diagonal lines, are kinked to make a rectangular tree, or are curved to make a circular tree.

Think of lineages as flexible pipe cleaners rather than rigid rods; similarly, picture nodes as universal joints that can swivel rather than fixed welds. Using this sort of imagery, it becomes easier to see that the three trees in Figure 7, for example, are equivalent.

The basic rule is that if you can change one tree into another tree simply by twisting, rotating, or bending branches, without having to cut and reattach branches, then the two trees have the same topology and therefore depict the same evolutionary history.

Here, it might be helpful to remember that the lines of a tree represent evolutionary lineages--and evolutionary lineages do not have any true position or shape. Finally, it's important to note that in some instances, rectangular phylogenetic trees are drawn so that branch lengths are meaningful. These trees are often called phylograms, and they generally depict either the amount of evolution occurring in a particular gene sequence or the estimated duration of branches.

Usually, the context of such trees makes it clear that the branch lengths have meaning. However, when this is not the case, it is important to avoid reading in any temporal information that is not shown. For example, Figure 8 may appear to suggest that the node marking the last split leading to tips A and B marked x occurred after the node separating tip C from tips D and E marked y. However, this should not be read into the tree; in reality, node x could have occurred either before or after node y.

Given the increasing use of phylogenies across the biological sciences, it is now essential that biology students learn what tree diagrams do and do not communicate. Developing "tree thinking" skills also has other benefits. Most importantly, trees provide an efficient structure for organizing knowledge of biodiversity and allow one to develop an accurate, nonprogressive conception of the totality of evolutionary history.

It is therefore important for all aspiring biologists to develop the skills and knowledge needed to understand phylogenetic trees and their place in modern evolutionary theory. Figure 8: Trees contain information on the relative timing of nodes only when the nodes are on the same path from the root i. In this tree, nodes x and y are not on the same path, so we cannot tell whether the ancestral organisms in node x lived before or after those in node y.

Avise, J. Baum, D. The tree thinking challenge. Science , — Phylogenies and tree thinking. American Biology Teacher 70 , — Dawkins, R. O'Hara, R. Homage to Clio: Toward an historical philosophy for evolutionary biology. Systematic Zoology 37 , — Population thinking and tree thinking in systematics. Zoologica Scripta 26 , — Maddison, W. Origins of New Genes and Pseudogenes. Evolutionary Adaptation in the Human Lineage. Genetic Mutation.

Negative Selection. Sexual Reproduction and the Evolution of Sex. Haldane's Rule: the Heterogametic Sex. Hybrid Incompatibility and Speciation. Hybridization and Gene Flow. Why Should We Care about Species? Citation: Baum, D. Nature Education 1 1 Phylogenies are a fundamental tool for organizing our knowledge of the biological diversity we observe on our planet.

But how exactly do we understand and use these devices? Aa Aa Aa. What an Evolutionary Tree Represents. Figure 1. Figure Detail. The Lexicon of Phylogenetic Inference. A node represents a branching point from the ancestral population.

Terminals occur at the topmost part of each branch, and they are labeled by the taxa of the population represented by that branch. Figure 4: A monophyletic group, sometimes called a clade, includes an ancestral taxon and all of its descendants. A monophyletic group can be separated from the root with a single cut, whereas a non-monophyletic group needs two or more cuts. How to Read an Evolutionary Tree. Figure 6: Types of phylogenetic trees.

These trees depict equivalent relationships despite being different in. Figure 7: Relationships on a phylogenetic tree can be depicted in multiple ways. These trees depict equivalent relationships despite the fact that certain internal branches have been rotated so that the order of the tip labels is different. The Importance of Phylogenetic Trees. Science , — Baum, D. American Biology Teacher 70 , — Dawkins, R.

Eastman, J. A novel comparative method for identifying shifts in the rate of character evolution on trees. Evolution 65 : — Elliot, M. Inferring ancestral states without assuming neutrality or gradualism using a stable model of continuous character evolution. BMC Evol. Felsenstein, J. Maximum-likelihood estimation of evolutionary trees from continuous characters. Genetics 25 : Phylogenies and the comparative method. A comparative method for both discrete and continuous characters using the threshold model.

FitzJohn, R. Diversitree: comparative phylogenetic analyses of diversification in R. Methods Ecol. Freckleton, R. Phylogenetic analysis and comparative data: a test and review of evidence. Futuyma, D. Wherefore and whither the naturalist? Galtier, N. Maximum-likelihood phylogenetic analysis under a covarion-like model. Garamszegi, L. Modern phylogenetic comparative methods and their application in evolutionary biology: concepts and practice. Heidelberg : Springer.

Garland, T. Procedures for the analysis of comparative data using phylogenetically independent contrasts. Gelman, A. Cambridge : Cambridge University Press.

Gould, S. Punctuated equilibria: the tempo and mode of evolution reconsidered. Paleobiology 3 : — Grafen, A. The phylogenetic regression. Hadfield, J. General quantitative genetic methods for comparative biology: phylogenies, taxonomies and multi-trait models for continuous and categorical characters.

Hansen, T. Stabilizing selection and the comparative analysis of adaptation. Evolution 51 : — Interpreting the evolutionary regression: the interplay between observational and biological errors in phylogenetic comparative studies. Assessing current adaptation and phylogenetic inertia as explanations of trait evolution: the need for controlled comparisons.

Evolution 59 : — A comparative method for studying adaptation to a randomly evolving environment. Evolution 62 : — Harvey, P. Why ecologists need to be phylogenetically challenged. Probabilistic graphical model representation in phylogenetics.

Revbayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Housworth, E. The phylogenetic mixed model. Ingram, T. Surface: detecting convergent evolution from comparative data by fitting Ornstein—Uhlenbeck models with stepwise Akaike information criterion. Jablonski, D. Approaches to macroevolution: 1. General concepts and origin of variation. Jones, K.

An optimum body size for mammals? Comparative evidence from bats. Karlin, S. A second course in stochastic processes. New York : Academic Press. Khabbazian, M. Fast and accurate detection of evolutionary shifts in Ornstein—Uhlenbeck models. Lande, R. Natural selection and random genetic drift in phenotypic evolution. Evolution 30 : — Landis, M. Pulsed evolution shaped modern vertebrate body sizes.

Lewis, P. A likelihood approach to estimating phylogeny from discrete morphological character data. Losos, J. Seeing the forest for the trees: the limitations of phylogenies in comparative biology: American Society of Naturalists Address. Lynch, M. Methods for the analysis of comparative data in evolutionary biology. Evolution 45 : — Mace, G.

Brain size and ecology in small mammals. Maddison, W. A method for testing the correlated evolution of two binary characters: are gains or losses concentrated on certain branches of a phylogenetic tree? Evolution 44 : — The unsolved challenge to phylogenetic correlation tests for categorical characters. McNab, B. Complications inherent in scaling the basal rate of metabolism in mammals. Standard energetics of phyllostomid bats: the inadequacies of phylogenetic-contrast analyses.

A Mol. Mooers, A. Inferring evolutionary process from phylogenetic tree shape. Nee, S. The relationship between abundance and body size in British birds. Nature : — Why phylogenies are necessary for comparative analysis. Phylogenies and the comparative method in animal behavior. Oxford : Oxford University Press. Evolutionary inferences from phylogenies: a review of methods. Testing for different rates of continuous trait evolution using likelihood.

Evolution 60 : — Pagel, M. Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Inferring the historical patterns of biological evolution. Nature : Pavlidis, P. A critical assessment of storytelling: gene ontology categories and the importance of validating genomic scans.

Pearl, J. Causal diagrams for empirical research. Biometrika 82 : — Pennell, M. An integrative view of phylogenetic comparative methods: connections to population genetics, community ecology, and paleobiology.

Model adequacy and the macroevolution of angiosperm functional traits. Penny, D. Mathematical elegance with biochemical realism: the covarion model of molecular evolution. Price, T. Correlated evolution and independent contrasts. Purvis, A. Comparative analysis by independent contrasts caic : an apple macintosh application for analysing comparative data. Bioinformatics 11 : — Rabosky, D. Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees. PLoS One 9 : e Model inadequacy and mistaken inferences of trait-dependent speciation.

Read, A. Inference from binary comparative data. Reitan, T. An unknown phanerozoic driver of brachiopod extinction rates unveiled by multivariate linear stochastic differential equations. Paleobiology 43 : — Revell, L. Phylogenetic signal and linear regression on species data.

Ridley, M. The explanation of organic diversity: the comparative method and adaptations for mating. Rohlf, F. Comparative methods for the analysis of continuous variables: geometric interpretations. Evolution 55 : — A comment on phylogenetic correction. Scales, J.

Adaptive evolution in locomotor performance: how selective pressures and functional relationships produce diversity. Evolution 70 : 48 — Running for your life or running for your dinner: what drives fiber-type evolution in lizard locomotor muscles? Schraiber, J. Sensitivity of quantitative traits to mutational effects and number of loci.

Shipley, B. Slater, G. Robust regression and posterior predictive simulation increase power to detect early bursts of trait evolution. Stadler, T. Mammalian phylogeny reveals recent diversification rate shifts.

Stearns, S. The influence of size and phylogeny on patterns of covariation among life-history traits in the mammals. Oikos 41 : — Sugihara, G. Detecting causality in complex ecosystems. Science : — Uyeda, J. A novel Bayesian method for inferring and interpreting the dynamics of adaptive landscapes from phylogenetic comparative data.

The million-year wait for macroevolutionary bursts. The evolution of energetic scaling across the vertebrate tree of life. Vermeij, G. Historical contingency and the purported uniqueness of evolutionary innovations. Gonzalez-Voyer, A. Disentangling evolutionary cause-effect relationships with phylogenetic confirmatory path analysis.

Evolution 67 : — Westoby, M. Generalization in functional plant ecology: the species-sampling problem, plant ecology strategy schemes, and phylogeny.

In: Pugnaire F. Functional plant ecology. Further remarks on phylogenetic correction. Zenil-Ferguson, R. Digest: trait-dependent diversification and its alternatives. Evolution 71 : — Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide. Sign In or Create an Account.

Sign In.



0コメント

  • 1000 / 1000