Abstract:
|
The Global LAteSt Split/Maximum Tree method of species tree (ST) inference performs well and is statistically consistent when inferring the ST from known gene trees (GT). With estimated GTs, the inferred ST is the maximum likelihood tree and a consistent estimator of the ST under some conditions. The method as implemented in STEM (ST Estimation using Maximum likelihood) software has performed relatively poorly when the input is GT estimated from DNA sequences, and conditions for statistical consistency in this case can be unrealistic. We propose a modification to the STEM tree: an application of clustering in measurement error models as described by Su, Reedy and Carroll in 2018. The proposed method replaces estimated pairwise coalescence times used by STEM with randomly generated realizations from the distribution, estimated through measurement error modeling, of true pairwise coalescence times. As with STEM, the minimum of these realizations is taken over all loci for each pairwise distance to form a distance matrix, and single linkage clustering is used to infer the ST. In simulation studies, the new method outperforms STEM in terms of Robinson-Foulds distance from the true ST.
|