Generalizing matrix representations to fully heterochronous ranked tree shapes
Abstract
Phylogenetic tree shapes capture fundamental signatures of evolution. We consider ``ranked'' tree shapes, which are equipped with a total order on the internal nodes compatible the tree graph. Recent work has established an elegant bijection of ranked tree shapes and a class of integer matrices, called \textbf{F}-matrices, defined by simple inequalities. This formulation is for isochronous ranked tree shapes, where all leaves share the same sampling time, such as in the study of ancient human demography from present-day individuals. Another important style of phylogenetics concerns trees where the ``timing'' of events is by branch length rather than calendar time. This style of tree, called a rooted phylogram, is output by popular maximum-likelihood methods. These trees are broadly relevant, such as to study the affinity maturation of B cells in the immune system. Discretizing time in a rooted phylogram gives a fully heterochronous ranked tree shape, where leaves are part of the total order. Here we extend the \textbf{F}-matrix framework to such fully heterochronous ranked tree shapes. We establish an explicit bijection between a class of \textbf{F}-matrices and the space of such tree shapes. The matrix representation has the key feature that values at any entry are highly constrained via four previous entries, enabling straightforward enumeration of all valid tree shapes. We also use this framework to develop probabilistic models on ranked tree shapes. Our work extends understanding of combinatorial objects that have a rich history in the literature: isochronous ranked tree shapes are related to alternating permutations that Andr\'e studied over 130 years ago, and Poupard found (nearly 40 years ago) that fully heterochronous ranked tree shapes are counted by the reduced tangent numbers.