Recent advances in high-throughput sequencing of the T cell receptor (TCR) repertoire provide a new, detailed characterization of the immune system. The diversity of TCR can reflect general immune potency and the dynamics of immune response to pathogens. It potentially can be used as an early indicator to differentiate patients who respond to the treatment from those who do not. Though estimators borrowed from ecology literature, such as, species richness, are often used to measure the diversity, these measures can be sensitive to sequencing depth and may not faithfully reflect the true diversity.
In this work, we propose a novel threshold model, based on a generalized Pareto distribution (GPD) and a truncate Gamma distribution, to model the diversity of the TCR repertoire. This model can be related to the stochastic process of clonal formation, thus can provide biologically meaningful interpretation. We evaluated the performance on both simulated and real datasets. Our results show that it is able to differentiate individuals with different clinical outcomes. It is also robust across a range of sequencing depths.