Online Program

Return to main conference page
Friday, May 31
Machine Learning
Machine Learning E-Posters, II
Fri, May 31, 3:00 PM - 4:00 PM
Grand Ballroom Foyer

Statistical Learning on Next-Generation Sequencing of T cell Repertoire Data (306330)

Jason Cham, UCSF 
Lawrence Fong, UCSF 
Tao He, San Francisco State University 
David Oh, UCSF 
Alan Paciorek, UCSF 
*Li Zhang, UCSF 

Keywords: TCR sequencing, next-generation sequencing, time change analysis, pattern recognition

Cancer immunotherapy has demonstrated significant clinical activity in different cancers. T cells represent a crucial component of the adaptive immune system and are thought to mediate antitumoral immunity. Antigen-specific recognition by T cells is via T cell receptor (TCR), which is the product of somatic V(D)J gene recombination, plus the addition/subtraction of nontemplated bases at recombination junctions. Next generation sequencing of TCR is used as a platform to profile TCR repertoire. We developed an analysis pipeline to track and examine TCR repertoire across time by focusing on V and J gene segments, which overcomes the limitation of small or non-overlap clones among subjects and thus can obtain statistical inferences across subjects directly. We developed a customized clustering workflow: 1) Patterns Recognition by using the combination of V and J gene segments based on their abundance change across time based on changepoint analysis; 2) Feature Selection to select the important V and J gene segments by random forest; and 3) Hierarchical Clustering to distinguish the subjects based on the selected important V and J gen segments. TCR sequence data from serial peripheral blood mononuclear cells samples from a group of cancer patients receiving the immunotherapy was used for illustration purpose.