Accurate HIV incidence estimation based on individual recent infection status is important for prevention and intervention strategies. The current recent infection indicator is a binary variable (recent vs long-term infection) based on a classification tree with cutoffs (previously determined in an ad-hoc manner); a recent infection has low normalized optical density (from limited antigen avidity test in incidence assay), high viral load, and no antiretroviral treatment. However, based on inspection of the data, we believe that the current indicator may be too conservative in classifying infections as recent (i.e., we believe there may be too many indicated as long-term). Also, the existing label does not capture uncertainty in labeling due to the hard cutoffs of the classification tree. We aim to replace the binary recent infection label with a probabilistic indicator, using supervised learning methods with the current label and/or unsupervised learning methods without it. We will consider crucial biomarkers including CD4 count as well as possibly relevant auxiliary covariates (socioeconomic, self-reported status, etc.).