Abstract:
|
Persistent homology is a powerful tool for characterizing the topology of a dataset at various geometric scales. However, in addition to geometric information, there can be a wide variety of nongeometric information, for example, there are element types and atomic charges in addition to the atomic coordinates in molecular structures. To characterize such datasets, we propose an enriched persistence barcode approach that retains the non-geometric information in the traditional persistence barcode. The enriched barcode is constructed by finding the smoothest representative cocycles determined by combinatorial Laplacian for each persistence pair. We show that when combined with machine learning methods, this enriched barcode approach achieves state-of-the-art performance in an important real-world problem, the prediction of protein-ligand binding affinity based on molecular structures.
|