Activity Number:
|
235
|
Type:
|
Invited
|
Date/Time:
|
Tuesday, August 13, 2002 : 2:00 PM to 3:50 PM
|
Sponsor:
|
Technometrics
|
Abstract - #300335 |
Title:
|
A Modification of the Jaccard/Tanimoto Similarity Index for Diverse Selection of Chemical Compounds Using Binary Strings
|
Author(s):
|
Joseph Verducci*+ and Paul Blower and Michaei Fligner
|
Affiliation(s):
|
Ohio State University and Leadscope, Inc. and Ohio State University
|
Address:
|
, , Ohio, ,
|
Keywords:
|
binary data ; chemical fingerprints ; data mining ; measures of association ; optimal design
|
Abstract:
|
Determination of molecular similarity plays an important role in analyzing large compound databases in chemical and pharmaceutical research. When molecules are described by binary vectors, with bits corresponding to the presence or absence of structural features, the Tanimoto association coefficient is the most commonly used measure of similarity or chemical distance between two compounds. However, when used to select compounds for an optimal spread design, the Tanimoto coefficient produces an intrinsic bias towards smaller compounds. We have developed a new association coefficient that overcomes this bias. This paper will give details of the new coefficient and contrast the two coefficients for selecting diverse sets of compounds from a large collection. When the modified coefficient is used to select a diverse set in the NCI and RTECS databases, the average number of features among the selected compounds is nearly twice that when the original Tanimoto coefficient is used.
|
- The address information is for the authors that have a + after their name.
- Authors who are presenting talks have a * after their name.
Back to the full JSM 2002 program |