Abstract #300335


The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2002 Program page



JSM 2002 Abstract #300335
Activity Number: 235
Type: Invited
Date/Time: Tuesday, August 13, 2002 : 2:00 PM to 3:50 PM
Sponsor: Technometrics
Abstract - #300335
Title: A Modification of the Jaccard/Tanimoto Similarity Index for Diverse Selection of Chemical Compounds Using Binary Strings
Author(s): Joseph Verducci*+ and Paul Blower and Michaei Fligner
Affiliation(s): Ohio State University and Leadscope, Inc. and Ohio State University
Address: , , Ohio, ,
Keywords: binary data ; chemical fingerprints ; data mining ; measures of association ; optimal design
Abstract:

Determination of molecular similarity plays an important role in analyzing large compound databases in chemical and pharmaceutical research. When molecules are described by binary vectors, with bits corresponding to the presence or absence of structural features, the Tanimoto association coefficient is the most commonly used measure of similarity or chemical distance between two compounds. However, when used to select compounds for an optimal spread design, the Tanimoto coefficient produces an intrinsic bias towards smaller compounds. We have developed a new association coefficient that overcomes this bias. This paper will give details of the new coefficient and contrast the two coefficients for selecting diverse sets of compounds from a large collection. When the modified coefficient is used to select a diverse set in the NCI and RTECS databases, the average number of features among the selected compounds is nearly twice that when the original Tanimoto coefficient is used.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2002 program

JSM 2002

For information, contact meetings@amstat.org or phone (703) 684-1221.

If you have questions about the Continuing Education program, please contact the Education Department.

Revised March 2002