Abstract #302141

This is the preliminary program for the 2003 Joint Statistical Meetings in San Francisco, California. Currently included in this program is the "technical" program, schedule of invited, topic contributed, regular contributed and poster sessions; Continuing Education courses (August 2-5, 2003); and Committee and Business Meetings. This on-line program will be updated frequently to reflect the most current revisions.

To View the Program:
You may choose to view all activities of the program or just parts of it at any one time. All activities are arranged by date and time.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2003 Program page



JSM 2003 Abstract #302141
Activity Number: 311
Type: Invited
Date/Time: Wednesday, August 6, 2003 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Computing
Abstract - #302141
Title: Statistical Analysis of Genome Sequences to Identify Novel Important Protein Domain Combinations
Author(s): Wolfgang Huber*+ and Gordana Apic and Sarah A. Teichmann
Companies: German Cancer Research Center and MRC Laboratory of Molecular Biology and MRC Laboratory of Molecular Biology
Address: Division of Molecular Genome Analysis (B050), 69120 Heidelberg, , Germany
Keywords: protein domains ; random graphs ; genomic databases
Abstract:

There is a limited repertoire of families of protein domains in nature, which are duplicated and combined in different ways to form the set of proteins in a genome. With 70 genomes sequenced, it becomes possible to gain a systematic overview over the ways in which duplication and combination events have been used in evolution. Most domain pairs occur in three to six different architectures: in isolation and in combinations with different partners. Looking at the full presently known set of all pairwise domain combinations, we observe that most small and medium-sized families combine with only one or two families, while a few large families are versatile and combine with many different partners. We investigate to which extent this can be explained by a "random combination" model, in which the frequency of combination is proportional to the size of a domain family. From the complete sequence data of 70 genomes, we find that about 1/3 of families show significantly more duplications than expected from their number of neighboring domains, and about 1/3 are combined with more different types of domains than expected from family size. They are interesting targets for structural elucidation.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2003 program

JSM 2003 For information, contact meetings@amstat.org or phone (703) 684-1221. If you have questions about the Continuing Education program, please contact the Education Department.
Revised March 2003