ASA-JSM 2014

Technical Support

Phone: (410) 638-9239

Fax: (410) 638-6108

GoToMeeting: Meet Now!

Web: www.CadmiumCD.com

←Back

‹‹ Go Back

Guozhu Zhang

North Carolina State University

‹‹ Go Back

Stephen Lee

University of Idaho

466 – Biodata Methods

Statistical Modeling of Genomic Words and Motifs

Sponsor: Section on Statistical Learning and Data Mining

Keywords: Segmentation, Genomic words, motifs

Guozhu Zhang

North Carolina State University

Stephen Lee

University of Idaho

The arrangement of the four nucleotides A, C, G, and T along the genome is known to be non-random. Vast amount of information are built into the complex arrangements and compositions of genomic nucleotides. It can be viewed as a book of nucleotide text of instructions at the cellular level. Genome is decoded as a continuous stream of nucleotide alphabets message as one read the genomic text. We approach the reading of genomic text by segmentation - dividing the continuous stream into chunks according to some statistical measures of homogeneity. The goal would be to segment the genome into the most probable dictionary of motifs or words. Words are defined by our segmentation method as more homogeneous units within the boundaries than without. The core idea of this paper is to introduce the method of setting word boundaries. We applied the method to compare the yeast and worm genomes, to distinguish ordered and disordered protein sequences, and to characterize different English texts.

View Paper

"eventScribe", the eventScribe logo, "CadmiumCD", and the CadmiumCD logo are trademarks of CadmiumCD LLC, and may not be copied, imitated or used, in whole or in part, without prior written permission from CadmiumCD. The appearance of these proceedings, customized graphics that are unique to these proceedings, and customized scripts are the service mark, trademark and/or trade dress of CadmiumCD and may not be copied, imitated or used, in whole or in part, without prior written notification. All other trademarks, slogans, company names or logos are the property of their respective owners. Reference to any products, services, processes or other information, by trade name, trademark, manufacturer, owner, or otherwise does not constitute or imply endorsement, sponsorship, or recommendation thereof by CadmiumCD.

As a user you may provide CadmiumCD with feedback. Any ideas or suggestions you provide through any feedback mechanisms on these proceedings may be used by CadmiumCD, at our sole discretion, including future modifications to the eventScribe product. You hereby grant to CadmiumCD and our assigns a perpetual, worldwide, fully transferable, sublicensable, irrevocable, royalty free license to use, reproduce, modify, create derivative works from, distribute, and display the feedback in any manner and for any purpose.