Mathematics makes possible the management and analysis of the massive database of the Human Genome Project. Numerical analysis, statistics and modeling play a significant role in mapping and sequencing our DNA -- the blueprint for the genetic information that determines what makes each of us unique. Researchers predict that this fusion of mathematics and biology will result in a new era of molecular medicine, when the diagnosis, treatment and prevention of disease will be individual-specific and thus more successful.
Some background on the genome:
Inside the nucleus of nearly every cell in our body, a complex set of genetic instructions, (the human genome) is contained on 23 pairs of chromosomes. Chromosomes, long chains made up primarily of DNA (deoxyribonucleic acid), are long, threadlike molecules coiled inside our cells. Each chromosome, in turn, carries genes that look like beads on a string. Genes (short segments of DNA) are packets of instructions for making particular proteins that tell cells how to behave. The hereditary instructions are written in a four-letter code, with each letter corresponding to one of the chemical constituents of DNA (nitrogen-containing chemicals called bases): A (adenine), G (guanine), C (cytosine), T (thymine). The sequence of As, Gs, Cs and Ts constitutes the recipe for a specific protein. Proteins are large, complex molecules made up of amino acids. If the instructions become garbled, the cell can make a wrong protein or make too much or too little of a protein-mistakes that can result in disease. Scientists estimate that the human genome contains 30,000 to 40,000 genes, and that genes comprise only about 2% of the human genome-the rest consisting of non-coding regions that provide chromosomal structural integrity and regulate where, when, and in what quantity proteins are made.
All humans have the same basic set of genes and genomic regulatory regions that control the development and maintenance of biological structures and processes, yet there are differences among us. Therefore the Human Genome Project's (HGP) reference sequence, which is based on samples of a group of individuals, does not represent an exact match of any one person's genome. Another important HGP goal is to identify many of the small DNA regions that vary among individuals and that could underlie disease susceptibility and drug responsiveness.
What is "Mapping" and "Sequencing"?
Mapping. A "genetic map" consists of thousands of markersshort, distinctive pieces of DNA -- more or less evenly spaced along chromosomes. This map should enable researchers to pinpoint the location of a gene between any two markers. "Physical maps" consist of overlapping pieces of DNA spanning an entire chromosome. When the "physical maps" are complete, investigators can localize a gene to a particular region of a chromosome by using a "genetic map," then select and study a specific piece of the "physical map," rather than having to search through the entire chromosome all over again.
Sequencing. The ultimate goal of the HGP is to decode, letter by letter, the exact sequence of all 3 billion nucleotide bases that make up the human genome. Computer scientists, biologists, physicists and engineers are all developing automated technologies to reduce the time and cost of sequencing. Once the human genome sequence (a composite of sequences derived from many individuals) is completed, attention can shift from the job of finding genes (which will then be simply a matter of scanning a computer database) to understanding them.
Some of the mathematics involved in the Human Genome Project:
Numerical analysis DNA is microscopic yet the amount of data it generates is huge. Researchers require advanced numerical techniques to manipulate and make sense of the data.
Statistics has played a part in generating "draft sequence data"mostly 10,000 base-pair fragments whose approximate chromosomal locations are known. Additional sequencing will close gaps and reduce ambiguities. Statistics is also used to design experiments to optimize information extracted from these experiments.
Computational models Researchers attempt to predict molecular behavior by describing DNA and protein molecules with equations that can be solved numerically. The massive amount of data now available allows for more accurate equations on which to base models and the ability to compare predictions with known results.
Topology Topology deals with the shape and geometry of complex structures. The basic double helix structure of DNA provides a good deal of information about the molecule, but it is not complete. The details of the structure and of the different forms of DNA, provide information about the biological function of DNA. In addition, the structure and formation of proteins are far more complicated than those of DNA.
Computer graphics make static and mobile images of DNA structures possible, which enables both researchers and laypersons better able to visualize and study the genome.
Microarrays These are relatively new invention that lets scientists measure something they could not measure before. A microarray measures how much a messenger RNA of a given type is being made in a sample of tissue at a given moment, which gives a good idea of how much of the corresponding protein is being made.