Although the genetic code is not a "code" in the sense normally used in intelligence and espionage terminology, a fundamental understanding of the genetic code is essential to understanding the molecular basis of advanced DNA and genetic tests that are increasingly important in forensic science and identification technology.
The genetic information that is passed on from parent to offspring is carried by the DNA of a cell. The genes on the DNA code for specific proteins that determine appearance, different facets of personality, health etc. In order for the genes to produce the proteins, it must first be transcribed from DNA to RNA in a process known as transcription. Thus, transcription is defined as the transfer of genetic information from the DNA to the RNA. Translation is the process in which genetic information, carried by messenger RNA (mRNA), directs the synthesis of proteins from amino acids, whereby the primary structure of the protein is determined by the nucleotide sequence in the mRNA.
The genetic code is the set of correspondences between the nucleotide sequences of nucleic acids such as deoxyribonucleic acid (DNA), and the amino acid sequences of proteins (polypeptides). These correspondences enable the information encoded in the chemical components of DNA to be transferred to the ribonucleic acid messenger (mRNA) and then used to establish the correct sequence of amino acids in the polypeptide. The elements of the encoding system, the nucleotides, differ by only four different bases. These are known as adenine (A), guanine,(G), thymine (T) and cytosine (C), in DNA or uracil (U) in RNA. Thus RNA contains U in the place of C and the nucleotide sequence of DNA acts as a template for the synthesis of a complementary sequence of RNA, a process known as transcription. For historical reasons, the term genetic code in fact refers specifically to the sequence of nucleotides in mRNA, although today it is sometimes used interchangeably with the coded information in DNA.
Proteins found in nature consist of 20 naturally occurring amino acids. One important question is, how can four nucleotides code for 20 amino acids? This question was raised by scientists in the 1950s soon after the discovery that the DNA comprised the hereditary material of living organisms. It was reasoned that if a single nucleotide coded for one amino acid, then only four amino acids could be provided for. Alternatively, if two nucleotides specified one amino acid, then there could be a maximum number of 16 (4 2 ) possible arrangements. If, however, three nucleotides coded for one amino acid, then there would be 64 (4 3 ) possible permutations, more than enough to account for all the 20 naturally occurring amino acids. The latter suggestion was proposed by the Russian born physicist, George Gamow (1904–1968) and was later proved to be correct. It is now well known that every amino acid is coded by at least one nucleotide triplet or codon, and that some triplet combinations function as instructions for the termination or initiation of translation. Three combinations in tRNA, UAA, UGA and UAG, are termination codons, while AUG is a translation start codon.
The genetic code was solved between 1961 and 1963. The American scientist Marshall Nirenberg (1927–), working with his colleague Heinrich Matthaei, made the first breakthrough when they discovered how to make synthetic mRNA. They found that if the nucleotides of RNA carrying the four bases A, G, C and U, were mixed in the presence of the enzyme polynucleotide phosphorylase, a single stranded RNA was formed in the reaction, with the nucleotides being incorporated at random. This offered the possibility of creating specific mRNA sequences and then seeing which amino acids they would specify. The first synthetic mRNA polymer obtained contained only uracil (U) and when mixed in vitro with the protein synthesizing machinery of Escherichia coli it produced a polyphenylalanine—a string of phenylalanine. From this it was concluded that the triplet UUU coded for phenylalanine. Similarly, a pure cytosine (C) RNA polymer produced only the amino acid proline, so the corresponding codon for cytosine had to be CCC. This type of analysis was refined when nucleotides were mixed in different proportions in the synthetic mRNA and a statistical analysis was used to determine the amino acids produced. It was quickly found that a particular amino acid could be specified by more than one codon. Thus, the amino acid serine could be produced from any one of the combinations UCU, UCC, UCA, or UCG. In this way the genetic code is said to be degenerate, meaning that each of the 64 possible triplets
have some meaning within the code and that several codons may encode a single amino acid.
This work confirmed the ideas of the British scientists Francis Crick (1916–) and Sydney Brenner (1927–). Brenner and Crick were working with mutations in the bacterial virus bacteriophage T4 and found that the deletion of a single nucleotide could abolish the function of a specific gene. However, a second mutation in which a nucleotide was inserted at a different, but nearby position, restored the function of that gene. These two mutations are said to be suppressors of each other, meaning that they cancel each other's mutant properties. It was concluded from this that the genetic code was read in a sequential manner starting from a fixed point in the gene. The insertion or deletion of a nucleotide shifted the reading frame in which succeeding nucleotides were read as codons, and was thus termed a frameshift mutation. It was also found that whereas two closely spaced deletions, or two closely spaced insertions, could not suppress each other, three closely spaced deletions or insertions could do so. Consequently, these observations established the triplet nature of the genetic code. The reading frame of a sequence is the way in which the sequence is divided into the triplets and is determined by the precise point at which translation is initiated. For example, the sequence CATCATCAT can be read CAT CAT CAT or C ATC ATC AT or CA TCA TCA T in the three possible reading frames. Sometimes, as in particular bacterial viruses, genes have been found that are contained within other genes. These are translated in different reading frames so the amino acid sequences of the proteins encoded by them are different. Such economy of genetic material is, however, quite rare.
The same genetic code appears to operate in all living things, but exceptions to this universality are known. In human mitochondrial mRNA, AGA and AGG are termination or stop codons. Other differences also exist in the correspondences between certain codon sequences and amino acids.
█ FURTHER READING:
Brenner, Sydney. My Life in Science. London: BioMed Central, Ltd., 2001.
Davies, Kevin. Cracking The Genome: Inside The Race To Unlock Human DNA. New York: Free Press, 2001.
Watson, James D. The Double Helix: A Personal Account of the Discovery of the Structure of DNA. Westport, CT: Touchstone Books, 2001.
——. DNA: The Secret of Life. New York: Knopf, 2003.