█ BRIAN HOYLE
Sequencing refers to the techniques used to determine the order of the constituent bases (i.e., adenine, thymine, guanine, and cytosine) of deoxyribonucleic acid (DNA) or protein. Protein sequencing determines the order of the constituent amino acids. Sequencing is increasingly important in forensic science and in the rapid and positive identification of potential pathogens that can be exploited by bioterrorists.
DNA is typically sequenced for several reasons: to determine the sequence of the protein encoded by the DNA, the location of sites at which restriction enzymes can cut the DNA, the location of DNA sequence elements that regulate the production of messenger RNA, or alterations in the DNA.
The sequencing of DNA is accomplished by stopping the lengthening of a DNA chain at a known base and at a known location in the DNA. Practically, this can be done in two ways. In the first method, called the Sanger-Coulson procedure, a small amount of a specific so-called dideoxynucleoside base is incorporated in along with a mixture of the other four normal bases. This base is slightly different from the normal base and is radioactively labeled. The radioactive base becomes incorporated into the growing DNA chain instead of the normal base, growth of the DNA stops. This stoppage is done four times, each time using one of the four different dideoxynucleosides. This generates four collections of DNA molecule. Also, because replication of the DNA always begins at the same point, and because the amount of altered base added is low, for each reaction many DNA pieces of different length will be generated. When the sample is used for gel electrophoresis, the different sized pieces can be resolved as radioactive bands in the gel. Then, with the location of the bases known, the sequence of the DNA can be deduced. The second DNA sequencing technique is known as the Maxam-Gilbert technique, after its co-discoverers. In this technique, both strands of double-stranded DNA are radioactively labeled using radioactive phosphorus. Upon heating, the DNA strands separate and can be physically distinguished from each other, as one strand is heavier than the other. Both strands are then cut up using specific enzymes, and the different sized fragments of DNA are separated by gel electrophoresis. Based on the pattern of fragments the DNA sequence is determined.
The Sanger-Coulsom is the more popular method. Various modifications have been developed and it has been automated for very large-scale sequencing. During the sequencing of the human genome, a sequencing method called shotgun sequencing was very successfully employed. Shotgun sequencing refers to a method that uses enzymes to cut DNA into hundreds or thousands of random bits. So many fragments are necessary since automated sequencing machines can only decipher relatively short fragments of DNA about 500 bases long. The many sequences are then pieced back together using computers to generate the entire DNA genome sequence.
Protein sequencing involves determining the arrangement of the amino acid building blocks of the protein. It is common to sequence a protein by the DNA sequence encoding the protein. This, however, is only possible if a cloned gene is available. It still is often the case that chemical protein sequencing, as described subsequently, must be performed in order to manufacture an oligonucleotide probe that can then be used to locate the target gene. The most popular direct protein chemical sequencing technique in use today is the Edman degradation procedure. This is a series of chemical reactions, that remove one amino acid at a time from a certain end of the protein (the amino terminus). Each amino acid that is released has been chemically modified in the release reaction, allowing the released product to be detected using a technique called reverse phase chromatography. The identity of the released amino acids is sequentially determined, producing the amino acid sequence of the protein.
Another protein sequencing technique is called fast atom bombardment mass spectrometry, or FAB-MS. This is a powerful technique in which the sample is bombarded with a stream of fast atoms, such as argon. The protein becomes charged and fragmented in a sequence-specific manner. The fragments can be detected and their identify determined. The expense and relative scarcity of the necessary equipment can be a limitation to the technique.
Still another protein sequencing strategy is the digestion of the protein with specialized protein-degrading enzymes called proteases. The shorter fragments that are generated, called peptides, can then be sequenced. The problem then is to order the peptides. This is done by the use of two proteases that cut the protein at different points, generating overlapping peptides. The peptides are separated and sequenced, and the patterns of overlap and the resulting protein sequence can be deduced.
█ FURTHER READING:
Cirincione, Joseph, Jon B. Wolfsthal, Miriam Rajkuman, Jessica T. Mathews. Deadly Arsenals: Tracking Weapons of Mass Destruction. Washington, DC: Carnegie Endowment for International Peace, 2002.
Balding D. J. "The DNA Database Search Controversy." Biometrics 2002 Mar; 58 (1): 241–4.
Henderson J. P. "The Use of DNA Statistics in Criminal
Trials." Forensic Sci Int. 2002 Aug 28; 128 (3): 183–6.
Mullis, K. B. and F. A. Faloona."Specific Synthesis of DNA in vitro via a Polymerase catalysed Chain Reaction." Methods in Enzymology no. 155 (1987): 335–50.