DNA Sequences, Unique
█ AGNIESZKA LICHANSKA
Deoxyribonucleic acid (DNA) contains genetic information of an organism that is unique for each organism. The entire cellular DNA of any organism, bacteria, plant or animal is known as its genome, as is the entire genetic material of a virus. A DNA sequence is considered to be unique if it is present in only one copy in a haploid genome. A haploid genome contains only a single copy of each chromosome. In humans, for example, a haploid number of chromosomes is 23. However, not all of the DNA contained in the genome is considered as unique; there are also various repetitive sequences present.
DNA and Genome Structure
A DNA strand is composed of a strand of nucleotides (nitrogen-based building blocks of DNA and RNA). Each nucleotide contains a phosphate attached to a sugar molecule (deoxiribose) and one of four bases, guanine (G), cytosine (C), adenine (A) or thymine (T). It is the arrangement of the bases in a sequence, for example ATTGCCAT, that determines the encoded gene. This sequence allows scientists to identify organisms, genes, or fragments of genes. One of the main characteristics of DNA is the fact that it forms double stranded molecules (helices) by forming hydrogen bonds between the complementary strands inside the helix and a sugar-phosphate backbone outside. This pairing is not random, A always pairs with T, and C pairs with G; therefore, a sequence complementary to ATTCCGAT will be TAAGGCTA.
Genes are the sequences of encoded proteins, and together with the surrounding regulatory sequences are, considered as unique genomic sequences, because they are present as single copies in a haploid genome. In contrast, some sequences are present in multiple copies and are known as repetitive fragments. The simplest genomes of viruses and bacteria contain mostly unique sequences with only a few repetitive regions. However, the proportion of repetitive DNA increases in higher organisms, for example sea urchins have only 38% unique sequences and human just over 50%.
The genes encoding the same protein in bacteria, plants, and humans show some similarity as the majority of the encoded proteins perform the same or similar function across the species. Such homology between the sequences allows scientists to identify the genes in humans by using fragments of mouse or yeast genes to search for similar DNA fragments. Although most of the genes show some species-dependent differences, not all of them can be used to discriminate between organisms. Only a few genes can be used for this purpose. The two main groups are ribosomal (16S in bacteria and 18S in animals) and mitochondrial genes.
Ribosomal genes are useful for tracing evolution and relationships, especially in bacteria. However, mitochondrial genes have an advantage over the ribosomal genes as they are not encoded by the nuclear DNA, but are present as circular molecules in the cells. As such they are less likely to be degraded with time; therefore bones, teeth, or tissue fragments can be identified even after a long time.
Exploiting Unique DNA Sequences
The presence of unique DNA sequences allows scientists to identify signature sequences that can be later used as probes to detect individual organisms or to detect a particular gene. Changes of even one base pair can be readily detected by most hybridization techniques and by sequencing. Signature sequences are particularly important for diagnosis of viruses, which are the pathogens that lack ribosomal or mitochondrial genes. Their detection and identification is greatly simplified by using these sequences, as traditional methods can take up to a few weeks.
The unique DNA sequences can also be used to design primers (short DNA fragments needed to initiate DNA amplification) for polymerase chain reaction (PCR). There is adequate difference between all the genes within one organism, as well as between organisms from different species, to ensure that the selected primers will only amplify the target sequence even if a mixture of different DNA molecules is present. This allows scientists to design diagnostic and identification tests for the common pathogens and diseases and for parts of the pathogen's genome.
Identification of people. Although every person has unique DNA (except for the identical twins), identification of people is not based on the sequencing of someone's genome. Instead, analysis of mitochondrial DNA in a region of a displacement-loop (D-loop or control region) or of short tandem repeats (STRs) is used for identification purposes. D-loop analysis is used for individual identification in forensic analysis. This is possible due to the polymorphisms of such sequences resulting from substitutions of base pairs during DNA replication process (for example, instead of A, DNA polymerase incorporates T).
The D-loop region is 1274 base pairs long and is located between the genes encoding transfer RNA (tRNA) for proline and tRNA for phenylalanine and contains the regulatory regions of the for replication other genes.
The main method used for the identification of the changes in this region is PCR amplification and sequencing. However, new microarray approaches are under development.
Encoding secret messages. DNA sequences offer a unique method of encrypting messages or concealing information. A DNA sequence encoding a message is flanked on the sites by primers that will be later used to amplify if by PCR and sequence. An encryption code is selected by a group that is using the system; for example, each letter and number might be assigned three base pairs. The DNA strand with a message is prepared and mixed with human genomic DNA fractionated to the same size as the message. To further conceal the DNA from an enemy, DNA from another species can be added. An intended recipient of the message can decode it by PCR amplification and sequencing. Sending such as message is as simple as writing a letter and enclosing the DNA coded message as a microdot. Once the DNA mix is prepared, it is spotted over a dot on paper from which the microdots are cut out and attached to the full stops in the letter. If such a letter falls into the wrong hands finding a message will be extremely difficult, as it will be buried among millions of others, and reading it without the primer sequences and encryption code will be impossible.
DNA encrypted messages can be used for safekeeping important information, but also to pass on espionage information. Although the method is simple, it requires molecular biology equipment to decode and can be too troublesome for everyday use.
Use of Unique DNA Sequences
Unique DNA sequences are already used as security tools. The ability to synthetically create DNA molecules allows the generation not only of spy messages, but more importantly, unique signatures that would protect consumers from product fakes. Similar methods were used at the Sydney Olympic Games in 2000 to mark all of the official merchandise. In this case, an invisible ink mixed with DNA obtained from one of the athletes was used. Protection is not limited to manufacturers. Unique DNA sequences are also used by artists such as Thomas Kinkade and cartoon creator Joseph Barbera, who protect their artwork by DNA signatures.
The major use of unique DNA sequences for security, however, is in the area of environmental surveillance and identification of agents of biological warfare. The sequences used for these purposes are often kept secret. Most of the producers of DNA recognition instruments use such sequences to design their products.
Finally, forensic science relies in many cases on the use of unique sequences for identification of biological traces and individual identification.
█ FURTHER READING:
Strachan, Tom, and Andrew P. Read. Human Molecular Genetics, 2nd ed. Oxford: BIOS Scientific Publishers, 1999.
Hartl, Daniel L. Genetics. Boston: Jones and Bartlett, 1994.
Clelland, C. T., V. Risca, and C. Bancroft. "Hiding Messages in DNA Microdots." Nature no. 6736 (1999): 533–534.
Wired News. "DNA Tagging." Stewart Taggart. < http://www.wiredcom/news/print/0,1294,34774,00.html > (15 January 2003).