Codes and Ciphers
█ LARRY GILMAN
Codes and ciphers are forms of cryptography, a term from the Greek kryptos , hidden, and graphia , writing. Both transform legible messages into series of symbols that are intelligible only to specific recipients. Codes do so by substituting arbitrary symbols for meanings listed in a codebook; ciphers do so by performing rule-directed operations directly on original message text. Because codes can only communicate concepts that are listed in their codebooks, they have limited flexibility. Rather, modern cryptography relies almost entirely on ciphers implemented by digital computers, and is widely employed in industry, diplomacy, espionage, warfare, and personal communications.
Codes. A code is a set of symbolic strings ("code groups") that are listed, along with their assigned meanings, in a code book.
Codes encrypt messages by substitution, that is, they substitute code groups for components of the original message. "Kill the king at midnight" could thus be encoded, for example, as "OAKEN 7890 SPINDRIFT." Without the code book, it would be difficult for a reader of the encoded message to form an idea of its meaning.
Either a word or a number can be used as a code group. Code groups that are words are termed code words and those that are numbers are termed code numbers. Note that a single code group can encode a single word ("king") or an entire phrase ("deliver the films to agent number 3"). A coded message may, therefore, be shorter than the original message. It can also be made as long as or longer than the original message, if the codebook
provides lengthy code phrases for single concepts or nonsense code groups for padding purposes. Such techniques can be used to make encoded messages harder for opponents to read.
Ciphers. A cipher uses a system of fixed rules (an "algorithm") to transform a legible message ("plaintext") into an apparently random string of characters ("ciphertext"). For example, a cipher might be defined by the following rule: "For every letter of plaintext, substitute a two-digit number specifying the plaintext letter's position in the alphabet plus a constant between 1 and 73 that shall be agreed upon in advance." If 46 is the agreed-upon constant, then the plaintext word ZAP enciphers to 724762 as follows:
- Plaintext letter Z = ciphertext 72 (alphabet position 26 + 46).
- Plaintext letter A = ciphertext 47 (alphabet position 1 + 46).
- Plaintext letter P = ciphertext 62 (alphabet position 16 + 46).
Incorporation of a variable term into a fixed algorithm, as in this example, is typical of real-world ciphers. The variable component is termed a key. A real key would be longer and would have a more complex relationship to the cipher algorithm than the key in this example, but its basic role would be the same: a key fits into an algorithm so as to enable enciphering and deciphering, just as a physical key fits into a lock to enable locking and unlocking. Without a key, a cipher algorithm is missing an essential part. In fact, so important is the concept of the key that in real-world ciphering it is not algorithms that are kept secret, but keys. Cipher designers assume that their algorithms will always become known to their opponents, but design the relationship between key and algorithm so that even knowing the algorithm it is almost impossible to decipher a ciphertext without knowing the appropriate key. Before a cipher can work, therefore, a key or set of keys must be in the possession of both the sender and the receiver.
If the key were always the same, it would simply constitute a permanent part of the algorithm, and keying would have no special advantage over trying to keep one's algorithm secret to begin with. Keys must, therefore, be changed occasionally. A new key may be employed every day, for every message, or on some other schedule.
Comparison of codes and ciphers. Codes have the advantage of simplicity. No calculations are required to encode or decode messages, only lookups in a codebook. Further, because a code uses no fixed system for associating code groups with their meanings (even the amount of meaning assigned to a code word can vary, as seen above), a code may fail gracefully—that is, an enemy may discern the meaning of a few code groups but still be unable to interpret others. In contrast, a cipher produces ciphertext from plaintext (and vice versa) according to a fixed algorithm. Thus, if an enemy determines the algorithm and steals or guesses a key, they can at once interpret all messages sent using that key. Changing the key may restore cipher security, unless the enemy has developed a system for guessing keys. One such system, always possible in theory, is to try all possible keys until one is found that works.
Codes, however, have two great disadvantages. Users can only send messages that can be expressed using the terms defined in the codebook, whereas ciphers can transmit all possible messages. Additionally, all codes are vulnerable to codebook capture. If a codebook is captured, there is no recourse but to distribute new codebooks to all users. In contrast, the key–algorithm concept makes cipher secrecy dependent on small units of information (keys) that can be easily altered.
Secure ciphers, however, entail complex calculations. This made the use of complex ciphers impractical before
the invention of ciphering machines in the early twentieth century; codes and simple ciphers were the only feasible methods of ciphering. Yet, a cipher that is simple to implement is proportionately simple to crack, and a cracked cipher can be disastrous. It is better to have to communicate "in the clear"—to send messages that can be easily read by the enemy—than to suppose that one's communications are secret when they are not. Mary, Queen of Scots (1542–1567) was executed for treason on the basis of deciphered letters that frankly discussed plans for murdering Queen Elizabeth of England; likewise, simple ciphers used by the Confederacy during the U.S. Civil War were easily cracked by Union cryptographers. What is more, even more sophisticated ciphers, such as the Enigma cipher used by Nazi Germany during World War II or implemented today on digital computers, are subject to attack. As soon as any new cipher is invented, someone, somewhere starts attacking it. The result is that ciphers, like some antibiotics, have limited lifespans, and must be regularly replaced.
Historical perspective. Throughout much of the ancient world, writing was either completely unknown or was an arcane art accessible only to priests. There was little motive, therefore, to develop coding or ciphering. Eventually, however, writing came to serve military, personal, and commercial as well as sacred purposes, creating a need for secure communications. To meet this need, ciphers based on scrambling the order of plaintext characters or on substituting other characters for them were developed. The first recorded use of ciphering was by the Greek general Lysander in the fifth century B.C. The Kamasutra , a Hindu text compiled in the A.D. fourth century from manuscripts dating back as far as the fourth century B.C. , recommends monoalphabetic substitution ciphering—the replacement of each letter of a plaintext message with a different letter of the alphabet—as one of the 64 arts to be mastered by an ideally-educated woman. By the first century B.C. , codes had also been developed.
Cryptography fell out of use during the early Middle Ages, but Arab scholars during the heyday of medieval Muslim civilization, the Abbasid caliphate ( A.D. 750–1258), revived it. Muslim writers not only ciphered, but invented cryptanalysis , the systematic breaking of ciphers. Ninth-century Arab philosopher Abu Yusuf al-Kindi wrote the earliest known description of the cryptanalytic technique known as frequency analysis, which breaks substitution ciphers by matching ciphertext letters with plaintext letters according to their frequency of use in the language. In English, for example, the most frequently used letter is E; in an English-language ciphertext produced using a monoalphabetic substitution cipher, therefore, the most frequently used character probably stands for E.
During the late Middle Ages and the Renaissance, a literate ruling class arose throughout Europe, and ciphering regained importance in that part of the world for purposes of intrigue, espionage, and war. English monk and scientist Roger Bacon (1220–1292) wrote a book describing several cryptographic methods; Italian artist Leon Battista Alberti (1404–1472) wrote the first European text on cryptanalysis in 1466. Under pressure from cryptanalysis, codes and cipher systems gradually became more complex.
Beginning in the mid-nineteenth century, the importance of coding and ciphering was rapidly amplified by the invention of electronic information technologies: the telegraph (1837), the telephone (1876), radio (1895), and electronic computers (1940s). Non-secret commercial codes were developed in conjunction with telegraphy to make messages more compact (therefore cheaper); ciphers were widely used (and cracked) during the U.S. Civil War and the first and second world wars. The cracking of German and Japanese ciphers by Allied cryptographers during World War II was of particular importance, enabling the British and Americans to avoid submarines, intercept ships and aircraft, and otherwise frustrate enemy plans. Ciphering has since become basic to military and government communications. Since the 1960s, commercial and personal communications have become increasingly dependent on digital computers, making sophisticated ciphering a practical option for those sectors as well. In the late 1970s, the U.S. government defined a cipher algorithm for standard use by all government departments, available also to the public; this now-elderly algorithm, the Digital Encryption Standard, is today in the process of being replaced by a new algorithm, the Advanced Encryption Standard.
Types of codes. Codes can be generally divided into one-part and two-part codes. In a one-part code, the same codebook is used for encipherment and decipherment. The problem with this system is that some systematic ordering of the code groups and their assigned meanings must be made, or it will be difficult to locate code groups when enciphering or their meanings when deciphering. (A randomly ordered list of words or numbers thousands of terms long is difficult to search except by computer.) Thus, code groups tend to be arranged in alphabetic or numerical order in a one-part code, an undesirable property, since an opponent seeking to crack the code can exploit the fact that code groups that are numerically or alphabetically close probably encode words or phrases that are alphabetically close. To avoid this weakness, a two-part code employs one codebook for encipherment and another for decipherment. In the encipherment codebook, alphabetically ordered meanings (e.g., A, ABDICATE, ABLE) are assigned randomly ordered code groups (e.g., 6897, 1304, 0045). In the decipherment codebook, the code groups are arranged in order (e.g., 0045, 1304, 6897), for easy location.
Code security can be improved by combining ciphering with coding. In this technique, messages are first encoded and then enciphered; at the receiving end, they are first deciphered and then decoded. A standard method for combining coding and ciphering is the "code plus additive" technique, which employs numbers as code groups and adds a pseudorandom number to each code group to produce a disguised code group. The pseudorandom numbers used for this purpose are generated by modulo-arithmetic techniques closely related to those used in stream ciphering.
Block ciphers. Ciphers that encrypt whole blocks of characters at once—such as 10 letters at a time, or 128 bits—are termed block ciphers. Block ciphers have the advantage that each character in each ciphertext block can be made to depend complexly on all characters of the corresponding message block, thus scrambling or smearing out the message content over many characters of ciphertext. The widely used Digital Encryption Standard (DES) is a block cipher that employs a 56-bit key to encrypt 56-bit blocks. In DES, the key and each message block are used as inputs to a complex algorithm that produces a 56-bit block of ciphertext. The same key is used to decode the block of ciphertext at the receiving end.
Stream ciphers. Stream ciphers operate upon series of binary digits ("bits," usually symbolized as 1s and 0s), enciphering them one by one rather than in blocks of fixed length. In stream encipherment, a series of bits termed the key-stream is made available by some means to both the sender and receiver. This stream is as long as the message to be sent. At the sending end, the key-stream is combined with the message-stream in a bit-by-bit fashion using the exclusive or operation of Boolean algebra, producing the ciphertext. At the receiving end, the same key-stream is combined again with the ciphertext to recover the message stream. This system of ciphering is unbreakable in both theory and practice if the key-stream remains secret. Ongoing breakthroughs in quantum cryptography may soon make perfectly secret key-streams available by exploiting certain properties of photons. If these techniques can be made technologically practical, truly unbreakable cipher systems will have become available for the first time in history.
Public-key ciphers. All ciphers require the use of a secret key. Public-key ciphers, first developed in the late 1970s, are no exception. However, public-key ciphers have the important advantage that the secret key possessed by the sender need not be the same secret key possessed by the receiver; thus, no secure transfer of keys between the sender and receiver is ever necessary.
Public-key ciphers exploit the computational difficulty of discovering the prime factors of large numbers. (The prime factors of a number are the primes that, when multiplied together, produce the number: e.g., the prime factors of 15 are 5 and 3.) To create a public key, two large (50-digit or longer) primes are chosen and their product calculated. This number ( r ) is made public. Further mathematical operations by the user produce two numbers based on r ; one of these is the user's public key k p , and the other is retained as the user's private key k s . Anyone that knows r and a given user's public key k p can send encrypted messages to that particular user; the recipient decrypts the message using their private key k s .
Public-key cryptography has seen wide use since the 1970s. Its security is limited by the ability of opponents to determine the prime factors of r , and the difficulty of this task is a function both of the size of r and of the speed of available digital computers. (Large r also makes encryption and decryption more computation-intensive, so it is not practical to defeat opponents by simply making r extremely large.)
Software for a powerful public-key cipher algorithm known as Pretty Good Privacy (PGP) is downloadable for free from many sites on the Internet.
Attacking codes and ciphers. Codes and ciphers can be attacked by two basic means. The first is theft of codebooks or keys—espionage. The second is cryptanalysis, which is any attempt to crack a code or cipher without direct access to keys or codebooks. Cryptanalysis may proceed either by trial and error or by systematic analysis of plaintext and ciphertext. The analytic approach may involve both looking for patterns in ciphertext and solving mathematical equations representing the encryption algorithm.
Cryptanalysis by trial and error usually means guessing cipher keys. A cipher key can be guessed by trying all possible keys using a computer. However, designers of encryption systems are aware of this threat, and are constantly employing larger and larger keys to keep ahead of growing computer speed. Systematic cryptanalysis may seek patterns in ciphertext, either by itself or in conjunction with a known plaintext (the so-called "known-plaintext attack"). Mathematical modeling of cipher algorithms may assist trial-and-error methods by reducing the number of guesses required to within (or near) practical limits. For example, in 2002, cryptographers announced that the recently-standardized Advanced Encryption Standard of the U.S. government might be vulnerable to a mathematical attack that would reduce the number of computations needed for a successful trial-and-error attack from order 2 256 to order 2 100 . The latter number is still not computationally practical, but may be soon.
Quantum cryptography holds out the promise of truly attack-proof ciphering. In a quantum-cryptographic system, not only would messages be undecipherable if intercepted, but also the act of interception would always be detectable by the intended receiver. Such systems may become available to military and government users around 2010.
█ FURTHER READING:
Charthouse, Robert. Codes and Ciphers. Cambridge, UK: Cambridge University Press, 2002.
Meyer, Carl H., and Stephen M. Matyas. Cryptography: A New Dimension in Computer Data Security. New York: John Wiley & Sons, 1982.
Mollin, Richard A. An Introduction to Cryptography. New York: Chapman & Hall, 2001.
Singh, Simon. The Code Book. New York: Doubleday, 1999.
Stinson, Douglas R. Cryptography: Theory and Practice. New York: Chapman & Hall, 2002.
Seife, Charles. "Crucial Cipher Flawed, Cryptographers Claim." Science no. 5590 (2002): 2193.
FISH (German Geheimschreiber Cipher Machine)
French Underground During World War II, Communication and Codes
World War I: Loss of the German Codebook
World War II, United States Breaking of Japanese Naval Codes