Chapter 6. Unit Testing

Table of Contents

6.1. Diving in

In previous chapters, we “dived in” by immediately looking at code and trying to understanding it as quickly as possible. Now that you have some Python under your belt, we’re going to step back and look at the steps that happen before the code gets written.

In this chapter we’re going to write a set of utility functions to convert to and from Roman numerals. You’ve most likely seen Roman numerals, even if you didn’t recognize them. You may have seen them in copyrights of old movies and television shows (“Copyright MCMXLVI” instead of “Copyright 1946”), or on the dedication walls of libraries or universities (“established MDCCCLXXXVIII” instead of “established 1888”). You may also have seen them in outlines and bibliographical references. It’s a system of representing numbers that really does date back to the ancient Roman empire (hence the name).

In Roman numerals, there are seven characters which are repeated and combined in various ways to represent numbers.

  1. I = 1
  2. V = 5
  3. X = 10
  4. L = 50
  5. C = 100
  6. D = 500
  7. M = 1000

There are some general rules for constructing Roman numerals:

  1. Characters are additive. I is 1, II is 2, and III is 3. VI is 6 (literally, “5 and 1”), VII is 7, and VIII is 8.
  2. The tens characters (I, X, C, and M) can be repeated up to three times. At 4, you have to subtract from the next highest fives character. You can’t represent 4 as IIII; instead, it is represented as IV (“1 less than 5”). 40 is written as XL (“10 less than 50”), 41 as XLI, 42 as XLII, 43 as XLIII, and then 44 as XLIV (“10 less than 50, then 1 less than 5”).
  3. Similarly, at 9, you have to subtract from the next highest tens character: 8 is VIII, but 9 is IX (“1 less than 10”), not VIIII (since the I character can not be repeated four times). 90 is XC, 900 is CM.
  4. The fives characters can not be repeated. 10 is always represented as X, never as VV. 100 is always C, never LL.
  5. Roman numerals are always written highest to lowest, and read left to right, so order of characters matters very much. DC is 600; CD is a completely different number (400, “100 less than 500”). CI is 101; IC is not even a valid Roman numeral (because you can’t subtract 1 directly from 100; you would have to write it as XCIX, “10 less than 100, then 1 less than 10”).

These rules lead to a number of interesting observations:

  1. There is only one correct way to represent a number as Roman numerals.
  2. The converse is also true: if a string of characters is a valid Roman numeral, it represents only one number (i.e. it can only be read one way).
  3. There is a limited range of numbers that can be expressed as Roman numerals, specifically 1 through 3999. (The Romans did have several ways of expressing larger numbers, for instance by having a bar over a numeral to represent that its normal value should be multiplied by 1000, but we’re not going to deal with that. For the purposes of this chapter, Roman numerals go from 1 to 3999.)
  4. There is no way to represent 0 in Roman numerals. (Amazingly, the ancient Romans had no concept of 0 as a number. Numbers were for counting things you had; how can you count what you don’t have?)
  5. There is no way to represent negative numbers in Roman numerals.
  6. There is no way to represent decimals or fractions in Roman numerals.

Given all of this, what would we expect out of a set of functions to convert to and from Roman numerals? requirements

  1. toRoman should return the Roman numeral representation for all integers 1 to 3999.
  2. toRoman should fail when given an integer outside the range 1 to 3999.
  3. toRoman should fail when given a non-integer decimal.
  4. fromRoman should take a valid Roman numeral and return the number that it represents.
  5. fromRoman should fail when given an invalid Roman numeral.
  6. If you take a number, convert it to Roman numerals, then convert that back to a number, you should end up with the number you started with. So fromRoman(toRoman(n)) == n for all n in 1..3999.
  7. toRoman should always return a Roman numeral using uppercase letters.
  8. fromRoman should only accept uppercase Roman numerals (i.e. it should fail when given lowercase input).

Further reading

  • This site has more on Roman numerals, including a fascinating history of how Romans and other civilizations really used them (short answer: haphazardly and inconsistently).