|
 
 
     
Once you've classified a ciphertext as coming from a polyalphabetic system you need to determine the period of the system. The period of a polyalphabetic substitution is another name for how many cipher alphabets are used in a message. Determining the period is probably the most important step in solving a polyalphabetic cipher.
In Porta like systems, the period of the system is 26. A new alphabet is used for each letter, and then repeats in the same order after the first 26 alphabets are used up. The twenty-seventh letter, then, uses the same cipher alphabet that was used for the first letter of the message.
Suppose, though, that we wanted to encode a message using Alberti's disk system with a keyword to specify the alphabet changes. One way to do this would be to write the plaintext message out completely, and then write the keyword over the text, repeating when necessary. Then, to encode the message, we would use the letter of the keyword above the plaintext letter to specify the keyletter that the disks should be set to for that particular letter. An example is shown below:
Animated gifs showing disk spinning….
Keyword: KEYWORDKEYWORD
Plaintext: a plaintext message
Ciphetext: ciphertext of message
Now, how in the world are you supposed to determine that the period for the above system is 7 if you only have access to the ciphertext?
The basis for determining a system's period rests in the frequency of the same polygraphs within the ciphertext and the distance that separates these identical polygraphs.
Quite a mouthful, huh? Let's break it down.
The 'frequency of pairs of the same polygraphs' simply means the number of combinations of letters in the ciphertext that are found more than once. Most of the time, cryptanalysists look for digraphs (2 letter combos) that appear more than once in the ciphertext because these are the most common.
The 'distance that seperates these identical polygraphs' means exactly what is says. When a pair of duplicate polygraphs are found in the text, the analysist counts the number of letters that separate the two. The count is performed begins just after the first combination and ends at the end of the second combination.
If we had the following ciphertext, we would count 12 spaces between the digraph pair 'gs' and 'gs'.
Tsfodfrpgsakdhbgnloegshfnash
Now, why do we do this?
Each similar polygraph in a message may represent the same word and have been encoded with the same alphabet. It's quite likely that in long messages there should be some coincidences in which the exact same text was encoded using the exact same keyletters to determine the alphabet. It might not happen a lot, but it certainly does happen. And by counting the number of spaces between the two identical polygraphs, we must be counting a number which is a multiple of the actual period. In the example in which we counted a 12 space difference, the keyword could be 3, 4, 6, or 12, letters long.
Of course, not all duplicate polygraphs will have occurred because the same text was encoded by the same alphabets. Many times, combinations of non related text will produce the same ciphertext. Although we can't distinguish these random events from the true ones all the time, there are several tell tale signs that indicate a polygraph was produced randomly.
For starters, William Friedman (the Father of American Cryptology and introducer of mathematics as a common tool for cryptanalysis) compiled how many random polygraphs one may expect to find in a message with a certain number of characters. The formula to determine how many polygraphs would appear randomly is a based on the binomial theorem.* The data that this formula creates is a little confusing, even in the table below, so bear with us.
The 'number of letters' column in the chart indicates how many random letters were assumed to be in the plaintext which the rest of the row used to calculate their values.
The columns which have headers like 'E(2)' and 'E(3)' indicate the number of Expected digraph pairs.
For example, for 300 letters, you would expect to find about 43 digraphs repeated TWICE, as seen in the E(2) column in the row for 300 letters. You should also expect to get about 6 digraphs that appear THREE times, and perhaps 1 digraph that appears FOUR times ('E(4)'). These number represent how many digraphs will appear simply because of the random interaction of text.
Similarly, graphs can be made of the number of other polygraphs…as shown below…
When you count the polygraphs that appear in a polyalphabetic ciphertext, the above tables give you an approximation of how many of the pairs will actually be meaningless. This will be clear in our full example that will show you how to put theory into practice with an example.
    
|