[an error occurred while processing this directive]
The most important tools in cryptanalysist toolbox are several tendencies of languages that are generally consistent and help make plaintext 'stand out.' This allows the cryptanalyst (ie: you) to compare the properties of ciphertext to the properties of normal plaintext. The more official name of these 'tendencies' are frequency distributions which represent the general distribution of characters in regular writing for a language. Other frequency distrubutions, including digraphs and trigraphs show the most common two and three letter pairings in languages.
In English, the letter 'e' is by far the most used letter, while the word 'the' is the most common. When analyzing messages, it is very useful to know the frequency distributions for many letters and key phrases. E, T, O, A, N, I, R, S, and H are the most common letters (in that order.) If you want to remember them, just try saying etoan-irsh a few times and pretty soon you'll have it. The rest of the letters, digraphs and specific frequencies will be so useful that eventually you'll memorize them subconciously without even trying.
The chart below shows the percent frequencies of all the letters in the alphabet. This chart is based on over 4.5 million letters we counted from five famous novels (well, we had a computer count for us). If you'd like to see how your data matches up to the novels (The Adventures of Tom Sawyer, Great Expectations, Moby , A Tale of Two Cities, and War and Peace), type in your own text into the box and press enter. A resulting graph will let you compare the frequency distribution of your text and a large average of text.
[an error occurred while processing this directive]