Breaking Viginere Ciphers by Hand: An Example R.S. Fourney Copyright 2002 I had hoped to spend more time going over this particular example in class. Since I didn't, I took the time to type it in and make it available. It shows a little more detail, and more of the "why" between the steps than does the example in Chapter 9 of your text. We started with the ciphertext: UPWYS DOKWD OHBVG SRBBV XYCBF AKLFL IWKLY CPEDL RVFOJ LKAFA KLFPO LDLGH FZAAO IMIKA UGFEK VGBBT DOSZO MOUCI IQFNO IBTDH BSOWO OSHBK GSZTH MHYCU FAKVF We're assuming that it's a Vigenere cipher (since that's all we've learned to break so far--Caesar is just a Vigenere using a key of length 1) First, is it a Caesar? Figure out Index of Coincidence. Start by counting the number of instances of each cipher character: A- 7; B- 9; C- 4; D- 6; E- 2; F-10; G- 5; H- 6; I- 6; J- 1; K- 9; L- 8; M- 3; N- 1; O-12; P- 3; Q- 1; R- 2; S- 6; T- 3; U- 4; V- 5; W- 4; X- 1; Y- 4; Z- 3; IC = .045 => keylength around 4 Recall from the method already discussed, we do NOT assume keylength is equal to 4, but we are pretty sure that keylength is > 1 (If in doubt, guess a few keys of length one, decrypt and see what the plaintext looks like. Guess by eyeballing the distribution, or by maximizing F(i) as shown in the text for the Caesar Cipher example -- in descending order they are O, H, K, X,..) Convinced that it's not a Caesar Cipher, we try to determine the keylength. There are several methods of doing this, and I prefer the one in your text. Another popular method (perhaps easier to automate) involves dividing the ciphertext into various numbers of columns and calculating the IC for each column for each period. A period that gives mostly ICs consistent with a keylength of 1 (e.g. .066 or greater) is a likely candidate. For this example I'll use the method in the textbook (known as the Kasiski Method), which I seems have better luck with: In using Kasiski's method, we look for repeated patterns of letters, and consider the distance between repeats. The given ciphertext yields the following repeated patterns: Pattern Location Delta Factors 1st 2nd 3rd DO 5 9 ---- 4 2,2 DL 39 57 ---- 18 2,3,3 OS 82 106 ---- 24 2,2,2,3 BB 60 78 ---- 60 2,2,3,5 KL 27 33 51 6 2,3 18 2,3,3 24 2,2,2,3 BV 13 19 ---- 6 2,3 Now, even though the overall IC => Keylength of 4, and the very first pattern that repeats implies the period is 4, there are just too many 3s in the above chart for us to believe that the key is of length 4. Except for the first pattern, all of the repeats can be explained by a key length of 6. (If there as more ciphertext, we'd have a larger statistical sample, and would probably put more faith in the IC. Also, if there were more repeats with periods of 4 (or 8) we'd investigate the keylength = 4 case in more detail) Given all the 3s, and the fact that no 3 appears without a 2, we'll guess 2x3 or "6", and divide the ciphertext into 6 columns for further analysis: cipher: UPWYSD We'll then try to reinforce out guess by calculating OKWDOH the IC for each column (or suspected alphabet): BVGSRB BVXYCB Alphabet 1 has an IC of 0.15, which implies that it FAKLFL has been encrypted using a key with a period of 1. IWKLYC PEDLRV Alphabet 2 has an IC of .043, which implies a P of 6, FOJLKA which would indicate that we need many more columns, FAKLFP OLDLGH but Alph #3 has IC = 0.11, => P = 1, FZAAOI Alph #4 has IC = 0.10, => P = 1, MIKAUG Alph #5 has IC = 0.066 => P = 1, although FEKVGB Alph #6 has IC = 0.043, => P = 6. BTDOSZ OMOUCI All in all, we're pretty comfortable that the keylength IQFNOI is 6, since 4 out of our six alphabets seem to be BTDHBS encrypted with a key of length 1. Again, we'd expect OWOOSH better results with more ciphertext. With only 125 BKGSZT characters we wouldn't expect them ALL to be right HMHYCU on. If guessing single keys for each column doesn't FAKVF give good cleartext we will have to revise this, of course. Now, assuming key of length 6, we need to guess the single letter key for each of our six columns. This can be done by calculating the F(i) as shown on page 252 of your text for each alphabet and trying maximum values first, or by eyeballing the various distributions and acting on some hunches as shown on page 256. For example: Col A B C D E F G H I J K L M N O P Q R S T U V W X Y Z #1 5 6 1 2 1 4 1 1 #2 3 2 1 2 1 2 1 1 1 2 2 2 1 #3 1 4 1 2 1 1 6 2 2 1 #4 2 1 1 6 1 2 2 1 2 3 #5 1 3 3 2 1 3 2 3 1 1 1 * #6 1 3 1 1 1 3 3 1 1 1 1 1 1 1 * Exp: H M M M H M M H H R R M M H H M R H H H M L L R L R * Corrected 9/17/02. I had reported 3 Ls and 0 Cs, it is 3 Cs and No Ls in column 5. This does NOT change the IC for this column calculated above (why not). Also, I had reported 1 I for column six, and there are really 3. This should change the column IC shown above, but does not (so it must have been a typo when I typed in the example) --- Thanks to the alert student who pointed this out! Where the bottom line indicates the expected frequency (originally from Denning's book, referenced in your text). H, M, L, and R correspond to High, Medium, Low, and Rare expected frequencies of the letter in plaintext. (Note the correction to the text pointed out in class: both N and R are have High expected frequencies.) Also, according to this model, the most expected plaintext characters, in order are: ETAONIRSH So, looking at the B and F column of row one, I guess it's shifted by one, and the higher numbers of Bs and Fs in the ciphertext come from As and Es in the plaintext. Numerically, calculating F(i) for various shifts gives me B first (at .085) then X (.069) and O(.06). That's a big gap, so I like "B" for the first letter of the key. Looking at row 4 of the above table (column 4 of the ciphertext), I'd like to shift that "6" to E (or T, or A). Putting the 6 under the A gives no Es, but putting it under the E looks ok. Numerically this shift (corresponding to "H" being the fourth letter of the key) also looks good. F(i) is max for H (.08) followed by U(.056) and S (.051), again a comfortable gap. How about the second letter. Nothing really jumps out visually, and calculating F(i) gives us a max at I (.058), followed by W (.053), A (.049) and S (.047). Not a large gap, so lets look at the plaintext before we calculate the whole column of cleartext. Using I as the key would put an "H" right after the first "T", and that's often a good thing to do. It would also put "N"s behind a couple of "A"s, another familiar pattern. Lets assume that the second letter of the key is I and take a look at our plaintext under those assumptions: ciphertext plaintext key: BI_H__ UPWYSD TH_R__ OKWDOH NC_W__ BVGSRB AN_L__ BVXYCB AN_R__ FAKLFL ES_E__ IWKLYC HO_E__ PEDLRV OW_E__ FOJLKA EG_E__ FAKLFP ES_E__ OLDLGH ND_E__ FZAAOI ER_T__ MIKAUG LA_T__ FEKVGB EW_O__ BTDOSZ AL_H__ OMOUCI NE_N__ IQFNOI HI_G__ BTDHBS AL_A__ OWOOSH NO_H__ BKGSZT AC_L__ HMHYCU GE_R__ FAKVF ES_O_ This looks pretty good, and one wants to immediately stick an "E" after the initial "TH" (might want an "A" if that "R" wasn't right after it, but "E" would probably be a first choice even then.) This corresponds to an "S" for the third key letter, and doesn't match the F(i) calculation [which gives, in descending order: K (.061), W (.060), G (.059), and D (.058) before getting around to S at .056] BUT it's almost got to be a vowel. An "I" would work, but it's even further down the list. Let's try using "S" as a keyletter to put an "E" in that spot. This gives: ciphertext plaintext key: BISH__ UPWYSD THER__ OKWDOH NCEW__ BVGSRB ANOL__ BVXYCB ANFR__ FAKLFL ESSE__ IWKLYC HOSE__ PEDLRV OWLE__ FOJLKA EGRE__ FAKLFP ESSE__ OLDLGH NDLE__ FZAAOI ERIT__ MIKAUG LAST__ FEKVGB EWSO__ BTDOSZ ALLH__ OMOUCI NEWN__ IQFNOI HING__ BTDHBS ALLA__ OWOOSH NOWH__ BKGSZT ACOL__ HMHYCU GEPR__ FAKVF ESSO_ Some familiar looking patterns, probably enough to make some guesses, and our graphical method (above chart) doesn't seem to help with column 5, lets see what the mathematical model gives us. F(i), in descending order (for column 5) O (.067), K (.057), Y(.055), and B (.051). Again, a healthy gap so lets look at O as a potential key: ciphertext plaintext key: BISHO_ UPWYSD THERE_ OKWDOH NCEWA_ BVGSRB ANOLD_ BVXYCB ANFRO_ FAKLFL ESSER_ IWKLYC HOSEK_ PEDLRV OWLED_ FOJLKA EGREW_ FAKLFP ESSER_ OLDLGH NDLES_ FZAAOI ERITA_ MIKAUG LASTG_ FEKVGB EWSOS_ BTDOSZ ALLHE_ OMOUCI NEWNO_ IQFNOI HINGA_ BTDHBS ALLAN_ OWOOSH NOWHE_ BKGSZT ACOLL_ HMHYCU GEPRO_ FAKVF ESSOR Now I see complete words that I recognize, along with longer words such as: "PRO_ESSOR", and "K_OWLED_E", and even "COLL_GE" in front of "PRO_ESSOR", so I make a guess that the last letter of the key is "P", which gives: ciphertext plaintext key: BISHOP UPWYSD THEREO OKWDOH NCEWAS BVGSRB ANOLDM BVXYCB ANFROM FAKLFL ESSERW IWKLYC HOSEKN PEDLRV OWLEDG FOJLKA EGREWL FAKLFP ESSERA OLDLGH NDLESS FZAAOI ERITAT MIKAUG LASTGR FEKVGB EWSOSM BTDOSZ ALLHEK OMOUCI NEWNOT IQFNOI HINGAT BTDHBS ALLAND OWOOSH NOWHES BKGSZT ACOLLE HMHYCU GEPROF FAKVF ESSOR Or, with proper spacing and punctuation: There once was an old man from Esser, Whose knowledge grew lesser and lesser. It at last grew so small, He knew nothing at all, And now he's a college professor. The key, "BISHOP", is the last name of the author of the textbook, off of whose web page I stole the above limerick (his web page is also a good place to look for computer security related information) Copyright January 2002 R.S. Fourney Note: The above example is for the benefit of CMSC414 students at the University of Maryland. If it is of use to you, feel free to copy it, so long as it is copied in it's entirety, including this message and attribution.