A Cryptographically Sound Vigenere.
In middle school, I learned a version of the Vigenere Cipher from a paperback by a German author. It was a 25 x 25 grid, with the letters of the alphabet across, up to "y," and dropping the "z," for column headings and rows. The grid was constructed by writing the rows as the entire 26 letter alphabet, beginning with the letter "a," and pushing the columns over one column to the right, every row. The next row began with , "z," and the rest of the rows followed suit. I recall that there was supposed to be a way to implement a Salt in the book, but I am unable to duplicate it.
I learned of lossless Vigenere from Simon Singh's book, "The Code Book." My early experiment was to double encrypt with various passwords in an attempt to increase the Unicity distance, although I did not know the term.
By experimentation, I found that the Unicity distance was equal to the product of the unique prime factors of the password lengths. This is trivial to show.
My next experiment was to encrypt a six character message, first with a two character password, then with a three character password. To test this, I ran a list of the message, "eeeeee" encrypted with ALL possible two character, three character password combinations. I then encrypted "eeeeee" with a single six character password.
This was to show that double encryption wasn't doing anything that a single six character password wouldn't do.
I was pleased to discover that the ciphertext encrypted with a six character password was not found in the list.
I repeated the experiment with all three character, two character passwords, with the same results.
[Atari 520ST, GFA Basic.]
More recently, I became interested in the value of hashing a password, so that the cryptogram would not give up the password. The md5 hash is sixteen bytes long, and I divided this up into a three character password, a five character password and a seven character password, using fifteen bytes. I computed that this would give me a Unicity distance of 105 bytes.
To improve on this, I extended Simon Singh's lossless Vigenere from 26 x 26 characters to 256 x 256, to accommodate encrypting executable files and others. This resulted in triple encrypting. I observed that different orders yielded different key combinations: 3,5,7 was different from 3,7,5. The same was true of 5,3,7; 5,7,3; 7,3,5 and 7,5,3. From this I concluded (possibly redundantly,) that the cryptogram didn't give up key material.
In my implementation, I succeeded in salting the grid by not knowing that I couldn't do it. I took the password, struck duplicate characters, then struck those from the alphabet and prepended them to it. I built the grid on that basis, but exploited the fact that an ASCII char is a hex value, and used the ASCII alphabet for the indexes.
This resulted in a unique grid for every password. This eventually means that encrypting two messages with the same password may result in ciphertext that is not subject to differential analysis.
I knew that the Unicity distance was only 105 bytes, but I thought it would be difficult to statistically analyze anything less than 30 times that distance: 30 rows of 105 columns would leave an analyst doing T-tailed testing on 105 individual columns of 256 character alphabets.
In statistics, the number of variables increases the sample size required for testing, and I was willing to trust the md5 Vigenere as much as 105 * 120 bytes, or 12.3 K.
Never satisfied, I began to speculate what might be done with a larger hash. I knew that SHA256 was 256 bytes long, and decided to see if breaking that into three primes would give me a more respectable result. I was somewhat disappointed.
In my previous efforts with md5, I compared the run-time to encrypt a large (600MB) file with an md5 Vigenere, (run interpretive in Python,) and Bcrypt. Bcrypt was faster, but md5 Vigenere still compared favorably with other implementations.
I began theorizing more encryption cycles, supposing that I might call them "rounds," as other algorithms call their cycles "rounds."
I eventually stopped at 13 rounds, with a chosen set of prime lengths. This resulted in 304,250,263,527,210 bytes of data for the x1 pad secure Unicity distance.
It is not possible to do better with 256 bytes of data - the first 14 primes add up to 281 bytes, too long for SHA256 to accommodate.
Comments
Post a Comment