Vigenere Better Understood

I have for some time speculated that the 1500's Cipher, "Vigenere," is not a mathematical group.

Not to bury the lead, IT IS NOT! 

However, this is not completely satisfying for the question I was asking.
 
Vigenere could be represented as a Matrix function such that f(P) = C and f(C) = P. (P for plaintext, C for ciphertext.)

Early, I observed that when the operation is repeated with passwords of differing lengths, the periodicity is equal to the product of the unique prime factors of the password lengths. In encryption, there is also a concept of the amount of ciphertext necessary to calculate or infer the password. This is called the "Unicity distance." In Vigenere, the unicity distance is the same as the period.

Since the idea in encryption is to extend the period, I amused myself with double and triple encrypting.

Around 1995, I worried that a double encrypted message of a specific period might be comparable to a message encrypted ONCE, with a single password equal in length to the period of the cryptogram. To satisfy myself about this worry, I tested three password lengths: 2, 3 and 6.

For a message I employed a sequence of six "E's." I found that an exhaustive listing of all 2 then 3 cipher texts did not contain the ciphertext of a particular password of six characters in length.

Soon, I was back at my Atari 520ST, repeating the experiment with an exhaustive list of 3 then 2 passwords.

When I completed this test, I concluded that I had _proven_ that Vigenere was NOT a group. As you may imagine, I was ecstatic. I later embarrassed myself by publishing in this blog, the inference that NO six character password could do the job. (I am still smarting for printing that SHA256 is 256 Bytes long.)

The problem is this. I had proven that there exists _some_ six character cryptogram that is not in the union set of 2 then 3 and 3 then 2 cryptograms. This might have been good news to my speculation, and it DOES PROVE that the operation is not technically a group, but it fails to adequately prove that there is NO six character password that can do the job.

Now I have a bigger hard disk drive, and a faster processor. Simon Singh devised a way to make the original cipher lossless, and applicable to computers, and I adopted his grid. I have extended my efforts in implementation, to include salting the alphabet for the grid, whereby, f(P) <> g(P), operating on the same plaintext WITH THE SAME PASSWORD. Good news for encryption, but still not an answer to my question. I further persisted, by extending the alphabet from 26 x 26 to 256 x 256.

At this point I revisited my original question, and resolved to test exhaustive 3x2 and 2x3 "space," against EXHAUSTIVE 6 space.

We have fast computers, but this was computationally too expensive. It is analogous to [(26^2 * 26^3) + (26^3 * 26^2)] * [(26^6)] comparison operations. No shortcuts!
While 2 * 26 ^ (2+3+6) is a very large number, I had to be encouraged that 2^26 - 2 * 26^(2+3+6) = 26^26 - 26^11, which is a very helpfully large space where there is room for uniqueness. (There are 26^26 unique alphabets, when noting ordinal value.)

Then I had a stroke of insight. I am already speculating that a test on 26 x 26 will prove a rule for all 256 x 256. What I was doing was attempting to generalize for all n x n. What if I choose a smaller n? I resolved to test a 10 x 10 alphabet.

Here are data sets (named as text files,) built on a ten character alphabet.

The data sets are descriptively named, building on a single message, "EEEEEE." 

The Python3 scripts to build the data sets of a 10 x 10 matrix are included. The test script as proof of concept that compare.py will detect "collisions," is named test_compare.py.  compare.py runs in less than 35 mins (on an overclocked Core i5 9600.) 

I detected 1000 "collisions" in 2 then 3, and 100 more in 3 then 2, accumulating to 1100 collisions. These represent "weak passwords," in double Vigenere.

test_compare.py accepts test.txt as input, and compares it to data1.txt and data2.txt.

I now asking myself this question: What about grids of prime dimensions? I modified these scripts to run against 11 x 11 space. Would this improve the experiment? 

The result is that 2 then 3 has 11^2 collisions, and 3 then 2 has 11^3 collisions, testing against a single 6 character message. 

This led me to speculate that 2, 5 Double Vigenere, on a ten character message would have 10^5, but left me puzzled as to what the result might be for 5 then 2. (Scripts.) This proved computationally too challenging for a trivial effort. (I still have the data.)

Column then Row lookup is NOT a hash.

Flush with enthusiasm, I recalled that Vigenere is only symmetric against a row, column lookup. I have long speculated that a column, row lookup results in a hash - that is, that there is no plaintext/passwordtext lookup in row, column that can reverse a column, row "encryption." 

I took the results for 2 character passwords "HASHED," (and 3 characters likewise,) and compared them against 6 character row, column lookup, "encrypted."

The results was not only collisions, but an exhaustive list. Passwords to reverse a column, row lookup exist in row, column lookup, but they are hard to find.

Weak keys

The fact that 2,3 encryption on a six character message results in (dimension of the grid)^3 "collisions" indicates that in 2 then 3, these keys are "weak," since there exists a six character lookup that will brute force it. However, under 3 then 2, there are far fewer.

To understand, we can contemplate a 2 character password resulting in a six character message "ABABAB." For this message there might exist a three character password that results in "ABA," followed by "BAB." Similar messages, but arrived at by different paths. Likewise, two character passwords that repeat are "AA," "BB," etc, resulting in cryptograms that repeat, but NEVER THE SAME LETTER. The WORST CASE SCENARIO is substitution, even if it IS, "simple."

I have not gone back and exhaustively listed 2,3 and 3,2 plaintext, passwordtext combinations to derive these. 

It is worth recalling that a one-time pad is nothing but simple substitution on a (long) unpredictable password.

Do these collisions propagate, when adding rounds, or do further rounds preclude them, ruling them out? This remains an open question.

With this under consideration, we can speculate that, when encrypting more times, it is beneficial to encipher long keys first.

Comments

Popular posts from this blog

A Question About Erasthmus' Sieve

A Cryptographically Sound Vigenere.

Notice of corrupted results: Vigenere may yet be found to be a "group."