Yet another utterly futile attempt to find a md5 fixpoint
During lunch discussion at my workplace we started wondering if there existed a fixpoint for the md5 cryptographic hash function.
That is a point such that md5('x')='x' where x is the 32-digit hexadecimal string representation.
To be completely truthful the md5 algorithm works on 512 bit values. A 32-digit hexidecimal string is 128 bit and is first padded(with zeroes) to 512 bits before the md5 function is taken and
returns a 128 bit value which then is converted to a 32 digit hexidecimal string thus explaining my md5('x')='x' expression.
Some googling showed this is known as the "Kember Identity" and is an unsolved problem. Also a few programmers had tried to brute force it over the years with no luck (not surprisingly as you will see below). Even though I would never succeed I decided to look into the problem anyway.
Interestingly there is a simple mathematical argument that there is ~63.21% probability for at least one fixpoint. This does not mean there is one of course! But it also reveals there is probably only a few, if any! The argument is based on the same principly that when you shuffle a deck of card, what is the probability that at least one card will end up in the same position in the deck. The probability actually converges to 1-1/e very fast as the number of the elements shufled increases. There is no guarantee that the md5 function is a shuffling though, there could exist different 32-digit hexidecimal strings x,y such that md5(x)=md5(y). But such collisions are extremtly rare and would not change the overall probability by much.
Examples of different x,y with same hashing value does exist , see http://en.wikipedia.org/wiki/MD5
etc. But this example is 512 bit and not 128 bit (32 hexidecimal). The md5 hash function is not considered safe anymore due to collision vulnerabilities , but this is completly irrelevant to my mission of finding a fix point.
Lets see how futile a brute force attempt will be:
There are 2^128= 32^16 ~ 3.4*10E38 different values.
My brute force program can do 3.5M hash/sec (For each CPU used)
This gives 3*10E24 years to try all combinations (Using 1 CPU).
Not a very uplifting result and now you know why I used the word futile.
So without a cryptographic breakthrough, humanity will never know if such a point indeed does exist.
And now back to my brute force attempt:
I made a small Java program that can be found on Github on the link below.
Thanks to +Toke Eskildsen
for doubling the performance using tricky bit manipulation instead of the standard Java methods as substring, equals etc.
The code is much harder to read now though, nothing comes for free...
The program starts with a random 32 digit hexadecimal string (for each CPU) and then iterative computes md5(x) and check if it matches x. Using this endless chaining of md5 values there is no performance overhead from generating a a lot of random strings. Of course there is a (tiny!) probability that this would result in an endless loop as it eventually is bound to do! But cycle detection would slow
the program down and detecting a cycle having 1E10 elements would take way too many resources anyway. Also I restarted the program
regulary over the 2 weeks I had it running on 24 CPUs and this would have reset any loops.
Since I would not find a fix-point but still would like to program to output somthing, I logged the maximum prefix and suffix match for md('x')='x'
The maximum match I found was a 12 character suffix and a 12 character prefix match shown below:prefix 12
54db1011d76dc70a0a9df3ff3e0b390f -> 54db1011d76d137956603122ad86d762suffix 12
df12c1434cec7850a7900ce027af4b78 -> b2f6053087022898fe920ce027af4b78
Remember this used 24*2weeks = 0.9 year CPU time. If you run it and find a 13 character match or more, I would like to hear for you of course.