Shared publicly  - 
The SHA-3 contest has demonstrated that it is possible to build secure hash functions. Now can we build fast ones?

[originally posted April 13, 2012, updated after the SHA-3 contest result was announced, October 3, 2012]

When the SHA-3 process was launched—in the exciting time when MD5 and SHA-1 had been dramatically shown to be weak—it seemed like we were in danger of waking up one day and finding out that we had no strong hash functions left. It was prudent to get started on SHA-3 ASAP in order to have an alternative ready for that eventuality. NIST explicitly called for hash functions which were different from SHA-256 and which seemed likely to survive a breakthrough that would destroy SHA-256.

Now these few years later my feelings, at least, have kind of flip-flopped. SHA-256 feels a lot less likely to suddenly succumb, now that it has weathered the Wang attacks for half a decade.

I was, frankly, disappointed that the SHA-3 project didn't produce something both safer and faster than SHA-1 (not to mention SHA-2), in the same way that AES provided an algorithm both safer and faster than DES (not to mention 3-DES). But I can understand where NIST, and the many cryptographers involved, were coming from. Again, they were starting from a position of uncertainty about whether there were critical flaws in our basic assumptions, so conservatism reigned. I overheard one SHA-3 designer (who is probably reading this) comment “At that time, we weren't sure that we knew how to build secure hash functions.”.

The major result of the SHA-3 process appears to be that not only is it quite possible for cryptographers to build new secure hash functions, but also that SHA-256 is a secure hash function.

All five of the SHA-3 Round 3 candidates (finalists) seem quite secure. In addition, all nine of the Round 2 candidates seem quite secure. In addition, there were eleven Round 1 candidates without published attacks, plus five Round 1 candidates without attacks better than brute force (counting time*memory), plus twelve Round 1 candidates without attacks that were demonstrated in practice (some of them nearly practical e.g 2⁶⁷ work, and others extremely far from practical, e.g. 2¹⁴³ work). In addition there was at least one (a personal favorite), EnRUPT, which was practically vulnerable with the proposed parameters, but might be counted as "evidence that we know how to build secure hash functions" with slightly larger parameters [¹]. See the SHA-3 Zoo [²] for a catalogue of these candidates and published attacks on them.

So the SHA-3 project has shown that cryptographers could, in about a year, come up with, depending on how you count, somewhere between twenty-five and fourty-four different functions which cryptographers, in about four years, couldn't (practically) break.

Perhaps now that the SHA-3 project has wrapped up, cryptographers will start searching for the limit on how efficient they can make hash functions before they break. The "lightweight cryptography" research (Quark, Spongent, Photon) is a step in that direction.

Another approach would be cataloguing attacks on reduced-round variants of otherwise believed-good functions. For example, SHA-256 runs at about 20 cycles/byte on a 32-bit ARM chip [³] and has 64 rounds, and the best known attack on reduced SHA-256 is on 41 rounds (according to wikipedia), which suggests that a 42-round variant of SHA-256, which would run at about 13 cpb, ought to be safe.

Then there is Skein. It runs at 23 cpb and has 72 rounds, and there is no published attack of any kind (even the unreasonable kind) on 57 rounds of Skein [⁴]. If you used 57-round Skein it would cost about 18 cpb. Then Blake [⁵] (my current favorite). 14 rounds, 24 cpb, no attacks known on more than 6 rounds, so it ought to be possible to have a cut-down version of Blake that runs at about 12 cpb.

Next, consider Keccak, which is now SHA-3. It has 24 rounds, runs at about 34 cpb, and 9 rounds of it is more than enough to withstand all known attacks, even of the "unreasonable" variety. A 9-round version of Keccak would take about 13 cpb.

Note that all of these numbers above are still pretty conservative! For example, I said that there were no known attacks on Blake with more than 6 rounds, but the known "attacks" on Blake with 6 rounds are not really "attacks", but more like "interesting theoretical observations". If you only count attacks which are actually computable in the real world, which apply to the full hash function (not just to some internal component of it), and which are a full collision (not just some "near miss" like a "partial pseudo-collision"), then I don't think there is an attack on even 1 round of Blake. If 1-round Blake is actually secure, then that would mean you can have a secure hash function in about 2 cpb! A similar argument goes for Skein, Keccak, and even SHA-256.

Another approach would be to look at the most efficient full hash functions that have been studied. If you look at the most efficient hash functions on [³], they are:

md4 — 7 cpb — insecure
md5 — 8 cpb — insecure
edonr — 11 cpb — secure! It was rejected from the SHA-3 contest, not because it was shown to be insecure but from an abundance of caution (in my humble opinion — I know others who know a lot more than I do about this might disagree).
sha-1 — 14 cpb — insecure
shabal — 15 cpb — secure! ditto
bmw — 16 cpb — secure! ditto

All this makes me think that it really ought to be possible to have a function more efficient than SHA-256 and is still secure, and perhaps even possible to have a function more efficient than SHA-1 or even MD5 and is still secure. I guess it is going to be quite a few years before we gain confidence in any such function, unfortunately.

To be clear, I'm not exactly recommending that you should use a reduced-round SHA-256, a reduced-round SHA-3 (Keccak), a reduced-round SHA-3 finalist like Blake, or a SHA-3 reject like Edon-R. I'm not saying you shouldn't use such a thing either. What I'm saying is: their existence is reason to believe that a secure hash function with this kind of efficiency could exist.





[³] ; 32-bit ARM "h6dragon", 4096 byte input, worst quartile, 2012-10-03



This post was derived from this post to the cryptography mailing list:
Chris W. (CodesInChaos)'s profile photoMansour Moufid's profile photoZooko Wilcox-O'Hearn's profile photoDavid Johnston's profile photo
One question given the time frames involved in algorithm design, testing and acceptance on the efficiency (lower cycles per byte) vs perceived security of a SHA is how much does aiming for ultimate cpb efficiency matter vs establishing a standard that will last for a enough decades to be granted room to be implemented as hardware in a tiny corner of shrinking chip dies due to moores law?

IMNSHO, the more important measure of efficiency is not cycles per byte but energy per byte; typically measured in Watt hours.

Lower energy solutions are typically faster but I am not sure that can be guaranteed.
Good points, Greg. I too think energy per byte and ASIC area are critical metrics for the long term. (I think cycles per byte in software on ARM is a critical metric for the short term -- the next 10 years.)

This is one of the details that is particularly disappointing about SHA-3 -- all of the finalists are actually worse than SHA-256 in those metrics! See for example .

The "lightweight cryptography" field that I mentioned earlier is actually more about low gate count in ASIC than about low cycles per byte in software. I think those "lightweight cryptography" researchers are working on designs that might turn out to be just as secure as SHA-2 and SHA-3, but use maybe 1/10 the ASIC gates. I don't know if one of those will also be as efficient as SHA-256 in software, though...

My favorite failure of a SHA-3 candidate, EnRUPT, was pretty good at this. Check out Figure 12 from which shows, among other things, EnRUPT processing 10 Gbps in 10K gates, or 2 Gbps in 6 Kgates. (Note that the version described in that document is completely insecure. You have to cut performance by about a factor of 2 to get a (probably) secure variant, so that would be 5 Gbps in 10K gates, or 1 Gbps in 6K gates.)
First, thank you for the insight. As a consumer of cryptography tools without delving too deeply into their internals, I appreciate the investigation you put forth.

Though I wonder, is this a purely academic discussion? The algorithms you mention which were failures consequently will not be present in SHA-3, and won't be available as "common-use." In this sense, the discussion seems moot; could you elaborate as to how this could influence the futures in cryptography?
That's a good question, Zackery. In the near term, as a user of crypto algorithms you'll probably just keep using SHA-256 or else start using SHA-3. The most important difference is that SHA-256 (like SHA-1, MD5, Tiger, and RIPEMD-160) is vulnerable to length-extension attacks, and SHA-3 won't be. (I hope you're not using SHA-1 or MD51)

In the longer term, maybe in 10 years, it is possible, but by no means assured, that there will be a new algorithm that you might choose which requires about a quarter as many CPU cycles and which seems to be as safe as SHA-256 or SHA-3.

Another trend in security engineering is people trying to build higher-level abstractions so that users of crypto algorithms like us don't have to deal with the nuts and bolts such as secure hash functions and instead get some higher-level abstractions such as integrated-integrity-and-encryption modes. If that effort works out then perhaps in the future you'll choose a standard or a library which provides all the right security properties and you won't even know what secure hash function it uses or if it uses a secure hash function at all.
Speed is not desirable in a cryptographic hash. The slower the hash, the harder the brute-force; with stretching, this can be a significant difference in the cost of making attack feasable.

Yes, we very easily could make them faster. Look at hashes used for hash tables. There's a good reason that cryptographic hashes are so slow: it's a desirable feature.
No. A slow secure hash is one that won't be used. Hashes and symmetric crypto algorithm speeds must be on par and must be fast.
This is counter to the purpose of stretching, peppering, and the design of scrypt and bcrypt, whose entire purpose is to slow the thing down, and otherwise consume resources (memory consumption, etc.)

Cryptographic hashes should be slow and should consume large volumes of resources. It's a form of counter-attack called resource prohibitive defense.
That prevents use in online signing, and makes verification slow. Password derivation is an edge case.
John: if you want a hash function to be slow in order to slow down brute force attempts, it is easy to apply it many times iteratively. The SHA-3 process has always had efficiency (fast computation, little RAM requirements, little silicon area, etc.) as a desideratum.

There's an interesting subtlety about this, which is that the attacker and the defender may have different implementations, so the relative efficiency between cheaper and more specialized implementations is what matters.

That is: suppose you want to iterate your hash as many times as you can in order to slow down attackers, but defenders will be computing this on their smart phones so you don't want to do it so many times that it imposes a user-annoying delay when authenticating. So you figure out how many iterations you can require without annoying your users. Now: what if the attacker buys a chip with the hash function implemented in hardware? The more efficient the hardware is (as a multiple of how efficient the software implementation on the smartphone is), the easier the attacker's job. So, interestingly, for this particular application, we would ideally want a hash function that is less efficient in hardware and/or more efficient in software. :-)
Hi, Watson Ladd!

Yes, Keccak is great for high throughput, e.g. says:

IBM 130 nm
hash -- area (kGEs) -- throughput (Gbps)
Blake -- 34 -- 2.1
Grøstl -- 124 -- 9.3
JH -- 49 -- 3.1
Keccak -- 42 -- 10.7
Skein -- 66 -- 3.1

However, most applications don't have 10 Gbps worth of data to hash! For a lot of situations, it might be better to have Blake, which can be squeezed into more designs because it requires fewer gates, and which is still more than fast enough for most actual demands, e.g. streaming video to your smart phone, which can't receive data at anywhere near Blake's 2.1 Gbps anyway.

If in the happy future we all do have data flowing at 10 Gbps, we'll also hopefully have higher-tech chips. The 120 nm process mentioned above is cheap and available, but several generations of finer tech have come online: 90 nm, 65 nm, 45 nm, and now 32 nm.


Of those processes, the best one that we have measurements of all the SHA-3 finalists in is 90 nm.


UMC or STM 90 nm
hash -- area (kGEs) -- throughput (Gbps)
Blake small -- 38 -- 15
Blake fast -- 65 -- 17
Grøstl small -- 120 -- 16
Grøstl fast -- 139 -- 17
JH small -- 55 -- 10
JH fast -- 80 -- 11
Keccak small -- 50 -- 43
Keccak fast -- 50 -- 43
Skein small -- 43 -- 3
Skein fast -- 50 -- 4

To see where this is going, there is also one report of an even better process -- someone implemented Skein in Intel 32 nm:

Intel 32 nm
hash -- area (kGEs) -- throughput (Gbps)
Skein-512 -- 58 -- 32

So Skein, at least, gets a 10X efficiency boost when moving from 120 nm or 90 nm to 32 nm.
Ah, here's a paper implementing them all in 65 nm:

If I understand correctly, they tried to design each one to handle 2.488 Gbps (data rate for a OC48 channel). I think they failed in a couple of cases.

See the two-dimensional Figure 3 for the full story, but:

UMC 65 nm
hash -- area (kGEs) -- throughput (Gbps)
Blake small -- 40 -- 1.8
Blake fast -- 43 -- 4.5
Grøstl small -- 70 -- 1.7
Grøstl fast -- 160 -- 11
JH small -- 47 -- 4
JH fast -- 54 -- 7
Keccak small -- 46 -- 22
Keccak fast -- 80 -- 27
Skein small -- 72 -- 1.9
Skein fast -- 72 -- 4.8
SHA-2 small -- 24 -- 2.3
SHA-2 fast -- 25 -- 4.2

See, SHA-2 is, for at least some purposes, better than any SHA-3 candidate. :-/
OTOH, on FPGA BLAKE has a much better efficiency (as throughput/area) than SHA-2: see slide 20 on this presentation from the 3rd SHA-3 conference:

Another argument for BLAKE or Skein is that they seem to better exploit the recent (and arguably future) microarchitectures from Intel or AMD, thanks to their local parallelism:

ASIC performance figures will probably look better on 32nm technology than 90nm for any of the candidates.
Jean-Philippe: nice! Thanks for pointing that out. I have the vague notion that FPGA is becoming more important. They recently started shipping 28 nm FPGA, and I have the vague notion that this will make FPGA beat ASIC in a lot more designs. Also the recent trend in integrating an ARM CPU and making it easier for ARM programmers to program your FPGA seems very promising.

It has been interesting to see Bitcoin miners (at least some of them) moving from GPU to FPGA over the last few months.
> ASIC performance figures will probably look better on 32nm technology than 90nm for any of the candidates.

Oh, of course. It's just that, unfortunately, those folks implemented only Skein on their Intel 32 nm process.
Going back to your initial post: you say that rejecting BMW or Shabal was "an abundance of caution", but OTOH you've argued in favor of AES-256 + Salsa20, for the sake of long-term security. This sounds a bit inconsistent to me; long-term security is precisely what SHA-3 aims to achieve, and IMO Edon, BMW, or Shabal really don't inspire confidence for that matter.
Jean-Philippe: you're right. I guess I can resolve my apparent inconsistency by saying that an abundance of caution is appropriate for the SHA-3 process, but I still hope for a more efficient alternative someday.

I'm not sure my wish will be granted in the forseeable future. The symmetric crypto community's attention seems to be moving on to other things like authenticated encryption (which, as it happens, I currently have no use for).

Even if people go ahead and invent extremely efficient new hash functions after SHA-3 (analogous, perhaps, to the way they went ahead and invented extremely efficient new ciphers after AES), I imagine it will take at least a decade before any of them have received sufficient scrutiny.

I guess the best outcome for my purposes at this point would be if the makers of all CPUs would start including a SHA-3 circuit in every chip. I regard the software performance of the SHA-3 candidates as an issue that has to be grappled with, but if it came in hardware everything would be fine. :-)
Zooko: wouldn't a 5 cycles/byte tree-hash running on a 3GHz CPU be fast enough for you? (4.8Gbps!)
My 2 cents on John's initial comment "Speed is not desirable in a cryptographic hash": you probably mean that speed is not desirable for password hashing schemes, which I fully agree with! As you know, however, many applications of cryptographic hash function need high speeds. OTOH, we do have constructions called password hashing schemes, whose goal is to slow down hashing; those constructions may or may not rely on a cryptographic hash, and essentially consists in performing many iterations of a basic algorithm (PBKDF2 asks for a PRF [not necessarily a cryptographic hash], bcrypt uses Blowfish, etc.).
Jean-Philippe: you're right that I'm being somewhat inconsistent here, but I guess that's because I want to have it both ways: an abundance of caution was a good guiding principle for the SHA-3 process, but I still need something more efficient in addition to having a nice safe SHA-3.

There are a great many uses of secure hash functions where SHA-256-style efficiency is fine, but the two that I care most about happen to be two where more efficiency would be desirable.

Those two are: integrity-checking on (possibly large) files, and hash-based digital signatures.
> wouldn't a 5 cycles/byte tree-hash running on a 3GHz CPU be fast enough for you? (4.8Gbps!)

Yes! 5 cpb would be great! Where can I get such a thing?

*looks at*

Oh... I think I see what you are getting at. But bblake256 is 35 cpb on Scorpion and 10 cpb on amd64. Still: promising! That may well turn out to be good enough.
As process nodes get smaller and more expensive, FPGAs get cheaper and more powerful and the ASIC / FPGA tradeoff only makes sense for larger and larger quantities of ASICs (not to mention the lead time problems with ASICs). These days it may be around 10 million units before you would choose an ASIC, and you can get a tiny FPGA for 2 bucks.
+Daniel Holth: good point! It seems like uniprocessor ASIC stopped climbing the Moore's Law curve a few years ago, but FPGA is still on that curve. I'm hoping to see someone start selling mainstream commodity CPUs that come with an FPGA. I.e., Intel, AMD, or maybe some ARM vendor.
> and there is no published attack of any kind on more than 57 rounds of Skein

I think the 57 round attack only applies to Round 2 Skein. The Round 3 tweak reduced the efficiency of that attack.
Is there any reason not to use RIPEMD-160? According to that eBACS page, it's just as fast as or faster than any SHA-2 function, and also comparable to Skein.
+Chris Muller: Oh, thanks! How many rounds of Round 3 Skein is enough to resist all known attacks, then?

+Mansour Moufid: RIPEMD-160 has only 160-bit output, which might not be enough to withstand long-term, far-future birthday collisions, if you care about that sort of thing. According to RIPEMD-160 is slightly slower than SHA-256 on the ARM devices.
+Mansour Moufid Having only 80 bit of collision resistance is enough for me to instantly dismiss it. I need a 256 bit hash function. So you'd need to compare with RIPEMD-256. SHA-3 also requires a 512 bit variant, which currently doesn't exist in the RIPEMD family.

To see how it compares against SHA-3 candidates you'd need to evaluate the security margin of both. I didn't investigate it, but I suspect the margin of Blake and Skein is higher than the margin of RIPEMD-160.

It suffers from the generic weaknesses of the Merkle-Damgaard construction, most notably length extensions. So you'd need to fix that too, making it even less attractive.
Hm, I remembered that CubeHash had some nice tables of known analysis that was easy to browse:

If I understand correctly, CubeHash9/32 is more than enough to perfectly resist all known attacks. "cubehash512" on, running at 24 cpb (4096 byte inputs, worst quartile), seems to be a variant of CubeHash16/32, so CubeHash9/32 ought to be able to run at about 13 cpb.

Interesting! Once again, a secure hash function, when cut down to be just a little bit stronger than it needs to be to perfectly withstand all known attacks (even unrealistic ones), runs at about 13 cpb. Interesting pattern.
I build crypto functions in hardware. I don't know how to build a secure SHA3. The state space is massive (1600 bits all clocking on the same cycle in the same place!), which is nice mathematically but horrendous from a side channel and noise/power of view.
Add a comment...