SlideShare a Scribd company logo
Cryptographic Hash Functions
and their many applications
Shai Halevi – IBM Research
USENIX Security – August 2009
Thanks to Charanjit Jutla and Hugo Krawczyk
What are hash functions?
 Just a method of compressing strings
– E.g., H : {0,1}*  {0,1}160
– Input is called “message”, output is “digest”
 Why would you want to do this?
– Short, fixed-size better than long, variable-size
 True also for non-crypto hash functions
– Digest can be added for redundancy
– Digest hides possible structure in message
Typically using Merkle-Damgård iteration:
1. Start from a “compression function”
– h: {0,1}b+n{0,1}n
2. Iterate it
How are they built?
h
c =160 bits
|M|=b=512 bits
d=h(c,M)=160 bits
h h h h
…
M1 M2 ML-1 ML
IV=d0
d1 d2 dL-1 dL
d=H(M)
But not
always…
What are they good for?
“Request for Candidate Algorithm Nominations”,
-- NIST, November 2007
“Modern, collision resistant hash functions were designed to create
small, fixed size message digests so that a digest could act as a
proxy for a possibly very large variable length message in a digital
signature algorithm, such as RSA or DSA. These hash functions
have since been widely used for many other “ancillary” applications,
including hash-based message authentication codes, pseudo
random number generators, and key derivation functions.”
Some examples
 Signatures: sign(M) = RSA-1( H(M) )
 Message-authentication: tag=H(key,M)
 Commitment: commit(M) = H(M,…)
 Key derivation: AES-key = H(DH-value)
 Removing interaction [Fiat-Shamir, 1987]
– Take interactive identification protocol
– Replace one side by a hash function
Challenge = H(smthng, context)
– Get non-interactive signature scheme
smthng
challenge
response
A B
smthng, response
Part I: Random functions
vs. hash functions
Random functions
 What we really want is H that behaves
“just like a random function”:
Digest d=H(M) chosen uniformly for each M
– Digest d=H(M) has no correlation with M
– For distinct M1,M2,…, digests di=H(Mi) are
completely uncorrelated to each other
– Cannot find collisions, or even near-collisions
– Cannot find M to “hit” a specific d
– Cannot find fixed-points (d = H(d))
– etc.
The “Random-Oracle paradigm”
1. Pretend hash function is really this good
2. Design a secure cryptosystem using it
 Prove security relative to a “random oracle”
[Bellare-Rogaway, 1993]
The “Random-Oracle paradigm”
[Bellare-Rogaway, 1993]
1. Pretend hash function is really this good
2. Design a secure cryptosystem using it
 Prove security relative to a “random oracle”
3. Replace oracle with a hash function
 Hope that it remains secure
The “Random-Oracle paradigm”
1. Pretend hash function is really this good
2. Design a secure cryptosystem using it
 Prove security relative to a “random oracle”
3. Replace oracle with a hash function
 Hope that it remains secure
 Very successful paradigm, many schemes
– E.g., OAEP encryption, FDH,PSS signatures
 Also all the examples from before…
– Schemes seem to “withstand test of time”
[Bellare-Rogaway, 1993]
Random oracles: rationale
 S is some crypto scheme (e.g., signatures),
that uses a hash function H
 S proven secure when H is random function
 Any attack on real-world S must use
some “nonrandom property” of H
 We should have chosen a better H
– without that “nonrandom property”
 Caveat: how do we know what “nonrandom
properties” are important?
This rationale isn’t sound
 Exist signature schemes that are:
1. Provably secure wrt a random function
2. Easily broken for EVERY hash function
 Idea: hash functions are computable
– This is a “nonrandom property” by itself
 Exhibit a scheme which is secure only
for “non-computable H’s”
– Scheme is (very) “contrived”
[Canetti-Goldreich-H 1997]
Contrived example
 Start from any secure signature scheme
– Denote signature algorithm by SIG1H(key,msg)
 Change SIG1 to SIG2 as follows:
SIG2H(key,msg): interprate msg as code P
– If P(i)=H(i) for i=1,2,3,…,|msg|, then output key
– Else output the same as SIG1H(key,msg)
 If H is random, always the “Else” case
 If H is a hash function, attempting to sign
the code of H outputs the secret key
Some
Technicalities
Cautionary note
 ROM proofs may not mean what you think…
– Still they give valuable assurance, rule out
“almost all realistic attacks”
 What “nonrandom properties” are important
for OAEP / FDH / PSS / …?
 How would these scheme be affected by a
weakness in the hash function in use?
 ROM may lead to careless implementation
Merkle-Damgård vs. random functions
 Recall: we often construct our hash functions
from compression functions
– Even if compression is random, hash is not
 E.g., H(key|M) subject to extension attack
– H(key | M|M’) = h( H(key|M), M’)
– Minor changes to MD fix this
 But they come with a price (e.g. prefix-free encoding)
 Compression also built from low-level blocks
– E.g., Davies-Meyer construction,
h(c,M)=EM(c)c
– Provide yet more structure, can lead to attacks
on provable ROM schemes [H-Krawczyk 2007]
h h h h
…
Part II: Using hash functions
in applications
Using “imperfect” hash functions
 Applications should rely only on “specific
security properties” of hash functions
– Try to make these properties as “standard” and
as weak as possible
 Increases the odds of long-term security
– When weaknesses are found in hash function,
application more likely to survive
– E.g., MD5 is badly broken, but HMAC-MD5 is
barely scratched
Security requirements
 Deterministic hashing
– Attacker chooses M, d=H(M)
 Hashing with a random salt
– Attacker chooses M, then good guy
chooses public salt, d=H(salt,M)
 Hashing random messages
– M random, d=H(M)
 Hashing with a secret key
– Attacker chooses M, d=H(key,M)
Stronger
Weaker
Deterministic hashing
 Collision Resistance
– Attacker cannot find M,M’ such that H(M)=H(M’)
 Also many other properties
– Hard to find fixed-points, near-collisions,
M s.t. H(M) has low Hamming weight, etc.
Hashing with public salt
 Target-Collision-Resistance (TCR)
– Attacker chooses M, then given random salt,
cannot find M’ such that H(salt,M)=H(salt,M’)
 enhanced TRC (eTCR)
– Attacker chooses M, then given random salt,
cannot find M’,salt’ s.t. H(salt,M)=H(salt’,M’)
Hashing random messages
 Second Preimage Resistance
– Given random M, attacker cannot find M’
such that H(M)=H(M’)
 One-wayness
– Given d=H(M) for random M, attacker cannot
find M’ such that H(M’)=d
 Extraction*
– For random salt, high-entropy M, the digest
d=H(salt,M) is close to being uniform
* Combinatorial, not cryptographic
Hashing with a secret key
 Pseudo-Random Functions
– The mapping MH(key,M) for secret key
looks random to an attacker
 Universal hashing*
– For all MM’, Prkey[ H(key,M)=H(key,M’) ]<e
* Combinatorial, not cryptographic
Application 1:
Digital signatures
 Hash-then-sign paradigm
– First shorten the message, d = H(M)
– Then sign the digest, s = SIGN(d)
 Relies on collision resistance
– If H(M)=H(M’) then s is a signature on both
 Attacks on MD5, SHA-1 threaten current
signatures
– MD5 attacks can be used to get bad CA cert
[Stevens et al. 2009]
Collision resistance is hard
 Attacker works off-line (find M,M’)
– Can use state-of-the-art cryptanalysis, as much
computation power as it can gather, without
being detected !!
 Helped by birthday attack (e.g., 280 vs 2160)
 Well worth the effort
– One collision  forgery for any signer
 Use randomized hashing
– To sign M, first choose fresh random salt
– Set d= H(salt, M), s= SIGN( salt || d )
 Attack scenario (collision game):
– Attacker chooses M, M’
– Signer chooses random salt
– Attacker must find M' s.t. H(salt,M) = H(salt,M')
 Attack is inherently on-line
– Only rely on target collision resistance
Signatures without CRHF
[Naor-Yung 1989, Bellare-Rogaway 1997]
same salt (since salt
is explicitly signed)
TCR hashing for signatures
 Not every randomization works
– H(M|salt) may be subject to collision attacks
 when H is Merkle-Damgård
– Yet this is what PSS does (and it’s provable in the ROM)
 Many constructions “in principle”
– From any one-way function
 Some engineering challenges
– Most constructions use long/variable-size randomness,
don’t preserve Merkle-Damgård
 Also, signing salt means changing the underlying
signature schemes
 Use “stronger randomized hashing”, eTCR
– To sign M, first choose fresh random salt
– Set d = H(salt, M), s = SIGN( d )
 Attack scenario (collision game):
– Attacker chooses M
– Signer chooses random salt
– Attacker needs M‘,salt’ s.t. H(salt,M)=H(salt',M')
 Attack is still inherently on-line
[H-Krawczyk 2006]
attacker can use
different salt’
Signatures with enhanced TCR
Randomized hashing with RMX
 Use simple message-randomization
– RMX: M=(M1,M2,…,ML), r 
(r, M1r,M2r,…,MLr)
 Hash( RMX(r,M) ) is eTCR when:
– Hash is Merkle-Damgård, and
– Compression function is ~ 2nd-preimage-resistant
 Signature: [ r, SIGN( Hash( RMX(r,M) )) ]
– r fresh per signature, one block (e.g. 512 bits)
– No change in Hash, no signing of r
[H-Krawczyk 2006]
HASH
SIGN
r
HASH
SIGN
RMX
M =(M1,…,ML)
X
(r, M1r,,…,MLr)
M =(M1,…,ML)
Preserving hash-then-sign
TCR
Application 2:
Message authentication
 Sender, Receiver, share a secret key
 Compute an authentication tag
– tag = MAC(key, M)
 Sender sends (M, tag)
 Receiver verifies that tag matches M
 Attacker cannot forge tags without key
Authentication with HMAC
 Simple key-prepend/append have problems
when used with a Merkle-Damgård hash
– tag=H(key | M) subject to extension attacks
– tag=H(M | key) relies on collision resistance
 HMAC: Compute tag = H(key | H(key | M))
– About as fast as key-prepend for a MD hash
 Relies only on PRF quality of hash
– MH(key|M) looks random when key is secret
[Bellare-Canetti-Krawczyk 1996]
Authentication with HMAC
 Simple key-prepend/append have problems
when used with a Merkle-Damgård hash
– tag=H(key | M) subject to extension attacks
– tag=H(M | key) relies on collision resistance
 HMAC: Compute tag = H(key | H(key | M))
– About as fast as key-prepend for a MD hash
 Relies only on PRF property of hash
– MH(key|M) looks random when key is secret
[Bellare-Canetti-Krawczyk 1996]
As a result, barely
affected by collision
attacks on
MD5/SHA1
Carter-Wegman authentication
 Compress message with hash, t=H(key1,M)
 Hide t using a PRF, tag =
tPRF(key2,nonce)
– PRF can be AES, HMAC, RC4, etc.
– Only applied to a short nonce, typically not a
performance bottleneck
 Secure if the PRF is good, H is “universal”
– For MM’,D, Prkey[ H(key,M)H(key,M’)=D ]<e)
– Not cryptographic, can be very fast
[Wegman-Carter 1981,…]
Fast Universal Hashing
 “Universality” is combinatorial, provable
 no need for “security margins” in design
 Many works on fast implementations
From inner-product, Hk1,k2(M1,M2)=(K1+M1)·(K2+M2)
 [H-Krawczyk’97, Black et al.’99, …]
From polynomial evaluation Hk(M1,…,ML)=Si Mi ki
 [Krawczyk’94, Shoup’96, Bernstein’05, McGrew-
Viega’06,…]
 As fast as 2-3 cycle-per-byte (for long M’s)
– Software implementation, contemporary CPUs
Part III:
Designing a hash function
Fugue: IBM’s candidate for the
NIST hash competition
Design a compression function?
PROs: modular design, reduce to the “simpler
problem” of compressing fixed-length strings
– Many things are known about transforming
compression into hash
CONs: compressionhash has its problems
– It’s not free (e.g. message encoding)
– Some attacks based on the MD structure
 Extension attacks ( rely on H(x|y)=h(H(x),y) )
 “Birthday attacks” (herding, multicollisions, …)
h h h
…
h
 Find many off-line collisions
– “Tree structure” with ~2n/3 di,j’s
– Takes ~ 22n/3 time
 Publish final d
 Then for any prefix P
– Find “linking block” L s.t. H(P|L) in the tree
– Takes ~ 22n/3 time
– Read off the tree the suffix S to get to d
 Show an extension of P s.t. H(P|L|S) = d
Example attack: herding
[Kelsey-Kohno 2006]
h
h
h
h
d2,1 h
h
d
M1,1
M1,2
M1,3
M1,4
M2,1
M2,2
d1,1
d1,2
d1,3
d1,4
d2,2
The culprit: small intermediate state
 With a compression function, we:
– Work hard on current message block
– Throw away this work, keep only n-bit state
 Alternative: keep a large state
– Work hard on current message block/word
– Update some part of the big state
 More flexible approach
– Also more opportunities to mess things up
The hash function Grindahl
 State is 13 words = 52 bytes
 Process one 4-byte word at a time
– One AES-like mixing step per word of input
 After some final processing, output 8 words
 Collision attack by Peyrin (2007)
– Complexity ~ 2112 (still better than brute-force)
 Recently improved to ~ 2100 [Khovratovich 2009]
– “Start from a collision and go backwards”
[Knudsen-Rechberger-Thomsen 2007]
The hash function “Fugue”
 Proof-driven design
– Designed to enable analysis
 Proofs that Peyrin-style attacks do not work
 State of 30 4-byte words = 120 bytes
 Two “super-mixing” rounds per word of input
– Each applied to only 16 bytes of the state
– With some extra linear diffusion
 Super-mixing is AES-like
– But uses stronger MDS codes
[H-Hall-Jutla 2008]
Initial State (30 words)
Process
New State
M1
Mi
Final Processing
Output 8 words = 256 bits
Iterate
State
Fugue-256
Initial State (30 words)
Process
New State
DM1
DMi
Final Processing
D = 0
Iterate
State
Collision attacks
D State = 0? D State = 0
Internal collision
D State  0
External collision
Collision
means that
DMi’s are
not all zero
Think of M1, …,ML
and M’1,…,M’L
Initial State (30 words)
Process
New State
Final Stage
Iterate
State
Process
M1
SMIX

M1
Repeat 2-4 once more
Processing one input word
1. Input one word
2. Shift 3 columns to right
3. XOR into columns 1-3
4. “super-mix” operation
on columns 1-4
This is
where the
crypto
happens
SMIX in Fugue
 Similar to one AES round
– Works on a 4x4 matrix of bytes
– Starts with S-box substitution
 Byte b, S[256] = {...};
...
 b = S[b];
– Does linear mixing
 Stronger mixing than AES
– Diagonal bytes as in AES
– Other bytes are mixed into both column and row
SMIX in Fugue
 In algebraic notation:
 M generates a good linear code
– If all the bi’ bytes but 4 are zero
then  13 of the S[bi] bytes must be nonzero
– And other such properties
b16
= M16x16

b2
b1
'
M
'
'
S[b2]
S[b1]
M
S[b16]
Analyzing internal collisions*
SMIX

D
After last input word: DState=0
before input word: D10
4 nonzero byte diffs
before SMIX: D1-40
still D1-40 
now D28-10  3 columns
* a bit oversimplified

Analyzing internal collisions*
SMIX
D
after input word: DState=0
before input word: D10
before SMIX: D1-40
still D1-40 
now D28-10  3 columns
SMIX

D28-40
D28-40
 3 columns
D25-10
4 nonzero byte diffs
* a bit oversimplified

Analyzing internal collisions*
SMIX
D
after input word: DState=0
before input word: D10
before SMIX: D1-40
still D1-40 
now D28-10  3 columns
SMIX

D28-40
D28-40
 3 columns
D25-10
D’
before input: D1=?, D25-300
* a bit oversimplified
The analysis
from previous
slides was
upto here
Many nonzero byte
differences before
the SMIX operations
Cryptographic-Hash-Functions.ppt
Analyzing internal collisions
 What does this mean? Consider this attack:
– Attacker feeds in random M1,M2,… and M’1,M’2,…
– Until StateL  State’L = some “good D”
– Then it searches for suffixed (ML+1,…,ML+4),
(M’L+1,…,M’L+4) that will induce internal collision
Theorem*: For any fixed D,
Pr[  suffixes that induce collision ] < 2-150
* Relies on a very mild independence assumptions
Analyzing internal collisions
 Why do we care about this analysis?
 Peyrin’s attacks are of this type
 All differential attacks can be seen as
(optimizations of) this attack
– Entities that are not controlled by attack are
always presumed random
 A known “collision trace” is as close as we
can get to understanding collision resistance
Fugue: concluding remarks
 Similar analysis also for external collisions
– “Unusually thorough” level of analysis
 Performance comparable to SHA-256
– But more amenable to parallelism
 One of 14 submissions that were selected
by NIST to advance to 2nd round of the
SHA3 competition
Morals
 Hash functions are very useful
 We want them to behave “just like random
functions”
– But they don’t really
 Applications should be designed to rely on
“as weak as practical” properties of hashing
– E.g., TCR/eTCR rather than collision-resistance
 A taste of how a hash function is built
Thank you!

More Related Content

PPTX
Hash Techniques in Cryptography
PPT
lec-05-Message authentication, hashing, basic number theory.ppt
PPT
ch11_hashing Function.ppthdhdjdjdidjebehehejeueu
PPT
Network Security Lec5
PPTX
20180503_hash_based.pptx
PDF
cryptography summary hash function slides
PPTX
Bitcoin MOOC Lecture 1.pptx
PPT
Hash Techniques in Cryptography
lec-05-Message authentication, hashing, basic number theory.ppt
ch11_hashing Function.ppthdhdjdjdidjebehehejeueu
Network Security Lec5
20180503_hash_based.pptx
cryptography summary hash function slides
Bitcoin MOOC Lecture 1.pptx

Similar to Cryptographic-Hash-Functions.ppt (20)

PPT
NSC_Unit-III_final.ppt
PPT
27-SHA1.ppt
PPTX
Lecture 2 Message Authentication
PPT
Hash crypto
PPT
Hash crypto
PPT
Hash crypto
PPT
Hash crypto
PPT
Hash crypto
PPT
Hash crypto
PPT
Hash crypto
PPT
PPT
Hash Function & Analysis
PPTX
Introduction to security_and_crypto
PPTX
Introduction to security_and_crypto
PPTX
Introduction to security_and_crypto
PPTX
Introduction to security_and_crypto
PPTX
Introduction to security_and_crypto
PPTX
Introduction to security_and_crypto
PPTX
Introduction to security_and_crypto
PPTX
Cryptography Key Management.pptx
NSC_Unit-III_final.ppt
27-SHA1.ppt
Lecture 2 Message Authentication
Hash crypto
Hash crypto
Hash crypto
Hash crypto
Hash crypto
Hash crypto
Hash crypto
Hash Function & Analysis
Introduction to security_and_crypto
Introduction to security_and_crypto
Introduction to security_and_crypto
Introduction to security_and_crypto
Introduction to security_and_crypto
Introduction to security_and_crypto
Introduction to security_and_crypto
Cryptography Key Management.pptx
Ad

Recently uploaded (20)

PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPT
Project quality management in manufacturing
PPT
Mechanical Engineering MATERIALS Selection
PDF
Well-logging-methods_new................
PPTX
web development for engineering and engineering
PDF
PPT on Performance Review to get promotions
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
composite construction of structures.pdf
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Digital Logic Computer Design lecture notes
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Internet of Things (IOT) - A guide to understanding
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
CYBER-CRIMES AND SECURITY A guide to understanding
Project quality management in manufacturing
Mechanical Engineering MATERIALS Selection
Well-logging-methods_new................
web development for engineering and engineering
PPT on Performance Review to get promotions
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Embodied AI: Ushering in the Next Era of Intelligent Systems
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
composite construction of structures.pdf
OOP with Java - Java Introduction (Basics)
bas. eng. economics group 4 presentation 1.pptx
Digital Logic Computer Design lecture notes
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Ad

Cryptographic-Hash-Functions.ppt

  • 1. Cryptographic Hash Functions and their many applications Shai Halevi – IBM Research USENIX Security – August 2009 Thanks to Charanjit Jutla and Hugo Krawczyk
  • 2. What are hash functions?  Just a method of compressing strings – E.g., H : {0,1}*  {0,1}160 – Input is called “message”, output is “digest”  Why would you want to do this? – Short, fixed-size better than long, variable-size  True also for non-crypto hash functions – Digest can be added for redundancy – Digest hides possible structure in message
  • 3. Typically using Merkle-Damgård iteration: 1. Start from a “compression function” – h: {0,1}b+n{0,1}n 2. Iterate it How are they built? h c =160 bits |M|=b=512 bits d=h(c,M)=160 bits h h h h … M1 M2 ML-1 ML IV=d0 d1 d2 dL-1 dL d=H(M) But not always…
  • 4. What are they good for? “Request for Candidate Algorithm Nominations”, -- NIST, November 2007 “Modern, collision resistant hash functions were designed to create small, fixed size message digests so that a digest could act as a proxy for a possibly very large variable length message in a digital signature algorithm, such as RSA or DSA. These hash functions have since been widely used for many other “ancillary” applications, including hash-based message authentication codes, pseudo random number generators, and key derivation functions.”
  • 5. Some examples  Signatures: sign(M) = RSA-1( H(M) )  Message-authentication: tag=H(key,M)  Commitment: commit(M) = H(M,…)  Key derivation: AES-key = H(DH-value)  Removing interaction [Fiat-Shamir, 1987] – Take interactive identification protocol – Replace one side by a hash function Challenge = H(smthng, context) – Get non-interactive signature scheme smthng challenge response A B smthng, response
  • 6. Part I: Random functions vs. hash functions
  • 7. Random functions  What we really want is H that behaves “just like a random function”: Digest d=H(M) chosen uniformly for each M – Digest d=H(M) has no correlation with M – For distinct M1,M2,…, digests di=H(Mi) are completely uncorrelated to each other – Cannot find collisions, or even near-collisions – Cannot find M to “hit” a specific d – Cannot find fixed-points (d = H(d)) – etc.
  • 8. The “Random-Oracle paradigm” 1. Pretend hash function is really this good 2. Design a secure cryptosystem using it  Prove security relative to a “random oracle” [Bellare-Rogaway, 1993]
  • 9. The “Random-Oracle paradigm” [Bellare-Rogaway, 1993] 1. Pretend hash function is really this good 2. Design a secure cryptosystem using it  Prove security relative to a “random oracle” 3. Replace oracle with a hash function  Hope that it remains secure
  • 10. The “Random-Oracle paradigm” 1. Pretend hash function is really this good 2. Design a secure cryptosystem using it  Prove security relative to a “random oracle” 3. Replace oracle with a hash function  Hope that it remains secure  Very successful paradigm, many schemes – E.g., OAEP encryption, FDH,PSS signatures  Also all the examples from before… – Schemes seem to “withstand test of time” [Bellare-Rogaway, 1993]
  • 11. Random oracles: rationale  S is some crypto scheme (e.g., signatures), that uses a hash function H  S proven secure when H is random function  Any attack on real-world S must use some “nonrandom property” of H  We should have chosen a better H – without that “nonrandom property”  Caveat: how do we know what “nonrandom properties” are important?
  • 12. This rationale isn’t sound  Exist signature schemes that are: 1. Provably secure wrt a random function 2. Easily broken for EVERY hash function  Idea: hash functions are computable – This is a “nonrandom property” by itself  Exhibit a scheme which is secure only for “non-computable H’s” – Scheme is (very) “contrived” [Canetti-Goldreich-H 1997]
  • 13. Contrived example  Start from any secure signature scheme – Denote signature algorithm by SIG1H(key,msg)  Change SIG1 to SIG2 as follows: SIG2H(key,msg): interprate msg as code P – If P(i)=H(i) for i=1,2,3,…,|msg|, then output key – Else output the same as SIG1H(key,msg)  If H is random, always the “Else” case  If H is a hash function, attempting to sign the code of H outputs the secret key Some Technicalities
  • 14. Cautionary note  ROM proofs may not mean what you think… – Still they give valuable assurance, rule out “almost all realistic attacks”  What “nonrandom properties” are important for OAEP / FDH / PSS / …?  How would these scheme be affected by a weakness in the hash function in use?  ROM may lead to careless implementation
  • 15. Merkle-Damgård vs. random functions  Recall: we often construct our hash functions from compression functions – Even if compression is random, hash is not  E.g., H(key|M) subject to extension attack – H(key | M|M’) = h( H(key|M), M’) – Minor changes to MD fix this  But they come with a price (e.g. prefix-free encoding)  Compression also built from low-level blocks – E.g., Davies-Meyer construction, h(c,M)=EM(c)c – Provide yet more structure, can lead to attacks on provable ROM schemes [H-Krawczyk 2007] h h h h …
  • 16. Part II: Using hash functions in applications
  • 17. Using “imperfect” hash functions  Applications should rely only on “specific security properties” of hash functions – Try to make these properties as “standard” and as weak as possible  Increases the odds of long-term security – When weaknesses are found in hash function, application more likely to survive – E.g., MD5 is badly broken, but HMAC-MD5 is barely scratched
  • 18. Security requirements  Deterministic hashing – Attacker chooses M, d=H(M)  Hashing with a random salt – Attacker chooses M, then good guy chooses public salt, d=H(salt,M)  Hashing random messages – M random, d=H(M)  Hashing with a secret key – Attacker chooses M, d=H(key,M) Stronger Weaker
  • 19. Deterministic hashing  Collision Resistance – Attacker cannot find M,M’ such that H(M)=H(M’)  Also many other properties – Hard to find fixed-points, near-collisions, M s.t. H(M) has low Hamming weight, etc.
  • 20. Hashing with public salt  Target-Collision-Resistance (TCR) – Attacker chooses M, then given random salt, cannot find M’ such that H(salt,M)=H(salt,M’)  enhanced TRC (eTCR) – Attacker chooses M, then given random salt, cannot find M’,salt’ s.t. H(salt,M)=H(salt’,M’)
  • 21. Hashing random messages  Second Preimage Resistance – Given random M, attacker cannot find M’ such that H(M)=H(M’)  One-wayness – Given d=H(M) for random M, attacker cannot find M’ such that H(M’)=d  Extraction* – For random salt, high-entropy M, the digest d=H(salt,M) is close to being uniform * Combinatorial, not cryptographic
  • 22. Hashing with a secret key  Pseudo-Random Functions – The mapping MH(key,M) for secret key looks random to an attacker  Universal hashing* – For all MM’, Prkey[ H(key,M)=H(key,M’) ]<e * Combinatorial, not cryptographic
  • 23. Application 1: Digital signatures  Hash-then-sign paradigm – First shorten the message, d = H(M) – Then sign the digest, s = SIGN(d)  Relies on collision resistance – If H(M)=H(M’) then s is a signature on both  Attacks on MD5, SHA-1 threaten current signatures – MD5 attacks can be used to get bad CA cert [Stevens et al. 2009]
  • 24. Collision resistance is hard  Attacker works off-line (find M,M’) – Can use state-of-the-art cryptanalysis, as much computation power as it can gather, without being detected !!  Helped by birthday attack (e.g., 280 vs 2160)  Well worth the effort – One collision  forgery for any signer
  • 25.  Use randomized hashing – To sign M, first choose fresh random salt – Set d= H(salt, M), s= SIGN( salt || d )  Attack scenario (collision game): – Attacker chooses M, M’ – Signer chooses random salt – Attacker must find M' s.t. H(salt,M) = H(salt,M')  Attack is inherently on-line – Only rely on target collision resistance Signatures without CRHF [Naor-Yung 1989, Bellare-Rogaway 1997] same salt (since salt is explicitly signed)
  • 26. TCR hashing for signatures  Not every randomization works – H(M|salt) may be subject to collision attacks  when H is Merkle-Damgård – Yet this is what PSS does (and it’s provable in the ROM)  Many constructions “in principle” – From any one-way function  Some engineering challenges – Most constructions use long/variable-size randomness, don’t preserve Merkle-Damgård  Also, signing salt means changing the underlying signature schemes
  • 27.  Use “stronger randomized hashing”, eTCR – To sign M, first choose fresh random salt – Set d = H(salt, M), s = SIGN( d )  Attack scenario (collision game): – Attacker chooses M – Signer chooses random salt – Attacker needs M‘,salt’ s.t. H(salt,M)=H(salt',M')  Attack is still inherently on-line [H-Krawczyk 2006] attacker can use different salt’ Signatures with enhanced TCR
  • 28. Randomized hashing with RMX  Use simple message-randomization – RMX: M=(M1,M2,…,ML), r  (r, M1r,M2r,…,MLr)  Hash( RMX(r,M) ) is eTCR when: – Hash is Merkle-Damgård, and – Compression function is ~ 2nd-preimage-resistant  Signature: [ r, SIGN( Hash( RMX(r,M) )) ] – r fresh per signature, one block (e.g. 512 bits) – No change in Hash, no signing of r [H-Krawczyk 2006]
  • 29. HASH SIGN r HASH SIGN RMX M =(M1,…,ML) X (r, M1r,,…,MLr) M =(M1,…,ML) Preserving hash-then-sign TCR
  • 30. Application 2: Message authentication  Sender, Receiver, share a secret key  Compute an authentication tag – tag = MAC(key, M)  Sender sends (M, tag)  Receiver verifies that tag matches M  Attacker cannot forge tags without key
  • 31. Authentication with HMAC  Simple key-prepend/append have problems when used with a Merkle-Damgård hash – tag=H(key | M) subject to extension attacks – tag=H(M | key) relies on collision resistance  HMAC: Compute tag = H(key | H(key | M)) – About as fast as key-prepend for a MD hash  Relies only on PRF quality of hash – MH(key|M) looks random when key is secret [Bellare-Canetti-Krawczyk 1996]
  • 32. Authentication with HMAC  Simple key-prepend/append have problems when used with a Merkle-Damgård hash – tag=H(key | M) subject to extension attacks – tag=H(M | key) relies on collision resistance  HMAC: Compute tag = H(key | H(key | M)) – About as fast as key-prepend for a MD hash  Relies only on PRF property of hash – MH(key|M) looks random when key is secret [Bellare-Canetti-Krawczyk 1996] As a result, barely affected by collision attacks on MD5/SHA1
  • 33. Carter-Wegman authentication  Compress message with hash, t=H(key1,M)  Hide t using a PRF, tag = tPRF(key2,nonce) – PRF can be AES, HMAC, RC4, etc. – Only applied to a short nonce, typically not a performance bottleneck  Secure if the PRF is good, H is “universal” – For MM’,D, Prkey[ H(key,M)H(key,M’)=D ]<e) – Not cryptographic, can be very fast [Wegman-Carter 1981,…]
  • 34. Fast Universal Hashing  “Universality” is combinatorial, provable  no need for “security margins” in design  Many works on fast implementations From inner-product, Hk1,k2(M1,M2)=(K1+M1)·(K2+M2)  [H-Krawczyk’97, Black et al.’99, …] From polynomial evaluation Hk(M1,…,ML)=Si Mi ki  [Krawczyk’94, Shoup’96, Bernstein’05, McGrew- Viega’06,…]  As fast as 2-3 cycle-per-byte (for long M’s) – Software implementation, contemporary CPUs
  • 35. Part III: Designing a hash function Fugue: IBM’s candidate for the NIST hash competition
  • 36. Design a compression function? PROs: modular design, reduce to the “simpler problem” of compressing fixed-length strings – Many things are known about transforming compression into hash CONs: compressionhash has its problems – It’s not free (e.g. message encoding) – Some attacks based on the MD structure  Extension attacks ( rely on H(x|y)=h(H(x),y) )  “Birthday attacks” (herding, multicollisions, …) h h h … h
  • 37.  Find many off-line collisions – “Tree structure” with ~2n/3 di,j’s – Takes ~ 22n/3 time  Publish final d  Then for any prefix P – Find “linking block” L s.t. H(P|L) in the tree – Takes ~ 22n/3 time – Read off the tree the suffix S to get to d  Show an extension of P s.t. H(P|L|S) = d Example attack: herding [Kelsey-Kohno 2006] h h h h d2,1 h h d M1,1 M1,2 M1,3 M1,4 M2,1 M2,2 d1,1 d1,2 d1,3 d1,4 d2,2
  • 38. The culprit: small intermediate state  With a compression function, we: – Work hard on current message block – Throw away this work, keep only n-bit state  Alternative: keep a large state – Work hard on current message block/word – Update some part of the big state  More flexible approach – Also more opportunities to mess things up
  • 39. The hash function Grindahl  State is 13 words = 52 bytes  Process one 4-byte word at a time – One AES-like mixing step per word of input  After some final processing, output 8 words  Collision attack by Peyrin (2007) – Complexity ~ 2112 (still better than brute-force)  Recently improved to ~ 2100 [Khovratovich 2009] – “Start from a collision and go backwards” [Knudsen-Rechberger-Thomsen 2007]
  • 40. The hash function “Fugue”  Proof-driven design – Designed to enable analysis  Proofs that Peyrin-style attacks do not work  State of 30 4-byte words = 120 bytes  Two “super-mixing” rounds per word of input – Each applied to only 16 bytes of the state – With some extra linear diffusion  Super-mixing is AES-like – But uses stronger MDS codes [H-Hall-Jutla 2008]
  • 41. Initial State (30 words) Process New State M1 Mi Final Processing Output 8 words = 256 bits Iterate State Fugue-256
  • 42. Initial State (30 words) Process New State DM1 DMi Final Processing D = 0 Iterate State Collision attacks D State = 0? D State = 0 Internal collision D State  0 External collision Collision means that DMi’s are not all zero Think of M1, …,ML and M’1,…,M’L
  • 43. Initial State (30 words) Process New State Final Stage Iterate State Process M1 SMIX  M1 Repeat 2-4 once more Processing one input word 1. Input one word 2. Shift 3 columns to right 3. XOR into columns 1-3 4. “super-mix” operation on columns 1-4 This is where the crypto happens
  • 44. SMIX in Fugue  Similar to one AES round – Works on a 4x4 matrix of bytes – Starts with S-box substitution  Byte b, S[256] = {...}; ...  b = S[b]; – Does linear mixing  Stronger mixing than AES – Diagonal bytes as in AES – Other bytes are mixed into both column and row
  • 45. SMIX in Fugue  In algebraic notation:  M generates a good linear code – If all the bi’ bytes but 4 are zero then  13 of the S[bi] bytes must be nonzero – And other such properties b16 = M16x16  b2 b1 ' M ' ' S[b2] S[b1] M S[b16]
  • 46. Analyzing internal collisions* SMIX  D After last input word: DState=0 before input word: D10 4 nonzero byte diffs before SMIX: D1-40 still D1-40  now D28-10  3 columns * a bit oversimplified
  • 47.  Analyzing internal collisions* SMIX D after input word: DState=0 before input word: D10 before SMIX: D1-40 still D1-40  now D28-10  3 columns SMIX  D28-40 D28-40  3 columns D25-10 4 nonzero byte diffs * a bit oversimplified
  • 48.  Analyzing internal collisions* SMIX D after input word: DState=0 before input word: D10 before SMIX: D1-40 still D1-40  now D28-10  3 columns SMIX  D28-40 D28-40  3 columns D25-10 D’ before input: D1=?, D25-300 * a bit oversimplified
  • 49. The analysis from previous slides was upto here Many nonzero byte differences before the SMIX operations
  • 51. Analyzing internal collisions  What does this mean? Consider this attack: – Attacker feeds in random M1,M2,… and M’1,M’2,… – Until StateL  State’L = some “good D” – Then it searches for suffixed (ML+1,…,ML+4), (M’L+1,…,M’L+4) that will induce internal collision Theorem*: For any fixed D, Pr[  suffixes that induce collision ] < 2-150 * Relies on a very mild independence assumptions
  • 52. Analyzing internal collisions  Why do we care about this analysis?  Peyrin’s attacks are of this type  All differential attacks can be seen as (optimizations of) this attack – Entities that are not controlled by attack are always presumed random  A known “collision trace” is as close as we can get to understanding collision resistance
  • 53. Fugue: concluding remarks  Similar analysis also for external collisions – “Unusually thorough” level of analysis  Performance comparable to SHA-256 – But more amenable to parallelism  One of 14 submissions that were selected by NIST to advance to 2nd round of the SHA3 competition
  • 54. Morals  Hash functions are very useful  We want them to behave “just like random functions” – But they don’t really  Applications should be designed to rely on “as weak as practical” properties of hashing – E.g., TCR/eTCR rather than collision-resistance  A taste of how a hash function is built