Cryptographic-Hash-Functions.ppt

Cryptographic Hash Functions
and their many applications
Shai Halevi – IBM Research
USENIX Security – August 2009
Thanks to Charanjit Jutla and Hugo Krawczyk

What are hash functions?
 Just a method of compressing strings
– E.g., H : {0,1}*  {0,1}160
– Input is called “message”, output is “digest”
 Why would you want to do this?
– Short, fixed-size better than long, variable-size
 True also for non-crypto hash functions
– Digest can be added for redundancy
– Digest hides possible structure in message

Typically using Merkle-Damgård iteration:
1. Start from a “compression function”
– h: {0,1}b+n{0,1}n
2. Iterate it
How are they built?
h
c =160 bits
|M|=b=512 bits
d=h(c,M)=160 bits
h h h h
…
M1 M2 ML-1 ML
IV=d0
d1 d2 dL-1 dL
d=H(M)
But not
always…

What are they good for?
“Request for Candidate Algorithm Nominations”,
-- NIST, November 2007
“Modern, collision resistant hash functions were designed to create
small, fixed size message digests so that a digest could act as a
proxy for a possibly very large variable length message in a digital
signature algorithm, such as RSA or DSA. These hash functions
have since been widely used for many other “ancillary” applications,
including hash-based message authentication codes, pseudo
random number generators, and key derivation functions.”

Some examples
 Signatures: sign(M) = RSA-1( H(M) )
 Message-authentication: tag=H(key,M)
 Commitment: commit(M) = H(M,…)
 Key derivation: AES-key = H(DH-value)
 Removing interaction [Fiat-Shamir, 1987]
– Take interactive identification protocol
– Replace one side by a hash function
Challenge = H(smthng, context)
– Get non-interactive signature scheme
smthng
challenge
response
A B
smthng, response

Part I: Random functions
vs. hash functions

Random functions
 What we really want is H that behaves
“just like a random function”:
Digest d=H(M) chosen uniformly for each M
– Digest d=H(M) has no correlation with M
– For distinct M1,M2,…, digests di=H(Mi) are
completely uncorrelated to each other
– Cannot find collisions, or even near-collisions
– Cannot find M to “hit” a specific d
– Cannot find fixed-points (d = H(d))
– etc.

The “Random-Oracle paradigm”
1. Pretend hash function is really this good
2. Design a secure cryptosystem using it
 Prove security relative to a “random oracle”
[Bellare-Rogaway, 1993]

3. Replace oracle with a hash function
 Hope that it remains secure

3. Replace oracle with a hash function
 Hope that it remains secure
 Very successful paradigm, many schemes
– E.g., OAEP encryption, FDH,PSS signatures
 Also all the examples from before…
– Schemes seem to “withstand test of time”

Random oracles: rationale
 S is some crypto scheme (e.g., signatures),
that uses a hash function H
 S proven secure when H is random function
 Any attack on real-world S must use
some “nonrandom property” of H
 We should have chosen a better H
– without that “nonrandom property”
 Caveat: how do we know what “nonrandom
properties” are important?

This rationale isn’t sound
 Exist signature schemes that are:
1. Provably secure wrt a random function
2. Easily broken for EVERY hash function
 Idea: hash functions are computable
– This is a “nonrandom property” by itself
 Exhibit a scheme which is secure only
for “non-computable H’s”
– Scheme is (very) “contrived”
[Canetti-Goldreich-H 1997]

Contrived example
 Start from any secure signature scheme
– Denote signature algorithm by SIG1H(key,msg)
 Change SIG1 to SIG2 as follows:
SIG2H(key,msg): interprate msg as code P
– If P(i)=H(i) for i=1,2,3,…,|msg|, then output key
– Else output the same as SIG1H(key,msg)
 If H is random, always the “Else” case
 If H is a hash function, attempting to sign
the code of H outputs the secret key
Some
Technicalities

Cautionary note
 ROM proofs may not mean what you think…
– Still they give valuable assurance, rule out
“almost all realistic attacks”
 What “nonrandom properties” are important
for OAEP / FDH / PSS / …?
 How would these scheme be affected by a
weakness in the hash function in use?
 ROM may lead to careless implementation

Merkle-Damgård vs. random functions
 Recall: we often construct our hash functions
from compression functions
– Even if compression is random, hash is not
 E.g., H(key|M) subject to extension attack
– H(key | M|M’) = h( H(key|M), M’)
– Minor changes to MD fix this
 But they come with a price (e.g. prefix-free encoding)
 Compression also built from low-level blocks
– E.g., Davies-Meyer construction,
h(c,M)=EM(c)c
– Provide yet more structure, can lead to attacks
on provable ROM schemes [H-Krawczyk 2007]
h h h h
…

Part II: Using hash functions
in applications

Using “imperfect” hash functions
 Applications should rely only on “specific
security properties” of hash functions
– Try to make these properties as “standard” and
as weak as possible
 Increases the odds of long-term security
– When weaknesses are found in hash function,
application more likely to survive
– E.g., MD5 is badly broken, but HMAC-MD5 is
barely scratched

Security requirements
 Deterministic hashing
– Attacker chooses M, d=H(M)
 Hashing with a random salt
– Attacker chooses M, then good guy
chooses public salt, d=H(salt,M)
 Hashing random messages
– M random, d=H(M)
 Hashing with a secret key
– Attacker chooses M, d=H(key,M)
Stronger
Weaker

Deterministic hashing
 Collision Resistance
– Attacker cannot find M,M’ such that H(M)=H(M’)
 Also many other properties
– Hard to find fixed-points, near-collisions,
M s.t. H(M) has low Hamming weight, etc.

Hashing with public salt
 Target-Collision-Resistance (TCR)
– Attacker chooses M, then given random salt,
cannot find M’ such that H(salt,M)=H(salt,M’)
 enhanced TRC (eTCR)
– Attacker chooses M, then given random salt,
cannot find M’,salt’ s.t. H(salt,M)=H(salt’,M’)

Hashing random messages
 Second Preimage Resistance
– Given random M, attacker cannot find M’
such that H(M)=H(M’)
 One-wayness
– Given d=H(M) for random M, attacker cannot
find M’ such that H(M’)=d
 Extraction*
– For random salt, high-entropy M, the digest
d=H(salt,M) is close to being uniform
* Combinatorial, not cryptographic

Hashing with a secret key
 Pseudo-Random Functions
– The mapping MH(key,M) for secret key
looks random to an attacker
 Universal hashing*
– For all MM’, Prkey[ H(key,M)=H(key,M’) ]<e
* Combinatorial, not cryptographic

Application 1:
Digital signatures
 Hash-then-sign paradigm
– First shorten the message, d = H(M)
– Then sign the digest, s = SIGN(d)
 Relies on collision resistance
– If H(M)=H(M’) then s is a signature on both
 Attacks on MD5, SHA-1 threaten current
signatures
– MD5 attacks can be used to get bad CA cert
[Stevens et al. 2009]

Collision resistance is hard
 Attacker works off-line (find M,M’)
– Can use state-of-the-art cryptanalysis, as much
computation power as it can gather, without
being detected !!
 Helped by birthday attack (e.g., 280 vs 2160)
 Well worth the effort
– One collision  forgery for any signer

 Use randomized hashing
– To sign M, first choose fresh random salt
– Set d= H(salt, M), s= SIGN( salt || d )
 Attack scenario (collision game):
– Attacker chooses M, M’
– Signer chooses random salt
– Attacker must find M' s.t. H(salt,M) = H(salt,M')
 Attack is inherently on-line
– Only rely on target collision resistance
Signatures without CRHF
[Naor-Yung 1989, Bellare-Rogaway 1997]
same salt (since salt
is explicitly signed)

TCR hashing for signatures
 Not every randomization works
– H(M|salt) may be subject to collision attacks
 when H is Merkle-Damgård
– Yet this is what PSS does (and it’s provable in the ROM)
 Many constructions “in principle”
– From any one-way function
 Some engineering challenges
– Most constructions use long/variable-size randomness,
don’t preserve Merkle-Damgård
 Also, signing salt means changing the underlying
signature schemes

 Use “stronger randomized hashing”, eTCR
– To sign M, first choose fresh random salt
– Set d = H(salt, M), s = SIGN( d )
 Attack scenario (collision game):
– Attacker chooses M
– Signer chooses random salt
– Attacker needs M‘,salt’ s.t. H(salt,M)=H(salt',M')
 Attack is still inherently on-line
[H-Krawczyk 2006]
attacker can use
different salt’
Signatures with enhanced TCR

Randomized hashing with RMX
 Use simple message-randomization
– RMX: M=(M1,M2,…,ML), r 
(r, M1r,M2r,…,MLr)
 Hash( RMX(r,M) ) is eTCR when:
– Hash is Merkle-Damgård, and
– Compression function is ~ 2nd-preimage-resistant
 Signature: [ r, SIGN( Hash( RMX(r,M) )) ]
– r fresh per signature, one block (e.g. 512 bits)
– No change in Hash, no signing of r
[H-Krawczyk 2006]

HASH
SIGN
r
HASH
SIGN
RMX
M =(M1,…,ML)
X
(r, M1r,,…,MLr)
M =(M1,…,ML)
Preserving hash-then-sign
TCR

Application 2:
Message authentication
 Sender, Receiver, share a secret key
 Compute an authentication tag
– tag = MAC(key, M)
 Sender sends (M, tag)
 Receiver verifies that tag matches M
 Attacker cannot forge tags without key

Authentication with HMAC
 Simple key-prepend/append have problems
when used with a Merkle-Damgård hash
– tag=H(key | M) subject to extension attacks
– tag=H(M | key) relies on collision resistance
 HMAC: Compute tag = H(key | H(key | M))
– About as fast as key-prepend for a MD hash
 Relies only on PRF quality of hash
– MH(key|M) looks random when key is secret
[Bellare-Canetti-Krawczyk 1996]

Authentication with HMAC
 Simple key-prepend/append have problems
when used with a Merkle-Damgård hash
– tag=H(key | M) subject to extension attacks
– tag=H(M | key) relies on collision resistance
 HMAC: Compute tag = H(key | H(key | M))
– About as fast as key-prepend for a MD hash
 Relies only on PRF property of hash
– MH(key|M) looks random when key is secret
[Bellare-Canetti-Krawczyk 1996]
As a result, barely
affected by collision
attacks on
MD5/SHA1

Carter-Wegman authentication
 Compress message with hash, t=H(key1,M)
 Hide t using a PRF, tag =
tPRF(key2,nonce)
– PRF can be AES, HMAC, RC4, etc.
– Only applied to a short nonce, typically not a
performance bottleneck
 Secure if the PRF is good, H is “universal”
– For MM’,D, Prkey[ H(key,M)H(key,M’)=D ]<e)
– Not cryptographic, can be very fast
[Wegman-Carter 1981,…]

Fast Universal Hashing
 “Universality” is combinatorial, provable
 no need for “security margins” in design
 Many works on fast implementations
From inner-product, Hk1,k2(M1,M2)=(K1+M1)·(K2+M2)
 [H-Krawczyk’97, Black et al.’99, …]
From polynomial evaluation Hk(M1,…,ML)=Si Mi ki
 [Krawczyk’94, Shoup’96, Bernstein’05, McGrew-
Viega’06,…]
 As fast as 2-3 cycle-per-byte (for long M’s)
– Software implementation, contemporary CPUs

Part III:
Designing a hash function
Fugue: IBM’s candidate for the
NIST hash competition

Design a compression function?
PROs: modular design, reduce to the “simpler
problem” of compressing fixed-length strings
– Many things are known about transforming
compression into hash
CONs: compressionhash has its problems
– It’s not free (e.g. message encoding)
– Some attacks based on the MD structure
 Extension attacks ( rely on H(x|y)=h(H(x),y) )
 “Birthday attacks” (herding, multicollisions, …)
h h h
…
h

 Find many off-line collisions
– “Tree structure” with ~2n/3 di,j’s
– Takes ~ 22n/3 time
 Publish final d
 Then for any prefix P
– Find “linking block” L s.t. H(P|L) in the tree
– Takes ~ 22n/3 time
– Read off the tree the suffix S to get to d
 Show an extension of P s.t. H(P|L|S) = d
Example attack: herding
[Kelsey-Kohno 2006]
h
h
h
h
d2,1 h
h
d
M1,1
M1,2
M1,3
M1,4
M2,1
M2,2
d1,1
d1,2
d1,3
d1,4
d2,2

The culprit: small intermediate state
 With a compression function, we:
– Work hard on current message block
– Throw away this work, keep only n-bit state
 Alternative: keep a large state
– Work hard on current message block/word
– Update some part of the big state
 More flexible approach
– Also more opportunities to mess things up

The hash function Grindahl
 State is 13 words = 52 bytes
 Process one 4-byte word at a time
– One AES-like mixing step per word of input
 After some final processing, output 8 words
 Collision attack by Peyrin (2007)
– Complexity ~ 2112 (still better than brute-force)
 Recently improved to ~ 2100 [Khovratovich 2009]
– “Start from a collision and go backwards”
[Knudsen-Rechberger-Thomsen 2007]

The hash function “Fugue”
 Proof-driven design
– Designed to enable analysis
 Proofs that Peyrin-style attacks do not work
 State of 30 4-byte words = 120 bytes
 Two “super-mixing” rounds per word of input
– Each applied to only 16 bytes of the state
– With some extra linear diffusion
 Super-mixing is AES-like
– But uses stronger MDS codes
[H-Hall-Jutla 2008]

Initial State (30 words)
Process
New State
M1
Mi
Final Processing
Output 8 words = 256 bits
Iterate
State
Fugue-256

Process
New State
DM1
DMi
Final Processing
D = 0
Iterate
State
Collision attacks
D State = 0? D State = 0
Internal collision
D State  0
External collision
Collision
means that
DMi’s are
not all zero
Think of M1, …,ML
and M’1,…,M’L

Process
New State
Final Stage
Iterate
State
Process
M1
SMIX

M1
Repeat 2-4 once more
Processing one input word
1. Input one word
2. Shift 3 columns to right
3. XOR into columns 1-3
4. “super-mix” operation
on columns 1-4
This is
where the
crypto
happens

SMIX in Fugue
 Similar to one AES round
– Works on a 4x4 matrix of bytes
– Starts with S-box substitution
 Byte b, S[256] = {...};
...
 b = S[b];
– Does linear mixing
 Stronger mixing than AES
– Diagonal bytes as in AES
– Other bytes are mixed into both column and row

SMIX in Fugue
 In algebraic notation:
 M generates a good linear code
– If all the bi’ bytes but 4 are zero
then  13 of the S[bi] bytes must be nonzero
– And other such properties
b16
= M16x16

b2
b1
'
M
'
'
S[b2]
S[b1]
M
S[b16]

Analyzing internal collisions*
SMIX

D
After last input word: DState=0
before input word: D10
4 nonzero byte diffs
before SMIX: D1-40
still D1-40 
now D28-10  3 columns
* a bit oversimplified


SMIX
D
after input word: DState=0
still D1-40 
SMIX

D28-40
D28-40
 3 columns
D25-10
4 nonzero byte diffs


SMIX
D
after input word: DState=0
still D1-40 
SMIX

D28-40
D28-40
 3 columns
D25-10
D’
before input: D1=?, D25-300

The analysis
from previous
slides was
upto here
Many nonzero byte
differences before
the SMIX operations

Cryptographic-Hash-Functions.ppt

Analyzing internal collisions
 What does this mean? Consider this attack:
– Attacker feeds in random M1,M2,… and M’1,M’2,…
– Until StateL  State’L = some “good D”
– Then it searches for suffixed (ML+1,…,ML+4),
(M’L+1,…,M’L+4) that will induce internal collision
Theorem*: For any fixed D,
Pr[  suffixes that induce collision ] < 2-150
* Relies on a very mild independence assumptions

Analyzing internal collisions
 Why do we care about this analysis?
 Peyrin’s attacks are of this type
 All differential attacks can be seen as
(optimizations of) this attack
– Entities that are not controlled by attack are
always presumed random
 A known “collision trace” is as close as we
can get to understanding collision resistance

Fugue: concluding remarks
 Similar analysis also for external collisions
– “Unusually thorough” level of analysis
 Performance comparable to SHA-256
– But more amenable to parallelism
 One of 14 submissions that were selected
by NIST to advance to 2nd round of the
SHA3 competition

Morals
 Hash functions are very useful
 We want them to behave “just like random
functions”
– But they don’t really
 Applications should be designed to rely on
“as weak as practical” properties of hashing
– E.g., TCR/eTCR rather than collision-resistance
 A taste of how a hash function is built

Cryptographic-Hash-Functions.ppt

More Related Content

Similar to Cryptographic-Hash-Functions.ppt (20)

Recently uploaded (20)

Cryptographic-Hash-Functions.ppt