-
-
Save veorq/6146951 to your computer and use it in GitHub Desktop.
""" | |
This is an EXPERIMENTAL password hash with time and memory parameters, | |
such that the time parameter does not affect the memory required (but | |
does affect the number of memory accesses). | |
This was quickly designed, with no real test, so it's probably a silly | |
design and the code may be broken. Therefore, please: | |
- Do not use it to hash real passwords! | |
- Attack it! (circumvent the time/memory requirements, find biases...) | |
Parameters: | |
h Hash function with the hashlib common interface | |
(at least digest() and hexdigest() should be implemented) | |
pwd Password, a string | |
salt Salt, a string | |
ptime Time parameter, an integer > 0 | |
pmem Memory parameter, an integer > 0 | |
Back-of-the-enveloppe time and space requirements: | |
Time: ~ 1.5*ptime*pmem compression evaluations, for any of the functions | |
in hashlib; for other functions, it depends on the hash and block | |
lengths and on the padding rule | |
Memory: ~ hlen*pmem bytes, where hlen is the byte length of a digest | |
Examples with approximate timings measured on an Ivy Bridge: | |
phs( hashlib.sha256, 'password', 'salt', 2, 2**16 ) | |
uses ~2MB, runs in ~250ms | |
phs( hashlib.sha256, 'password', 'salt', 1, 2**17 ) | |
uses ~4MB, runs in ~250ms | |
(An optimized C implementation would probably be much faster.) | |
JP Aumasson / @veorq / [email protected] | |
2013 | |
""" | |
def phs( h, pwd, salt, ptime, pmem ): | |
s = h( h( pwd + salt ).digest() + salt ).digest() | |
hlen = len(s) | |
for t in xrange(ptime): | |
for m in xrange(pmem): | |
s = s + h( s[-hlen:] ).digest() | |
s = h( s[::-1] ).digest() | |
return h(s).hexdigest() | |
import hashlib |
I like the simplicity.
Two things:
If you were building a pipeline in hardware, where each stage in the pipeline computes one iteration of the outer loop, you'd only need to pass hlen bytes (length of state) to the next stage. It would probably be better to keep the big 'buf' across iterations of the outer loop, so that more info needs to be passed between pipeline stages. In other words, the memory doesn't need to be 'sustained' throughout the entire computation, it has a break after each iteration of the outer loop.
Reversing a string in hardware is just a matter of wiring, whereas in software it takes time. That may give hardware implementations an advantage.
Good points. The inversion thing is probably a stupid idea as it is. It would be sufficient to (say) copy the last chunk at the first position and preserve the ordering otherwise.
Revision 2: replaced the byte inversion with "buf[-hlen:] + buf", which turns out to be a bit slower.
Revision 3: similar functionality, simpler algorithm and code.
Revision 4: even simpler.
Your memory-hardness isn't actually memory-hard. Witness that this:
def phs2(h, pwd, salt, ptime, pmem):
s = h( h( pwd + salt ).digest() + salt ).digest()
hlen = len(s)
for t in xrange(ptime):
ps1 = s
ps2 = s
for m in xrange(pmem):
ps1 = h(ps1).digest()
ptime_hash = h(ps1)
for m in xrange(pmem):
ptime_hash.update(ps2)
ps2 = h(ps2).digest()
ptime_hash.update(ps2)
s = ptime_hash.digest()
return h(s).hexdigest()
gives the same result as phs, but takes only O(1) memory.
I strongly recommend that you read the scrypt paper; it's pretty easy to follow. Colin Percival is a very good writer. http://www.tarsnap.com/scrypt.html
Thanks Richard for spotting that! Looks like the version with byte inversion did have some advantages ;)
Will fix!
Instead of byte inversion, you could store all of the hashes that make up the final S in a list, then go through that list backwards with .update(). Doing that solves the string concatenation problem at the same time. You could still do it in O(1) memory by computing just the hash you need every time, but would take O(pmem^2) time.
The initial bytes inversion works and can be efficiently implemented with pshufb or XOP's vpperm.
There is an advantageous time memory trade off where 1/2 the memory only takes 3/2 the time. Compute the hashes all the way through, and save the first, and middle through last hash. Then reverse the half you have saved, feed it in, and calculate the first half again (since you have the first hash), reverse that and feed it in. This involves 3/2 the time, but 1/2 the memory.
Generalizing the idea you can take 1/k+k/M of the memory at a cost of at most 2 time the time by storing every kth hash and computing only 1/k of the hashes at a time.
I'd swap the order of salt and pw
h(salt + h(salt + pwd))
Or just HMAC(salt, pw)
For non-python-programmers like myself: