Deep Hash
hi,
Hope i post in correct forum...
I'm trying to design a hash for huge files that can be computed in parallel and/or at GPU to big extend. I baptized it 'deep hash' and giving it a name for a first attempt is probably a bit early. Of course the first 4 letters of my surname/family name translates to the English: 'deep'.
Prior to coding it up i would like some comments on it and how useful it is to avoid easy to generate collissions. Here is what i crabbled up:
Definitions: in all cases the least significant bit is bit 0 in that specific byte or double word (32 bits unsigned). Everything gets done using unsigned math.
a) 64 bits: number of total bits of file
b) 32 bits: number of ones (mod 2^32)
c) 64 bits: summation of squared bytes (mod 2^64)
d) 32 bits: summation over all doublewords (32 bits datatype summation mod 2^32)
e) we group the file in packets of n bits and xor all packets
we do this for several n: 17,19,23,29,31,37,41,43,47,53,59,61
this is 460 bits together.
so far 460 + 192 = 652 bits
f) 32 bits: rotate each doubleword clockwise
n bits with n being 1,2,.. ..30,31... and calculate
the sum of it modulo 2^32, so we don't rotate 0 bits,
always rotation is a number from domain [1..31]
and each next doubleword we take the next number and after
31 we start again at 1.
g) 32 bits: same as f, but now we rotate in the other direction
652+64 = 716 bits so far..
h) 32 bits: for each byte we calculate how many bits each bit
of the byte have a 1. So how many 0th bits we have ... 7th bit.
we then AND the result with 15 giving us 8 numbers of
4 bits.
748 bits so far
We use a sieve to calculate prime numbers.
for each bit set to 1 we calculate the prime number
belonging to it. We start with prime number 65537 for bit 0 in the first byte
then our sieve calculates the next prime number that comes after that. In this case it starts with:
65537 for bit 0
65539 for bit 1
65543 for bit 2
65551 for bit 3
65557 for bit 4
65563 for bit 5
65579 for bit 6
65581 for bit 7
65587 for bit 8
65599 for bit 9
etc
If a bit is set we grab the prime number. Now we do 3 things with the
selected prime numbers:
i1) 64 bits: we xor all selected primes and store the result modulo 2^64
i2) 64 bits: we add all selected primes up and store the result modulo 2^64
i3) 127 bits: we multiply all selected primes modulo 2^127  1,
total: 748 + 255 = 1003 bits so far
j4) 17 bits: we split up file in 16 bits words and multiply those modulo 2^17  1
total 1003 + 17 = 1020 bits so far
We calculate a 4 bits CRC of the 1020 bits hash and
store that as the last 4 bits
1020 + 4 = 1024 bits hashkey in total
Last fiddled with by diep on 20121005 at 14:55
