Thread: Deep Hash View Single Post
 2012-10-05, 14:46 #1 diep     Sep 2006 The Netherlands 36 Posts Deep Hash hi, Hope i post in correct forum... I'm trying to design a hash for huge files that can be computed in parallel and/or at GPU to big extend. I baptized it 'deep hash' and giving it a name for a first attempt is probably a bit early. Of course the first 4 letters of my surname/family name translates to the English: 'deep'. Prior to coding it up i would like some comments on it and how useful it is to avoid easy to generate collissions. Here is what i crabbled up: Definitions: in all cases the least significant bit is bit 0 in that specific byte or double word (32 bits unsigned). Everything gets done using unsigned math. a) 64 bits: number of total bits of file b) 32 bits: number of ones (mod 2^32) c) 64 bits: summation of squared bytes (mod 2^64) d) 32 bits: summation over all doublewords (32 bits datatype summation mod 2^32) e) we group the file in packets of n bits and xor all packets we do this for several n: 17,19,23,29,31,37,41,43,47,53,59,61 this is 460 bits together. so far 460 + 192 = 652 bits f) 32 bits: rotate each doubleword clockwise n bits with n being 1,2,.. ..30,31... and calculate the sum of it modulo 2^32, so we don't rotate 0 bits, always rotation is a number from domain [1..31] and each next doubleword we take the next number and after 31 we start again at 1. g) 32 bits: same as f, but now we rotate in the other direction 652+64 = 716 bits so far.. h) 32 bits: for each byte we calculate how many bits each bit of the byte have a 1. So how many 0th bits we have ... 7th bit. we then AND the result with 15 giving us 8 numbers of 4 bits. 748 bits so far We use a sieve to calculate prime numbers. for each bit set to 1 we calculate the prime number belonging to it. We start with prime number 65537 for bit 0 in the first byte then our sieve calculates the next prime number that comes after that. In this case it starts with: 65537 for bit 0 65539 for bit 1 65543 for bit 2 65551 for bit 3 65557 for bit 4 65563 for bit 5 65579 for bit 6 65581 for bit 7 65587 for bit 8 65599 for bit 9 etc If a bit is set we grab the prime number. Now we do 3 things with the selected prime numbers: i1) 64 bits: we xor all selected primes and store the result modulo 2^64 i2) 64 bits: we add all selected primes up and store the result modulo 2^64 i3) 127 bits: we multiply all selected primes modulo 2^127 - 1, total: 748 + 255 = 1003 bits so far j4) 17 bits: we split up file in 16 bits words and multiply those modulo 2^17 - 1 total 1003 + 17 = 1020 bits so far We calculate a 4 bits CRC of the 1020 bits hash and store that as the last 4 bits 1020 + 4 = 1024 bits hashkey in total Last fiddled with by diep on 2012-10-05 at 14:55