Hash principle and implementation of global
1-hash hash Introduction
2-Universal hashing global hashing
3- construct a global hash H \ mathcal {H} H
4-python achieve
1-hash hash Introduction
hash function y = h (k) y = h (k) y = h (k), the input of arbitrary length kkk through a hash algorithm hhh yyy into fixed length output, the output hash value is 1. A common hash function is y = H (k) = (a⋅k + b) mod my = H (k) = (a \ cdot k + b) \ mod my = H (k) = (a⋅k + b) modm, mmm generally the prime number.
Domain hash function is provided of KKK, YYY range is, in general, |K|> |Y| | K |> | Y | |K|> |Y|, so prone to collisions hash function, as shown below, h (k5) = h (k2) = h (k7) h (k_5) = h (k_2) = h (k_7) h (k5) = h (k2) = h (k7), k5, k2, k7k_5, k_2 , k_7k5, k2, k7 on a chain (collision):
For the hash function, basically able to find a set of input, so that their hash values are the same, leading them on a chain, and sometimes even looks than linear complexity is even higher, because more than a hash lookup time than linear .
2-Universal hashing global hashing
Idea: One way to solve the problem is random. A random selection from a set of hash functions (a family of hash functions). So choose, there is no other way to construct a set of inputs for a particular hash function that the hash function is very low efficiency.
Defined 1: U \ mathcal {U} U is defined domains, H \ mathcal {H} H is the set of hash functions, can be U \ mathcal {U} U are mapped to {0,1, ..., m-1 } \ {0, 1, ..., m-1 \} {0,1, ..., m-1}, i.e. h: U → {0,1, ..., m-1}, h ∈Hh: \ mathcal {U} \ rightarrow \ {0, 1, ..., m-1 \}, h \ in \ mathcal {H} h: U → {0,1, ..., m-1 }, h∈H.
Definition 2: If ∀x, y \ forall x, y∀x, y satisfy x ≠ yx \ neq yx = y and | {h∈H: h (x) = h (y)} | = |H|m | \ {h \ in \ mathcal {H}: h (x) = h (y) \} | = \ frac {| \ mathcal {H} |} {m} | {h∈H: h (x) = h (y)} | = m|H|, called H \ mathcal {H} H is global (Universal) is.
2 By definition, if h is uniformly at random from H mathcal {H} H Select \ (Note that each input to re-select a hash function), then the probability of collision xxx and yyy are:
The number of function h (x) = h (y) is a function of all = |H|m|H| = 1m. \ {Number of function h (x) = h (y) is a function of all} frac {}
= \ Frac {\ frac {| \ mathcal {H} |} {m}} {| \ mathcal {H} |}. = \ Frac {1} {m} all the functions h (x) = h (y) number function = |H|m|H| = m1.
Theorem 1: uniformly at random from H \ mathcal {H} H (H \ mathcal {H} H is the whole domain) selected HHH, if we now put nnn input into the hash table in the TTT, then give a enter xxx, there
E [x and the number of T elements in the hash table collisions]
Wherein E [⋅] E [\ cdot] E [⋅] denotes expectation.
[Importance of Theorem 1] by proving the above theorem, we can say that, if there H \ mathcal {H} H is, the final distribution of elements in the whole field in the hash table TTT (in an average sense) is uniform.
The proof of Theorem 1 Let CxC_ {x} Cx represents the number of random elements in the hash table and xxx TTT collision, provided
Cxy={1if h(x)=h(y)0if h(x)≠h(y)C_{xy}=\left\{\begin{array}{cr}
1 & if\ h(x)=h(y) \\
0 & if\ h(x)\neq h(y)
\end{array}\right.Cxy={10if h(x)=h(y)if h(x)=h(y)
Well,
E [Cx] = E [Σy∈T-xCxy] = Σy∈T-xE [Cxy] Since the desired properties of linearity = Σy∈T-x1m = (n-1) 1m
E[C_x]&=E[\sum_{y\in T-x}C_{xy}] \\
& = \ Sum_ {y \ in Tx} E [C_ {xy}] & because the desired linear properties \\
&=\sum_{y\in T-x}\frac{1}{m} \\
&=(n-1)\frac{1}{m} \\
&<\frac{n}{m}.
\end{array}E[Cx]=E[∑y∈T−xCxy]=∑y∈T−xE[Cxy]=∑y∈T−xm1=(n−1)m1
Example: If n = 1, m = 2n = 1, m = 2n = 1, m = 2, then E [Cx] <12.E [C_x] <\ frac {1} {2} .E [Cx] < twenty one.
3- construct a global hash H \ mathcal {H} H
Theorem 2: according to the following four-step configuration H \ mathcal {H} H is all fields:
(Condition) Order mmm equal to a prime number;
(Initial preparation) input kkk written r + 1r + 1r + 1 digits: k = k = k =, where ki∈ {0,1, ..., m-1} k_i \ in \ {0, 1, ..., m-1 \} ki∈ {0,1, ..., m-1} (equivalent to the hexadecimal notation by kkk mmm);
(Random) randomly selecting a a = a = a =, wherein ai∈0,1, ..., m-1a_i \ in {0, 1, ..., m-1} ai∈0,1, .. ., m-1;
(hash函数)ha(k)=(∑i=0i=rai×ki)mod mh_a(k)=(\sum_{i=0}^{i=r}a_i\times k_i) \mod mha(k)=(∑i=0i=rai×ki)modm.
See 2 proof.
4-python achieve
To write their own code, correct me if wrong look. Code link: https: //github.com/VFVrPQ/LDP/blob/master/Components/UniversalHashing.py, otherwise complete code is as follows:
import math
import random
class UniversalHashing:
'''
g: a prime
d: domain, [0, 1, ..., d-1]
len: The maximum number of digits in g Base
v: an input value in [0, 1, ..., d-1]
hash function: H_A (a) = (a (0) * k (0) + a (1) * k (1) + ... + a (as-1) * k (only-1))% g
'''
def __init__(self, g, d):
self.__g = g
assert g>=2, 'g is less than 2'
assert self.__isPrime(g), 'g is not a prime'
self.__d = d
The self .__ len = math.ceil (math.log (d) / math.log (g)) # g-band, the maximum number of bits
self.__a = self.__len*[0] # initial length
# v is an input value in [0, 1, ..., d-1]
def hash(self, v):
self.__randomness() # regenerate a, select H
out = self.calc(self.__a, v)
return self.__a, out
Calc H_A # (k) = (a (0) * k (0) + a (1) * k (1) + ... + a (as-1) * k (only-1))% g
def calc(self, a, v):
assert only (a) == self .__ only, "only the (A)! = self .__ only '
k = self .__ toBitList (v)
out = 0 How much money Zhengzhou abortions http://mobile.sgyy029.com/
for i in range(self.__len):
out = (out + a[i]*k[i]) % self.__g
return out
def __randomness(self):
# generate a
for i in range(self.__len):
self .__ a [i] = random.randint (0, self .__ g-1)
def __toBitList(self, v):
assert v>=0, 'v<0'
if v == 0:
return self.__len * [0]
Bitliste = self .__ man * [0]
for i in range(self.__len):
bitList[i] = v%self.__g
v = int (v / self .__ g)
return Bitliste
def __isPrime(self, v):
if v<=1:
return False
for i in range(2, int(math.sqrt(v))+1, 1):
if v%i==0:
return False
return True
# for test
if __name__ == "__main__":
TIMES = 10
g = 29 # prime
d = 16 # domain
uhash = UniversalHashing(g, d)
H = g * [0]
for i in range(TIMES): # random TIMES to verify
x = random.randint (0, d-1)
_, out = uhash.hash(x)
H[out] += 1
for i in range(g):
print(i, H[i])