On the resolver library XPath, bs4 and pyquery

  Hash principle and implementation of global

  1-hash hash Introduction

  2-Universal hashing global hashing

  3- construct a global hash H \ mathcal {H} H

  4-python achieve

  1-hash hash Introduction

  hash function y = h (k) y = h (k) y = h (k), the input of arbitrary length kkk through a hash algorithm hhh yyy into fixed length output, the output hash value is 1. A common hash function is y = H (k) = (a⋅k + b) mod my = H (k) = (a \ cdot k + b) \ mod my = H (k) = (a⋅k + b) modm, mmm generally the prime number.

  Domain hash function is provided of KKK, YYY range is, in general, |K|> |Y| | K |> | Y | |K|> |Y|, so prone to collisions hash function, as shown below, h (k5) = h (k2) = h (k7) h (k_5) = h (k_2) = h (k_7) h (k5) = h (k2) = h (k7), k5, k2, k7k_5, k_2 , k_7k5, k2, k7 on a chain (collision):

  For the hash function, basically able to find a set of input, so that their hash values ​​are the same, leading them on a chain, and sometimes even looks than linear complexity is even higher, because more than a hash lookup time than linear .

  2-Universal hashing global hashing

  Idea: One way to solve the problem is random. A random selection from a set of hash functions (a family of hash functions). So choose, there is no other way to construct a set of inputs for a particular hash function that the hash function is very low efficiency.

  Defined 1: U \ mathcal {U} U is defined domains, H \ mathcal {H} H is the set of hash functions, can be U \ mathcal {U} U are mapped to {0,1, ..., m-1 } \ {0, 1, ..., m-1 \} {0,1, ..., m-1}, i.e. h: U → {0,1, ..., m-1}, h ∈Hh: \ mathcal {U} \ rightarrow \ {0, 1, ..., m-1 \}, h \ in \ mathcal {H} h: U → {0,1, ..., m-1 }, h∈H.

  Definition 2: If ∀x, y \ forall x, y∀x, y satisfy x ≠ yx \ neq yx = y and | {h∈H: h (x) = h (y)} | = |H|m | \ {h \ in \ mathcal {H}: h (x) = h (y) \} | = \ frac {| \ mathcal {H} |} {m} | {h∈H: h (x) = h (y)} | = m|H|, called H \ mathcal {H} H is global (Universal) is.

  2 By definition, if h is uniformly at random from H mathcal {H} H Select \ (Note that each input to re-select a hash function), then the probability of collision xxx and yyy are:

  The number of function h (x) = h (y) is a function of all = |H|m|H| = 1m. \ {Number of function h (x) = h (y) is a function of all} frac {}

  = \ Frac {\ frac {| \ mathcal {H} |} {m}} {| \ mathcal {H} |}. = \ Frac {1} {m} all the functions h (x) = h (y) number function = |H|m|H| = m1.

  Theorem 1: uniformly at random from H \ mathcal {H} H (H \ mathcal {H} H is the whole domain) selected HHH, if we now put nnn input into the hash table in the TTT, then give a enter xxx, there

  E [x and the number of T elements in the hash table collisions]

  Wherein E [⋅] E [\ cdot] E [⋅] denotes expectation.

  [Importance of Theorem 1] by proving the above theorem, we can say that, if there H \ mathcal {H} H is, the final distribution of elements in the whole field in the hash table TTT (in an average sense) is uniform.

  The proof of Theorem 1 Let CxC_ {x} Cx represents the number of random elements in the hash table and xxx TTT collision, provided

  Cxy={1if h(x)=h(y)0if h(x)≠h(y)C_{xy}=\left\{\begin{array}{cr}

  1 & if\ h(x)=h(y) \\

  0 & if\ h(x)\neq h(y)

  \end{array}\right.Cxy={10if h(x)=h(y)if h(x)=h(y)

  Well,

  E [Cx] = E [Σy∈T-xCxy] = Σy∈T-xE [Cxy] Since the desired properties of linearity = Σy∈T-x1m = (n-1) 1m

  E[C_x]&=E[\sum_{y\in T-x}C_{xy}] \\

  & = \ Sum_ {y \ in Tx} E [C_ {xy}] & because the desired linear properties \\

  &=\sum_{y\in T-x}\frac{1}{m} \\

  &=(n-1)\frac{1}{m} \\

  &<\frac{n}{m}.

  \end{array}E[Cx]=E[∑y∈T−xCxy]=∑y∈T−xE[Cxy]=∑y∈T−xm1=(n−1)m1

  Example: If n = 1, m = 2n = 1, m = 2n = 1, m = 2, then E [Cx] <12.E [C_x] <\ frac {1} {2} .E [Cx] < twenty one.

  3- construct a global hash H \ mathcal {H} H

  Theorem 2: according to the following four-step configuration H \ mathcal {H} H is all fields:

  (Condition) Order mmm equal to a prime number;

  (Initial preparation) input kkk written r + 1r + 1r + 1 digits: k = k = k =, where ki∈ {0,1, ..., m-1} k_i \ in \ {0, 1, ..., m-1 \} ki∈ {0,1, ..., m-1} (equivalent to the hexadecimal notation by kkk mmm);

  (Random) randomly selecting a a = a = a =, wherein ai∈0,1, ..., m-1a_i \ in {0, 1, ..., m-1} ai∈0,1, .. ., m-1;

  (hash函数)ha(k)=(∑i=0i=rai×ki)mod  mh_a(k)=(\sum_{i=0}^{i=r}a_i\times k_i) \mod mha(k)=(∑i=0i=rai×ki)modm.

  See 2 proof.

  4-python achieve

  To write their own code, correct me if wrong look. Code link: https: //github.com/VFVrPQ/LDP/blob/master/Components/UniversalHashing.py, otherwise complete code is as follows:

  import math

  import random

  class UniversalHashing:

  '''

  g: a prime

  d: domain, [0, 1, ..., d-1]

  len: The maximum number of digits in g Base

  v: an input value in [0, 1, ..., d-1]

  hash function: H_A (a) = (a (0) * k (0) + a (1) * k (1) + ... + a (as-1) * k (only-1))% g

  '''

  def __init__(self, g, d):

  self.__g = g

  assert g>=2, 'g is less than 2'

  assert self.__isPrime(g), 'g is not a prime'

  self.__d = d

  The self .__ len = math.ceil (math.log (d) / math.log (g)) # g-band, the maximum number of bits

  self.__a = self.__len*[0] # initial length

  # v is an input value in [0, 1, ..., d-1]

  def hash(self, v):

  self.__randomness() # regenerate a, select H

  out = self.calc(self.__a, v)

  return self.__a, out

  Calc H_A # (k) = (a (0) * k (0) + a (1) * k (1) + ... + a (as-1) * k (only-1))% g

  def calc(self, a, v):

  assert only (a) == self .__ only, "only the (A)! = self .__ only '

  k = self .__ toBitList (v)

  out = 0 How much money Zhengzhou abortions http://mobile.sgyy029.com/

  for i in range(self.__len):

  out = (out + a[i]*k[i]) % self.__g

  return out

  def __randomness(self):

  # generate a

  for i in range(self.__len):

  self .__ a [i] = random.randint (0, self .__ g-1)

  def __toBitList(self, v):

  assert v>=0, 'v<0'

  if v == 0:

  return self.__len * [0]

  Bitliste = self .__ man * [0]

  for i in range(self.__len):

  bitList[i] = v%self.__g

  v = int (v / self .__ g)

  return Bitliste

  def __isPrime(self, v):

  if v<=1:

  return False

  for i in range(2, int(math.sqrt(v))+1, 1):

  if v%i==0:

  return False

  return True

  # for test

  if __name__ == "__main__":

  TIMES = 10

  g = 29 # prime

  d = 16 # domain

  uhash = UniversalHashing(g, d)

  H = g * [0]

  for i in range(TIMES): # random TIMES to verify

  x = random.randint (0, d-1)

  _, out = uhash.hash(x)

  H[out] += 1

  for i in range(g):

  print(i, H[i])


Guess you like

Origin blog.51cto.com/14503791/2484280