Learning Algorithms for Hash Functions

foreword

If you are interested in this article, you can click " 【Visitor Must Read-Guide Page】This article includes all high-quality blogs on the homepage " to view the complete blog classification and corresponding links.


overview

Two steps for hash function learning:

  • Convert to binary code: You can first reduce the dimension to real numbers, and then convert to binary, or you can directly learn a binary code;
  • Learning hash mapping function: Design or learn hashing methods based on binary codes, so that similar elements are close and dissimilar elements are far away.

According to the properties of the hash function, it can be classified as follows:

  • Data-Independent Methods (Data-Independent Methods)
    • Features: The hash function has nothing to do with the training set, usually random projection or manual construction
    • 举例:Locality-sensitive hashing (LSH), Shift invariant kernel hashing (SIKH), MinHash;
  • Data-Dependent Methods
    • Features: The hash function is learned through the training set
    • Classification: unimodal hashing (unsupervised/supervised), ranking-based methods (supervised information is sorted sequence), multimodal hashing, deep hashing

Unsupervised Hashing

Problem definition:

  • Input: feature vector { xi } \{\mathbf{x}_i\}{ xi} , corresponding to the matrixX \mathbf{X}X
  • Output: binary code { bi } \{\mathbf{b}_i\}{ bi} , corresponding to matrixB \mathbf{B}B , similar features correspond to similar codes

PCA Hashing (PCAH)

Select matrix XX ⊤ \mathbf{XX}^\topXX Maximummmm eigenvectors to form a projection matrixW ∈ R d × m \mathbf{W}\in \mathbb{R}^{d\times m}WRd × m , and define the hash function as:
h ( x ) = sgn ⁡ ( W ⊤ x ) . h(\mathbf{x})=\operatorname{sgn}\left(\mathbf{W}^\top \ mathbf{x}\right).h(x)=sgn(Wx).

Spectral Hashing (SH)

min ⁡ { y i }     ∑ i j W i j ∥ y i − y j ∥ 2  s.t.    y i ∈ { − 1 , 1 } k ∑ i y i = 0 1 n ∑ i y i y i ⊤ = I \begin{aligned} \mathop{\min}\limits_{\left\{\mathbf{y}_i\right\}} \ \ \ & \sum_{i j} W_{i j}\left\|\mathbf{y}_i-\mathbf{y}_j\right\|^2 \\ \text { s.t.} \ \ \ & \mathbf{y}_i \in\{-1,1\}^k \\ &\sum_i \mathbf{y}_i=0 \\ &\frac{1}{n} \sum_i \mathbf{y}_i \mathbf{y}_i^\top=\mathbf{I} \end{aligned} { yi}min    s.t.   ijWijyiyj2yi{ 1,1}kiyi=0n1iyiyi=I

W ij W_{ij}Wijxi \mathbf{x}_ixixj \mathbf{x}_jxjThe similarity, the second constraint means that after all the data is mapped, each bit is 1 11 and−1 -1The number of − 1s is the same, the third constraint means that there is no correlation between different bits. The above optimization problem is an integer programming, which is difficult to solve. You can useyi ∈ { − 1 , 1 } k \mathbf{y}_i \in\{-1,1\}^kyi{ 1,1}The k constraint is canceled, the problem is relaxed, and the obtained{ yi } \{\mathbf{y}_i\}{ yi} each bit bysgn ⁡ \operatorname{sgn}sgn function mapping to get the final binary code.

Anchor Graph Hashing (AGH)

For X \mathbf{X}X uses k-means clustering to get{ uj ∈ R d } j = 1 m \{\mathbf{u}_j\in \mathbb{R}^d\}_{j=1}^m{ ujRd}j=1m, redefine the matrix Z ∈ R n × m \mathbf{Z}\in \mathbb{R}^{n\times m}ZRn×m
Z i j = { exp ⁡ ( − D 2 ( x i , u j ) / t ) ∑ j ′ ∈ J i exp ⁡ ( − D 2 ( x i , u j ′ ) / t ) , ∀ j ∈ J i 0 ,  otherwise  Z_{i j}=\left\{\begin{array}{l} \frac{\exp \left(-\mathcal{D}^2\left(\mathbf{x}_i, \mathbf{u}_j\right) / t\right)}{\sum_{j^{\prime} \in\mathcal{J}_i } \exp \left(-\mathcal{D}^2\left(\mathbf{x}_i, \mathbf{u}_{j^{\prime}}\right) / t\right)}, \forall j \in \mathcal{J}_i \\ 0, \text { otherwise } \end{array}\right. Zij= jJiexp(D2(xi,uj)/t)exp(D2(xi,uj)/t),jJi0, otherwise 

where J i \mathcal{J}_iJiis a set of subscripts, corresponding to { uj ∈ R d } j = 1 m \{\mathbf{u}_j\in \mathbb{R}^d\}_{j=1}^m{ ujRd}j=1mmiddle sss away fromxi \mathbf{x}_ixiThe index of the nearest point. Define the hash function as:
h ( x ) = sign ⁡ ( W ⊤ z ( x ) ) , h(\mathbf{x})=\operatorname{sign}\left(W^\top z(\mathbf{x} )\right),h(x)=sign(Wz(x)),

W = n Λ − 1 / 2 V Σ − 1 / 2 W=\sqrt{n} \Lambda^{-1 / 2} V \Sigma^{-1 / 2}W=n L1/2VΣ1/2 ,Λ = diag ⁡ ( Z ⊤ 1 ) \Lambda=\operatorname{diag}\left(Z^\top\mathbf{1}\right)L=diag(Z 1)VVV andΣ \SigmaΣ by matrixΛ − 1 / 2 Z ⊤ Λ − 1 / 2 \Lambda^{-1 / 2} Z^\top \Lambda^{-1 / 2}L1/2 ZΛ1/2 of the eigenvectors and eigenvalues. The overall goal of this method is consistent with Spectral Hashing, but by introducing Anchor, the problem solving is accelerated, and the time complexity is fromO ( n 3 ) O(n^3)O ( n3 )down toO ( nm 2 ) O(nm^2)O ( n m2 ). For details, please refer tothe original paper.


Supervised Hashing

Problem definition:

  • Input: feature vector { xi } \{\mathbf{x}_i\}{ xi} , corresponding to the matrixX \mathbf{X}X , category vector{ yi } \{\mathbf{y}_i\}{ yi} , corresponding to matrixY \mathbf{Y}Y
  • Output: binary code { bi } \{\mathbf{b}_i\}{ bi} , corresponding to matrixB \mathbf{B}B , the same category corresponds to a similar code

The common algorithms are as follows. For specific information, please refer to the reference materials, which will not be expanded here.

insert image description here


Ranking-based Methods

This part belongs to the category of supervised hashing, but the supervision information has changed from marking to sorting information, such as triples ( x , x + , x − ) \left(x, x^{+}, x^{-}\ right)(x,x+,x ). Common algorithms covered in this section are as follows:

insert image description here


References

Guess you like

Origin blog.csdn.net/qq_41552508/article/details/129187566