[Dictionary Learning + Sparse Encoding] A brief introduction and the implementation of sklearn

1. Dictionary learning and sparse coding

  • Simply put, sparse coding is to represent the input vector (signal)/matrix (image) as a linear combination of a sparse coefficient vector and a set of over-complete basis vectors (dictionary).
  • Therefore, after sparse coding passes the above method, the input data can be reconstructed into a sparse vector : that is, the internal elements of the vector satisfy only a few non-zero elements or a few elements far greater than zero.
  • Generally, the number k of overcomplete basis vectors is required to be very large (much larger than the dimension n of the input data), because such a combination of basis vectors can more easily learn the intrinsic structure and characteristics of the input data.
  • Why convert to sparse vector?

1) Feature Selection: In many cases, the features extracted directly from the original image actually have redundant components. That is to say, we only need to identify key features. There is no need to use so many features. In more cases, Those redundant information will interfere with our final recognition results! Sparse coding can realize automatic selection of features. It will learn to remove these uninformative features, that is, reset the weights corresponding to these features to 0.
2) Interpretability : Another reason for favoring sparsity is that the model is easier to explain, that is, only those few key features will affect the final result, which is easier to explain.

For example, the probability of contracting a certain disease is y, and the data x we ​​collect is 1000-dimensional. That is, we need to find out how these 1000 factors affect the probability of contracting the disease. Through learning, if the finally learned w* has only a few non-zero elements, for example, only 5 non-zero wi, then we have reason to believe that these corresponding features provide huge information in disease analysis. , decision-making. In other words, whether you suffer from this disease is only related to these five factors, and it will be much easier for doctors to analyze.

2. Implementation of sklearn

  • Dictionary learning and sparse coding through MiniBatchDictionaryLearning
class sklearn.decomposition.MiniBatchDictionaryLearning(n_components=None, *, alpha=1, n_iter='deprecated',
max_iter=None, fit_algorithm='lars', n_jobs=None, batch_size=256, shuffle=True, dict_init=None,
transform_algorithm='omp', transform_n_nonzero_coefs=None, transform_alpha=None, verbose=False,
split_sign=False, random_state=None, positive_code=False, positive_dict=False, transform_max_iter=1000,
callback=None, tol=0.001, max_no_improvement=10)

1) n_components: int, default=None , the number of each basis vector/atoms in the dictionary to be extracted, the dimension of each basis vector should be the dimension of the input vector, thereforeThe dimension of the dictionary should be (n_components, n_features)
2) alpha: float, default=1 , the weight of the regularization term (Lasso regression term), used to balance sparsity and reconstruction error
3) n_iter: int, default=1000 , the total number of iterations, deprecated in version 1.1, Use max_iter instead
4) max_iter: int, default=None , the maximum number of iterations (before early stopping strategy), if not None, n_iter will be ignored
5) fit_algorithm: {'lars', 'cd'}, default=' lars' , an algorithm for solving optimization problems (First use the fit algorithm to train the dictionary), default is lars, minimum angle regression
6) n_jobs: int, default=None , the number of parallel jobs, the default is None, which is 1
7) batch_size: int, default=256 , the number of samples in each mini-batch
8 ) shuffle: bool, default=True , whether to shuffle the samples before building the batch
9) dict_init: ndarray of shape (n_components, n_features), default=None , initialization value of the dictionary
10) transform_algorithm: {'lasso_lars', 'lasso_cd' , 'lars', 'omp', 'threshold'}, default='omp' , the algorithm used to transform the data, ieThe algorithm is used to learn a sparse coefficient vector for each sample (i.e. the result of sparse coding), and then transform the original input data. of each sampleDimensions of sparse vectorsshould be equal to the number of basis vectors in the dictionary, that isn_components, so forThe transformed dimension of the input data should be (n_samples, n_components), every vector is sparse.
11) transform_n_nonzero_coefs: int, default=None , the number of nonzero coefficients in each column of the solution. This only works with algorithm='lars' and algorithm='omp'. If None, then transform_n_nonzero_coefs=int(n_features / 10) .
12) transform_alpha: float, default=None , if algorithm='lasso_lars' or algorithm='lasso_cd', alpha is the penalty applied to the L1 norm. If algorithm='threshold', alpha is the absolute value of the threshold below which coefficients will be squashed to zero. If None, defaults to alpha.
13) split_sign: bool, default=False , whether to split the sparse feature vector into the splicing of its negative part and positive part. This can improve the performance of downstream classifiers.
14) random_state: int, RandomState instance or None, default=None , used to initialize the dictionary when dict_init is not specified
15) positive_code: bool, default=False , whether to force the encoding to positive
16) positive_dict: bool, default=False , whether to force the dictionary to be positive
17)transform_max_iter: int, default=1000 , if algorithm='lasso_cd' or 'lasso_lars', the maximum number of iterations to perform.

  • Member variables of this class:
    Insert image description here

  • components_ is the learned dictionary, [n_components, n_features], indicating that there are n_components basis vectors/atom, and the dimension of each basis vector is equal to the dimension of the input vector

  • Commonly used methods of this class are:
    Insert image description here

1、fit(X, y=None)
拟合X中的数据,即学习到shape为[n_components, n_features]的字典
X:待学习/待训练的样本,[n_samples, n_featues]的ndarray
返回MiniBatchDictionaryLearning类实例本身

2、transform(X)
将数据X编码为字典atom/基向量的稀疏组合,返回的就是稀疏编码的结果
X:待编码的样本,[n_samples, n_featues]的ndarray
返回:编码后的结果,[n_samples, n_components]的ndarray,需要先进行fit后学习到字典再进行稀疏编码

3、fit_transform(X)
字典学习+稀疏编码,就是上述两个函数的结合
X:待学习/待训练的样本,[n_samples, n_featues]的ndarray
返回:编码后的结果,[n_samples, n_features_new]的ndarray

3. Example

  • First use make_sparse_coded_signal to construct the training sample X, which is obtained by multiplying dictionary and code
    Insert image description here
  • Build dictionary learning/sparse coding dict_learner, the learned dictionary is:
    Insert image description here
  • Finally, transform the input data:
    Insert image description here
import numpy as np
from sklearn.datasets import make_sparse_coded_signal
from sklearn.decomposition import MiniBatchDictionaryLearning

X, dictionary, code = make_sparse_coded_signal(n_samples=100, n_components=300, n_features=20,
                                               n_nonzero_coefs=10, random_state=42)

dict_learner = MiniBatchDictionaryLearning(n_components=300, batch_size=4, transform_algorithm='lasso_lars',
                                           transform_alpha=0.1, random_state=42, shuffle=False)
X_transformed = dict_learner.fit_transform(np.transpose(X))
print(X_transformed)

Guess you like

Origin blog.csdn.net/m0_48086806/article/details/132269090