Paper reading on multi-view clustering (1)

When the clustering method uses a certain type of predefined similarity measure, the following situation will occur:

There have been successes in data clustering, but they often rely on predefined similarity measures, which are subject to the original method: often ineffective when the input dimensionality is relatively high.

1. Deep Multi network Embedded Clustering

Applied Intelligence
mainly proposes the use of DEC (deep Embed clustering) deep encoding clustering method to cluster features;

On this basis, several multi-view features are added;

2. Deep convolutional self-paced clustering

The main research methods used in this article are:

  1. Unsupervised clustering;
  2. Self-paced learning method, a learning method that changes samples from simple to difficult;

2.1 Existing problems and proposed solutions

2.1.1 Problems

The Kmeans algorithm is very effective when the data points are evenly distributed around the corresponding centroids in the feature space. However, K-means is generally not suitable for high-dimensional data because of the inefficiency of similarity measures caused by the "curse of dimensionality".

2.1.2 Solutions

The main contribution of the paper:
Specifically,

  1. In the pre-training stage, we propose to utilize convolutional autoencoders to extract high-quality data representations containing spatially relevant information.

  2. Then, in the fine-tuning stage, clustering loss is directly applied to the learned features to jointly perform feature refinement and cluster assignment. We retain the decoder to avoid the feature space being distorted by clustering loss.

  3. In order to stabilize the training process of the entire network, we further introduce a self-stepping learning mechanism and select the most confident sample in each iteration. Through comprehensive experiments on 7 popular image datasets, we demonstrate that the proposed algorithm can consistently outperform state-of-the-art competitors.

The first two indicate that the feature learning and clustering processes are mutually complementary processes.
The third point uses self-paced learning, and the samples change from easy to difficult during the optimization process. , the adverse effects of marginal samples can be effectively mitigated. It is to reduce the possibility that unreliable samples will confuse or even mislead the DNN training process, thereby seriously reducing the clustering performance.

To put it simply, use convolution to extract features; then cluster the features; 3. In the training process, a self-stepping learning step mechanism is introduced to select the most confident sample in each iteration process;

2.2 Implementation method

Specifically, our method consists of two stages: pre-training and fine-tuning.

  • In the pre-training stage, we train a convolutional autoencoder (CAE) by minimizing the reconstruction loss [26]. By using CAE, our method can transform the data from a relatively high-dimensional and sparse space to a low-dimensional space. Dimensional and compact space.

  • , in the fine-tuning stage, unlike some previous works [31, 32, 37] that only retain the encoder, we tune the entire autoencoder (i.e., CAE) by using clustering loss and reconstruction loss, so that the data can be retained attributes to avoid the destruction of feature space.

  • Question: To select the most confident samples during the generation process, how to know which samples have high credibility?

3. Multi-view representation learning

4. Clustering method

Several clustering methods are used to compare with the DCSPC method, which can be roughly divided into three categories:

  • 1) Traditional methods, including Kmeans (KM) [5], Gaussian Mixture Model (GMM) [6] and spectral clustering (SC) [7];

  • 2) Representation-based methods, including SAE [25] and CAE [26];

  • 3) Deep clustering method, consisting of deep embedding clustering (DEC) [32], improved deep embedding clustering (IDEC) [33], deep embedding network (DCN) [34], deep K-means (DKM) [ 35], convolutional deep embedding clustering (ConvDEC) [36], adaptive self-paced clustering (ASPC) [37], structured deep embedding network (SDCN) [38], semi-supervised deep embedding clustering (SDEC) [ 39], DDC (deep density-based clustering)[40]

4.1 K means clustering

The Kmeans algorithm is very effective when the data points are evenly distributed around the corresponding centroids in the feature space. However, K-means is generally not suitable for high-dimensional data because of the inefficiency of similarity measures caused by the "curse of dimensionality". Therefore, in practical applications, we should use dimensionality reduction methods, such as PCA[8], MDS[9], NMF[10], etc., to project the original data into a low-dimensional space, and then use the K-means algorithm to Clustering usually gives better results. In addition to the above linear dimensionality reduction methods, nonlinear algorithms such as tSNE [17], LLE [18] and DNN-based methods [19-21] are widely used in preprocessing before the Kmeans algorithm. Interested readers can refer to [22-24] for a comprehensive understanding. In many practical applications, data may come from different views, therefore, many multi-view clustering methods have been proposed. For example, Zhang et al. [13] first mapped multi-view samples to the shared view space, then converted the samples to the discriminative space, and finally performed K-means clustering on the converted samples. Wang et al. [14] proposed a general graph-based multi-view clustering framework, which extracts the feature matrix of multi-views, fuses the graph matrix, and generates a unified graph matrix for direct clustering. Considering that there may be situations where a specific class does not exist in the training data, Hayashi et al. [16] proposed a clustering-based zero-shot learning method to divide the data into unseen classes. and visible classes.

4.2 Unsupervised clustering

Deep unsupervised clustering methods can be roughly divided into two categories. One type is a method that usually treats feature learning or clustering independently, that is, first project the original data into a low-dimensional feature space, and then use a conventional clustering algorithm to group the feature points. Unfortunately, this form of separation can impose limitations on clustering performance because some potential relationships between feature learning and clustering are ignored.

The other type is the method using joint optimization criteria, which performs feature learning and clustering at the same time, which has great advantages over the separation method. Recently, several methods have been proposed to integrate feature learning and clustering into a unified framework. Joint unsupervised learning (JULE) [29] proposes to simultaneously guide clustering and representation learning based on a unified weighted three-state loss, but the computational complexity is high. Chang et al. [30] proposed the hypothesis of binary relationships between pairs of images, and developed a deep adaptive clustering (DAC) model to re-establish the clustering task as a binary pairwise classification problem. , showing good results on 6 image datasets. Adaptive self-step clustering (ASPC) [37] draws on the hard-weighted self-step learning method and gives priority to high-confidence samples when training the clustering network to eliminate the negative impact of marginal samples and stabilize the training process. Ren et al. [40] proposed a depth density-based clustering (DDC) technique that can adaptively estimate the number of data clusters of arbitrary shapes. Deep embedded clustering with data augmentation (DECDA) [36] introduced data augmentation techniques into the original deep embedded clustering framework and achieved good results on four grayscale image data sets. Clustering performance. Semi-supervised deep embedded clustering (SDEC) [39] overcomes the shortcoming of DEC [32] that cannot use prior knowledge to guide the training process.

deep adaptive clustering, DAC 模型: Chang J, Wang L, Meng G, Xiang S, Pan C (2017) Deep adaptive
image clustering. In: International Conference on Computer
Vision, pp 5880–5888
https://github.com/vector-1127/DAC

Adaptive self-step clustering (ASPC) [37] draws on the hard-weighted self-step learning method, Guo X, Liu X, Zhu E, Zhu X, Li M, Xu X, Yin J ( 2020) Adaptive
self-paced deep clustering with data augmentation. IEEE Trans Knowl Data Eng
https://github.com/XifengGuo/ASPC-DA;< /span>

Semi-supervised deep embedded clustering (SDEC) Ren Y, Hu K, Dai X, Pan L, Hoi SCH, Xu Z (2019) Semi-supervised deep embedded clustering. Neurocomputing 325 :121–
130
https://github.com/yongzx/SDEC-Keras;

5. Self-paced learning

Similar to the core idea of ​​course learning [43], the goal of self-pace learning is to learn a model, from easy to difficult, and gradually introduce samples for training. The obvious difference between these two methods is that the former requires easy and difficult samples to be predetermined, while the latter can automatically select the order from the data itself. Given a training set

Insert image description here

Where, L(·) represents the loss function of a specific problem, h(λ, vi) represents a self-stepping regularizer independent of L(·), which can be defined in many forms,
V =[v1, v2,…], vn] T represents the weight variable reflecting the complexity of the sample, λ is a parameter, called the learning speed, used to control the "model age", which gradually increases, to explore more samples. When h(λ, vi) =−λvi and vi equals 0 or 1, self-paced learning degenerates into a hard-weighted form, that is:

Insert image description here
In addition, for updating θ with a fixed v, problem (3) degenerates into a weighted loss minimization problem, which can be easily solved by stochastic gradient descent (SGD) and backpropagation (BP).

So far, self-paced learning has been applied to a variety of tasks and models. Kumar et al. [44] demonstrated for the first time that a self-paced learning algorithm outperforms current state-of-the-art methods in learning latent structure support vector machines. In [45], the self-step learning paradigm was successfully applied to the clustering of time series. Tang Y, Xie Y, Yang X, Niu J, Zhang W (2021) Tensor multi-
elastic kernel self-paced learning for time series clustering. IEEE
Trans Knowl Data Eng 33(3):1223–1237;

Jiang et al. [46] proposed a self-paced curriculum learning (SPCL) framework that can jointly consider prior knowledge and learning progress. In order to simultaneously enhance the robustness and effectiveness of supervised learning, [47] and others first proposed the self-paced boost learning (SPBL) framework, which can reveal and utilize the relationship between boost and self-paced learning. Ren et al. [48] noticed that standard self-paced learning may have a class imbalance problem, and carefully designed two new soft weighting schemes to compensate for this problem by assigning weights to each class and locally selecting instances. Recently, SPUDRFs [49] solve the fundamental issues of ranking and selection in self-paced learning in terms of fairness and can be easily combined with various deep discriminative models. In SAMVC [50], a soft-weighted self-stepping learning form is introduced in the multi-view clustering model to reduce the adverse effects of outliers and noise, and a self-weighting strategy is proposed to judge the importance of different views . Meng et al.'s [51] managed to provide some explanations of the self-paced learning paradigm in pursuit of theoretical understanding. Collectively, these literature publications confirm that self-paced learning helps avoid getting stuck in undesirable local minima and generally improves model performance.

Guess you like

Origin blog.csdn.net/chumingqian/article/details/134416896