PP: Deep clustering based on a mixture of autoencoders

Problem: clustering

A clustering network transforms the data into another space and then selects one of the clusters. Next, the autoencoder associated with this cluster is used to reconstruct the data-point.

Introduction:

traditional method: data------> extract a feature vector from each object --------> aggregate groups of vectors in a feature space.

cluster is represented by an autoencoder network. ??how

common method: k-means; but for the high-dimensional dataset, it's less useful because inter-point distances become less informative in high-dimensional spaces.

如果对于找一个序列的pattern来说，是不是就是时间维度作为高维情况，每个pattern作为一个cluster，而有的子序列不能归到cluster当中。

representation learning has been used to map the input data into a low-dimensional feature space.

Attempts: apply unsupervised deep learning approaches for clustering. ??how

However, most focus on clustering over a low-dimensional feature space.

Transform the data into more clustering-friendly representations:

A deep version of k-means is based on learning a data representation and applying k-means in the embedded space.

How to represent a cluster:

a vector VS an autoencoder network.

Data collapsing problem: 数据崩溃问题，对于每个数据库，你必须重新调一遍程序。

for multivariate time series, how to find patterns.

1. find patterns: SAX; TICC; slide windows; 导数

2. VG, statistic features.

Supplementary knowledge:

1. Pattern recognition and clustering

Pattern recognition is a mature field in computer science with well-established techniques for the assignment of unknown patterns to categories, or classes. A pattern is defined as a vector of some number of measurements, called features. Usually, a pattern recognition system uses training samples from known categories to form a decision rule for unknown patterns. The unknown pattern is assigned to one of the categories according to the decision rule. Since we are interested in the classes of documents that have been assigned by the user, we can use pattern recognition techniques to try to classify previously unseen documents into the user's categories. While pattern recognition techniques require that the number and labels of categories are known, clustering techniques are unsupervised, requiring no external knowledge of categories. Clustering methods simply try to group similar patterns into clusters whose members are more similar to each other (according to some distance measure) than to members of other clusters. There is no a priori knowledge of patterns that belong to certain groups, or even how many groups are appropriate. Refer to basic pattern recognition and clustering texts such as [5, 6, 7] for further information.

We first employ pattern recognition techniques on documents to attempt to find features for classification, then focus on clustering the raw features of the documents.

PP: Deep clustering based on a mixture of autoencoders

猜你喜欢