PP: Time series clustering via community detection in Networks

tasks:
1. review the community detection paper
2. formulate your problem and software functions
3.

Suppose: similar time series tend to connect to each other and form communities. / high correlated time series tend to connect to each other and form communities. 

Background and related works

shaped based distance measures; feature based distance measures; structure based distance measures. time series clustering; community detection in networks.

Methodology

  1. data normalization
  2. time series distance calculation
  3. network construction
  4. community detection

Which step influence the clustering results:

distance calculation algorithm; network construction methods. community detection methods.  

2. distance matrix

 calculating the distance for each pair of time series in the data set and construct a distance matrix D, where dij is the distance between series Xi and XJ . A good choice of distance measure has strong influence on the network construction and clustering result.

3. network construction

Two common method: K-NN and \epsilon-NN;  EXPLORATION

Experiments

45 time series data sets. 

Purpose: check the performance of each combination of step2, step3,and step4 to each data sets. 

Index指标:Rand index. 

Vary the parameters: the k of k-NN from 1 to n-1;  the epsilon of epsilon-NN from min(D) to max(D) in 100 steps.

 Step2: Manhattan, Euclidean, infinite Norm, DTW, short time series, DISSIM, Complexity-Invariant, Wavlet tranform, Pearson correlation, Intergrated periodogram. 

Step3: fast greedy; multilevel; walktrap; infomap; label propagration. 

Step4: vary the parameter of k and \epsilon. 

Results

1. The k-NN construction method just allows discrete values of k while the ε-NN method accepts continuous values

Supplementary knowledge: 

1. box plot

它能显示出一组数据的最大值最小值中位数、及上下四分位数

以下是箱形图的具体例子:

                            +-----+-+       
  *           o     |-------|   + | |---|
                            +-----+-+    
                                         
+---+---+---+---+---+---+---+---+---+---+   分数
0   1   2   3   4   5   6   7   8   9  10

这组数据显示出:

  • 最小值(minimum)=5
  • 下四分位数(Q1)=7
  • 中位数(Med --也就是Q2)=8.5
  • 上四分位数(Q3)=9
  • 最大值(maximum )=10
  • 平均值=8
  • 四分位间距(interquartile range)={\displaystyle (Q3-Q1)}{\displaystyle (Q3-Q1)}=2 (即ΔQ)

2. 观念转变, experiment部分也很重要,不是可有可无的, 要细看。

猜你喜欢

转载自www.cnblogs.com/dulun/p/12170759.html