Original link: http://tecdat.cn/?p=6443
Division clustering is based on similarity of the data set data set clustering method classified into a plurality of groups.
Partition clusters, including:
- K-means clustering (MacQueen 1967), wherein each cluster is represented by a mean value belonging to the cluster centers or data points. K-means method is sensitive to abnormal data points and abnormal values.
- K-medoids cluster or the PAM ( Partitioning Around Medoids, Kaufman and Rousseeuw, 1990), wherein each cluster is represented by a cluster of objects. Compared with the k-means, PAM less sensitive to outliers.
- CLARA algorithm ( Clustering Large the Applications ), which is extended for large data sets of PAM.
For each of these methods, we provide:
- The basic ideas and key mathematical concepts
- Clustering algorithms and implementation of software R
- Example Cluster analysis and visualization for R
data preparation:
## Murder Assault UrbanPop Rape
## Alabama 1.2426 0.783 -0.521 -0.00342
## Alaska 0.5079 1.107 -1.212 2.48420
## Arizona 0.0716 1.479 0.999 1.04288
Determine the optimal number of clusters k-means clustering:
## Clustering k = 1,2,..., K.max (= 10): .. done
## Bootstrapping, b = 1,2,..., B (= 100) [one "." per sample]:
## .................................................. 50
## .................................................. 100
Computing and visualization k-means clustering:
Similarly, the visualization can be calculated and clustering PAM:
Big Data tribe - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services
Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )
[Service] Scene
Research; the company outsourcing; online and offline one training; data collection; academic research; report writing; market research.
[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy
Welcome to elective our R language data analysis will be mining will know the course!