CLASSIFICATION language model R

Original link: http://tecdat.cn/?p=6443

Division clustering is based on similarity of the data set data set clustering method classified into a plurality of groups.

Partition clusters, including:

K-means clustering (MacQueen 1967), wherein each cluster is represented by a mean value belonging to the cluster centers or data points. K-means method is sensitive to abnormal data points and abnormal values.
K-medoids cluster or the PAM ( Partitioning Around Medoids, Kaufman and Rousseeuw, 1990), wherein each cluster is represented by a cluster of objects. Compared with the k-means, PAM less sensitive to outliers.
CLARA algorithm ( Clustering Large the Applications ), which is extended for large data sets of PAM.

For each of these methods, we provide:

The basic ideas and key mathematical concepts
Clustering algorithms and implementation of software R
Example Cluster analysis and visualization for R

data preparation:

my_data <- USArrests
# Remove any missing value (i.e, NA values for not available)
my_data <- na.omit(my_data)
# Scale variables
my_data <- scale(my_data)
# View the firt 3 rows
head(my_data, n = 3)

##         Murder Assault UrbanPop     Rape
## Alabama 1.2426   0.783   -0.521 -0.00342
## Alaska  0.5079   1.107   -1.212  2.48420
## Arizona 0.0716   1.479    0.999  1.04288

Determine the optimal number of clusters k-means clustering:

fviz_nbclust(my_data, kmeans,
             method = "gap_stat")

## Clustering k = 1,2,..., K.max (= 10): .. done
## Bootstrapping, b = 1,2,..., B (= 100)  [one "." per sample]:
## .................................................. 50 
## .................................................. 100

Computing and visualization k-means clustering:

fviz_cluster(km.res, data = my_data, 
             ellipse.type = "convex",
             palette = "jco",
             repel = TRUE,
             ggtheme = theme_minimal())

Similarly, the visualization can be calculated and clustering PAM:

pam.res <- pam(my_data, 4)
# Visualize
fviz_cluster(pam.res)

Thank you for reading this article, you have any questions please leave a comment below!

Big Data tribe - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services

Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )

QQ：3025393450

[Service] Scene

Research; the company outsourcing; online and offline one training; data collection; academic research; report writing; market research.

[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy

Welcome to elective our R language data analysis will be mining will know the course!