A- example of unsupervised learning algorithms: hierarchical clustering

 

⼀ kinds of small samples for accurate clustering clustering algorithms: hierarchical clustering. The so-called more and more precise, ⼀ aspect refers to the clustering method full transparency in the implementation process, the other side is ⼀ in a real application scenarios for the amount of data (about thousands of rows) of data, hierarchical clustering to have a clustering effect of looking great!

 

First of all it will calculate the nearest data point, be classified as a cluster, and then take the center of mass of the cluster as a representative to participate in the next time the process of selecting the nearest point, in short, it is the constant pairwise merge until it merged into a cluster so far

 

Fake code

1: Calculation proximity twenty-two element

2: repeat

3: merge the two closest clusters

4: more proximity matrix updated to reflect the proximity between the original and the new cluster or cluster elements

5: Until leaving only a cluster ⼀

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
 
In [4]:
from sklearn.datasets import load_iris
iris = load_iris()
In [6]:
iris.data
In [7]:
iris.target

Out[7]:

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
In [8]:
from sklearn.cluster import AgglomerativeClustering
agClustering = AgglomerativeClustering(n_clusters=3)
In [9]:
agClustering.fit(iris.data)
Out[9]:
AgglomerativeClustering(affinity='euclidean', compute_full_tree='auto',
            connectivity=None, linkage='ward', memory=None, n_clusters=3,
            pooling_func=<function mean at 0x0000027004F287B8>)
In [12]:
agClustering.labels_
Out[12]:
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2,
       2, 2, 2, 0, 0, 2, 2, 2, 2, 0, 2, 0, 2, 0, 2, 2, 0, 0, 2, 2, 2, 2,
       2, 0, 0, 2, 2, 2, 0, 2, 2, 2, 0, 2, 2, 2, 0, 2, 2, 0], dtype=int64)
In [15]:
from sklearn.metrics import accuracy_score  

In [16]:

accuracy_score(iris.target,agClustering.labels_)

Out[16]:

0.23333333333333334
 

Guess you like

Origin www.cnblogs.com/Koi504330/p/11909413.html