How to use UMAP

To learn the principle background of UMAP [1]


Propose

Manifold learning and dimension reduction algorithm

Python tools needed

numpy, sklearn, matplotlib, seaborn, pandas
matplotlib and seaborn are plotting tools and pandas is facilitating the process.

Data in use

Penguin data, https://raw.githubusercontent.com/allisonhorst/penguins/master/data/penguins_size.csv

Visualize the data

seaborn.pairplot(penguins, hue = 'species_short')

Construct a UMAP object

import umap
reducer = umap.UMAP()

Standardized the penguin dataset

penguin_data = penguins[
    [
        "culmem_length_mm",
        "culmem_depth_mm",
        "flipper_length_mm",
        "body_mass_g",
    ]
].values
scaled_penguin_data = StandardScaler().fit_transform(penguin_data)

After standardized, the shape of data

matrix = reducer.fit_transform(scaled_penguin_data) #This is the dimention reduction step 
print matrix.shape
# matrix is a numpy array
Terminal: (344,2)

Visualizing the result of UMAP

plt.scatter(
      matrix[:,0],
      matrix[:,1],
      c=[sns.color_palette()[x] for x in penguins.species_short.map({"Adelie":0, "Chinstrap":1, "Gentoo":2})]
)
plt.gca().set_aspect('equal','datalim')
plt.title('UMAP projection of the Penguin dataset', fontsize = 24)

  1. Hope to help people who think the umap-learn.readthedoc.io is redundancy ↩︎

猜你喜欢

转载自www.cnblogs.com/wulilichao/p/13386365.html