A simple and small example of t-SNE dimensionality reduction in R language

Some people have left a message on the public account before asking how to use R language to achieve t-SNE dimensionality reduction. Today's tweet introduces the code implemented in R language. The main content refers to the link https://datavizpyr.com/how-to-make- tsne-plot-in-r/

The full name of t-SNE is t-Distributed Stochastic Neighbor Embedding. I don’t know the specific calculation principle. Anyway, it is similar to PCA to convert high-dimensional data into low-dimensional data.

The example data set uses the penguin data set, the name is penguins, this data set comes from the R package palmerpenguins, if you want to use this data, you need to install the R package, and you need to use the R package to implement t-SNE Rtsne. The R package tidyverseis used for data sorting, so load these three R packages first. If it is the first time to use it, you need to install it first. The installation command is

install.packages("tidyverse")
install.packages("palmerpenguins")
install.packages("Rtsne")

Load the required R packages

library(tidyverse)
library(palmerpenguins)
library(Rtsne)

Select numeric variables in the dataset for subsequent analysis

penguins %>% 
  select(where(is.numeric),-year,species) %>%
  mutate(ID=row_number()) %>% 
  column_to_rownames("ID") %>% 
  na.omit()-> df

Newly learned function here

  • Select numeric variables in a data frame select(where(is.numeric))
  • Add 1 to the dataset: the number of how many rows mutate(ID=row_number())
  • Specify the column in the data set as the row name (provided that there are no duplicates) column_to_rownames("ID")

t-SNE dimensionality reduction

tSNE_fit<-df %>% 
  select(-species) %>% 
  scale() %>% 
  Rtsne()

Extract dimensionality reduction results

tSNE_fit$Y %>% 
  as.data.frame() %>% 
  rename(tSNE1="V1",
         tSNE2="V2") %>% 
  mutate(Species=df$species) -> tSNE.plot

Scatter plot showing results

library(ggplot2)
ggplot()+
  geom_point(data=tSNE.plot,
             aes(x=tSNE1,y=tSNE2,color=Species))+
  stat_ellipse(data=tSNE.plot,
               geom="polygon",
               aes(x=tSNE1,y=tSNE2,
                   group=Species,
                   fill=Species),
               alpha=0.5,
               lty="dashed",
               color="black",
               key_glyph="blank")+
  theme_bw()

Elegantly modify the legend

https://mp.weixin.qq.com/s/I3YnxqulQRu-9i-gZIh7fA

image.png

Welcome everyone to pay attention to my public number

Xiao Ming's data analysis notebook


Message to discuss related content

The sample code of today's tweet can be 20210827obtained by leaving a message in the background of the official account

Xiaoming’s data analysis notebook public account mainly shares: 1. Simple examples of R language and python for data analysis and data visualization; 2. Reading notes on horticultural plants related transcriptomics, genomics, and population genetics literature; 3. Bioinformatics Learn introductory study materials and your own study notes!


本文分享自微信公众号 - 小明的数据分析笔记本(gh_0c8895f349d3)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4579431/blog/5211974