Some people have left a message on the public account before asking how to use R language to achieve t-SNE dimensionality reduction. Today's tweet introduces the code implemented in R language. The main content refers to the link https://datavizpyr.com/how-to-make- tsne-plot-in-r/
The full name of t-SNE is t-Distributed Stochastic Neighbor Embedding. I don’t know the specific calculation principle. Anyway, it is similar to PCA to convert high-dimensional data into low-dimensional data.
The example data set uses the penguin data set, the name is penguins
, this data set comes from the R package palmerpenguins
, if you want to use this data, you need to install the R package, and you need to use the R package to implement t-SNE Rtsne
. The R package tidyverse
is used for data sorting, so load these three R packages first. If it is the first time to use it, you need to install it first. The installation command is
install.packages("tidyverse")
install.packages("palmerpenguins")
install.packages("Rtsne")
Load the required R packages
library(tidyverse)
library(palmerpenguins)
library(Rtsne)
Select numeric variables in the dataset for subsequent analysis
penguins %>%
select(where(is.numeric),-year,species) %>%
mutate(ID=row_number()) %>%
column_to_rownames("ID") %>%
na.omit()-> df
Newly learned function here
-
Select numeric variables in a data frame select(where(is.numeric))
-
Add 1 to the dataset: the number of how many rows mutate(ID=row_number())
-
Specify the column in the data set as the row name (provided that there are no duplicates) column_to_rownames("ID")
t-SNE dimensionality reduction
tSNE_fit<-df %>%
select(-species) %>%
scale() %>%
Rtsne()
Extract dimensionality reduction results
tSNE_fit$Y %>%
as.data.frame() %>%
rename(tSNE1="V1",
tSNE2="V2") %>%
mutate(Species=df$species) -> tSNE.plot
Scatter plot showing results
library(ggplot2)
ggplot()+
geom_point(data=tSNE.plot,
aes(x=tSNE1,y=tSNE2,color=Species))+
stat_ellipse(data=tSNE.plot,
geom="polygon",
aes(x=tSNE1,y=tSNE2,
group=Species,
fill=Species),
alpha=0.5,
lty="dashed",
color="black",
key_glyph="blank")+
theme_bw()
Elegantly modify the legend
https://mp.weixin.qq.com/s/I3YnxqulQRu-9i-gZIh7fA
Welcome everyone to pay attention to my public number
Xiao Ming's data analysis notebook
Message to discuss related content
The sample code of today's tweet can be 20210827
obtained by leaving a message in the background of the official account
Xiaoming’s data analysis notebook public account mainly shares: 1. Simple examples of R language and python for data analysis and data visualization; 2. Reading notes on horticultural plants related transcriptomics, genomics, and population genetics literature; 3. Bioinformatics Learn introductory study materials and your own study notes!
本文分享自微信公众号 - 小明的数据分析笔记本(gh_0c8895f349d3)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。