(Complete code) Use SVM-RFE machine learning algorithm to screen key factors in R language

foreword

It took more than a month to write down the personal letter code. For the first time since learning R, I wrote more than 600 lines of code. My article has been published, if it is helpful to your research, I hope you can cite it. article click me

SVM-RFE

It mainly uses the e1071 package to realize mSVM-REF identification and screening of key genes. Those who have not installed it need to install it.

install.packages("e1071")

The mSVM-REF function is written by Professor John Colby . Link me . If you can't go to GitHub, I also upload it in my gitee warehouse, you can click 1 on the right to go directly to 1 .

The input file is sorted like this, that is, the behavior samples are listed as genes, and the first column is the grouping information (I only made two groups of comparisons, and multiple groups of comparisons need further research).
input format
The function has already been written, we can directly quote it.

set.seed(2023)
library(e1071)

#这里填写你存放的文件路径
source("D:\\ProgramFiles\\R\\Work\\msvmRFE.R")

nfold = 10 #10倍交叉验证
nrows = nrow(input)
folds = rep(1:nfold, len=nrows)[sample(nrows)]
folds = lapply(1:nfold, function(x) which(folds == x))

results = lapply(folds, svmRFE.wrap, input, k=10, halve.above=100)
top.features = WriteFeatures(results, input, save=F)
featsweep = lapply(1:5, FeatSweep.wrap, results, input)

no.info = min(prop.table(table(input[,1])))
errors = sapply(featsweep, function(x) ifelse(is.null(x), NA, x$error))

pdf("svm_rfe.pdf", height = 8, width = 10)
PlotErrors(errors, no.info=no.info)
dev.off()
plot(top.features)#这个图也可以保存

In addition, I also referred to Professor Maryam's parallel code link to point me to parallel computing to improve computing speed. The premise is that Rmpi ​​needs to be installed in the win10 system. In my impression, I tossed and installed it. If the installation is not successful, don’t try it. Using the above code to let it run slowly can also produce results.

set.seed(2023)

library(e1071)
library(Rmpi)
library(snow)
library(parallel)

#这里填写你存放的文件路径
source("D:\\ProgramFiles\\R\\Work\\msvmRFE.R")

nfold = 10 #10倍交叉验证
nrows = nrow(input)
folds = rep(1:nfold, len=nrows)[sample(nrows)]
folds = lapply(1:nfold, function(x) which(folds == x))

#make a cluster
cl <- makeMPIcluster(mpi.universe.size())

clusterExport(cl, list("input","svmRFE","getWeights","svm"))
results <-parLapply(cl,folds, svmRFE.wrap, input, k=10, halve.above=100)
top.features = WriteFeatures(results, input, save=F)

clusterExport(cl, list("top.features","results", "tune","tune.control"))
featsweep = parLapply(cl,1:100, FeatSweep.wrap, results, input)
stopCluster(cl)

no.info = min(prop.table(table(input[,1])))
errors = sapply(featsweep, function(x) ifelse(is.null(x), NA, x$error))

pdf("svm_rfe.pdf", height = 8, width = 10)
PlotErrors(errors, no.info=no.info)
dev.off()
plot(top.features)
mpi.exit()

Other content

Other codes used in my article, including downloading data, difference analysis, lasso regression, random forest, etc. There are already many strategies on the Internet, so I won’t repeat them. I just post what I actually use for your reference. You can click this next to it 1 visit. 1

  • The GEOquery package downloads data in GEO. (Sometimes you still need to download and read manually if you can’t download it)
  • The limma package performs differential analysis on microarray data.
  • The DESeq2 package performs differential analysis on sequencing data.
  • The MEGENA package constructs co-expression maps.
  • The glmnet package implements lasso regression screening
  • The randomForest package implements random forest screening.
  • The venn package draws Venn diagrams.
  • The pROC package tests the prediction effect.
  • CIBERSORT for immune infiltration analysis ( function code point i ).
  • Some visualizations were done with ggplot2, Uncle Yu's aplot and other packages.

The code is a bit messy, I will explain it in detail first, and I will sort it out when I have time.
If you don't understand, you can comment below and private message me.

Finally want to say

In fact, these codes were written by others, and I just carried them over. It's like I bought the fish that people spent a lot of time raising and then sold it to everyone. I am a fragrant fish seller.

I don't produce code, just a code porter.

Guess you like

Origin blog.csdn.net/weixin_55842556/article/details/128828895