转录组分析中的R基础知识

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/sunchengquan/article/details/84924037

数据

wget https://ndownloader.figshare.com/articles/3219685?private_link=1d788fd384d33e913a2a -O 3219685.zip

!ls -l 3219685/

总用量 2588
-rw-r--r-- 1 root root 1340161 12月  9 07:43 GSE60450_Lactation-GenewiseCounts.txt
-rw-r--r-- 1 root root 1253364 12月  9 07:43 mouse_c2_v5.rdata
-rw-r--r-- 1 root root   22483 12月  9 07:43 mouse_H_v5.rdata
-rw-r--r-- 1 root root    4362 12月  9 07:43 ResultsTable_small.txt
-rw-r--r-- 1 root root     733 12月  9 07:43 SampleInfo_Corrected.txt
-rw-r--r-- 1 root root     733 12月  9 07:43 SampleInfo.txt
-rw-r--r-- 1 root root     278 12月  9 07:43 small_counts.txt

数据来源:
EGF-mediated induction of Mcl-1 at the switch to lactation is essential for alveolar cellsurvival (Fu et al. 2015)

原始的测序数据:
Gene Expression Omnibus database (GEO) under accession number GSE60450

R基础

我们使用RStudio作为集成开发环境,包括桌面版和服务器版,根据需求自行下载安装

服务器版RStudio安装配置

下载:wget https://download2.rstudio.org/rstudio-server-rhel-1.1.456-x86_64.rpm

安装:yum install rstudio-server-rhel-1.1.456-x86_64.rpm

修改配置文件
vi /etc/rstudio/rserver.conf

# Server Configuration File
rsession-which-r=/home/sunchengquan/R-3.5.1/bin/R  
www-port=8787 

如果个人目录下有利用anaconda安装R,可能会报错。不能使用root安装

启动:rstudio-server start

可以在网页上登录使用Rstudio

R 包

使用Bioconductor安装

source("http://bioconductor.org/biocLite.R")
biocLite("limma")

读入数据

# Read the data into R
small_counts <- read.table("3219685/small_counts.txt", header = TRUE)
print(small_counts)
        Sample_1 Sample_2 Sample_3 Sample_4
Xkr4         438      300       65      237
Sox17        106      182       82      105
Mrpl15       309      234      337      300
Lypla1       652      515      948      935
Tcea1       1604     1495     1721     1317
Rgs20          4        2       14        4
Atp6v1h      769      752     1062      987
Rb1cc1      1494     1412     1157      967
Pcmtd1      1344     1242     1374     1593
Rrs1        1691     1808     2127     1653
dim(small_counts)
  1. 10
  2. 4

操作数据框

取子集

$ notation with the column name

#取Sample_1的数据
small_counts$Sample_1
  1. 438
  2. 106
  3. 309
  4. 652
  5. 1604
  6. 4
  7. 769
  8. 1494
  9. 1344
  10. 1691

[row, column] notation with numeric indices

small_counts[, 1]
  1. 438
  2. 106
  3. 309
  4. 652
  5. 1604
  6. 4
  7. 769
  8. 1494
  9. 1344
  10. 1691

[row, column] notation using the column name (in a vector)

small_counts[, c("Sample_1")]
  1. 438
  2. 106
  3. 309
  4. 652
  5. 1604
  6. 4
  7. 769
  8. 1494
  9. 1344
  10. 1691
small_counts[1:3, c("Sample_1", "Sample_3")]
Sample_1 Sample_3
Xkr4 438 65
Sox17 106 82
Mrpl15 309 337

除第一个样本外的所有样本

small_counts[1:3, -1]
Sample_2 Sample_3 Sample_4
Xkr4 300 65 237
Sox17 182 82 105
Mrpl15 234 337 300

向量化操作

small_counts$Sample_1 * 2
  1. 876
  2. 212
  3. 618
  4. 1304
  5. 3208
  6. 8
  7. 1538
  8. 2988
  9. 2688
  10. 3382
log(small_counts[1:3,])
Sample_1 Sample_2 Sample_3 Sample_4
Xkr4 6.082219 5.703782 4.174387 5.468060
Sox17 4.663439 5.204007 4.406719 4.653960
Mrpl15 5.733341 5.455321 5.820083 5.703782

计算每个样本的counts的和

sum(small_counts$Sample_1)

8411

sum(small_counts$Sample_2)

7942

如果有很多样本,这样操作很麻烦的,所以使用apply函数,比循环快

注意 MARGIN = 1 意思是按行计算,而MARGIN = 2 按列计算

sample_sums = apply(small_counts, MARGIN = 2, sum)
print(sample_sums)
Sample_1 Sample_2 Sample_3 Sample_4 
    8411     7942     8887     8098 

可以省略MARGIN

sample_sums = apply(small_counts, 2, sum)
print(sample_sums)
Sample_1 Sample_2 Sample_3 Sample_4 
    8411     7942     8887     8098 

数据类型

5 main types: doubles, integers, complex, logical and character.

typeof(3.14)

‘double’

typeof(1L)

‘integer’

typeof(1+1i)

‘complex’

typeof(TRUE)

‘logical’

typeof('banana')

‘character’

ResultsTable_small <- read.table("3219685/ResultsTable_small.txt", header=TRUE)
head(ResultsTable_small)
ENTREZID SYMBOL logFC AveExpr t P.Value adj.P.Val
24117 Wif1 1.819943 2.975545 20.10780 1.063770e-10 1.01624e-06
381290 Atp2b4 -2.143885 3.944066 -19.07495 1.982934e-10 1.01624e-06
78896 1500015O10Rik 2.807548 3.036519 18.54773 2.758828e-10 1.01624e-06
226101 Myof -2.329744 6.223525 -18.26861 3.297667e-10 1.01624e-06
16012 Igfbp6 -2.896115 1.978449 -18.21525 3.413066e-10 1.01624e-06
231830 Micall2 2.253400 4.760597 18.02627 3.858161e-10 1.01624e-06

str 查看ResultsTable_small的结构.

str(ResultsTable_small)
'data.frame':	40 obs. of  7 variables:
 $ ENTREZID : int  24117 381290 78896 226101 16012 231830 16669 55987 231991 14620 ...
 $ SYMBOL   : Factor w/ 40 levels "1500015O10Rik",..: 40 3 1 26 20 23 21 8 9 16 ...
 $ logFC    : num  1.82 -2.14 2.81 -2.33 -2.9 ...
 $ AveExpr  : num  2.98 3.94 3.04 6.22 1.98 ...
 $ t        : num  20.1 -19.1 18.5 -18.3 -18.2 ...
 $ P.Value  : num  1.06e-10 1.98e-10 2.76e-10 3.30e-10 3.41e-10 ...
 $ adj.P.Val: num  1.02e-06 1.02e-06 1.02e-06 1.02e-06 1.02e-06 ...

一个向量中包含多种的数据类型,会发生什么?

my_vector = c(1, "hello", TRUE)
print(my_vector)
[1] "1"     "hello" "TRUE" 
typeof(my_vector)

‘character’

R会自动转化同种数据类型,数据类型优先级:
logical -> integer -> numeric -> complex -> character.

my_vector = c(1,TRUE, TRUE)
print(my_vector)
typeof(my_vector)
[1] 1 1 1

‘double’

因子

str(ResultsTable_small)
'data.frame':	40 obs. of  7 variables:
 $ ENTREZID : int  24117 381290 78896 226101 16012 231830 16669 55987 231991 14620 ...
 $ SYMBOL   : Factor w/ 40 levels "1500015O10Rik",..: 40 3 1 26 20 23 21 8 9 16 ...
 $ logFC    : num  1.82 -2.14 2.81 -2.33 -2.9 ...
 $ AveExpr  : num  2.98 3.94 3.04 6.22 1.98 ...
 $ t        : num  20.1 -19.1 18.5 -18.3 -18.2 ...
 $ P.Value  : num  1.06e-10 1.98e-10 2.76e-10 3.30e-10 3.41e-10 ...
 $ adj.P.Val: num  1.02e-06 1.02e-06 1.02e-06 1.02e-06 1.02e-06 ...

因子看起来像字符数据,但是包含有分类信息,一串数字,标签的下标

str(ResultsTable_small$SYMBOL)
 Factor w/ 40 levels "1500015O10Rik",..: 40 3 1 26 20 23 21 8 9 16 ...
typeof(ResultsTable_small$SYMBOL)

‘integer’

如果你不想使用因子,都按字符处理

ResultsTable_small <- read.table("3219685/ResultsTable_small.txt", stringsAsFactors = FALSE, header=TRUE)
str(ResultsTable_small)
'data.frame':	40 obs. of  7 variables:
 $ ENTREZID : int  24117 381290 78896 226101 16012 231830 16669 55987 231991 14620 ...
 $ SYMBOL   : chr  "Wif1" "Atp2b4" "1500015O10Rik" "Myof" ...
 $ logFC    : num  1.82 -2.14 2.81 -2.33 -2.9 ...
 $ AveExpr  : num  2.98 3.94 3.04 6.22 1.98 ...
 $ t        : num  20.1 -19.1 18.5 -18.3 -18.2 ...
 $ P.Value  : num  1.06e-10 1.98e-10 2.76e-10 3.30e-10 3.41e-10 ...
 $ adj.P.Val: num  1.02e-06 1.02e-06 1.02e-06 1.02e-06 1.02e-06 ...

排序

sort(x)是对向量x进行排序,返回值排序后的数值向量

向量排序从小到大

sort(ResultsTable_small$logFC)
  1. -6.07014263352471
  2. -5.82788863265927
  3. -5.14626842050727
  4. -3.31364787941005
  5. -3.21114827988465
  6. -2.89611515497497
  7. -2.65339801437433
  8. -2.59810458251622
  9. -2.59704434679791
  10. -2.5385964096814
  11. -2.32974392966638
  12. -2.31272074376764
  13. -2.17189594243266
  14. -2.14388533952125
  15. -2.07146867497747
  16. -2.01180757857908
  17. -1.7089733203604
  18. -1.56742438255758
  19. -1.52029112638995
  20. -1.51546863348474
  21. -1.33143737986022
  22. -1.258670931154
  23. -1.10915597439346
  24. 1.47467090878791
  25. 1.52240538027464
  26. 1.710379971561
  27. 1.7513404533859
  28. 1.78860123725529
  29. 1.81994310357102
  30. 1.88756079716885
  31. 1.97277123486981
  32. 2.18037012538574
  33. 2.25339982481145
  34. 2.27887939443659
  35. 2.34291447312525
  36. 2.76674499153781
  37. 2.80754753168061
  38. 2.83562374639041
  39. 3.60009376671151
  40. 3.73893325921556

向量排序从大到小

sort(ResultsTable_small$logFC, decreasing = TRUE)
  1. 3.73893325921556
  2. 3.60009376671151
  3. 2.83562374639041
  4. 2.80754753168061
  5. 2.76674499153781
  6. 2.34291447312525
  7. 2.27887939443659
  8. 2.25339982481145
  9. 2.18037012538574
  10. 1.97277123486981
  11. 1.88756079716885
  12. 1.81994310357102
  13. 1.78860123725529
  14. 1.7513404533859
  15. 1.710379971561
  16. 1.52240538027464
  17. 1.47467090878791
  18. -1.10915597439346
  19. -1.258670931154
  20. -1.33143737986022
  21. -1.51546863348474
  22. -1.52029112638995
  23. -1.56742438255758
  24. -1.7089733203604
  25. -2.01180757857908
  26. -2.07146867497747
  27. -2.14388533952125
  28. -2.17189594243266
  29. -2.31272074376764
  30. -2.32974392966638
  31. -2.5385964096814
  32. -2.59704434679791
  33. -2.59810458251622
  34. -2.65339801437433
  35. -2.89611515497497
  36. -3.21114827988465
  37. -3.31364787941005
  38. -5.14626842050727
  39. -5.82788863265927
  40. -6.07014263352471

对字符也适用

sort(ResultsTable_small$SYMBOL)
  1. '1500015O10Rik'
  2. 'Ak1'
  3. 'Atp2b4'
  4. 'Bhlhe41'
  5. 'Ccdc129'
  6. 'Ccdc153'
  7. 'Chil1'
  8. 'Cpxm2'
  9. 'Creb5'
  10. 'Csf1'
  11. 'Csn1s2b'
  12. 'Cyp2s1'
  13. 'Ddit4'
  14. 'Fam102b'
  15. 'Fam110a'
  16. 'Gjb3'
  17. 'Gpsm2'
  18. 'Hmcn1'
  19. 'Hs6st2'
  20. 'Igfbp6'
  21. 'Krt19'
  22. 'Lif'
  23. 'Micall2'
  24. 'Mrgprf'
  25. 'Mtmr11'
  26. 'Myof'
  27. 'Naaa'
  28. 'Nfatc2'
  29. 'Nr1d1'
  30. 'Pdzd3'
  31. 'Ppp2r3a'
  32. 'Serpinf1'
  33. 'Skil'
  34. 'Slit3'
  35. 'Smad7'
  36. 'Sox4'
  37. 'Tnni2'
  38. 'Tppp3'
  39. 'Trp53inp1'
  40. 'Wif1'

对数据框排序

order()的返回值是对应“排名”的元素所在向量中的位置

order(ResultsTable_small$logFC)
  1. 19
  2. 18
  3. 11
  4. 23
  5. 14
  6. 5
  7. 37
  8. 9
  9. 31
  10. 30
  11. 4
  12. 7
  13. 32
  14. 2
  15. 39
  16. 36
  17. 17
  18. 27
  19. 33
  20. 8
  21. 21
  22. 28
  23. 38
  24. 24
  25. 40
  26. 12
  27. 35
  28. 34
  29. 1
  30. 16
  31. 20
  32. 13
  33. 6
  34. 29
  35. 26
  36. 15
  37. 3
  38. 25
  39. 10
  40. 22
ResultsTable_small$logFC[order(ResultsTable_small$logFC)]
  1. -6.07014263352471
  2. -5.82788863265927
  3. -5.14626842050727
  4. -3.31364787941005
  5. -3.21114827988465
  6. -2.89611515497497
  7. -2.65339801437433
  8. -2.59810458251622
  9. -2.59704434679791
  10. -2.5385964096814
  11. -2.32974392966638
  12. -2.31272074376764
  13. -2.17189594243266
  14. -2.14388533952125
  15. -2.07146867497747
  16. -2.01180757857908
  17. -1.7089733203604
  18. -1.56742438255758
  19. -1.52029112638995
  20. -1.51546863348474
  21. -1.33143737986022
  22. -1.258670931154
  23. -1.10915597439346
  24. 1.47467090878791
  25. 1.52240538027464
  26. 1.710379971561
  27. 1.7513404533859
  28. 1.78860123725529
  29. 1.81994310357102
  30. 1.88756079716885
  31. 1.97277123486981
  32. 2.18037012538574
  33. 2.25339982481145
  34. 2.27887939443659
  35. 2.34291447312525
  36. 2.76674499153781
  37. 2.80754753168061
  38. 2.83562374639041
  39. 3.60009376671151
  40. 3.73893325921556
ResultsTable_small[order(ResultsTable_small$logFC), ]
ENTREZID SYMBOL logFC AveExpr t P.Value adj.P.Val
19 12992 Csn1s2b -6.070143 3.56295004 -14.16565 6.377604e-09 5.131276e-06
18 21953 Tnni2 -5.827889 0.30207159 -14.40327 5.265278e-09 4.622914e-06
11 211577 Mrgprf -5.146268 -0.93683349 -16.36573 1.196263e-09 1.718703e-06
23 170761 Pdzd3 -3.313648 -0.06019306 -13.62372 9.982985e-09 6.580512e-06
14 270150 Ccdc153 -3.211148 -1.34083882 -15.50126 2.249931e-09 2.539851e-06
5 16012 Igfbp6 -2.896115 1.97844876 -18.21525 3.413066e-10 1.016240e-06
37 67971 Tppp3 -2.653398 4.90816305 -12.22845 3.416616e-08 1.419445e-05
9 231991 Creb5 -2.598105 4.27592952 -16.53634 1.059885e-09 1.718703e-06
31 232016 Ccdc129 -2.597044 5.00471484 -13.02266 1.672195e-08 8.524957e-06
30 67111 Naaa -2.538596 3.29074575 -13.04083 1.645823e-08 8.524957e-06
4 226101 Myof -2.329744 6.22352456 -18.26861 3.297667e-10 1.016240e-06
7 16669 Krt19 -2.312721 8.74189184 -17.07937 7.264548e-10 1.640127e-06
32 76123 Gpsm2 -2.171896 4.99093472 -12.76344 2.102397e-08 1.015751e-05
2 381290 Atp2b4 -2.143885 3.94406593 -19.07495 1.982934e-10 1.016240e-06
39 74134 Cyp2s1 -2.071469 1.40704575 -12.20154 3.502805e-08 1.419445e-05
36 18019 Nfatc2 -2.011808 5.79499693 -12.27561 3.271067e-08 1.419445e-05
17 194126 Mtmr11 -1.708973 2.50804119 -14.48746 4.922928e-09 4.576586e-06
27 545370 Hmcn1 -1.567424 3.10302591 -13.19053 1.444832e-08 8.306595e-06
33 329739 Fam102b -1.520291 4.18813047 -12.75357 2.120968e-08 1.015751e-05
8 55987 Cpxm2 -1.515469 2.83451194 -16.64333 9.829870e-10 1.718703e-06
21 20564 Slit3 -1.331437 3.44179493 -13.88522 8.026279e-09 6.040348e-06
28 60599 Trp53inp1 -1.258671 6.11839605 -13.16241 1.480464e-08 8.306595e-06
38 235542 Ppp2r3a -1.109156 6.50105941 -12.22041 3.442139e-08 1.419445e-05
24 73847 Fam110a 1.474671 6.84086068 13.62251 9.993185e-09 6.580512e-06
40 20677 Sox4 1.522405 7.46932835 12.13548 3.724389e-08 1.462109e-05
12 20317 Serpinf1 1.710380 3.38883490 15.77280 1.838727e-09 2.356351e-06
35 50786 Hs6st2 1.751340 0.53953600 12.43097 2.836919e-08 1.280991e-05
34 79362 Bhlhe41 1.788601 6.18368494 12.70504 2.214908e-08 1.029541e-05
1 24117 Wif1 1.819943 2.97554452 20.10780 1.063770e-10 1.016240e-06
16 20482 Skil 1.887561 8.49892507 14.65488 4.311334e-09 4.258521e-06
20 17131 Smad7 1.972771 6.71751902 14.14348 6.493642e-09 5.131276e-06
13 74747 Ddit4 2.180370 6.86479110 15.70145 1.938279e-09 2.356351e-06
6 231830 Micall2 2.253400 4.76059697 18.02627 3.858161e-10 1.016240e-06
29 217166 Nr1d1 2.278879 6.26087761 13.12885 1.524242e-08 8.306595e-06
26 12654 Chil1 2.342914 5.57645724 13.21976 1.408760e-08 8.306595e-06
15 11636 Ak1 2.766745 4.30347462 15.27694 2.664640e-09 2.807465e-06
3 78896 1500015O10Rik 2.807548 3.03651950 18.54773 2.758828e-10 1.016240e-06
25 12977 Csf1 2.835624 7.47759094 13.41902 1.187300e-08 7.505634e-06
10 14620 Gjb3 3.600094 3.52528051 16.46627 1.113755e-09 1.718703e-06
22 16878 Lif 3.738933 6.68203417 13.73344 9.105708e-09 6.541210e-06

逻辑语句取子集操作

Subsetting using logical statements

ResultsTable_small$logFC > 3
  1. FALSE
  2. FALSE
  3. FALSE
  4. FALSE
  5. FALSE
  6. FALSE
  7. FALSE
  8. FALSE
  9. FALSE
  10. TRUE
  11. FALSE
  12. FALSE
  13. FALSE
  14. FALSE
  15. FALSE
  16. FALSE
  17. FALSE
  18. FALSE
  19. FALSE
  20. FALSE
  21. FALSE
  22. TRUE
  23. FALSE
  24. FALSE
  25. FALSE
  26. FALSE
  27. FALSE
  28. FALSE
  29. FALSE
  30. FALSE
  31. FALSE
  32. FALSE
  33. FALSE
  34. FALSE
  35. FALSE
  36. FALSE
  37. FALSE
  38. FALSE
  39. FALSE
  40. FALSE
ResultsTable_small$logFC[ResultsTable_small$logFC > 3]
  1. 3.60009376671151
  2. 3.73893325921556

应用到数据框

ResultsTable_small[ResultsTable_small$logFC > 3, ]
ENTREZID SYMBOL logFC AveExpr t P.Value adj.P.Val
10 14620 Gjb3 3.600094 3.525281 16.46627 1.113755e-09 1.718703e-06
22 16878 Lif 3.738933 6.682034 13.73344 9.105708e-09 6.541210e-06
ResultsTable_small[ResultsTable_small$logFC > 3 | ResultsTable_small$logFC < -3, ]
ENTREZID SYMBOL logFC AveExpr t P.Value adj.P.Val
10 14620 Gjb3 3.600094 3.52528051 16.46627 1.113755e-09 1.718703e-06
11 211577 Mrgprf -5.146268 -0.93683349 -16.36573 1.196263e-09 1.718703e-06
14 270150 Ccdc153 -3.211148 -1.34083882 -15.50126 2.249931e-09 2.539851e-06
18 21953 Tnni2 -5.827889 0.30207159 -14.40327 5.265278e-09 4.622914e-06
19 12992 Csn1s2b -6.070143 3.56295004 -14.16565 6.377604e-09 5.131276e-06
22 16878 Lif 3.738933 6.68203417 13.73344 9.105708e-09 6.541210e-06
23 170761 Pdzd3 -3.313648 -0.06019306 -13.62372 9.982985e-09 6.580512e-06
ResultsTable_small[abs(ResultsTable_small$logFC) > 3, ]
ENTREZID SYMBOL logFC AveExpr t P.Value adj.P.Val
10 14620 Gjb3 3.600094 3.52528051 16.46627 1.113755e-09 1.718703e-06
11 211577 Mrgprf -5.146268 -0.93683349 -16.36573 1.196263e-09 1.718703e-06
14 270150 Ccdc153 -3.211148 -1.34083882 -15.50126 2.249931e-09 2.539851e-06
18 21953 Tnni2 -5.827889 0.30207159 -14.40327 5.265278e-09 4.622914e-06
19 12992 Csn1s2b -6.070143 3.56295004 -14.16565 6.377604e-09 5.131276e-06
22 16878 Lif 3.738933 6.68203417 13.73344 9.105708e-09 6.541210e-06
23 170761 Pdzd3 -3.313648 -0.06019306 -13.62372 9.982985e-09 6.580512e-06

%in%

my_genes <- c("Smad7", "Wif1", "Fam102b", "Tppp3")
ResultsTable_small$SYMBOL %in% my_genes
  1. TRUE
  2. FALSE
  3. FALSE
  4. FALSE
  5. FALSE
  6. FALSE
  7. FALSE
  8. FALSE
  9. FALSE
  10. FALSE
  11. FALSE
  12. FALSE
  13. FALSE
  14. FALSE
  15. FALSE
  16. FALSE
  17. FALSE
  18. FALSE
  19. FALSE
  20. TRUE
  21. FALSE
  22. FALSE
  23. FALSE
  24. FALSE
  25. FALSE
  26. FALSE
  27. FALSE
  28. FALSE
  29. FALSE
  30. FALSE
  31. FALSE
  32. FALSE
  33. TRUE
  34. FALSE
  35. FALSE
  36. FALSE
  37. TRUE
  38. FALSE
  39. FALSE
  40. FALSE
ResultsTable_small[ResultsTable_small$SYMBOL %in% my_genes, ]
ENTREZID SYMBOL logFC AveExpr t P.Value adj.P.Val
1 24117 Wif1 1.819943 2.975545 20.10780 1.063770e-10 1.016240e-06
20 17131 Smad7 1.972771 6.717519 14.14348 6.493642e-09 5.131276e-06
33 329739 Fam102b -1.520291 4.188130 -12.75357 2.120968e-08 1.015751e-05
37 67971 Tppp3 -2.653398 4.908163 -12.22845 3.416616e-08 1.419445e-05

match

%in%这个操作符只返回逻辑向量TRUE 或者FALSE,而且返回值应该与%in%这个操作符前面的向量程度相等。也就是说它相当于遍历了C里面的一个个元素,判断它们是否在B中出现过,然后返回是或者否即可。

而match(C,B)的结果就很不一样了,它的返回结果同样与前面的向量等长,但是它并非返回逻辑向量,而是遍历了C里面的一个个元素,判断它们是否在B中出现过,如果出现就返回在B中的索引号,如果没有出现,就返回NA。

B<-seq(5,15,2)

C<-1:5
match(C,B)


C%in%B
  1. <NA>
  2. <NA>
  3. <NA>
  4. <NA>
  5. 1
  1. FALSE
  2. FALSE
  3. FALSE
  4. FALSE
  5. TRUE
match(my_genes, ResultsTable_small$SYMBOL)
  1. 20
  2. 1
  3. 33
  4. 37

和my_genes中的排序一样

ResultsTable_small[match(my_genes, ResultsTable_small$SYMBOL), ]
ENTREZID SYMBOL logFC AveExpr t P.Value adj.P.Val
20 17131 Smad7 1.972771 6.717519 14.14348 6.493642e-09 5.131276e-06
1 24117 Wif1 1.819943 2.975545 20.10780 1.063770e-10 1.016240e-06
33 329739 Fam102b -1.520291 4.188130 -12.75357 2.120968e-08 1.015751e-05
37 67971 Tppp3 -2.653398 4.908163 -12.22845 3.416616e-08 1.419445e-05

猜你喜欢

转载自blog.csdn.net/sunchengquan/article/details/84924037