Use the subset function in R to classify and manage data

We can often see such tables in SCI papers. The statistical results are based on classification. As shown in the figure below, the patients are divided into two categories according to whether the patients are alive or not. Class statistical results
Insert picture description here
So, how does the R language make such a table? First of all, we have to divide the data to get a surviving data table and a dead data table, and then separate statistics. Today we use the subset function that comes with the R language to demonstrate this function. This is a very important function. To prepare for further analysis of the data in the future.
We use a Breast cancer survival tumor data that comes with SPSS to demonstrate. First, we import this data into R and delete the missing value
library(foreign)
library(survival)
bc <- read.spss("E:/r/Breast cancer survival agec.sav",
use.value.labels=F, to.data.frame=T)
bc <- na.omit(bc)
Insert picture description here
Check the structure of the bc data set
head(bc)
Insert picture description here
age represents age, pathsize represents pathology Tumor size (cm), lnpos means positive axillary lymph nodes, histgrad means histopathological grade, er means estrogen receptor status, pr means progesterone receptor status, status outcome event is dead, pathscat means pathological tumor size category (grouping variable ), ln_yesno indicates whether there is lymph node enlargement, time is the survival time, and the following agec is set by ourselves, so don't care about it.
In English, subset means subgroup and subset. The subset function in R language is used to construct subsets. We use this function to group data. The Subset function uses at least two indicators to specify, one is the data set and the other is the variable. There are only two variables in ln_yesno, 0 and 1,0, which means that there is no lymph node swelling, and 1 means that there is lymph node swelling. We now use the ln_yesno index to divide patients into two groups, one group is caused by lymph node swelling, and the other is without lymph node swelling big.
ln_yesno0<-subset(bc,bc$ln_yesno<1)#We tell the subset function that the value of the ln_yesno variable is less than 1, which is 0
Insert picture description here

Similarly, enter
ln_yesno1<-subset(bc,bc$ln_yesno>=1)
Insert picture description here
Insert picture description here

In this way, two data boxes, ln_yesno0 and ln_yesno1, represent one group with enlarged lymph nodes, and one group without lymph node enlargement. Having said a lot, the code is actually just a few lines, very simple.
Move your little hands and pay attention, more wonderful articles are all in the zero-based scientific research
Insert picture description here

Guess you like

Origin blog.csdn.net/dege857/article/details/108981407
Recommended