We can often see such tables in SCI papers. The statistical results are based on classification. As shown in the figure below, the patients are divided into two categories according to whether the patients are alive or not. Class statistical results
So, how does the R language make such a table? First of all, we have to divide the data to get a surviving data table and a dead data table, and then separate statistics. Today we use the subset function that comes with the R language to demonstrate this function. This is a very important function. To prepare for further analysis of the data in the future.
We use a Breast cancer survival tumor data that comes with SPSS to demonstrate. First, we import this data into R and delete the missing value
library(foreign)
library(survival)
bc <- read.spss("E:/r/Breast cancer survival agec.sav",
use.value.labels=F, to.data.frame=T)
bc <- na.omit(bc)
Check the structure of the bc data set
head(bc)
age represents age, pathsize represents pathology Tumor size (cm), lnpos means positive axillary lymph nodes, histgrad means histopathological grade, er means estrogen receptor status, pr means progesterone receptor status, status outcome event is dead, pathscat means pathological tumor size category (grouping variable ), ln_yesno indicates whether there is lymph node enlargement, time is the survival time, and the following agec is set by ourselves, so don't care about it.
In English, subset means subgroup and subset. The subset function in R language is used to construct subsets. We use this function to group data. The Subset function uses at least two indicators to specify, one is the data set and the other is the variable. There are only two variables in ln_yesno, 0 and 1,0, which means that there is no lymph node swelling, and 1 means that there is lymph node swelling. We now use the ln_yesno index to divide patients into two groups, one group is caused by lymph node swelling, and the other is without lymph node swelling big.
ln_yesno0<-subset(bc,bc$ln_yesno<1)#We tell the subset function that the value of the ln_yesno variable is less than 1, which is 0
Similarly, enter
ln_yesno1<-subset(bc,bc$ln_yesno>=1)
In this way, two data boxes, ln_yesno0 and ln_yesno1, represent one group with enlarged lymph nodes, and one group without lymph node enlargement. Having said a lot, the code is actually just a few lines, very simple.
Move your little hands and pay attention, more wonderful articles are all in the zero-based scientific research