R language manually draws the NHANSE data baseline table and discusses the problem of making subgroup interaction effect tables for NHANSE data (P for interaction)

The National Health and Nutrition Examination Survey (NHANES) is a population-based cross-sectional survey designed to collect information about the health and nutrition of the U.S. household population.
The address is: https://wwwn.cdc.gov/nchs/nhanes/Default.aspx
Insert image description here
In the previous article "Nhanes Clinical Database Mining Tutorial 2— In "Baseline Table Drawing (table1)", we have introduced the tableone package to draw the baseline table of NHANES data. Today we will introduce how to manually draw the baseline table of NHANES data. The advantage of manual drawing is that it can deepen your understanding of the operation. The second is that it is more flexible. The output format of the tableone package is relatively fixed. For example, if you want to change the number of people to unweighted, you need to calculate it manually, and the two methods can be used to confirm each other.
Let’s continue to use the data in the article "Nhanes Clinical Database Mining Tutorial 2—Baseline Table Drawing (table1)" as an example, and first import the R package and data

library(survey)
bc<-read.csv("E:/nhanes/nhanes.csv",sep=',',header=TRUE)

Insert image description here
Let me introduce the data, SEQN: serial number, RIAGENDR, # gender, RIDAGEYR, # age, RIDRETH1, # race, DMDMARTL, # marital status, WTINT2YR, WTMEC2YR, # weight, SDMVPSU, # psu, SDMVSTRA, # strata, LBDGLUSI, #blood sugar in mmol, LBDINSI, #insulin (pmmol/L), PHAFSTHR #postprandial blood sugar, LBXGH #glycated hemoglobin, SPXNFEV1, #FEV1: forced expiratory volume in the first second, SPXNFVC #FVC: forced vital capacity , ml (estimated lung volume), LBDGLTSI #2-hour postprandial blood glucose.
In order to have a comparison after making it, I performed a segmented treatment of diabetes according to the article "Nhanes Clinical Database Mining Tutorial 2-Baseline Table Drawing (table1)". Patients with OCTT less than 7.8 are considered normal patients. , 7.8-11 is pre-diabetes, and greater than 11 is diabetes.

bc$oGTT2<-ifelse(bc$LBDGLTSI<7.8,1,ifelse(bc$LBDGLTSI>=11,3,2))

The above code means to classify items less than 7.8 as 1, items greater than 11 as 3, and the rest as 2.

Insert image description here
Because the tableone package can automatically convert categorical variables into factors, if we make it manually here, we need to convert the categorical variables into factors ourselves.

bc[,c("RIAGENDR", "RIDRETH1","DMDMARTL")] <- lapply(bc[,c("RIAGENDR", "RIDRETH1","DMDMARTL")], factor)

After converting into factors, the following begins to establish the sampling survey function svydesign. ids means cluster. Fill in the sampling unit SDMVPSU (PSU) here. If not, fill in 1. strata = ~ SDMVSTRA. strata here means stratification. Here Fill in SDMVSTRA, weights means weights, refer to the meaning of other experts, such as WTINT2YR, WTMEC2YR, fill in WTMEC2YR for these two weights, and fill in your data for data.

bcSvy2<- svydesign(ids = ~ SDMVPSU, strata = ~ SDMVSTRA, weights = ~ WTMEC2YR,
                   nest = TRUE, data = bc)

After generating the survey function bcSvy2, we can perform calculations. Here we generate a baseline table classified according to oGTT2, which is the baseline value level of different blood glucose levels.
Mainly use the functions that come with the survey package to perform calculations. The continuous and categorical calculations must be calculated separately.
Let’s calculate the continuous ones first, mainly using Use the svyby function to perform calculations. Suppose we want to know the age distribution baseline of different oGTT2

svyby(~RIDAGEYR, ~oGTT2, bcSvy2, svymean)

Insert image description here
Then the result comes out, and you can see that it is exactly the same as that calculated by the tableone package. (The picture below is calculated by the tableone package)
Insert image description here
If you want to find the credible interval,

svyby(~RIDAGEYR, ~ oGTT2, bcSvy2, svymean , vartype="ci")

Insert image description here
If we want to find it by quantile

svyby(~RIDAGEYR, ~oGTT2, bcSvy2, svyquantile, quantiles=0.5,ci=TRUE,vartype="ci")

Insert image description here
If you want to find the unweighted number of people in each group

svyby(~RIDAGEYR, ~oGTT2, bcSvy2, unwtd.count, keep.var=FALSE)

Insert image description here

Next let’s take a look at how to calculate categorical variables. Categorical variables are mainly calculated using the svytable function. Let’s take race (RIDRETH1) as an example.
svyby(~RIDRETH1, ~oGTT2 , bcSvy2, svytotal,covmat=TRUE)

Insert image description here
The calculation here is exactly the same as the tableone package, so that both continuous and categorical variables are calculated. This indirectly proves that our calculation using the tableone package is correct. It is easier for novices to use the tableone package to calculate.

Finally, let’s talk about the recent past. Recently, I have been writing a function (P for interaction) that generates the NHANSE data subgroup interaction effect table with one click, but a small problem has stuck. That is, when generating the svyglm model inside the function, we need to use The anova function obtains the value of P for interaction of the model, but at this time R will prompt that the design function cannot be found, as shown in the figure below.

Insert image description here

This problem stuck with me for several days. I had no choice but to write an email and ask Professor Thomas, the author of the survey package, how to solve this problem. Professor Thomas said that this survey package cannot currently be solved. He needs to rewrite the anova.svyglm function. It may be possible to solve this problem in the new version, but Professor Thomas gave me a solution to this problem.
Why do I want to say this? Because since the survey package currently cannot solve this problem, many functions and R packages that implement algorithms based on the anova.svyglm function of the survey package are wrong. Because the design function can only be found in the global environment and cannot be found in other environments.
Here I want to mention the jstable package. Many fans mentioned this package to me, and I saw that some bloggers also recommended this package. This package uses anova when analyzing review weighted data. .svyglm function, so it will report an error once your data has multiple interactions. Moreover, the logic of this package is still different from when we usually calculate interaction effects. Our usual calculation of interaction effects is usually: y=a+b+a*b
and the calculation logic of the jstable package is y= a*b. What is calculated in this way is completely different. Just make it according to your needs when making it yourself.

Guess you like

Origin blog.csdn.net/dege857/article/details/133760169