NHANES data (complex survey data) subgroup interaction function 1.4 early adopter version (P for interaction) released --- used to generate interaction effect tables with one click

In SCI articles, the interaction effect table (usually Table 5) can enhance the article, increase the persuasiveness of the article, increase the credibility of the results, and enable data mining. What is a subgroup? It is usually a special type of group of people, such as men and women, race, etc. That is to say, is the result of your data still reliable when you put it into a special group of people? If your results are stable among various special groups, it means your conclusion is reliable. If the conclusion of the subgroup is contrary to the conclusion of your data, you can use it to make a new topic. You can also compare whether there are any differences between different subgroups, such as the difference between those who have had heart stents and those who have not had heart stents. Many new ideas can be discovered and data mining is easy.
Insert image description here
I have already introduced how to make an interaction effect table in my previous article "R Language Teaches You Step by Step to Make an Interaction Effect Table". You can take a look at it for details. In the previous article "scitb5 function version 1.7 (interaction effect function P for interaction) released - used to generate interaction effect tables with one click", I released the scitb5 function version 1.7 written by me. The scitb5 function can be used for ordinary weighting data, but it cannot be used for complex weighted data, so I rewrote the subgroup interaction effect function for the NHANES data.
Let me demonstrate below
Import a NHANES data that I downloaded myself. It should be noted here that the method of importing data is also very important (many people here Wrong), because I am programming based on R's basic data frame data.frame format. Some R packages will report errors when importing data with formats (such as the haven package). A safer way is to use EXCEL to first use EXCEL to convert the data like I did. Save it as csv format and then import it according to the method below. Or you can use as.data.frame() to force the data into a data frame. Deleting missing values ​​here is just for demonstration convenience and has no practical significance.

library(survey)
bc<-read.csv("E:/r/test/subtext.csv",sep=',',header=TRUE)
bc <- na.omit(bc)
str(bc)

Insert image description hereLet me introduce the data, SEQN: serial number, RIAGENDR, # gender, RIDAGEYR, # age, RIDRETH1, # race, DMDMARTL, # marital status, WTINT2YR, WTMEC2YR, # weight, SDMVPSU, # psu, SDMVSTRA, # strata, LBDGLUSI, #blood sugar in mmol, LBDINSI, #insulin (pmmol/L), PHAFSTHR #postprandial blood sugar, LBXGH #glycated hemoglobin, SPXNFEV1, #FEV1: forced expiratory volume in the first second, SPXNFVC #FVC: forced vital capacity , ml (estimated lung capacity), LBDGLTSI #2-hour postprandial blood sugar, factor.FVC is where I divided vital capacity into 2 categories for convenience in testing.
Convert categorical variables into factors

bc$DMDMARTL<-ifelse(bc$DMDMARTL==1,1,0)
bc$RIAGENDR<-as.factor(bc$RIAGENDR)
bc$RIDRETH1<-as.factor(bc$RIDRETH1)
bc$DMDMARTL<-as.factor(bc$DMDMARTL)
bc$factor.FVC<-as.factor(bc$factor.FVC)

Create a sample survey function

bcSvy2<- svydesign(ids = ~ SDMVPSU, strata = ~ SDMVSTRA, weights = ~ WTMEC2YR,
                   nest=TRUE,data = bc)

Set covariates and interaction variables. It should be noted here that cov1 refers to the covariate, Interaction is the interaction variable, and the interaction variable

cov1<-c("DMDMARTL","RIDRETH1","RIDAGEYR","LBXGH")	
Interaction<-c("DMDMARTL","RIDRETH1")

Import function svyscitb5 function

source("E:/r/test/1.4svyscitb5.R")

After successful import, 7 functions should be displayed

Insert image description here
The body of the function is

svy.scitb5 (data,x,y,Interaction,cov=NULL,time=NULL,family=NULL,method=NULL,
                     ids=NULL,strata=NULL,weights=NULL,svydstr=NULL)

Let me explain that data is data, and it must be in the form of a data frame. Covariates, in my setting, cov should include Interaction, which is also in line with our habits. Use family="glm" uniformly, fill in the survey function here in svydstr, currently support logistic regression and linear regression, cox regression is not currently supported. If you have used the scitb5 function I wrote, this function is completely stress-free to use.

Let’s start with a categorical variable. Suppose the variable x I’m studying is race and the y variable is vital capacity. I want to understand the relationship between this layer.

out<-svy.scitb5(data=bc,x="RIAGENDR",y="factor.FVC",Interaction=Interaction,cov = cov1,family="svyglm",
                svydstr=bcSvy2)  #x分类,Y分类

Insert image description here
I want to explain the above picture. When making categorical variables, you need to set a reference. In the hierarchical comparison of DMDMARTL, RIAGENDR.1_DMDMARTL.0 is recognized as the reference. What does it mean? That is, when RIAGENDR is equal to 1 and DMDMARTL is equal to 0, this subgroup of people is defaulted as the reference comparison, and other groups are compared with it. When categorical variables interact with subgroups, it is best not to have too many categories, or else The data will be very large, and if the data cannot be separated into some layers, the data will show missing NA.
Generate forest map with one click

plotforest(out)

Insert image description here
If Y is a continuous variable, the generated value here should be β. We can see that the function determines the data type by itself.

out<-svy.scitb5(data=bc,x="LBDINSI",y="SPXNFVC",Interaction=Interaction,cov = cov1,family="svyglm",
                svydstr=bcSvy2)  #x连续,Y连续

Insert image description here
Since Y is a continuous variable, it is impossible to draw a forest diagram. Because it has negative values, even if it is drawn, it will be very strange. I will not do it here.

plotforest(out)

Insert image description here
Here's a x-continuous, Y-classified

out<-svy.scitb5(data=bc,x="LBDINSI",y="factor.FVC",Interaction=Interaction,cov = cov1,family="svyglm"
                ,svydstr=bcSvy2)  #x连续,Y分类

Insert image description here
Draw a forest map

plotforest(out)

Insert image description here
Next, I will introduce the method parameter of the svy.scitb5 function. This parameter is used to calculate the P for interaction value. It has two options ("LRT", "Wald"). LRT uses the weighted deviation method, and Wald is Wald. For testing, just use LRT for nested models, so the default is LRT. We can try the difference between the two parameters.

out<-svy.scitb5(data=bc,x="LBDINSI",y="factor.FVC",Interaction=Interaction,cov = cov1,family="svyglm",method="Wald",
                svydstr=bcSvy2)  #x连续,Y分类

out<-svy.scitb5(data=bc,x="LBDINSI",y="factor.FVC",Interaction=Interaction,cov = cov1,family="svyglm",method="LRT",
                svydstr=bcSvy2)  #x连续,Y分类

Insert image description here
Insert image description here
Normally you don't need to care about this parameter or fill it in. Why do I mention this? Because sometimes the extreme values ​​of your model data are too small and cannot be calculated using the LRT method. Then we need to change it, but we usually ignore it. Let’s use the data of a fan to demonstrate. We won’t explain the meaning of the data here.

fenxi<-read.csv("E:/r/fensi/fenxiR.csv",sep=',',header=TRUE)
fenxi$Y<- as.factor(fenxi$Y)
fenxi$Y<- as.factor(fenxi$X3)
fenxi$Y<- as.factor(fenxi$X5)
fenxi$Y<- as.factor(fenxi$X6)
fenxi$Y<- as.factor(fenxi$X7)
fenxi$Y<- as.factor(fenxi$X8)
fenxi$Y<- as.factor(fenxi$X9)
fenxi$Y<- as.factor(fenxi$X10)
fenxi$Y<- as.factor(fenxi$X11)
fenxi$Y<- as.factor(fenxi$X12)
fenxi$Y<- as.factor(fenxi$X13)
fenxi$Y<- as.factor(fenxi$X1)
fenxi$Y<- as.factor(fenxi$X16)
fenxi$Y<- as.factor(fenxi$X17)
fenxi[,c("Y","X3",  "X5", "X6", "X7", "X8", 
         "X9", "X10", "X11", "X12", "X13","X1","X16", "X17")] <- lapply(fenxi[,c("Y","X3",  "X5", "X6", "X7", "X8", 
                                                                                 "X9", "X10", "X11", "X12", "X13","X1","X16", "X17")], factor)
nhans<- svydesign(data=fenxi, ids=~SDMVPSU,strata = ~SDMVSTRA,
                  weights=~quanzhong,nest = TRUE)
cov1<-c("X2", "X3","X5", "X6", "X7", "X8", 
        "X9", "X10", "X11", "X12", "X13","X15","X16", "X17")	
Interaction<-c("X3","X5","X6","X7", "X8", "X9", "X10", "X11", "X12", "X13","X16","X17")

Let’s take a look at the conventional methods

out<-svy.scitb5(fenxi,x="X",y="Y",Interaction=Interaction,cov = cov1,family="svyglm",
                svydstr=nhans)

Insert image description here
This time we need to change the method

out<-svy.scitb5(fenxi,x="X",y="Y",Interaction=Interaction,cov = cov1,family="svyglm",method="Wald",
                svydstr=nhans)

Insert image description here
I want to say here that these two methods are methods in the survey package. They are not generated by me. I just call them and finally draw the picture.

plotforest(out)

Insert image description here
Finally, I would like to make an original statement that the entire code for this function was designed and written by me. You can use and modify it at will, but you must obtain my permission for commercial use.

To obtain the svy.scitb5 function, please read this article

NHANES data (complex survey data) subgroup interaction function 1.4 early adopter version (P for interaction) released—used to generate interaction effect tables with one click

Guess you like

Origin blog.csdn.net/dege857/article/details/134050756