R language draws the baseline table after inverse probability weighting

The inverse probability weighting (IPTW) method based on PS (propensity score) was first proposed by Rosenbaum as a model-based direct standardization method, which belongs to the marginal structure model. To put it simply, it is to pack many covariates and confounding factors into a probability and weight them. In this way, I only need to calculate its weight, which is much more convenient. Then, how to express the influence of multiple covariates with a propensity score value? That is, how to estimate the propensity score value? According to the definition of Rosen-baum and Rubin: the propensity score value is given a set of covariates (X i ) condition, the conditional probability that research subject i (i =1, 2, ... N) is assigned to a certain treatment group or receives a certain exposure factor (Z i =1).
insert image description here
Using logistic regression model to estimate propensity score has
obvious advantages such as simple model, easy implementation, direct access to propensity score, and easy interpretation of the results.
Let's take logistic regression as an example:
Logistic regression is the first method to estimate the propensity score value. Because its principle is familiar to people and easy to implement, it
is also the most commonly used estimation method at present. The logistic regression model is as follows:
insert image description here
Assuming a binary logistic regression, the equation of a series of confounding factors on the right will generate a probability of occurrence of the target event time between 0 and 1. The greater the probability, the greater the possibility of the event occurring, which is equal to Multiple confounding factors are made into a comprehensive score to represent. Inverse Probability Weighting (IPTW) is a method that uses the inverse of the propensity score to deal with confounding in the data. The calculation method of the weighting coefficient (shape) given by Robins et al. is: the weight of the observation unit in the treatment group Wt=1/PS, and the weight of the observation unit in the control group Wc=1/(1-PS). PS is the propensity score value of the observation unit. obtained by this method
The population is often different from the original population, so the variance of each variable in the virtual population can change, and the treatment group objects with lower PS and the non-treatment group objects with higher PS will get a large weight. Instability is induced due to very large weights. Heman et al. adjusted the calculation method, and added the treatment rate and non-treatment rate of the entire research population into the formula to obtain stabilized weights after adjustment. The specific method is: the weight Wt=Pt/PS of the observation unit of the treatment group, and the weight Wc=(1-Pt)/(1-PS) of the observation unit of the control group. At present, many articles use stable weights.
In previous articles, we have introduced how to use R and SPSS to perform inverse probability weighted analysis. Some fans in the background asked how to draw the weighted baseline table, as shown in the figure below. The weighted baseline data is basically balanced. Today we will use R to
insert image description here
demonstrate Let’s see how to draw the weighted baseline table and continue to use our premature birth data (Reply from the official account: premature birth data, you can get this data), we first import the R package and data

library(tableone)
library(survey)
bc<-read.csv("E:/r/test/zaochan.csv",sep=',',header=TRUE)
bc <- na.omit(bc)

insert image description here
insert image description here
This is the data about premature low birth weight infants (official account reply: premature birth data, this data can be obtained), and the infants below 2500g are considered low birth weight infants. The data is interpreted as follows: low is a premature low birth weight baby less than 2500g, age is the mother’s age, lwt is the last menstrual weight, race is race, smoke is smoking during pregnancy, ptl is premature birth history (count), ht is having a history of hypertension, ui is uterine allergy, ftv is early pregnancy Number of doctor visits
bwt Newborn weight value.
We first convert the categorical variables into factors

bc <- na.omit(bc)
bc$race<-ifelse(bc$race=="black",1,ifelse(bc$race=="white",2,3))
bc$smoke<-ifelse(bc$smoke=="nonsmoker",0,1)
bc$low<-factor(bc$low)
bc$race<-factor(bc$race)
bc$ht<-factor(bc$ht)
bc$ui<-factor(bc$ui)

Assuming that we are studying the impact of high blood pressure (ht) on the birth of low birth weight children (low), we first draw a patient baseline table that has not been weighted

dput(names(bc))##输出变量名
allVars <-c("age", "lwt", "race", "smoke", "ptl", "ht", "ui", 
            "ftv", "bwt")###所有变量名
fvars<-c("race", "smoke","ht","ui")#分类变量定义为fvars
tab2 <- CreateTableOne(vars = allVars, strata = "low" , data = bc, factorVars=fvars,
                       addOverall = TRUE )###绘制基线表
print(tab2)#输出表格

insert image description here
We noticed that there are three indicators P in the above figure that are less than 0.05, bwt is an outcome indicator, and lwt weight is a baseline indicator.
We first establish the regression equation to generate the predicted value

pr<- glm(ht ~age + lwt + race + smoke + ptl + ui + ftv, data=bc,
         family=binomial(link = "logit"))
pr1<-predict(pr,type = "response")
summary(bc$ht)

insert image description here
The figure above shows 12 people with high blood pressure and 177 people without high blood pressure. Next, we will generate two weights respectively. One is the calculation method of the weighting coefficient (shape) given by Robins et al.

w<- (bc$ht==1) * (1/pr1) + (bc$ht==0) * (1)/(1-pr1)

The other is the calculation method of Heman et al. to calculate the probability of stable weight generation (the probability of high blood pressure)

pt<-12/(177+12)
w1 <- (bc$ht==1) * (pt/pr1) + (bc$ht==0) * (1-pt)/(1-pr1)

insert image description here
After generating the weights, you can draw the weighted baseline table. Here you need to use the svydesign function of the survey package. This is a powerful R package that can generate baseline tables with various weights.

bcSvy1<- svydesign(ids = ~ id, strata = ~ low, weights = ~ w,
                   nest = TRUE, data = bc)

Once generated, you can use the TableOne package to draw weighted tables

Svytab1<- svyCreateTableOne(vars = c( "age", "lwt", "race", "smoke", "ptl","ui", 
                                      "ftv", "bwt"),
                            strata = "low", data =bcSvy1 ,
                            factorVars = c("race", "smoke","ht","ui"))
Svytab1

insert image description here
It can be seen from the above figure that after weighting, the number of cases in each group has changed, and the baseline treatment of lwt has been balanced, making the comparison between the two groups of patients more comparable. Below we use the calculation method of Heman et al. to generate weights to generate a baseline table

bcSvy2<- svydesign(ids = ~ id, strata = ~ low, weights = ~ w1,
                   nest = TRUE, data = bc)
Svytab2<- svyCreateTableOne(vars = c( "age", "lwt", "race", "smoke", "ptl", "ui", 
                                      "ftv", "bwt"),
                            strata = "low", data =bcSvy2 ,
                            factorVars = c("race", "smoke","ht","ui"))
Svytab2

insert image description here
We can see that for the data in this article, the baseline table generated by the Heman method does not have as large a change in the number of cases as the Robins method, but it is not as good as the Robins method in trimming the baseline data. It is not that the Robins method is better than the Heman method. It is estimated that in different data, the two methods have their own advantages.
insert image description here

Guess you like

Origin blog.csdn.net/dege857/article/details/123221854#comments_27964967