starter, pm3 package, an R package for multi-group (3-group) propensity score matching

At present, the second R package pm3 written by me has been officially launched on CRAN. It is used for 3 groups of propensity score matching, and there can only be 3 groups, neither more nor less.
It can be installed using the following code

install.packages("pm3")

insert image description here
What is propensity score matching? Propensity Score Matching (PSM for short) is a statistical method used to process observational study (Observational Study) data and is widely used in SCI articles. In observational studies, due to various reasons, there are many data biases and confounding variables. The method of propensity score matching is to reduce the influence of these biases and confounding variables, so as to make a more accurate comparison between the experimental group and the control group. reasonable comparison.
Why is propensity score matching needed?
We know that the evidence strength of RCT is high because of the strict screening of patients. Our retrospective study is all past data, it is difficult to screen out the basic information of the two groups of patients with similar baselines as strictly as RCT, but we can filter the regression data through propensity score matching, and combine the patients with similar baseline information Patients were matched to obtain an approximate RCT effect.
Application Scenarios
 1. Uneven baseline data
 2. The number of positive cases in case-control studies is small, such as rare disease research
 3. Turn many confounding factors into one variable:
one instance below the propensity value, and the first two without matching The baseline data of the patients in the two groups are very different. After propensity score matching, the baseline data are approximately the same. As far as
insert image description here
I know, there are almost no R packages for propensity scoring for the three groups. I have also written about the article "Patient Propensity Score Matching (PSM) for 3 Groups in R Language" in the past. If you want to know how to do it, you can read it, but many people can't do it after reading the article, so I have The idea of ​​writing a package, this is what I promised everyone at the end of the article, and it can be said that it has been done.

insert image description here
Thanks to the friends who paid, this is also the motivation to support me to move on. Thanks to the original author for providing the method selflessly. I just present the method in code. Don’t ask me about the theoretical discussion. Here I would also like to mention that I have improved and optimized the author's method to some extent. In the original author's method, the categorical variables of the covariates can only be classified into 2 categories, and I can use multiple categories here.
Let's demonstrate the usage of the pm3 package. Let's import the R package and data first. The pm3 package has my built-in premature birth data. We can just import it directly.

library(pm3)
bc<-prematurity

insert image description here
insert image description here
This is the data about premature low birth weight infants (official account reply: premature birth data, and this data can also be obtained), less than 2500g is considered low birth weight infants. The data is interpreted as follows: low is a premature low birth weight baby less than 2500g, age is the age of the mother, lwt is the last menstrual weight, race is race, smoke is smoking during pregnancy, ptl is history of premature birth (count), ht is history of hypertension, ui is uterine allergy, ftv is early pregnancy The number of visits to the doctor, bwt the value of the newborn's weight.

Suppose we are studying the effect of different races on the birth of low birth weight children. Baseline profile propensity score matching required for 3 races

Now we don't need to do a lot of complicated operations as before, just one line of code, and it's done

g<-pm3(data=bc,x="race",y="low",covs=c("age","lwt","ptl"),factor=c("ui","low"))

Let me explain this code, because we rely on the logistic regression model to generate scores, so we need to define a variable of the regression model: data is your data, x is the variable you want to compare, here is race, y is you The outcome variable to be compared, covs means covariate, fill in the covariate in your model model, which is the baseline indicator you want to match, including continuous and categorical, here is "age", "lwt", " ptl", the last factor is to define the categorical variables in your data. There is a small problem here. If you don’t have factor, you can leave it blank. If you fill it in, you must fill in at least two, otherwise an error will be reported. This problem will be corrected in later versions. I originally only had the categorical variable ui here. I was afraid of reporting an error. I added low, or added race. It doesn’t affect it.
After executing the code, g is generated

insert image description here

g is a list data file. We can see that 3 data files that we have matched have been generated, with 26 data in each file, which is exactly the same as the article "Patient Propensity Score Matching (PSM) for 3 Groups in R Language". mbc is the merged data of 3 matching files,
insert image description here
we extract mbc

mbc<-g[["mbc"]]

insert image description here
Below we compare before and after matching. Import the tableone package

library(tableone)

Define all variables and categorical variables

allVars <-c("age", "lwt", "ptl","ht")
fvars<-c("ht")

Compare

tab2 <- CreateTableOne(vars = allVars, strata = "race" ,
data = bc, factorVars=fvars,addOverall = TRUE )
print(tab2,smd = TRUE)
tab1 <- CreateTableOne(vars = allVars, strata = "race" ,
data = mbc, factorVars=fvars,addOverall = TRUE )
print(tab1,smd = TRUE)

insert image description here
We can see that the P value becomes larger, the smd becomes smaller, and the matching effect is very good. Finally, I would like to say that propensity matching is not a panacea, and it is impossible to balance all variables.
references:

  1. Deng Qiangting, Wang Hong, Zhang Dada, et al. Design of propensity score matching algorithm for unordered multi-group data and R program implementation [J]. Modern Preventive Medicine, 2021, 48(15):5.
  2. [1] Wu Shunquan, Wu Cheng, He Jia. Comparison and application of propensity score matching method in multi-classification data [J]. Chinese Journal of Health Information Management, 2013(5):448-451.

Guess you like

Origin blog.csdn.net/dege857/article/details/129331528