pm3 package version 1.4 released - an R package for 3-group propensity scoring

Currently, version 1.4 of the second R package I wrote, pm3, has been officially launched on CRAN. It is used for 3-group propensity score matching, and only 3 groups can be used, neither more nor less.
It can be installed using the following code

install.packages("pm3")

insert image description here
What is propensity score matching? Propensity Score Matching (PSM for short) is a statistical method used to process observational study (Observational Study) data and is widely used in SCI articles. In observational studies, due to various reasons, there are many data biases and confounding variables. The method of propensity score matching is to reduce the influence of these biases and confounding variables, so as to make a more accurate comparison between the experimental group and the control group. reasonable comparison.
Why is propensity score matching needed?
We know that the evidence strength of RCT is high because of the strict screening of patients. Our retrospective study is all past data, it is difficult to screen out the basic information of the two groups of patients with similar baselines as strictly as RCT, but we can filter the regression data through propensity score matching, and combine the patients with similar baseline information Patients were matched to obtain an approximate RCT effect.
Application Scenarios
 1. Uneven baseline data
 2. The number of positive cases in case-control studies is small, such as rare disease research
 3. Turn many confounding factors into one variable:
one instance below the propensity value, and the first two without matching The baseline data of patients in the two groups are very different. After propensity score matching, the baseline data are approximately the same.
insert image description here
Version 1.4 corrected some errors in the previous version. The last version of the tutorial was not very satisfactory. Let's take a look at the pm3 function

pm3 <-function(data,x,y,covs,factor,CALIP)

There are 6 parameters here, data is your data, x is the variable you want to compare and match, you can use characters or numbers, but if you use numbers to represent, it must be 1, 2, 3, use 0, 1 , 2 or other series will report an error. Here we are race, y is the outcome variable you want to compare, covs means covariate, fill in the covariate in your model model, which is the baseline index you want to match, including continuous and categorical, fill in factor After the variables, the categorical variables in your data will be converted into factors. CALIP means caliper. If not filled, the default is 0.5. Next, I will continue to demonstrate with the premature birth data that comes with the R package, first import the R package and data

library(pm3)
bc<-prematurity

insert image description here
This is the data about premature low birth weight infants (official account reply: premature birth data, and this data can also be obtained), less than 2500g is considered low birth weight infants. The data is interpreted as follows: low is a premature low birth weight baby less than 2500g, age is the age of the mother, lwt is the last menstrual weight, race is race, smoke is smoking during pregnancy, ptl is history of premature birth (count), ht is history of hypertension, ui is uterine allergy, ftv is early pregnancy The number of visits to the doctor, bwt the value of the newborn's weight.

Suppose we are studying the effect of different races on the birth of low birth weight children. Baseline profile propensity score matching required for 3 races

Now we don't need to do a lot of complicated operations as before, just one line of code, and it's done

g<-pm3(data=bc,x="race",y="low",covs=c("age","lwt","ptl"),factor=c("ui","low"))

In the previous version, at least 2 factor variables were required, but now there is no limit, and data can be generated by taking one. The important thing is to say twice, x is the variable you want to compare and match, you can use characters or numbers, but if you use numbers to represent, it must be 1, 2, 3, use 0, 1, 2 or other Sequences will report an error.

g<-pm3(data=bc,x="race",y="low",covs=c("age","lwt","ptl"),factor=c("ui"))

insert image description here
We extract the mbc

mbc<-g[["mbc"]]

For matching, let’s go through the code here. For details, you can read the previous article

library(tableone)
allVars <-c("age", "lwt", "ptl","ht")
fvars<-c("ht")
tab2 <- CreateTableOne(vars = allVars, strata = "race" ,
data = bc, factorVars=fvars,addOverall = TRUE )
print(tab2,smd = TRUE)
tab1 <- CreateTableOne(vars = allVars, strata = "race" ,
data = mbc, factorVars=fvars,addOverall = TRUE )
print(tab1,smd = TRUE)

insert image description here
In the previous version, some fans encountered an error
Error in if ((absDist12 + absDist13) < mindis) { : missing value where TRUE/FALSE needed
This R package has also been corrected to eliminate the error, which is to use Matching of fan data
insert image description here
We can see that the matching effect is very good, and most of the variables are basically P greater than 0.05. Since the pm3 package uses a for loop for matching, the speed is still a bit slow. I tried it. It takes about 1 minute for data with a data volume of 10,000. Next, I want to use the apply function to rewrite it to make it faster. , and then look at optimizing the code and developing a 1:2:2 matching function.

Guess you like

Origin blog.csdn.net/dege857/article/details/129612664