R language Theil-Sen regression analysis

Original link: http://tecdat.cn/?p=10080


 

 Theil-Sen estimator is a commonly used in the social sciences is not a simple linear regression estimator. Three steps:

  • Data in a line is drawn between all points
  • Calculates the slope of each line
  • The median is the slope of the regression slope

Calculating the slope in this way is very reliable. When the error is normally distributed with no outliers, the slope is very similar to the OLS. 

There are several ways to obtain the intercept method. If the regression intercept concern, then you know what software is very reasonable. 

When I have concerns about outliers and heteroscedasticity, please comment on simple linear regression for the Theil-Sen at the top.

I conducted a  simulation to learn how to Theil-Sen compared with the OLS under heteroskedasticity. It is more efficient estimator.

library(simglm)
library(ggplot2)
library(dplyr)
library(WRS)

# Hetero
nRep <- 100
n.s <- c(seq(50, 300, 50), 400, 550, 750, 1000)
samp.dat <- sample((1:(nRep*length(n.s))), 25)
lm.coefs.0 <- matrix(ncol = 3, nrow = nRep*length(n.s))
ts.coefs.0 <- matrix(ncol = 3, nrow = nRep*length(n.s))
lmt.coefs.0 <- matrix(ncol = 3, nrow = nRep*length(n.s))
dat.s <- list()



ggplot(dat.frms.0, aes(x = age, y = sim_data)) +
  geom_point(shape = 1, size = .5) +
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~ random.sample, nrow = 5) +
  labs(x = "Predictor", y = "Outcome",
       title = "Random sample of 25 datasets from 15000 datasets for simulation",
       subtitle = "Heteroscedastic relationships")


Simulation results

 
ggplot(coefs.0, aes(x = n, colour = Estimator)) +
  geom_boxplot(
    aes(ymin = q025, lower = q25, middle = q50, upper = q75, ymax = q975), data = summarise(
      group_by(coefs.0, n, Estimator), q025 = quantile(Slope, .025),
      q25 = quantile(Slope, .25), q50 = quantile(Slope, .5),
      q75 = quantile(Slope, .75), q975 = quantile(Slope, .975)), stat = "identity") +
  geom_hline(yintercept = 2, linetype = 2) + scale_y_continuous(breaks = seq(1, 3, .05)) +
  labs(x = "Sample size", y = "Slope",
       title = "Estimation of regression slope in simple linear regression under heteroscedasticity",
       subtitle = "1500 replications - Population slope is 2",
       caption = paste(
         "Boxes are IQR, whiskers are middle 95% of slopes",
         "Both estimators are unbiased in the long run, however, OLS has higher variability",
         sep = "\n"
       ))



25 from the analog random samples

Published 445 original articles · won praise 246 · views 970 000 +

Guess you like

Origin blog.csdn.net/qq_19600291/article/details/103960701