Original link: http://tecdat.cn/?p=10080
Theil-Sen estimator is a commonly used in the social sciences is not a simple linear regression estimator. Three steps:
- Data in a line is drawn between all points
- Calculates the slope of each line
- The median is the slope of the regression slope
Calculating the slope in this way is very reliable. When the error is normally distributed with no outliers, the slope is very similar to the OLS.
There are several ways to obtain the intercept method. If the regression intercept concern, then you know what software is very reasonable.
When I have concerns about outliers and heteroscedasticity, please comment on simple linear regression for the Theil-Sen at the top.
I conducted a simulation to learn how to Theil-Sen compared with the OLS under heteroskedasticity. It is more efficient estimator.
library(simglm)
library(ggplot2)
library(dplyr)
library(WRS)
# Hetero
nRep <- 100
n.s <- c(seq(50, 300, 50), 400, 550, 750, 1000)
samp.dat <- sample((1:(nRep*length(n.s))), 25)
lm.coefs.0 <- matrix(ncol = 3, nrow = nRep*length(n.s))
ts.coefs.0 <- matrix(ncol = 3, nrow = nRep*length(n.s))
lmt.coefs.0 <- matrix(ncol = 3, nrow = nRep*length(n.s))
dat.s <- list()
ggplot(dat.frms.0, aes(x = age, y = sim_data)) +
geom_point(shape = 1, size = .5) +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~ random.sample, nrow = 5) +
labs(x = "Predictor", y = "Outcome",
title = "Random sample of 25 datasets from 15000 datasets for simulation",
subtitle = "Heteroscedastic relationships")
ggplot(coefs.0, aes(x = n, colour = Estimator)) +
geom_boxplot(
aes(ymin = q025, lower = q25, middle = q50, upper = q75, ymax = q975), data = summarise(
group_by(coefs.0, n, Estimator), q025 = quantile(Slope, .025),
q25 = quantile(Slope, .25), q50 = quantile(Slope, .5),
q75 = quantile(Slope, .75), q975 = quantile(Slope, .975)), stat = "identity") +
geom_hline(yintercept = 2, linetype = 2) + scale_y_continuous(breaks = seq(1, 3, .05)) +
labs(x = "Sample size", y = "Slope",
title = "Estimation of regression slope in simple linear regression under heteroscedasticity",
subtitle = "1500 replications - Population slope is 2",
caption = paste(
"Boxes are IQR, whiskers are middle 95% of slopes",
"Both estimators are unbiased in the long run, however, OLS has higher variability",
sep = "\n"
))