Ryan T. Moore 14 October 2016
You want to estimate a set of similar models, and display the coefficients in a plot. For example, you want to estimate
y = a + b*x1
y = a + c*x2
y = a + d*x3
and display b
, c
, and d
in a plot with confidence intervals.
Copy the function below (or from this PDF
version) and
run the function similar_models()
, such as
similar_models(df, "y", c("x1", "x2", "x3"))
library(ggplot2)
library(tidyr)
library(dplyr)
The function similar_models
takes the data, the outcome, the set of
related variables that you want to include, one-by-one, and any
variables that you want to include in every
model.
similar_models <- function(data, outcome, related_variables, always_include = NULL,
verbose = FALSE){
## Initialize storage:
n_models <- length(related_variables)
storage <- matrix(NA, n_models, 3)
## Loop over names
for(model.idx in 1:n_models){
if(verbose == TRUE){
cat(paste("Starting ", related_variables[model.idx], "\n"))
}
## Create string for formula:
this_formula <- paste0(outcome, " ~ ", related_variables[model.idx])
if(length(always_include) > 0){
additional_terms <- ""
for(always.idx in 1:length(always_include)){
additional_terms <- paste(additional_terms, "+", always_include[always.idx])
}
this_formula <- paste(this_formula, additional_terms)
}
this_formula <- as.formula(this_formula)
## Estimate:
lm_out <- lm(this_formula, data = data)
## Store coef and CI:
results <- c(coef(lm_out)[2], confint(lm_out)[2, 1], confint(lm_out)[2, 2])
storage[model.idx, ] <- results
}
storage <- as.data.frame(storage)
## Add variable names:
storage$Variable <- related_variables
names(storage)[1:3] <- c("Coefficient", "Lower", "Upper")
## Sort:
storage$Variable <- factor(storage$Variable,
levels = storage$Variable[order(storage$Coefficient)])
return(storage)
}
Create some data with variables y
, x1
, x2
, x3
, and w
:
set.seed(418)
n <- 100
df <- data.frame(x1 = rnorm(n), x2 = rnorm(n), x3 = rnorm(n), w = rnorm(n))
df <- df %>% mutate(y = x1 + 2*x2 + 3*x3 + 2*w + rnorm(n))
Store the variable names as a vector of strings, then estimate the similar models in the equations above.
related_vars <- c("x1", "x2", "x3")
lm_results <- similar_models(df, "y", related_vars)
## Plot the results:
lm_results %>% ggplot(aes(x = Coefficient, y = Variable)) +
geom_point() +
geom_segment(aes(x = Lower, xend = Upper, y = Variable, yend = Variable))
To estimates models like
y = a + b*x1 + e*w
y = a + c*x2 + e*w
y = a + d*x3 + e*w
specify the variable(s) you want to always include, such as w
. You
could include several, as c("w", "z", "q", ...)
.
lm_results_w <- similar_models(df, "y", related_vars, always_include = "w")
## Plot:
lm_results_w %>% ggplot(aes(x = Coefficient, y = Variable)) +
geom_point() +
geom_segment(aes(x = Lower, xend = Upper, y = Variable, yend = Variable))