Skip to content

Instantly share code, notes, and snippets.

@ryantmoore
Last active April 6, 2022 10:19
Show Gist options
  • Save ryantmoore/cda9d226ec2245ae1b1c3ed1487ba652 to your computer and use it in GitHub Desktop.
Save ryantmoore/cda9d226ec2245ae1b1c3ed1487ba652 to your computer and use it in GitHub Desktop.
Estimate many Similar Models (and Plot Results)

Estimate Many Similar Models and Display Coefficients

Ryan T. Moore 14 October 2016

The Problem

You want to estimate a set of similar models, and display the coefficients in a plot. For example, you want to estimate

y = a + b*x1
y = a + c*x2
y = a + d*x3

and display b, c, and d in a plot with confidence intervals.

The TL;DR Solution

Copy the function below (or from this PDF version) and run the function similar_models(), such as

similar_models(df, "y", c("x1", "x2", "x3"))

The Full Solution

Some required libraries:

library(ggplot2)
library(tidyr)
library(dplyr)

The Function

The function similar_models takes the data, the outcome, the set of related variables that you want to include, one-by-one, and any variables that you want to include in every model.

similar_models <- function(data, outcome, related_variables, always_include = NULL,
                           verbose = FALSE){
  
  ## Initialize storage:
  n_models <- length(related_variables)
  storage <- matrix(NA, n_models, 3)
  
  ## Loop over names
  for(model.idx in 1:n_models){
    if(verbose == TRUE){
      cat(paste("Starting ", related_variables[model.idx], "\n"))
    }
    ## Create string for formula:
    this_formula <- paste0(outcome, " ~ ", related_variables[model.idx])
    
    if(length(always_include) > 0){
      additional_terms <- ""
      for(always.idx in 1:length(always_include)){
        additional_terms <- paste(additional_terms, "+", always_include[always.idx])
      }
      
      this_formula <- paste(this_formula, additional_terms)
    }
    
    this_formula <- as.formula(this_formula)
    
    ## Estimate:
    lm_out <- lm(this_formula, data = data)
    ## Store coef and CI:
    results <- c(coef(lm_out)[2], confint(lm_out)[2, 1], confint(lm_out)[2, 2])
    storage[model.idx, ] <- results
  }
  
  storage <- as.data.frame(storage)
  ## Add variable names:
  storage$Variable <- related_variables
  names(storage)[1:3] <- c("Coefficient", "Lower", "Upper")
  
  ## Sort:
  storage$Variable <- factor(storage$Variable, 
                             levels = storage$Variable[order(storage$Coefficient)])

  return(storage)
}

Sample Data

Create some data with variables y, x1, x2, x3, and w:

set.seed(418)
n <- 100
df <- data.frame(x1 = rnorm(n), x2 = rnorm(n), x3 = rnorm(n), w = rnorm(n))
df <- df %>% mutate(y = x1 + 2*x2 + 3*x3 + 2*w + rnorm(n))

Estimate the Similar Models

Store the variable names as a vector of strings, then estimate the similar models in the equations above.

related_vars <- c("x1", "x2", "x3")

lm_results <- similar_models(df, "y", related_vars)

## Plot the results:
lm_results %>% ggplot(aes(x = Coefficient, y = Variable)) + 
  geom_point() + 
  geom_segment(aes(x = Lower, xend = Upper, y = Variable, yend = Variable))

Including other Predictors in Each Model

To estimates models like

y = a + b*x1 + e*w
y = a + c*x2 + e*w
y = a + d*x3 + e*w

specify the variable(s) you want to always include, such as w. You could include several, as c("w", "z", "q", ...).

lm_results_w <- similar_models(df, "y", related_vars, always_include = "w")

## Plot:
lm_results_w %>% ggplot(aes(x = Coefficient, y = Variable)) + 
  geom_point() + 
  geom_segment(aes(x = Lower, xend = Upper, y = Variable, yend = Variable))

---
title: "Estimate Many Similar Models and Display Coefficients"
author: Ryan T. Moore
date: "14 October 2016"
output: rmarkdown::github_document
urlcolor: red
---
# The Problem
You want to estimate a set of similar models, and display the coefficients in a plot. For example, you want to estimate
```
y = a + b * x1
y = a + c * x2
y = a + d * x3
```
and display `b`, `c`, and `d` in a plot with confidence intervals.
# The TL;DR Solution
Copy the function below (or from this [PDF version](http://www.ryantmoore.org/files/ht/htSimilarModels.pdf)) and run the function `similar_models()`, such as
```r
similar_models(df, "y", c("x1", "x2", "x3"))
```
# The Full Solution
## Some required libraries:
```{r warning = FALSE, message=FALSE}
library(dplyr)
library(ggplot2)
library(tidyr)
```
## The Function
The function `similar_models` takes the data, the outcome, the set of related variables that you want to include, one-by-one, and any variables that you want to include in every model.
```{r warning = FALSE}
similar_models <- function(data, outcome, related_variables, always_include = NULL,
verbose = FALSE){
## Initialize storage:
n_models <- length(related_variables)
storage <- matrix(NA, n_models, 3)
## Loop over names
for(model.idx in 1:n_models){
if(verbose == TRUE){
cat(paste("Starting ", related_variables[model.idx], "\n"))
}
## Create string for formula:
this_formula <- paste0(outcome, " ~ ", related_variables[model.idx])
if(length(always_include) > 0){
additional_terms <- ""
for(always.idx in 1:length(always_include)){
additional_terms <- paste(additional_terms, "+", always_include[always.idx])
}
this_formula <- paste(this_formula, additional_terms)
}
this_formula <- as.formula(this_formula)
## Estimate:
lm_out <- lm(this_formula, data = data)
## Store coef and CI:
results <- c(coef(lm_out)[2], confint(lm_out)[2, 1], confint(lm_out)[2, 2])
storage[model.idx, ] <- results
}
storage <- as.data.frame(storage)
## Add variable names:
storage$Variable <- related_variables
names(storage)[1:3] <- c("Coefficient", "Lower", "Upper")
## Sort:
storage$Variable <- factor(storage$Variable,
levels = storage$Variable[order(storage$Coefficient)])
return(storage)
}
```
## Sample Data
Create some data with variables `y`, `x1`, `x2`, `x3`, and `w`:
```{r}
set.seed(418)
n <- 100
df <- data.frame(x1 = rnorm(n), x2 = rnorm(n), x3 = rnorm(n), w = rnorm(n))
df <- df %>% mutate(y = x1 + 2*x2 + 3*x3 + 2*w + rnorm(n))
```
## Estimate the Similar Models
Store the variable names as a vector of strings, then estimate the similar models in the equations above.
```{r warning = FALSE, message=FALSE}
related_vars <- c("x1", "x2", "x3")
lm_results <- similar_models(df, "y", related_vars)
## Plot the results:
lm_results %>% ggplot(aes(x = Coefficient, y = Variable)) +
geom_point() +
geom_segment(aes(x = Lower, xend = Upper, y = Variable, yend = Variable))
```
## Including other Predictors in Each Model
To estimates models like
```
y = a + b*x1 + e*w
y = a + c*x2 + e*w
y = a + d*x3 + e*w
```
specify the variable(s) you want to always include, such as `w`. You could include several, as `c("w", "z", "q", ...)`.
```{r warning = FALSE}
lm_results_w <- similar_models(df, "y", related_vars, always_include = "w")
## Plot:
lm_results_w %>% ggplot(aes(x = Coefficient, y = Variable)) +
geom_point() +
geom_segment(aes(x = Lower, xend = Upper, y = Variable, yend = Variable))
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment