A collaborator posed an interesting R question to me today. She wanted to do several regressions using different outcomes, with models being computed on different strata defined by a combination of experimental design variables. She then just wanted to extract the p-values for the slopes for each of the models, and then filter the strata based on p-value levels.
This seems straighforward, right? Let’s set up a toy example:
library(tidyverse)
dat <- as_tibble(expand.grid(letters[1:4], 1:5))
d <- vector('list', nrow(dat))
set.seed(102)
for(i in 1:nrow(dat)){
x <- rnorm(100)
d[[i]] <- tibble(x = x, y1 = 3 - 2*x + rnorm(100), y2 = -4+5*x+rnorm(100))
}
dat <- as_tibble(bind_cols(dat, tibble(dat=d))) %>% unnest()
knitr::kable(head(dat), format='html')
a | 1 | 0.1805229 | 4.2598245 | -3.004535 |
a | 1 | 0.7847340 | 0.0023338 | -2.104949 |
a | 1 | -1.3531646 | 3.1711898 | -9.156758 |
a | 1 | 1.9832982 | -0.7140910 | 5.966377 |
a | 1 | 1.2384717 | 0.3523034 | 2.131004 |
a | 1 | 1.2006174 | 0.6267716 | 1.752106 |
Now we’re going to perform two regressions, one using y1
and one using y2
as the dependent variables, for each stratum defined by Var1
and Var2
.
out <- dat %>%
nest(-Var1, -Var2) %>%
mutate(model1 = map(data, ~lm(y1~x, data=.)),
model2 = map(data, ~lm(y2~x, data=.)))
Now conceptually, all we do is tidy up the output for the models using the broom
package, filter on the rows containg the slope information, and extract the p-values, right? Not quite….
library(broom)
out_problem <- out %>% mutate(output1 = map(model1, ~tidy(.)),
output2 = map(model2, ~tidy(.))) %>%
select(-data, -model1, -model2) %>%
unnest()
names(out_problem)
[1] “Var1” “Var2” “term” “estimate” “std.error” [6] “statistic” “p.value” “term” “estimate” “std.error” [11] “statistic” “p.value”
We’ve got two sets of output, but with the same column names!!! This is a problem! An easy solution would be to preface the column names with the name of the response variable. I struggled with this today until I discovered the secret function.
out_nice <- out %>% mutate(output1 = map(model1, ~tidy(.)),
output2 = map(model2, ~tidy(.)),
output1 = map(output1, ~setNames(., paste('y1', names(.), sep='_'))),
output2 = map(output2, ~setNames(., paste('y2', names(.), sep='_')))) %>%
select(-data, -model1, -model2) %>%
unnest()
This is a compact representation of the results of both regressions by strata, and we can extract the information we would like very easily. For example, to extract the stratum-specific slope estimates:
out_nice %>% filter(y1_term=='x') %>%
select(Var1, Var2, ends_with('estimate')) %>%
knitr::kable(digits=3, format='html')
a | 1 | -1.897 | 5.036 |
b | 1 | -2.000 | 5.022 |
c | 1 | -1.988 | 4.888 |
d | 1 | -2.089 | 5.089 |
a | 2 | -2.052 | 5.015 |
b | 2 | -1.922 | 5.004 |
c | 2 | -1.936 | 4.969 |
d | 2 | -1.961 | 4.959 |
a | 3 | -2.043 | 5.017 |
b | 3 | -2.045 | 4.860 |
c | 3 | -1.996 | 5.009 |
d | 3 | -1.922 | 4.894 |
a | 4 | -2.000 | 4.942 |
b | 4 | -2.000 | 4.932 |
c | 4 | -2.033 | 5.042 |
d | 4 | -2.165 | 5.049 |
a | 5 | -2.094 | 5.010 |
b | 5 | -1.961 | 5.122 |
c | 5 | -2.106 | 5.153 |
d | 5 | -1.974 | 5.009 |