Changing names in the tidyverse: An example for many regressions

Mar 9, 2017 3 min read R

A collaborator posed an interesting R question to me today. She wanted to do several regressions using different outcomes, with models being computed on different strata defined by a combination of experimental design variables. She then just wanted to extract the p-values for the slopes for each of the models, and then filter the strata based on p-value levels.

This seems straighforward, right? Let’s set up a toy example:

library(tidyverse)

dat <- as_tibble(expand.grid(letters[1:4], 1:5))
d <- vector('list', nrow(dat))
set.seed(102)
for(i in 1:nrow(dat)){
x <- rnorm(100)
d[[i]] <- tibble(x = x, y1 = 3 - 2*x + rnorm(100), y2 = -4+5*x+rnorm(100))
}
dat <- as_tibble(bind_cols(dat, tibble(dat=d))) %>% unnest()
knitr::kable(head(dat), format='html')

Var1 Var2 x y1 y2

a	1	0.1805229	4.2598245	-3.004535
a	1	0.7847340	0.0023338	-2.104949
a	1	-1.3531646	3.1711898	-9.156758
a	1	1.9832982	-0.7140910	5.966377
a	1	1.2384717	0.3523034	2.131004
a	1	1.2006174	0.6267716	1.752106

Now we’re going to perform two regressions, one using y1 and one using y2 as the dependent variables, for each stratum defined by Var1 and Var2.

out <- dat %>%
nest(-Var1, -Var2) %>%
mutate(model1 = map(data, ~lm(y1~x, data=.)),
model2 = map(data, ~lm(y2~x, data=.)))

Now conceptually, all we do is tidy up the output for the models using the broom package, filter on the rows containg the slope information, and extract the p-values, right? Not quite….

library(broom)
out_problem <- out %>% mutate(output1 = map(model1, ~tidy(.)),
output2 = map(model2, ~tidy(.))) %>%
select(-data, -model1, -model2) %>%
unnest()
names(out_problem)

[1] “Var1” “Var2” “term” “estimate” “std.error” [6] “statistic” “p.value” “term” “estimate” “std.error” [11] “statistic” “p.value”

We’ve got two sets of output, but with the same column names!!! This is a problem! An easy solution would be to preface the column names with the name of the response variable. I struggled with this today until I discovered the secret function.

out_nice <- out %>% mutate(output1 = map(model1, ~tidy(.)),
output2 = map(model2, ~tidy(.)),
output1 = map(output1, ~setNames(., paste('y1', names(.), sep='_'))),
output2 = map(output2, ~setNames(., paste('y2', names(.), sep='_')))) %>%
select(-data, -model1, -model2) %>%
unnest()

This is a compact representation of the results of both regressions by strata, and we can extract the information we would like very easily. For example, to extract the stratum-specific slope estimates:

out_nice %>% filter(y1_term=='x') %>%
select(Var1, Var2, ends_with('estimate')) %>%
knitr::kable(digits=3, format='html')

Var1 Var2 y1_estimate y2_estimate

a	1	-1.897	5.036
b	1	-2.000	5.022
c	1	-1.988	4.888
d	1	-2.089	5.089
a	2	-2.052	5.015
b	2	-1.922	5.004
c	2	-1.936	4.969
d	2	-1.961	4.959
a	3	-2.043	5.017
b	3	-2.045	4.860
c	3	-1.996	5.009
d	3	-1.922	4.894
a	4	-2.000	4.942
b	4	-2.000	4.932
c	4	-2.033	5.042
d	4	-2.165	5.049
a	5	-2.094	5.010
b	5	-1.961	5.122
c	5	-2.106	5.153
d	5	-1.974	5.009