Row-binding lists of stuff
The problem of needing to convert a single-level list of data into a single tibble comes up every
once in a while and I can never remember how I did it the last time I did it. So here we are! I
would also like to stay in the tidyverse where possible in order to keep the flow of my pipelines.
I'd say the only exception is the usage of paste0
over str_c
/glue
.
Packages and setup
library(tibble)
library(dplyr)
library(purrr)
library(ggplot2) # optional
library(tidytext) # optional
theme_set(theme_bw()) # optional
set.seed(20)
Lists of vectors
Recently, I needed to find the roots of arbitrary quadratic equations using the
polyroot
function. The
equation was of the form:
If the equation had two distinct roots, a vector of length two was returned, e.g.:
polyroot(c(6, 5, 1))
## [1] -2+0i -3-0i
If the equation had one distinct root or was not actually quadratic ( $a = 0$), a vector of length one was returned, e.g.:
polyroot(c(1, 2))
## [1] -0.5+0i
If the equation was inconsistent ( $a,\, b = 0,\, c \neq 0$), a vector of length zero was returned, e.g.:
polyroot(3)
## complex(0)
In addition to finding the root, I also wanted to know the value of $x_1$ where this root occurred.
Let's first create a function that will calculate the required quadratic coefficients, given the
input value x1
.
get_coefs <- function(x1) {
coef_a <- 0.002
coef_b <- 2*(-0.013)*x1 - 0.23
coef_c <- 1.08 + 1.23*x1 + 0.098*x1^2
c(coef_c, coef_b, coef_a)
}
Now, let's find some roots for some basic values of x1
.
some_roots <- seq(1, 1.60, length.out=7) %>%
map(get_coefs) %>%
map(polyroot)
head(some_roots)
## [[1]]
## [1] 10.22268-0i 117.77732+0i
##
## [[2]]
## [1] 10.76278-0i 118.53722+0i
##
## [[3]]
## [1] 11.30435-0i 119.29565+0i
##
## [[4]]
## [1] 11.84739-0i 120.05261+0i
##
## [[5]]
## [1] 12.39188-0i 120.80812+0i
##
## [[6]]
## [1] 12.93782-0i 121.56218+0i
To convert this data into a tibble with all the roots in one column, we should map as_tibble_col
over all list elements.
some_roots %>%
head() %>%
map_dfr(as_tibble_col, column_name="root")
## # A tibble: 12 x 1
## root
## <cpl>
## 1 10.22268-0i
## 2 117.77732+0i
## 3 10.76278-0i
## 4 118.53722+0i
## 5 11.30435-0i
## 6 119.29565+0i
## 7 11.84739-0i
## 8 120.05261+0i
## 9 12.39188-0i
## 10 120.80812+0i
## 11 12.93782-0i
## 12 121.56218+0i
While this works well in row-binding the roots into a single column tibble, it's unclear which roots
belong to which values of x1
. In addition, it's not a simple as just column-binding back the
initial values of x1
that we passed in since some roots are of length one and some are of length
two.
The solution is to give the intial sequence of x1
values names that match its values and make use
of map_dfr
's .id
argument, e.g.
seq(-13, 33, by=0.10) %>%
set_names(., .) %>%
head()
## -13 -12.9 -12.8 -12.7 -12.6 -12.5
## -13.0 -12.9 -12.8 -12.7 -12.6 -12.5
Putting it all together:
some_roots <- seq(-13, 33, by=0.10) %>%
set_names(., .) %>%
map(get_coefs) %>%
map(polyroot) %>%
map_dfr(as_tibble_col, column_name="root", .id="x1")
some_roots
## # A tibble: 922 x 2
## x1 root
## <chr> <cpl>
## 1 -13 -27.00000+9.84886i
## 2 -13 -27.00000-9.84886i
## 3 -12.9 -26.35000+8.14049i
## 4 -12.9 -26.35000-8.14049i
## 5 -12.8 -25.70000+5.97244i
## 6 -12.8 -25.70000-5.97244i
## 7 -12.7 -25.05000+2.28199i
## 8 -12.7 -25.05000-2.28199i
## 9 -12.6 -19.38801-0.00000i
## 10 -12.6 -29.41199+0.00000i
## # ... with 912 more rows
Note that at this point, x1
is a column of type character, not numeric. Coercion of x1
to
numeric will be required for plotting.
Optional plots of real roots
real_roots <- some_roots %>%
mutate(root = zapsmall(root)) %>%
filter(Im(root) == 0) %>%
mutate(
root = Re(root),
x1 = as.numeric(x1)
)
real_roots
## # A tibble: 900 x 2
## x1 root
## <dbl> <dbl>
## 1 -12.6 -19.4
## 2 -12.6 -29.4
## 3 -12.5 -16.3
## 4 -12.5 -31.2
## 5 -12.4 -13.9
## 6 -12.4 -32.3
## 7 -12.3 -11.7
## 8 -12.3 -33.2
## 9 -12.2 -9.76
## 10 -12.2 -33.8
## # ... with 890 more rows
ggplot(real_roots, aes(x=x1, y=root)) +
geom_point()
Lists of matrices with rownames
Let's first generate some data.
phi <- function(n) {
out <- matrix(
runif(n),
dimnames = list(
stringi::stri_rand_strings(
n = n,
length = sample(3:8, size=n, replace=TRUE),
pattern = "[a-z]"
),
"probability"
)
)
out / colSums(out)
}
topic_term_prob <- rep(5, 3) %>%
set_names(., paste0("topic", 1:length(.))) %>%
map(phi)
topic_term_prob
## $topic1
## probability
## sta 0.25680383
## tfli 0.22490878
## chvmalc 0.08163767
## gbxzbril 0.15485806
## venmmxqg 0.28179166
##
## $topic2
## probability
## mhy 0.25738923
## dnalha 0.01656447
## kcybybw 0.19192038
## pdoakeox 0.20242189
## knmrhfx 0.33170403
##
## $topic3
## probability
## wwt 0.02444428
## jawymz 0.37437258
## orxtty 0.28268967
## nfxzi 0.17689973
## vfbql 0.14159375
I'd like to convert this into a single tibble via row-binding, while moving the rownames into its own column, and also creating another column to keep track of the topic from which these topic-term distributions originated.
The as_tibble
method for matrices indeed supports the rownames
argument, allowing input of a
single string for the name of new column of rownames. This is not really documented on the
as_tibble
reference under
# S3 method for matrix
, but is documented under
as_tibble.matrix should keep rownames attribute #288
.
For the topic numbers, we can once again make use of the .id
argument of map_dfr
.
topic_term_prob_tibble <- topic_term_prob %>%
map_dfr(as_tibble, rownames="term", .id="topic")
topic_term_prob_tibble
## # A tibble: 15 x 3
## topic term probability
## <chr> <chr> <dbl>
## 1 topic1 sta 0.257
## 2 topic1 tfli 0.225
## 3 topic1 chvmalc 0.0816
## 4 topic1 gbxzbril 0.155
## 5 topic1 venmmxqg 0.282
## 6 topic2 mhy 0.257
## 7 topic2 dnalha 0.0166
## 8 topic2 kcybybw 0.192
## 9 topic2 pdoakeox 0.202
## 10 topic2 knmrhfx 0.332
## 11 topic3 wwt 0.0244
## 12 topic3 jawymz 0.374
## 13 topic3 orxtty 0.283
## 14 topic3 nfxzi 0.177
## 15 topic3 vfbql 0.142

Optional plots of topic term distributions
topic_term_prob_tibble <- topic_term_prob_tibble %>%
mutate(term = reorder_within(term, by=probability, within=topic))
ggplot(topic_term_prob_tibble, aes(x=probability, y=term, fill=topic)) +
geom_col(show.legend=FALSE) +
scale_y_reordered() +
facet_wrap(~topic, scales="free_y")