Motivation
This series of blog posts is inspired by David Robinson’s tweet:
When you’ve given the same in-person advice 3 times, write a blog post.
In each instalment in the series, I will walk through simple scenarios to illustrate how functional programming tools from purrr
and related packages can bring quality of life improvements to tidyverse workflows.
library(tidyverse)
library(magrittr)
library(kableExtra)
pretty_print <- function(df){
result <- df %>%
kable() %>%
kable_styling(font_size = 14) %>%
row_spec(0, bold = T, font_size = 14)
return(result)
}
Rowwise operations on all columns in a dataframe
Background
We will use a small subset of the planes
dataset to illustrate this example.
df <- nycflights13::planes %>%
select(tailnum, year, engine) %>%
head(4)
df %>% pretty_print
tailnum | year | engine |
---|---|---|
N10156 | 2004 | Turbo-fan |
N102UW | 1998 | Turbo-fan |
N103US | 1999 | Turbo-fan |
N104UW | 1999 | Turbo-fan |
Suppose we want to create a new column by concatenating elements in each row of the dataframe to form a string.
Base R provides a functional (a function which takes one or more functions as arguments) for performing this type of operation: apply
. apply
belongs to a special group of functionals: the map family
(functions that take a function and a list as inputs, and return a new list with the function applied to each element from the list).
How does it work?
Under the hood, apply
inspects the object to which the function will be applied and coerces it to a matrix if the object is two-dimentional (e.g. a dataframe). Otherwise, the object is coerced to an array. After this step, the function is applied to either the rows (along MARGIN
1) or columns (along MARGIN
2) of the matrix or array.
This means that rowwise string concatenation in base R will look like this:
df_base <- df
df_base$id = apply(df, 1, paste, collapse = " ")
pretty_print(df_base)
tailnum | year | engine | id |
---|---|---|---|
N10156 | 2004 | Turbo-fan | N10156 2004 Turbo-fan |
N102UW | 1998 | Turbo-fan | N102UW 1998 Turbo-fan |
N103US | 1999 | Turbo-fan | N103US 1999 Turbo-fan |
N104UW | 1999 | Turbo-fan | N104UW 1999 Turbo-fan |
How can we implement this operation with tidyverse tools?
We can solve this with purrr::pmap
in a dplyr::mutate
call. pmap
is a member of purrr
’s implementation the map
family of functions which allows vectorized iteration over more than one argument.
If you are new to purrr
, the iteration chapter of R for Data Science is worth reading.
We will solve the problem in our example in two steps:
Create an anonymous function which performs string concatenation on any number of arguments passed to it
Pass the anonymous function as an argument to
pmap_chr
.
pmap_chr
is a stricter variant of pmap
which always returns a character vector. This ensures that there are no surprises in our output as a result of R’s weak type system.
df %>%
mutate(id = pmap_chr(., ~paste(..., collapse = " "))) %>%
pretty_print()
tailnum | year | engine | id |
---|---|---|---|
N10156 | 2004 | Turbo-fan | N10156 2004 Turbo-fan |
N102UW | 1998 | Turbo-fan | N102UW 1998 Turbo-fan |
N103US | 1999 | Turbo-fan | N103US 1999 Turbo-fan |
N104UW | 1999 | Turbo-fan | N104UW 1999 Turbo-fan |
Notice that instead of passing an anonymous function to pmap
we passed a formula. purrr
supports this syntax to make it possible for users to create very compact anonymous functions on the fly.
This works because, under the hood, pmap
(like all purrr
functionals) translates formulas into mapper functions using purrr::as_mapper
. This means that the formula in our example will look like this behind the scenes:
as_mapper(~paste(..., collapse = " "))
## <lambda>
## function (..., .x = ..1, .y = ..2, . = ..1)
## paste(..., collapse = " ")
## attr(,"class")
## [1] "rlang_lambda_function"
Rowwise operations on specific columns in a dataframe
As a concrete example, imagine we want to create a new column by concatenating year and tailnum.
How do we achieve this with tidyverse tools?
mutate + pmap
There are a few ways of doing this with a similar approach to the mutate
+ pmap
workflow from the previous example.
We can solve the problem with a neat trick by taking a leaf out of Hadley Wickham’s book. This approach uses list()
, instead of the dot placeholder, to match arguments (to the anonymous function) by name.
df %>%
mutate(id = pmap_chr(list(year, tailnum), ~paste(..., collapse = " "))) %>%
pretty_print()
tailnum | year | engine | id |
---|---|---|---|
N10156 | 2004 | Turbo-fan | 2004 N10156 |
N102UW | 1998 | Turbo-fan | 1998 N102UW |
N103US | 1999 | Turbo-fan | 1999 N103US |
N104UW | 1999 | Turbo-fan | 1999 N104UW |
rap
While writing up this post, I stumbled upon Romain Francois’ rap package which provides a nice alternative to the pmap
+ mutate
approach to row-oriented operations.
rap
, like map
, works with anonymous functions supplied as formulas. The main difference is that with rap
the anonymous functions can directly use column names.
library(rap)
df %>%
rap(id = character() ~ paste(year, tailnum, collapse = " ")) %>%
pretty_print
tailnum | year | engine | id |
---|---|---|---|
N10156 | 2004 | Turbo-fan | 2004 N10156 |
N102UW | 1998 | Turbo-fan | 1998 N102UW |
N103US | 1999 | Turbo-fan | 1999 N103US |
N104UW | 1999 | Turbo-fan | 1999 N104UW |
Note that the left hand side (lhs) of the formula specifies the type of results returned. If the lhs is empty, rap
adds a list column to the dataframe.