In the first instalment of this blog series, I explored how some functional programming (FP) concepts are implemented in R using
purrr and associated packages. This post extends the exploration to cover two closely related concepts: composition and pointfree style.
Note that the main focus of the blog series is FP concepts that feel natural to R and are in line with the design philosophy of
“The goal of purrr is not try and turn R into Haskell in R: it does not implement currying, or destructuring binds, or pattern matching. The goal is to give you similar expressiveness to a classical FP language, while allowing you to write code that looks and feels like R” - purrr vignette
library(tidyverse) library(magrittr) library(fs)
Composition is a technique for combining small functions to form a new function.
f are unary functions which perform transformations on dataframes. If we are interested in the sequential application of these functions, we can create a new function by applying
f from right to left (as is conventional in mathematics) with nested calls.
This syntax works in R but the expression becomes difficult to read as the number of functions grows.
We will look at two ways of eliminating nested function calls using tidyverse implementations of composition.
Imagine we have several small csv files on disk. We want to write a simple workflow to read data from the files and store them in a list of tibbles.
Composition with pipes
The forward pipe operator (
%>%) from Stefan Bache’s
magrittr package makes it possible to implement function composition in R. We achieve this by chaining small functions with
%>% and replacing the initial object with the dot placeholder.
This means we can solve the problem in our motivating example by composing
load_files() with pipes:
load_files <- . %>% path() %>% dir_ls(regexp = "[.]csv$") %>% map(read_csv)
Stringing functions together with pipes and replacing the initial object with the dot placeholder creates a function in its own right. The composed function is a functional sequence.
We can inspect the contents of the functional sequence with
## [] ## function (.) ## path(.) ## ## [] ## function (.) ## dir_ls(., regexp = "[.]csv$") ## ## [] ## function (.) ## map(., read_csv)
One nice feature of functional sequences is that they work with standard subsetting tools. This gives us the flexibility to apply an entire pipeline (or a subset of the pipeline) to an object.
For example, we can use a
single square bracket subset to generate a new functional sequence:
## [] ## function (.) ## path(.) ## ## [] ## function (.) ## dir_ls(., regexp = "[.]csv$")
We can also use a
double square bracket subset to generate a single function:
## function (.) ## path(.)
Which makes the following possible:
path_name <- "./data" # construct path load_files[](path_name) #> ./data # list csv files in the directory load_files[-3](path_name) #> ./data/file1.csv ./data/file2.csv ./data/file3.csv # read csv files and store in list data_list <- load_files(path_name)
purrr composed functions
purrr provides a function for composition:
compose takes any number of functions and applies them in turn from right to left (by default). The order in which functions are applied can be reversed by setting the
.dir argument to “forward”.
purrr::compose, the workflow becomes:
load_files <- compose( ~ path(.), ~ dir_ls(path = ., regexp = "[.]csv$"), ~ map(., read_csv), .dir = c("forward") )
purrr::compose works with anonymous functions supplied as formulas.
## <composed> ## 1. <lambda> ## function (..., .x = ..1, .y = ..2, . = ..1) ## path(.) ## attr(,"class") ##  "rlang_lambda_function" ## ## 2. <lambda> ## function (..., .x = ..1, .y = ..2, . = ..1) ## dir_ls(path = ., regexp = "[.]csv$") ## attr(,"class") ##  "rlang_lambda_function" ## ## 3. <lambda> ## function (..., .x = ..1, .y = ..2, . = ..1) ## map(., read_csv) ## attr(,"class") ##  "rlang_lambda_function"
A bit more background
In functional programming, points refer to function argument. Pointfree (also called tacit programming) means getting rid of arguments using composition.
Suppose we want to apply a series of transformations to a dataframe. The imperative way to do this in R will look like this:
df1 <- log(mtcars) df2 <- tan(df1) df3 <- sqrt(df2) result <- abs(df3)
This approach has a number of drawbacks.
- We have to pass parameters everywhere
- we need to think about naming intermediate variables, but naming is notorious for being hard
“There are only two hard things in Computer Science: cache invalidation and naming things” - Phil Karlton
Our example is simple but in a real system, this would mean passing a ship-load of parameters and naming tons of variables.
In the following sections, we will look at two ways of simplifying the workflow with pointfree style.
Sweet dreams are made of pipes
%>% is syntactic sugar for invoking multiple function calls on an object without needing to create variables to store intermediate results.
Using pipes, the series of data transformations becomes:
result <- mtcars %>% log %>% tan %>% sqrt %>% abs
What we are doing here is piping the dataset through the first function, then piping the result into the next function and so on.
magrittr has a function for this type of problem:
freduce takes a list of functions, and applies the functions sequentially to the object.
result <- freduce(mtcars, list(log, tan, sqrt, abs))
We have covered the tidyverse implementation of two related functional programming techniques:
- Getting rid of nested function calls with composition
- Getting rid of arguments with pointfree
Remember that although the pipe makes code concise and easy to read, it is important not to take it too far. Where a pipeline has a lot of sequence of operations, consider creating intermediate steps with meaningful names. Also consider using the pipe only when the workflow transforms one main object (For more on this, see the pipes chapter of R for Data Science).