Functional Programming helpers from purrr and friends, part 2 | Composition
Jan 2, 2019
Ernest Omane-Kodie
5 minute read

via GIPHY

Background

In the first instalment of this blog series, I explored how some functional programming (FP) concepts are implemented in R using purrr and associated packages. This post extends the exploration to cover two closely related concepts: composition and pointfree style.

Note that the main focus of the blog series is FP concepts that feel natural to R and are in line with the design philosophy of purrr:

“The goal of purrr is not try and turn R into Haskell in R: it does not implement currying, or destructuring binds, or pattern matching. The goal is to give you similar expressiveness to a classical FP language, while allowing you to write code that looks and feels like R” - purrr vignette

library(tidyverse)
library(magrittr)
library(fs)

Composition

Composition is a technique for combining small functions to form a new function.

Suppose g and f are unary functions which perform transformations on dataframes. If we are interested in the sequential application of these functions, we can create a new function by applying g and f from right to left (as is conventional in mathematics) with nested calls.

g(f(data))))

This syntax works in R but the expression becomes difficult to read as the number of functions grows.

We will look at two ways of eliminating nested function calls using tidyverse implementations of composition.

Motivating example

Imagine we have several small csv files on disk. We want to write a simple workflow to read data from the files and store them in a list of tibbles.

Composition with pipes

The forward pipe operator (%>%) from Stefan Bache’s magrittr package makes it possible to implement function composition in R. We achieve this by chaining small functions with %>% and replacing the initial object with the dot placeholder.

This means we can solve the problem in our motivating example by composing load_files() with pipes:

load_files <- . %>% 
  path() %>% 
  dir_ls(regexp = "[.]csv$") %>% 
  map(read_csv)

Stringing functions together with pipes and replacing the initial object with the dot placeholder creates a function in its own right. The composed function is a functional sequence.

We can inspect the contents of the functional sequence with magrittr::functions():

magrittr::functions(load_files)
## [[1]]
## function (.) 
## path(.)
## 
## [[2]]
## function (.) 
## dir_ls(., regexp = "[.]csv$")
## 
## [[3]]
## function (.) 
## map(., read_csv)

One nice feature of functional sequences is that they work with standard subsetting tools. This gives us the flexibility to apply an entire pipeline (or a subset of the pipeline) to an object.

For example, we can use a single square bracket subset to generate a new functional sequence:

magrittr::functions(load_files[-3])
## [[1]]
## function (.) 
## path(.)
## 
## [[2]]
## function (.) 
## dir_ls(., regexp = "[.]csv$")

We can also use a double square bracket subset to generate a single function:

load_files[[1]]
## function (.) 
## path(.)

Which makes the following possible:

path_name <- "./data"

# construct path
load_files[[1]](path_name) 
#> ./data

# list csv files in the directory
load_files[-3](path_name)
#> ./data/file1.csv ./data/file2.csv ./data/file3.csv 

# read csv files and store in list
data_list <- load_files(path_name)

purrr composed functions

purrr provides a function for composition: compose. compose takes any number of functions and applies them in turn from right to left (by default). The order in which functions are applied can be reversed by setting the .dir argument to “forward”.

We can create a purrr composed function for reading csv files from disk with:

load_files <- compose(
  ~ path(.),
  ~ dir_ls(path = ., regexp = "[.]csv$"),
  ~ map(., read_csv),
  .dir = c("forward")
)

Note that purrr::compose works with anonymous functions supplied as formulas.

load_files
## <composed>
## 1. <lambda>
## function (..., .x = ..1, .y = ..2, . = ..1) 
## path(.)
## attr(,"class")
## [1] "rlang_lambda_function"
## 
## 2. <lambda>
## function (..., .x = ..1, .y = ..2, . = ..1) 
## dir_ls(path = ., regexp = "[.]csv$")
## attr(,"class")
## [1] "rlang_lambda_function"
## 
## 3. <lambda>
## function (..., .x = ..1, .y = ..2, . = ..1) 
## map(., read_csv)
## attr(,"class")
## [1] "rlang_lambda_function"

Pointfree

A bit more background

In functional programming, points refer to function argument. Pointfree (also called tacit programming) means getting rid of arguments using composition.

Suppose we want to apply a series of transformations to a dataframe. The imperative way to do this in R will look like this:

df1 <- log(mtcars)
df2 <- tan(df1)
df3 <- sqrt(df2)
result <- abs(df3)

This approach has a number of drawbacks.

  • We have to pass parameters everywhere
  • we need to think about naming intermediate variables, but naming is notorious for being hard

“There are only two hard things in Computer Science: cache invalidation and naming things” - Phil Karlton

Our example is simple but in a real system, this would mean passing a ship-load of parameters and naming tons of variables.

How do we correct this?

In the following sections, we will look at two ways of simplifying the workflow with pointfree style.

Sweet dreams are made of pipes

%>% is syntactic sugar for invoking multiple function calls on an object without needing to create variables to store intermediate results.

This means we can use pipes to rewrite the imperative solution in one expression, resulting in clarity to the dataflow with no intermediate variables:

result <- mtcars %>%
  log %>% 
  tan %>% 
  sqrt %>% 
  abs

What we are doing here is piping the dataset through the first function, then piping the result into the next function and so on.

magrittr::freduce

magrittr has a function for this type of problem: freduce. freduce takes a list of functions, and applies the functions sequentially to the object.

result <- freduce(mtcars, list(log, tan, sqrt, abs))

Wrapping up

We have covered the tidyverse implementation of two related functional programming techniques:

  • Getting rid of nested function calls with composition
  • Getting rid of arguments with pointfree

Remember that although the pipe makes code concise and easy to read, it is important not to take it too far. Where a pipeline has a lot of sequence of operations, consider creating intermediate steps with meaningful names. Also consider using the pipe only when the workflow transforms one main object (For more on this, see the pipes chapter of R for Data Science).

References