Functional Programming helpers from purrr and friends, part 2 | Composition
Jan 2, 2019

## Background

In the first instalment of this blog series, I explored how some functional programming (FP) concepts are implemented in R using `purrr` and associated packages. This post extends the exploration to cover two closely related concepts: composition and pointfree style.

Note that the main focus of the blog series is FP concepts that feel natural to R and are in line with the design philosophy of `purrr`:

“The goal of purrr is not try and turn R into Haskell in R: it does not implement currying, or destructuring binds, or pattern matching. The goal is to give you similar expressiveness to a classical FP language, while allowing you to write code that looks and feels like R” - purrr vignette

``````library(tidyverse)
library(magrittr)
library(fs)``````

## Composition

Composition is a technique for combining small functions to form a new function.

Suppose `g` and `f` are unary functions which perform transformations on dataframes. If we are interested in the sequential application of these functions, we can create a new function by applying `g` and `f` from right to left (as is conventional in mathematics) with nested calls.

``g(f(data))))``

This syntax works in R but the expression becomes difficult to read as the number of functions grows.

We will look at two ways of eliminating nested function calls using tidyverse implementations of composition.

### Motivating example

Imagine we have several small csv files on disk. We want to write a simple workflow to read data from the files and store them in a list of tibbles.

### Composition with pipes

The forward pipe operator (`%>%`) from Stefan Bache’s `magrittr` package makes it possible to implement function composition in R. We achieve this by chaining small functions with `%>%` and replacing the initial object with the dot placeholder.

This means we can solve the problem in our motivating example by composing `load_files()` with pipes:

``````load_files <- . %>%
path() %>%
dir_ls(regexp = "[.]csv\$") %>%

Stringing functions together with pipes and replacing the initial object with the dot placeholder creates a function in its own right. The composed function is a functional sequence.

We can inspect the contents of the functional sequence with `magrittr::functions()`:

``magrittr::functions(load_files)``
``````## []
## function (.)
## path(.)
##
## []
## function (.)
## dir_ls(., regexp = "[.]csv\$")
##
## []
## function (.)

One nice feature of functional sequences is that they work with standard subsetting tools. This gives us the flexibility to apply an entire pipeline (or a subset of the pipeline) to an object.

For example, we can use a `single square bracket` subset to generate a new functional sequence:

``magrittr::functions(load_files[-3])``
``````## []
## function (.)
## path(.)
##
## []
## function (.)
## dir_ls(., regexp = "[.]csv\$")``````

We can also use a `double square bracket` subset to generate a single function:

``load_files[]``
``````## function (.)
## path(.)``````

Which makes the following possible:

``````path_name <- "./data"

# construct path
#> ./data

# list csv files in the directory
#> ./data/file1.csv ./data/file2.csv ./data/file3.csv

# read csv files and store in list

### purrr composed functions

`purrr` provides a function for composition: `compose`. `compose` takes any number of functions and applies them in turn from right to left (by default). The order in which functions are applied can be reversed by setting the `.dir` argument to “forward”.

Using `purrr::compose`, the workflow becomes:

``````load_files <- compose(
~ path(.),
~ dir_ls(path = ., regexp = "[.]csv\$"),
.dir = c("forward")
)``````

Note that `purrr::compose` works with anonymous functions supplied as formulas.

``load_files``
``````## <composed>
## 1. <lambda>
## function (..., .x = ..1, .y = ..2, . = ..1)
## path(.)
## attr(,"class")
##  "rlang_lambda_function"
##
## 2. <lambda>
## function (..., .x = ..1, .y = ..2, . = ..1)
## dir_ls(path = ., regexp = "[.]csv\$")
## attr(,"class")
##  "rlang_lambda_function"
##
## 3. <lambda>
## function (..., .x = ..1, .y = ..2, . = ..1)
## attr(,"class")
##  "rlang_lambda_function"``````

## Pointfree

### A bit more background

In functional programming, points refer to function argument. Pointfree (also called tacit programming) means getting rid of arguments using composition.

Suppose we want to apply a series of transformations to a dataframe. The imperative way to do this in R will look like this:

``````df1 <- log(mtcars)
df2 <- tan(df1)
df3 <- sqrt(df2)
result <- abs(df3)``````

This approach has a number of drawbacks.

• We have to pass parameters everywhere
• we need to think about naming intermediate variables, but naming is notorious for being hard

“There are only two hard things in Computer Science: cache invalidation and naming things” - Phil Karlton

Our example is simple but in a real system, this would mean passing a ship-load of parameters and naming tons of variables.

In the following sections, we will look at two ways of simplifying the workflow with pointfree style.

### Sweet dreams are made of pipes

`%>%` is syntactic sugar for invoking multiple function calls on an object without needing to create variables to store intermediate results.

Using pipes, the series of data transformations becomes:

``````result <- mtcars %>%
log %>%
tan %>%
sqrt %>%
abs``````

What we are doing here is piping the dataset through the first function, then piping the result into the next function and so on.

### magrittr::freduce

`magrittr` has a function for this type of problem: `freduce`. `freduce` takes a list of functions, and applies the functions sequentially to the object.

``result <- freduce(mtcars, list(log, tan, sqrt, abs))``

## Wrapping up

We have covered the tidyverse implementation of two related functional programming techniques:

• Getting rid of nested function calls with composition
• Getting rid of arguments with pointfree

Remember that although the pipe makes code concise and easy to read, it is important not to take it too far. Where a pipeline has a lot of sequence of operations, consider creating intermediate steps with meaningful names. Also consider using the pipe only when the workflow transforms one main object (For more on this, see the pipes chapter of R for Data Science).