Functional Programming helpers from purrr and friends, part 3
Mar 23, 2019
Ernest Omane-Kodie
4 minute read

via GIPHY

Background

In the first and second instalments of this blog series, we touched on some functional programming concepts and their implementation in purrr and related packages. This instalment continues the exploration, showing techniques for writing functions which are capable of creating new functions: function factories, partial application and quasiquotation.

Let’s begin by loading the packages we need.

library(purrr)
library(rlang)
library(fs)
library(magrittr)
library(vroom)
library(readr)

Motivating example

We will illustrate new concepts by adapting the simple file reading problem from the previous instalment.

Recapping the example: Imagine we have several flat files in our current working directory.

mtcars %>% 
  split(.$carb) %>% 
  iwalk(~ readr::write_csv(.x, glue::glue("mtcars_{.y}.csv")))

We want to explore ways of creating functions to read these files into a list of dataframes.

A first attempt

Our first go at the problem using fs and purrr might look like this:

read_many_csv_files <- function(folder_path){
  # extract file names
  file_names <- path(folder_path) %>% 
    dir_ls(type = "file", glob = "*.csv") %>%
    map_chr(fs::path_file) 
  
  # load data
  out <- path(folder_path) %>% 
    dir_ls(regexp = "[.]csv$") %>% 
    map(readr::read_csv) %>% 
    set_names(file_names)
  return(out)
}
# read csv files
read_many_csv_files(".")

This is all well and good, but imagine further that we don’t want to restrict our function to only use readr::read_csv().

How do we extend our workflow to accommodate any csv reader?

In the sections that follow, we will walk through different ways of creating general-purpose versions of read_many_csv_files().

Function factories

Our second attempt at the motivating problem begins with a function factory (a function for creating other functions):

read_many_csv_files <- function(.f){
  force(.f)
  function(folder_path){
    # extract file names
    file_names <- path(folder_path) %>% 
      dir_ls(type = "file", glob = "*.csv") %>% 
      map_chr(fs::path_file) 
    
    # load data
    out <- path(folder_path) %>% 
      dir_ls(regexp = ".csv") %>% 
      map(.f) %>% 
      set_names(file_names)
    return(out)
  }
}

This gives us a version of read_many_csv_files() which is versatile enough to let us take different file readers (including Jim Hester’s awesome vroom package) for a spin.

read_with_vroom <- read_many_csv_files(vroom::vroom)
read_with_readr <- read_many_csv_files(readr::read_csv)
read_with_vroom(".")
read_with_readr(".")

What’s happening under the hood?

When read_with_vroom() is created, it captures the current environment which happens to be the enclosing environment of the function factory. Every time read_with_vroom() is called, it creates a fresh execution environment which is the enclosing environment of the manufactured function.

This makes lexical scoping possible: for instance since vroom::vroom() is not in the enclosing enviroment of read_with_vroom(), R will look for it in the parent environment - See Adv. R for more information

Partial application

In programming and in mathematics, function application means applying a function to its arguments. Partial function application means pre-filling one or more arguments of a function, to produce a new function with a fewer number of arguments.

purrr implements this technique with partial(). We can create a customisable csv reader with purrr::partial() in two steps.

First, we create a function which takes two arguments:

read_many_csv_files <- function(.f, file){
  # extract file names
  file_names <- path(file) %>% 
    dir_ls(type = "file", glob = "*.csv") %>%
    map(fs::path_file) 
  
  # load data
  out <- path(file) %>% 
    dir_ls(regexp = "[.]csv$") %>% 
    map(.f) %>% 
    set_names(file_names)
  return(out)
}

Next, we pre-fill .f with our file reader of choice. For example, if we’re feeling super excited about utils::read.csv we could pass that to read_many_csv_files():

read_with_base <- partial(read_many_csv_files, .f = read.csv)
read_with_base(".")

Quasiquotation

Quasiquotation involves using two techniques: quoting (i.e. capturing an expression to be evaluated later) and unquoting (i.e. evaluating parts of an otherwise quoted expression).

rlang provides new_function() which constructs a new function given its three components: list of arguments, body code and parent environment (the caller environment by default).

When combined with quasiquotation, new_function() can be used as an alternative to function factories. For example, we could write a customisable csv reader function:

read_many_csv_files <- function(.f){
  rlang::new_function(
    args = exprs(folder_path =),
    body = expr({
      file_names <- path(folder_path) %>% 
        ddir_ls(type = "file", glob = "*.csv") %>% 
        map_chr(fs::path_file) 
      
      out <- path(folder_path) %>% 
        dir_ls(regexp = "[.]csv$") %>% 
        map(!!.f) %>% 
        set_names(file_names)
    }) ,
    env = caller_env()
  )
}

And use it to generate a function to read csv files with vroom::vroom():

read_with_vroom <- read_many_csv_files(vroom::vroom)
read_with_vroom(".")

How does it work?

As with function factories, when we create read_with_vroom(), it captures the execution environment of the function that created it. When we invoke read_with_vroom(), it creates a new execution environment of its own.

In our example, the body argument of rlang::new_function() captures (quotes) the multiline expression which represents code inside the manufactured function. At the moment of quoting these expressions, .f is known (i.e. can be looked up from the caller environment). We use rlang::!! to evaluate (unquote) .f before we capture the expressions in the body of rlang::new_function() - See Adv. R for more information.

Wrapping up

This concludes our brief look at three techniques for code generation: function factories, partial application and quasiquotation.

If after reading this you crave more purrr, FP and tidyeval goodness, head over to these resources: