What's in [a] purrr?
Learning to purrr (Part 1)
I have flirted with the principles of functional programming in R over the years, yet I’ve never given the paradigm sufficient time to feel comfortable in its presence1.
In this series of posts I will tackle my mental blocks by working on increasingly complex applications of functional programming. The path will meander a bit because this is essentially a journey of discovery for me.
Note: I will not be spending any time working with the
applyfamily of functions in this series. I have used them successfully and I recommend that you read their documentation to understand how they are both similar and different to the
map*family of functions in
How Hard is it to Purrr?
The more I use R, the more I grow to appreciate the ‘ecosystem’ in which a package operates.
This is particularly true of the ‘Tidyverse’ experience, where the core packages are carefully curated to do one or a few things very well.
According to the Tidyverse website, the eight core packages are
I’ve spent a lot of time working with
stringr, and I’m still learning new tricks with each of them.
So how does
purrr compare to the other seven core packages in terms of size and/or complexity?
To answer this question, I’m going to start by comparing the
namespaces of the eight, core Tidyverse packages.
One very quick (and dirty) way that I use to get a feel for a package is
As the function name suggests, it returns an character vector containing the names of all the objects available to me in a package’s
Let me demonstrate this quickly by listing the first six exported functions from the
getNamespaceExports("utils") |> # Bonus: This is the base R pipe (v 4.1+) head(6)
##  "aspell_package_Rd_files" "vi" ##  "read.table" "URLdecode" ##  "rc.status" "write.csv"
Did you know that you could call
vi (the Unix text editor) from within the R workspace?
You can call
vi inside R or RStudio if you are in a Unix OS (or using WSL - like me!):
vi("I learn new things every day!", file = "always-learning.txt")
But I digress…
Clearing the Throat
purrr, I will use my trusty Tidyverse tools - but this time I will be including
We can load the eight core packages of the Tidyverse with a single library call.
The next step is to make a character vector containing the core package names so that I can iterate over them for different purposes.
tidyverse_core <- c("ggplot2", "dplyr", "tidyr", "readr", "purrr", "tibble", "stringr", "forcats")
We have already seen how to return a vector of the exported objects in the
namespace of a package.
Now I want to return all the exported objects from each of the eight core packages of the Tidyverse.
In an ‘ideal world’2, my trusty
getNamespaceExports function would take a vector of inputs and return a well-curated object that contains the information I want.
Let’s cross our fingers and see if we get lucky:
getNamespaceExports(tidyverse_core) %>% length()
##  523
getNamespaceExports function appears to work with a vector, the sad truth is that it has only returned the objects for the first name in my
tidyverse_core vector, which is
In Purrr-suit of Answers
Now I know I have to use iteration to step through each name in my
I can either write a very careful
for loop and assign each step’s output to a pre-allocated output of my choice, or I can learn to tap the power of the ‘ready-to-use’
for loops in the
First I will use
purrr::map to call the
getNamespaceExports function on each element of the
tidyverse_core vector and then print the
str (structure) of the object that
For clarity, I will list the
purrr::namespace prefix before every
purrrfunction that I use in each chunk.
tidyverse_core %>% purrr::map(getNamespaceExports) %>% str()
## List of 8 ## $ : chr [1:523] "draw_key_vpath" "StatDensity2dFilled" "find_panel" "stat_density2d_filled" ... ## $ : chr [1:288] "rows_upsert" "src_local" "db_analyze" "n_groups" ... ## $ : chr [1:65] "complete" "tribble" "pivot_wider" "full_seq" ... ## $ : chr [1:115] "read_log" "read_fwf" "read_tsv" "spec_csv2" ... ## $ : chr [1:178] "pmap_chr" "invoke_map_df" "as_vector" "is_vector" ... ## $ : chr [1:47] "set_tidy_names" "lst" "size_sum" "deframe" ... ## $ : chr [1:49] "str_length" "invert_match" "str_to_upper" "str_ends" ... ## $ : chr [1:36] "fct_inseq" "fct_match" "first2" "fct_explicit_na" ...
I know that my function call has worked because I got back a
list containing eight vectors of object names.
This doesn’t tell me the relative size of the eight packages though so I need to perform another iteration.
This time I want to call the
length function on each vector in the list that
tidyverse_core %>% purrr::map(getNamespaceExports) %>% purrr::map(length) %>% str()
## List of 8 ## $ : int 523 ## $ : int 288 ## $ : int 65 ## $ : int 115 ## $ : int 178 ## $ : int 47 ## $ : int 49 ## $ : int 36
This is looking good, but I can’t be certain which number relates to which package name. I would prefer to output a dataframe that contains a column of the package names and a column listing the number of exported objects.
What happens if I pass the
list output as a column to the
tidyverse_core %>% purrr::map(getNamespaceExports) %>% purrr::map(length) %>% tibble( package = tidyverse_core, num_exports = .)
## # A tibble: 8 × 2 ## package num_exports ## <chr> <list> ## 1 ggplot2 <int > ## 2 dplyr <int > ## 3 tidyr <int > ## 4 readr <int > ## 5 purrr <int > ## 6 tibble <int > ## 7 stringr <int > ## 8 forcats <int >
Well that’s not great.
The tibble I wanted is printed but the values returned by
map(length) are hidden from view.
This is because tibbles support
list columns (or nesting).
I need to end up with a numeric vector to pass to the
Luckily I can do this with the
The documentation tells me that
map_dbl functions exactly like
map but that
map_dbl returns a numeric vector as output.
map_* functions work the same way - returning a vector of the type in the function suffix.
map_chr returns a character vector, and
map_lgl returns a logical vector.
With this change to
tidyverse_core %>% purrr::map(getNamespaceExports) %>% purrr::map_dbl(length) %>% tibble( package = tidyverse_core, num_exports = .)
## # A tibble: 8 × 2 ## package num_exports ## <chr> <dbl> ## 1 ggplot2 523 ## 2 dplyr 288 ## 3 tidyr 65 ## 4 readr 115 ## 5 purrr 178 ## 6 tibble 47 ## 7 stringr 49 ## 8 forcats 36
Well - almost.
I’d like to order the rows for readability so the last thing I do below is add an
arrange call on the dataframe.
tidyverse_core %>% purrr::map(getNamespaceExports) %>% purrr::map_dbl(length) %>% tibble( package = tidyverse_core, num_exports = . ) %>% arrange(desc(num_exports))
Before we look at the final output, let’s mentally review each line of this chunk.
- Using the
purrr::mapfunction, I call the
getNamespaceExportsfunction on each element of the
- I want to know the length of each of the eight returned vectors from my
purrr::mapcall. These are contained in a list, so I use
purrr::map_dblto call the
lengthfunction on each vector and return a numeric vector.
- Printing a list makes for messy output, so I use
tibbleto create a data.frame with two columns named
- Lastly, the
arrange(desc(exports))call reorders the data so that the packages are listed in descending order according to the
… which produces this concise summary:
## # A tibble: 8 × 2 ## package exports ## <chr> <dbl> ## 1 ggplot2 523 ## 2 dplyr 288 ## 3 purrr 178 ## 4 readr 115 ## 5 tidyr 65 ## 6 stringr 49 ## 7 tibble 47 ## 8 forcats 36
As you might have predicted,
ggplot2 exports the most objects of the eight, core Tidyverse packages followed by
dplyr and then
I must point out that not all of these objects will be functions that we will need to learn to use.
Some of them are supporting functions that are used by the other
purrr functions e.g.,
Nevertheless, there are enough functions in
purrr to keep us entertained for a long while yet.
Sidenote: If anyone tries to make you feel guilty about ‘Googling’ your way to solutions with
ggplot2, ask them to name all 523 objects exported by
A Feeling of Purrr-fection
When I think about all the things that are happening within the piped code sequence above, I feel incredibly satisfied with how little code I had to write to achieve it.
Using a vector, we generated a very large list of eight vectors with different lengths and then summarised the list elements with a single line of code. ‘Prettifying’ the final output took as many lines as the computation.
That is the beauty of
purrr and functional programming.
When I use
purrr, I save myself all the time of writing the equivalent
for loops that would achieve the same result.
If I had to code my own
for loops here, I’d probably still be writing this blog post.
An ‘ideal world’ is the one in which someone else has written a function that exactly matches my use case so that I can do exactly what I want with minimal effort. ↩︎
You can test this if you want to confirm this using