Data manipulation basics with tidyverse in R – Part 1 – Piping data between functions

Last July I attended the talk at LSE by Hadley Wickham on tidyverse and since then I have been working with tidyverse sparingly. My opinion on the whole thing is that it is brilliant! Along with ggplot2 it is clear, concise and powerful for analysis and visualisation of most data I encounter on daily basis. The things I like about tidyverse are,

  1. Pipes: similar to linux, each function does one thing and does that well and the data can be piped from one function to another with “%>%” operator. For me it kind of bridges one of the major gap between shell scripting and R (which are very similar to begin with)
  2. Grammar: Similar to vim, when dealing with tabular data, the grammar to manipulate them is very clear and consistent.

In this post I’ll look into the pipe operator in detail,

Piping is primarily done with the operator “%>%”, this is similar to the “|” operator in linux shell. It takes the output of the previous function and uses it as input for the next function. I used to previously do this in R by storing the output of the first function in an intermediate object but it gets tedious really quickly

## Instead of doing this,
sand <- dig(earth)
brick <- bake(sand)
wall <- lay(bricks)

## We can do this
wall <-
    earth %>% 
    dig() %>%
    bake() %>% 
    lay()

Special cases of the piping are,
%T>% This one sends the input of the previous function as input for the next function essentially skipping the previous function in the pipe. I usually use this I have to return the final output after plotting as shown below,

 data %>%
    modify_1() %T>%
    plot() %>%
    modify_2() %>% 
    plot()

%<>% This one sends the end of the pipe back to the object before it. This is useful when we are trying to transform the object itself rather than creating new one. For example,

## This, 
data <- 
    data %>%
    modify_1() %>%
    modify_2()

## Can be simplified to,
data %<>%
    modify_1() %>%
    modify_2()

The pipes pass the output to the first input variable to the next function. This can be a problem with some functions. To overcome this we use “.” to denote where the data is passed to in the function following the pipe. For example,

## This, 
1:5 %>%
    data.frame(
        x=.
        y=.^2 )

## returns the data frame,
#     x   y
# 1   1   1
# 2   2   4
# 3   3   9
# 4   4  16
# 5   5  25

That covers the basics of pipes. In the next post I’ll talk about the data manipulation functions within tidyverse in detail.

2 thoughts on “Data manipulation basics with tidyverse in R – Part 1 – Piping data between functions”

Leave a Reply