Home

Two non-standard ways of organising R projects

Two unconventional patterns I use when organising R projects are,

Anonymous function

Use a nameless function to return an object with all the sub functions needed similar to a node.js package. The main advantage is that you don’t litter your default namespace with lot of function names. For example, if your arith.r file has,

(function(){
  arith<-list()
  arith$square <- function(a){return(a*a)}
  arith$cube <- function(a){return(arith$square(a)*a)}
  return(arith)
})()

In another script/project, you can source this file and store the individual functions to in an object. This is done by calling for its “value” as show below,

> arith <- source("arith.r")$value

After this you can use the functions by using,

> arith$square(2)
[1] 4
> arith$cube(3)
[1] 27

This gives us a very preliminary way of creating tiny custom packages. Note that the object returned from anonymous function does have state (internal variables) making the function similar to a class definition in Object oriented programming.

Rscript environment

You can read from and write output to standard IO in linux/unix from Rscript. This helps us immensely when there are millions of small files needs to be processed parallely.  We can just make small Rscripts which read from stdin and write to stdout and chain them together with pipes and run the pipe on multiple files/ streams.

For example, create a square.r file containing,

!# /bin/Rscript
# Adds a column with squares to the csv

suppressMessages(library('tidyverse'))

read.table(file('stdin'),sep = ",") %>%
  mutate(square = value * value) %>%
  format_csv %>%
  cat

cube.r file containing,

!# /bin/Rscript
# Adds a column with cubes to the csv
suppressMessages(library('tidyverse'))
read.table(file('stdin'),sep=",") %>%
mutate(cube=square*value) %>%
format_csv %>%
cat

data.csv containing,

value
1
2
4

Now we can apply previous two functions on the csv by doing,

$ cat data.csv | square.r | cube.r > result.csv

The result.csv will have,

value,square,cube
1,1,1
2,4,8
4,16,64

The advantages of this pattern are that you can encapsulate problems as tiny components and you can wire these individual components with other system programs. For example,  use curl to stream a file from web, filter it using grep, process it with two different Rscripts, write it to postgres database using psql. We can mix and match different tools and choose the ones which are really good at what they do.

Leave a Reply

Your email address will not be published. Required fields are marked *