Thread by @henrikbengtsson, There's lots of discussion around the new |> operator & whether a(b(x)) [...]

There's lots of discussion around the new |> operator & whether a(b(x)) is "better" than x |> b() |> a(), performance, debugging, etc. Note, they're identical to #rstats after parsed;

e0 <- quote(a(b(x)))
e1 <- quote(x |> b() |> a())
> identical(e0, e1)
[1] TRUE

/1 https://twitter.com/henrikbengtsson/status/1334703130378788866

https://twitter.com/henrikbengtsson/status/1334703130378788866

This is because |> is processed during *parsing*, which is the first step performed by any programming language. Parsing runs nothing! It just deconstructs the human-readable code into an abstract syntax tree (AST). Evaluation happens afterward

/2

We can also see this identity using {lobstr}:

> lobstr::ast(g(f(x)))
+- g
└-+- f
└- x

> lobstr::ast(x |> f() |> g())
+- g
└-+- f
└- x

/3

We can only tell the difference from the 'srcref' attribute

> parse(text="g(f(x))")
expression(g(f(x)))

> parse(text="x |> f() |> g()")
expression(x |> f() |> g())

but that's just for display; R evaluates the two expressions in the exact same way with the same performance

/4

This duality of g(f(x)) and x |> f() |> g() in the parser makes it "safe" to introduce |> into the R language bc its 100% backward compatible w/ the existing R ecosystem. No matter how hard you try, you shouldn't be able to find a case where f(x) works, but x |> f() doesn't

/5

Basically, we don't have to worry about surprising corner cases and side effects showing next months, in a year, or ten years from now (*)

(*) This claim/prediction is soo gonna come back to me

/6

In contrast, other core-level changes to R are much more complicated to introduce. For example, getting to the point where if (1:2 == 1) { ... } produces an error in R (as it should be) is a much slower roll-out process since it breaks some existing code

/7

Now, the magrittr %>% pipe didn't have the luxury of being able to work at the parser level, so they had to try to achieve the above at runtime … and they did it very well

I think it's important to understand that |> is not the same as %>% ... but they're certainly similar

/8

Some important differences: Static code inspection can be done with code using |> but that can't be done reliably with %>%, or any other infix operator that can be redefined at runtime. This matters in, for instance, 'R CMD check' and parallel processing

/9

Watching from the sideline, the original author & following maintainers have tried their best to have %>% emulate as far as possible what we're getting with |>. They made another big leap towards harmonizing it further in magrittr 0.2.0 (Nov 2020) https://www.tidyverse.org/blog/2020/08/magrittr-2-0/

/10

Although I'm not a heavy user, it's been facinating to follow the evolution of magrittr %>% and its uptake from the perspective of the R language and the R community. It's a process that's been going on for many years

/11

The magrittr %>% to base R |> shows there's a path for community-driven language changes to #rstats. It started out as proof-of-concept, was picked up by several, and embraced by many more to a point of no return, and R Core listened and brought it in

/12

I should clarify that the above is based on my understanding and interpretation of magrittr %>% and base R |>, and their history. I haven't contributed to either but I can say: Good job, good job.

I'm now handing over the mic to those who know this better than I do.

/13

Latest Threads Unrolled: