Thread by @WeAreRLadies, I started this thread with the intention of discussing different types of [...]

We are R-Ladies

WeAreRLadies

I started this thread with the intention of discussing different types of data pipeline collaborators I interact with as an R programmer/data scientist - but instead it evolved into a lovely chat on data collection & data cleaning with you all!

Let me try another attempt... https://twitter.com/WeAreRLadies/status/1326953859764269057

https://twitter.com/WeAreRLadies/status/1326953859764269057

Yesterday we discussed interfacing with data owners/data collectors.

As data analysts, it can tempting to impose strict Tidy Data formatting policy on incoming data. But that isn't always possible, or in the best interests of a project... https://twitter.com/WeAreRLadies/status/1326954432978804737?s=20

https://twitter.com/WeAreRLadies/status/1326954432978804737?s=20

Another type of collaborator is your fellow R programmers. They read and write (and think in) R.

Here, facilitating collaboration is a fairly well-understood topic:
- git
- code reviews
- style-guides
- documentation (eg, {roxygen2})
- testing (eg {testit}, {testthat})

A third type of collaborator that I haven't seen as much discussion on is programmers that DON'T use R and don't read/write R code, but rely on R outputs or R products.

I think this type of collaboration is one of the tricky parts about #RinProduction!

R can be a bit of a scary black box - particularly if R is used for model predictions and the model changes over time and data requirements change over time.

I think there's a need to be a bit of an ambassador for your R products to your engineering collaborators.

Being a good ambassador for your R data product requires the general good coding practices described above (well tested, well-documented).

There's also a soft-skill component.

Sometimes R/statistics/machine learning uses different terms to describe a concept familiar to your software development colleagues. Take some time to understand their concerns and needs, and anticipate differences in terminology.

If you aren't using R from pipeline beginning to end, but instead have code written in multiple languages by different developers, it's particularly important to have a clearly defined philosophy for who handles data validation and who tests what.

Engineers that "speak" both R and the other languages it interfaces with are so valuable (as are good product managers).

And I think when people discuss python versus R for production, this is one of those missing pieces - people that "know" R and all the frontend/backend code.

(Don't get me wrong, { #shiny} is great, but it isn't suitable for all instances where you might want #RinProduction!)

You can follow @WeAreRLadies.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: