I started this thread with the intention of discussing different types of data pipeline collaborators I interact with as an R programmer/data scientist - but instead it evolved into a lovely chat on data collection & data cleaning with you all!
Let me try another attempt... https://twitter.com/WeAreRLadies/status/1326953859764269057
Let me try another attempt... https://twitter.com/WeAreRLadies/status/1326953859764269057
Yesterday we discussed interfacing with data owners/data collectors.
As data analysts, it can tempting to impose strict Tidy Data formatting policy on incoming data. But that isn't always possible, or in the best interests of a project... https://twitter.com/WeAreRLadies/status/1326954432978804737?s=20
As data analysts, it can tempting to impose strict Tidy Data formatting policy on incoming data. But that isn't always possible, or in the best interests of a project... https://twitter.com/WeAreRLadies/status/1326954432978804737?s=20
Another type of collaborator is your fellow R programmers. They read and write (and think in) R.
Here, facilitating collaboration is a fairly well-understood topic:
- git
- code reviews
- style-guides
- documentation (eg, {roxygen2})
- testing (eg {testit}, {testthat})
Here, facilitating collaboration is a fairly well-understood topic:
- git
- code reviews
- style-guides
- documentation (eg, {roxygen2})
- testing (eg {testit}, {testthat})
A third type of collaborator that I haven't seen as much discussion on is programmers that DON'T use R and don't read/write R code, but rely on R outputs or R products.
I think this type of collaboration is one of the tricky parts about #RinProduction!
I think this type of collaboration is one of the tricky parts about #RinProduction!
R can be a bit of a scary black box - particularly if R is used for model predictions and the model changes over time and data requirements change over time.
I think there's a need to be a bit of an ambassador for your R products to your engineering collaborators.
I think there's a need to be a bit of an ambassador for your R products to your engineering collaborators.
Being a good ambassador for your R data product requires the general good coding practices described above (well tested, well-documented).
There's also a soft-skill component.
Sometimes R/statistics/machine learning uses different terms to describe a concept familiar to your software development colleagues. Take some time to understand their concerns and needs, and anticipate differences in terminology.
Sometimes R/statistics/machine learning uses different terms to describe a concept familiar to your software development colleagues. Take some time to understand their concerns and needs, and anticipate differences in terminology.
If you aren't using R from pipeline beginning to end, but instead have code written in multiple languages by different developers, it's particularly important to have a clearly defined philosophy for who handles data validation and who tests what.
Engineers that "speak" both R and the other languages it interfaces with are so valuable (as are good product managers).
And I think when people discuss python versus R for production, this is one of those missing pieces - people that "know" R and all the frontend/backend code.
And I think when people discuss python versus R for production, this is one of those missing pieces - people that "know" R and all the frontend/backend code.
(Don't get me wrong, { #shiny} is great, but it isn't suitable for all instances where you might want #RinProduction!)