Thread by @sh_reya, I have been feeling tired lately when thinking about the differences between [...]

I have been feeling tired lately when thinking about the differences between MLOps and DevOps. There are so many “gotchas” to keep track of in production ML systems, but I don't think ML systems are as different from traditional software systems as many people say. (1/13)

From an Ops perspective, I think there's only one real difference between traditional software and ML: in traditional software, you write code to solve a problem; in ML, you write code to write code to solve a problem. (2/13)

If you think about this long enough, you will realize that many assumptions can get lost in this game of telephone. Sometimes that second layer of code isn't what we think it is. We can only guess what it does through its outputs for inputs of our choosing. (3/13)

The whole domain of Ops centers around managing the life cycle of these production grade systems. For MLOps in particular, it feels like there are so many tools that prey on org leaders who know they have ML problems but don’t know how to think about these problems. (4/13)

People talk about how we MLOps is different than DevOps because we need continuous monitoring of data, models, and many other artifacts. But DevOps also knows we need to monitor things; case in point: Prometheus. (5/13)

Additionally, people hypothesize that ML pipelines are nondeterministic and therefore hard to debug, as opposed to to traditional software being mostly deterministic (where if something went wrong, you can replicate the bug). (6/13)

I don’t believe ML pipelines (when built correctly) are any more nondeterministic than traditional software pipelines — at any point in production, a trained model is literally a function with fixed values. (7/13)

If you version artifacts properly, can reproduce ML pipeline output (unless your model's inference is intentionally stochastic). Obviously I am ignoring hardware or OS changes, but such changes can also cause problems with traditional software determinism. (8/13)

The hard part is in developing the infrastructure to keep track of all the artifacts. Off-the-shelf tooling is not enough because data scientists and engineers also need to learn how to leverage these ideas for their specific uses. But this is only hard, not impossible. (9/13)

People also talk about how these disciplines are different because different stakeholders with different skill sets and different vocabularies are involved in MLOps. Maybe this is true, but it seems irrelevant to managing the life cycle of these systems. (10/13)

In the contrived case of ML research -- where only a few people with similar skill sets and shared vocabulary are working on a problem -- we still run into issues solving problems correctly and in a reproducible manner. (11/13)

These threads and articles about the complexities and nuances of MLOps vs DevOps wear me out. Often I find that these laundry lists of similarities and differences point to the authors not being convinced they know what they are talking about. (12/13)

I don't fully know what I'm talking about either, but I generally believe it's better to simplify rather than complicate. Identify the root problems your systems face and try to adapt existing Ops tools / frameworks to solve them before buying into new MLOps tooling hype. (13/13)

Latest Threads Unrolled: