Was chatting with a coworker today about what "data strategy" means and figured I'd share some thoughts since it was a good discussion. You can take these concepts and apply them to fighting white supremacy or pandemic management, if you'd like.
To understand how to use data effectively, you need two things:

- a vision for what you want data to help you do; and
- a plan for getting all of your people fluent with how to use data.
The rest is implementation details. But broadly speaking, you can break down the way data is being used in (at least) four escalating categories:

- descriptive
- diagnostic
- predictive
- prescriptive

Each of these builds on the one before it.
When we use data for descriptive purposes, we are only looking backwards to describe where we've been. This type of analysis answers the "what" question.
When we advance that a little, we start asking the "why" question. This is diagnostic. This requires us to use information that exists *outside* the data. For instance, using modeling techniques.
But the "d"-levels are backwards looking only. They don't tell us about the future. And that's the thing about using data, we want to use data to shape our future, so it's not enough to stop here.
When we get to predictive data usage, we're starting to use the past to inform the future. Predictive data usage is the future-facing complement to descriptive data usage: it answers the "what" question.
The problem is that "what" questions are way easier to answer than "why" questions, and a big problem I see is people skipping the diagnostic step and going from descriptive analytics to predictive analytics. You can't effectively predict what if you don't understand the why.
Finally, we have the prescriptive level. Here, we are no longer using data to tell us about a trajectory we are on by our god-given destiny. Instead, we state where we want to be, and then we use data to tell us what actions we need to take to get there.
This requires a lot of trust in data. It also requires us to trust all of the people involved in preparing the data. This is perhaps why it's so hard and rare to see implemented in practice.
This takes power out of the hands of people who are Very Senior and Very Smart and distributes it over dozens or hundreds of other people (who are also probably Very Smart). This can be uneasy for an organization that wants to consolidate responsibility.
In order to get here, you don't just need good data people. You also need a culture of trust and blamelessness. You need to be able to constantly look back and evaluate decisions in the light of new learnings and new data.
We see these mistakes being made, for example, in how the Capitol violence wasn't prevented. The people in charge did not want to see the data for what it was telling them. Likewise, with the pandemic.
Anyhow, data engineering is fundamentally about enabling decisions, not about shuffling data from one place to another. You can move data very skillfully but that effort is for naught if you don't have data fluency.
You can follow @EmilyGorcenski.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.