Thread by @karlhigley, Since we’re talking about logs vs events today (apparently), I find the [...]

Karl Higley

karlhigley

Since we’re talking about logs vs events today (apparently), I find the Honeycomb view of events as high-cardinality structures containing all the relevant information about processing unit of works to be pretty compelling for recommender systems.

Here’s why...

In order to understand what’s happening, you need to have a lot of info available.

What candidates did we fetch? How many and which ones were filtered out? What scores did we assign to the remaining items? What were the propensities and positions of the recommended items?

What values of the scoring model features were used in the prediction? What candidate sources did we fetch? Did any of them time out? Which ones? How long did it take to execute each step of the process?

And so on.

If that information is scattered across many log lines in unstructured logs, you’re doomed. Parsing is a pain, but also figuring out how to join it all together is excruciating.

Using structured logs helps, but you still have to figure out how to find and merge all that info.

If, on the other hand, you collect that information into one unified data structure during request processing and emit a high-cardinality (wide) event that contains all of it together, then it becomes much easier to work with the data because you don’t have to join anything.

It also becomes a lot easier to correlate things that might not be obvious from logs:

“Ohhhh, we end up defaulting to popular items when candidate source service A times out and the user features contain feature B but not feature C. I get it now!”

You can follow @karlhigley.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: