🧵 comparing speed metrics and scores is a significant problem. here’s why.

#webperf #sitespeed #perfmatters
some people come to speed monitoring with an expectation to see the same results across different testing services.

this is common in a scenario of comparing service x to goog (pagespeed insights, http://web.dev  measure, etc., which are treated as the only authority).
it’s not an unreasonable expectation. in an ideal world, the results would be identical or very similar everywhere you track. that way, there are no trust issues in terms of reliability and no confusion in terms of metrics. teams focus on tracking & improving. perfect!
this expectation might be personal or driven by stakeholders. owners might be using goog tools as reference, focusing solely on speed for seo (thus only what goog reports matters), not knowing about variability or a myriad other reasons.
so, what happens when people start comparing speed metrics and scores?

1️⃣ they see divergent results. it brings confusion to the team about which results are "real" and "trustworthy". is there a bug? what’s happening?
2️⃣ whoever drove the speed monitoring initiative, has to explain and defend the difference to stakeholders. they might or might not be receptive. the person might or might not have the knowledge to talk about variability and infra influence on measurements.
3️⃣ in a not-so-optimistic scenario, team ditches their speed tool and chooses another over reliability concerns. sadly, they observe similar patterns. they still don’t trust metrics and performance. frustration grows, not many speed improvements are made.
4️⃣ in an extreme scenario, teams might give up on performance altogether as it doesn’t seem reliable, "real" and "scientific" enough. there are so many unknowns and differences they find it hard to take seriously.
5️⃣ teams might also ditch comprehensive perf platforms in favour of one-off tests google tooling because they believe those results are the only believable ones. they might miss out on transparency, continuous monitoring and other features that make perf work much less complex.
many other things can happen too, but in many cases, comparisons bring confusion, frustration and lack of trust not only in specific tooling but speed monitoring in general. it might become even harder to convince them about the importance of speed. it's a lose-lose.
so why are results between speed platforms so different, you might ask? there's a multitude of reasons 👇🏻
1️⃣ speed monitoring platforms use various tools to test and collect metrics: @____lighthouse, webpagetest, a combination of both or an entirely custom solution.

different tools / frameworks = different results.
2️⃣ each tool has a different infrastructure: location of tests (think latency), CPU + GPU of the test machines and emulated network speed for the test itself (to mimic devices, such as mobiles, etc.).

different infra = different results.
3️⃣ tools running lighthouse with defaults use simulation (known as lantern mode). when simulation is on, sites are tested at full speed (no network or CPU limiting), then a simulation is applied on top of the results.
3️⃣ this means that lighthouse is ‘extrapolating’ based on a full speed test, not on what actually happened in terms of user experience and real conditions. without simulation, network + cpu limiting is applied during the test.

simulation off or on = different results.
with so many factors at play, it's impossible to compare metrics, and even more so, performance score (that consists of many metrics being weighted).

this is not the answer people want to hear, but it is true.
in a way, this feels like an impossible problem to solve, because we can't change how web's networking works or make all infrastructure everywhere the same to deliver on the expectation of the same results.
here's what we CAN do:

1️⃣ educate on how speed measurements are collected + what affects them
2️⃣ maximise reducing variability in results
3️⃣ understand with WHY people compare: what is their goal? what do they need?
if you are tracking perf, you can also familiarize yourself with those limitations and act accordingly when planning your speed strategy. i wrote about those limitations more extensively here: https://calibreapp.com/blog/common-mistakes-in-tracking-speed
also, because the most often compared metric (by far) is the performance | pagespeed score, i'd like to make it clear that the scores have no bearing on search engine ranking. core web vitals do. read more here: https://calibreapp.com/blog/site-speed-search-ranking-complete-guide
all in all:

➡️ comparing will cause confusion
➡️ you will inevitably see divergent results
➡️ knowing the reasons for variability is critical
➡️ broadly educating on variability will help with perf buy-in
➡️ think about ux and selected metrics, not only the perf score

✌🏻
You can follow @fox.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.