Thread by @fox, comparing speed metrics and scores is a significant problem. here’s why.#webperf #sitespeed [...]

comparing speed metrics and scores is a significant problem. here’s why.

#webperf #sitespeed #perfmatters

some people come to speed monitoring with an expectation to see the same results across different testing services.

this is common in a scenario of comparing service x to goog (pagespeed insights, http://web.dev measure, etc., which are treated as the only authority).

it’s not an unreasonable expectation. in an ideal world, the results would be identical or very similar everywhere you track. that way, there are no trust issues in terms of reliability and no confusion in terms of metrics. teams focus on tracking & improving. perfect!

this expectation might be personal or driven by stakeholders. owners might be using goog tools as reference, focusing solely on speed for seo (thus only what goog reports matters), not knowing about variability or a myriad other reasons.

so, what happens when people start comparing speed metrics and scores?

they see divergent results. it brings confusion to the team about which results are "real" and "trustworthy". is there a bug? what’s happening?

whoever drove the speed monitoring initiative, has to explain and defend the difference to stakeholders. they might or might not be receptive. the person might or might not have the knowledge to talk about variability and infra influence on measurements.

in a not-so-optimistic scenario, team ditches their speed tool and chooses another over reliability concerns. sadly, they observe similar patterns. they still don’t trust metrics and performance. frustration grows, not many speed improvements are made.

in an extreme scenario, teams might give up on performance altogether as it doesn’t seem reliable, "real" and "scientific" enough. there are so many unknowns and differences they find it hard to take seriously.

teams might also ditch comprehensive perf platforms in favour of one-off tests google tooling because they believe those results are the only believable ones. they might miss out on transparency, continuous monitoring and other features that make perf work much less complex.

many other things can happen too, but in many cases, comparisons bring confusion, frustration and lack of trust not only in specific tooling but speed monitoring in general. it might become even harder to convince them about the importance of speed. it's a lose-lose.

so why are results between speed platforms so different, you might ask? there's a multitude of reasons

speed monitoring platforms use various tools to test and collect metrics: @____lighthouse, webpagetest, a combination of both or an entirely custom solution.

different tools / frameworks = different results.

each tool has a different infrastructure: location of tests (think latency), CPU + GPU of the test machines and emulated network speed for the test itself (to mimic devices, such as mobiles, etc.).

different infra = different results.

tools running lighthouse with defaults use simulation (known as lantern mode). when simulation is on, sites are tested at full speed (no network or CPU limiting), then a simulation is applied on top of the results.

this means that lighthouse is ‘extrapolating’ based on a full speed test, not on what actually happened in terms of user experience and real conditions. without simulation, network + cpu limiting is applied during the test.

simulation off or on = different results.

there are more reasons for variability in test results, as explained here: https://developers.google.com/web/tools/lighthouse/variability

Lighthouse Variability | Tools for Web Developers | Google Developers

A guide on dealing with variance in Lighthouse results

https://developers.google.com/web/tools/lighthouse/variability

with so many factors at play, it's impossible to compare metrics, and even more so, performance score (that consists of many metrics being weighted).

this is not the answer people want to hear, but it is true.

in a way, this feels like an impossible problem to solve, because we can't change how web's networking works or make all infrastructure everywhere the same to deliver on the expectation of the same results.

here's what we CAN do:

educate on how speed measurements are collected + what affects them

maximise reducing variability in results

understand with WHY people compare: what is their goal? what do they need?

if you are tracking perf, you can also familiarize yourself with those limitations and act accordingly when planning your speed strategy. i wrote about those limitations more extensively here: https://calibreapp.com/blog/common-mistakes-in-tracking-speed

5 Common Mistakes Teams Make When Tracking Performance | Calibre

Learn and avoid the most common misconceptions in tracking speed.

https://calibreapp.com/blog/common-mistakes-in-tracking-speed

also, because the most often compared metric (by far) is the performance | pagespeed score, i'd like to make it clear that the scores have no bearing on search engine ranking. core web vitals do. read more here: https://calibreapp.com/blog/site-speed-search-ranking-complete-guide

Site Speed and Search Ranking (Complete Guide) | Calibre

Learn how to optimise and track speed to increase search ranking.

https://calibreapp.com/blog/site-speed-search-ranking-complete-guide

all in all:

comparing will cause confusion

you will inevitably see divergent results

knowing the reasons for variability is critical

broadly educating on variability will help with perf buy-in

think about ux and selected metrics, not only the perf score

Latest Threads Unrolled: