I just ran into an interesting performance issue. We're implementing a feature that collects runtime metrics from apps in the background (using EventListener/EventCounters, collection is done in-process). I load-tested it and found a 4.5% overhead, far above what I expected [...]
The setup is 2 Amazon EC2 instances on Linux. One c5.xlarge (4 cores) generating the traffic using wrk, one c5.large (2 cores) hosting an aspnetcore app that is instrumented using our library. Thoughput was ~13.8K requests/second with the new metrics, ~14.5K without
What struck me immediately was the CPU usage, that averaged to 85% when metrics are enabled. At the beginning of each run there could be a 1-2 minutes timeframe where CPU was at 100%, but it would eventually fall down. With metrics disabled, CPU was at 100% all the time
I spent some time looking for a wall-clock profiler on Linux. In the process I discovered perf timechart, which unfortunately produces a 500MB svg file for 5 seconds of profiling, which is completely unusable. That's a shame because the issue is clearly visible in the preview
Eventually I remembered that good old dotTrace now works on Linux. It allowed me to discover that our statsd client uses a threadpool worker to consume a background queue. The thread spends most of its time waiting synchronously for new items. Not ideal but shouldn't be harmful.
Then I captured another trace with metrics disabled, and something struck me: both traces had the same number of managed threads, even though the one with metrics enabled had the one worker spending 99.9% of the time waiting 🧐 (metrics disabled on the left, enabled on the right)
For some reason it looks like the threadpool won't grow above 3 threads for this app. When one of those threads was busy waiting, the 2 remaining ones weren't enough to saturate the CPU. To confirm, I changed the consumer to use a dedicated thread, and CPU usage went back to 100%
Going back to the previous runs where CPU was at 100% and fell down to 85%, it looks like the threadpool started with a large number of workers, then progressively scaled down to 3. Past this point, it oscillated between 2 and 3, but never got to 4.
You can follow @KooKiz.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.