Thread by @Piwai, Today I discovered BLeak: http://plasma-umass.org/BLeak/ I just read the paper from @johnwvilk and [...]

Today I discovered BLeak: http://plasma-umass.org/BLeak/

I just read the paper from @johnwvilk and @emeryberger and my mind is blown

. It was published in 2018, I've been working on LeakCanary since 2015.. how did I miss this?

Thread on take aways / thoughts beyond just Android

BLeak: Automatically debug memory leaks in web applications

Using a short developer-provided script, BLeak automatically locates, ranks, and diagnoses memory leaks in web applications.

http://plasma-umass.org/BLeak/

First, the paper is great: https://github.com/plasma-umass/BLeak/raw/master/paper.pdf

Super approachable. I've tweeted before how I can't read graph papers yet I often intuitively get it when looking at a real implementation https://twitter.com/Piwai/status/1307147191706726400

This is definitely an exception for me.

Also, the paper references other papers that look interesting... but unfortunately it looks like they're behind paywalls? Now I need to figure out if we have a subscription at @SquareEng.

I hate that we put paywalls on research papers.</rant>

If you come to this thread without having heard of LeakCanary:

Leak detection on Android.

Can work on any JVM based app

Leverages knowledge of Android fwk

unique ability to narrow down the cause of each leak

See here for how it works: https://square.github.io/leakcanary/fundamentals/

So what's BLeak anyway? My understanding:

Novel approach for detecting and diagnosing mem leaks.

Targeted at browsers

Principles work for all automatic memory management systems (more on that later)

Great for UI + dynamic languages

Detection via test loops

Somehow as I read through this, a recent thread from @mekkaokereke came to mind about FE complexity and promos. FE work can be incredibly complex, this probably being an extreme example. https://twitter.com/mekkaokereke/status/968878515901710336

https://twitter.com/mekkaokereke/status/968878515901710336

If you don't want to read the BLeak paper, watch this:

Some take aways:

- Memory leaks are very common in web apps (same as Android!)
- They cause out of memory crashes, and also UI slowness due to GC as memory goes up (same as Android!)
- JS heap dumps suck: everything is an object => hard to tie back to code.

- Detecting that there is a leak is hard, typically you just notice memory going up over time and poke. The tools are manual, require great expertise.
- Minification makes things worst (Android too, which is why LeakCanary's hprof parser can take a proguard mapping file)

The BLeak talk identifies past approaches to identify leaks / worthless memory:

Staleness tracking, based on access recency (not accessed anymore = stale)

Growth Analysis: identify objects that grown in counter and size

Limitations =>

The BLeak talk identifies past approaches to identify leaks / worthless memory: Staleness tracking, based on access recency (not accessed anymore = stale) Growth Analysis: identify objects that grown in counter and sizeLimitations =>

Interestingly, LeakCanary uses a different form of staleness tracking: detect leaking objects via lifecycle callbacks (e.g. Activity#onDestroy()) and known state patterns (e.g. View.mAttachInfo = null)

LeakCanary doesn't suffer from limitations outlined

.

However, LeakCanary does currently miss leaks if no known lifecycle / state is involved.

Real life example: last week I randomly noticed that an EditText had 56 identical listeners.

I want to write more but worried Twitter web will crash / OOM on me (

) so I'll just hit "Tweet All" then keep going

How does BLeak detect leaks? It find objects that reference more things over time.

Run scenario on a loop, dump heap on each iteration.

For each object, compute shortest path from roots and count outgoing refs

Each iteration, remove paths with non increasing counts

After N iterations, remaining paths point to "leak roots" that kept increasing outgoing refs each iteration.

The focus on paths instead of objects as identifier means this approach still works with objects copy / swap (e.g. array copy for growth)

Once it's got the "leak roots" (I've called these "leak causes", though I like "leak root" now), BLeak does one more run, with a modified runtime that captures stacktraces when an object is added to a leak root.

That's amazing.

Some tools give you the object allocation stacktrace. Cool but doesn't tell you why it's retaining.

Some tools give you the retained path. This tells you what's holding on to the object, but not which code added it there.

LeakCanary gives you the retained path to a leaking object, and then annotates the leaktrace (shortest path) to help identify the actual leak cause / root (which ref should have been cleared in the path).

Once you've identified the problematic reference, you still need to figure out when it was set and why it wasn't cleared. If there are several codepaths to updating it, that can be hard. Even more so in a dynamic language like JS.

So capturing a stacktrace of when that reference is set is super helpful!

Different but similar to say https://github.com/uber/RxDogTag (hi @ZacSweers

) or RxJavaHooks.enableAssemblyTracking() (... memories of tracking down https://github.com/ReactiveX/RxJava/issues/4737)

AssemblyStackTraceException doesn't account for unknown causes · Issue #4737 · ReactiveX/RxJava

AssemblyStackTraceException#attachTo method does this: /** * Finds an empty cause slot and assigns itself to it. * @param exception the exception to start from */ public void attachTo(Throwable exc...

https://github.com/uber/RxDogTag

When I read the BLeak paper, my immediate reaction was "that's amazing surely that's got to be part of all web devs basic tool suite, right?"

Would love to hear the perspective of people doing web things for real (e.g.

@notwaldorf or @dan_abramov ).

Another cool trick that BLeak does is ranking leaks using a variant of retained size they call "LeakShare".

Retained size leverages dominators and ignores size for object that co dominate. LeakShare redistributes that size across co dominators, if I understand right.

I just made up "co dominators", not sure it's the right word. Anyway this is probably more fair than a size of 0, though I wonder if there might be other way to account for that retained size.

The paper also mentions adding "fake" objects to account for native memory that isn't part of the heap graph.

perflib / Android Studio heap analyzer does this, and I reproduced the approach in LeakCanary: https://github.com/square/leakcanary/blob/main/shark/src/main/java/shark/internal/AndroidNativeSizeMapper.kt#L11-L47

square/leakcanary

A memory leak detection library for Android. Contribute to square/leakcanary development by creating an account on GitHub.

https://github.com/square/leakcanary/blob/main/shark/src/main/java/shark/internal/AndroidNativeSizeMapper.kt#L11-L47

Interestingly, the Android runtime heap dumping code also does something similar: https://cs.android.com/android/platform/superproject/+/master:art/runtime/hprof/hprof.cc;l=84;drc=330d7ae3c860ee34a52b391dc8b6f22beea93f11

Classes in Android heap dumps have a synthetic $classOverhead static field pointed to a synthetic byte array of 0s.

"synthetic" as in this static field and the array it points to don't actually exist at runtime. They were added forever ago by @jessewilson so that standard hprof tools would correctly report class size for Android heap dumps

BLeak's approach is super interesting if you notice that memory is growing on a given scenario and you can write a test case to loop over it.

LeakCanary notifies you that something went wrong when you're working on something different. That means you can set it up then move on.

LeakCanary also has a test artifact that scans the heap after a UI test run. However, it's probably not as effective as what BLeak can find.

I think someone could implement BLeak's leak detection on top of LeakCanary to write "benchmark" types of scenarios.

Unfortunately, I don't think we can easily rewrite dex code at runtime to produce stacktraces when references are added to specific objects.

You *could* go through a recompile step once you've identified leak roots and run the scenario again.

LeakCanary's approach is automatic leak detection via weak refs on callbacks + looking for state patterns on known objects.

Can that work in web apps?

I don't see why not.. however the BLeak paper tells us that shortest path to GC roots might not be all that useful in JS.

Take aways thus far:

LeakCanary could potentially compute "LeakShare" instead of "retained size"

LeakCanary could implement a benchmark type utility for detecting nodes with growing outer ref count

I didn't know about BLeak and it looks like @johnwvilk and @emeryberger didn't know about LeakCanary (based on refs
in the paper anyway).

Is there like a Telegram group for people who want to talk about heap analysis? Where do the leak kids hang out?

Back to the original tweet: how did I find out about BLeak anyway? https://twitter.com/Piwai/status/1343561668329140225

https://twitter.com/Piwai/status/1343561668329140225

Time for a bit of a back story...

LeakCanary is great for finding leaks in dev builds. However, we don't know what happens in release builds!

I've talked about this in the past in a hand wavy way. Could we possibly analyze heaps in prod without user impact?

This is finally shipping in LeakCanary 2.6 but has been years in the making. https://twitter.com/Piwai/status/1342473337738584065

https://twitter.com/Piwai/status/1342473337738584065

More context: to analyze heaps, LeakCanary originally used a fork of MAT, then a fork of perflib. Those were hungry for memory, which was totally fine (ish) for a desktop heap analyzer.

LeakCanary is embedded in Android apps, so it would sometimes OOM while analyzing.

Not great! So I ended up writing a hprof parser from the ground up for LeakCanary 2.0: Shark https://square.github.io/leakcanary/shark/

It's pretty good at keeping a low & constant memory overhead.

After building this, I talked to @tsmith at @AndroidMakersFR . He had a ton of great ideas.

First, @tsmith suggested waiting for a threshold of leaks to avoid bothering devs all the time with AOSP base leaks.
Also suggested dumping heap when the app enters background.
Also suggested I reach out to folks at Google, see if we could work together.

Dumping the heap when the app enters background: that wasn't just a great idea for LeakCanary in debug apps, but also great for release builds! When the app is in background, the user won't notice the main thread freezing.

Also constant memory means we can afford to do this.

Following @tsmith's advice, I reached out to @tornorbye to talk about how perflib / AS memory analyzer could use some love, and how I happened to have a more efficient heap dump parser.

At the time, parsing a 120 MB heap dump would use 2 GB in AS vs 30 MB in LeakCanary

.

Tor introduced me to great engs working on various things. In the end, we kept things separate but I did learn a great deal through this, and this is how years later I ended up finding out about BLeak (more on this soon)

I learnt that you can't track objects across heap dumps as they have no stable identifiers, and that was preventing the studio team from building a tool showing heap diffs to identify leaks.

Which is pretty much what @johnwvilk and @emeryberger did

Hprof is an old and unflexible format, so the Android perf team wanted to build a new way that would stream the heap, leveraging protos as transport, and with stable ids.

I think this ended up shipping as "perfetto hprof". Not sure about stable ids

https://cs.android.com/android/platform/superproject/+/master:art/perfetto_hprof/perfetto_hprof.cc;l=666;drc=master

You can see the result here: https://perfetto.dev/docs/case-studies/memory#java-hprof

I kinda like showing the retained graph as a flamegraph.

Another thing I learnt: Android Studio of course had memory issues, and the team had built a tool to run a heap analysis on OutOfMemoryError, IN RELEASE BUILDS

That was exactly what I wanted to build for Android apps!

https://cs.android.com/android-studio/platform/tools/adt/idea/+/mirror-goog-studio-master-dev:android/src/com/android/tools/idea/diagnostics/hprof/analysis/HProfAnalysis.kt

Here's a sample output:
https://cs.android.com/android-studio/platform/tools/adt/idea/+/mirror-goog-studio-master-dev:android/testData/profiling/sample-report.txt;drc=b9862c0dc8aa1dbe3164f64b77c8f693725a6901

That text report also contains shortest paths from GC roots rendered as strings.

I believe that LeakCanary shows a similar output in a less compact way but much nicer to read:

One thing that caught my eye in the report: "Count of disposed-but-strong-referenced objects"

"disposed but strong referenced objects" is exactly how LeakCanary identifies "leaks".

Turns out, IntelliJ has a Disposer API for registering resources that need clean up. https://jetbrains.org/intellij/sdk/docs/basics/disposers.html#diagnosing-disposer-leaks

It's exactly the foundation on which you could build LeakCanary hooks, checking that disposed disposables soon become weakly reachable.

Disposer and Disposable

Documentation for working with and extending the IntelliJ Platform SDK

https://jetbrains.org/intellij/sdk/docs/basics/disposers.html#diagnosing-disposer-leaks

As I was poking around for Disposer usage in Android Studio sources, I found DisposerInfo which "Add finer-grained Disposer check to BLeak"

https://cs.android.com/android-studio/platform/tools/adt/idea/+/mirror-goog-studio-master-dev:uitest-framework/testSrc/com/android/tools/idea/bleak/DisposerCheck.kt;l=24-26;drc=49da46f9ed909e9073f98541832a0ffcc5d62f4a

I had no idea what BLeak was, so I searched for "BLeak" in the Android Studio codebase

This led me to Bleak.kt

https://cs.android.com/android-studio/platform/tools/adt/idea/+/mirror-goog-studio-master-dev:bleak/src/com/android/tools/idea/bleak/Bleak.kt;l=30-44;drc=49da46f9ed909e9073f98541832a0ffcc5d62f4a

And that's how I learnt about BLeak!

However, this implementation doesn't seem to do everything @johnwvilk and @emeryberger set out to do in BLeak.

It finds objects consistently growing but AFAIK does not track ref updates

That's probably because it's real hard to do in a JVM based app?

Anyway, the other cool thing about it though: it doesn't trigger a heap dump.

Instead it uses JNI to pause all threads and get GC roots, then navigates the object graph via reflection

https://cs.android.com/android-studio/platform/tools/adt/idea/+/mirror-goog-studio-master-dev:bleak/src/com/android/tools/idea/bleak/agents/jniBleakHelper.h;drc=455b918f53e5a1e35d4588eeb2c8ab7046275d84

Triggering a JVM heap dump isn't hard, here's an example: https://github.com/square/leakcanary/blob/main/shark-test/src/main/kotlin/shark/JvmTestHeapDumper.kt

Not sure why they went with JNI instead. Maybe easier than parsing a heap dump?

square/leakcanary

A memory leak detection library for Android. Contribute to square/leakcanary development by creating an account on GitHub.

https://github.com/square/leakcanary/blob/main/shark-test/src/main/kotlin/shark/JvmTestHeapDumper.kt

Ok, I think that's all I have for now