Thread by @marcan42, Reading between the lines (and RDF) on the Apple M1, it seems [...]

Reading between the lines (and RDF) on the Apple M1, it seems they built a very competitive chip - but not a magical one.

TL;DR it looks like they're in the ballpark of Ryzen at multithreaded workloads, within the ~same TDP. Quite strong single thread perf though.

So it's clear they're not "10x better perf per watt" which is something I've seen thrown around a few days ago.

But they're good. They are now competing with AMD, and Intel, well, haha good luck. (They totally had it coming.)

Since this is a completely different architecture, benchmark numbers are *not* comparable like they are across x86. The margin of error is much larger, because on top of µarch differences, you also have *compiler* differences and manual optimization for apps with hand-rolled asm.

So right now M1 wins on multithread Geekbench vs. a similar Ryzen, but loses on Cinebench, by significant margins either way. For any single benchmark, I would easily expect up to 50% noise in either direction at this stage. Don't try to extract more significant figures.

That said, it is somewhat reasonable to assume that M1 is likely to trend ahead as stuff is better optimized for ARM. But we don't know what kind of gains are yet to be had; some things might have reached peak already.

So things will get interesting from here on.

Once we have more real-world app tests to use as comparisons, we'll have a better idea of how the *practical* performance of the M1 compares with the current x86 crop.

All this said, it looks like Apple has gone all-in on the "desktop experience". The really strong single-core perf (I wonder how much of that can be attributed to "x86 legacy garbage still has a cost"?), the awesome SSD, GPU, etc.

It's no wonder the M1 Macs are beating the pants off of the previous Intel offering there. But Intel has been *sucking badly* for years, and there are a pile of improvements other than the CPU.

As for Rosetta 2, it's good, but I'm still *really* curious how it'll do in the audio domain. We're talking lots of floating point processing with some integer mixed in, written by lots of different teams, some scalar, some vector, *definitely* a lot of it not well optimized.

And with hard realtime constraints - if the JIT fires off anything substantial in the audio processing thread, you *will* get a dropout - and even if it's not substantial, you'll probably get a pile of priority inversion hazards that will cause inconsistent dropouts.

So it looks like for day-to-day stuff Mac users can probably be confident that they won't lose much vs. their older Intel Mac under Rosetta 2, and gain in many instances. But I wouldn't put my money on M1+R2 for all workloads yet.

It'll be interesting to see these performance details worked out in more detail; e.g. people have talked about M1 being way faster at ObjC object management, so presumably it has *way* faster atomics. That matters a lot for some kinds of software, and not at all for others.

But the question is how, and why - presumably their bus system is tighter than typical x86 ones? I'm looking forward to a deeper dive, and whether AMD/Intel care to improve this in the future.

Also, remember that Apple cheated with their control over the CPU for Rosetta 2. Getting R2 x86 performance on any other ARM is impossible, due to the memory model mismatch. You have to massively slow down all loads and stores.

So Apple straight up implemented the x86 consistency model on their cores. That's the kind of high-impact detail that makes or breaks emulation performance for a different arch. Did they do this for any other x86-isms? Nobody knows so far.

Latest Threads Unrolled: