Memory Wars! Apple vs Ryzen - Is Unified Memory Faster than Shared GPU Memory?

Dave's Garage · 165.2K views · 12.3K likes

Analysis Summary

30% Low Influence

mildmoderatesevere

“Be aware that the technical 'gap' is framed using a high-end Mac Pro ($7,000+) against a budget-friendly NUC, which naturally exaggerates the perceived 'magic' of one architecture over the other.”

Transparency Mostly Transparent

Primary technique

Human Detected

98%

Signals

The content is hosted by a known public figure (Dave Plummer) with a verifiable professional background, featuring a script that includes personal anecdotes and a highly specific, non-formulaic delivery style.

Personal Identity and Credentials The narrator identifies as Dave Plummer, a retired Microsoft engineer, and references his specific history with MS-DOS and Windows 95.

Natural Speech Patterns Use of colloquialisms like 'slapping in some sodiums', 'off in the weeds', and 'buckle up' alongside technical expertise.

Cross-Platform Presence Links to a personal book on Amazon, a specific podcast (ShopTalk), and established social media profiles with a consistent persona.

Technical Nuance The explanation of cache coherency and silicon interposers is delivered with the specific cadence of an educator rather than a generic script.

Worth Noting

Positive elements

This video provides an exceptionally clear technical breakdown of bus width, memory bandwidth, and the physical differences between on-package and off-package RAM.

Be Aware

Cautionary elements

The comparison uses hardware from vastly different price brackets to illustrate architectural points, which may lead viewers to overattribute performance gains to 'architecture' rather than 'cost of components'.

Influence Dimensions

How are these scored?

About this analysis

Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.

This analysis is a tool for your own thinking — what you do with it is up to you.

Analyzed March 23, 2026 at 20:38 UTC Model google/gemini-3-flash-preview-20251217

More on This Topic

Related content covering similar topics.

DeepSeek 671B params on Mac Studio

Alex Ziskind

Minimal Transparent

unified memory apple silicon

Your Mac Has Hidden VRAM… Here's How to Unlock It

Alex Ziskind

Low Mostly Transparent

unified memory apple silicon

Transcript

Hey, I'm Dave. Welcome to my shop. I'm Dave Plamer, a retired software engineer from Microsoft, going back to the MSTOS and Windows 95 days. And today, we're going to venture into the world of modern memory architecture, but with a twist. Because while everybody's busy talking about raw CPU core counts and GPU teraflops, there's something even more foundational lurking under the hood that makes or breaks your systems real world performance, especially in mixeduse creative or technical workflows. And that something is memory. how it's accessed, how it's shared, how fast it is, and who gets to use how much of it at a time. And to make that exploration interesting, we're going to do what I'm known for doing, pitting two radically different platforms head-to-head. And then I'll share some comparison benchmarks towards the end once we understand the platforms better. On the one side, we've got the sleek, streamlined M2 Ultra Mac Pro from Apple, featuring 128 GB of what Apple calls unified memory. On the other, we've got GMK Tech Nookbox, a sleek 16 core Ryzen desktop equipped with the AMD 8060S APU, also with integrated graphics, but running on a shared DDR5 system. They both use integrated graphics. They can both edit video. They both run productivity apps, but beyond those surface similarities, their memory systems may as well come from two different planets, and that's what we're going to explore today. So, buckle up because this is going to be a deep dive into bandwidth, bus width, cache coherency, and a bit of silicon wizardry. Let's start with the Apple side of the fence. Apple's M2 Ultra and their Pro systems, and indeed, most of Apple silicon is built around something called a unified memory architecture, or UMA. The M2 Ultra is a behemoth of a chip. Up to 24 cores, 60 GPU cores, and a neural engine to boot. But what sets it apart isn't just what's on the chip. It's how the memory is arranged around it. Instead of slapping in some sodiums on a motherboard and calling it a day, Apple took the bold step of integrating the memory directly onto the chip package using a silicon interposer. That means the LPDDR5 memory modules aren't somewhere off in the weeds or on the bus. They're right next to the SoC. And not in a close enough kind of way, but in a shared substrate with a thousand pin connection kind of way. We're talking about a 1024bit memory bus capable of delivering up to 800 GB per second of bandwidth. That's not a typo. 800 gigabytes per second. If that number doesn't make your jaw drop, let me put it this way. That's 8 to 10 times the bandwidth you'd typically find in a modern Ryzen desktop running dual channel DDR5. So, what does all the extra bandwidth and proximity actually buy you? Well, in Apple's, all the components, the CPU, the GPU, the NPU, and even the image signal processor share access to the same pool of memory in a cache coherent fashion. That means if the GPU writes to a memory address, the CPU can read that exact data without needing to copy it out to another buffer or go through explicit synchronization. And this is a big deal because on a traditional PC, things aren't nearly as cooperative. Enter the AMD 860S in the GMK Tech Nookbox. The 8060S lives inside a Ryzen APU, and like many of its x86 siblings, it runs in what's called a shared memory model. That's a much older approach where the CPU and the GPU technically share the same pool of RAM, but functionally they don't share it well. Instead, a portion of your system's main memory, say 2 GB or 4 GB or 16 or 32, is carved out and reserved as VRAM for the GPU. This reservation is handled by the firmware or the BIOS, and the operating system treats it as off limits for everything else. So, yes, the memory is shared, but that's more of a logistical arrangement than a true architectural unification. data still has to move and buffers are still copied and everything still travels through a memory controller located on the APU die which then talks to your RAM modules through a relatively narrow pipe. 128 bit or 256-bit dual channel DDR5 memory interface and you're not getting 256 unless you're quad channel and I don't think they do that on the Ryzen desktop yet. But compared to the 1024-bit beast on the N2 Ultra, that's a bit like trying to hydrate a stadium using a garden hose. But now let's talk speed cuz that's what we care about. Apple's LPDDR5 memory is not only wide, it's also fast. Running at around 6,400 megat transfers a second, each module can move data very quickly. And when multiplied across 1024 bits of access width, you can start to see where that 800 GB a second number comes from. And all of this happens right on package, meaning there are no long PCB traces, no motherboard routes, no DIM slots, no latency inducing connectors. Data moves quickly and efficiently. On the Ryzen side, DDR5 might also clock in at around 5200 or 5600 megat transfers a second. But because the memory bus is narrower, the total bandwidth is limited to somewhere in the 80 GB range, depending on configuration. Not bad, but once again, about 1/8 of what the M2 Ultra can do. And that's assuming that the CPU and GPU aren't stepping on each other's toes. In reality, contention can further reduce effective bandwidth during mixed workloads. So, when you're editing 8K video or training a neural network and both the CPU and the GPU want to chew on the same data set, Apple's architecture can serve as both without blinking. With the Ryzen, it's got to mediate who goes next. Now, let's talk bit width because this is one of those classic size matters situations. On the M2 Ultra, that memory interface we identified was 1024 bits wide. That means the CPU, the GPU, or the neural engine can request huge chunks of data in a single transaction. That's great for tasks like high resolution video rendering where you're moving gigabytes of raw pixel data around per second. The rise in APU, by contrast, is working with a 128 bit bus. And that smaller highway means more memory transactions are required to move the same amount of data. Not only does that slow things down, it also consumes more power and can increase memory contention when multiple agents are requesting access. And speaking of contention, let's move on to latency and cache coherency. Apple's on package memory and system level cache design mean that the CPU and GPU can both access the same data without having to make redundant copies. This is especially powerful for things like metal accelerated machine learning where data sets can live in shared memory and be updated in place by whichever engine is working on them. In contrast, the Ryzen's architecture requires more fencing and mapping. The GPU might have its own view of a buffer and when the CPU wants to read to write to it, a copy operation or at least a synchronization operation is often required. That adds latency and burns power. And while Ryzen does have a shared L3 cache across its CPU cores, usually in the 16 to 32 megabyte range, it doesn't extend that cache to the GPU in a unified fashion. Apple on the other hand includes a massive 64 megabyte system level cache that is accessible and usable by all the cores in the SOC. That means hot data can be kept very close to all the engines, reducing latency and boosting throughput, which also brings us now to power efficiency. Now, I'm not saying the Apple silicon is magic, but if you squint hard enough, it's starting to feel that way. The N2 Ultra's tight coupling of compute and memory, coupled with the inherently lower power draw of LPDDR5 versus desktop DDR5, means it can deliver incredible performance per watt. There are very few data copies, less movement across physical interconnects, and much lower idle and leakage power. Ryzen, meanwhile, has to move data from the APU die to external DIMs and across the traces of your motherboard. That not only takes more energy, but it also means you've got the signal integrity issues, timing coordination, and more memory power spent on the controller overhead. And sure, desktop DDR5 supports things like power down modes, but nothing beats the efficiency of an SOC where everything lives under one digital roof. So, here's a philosophical question. What's more important, raw performance or flexibility? Because while the M2 Ultra absolutely crushes the Ryzen 860S in terms of architectural elegance and performance per watt, there's one area where the Ryzen system still has a clear advantage. Upgradability. On the Apple side, what you buy is what you live with. If you get the 128 GB of memory, great, but it's expensive and it's soldered into the SOC package and there's no going back. That's fine for video editors or machine learning engineers who know their memory footprint. But for the rest of us, especially those who like to tinker or to buy small and scale up down the road, it can be a hard stop. The Ryzen system, on the other hand, uses industry standard DDR5 DIMs in socketed slots. If you want to swap in more RAM, upgrade it to 128 GB, use faster memory, or even run mixed mode configurations, you can do that. And while that doesn't help your integrated GPU performance, it does offer system level flexibility that Apple simply doesn't. If you're doing prolevel video editing, machine learning, or any kind of high resolution media work, the M3 Ultra's unified memory setup is hands down the better tool. You get massive bandwidth, zero copy data sharing between compute units, and incredibly low latency. But if you're gaming, browsing, running spreadsheets with the occasional bit of Photoshop thrown in, then the Ryzen APU with shared memory is a fine choice. You might not get the absolute best performance per watt or the flashiest benchmarks, but you get a solid capability at a fraction of the price, and you can upgrade or tinker to your heart's content. What we're looking at here isn't just two different chips, but two different design philosophies. Apple is betting everything on tight integration, shared resources, and vertical control of their hardware. AMD and the broader x86 world is still built around flexibility, modularity, and user choice, even if it comes at a cost, performance, and efficiency. And you know what? There's room for both in this world. There's one more subtle but incredibly important angle that we need to cover, and it's all about real world optimization. Because it's easy to get swept up in the specs, gigabytes per second or cache sizes and bus widths. But the rubber meets the road when you ask a very simple question. How well does your software stack actually use your hardware? And here's where Apple's approach really shines, especially if you're inside their walled garden. Let me explain. When you run Final Cut on an M2 Ultra, you're running a program tailor made to leverage everything Apple's architecture has to offer. It can stream data straight from SSD to RAM to GPU to display, all without translation layers, driver issues, or copying buffers back and forth. Metal, Apple's graphics and compute API, which is kind of like CUDA, was built from the ground up to play nice with unified memory. And that means real gains because projects that used to need intermediate render passes on disk can be computed on the fly. Machine learning effects can tap into the neural engine without exporting model data across buses or dealing with interop headaches. The OS, the apps, and the hardware all speak the same language, and it's a private dialect. Contrast that with the Ryzen system. Yes, you can run resolver or Blender or PyTorch, but now you're relying on drivers from AMD, Open CL, or Vulcan interop layers, and a dance of memory synchronization going on between the CPU and GPU buffers. You can still get good results, but you're doing with a lot more leg work behind the scenes. Now, this doesn't mean that the PC is inferior. It just means that open platforms carry with them the burden of interoperability. Every layer adds flexibility for sure, but also friction. And nowhere does that show up more clearly than how memory is used and managed across components. There's one last philosophical contrast I want to touch on. When you look at the Apple M2 Ultra, it's clear that Apple is chasing a specific vision, a monolithic, highly integrated comput engine where specialization is internal, not external. Everything's on the SOC. Memory is shared. Caches are unified. The user doesn't manage resources. The system does. It's elegant, but it's also very rigid. You're buying into a fixed future. The Ryzen desktop, by contrast. It's almost modular by design. You get to pick your CPU, your RAM, your GPU if you want one, and you can add more later and change cooling systems, tune voltages. It's messy, sure, but it's yours. And that openness is why the PC ecosystem has survived for decades and adapted to workloads that Apple could never have imagined. So, which is better? Well, if you're building a system for a tightly scoped, high bandwidth professional content creation task, especially video, photography, or machine learning pipelines, then Apple's unified memory architecture can offer a level of performance and simplicity that's hard to match. So, if like me, your main reason for having a Mac is to run Final Cut, it's almost perfect for that task. But if your needs are more general, or if you value upgrade paths, flexibility, or just being able to tinker and learn, then a Ryzenbased system with shared memory is not only good enough, it might actually be better for you as a long-term investment. If you're doing AI workloads, the ability to run the larger models is appreciated on both systems, but the Macs run them significantly faster than the Ryzen's APU. But if you're running a more CPU friendly task like solving prime numbers, the 16 high-speed cores of the Ryzen actually put the Mac to shame, turning in nearly double the performance. In fact, the Knuckbox is the fastest single core chip I've ever tested, faster than both the M2 Mac Pro Ultra and the Ryzen Thread Ripper 7995WX on single core workloads. And the CPU is fast enough that it even beats my older 32 core Thread Ripper 3270X on multi-core tests despite having only half the core count. So, the key is knowing what kind of work you do and then choosing the tool that best matches that profile. If you found today's look at memory architecture to be any combination of informative or entertaining, remember that I'm mostly in this for the subs and likes. So, I'd be honored if you consider leaving me one of each before you go today. And if you're already subscribed to the channel, thank you. In the meantime, and in between time, I hope to see you next time right here in Dave's Garage.

Video description

Dave explains the difference between unified and shared GPU memory and tests them for performance. Free Sample of my Book on the Spectrum: https://amzn.to/3zBinWM Check out ShopTalk, our weekly podcast, every Friday! https://www.youtube.com/@davepl Follow me on X: https://x.com/davepl1968 Facebook: https://www.facebook.com/davepl GMKTek NUCBOX: https://www.gmktec.com/?srsltid=AfmBOopEqXuKUutfGcUGHeQE1pbaKkd72MAD3buv3Si_wISkYVFpruLT