We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Analysis Summary
Performed authenticity
The deliberate construction of "realness" — confessional tone, casual filming, strategic vulnerability — designed to lower your guard. When someone appears unpolished and honest, you evaluate their claims less critically. The spontaneity is rehearsed.
Goffman's dramaturgy (1959); Audrezet et al. (2020) on performed authenticity
Worth Noting
Positive elements
- This video provides specific, reproducible terminal-based benchmarks (Stream Triad, Mandelroad) that offer more granular data for developers than standard consumer reviews.
Be Aware
Cautionary elements
- The use of 'insider' skepticism toward Apple's marketing (e.g., core naming) is a rhetorical tool to build unearned authority, making the viewer more susceptible to the creator's own affiliate links and sponsorships.
Influence Dimensions
How are these scored?About this analysis
Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.
This analysis is a tool for your own thinking — what you do with it is up to you.
Related content covering similar topics.
Windows game testing on M5 Max
Andrew Tsai
Game testing on M5 Max
Andrew Tsai
Gaming On Mac Doesn’t Suck!
Toasty Bros
2024 M4 Mac Mini vs. M1: A Waste of Money?
Elevated Systems
MacBook Pro M5 Max vs M4 Max Benchmark Results Are INSANE!
Matt Talks Tech
Transcript
This is the new M5 Max MacBook Pro and it's replacing my M4 Max MacBook Pro. Apple says this thing has a new GPU architecture with neural accelerators in every single GPU core. And here I got the 40 core version and they claim it gets over four times the peak GPU compute for AI compared to the previous generation and up to 614 GB per second of memory bandwidth. Those are eyepopping numbers. So, I want to find out how much of that is real in the stuff I actually care about. Software development and local AI. This is my first look and I'll be digging into more of that in later videos, as well as the new M5 Pro, M5 Air, and the Neo. So, stay tuned for those. But today, I'm not just comparing it to my M4 Max. I also want to see how close this thing gets to the M3 Ultra Max Studio. Because if Apple's numbers translate into real workloads, this laptop could get weirdly close to this desktop beast in some areas. As a software developer, there are a few things that I really care about. Single core performance matters for general system responsiveness as well as a lot of JavaScript heavy applications. Apple has already been absurdly good in this for years. And Apple again is positioning the M5 as having its fastest CPU core yet. Here's speedometer. Boom. And boom. And boom. Speedometer 3.1. Actually, this gives me a quick read on browser and JavaScript responsiveness, which maps pretty well to the kind of day-to-day snappiness I actually feel. This is the highest score I've ever seen in this test. 60.5 on the M5 Max. And we're actually pretty much in line for the M3 single core, which was on this machine, 49.6, and the M4 single core on the M4 Max, which is 56.7. Really good improvements there. But single core is only part of the story. What I really care about on this machine is multi-core because that affects idees, builds, code compiles, and heavier parallel workloads. For that, I'm using Mandelroad, a real algorithm implemented in Python. It's named after a famous mathematician, Mandlero. You can also find it on the benchmarks game website if you want to run it yourself to compare. I love it because it absolutely hammers all the cores. And I'm going to time it. Time Python main. And I'll give it that 16,000 flag that's recommended. Boom. Boom. Boom. Let's see who's going to make noise first. Right now, everybody's quiet. Oh. Oh. Okay. Well, uh, they're done. So, nobody made noise. But this is an intense, intense program. 14.6 seconds to run that on the M4 Max. There they are running on the M4 Max with 16 cores. 12 of those cores are performance cores and four are efficiency cores. This is one of the big architectural changes with the M5 Max because the M5 Max, unlike 16 cores on the M4 Max, has 18. And not only that, look at that. They're named super and performance. There's no more efficiency cores. So, six super cores and 12 performance cores. I mean, that's really kind of a marketing thing if you ask me because the performance cores have been renamed to super cores and the new performance cores are kind of different than the old efficiency cores. So, that's why they had to make that name change. And of course, because this thing has a lot more cores, it's going to be faster in multi-core operations. So, I ran it twice on the M4 Max, got 14.6 and 15 seconds. And on the M5 Max, 11.6 and 11.8 seconds. That is a big improvement. What about the M3 Ultra? This thing has 32 cores. Look at all those things. I'm going to run that one more time just so you can see all the little green soldiers marching on. And this thing just is insanely fast. You can't compare. It's 8.5 seconds the first run, 8.6 seconds the second run. But notice that the M5 Max is actually not too far off from that. Okay, switching gears to what everybody cares about and what all the talk is about. AI. >> So for local LMS, I care about a few things. Storage speed, SSD, prompt processing, PP, and token generation, TG. Those last two, I didn't make those up. That's what Llama CPP refers to them as. Now, it's important to understand this because prompt processing or sometimes referred to as prefill, that's the first stage and it leans more on compute or how powerful your GPU is or you know, wherever you're running it on the CPU GPU, but you tend to run things more on the GPU because it's designed for that kind of parallel work which LLMs are used to using. Weird phrasing, but okay, you get what I'm saying. CPUs are just not that great at running this. And token generation is the second half of inference of what LLMs produce. In token generation, TG is more sensitive to memory bandwidth. Now, just a quick refresher. Memory bandwidth on the M4 Max is 546 GB per second. The M3 Ultra has a memory bandwidth of 819 GB per second. Now, Apple claims the M5 Max improves the local AI by boosting both the throughput side and the GPU AI execution path. as those neural accelerators inside each GPU core, increasing unified memory bandwidth to as high as 614 GB per second. That's at the top configuration with the 40 GPU cores. However, it keeps the amount of memory the same up to 128 GB. That's both on the M4 Max and the M5 Max limit to 128 while the M3 Ultra still has 512 GB of memory. Now, the M5 Max also pairs that with a faster SSD for loading larger models and caches faster. Let's take a look at that SSD speed. So, that's the M4 Max, M3 Ultra, and both of those already had really fast speeds. We're talking about 7,300 read and 8200 right on the M4 Max. About the same speeds on the M3 Ultra. Now, let's do this one. This is supposed to be using the new Gen 5 drives. Let's take a look, folks. We're almost at 14,000 megabytes per second for read. 13,647 to be exact and 16,32 for write. That's like two times faster than the M3 Ultra or the M4 Max. This will help with the loading side of local LLMs. Big models have to come off the disc first. So faster storage helps with startup, caching, and moving large files around. It also helps with code compilation. Well, those big numbers are the sequential speed. So that's like writing large files and reading large files. But small files also get a benefit here, although not as much. The random read and write speeds are still quite a bit faster. 59 megabytes per second for read, 45 for write, which is faster than both the M4 Max at 49 and 35, respectively, and the M3 Ultra at 51 and 40 respectively. speed and accuracy under pressure is what separates a junior from a pro when alerts flood in at 3:00 a.m. in the sock or security operation center. In that moment, theorybased training doesn't help. You need muscle memory, not manuals. This is Try Hackme, and it's not just for individual learners. Try Hackme for Business helps security teams build and prove real world readiness with a management dashboard to track skill progression, a sock simulator and threat hunting simulator for realistic incident practice, AI powered tabletop exercises to stress test decision-making, and certifications and CTF events that validate applied skills, not just theory. In the sock simulator, analysts get dropped into realistic incident scenarios where they investigate alerts, respond to attacks, and write up what happened. You'll see the platform introduce a black catat style scenario. Then you're moving through a real sock workflow, investigating things like foil login and then writing the case report right through the final submission. The point is this builds a kind of hands-on muscle memory that teams need so that when real incidents hit, they stop breaches faster. If you're a sock manager, jump in the link in the description and hit try for free to get 30 days to explore everything with your team. 30 days instead of 14. And all you have to do is fill out a simple form at tryhackme.com/business. Link down below. Now, the stream test is a well-known, longived memory bandwidth test. It's been around for ages. Local LLMs love memory bandwidth, especially during token generation. So, this is going to give us a really important part of the AI story. It's free. You just download it from GitHub, compile it locally, and you can run it. Boom. Yeah. So, this is a little nuts. On the M4 Max, Stream Triad came in at about 319,000 megabytes per second. That kind of gives us a baseline of what Apple's last generation can do for sustained memory reads. The M3 Ultra moves up to 337,000 megabytes per second. This makes sense for a much bigger chip. And remember, that chip is kind of like two M3 Maxes fused together. But the M5 Max, we're at 351,000 megabytes per second. That puts it at the top here. It's about 13% more than the M4 Max and about four or 5% more than the M3 Ultra. If you compare this to Apple's advertised memory bandwidths, these come in a bit lower because these are for sustained memory throughput via the CPU, not the whole system. And also, Apple quotes the peak performance for memory bandwidth with the GPU and the CPU. So if memory bandwidth is showing up faster, does that mean that token generation is going to be faster? Pop and open LM Studio because it's easy and everybody's familiar with it. Let's take a look at ane model or mixture of experts. This new one, Quen 3.5, 35 billion active 3 billion. I know it's not as big as these machines can handle. These are monster machines, but still. Let's give it a little bit more context here. 50,000 tokens and load. Here's a prompt courtesy of Animal who does research on the A&E or Apple Neural Engine. I'll link to his Twitter down below. You can check it out. This is supposed to be a little word problem to solve. Let's go. Boom. Oh, this is interesting. Time to first token was exactly the same. M4 Max and M5 Max, 1.58 seconds, but tokens per second was different. 79.1 tokens per second on the M4 Max, 88.49 tokens per second on the M5 Max. That's a big boost. The tokens per second number is the token generation speed courtesy of memory bandwidth. By the way, M3 Ultra 69 tokens per second, but much faster time to first token. Now, that is an MLX model. MLX, of course, is Apple's optimized framework to run machine learning models on Apple Silicon. LM Studio also runs GGUF type of models, which are not MLX based, but they run on Llama CPP, another very popular project. So, let's do GPTOSS 12B. That's 120 billion parameter model. Now, we're talking, now we're getting bigger here. 60 GB on disk, which means we're going to need a lot of memory there, especially if I give it more context length. Let's go with 50,000 again. Load that up. This time, I'm going to give it a software engineering kind of prompt. Design a scalable web application architecture for an e-commerce platform. It needs to handle 10,000 concurrent users and so on. I use this prompt quite a bit because it generates a lot of planning. Let's go. And there it goes. They're all making coil wine, every single one of these. And this is definitely happening on the GPU, but we got a lot more GPU usage on the M3 Ultra than on the other two machines. The other two machines are hovering around 75 to 79% GPU usage. M3 Ultra went all the way up to 100. Maybe that's why it's done. Oh, both fans on the M4 Max and the M5 Max turned on at the same time, and they're both using about 130 watts of power. Although I just saw the M5 Max spike up to 154 watts. Also notice the labeling in Mactop. M4 Max has 16 cores, four EC cores, and 12 P cores. M5 Max, 18 cores, 12 E cores, 6P cores. I guess Mactop people didn't have time to change that. 61 tokens per second on the M4 Max, 65 tokens per second, faster, not that much faster, but faster on the M5 Max. and 82 tokens per second on the M3 Ultra. I would have expected a little bit more of a jump on the token generation side because the memory bandwidth is so much higher here. But it's also model dependent. These models are mixture of experts models. What about a dense model? So, I'm going to use a different tool just so that we can get not only the token generation speed, but also the prompt processing speed. By the way, the M3 Ultra hit 240 watts of power usage during that operation. A lot more. Here I've got a freshly compiled version of Llama CPP and I'm going to do Llama Bench on Gemma 34B. This is the GGUF version Q4KM quant. Uh if you don't know what I'm talking about, it's a small model. Quantization is Q4. So integer 4. It's not huge, but it is going to reveal some information for us, which is holy. Um yeah. Oh my gosh. Apple was not lying for PP, which is prompt processing. That's the first stage, the one that relies on compute. Remember those new neural accelerators and the GPU cores? 1,855 tokens per second for PROM processing on the M4 Max. On the M5 Max, 4,468. That's kind of like four times, right? Almost close. We have a one over there as the first number and a four over here as the first number. So, holy cow. This is for real, folks. What? Okay. Okay. The M3 Ultra got 2,959. So, the M5 Max beats the M3 Ultra in prompt processing speed. This is some really incredible news. It makes me pretty excited for what might be coming in the M5 Ultra. Can you imagine? Let me know if you're excited about that by giving a thumbs up to this video and stay tuned for more testing to come. And also, let me know what else you want to see. Thanks for watching. You might be interested in this video next. And I'll see you next time.
Video description
Apple made some huge claims with M5 Max, but one result in this test completely changed how I look at this machine. Security teams don’t stop breaches with theory. They stop them with practice. TryHackMe for Business helps SOC teams build real-world capability through hands-on cyber simulations. If you’re a SOC leader, you can try it free for 30 days by following this link 👇 https://bit.ly/4bjY0NT Offer valid during March for Security Managers leading teams of 5+. Subject to eligibility criteria and geographic restrictions. TryHackMe reserves the right to verify eligibility. 🛒 Gear Links 🛒 👀 2 400Gbps switch: https://bhpho.to/4r9vJi1 👀👀 4 400Gbps switch: https://bhpho.to/4qQqOlI 👀 MSI EdgeExpert now available here: https://amzn.to/40luUHG 👀 ASUS GX10: https://amzn.to/4kROLb8 👀 DGX Spark on SALE: https://amzn.to/4kOnczp 🪛🪛Highly rated precision driver kit: https://amzn.to/4fkMVfg 💻☕ Favorite 15" display with magnet: https://amzn.to/3zD1DhQ 🎧⚡ Great 40Gbps T4 enclosure: https://amzn.to/3JNwBGW 🛠️🚀 My nvme ssd: https://amzn.to/3YLEySo 📦🎮 My gear: https://www.amazon.com/shop/alexziskind 🎥 Related Videos 🎥 🧳🧰 Mini PC portable setup - https://youtu.be/4RYmsrarOSw 🍎💻 Dev setup on Mac - https://youtu.be/KiKUN4i1SeU 💸🧠 Cheap mini runs a 70B LLM 🤯 - https://youtu.be/xyKEQjUzfAk 🧪🔥 RAM torture test on Mac - https://youtu.be/l3zIwPgan7M 🍏⚡ FREE Local LLMs on Apple Silicon | FAST! - https://youtu.be/bp2eev21Qfo 🧠📉 REALITY vs Apple’s Memory Claims | vs RTX4090m - https://youtu.be/fdvzQAWXU7A 🧬🐍 Set up Conda - https://youtu.be/2Acht_5_HTo ⚡💥 Thunderbolt 5 BREAKS Apple’s Upcharge - https://youtu.be/nHqrvxcRc7o 🧠🚀 INSANE Machine Learning on Neural Engine - https://youtu.be/Y2FOUg_jo7k 🧱🖥️ Mac Mini Cluster - https://youtu.be/GBR6pHZ68Ho * 🛠️ Developer productivity Playlist - https://www.youtube.com/playlist?list=PLPwbI_iIX3aQCRdFGM7j4TY_7STfv2aXX 🔗 AI for Coding Playlist: 📚 - https://www.youtube.com/playlist?list=PLPwbI_iIX3aSlUmRtYPfbQHt4n0YaX0qw — — — — — — — — — ❤️ SUBSCRIBE TO MY YOUTUBE CHANNEL 📺 Click here to subscribe: https://www.youtube.com/@AZisk?sub_confirmation=1 — — — — — — — — — Join this channel to get access to perks: https://www.youtube.com/channel/UCajiMK_CY9icRhLepS8_3ug/join — — — — — — — — — 📱 ALEX on X: https://x.com/digitalix 📱 ANEMLL on X: https://x.com/anemll #macstudio #m5max #macbook