Qwen3 Coder M4 Max vs RTX 5090

Alex Ziskind · 87.1K views · 1.4K likes Short

Analysis Summary

30% Minimal Influence

mildmoderatesevere

“Be aware that the performance advantage shown relies heavily on the 'unified memory' architecture of the Mac, which may not translate to all AI tasks or different model quantizations.”

Ask yourself: “Whose perspective is missing here, and would the story change if they were included?”

Transparency Transparent

Human Detected

95%

Signals

The transcript exhibits clear markers of human spontaneity, including informal grammar, real-time reactions to hardware performance, and non-formulaic sentence structures. The content is a hands-on technical demonstration by a known creator (Alex Ziskind) whose style is consistently human-driven.

Natural Speech Patterns Use of filler words ('anyway', 'I think'), self-correction ('It's actually 32.5 I think'), and informal phrasing ('barely barely fits').

Personal Voice and Subjectivity Subjective commentary such as 'not amazing' and 'really nice and fast' reflecting personal opinion rather than neutral data reporting.

Contextual Awareness Real-time observation of hardware metrics ('Look at that memory usage') and planning for future tests ('I'm going to test it out with dual GPUs here next').

Worth Noting

Positive elements

This video provides a rare, direct side-by-side speed test of the unreleased or cutting-edge RTX 5090 against Apple's latest silicon using a specific high-demand coding model.

Be Aware

Cautionary elements

The comparison uses an 8-bit quantization which may be optimized differently for MLX (Apple) versus CUDA (NVIDIA), potentially skewing the perceived hardware gap.

Influence Dimensions

How are these scored?

About this analysis

Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.

This analysis is a tool for your own thinking — what you do with it is up to you.

Analyzed March 22, 2026 at 21:49 UTC Model google/gemini-3-flash-preview-20251217 Prompt Pack bouncer_influence_analyzer 2026-03-15b App Version 0.1.0

Transcript

Quen 3 coder is finally available. Here is the MLX version of it 8bit. And here I've got it running on a 5090. It just barely barely fits cuz it's 32 GB in size. It's actually 32.5 I think. So maybe some of it is not going to fit. But anyway, let's find out. I'm going to run this thing on the 5090. And that's going really nice and fast. If we take a look at this chart, you'll see a lot of that stuff is happening on that GPU. The GPU is full. Look at that memory usage. Very nice. But we see a little bit on the CPU. So, it might be doing a little bit on that. And we're getting 48 tokens per second on that, which is not amazing. Not amazing. I mean, it's good, but it's not amazing. Here is the same model 8bit but this is the MLX quant running on this M4 Max MacBook Pro. It is going fast. This is definitely happening on the GPU here. But this has 128 gigs of VRAM or total unified memory. We're getting 79 tokens per second here. So 79 48. I'm going to test it out with dual GPUs here next.