We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Analysis Summary
Ask yourself: “Whose perspective is missing here, and would the story change if they were included?”
Worth Noting
Positive elements
- This video provides a rare, direct side-by-side speed test of the unreleased or cutting-edge RTX 5090 against Apple's latest silicon using a specific high-demand coding model.
Be Aware
Cautionary elements
- The comparison uses an 8-bit quantization which may be optimized differently for MLX (Apple) versus CUDA (NVIDIA), potentially skewing the perceived hardware gap.
Influence Dimensions
How are these scored?About this analysis
Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.
This analysis is a tool for your own thinking — what you do with it is up to you.
Transcript
Quen 3 coder is finally available. Here is the MLX version of it 8bit. And here I've got it running on a 5090. It just barely barely fits cuz it's 32 GB in size. It's actually 32.5 I think. So maybe some of it is not going to fit. But anyway, let's find out. I'm going to run this thing on the 5090. And that's going really nice and fast. If we take a look at this chart, you'll see a lot of that stuff is happening on that GPU. The GPU is full. Look at that memory usage. Very nice. But we see a little bit on the CPU. So, it might be doing a little bit on that. And we're getting 48 tokens per second on that, which is not amazing. Not amazing. I mean, it's good, but it's not amazing. Here is the same model 8bit but this is the MLX quant running on this M4 Max MacBook Pro. It is going fast. This is definitely happening on the GPU here. But this has 128 gigs of VRAM or total unified memory. We're getting 79 tokens per second here. So 79 48. I'm going to test it out with dual GPUs here next.