Deep Dive into Nvidia's DGX Spark GB10

Level1Techs · 135.6K views · 3.9K likes

Analysis Summary

20% Minimal Influence

mildmoderatesevere

“Be aware that while the technical analysis is deep, the hardware was provided by Nvidia specifically for this launch-day review, which naturally frames the ecosystem as the primary standard for AI development.”

Transparency Transparent

Human Detected

98%

Signals

The content exhibits clear signs of human creation, including spontaneous speech markers, personal opinions, and specific physical interactions with the hardware that AI cannot currently replicate. The Level1Techs channel is a well-known human-led tech review outlet with a consistent, authentic presentation style.

Natural Speech Patterns Transcript contains natural stumbles, filler words ('uh', 'you know'), and self-corrections ('littleer', 'given give or take').

Personal Anecdotes and Context Speaker mentions specific details like a sticker with their SSID, a 3D printable foot they created, and personal opinions on the power brick's density.

Technical Nuance and Ad-libbing The speaker addresses 'pedantic' viewers and explains hardware quirks (Wi-Fi antenna placement) in a way that reflects hands-on experience rather than a script.

Worth Noting

Positive elements

This video provides a rare, detailed look at the specific hardware internals and real-world software latency of Nvidia's ARM-based Blackwell developer kit.

Be Aware

Cautionary elements

The content reinforces the 'Nvidia-only' developer workflow as the default reality for AI, making the proprietary software stack feel like a neutral utility rather than a commercial choice.

Influence Dimensions

How are these scored?

About this analysis

Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.

This analysis is a tool for your own thinking — what you do with it is up to you.

Analyzed March 13, 2026 at 16:07 UTC Model google/gemini-3-flash-preview-20251217

Transcript

A wizard never arrives late or early. Arrives precisely when he means to. Now, those words were not communicated to me, but I could tell from a look. This is DGX Spark. This is why you're here, right? 128 GB of machine learning stuff in here. But Nvidia also wants to put their best foot forward with their software stack and a software experience and what you can expect when you get yourself into the Nvidia developer ecosystem. You can use this as a desktop computer. You could use this as a network companion with your dev laptop or your dev workstations. This thing just sitting on the network being a an intelligent little AI brain answering your questions without taking any resources off of your, you know, main development machine or your your laptop or whatever it is on the local network over the internet via dual 100 GB network interfaces. Yeah, this is DGX Spark. This is the GB10. This is an homage to the DG1. It's just a littleer DG1 to the tune of about 230 watts. Let's let's take a closer look. Let's dive in. So, what are we looking at right here? This is the DJX Spark. It was provided to me by Nvidia ahead of time cuz it's launching tomorrow at my request for my review. My opinion is my own and Nvidia does not exert any editorial control over the content of this video because some of you are very pedantic. So, these are launching tomorrow. You can buy these tomorrow or if you're if you're watching this on launch day, October 15th, 2025. 128 GB of memory, 4 tab of capacity, 200 GB network, plus 10 GB Ethernet plus HDMI, plus four 20 GB USB ports. This thing has a lot of connectivity. In the box, you'll also get a power cord and a 230 watt power brick and a quick start guide with a sticker that has my SSID and my hotspot password and my system setup page. That one of the USBC is just for power. It's the one close to the power button. Just make sure you don't plug it in the wrong place. One-year product warranty. Oh, the power brick is almost Okay, this thing is a dense little brick. This is a tool for learners in my opinion. Learners like developers and students alike I think uh to get comfortable with AI and whatever industry or whatever other vertical domain you work in. This can run thousands of publicly available AI models and is intended to make it easier for you to prototype and mess around and experiment and deploy solutions that incorporate AI. It all runs here locally. It's a it's a lab in a box. Under the hood, this thing has no more compute capability than a 5070 gaming GPU has, but it has 128 gigs of LPDDR5 memory. That memory capacity and the hardware features of Blackwell that uh you're not going to find in this form factor elsewhere are going to let developers get a lot done with this platform. I think it's been interesting to take a look at. It's been interesting to run the experiments and it's it's been interesting to just sort of understand how it works. Inside is the GB10 which delivers up to one pedaflop of FP4, Nvidia's new 4-bit format, which we'll talk more about in a second. GB10, that's a 20 core ARM CPU. It has 10 Cortex X925s and 10 A725 cores. Fifth, it has fifth generation tensor cores with fourth generation RT cores. It has a 256-bit memory bus with a peak memory bandwidth of about 275 GB per second given give or take. Our model here has a 2242 M.2 two. That's 4 tab. That's accessible on the bottom here. There's a rubberized plastic pad on the bottom that is magnetically held in place. And then that will reveal four screws in the Wi-Fi antennas. Now, the Wi-Fi antenna placement here is a little bit odd. And the case is made out of metal. So, we do have a 3D printable foot if you would like to use that. And it even turns the Nvidia logo the right way. And now you have really good Wi-Fi reception on this side. The 10 gig interface is powered by realtech 8127 and the dual QSFP that's a connectx7 smart nick GP10 has 6144 CUDA cores topping out at 2.4 GHz. Our LPDDR5 here is clocked at 8533. Uh there's a full hardware dump of device query on the screen there. That's in CUDA tools. So this is our AI dev lab in a box. It gives you access to bigger models than will fit on a GPU. Agentic AI nickel in CCL. Uh device stacking. You can stack up two of these and the whole Nvidia ecosystem enchilada. But before we talk benchmarks, let's talk about getting started with this and its capabilities out of the box because I think that's the most interesting thing. This is the dashboard on the Spark itself. And you could just, you know, jump right into a Jupyter Lab notebook and just start doing image generation. And that sounds like a good use of time and resources. 128 gigs available of system memory, GPU utilization. I literally just updated it and while I was shooting some more, you know, of of the footage for this, the Nvidia dropped another update. They're real they're they're nervous as a kitten about this launch. It's fun. So, this is a demo that I set up with Jupiter Lab. And of course, it's a Danny Devito like character eating cereal out of a moving car. And so, you could just jump right into having a Jupiter notebook environment on your LAN that has access to 128 gigs of VRAM. and you you know build the prompt, do the code, run the thing, get the output. How simple is that? And that is sort of what we we jump into. Like the resources that Nvidia has on GitLab will walk you through doing almost all of this and that is what I have set up for this video. So there's DGX Spark playbooks, Comfy UI, dive into image generation with Comfy UI. I just used Jupyter Notebook, but we could use Comfy UI. uh the DGX dashboard that I just showed you, but if you want to do Flux and Laura fine-tuning, that's fine. Aman AI reasoning model, uh optimize Jack's llama factory, there are all of these examples. Now, I want to show you the build and deploy a multi- aent chatbot because this is kind of modern enterprise architecture for AI if you're working on this kind of thing. And this is really pretty amazing because it's a whole bunch of chat bots that work together. So, let's take a closer look at that. This is a sandbox that lets you experiment with Agentic AI, but this is not the product. This is a thing for you to learn at. And so I've got my debug console pulled up from my browser. Hello and send a message to start chatting with Spark. Understand what we're looking at here. We've got GPTOSS 120 billion acting as the supervisor model. But we actually have several other models in play here. We have Deep Seek Coder 6.7 billion instruct. That is the code generation model. The GPTOSS 120 billion will work with. We also have Quinn 2.5 VL 7B instruct and Quinn 3 embedding as the embedding model. And so like together these models are pretty big, but is you know they're not going to overrun 128 gigs of memory that we have here. But together they will get pretty close to filling up all the 128 gig. And so we can upload a documents and have tool calling and a model and and and that can recognize it's like oh I need to call a tool to parse this PDF or understand the contents of this PDF or see what's inside this image or to help with code generation. And you can access some of that through the chat. So, I've uploaded a PDF of Nvidia RTX Blackwell GPU architecture as well as the MNL 2821 PDF, which is a motherboard manual PDF. Um, using only the NL 21 PDF as a reference. Um, does this support DDR5 and DDR5 6400? because I'm curious about the memory and I don't want to go digging through the manual because uh you know reading I'm I'm lazy and so we can I've got a console pulled up here in Docker and so it is doing the tool calling thing and it has pre-processed the thing and it's like okay according to the manual the revision 1.0 So, the H14 SSL NT, it got the name of the motherboard correct. DDR5 memory can run at a speed of 6400. The manual is DDR5 support and the memory type specifies the maximum frequency of 6,400 mega transfers. Can you tell me what page the manual has information about populating the memory channels? Let's see if it can handle this. This is sort of complicated. Tool start search documents. And we can see that happening in the uh in the console here. There's all kinds of stuff happening. Oh, I forgot that the PDF uh was already there. There we go. Forcing full prompt reprocessing. Page 40, figure 216. Let's see. Boom. Look at that. Page 40. Woo! Our AI future is here. And there's the memory speed. It's nice. The table also has the nodes per socket configuration that's supported. It's nice that it got the diagram correct, but it did not get the correct. It's like, oh, six memory channels. No, sweetie. That's 12. But hey, it's not doing too bad. This you as like this is a developer tool. You've got to you you got to figure this out. This is the hallucination. This is the 0.1. This is the vacuum tubes of you know this generation. But there's there's a there's a there there's there's something you can do with that. Now what about image processing? Now again this is a different this is multi- aent understand. And so we have this this diagram, the surprising power of AI. It's like, you know, what's going on in this diagram? Let's just go ask it. And we can see from this that it is calling a tool to go to the internet and get things. And this is also why so many websites have captures. They don't want AI to be able to retrieve the documents directly. And this is why AI companies are working on browsers that have the AI built in because instead of the AI having to go and get the image, it can just pull the image from what you're looking at. It can see your screen. This is Microsoft's approach with Copilot. But this is also becoming the norm for for what AI companies want to do with browser software so that you can hit a button and then the AI can see what you're looking at so that you can ask it questions or ask it to analyze or whatever. Here we go. Here's the result of the uh the image. So this example is tool calling to a different docker container. So like if we look at the docker ps output, we can see that we've got a lot of containers running. And we can see that you know we've got the mil standalone and the quen 2.5 v1. This is the multimodal one. The one that just the the just did the image processing 120B uh supervisor deepse coder and how all this is you know glued together. This chat interface recognizes that it needs to do tool calling and then it will call the appropriate function in the other docker container to go and fetch the image and look at it and then based on the contents of the image can give you a nice summary. So, it's multi- aent chatbot workflow. Like, this is the state-of-the-art. This is where people that have been doing AI development for the last 5 years are doing a lot of their development. And to be sure, it's got bugs and it hallucinates and there's weird stuff that happens. But that it is as functional as it is with as little effort as there is is promising from a development standpoint and a future tools standpoint. So this is just you know another demo of like the multi- aent thing and also like code assist and and experimenting with that. You wouldn't use this for actual code assist. This is just an example of like how you could build something like that. You wouldn't use this for retrieval augmented generation, but you you would use this to learn about how that works as you're building something. That's what makes it interesting. This is just B-roll of the uh PDF that I uploaded. So we can see it referencing the PDF and looking through the PDF here in the Docker logs portion. And you know to be sure this is the PDF that I uploaded. It is 57 pages of Blackwell specification and diagrams and everything else. and it was able to retrieve that. It can do retrieval augmented generation and this is a demo of how retrieval augmented generation works and you can see how this is put together and build this yourself which is exciting. Also text to knowledge graph. This is a pretty interesting way to look at and think about knowledge like building a knowledge graph. I have to show you this demo. It's a text to knowledge graph demo that is uh on the NVIDIA GitHub. You could do this one yourself and it's it's not perfect, but it is a great starting point. This is one of my personal counter examples for the AI doomers that I see everywhere. Well, I mean what I worry is a lot of folks are using AI kind of as a substitute for thinking like the thought process like human labor and creativity and just like using your brain. Uh AI isn't going to do the thing on its own any more than a table saw is going to cut up a bunch of lumber on its own. I mean, that's sort of the reality that we're at here. I mean yes there are some things that it can automate but building a knowledge graph is important and interesting and uh historically building a knowledge graph was a lot of effort but it was worth it if you had a lot of information to organize and you were trying to figure out what was relevant what was not think like scientific knowledge or anything like that but it also applies to to fiction like you can use this on on novels and science fiction and it's useful for world building sometimes if you're an author so I've uploaded Mary Shell's Frankenstein because it's interesting A knowledge graph is just a map of people, places, and things. Uh the the nodes are labeled with connections between them uh to the edges. It's the edges of the graph. Basically, under the hood, each connection is a tiny sentence. Subject relation object Victor Frankenstein creates the creature or Elizabeth Levenza is the fiance of Victor. Uh break a book or your own documentation into hundreds of these little atomic facts and suddenly you can search and filter and do all sorts of fun things. You can reason over the story when you have it represented this way. It makes it interesting to explore and things that you might you might have missed uh you know during the reading. You'll notice there's some rough edges. Pronouns like he or she or possesses like my can confuse the extractor. So you might see nodes such as the house of morning or generic he linking to the wrong person, but that's normal and that's part of the exploration here. There's a metag game. Like this isn't meant to be a finished product. This is meant for you to use as a learning tool as you dip your toes or maybe all the way up to your eyeballs into the wonderful new world of AI. So, uh, you know, co-reference cleanup, replace he, my with the most likely named entity, merging and aliasing collapse Frankenstein or Victor the student into one canonical Victor Frankenstein, keeping the original quotes and citations. Believe it or not, there's something there for that in the guey. Like, it's just a thing that you have to think about in your brain when you're thinking in terms of AI. This is this is actually a deep rabbit hole that you can go down. You can use a lang chain uh or not uh you know different embedding models and or extractor models. That's a whole other set of learning but the beginnings of that is here and there are ways to approach those kinds of like dreferencing problems or am dreferencing ambiguity problems before you feed it to the thing that does the knowledge graph. Uh the goal is that you the person that's that's watching this video uh are building a cool inference interface to knowledge. That's the point. And you could learn how this kind of stuff works. Show all the characters connected to Deacy or where does the creature appear before chapter 15. The graph narrows retrieval to passages directly connected to your question so answers are grounded and explainable. Maybe you're you're interested in Justine and you can click on that and you know to see see where that goes or you see what the connections are or see what leads into that or what leads out of that. I can click a proper name and see the edges like creatures, confides in kills, rescues, writes to uh you know the relationships will light up in the graph here which is really awesome. Click any node you want, jump to the supporting text. Even with a little manual cleanup, those ambiguous he my moments, you're going to end up with a living queryable map of the novel. Now the same workflow would scale to your own research or uh PDF notes or meeting transcripts or whatever it is that you happen to be working on. And this is a small collection with different agents doing different aspects of the processing here, you know, like dreferencing and lang chain and and all the Docker containers. And you can see how it's built under the hood. So this is a really interesting and worthwhile demo. You should definitely check it out. And keep in mind all of these containers are running simultaneously on Spark. That's that's what the 128 GB of memory gives you. If you're looking for something more basic to start with, you could install O Lama or Open Web UI with O Lama. We've done tons of videos on that. And setting it up on Spark is no different than than anything else. With O Lama and Open Web UI setup, you know, you can configure O Lama to permit connections from the network on port 1134. At that point, you can configure Visual Studio Code or whatever your your code editor uses uh or or whatever code editor you prefer to connect to this instance and then whatever model you've downloaded, you know, GPTO OSS 20 billion or 120 billion can be used as your your coding assistant. So, you can do vibe coding right from Visual Studio Code on another machine on your network using this as the AI backend with 120 billion parameter model. There are even examples for doing fine-tuning of vision language models locally on Spark. Now, Nvidia's team used um they fine-tuned the Quinn 7B model to detect wildfires and aerial imagery, and that's really cool. But we can use Flux diffusion model tuning to demonstrate Spark's capabilities another way. In the flux diffusion model fine-tuning demo, they use six images of Toy Jensen and eight images of DGX Spark to train the model on how to incorporate images of that into the actual output. Um, and so there's an example of that. You can go through that today on well tomorrow with what they're releasing on GitLab and run through the model. And basically all we're doing is just showing the model these images so that it knows what to do and how to incorporate them into um you know various requests that a user might have. And all of this can run locally on Spark. It takes about 110 gigs of VRAM to get to the other side of this and slightly more time than I had to do my own training. Nvidia also provided the Unsloth on DGX Spark. Unslop is a a team that is quantizing models. It's generally true that a larger model that has been quantized will perform better than a model that is small to begin with. It's sort of counterintuitive because the um the larger models when you quantize them to a point they start to really lose their uh their coherency like they they don't perform quite as well. Um, in our own tests, you know, we quantized the DeepSeek, you know, R528, which is natively 8bit Q8. Um, and, uh, Uber Garm and folks in the level one community have quantized DeepSeek from Q8 to Q4 with roughly the same perplexity. And you can get that running up on DeepSeek, just barely. There's just barely enough memory. you're not going to have a super deep context, but you're able to run it and do interesting things with it. Even though it's, you know, a 500 billion parameter model and we've only got, you know, only 128 GB of memory, the perplexity is a little high on that one. But other models that aren't quite as large being quantized down to 4bit are generally pretty useful still. And as a learning tool or anything like that, it's still really pretty useful. So, you know, don't sleep on on quantized models. They can work surprisingly well. Now, I want to go back to video search and summarization for just a second. I know we've talked about it in some of our other content that we've done, but it is especially a good demo for Spark and Spark's capabilities. And I want to start by looking at the hardware requirements page. Look, local deployment default topology, eight B200s, you know, eight RTX Pro 6000 Blackwell. And it's like we're going to use the local Cosmos reason 7 billion parameter model and the LLM is going to be along with a 3.1 70 billion parameter model. This seems like this would be impossible to scale down and attain, but you can you can scale it down to all the way down to Jets and Thor as I've demoed before, but it also runs just fine on Spark. Thor and and Spark are similar in in some of these regards. And this is a great platform to experiment with. You can build something. Imagine a uh a system where there's packages coming down a conveyor belt. You can build a vision system that says, "Look at the packages coming down the conveyor belt. Look for this attribute or look for packages that look like they've been mishandled or look look for packages where the packing material seems damaged in some way." And that system prompt like as the programming task, one of the things that you come up with as a developer is the natural language prompt that you feed the language model that is analyzing the video. That will run just fine on the Spark platform. And it works well on the Spark platform. Even though it's not eight RTX Pro 6000s, you can still experiment with that and work on it and iron it out here and then scale it up as far as you want to go to these multi-million dollar, you know, B200based systems. That's the scale. That's the promise. That is the the the ecosystem that Nvidia has built from from individual developers, folks like you and I watching this video working on a a relatively self-contained system. The thing that you do there, the thing that you figure out there, what you set up there and and the the the geometry and of of how you engineer that solution will scale all the way up to the top. That is the value in Nvidia's ecosystem. That is what they have built and the video search and summarization demo uh you can check out today. We there's there's nothing spark special about it. It'll run on on on basically anything as long as you've got the VRAM for it. And it can be uh parcel inspection. It can be, you know, road uh traffic inspection. Uh the other demo that we did was uh warehouse safety showing you know how the system would automatically flag things uh for warehouse safety like show me the time indexes where someone was not wearing safety gear and so at the end of the day you can look at that it's like oh here's a table of events that has been created in a web UI for me to explore the video data that way. That's something that, you know, a relatively inexperienced developer can build with these learning resources and develop, you know, modern commercial solutions based around that. Did you forget that Spark has two 100 Gbit interfaces? Nvidia has the walkthrough for how to set that up. Now, NCCL the Nvidia um combined communications library or uh you know it's a subset of MPI message passing interface and like Nvidia has done a lot of work in uh open MPI and the open source thing and building the GPU fabric. We've seen in very large GPU installations where each GPU has its own network card basically. It is it is uh brilliance and foresight that Nvidia has put such a ridiculously powerful Nvidia Connect X7 Nick in Spark to begin with. I mean dual 100 gigabit. That's going to be uh something really exciting. I've only got the one Spark. I can't connect two, but I can I'm familiar with how this works and I'm familiar with with the underlying fabric. Nickel is a subset of NCCL, that's how it's pronounced, is a subset of NPI functionality. And this is a a great accessible way that you, the developer, can leverage multiple GPUs across multiple physical hosts. And Nvidia is far out in front here, far out and ahead in terms of the GPU fabric interface and being able to uh pass messages for GPU jobs to remote machines. So I can imagine that there are teams out there that are learning to write code or learning to work on this sort of thing that need this kind of functionality. The connect 7 that's in the Spark in my testing seems to be a little power gated or a little power limited like the connect 7 nicks that I have access to like in the enterprise like in the real world um perform a bit better but you could still get 100 gigabits out of this and I I think stacking two sparks together giving you access to a memory footprint of 256 GB given the other performance limitations of the Spark is entirely reasonable but as a platform to experience the APIs and learn things and put it together. Again, that is part of Nvidia's goal with this whole platform. If you can run this stuff at home or in your lab or you know, you know, for your employer at a small scale, you can then convince them to spend the, you know, few million or dozens of millions or hundreds of millions of dollars on the big stuff when you can get it running on the on the small stuff. And that is you know nickel and the network aspect of this and being able to scale to GPUs across multiple physical hosts is going to be a big part of that and having that here it's like at a very basic level connect two sparks together and it is going to scale it is going to the the theme here is that yes we can scale from this tiny little desktop appliance all the way to something enterprise level in the cloud that is what makes it exciting. I've only got the one, so this is going to have to be a video for another day. But yes, uh 200 GB through dual QSFP connections. Amazing. And also, you know, look at the how-to to see what you're getting yourself into. It's really not that complicated. Now, I mentioned I think the CX7's a little power gated. You definitely notice it a little more when the system has a background load and things are running. Uh peak performance for both CX7 interfaces I can achieve about 130 to 150 gigabit give or take. Not too bad. You can see it dips down there sometimes. It does seem to be a thermal component here. It's like I can let it run for a long time and and heat soak. And it doesn't seem to be thermals from the optic the optic adapters. Uh the the the system does not ramp the fans even when the optics get really hot. But I've tried both color chip and even just loop back cables which use very very little power overall. Um real world performance about 55 gigabit off of the interface even when the rest of the system is busy which isn't too bad. Open fold open protein folding. Yes, you can do open protein folding and customizing it. Nvidia has a demo for setting up openfold. This is uh going to be something that I would like to explore. If there are any, you know, bioinformaticists on the forum and you want to do any testing, let me know. But that's something that I'm going to have to save for another video. I always go back to Jensen's slide on Blackwell and a hopper will be hard to give away. Remember that >> that when Blackwell start shipping in volume, you couldn't give hoppers away. >> It's incredibly important to understand that that graphic mixes two things. Performance benefits from both the new Blackwell silicon. Sure, that's that's pretty, you know, the hot new thing as you'd expect, but also the performance benefit from aggressive low precision math. And that last one doesn't come without some trade-offs. You know, new hardware and yeah, so with Spark, NVFP4 is the real accelerant. NVFP4 uses two levels of scaling per block and global tensor to try to hold the accuracy within about 1% you know losses while shrinking the memory footprint by about three to three and a half times versus floatingoint 16 or about 1.8x versus FP8. That's why Spark you know it's a 20 core ARM box with 128 gigs of LPDDR5. As I said even though it operates at just 275 GB per second can feel really really fast. The FP FP4 trades bits for bandwidth and that's really important for Blackwell but particularly important for Spark. You can run FP8, FP16, Bflat 16, but you're going to hit the bandwidth wall a bit sooner on Spark than you would on other Blackwell based platforms because of the LPDDR5. And if you think on a longer timeline, because we've already burned precision down to FP4, the next generations after Blackwell aren't going to get the same uh loss of bits speed up from fewer bits. There's no more bits left to cut. Future gains will have to come more from architecture and hardware, right? So, this is going to be the biggest speed up that we're going to see for a while. I just don't think we're going to see this on Vera Rubin. I mean, I I could be wrong. There are precision shifts from prior generations, sure, but that's also why Blackwell's rollout has been somewhat bumpy. You know, the kernels and enablements and that sort of thing were still landing even when the hardware launched. I mean, that's why, as I'll talk about a little bit later, the Jetson Thor got the headline. It's like, oh, we got a three and a half time speed up, you know, just a few weeks after launch. It's like, well, it's cuz that's because of this. Week by week, Tensor, RT, LLM, Nickel, NCCL, FP, FP4 paths, all those software paths are maturing, enabling developers to get closer to what Jensen promised when he showed us that Blackwell slide. So for me, Spark's charm is less its peak performance and more an AI lab in a box that teaches modern workflows and really helps you understand NVFP4 and how it might benefit you know uh you know whatever you're working on and integrating that. And so finally we can look at the actual benchmarks but remember this is an AI lab not actually a thing that is meant to go at the speed of light. I mean, a 5070 is going to be faster, but you're going to run out of VRAM really quickly because the VRAM is faster than LPDDR5. And if you look at if you look at this, you know, our normal benchmarks, we're doing FP8 benchmarking here. It's like, okay, 93 FPS on our FP8 llama 3.2 billion parameters. The uh Bflat 16 is also, you know, uh not not an entirely unreasonable 62 FPS. So, moving from block float 16 to FP8, that is quite a speed up. But now you understand why I'm giving you the background about FP4. FP4 is the star of the show here. Like I say, you've got some, you know, pretty good indicators of performance here. GPTOSS 20 billion. Honestly, the 120 billion parameter version of this model is not that much slower. And remember, this is also still running at full 8bit. Everything here is 8 or 16 bit. Well, block float 16, so not quite 16 bit, but the speed up from 16 bit true floating point 16 bit and then block float 16 bits a little faster and then FP8 all the way down to in Nvidia's 4bit then you start to understand okay yes the performance here is substantial. So that same uh you know llama 3.1 8 billion parameter model 39 tokens per second versus what we were seeing versus what we were seeing here at 23 tokens per second. Same model it's just a different quantization and you know the output the perplexity does change on these models when you mess with the uh the density but that's you know envy fp4 in this case. There's also MXFP4 for the format depending on the backend. Mostly for uh my testing I mostly use VLLM but TRT LLM is a lot more optimized and so the performance here for the various models this is about what you would expect. Uh and Spark also supports inferencing with speculative decode. So if you want to run Llama 3.3 to 70 billion parameter model, you can expect about 11.12 tokens per second when you're running the NVFP4 with the TRT LLM backend. I'm pretty sure VLM is going to be in the same neighborhood, but you're going to have to be running the very latest version of VLM in order to do that. That's why I say now you understand the star of the show here is 4bit. And the the the purpose here is AI learning lab uh more than just I'm going to buy this appliance and it's going to to run insanely fast that or having RTX Pro 6000s has spoiled me. I I don't know. Spark is compatible and tested with a large ecosystem of tools. Whether you're working with PyTorch or TensorFlow or Hugging Face or you're working with custom transformers or you're digging into rapids or pandas or numpy or polars, Nvidia Nvidia has their own AI workbench. Nemo automated gradio streamllet the whole stack. There's there's basically something here. No matter what it is, VLM or even Tensor RT LLM, which I think Tensor RT has the most optimization for Spark of anything else out there. If you want to experiment with doing your own quants or quantizing large models to test drive the new NVFP4 format, you can do that too, right? Natively on this piece of hardware. The ENV FP4 format is what puts this in a performance class all by itself. Fortunately or unfortunately, you can spin that positively or negatively. Everything I just rattled off is currently like the hotness in different corners of the AI universe. And it's all running here at a reasonable performance at a reasonable scale. And I say this because of the fact that this generation leverages the loss of precision for more performance. That's why I say it's reasonable because I also think that we probably won't see this level of performance uplift in any subsequent generations. That's why I say that Nvidia, you know, maybe a nice way to say it is that they almost but not quite bit off more than they could chew with the Blackwell generation because the hardware and software changes changing simultaneously. The that's maybe a video idea for another day. Everything that went wrong with Blackwell so far because they changed a lot. But right now, as of today in October of 2025, this is the hardware embodiment of the most cohesive software ecosystem for AI that exists. This is a relatively small scale, sure, but this is the smallest thing that you can get that will let you do basically everything end to end for about any DevOps tooling you'd want to do up to and including GPU fabrics if you get two, at least for nickel. What about competing platforms? I mean, you know, wasn't AMD first on the scene with the Ryzen AI Max 395 Plus? And yeah, on paper, this looks like AMD's answer to Spark. This is the framework desktop featuring that platform. Same 128 GB of memory, similar memory bandwidth. Uh in 2025, this is using the Vulcan backend for AI. And with with this, I can get similar FP8 inference performance on AMD versus Spark. But really, that's where the similarities end. NVFP4 is the first differentiator for Spark. NVFP4 is is what Nvidia is leaning into hard and you're going to see it in in all of Nvidia's promo materials and and that kind of thing because the similarities of the performance of FP8 on the two platforms, but Blackwell has hardware support for FP4 and the performance is fine, but the star of the show is FP4 making the most of that 275 GB per second memory bandwidth and modern workflows. I mean, yeah, you can run FP8 and FP16 and block float 16 everywhere, but Spark's feel fast moments come from shrinking the tensors and trading the bits for memory bandwidth. That's why I also wanted to see more raw FP8 performance from Nvidia, but you know, maybe not on Spark because machines like the framework desktop can actually go toe-to-toe at FP8. And I think that Nvidia was maybe worried about that aspect of that for this launch. because it is kind of similar. But, you know, it's amazing that AMD has the hardware and that you can do that. But Nvidia has got the the ecosystem in a box and the networking. I mean, can can I DIY some networking stuff on stricks Halo? I mean, yes. But does that give me a path to the cloud and like what I would be doing with CDNA? No. And this is a Vulcan back end over here. Does did Vulcan back in help me get to a CDN GPU fabric? Also, no. Native hardware 4-bit quants on AMD. Uh, if you were a competitor thinking about the big picture and thinking about the missing parts and how you would show someone how to scale from A to B, it's easy to imagine why a competitor might deadlock internally about, you know, building some educational or demonstration resources like Nvidia is launching with Spark. you you can't get there from here. It's probably what they're thinking. This carried only by the grace of a few internal and external enthusiasts. I get it. I get it. Meanwhile, Spark is a true soup to nuts Dev Lab first. You have a turnkey portal Visual Studio Code plugin. It could be an appliance mode right out of the box so that you can use it with your existing dev workstation or whatever you want. But it's not only about the inference performance. That's what we're talking about when I say like FBA and all the other stuff. On day one, you have a clear path with Spark to learning about and building useful retrieval augmented generation solutions and building knowledge graph playbooks and learning about multimodal agents and building complex video search and summarize routines. I mean, you can things that were impossibly out of reach or multi-million dollar software projects 5 years ago, you could do in a couple of weeks now. I mean it's true frontier deep learning and fine-tuning and there's examples for people that are doing research even in in other other entire you know divisions physics biological research um and the software enablements here for Spark it also benefits Thor the FP4 enablement they're launching today or tomorrow also does a three and a half speed up there so I think Spark's going to go everywhere CS labs internal dev teams solo tinkerers because it's an AI lab in a box with reasonable power reasonable clear example in a clean path to bigger irons when you graduate to GPU fabrics and everything beyond. Folks are going to share my video with their bosses as justification to request a Spark or two probably. If you need budget cover, this is cheaper than a forgotten cloud instance. Oops, accidentally left the cloud instance on and doing stuff. That's going to be a big bill. And the skills that you build with this are going to transfer directly to those big clusters in the sky. for competitors. This is also, you know, it's giving you your homework. Your homework is obvious here, right? Reduce the friction, document the path, make the developer workflows easy because enthusiasts notice when the basics are easy and then they roll out that to their job. That's how this works. That's how this has always worked. The enthusiasts are going to be really excited about it. Uh there there is another option, dual RTX Pro 6000s, that gives you 192 gigs of VRAM, and that's currently the next tier beyond this, but that unlocks everything that I've shown you today. That's probably going to set you back at least $20,000, and it would be an order of magnitude faster, which is nice. But I'm also given to understand that Nvidia is working on an in between option between the dual RTX Pro 6000 workstation and Spark. So, it's something I think you'll have to stay tuned for. So, what's next? for this hardware. Well, I'm going to try to get a second one and see if I can improve on those network scores, network speeds. Oh, and uh for what you see on the desk here, I'm already up 2% just replacing the thermal compound with PTM 7950. Um, hit me up if you have an idea for what you want to see next. I'm Will. This is Level One. This has been a quick look at Nvidia's Spark. Did it spark any ideas that you want to share below or in the forum at level one text? I'm signing out and I'll see you there.

Video description

Available October 15th! It's crazy seeing what the first spark looked like compared to this little dude! Check out our Forum Thread: https://forum.level1techs.com/t/nvidias-dgx-spark-review-and-first-impressions/238661 0:00 Unboxing 1:10 Overview 4:44 DGX Dashboard 6:24 Multi Agent Demo 13:14 Text2kg 17:33 Ollama 18:22 Fine-Tuning 19:23 Unsloth 20:52 Video Search and Summarization 23:59 NCCL 27:42 Open Fold 28:02 NVFP$ 30:43 Benchmarks 33:28 Last Thoughts and Competition You can find us... Twitter - https://twitter.com/level1techs Twitch - https://twitch.tv/teampgp Patreon - https://www.patreon.com/level1 For all our social links, websites, and more, check out our link tree! https://linktr.ee/level1techs Thank you for watching! ------------------------------------------------------------------------------------------------------------- *IMPORTANT* Any email lacking “level1techs.com” should be ignored and immediately reported to Queries@level1techs.com.