We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Level1Techs · 123.7K views · 5.6K likes
Analysis Summary
Performed authenticity
The deliberate construction of "realness" — confessional tone, casual filming, strategic vulnerability — designed to lower your guard. When someone appears unpolished and honest, you evaluate their claims less critically. The spontaneity is rehearsed.
Goffman's dramaturgy (1959); Audrezet et al. (2020) on performed authenticity
Worth Noting
Positive elements
- This video provides an excellent, granular breakdown of NVLink C2C and unified memory architectures that is rarely found in mainstream tech reviews.
Be Aware
Cautionary elements
- The use of geopolitical 'cloak and dagger' rhetoric to frame a product review can lead to an exaggerated sense of the hardware's unique historical importance.
Influence Dimensions
How are these scored?About this analysis
Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.
This analysis is a tool for your own thinking — what you do with it is up to you.
Transcript
This computer, this is it. This is the one that created international incidents. It's really been a huge headache for basically everyone. There were emergency meetings. There were closed door briefings and billionaires boarding planes on hours notice to fly halfway around the world. Not not to sell this product, nothing like that, but to explain it to governments. This machine sure didn't launch the way that servers usually launch. uh the helm of Troy of servers maybe. Now, it's not because it wasn't fast or noteworthy. I mean, it's fast. Uh but because nobody could agree on uh how it might be used and what it might actually unlock and who was going to benefit first. This is a single GH200 and it's on the forbidden export list, or at least it was on the forbidden export list for China. China wants it and they couldn't have it and now they can have it, but they say they don't want it or at least China's telling people not to not to buy it. But then there are massive billiond dollar smuggling operations to get it into the country. It's been treated like munitions, like uh governments historically treat nuclear proliferation type stuff this way. And it's been sort of interesting to watch that. And you could run intelligence workloads on this. You could do some fun government stuff. But why all the intrigue? Why all the cloak and dagger? Well, it's kind of obvious, isn't it? Everyone thinks that this piece of hardware is the demarcation point, a possible inflection point for all of human civilization. And I'm not kidding when I say this, that this hardware may actually be remembered that way. Why? Well, where we It's going to be where we embark onto the next phase of human civilization. So, obviously, if that's true, and I have one of these here, how could I not be excited? I mean, let's let's let's look at me talking when I've got science to do. >> [music] >> Okay, okay, okay. Stick to technology. I get it. I get it. Not not mythology. But this is the GH200. Like, this is the Super Micro ARS 1111GL. It's a 1U system built around Nvidia's GH200 Gracehopper Superchip as Nvidia calls it. Part of Nvidia's MGX architecture. The last part matters. MGX is a single server building block, but it's part of a larger philosophy. Modularity, modular baseboards. Like this is it. This is the motherboard. There's not much here. It's all HeSync. And then all of this is just PCIe routing. It's a way for OEMs like Super Micro to build many many many systems around this kind of hardware basically. And it translates into modularized everything including PCIe. G200 is a single coherent compute domain consisting of a grace neoververse ARM CPU 72 Neoververse ARM cores running at up to 3.1 GHz a hopper class GPU H100 basically with up to 144 gigs of HBM3e this one is configured with 96 gigs also the magic here is NV link C toC I mean you can get 900 GB per second of coherent bandwidth and that matters because uh PCIe can't even touch that even PCIe 5 which this system supports It's 316 wide lanes. It's 128 gigabytes per second birectional and I can like every like envy link under the radar like how this works. I mean my measured bandwidth here [music] is uh you using Babby's first Python program moving from the HBM3 to system memory is 340 GB per second both ways like moving data in and out of system memory from the CPU to the GPU at those kinds of speeds. I can't even hit 20% of that uh in a normal threader test system with GPUs. Like that's how fast it is. It is absurdly fast. That is fast. Very fast. Woo. Uh, if you saw the B300 video, which you should definitely check out, you'll learn that, you know, a system like that has eight Blackwell GPUs and eight ConnectX nicks, and they're already on a PCIe fabric. So, like all of that is on one carrier card, but the Invinc fabric there and everything like that is running a PCIe gen 6. And so, that is what Nvidia has done to try to mitigate some of the PCIe Gen 5 is too slow. In a system like this, they can sort of bypass all of that architecture and they have. And that is also part of why, look how tiny this thing is. cuz it's so little. The unified heterogeneous memory is different architecturally is is really all I'm saying. The GPU can directly address the CPU memory which is LPDDR5. The CPU can directly operate on GPU resident data. You don't really need to do any explicit mem copy or memcopy or orchestration in your application. This is about raw floatingpoint performance [music] and removing friction from those kinds of operation. This platform is also the platform for GPU direct NVME fabric stuff is being brought up and so the software for that and the network fabric being a storage fabric and vice versa. I mean that's what it's all about. The parts of this ARS11GL system 96 gigs of HBM3e, 480 gigs of LPDDR5 attached directly to that Grace CPU and all of the the attached devices visible in one unified address space. And uh it's not you know all memory is not equal. HBM 3 is fast low latency but scarce and expensive. LPDDR5 is slower and higher latency but vastly more available even right now. That matters because you know as we move forward in our global memory shortage HBM 3 supply is going to be the limiting factor for product lines. the GS200 doesn't eliminate that constraint, but it does kind of work around it, or at least it has the potential to for the software side of things. And and look at our internal layout. I mean, all of these headers here, they're MCIO. PCIe is fungeible. It might not even be PCIe. I know in a lab somewhere that some experiments are being done with CXL, which is another tier of memory and another place where you've got fungibility, but not with your PCIe lanes, but with your memory architecture. But still having those three groups of 16 lanes. You want to add blue field to this. That's Nvidia's platform for offloading the storage fabric calculations or ancillary compute. So you can add another kind of a GPU but is close to your nick. Now the connect X7 Nick that I've got in here can offload a lot of things. RDMMA is definitely part of the equation. I can hook this thing up. You know remember the four spark cluster? I can connect this to that at 200 gigabit. And so we could use prompt processing on a Spark at 200 Gbit and but then run the inferencing over here at north of a thousand tokens per second. Hell, we can get 10,000 tokens per second, very nearly on a on a smaller model like Lama 7B with a relatively small context. Our internal layout here, it's more fans and heat sinks than PCBs. It's actually a tiny PCB. This might be a smaller PCB than some original like video game consoles. So here we just have lots of heat sink at the front. Uh, Nvidia's got some fun tear down pictures of what this board looks like without anything on it, but all of these connectors back here are PCIe lanes. So, your PCIe lanes break out and then can go do whatever. Some of the PCI lanes break out and go to the front. We have eight Gen 5 E1 MVME storage that can be part of the local fabric. And there are other chassis configurations that will give you other lanes in a different way. And then we have uh 16 32 uh 48 PCI Gen 5 lanes here at the rear. And I'm not using one of the slots. And one of the slots has a relatively pedestrian 10 gig Ethernet adapter in it. So we're not using anywhere near the bandwidth there. But I do have a connect X7 nick which can connect to our uh Infiniband RDMA network with our DGX Spark systems. U there's a riser here which is connected to our operating system M.2. And so we could have an operating system M.2 or two and then our front E1.s for local model storage or caching at absurdly high speeds. Remember each one of those can do 15 gigabytes per second. And this is it. This is the building block. You you throw as many of these in a rack as will fit [music] plus your networking fabric plus everything else. And this is uh what is scaled. Like Super Micro is shipping full racks with all of like the entire rack is preconfigured and pre-cabled. So, they're deploying it a rack at a time, and the uh the computer janitors on the ground uh just wheel the rack where it's supposed to go, connect the uh the north, south, east, west connections, and that's it. And power obviously, um fully kitted out. I mean, like I say, the dual 200 watt power supplies, you can add more GPUs to this if you if your architecture demands it, but you can also just use high-speed networking or blue field. So, you can kind of customize this a little bit as well depending on what sort of workload that you're looking to do. Remember the RTX Pro 6000 rig? Each GPU there has 96 gigs of memory, too. But the speed difference between GDDR7 on those and HBM 3 here is even more than GDDR7 to LPDDR5. It's it's it's absurdly fast. And Hopper, you know, Hopper GPU architecture and by extension GH200 is very good at FP64, FP32, and FP16. mixed precision scientific workloads and the stuff we saw at supercomput2 and the jobs that scientists are running run really well here but that's not a sign of legacy design the H100 was designed when AI and HPC sort of shared the same roadmap and that's not really true anymore as we move into the Blackwell and Vera Rubin generation as we move past Blackwell to Vera Rubin Blackwell's focus was FP8 FP4 envy FP4 transformer efficiency throughput per watt per dollar and all of that is great for AI high. But this this is what scientists and engineers are still excited about or anyone whose workload still lives in FP32 and FP64. Performance scaling here as we move to Vera Rubin. It depends on playing it a little fast and loose with the number formats. GH200 isn't better than Blackwell. It's aimed at a different class of workload or at least for the time. That's why Jensen's comment about not being able to give it away is technically true in a narrow sense. If you only care about nextg AI training and nextG AI performance or you only want NVFP4, yeah, Blackwell and beyond. Now, let's talk for a second about Wall Street and AI and the bubble. Wall Street quietly expanded the depreciation schedules from 3 to 5 years for machines like this to 5 to 8 years. And that's not just really accounting theater. I think that the depreciation schedule on equipment like this should accurately reflect its useful lifetime. So, we look at an ancient GPU, Voltas V100. Those are released in 2017. It is only just starting to be retired 8 years later and there are many many academic researchers rocking small volta clusters perfectly happily. If you're in academic institution and you're like hey I I'll take your old voltas speak up below because there's a lot of you guys in lab coats aren't necessarily ready to go down the blackwell route. And in that sense, I buy that hopperbased silicon is going to have a longer effective useful lifetime because of nonAI related number formats. And you know, FP64 is not really getting faster on newer stuff. Uh that makes sense to me. I understand that memory capacity ages a little slower than peak flops. And how peak flops gets defined is somewhat loosey goosey, but you've got some options there. Couple that with maybe some CXL capabilities and unified memory architectures for heterogeneous memory and then maybe this is going to have more forward compatibility into the future than everybody realizes. So whatever number [music] formats are going to become fashionable uh in the future post Blackwell and Vera Rubin a hopper is going to be around and uh you know a hopper might even be more useful if something replaces the transformer model. Maybe you know recurrence relations neural nets become a thing then we have to add more and different silicon for that and then it's like oh that'll make it more obsolete but maybe the training steps and everything like no this is this is this makes sense for FP64 and science. It's only dangerous in the sense that it makes it easier for people to deploy and program at scale and uh it makes that kind of thing more accessible to the people that have the skill and intelligence to use it. And I think that's really the thing that scared everybody is that it will enable you to do really amazing things. And if the people doing really amazing things with it do really amazing things that the other group of people weren't able to do, then maybe that's a little bit scary. But also keep in mind that when you deploy one of these, the software is uh very incomplete and peak efficiency for something like this is 12 to 24 months after deployment, maybe even well beyond that if there's some kind of a software breakthrough. Um and these also scale. I mean, you want to roll out eight of these or 8192 of these, the software is basically the same. And so the people doing the software don't have to worry about that as well. they can put all of that back into productivity to make it a little bit better and to make it do something interesting. And in so doing, when they figure it out on this one, maybe the next generation will be a little bit faster. Maybe the next generation will be a little bit faster. And that's kind of what happened with Blackwell. It's like, oh, we think we can use smaller, a less number of bits without any real loss of utility. And so that was baked into the next generation of hardware. But if those assumptions turn out to be wrong or there's other breakthroughs, you're still going to need an FP64 or at least FP32 or 16 jumping off point. And so this is a pretty safe investment in terms of like, you know, 8 years of investment. And I really do think like time will tell, but I think that for data center and academic use cases, this is going to have better longevity than Blackwell. I will this is level one. This has been a quick look. If you have a job or a workload you want to you want to run, it'll be a couple weeks before I have to return this. Um, so let me know what you'd like to run on it and we'll post the results on level one forum. I'm signing out and I'll see you there.
Video description
*if you're in China We recently added a Thoughts on AI section to our forum if you want to pop in and let us know any ideas/thoughts you might have with regards to further AI vids: https://forum.level1techs.com/c/developers/thoughts-on-ai/170 You can find us... Twitter - https://twitter.com/level1techs Twitch - https://twitch.tv/teampgp Patreon - https://www.patreon.com/level1 For all our social links, websites, and more, check out our link tree! https://linktr.ee/level1techs Thank you for watching! ------------------------------------------------------------------------------------------------------------- *IMPORTANT* Any email lacking “level1techs.com” should be ignored and immediately reported to Queries@level1techs.com.