Groq - Ultra-Fast LPU: Redefining LLM Inference - Interview with Sunny Madra, Head of Cloud

Zaiste Programming · 1.2K views · 32 likes

Analysis Summary

30% Low Influence

mildmoderatesevere

“Be aware that the 'spontaneous' nature of the interview (described as a cold-outreach success) serves to create a sense of grassroots excitement that mirrors a planned marketing launch.”

Transparency Mostly Transparent

Primary technique

Human Detected

100%

Signals

The video is a genuine, unscripted interview between two humans featuring natural conversational flow, personal history, and physical presence in a real-world office. There are no indicators of synthetic narration or AI-generated scripting.

Natural Speech Patterns Transcript contains filler words ('uh', 'like'), self-corrections, and conversational interruptions typical of spontaneous human dialogue.

Personal Anecdotes Sunny Madra shares specific, non-generic life details such as getting his first computer in 1988, hacking software on BBSs, and selling a company to Ford.

Contextual Interaction The interviewer references a specific real-world meeting ('Saturday morning', 'hackathon') and spontaneous reactions to the guest's answers.

Worth Noting

Positive elements

This video provides a clear, high-level explanation of how Groq's LPU architecture differs from traditional GPUs, specifically regarding deterministic data flow and inference speed.

Be Aware

Cautionary elements

The use of 'revelation framing'—presenting a corporate interview as a lucky, spontaneous encounter—can make a standard marketing message feel like an objective discovery.

Influence Dimensions

How are these scored?

About this analysis

Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.

This analysis is a tool for your own thinking — what you do with it is up to you.

Analyzed March 23, 2026 at 20:38 UTC Model google/gemini-3-flash-preview-20251217

Transcript

okay so we are uh here at one of the gro offices and it's a pretty crazy story because we were doing hackaton and I just wrote uh to you Sami out of out of blue if would you be interested in doing interview and you responded immediately yes and we are meeting we we haven't met before on Saturday morning on Saturday morning I was pretty crazy s welcome uh to this interview thank you for agreeing of course thanks for having me it's it's awesome s you are a head of uh gr Cloud yeah General Manor uh gr right yeah yeah so I actually have been a Serial entrepreneur so I've built and sold several companies and Gro just acquired my company in March and so that's how I joined but you know I've known grock and the company even before it was formed I actually met Jonathan when he was leaving Google and uh I had a you know chance to seed invest in the company and help introduce him to a bunch of different investors so I have a long history with Gro so it's been really interesting to watch and the last couple months have been you know really really exciting it's going crazy yeah it's pretty but before we get into that could you tell us a little bit about yourself about how did you start uh programming how did you get involved with AI yeah so I'm a lot older than you guys but uh much no I got my first computer back in 1988 and you know when I got it back then I was see no internet and so it's very basic in terms of understanding I would say a year after getting it I was already opening it up and like learning how to upgrade different parts of it a year after that I was online with bbs's you know so very early on and then a year after that I was you know part of different on online BBS communities and we were like hacking software and and taking away copyright protections um and so I have a long history in that and then you know ultimately I went to college for computer engineering and it honed my skills um you know worked at a startup and went to Cisco was at Cisco and then I've built you know three of my own companies after that so I've been writing software for a very very long time maybe random question about what's your favorite PR language you know it it's an interesting question you ask and I I think I have like a unique skill in that I can look at any language and it doesn't bother me uh you know it's context or or syntax and so I don't really have a favorite one because I I can sit down in python or Java or C or C++ or you know sort of go or whatever it is and I I don't they don't look different to me yeah this is like a ping good programming it is yeah and that's how it seems to me you know there's certain tools I like better which are more advanced you know there was a time in place where you know know intellig for Java was very very powerful as an IDE and now you have like things like cursor and all the you know um you know the plugins for vs code with co-pilot which are powerful um yeah but I'm very kind of flexible when it I I love this this answer what we talked before the interview with one of your colleagues yeah and he was telling us about the fundamentals and I think that's also important nowadays that we need to learn about fundamentals and like reduce the abstraction yes and if you're approaching language like that like a tool or a specific problem I think it's but that's that's very unique approach because there are many people just arguing about favorite langues or just focusing time yeah or editors or times of space yeah yeah exactly yeah and how did you get involved with AI uh it was it like U from the early beginning or no no it was quite recent so like I sold my previous company which was a a data and analytics and Telemetry platform for connected and and self-driving vehicles to Ford and after we sold that company to Ford um you know one of the things that we saw was companies weren't able to effectively utilize their data you know large companies have so much data but it's very hard for them to access it and so when we were starting definitive intelligence we really thought about how could more people access it and for us it became clear that AI would be able to help here but this is prior to gp4 and GPT 35 and so we started experim exp menting with what was available then like gpt3 and we said oh can we use um can we use that technology to take human requests and turn them into SQL or you know even like python to help extract data and so that was like when we really got deep into it my co-founders they have a little bit of a different history um Gavin and Caleb were you know Gavin was one of the early contributors on postgress and they were both architects of a at scale database product called green plum and when they were there they were very early on in creating Mad Lib which is one of the first kind of plug-in libraries for ML so they have a much longer history in AI than I do but for us it really kind of took off I would say just before the release of chat GPT and then as chat GPT came out which is GPT 35 and four we started to see this like exponential capability um and that's you know when we really kind of started to double click in more and more what would be possible there is now another like exponential moment I would say uh and this is speed right yes because before those tools or those machines I know how to call them llms were pretty slow and then growth happened yes and now we have like this uh immense uh speed I would say yeah yeah so like I'd say like the following things are happening which are really important one you know speed and latency are everything on the internet right whether it's in like meta or Facebook or Google Everyone is always trying to get that but down but I think like Kathy you know and I'm sure you guys follow his stuff he really talks about like you know and we'll just use llm generically here because I think he means it more broadly llm is the new operating system and if right and he has like a post there and and in fact he commented on one of our posts because we did the Llama 3 launch and you know he said yeah like you guys get it where you know as it gets faster you know we're still in very low Hertz right you know CPUs operate in you know gigahertz right and we're still in very low Hertz but what's starting to emerge with this technology is that you you can basically apply that thinking can an llm be like the new operating system okay so now but we know that the curve is probably exponential so the growth will only the rate of growth gr increase so I'm wondering uh where does grow fit in into the ecosystem yeah for for us you know I think let let's maybe let's do like a flowchart from the top down so there's like training versus inference so I think like Nvidia is dominant in training right and obviously they're currently dominant in inference but I think our architecture as you can see in the benchmarks and the demos and by all the developers wanting to use us I think we have an edge on inference and we have an edge on inference because of the architecture and I think what's happened is you know the architecture of grock which is really suited for like large scale um is really starting to shine because most developers are not in a place now where they want to have like a single um you know h100 system or two to run a model most developers want a serverless API and they want it to run really fast and Gro is really well suited for that particular use case so in the case of llama 370b we run that on 512 chip right but you know we're able to run it at 300 tokens a second right in the case of llama 38b we run that on 64 chips at 800 tokens a so th those speeds I think are really important to this new generation of applications that are being built and then there's a there's one more factor that happens when you can start running that quickly you know obviously the Llama release is incredible because the model itself is very powerful but almost all benchmarks are done on single shot but if you allow yourself to use reflection you can do that fast enough that a user may not even notice and so you can get increased performance out of a a model of even smaller that's kind of like on my next question because we were working with miow on this like idea of having voice interaction with our lens right in real time in real in real time like with a human in a way that I can you when you were saying something and this way LM stops and then generates after given the context already that is provided and I think that's right now that's only possible with Gro right to have this kind of fluidity uh and you talked about latency it's like low latency and like F Well 100% And one of the things that we have in beta maybe we'll make it available to you guys is we actually have whisper three yeah we do yes we whisper three large available and I believe we run it and you know we haven't done a benchmark yet but like 150x real time oh wow so it's really powerful so I think the combination of whisper plus our llm and then we'll also do a you know text to speech that whole end to end will be really really fast with Gro I wanted to ask you about the future because there's so many top topics discuss right now your time is limited but I think voice would be important part of the future right how we interact with computers voice could be one I think voice and vision M right and so you know exactly and I think multilingual so I think like you know if if we really well you know I I'll just share with you guys is the things of most important to us is embeddings because you want to be able to do embeddings really fast right so you'll see a model for us on the embedding side um obviously we have a beta for voice and then um multimodal and vision models you know I think the combination of those things really is going to create like sort of the future of applications and services so we talked about speed but there is another factor which is pretty amazing about Gro which is like Dev developer happiness yes people are super happy super crazy about Gro on internet Twitter ons yeah tweeting and uh what do you make out out I mean well you know sometimes people think a lot about what it takes to make you know I guess Developers happy MH but it's it's always the easiest thing it's just you know price performance yeah so if you give them a really good price and really good performance they're generally really happy like whether it's Gro or even before you want to be able to compile faster you want your internet connection to be faster right you want your browser to run faster and so I feel like we've just hit the sweet spot for developers and for us like you know the content coming to match that and really you know kudos to meta and the launch of llama 3 that's a very amazing combination for us absolutely yeah I just wanted to also take a step back maybe because I'm not sure everyone who's listening to us watching us understood it but Gro custom architecture means that Gro is also developing custom chips processors yes so could you very briefly high level tell me how they are different from I don't know inel or Nvidia beers yeah I'm not going to be like the best suited so I'm going to give like the the high the high level yeah yeah that's that's what we looking yeah exactly and you know for that we should get you guys in with Jonathan or some of the other folks but like the best way to think about it and I'll I'll use like some computer science principles right when you have a GPU right gpus utilize external memory high bandwidth memory and they're they have like a grid of compute units and that grid of compute units competes with each other to go out to the memory to go get access to what's needed and if if you think about let's normalize everything to llama 3 for a second you know you can run llama 3 on probably some somewhere between 2 and four h100s and so all of the h100 fits in the memory for those those units and um all the compute units have to compete to go out to the memory to do the computations to to you know make the results occur for us we take the same model but you know share it across 512 chips and so there no contention for the memory and all secondly all the memories within the chip it's all SRAM and so what ends up happening is it becomes highly parallelized and it be and it and it becomes you know sort of runs in a data flow architecture and so because of that it's just it's a just a completely different architecture that is well suited for larger scales so if you're a single user at home and you want to run a model like groc is not going to be where you want to start because like I said you need you want to run it on more chips but if you're a data center or you're an Enterprise and you have you know tens or hundreds or thousands of users that's when we're really well suited and that's why you see the speed jump that you get so Gro is working on there is Gro clouds but there is also like a Gro Rock yes you can also buy gpus and I was wondering uh because now companies don't want to share maybe the data of external provider open AI do you think that makes sense to like buy gr the machines and build the kind of like a brain inside of company is it that a good yeah so like our our primary go to market motions are as follows we want to basically make um we want to allow people to use this in the cloud for developers then for folks that have concern around privacy we have in the cloud dedicated system single tenant systems so that's our our next preference and then for large enough customers we will kind of sell to them directly but you know the industry is moving so quickly that we know we can offer sort of the latest and the most updates when we manage that hardware for folks right because you know it's just sort of it's in our control and it's in our control with you know privacy available for those customers uh maybe changing the gears a little bit uh as you said the industry is moving very fast uh do you have any recommendations you could give to people who are starting right now should they go to college study computer science as we did programming or should they just uh imer themselves into AI right away well I I think it uh you know I think there's a lot of different verdicts out there I think yes you should go to college and go or you don't have to go to college but you should go learn about computer science I think you should also learn about computer architecture I think now what I would say is what's really different and and I can say this both as a developer and as like an entrepreneur and CEO before which is the superpower that exists now when you were building a company before if you a developer you could only really focus on the you know the the computer science part now I really challenge people to utilize these tools not just to accelerate building that but you can do the marketing and you can do the customer support and you can even do the design and you can do all these other pieces and so what I really think I would push people to do is they can become much more holistic because they can tackle many other parts of the stack of building something by utilizing AI so that's kind of the recommendation I try to give to folks now is you know before you would have you know folks in product management or different places and what was their role their role was to take customer requirements and sort of translate it into what Engineers what they think Engineers can understand well now you can just do that with like you know your favorite model of choice you could say here's all this feedback please aggregate it for me and then let's think about what features we should implement and it removes bottlenecks right yes we used to have bottle communication bottlenecks it's kind of like what rug does to GPU like a chips say like eliminate some like memory BN or kind of interested uh S I wanted to ask you uh outside of your work which you're passionate about right now ai space because there's like music generation yeah like video generation are you following that or you well you know I do a podcast with Jason Ganon right and so we we we follow a lot of things I think I would say the three areas I'm the most excited about is music generation I really find like the latest that's happened there to be pretty incredible um I'd like it to intersect properly with rights though because I would like music from artists I like and maybe perhaps a remix from there I'm very excited about customized content generation so the example and something I do all the time so a show I really loved was Seinfeld right and I don't know if you guys yeah watch watched it or still watch it plays and reruns but something I do all the time is I'll I'll go in you know favorite AI of choice and say you know make me a modern episode of Seinfeld but with like today's things and it's fun to read because you know I get that I i' I'd like that to be turned into something and another one of my favorite shows Kirby Your Enthusiasm just finished so maybe I'll keep doing it there as well maybe you'll be able to generate videos well that's exactly it yeah so and then the third one I would say is uh robotics like the personal robot and there's a few companies now doing that and I'm very excited for the opportunity that I think that brings for all of society to have those personalized robots and their capabilities because they're going to merge with generative a and they already arment with so we're talking about Society like maybe one of the final questions there's a lot of opportunities in AI but there's also a lot of risks yeah yeah or or like do you see are you more optimistic or how pessimistic I mean well for me I I'm going to use the lens of Open Source right think about you know there's people talk about the risks with with AI but if we look backwards and we say whether programming languages are open source that's where it all starts right then we go to operating systems then we go to like databases and the more that is open source the safer Society is because those bugs can get patched right people can see malicious code and so my general feeling is if we keep pushing on the open side which meta is doing a great job I think it's amazing for society the more closed it is the more risky because we don't know what's being put in the code and how it's potentially being used and could it be used in for Bad actors and do only bad people have access to it and so for me as long as open is is leading which I think it is starting to now I feel like the technolog is very very safe for us okay and it's if you were to fantasize about next I don't know 6 12 months what what do you think will happen in the AI space I think the example that I tried to to you know a lot of people ask me this question so the example I try to use is like reasoning right and so if you pull out again your favorite model of choice and ask it you know you give it a picture of two cups and say tell me which one holds more liquid if they're different shape and give your reasoning why it does a pretty good job and I think that's like sort of these examples that people say they're only at a level of a you know four or five six seven year old what I think I would love to see and and I you know I do believe it's coming is that reasoning capability is at the level of an adult human where you're looking at two pieces of code you're looking at two financial statements and you're asking it to do a in-depth comparison and share its reasoning as to maybe why one company is better than another one or why this algorithm is better than this algorithm I think those type of things are very exciting Sun thank you very much for your time or this is still mindblowing to me that we managed to meat the magic of silicon value magic of s value thank you again for your time yeah of course you thanks bye

Video description

Groq is a computing company that developed the fastest chip for LLM inference, enabling real-time chatbot responses with their proprietary Language Processing Units (LPUs). Founded by Jonathan Ross in 2016, Groq's LPUs deliver ultra-fast, deterministic AI inference performance by focusing on efficient data flow and unique chip design. We spoke with Sunny Madra, Groq's General Manager and Head of Cloud, to learn how they revolutionized computing for LLMs and what the future holds for this innovative technology. 00:00 - Introduction and Background 02:00 - Journey into AI and Programming Links: https://www.linkedin.com/in/sundeepm/ https://twitter.com/sundeep https://groq.com Follow us: https://www.linkedin.com/in/zaiste/ https://www.linkedin.com/in/mmiszczyszyn/ Join 0to1AI 👉 https://www.0to1ai.com #ai #programming #llm #computer