We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Zaiste Programming · 1.9K views · 47 likes
Analysis Summary
Performed authenticity
The deliberate construction of "realness" — confessional tone, casual filming, strategic vulnerability — designed to lower your guard. When someone appears unpolished and honest, you evaluate their claims less critically. The spontaneity is rehearsed.
Goffman's dramaturgy (1959); Audrezet et al. (2020) on performed authenticity
Worth Noting
Positive elements
- This video provides a practical, hands-on demonstration of integrating high-performance hardware (Groq) with speech-to-text services (Deepgram) for real-time applications.
Be Aware
Cautionary elements
- The use of 'geographical elitism' to suggest that technical information is only valid if filtered through a specific Silicon Valley social circle.
Influence Dimensions
How are these scored?About this analysis
Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.
This analysis is a tool for your own thinking — what you do with it is up to you.
Transcript
e e e we are live do you hear us yeah I I got a confirmation Hi miow how are you I'm fine thank you now this is a typical delay that we would expect from a AI model right uh up yeah like I don't know 5 seconds or something that that was typical up until Monday yeah I would say maybe maybe because we will be talking about how it's possible how to make something similar today yeah and maybe what are the constraints when building uh systems like that but let's maybe before we start I would like to welcome everyone uh maybe could we get a confirmation that uh our audience is seeing I mean hearing us maybe seeing as well okay we are getting confirmation awesome so uh so before we start maybe uh we should talk a little bit about our trip that we did recently um we recently uh returned from our trip to San Francisco and how do you feel about that trip I was I was kind of on the edge between San Francisco and Silicon Valley uh to what to say how do I feel uh it was amazing it was so inspiring it's like packed with ambitious uh people and yeah you you you sometimes you get to uh experience that in Europe but that's like on special occasions like conferences or or so and in San Francisco it's like like that every day every single day you get two three events that you can attend just in the AI area yeah it was very intense very intense period so I think we during those four or five weeks we I mean we did a lot yeah uh in in I mean we met a l of people and uh yeah we'll be discussing this as well during during this live some crazy stories but maybe we'll tell them later uh and we've been meeting meeting some with some people uh work that were that are working on some things that we'll be showing today uh and then uh we are preparing this live right uh yeah but but maybe let's uh what's the purpose of our trip to the US yeah all you're right maybe we should discuss a little bit that so the idea was to be close to where the Innovation happens right to meet all those people that because it seems that San Francisco silon Valley is back again in a sense that people are returning to San Francisco it used to be you know people were leaving because it's a pretty expensive city to live uh still but they now they're getting back and they are attracted by like a circle right yeah like a circle because there was like a lot of hype I would say but in the positive sense around AI a lot of new companies are being created a lot of people working on on very interesting stuff lots of excitement around this yeah yeah so uh yeah so we during our trip we recorded like I don't know 20 maybe Vlogs right yeah some of them were pretty cringey I would say but uh still it was nice to record them and uh yeah so yeah we've been we've been working on yeah so the idea you asked what was the purpose and we wanted to like be with those people see what they are building maybe help or like contribute uh as well and and I think we did we participated not in only in events but uh like our meetups but we had pleasure to do four or five Hons as well right yeah we we did a few uh build some really really cool stuff uh actually to be honest I never participated in hakatan before really wow okay I didn't know that it was never the idea of a haakon was really not appealing to me at all like spending whole day or two days uh you know getting getting dirty getting tired like all sweaty not not sleeping it's sounded just just bad but in this case uh it was really fun yeah I mean I don't want to Hype uh like San Francisco too much but I I I've participated in many Hons uh in US and in Europe and in Poland and I think the energy levels are different like like uh just difference of culture i' say maybe different of culture but it's also the fact that I would say that you know maybe it's could be seen as negative but San Francisco attracts the best people and when you're on a hackaton and you talk with people you see that they are very motivated and they they know a lot and it's very intensive and very interesting environment to be in for me at least yeah and these best people uh the it they create this kind of atmosphere that you also want to Aspire to be the best when you're just around them yeah there something like that yeah so you're this the saying like you're an average of the people that are around you right so if you're around great people you yeah but maybe this is like more philosophical but what we also did is you mentioned that we met with some people uh related to the tools that we'll be talking about today uh but we also recorded 10 amazing interviews yeah and we only publish two of them right two or three yeah so we'll be publishing them and and if you sign up on our newsletter maining list that you will get access to it first before anyone else I must say it was pretty crazy especially the one with Gro right uh I'm not sure we we talked about Gro last time before our trip to us yeah we talk about Gro every time yeah because it's fast and it's great and inference is like the future I think uh but we met one of the like uh general managers uh at GR and then when we were preparing for the recording somebody else came into the room it was CTO of Gro and we had like a long discussion like I don't know very technical super technical Lowel yeah it was pretty crazy cool yeah I I we were joking that it's reminded me a little bit of the Silicon volum TV show like pretty unusual uh yeah so we recorded this interview it's please check it out if you're interested uh s madra uh and then we also did one with damond marphy the hackaton from yeah and today we'll be talking a lot about de and Grog and Grog and and voice uh voice analysis uh I should say or building voice uh Speech uh speech language models y as they call them now slm yeah I've seen that yeah I've seen that for a couple of weeks now it's not no longer about llms it's about slms slm who who made it up yeah it's it's it's changing so fast uh everything is changing so fast and maybe we should say that so we are preparing a course about AI right yeah 0 to1 ai. and we had some ideas before we uh went to San Francisco about the shape about the format of this course but now we we decided to adapt to the the space right because it's very verify yeah basically that the our ideas were verified by by the people in Silicon Valley and maybe some like a a note because when we talked with CTO of Gro he told us that he thinks that New York City is six months lagging yeah behind behind San Francisco yeah which was pretty crazy to me so I imagine how far is Europe and Poland in that in that case and yesterday we met with g kosovski y which is which was a lead of uh AI stripe and he was working with many other companies before uh in US and he told us that he thinks that I'm not sure maybe I'm not sure maybe he's watching uh but he said something like three to five years uh that's the difference uh for Poland uh you know with of difference yeah and yeah so there was this event AA and there was like a question from the audience what's this lack lacking in Polish Founders and like in European Founders in general compared to us Founders and he was trying to yeah but that's that's maybe uh topic for another another time but today we uh so I was talking about the course yeah and we need to adapt it in a way right because it's changing so fast and today we'll be talking a lot about what open AI showed uh on Monday yeah and how it comp we will compare and contrast it with what we are preparing to show you today so the content is changing so fast that it's hard to follow uh it could be hard to to follow but in a way I would say it's evolving and I I like that because it's like it's it's converging for me in a way that you see certain topics and it seems we are getting there through different paths uh but you can see the the goal it's just that now it's a bit messy the situation and I think you you phrase it pretty well when we are talking uh behind the scenes let's say uh that uh I think the next six weeks would be pretty chaotic pretty messy and then I think there will be some kind of like uh maybe a little pause uh we we'll see about that we'll see yeah um so we we want to adapt to that uh so we'll be uh like I don't know doing some stuff uh to make for for those people who want to join us uh for the course for this journey uh we want to make some uh changes uh to reflect what we learned Y and Valle and San Francisco it's both changes to the or or adapting the content as well as the form of the course actually so we want to make it more Cutting Edge even more yeah more Cutting Edge and more I would say beneficial or fulfilling for people who will join us so that you are kind of like uh you have the information from the let's say maybe not firsthand but I don't know we will try to transfer what we learned basically and in in yeah well well said well said yeah yeah so that's that's our go just wanted to mention that uh so let's maybe switch a little bit uh the gears and let's talk about open Ai and what they showed on Monday before we start coding our solution yeah so the topic for this live is about uh creating voice agents uh voice chat Bots or like uh having this um conversations human-like conversation with AI right and until that day uh or maybe there were some attempts before but uh generally it was only possible via Tech yeah until recently yeah the text was the main uh medium let's say main uh yeah but why why why would it be the main medium uh because it's easier it's smaller uh it requires less uh bandwidth right uh yeah I mean those those are technical reasons but if you ask uh I don't know John uh from I don't know a farmer what would be his like native most uh intuitive way to communicate wouldn't be texting it would be voice exactly voice voice so we one of the hackens we participated in uh the title of the hack was voice AI right and the idea was that voice is the most natural I would even say that voice more is more natural than image or than video in a way sure even if it doesn't sound that uh because with voice you can transfer a lot not only the information but emotions right we'll be talking about emotions today as well uh and you can yeah you can complement it with gestures as well this is the image part but the voice is like very natural and you can always like start talking and AI can listen all the time and like help you with the tasks or like anything so we I think I think we can agree that we think sorry that voice is the future of AI yeah and it's funny because we said that before the open AI event yeah and then open AI are like revealed the the changes to the model yeah but we kind of I I wouldn't say we knew what op AI will will show but there was this like um kind of uh sentiment in Silicon Valley yeah that everyone was talking about voice right for a reason and I think people sense that the voice is the next step and as as always up to now open AI was first to deliver something like showed something but we'll talk about the tasts is it really already there or we need to wait a little bit more for for that to to happen yeah so maybe I don't know I don't want like be too boring we're already talking about 10 10 minutes we've been talking for 10 minutes now um let's maybe talk about voice and let's maybe show one of the examples that open AI showed on Monday and let's maybe try to analyze it what happens here I'm not sure if we are ready to play the video we are ready okay so if my screen is visible uh I will just play a short video one of the this is the block post by open AI that they released this is the new model and they are showing the interaction voice interaction with the uh yeah so we we picked just one of the videos to to show you very quickly for people who didn't see it yeah just just to maybe remind set the conversation uh let's see what what happens here hey chat jpt could you count from one to 10 for me please hey actually that's um that's a little could you count faster okay yeah so I think that's enough uh and I want to focus on two things here the first thing is latency and the second thing is the interruptions both of those things happens happened in this video so let's maybe talk about latency first so typical latency because uh we would need to maybe take a step back and think about how the processing of of this happened before the event of open I so typically you would need to first uh well obviously record The Voice send it to some kind of model uh that model would transcribe it to text then another model would take the text and then uh prepare a response also in a text format usually and then you would use some kind of tool of uh text to speech to generate voice and then send it back to the browser again so we have three or maybe five steps yeah depending how you count yeah so we can we can say that we are sending the the binary file like the audio to the to the let's say to the browser in our case then it's transcribed as you say that's the second step the third step is we are interacting with llm this time yeah like up till now llms were the most popular way uh and then the response is as you say transcribed to speech and speech is sent back that's the fifth uh the last step to the user and all that needs to happen very fast and but even if it's fast uh we will see how fast we can get but even if if it's fast it's it still adds up I mean there are a lot of steps that need to need to happen like even if it takes 200 milliseconds for each of them if there are five steps it's it's still one second of a of a delay exactly and and this blog post open AI uh claims that they are under 300 milliseconds right uh and that's something that feels very natural but when we watch this video it seems a little bit longer than 3 300 milliseconds for me especially in the when he asked uh the second command yeah like when he says that it was a bit too fast can you do it a bit slower because it's a longer sentence probably I'm guessing there's also the interruption part which is also important which is not like a trivial to implement but we'll be talking about that when we uh start coding uh so yeah and all that need to happen very fast and another I think interesting point is that if you manage to do do all that what you describe those five steps let's say in under one second you can have almost exactly what happens here yeah almost humanlike interaction yeah almost humanlike interaction and one second it's a good um approximation I would say it's it's a good Benchmark also for for any kind of solution that you implement but this is what was possible up till now up till what our open AI presented but what's different is that open AI introduced this like a new model which is called Omni which is multimodel which means that it's not only trained on text but it's trained on the actual audio data and actual image data yep so this way you one of we have five steps but we can reduce those steps we don't need to transcribe we can directly send the audio to uh to the model right yeah we can squash the pre steps inside into one exactly so we send audio in and we get audio out so there is no transcription either way text to speech or speech to text exactly so this way we can great to reduce latency uh and uh what's interesting is that maybe maybe we are like to Jumping one from one topic to another but this like a five-step approach uh is like one one second right what we described and we've seen some people on Twitter and we we met also some people that were achieving even less than one second on average some claim even 500 M exactly which is with this old approach let's say with with the with just the llm with just the model which is not train on audio or or video uh so but now there is uh there are new open source models that happen to uh imitate this approach that open AI showed yeah uh so one of them is called uh gazelle um and let me just show you yeah so this is yeah this is this is the blog post that was introducing it and it was introduced in in March right so it's like two months uh two months ago almost and already it was it was working it's it's it's a pretty interesting read I recommend to check it out and you can also use the model it's on hugging face and there is also recently uh someone released the the trainer so you can train as well uh that if you have the infrastructure so there is gazelle uh but before maybe because now we are talking about some details um we wanted to show you some interesting use cases for uh for that and one of them is this ad uh and let me let me play that there is this company called uh Bland Ai and they have this Banner uh I'm not sure if it's in Fr San Francisco or but you can I hope you see this uh yeah I got confirmation and I will play you can call the number actually you see the billboard you can call this number and this is what happens to a but agent milons of phone calls for buses and in any voice what's your name my name's Mike what's your name nice to meet you Mike I'm gland I guess you're calling since you saw our billboard right yeah I am great well I guess I should tell you a bit more about myself I can be programmed to do sales customer support or really any type of phone call what sounds interesting to you you know sales sounds pretty interesting but I'm curious if you can tell me more about yourself AG all right thank you that's good you have a good day now yeah so that's that's the video that shows again two things happen here low latency the person asked a question and almost immediately quite low yeah yeah quite low maybe a little bit uh on the uh Higher Side let's say and there's Interruption part as well so you can interrupt and the let's say voice chat will adapt the AI will adapt to to the changes and some of those voice that Al remember that have the previous context so yeah of course you are like talking and all what you said and what was responded is like kept in a memory of sort and and then it's used to yeah so it's it's almost feels like human mhm uh what open AI show showed this all even closer yeah even closer even better yeah but what open AI actually showed to was the emotions because until now the the Bots typically they couldn't like imitate any kind of emotions right and open AI was able to yeah and I don't think there are any other models that do that open source models yeah yeah I haven't seen until now but deep gr has some emotion analysis but analysis yeah but it's different it's not the same let's say type of things but I I wanted to un just quickly mention a few other things uh because when you saw uh after seeing this demo by open AI you you could think that it's open AI only thing but there's a lot of startups that happened before that were doing a kind of similar thing and one of them is uh I don't know how it's pronounced wayi wayp uh voice AI for developers which is also very interesting uh very low latency we could maybe try it I'm not sure if we have the technical capacity now but let me let me try it so it's initializing um I'll click allow oh hey sorry I must have dozed off for a second what's your name welcome to V I'm an AI assistant you can talk to like a person okay what's your name yeah I think something something broken sorry I must haveed off for a second live demos are the worst yeah assistant that you talk to like a person could you tell me a joke so how's your day going could you tell me a joke or of course here's a joke why did the Scarecrow win an award because he was outstanding in his field so yeah normally I think it works a little bit faster but what's interesting it shows you the latency below not sure if you catch that but I encourage you to try it on your own uh and usually it's like a 600 milliseconds from what I seen here 6 700 milliseconds which is pretty pretty impressive yeah but it sounded more yeah it show it showed me one second uh 2000 when I just uh talk to it so I don't know maybe there's like I don't know it doesn't it usually works better uh and there's another thing I wanted to show you which is um this thing called uh what is called uh yeah retail AI so there's like this company that allows you to build those like a voice agents uh and it's also I I won't be showing it but it's the demo is also open you can just uh what's the difference from the the one on the billboard uh yeah so I think the one on the billboard is more like oriented for the um business people and here it's more about like creating the actual b or maybe they're similar I haven't like really uh checked like the business side of those startups uh but nice thing is that you can uh do the same you can get the call and it will like call you and you can talk with it latency is very low and then you can interrupt the uh this voice um agent and it's pretty uh it's pretty impressive and those companies existed like longer before uh open AI showed uh those things and they are also working on reducing the latency even if they use this like older approach uh which is not training on on directly on audio uh yeah so let's maybe get back to our uh coding now um so today we wanted to show you how to build something similar to what you see here and what open AI showed uh and we'll be using three things uh the first thing would be deepgram which is the the tool that allows you to transcribe speech to text and also text to speech so we'll be using it on the edges of our like pipeline say also it has so many other features that we'll briefly talk about exactly so we'll be talking about interruptions about sentiment analysis and those thing that Daman Murphy mentioned to us that when you talk utterances yeah exactly it's called utterances when you talk you can like have some like a moment to gather your thoughts and you after a short delay delay you start talking again and deepgram is able to catch that and like construct a a whole phrase uh out of that um and yeah so that's the on the edges of our Pipeline and in the middle of our pipeline we have two things we have Gro uh which is just like an inference uh engine uh a chip yeah Cloud Cloud solution as well in this case yeah but you can also build you can buy a Gro rock yeah so if you we should order one yeah we should order one it's only 20K per or something like Ser or something uh but uh and and we have also we'll be using Lama 3 which is Gro supports different models different open source models and one of the most recent and most uh let's say among those open source Lama fre is the most interesting one it was it was said to be very close to GPT 3.5 right or four something like that yeah GP in between I think MH so it's it's good enough for our use case it's good enough and uh what else yeah and that that would be all and we'll be trying to aim to go below one second because we think that's good enough uh for like a kind of imitate human interaction but even with this older approach which is not training audio I repeat that but just to be clear uh we've seen people that can go as low as 500 right yeah but we we'll talk maybe at the end on what kind of techniques can be used to improve it even further because we don't want to get into more yeah technical before we start coding I wanted also to mention one thing because open AI uh mentioned that first of all this voice agent will be available in a few weeks only for selected Partners right and my my theory my like intuition uh tells me that openi needs to like have a lot of compute power they need they need a lot of computers they need a lot of computers to handle like a public access to to this uh uh to those voice yeah because like you said uh using text for interacting with Bots from technical perspective was obvious because text is just easy it's simple the bandwidth required is low and then for audio it's like probably few magnitudes like like higher right and uh open AI is using those Nvidia chips which is H uh 200 right and they are mostly um like adapted I would say for training and not for inference this is what where Gro shines y so they need to find a way I think they found a way to maybe reuse I don't know exactly what they are doing of course but uh they the inference is the next step right because now we have like most of the models are pretty good and the training will become less and less I would say important and more important would be how fast you can interact with those models especially if you add audio or image especially in multimodel yeah so we on one hand we have training which was up till now very like important and now from now on I think insurance will get more uh importance uh so yeah so that's maybe question to the audience how long do you think till we get access like up publicly uh to what open a presented I think it would be still pretty pretty long I would also like to ask another question how long do you think uh it would like before an open source model appears that it's almost as good as the the yeah the one open I showed the one we showed gazelle uh shows amazing promise shows promise yeah yeah and I think and and was introduced in March yeah we have so in between nothing was presented but I think people who created that were working on that especially or maybe they were hired by open AI yeah no they they were not uh but I think they were they could be like have this they can have this additional boost by them Monday presentation right because now open AI show that the voice is the focus and maybe the image as well yeah and um focus and I think uh we should see in a couple of days uh a lot of interesting things can happen uh yeah so that that's uh that's the idea so with what what we are showing you today the code you can build almost something like open AI showed us you can use very close that yeah and you can use it now today with open source models with open source models on top of that you could even host it yourself probably and it will be two maybe three times uh slower right or if you use those additional techniques we mentioned at the end it could be almost exactly as fast as open AI yeah and yeah and it can but it's it's good enough it's good enough yeah I think for most cases it would be good enough uh and uh yeah so let's maybe switch to you computer now and let's start building this application yeah do do you want to start from the end I mean show what's the exactly uh like what's what's the product yes let's maybe first because we got some feedback after last lives that we are maybe too we jumped too fast to coding so let me show the final product final thing we are building yeah okay so the final thing that we are building is very simplis s uh so it's just uh like like you said Lamar free Grog deep gr and personal Voice Assistant with memory uh but actually we will probably skip the memory part for today because it's we don't have enough time uh so what it does right now is I can uh well I could I could tell it something let's maybe try uh let's hope this works fingers crossed it worked and it worked and it showed that we are below uh 1 second so 721 milliseconds actually on on this part so we actually added logging of the of the delays latency here as well could you ask something like um ask ask the could you ask something like I'm ready what's the question tell me a joke tell me a joke here's one why don't scientists trust Adam because they make up everything so what's the latency uh 1,300 so 1,300 milliseconds so it's pretty good I would say yeah pretty good pretty good but Gro was a bit slower uh because probably we're getting rate limited it's something that we haven't F before the the live actually yeah we've been testing before the live it works and maybe they have like this maybe we did we made too many request but uh yeah but maybe let's get into into coding I asked Sunny maybe sun is watching us uh to increase the limits for us and I hope he will also we should probably remember that those API are usually not distributed around the globe so we are probably mostly hitting the suran in in America right so the added latency if we substracted that uh it's probably 100 milliseconds less yeah something like that for each call yeah and still will show you I mean we will discuss some techniques that you can even like really reduce that uh one second so just for the record one second I for us it's it's good enough and then we will also show you how to reduce that okay uh so we yeah that's the final product you can talk to it and uh it just runs those voice commands uh on the LMA fre model uh how to build that yeah how to build that so we we already build that so maybe I should like go back in time a few comets how do you say uh I cannot help you I'm not good good with computers how about we go back to initial Comet yeah we could we could go back to initial Comet I think or I will just reset to the initial comit so there is no way no way back so let me just um I won't be showing anything but I just need bigger screen um so yeah this is our next Js right yeah so we won't talk about bootstrapping ex applications again uh we are just using the actually this is the B play the starter that comes from decrum right it's a recent one uh it's very nicely done I would say um it already transcribes so when you say something it will output text on the screen so the first part in the our pipeline we described is like a five step is already uh done the I mean two first steps are done so sending the audio to yeah it's doing that right nowr and then transcribing so that's already comes with the with the boiler boiler plate and then we need to take this text now and simply send it to Gro right the basically that's the but maybe before we do that we can talk a bit more about deep grum itself yeah ex because the starter had some code in it so maybe you can go through the code very quickly yes talk about different options that we have uh here so what they did uh in the boiler plate is that they are connecting to the Deep client uh from the from client side right and they have this like a connect to deep gr uh function so there is also some boiler around the microphone state so we have to make sure the microphone is ready we need to uh query the browser theow the first time you run it the browser will ask you if you allow for the microphone to be listened to so it it's a synchronous process uh and then uh it's irant actually but so then when the microphone is ready we can finally capture the sound uh so what we do is we call the function called connect to deep drum it's as simple as that and then we can provide model uh there are a few different models to use but we are using the latest one which is Nova 2 but this this model has some declinations let's say or like variations right because you can have no two dash conversations right or yeah conversational AI we can we can get to that so there are specialized like probably sub models exactly so the one that maybe should be used for a call center or something like that or the one that when you can build something to make orders imagine that there is a restaurant uh and you want to like create a a voice agent that can listen to um maybe lower quality uh audio input so that that's possible and and there are variations there are a couple of them I think seven or eight right for for specifically for Nova 2 uh the model they're using and also another another axis I would say is the language because Nova 2 by default is English but they also have like the languages for I mean the the variations the models for other languages like French polish even right uh like almost so in that case uh diom is pretty I would say comprehensive they provide a lot that's the first thing the model so I open the documentation of deep gr and what I'm showing right now on the screen is the rate of errors that they they present and in case of Nova 2 is below uh 10% it's 88.4% of Errors made so that's pretty good mhm it's pretty amazing yeah and then there are like you said there are different languages we can use uh is there Polish you said polish yeah there is Polish as well obviously English and then there are like Specialties so we can have the to meeting phone call Finance conversational AI voicemail video medical drive-thru automotive and custom which is probably uh for custom models the drivethru is pretty funny right there was like what's the difference between calling and dry fruit because you usually scream about food I'm hungry give me chicken yeah but you can you can use that so that's the model and then we have like a bunch of interesting so we can use the conversational AI I guess yeah we could we could try that we for the uh demo here we used just a Nova 2 okay which is good enough but uh we can switch and we can test it and then there are like a bunch of um options and they are not clearly described in my opinion and the Deep gr documentation I'm not saying that it's bad because it's pretty awesome and comprehensive but when you have such a large documentation it's a common problem that theability of features is is hard it's hard yeah and this is where Damian Murphy the yeah we are talking about that with him we have a interview with him he's he's from Deep gr and uh he just approached I mean we approached him approached him during the hackaton and he told us about some hidden features which are some of them are pretty interesting I would say that the last two are pretty interesting so utterance and Ms and end pointing are pretty important when creating what we want to create today uh so the idea is that as as I said at the beginning when you talk and you have like a small pause or uh you you're like thinking about something deep gram is smart enough to concatenate and create like a phrase out of that right so there's like a you're defining those delays and it's a good idea if you if you plan to build something like that to play with those two uh options parameters yeah atanes and Ms right and exactly then we also have like those filler words right so like uh and all that and there also we are changing the output format um and there is also this and end pointing here you mention this yeah end pointing exactly it's it's 300 oops Yeah uh so uh and we also have Smart format which doesn't really change how it's described uh there's actually page and documentation about that it only changes how it's printed out back to to you so when you add smart uh punctuation before smart Punctuation is all lower cre letters no commas no uh like full stops no nothing and then after enabling it uh deep will try to do it it its best to intelligently yeah to add the capital letters dots commas uh as well as change some of the actually some of the numbers into into digits uh when it's thinks it's it's appropriate to do so and that's that's pretty much it and after like connecting it will work it will capture the the voice and it will transcribe it and this is pretty fast like really uh this part those two First Steps sending the audio and um transcribing it yeah is super fast with deep gum and when uh de is done transcribing it there's plenty of code Rel to react here so it's it's not really important uh what it does it calls the Callback called on transcript right yeah one of the events one of the events yeah so um and we get a few different fields we can maybe talk about them for maybe short period of time there are two two important Fields the one of them is uh is final and another one is speech final mhm which is uh funny quite so this is the S sorry interrupting you but this is the atance parameter right coming into play so it tries to detect when a person stopped talking yeah yeah yeah exactly so uh there is actually an example oh yeah in in the dogs because uh what de allows you to do it all allows you to consume like it it kind of streams the response or calls the call back a few times and you can consume the response even when it's not fully ready according to degram so for example in this case we can see it in dogs someone was speaking some degram first sent just uh first two words uh then five words then seven words and so on and it set the parameters is final and is spech Final in each of these cases to fals so that we know that it's not not fully ready yet but we can maybe display it on to the user to give it some kind of feedback for better ux maybe and then when the is final is true uh then then we know that um deepgram is done like formatting the or understanding The Words that we were said because sometimes it would make mistakes and then diam goes back to previous word and fixes it uh so is final means that the the piece that it's sending to you is is done and then when speed final is true uh it means that the given someone stopped talking and every word is actually in the in the form that it's supposed to be so we are sure uh that we are ready to process the the text exactly and now and then we have as a result we receiving this like a data object which has a lot of uh properties one of them is Channel and like you have some Alternatives and then the actual transcript but on top of that uh we won't be showing that today but you can also the parameters you can Define sentiment analysis as well right and you will have uh in other properties here on the data that you receiving after transcription you will have information about like is the person uh happy or is the person sad and you can use that as well to maybe tune I know response or uh react to that you also get the apart from that you also get the Ence level so it's if someone is talking uh in a way that it's hard for deep to understand you also can account for that uh because the confidence level will be low so we are getting this caption and then we are checking if it's if there was something so if if like because uh you there there may be sound but it could be noise and it's not speech so in that case if it's just a noise that it's recorded and sent to the browser and the degram the caption will be empty right and we are for that and we are setting uh the audio to just n uh and then if it's not uh empty so something was like as you said captured and all those two criteria were met that it's final it's final and speech final then we can send our uh text to the llm in this case to do it here and there is also some other code uh about keeping the connection alive of the server right related to deep drum so well just leave it uh not not change it so let's let's keep it and maybe yeah I'm not seeing your code on the on the screen here but doesn't matter I can like imagine it in a way yeah you can have it in your yeah brilliant mind wow oh code wow exactly so that's the um um yeah that's the uh it's the code uh so now we we need to have this like a route right yeah we need to send it to basically send it to like you said to yeah so we are getting this caption I don't know why they called it this caption mhm why is it l ah yeah that's a baller plate maybe we should issue a request to yeah so we we want to send it to to deep yes uh sorry to Gro to do that uh because this is the client component in nextjs so it's happening on the client side because we need to access microphone so it has to be on the client so we need to use we could actually probably use a server action as well but we can also use just a route fetch route and that's also cool I don't know which one do you prefer uh I think I would go with the fetch approach it would be easier uh I think it' be more clear maybe for the audience to just see because it's just uh triggering some like sending a post request uh uh and with the the caption I mean or the transcription rather yep so it's just a basic fetch uh we put it under API respond right that's the yeah basically yeah copile generated all the code for me yeah it's maybe kind of old school with all those yeah d d but it's good enough good enough that's the job you can re yeah you can remove Local Host you can just use the path name um um right and [Music] um yeah so we can it's API submit okay yeah could be submit and the in the code we created before it's respond but it doesn't matter yeah it's a route it would be would be a post route so very typical nextjs so let me make sure we are doing this correctly method post on application Json and we are sending caption in the body okay yeah so export cost post equals I think yes um so we are getting a request yeah it's exactly we are um and the body is uh are just yeah type casted for now yeah because it's just the perp for the purpose of the life coding I just typ cast it usually uh you probably want validate this with some kind of light barrier maybe in such a simple case just WR it yourself but uh for our use case today it's it's good enough yeah and what do we do next yeah so next we need uh Gro provides SDK so we don't we could use another fetch but in this case maybe easier e it would be easier to just use the gro SDK so there is gr- SDK a package is it installed already no um in this code no but uh yeah it should be yeah gr- SDK and we can import a default object which is Gro uh from there initialize it uh it's a class so we create an instance okay so let's import Grog from Gro SDK exactly and then you need to initialize it uh just before the yeah yeah new gro yeah new Grog and that's all because it takes the um takes the the values from the envirment variables right and if they have the proper name I have them already probably yeah you yeah I should have them if if not we will uh I will show everyone my secret code again they are my secrets but all right yeah please don't show them yeah and uh now we can so Gro has it's comp it's compatible with open AI so what it has a few different functions there which one should we use chat chat exactly chat and completions compl oops completions and create yeah so which is exactly as uh and then you provide messages which is an array an object with um messages yeah which is an array and there there is like an object which is a role in our case would be user and actual content yeah and here instead of just providing the actual caption model we also need model obviously uh yeah after sorry yes it's it's a model uh and it's Lama free- 8B Das 8192 which is a smaller uh and faster yeah I think we already said that but the the names of the models that can be used here are on the gro website so we can you can just look it up I think you have the typescript uh not not here gu in case of open open AI there typ script like suggestions but not here so it's just a string okay so that's uh and here uh we should in content we we cannot pass the caption we should also provide some instruction how to treat this so for example what we did is that we said uh what we said is that response to the following query as a human would do and be super concise yeah that sounds about right let's do this so resp respond to the following query to the following query as a human would too mhm and be concise exactly and this way we could have like this conversation rather than LM generating like a huge you can adjust it to your needs exactly you could ask it to pretend to be I don't know call center person from call center or something but this is the part that I think requires some like U yeah tuning tuning and tinkering tinkering nice word I'm still learning my English uh and now we need to return the response couldn't be easier yeah but before yeah but wait wait because we are still we need to now we need to return voice right yeah but I I was thinking maybe we can return text first and show what the text response is or maybe you can just conso log it I don't know and I just wanted to add that uh Grog is so fast that up to this point point we will be around 200 300 milliseconds right we'll see we'll see yeah if if we if we log that but it's up up up to this point it's very fast so similar if you know open AI similar to it uh there are also different choices you can have you can ask uh Gro or open AI to generate multiple responses at once and you can pick one and but in this case by default I think it's it's only one so we only have one element in this array so just return the content of the first element of of the choices array yes and we return it back to the to here and this should work it should conso L whatever uh the the bot answers yeah it's not the preious code no I I wonder if you select it and ask the co-pilot to like make it to to transfer it to like a a single we approach will will it work please don't make me this okay so maybe yeah let's get back to where is it yeah let's run it again mhm hi I think it's not could you refresh because it not always captures yeah hi I forgot to open the a it worked tell me a joke so I think uh yeah it works pretty well what are you checking the console EXP for the what the model is generating but it's showing here on the screen no it's just the transcription of what you are saying ah yeah yeah yeah it's not what the bot responds oh right exactly we did uh so one more time tell me a joke why won you listen to me yeah work works we have this funny situation because every time I'm trying to speak to the to the machine it doesn't listen to me and whenever Kuba does it it listens I have no idea what's what's going on maybe you're the king of the machines I don't know okay but we it works right we we send the the V the audio we transcribe it to text we like send the text to the llm gro we got the response and now we need to change this response to voice right so the final step how do we do that yeah so again we can use directly the SDK from Deep grum so there's like a deep gr/ SDK with um the monkey at the beginning so for those who are not from from Poland and in Polish the ERS sign is called it's not % it's sorry it's called mon monkey for some reason okay what do we import from here just deum yeah it's uh exactly it's it's uh no it's create client sorry it's uh named uh function like a so it's deep equals create client and then you just pass the uh n which is directly it's not an object so uh we just passed process yeah so the copilot was wrong in this case mhm uh so what's the name of that uh de gore API undor key exactly that's what it's proposed okay typically you would want to validate that but in this case f fast right with an F or something we showed that in other lives yeah in most of the lives if you're watching us that should be yeah and uh and now we we can just run this um there is like a on this speak exactly there's a speak uh property or field and it has a method called request wait wait wait it creates the speak client no no no you you can chain so you do speak without it's not an invocation it's just a field okay and then request and then you provide two objects here the first one is just text MH which is the actual response from the llm and as as an object yeah and the second object is the uh the configuration and again we need to define a model here which is the the most important part I would say yeah but it's a different model one is is text to speech it's text to speeech model TTS yeah so what's the name of the model in AA they have degr has a pretty Dash uh Asteria which is the name of the person right you can have different uh voices and then Dash the language and language yeah the N no no oh no sorry no worries so it could be PL for Polish right and it could be but from we could maybe check that but last time I checked for now this model only is only available in English so if you want to uh Change speech to text in deep you have like a lot of languages but if you want to change text to speech right now I think the only one is English but maybe something change or just okay we also have some other options like beat rate and Sample rate that we won't adjust but you can adjust the quality of the audio you can make it better so may maybe a bit slower I also tried setting the bat rate as slow as possible so the the voice was barely understandable yeah and all that will impact the latency right it would probably if you increase that it will probably uh like increase latency as well but what we could do here is to set encoding and container as well what you proposed because by default it's just I think no no it's by default I think it's mp3 mp3 okay yeah but we could do better right uh than MP3 I think so it was suggested that the OGG should work better for this use case but I'm not true mhm when we switched from MP3 I think to OG uh it was noticeably worst like as you you could hear that's the Quality quality but it wasn't like something bothering it was just uh is it did I get it right it's container OG and encoding oppus exactly okay okie doie and now uh it will return this response uh and there's another important part here yeah because we could yeah we could return this response but we don't want to return it as is we can return it as a stream which is very important here so the idea is that when you return it as a stream to to the browser browser can start playing that file even if it hasn't received the whole file right that's the idea maybe yeah so we return a stream here yeah and if we get a stream um yeah we could also get headers but I think you don't need to because we are not using headers um but I am I you are okay so we can return both yeah uh there premises and then you can just pass the stream directly to the web API response right and the headers as well yeah yeah and that's the I would say in the most optimal way uh at least when we are talking about HTTP because in a moment we'll be talking about transports which are more efficient but for HTTP it's I think the best what we can do and that's all and I think it should work so it's I would say well we need to also do something with the response on the client side yeah of course CU right now we're just consing it and we can still cons log it we could cons it but I'm not sure if that's important we can just directly just let's see it will be bu hello yeah so we can see it returns some kind of bites some like a and we see that maybe I'm not sure if you can see this but it's oppose the header of the file right or so it's clearly an audio file mhm so let's do something about this and that would be should be very simple we can just uh there is the audio tag right and we can just yeah so but in order to set it we need to maybe let's let's try with the audio TP first so let's put it maybe here mhm and we need uh yeah we need uh URL right we need URL but we also I think we need some other props like autop play I think no yes autop play we could hide this because we don't need to so we just adding the the player that will start uh playing automatically techically and as the response is streaming but we don't want to show that so that you know you have just like a feeling of what are uh so there is uh autop playay right yeah and class hidden yeah we can do that of course ah yeah yeah class name hidden and it's not URL but it's Source like with the oh yeah yeah of course wait it's not Source SRC SRC so yeah and we need to now adapt the like this blob we are receiving like a binary uh like a a stream of bytes right and we need to do something about that yeah so let me just for quick like housekeeping create this as a state variable mhm exactly so yeah okay that's exactly it and and maybe we could change that [Music] to okay it doesn't really matter here it's it's okay so here we can just um we can use this like a buil-in think there is like URL object right yeah the change I also did is change the Json to blob because we want the binary data exactly and now you said URL all major I mean with exactly create object URL that's exactly it exactly and it Returns the it returns exactly something which is could be F like a put into audio tag yeah yeah this is also I'm not sure I'm pretty sure this is also not the most optimal way to do it but we tried uh to gain some very minor uh speed up and we we spend a lot of time on this and to to stream the this actually to we could shave I mean maybe one 100 maybe 200 milliseconds no I don't think so probably not even that that much with short responses yeah so the idea was to like consume the stream in a more efficient way but it requires some more like lower level apis and you have to actually consume the btes that come come through as they come so let's see if it works yeah hello hello please go ahead and ask your question go ahead and ask your question could you tell me a joke here's a classic one why don't scientists trust Adam because they make up everything that was pretty fast that longer Sil yeah I got it sorry about please go ahead and ask your question and I'll respond with as so it depends I think sometimes maybe we are uh rate limited but if if it's fast it's almost as yeah when it's fast it's fast yeah when it's fast it's fast as it feels uh Pretty Natural I would say um yeah so I think it also could be because we are like the computer and the internet is doing a lot of things so yeah I don't know for some reason yeah space energy so let's maybe summarize this uh so we use two sdks the gro one which is compatible with open AI so if you don't want to use Gro you want to use open AI directly which is now faster you can also try that and you don't have to change anything you just replace the gro with open Ai and the rest and the route would stay the same and then we have the Deep gr SDK which we use to change text to speech and then the boiler plate come boiler plate comes with the already preconfigured uh speech to text right sorry I wanted to interrupt but no you just said what I wanted to say yeah yeah played and there's like a lot of parameters that you can play with the model the type of model the language as well uh and all those like things we mentioned there is the part about emotions we haven't like touched we just briefly discussed it but you can yeah sentiment analysis sentiment analysis and and all that and now so this and uh if you because we we could actually uh use the sent analysis to analyze the emotions in in deep gram and then provide this additional context to the to Gro so the gro knows that you're sad or you're angry for example exactly and then then then way this way you could like uh have different answer basically yeah um so now we have the basic structure of that and I wanted just to say that if you could revert the like the most recent Comet you added this like uh latency meas measurements right so we are displaying with with the code we are going to share with you you'll be seeing at each step what's the latency yeah and from our like testing for the last two weeks I would say and the hackaton we were able to like have on average around 1 second of latency on average which is pretty good and then um we were investigating other um transports right because now we are using HTP but we could do something else yeah HTTP is well it's it's pretty good it's pretty good but yeah still one second I think we'll see I don't see the comments right now but I hope we can discuss this with with our viewers uh but um we have two other options for the transport we could use websockets and which is something you implemented at the hackaton and I think it was much faster right uh yeah because uh what we did we actually instead of using the uh HTP we used websockets yeah in there it was it it it was a bit faster because the communication happened like both both ways uh via hour but it was also more complex because we had to set up the server exactly that's not quite compatible with versel for example because it's something that needs to it has to be stateful and run all the time needs to be it's going to be Lambda um and uh what's important is that maybe just for the record with websockets you just you don't pay this like a penalty of starting a connection each time right because with HTTP you have this like it's really hard to measure with http2 or or three because sometimes it's also like preheated yeah but in in theory it could be slightly faster if you're already open and like you de send the data and we observe something like that and we also seen uh other people doing using web RTC yeah and it's for the transport in theory or or on demos it's way faster half what we have here yeah so it's 500 milliseconds on average uh some people reported that and 500 is like almost exactly what open a I showed and it's possible right now today uh and this is kind of mind-blowing because this is still the like older approach because we haven't today talked about speech uh language models right yeah that's new things yeah but be faster the project that you talking about was that pipe cut yeah so I have um do you have those uh so there is yeah there is this project um called pip cut and it's in Python and the idea is to have like a framework or like a mental no framework sorry a framework that like a Bo boiler plate or like a template for building those voice agents with everything that's already set up and all those like uh things related to Performance and like uh latency already like well design designed and well placed and you can just focus on the actual um building the actual agent so that's it's uh it's built by uh people right from daily I think it's the CEO of daily okay so it's pretty it has eight contributors so yeah but I think the the person that if you show the contributors the most which is also pretty crazy for Silicon Valley that you know a CEO is super technical cool and uh uh oh no it's not sorry that the CEO is this guy I think the the fifth one that's the so no I was I was wrong but I've seen that person tweet um yeah but uh they are using definitely using daily right yeah daily is another startup related to to AI also sponsor the hak we we went to and they allow for they're buing like a tool around web RTC for video for yeah of course so it's it's voice and video in a way that they you can like create those something we'll be showing during the course you can create an agent that connects to zoom uh not only listens to the conversation but can also see the like your facial it could in at least in theory analyze yeah but but mostly the the The Voice mostly the voice but also like the the image and they by default I think daily is using web RTC yes for video um like because video is large much larger than audio right they also using 11 laps instead of uh de jum in this project but you could use either of them so I've been testing for this project we showed I've been also testing 11 Labs M and for the um maybe for those of you who don't know the 11 laps is the Polish unicorn startup that are specializing in text to speech and speech to text yeah Al pretty interesting so it could be a good exercise to maybe change from Deep ground to 11 laps uh I had some problems regarding because I was working on this voice chat for a friend that it responds in French and I was thinking that uh 11 Labs would be good at text to speech in French but it generates some strange things so maybe someone maybe your French is not that good maybe maybe or no I mean I was like I was like showing this to my friend and he said like it's kind of strange sounding French for a French person either way uh the PIP cut project it's it's in Python but it has tons of examples here even if you go to foundational they actually walk you step by step through the process of building it so you can you can actually understand how it works and you can adjust it to your need and you're starting with just a thing that says one word and then go through all of it yeah and you you get to to I don't know to interruptible model or Bots arguing the example there are tons of uh and it's just one folder it's just the foundational there are more more uh like real life examples here and it's really easy to run to yeah so I'm pretty excited about this about voice I think it's it's really uh especially in the context what what open AI showed and as I said the beginning I think open AI still is working on the capacity to like you know treat all those potential voice requests uh with the infrastructure they have I think they will be fine but wonder when however on our in our code B in our demo we observed that the voice is not really that slower than the the llm operation that's weird actually makes you wonder yeah it's not Transcription than text to speech it's it's not that slower you would expect yeah the inition like says otherwise right it's not like you would imagine it as yeah but so it requires some like uh maybe playing with like testing what's possible and and getting this like U intuition about how all those tools behave together uh but I'm also super excited about like this new era so people as as as you said at the beginning it's pretty funny they inventing new terms or new acronyms so up till now we were talking about llms large language models but now there is this new term like speech language models like slm right which gazelle the one I I I showed is like a example one of the first examples so I understand the joke now because I saw someone on Twitter asking what's the acronym for small language models now if the speech is takes the yeah exactly but because there is like it's another topic but when we were like at one of the hon Microsoft released this like a small language model there was like a very small language which five something right yeah five3 five3 which is like Phi three Greek Greek yeah which is pretty amazing it's but it's another topic it's like super powerful and still very small and yeah so there was like this discussion about the models that can live on your like a phone right yeah we'll see what Apple shows on WWDC in June yeah but back to to voice so yeah we have a lot of tools now and there's like a lot of positive basss a lot of excitement around voice MH uh and yeah so what we show you today is something that you can build right now and you have the capacity so you have the tools that you can use today to to build those things which are not that far away from what uh also they are very cheap actually yeah they I've been running on my like the the the bonus credits that we got during hon from Deep I've been running home then for for a month and still not not paying I wanted to because today I I seen this interesting post by a friend if I may call or someone we know acquaintance yeah overment uh and oh yeah that's a friend and and he wrote this post about how much he pays for open a and he disc like it was very nicely described like what what happens like sometimes you need to send several queries so it's it's getting like cheaper but it's still not that cheap if you like do something some substantial work and you're doing it all all day uh like each day I mean yeah uh so yeah it depends but as you say we yeah we got 200 credits and I'm at9 now and I've been doing like tests the whole three weeks so it's um uh ask a question isn't self-hosting it was long time ago isn't self-hosting Lama 70b like 5K a month it it could be I don't know we uh what oneel did we use in here actually yeah it was Lama free but not 70b yeah we just 8B yeah so the the one that you can actually run on local machine it works even on my old Mac so and it it works pretty well so yeah it could be very expensive because not only expensive but expensive in a way that if you want to have it run like for the inference to be fast uh you have to like have have it uh you have to buy a server from Gro I think yeah that could that's actually not uh not that bad idea because um Gro is building those pipelines specifically for inference right and all other approaches are like trying to bend certain like computation architecture to to this new let's say thing uh which is required so uh that you need to run a lot of computation in parallel uh and this requires a lot of uh computer power yeah and this is even more important for audio right a and then for video for more it's much more data to to analyze the files are larger not to mention that one day maybe you'll be sending 4K or something like that which is like huge files right also now I have this thought uh it's much harder to cach the audio audio or video because the algorithm are not for cashing are not that yeah I mean on top of that yeah yeah it's just harder uh and ALT uh 890 says that you can use server andent events for input and HTTP for output in hybrid mode yeah interesting that's interesting yeah I actually never to be honest I never used server sent events I mean I read about them but never had the need uh yeah I was once when I was working in a bank we were doing this like um uh currency display that's perfect example you don't expect any input you just want to update something you don't need to send anything you just expect constantly changing in tickers yeah so you don't need to maintain the web soet connection exactly and the servers and events are ideal for that uh so this could be good I think you cire some testing I wonder how much lat you can gain or like how much you can reduce it or lose or lose yeah so maybe could we could test it but I think the the most no I'm very excited about trying web RTC but it also requires requires a separate server and a strong one stronger one that for uh for web suckets for example I think it's it's the most uh yeah so if so if someone this question wasn't maybe asked but if you ask for example what's the next step I think on one side you can start building using what we showed and maybe consider web RTC and and do this and you can have pretty nice uh voice uh agent and on the other hand I would look into those like new models this those speech models that get speech directly until open releases it and I heard some rumors that it's not not going to be soon to General Public for me for me I I also feel that it won't be soon because as we said again it requires a lot of computation and I wonder because today on Twitter I proposed to Gro that maybe they should try to take gazelle and run it because they have amazing inference capacity uh and I think that they could do something great with here it's funny that you say that because you asked them and they followed me so maybe they maybe they got it mixed up you know anyway uh I think um I'm trending now yeah but regarding this questions from Gregor I think uh instead of buying something like uh custom uh yeah using Gro cloud and if they support this like open model because they plan to support open models right yeah I I'm wondering about bik model the Polish one yeah yeah maybe at some point it could be also trained on audio we should maybe talk to Sebastian right it was uh yeah so I think that's pretty much it uh I don't see more questions uh today if you have any question asked and during the course we'll be talking a lot more about this we will be building actually a bot that connects with zoom yeah uh and and does it interactively so to say so and the idea is to adapt because everything is changing so fast so maybe we won't be bu building that I think we we will be building that it's just the question what we will be using what would be the most um adapted or the most the best approach to to use because we want to provide you with the most recent uh techniques m and methods so uh just a matter of tools and we want to be up to date so we are just thinking how to do this so you are uh on the on the edge on the edge of AI yeah yeah uh someone someone's asking uh what tech level of people are you targeting for this course interesting question I think someone asked that on email but we still so I would say I would say this course is for every everyone um however you need to be a developer probably you need to know how to I'm not sure because things I I you know what we seen in in in in Silicon Valley right there was a lot of people who were completely new to programming like especially younger people yeah and they were super motivated so yeah you have a point so if someone is super motivated and and now I think it's a good time to start because starting programming is different now than it's easier probably thanks to copilot and exactly that's my point I wouldn't say it's easier I would say that it's different yeah it's it's just a different mindset you have to because yesterday we had this like uh we were this there was this event in in waro AA and there was this person talking about that if you are not using was toas right toas Kinko if you are not using as a programmer copilot or something else AI in general it will be replaced and I I agree with that uh you have to use it use it and um you have to use it you have to know how to use it well yep it's just about completion there's like a lot of things and we will be also touching on that and there is like a lot of Alternatives interesting alternatives to co-pilot we met one with one of the founder of continue the death with Tyler we talking about that during the course and um so they are building is kind of like a open source copilot let's say and they are is more than that because they are analyzing um not only you know what you WR uh but they also can see like in between what happens in in between the changes that's the promise that's that's the promise and I think that in some to some extent they are they are doing that uh and that could be also so general idea is that you can have more context uh of what happens and you don't have to share it with some third party company like uh GitHub right yeah of course you can run locally you can run locally and and you can like create your own like um it's it co-pilot which is adapted to you as a person or your company for example or your company exactly in a company context is even more more interesting and there's another one yeah called super Maven I haven't heard of it yeah it's pretty interesting because it's it's it's the promise here is that it's super fast so with co-pilot you have this latency again yeah and it's I would say before I uh uh before trying super Maven scop pilot was good enough for me but after trying super Maven I was amazed it's so fast it's just you start writing and it's like almost immediately feeling so the sensation of a flow is even greater and and if it's even like 100 milliseconds of difference it makes a difference and I wonder if copilot will be able to catch up to that so I highly recommend that uh also also had this thought it's a bit of like I don't want digress but maybe for a bit Yeah I had this thought some time ago that uh before AI boom there were a lot of people who are developers who are ashamed of the fact that they didn't remember something ased of using for example especially Junior programmers if if You' met such people or they felt bad because of using Google and right now if you're not using Google and not using AI uh I mean mhm what have you been doing that the change of mindset is is like 100 180 degrees right yeah you have to I you can do so many things much faster um with with those tools and uh and there are so many of them and it's changing so fast so it's also difficult it may be difficult so regarding maybe because that was like a long response yeah but I would say that we require on one hand we require that you already programmed in a way MH that's that could be good something but if you haven't done any programming in the past you but you really want to but you really want to and you're super motivated uh what we've seen people we've seen that in the be fine I think as like doing this course you just have to have time and uh because before in in the previous courses a lot of people maybe not they weren't complaining but they were saying that there was a lot of materials and if they have if they had more time it would be easier right so the time constraint was kind of like a a factor here no I I mean from my experience the time constraint is never a problem really it's just motivation uh how long is the promo code valid think B is not uh till the end till the end till the course started okay so it's like a couple of uh yeah at least two weeks at least couple of days Yeah Yeah couple of days and how it's different from AI devs from bra Brave courses yeah as well it's funny because we are actually sitting in the brave courses uh studio right now uh it's I I would say it's completely different I mean it's more like uh like they we these courses complement each other rather than like replace I think it's like sitting with you on the T the table look your course is like sitting with you here like it's very cozy and very about coding and programming with you yeah that's that's something yeah so as as B that's you people hear us but hear you uh but as you say it's more about being yeah Coy and more so I think AI sitting with us like PA programming with us together and more about coding than than not coding but I would say the major difference is that when we are thinking about that is that AI now is entering every domain and it's it's not no longer about like is it of course about AI or not AI is it about like where you can use AI so AI devs is is amazing course I recommend everyone to like check it out mhm uh it focuses on like a fundamentals like on on building uh great stuff uh and we wanted to build products yeah approach it from the maybe this nextjs angle right show you something which is like how you as a coder that has familiar with nextjs you can like amplify your how can use your web development skills to build uh upon this with with AI basically and and build as you say products so this mindset of like a Indie hacker so startupers so AI many people say that AI will make that companies will be smaller and smaller and yesterday it's funny also toas Kinko mentioned that that now you have startups that are run by two and three people they are much more like capable with this new mindset of using AI of course than companies that I know like a year ago were being created by run by 10 or 15 people right because that's the factor of um so you need to know how to use and we want to like show you in this course we want to just show you how to build like a b prag pragmatic how to build with nextjs uh five maybe more projects M uh and uh maybe some of them could be used as a starter points for real products for real products that could be then if you have an idea um like we'll be brain brainstorming different things we'll be proposing our ideas like very simple yeah I would say imagine a hackaton every week yeah exactly it's kind of like a hackaton with AI uh but different areas of AI every week new hackaton so to say uh and we also want to be yeah we try to be as as I said close to the changes that happen yeah right now in Silicon Valley based on the on the trip we we did and we are planning maybe to try again come back yeah so that when we each week would be hopefully something that is the most recent uh developments it would be like exposed to the most recent developments that happened that that's our goal at least so if hope that answer the the question if it doesn't then hit us with an email yeah we probably will answer yeah yeah we have have trouble answering you sorry about that um so I think that would be yeah I think concludes the the live uh thank you very much and uh yeah see you on the next one and join our course 021 ai.com and watch our interviews on YouTube yeah there's seven or six still to be published right yep yeah okay thank you for today see you bye
Video description
Join 0to1AI 👉 https://www.0to1ai.com Creating a voice AI assistant is challenging because it involves three key components: converting speech to text, processing queries with an LLM, and transforming text back into speech. Each step adds significant latency, making interactions with the assistant feel less natural and human-like. Discover how to minimize latency in your AI applications with LPU. Improve UX by implementing AI-driven voice interactions. Leverage cutting-edge Web APIs to seamlessly connect components and create a human-like AI assistant. You will learn: ✅ Deep dive into Deepgram ✅ Different modes of Deepgram ✅ LLM, Inference Speed, and Groq Cloud ✅ Benefits of using LLama3 from Meta ✅ Next.js and AI from Vercel to connect all pieces 🎁 BONUS: 50$ discount for upcoming Oto1AI course! See you soon! It will be awesome! #ai #0to1ai #aiprogramming #deepgram #groq #nextjs