bouncer
← Back

Zaiste Programming · 273 views · 9 likes

Analysis Summary

20% Minimal Influence
mildmoderatesevere

“Be aware that the praise for specific AI tools (like Deepgram) is influenced by the hackathon environment where those companies provided credits and prizes.”

Transparency Transparent
Human Detected
98%

Signals

The content is a genuine human-led vlog featuring natural, unscripted dialogue between two individuals at a physical event. The speech patterns, including humor and specific technical anecdotes, are characteristic of human interaction rather than synthetic generation.

Natural Speech Patterns Transcript contains self-corrections, interruptions, banter, and filler phrases like 'I mean', 'yep', and 'I guess'.
Contextual Authenticity The speakers discuss specific real-world events (Cloudflare hackathon), specific pricing (11 cents), and personal dynamics ('I'm doing all the work').
Vlog Format The structure follows a traditional human-led vlog with spontaneous dialogue rather than a scripted AI narration.

Worth Noting

Positive elements

  • This video provides a realistic look at the technical latency challenges involved in building real-time voice-to-voice AI applications.

Be Aware

Cautionary elements

  • The casual dismissal of venture capital (YC) and the promotion of 'bootstrapping' serves to build an 'insider' identity for the creator's own community.

Influence Dimensions

How are these scored?
About this analysis

Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.

This analysis is a tool for your own thinking — what you do with it is up to you.

Analyzed March 13, 2026 at 16:07 UTC Model google/gemini-3-flash-preview-20251217
Transcript

Another Day in [Music] Paradise select the seat 2D and proceed to check out okay another day another adventure Another Day in Paradise so today we are going to hackaton and we are going to win it right today we are going to win hackaton that's the the first first price first price is Apple Vision Pro or uh 3500 so I guess we'll take the money maybe let's talk about the topic of this hackaton it's about real time AI yeah multimodal AI multi multimodel voice related AI uh so building like personal assistance uh or anything that requires you to created to voice basically audio interact with AI using voice yep so we had a couple of ideas maybe we should discuss them yeah we had like uh we used perplex thei to generate a couple of ideas because we didn't have any no that's not true we had some ideas we had some ideas and but we also use perplexity Ai and actually perplexity AI generated one of the ideas that we had yeah right so I mean you had because I'm I'm doing all the work and you are the creative person it so the the first idea we we had was about creating this assistant that can correct you when you're speaking a foreign language so for example imagine you're not feeling comfortable speaking English yeah like we are like we are right now and you're making small mistakes um or a huge one like you do exactly so if if you if you are in this situation you could like talk or exchange with assistant and that this assistant that person or that AI will respond to you but it will also like give you some hints how you could improve uh the way you speak and we correct your mistakes correct your mistakes and I was thinking about this like a demo we could have because I happen to speak more than one language compared to you we we we were thinking about doing this Dem up on the Fly we change the language uh and each in each language like English Polish French Portuguese maybe even Japanese we we will do some mistakes and as we speak and change the language I can do lot of mistakes in Japanese because I don't know any yeah I think this assistant would be would be perfect for you we are joking but this was one of the ideas but and it's it sounds very useful right but uh I think the the hardest part actually because we have all the tools we can use different AI models for speech recognition speech to text text to speech maybe we could talk about what what was provided by the hackaton or what what companies are being like offed like some kind of credit because the one of the sponsors of the hackaton is degr De yeah which is probably one of the best companies in AI voice recognition and the other way around but is is it the best because yesterday we were we were doing some kind of benchmark of different providers for voice Ai and it seems open AI is like good enough I would say it just the difference between Dam from what I seen is minor uh what degram is doing they are adding some a bunch of additional features like sentiment analysis and intent analysis yeah and I'm sure if that's that's going to be yeah that's something they also when you speak the they and and record The Voice they split the they do the speech to text and they split it in into separate words and then uh they give you information when each word starts in in terms of milliseconds in the audio recording so it would be perfect for generating subtitles for example yeah I would say that deep gram is like open AI is doing like 80% of what's really needed and degram is doing the 20% uh so yeah it depending on your use case but we haven't compared pricing yeah exactly so this could be the deal breaker that's funny because I was just thinking about that uh yeah what's the what's the price uh testing the Deep gr yesterday we spent let me think 11 cents I think mhm uh not sure about the open I I would need to verify my account and for the hackaton we got $200 uh of credit for just for deep gr uh so yeah deep gr is interesting and I was thinking in order to win I think we should use the whole like a r of they provide me use the tools from the sponsors yeah use the tools from the sponsors so not only transcribe and Speech to Text uh I mean text to speech uh yeah both yeah I mean both yeah but I I really think that the tools are there and anyone could use them but to make this app really like Smooth uh we need to work on user experience and and for that we need for example voice activation which is not that easy to get right yeah so let's maybe uh maybe let's not spoil no no let's maybe talk a a little about that but let's because um yeah because when you are talking to assistant uh you want to be able to uh talk in a way that whenever something is said you can interrupt yeah like human being like I'm interacting you right now yeah you're just entering my every every sentence I'm trying to yes finish I cannot finish my sentences yeah but the idea is the same that you're creating uh something that you can interact so as you said it's a voice activation so AI starts speaking and then you interrupt and there's a new context right new information provided with your additional sentence and AI needs to react to that and it's not it's not trivial yes as with like regular transcription or like a activation on like with a button or something like that when you press a button record and then stop usually most of the demos are just like uh the audio element in HTML and you to click record manually and then click play or already auto play but voice activation is a bit harder you need to full Ser driving you need to do a lot of a lot of more work like lower level yeah so for example we were thinking because uh that's one problem the other problem is the latency in order for the AI assistant to to feel human in a way mhm or like reactive is that I think you need to be below 500 milliseconds maybe even less so that when you just say something you hear the response immediately but it means that your voice is sent to the AI it's changed to it's transcribed to text to text then the text is analyzed through L Yeah by another Ai and then this uh response is also changed to speech this time yeah so this takes like there was like let's see three three things yeah three round trips to do that need to happen in order to get the response and we need to do it very very fast so that you can have this like a reactive humanik convers conversation yep uh so we were thinking how we can reduce it so the problem is also that with the uh browser API it's like a kind of difficult right now it's very rough I would say um and deep gr compared to uh open AI provides like a websocket interface right API that's interesting we are thinking about using that maybe and uh that's one thing and the other thing is that we we got some from our friends we got some examples of tools that already exist like interesting startup or maybe just a proof of concept of a startup that does this yeah so there's like a web call uh thing you can like uh schedule an appointment with yeah and you can interrupt the the bot the AI uh so it's very fluid and we were reverse engineering it and you found you can't say that in the US I cannot okay it's legal and you uh I mean reverse reverse engineering in a sense that we were seeing how they what they do that's illegal and uh and that's that's called Espionage and uh can go to jail for that you you found out that they were using something like a low lower level API right yeah but we will not discuss the details okay let's keep the details for for later yeah so we are thinking about combining different things uh and we want to reduce the latency so that it feels uh human so that's the objective for this haon I think the interesting part of thing about this hakatan is that it's in Cloud FL office by the way Club restating office to to do the hakatan so yeah and it's interesting because we've been working Cloud for a lot like workers workers AI Vector eyes and we've been using clout a lot recently so it's a certainly pity yeah so what what's interesting about geni is that they are bootstrapping right they don't they don't have any Venture Capital right yeah it's even even better than this because they actually got rejected from YC two years ago yeah but I think many companies got rejected from my it's not uh and they they become successful afterwards so that's not a I read this post on LinkedIn that uh getting accepted to YC nowadays is like a red flag because the companies that got accepted to IC they are doomed to fail usually interesting I yeah so the trend is changing but it's kind of like a I don't know it seems like a random but I think it's it's still interesting to to well I don't want to dive into like YC right now but I think it's interesting when you meet different people what what we did with Hacker House is like yeah of of course what I'm trying to say is that uh I guess bootstrapping is hot again so uh yeah they they had uh a lot of like uh I mean it was combination of luck great product like maybe timing Etc so yeah they did great and what's interesting they uh uh they have impressive numbers right I don't remember exactly but I wanted to interrupt you about the timing because they actually the timing was poor because they started just before the pandemic and they they set all the users cancelled during the co interesting so they had this very uh yeah that the turn rate was was huge uh yeah but the numbers yeah they they have impressive numbers I yeah they didn't they didn't give up yeah and now uh they I don't remember exactly how much they earn like maybe 400 something yeah 400,000 yeah per month yeah monthly recuring Revenue so it's uh it's kind of interesting and they are like just a few people um so yeah we met with the CTO and there were other interesting people uh in the house in the house funny yeah so see see you at the H I guess see you later [Music] yo yo yo yo yo [Music] you [Music] yeah we are here at the uh Cloud office one of my favorite favorite companies super happy about about that all right thank you let's get [Music] in please upload a [Music] photo thank you I have this one like an ID [Music] excuse me can I get one of those yeah sure can I get a medium maybe okay thank you thank you it's huge anyway uh in the description of the event they said that we can use any of the open spaces so we can go downstairs we can go okay anywhere [Music] I think there's oh no there's a ball [Applause] here here to our first experiences to to be creating this opportunity to all of you guys [Music] so we are about to start hacking uh on our project during the hackaton yeah we had some breakfast and it was pretty pretty delicious the event is organized very well in my opinion the lot of ambitious people and the companies there I mean deep gr Oracle uh yeah we daily we had some talks with some people from those companies we are hoping that we can utilize all of their offerings actually during this hakon it's actually pretty useful this could be useful for the for our project yeah measuring the internet because we are in the cloudro office so measuring the speed of the internet and it's about the same that I get in my house in in G in Poland mhm so that's amazing for America [Music] W where this hackaton that we've mentioned today in the car uh in Cloud FL office it's huge you probably saw that yeah pretty nice office uh we've been hacking for a few hours now yeah learning about uh real time voice interfaces mostly we work with the Deep gram uhk yeah AI uh tooling uh we wanted to reduce the latency that was the the problem for us because when you sent your voice it got it's changed to text and then it's sent to llm and then sent back to to you as as a voice of AI and and many demos it requires like inter manual interaction you have to click the button when you when you're done talking so it doesn't like detect when it's done yeah and we wanted to change that we wanted to like have this like a voice activation and AC and a feature which allows you to interact or interrupt rather when you are speaking to uh to your computer so we spend a lot of time on this and you came up with this nice idea of using audio worklet right and then we combined that with the web socket interface that the perfect combination it was nice to have one of the people from Deep gr they helped us to um clarify certain things regarding the API yeah and brainstorm ideas together perfect yeah we use this like a b batch AP res API which is not optimal and it doesn't it it was pretty fast but this what what we have right now is like mindblowing much faster uh order of magnitude faster and I think before we go to the demo I I think we want to thank the Deep for the $200 of credit free credit yeah given to every everyone at this and a big sh uh congratulations to organizers of this event uh yeah pretty nice we can maybe show bu it's not like the ideal because your suggestion was to take uh go with the flow and take the demo that verell prepared for the Google conference that happened two or three days ago right yeah and and extend it and make it even more accessible and easier to use yeah so the idea was that you have this like a generative UI approach where you can like like ask certain things using like a regular typing as keyboard or Mouse and then it provides you with some widgets react uh widgets that allow you for example to buy something or to do some booking flight flight booking uh any other stuff like you can book books you can uh I don't know join a conference call something that sort purchases transactions I any and everything from one place right Y and we wanted to amplify that demo by adding a voice right so idea is that instead of writing uh you could just talk to your computer and it will understand and it will do all those operations for you so our demo let me let me maybe show you it's not perfect list flights from San Francisco to Paris select the flight departing at 240 PM select the seat 2D and proceed to check out pay the amount yeah so it's almost perfect yeah it's it worked pretty well yeah uh except some minor like issues we are still solving but what we can also show is how it works in real time because right now it's it's almost like a dictation so it waits for you to stop yeah uh but we actually added more because uh you can talk pause for a brief second to think for example and then continue talking and it joins the uh separate sentence or or words list flights from San Francisco to Paris Now it worked yeah and it it could have been seen on the screen that it uh displayed first part of the sentence and the second part of the sentence and the third part of the sentence and it just worked so and it works real in real time it's sends the voice stream uh through websockets to the decrum apis and it's really really fast yeah so that's what we have uh we are almost it's almost over we have like 1 hour I think left yeah so we have to still cut the video uh submit the form to the organizers and then we have to wait for judges to well judge our [Music] submission me up [Applause] so Synergy AI is like Synergy and Ai and let me show you the demo first and then we will discuss it and we're done so the idea was that we could interact with yes so let me just discuss this uh we wanted to create this like a voice oriented uh interface so you could connect different services for example e-commerce booking flights everything you would imagine and connect that using voice so we use this demo from verel uh but instead of typing you could use your voice and because of the time constraints we only uh integrated with digram and uh that's the result thank [Applause] you okay um so this is AI 3D can voice and it's a experiment in using your hands and your voice to create AI 3D models [Music] [Applause] [Music] when see question uh online [Music]

Video description

Join 0to1AI 👉 https://www.0to1ai.com Join us on a thrilling journey at a Cloudflare AI hackathon where we tackled real-time AI challenges, focusing on voice recognition technologies and AI assistants. Watch as we brainstorm, develop, and debug our project, aiming for the top prize. Chapter 1: Introduction to the Hackathon (0:00) Introduction to the "Another Day in Paradise" event. (0:11) Details on choosing seats and checking in. (0:20) Overview of the day's adventure and hackathon goals. Chapter 2: Hackathon Details (0:25) Discussion about the hackathon focus, which includes real-time AI, multimodal AI, and voice-related AI applications. (0:47) Explanation of building personal assistants and other voice interaction technologies. Chapter 3: Generating Ideas with AI (1:05) Discussion about how they used Perplexity AI to generate ideas. (1:11) Mention of having original ideas and enhancing them with AI tools. Chapter 4: Language Assistant Idea (1:31) Introduction to the idea of an assistant that corrects language speaking errors. (1:36) Details about how this assistant would work in correcting mistakes in real-time while speaking different languages. (1:58) Example of how the assistant could improve communication in English and other languages. Chapter 5: Hackathon Preparation and Strategy (2:04) Strategy talk about winning the hackathon and potential prizes. (2:20) Discussion about the technical aspects of implementing the language assistant. (2:47) Additional features like sentiment and intent analysis mentioned, provided by one of the hackathon sponsors. Chapter 6: Technical Challenges and Solutions (3:01) Addressing the challenges of speech recognition and latency in voice interactions. (3:27) Technical details on how voice data is processed and analyzed to enhance user interaction. (3:50) Mention of sponsors and their tools, emphasizing the use of Deepgram for advanced voice recognition capabilities. Chapter 7: Future Plans and Closing Remarks (4:10) Plans for improving the user experience by reducing latency and enhancing fluidity in interactions. (4:56) Discussion about the importance of using hackathon resources effectively. (5:10) Closing thoughts on how to leverage sponsor tools and technologies to maximize hackathon success. Chapter 8: Post-Hackathon Reflections (12:38) Reflection on the experience at the hackathon, including the office environment and networking opportunities. (13:00) Personal insights about participating in the hackathon and the learning outcomes. (13:25) Final remarks on the successful use of AI tools during the event and the potential for future projects. Follow us: https://www.linkedin.com/in/zaiste/ https://www.linkedin.com/in/mmiszczyszyn/ Join 0to1AI: https://www.0to1ai.com #Hackathon #VoiceRecognition #RealTimeAI #AIChallenge #SoftwareDevelopment #TechVlog #CodingLife

© 2026 GrayBeam Technology Privacy v0.1.0 · ac93850 · 2026-04-03 22:43 UTC