We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Mark Kashef · 7.4K views · 339 likes
Analysis Summary
Ask yourself: “Did I notice what this video wanted from me, and did I decide freely to say yes?”
Worth Noting
Positive elements
- This video provides a practical, hands-on look at how to structure multi-agent LLM workflows (retriever, architect, critic) which is a highly relevant architectural pattern in modern AI development.
Be Aware
Cautionary elements
- The use of academic research (PaperBanana) as a 'scientific' wrapper for a product demonstration can make marketing claims feel like objective technical truths.
Influence Dimensions
How are these scored?About this analysis
Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.
This analysis is a tool for your own thinking — what you do with it is up to you.
Related content covering similar topics.
Claude Just Rolled Out 2 Big New Features
Matt Wolfe
This New Claude Code Feature is a Game Changer
Nate Herk | AI Automation
Claude Code to Figma explained in simple terms for UI designers.
02ui - Murat Bayral
This New Claude Code Feature is a Game Changer
Nate Herk | AI Automation
Claude Code Skills Just Got Even Better
Nate Herk | AI Automation
Transcript
So, I just figured out how to combine the power of Claude Code agent teams with Nano Banana to make beautiful illustrations and graphics just like these [music] with a single prompt. No Photoshop and no design skills at all. All you have to do is combine the knowledge of Paper Banana, a brand new paper that came out from [music] the Google team in combination with our newfound army of agents that you can spin [music] up natively in Cloud Code, and you can have an entire suite of agents that can reverse engineer any image, research the style, create new ones with completely new applications, and most importantly, critique them in a beautiful feedback loop. So, in this video, I'm going to give you the TLDDR of what Paper Banana brings to the table in terms of insights and how you can apply them today to create whatever image you want in as little as a single prompt. This is definitely one of the cooler applications of agent teams. So, you're going to want to watch this till the very end. Let's jump in. So, each and every image you see here was all generated using Nano Banana and our suite of agents. And you can even take a look at this. Look how beautiful these cups are that are thematically designed like the flags of the nations as well as all the different texts showing up perfectly. And all this was inspired by one of my favorite websites that I used to look at 5 10 years ago and still go on once in a while today which is called the visual capitalist. And if you're not as familiar, it's basically an entire website where all of the different news is communicated purely in visuals. So you can see here ranked the jobs most exposed to Genaii according to Microsoft. And if we click through, there's no essays and paragraphs. You just have visuals. So it really takes the idea of a picture is worth a thousand words and brings [music] it to reality. So my goal was to take an image like this, which you would have seen on our canvas, and an image like this, and then reverse engineer it using my Cloud Code agent team to recreate it and apply it to a brand new scenario. All these images I just showed you were only made possible to do in an agent team format in one shot by using the insights in this paper called paper banana. And I'm going to give you the TLDDR of everything you need to know. So you don't have to go through it or even upload it to your LLM to ask it for the insights that you need to know. So the main premise of the paper was that researchers were trying to find a way to recreate academic diagrams for academic papers. And what they ran into is that even though AI could write papers, run experiments, and review literature, it couldn't draw the figures they were looking for in a reliable way. So the big idea here is they posed the question, what if we treated image generation like a design agency with a team of specialists? So in their case, they created a retriever that would go and look at all the existing reference images. Then they would have a planner and the planner would write a detailed description and turns the science into a rich visual description aka it can prompt engineer. Then you have the stylist that applies the aesthetic guidelines to said prompt. And then you have the visualizer that goes into not an endless loop but a loop of around three rounds to come up with an image. And then it goes through critique and then it goes through that feedback loop until it's good enough. And in terms of the results of the paper, the main thing to take away was that adding multiple iterations of critique from a critique agent [music] allowed it to better dial down exactly what the researchers were looking for. The researchers experimented with fine-tuning versus just retrieving. And they realized that showing the model any good diagram teaches it the structure better than anything else. So monkey see monkey do, which is very fitting for a banana based model. And when it comes to style, when it comes to accuracy, you could increase the conciseness, aesthetics, everything, and visual polish of the image just by having this critique layer. Which is why it emphasizes once again that the critique is the secret weapon. Without critique, it was, let's say, 45.1% accurate [music] in recreating the original image or applying it to a brand new scenario. With additional rounds of critique from 1 to three, you have almost a 10% increase in said accuracy. So that's pretty much everything that you need to know. So I applied this right away to our scenario and I created my own banana squad. And the banana squad looks like this. So we have the lead which if you remember from my last video walking [music] through the fact that leads are not meant to work but to delegate. The lead orchestrates. So the lead asks you 10 clarifying questions to route it to the right agents and then it presents ranked results and then it never generates the images itself. It then routts to the research agent that analyzes your specific reference image and then outputs the style brief, the color, the info, the composition, >> [music] >> etc. And then we have the prompt architect whose role is to create five narrative prompts. So my goal was to generate five images and then iterate from there. Then each would have a different take on your idea but still keep the style intact. Then we have the generator agent which in this case [music] uses and calls the Gemini 3 Pro API and it saves five images to a brand new folder called outputs. And last but not least, we spin up the critic agent which reviews images on four dimensions. And then it has its own KPIs for how it measures how well it did. So some of them are faithfulness, conciseness, readability, beauty. The last one's probably the most understandable for you and I. Then it ranks it from one to five. So based on that ranking decides, are we good to go or do we have to iterate again? So in terms of how our squad communicates, we're going to have the leads send the requirements to the researcher that will send the style brief to the prompt architect that will then send said five prompts to the generator that will loop through and talk back and forth with the critic until we have the result we're looking for, which is slightly different from the papers, but it's more simple to apply it in this specific way. So, if we pop into the terminal, I'll walk you through the mega prompt that we need to send over to create these beautiful images. So, it starts off with the following. It says, "Create an agent team called Banana Squad to generate professional highquality images using the paper banana agentic framework." So, in terms of the paper itself, you can see I have access to it right here. And this is basically a converted version from the PDF. Reason being the PDF itself has so much junk data behind the scenes that if I just rammed that in to Opus 4.6, it would take the entire context window for no reason. So [music] just converting it into a markdown file makes it that much more readable. So then I go through the structure. I also refer to my cloud MD that I've also added quite a bit of context to as well. And then we have the team structure. So number one is we have the research agent, the retriever. and I go through all the responsibilities. So one example of a responsibility is I say when given an image generation request scan the reference images folder which is right here. So the goal is I want to be able to just plop any one of my output examples. So those are the ones that I showed you before. This one and this one. And then this is basically the target. This is the goal of the team to replicate and apply to a brand new scenario. So then I also go through and I say read the Gemini API guide which is literally a copy paste from their website. So if you go to this one right here and we go back to our browser all I did was go to nano banana image generation API. I clicked on copy page pasted it in a markdown file and I'm just directing it to take a look. And then after completing the research we have the prompt architect whose responsibilities are to make sure that we basically optimize the prompt for the nano banana API and it's very descriptive and saying it has to be a descriptive narrative paragraph never a keyword list kind of like what you used to do with something like midjourney and then include a subject environment lighting camera angle mood textures colors composition and then everything else it would need. So every step really reinforces this point on what it should use and how. And then we have the generator agent. We have the critic agent which again has all of these metrics here. So just to convey them to you. Faithfulness is how well does it match the original request. Conciseness is does it focus on core information without visual clutter. Readability is the layout clear text legible [music] composition clean. And last but not least, does it look professional and visually appealing? And for us, this last part really matters. Then we go through just slightly the behavior. So as the lead, just so the lead doesn't step on the toes of its subordinates, it knows that the first thing it has to do is ask the user the following clarifying questions. Now, naturally, you can just override these questions and say, "Go and take a look at the output example. This is exactly what I want. I want you to just reverse engineer it and apply it to a brand new scenario. So in this case, it sets up the team first and foremost. They're all ready to go and then comes back to me with all those questions and then it won't kick it off until I give it exactly what it needs. So in this case, I just said I want to generate images exactly like reference images right here. So then it takes a look at that file and it still pushes back on me, which is good to ask me, okay, cool. I get the idea, but do you want it in a different topic? Do you want it with warm tones in 69 aspect ratio, etc. So then I tell it essentially that I want to do it on global consumption per capita by country. The original image had nothing to do with that. So then it creates a summary for the team. It invokes the team. They all go through, they create the images right here, and then it goes through the critic agent to rank them. And then the critic agent comes back with this basically a series of ratings out of 10. So in this case it says the recommended one is number V4. It tells you why and it goes through all of them. So that's why it's a very beautiful application of agent teams because typically let's say the thumbnail that you clicked on to get this video I've created a nano banana thumbnail pipeline for my YouTube channel that's taken me around one and a half to two months to put together. is actually very comprehensive and goes through multiple looping steps. So all of this essentially categorized and summarized everything that I did in a much more elegant manner. And just to drive the point home, this is exactly where all the files end up once we're done the generation. So this is what we pasted which is the spawn team prompt which I will make available to you in the second link in the description below so you can recreate this. Then we have the claude MD and the API guides. All of these which are sent to the agents as well. Then they use the Gemini API key to access the Nanobanana API. And then it goes through the reference images folder. So it's broken down by style, composition, subject, brand, and output examples. The main one that I care about is this one, but you could also add more richness by adding more here, especially if it's a very complex diagram, image, or illustration. Then finally, the outputs folder is where we get everything. So if we pop on over back here and we go to outputs, this is where you can see each and every permutation of the images that I've showed you. And this comes in handy in all kinds of cases, not just images, illustrations, thumbnails, anything you want to recreate, but apply to a brand new scenario. This is exactly how I'd go about doing it. And just to drive it home, we have the exact same prompt here, but in this case, I just asked it to do the following. So with that more square image with the globe of the world kind of split up into different sections I [music] said I want to recreate the world diagram but I want to make it about AI investment as of 2026. So you'll have to do some research. So in this case I give it the hint that has to go and use web search after it asks me for the sizing and then after it does that you can see right here it's gone and taken a look at statistics between 2024 and 2035. maybe one other source. Obviously, we could push it to do even more comprehensive research. It comes back with the TLDDR of the results, the sources [music] it used, and then it applies the team. And in this case, I roasted it because the first time it generated it, I was wrong. I tagged the wrong image. So, I just respun up the team to get us to the end line. And once everything's completed, we pretty much have the exact same process with little to no intervention from myself. And at the very end, one thing I will tell you is if you recreate the same process or you send the same prompt over and over again, that's where it makes sense to then spin up a skill or slash command. [music] And the way you implement the skill really depends on your workflow. So in my case, it knows that when I say banana squad or spawn the image generation team, it will automatically ask me the questions, create and spawn the team of agents, dispatch the work after I confirm and I approve it, and then it will present me the critics's findings. Then from there, it will do what's called a graceful shutdown, which is basically making [music] sure you don't keep your agents running in perpetuity. Because if you can see right here at my last version, these agents actually ran for less than 2 hours, but I didn't actually shut them down. So you can see here, it's baked for a day. So in this time, it took unnecessarily another 10 20,000 tokens because I forgot to shut them down. So make sure that [music] if you spin up a skill or a slash command that you bake this in. And that's pretty much it. You have everything you need to create whatever diagrams you want today using this exact same method [music] and exact same skill. And to make it that much more easy for you, I'll throw in the system prompt and a few of the guides that I fed it to recreate this whole process in the [music] second link in the description below. If you want access to things like my Claude MD and exclusive systems and a brand new beginner to intermediate Claude code course, I've just made that available to all of my exclusive community members in the early AI adopters community. And I'm even going to go a little bit deeper for my community members on exactly why I've structured my project in this way, how the API works under the hood, and how the critic [music] actually goes through each and every one of its KPIs. So, if you want to go deeper in the Nano Banana and Claude agent team rabbit hole, then you'll want to check that out. And for the rest of you, I would super appreciate if you could leave a like and comment on the video. It helps the video. It really helps the channel. I'll see you in the next
Video description
Join My Community to Level Up ➡ https://www.skool.com/earlyaidopters/about 🚀 Banana Squad Prompt + API Guide + Diagrams (FREE): https://markkashef.gumroad.com/l/banana-squad-agent-team 📅 Book a Meeting with Our Team: https://bit.ly/3Ml5AKW 🌐 Visit Our Website: https://bit.ly/4cD9jhG --- 🎬 Core Video Description What if you could generate professional illustrations, diagrams, and graphics with zero Photoshop and zero design skills? In this video, I break down PaperBanana, a brand new research paper from the Google team that proved a 5-agent pipeline beats human designers 73% of the time, and I show you exactly how I rebuilt that pipeline using Claude Code Agent Teams and Nano Banana (Gemini 3 Pro Image API). You'll see the full Banana Squad in action: a Lead that orchestrates, a Researcher that reverse engineers any reference image, a Prompt Architect that writes five distinct prompts, a Generator that calls the API five times, and a Critic that ranks every result on faithfulness, conciseness, readability, and aesthetics. By the end, you'll have everything you need to recreate this exact workflow with a single prompt. --- ⏳ TIMESTAMPS: 00:00 - Results Preview: What the Banana Squad Can Generate 00:58 - Visual Capitalist Inspiration and Reverse Engineering 02:30 - PaperBanana TLDR: What the Google Paper Actually Found 03:00 - The Big Idea: Image Generation as a Design Agency 03:39 - Why Critique Is the Secret Weapon (45% to 55% Accuracy) 04:40 - Applying PaperBanana to Claude Code Agent Teams 04:58 - Meet the Banana Squad: 5-Agent Architecture 05:15 - The Lead: Orchestration, Not Generation 05:25 - Research Agent: Analyzing Reference Images 05:49 - Prompt Architect: Five Narrative Prompts 05:49 - Generator Agent: Calling the Nano Banana API 06:00 - Critic Agent: Ranking on 4 Dimensions 06:25 - How the Squad Communicates 07:01 - Walking Through the Mega Prompt 07:46 - Project Structure: Paper, API Guide, CLAUDE.md 08:10 - Reference Images Folder Setup 08:39 - Gemini API Guide (Copy-Paste from Docs) 09:00 - Prompt Architect Rules: Narrative Paragraphs, Not Keywords 09:43 - Critic KPIs: Faithfulness, Conciseness, Readability, Aesthetics 10:00 - Lead Behavior: 10 Clarifying Questions 10:23 - Live Demo: Spawning the Team 11:00 - Critic Rankings and Recommendations 13:12 - Where All the Files End Up 13:47 - Reference Images: Style, Composition, Subject, Brand 14:10 - Second Demo: World Diagram on AI Investment 15:37 - Turning It Into a Skill / Slash Command 16:00 - Graceful Shutdown (Don't Waste Tokens) 16:54 - Recap + Free Resources --- 🔗 RESOURCES: • Banana Squad Prompt + Guides: https://markkashef.gumroad.com/l/banana-squad-agent-team • PaperBanana Paper: https://dwzhu-pku.github.io/PaperBanana/ • Agent Teams Docs: https://docs.anthropic.com/en/docs/claude-code/agent-teams • Gemini Image API: https://ai.google.dev/gemini-api/docs/image-generation • Visual Capitalist: https://www.visualcapitalist.com • Claude Code: https://claude.ai/code --- #ClaudeCode #AgentTeams #NanoBanana #PaperBanana #GeminiAPI #AIImageGeneration #AIAgents #Opus46 #ClaudeAI #AIAutomation #AIDesign #NoCode #AITools #Anthropic #GoogleAI