impl Rust: Avro IDL tool in Rust via LLM

Jon Gjengset · 15.4K views · 271 likes

Analysis Summary

20% Minimal Influence

mildmoderatesevere

“Be aware that the 'Java is difficult/unhelpful' narrative is used as a primary justification for the project, which may overlook simpler fixes within the existing Java codebase to favor a Rust-centric solution.”

Transparency Transparent

Human Detected

98%

Signals

The video is a long-form live-coding session featuring a well-known human creator (Jon Gjengset) with highly natural, unscripted speech patterns and personal insights. While the content involves using AI tools (LLMs) to assist in coding, the presentation layer—the narration and creative direction—is entirely human.

Natural Speech Patterns Frequent use of filler words ('um', 'uh'), self-corrections ('combination of the three... combination of the four maybe'), and natural pauses.

Personal Anecdotes and Opinions The speaker discusses their personal skepticism of LLMs, their specific workflow preferences, and their lack of Java knowledge.

Live Interaction Context References to 'another imple Rust stream' and 'doing it on stream' indicate a live, unscripted human performance.

Subject Matter Nuance The speaker explains the 'why' behind using LLMs as a tool for a specific porting task, showing high-level human reasoning rather than a formulaic script.

Worth Noting

Positive elements

This video provides a deep, transparent look at using LLMs (Claude Code) for complex code translation and grammar parsing in a real-world systems programming context.

Be Aware

Cautionary elements

The 'powercoding' framing may make the process look more reliable than it is for viewers who lack the host's high level of expertise to verify the LLM's output.

Influence Dimensions

How are these scored?

About this analysis

Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.

This analysis is a tool for your own thinking — what you do with it is up to you.

Analyzed March 13, 2026 at 16:07 UTC Model google/gemini-3-flash-preview-20251217 Prompt Pack bouncer_influence_analyzer 2026-03-08a App Version 0.1.0

Transcript

Welcome back to another imple Rust stream. Um where we take ideas and we build Rust code out of them I guess is the whole premise which uh feels like most of programming in Rust but I guess we do it on stream. So so hence it needs to have a name. Um so in the stream we're going to uh take a tool for the Avro ecosystem. I'll explain what Avro is in a second. Um that already exists. It's written in Java. uh and I just want a version that's not written in Java. Um and then there there's really like one main motivation for it um which is the existing Java tool while it is maintained by the official maintainers of Avro um it does not do a great job of giving good error messages and uh as you'll see when I explain what Avro is um bad error messages in this case can be extremely timeconuming to debug uh and so I wanted to make that better and you probably can make this better in Java but I don't know Java very well. Um, not to mention I can think of like another couple of things that I would like to be different. Um, and so I'm just going to try to port it. And and uh uh I'm also going to use a decent amount of LLMs for this one which uh some people have like raised question marks about. And the reason actually is because this specific kind of task is the kind of thing that I found LLMs to be really good at. like I I've mentioned in the past how you know I have a lot of skepticism around LLMs but the skepticism is uh I think I think has become more refined maybe over time that there I've sort of recognized that there are some things that agent coding in particular do really well at there are many things it's terrible at uh and there are some things where it's good with enough guidance at and there are some things it's really good at and this is actually something where I think it is really good at this kind of thing um in particular the work we're about to do has two properties that I think make sort of set up the agentic LM4 um for success. The first of them is that there's an existing open source implementation in a different language uh that we just want to port, right? So the existing code already exists and we just want it in a different language. I found LLMs to be very very good at that kind of translating from one language to another when you have the other as a reference. Um especially I've used this a lot going from bash to Python for example as bash scripts try start to grow too large and unwieldy uh turning them to Python or potentially turning them to rust. Uh it does really well for that. And then the other and this is another critical component to it is um we have a reference implementation where we can basically generate an infinite number of examples of correct behavior. Right? So, um, and this gets back to what AVO is, but the tool we're building is basically a sort of file format converter, um, that takes, um, uh, it takes one file format and turns it into a different file format. They're both textbased. Uh, and so if we have a given input file, um, we can tell the LLM, well, if you want to check whether your implementation is correct, then pass it through the the official tool, pass it through yourself and check that the outputs are exactly identical. Um, and that kind of setup means that you can kind of leave it to run on its own to figure out its own problems until it arrives at a solution that like fits the fits the pattern exactly. Um, and this is this becomes more more like um, you know, it's almost like a semi-random walk. It's almost like fuzzing to get the right uh solution, but because you have a good reference uh becomes more like a you can almost do property based testing over this thing. Um so uh let's get into what Avro is. So AVO is uh for those of you who have heard of Protobuff, Avro is fairly similar. So uh AVO is a um system mechanism um concept combination of the three a spec combination [snorts] of the four maybe um that lets you describe um interfaces for both for data exchange and for remote procedure calls like APIs um in a particular format, a standardized format and then you can generate servers and clients based on those definitions of data types and potentially those definitions of of service endpoints. Um, and you can generate those in multiple different languages. So, for example, you could write a uh a schema that defines, you know, here are my here are my various data types and also here are the API endpoints that I want and you could autogenerate a client in Rust and a server in Java, for example. And then you obviously have to fill in the the sort of business logic, but it can generate the interface for you. it generates the the the serialization and des serialization protocol between the two uh and brings you a bunch of tooling to work with that as well. Um Protobuff obviously is the the one I think most people know about from from Google. So this is Google protocol buffers. Um and the there are a couple of differences between Avro and and protocol buffers. Um for me two of the ones that are most relevant and the reason why I ended up working with Avro in this particular uh setting that I'm using it in is um uh Avro has support for um these thing they're called uh they're called uh open containers which are basically this is not to be confused with OCI images and the like um but they have a mechanism to say this is like a bite stream that accompanies the the data that's exchanged. Uh Protobuff doesn't really have anything equivalent where you can say this gRPC does, but I don't want to use the RPC part. I just want to use the the the data interface definition language. Um and so Abro has this mechanism for saying here's like the data type that you're going to send or receive like is a definition of a message and this part of the message is just a a bite stream that where we might not know the length in advance. Um the other thing Abro has is this notion of logical types. So um it's not like your schema has to be constructed just of um you know objects that have fields where the fields have types that are either themselves objects or they are primitives like booleans, integers, floats etc. Um in in Avro you can also define your own logical types to say um this is an integer but it also has to follow the following rules like for example it's an integer but its range is between x and y uh or it's a string that has to match the following set of patterns um and you can kind of hack together something similar in protobuff as well um but it it's um it becomes much more ad hoc like in Avro this kind of logical typing is is uh directly supported by the spec itself. It sort of has a a place where it goes. Um so why do we need a tool for this? Well, so Avro itself is a a pretty vibrant ecosystem. Uh it's maintained by Apache or rather it's an Apache project. So it has a bunch of maintainers, a bunch of users. Um there's an AVO project on GitHub that's pretty active. Um and it's all of the tooling there is written in Java. um it can still generate code in lots of different languages. Um but the the implementation of like the parser and the the code generation code generator and all of that stuff is in Java. Um and that's fine as far as I'm concerned. It's like uh I I don't necessarily want to have to install all the Java stack to run these things, but you know, so be it. It's fine. Um but there is one particular tool that is kind of unique to AVO that doesn't exist in the Google Google uh protocol buffers uh ecosystem, which is that AVO doesn't just have one file format for describing interfaces, desri describing data types. Um it actually has well let's let's say two. Um so it has one that is called the IDL. The IDL is an interface description language and it looks very much like protobuff. Um so if you look at the IDL language as sort of example at the bottom here um so here for example um schema.avdl. So you declare a namespace. Okay fine. You declare a uh this is optional. Um you declare an enum type. You declare um some other type. you a record which is sort of a me equivalent to message in um in protocol buffers. Uh you give a name to that record and then you say it has the following fields of the following types. Um and you can include additional attributes for those fields as well. Uh you have unions, you have uh default values and so on. You have arrays. So you can say this is a list of things of the following type. Um so this is a an interface description language. This is great for humans, right? So humans can read and write here. You have comments like you have all the the nicities of being able to like nest um and like write, you know, write the code and format the code however you wish. Um and it's fairly easy to read. Even if the file gets large, you can sort of logically separate into multiple files and you know, whatever organization you want to do. Um it's okay for machines, but it's kind of annoying to parse something like this, right? because it it has its own grammar. It has its own syntax. And so for that reason, AVO also has a separate schema type um or or file type really which is a JSON file type um that is basically a JSON encoding of the information that is in this file. So uh if we go over to the the actual specification for Abro, you'll see that it says a schema is represented in JSON by one of these um where it has type and things and attributes. And if we go down, they probably have examples near the bottom here too. This is what a schema looks like in uh like in the afro JSONbased format. And so this ex can express the same kinds of types, but it does so in a way that's much easier for machines to parse, process, do code generation based on and so on. Uh but you need something that converts from the IDL to this kind of JSON representation. Right? So if you have this but all of the tooling expects to be given this then how do you go from this to that? Well there's a uh a tool that ships in the so there's an ARO tools Java package that includes a bunch of various kind of Abro tools and one of the ones that it includes is this um IDL tool where you give it an AVDL file. That's the IDL we just looked at here, the the human writable one. And it generates you an AVPR or an AVSC um which are the JSON representations. The difference between APR and AVSC here is that um AVPR is a protocol. So think gRPC um like it's a mechanism for describing service endpoints as well. Whereas AVSC is just the data types and not the the service endpoints. Um, and so when you run this tool, it will take this human readable thing and produce a JSON readable thing. Um, and you can use it for schemas and protocols and, uh, whichever you prefer. Um, and this tool itself uh, seems to be maybe less well-maintained than the other tools. It's not that it doesn't work or that is broken or anything like that. It's just that um, it seems to have not gotten as much, you know, care, love, and attention. Um, and I think maybe I've been spoiled by Rust here, but when you run this tool, um, if you have an error in your IDL, you don't get very good error messages. So, you will often just say this file just failed to compile or failed to be translated into uh, JSON format and then not tell you why. And this can be really frustrating. So, imagine you have like a thousandline IDL file and you like misplaced a comma. It won't tell you where you did so. And so you need to like you need to either just look really carefully or it turns out there are also people have built like IDE integrations for this um like an LSP among other things and that one actually has a better parser that gives you good errors. So you open it in your editor, you look for where the editor tells you the problem is and then you go back and run the tool and that just feels kind of silly. Um, not to mention you also need to have Java installed and you need to have this JAR in order to run the tool and that makes me a little sad, but you know, who am I to say? Um, you know, part of this is also I don't use Java very much, so I don't usually have Java installed on my computers. Um, like I will install it when I need to. Um, it also means I need to like fuss around with Maven in order to be able to grab the jar or like you can also go to the download page and then they have like a mirror where you can look through basically an FTP server where there's a bunch of jars you can click to download which I I don't love the the idea of. And so I thought, well, can we just write just this tool? Like I don't want to port all of Abra tools. just want the converter from the IDL format to the JSON format in Rust so that I can just run it in a Rust native project. Um, so that's what we are aiming for. Um, are we is it clear what this is for? What we're doing? Um, I'll I'll demo the tool in a second just because so we can show that it actually does what I said it does. Okay, people see people seem to think it's clear. Great. So then what we'll do um is we'll do cargo new and we'll call it AVDL which is the the file extension type for um uh for the AroDL. We could come up with a clever name here but I don't think we need a clever name. Um and then what I will actually do is I will create a git subm module. that brings in Abro. Um the reason for this is just because um we kind of want to reuse a lot of the uh Oh, riddle is good. Riddle is good. Yeah, but then it doesn't have the A because there are lots of IDL's, not just Afro. I don't hate Riddle, though. Oh. Fine. Fine. Riddle [laughter] [gasps] riddle. Riddler. Riddle. Radvil. Radvil. No. No. I'll just keep AVDL. Let's not go down this rabbit hole. [laughter] A riddle. So So now it's a riddle. A riddle. No, no, we're just going to stick with AVDL. Let's not make it complicated. So the [snorts] reason I want the Abro checkout in here is because one of the things we want to do is make use of all the test cases for the existing codebase. So, um, if I CD into Abro here and I search for AVDL files, you'll see that, um, in the Abro repo, there are a bunch of AVDL files that are used as tests for the Abro tool that they have. And we would at the very least expect that our tool produces the exact same output as the upstream tool um, when run on these files. So, we basically want to duplicate their test suite and make sure that we also pass the exact same test suite. Um, you'll see there are a couple of other AVDL files here that are also like um test files for the compiler and stuff and we can just run on all of them, right? Because we we have the reference implementation. Our goal is to produce the same thing as the the the average jar does. Um, and in fact, let's let's look at what that looks like. So, first we'll we'll grab um that's not the right command. That's not the right command either. Uh because it is uh it's like you need dash d get dependencies equals false. No, I [snorts] did this earlier. Yeah, it's this. No. No. This. Can you tell that I don't use Maven very much? What? No. I was sure I had that. Fine. We'll do the other one. So, you can also download the jar directly from Apache. So, let's do that instead. I don't want to fetch all these dependencies from Maven. seems annoying. Anyway, we now have the jar. Hooray. Uh, and so now we can run. In fact, if you go back here, you'll see there's a usage example. So, we can take this exact usage example. Uh, and we will run our Java tool. Uh, this is going to be in /avrol no what? Oh, I'm already in uh so it is going to be where was it? Lang. Okay, so it's in lang java IDL source test this and then we'll call it we'll set the output file to be namespaces.pr. Uh it ran and now we open namespaces.avpr. Oh, that's annoying. Fine. I can I can fix that. Uh I'll fix that right away. Config. Uh editor. No. Uh ne it. Yes. Yes. Yes. Okay. It's very insistent that I fix this right away. Yes. Yes. I don't want you to bother me every single time I [sighs and gasps] cool. the URL it told me to use instead does not work. There we go. Okay, great. Stop bothering me. Cool. Uh so when we look at the um uh when we look at the APR file that it generated, you see in fact let's look at the input file first. Uh you see this is an AVDL file. It has a protocol defined in here. So protocol is like a a service like APIs and stuff. Um, and you'll see it has a bunch of it has types and records. Uh, and then if we look at the generated file, we should see the same types like the fixed here, fixed in other namespace and so on. Um, and the same records and the nested fields and everything and it's just in JSON format instead. So, the tool works. Now, let's see what happens if I try to change this and I do, I don't know, remove this semicolon. If I now run this uh okay this one's not too bad actually. So exception in schema parse exception line 37 mismatched input record and other name expecting semicolon or colon. What happens if I put a comma here? Ah great. Okay so I put a comma instead of a semicolon. Exception in thread main schema parse exception. no such element exception when parsing with no indication of which line the problem is on or anything. And I mean granted parser parser errors are difficult, right? Because this might be a completely valid syntax like the grammar is is accurate in the file that I wrote. Uh but when we parse it, we end up with the like inability to parse what was produced. So like there's some amount of leniency here, but I would like to see this error be better. And so that is going to be the the primary difference between the the Java version of the tool and ours. Hopefully uh we will find out. So let's return that to a semicolon so the file remains valid. Um and uh what else do I want to show here? Ah there's one more thing to be aware of. So the um the abro tools jar also has uh IDL2 schema. No what's it called? Um, so there's actually two commands. Um, if I can find the other one. Yeah. So this IDL 2 schemata uh which is different from the IDL tool and it's not entirely clear how because these two tools are both invoked through the um uh through the AVO tools package. This one says tool implementation of generating avo schemata uh from ideal format file. This one says extract the avo schemata of the types of a protocol defined through an IDL format file. The actual difference is that this one if you pass it a protocol file, so one that defines like API endpoints and stuff, it will parse the thing, extract just the the type definitions but not the endpoints and turn it into a schema file instead. So in other words, this one will produce a um um an AVSC or an AVPR like the a proto like it will produce everything that's in the input file. This one will not include the protocol bits. Um is at least my memory of this. Uh yeah, I'm pretty sure that's the difference between them. I I don't end up using this one very much. Um, okay. So, now let's figure out how we start doing this. Um, and the way we're going to start doing this is we're going to kick off Claude and see what we can get to. Yes, I trust this folder. That's fine. U, we'll enter plan mode. And then what we're going to tell it is uh I think this is like it should be fairly easy to get it started here. And then we're going to have to vet a lot of what it does because I suspect it will get quite confused. Um, so what we're going to say is I'm trying to um port and then we're going to give the path to the tool. Uh, which is going to be this one. Oh, uh, I want to copy this path GitHub. Thank you very much. Um, trying to port uh to Rust. Um I would like to reuse Oh, there's here's a this is another important part. So one of the reasons why I think this is actually feasible is because um the the grammar for the IDL files is defined using something called antler. Um Antler. So, Antler or another tool for language recognition uh is a um is a tool for writing parsers. Um, so you basically it uh if we open up the file for it here, um, it's a a domain specific language DSL that lets you write out the grammar for some language you want to parse and once you have that grammar, you can pass it to antler and it will generate you a parser for that grammar in whatever language you choose. So um for example you can and this is what um Avro does. You can use Antler to generate Java code for the Antler grammar. Um and now we have the uh um now we have a Java program that can parse that file. But you can also use it to generate Rust code. Um and so we can take the grammar and just use that to generate a Rust parser right off the bat. In fact, this part we don't. Let's um wait to start Claude here. Oh, you're right. I'm in Abro. I don't need to be. Um so, let's first get Antler to generate us the Rust code that we want. And then we can build on top of this. Um so, uh I think I already have Antler. Yes, I do. Um, and then let's see if the antler I have builds with rust already or whether I need to um uh whether I need to custom build it. So what I want to do is this and see what happens. Antler cannot generate. Okay, so the antler for rust thing is uh yeah okay it's going on in this repository. This is the main repository. This readme I think is out of date. Um, but I I think this is not considered like it's not upstreamed yet or it's not uh enabled as a default distribution of Antler the the Rust implementation. So, we need to figure out how to get this one. Uh, which I think is just going to be a matter of us Oh, I see. No, it is still in a fork. Great. So, what we're going to do Yeah, the rabbit hole begins. It's true. Um, what we're actually going to do is we're gonna CD. Do I want to do this under us? Yeah, sure. Um, we'll go get subm module add this one. Uh such large projects to clone come out. All right. So how do we build this thing? Read me. Build. How to build doc. Building antler. Most programmers do not need the information on this page. Cool. Well, that's not us sadly. Uh, okay. We have the source. Yes. Check your environment. I have Java. I have Maven. Uh, the current Maven build seems complicated to me. Great. [laughter] Okay. Uh, don't forget this on Linux. Okay. So, this is going to be But I do I want it to install? Can I tell it to aha building? Install sounds a lot like not not just build, but okay. Um, fine. So, we're going to run uh Oops. We're going to run something like this. Let's just see how that goes. Yep, it's doing something. This feels like the wrong logging level, but we will find out. [snorts] [snorts] Install will just drop it in your user repo. Oh, build success. Build success, you say? [laughter] Uh, where did it put it? Oh, it put in an M2. Okay. Uh, so I should now have antler files in here. But that's the Oh, right. I don't want the test suite. I want antler. Uh, probably runtime. Probably the 4133 snapshot. Probably the jar. Cool. So if I now run uh here this instead, what happens? No main manifest attribute. I love running Java code. Uh, interesting. So maybe it's not just runtime. Maybe I need master. That sounds promising. Oh, but there's no Java. There's no jar there. H. uh run maven package to local build with artifacts in target. Oh, see this is why it's so handy to have Java experts in the room. Um so I can do package antler for tool. You lied to me. There's no jar now either. Uh where did it put it? Uh building jar ah in tool target antler snapshot. What's the difference between complete and not complete? Riddle me that. Okay. All right. I I'll I'll bite. How about this? What if I do the complete one? Is the complete one better? Complete. It did a thing. What did it do? What? Actually, what did it do? It ran successfully. Clearly, I have not used uh antler myself before. Let's um let's go look at what this says. Okay. Yes. Yes. Yeah. I Okay, I ran that. Now what did it like did it modify my source main? No, it what what Hey tool tell me what you did. D okay what if I what if I here do like Java? What happens? I bet you it writes it near the file you gave it or something. Yeah, there we go. Okay. Okay, great. Um Okay, well, we figured it out. So uh let's seed into arro and let's do let's do get clean in the subm module to remove those. Um and then we'll run the rust version again. Give us rust and then we'll ls here. Actually, we'll then we'll do this this and we'll do ls files. No, get ls. Get no. Get status. It's too early in the morning. Now, the question is, do I need the interp and tokens files or do I just need the rust files? I'm going to assume I only need the rust files. So, we're going to do move star.rs RS to source. Let's go look at what it generated. Um, IDL parser generated. Great. It generated a whole lot of code. Um, let's then see whether there's any mention of interp in source. Uh, it doesn't look like it. What about tokens? No mention of tokens. Okay. Cool. Uh so we have source files. What what else does it tell us? Uh you can also see builder as a one that rebuilds the parser automatically. Oh yeah, there is dash O. Great. Uh I don't think I want this to be a build RS. I think I want to check in the files cuz I don't want consumers of this crate to have to have the antler the antler rust version built locally to be able to run this. Uh so I think I actually want to just check in the generated files. Um yeah, that is what I want to do. And this is just uh examples. That's fine. Okay, great. And then it says add rust to your dependencies. Okay, that's fine. I can do that. And then ah so they so they have an example build RS. Do they also have an example like source lib? Not trivially. Okay, fine. I was hoping they would have a Oh, examples. Do they have an examples file in this directory? No, they do not. Okay, then uh I think what we're going to do is we need to figure out what the end what the entry point here is, which I wish they were documenting this read me, but they did not. Okay, that's fine. Um, what we'll do then is we'll go to source. Oops. Oh, I don't have a source lib because I didn't tell it I wanted I told it I wanted a binary, which is true. Um, but I also want a library and I want uh I want uh ls source. I want mods for each of these. Like so. Now, what happens if I run cargo check? Well, that's also interesting. They tell me to take a dependency on antler rust 0.5. Huh. Uh, okay. But I think I see a problem. Uh, anther rust02. Do not use anthther rust03. [gasps] Interesting. I wonder if it's this one they mean. Yeah, I think it's this one they mean because in fact if you look at the generated code if we go into this for example you see it's naming a dependency called antler for rust which is not what that one is. So I think this is supposed to be that and that the readme is simply out of date. Well, that compiled. Gave us some warnings, but that's fine. Okay. So, now we in theory have something that looks like a parser. Um, so in fact, let's go ahead and do let's cdient arrow first. Uh, get clean this. I just want to commit this uh in it plus generated grammar. And now I think we can so so we can either now go and try to figure out how the parser works. And maybe we we probably will have to cuz I I suspect we're going to have to uh point um Claude in the right direction for how to use this. But I do actually think uh aha for examples you can see here. Yes. Okay. This is the corresponding generated code. Yes, we've seen a lot of that. And so this is what Okay. So it generates uh we you know we pointed it at a grammar file called IDL and for each file you point it at it generates four files. Uh sometimes maybe more than four files. Ah that's visitor basic base. Visitor basics. Let's uh look at maybe CSV. It generated lexer listener parser and visitor. And it also generated these base. It didn't generate the base ah IDL base listener IDL lexer listener and parser. So we did not get a visitor for whatever reason. Okay, that's fine. Um so the generator files look like they're about right. But now I want to see uh general tests. So I'm hoping that this will actually show me how to parse one. Yeah. Okay, so this uses from the lexer and listener that were generated. Here's a thing that it's supposed to be able to parse. So I guess this is an XML parser, right? Okay, great. [snorts] Yeah. So, this shows you how to do the use the lexer, how to use the parser. Great. So, we have now a reference file here. I wonder what's in gen. Oh, the gen is just the generated code. What's perf? Cool. This This looks like maybe it needs some cleanup. Um, I see. So visitors are the ability to have a parser that will um let you write code for visiting stuff in the a uh which I don't think is going to be useful for us. This is where it's useful also to go back to the um the Java code. So if you look at because it also uses antler, right? So uh in the Java code they uh these files just define the basically the equivalent of like source main RS, right? for these two binaries and both of them end up just calling into um they get an IDL parser by creating an IDL over the input file and then they from there parse out the protocol and they get name schemas and stuff. Okay, so they just get things out of there and the IDL is this kind of thing. So IDL file which is an IDL reader. Uh so IDL reader here presumably we should see a bunch of the stuff that comes out of the generated parser. Aha. Uh Abro ideal parser. Okay. So where's the generated parser? Did they check that in somewhere? IDL schema parser. Aro parse context avo schema. No, that's for parsing a schema. That's not what we want. We want the IDL. Where are these coming? Like where's the IDL parser coming from? That sounds a lot like the thing that's generated. I just want to see if I can find a sort of natural entry point here. Yeah, this this looks a lot like using the things that Antler generated. Primitive type context. I just want to see if I can if they checked in the generated Java files or not. So if we go to uh IDL source main.org Aro IDL no they just have these. So that makes me think that they might not check in the generated um the generated parser unless it's under here utils write schemas. is no this is writing this is writing an IDL this is not writing the JSON or parsing the IDL so these are just like I think these are actually helpers for the llinter right so imagine you want to run a formatter on your IDL schema the way you do it is you parse the schema file and then you write it back out in a well- definfined way and I think it this is the write it back out in the well- definfined Okay. And so this doesn't matter too much to us. I mean, this could also be interesting to port, but it's not the main focus I want to have. I'm just like the IDL reader feels like it's the right one. But this IDL parser, where does the um this lives? Oh, they list an IDL grammar. Okay. Does anything else live here? No. Okay. My guess is that this generates an IDL parser and then that's just what gets imported from these other files eventually. Um, and so this is just the use of the generated code. Cool. And so that's similar then to the uh the Rust code that we saw earlier. So like in general tests for example um which also shows how to use the parser in the like in rust we're not going to get an ideal parser class instead we're going to get these um these listener visitor types and this is the kind of stuff where um this mapping if I were to sit down and understand everything antler generates um and then map what it does in Java to what it does in Rust it would take me a huge amount of time. it it might still be worth it in the end, but this is where I suspect Claude will be able to do very well if we point it at the things we found so far. Um, so this is where we're going to start. Yes. Um, okay. Now, one of the big sort of ways to make these agents actually work well for you is to be good at describing the thing you want implemented. Uh, so we're going to start with a sort of highle description of what I'm aiming for. Um, so I want to and let's give it the um the path here to the uh this tool. Uh I want to port a slash uh this to rust in the current to a rust binary in the current crate. Uh the grammar for uh the AVO IDL is uh described in uh described using antler in slash this um which I have already generated generated Rust code for in source slash uh idlar.rs um and then I actually think this would be enough to get it started but I want to guide it a little bit more for how do we want ours to be different. Um so what I will also point it at is this. So the Java code is not it's split kind of weird. So the main binaries are in a different place in the repo from where the libraries for doing the actual processing is. So I'm going to give it this path as well. The actual logic for parsing for the uh Java version lives in a slash this path uh lives in the files under here. Um, and then what else? Um, I'll also point it at this, uh, which is really just runtime rust test general. So, I'm guessing that this will be here, right? Yeah. uh antler this um you can see example rust code for interfacing with antler generated bindings in this. So this will be a good example for it. Um and then we want to give it some example input and output files. Um, and those we want to use these for. And then I also want to see whether there's an outputs. Yes, there is. Um, what else is in this directory? Okay. Um uh as test input and output files use and output for any given AVDL file uh for any given set of AVDL files. You can also execute the Java version yourself using um and this is where I need the Java this one. It's also very useful to give it um example commands that it can use to run things itself. Um and then we can do here path to AVDL path to output file. >> [snorts] >> Um, you can also execute the Java version yourself and generate an output file using this. Um, cool. Let's start it there. And then usually what will happen is it'll come up with a mediocre plan initially and then we're going to have to um oh is it an output folder? You can certainly give it just a what happens if I just do dot slash oh uh right I deleted that file. Oh no. Uh, well, now that Maven command works. That That's handy. Uh, yeah, it doesn't like that. So, it does need to be a file, I think. Oh, maybe the maybe this one is the one that can output a folder. Let's find out. Maybe that's the difference. Yeah, this does say output directory. So, let's let's see what happens if I do this. What do I get? Aha. Then I get one point. Then I get one file for every type. Uh okay. Uh there is also uh and then we'll give it the uh path to this guy as well which you can pass a output folder stress folder uh to in order to generate separate AVSC files for um every schemata defined in the input. Uh and then the last thing I'll do is I'll point it at the IDL language docs. Uh you can find the documentation for AVOS IDL format at this URL. And now let's um let's kick it off and see what plan it initially comes up with. And then I'm expecting we're going to have to do um a decent amount of um finagling to this [snorts] context filled entirely by prompt. H I don't think so. I think I think we'll be okay. Uh yeah, you can run WC commands. So that seems fine. [snorts] And then initially, obviously, it's going to prompt us a bunch for am I allowed to use this tool? Am I allowed to use this tool? And then pretty rapidly, we're going to end up with like a a sane enough set of [snorts] things that are allowed that we don't need to approve things all the time. Um, in the meantime, check the Rust parser files generated by Antler if it has a parser function. Sure. uh parser. In fact, we can do because it's Rust fn parser in source. Nope. But there might be a parse. That's interesting. Didn't we see this? Oh, no. It's um it's that you create a uh it's like parser new. Well, that's the lexer. Where's the here parser test CSV you create a lexer over the input stream then you create a parser uh over that token sort which is the stream of tokens that comes out of the lexer and then you add a listener to that parser. Um the other thing that's nice here is so Antler is supposed to be pretty good about keeping the parsing context and revealing that to the callers so that you can then give oh this is the last thing I forgot to give it is I want to use Miet um because Miet is really good for giving you the ability to point to specific parts of the input. Uh so let's go ahead and do that here as well. Um I would like to use Miiet as a way to keep track of uh lexing and parser errors so that we can give very good and helpful back traces in the case of invalid input files. The by the way here I think is a clawed feature where you can give a you can give an additional prompt and it doesn't interrupt what it's currently doing but it we'll read it the next time it's like in between instructions. [snorts] Um are you running cloud in a VM? How do you defend against information leaking out such as environment variables and so on? Um I'm not running claude in a VM here but I'm also not running it in YOLO mode so it has to ask me for permissions for things. Um, in general, the I mean the the built-in sandboxing is not great, but it's also not terrible. Like the sort of obvious examples of it trying to read outside of the directory that I'm currently in, it tends to ask me at least once. Um, environment variables I'm not too worried about. Like there there are no environment variables in my current environment that are actually sensitive. Um, do you use any advanced agent coding capabilities like git work trees and agent swarms? Um, so, uh, why does it want to look in my cargo register? Oh, I see. Because it wants to look at the code of antler for rust. U, sure, that's fine. Um, yeah, so I use that's also fine. That's also fine. Um I do use git work trees especially when I have um if I have multiple sort of you what I'll usually do is I'll um as I especially read through both the LLM sort of talking to itself uh and the code that it generates I sort of start keeping track of a to-do list oftentimes in an actual um to-do.md file. Um, and I'll sort of write down the the things that I want it to do in the future. And um, then after I finished whatever my current session is, I'll then point it at my to-do file and say, I would like to fix the following things. Um, break them into semantically related changes, then start an agent for each one in a separate work tree. Um, and then that tends to work pretty well. Uh I did not enable team mode for this. Um I don't even know that we're going to need multiple agents for this initial part. Um usually I found that for the generation of the first sort of core of the application, it doesn't work so well to try to do it in parallel. Um it works better for things like having them review each other for example or if you're working on multiple relatively disjoint features for the we're trying to write the initial version of a crate. it usually doesn't work as well. Um, uh, Miet is really neat, but I've always found the actual reporting a bit of a pain to deal with. Very poorly documented there. Yeah, but I think the documentation of MATE itself is a little mediocre. Um some of it is also that like the the dynamic error types in Rust can be a little annoying to deal with because for example um Miet error cannot implement the error trait and this is sort of a limitation that's that's known in Russ like anyhow and error have the same problem. Um but if you specifically want to get um the sort of errors where you can point back to source code and say like here [snorts] uh then me works pretty well for that. And you'll see it's still in the plan phase, right? Because we gave it a pretty involved set of things to look at. Um, and so so far it didn't hasn't generated any code. It's just trying to come up with a plan that sort of satisfies the inputs that I gave it. Um, and I mean we could we could go in and explore what it actually is thinking about. But instead, what I want to do is go back to the ideal language a little bit. So the ideal language itself is like not the translation to JSON is not actually very complicated. So if you look here like we have a record. So we have type message. It has two fields title and message. Uh both are string typed but the title is allowed to be optional and defaults to null. And you'll see if you turn that into JSON it it's a pretty straightforward transliteration right you go from like type is record. So that's from the first keyword name as message and from the second name space is grabbed from the file. So you do need to keep some amount of global state as you walk the file. Um fields is just an array of the fields uh for each field. You have the name of the field, the type of the field and a default if it has one. Um you'll see this type here is an array and that's to support unions of types. And so the way that averro represents optional types is that it's sort of a syntactic sugar for a union of string or null. Uh and that that's all there is to it. So so it's not like there's a super complicated resolver happening here. That resolver lives in the the actual Avro compiler which works on the JSON. So it's the just the transliteration from the AVDL format to the JSON format itself is kind of dumb. And we can sort of see this like if you go to the IDL reader like there's a bunch of code in here but much of it is just like invoking the parser and the lexer and then defining a bunch of types that we need. Um but there is some right like um here for example they're trying to extract doc comments so that you can add those into the JSON thing and then you need to know when you have a doc comment in the input file you need to know which record to associate that documentation comment with. So there is like some amount of complexity here but it's not a huge amount of complexity. Uh there are also imports that you do need to deal with. So it's like okay there's a there's a little bit of annoyance but like ultimately this whole file is a thousand lines. It's a thousand lines of Java which is not the like it's not the most concise language in the first place right so a thousand lines of code for doing the reading and the writing doesn't seem too bad to me [snorts] um no that seems incorrect. You don't you do not want to do this. Um, so let's instead tell it you can find the full source code for all the aro all the antler for rust uh stuff in uh antler for rust runtime rust. source, right? That's what we Yeah. source so that it stops trying to look through my registry, searching for things in my entire registry. >> [snorts] >> Um, I was writing a DSL for work a couple of weeks ago and was surprised the nightly trade allocator was not stabilized. Would have made a lot easier. Um, yeah, I mean allocator has been unstable for quite some time. Um, I don't actually know what the current blocker to stabilization there is. Um, yeah, there's a I remember for a while it was trying to figure out uh things like whether dealloc needed to be given the layout of the type that you wanted to deallocate and stuff, but but I'm sure they've moved past that now. That was almost a leak. I'm not concerned about leaking what's in my cargo registry. That seems fine. Um, important finding. Common token stream get is public and accesses the raw token buffer including hidden channels. We don't need to patch Edler for Rust for do comment extraction. Okay, good. See, this is where it sometimes it's funny to like expand the internal dialogue Claude has about itself because it it sometimes it will like fret over how am I going to possibly do this? I need this thing. Um I think a big holdup is seeing if they can avoid a huge explosion with fallible. Oh yeah, this is basically like if allocation fails, how do we propagate that information? And of course, someone came up with another alternative. Yeah, of course. And so then the the whole saga starts again. Yeah. I mean, it's the sort of the eternal question of like when are you done discussing, right? When are you done looking at alternatives and you should stabilize the thing you have? Uh there's no right answer to it. Do you think you're more or less productive with agents in your day-to-day? I feel like coding skills atrophies but reading skill improves. Um, I think it's a mix. Um so I think uh I do think that the more you use agents the more your coding skill atrophies but at the same time um I think there's sort of an upper limit to how much you can use agents for your work and so therefore there's an upper limit to the atrophy. Um, and what that upper limit is, I think, depends a lot on the kind of code you work with and the kind of projects you work on. So there there are some parts of the work that I do where like I just realistically couldn't generate the code like the agents would not be able to to write that kind of stuff. And there's other things like this where I'm pretty sure that Claude will do very well at this and it would take me a huge amount of time that would not be particularly well spent. Um, and so then I don't really care that that my coding doesn't learn from this coding because I have other coding that fills the gap. Um, should IDL and IDL2 schemata be subcomands of a single binary or would you prefer separate binaries? Uh, I would like a single binary with subcomands. I think um the generated antler files have allow inner attributes at the top which means they must be crate roots or mod targets. They can't simply be moved into a subdirectory without removing those inner attributes. Should I strip the inner attributes or move them to source generated or leave them flat in source? Uh, I'm pretty sure they're fine as they are. Like, yes, it has allow non-ster standard style, but that's that's fine. I guess it would I see it would like those files to be in a different place. Um, but why would that make you strip the attributes though? I'm I'm fine moving them to source generator. That seems fine. Uh, and then the other thing I was thinking of is it's probably going to default to using clap here. And I don't think we need to. I think the input here is straightforward enough that I think I want to use um Lexop instead. Maybe. No, it's fine. I'll leave it to choose what it wants. Okay, we have a plan. Let's look at Let's look at its plan. Oh, it's a long plan. Good. Um, okay. Context. And this is where, uh, well, one of the things that I found a little interesting about using agents is, um, it forces you to sort of read through someone else's plan. uh which I think can actually be really useful in the sense of it makes me I think it it has improved my ability to um sort of critique my own plans when I come up with one but also when other people pitch me a plan for what they want to do it feels similar to reviewing this plan and I actually think reviewing these plans is really useful and teaches you a bunch of things it's different from like the the kind of teaching you do by writing code yourself but it does teach you something about planning and I think it also forces you to think through some of the implications ahead of time which I think is a good practice anyway. Okay, the crate already has uh Rust parsex code for the Abroad grammar but no actual logic. The goal is to port the Java tools those to Rust binder that can parse ADL files and emit those things. Yes. Um, core parser logic lives in a IDL reader which uses a listener to walk the parse tree and build a schema protocol objects. We'll port this to rust using a recursive tree walking approach over the antler generated rust parry. That seems fine. Now there is um there is a really interesting observation here already which is in the Java code when they want to generate the JSON I expect that they use yeah exactly they use this stuff so they already have definitions for the the JSON representation uh so we're actually going to need Rust implementation of those strcts and then derive serialize for them so that we can produce the JSON. Um, which I'm guessing it will figure out. We'll find out. Okay. Key design decisions. Uh, recursive tree walk instead of an listener. The Java version uses the listener with enter exit methods and maintains mutable stacks. In Rust, it's awkward to implement the ideal listener trait due to the lifetime constraints. The listener methods receive borrowed context that can't easily coexist with mutable state on mute self. Interesting. Is that true? Um ideal listener parse tree listener. I see. So you're given you're given references to the nodes as the parser walks through them and you can't easily store those references in self as a stack because by the time this function returns that reference this reference uh is no longer valid. this reference is, but that reference is not. Um although that does make me wonder what's in these leaf nodes cuz I actually wonder if we could just extract these the tokens directly because ultimately they should just be tied to input and then be a little bit of information around oh I guess data maybe is the problem. Oh no. Or yeah, interesting. Okay. So, what does it want to do instead? Um, instead we parse with build parser equals true and then walk the resulting tree with recursive functions. This gives us full control over traversal order and state threading. Uh, I see. So this means that it's not going to be a streaming conversion. It's going to be a parse the whole thing and only when the whole thing is parsed do we walk it recursively. I think that's okay. It's a little unfortunate but I think working around it is going to be annoying. rather than depending on the Apache Abber Rust crate which lacks a protocol type and whose schema serialization wouldn't match the expected output format. Why actually is this the latest one? I remember there's been a rename of one of them. Yeah, this is the latest one. Um all right, what do we have in here? Yeah, because this um this crate ah schema and if I look at schema and show me the source of schema. Uh interesting. It actually doesn't even implement serialize. So Apache Abro is the Rust crate you use for interacting with uh Avro scheas. So after you've generated the JSON, you do you want to now use that JSON to generate rust types and do the serialization and des serialization in Rust. The Abro the Apache Abro crate is the one you use for that part of things. Um and it obviously needs to parse the JSON schema. So it must have rust representations of the things that it's going to parse. Oh, this derive what's in derive. Um and so the schema type here for example is basically the um the type used to represent the I think the top level of the JSON file. So the question is why can't we use that one? Because this should have all of the types you can have for fields and the like and it has a parse. Does it might not have a serialize, but this feels like ideally we would just add the ability to serialize to this so that it could use the same types. Um although it's not the end of the world like this is the the output schema is not too complicated. Um with a purpose-built domain model average schema enum protocol strct that serializes to start JSON value. This gives full control of the JSON key ordering and formatting. Yeah, that's true. The JSON key ordering is the other thing here because because we're expecting this to generate one to one equal output. That also means that all of the JSON fields would need to be in the same order. The alternative would be that we tell it we don't require that the JSON is bit forbit identical. We just require that it's semantically identical. So um like if you parsed both as JSON objects and then compared to them, they have to be the same. Um, I think for now I'm okay with us just having these types ourselves and then we might want to relax the comparison logic down the line anyway. Uh, but for now this seems fine. uh doc extraction via token stream get the Java coal code calls token stream get hidden tokens to left which is unimplemented in answer to rust however common token stream get index is public and access the raw token buffer oh that's interesting uh okay for rust uh runtime rust source course hidden. It's not what I wanted. I want uh hidden hidden tokens to right and hidden tokens to left are indeed not implemented. Cool. I wonder if that is if we go to the antler for rust thing. Uh oh, we should probably point it at this uh differences with Java thing. I was just curious whether there's an issue for this. No, it doesn't look like it. What about pull requests? No, none yet. Okay. Um, but I do think we should actually point it at the readme so that it's aware of the um that be in here. Yeah. Uh, so I do want to point it at that in a second. Um, so instead it's just going to do the common token stream and then walk backwards through the tokens. That seems fine. Uh, meat for rich diagnostics without source bands. Custom handler error listeners will collect errors as meat diagnostics. Perfect. Module structure. It's going to generate a CLI with clap that has the two subcomands. That's fine. Error types. The model. This is going to be the model for the output schema. So the JSON stuff that seems fine. Sure. Okay. What do we have in schema? Primitives name types with name namespace doc fields. Yeah, the these are basically the things that come from the spec itself. So, if we go up the spec, we should find uh oh, it's not open container. It's object container. That's the thing I was talking about earlier about arbitrary bytes. Um, yeah, I wanted to look for Yeah. So, if we look at, for example, record, it's supposed to have name, type, and fields. name, top, and fields. Although, it's interesting because it seems to declare more things in here. I wonder if there's a complete grammar at the bottom here. No. Wonder. It's surprising to me that it doesn't have everything in the spec. name type doc fields complex types support the following attributes. So where are the other attributes coming from? Okay. So we might have to point it at this as well to say your schema representation might be wrong name because like for example the is error and aliases. So aliases is there properties. Yeah, properties doesn't seem like a thing. All metadata properties. Oh, I see. So these might be additional keys, but they're not known in advance. So I'm guessing that properties is actually a hashmap of those and when you serialize you add those into the serialization. Um don't know what is error is going to be for. Oh, this might have to do with if you use it in a protocol. That's fine. We'll we'll see what it does here. Uh arrays are right. These need to be box averro schema because uh we have a recursive type and so they need to be sized. In order to be sized they need to be boxed. That's fine. Same for values. Um same for unions. That's fine. This is references to other types that we may not have parsed yet. So, we need to store the reference until we've parsed them. Uh, logical types, it's fine. Yeah. Name types serialized inline on the first occurrence and as bare name strings afterwards. Sure. Cycle permission for imports. That seems fine. All right. What are the errors like? Yeah, we already have enter for rust. We'll bring those in. Uh, I do want snapshot testing. Thank you very much. I think this is from my um agents MD. I think I put in that I like snapshot testing. Um, okay. First move all the stuff around, then add the model, then hang construct the tree. Serialize and compare, then error types, then schema registry. Yeah, the schema registry you need so that as you're parsing types, you know, um, imagine you have a, you know, a record fu and a record bar and then you have a record baz that has fields of type fu and bar. You need to have a registry where you store the definitions for fu and bar so that when you get to baz, you can inline the definition from there or reference them to my name. Um, that's what the registry is for here. That seems fine. uh core tree walker. This one I'm just going to believe it because if it gets it wrong, it will be very obvious in the outputs. Um but it'll generate a top level walk file. That seems fine. Um test after each substep against progressively more complex AVDL files. That seems reasonable. Uh yeah, main is going to generate two subcomands. This will be similar to the existing commands. This iterates over all the name schemas. Yep. Run all 18 test cases. Yes, please. Compare by parsing both expectant actual JSON to JSON value for semantic equality to avoid the formatting differences between the Java and the Rust serialization. Great. So, this is it's already doing the thing we were saying it would probably need to do. Um, tricky parts. Um, nullable type reordering type creates a union, but if the field default is non-null, the union must be reordered to t, null. The first type in a union must match the default. The Java code uses an internal marker property to track this. We should use an internal flag on the union variant instead. That seems okay. JSON default value fixup. When it feels default is parsed as an integer, but the type is long. Java converts in node to long node. Insert JSON numbers are unified. So, this is a non-issue. But nan and infinity are not valid JSON. The Java code uses special string representation. Of course, it does. Cool. Okay. See, this is the kind of stuff where like I'm glad it picked up on this cuz this seems like it's annoying to pick up on otherwise. Um name type serialization in protocol JSON output name types appear inline full definition on the first reference in the types array then as bare string names in subsequent references. The two JSON must track known names to decide which form to use. Some test files reference class path resources. We skip those provide the files alongside. Okay. So we do need to tell it then about um uh avo let me just uh do this. So there's also what its reference is to the things that put on class path or other things in here. So we have extra and we have put on class path. Um, and so these are other things where the Java version will know to import these files because they're on the Java class path. That doesn't exist in Rust. And so we need to track them separately. We should probably point Claude at the um uh we should probably point Claude at these extra paths as well so that it's aware of them. Uh, antless string tokens include the quotes and escape sequences. We need to unescape them. Yeah, that's fine. Uh, great. Okay. So, we have a couple of notes. Um, read the differences. What was it called? Differences with Java. Uh, with Java section of No, we don't need that. That's not a URL. um in case that holds uh differences relevant to your plan. Um also look at uh extra comma put on class path. both of which are probably useful when using the AVDL files from uh the testing AVDL files. Um, and then let's also point it out the spec. You can find the AVO JSON specification here. Um, and I think that's all we really needed to correct here so far. Probably make a small adjustment to the Yes, I'm fine fetching from Abro Apache or in this repo. >> [snorts] >> It's imagining um it might not adjust the plan very much here, but it's also useful to give it resources like this because what we'll do is we'll clear the context and then run it with just the plan. Uh and then URLs like this will be included in the plan so that later on if it's going through the plan and it needs to reference spec, it now still has the URL and can go look at it. Um you also notice that like uh for example I wrote um also look at this and I used basically glob syntax for specifying the paths and it understood what I meant by those right because glob syntax is used so widely that I guess the uh the sort of the the model gets to learn what that what that implies. So you can use a lot of shorthands like technical shorthands with this and it just kind of works. Um, cool. And then I think the next step now is just kind of going to, you know, let it cook. Um, because I think we need to start looking at the code it produces, not just the plan. Um, which we needed to actually produce code first. Um, oh there's one more thing I want to tell it. Uh, which is by the way, um, oh here, um, because it has these different steps in its implementation plan. After each step in your implementation plan, um, commit the changes using the commit writer skill. So, I have a a skill I've put into Claude that um tells it how I want commit messages to be written and what I want them to contain. And so, I want to make sure that it actually does those commits as it goes because otherwise we're just going to end up with a a giant batch of diff at the very end. Uh, cool. Clear context. Auto accept edits. Let's see what it makes. doesn't it lose the context the more data that's provided or more iterations are done? So kind of um it it is true that as you work more in a given session it will start to um its context will grow large and therefore get kind of diluted. Um what? Sure. Um and then then start to get a little diluted, which is okay. Um because whenever it has to compact the context, it is refed the plan. And this is why it's so important that the plan is well structured and contains a lot of the relevant details because you're as the session progresses, it's going to continue to refer back to that same plan whenever it has to compact its context. So the better you can make the plan, the more continuity you will get as it works through. The other thing I want to do here is um uh make use of sub aents actively for self-contained tasks so that you don't dilute your own context. Um sure that's fine. [snorts] Uh which cloud model is it? This is uh Opus. I think I'm on Opus 4.6 cuz I because I did a full system update this morning that also included the cloud update that came out this morning which includes uh Opus 4.6. It claims it has a solid understanding of the codebase now. I mean it it has a solid understanding of the generated code. I guess actually how much code did it generate? I mean so Antler did generate like 7,000 lines of code of parser. So I am glad we didn't have to write that ourselves. And I mean it's also like you know it's an interesting question of if the uh antler grammar and and generated had not existed would it have been feasible to do this task? I don't know. I mean, it it probably could eventually generate the the correct parser and grammar, but I think it would actually be a lot more work, even though it's it's okay at generating parsers and the like. Um, in part because there are so many examples available for it, but I I do think that we'd probably run into a uh um a situation pretty quickly where we would not be able to do this in the in the allotted time. So, let's see what we got. Um, oh yeah, I don't want jar files to be checked in. Don't check in jar files. Uh, and then this can go away. And now this is clean again. Okay, it wants to do a bunch of get moves. That's fine. Having used antler, the generate code is mostly boilerplate though. Oh, I I totally buy that. I mean, if if you look at the uh right, it's moved to here now. Uh so if we look at source generated IDL parser look through here like a lot of this is pretty mechanical some of it is just like be a parser right but it's like this is just walking the the grammar. Um, but at the same time, this is also code that it would kind of have to have generated anyway, maybe in a slightly different form. And maybe when you're parsing a particular grammar, rather than having to create a general purpose grammar parser generator, you end up being able to write much more efficient code. That's certainly true. Um, but even so, this is now a bunch of code we didn't have to write. Uh, I do want to cargo out all of those. Yes, thank you very much. [snorts] Uh, telling it to the examples and to generate a spec for it before generating a parser would probably have worked better. Yeah, I mean if if I did have to make it also generate the the parser, I think what I would have done is not start with the whole like we want to do the IDL stuff, but just start with I want to generate a parser for this language and then give it a bunch of examples and the Java code and everything. And then after we had a parser, I would then go, okay, new task. Now we're going to use this parser that we've built to build the um um the transpiler. Uh, yes, that's fine. That's fine. Um, honestly, I kind of like writing recursive descendant parsers manually. Tokenizers are a bit more of a pain, but not so much as you think. I know. I like writing parsers. I' I've done a couple of them. I think I did one or two on stream and we did one on the um uh there was a codec crafters writing an interpreter for some language and there we did write a a parser ourselves. Um we write wrote one using like um I think we wrote a prat parser if I remember correctly that was a lot of fun. I've written a separate parser for another reason. Um and I do think it's really fun. It's more that and this sort of gets back to the what's the point of coding, right? Like for me there are there's some coding that's fun and there's some coding that's annoying and it's usually not so much about the code that I write. Like for example, sometimes writing a parser is really fun. Sometimes writing a parser is really annoying. And it depends on whether the parser is blocking me from doing the things I actually want to do. Right? Right? So, if the thing I actually want to do is, you know, uh I don't know, uh knit a sweater, uh but the task I end up doing is shaving a yak, right? Then like the shaving a yak is less satisfying because I'm doing it, but it's blocking me from doing the thing I actually wanted to be doing. Even if normally that yak shaving would have been fun to me if I was just doing that in isolation. Um yeah, the interpreter for locks and I think it's it's something exemp uh similar for me here. Uh yes, cargo build is fine. Um it's something similar for me here where um I don't actually care about writing a parser for a format. Like that's not a thing that I want to spend a bunch of time on. Uh what? There's definitely a claude skills commit writer. It is simply lying. Uh the skill is called commit writer. Um so like here what I really want is to just have this porting thing. Uh, actually use the commit writer skill, please. It's not always very good at uh doing what it's told. There we go. Yes, use commit writer. Um, so so if I spent like most of today writing a parser for Abroad, that would not feel like a good use of my time, even if I might have enjoyed it if that was the task I'd set myself. But because the task I've set myself is something very different specifically I want to just have this transpiler exist. Um then you know that would be uh uh would not feel like a good use of my time. Um great. So now it's going to do steps two, three, and four in parallel using sub aents. And in theory they're all independent of another. One of the reasons why I want to use sub aents here is because the sub aents get their own context. And so they get to do a bunch of thinking and stuff without polluting the context of this main orchestrator. Um, so it's going to kick those off now, which is fine. Um, the one thing I want to show here is also I had it generate the commits using my commit writer skill. So let's look at what commit it actually generated because I think a lot of autogenerated commits are pretty bad. That's why I wrote a skill for it. Let's see how this turned out. Um, move the answer to generated files into generated generated files. Previously used inner allow attributes would only work at the crate route. That's just false. Uh, switching to outer attributes. It's probably thinking of crate features. I agree. Um, because they they work just fine. Um but but one thing someone made a point that um uh it tends to be useful to like let the LLM use its own taste rather than forcibly using yours for some things because otherwise um the the LM will keep getting confused down the line because it assume like let's take um function naming like if it proposes a name for a function then that's sort of the the pattern match it has done to the most likely name. If I change that name, if I forcibly tell it no, use this name instead, then when it later writes code that relies on that function, it will also infer the most likely name and that name will not match the one that was actually used. And so you're introducing additional potential for confusion for the model on multiple points in the future. So sometimes it's fine like it's useful to go with the recommendation it comes with even if you don't like it that much just so to increase its iteration velocity down the line and then the trick becomes at the end you can change a bunch of these things around but you don't want to do it during because then you're slowing down each subsequent phase of the of the development. Um created sub files for the planned module layout add rundown dependencies. Okay that's it's fine. It's a It's a fine commit message. Um, uh, the agents don't commit because they will step on each other's toes. Instead, you should commit their resulting work at the end. Um, do you mind spending money slashtokens on open source tools? Do you even have it as a consideration or is CloudMax enough that a project like this doesn't usually too much use too much of your quota? Um, so the reality here is that I'm normally on Claude Pro because I use it. Um, so at work we have our own separate setup. So at work I don't use my own tokens. Um, and then for my personal work, I I usually end up not going over the like it sort of depends what I use it for, but very often I'm able to stay on the pro plan and just stay within within budget, so to speak, whatever I use it for because I use it for smaller tasks. This is an example of a bigger task where I would expect that I would very quickly run out of the the pro sort of token quota and so I upgraded to max specifically for today uh because I think I will would easily otherwise have used like the tokens allotted to me and I wouldn't want to like stop the stream for 3 hours until the token limit reset but I'm totally fine using tokens on contributing to open source things. I'm actually I would be more skeptical on using it for my for something that was only going to be private because it feels like it's not even contributing to something outside of myself. Um uh isn't it harder to go back and refactor full system instead of doing it on the spot? Um it's not so much about refactoring a whole system. So for for system architecture I will tell it to do things differently. It's more for it's actually more specifically for things like naming or things where a construct that it constantly gets confused by where those you might be better off just letting it do it. Like in this case with the um uh with the sort of uh the allow attributes, I suspect that if I left those in the files, every time it would open one of those files and go, "Oh, this isn't allowed to be here." or if it hit an error, it might go, "Oh, but that might be because we have those here and they're not allowed to be." And I kind of just want to remove the distraction. Like, you can kind of think of these agentic agents as having like some amount of ADHD, right? Like they they see a thing and they can't really let it go or they need to continue to focus on it until it goes away. And so, I'd rather just make it go away, make the distraction go away. Um, where do you host your commit skill? It's I haven't um put it in the open yet cuz I need to I need to do a little bit more due diligence because before it's useful to other people. Uh moral of the story here is don't nitpick. Yes and no. So So there are a bunch of things I will nitpick on. Like when we're in planning mode, for example, I will nitpick on specific parts of the plan that I think should be different. But there's sort of a difference here between being very detail oriented and trying to change things that the model will they're sort of misaligned with how the model is likely to think about the code in the future. I feel like these models have more ADHD than I do. I I think that could certainly be true. Uh all right, let's look at the log. Oh, it committed all of them together. That's not what I wanted, but okay. Um, okay. Error. Do comment extraction. Yeah. And see, now we're at we're about to autoco compact in not too long. Um, that's fine. This is the core tree walker. So, I'm trying to debate if I want to to compact before it runs this. No, I think this is okay. All right, let's let's look at the code is generated so far. So, we now have a bunch of code. Cool. Uh generated code, I assume, is just going to stay the same. And it's now put all these allows on there. That's fine. Um the doc comment stuff is the thing that will um allow you to extract the doc comment before a record or a field. Exactly. So it extracts a doc comment associated with a parset tree node given the token index of the start of the record. For example, we scan backwards until we find a docu comment. Um so we get a token stream. We get a token index. [snorts] We keep walking the stream backwards borrowing it getting the token type. If the token type is doc comment, we grab the do comment text and we break. Otherwise, if it's whitespace, we keep going. Otherwise, we hit something that's not whitespace and not an empty comment. Then we break and give up. Uh then we return early. If the do comment is not set, we strip the prefix and suffix. This seems wrong. Um, ah, maybe not. Maybe we're actually guaranteed that these are This is an example of something where I would probably prefer a debugger here to check that that's actually what's going on. So, let's um start with one of my favorite things, to-dos.m MD. Um, and so this is, and part of this is because I don't like sending all of these nitpicks to the LM while it's doing something else because it tends to again get distracted by it. So instead, I'll do I'll keep this to-dos file and then at the end point it at this and be like, here are a bunch of things to tidy up. Um so uh in this when stripping the and add a debug assert to make sure that what we stripped was actually those uh uh those characters. Maybe use strip prefix and and friends. If termed is empty, then return none. Yeah. So an empty do comment is equivalent to having no doc comment. That seems fine. And then it strips the indents. This is matching the Java strip indents behavior. So let's go look at the JavaScript indents. [snorts] Yeah. So, this is just when you have a doc comment, the left side of every line is going to have like space space space star space and then text. But when you generate the actual doc comment, you want to strip those and instead like rewrap the text to not have those spaces and stars. And that's what this does. That's also presumably what this one does. Um, does this bring in the regex crate? It doesn't. It feel like it probably should. Yeah, let's um tell it to use regex for this instead. Uh to also to make it better match the Java version. Um uh let's prefer doing the stripping with the reg x crate to match uh the what the Java version of this code does. It's still going in the background. Yeah. While the tree walker agent works, let me prepare the import resolution and CLI modules in parallel since I can write them independently. Cool. Sure. Go ahead. Uh, and now it has to compact the conversation. That's fine. Um, okay. So, this is going to change because we are going to switch it to regex. So, no use in reviewing that right now. Um, and it has a bunch of smaller tests. I think I'd actually prefer those tests to be expanded. Um, also let's expand the set of test cases for super indents. Cover more odd combinations of characters people might might use in their source code. AVDL source AVDL code. Okay. Uh so that's the docom and parsing stuff. What else do we have? So next we have the error types. What's an error? Um what's a named source? So Mia this is new I think because it used to be that you had to like specifically include the name source the whole source string utility direct from your regular source code type that doesn't implement name. What do you mean doesn't implement name interesting because the source code is just like a string but I'm guessing then I'm hoping that what It does here is dduplicate, right? So, oh maybe it actually maybe the source code here ends up being a oh no it is just over string. Interesting. Okay. Um label message span. Okay. IDL error can parse diagnostics IO other. Yeah, this feels like it probably tidied up. Okay, so there's not very much logic in there at the moment. Import resolution. The import IDL case is not handled here because it requires calling the IDL reader/parsser. Oh, I see. because these will actually import the JSON files. Whereas this imports and the IDL parses it, turns it into JSON and then imports that interesting. Yep, that's fine. resolve import. So it resolves relative to the current directory. Otherwise, it checks the configured import directories which would be equivalent to the Java class path. Oh, interesting. I wonder whether it's even necessary to like because this basically has to parse the JSON representations. But I guess that's fine because we have we must have types to produce the JSON representations. So we're fine to just derialize them as well. Doesn't mean we need to understand their semantics though. So what does JSON schema do? converts a JSON value to an Avro schema where Avro schema is our wrapper type. Okay, I see. So, we can't actually just uh we can't just suralize into Avro schema because why? Uh because of things like name type references and we want to represent unions. Okay, I see. So, we have a custom way to turn a JSON value into uh into a parsed schema. That seems fine. And then for the name, we have the references. We turn them into references. Okay, that seems fine. And I think um ASC and AVPR here are actually very similar. So the AVPR just pulls out the types of messages. AVSC. I see. So they both just register the schema. I see messages here are so in the protocol messages are the API endpoint and then the registry holds all of the schema all the schema types. Um so in the case of being given a protocol file we have to parse out all the types and list the messages. If we get an ABSC all we have to do is pull out the types. Um so that makes me curious what register does which I think just is going to go to the registry which we haven't looked at yet. Um, okay. And a lot of this is just parsing stuff. That seems fine. It does allocations here that we probably don't need it to do, but that's okay. Yeah. And a lot of this is like parsing the complex types, not in the sense of parsing the IDL representation or parsing the JSON, but parsing the JSON structure. Um yeah, logical type. Yep, that seems also reasonable. Whole object extra properties. Yeah, I think I would definitely prefer to handle. So, um you can define your own logical types in ARO and when we get those I would want that to be I would want that logical type to be preserved in the translation as well. So, um handle the to-do around unknown logical types in uh source imports. Uh we should make sure to always preserve uh custom logical types in conversions. Primitives with no logical types any extra keys beyond type are custom properties that we cannot currently represent on bare primitives. Uh so properties here are things like uh annotations like uh amp% uh order for example not amp% uh at order um where you can say what ordering that field type should have. I do think we want to make sure that we keep track of uh uh additional properties on primitives to always preserve additional properties. Primitive for stir This feels like it should return an error, not hit unreachable. Um, should probably return an error rather than panic. Um, I assume the Apache Aroc crate already has done the leg work translating Abro concepts to Rust concepts like types. Are we missing out on free context and compatibility by not referencing or integrating? Also, JSON parsing and validation should already be in Apache Avocrate as well. So, so kind of the um there's two problems here. One of them is that the Avocrate the Apache Avocrate does not have support for serializing the JSON only deserializing because that's all it cares about. Um the second is that turning the um the JSON types into Rust types. The need you have is a little different when your goal is to produce these files and be able to um like combine them with a parsed input schema for example like an IDL schema. Um, so maybe there's a path where we could join in with Apache AR here, but not off the bat like because the their types are not built for uh producing these JSON schemas, only for consuming them and then doing code generation based on them. Um, okay. This is parsing JSON messages. Great. Uh, how's this going? Status AU. That's fine. Um, let's go back up model. We already Yeah. So this is then this is similar to the code for serialization. So this is the stuff that's not in Apache arrow, right? taking a schema and turning it back into the JSON representation. Uh which is unfortunately not just asserti serialize. Um because you need to do things like preserve the name spaces and you also need to do things like after the first time you referred to a name uh the first time you refer to a name you must include as definition but the subsequent times you refer to it you just refer to them by reference. Um, okay. This stuff feels like the structure is about right. Well, when I'm reviewing this code, I'm not so much reviewing like every line to check that it's correct because a lot of that will get out of the testing. I'm reviewing whether are there sort of obvious things that don't seem to be handled. I'm looking for are there um structural things where I think I think the code shouldn't be operating this way or handling the inputs in this way. I'm looking for things like uh error handling, panics, unwraps. Um I'm looking for um yeah, I think those are sort of the main things that I'm looking for here. You'll also notice the way that it's writing comments for me here um is a little bit like literate programming. And that's on purpose. That's also something I have in my in my cloud MD in my user file is that I I do I want comments to specifically help me read the file top to bottom um rather than commenting completely useless things. Um yeah. And this stuff too is like non-trivial serialization, right? It's like if you're in a file that's in the same namespace as the namespace of the type, then you just want to include the the name. You don't need to repeat the namespace. Okay. Uh these are just the definitions. These are also just the definitions reader. Uh that's fine. Yeah, this is the core part of the parser walks the parse tree to build our domain model which is the abro schema stuff. That's funny. So, it does choose to use module module level allows here and here it thinks it's fine. Uh, but this kind of comment is an example of of very like literate programming, right? Of um write out what is about to happen. Um, that's fine. All right, this seems worth reading through more carefully. So, this is the the top level of parsing the ADL. So, we get our we get our input, we pass it to an input stream. new. So this is the the antler generated stuff. Um give it to the lexer create the token stream from it and then a parser from the token stream. We do build the parse tree. This is the sort of limitation. This is because we're not using the listener pattern. Uh the input field holds the token stream for do comet extraction. Yeah, unfortunately. And then we walk the IDL file. This is the thing that starts the the recursive. So remember we're basically doing a recursive descent over the over the parse tree here. um rather than using the sort of visitor pattern accumulated annotations from the pars tree. So this is stuff we need to keep track of. They are they're consumed by the walker not passed through as custom properties. All other annotations end up in the property maps. I see. So um there can be custom annotations and there can be well-known annotations. So we want to separate those. So that's this bit and then yeah so this is walking the list of all the properties that annotate a given field uh and for each property figured out what the property is and extracting the information. Uh this feels like an error. Um so in uh it should probably be an error if there are multiple namespace annotations on the same item. aliases is just pushing to the array. That seems fine. Error handling here is pretty decent. Although it sort of depends what ends up happening to this error, right? Because we would want this error to be annotated on the the part we're actually parsing right now. So we'll have to look at that call stack. Uh so I want to find where is this called from? Uh that doesn't seem right. Okay. So um we need to make sure that errors that occur during parsing uh and interpretation um end up with sufficient context for the user to understand where the uh error originates. in their input files. For example, the error cases in this function um should probably also produce MIT errors that point to the relevant to the um tokens that triggered the error condition. So this is going to be a bigger challenge I think for it but find out. And then we have walk idl file which is the start of the recursive descent. Um and then we either walk protocol right. So protocol allows you to also have messages defined. Interesting. What is this looking for? Ah this is looking for the doc coming for the entire parry. So this is the top level docs. Interesting. This doesn't seem like it'll be right because this is looking for the the doc comment at the top of the input file and the it's trying to get the token index for the start of the file and then extract doc comment. But get token index here presumably will be zero. and extract doc comment tries to find tries to walk backwards to find the previous the previous doc comment but there won't be a previous I think this needs to be handled specially um I think we'll end up doing uh the wrong thing because it will call this with an index of zero. Um, and then that function will look earlier in the file for doc comments but not find any. Since there are no tokens that precede uh index zero, uh we probably need to handle looking for uh the top level doc comment especially Um, okay. And then it walks properties. I see. So for protocol the expectation is and we can go back and look here at the ideal language definition. So at the bottom they have an example of protocols, right? It's the expectation is there's a doc comment at the top, there's another doc comment and then um properties for the protocol. They expect the protocol to be the the only top level thing and then inside of the protocol you can have additional stuff. Um and so that's what this is trying to do, right? It's first trying to extract the doc comment at the top then trying to read the list of um schema properties then trying to get the protocol name. So something here is not right. Oh, this might actually end up it could be that start here is actually the that do comments are considered hidden tokens. And so when you ask for get token index, you don't get the or start, you don't get the first hidden token, you get the first non-hidden token. That could very well be. Uh or is um ctx.st start going to skip over the um doc comments. [snorts] The header comment is not a doc comment. Oh, that's not a doc comment. You're right. So in that case actually this is kind of wrong in the first place right because this this then is extracting the doc comment for for the protocol. But how can it how can it do that here? Unless start points at the protocol. It must be the case then that the the parser when you run it on something like this file, the parser's first token or item is going to be this. And that's sort of where it's pointed. And so when you then look here for uh when you look here at start, that's where it's pointed. And then this will look for the first thing before that. And so then and this is just a comment. So this is not actually a doc comment at all. Um so then let's rephrase this. Uh, will this do the right thing? Um, won't it call this and then? So, this is more of a hey, go check your logic again. Okay, what else do we have? Then it walks the properties, gets the identifier. But here's another example of like um somewhere where it's not actually preserving the error context. So let's go back to this. Uh a similar problem occurs in when it hits uh we need to be more rigorous about this to try to indicate that there are other places where this is needed. uh in fact or more or missing protocol body across the codebase. Then it gets the protocol body, collects the imports from the body, collects the schemas, [snorts] walk named schemas in the body. So this is the this is these right name schema decorations. No that's this is collecting the imports. This is walking the name schema decorations which are things like this. Um, we collect the top level schemas directly into types rather than pulling them from the registry because the registry flattens all named types including those nested inside records. The Java tools inline nested types at their first reference point. Only top level declarations appear in the types array. We collect the top level schemas directly into types. Ah, I think I see what it's saying. So when walking these, we could have the option of taking all of these and just adding all of them to the registry and then the list of types when we turn it into JSON would just be all the types in the registry. But that would be incorrect because um whenever we put something in a registry, we really put a potentially nested type definition into there and the registry will store all of those by name flat in the list of registry. So if we then later walked the registry, we would end up with the nested types as well. But the types array of a of the JSON representation should should only of the types array of a protocol should only have the top level names um named schemas from that protocol. That seems fine. and then walk the messages which are uh these for example like all the function calls sort the messages produce the types and then hopefully walk named schema is very similar to what we do in the not walk a protocol case. So then we walk namespace declaration collect the imports and then it also calls walk name schemas. Yep. In schema mode, we register them in the registry but don't collect them into types lists because there is no protocol. If there are name scheas but no main schema declaration, return the first one as the main schema. That's fine. But uh so the third argument here is the registry. What does this pass? Oh, I see. So, this also reads them into the registry. It just stores the information about the top level thing that was walked. So, if we go to here, presumably it returns the top level schema that it parsed in addition to adding it to the registry. Okay, that's fine. and then walking a record. First thing we do is we walk backwards in the parse context to get the doc comment. This bit is interesting like what is the what is the use of is error. So clearly there is an ideal error type. Oh. Aha. This says error instead of record. So that's what we're keeping track of. So error and record have the exact same structure. So we don't want different enum variants for it. Instead, we just keep one enovariant for records and we keep track of whether it was an error or not as to how we reproduce it again. That makes a lot of sense. Uh yeah, so this is just including the fallback. Save and set the current for field type resolution inside the record body, then restore it afterwards. Yeah. So this is if you declare a namespace on a record and inside of the record you declare new types. Those new types should inherit the namespace of the record not the namespace of the file. That seems fine. Restore the nameace. These are variables. So these are basically fields. Um, why are there default docs? What is a def what is a default doc for a field? Default doc. Oh, the doc com comment on the field declaration acts as a default for variables that don't have their own doc comment. What's the difference between a field and a variable? What what is a variable in this context? A variable declaration context. There's no mention of variable in the idea language spec. What? What? But it's in the Java code, too. I'm just confused. Um, yeah, a lot of these look very similar. This is the same of fall back. This is the default symbol. Okay, that's fine. And these, yeah, all of these so far seem they have sort of roughly the same issues, right? that they all lose the the parsing context. Uh we may need a helper here that should be widely used. Oh, and then there's a whole bunch of parsing. Curious why Interesting. I see this is like Okay, this is parsing string literal. Uh, well, isn't there a um I'm pretty sure I saw something about was it escape that was added escape as but I thought there was a there's an inverse of this that was like That was basically like parse string. Oh boy. Oh boy. What have I gotten myself into? This file is too long. I guess not. I was pretty sure there was a like a parse escaped ASKI in the sender library as well, but it wouldn't necessarily match what we have. [snorts] Um, yeah, I mean this is just parsing string escapes. Yeah. Okay. So, eventually we need full octal escape handling. Sure. This feels like something we could probably This feels like it's not that Java specific and it's actually just kind of standard um escape string parsing. I also don't know that it's interesting. It's an interesting question whether the expectation is that the IDL spec is string literal have Java string literal semantics. That feels like not really a sane thing for an IDL to to specify. Um like if we go here is there a mention of Java apart from those uh all Java style comments are supported Java style annotations. I see here's an example of custom properties, right? It's things like Java class, but I don't see any mention of strings. Let's go to primitive types. Uh why doesn't my Oh, right. Uh primitive types are the same ones supported by the JSON format. Huh. I see. uh string. Oh, this is how they're encoded on the wire. I don't care too much about that. JSON encoding. Well, but this like it only applies to the default thing. Uh so what I want to find are transforming into parsing canonical form. I think they just don't say. What if I search for like Yeah, [laughter] I think they're just going to go Yeah, it's it's whatever Java parses strings as. Interesting. Okay, fine. Uh, we'll leave this parsing in for now. parsing integer literal. Okay, so we're just reimplementing Java's decode functions. And I mean, it kind of makes sense why we ended up this path, right? Because we told the LLM we want to match the semantics of the Java code. And furthermore, the spec doesn't say how integer literal in the IDL are encoded. It just says they're integer literal. And then the the Java implementation of Averro Tools conversion of IDL to JSON schemas just uses the the Java parser. And so the the result is that the spec semantics are what the Java parser is for for literals. Uh and so we need to replicate that here. That's not great. Um, I'm a little tempted to just like sort of special case this and say for for like for integers for example, we're just going to parse it with the Rust integer parser. And if they're a little different, they're going to be a little different. Well, for what it's worth, that is what we're doing. Actually, we are using the the integer parsing here. But I thought the Rust integer parsing maybe it doesn't accept like 0x and the like. Um, okay. The rest here I think is going to be test covered. And then going back up from reader, where are we now? Resolve. Right. So this is the registry. Registry I'm not too interested in. Should be straightforward and also should be well tested. Okay. uh for test scripts uh create them as examples in examples and run them with cargo run examples. Um Claude for whatever reason really likes creating like standalone RS files and then running um like Rust C on them and then building them rather than just using examples and running them with cargo run. Uh and it's pretty frustrating. So I keep having to tell it to not do that. [snorts] Um okay. What have has it generated any? Yeah. Okay. So, it's generated a new commit for us. Add ideal reader and import resolution. Um, it uses the listener. Yeah. This is the repetition of this covers all of the various parsing. Known limitations. Properties on primitive types is silently dropped. Unknown logical types are sly dropped. Unescape Java only handles singledigit octal escapes and nullable union property targeting doesn't account for reordering. Um interesting. So session MD is something I've um added to also my my cloud MD that basically outlines whenever you run across something interesting, put it in this file. Um, and so if we go look at this, it'll say, yeah, like here's one example. Here's another. Um, yeah, many of these we already have to-dos for. This one's kind of interesting. Walk record receiver register parameter but does not use it in the Java implementation. Records can contain nested named schema declarations. This is not yet handled. Messages from import protocol statements are not merged in the current protocol. types are registered correctly, but the imported protocols messages are discarded. Yeah. So, certainly many of these are going to be to-dos. Uh, source of full qualified name, but should use a simple name of the reference in the same name space. This causes many integration test failures. Good. Um, messages without throws should not include an errors key. So, this is actually something where um, that's fine. I'm going to copy sessions to a uh recheck.mmd um because it's going to keep modifying sessions as it as it works. Um and so I want it to make sure it goes back and checks all the items from there. Um when we when it finishes its first iteration through here. I don't see this workflow more productive than the regular programming know the speed at which you program. I I actually think it is like um if we look at uh no source slash like okay let's exclude the um the generated code here. So the generated code is about 7,000 lines. So it's generated about 5,000 lines of code and reading through what what we've done so far. Like this is just me reading the code, right? There's way more written code than what I've read so far. And I would not have written it faster. There is just no way. Right? The the interesting thing I think is is what comes next. It's at the end of the sort of first iteration here and then going back over the to-dos. Um does it actually get to a working implementation? If it's the case that I now need to spend the next, you know, six hours uh basically redoing its implementation, then I agree it would not be more efficient. Um but I don't think that's going to be the case. You see it's already now doing uh it's debugging its own implementation already. And this is where it's fun to go back and read its own uh its own chats to itself. It goes uh wait let me check. Are they defined as separate top level declarations or are they forward references? Let me check the input file. Oh, they are top level record top level declarations. Uh the issue is the ordering. But wait, are they actually reference nodes? Let me check. Um it's very entertaining to read through. But this is what I mean by um it's really really useful to give it the ability to test its own implementation in a pretty rigorous way because now it can go oh the Java parser handles this um our parser needs to handle its case. Let me see how it checks it. And now it has the reference to the Java code and the reference to the input and the output and the difference in the error. And it can print the the um the parser representations, right? And so as a result, it's it gets to iterate without having to ask me a bunch of things as it goes. Um, and actually what I want to do here is stop it. Stop it more. Stop. Stop. Stop. Stop. Stop. Um, here's what I want to do. uh resume the concurrent agents also spin off agents for do this debugging in a sub agent. also spawn sub agents to investigate um but not fix the issues identified in um what did I call this file recheck in recheck.md and uh to-dos.m MD um when all the agents have concluded. Um uh split the identified changes into um those that can be worked on in parallel without um so so the the danger here is because because we don't yet have a thing that uh compiles and runs end to end. If I tell it to have many agents that all try to fix things at the same time, they're going to step on each other's toes either in the form of modifying the same files or even if I tell it to split them in that way, um those agents will like if one introduces a syntax error in the Rust code it generates, the other will try to run its code and go, "Oh, it doesn't compile because of a syntax error." And then they start stepping on each other. Um uh concluded. Let me know the state of affairs. Uh because what I want to get to here is for us to commit an implementation um that has you know if we if we look back here there are bunch of diffs in here right? Um, and what I want to do is commit them so that we end up with a a clean git uh state and then we can start issuing work trees for the different agents to fix their individual sub u issues. Um, how does it make you me feel that it's more productive for me to not write the code myself? It's fine. it. But but the point and I I've made this point in the past as well, right, is um I it's not as though I can have agents just do all my work. That is just not the case like empirically. Um, instead there are some tasks like this one where I can have uh like an agent do a lot of the work for me because of the again going back to the the start of the stream, right? Like why I think this particular problem is so well suited to solving by an LLM because of that it works really well here. There are other things that it works much less well for where it is definitely not more time efficient than me doing it. Um and so you know it it in a way it feels better this way because the things that I need to use my like active brain power for um end up being the the more interesting things and then the things that are a little more mechanical maybe or boilerplate or something I can sort of offload and not have to do myself or I mean I at the same time, you know, if you if you've observed the stream over the past what uh 3 hours, right? It's not as though this I just like started it and then walked away, right? This still requires me to walk in and look at the code, leave these to-do comments, guide the agents, come up with a plan. So it's not like it's completely handsoff, but there will be a point probably pretty soon where we can just have it sort of run wild and just debug itself and then we'll have to do another iteration of reviews of like okay what does the code look like now? What kind of insane workarounds does it come up with? Um and so that is where it can be really useful to like for example there was a time when um I had to work on something kind of similar and I kicked off the agents and then I went and made lunch and then I came back and then you know it had done stuff while I was having lunch. Uh and that is very useful. Um it it only works for the subset of problems where you get to have this um you you it has enough ability to iterate on its own without your direct involvement all the time. Uh is this clawed code CLI with a pro subscription? Um so so it's yesish. So, it's with the max subscription right now because this is going to be a decently long stream and so if I had the pro subscription, I would probably run out of tokens. Um, but so I upgraded it to max for the purposes of this stream. Normally though, Pro is fine for me for the stuff I do uh on my personal time. On work time, when I use it, I will use it in more this sort of relatively intense fashion for for a subset of the work that I do. Um and but at work we have a separate setup where I it's not pro or max or anything. It's like a separate you know separately hosted infrastructure and sometimes different models and everything. Why did it make [snorts] code edits? I told it not to make code edits. The debug agent is stuck in a loop. It's being denied bash permission and keeps retrying. Yeah. Okay, great. Yeah, that's fine. [snorts] Why did it I told it not to try to fix anything. I told it I told it. Uh yeah, but I don't want it to fix anything. >> [snorts] >> And then my plan here is once this finishes, I basically wanted to write out a a documentation of all the um now known issues that need to be addressed. And then I wanted to commit what we currently have. And then I want to create a bunch of workspaces for agents to try to solve each of those sub things that need to be dealt with. [snorts] Don't try to fix the issues right now, just identify them. Um the um this sort of aligns actually with something that I've I've thought about more broadly, which is um my philosophy when it comes to meetings. So for meetings, I tend to have this philosophy that either you can spend a meeting identifying many problems or you can spend a meeting solving one problem. You cannot do anything else and you cannot do both and you better do one of one or the other because any other meeting is not going to be useful. And I I actually have come to the conclusion that I think this is very similar to working with agents that I think you can either have uh one agent try to find many problems or you can have one agent try to solve one problem but you cannot have them do both and you cannot have it crucially have an agent try to solve multiple problems. it tends to run out of context, get confused, lose track of what we was doing. Um, and so this is where the sub aents thing works pretty well because you can basically say I'm going to take I'm going to spin up a bunch of agents and each agent is going to um each agent is going to either do problem analysis and like be a debugger and a problem identifier or it's going to try to fix a specific issue and nothing else. The borrow checking rule of meetings. Yeah, exactly. Um, if you weren't as experienced as you were, would you consider this work you're doing now dangerous because of the scrutiny you're applying would be beyond a high percentage of users? Um, maybe I think is the answer here because the and this gets back to why this is actually a really good use case for LMS is because we have such a um we have such a good reference point, right? Like we know what a correct implementation looks like. It looks like something that exactly maps to this other function. uh and we have a bunch of test cases and can generate more and so it's very difficult to end up with a solution that's just incorrect but that looks correct which is the normal problem you have with LLMs. Um okay so now it's identified a bunch of issues. Um great. So here's what I'll do. Um let's first commit the uh implement the uh let's first commit the currently uh pending changes. use the commit writer skill and consider splitting the changes into multiple commits. Um actually no, let's first document each of the uh remaining issues into a MD file. uh into separate MD files in issues. Then uh I want to do that first because I want to make sure we preserve the context for it. Then commit the currently pending changes. And we're just going to do those two first. Because then I like having this sort of directory where you keep track of the issues because then you can say create an agent for every file in this um in this directory and tell each of them to read their file to figure out what they need to do. Um and then separately we can and for each one create a git workspace that that thing will be operating in. Can we assume the string lit uh the string literal and integer literal definitions in the G4 are okay reference in place of the Java built-in parsers? Um let's go see. uh probably. Well, although if you look at integer literal I I would assume that the the G4 grammar here is accurate. Um like I don't think we actually need to support everything the Java one does. I'm guessing what they found actually is that the uh the Java one is a superset of the rules that are in here. So in fact if we look at this um there's no support for underscore for example in the integer literal here. So actually maybe we can simplify the um the issue. So let's go here and say by the way um for parsing the various kinds of literals check the G4 that's a good insight um check the G4 file for the possible syntax for such literal we don't need to accept anything the Um, Java built-in decode functions support only what the grammar says is legal. file this as a another MD in issues for us to address later because otherwise Claude tends to be overly eager about going to try to fix it right now. [snorts] So now we should start getting files and issues here. So if we go to issues and we look at number one for example the enum status and status schema AVDL has name space none and the parse model the expected output is namespace system. Yeah, great. So now we're going to start to get a bunch of these that hopefully can be handled relatively in parallel. Uh, if you were inexperienced, would you have exclusively used Claude for all your personal projects? Would that have made you learn programming on your own? It's a difficult hypothetical because I really like writing code. Like, I wouldn't want my entire day to be prompting agents and reading code. I want to write code myself. Um, and that especially applies to like the kind of really tricky code that um I think agents are pretty bad at writing still like high like concurrent algorithms is a good example of this. Um, and that's the stuff that sort of motivated me for why programming is fun in the first place. Um, I do think though like for example I I entered programming through web development. Um, I wanted to have a website and I wanted to like program things in it and it seemed fun. Um, and it could be that like that passion hadn't been started if I could just prompt LMS to build it for me because I wouldn't have really known to dig into the code. Um, so so I do think there's maybe a danger there that fewer people get exposed to writing code and writing code could be something that they found really fun. Um but but I don't think I would I think once you get a sort of um taste of writing code and find that you enjoy it, you you wouldn't then go to well now I'm just going to use agents for everything because sometimes you write code because you enjoy writing code. like the I do feel like the the part of my brain or the brain process I use for writing code is different than what I use when prompting and reviewing. Um they're related but they are distinct and I wouldn't want to not use the coding part at all. Do agents perform worse in Rust if they write unsafe code because the compiler feedback is more limited. um kind of so so you know the feedback mechanism for unsafe code is worse but Meri is pretty good these days so if you give it instructions to run Merie Miri will more often than not be able to tell it oh but here you're doing something illegal and then sometimes it can figure it out it's more that the the domain of problems where you use unsafe are also domains that are harder for LM in general um because they rely rely on more sophisticated understanding which they don't don't usually have. Um, so not so much. Uh, wouldn't that cause issues with files that work with the Java parser but fail with the Rust one? I can imagine the causing compatibility problems with non-spec compliant schemas. Um, yeah. So if well maybe so I actually think because Java also uses the antler pars the antler generated parser. If you had a literal that used a Java or Rust specific parser thing, it wouldn't pass the antler grammar parser. So you wouldn't like it would fail to parse in the first place in both Java and Rust because it doesn't match the grammar. Even if the if it had matched the grammar, the parser would have parsed it correctly. Uh so that I'm not too concerned about. [snorts] All right. So now it's doing the commits. So now we should have a bunch of issues. Uh aha. And now we have number 16. This is the one that was pointed out in chat. Literal parsing should match grammar, not Java built-ins. Currently tries to handle the full set of JavaScript sequences. However, the antler grammar defines the exact set of legal literal syntaxes. We only need to accept what the grammar permits, not everything that Java's stream decode or similar built-in support. Yeah, great. Um, there's one more thing I want to do in the in Git is fine. You can run GitHad. Um, there's one more thing I want to do in the context of this claude session which now has a lot of context from us working with it a lot. Um, I want it to write a cloud MD. um because I think there's a lot of context from it having built the first implementation that should go into the um into the cloud MD because they won't be obvious to someone who just looks at the code later on. Um, so once it finishes this, I'll I'll have it generate a cloud MD. And I'll also want to make sure that it generates in the cloud MD, it also includes references to many of the most important files that we gave in our initial context here, right? So things like the um URL for the IDL spec, um the the grammar file, like all of these paths which would be pretty annoying for or would be hard for cloud to find later on. >> [snorts] >> Great. So, now we're getting closer to a clean check-in. It's also going to check in my to-dos MD and my recheck MD and my session MD. Probably don't need all of those. Um, great. compacted just after creating the git commit message. That's what I like to see. Did it actually run the commit? Yes, it did. Okay. So, now I'm going to get RM uh recheck MD and to-dos. Actually, to-dos MD I want to keep um because it's still a very handy way for me to just keep notes as I review that I can then later tell it to turn into separate issues and the like. So, I'm actually going to do this. Uh, and then I'm going to amend that into the previous one. Uh, LM tend to have a kind of style they write code in. And as someone with strong opinions about the code, I find it frustrating to pair program with them. Um, so, so this is there's two parts to this. One of them is you can make it change how it writes code. Like I've done this pretty aggressively. So, um, uh, I don't have a fully up-to-date one here, but like I have a cloud MD that I spent a bunch of time on for how I wanted to write Rust code, uh, frameworks I want to make use as, how to write compile fail tests, um, how to do how to use git, how to write like preferences for how to do code style and comments, um, you know, things that I wanted to avoid or make sure it uses like I document the XY problem as something that it should be aware of. And so these things you can use to actually make it better at writing and reasoning about code in the way that you want. Like literate programming is a big one for me that I think makes it write code in a way that's much more aligned with how I think it should be written. And then the other part is uh and this was I made this comment earlier today that I I try to not impose too much stylistic preference on the LM especially when it comes to things like naming and locations of files because the recommendation it makes is also what its future self is going to predict and so if you forcibly change what it does now the future prediction will be wrong and so will be slower in the future, it'll m make more like mistakes in terms of identifying functions, locations, files, that kind of stuff. Um, and so it tends to be a sort of I don't know if you have this expression in English, but in Norwegian it's called a bear favor. Uh, to do yourself a bear favor is to uh do something for yourself that's actually a disadvantage to you in the future. You just don't know it yet. Um, okay. Here's what I now want. Um, let's write a claude.md for the current project. Uh, incorporating all the insights you have accumulated over this session. Make sure to include references to useful files and directories in Avro and uh and blur for rust. uh such as and then I want to do this one and this one and that one's less interesting. Uh and so these are both avo uh enter for Russ this um and this and This uh also include links to relevant online references. Uh we should also link to and this The reason for this is when I I'm going to exit this cloud session and start a new one with like fresh context and everything and I want to then have a a decent starting understanding of the entire project so far rather than have to discover it as it goes. And we're now in a pretty good place for it to create that initial documentation because it's all it has been writing every all of the code that we have in here. Uh so let's see what we get out of that. We should end up with the cloud MD. Uh and then we can start to split the remaining issues work here into separate agents that can then uh figure out what's wrong. We can also start a new set of agents whose job it is to just run the current one on various AVDL files and identify new issues that sort of get added to the task queue. [snorts] And in fact, that's something I might start over here um after this one runs. So after once we have the cloudmd file to say start a bunch of agents that are going to document new issues in the issues subdirectory [snorts] it's uh to-do list is uh it's lost track of its to-do list because I told it to stop somewhere in the middle where it started like debugging a particular issue. I'm like, "This is not your job right now." And then it uh lost track of his to-do list. Um do you already use Apache Abro? Can you briefly talk about it? Um I introduced what this was like, what what Avro is for and stuff at the beginning of the stream, so you can like rewind to there to to look. Um but yes, I I am using it for um for a couple of things that work. Isn't yielding to the LM's preference causing maintenance issues downstream? Like if you don't like the code you read, you will probably have a bad time debugging issues later. Um, yes and no. So it it depends on how strong your stylistic preference is, right? Is it for a really good reason or is it just you kind of like it better that way? Um, and the other is you can always make the change after all of the agent stuff is done, right? It's just you shouldn't try to correct during iteration because it makes the subsequent iteration slower. But after all the iteration has happened, then you could feel free to like go rename all your function calls and everything. That might be okay. Um, great. It seems to maybe have uh I think it's identifi it's added mentions of like all of the issues maybe, which is like kind of weird. Okay, great. I think that's fine. Um, add claw.md. Okay, now I'm going to exit this one. Uh, and then we're going to start one here. And we're going to start one here. this one. Um start many agents, many sub agents uh each of which should um run the current implementation of the IDL tool uh on known AVDL files from under AVO uh start menu sub aents each of which should run the current implementation of the idea tool on known AVDL files from under Abro slash you can't see that part because it's behind my face um um and compare the output to the expected output when running the uh Java tool. Oh, did it document how to run the Java tool is the other question. Um, yes, great. Um, and compare the out to the expect when running the Java tool. Uh if discrepancies are found, the agent should um do first level triage of the observed bug and then file an issue under issues. The agent should not um the agent should not attempt to fix the issues uh and should prefer to debug issues using uh Rust example files in examples that are run with cargo run example. Um they should avoid changing the source files as much the source files in source as much as possible um to avoid stepping on each other's toes. Okay. And then for this claude, uh, we're going to enter plan mode as well. And we're going to say, um, uh, Bradley, you you don't need to worry too much. We're doing a lot of code review in this as well. This is also specifically because for this particular problem, as explained earlier in the stream, uh, I actually think agents work really well. there are a lot of problems that they do not work well for. So it's not like my coding streams are going away in the traditional form. Um, okay. So, for this one, what I want to do is uh for each issue identified in issues, um, start a sub agent with its own git work tree um to fix to debug and fix that issue. Um, now I think there's an interesting question here of um, make sure uh, run those agents all in blocking mode. Um, the reason I want to run them in blocking mode is otherwise they can't prompt for permissions and they're going to need permissions for things like cargo check, cargo test, um, various bash commands and they'll just fail and then get nothing done. Um, and then I want to include the same thing I wrote here, uh, of using examples. Um, great. Hey Dave. Um, all right. So, let's see how what that comes up with. So the the idea here is basically the the current check out of the repository like the the main one we're in um is going to be the sort of clean copy that we're doing analysis over that we merge things into. Um and that's also where we're running this set of agents that are going to identify more and more problems by running on AVDL files running against the Java tool comparing that kind of stuff. And then for anything that's fixing the issues in issue slash um those agents are all going to get their own work tree where they can go hog wild on you know moving things around and uh you know changing source code and without the risk of stepping on each other's toes and hopefully end up coming up with solutions. Um, by the way, each agent should uh commit its work at the end of fixing the issue using the commit writer skill. [snorts] It has a clear picture of all 16 issues. Do you now? Do you now? They both claim they have clear understandings of the problems. I find that hard to believe. I don't understand the Java class annotation in the IDL docs. I don't see a test case for it either. Does it have an effect on the JSON? I believe that what that is for is uh I think all properties get propagated to the JSON, but they don't have a semantic meaning. they're just additional fields in the JSON that the generator can choose to make use of. So for example, um I think if you have Java class in there for example, it'll be directly translated into the JSON as like Java class colon and then some string and then the Rust generator will just like not care about that particular annotation or property. Uh but the Java generator will that that's my understanding at least. Do you think it gets represented in the JSON by just getting copied over? [snorts] Um, okay. Let's see what we got. What does this one think it's going to do? We want to systematically test every AD AVDL file in the average test suite against our RS implementation. Compare the JSON output against the Java tools golden files. Triage any discrepancies. File new issues under issues. This parallels the existing integration tests in test integrations. We use the CLI tool directly and produces detailed bug reports. Great. Uh yes, seven agents. Why seven? Who knows? Uh each with a pre-assigned issue number range so they never collide on file names and issues. I I don't want these files to be numbered, but sure. Um they're going to run that. Uh check the 16 existing issues before filing. Yes. Although I would like do I have UU uyu ID gen? Yes I do. Okay. Um the tool. Okay. Um I have I have a couple of notes here. Uh for the issues files, let's use uh UID gen to give them unique names but still include a short but still also include a short description in the name. Um, uh, also test the IDL. What was the other command? Schemata. No, this also check the mode where applicable. It should also produce the same results. Uh make their temp um for any temporary output files uh that the sub agents need. Um use temp um use dot slashtemp. Um this is so that it they don't start generating those files into slashtemp and then I need to grant them permission to have access to slashtemp. Um gen uh actually use mechtemp inside do slashte temp uh run all the agents in blocking mode so that they are able to prompt for permissions. All right. How's this guy doing? That's fine. Make me plans. Why do you need nroc? I mean you sure I guess it's trying to figure out how many things to run in parallel. Uh this is actually there's a there's an instruction I forgot to give it give it here. I told it to start sub aents for all the issues and I don't think that's quite right. I actually want it to think about semantic grouping of the issues and which ones are likely to touch the same files and which need which should be done in what order. Um uh an uh analyze the issues and check for which are semantic semantically related uh and which likely require changing the same files. Also um triage triage them uh to understand which are which likely need to be fixed first because they will impact many of the others based on this analysis. uh uh choose the order and grouping of issues to have the agents address. Um have you considered trying the new agent team functionality? Um I saw that was released like this morning. I I've not wanted to enable it here because um well both because they're in beta and because I haven't used them myself so I don't know how well they work and also because they apparently run through token budget really quickly and if if I run out of token budget here I just have to end the stream which is annoying. Um is it possible for these program assistant agent to actually do anything destructive? Could they rewrite git history for example? Um, so I mean they can that in general. So there's a sort of sandbox in claude where it will sort of analyze every command the agent wants to make and if it's not on a an allow list that I can help configure, it will prompt me, are you willing to let the agent run this command? And if you say yes, then it sort of pattern matches on that and goes, okay, um, in the future I won't ask you for commands like this. It's not perfect. So the agent can still choose to come up with a command that like matches the allow list but then therefore doesn't prompt me but actually has different semantic meaning. Find is a good example of this where people will often hit like yeah just allow the find command but find has find-ex which is you run arbitrary shell. So that's probably not a great idea to to blanket allow. Um so they certainly can do destructive things and git writing rewriting history is a is a good example too. Although there you can usually revert it. The problem is more if you allow them accesses like slashtemp, they can just delete everything in slashtemp if they wanted to. Subject to your your performance um stuff, right? Um oh yeah, I never run claude in yolo mode or or any other uh agent either. I'm happy for it to prompt me for uh prompt me for permissions and then I'll I'll I'll think about it. Thank you very much. I think if you run it in VMs, this is less of a problem. All right. How far has this gotten? Um, okay. Yep, that's fine. Great. Uh, yeah. And UID gen has I forget what UID gen is it seven that's like the nice short one is it it's like four or something but it's fine. Um yeah that's fine. Go. So that's going to hopefully find a bunch of new issues. Uh, and then ideally we can feed those into this one later on. The plan agent produced an excellent analysis. I love how it's congratulating itself on how good of a job it did. Uh, let me now verify some key detail details. That's fine. Incorporating your additional instructions. Okay, we'll see what plan it comes up with. Um, yes, ls is fine. You're allowed to run ls in this directory. Um there's also a real sandbox in cloud code via sandbox. There is the the thing that's annoying about using the real sandbox uh is that um [snorts] you end up then not being able to grant it permissions outside of the current directory which sometimes you really want to do because you want to put work trees there or you want to give it read access some other repository you have or like there's always like there are reasons why you might want to allow it outside and the sandboxing can be too strict for that sometimes. It's a little bit of a I don't know. Um the real problem is we don't have the security granularity on the OS for this kind of thing. Something a human could do but won't might happen stocastically by LM. Yeah, exactly. And I mean you you can figure out what sandboxing level you want, but the the the sandboxing level they go for has tradeoffs, right? Uh yes, that's fine. Uh yes, that's fine. Yes, that's fine. This is the one where it chooses to use like bash subcomands and then it can't like analyze them well enough to let you like go don't ask me again. Uh this is another example where like oh it just it wants to run Python 3 commands. I'm not going to blanket allow Python 3. That seems like a terrible idea. Um that seems fine. That seems fine. But that also means that I have to do this which is annoying. Uh yes. Yes. Yes. I'm a little sad that it decided to use Python for all of this. Uh prefer using Rust's examples for paring the Java code the JavaScript the the JSON uh rather than using ah it's fine. Yes. Yes. Yes. Yes. This is annoying. Um here I what I should have done is first have it write a compare JSON script that they can all use. Um because instead they're all going to come up with this idea of oh I should probably have a JSON comparison thing. Huh. Um yeah, that was pretty annoying. Okay, that was a mistake on my part. Uh, all right. Let's look at this uh fixing fixing problems thing. Okay. So, it's Yeah. So, it's sort of analyzed for the different issues. Which files are they likely to touch? Um, I'll touch import Yeah, I buy that execution plan wave one independent fixes. Agent A is going to do the reference restructure. Agent B is going to do import improvements. C is going to do literal parsing. D is going to do this is our to-do on check that these things are actually the doc comments and also add reg x is going to fall back to namespaces. And these all look like they're probably independent. They're all going to have their own branch. That seems fine. After wave one completes and merges into main, these agents branch from the updated main. Yeah, great. Um, okay. Now, where do I want these work trees to be is the other question. Um, I actually think so. The really annoying thing here is that if I [sighs] uh the default is that the work trees are created one level up from where you are and then in a in a sibling directory. The problem is if we do then all of the permissions have to be granted separately in that directory from where we currently are. And so I would need to regrant all the permissions to all the agents. Uh which is a little bit annoying. Um but I guess that's okay. Yeah, that's fine. Just do it. And then let me go back to saying yes to more of these Python object comparisons. That's fine. That's fine. That's fine. O, luckily these are mostly for the initi like each agent needs to do this once. Um, but then after they're all running, they shouldn't really need to do it anymore. Um, and they're also these commands they want to run are pretty easy to pattern match on. This is also why people run things in uh yolo mode is so that they don't have to do exactly what I'm doing right now, which to be fair is pretty annoying. Um you can still have it fall back to the regular permission system if it can't do the thing within the sandbox. Oh, nice. Yeah, then maybe that would be a good um good alternative. Oh, I really should have told it to create a single uh a single like JSON comparison tool before starting all of these. I'm uh I'm real sad about it now. Yeah, that's fine. Yes, that's also fine. Um um I don't know if you've answered this already, but how much of your time on average is spent dealing with cloud code other than writing code yourself? Um it really depends on what I'm doing. Like I don't think there's a I guess on average it it like varies from week to week because there's some weeks where I don't use agenda coding at all and there are some where the the thing I'm working on happens to align well and so I use it a lot. Um so so it's very hard to give you a an an actually representative average. Um, I'd say if I'm in a very like agenticheavy part of my work, I could easily spend like a third of the day to half of a day like actively doing prompting. Um, and then there will be other times when the task I give it is sufficiently like not oneshotty, but like I can kick it off and then I can be in a meeting and every now and again it'll prompt me for something and I just hit yes and then keep going. Um, and so it's it's fairly uh sort of selfgoing in the background. And so I'm not I it is do I'm doing a lot of agentic stuff in the background, but I'm not it's not foregrounded in my brain very much. Um, but it's hard to say what the average of those are across the weeks. But again, there are there are some periods of time where I'm only doing things myself. Like I'm not really using the agents at all. Uh, and of course, there's also a lot of work that I do that that is not even necessarily about writing code. It's more about like technical leadership and um you know uh setting milestones, working with teams to debug problems like uh personal issues, uh dealing with uh you know social planning in the context of work. Like there's all sorts of other things that come up that take up my time that are not about writing code either. Um, [snorts] I have a standard Docker container I run agents in. The source code is the only external thing mounted to it. Yeah, that's pretty common. Where I find that gets really annoying is the moment you actually want to give it access to something that's outside. I also love this. It's written a Python program for itself where it goes, [snorts] uh, actually wait. Looking at it, this means the name space is set. But then when the schema is registered uh and then it Yeah, let me check why. That's funny. And wait, so it's a it's a Python. Look at this Python script. It's a bunch of code comments ending with a print of need to check. Great. Love that. I'm very glad I had to review and approve that Python file. Uh yes, that's fine. You can you can make dur that. [snorts] Um uh do you use claude and AI for workdev or only personal open source? No, we use it at work as well. Um, but only in like there's like very specific things where it's okay to use it for and and uh obviously we have a lot more like reviews from actual other people about the code you produce. Um, but no, we use it at work as well. Like for example, the other day there was um there's like a 1,500 PDF, 1500 page PDF of a particular NATO standard for like the bit patterns you use when communicating with a um like a UAV that's flying. And the intent is to have a standard so that you can have like um ground control station for the UAV that's like written by, you know, country X and a drone written by country Y. And you want the two to be able to interoperate because they're all part of the same NATO deployment. Um and the Yes, that's fine. Um and uh that's all well and good in theory. The problem is that this the spec is 1500 pages of PDF where like 500 of them are like PDFs list with like reams and reams of tables of like these bits have this type with this semantic meaning 500 pages are implementation guide on top of that and another 500 pages are like spec requirements over the implementation and so it's just a a huge mess. Um, and uh, this is something where it's actually very useful to be able to point a set of agents at this document and say you each agent is going to own one of the tables and implement the Rust equivalent of the strct and then do all the testing to make sure that the the bit patterns line up. And you do that for all of them. They're all parallel agents. Uh, and then you run a separate parallel set that will check all of those against the implementation guide. check all of those against the requirements and annotate them and then you have a bunch of sort of agents checking agents checking agents and then you go over check at the end and that checking is a lot easier than writing and generating a lot of the boiler code that's just like transliterating PDF into rust strcts um but obviously for the kind of work we do there's a lot of this that ends up being like sanity checking the stuff that comes out you don't just want to like uh send the LLM off and just accept whatever comes out of it. That's a real bad idea. [snorts] [snorts] All right. So, this is now I'm curious whether it's finding a bunch of things. It's what? It's filed more issues, but those issues are not using Yuyu IDs, which is what I asked it for. Oh, we're running into the fact that Certi uses Bree maps for maps and so therefore the keys in things like properties get reordered. This is an interesting question of like I kind of think we Oh yeah, there's no accompanying test suite or that there is supposedly an XML schema that defines the valid messages but is hosted on the web server that was taken offline two years ago and there are no other copies. So, you know, fun times. Um the the interesting bit here I think is do we want to guarantee that our tool generates bit exact JSON and I think the answer to that is no. I think there's an interesting question about whether the ordering of keys matters but it should not. And so I think one thing we might want to add here is a um maybe it's as easy as like modifying the cloud MD here to say when comparing output for equality ignore key ordering for example. [snorts] Uh in fact let's add that here. uh consider if we should add to claude.md that when comparing output between the Java tool and ours uh that the ordering of JSON keys should not be should not be should be ignored. Multiple agents independently filed issues with the same number prefixes. I told you to use UU IDs. [snorts] Uh actually also want to get ignore uh temp. [snorts] >> [snorts] >> It is also so if you look at the session MD here uh 21 test files reveal multiple compounding import bugs missing messages qualified uh names and wrong type ordering these interact with existing issues one and nine to produce heavily divergent output. Oh no, not heavily divergent. Uh, great. Yeah. Okay. Although interesting, there are some that passed. Leading underscore passed. Great. [snorts] Uh I mean we could also set uh 30JSON's preserve order but I don't think I care about that. So preserve order there is also um it just preserve order when you parse compared to where you print and for maps I think it keeps insertion order. So that still wouldn't quite give us what we would be after here. Um because we don't have any guaranteed about the ordering that the Java is using. Um so I'm actually inclined here to say that for the order um [snorts] yeah there we go. That's the actual change that I want. Uh, with the new guidance added to Claude MD, uh, does this render some of the issues irrelevant? I'm skeptical whether these are actually doing anything because they haven't prompted me for any permissions. That makes me skeptical. >> [snorts] >> Uh, is the issue for schema JSON ordering or over the wire ordering? It should only be for schema JSON ordering because we're not doing anything about the overthe-wire encoding here. Uh the order annotations I think um you know that's a good question. Let's go look. sort order of a field within a record. I think that actually ends up coming out here. See the sort order section below. Note also that every binary encoded data can be efficiently ordered without deserializing to objects. So that's why so I don't think it affects the serialization. Um but it does seem like it is an annotation that has like meaning not just in the Java implementation side. Like what is what is happening in the background here? Like I'm not seeing any runs of cargo which makes me worried. Um issue 17 key ordering is still a real bug. We want our output to match the Java tools key ordering. Ah, no. Uh, we're fine if our tool does not match the Java tools key ordering. Uh, it is a non goal to match the JSON key order. >> [snorts] >> Oh, I don't know if I believe that they're still working. I think they're all stuck asking for permissions is what I think is going on. Are the agents stuck waiting for permissions? Maybe it's a big problem with uh sub trees. The the real solution to this is to start claude in a directory where you will have the work trees rather than so I started it directly in the directory for the main repo. If I started it one like I created a a parent directory and started it there um this problem would go away because the permissions would be associated with the um uh with a parent directory which transcends or the which um descends into all of the subdirectories as well. There we go. Agents are stuck because tool permissions are being autodenied. Okay. No, no, no, no. Stop stop. Uh, okay, great. This one's also done. Okay, let's uh let's tidy up the structure here a little bit. Uh, add issues. Um, add get ignore, add claw. Um, find more issues. I'm not living up to my own uh checkout to my own standard when it comes to uh commit messages. So here's what we'll do. Uh we will actually here uh we'll make deer uh we'll moved to main. We'll make deer AVDL. We'll move main and AVDL work trees into AVDL. We'll cd into AVDL. Uh and then we will uh ln s main claude claude here. Um so that way we've we now have the claude permissions in here. Um, and now I can start Claude. In fact, I can also then move main. I can move the jar into here. Uh, that's the main one. And now I can resume Claude here from the directory that holds the work trees. Uh, and now I should be um try launching the agents again. Now, make sure you launch them in blocking mode so they can ask for permissions. This is a a known bug in Claude that the background agents are not able to ask for permissions. Um, and it's it's very frustrating. Because I'm guessing if we go into any of these. Oops. Stop. Uh, right. I broke the I broke this cuz this now needs to be main. There we go. Uh and then I will said um dash I substitute actually cat wtb get set- i substitute um avdl.getit with um avdl main.getit git in wtar.git. And now if I go into any of them. Ooh, what? Oh, it's it's really unhappy with me. Fine. I will remove the work trees. and have them be recreated instead of trying to do get surgery. Um, uh, you'll have to recreate their work trees. Um, oh, this is the other problem is that the claude MD uh is now different. So the cloud MD here is going to say the main repository for this project is in main. Um, git work trees for sub aents should be created in um, AVDL work trees. You'll need to recreate the get work trees. Checkcloud.md. Um, can you see how many tokens you've burned since the stream starts? I think so. Uh, today is the 6th, so 75 million tokens or $53 if this tool is accurate. Um remember that the root of the git repository is in main. Yeah. So, this is why I was saying that the um this is why I was saying that I think max makes sense for the stream because I'm doing a lot of work here and a lot of tokens and a lot of parallel work. Uh and so, yeah, it gets really expensive. Um at the same time, you know, the amount of both code we've read and generated here is pretty high um compared to what I would normally do in a in a session. Um, but yeah, I mean claw is pretty expensive, but it also does its job very very well. All right, let's see if it uh it's going to be able to um I think it would save like 8x. I think they reverse engineered the cost or something. Uh, what do you mean? It would save me 8x. I'm afraid if I use LM too often, my brain/skills will rot away. Do you have the same concern? Uh, or do you have some strategy to protect against that potential? Um, so I don't think it rots away as long as you're still also doing programming and as long as you're reviewing the work that comes out of the the agent. And this is where I think um, you know, I draw the distinction here between vibe coding and power coding where vibe coding is the uh, run the agent and then don't look at what it produces. And power coding is use it as a power tool. So be really careful. Don't cut yourself and like measure twice, cut once and review the work. Um, and if you do that, I think that that mitigates a lot. But you do have to also write code yourself. Like it is not sufficient to just have the agents do everything. And there's also a bunch of stuff they're really bad at. And maybe you can identify that as it happens, but much of the time it's also useful just get working knowledge with try it with the LLM. If it does a bad job, then write it yourself. And then over time you'll get better at estimating where should I just do it myself from the get-go. Uh a friend reported about 30,000 euros per year per employee in the IT department for tokens alone. Um yeah I could buy that. I mean it really depends on how well they're using it as well, right? So if you just try to use it for everything and you are not principled about how you use it, then they will end up spending a lot of tokens because they do go through a lot of churn that is unnecessary. An example here again is the um uh you deciding to change the names and being very nitpicky where you're causing later iterations to be more expensive because the predictions aren't going to match on the first get first go. Um and yeah, I think critical skill thinking critical thinking skills wrong way around. um are certainly very important here to be able to get them to do a good job. That is certainly true. Why does each subm module here have to clone from upstream rather than clone from the main repo? That seems annoying. Um doesn't it make you sometimes feel stupid like you know when you don't remember simple stuff that you coded used for years? Um I don't think so. At least I haven't I haven't had that experience. Um but at the same time I also get that experience sometimes without like before starting to use LLMs I guess of like you know 10 years ago I knew exactly how this function works and that's I've paged it out now and that seems natural and I don't think that's really a bug either, right? like it feels as though if information is no longer relevant then it should be paged out so you can fit room for other things. Um, I don't think that's inherently a problem. And I do think that over time, you know, as we find things that the LLM's like better to to delegate the work to the LM for for certain kinds of tasks, then the relevant skills for those tasks will also fade into the background. And it's not clear to me that that's bad as long as the things that should not be paged out because ALMs are not good at it remain front of your head. Um, uh, since the cat is out of the bag, do you think we all have to allow ourselves, even if we don't like it, to integrate AI into our workflow, to not get left behind? I mean, I hear this a lot of like this fear of or it's like fear-mongering more so than actual fear, I think, of, well, you better you better pick up LMS now, otherwise the whole industry is going to move on without you. I don't think that's true. I do think that like there's um like this is a useful tool. There are things that this makes more efficient and I think not making use of that is um uh it do doing yourself a disservice in a way, right? like uh what happened when when sort of aentic AI became a became particularly you know this actually works for stuff what I and I was certainly very skeptical and I remain skeptical for many uses of it um I basically sat down and forced myself to use it for everything for like a week and a half and it was both in personal and work development and the reason I did that is because it made me it forced me to see the places it did well and the places it did poorly. If I hadn't gone through that experience, I wouldn't even have known when the tool was appropriate and inappropriate to use. And I do think it's very worthwhile to go through that experience of like forcing yourself to um forcing yourself to do the um the the due diligence here of at least learning what the tool is uh and how it works. I do also think that when you go through that that exercise um you come out the other end with a new tool in your tool belt that you can wield hopefully responsibly and and know what to use for and it will make you faster for some tasks. Um there's also a good point here that like it um some of the ways in which it makes us think I is is very useful because I think as programmers when we write code we sometimes get lost in the details. Um sometimes that's necessary but other times it means we sort of lose the forest for the trees. Um, and so it can be really helpful sometimes to work through an agent where you have to actually describe the problem you're trying to solve and the end state you're trying to get to rather than being too absorbed into the code to see that you're actually building the wrong thing. And so it forces you to take a step back. uh again applies more so in some cases than others but but even so uh better pick up LLMs now but also the model 3 months ago is outdated and you must learn again. I don't think that the learning changes that much. I just don't think they're all that different. Like sure they've gotten better at keeping more context and what commands they can execute and integrations and all of that, but it's not like it's hugely changed how how I interface with them now versus a few months ago. Um there was a really interesting um thread by uh what's her name? Uh Leia Veru. She's like a big name in the the sort of web spec web standards uh space where she basically sort of equated the the increased use of LMS to the invention of uh like the C programming language um where people suddenly stopped writing assembly and started writing like higher level code that was not assembly and then the same kind of critiques came up of like, well, but obviously um what's happening here? Um obviously we can't let some program just turn this into assembly for us. It's going to generate terrible assembly. And I don't think the the argument quite holds just because um I do think there's something meaningfully different between a deterministic program that turns like syntax into into assembly and something that is much more stochastic like an LM is. But but the analogy is kind of apt, right? That why are we saying you have to write the code? Again, there are some things where you do, but it's not clear to me that in every case it's bad to uh have a sort of higher level abstraction to the writing of the code. It really depends. Um, decomposing and formulating problems as tiny bite-sized chunks is a skill that I've developed during my career and it feels like I haven't exercised it to the same extent with the emergence of LMS. Um, I actually think I use it quite a lot with LM2 because I have to help it decompose the problem. I think that's actually one of the things that LM are not that good at or rather they might come up with a decomposition that I think is the wrong decomposition. Like it will lead it down a bunch of incorrect paths and sometimes it's useful for me to decompose the problem for it and say no do it do these things in these order this order first and then you go and actually do the the execution. Uh what do you do in between waiting for agents to do and work? Oh, I have enough things to do that that I I don't struggle to find other things to fill my time. Um, it also means that I can work on more things sort of in parallel. Um, the trickiest part actually is keeping the keeping enough context and working memory that I can switch between them because the agents obviously can because they just have more RAM in each case, right? Um, but me being able to in uh meaningfully review and reply to them, that becomes the bottleneck eventually. Um, but there's enough stuff of like interacting with peers, making sure we build the right thing, talking to customers, talking to partners, like doing research, like learning for myself, all of those things that in a way I get to spend more time on now. Uh, are using any MCP servers like Rusttos or memory MCPS for it to gather and store knowledge? Um, I haven't really been using MCP so far. I have um I have the LSP MCP enabled. Um, and I also have the um, uh, I have an MCP I wrote for working with large PDFs. This was for the that NATO standard stuff. Um, but those are the only two MCPs I've used so far. I do know that people like them for things like persisting knowledge so they can fetch it back later or for accessing Rust stocks, although there I think the LSP integration is actually pretty decent. Um but uh but no I I wouldn't say I'm like heavily making use of MCPS. The flicker gives me a DJ. Yeah, I know. The flicker is real bad. It's a It's a known problem. One of the fixes actually is to zoom out the terminal. But if I zoom out the terminal as much as I normally would, it would make it impossible for you to see it. Uh maybe not. Let's see. Like if I zoom out, it's like this muchish, then the flickering goes away, but it's also harder to read. Uh, have you had any problems trying to diagnose an error running an error running service when no humans on the team have mental ownership of the details of the code implementation? um not quite in that regard, but I have ended up using LMS for um I had um so there's a C library called NNG, which is like a network protocol implementation thing. Um and I'd written Rust bindings on top of that and then I'd written a so that was like a CIS crate to NNG and then I'd written an asynchronous wrapper around that crate to be able to use it from like Tokyo. Uh, and then I'd written a Rust program on top of that wrapper around the CIS crate around NNG. Um, and I had the context for all the Rust parts of that still pretty fresh in mind. Um, but hadn't spent much time in the NG codebase itself. And obviously there's an NG maintainer I could like talk to, but you know, they have their own to deal with. Um, you know, I don't want to disturb maintainers if I can avoid it. So I didn't have any context on the C codebase. But what actually worked really well was telling the LM, here's the here's the levels of the stack. Uh here are the pointers to the code for each of them. Um there's this bug in the top level code. Create MVPs that reproduce the bug at every layer of the stack and keep going until you cannot reproduce it anymore. Like basically find me which layer in the stack the bug is in and give me the MVP for that layer. um the like the minimal viable it's not MVP really it's like the minimal reproducible example um and did that great it ended up with like a C reproducible example that found an actual bug in NG and that would that helped enormously because it meant I didn't have to build the context on an NG itself. Uh so that was very very useful and it was certainly something where I didn't have mental ownership at all over that codebase and yet the LM was able to give me the insight that I needed for it. Um, my glasses need glasses for this font size. I know. Yeah. I mean, it's just to avoid the flickering. Although, I think now it um now we can zoom back in here. Uh, yeah. Let's do the merges. Bring him back into Maine. [snorts] Uh, have you had a chance to try out specd driven development? I mean, arguably all LLMs are kind of like that, right? Like if you work diligently on the plan, then the plan sort of becomes a spec for them to write after, but not in the more formal sense. No. Yeah. So, what's interesting now is you'll see what it it's been doing is each of these separate agents are now finished doing their fixes. And now it's going through merging each of those fixes into main. And then for each one, it's running the test suite and it's seeing, oh, we're still getting the same test failures we got before. So, nothing is regressed and that's fine. So, it's not it's not being naive and going, oh, it it failed now I must stop. but instead realizing that's the same test failure and just moving on which is what we would want it to do. [snorts] Um and so the hope now is um that you know it's going to do this wave of the first iteration of fixes and then it will go into the second iteration of fixes by again reusing the work trees. Um, and from here on I would expect it to basically be able to um, uh, continue to drive its own development, right, of uh, alternating between finding more issues and then fixing them, finding more issues and then fixing them. So, uh, the the plan I would end up constructing for it here is a plan that specifically outlines that strategy of I want you to alternate between these two phases. Like don't run one and then the other and then finish but run the first which is identify the issues then run the categorization and sort of ordering and then run the agents to fix them and then do the merging and then go back to do the issue finding and that way once you give it that loop it can execute that loop relatively independently. Um and so that's what I want us to get to next. Uh sort of for the persistent question is there a reason as to why you're using an LM to code from the stream? Uh, is this stream meant to be about LMS for development? Uh, no. I I've I've answered this a couple of times over the course of the stream. There's also a mention in the description of the video for why specifically we're using LMS here. And the short of it is that this is a a use case where I think LLM actually can do this job really really well um and really quickly where like it's used the right tool for the job. That doesn't mean that I use LLM for everything. How do newer engineers differentiate for LLMs um from LLMs? Um you know I think one big difference is that more junior engineers learn and the LMS don't really learn. Um you can kind of emulate it, right? So you document each of the things that you see them making mistakes of and then in the future they make fewer of those mistakes. But one of the big differences is that the LLM doesn't really it doesn't understand the guidance that you give. It doesn't understand the problem it really made. It just um it can sort of emulate that understanding. But I found that with junior engineers I feel like I can make them better over time. and the LM I can't make better or I can steer it so that it makes fewer of the dumb mistakes but it doesn't actually grow into a senior engineer that has better judgment and that I think is the biggest difference. Um, is there a huge difference between XML and JSON? I've never worked with them. That is a weird question. Uh so yes and no they have very different syntax and very different standards. JSON is a lot simpler than XML both in terms of parsing and in terms of what you can represent. Um XML is more used in enterprise is a little more used in older systems than newer ones. Um, JSON is a very simple file format that was sort of inspired by and and primarily driven by JavaScript. Um, yeah, I keep merging. I want to get to the point where we write the plan. So, it's self-going. Yes. Yes. It is interesting though that it's still failing on the same two tests. Oh, Clippy's going to be unhappy about this codebase is my guess. There's another um language for sort of textual representation of data that I really like called Cuddle KDL. Uh and cuddle is a is a fun one where uh it's sort of inspired by toml but toml is really bad at recursion. Uh cuddle is written to recurse a lot better. Um but it's like it's unclear that you would use cuddle in place of JSON but you would use it in place of toml. So it's like we've seen this divergence now in things that are used for data and things that are used for configs and like maybe they should actually have different languages which is going back to what we used to have with EDI files which are config only format. You would never encode data in EIE. [snorts] [snorts] I am really curious how it ends up fixing the um the error source location stuff because in a way that's obviously the the main thing we wanted here was the ability or the main improvement over the Java one apart from not being Java is having it um uh having it give much better errors when either there's a parse error or when there are um semantic errors on top of the on top of the code. [snorts] Have you ever joined a Rust codebase with an architecture you disagreed with? How do you influence change as the new person? Um, well, so I have been brought in to basically do code review for code bases that aren't mine. Um, and when doing that, there are certainly things where I go, well, this doesn't make a whole lot of sense to me. Sometimes that's because I don't have enough context. like it's actually a reasonable design. It's just that it's not obvious from the outside that that is the case. You kind of need to get into the specific semantics and use cases of the of what it's built for to realize that it makes sense. Uh or the constraints that they have when building it. Um, I'd say I don't I don't feel like I've usually run into cases of uh I think it's completely wrong and the team that built it think I am completely wrong for thinking that it's wrong. Like usually you sort of enter some amount of alignment. Um, but this also depends on how you approach conversations with other people, right? If you approach them as you've done this wrong, they're going to enter a defensive posture immediately, right? Whereas really what should happen is a um can you help me understand why it's done this way? And then they explain and hopefully you can go oh but doesn't that mean that or uh you know have you run into problems with and then you can sort of guide the conversation so that you understand the requirements and conditions they have and they understand the insights or experience you might have to bring and you come up with a oh actually these things should stay the same and these things should change. Yeah. Empathy. It's a weird word, huh? >> [snorts] >> Uh did you have the agents convert all these issues into regression tests? No. And part of the reason is because um for some of these they're just like uh the thing doesn't work. Uh and but for many of them it's this AVDL file that we get out of the AVO repo doesn't convert or doesn't convert correctly. And those will already be our test suite. Our test suite is already running. Um, our test suite is already running all the AVDL files. So those are already regression tests that we will continue to have. Um, have you figured out a good agendic workflow for formatting clippy? No, I have not. Uh, it's very annoying because the LLMs tend to generate code that's not quite formatted and not quite clippy clean. Um, format is the the worst one. Clippy is not so bad. But for [snorts] format, the moment it formats the code or I format the code for it, then it needs to read the whole source file again because the source file caching claims that the file is no longer up to date, which makes sense because if it now tries to edit the file, all the line numbers will be wrong. So, its diffs will be wrong. So like it's correct that it needs to read the file again, but I don't want to spend context on keeping two copies of the same file, one which is formatted and one which is not. So what I tend to do is just not do formatting as part of any part of the LLM interaction. And then when all of the LLM stuff is done, then I'll do a a cargo format and commit. Um, so I usually try to not do any of that in the in the LLM sort of critical loop, if you will. This is it's very similar actually to the forcibly apply my style to the LM while it's iterating of saying no, rename this function to that where I try to avoid doing that for the same reason because it sort of it confuses the LLM in in they're in two different ways, right? The formatting confuses the LM in that the file looks different. um the renaming confuses it and that it's in that its future predictions will be wrong. But in either case, kind of just like let it cook as sort of the expression, right? And then after it's done, then you can say, okay, now I'm going to run cargo format and I'm going to commit that or or then you stop the session. You say I start a new session, please fix all the clippy lints, end that session, start a new uh and then just run cargo format and then commit the result. [snorts] Uh does one need to learn the lingo? What does claw doing spelunking mean? No, this um the like the like blinking thing in red here is usually filled with a placeholder that's just a synonym of the word thinking. And this is like a it's entirely a humorous thing from Claude or from the Claude team, right? Of like we need to show that it's doing stuff and we don't want to just say like thinking. We want it to be a more fun experience than that. So we're just going to pick a random new word each time. spelunking being one of them. Um, and then, uh, what they've recently changed over to is, um, sometimes it will basically use another LLM to summarize its own work and put that summary of its work into the promp or into the status bar here instead. [snorts] Uh, seeing using LLMs more, no more lite. It's like yes and no because I'm uh I still don't think LLMs understand and this is sort of the and I don't think they're on the path to understanding either. I think I have a like this is the sort of controversial statement from me is that I don't think I've seen LLMs get better over the course of the past you know let's say year uh in the in the axis of understanding they've gotten better in other ways like being able to use tools having more context uh being able to recognize and repeat more complex patterns and and combine them but in terms of the like understand what's happening not Um, and that I think is also why there are a bunch of use cases I don't find them useful for because they're just they're not able to solve that kind of problem. Um, and so I I remain a lite in the sense that I don't think this is coming to like replace all of the work that we're doing. I don't think every task is amendable to being solved by an LLM now or in the near future. Um but I also definitely agree that having used them more now there are some use cases that they are very good at or very efficient at uh and for those it feels like it's a very worthwhile tool to to learn know and use um uh as AI takes over most of the coding tasks. I don't agree with most uh but some uh what skills become more important. How would you suggest improving on them? I I think the skills that are that become that I think actually are already important but it becomes even more visible that they are important is um critical thinking like the ability to take especially the plans but also the solutions that come out of claude and thinking through is this actually a good idea? This is how the the decomposition of the is this the correct decomposition of the problem. Is it the correct identification of the root cause? Is this where the solution should be implemented? This is how this solution should be implemented. Um I think the second is actually um like ability to review. I think a lot of programmers are actually pretty bad at doing code review. Uh whether it's from LLMs or from other people. Uh, and that skill is even more important now because there is a lot more to review because you're also reviewing incrementally as it goes. At least I think that's the sane way to approach doing so. Um, so like that that that skill of getting good at reviews and being good at giving actionable feedback in reviews becomes very critical. Um, and I think maybe a third is um just like reading and understanding speed of code. So traditionally because you wrote all the code yourself um you were understanding it as you were writing it. So your understanding speed all only had to be at the speed at which you can write the code because you usually will like understand it and write it concurrently. Um, now there'll be a bunch of code that gets presented to you that you didn't write and now you need to reason about it. And so this the speed at which you can read and review that code starts to become a bottleneck for the the workflows that are particularly LLMrone. Um, and so, uh, so I do think that that is an example of like skills that they're all three very useful from before, but they become even more critical now. [snorts] Um, adding correct context uh is also one of the things that it's really important. Yeah, it's true. And I think this is another thing engineers are pretty bad at is uh recognizing that the person you talk to don't have the same context you do. So you try to explain based on the context you already have and the other person doesn't understand because you haven't conveyed that context to them. NLMs struggle with the same thing. If you try to give them a problem, but you don't give them the context that's in your head, they have no hope of being able to solve the problem well. And so it forces us to get better at that kind of context transference. Um, but but again, I think this is a skill that good programmers and engineers have anyway. And so it translates very directly, but it this working with LMS makes it even more obvious if that is a skill that you still need to develop further. >> [snorts] >> Um, okay. I'm hoping this is nearing done with its uh background task because for once I actually have a hard stop to the stream. So I have a I have a thing at three which is in 40 minutes. So the stream will end in, you know, 40 minutes minus epsilon. Um, and so I'm hoping we get to do one more iteration of setting up this sort of uh critical loop for the development so that I can basically my hope right is that I can kick that off, see it go through one loop, and then uh end the stream and then just leave it running while I go do my things that happen separately. Um, I've been learning to read code faster from your videos. Uh, especially those of the Decrusted series like the Axom Crate. Those videos are real assets. I'm glad to hear. I I do actually think that like reading speed for code is like uh is a superpower in a way. It is actually really really useful to be able to just like scan code quickly and know roughly what it does and sort of spot the spot the patterns and spot the the uh the implications of the code pretty quickly. I was trying to solve the error source locations. I'm so interested to see what that looks like. Uh which one is doing that? This one. I just want to see how it produces these um diagnostics. Yeah. See now. So now it has a make diagnostic helper. You pointed out the source and the property. The star token uses a bite off of the original source text. Yeah. So it creates spans. Okay. Great. Now we're getting that's going to give us much better errors. I'm very curious to see how those turn out. Um I also this is actually another thing that I would like it to make more use of is um see all these like question marks where it just bubbles up errors. I wanted to start using context to propagate information not just about the lowest level parsing error but what was it doing while uh while doing that. So let's go ahead and go back here. Um move main. Actually we'll just we'll do it here. Um for every consider um we should be propagating additional context. Uh I think miet has context right or does it not? No too big. Uh yeah. Uh uh consider we should uh whether we should be propagating additional context uh that would be useful for users when the uh error eventually bubbles up to them. For example, uh if A imports B and B has a syntax error, uh it's valuable to be able it's valuable for the error to point to A's import of B. [snorts] I've used LMS in a more interactive way like letting it work on a much smaller scope and being able more in control about each line of code it generates. Yeah, if you look at the um the first stream I did with LMS where we try to implement um like a pretty non-trivial change to the squaba library like it's a spatial reasoning or um uh it's a library that deals with sort of coordinate systems and coordinate system transforms and rigid body transforms. That one was much more like that where I was reviewing each thing it did because the code was fairly intricate. Um, so that yes, you could you could take a look in for that one. Um, all right, they finished with the agents. All 16 tests pass. Great. Uh, agent J. This sounds like Men in Black. Not a bug. Grammar limitation. Record body rule only allows field decoration not named schema decoration. Documented the finding removed unused registry parameter for walk record. Cool quote. Uh, added source info. Now merging. Yes. [snorts] Uh, do you let your agents use the internet, especially at work with sensitive information? I saw a CV where malicious documentation could trick an agent to expose a secret in a seemingly innocuous curl. Um, so I I do let them use the internet in limited capacities. It really depends on what I'm trying to do. Um, at work this is less of a problem because we have a bunch of sandboxing around the agents as well. So there's not a whole lot that can go into them. Uh, and the control over what they're allowed to access externally is also controlled. Um, so less of a problem there. Um, but for for the work that I do personally um, it sort of depends like I will let it go to documentation for things for example if I think it is worthwhile. And this one's tricky, right? Like if the owners of like Docs RS decided to inject like bad prompt injections into the source code for Docs RS, that would be a problem. Like that would make me sad. Um and I I don't have a great solution for that. I think the closest is probably to do things like um grab it with like browser automation instead and then using like um extraction like use the browser to extract only the visible text like this is the path we would end up going down which would make me very sad. Um, [snorts] do we actually, this is an interesting question as well. Um, in main, do we Oops. No. In source. Yeah, that's what I thought. Oh, no, we do. Great. So, we already have a test that um grabs the AVDL files from inside of Abro. So, that' be AVO slash. Yeah, great. So that actually means that if the if the tests now all pass, it actually means that we now are able to uh parse all the all the AVDL files and produce the same result as what um uh as what AVO itself would have done. Neat. And I mean we could test it, right? So let's do cargo run. Um, actually let's do the Java one first. Let's do the other the IDL. And I want this to be let's do another the name spaces. Let's do baseball to baseball uh assec. Oh, right. Uh, and then we do cargo r um IDL. And then I want baseball to delvsc. And if I do um jq to base point and then I diff these, what do I get? Yeah. So, the fields are in different orders. That's fine. Fields are in different orders. Uh, fields are in different orders. Yeah. So, it actually produces the same resulting JSON. Okay. So, that's that's not bad. So, we now pass all of the same um tests that the Java code has. And this is what I mean. Okay, we we've been doing this now for what, four and a half hours. What we have now is a relatively complete parser that passes the same input tests. Obviously, there's more tests that needs to be done here, right? Because just because it works on the particular test cases they have does not mean that it works for arbitrary AVDL files. But it is a pretty damn good start. Uh so now here's what I want to do. [snorts] Uh that was fine. Actually, let's leave the work trees. uh for later reuse. Uh document that in um yep. Oh, jq-s did not know about jq-s to sort the keys. All right. And now see now let's see if we can go one step further. Uh, do we still have the issues directory? Okay, great. Um, triage the issues directory. Triage issues for which of them can now be removed as they are fixed. just have it also do that. Um, the other thing that I want out of this is you'll see there are a bunch of Clippy warnings. Uh, what if I run Clippy? What do I get? Okay. Let's um let's then do Hey, Claude. Yes, that's fine. um address all the clippy lints. Just have it do that in the background. Um and then streams AVDL. The next thing I want now is as follows. And this is where we're going to get into a um bit of a combining prompts game. So remember we have this prompt which is the one we set for uh starting testing. Um we have this prompt which was a continuation of that prompt. Um, we have Oops, no. We have this, which was a continuation of that prompt. Um, and then where's our 4MD? Because what I'm aiming to do here is build a um I want to build the sort of self-fix and find problems loop. Um, but what's missing is Yeah, I think we now have all of those prompts and now we got to put them together. Great. Uh adjust the files and issues accordingly. Did it not? Yeah, it did not do that. Um and so now what I want to do is combine these prompts. So, catprompt star uh MD into prompt MD. And now we're going to try to see if we can make this be a um a single loop. So, um, uh, we're aiming to create a self, uh, reinforce a selfcorrecting loop of development. to set up a self-corting loop of development. Um, do the following phases in order. Then when the last phase is completed, return to the first phase. Again, only exit this loop uh when um the first phase finds no new issues. Phase one uh is going to be this one. Start many sub aents each of which should run uh start many sub aents in main. uh each of which should run the current implementation of the IDL tool on the known AVDL files and compare the output to the expected output running the Java tool. If discrepancies are found, the agent to do a first level triage of the observed bug and then file an issue under issues. Um file name should be uh UUID gen plus short description. The agents should not attempt to fix issues and prefer to debug issues uh using Rust example files and examples that are run with cargo run-amples. They should avoid changing the source files and source as much as possible to avoid stepping on each other's toes. Um uh run all the agents in blocking mode so that they can request permissions if needed. Um, if agents need to compare JSON outputs, use uh jq-s to get sorted keys. Uh, to get sorted formatted JSON that can then be compared for equality. Phase two. Um well then I guess uh commit all the new issues new entries and issues using the commit writer skill. Phase two for each issue identified in issues. Uh this is where we want this thing to be first. So we changed that last time. uh analyze the analyze issues and check which are semantically related and which likely require changing the same files. Also triage to understand which likely need to be uh fixed first because it will impact many others. Based on this analysis uh come up with the order and grouping of issues uh to have sub agents address uh for each such grouping um run a sub aent in one of the workspaces uh sorry work trees in AVDL work trees um in one of the pre-existing work trees. Check out a new branch that branches from main. Um check out new branch that branches from main. Um, run a sub agent to debug and fix that issue. And again, run those agents all in blocking mode so they can request permissions. And again to debug issues, you address example files that are run with cargo run example. Um each agent should commit its work. At the end of fixing the issue using the commit writer skill, uh you should then merge uh all the fixes back into main after the agents after each wave of agents has finished. Uh is there a phase three? Phase three is maybe cargo format and cargo clippy. Phase three, uh, fix all cargo clippy warnings and commit. Uh, run cargo format and commit. And then go back to go to phase one. If there are no new entries, do not do not go to phase two. Now, this will be interesting to see. Oh, what is this? Expand macros in the error module to understand the warning source. Uh, no, ignore that warning. And this is still doing stuff. So [snorts] this is the prompt we're going to try to give Claude and see if it's able to sort of create that self-reinforcement loop. Um I do want to change phase one a little bit more. um start each of which should uh attempt to identify uh issues with the IDL or I keep forgetting the name of the other tool IDL IDL2 schemata uh and IDL2 schemata subcomands uh they can do so by investigate creating the tests in um and this is where I want to look at what's in main avo. Yeah. Um because there's also Java tests in there. That's what I wanted to check. Um, you can do so by investigating the tests that exist for the Java implementation in Abro/ this including running on known AVDL files uh the Java unit tests etc. Uh and compare. Great. This finished. So that can that can go. This finished. What is cargo clippy unhappy about? Ah, that's fine. I don't care about these. I actually don't think I wanted to try to fix all Clippy, but cargo format. I don't want it to cargo format either. phase three go to phase one because um this will still happen in the same context and so I don't want it to pollute its context using um by have being forced to reread the same source files. So I will actually leave that as is. Uh that's fine. Yeah. Why does why is it wanting to change the status of all these um mark issues uh update issue status and then this is fix clippy warnings. Uh how do you navigate in Teemos quite quickly? Is it the default set of shortcuts? Uh, mostly. So, um, I have my prefix set to control A instead of B, so it's closer. Uh, and then it's just, um, and then I can use B to go back by page and control D to go down by page. I don't understand why it's wanting to check the diff for all these files. Just stop. Stop. Stop. Okay. Um, so now let's see how this prompt fares. So we're going to rm0 uh rm prompt uh 1 2 3 4.md but not our main prompt. Uh and then here's what we'll do. Uh start cla and then we will paste this. Um and then I will say by the way uh also a first issue to introduce in uh is the one outlined in to-dos.md and then we shall see if it comes up with a plan that can uh selfiterate. Are we initiating the singularity? Maybe but uh I don't I don't think so. Uh, is that the default clawed banner? Mine takes up more space and shows my email. Um, I don't think I have any particular settings there that would make it smaller than usual. Uh, I don't understand how multiple agents are working simultaneously. Are there issues with syncing? Yeah, that's what the work trees are for. Um and then the interesting thing bit now will be what kind of plan does it turn our prompt here into? I think it should come up with something fairly reasonable. Like this is arguably already a plan. Um, but it is useful to have it turn it into a more formal plan because that gives usually a little bit more detail and that getting that detail right can be really important to get the loop to actually start spinning. Oh yeah, don't forget to tell them to have mercy on you. There's actually I um there's a really cool uh brutal review. So this thing I'll I'll put it in in chat as well. Um so this one this person wrote a uh a prompt to make uh well any agent really give you a really critical code review um for the latest git commit and it's actually really useful. It tends to come up with pretty good insights. It's always a little bit too much. Um, but it's a really good starting point. Um, and so I I don't want to tell the LM to use that when it makes changes because it it tends to be that review skill tends to be overly pedantic and I don't want it to be stuck on the pedantics. Um, uh, but it it can be really useful for when you write code and you want someone else to check it before you actually submit a PR um to your own project. The other thing we should probably add here is uh another should start with uh add to the initial plan. Uh how can we improve the uh Rust test suite for the crate to catch even to have a more robust test suite natively. I should arguably tell it to check to-dos.md on each iteration of the loop so that I could keep adding things to it. Um, but given that the stream is about to end, uh, I will probably not do that. I just want to see it sort of get to the the first start of the iteration. All right, let's see. Okay, context ports the average tools IDL and scanner. Well, several tests several inputs are untested. There are four known issues and the to-dos M&D identifies a missing error context propagation concern. Um the goal you can't see this behind here. Um the goal is a self-correcting loop. Find issues fix them. Verify repeatable clean. Launch six sub agents in parallel. All blocking all in main. Each agent investigates a specific area by running the rest tool via cargo run-ample and comparing as Java golden files using jq-s. Uh agents file issues under issues but do not fix the source code. Um so one problem I can already see is that for phase one it is outlining what the current issues are but the intent is that it should over time find new issues. Um read all issues triage by dependency order group issues work on the same files assigned to a work tree merge and return. Okay. So um um on subsequent iterations um we'll need agents to identify new issues uh rather than just recheck the same ones you've listed in the plan. uh you can think of the current phase one issues as uh a seed for the first iteration but on subsequent iterations phase one becomes uh becomes a more open-ended discovery exploration phase for finding new ones. No matter if this works, I still think the fact that we can think of it to work in a few hours is fascinating, right? Um and this is what I mean by this feels like the kind of thing where and I think this is you know turning out to be true a thing where um the LLM is really good at this kind of task because it's a lot of it is transliteration of code from one language to another. Uh and we have a a good reference we have a spec we have a large number of test suites where it can uh test files where it can actually check another tool for exactly what the output should be. Um, and so it has this ability to drive itself. And so even though you know, you know, would it be faster than if I wrote this code myself? I I genuinely think so. Um, I also think it's fine if the code is slightly suboptimal compared to what I would have written because it also meant that I got to spend 5 hours to build this tool rather than spending a week to build this tool. Uh, and that that that difference matters if the end result is they both produce the right output. um first iteration subsequent iterations on each re-entry to phase one after the face has been merged the inhip to open in an exploration rather than rechecking the same areas. Strategies include um yeah fuss stress test with handcrafted AVDL files. Great clear context. Auto accept edits run. And then now we'll we'll see it run and hopefully uh yeah, it's a very well- definfined problem and one that has a a very good um it has a very like um on rails iteration loop. Like you kind of know what to do each time. you know where you went wrong. Uh and that it gets really good at. Oh, I forgot one thing in the plan which is that did I write that they should commit. Yes. Okay, I did. Good. [snorts] All right. I want to I just want to see it. Uh, this is a this is one that I'm like I don't know whether I want to allow cargo run because cargo run means it could in theory build a program that does something bad and then just run it and I've given approval for it. So I tend to not allow cargo run but the same problem exists for um uh for cargo tests as well, right? You can put armory code in in the tests. All right, fine. We'll just allow cargo run. Um, I'm looking at what plans like Claude Code Max cost currently and what the corresponding tokens over the time of month cost in the real world. I'm still questioning the future physibility. I know I think this one of the big questions here is whether we'll get inference to get significantly cheaper, ideally local. Um, and I don't know yet. Like this feels like a it feels like it should be doable, but I I couldn't tell you. Um, and it's also true that as someone pointed out in the chat, like there are a lot of problems that are not as well defined as this one. And in those LLMs can maybe have help with p part of the task, but not the whole task. Or they can help you iterate on the task like by doing reviews or or uh helping you generate tests, but again, you can't just trust their tests to be sufficient on their own. Um, and so use them in an intelligent way. Use them as you would a power tool. Don't use them as a I'm only using this thing from now on. It does not work that way. Um yeah, let's check what CC usage ended up at. Okay, so today $73. So actually max was not worth it according to this measure. Max was not worth it because I've used less than the the max thing. It will be worth it if I use LVM uh LLM's more over the course of the coming month, which I probably will be. So, but like the for this session alone, Max is not worth it. Uh let's see what is it. Why did it decide to run them in non-blocking mode when I told it to run them in blocking mode? agents just stuck waiting on permissions. I told you to run them in blocking mode. Make sure that's reflected in the plan going forward. Why? Why? Why? I was very explicit. Oh, but maybe it didn't make it into the plan. Well, I think I think the I've at least conveyed the idea, right, of the the next step now that we have a command that we've we've seen actually produces the same JSON for at least some of the AVDL files is now to just keep letting that loop iterate. Uh, and then reviewing what's coming comes out of it. Uh, making sure it iterates again and then gathering more and more of those input test files. Um, and given we're now at four minutes to my next meeting, I have to actually stop the stream. Um, but what I'll do is I'll I'll push this to to GitHub. Um, I'll upload the video to YouTube as always. And then, you know, if you're watching this after the fact, you should be able to now go to see the the GitHub page and see what changes have happened since this last commit. Uh, last commit is rev parse head that commit. Um, so you should be able to see what what has changed since the stream and where the agents got to, uh, and what commits have been contributed. And hopefully it'll actually be a useful tool by the time you go look at it. Um, cool. Thank you everyone. Uh, I hope that was interesting. I hope it was fun. Uh, I promise not all my streams will be prompting LMS. We will also do like writing code ourselves. It's just that this was a good example of somewhere where LMS are actually really useful. Uh, and then there are others where they are not. And we will see some of those too. Uh, thanks for coming out. Thanks for watching and I'll see you next time.

Video description

Some of you may be familiar with Avro, the Apache take on Google's Protocol Buffers. Where Protobuf has just one file format, `.proto`, Avro has *three*. Two of these are JSON-based. The first, .avsc files, are used for "schemas", which are like Protobuf message types. The second, .avpr files, are for "protocols", which are like gRPC service declarations. The third file format, .avdl, uses an Interface Description Language, or IDL ( https://avro.apache.org/docs/1.12.0/idl-language/ ), which is intended for humans (as opposed to machines) to read and write and looks more like a .proto file. Avro comes with a tool that converts those IDLs into JSONs. That tool is written in Java and maintained by the Avro folks, but also seems to have stagnated somewhat. In particular, it produces *really* unhelpful errors, to the point where I've heard of people spending an hour chasing down a misplaced comma. My sense is that this is probably not *too* hard to fix in the Java version, but after digging a little I discovered that the parser behind this tool uses the ANTLR parser generator ( https://www.antlr.org/ ). ANTLR supports code generation to many languages, *including Rust*! And you know what that means: let's try to port it to Rust using something like miette ( https://docs.rs/miette/ ) for errors, and see how good we can make it! Since we a) have access to the existing Java code and b) there's an infinite supply of tests (the same IDL passed to the Java tool should produce the same JSON), this is also a perfect candidate for powercoding (LLM + review the code), so we decided to see if we could get a complete substitute up and running in just four hours 😅 The resulting repository can be found at https://github.com/jonhoo/avdl Live version with chat: https://youtube.com/live/NqV_KhDsMIs 00:00:00 Introduction 00:03:36 What is Apache Avro? 00:13:44 Setup and trying out the Java tool 00:23:25 ANTLR parser generation 00:46:07 Planning the implementation with Claude Code 01:46:10 Reviewing AI-generated Rust code 02:33:30 Making the LLM iterate 02:44:38 Setting up for a multi-agent workflow 02:57:56 Seeding a decent CLAUDE.md 03:03:08 Parallel issue identification and fixing 03:38:26 Making worktrees work 03:46:22 Brainrot, FOMO, and LLM use 04:16:37 Designing a self-correcting agent loop 04:53:14 Wrap-up & next steps