Level Up Your LangChain4j Apps for Production

Java · 4.6K views · 115 likes

Analysis Summary

20% Minimal Influence

mildmoderatesevere

“Be aware that while the technical advice is highly practical, the presentation naturally favors the LangChain4j ecosystem and Oracle's integration over alternative architectural patterns or frameworks.”

Transparency Transparent

Human Detected

100%

Signals

The video is a live conference presentation featuring a known industry professional with natural, unscripted speech patterns and direct audience engagement. There are no indicators of synthetic narration or AI-generated visual packaging.

Natural Speech Patterns Transcript contains filler words ('um', 'uh'), self-corrections, and spontaneous audience interaction ('raise your hands').

Contextual Awareness Speaker references the specific event (Devoxx Belgium 2025), her previous keynote session, and real-time audience reactions.

Personal Identity Speaker identifies as Lize Raes, mentions her role at Oracle and LangChain4j, and provides personal contact preferences (LinkedIn).

Technical Nuance The explanation of RAG shortcomings and specific product comparisons (Cursor vs. Monday.com) reflects professional experience rather than a generic script.

Worth Noting

Positive elements

This video provides high-quality, actionable code examples for solving common RAG issues like query ambiguity and retrieval noise specifically for Java developers.

Influence Dimensions

How are these scored?

About this analysis

Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.

This analysis is a tool for your own thinking — what you do with it is up to you.

Analyzed March 13, 2026 at 16:07 UTC Model google/gemini-3-flash-preview-20251217

More on This Topic

Related content covering similar topics.

Building an AI App in TypeScript & Bun with Groq SDK using Function Calling

Zaiste Programming

Minimal Transparent

ai agents large language models

WTF Is OpenClaw? And Should You Even Care?

Elevated Systems

Minimal Transparent

ai agents large language models

30 AI Buzzwords Explained in 30 min (for Busy Leaders)

Shaw Talebi

Minimal Transparent

retrieval augmented generation large language models

Building AI agents for 127 million customers: Practical lessons from Nubank

Building Nubank

Low Transparent

ai agents large language models

Build Apps Faster with AI | Vibe Coding with Goose

goose OSS

Minimal Transparent

ai agents large language models

Transcript

[Music] [Music] Hi everyone. Thanks for coming to my session, last session of Devox this year. Um, I'm sadly not as cool as the intro, but I still hope to bring you cool agent stuff. Uh, I want to take this session today uh a bit spontaneous, so feel free to raise your hand if I don't see you raise it more. Uh I'm happy to dive into your questions and check some code with you if you want. So today I want to share you a whole bunch of tips and tricks to make this AI application that you may try to build actually work, actually do what you want. Um and the first thing um is this very interesting article where some researchers it's not a really big scale proper research but they looked at a couple of products which AI products go viral and which one flop basically and they saw that um of course the more value you get um out of a successful run of this uh AI app the more people want to use it but very importantly also like the consequence of your error errors if the consequence of errors is high you can forget it in AI apps because they will make errors as we all know and the effort to correct if you can super easily roll back people are going to be much more inclined to use this that's why um a program like cursor where you can always roll back if it introduced errors uh has much more success than for example Monday's AI integration where it just creates you new tickets and it's a really a real mess if you actually didn't want them because you cannot just roll back. That's something to keep uh in mind when you build your AI app. Make it roll backable. Um apart from that, I'm going to share you a bunch of tips and tricks with Langchain 4J. Before I do that, I would like to ask um raise your hands who was not in my keynote. You were all there. Wow. Okay. Well, then I'm going to skip all those slides that explain what lang chain forj is about and I hope that these three people or so kind of know. Um, good. So, I'm Lisa Ras. I'm a collaborator of the lang chain forj framework since almost at the start and recently I also joined Oracle for Java and AI uh in a more broader sense including machine learning including AI powered applications. Um, so I'm also always very happy to talk to you or you come complain to me. That's my favorite. If you're like missing things in the ecosystem or want to put things on my radar if you want to contact me, best thing is via LinkedIn. I just accept any invite and then you can write to me. So these are the slides I'm going to skip because we all saw them. And yes, let's dive right into rag. Who tried rag already? So the thing where you try to load documents. Okay, that's that's quite some people like say 30%. Um, who was disappointed? Yeah. Okay. Still also quite some people. So, um, basic rag is where you want to um, add relevant info to your user question. This relevant info comes from your company documents or your internal things that the model had not been trained on before. So it managed to answer you without retraining of the model on much broader topics that know about your own company for example and then you send that user question and these extra pieces of information from your sources to the model and it can give an informed answer. The idea is very good. Um but the way that the basic rag was hyped and worked didn't work all that well. Um it worked by chunking documents in pieces um then creating vectors. So you could find similarly the things that mean something similar than the question and this is how it was added but um it has a lot of shortcomings like it doesn't find all the pieces we want. It just find many chunks we didn't want that are also sent to the model. It gets confused and it it answers really weird things that have nothing to do with the question anymore and so on and so on. And that's why I wanted to show you that there is advanced drag. Um, it's a bit more complex, but I'm going to explain you all these pieces and why they are important and how they're going to solve many, many of these problems with the basic rag. Um, so to start with, what we see on the left is this query transformer. Um, going to show you why you actually need this. Um, if we have time at the end of the session, I'm going to show you an AI drug discovery assistant pretty complex agentic system. And this is a little output of it. So on the left you see what's going on in my chat and on the right you see what's going on in the logs. I find it interesting to see those two in parallel. And at some point it asked me shall we proceed to the next step where we find antibbody characteristics and so on. And I say yes please. And I see there in the chat coming by a piece that says a a juicy stew of personal stories, funny bits of sex and love and friendship. And I'm like, what? What was in my documents? Why did it find it here? And then I checked. Okay, it was using web search to retrieve extra content and it searched for yes, please. That was of course not what I intended, right? And that's where these query transformers come in. So a compressing query transformer takes in the memory and instead of just taking yes please it's going to rephrase what I actually want to search which would be find characteristic for this and this antibbody this this wouldn't have happened so things really important in production right an expanding query transformer will do something similar taking in history and it will transform it into multiple queries that hopefully at least one of them or many of re uh yield the results that you would want. So definitely use that when you're in the chat and you're going to use rag. The next part is the uh query router and the multiple content retrievers. So you don't have to stick to chunked documents. You can use database search, you can just uh use graph rag, you can use use web search as you saw in the other example and you can use those in parallel. And you'll need a query router uh which is also um typically has an LLM unless you find nice rules to say okay when it contains this and these words send it there. So you can you can mix and match with LLMs where you want or not. But if you use an LLM, this query router will both decide where to send it and then rewrite a query because a SQL query is of course very different than something you're going to try on a web search. Um I want to show you some code for the SQL content retriever for example. Uh and I al also actually mainly want to point you to where you can find the examples. So you have this uh lang chain forj examples repository. The link is up there. And if you go there um we have next to this agentic tutorial that we're going to look uh at after we have rag examples and we have there uh folders for easy rag. Sorry, I cannot zoom on this but you'll have to believe me and you can try it at home. Um, easy rag, naive rag, uh, advanced rag and that's where I want you to to to look. Uh, all all these interfaces I've been showing you and I will show you they are listed there. So you have query compression, query routing and so on. And the last one here is a SQL SQL database content retriever. And what you do is um you have a database GDBC data source. Hi. Um and you're going to add this to your content retriever just like this. This will pass I mean lang chain 4j will take care of giving the schema to the content retriever and say like okay you can query this database and here's the schema here are all the tables and then um most models are actually very good at writing a human language to SQL when they know the schema so this works pretty well it's also pretty dangerous um you will see here on top uh never use this in production because you are actually accessing your database uh or at least use some serious checks before before using this. Um so these are a couple of different ways to retrieve content and you can make your own implementation. So all the great things there are interfaces and if you want to say oh I have a specific API even I have an MCP server there to find um context or documentation you can plug them all in. Um I worked at a company before and they did um companywide search over um it was called Naboo. They did it over your codebase, over all your tickets, over Confluence documents and that's very interesting. And they did not use much of this rag. They did actually very smart things like uh expanding synonyms. So they use LLMs to make more synonyms and just go find those words in the documents. For example, I've heard somebody that pre-processed the documents um and said what questions would be asked about this probably and then his rag worked like okay I have a question I retrieve the similar questions and I already know to which pieces these question point and that's the pieces I'm going to I'm going to add. So I'm just be smart about this based on your use case. You can do so much more than just chunk and find. Then we get to the content aggregator. Either you take default content aggregator. That just means we just slam all these chunks one after another. But you can also have a um yeah, by the way, the green things are the default implementations that you can use out of the box. Important. We have this re-ranking content aggregator that uses a tiny scoring model. Um it's also AI, but it's not an LLM. and it's extremely good and extremely fast at answering the question. If this is the question, how good does this chunk of information, how relevant is it? And then it gives it a score and then you can throw out everything that's not relevant. And I can u really recommend this because otherwise your model gets confused by all kind of junk. Say like if this web search threw in that answer that I found there, uh I really want to kick that out before passing all this content onto the model. Um, last thing is this content injector. Um, this is a bit of a flaw in lang chain 4j. I've tried to get this changed but we then didn't because of backward compatibility. But the current content injector takes a question says use all of the following information and then pastes all the chunks. And the problem with that is that the model will use all of the following information and that's typically not what you want. So I um also would recommend you if you go to production or you want proper answers out of this uh overwrite this standard method with something like if relevant to the question you can use uh the following information. Good. Another tip um the way rag is currently implemented in lang chain 4j is one shot. So you get um your memory your last question it is transformed into a query or multiple queries. it's retrieved and that's it. And very often you have questions where you actually you retrieve chunks but more questions come up or oh you only had half of it and you want to re-query or you had half of the answer and now given that the weather is this open now we want to actually ask a follow-up question based on that weather. Uh that's why I would say wrap your rag in a tool annotation. Then for the model it becomes um here you can search in our internal documents or about the weather or whatever and the model will call that thing as many times as it as it wants and it will just reququery if it finds that it doesn't know enough yet. Um then some things also very much for production um typically you don't want hallucinations and you want to be careful with giving LLM answers back. So often you will want to give actual uh phrases from your documentation back. So you can save resources like which original file does this come from uh in the meta data that goes that is stored in the vector store um or or anywhere else so that you can retrieve the original document and for example highlight or just give these pieces back and refer to the document. Um and the other thing is read permissions. So if you build a very big system, not every user is allowed to see every piece of information. Access rights is also something you can put on this metadata and then when a user queries you check with this user ID uh which of the pieces he's actually allowed to see and you put a hard filter on that. So like a code filter you don't say to the LLM um please don't tell him about this and this because that's not going to work. Uh so you filter out these fragments um in the hard way. Good. Um, another thing is tool calling. We had a lot of extensions beyond the basic tool calling. So, quick recap, we can uh expose tools to models and whenever we send a question, we have um all the tools that we make available are sent along and it's basically saying like here are the tools you have or here are the functions. Um this is the name. It does this with a description and um here's the input parameters you will need. And then the model will say maybe I need this now. If somebody asks about the weather, it sees it has a tool to find the weather for a specific city, then it will probably call this with whatever city we're in. Um, and we made like some nice progress. And I'm going to show you that in code. Um, this is currently not in our tutorial repo, but I plan to add that uh in the coming weeks. So return immediate is a new nice thing that we have depending on how you design your systems. You will not be interested in the LLM response but you'll be directly interested in the tool response or you will be interested in nothing at all and you just want to fire the tool. So I built very often systems where I'm just like okay my LLM is kind of a dispatcher and now I wanted to say update a state in the database. I don't care about the answer and I don't care about the LLM telling me, oh, the update was successful. I would say if the update was not successful, I want a hard fail because it means something is off with my database and then I don't want this LLM to tell the user that I want to basically quit the program and and log it and inform support. So um for those cases we have um return immediate. So um in this example we have tools which is just for the demo an add method and we annotate it with tool so we can use it and we say return behavior immediate. What it's going to do is that um after the model placed the tool call uh we are not going to call the model back with the answer. So it saves you tokens. It saves you time. And we're going to uh give a return object that just return immediate demo. Yep. Going to give us a return result object here. And we can um then get the content out of that. So here I want to show two things at once. I'm commenting this out for now. This is the ones we want to check here. So we're gonna um ask add 284 and 42 to this AI service that has our tools here. We added it. I'll talk about the other part later. Um also interesting you can say maximal uh tree sequential tool invocations. Uh it avoids that the system gets into these loops that sometimes we see in cursor etc. Um then we call it we say do this addition and what we get is this result of string and we can get the tool executions out of there. Get the result object and since we know it's going to be an int, we can Oh man, I've been forgetting this word all the time. Anyway, cast. Thank you. It's I don't know what is with cast. I keep forgetting it. So we cast it. Um voila. And then it didn't go back to the LLM. And when we when we run that, you'll see it just gives a kind of deterministic answer. So depending on what you try to do, at least your tool gives you an answer that was within ranges of what you know what to expect. It's very different than an LLM answer that still does something with this. Um that's for this one. Then uh something else we have is that we can dynamically add tools because um say if you use MCP servers you very very easily end up with 20 30 tool descriptions and they are sent to the model with every request um which which consumes a lot of tokens and which also often confuses your model. If there's too many tools they're going to try start trying weird things. So we can filter filter those down and that we do as follows. Um instead of just adding tools to our AI service, we will add a tool provider. And in this tool provider here um we have basically an if statement. In our case, if the user request contains booking, well then we will add a get booking details tool and a cancel booking tool. And if it's not about bookings, we are not even going to add those. Um, it's pretty verbose sadly how you then specifically add them. Um, I hope we can make this a bit less verbose, but so you just so you know, it's possible to restrict the tools. You can restrict them based on the user ID. If you if they have tools that that should not be executed by certain users and so on. Um, I'm not going to run the demo, but it basically says if you ask what's the weather like today and you would look in the logs, no tools are going to be added. But if you ask uh can you show me booking this this this then you will see these booking tools added. And then a last thing I want to show is that I I love this. It's that we can um um say annotate AI services as tool. So you can use an AI service as just another tool. And so the model that's going to use this tool, which is another AI service, uh is not going to even know that there is AI involved there. So this one says um it answers questions regarding bookings. That's all my invoking AI services will know about it. And then internally we have that it's an assistant. There's an LLM in there. There's tools in there. And um I want to show you that one still. That's this part where we can add that as well. So um that's here we also add this assistant with it dynamically added tools via this booking tool provider that we did before. So we just add this whole AI service as a tool to the new AI service. And if then we're gonna ask uh cancel booking this will route it can I can happily run it. Um so it's not g it also has this add method here but it also has the booking methods because we have booking in the question. Uh this AI service knows oh I have a tool that's going to help me with bookings. It's going to ask the question on and then uh the next one should add these booking tools. Yes. So, it has been successfully cancelled. And we Yeah, I I log these tool calls just to to show that they're definitely called. Voila. That's uh that's some tool magic, extended use of tools that you can mix and match um for your use case. Again, if you have questions, feel free. Otherwise, I think I'm perfectly explaining it. Yeah. >> How certain can you be that a tool called? >> Okay, the question is how certain can you be that a tool will be called? uh you can absolutely not it's you have to keep that in mind sadly but typically a good model say this is GPT4 mini mostly in my demos will rather call a tool than skip a tool even if it could do it perfectly by itself like addition to a certain degree it can do it by itself still if it sees a tool it will typically call it uh if you're working with local models that are much smaller much less smart uh there's especially for tool calling they're not very good. They will either skip it, call it too many times, call it with fake parameters and then most of them can also only call one tool. Whereas these smarter models, they can call call again, call things in parallel. Uh so this very model dependent and yeah, as everything with LLMs, it's never 100% sure. All right. Um MCP servers are also added to Lchain 4J about six or so months ago. Uh or bit more. um they are all the hype. I I like them and I hate them at the same time. And uh so I will tell you what's good about them, what's bad about them. So before when you wanted to get information uh from any service or API, it was I find kind of hard like you want to get information from your database, you have need a runtime that connects to your database, you need to learn SQL, you want to talk to any Google service, you have to get into their things. It's it's really hard to set up. Um you want to push to GitHub, you have to know Git, you have to do SSH and so on and so on. You need a new syntax and often also a new runtime for all these things. And what MCP did was hooray. It just takes care of that runtime and care of that syntax for you and makes that an LLM can just get anything done on your database on your Google Docs or on your weather API uh whatever you have an MCP server for. So the good thing is it became easy. The kind of bad thing is that now we have an LLM in the middle that makes it very nondeterministic. Um if you want to know some more um this link here model context protocol/servers it's from the entropic guys and wanted to just show you for a bit of inspiration if you go there ignore the reference servers ignore archived go to official integrations for third party servers because those are companies that expose their own MCP server so you can kind of trust it a bit more than when any other software developer says here's a tool and you just don't know what's going on in that tool. Um and so you have a huge huge huge list. Um and let me scroll. We have elastic search in there. I just remembering my alphabet order. So yeah like just to query your data in elastic search but then from natural language or an an AI an LLM can decide to do it themselves based on whatever you whatever task you gave it if it finds it should go query it will go query uh we have GitHub and GitLab in there it's all um very powerful very interesting and bit dangerous like I can just say create me a new repo and I suppose also delete my my repo and it will just do that. Okay, so be aware. Um, so very importantly when you deal uh oh yeah, this is one I wanted to talk about too. Context 7. It's an extremely um popular MCP server that you might want to add to your coding agent. What it does is it gives the most up-to-date documentation about kind of any framework you would want to be using while writing code. Uh also anything outside of training data. Good. So very cool. Um you write this or somebody wrote these access methods once and everybody can reuse them forever and you can access kind of any server by now any service in natural language. That's the cool part. The last cool part is what is in these MCP servers. So if it's an official integration, you can trust it a bit more. if it's a nonofficial integration, go through that code because you're typically exposing keys to your GitHub repo or to your emails or whatever. So, it's really tricky. Um, there are so many write and delete functions where often you just want to read. That's a problem. And if you go ask to legal if you can integrate FCP servers in your tech stack, good luck with that depending on what company you're working for. Um, something that's really helpful is that in Langchain 4J you can filter down what methods you actually expose to your LLM. So this is a GitHub MCP client and we say we only expose get issues, get issue commands and list the issues. No deleting or or anything else. So know that this is available. You you're not obliged to deal with all the delete uh and and create things. Good. Then um aentic patterns. So since almost everybody was in the keynote, I'm going to skip this just we have these three types of agents. The real LLM agents. Um then we have just code that we can annotate as agents and then we can have asking human feedback that we can also annotate as agents so that we can use them all together. for all these pieces in workflows. Sequence, parallel, loop and conditional uh are like integrated or you have self-chestration. And to remind you uh you can combine mix and match from both and any combo sequence or parallel or something is again an agent that you can just use inside another sequence loop whatever you want. Um, so this example we've already seen in the keynote, we're going to skip it. Um, so our agents the new hype, our agents the new um, very wasteful resource thing that does nothing useful. According to me, not according to me, agents are solution for all the problems we face before or many of them because they allow you to decompose things that were too complex for an LLM. not just decompose the tasks and make sure they are executed in the order you want with checks in the middle to see okay is my state actually correct before I even continue that's things you can do on top of that you can uh limit the tools per task so you're not having one big service that has access to all the tools and will call them at random moments and be be uh impossibly confused. Um, on top of that, since if you make if you decompose your workflow enough, you get much easier uh tasks and sometimes you don't need a full-blown big LLM for it. Some tasks you could train a little model that answers things like here is what the model proposed, here's what the user answered, are they actually done or was the conclusion reached? If yes, you just continue in your workflow. If no, you keep looping here that in that conversation. um all things like that that I think in the end will make it less resource wasteful with very much smaller models and very much more controlled and also testable chunk by chunk much better than when one big service has to solve all your problems. So um if you're trying to design an agentic system and I would say whatever you design of AI powered app consider to make it an agentic system. Um I want to give you here 14 actually 13 steps um on how to go about that uh to give you the best chances to not waste your time and to actually end up with a product that works controllably uh and actually does what what the users wanted to do with it. So first step um it's not let's throw eye at everything. It's like go see with the stakeholders what problem are you really trying to solve and how much value will this bring and will that value justify the effort you're going to spend. Uh because if you build an agentic system, you're going to typically have much more work than when you just have one big AI model and you say now do all of these things for me and here are all my tools and you're done. But it's just not going to work. it's not going to work um controllably reliably. If you want to set up a proper agentic system, you're going to be designing much more because you're going to be much more in control. Um so it becomes like writing real software that also really takes time to to design, work on and maintain. Good. So for the example here um like we are going to try to help HR with some um hiring process tasks so they can actually concentrate on keeping the employees happy and engaged instead of uh making us fill files all the time. So once you know your problem and you're um sure you will bring a lot of value if you manage to pull it off go and look what this state stakeholders what the current process is and map that out with them not what you think it is but they because they will have to use it in the end. So let's say for this one they receive a CV. HR does a first scan if this is anything usable. Um if no it's rejected. If yes then the manager has a quick look if this person has actually the skills kind of that are required. If no reject if yes HR is probably going to place that phone call to ask about salary expectations, relocation whatever. Um if that fails reject. If not, we go on to the proper review by the manager and the team ideally that will give um their end conclusion together to HR who is going to either reject or uh schedule an on-site interview for example with the candidates and that's a back and forth to find the right date uh schedule that meeting and then voila you schedule a meeting. So if this is our system that we want to um make with AI, we will have to see okay which part of these ones we can replace with an agent um or at least have an agent help the humans to do it. And then I really recommend already go check with legal because there's many many many reasons why legal will not want to allow an AI powered app. For example, accessibility act. Nice thing with uh AI is that it makes accessibility actually easier if you design it well. There is GDPR always like can we even take this candidate CV and send it to OpenAI? I bet not. So you will have to need um local model to do this or um local models are typically not good enough. So you will need to buy bigger hardware just for this um and and run a bigger open source model that can actually handle it. And then probably your CEO is going to say no that's not worth it for this use case. These things I really want you to find it out early. Um then there's the uh EU artificial intelligence act which will tell you that oh uh scoring um applicants for job openings is a high-risk category thing because you're gatekeeping and you risk discrimination and I think it's righteously in the high-risisk category. Um but if you want to actually use this or use this in production, you will need certifications uh and uh give guarantees that you do everything to avoid discrimination there and so on and so on. um it's not fun and the answer is probably going to be no you cannot do this or you have to like drill it down seriously um but still rather find it out early than when you did a lot of work so when you would have permission to do at least parts of your workflow with AI um you chunk your system which in this case the way I've written it out is actually already chunked and then we're going to um see where we can add AI in this flow so first thing would be this first HR scan like is this person even living in a territory that we're allowed to hire in uh do they have the required amount of experience and so on and so on can be according to me pretty well done by an AI that either does it entirely or informs um um HR hey there's in these and these flags or like everything is fine so HR has a little bit less work figuring all that out um manager scan same thing like does this person have the skills for the job it's also pretty obvious for this first scan. Telephone interview itself, that's of course going to be still a person. I'm just thinking like I would like to see when it becomes AI agents interviewing you. Not looking forward to that. Um but you can help at least HR um flagging like oh these and these and these points are a bit unclear. Definitely clarify that. Um which saves them again time. Then um after the phone interview would pass the manager and the team review can again be similarly AI assisted where AI also looks at the job description and the uh application and say like okay well this seems to be well covered but here and here and here we're not too sure. Um just to help this review process to go go better. Um then um just bundling all the reviews of these people. You can do that with piece of code and then HR has to schedule um an interview with this candidate and there you can add this human in the loop system which I think is very helpful because otherwise it's like yeah you send an email can you come on this day ah no that doesn't work and you have to uh for for HR that's just a lot of new emails and context switching whereas that you could definitely I think do with agents but you have to set it up smartly. And then uh to write a reject email, you could have that done by AI or at least proposed by AI and sent off by a human. Oh, and I even forgot to say like when they are invited like sending out a calendar invitation and stuff, AI can also do that. Okay. Then once you have like an idea where you're going to put AI in all that process, you will have to see okay, how much latency does it give us? How much is this going to cost us per run? Is this actually feasible? In this case, you're going to say, okay, this process any step in there anyway takes typically an hour or hours or days. So yes, if it AI takes maybe 3 minutes to run through this thing, that's definitely still uh worth it. If it's other things in chat bots and it takes a minute to get an answer, it's a different question. And that's where you can also check if you can um replace things like here sending out a rejection email. That doesn't have to be AI at all. I would absolutely not have AI do that because you risk liabilities. You can just have a um say that a template and then put in a contact of the candidate and send the template out instead. Things like that try to optimize try to get the latency down by getting AI out of there which wherever you can I would recommend you to do so. Okay. Before you want to implement this first thing would be to build your sorry hardest agent. Yes, I've added that phone there to say like, okay, if after three loops or so of back and forth AI does not manage to get an appointment with this candidate, you escalate to HR, they can call. So, let's say the hardest part in here, I would say, is this loop with an AI agent that has access to the calendar of everybody and tries to um find a date together with this human via email. So, um let's build this one. And I have an example here so you can see what this looks like a little bit. Although it's absolutely not I mean it doesn't have real access to the calendar. I've just written in the prompt which dates are available. Um that's going to be in our agent agentic tutorial. So in the examples you have this agentic tutorial. I've shown parts of this in the keynote already. Um the one I want to show you today is this human in the loop 9b one. Um what it is is it's a lit chatbot with a memory because the model has to remember what it has already proposed what the user which dates he already could not come to propose a new date. Um it has memory and it has a goal. Basically the goal is whenever the user agreed to a date that actually fits for everybody else then we are done and we exit this loop and we just send confirmation. So it works like this. um we add a memory to our meeting proposer part and also typically then a tool to deal with our agenda for now I've just in this proposers um system message written basically every week uh every morning of Monday Tuesday Thursday or something works just keep asking on um then it's going to output a proposal that's then going to be sent to the human in the loop one I'm here going to do it with the console So in reality, of course, this will be emails that are going to be sent out and answers that going to be parsed. Um, and this is going to output the candidate answer. Now, how do we know when we're done? That's where we're going to use another AI service, but again, you don't need such a strong model. So, this is really feasible with a small local model to say um this decision reached service will just say given um the proposal and the candidate answer, are we done? True or false? it gives it gives a boolean back. Okay. Um let's run this and so I mean this will be by email in theory but my model and I have some logs in here too. Uh my model says would you be available next Monday at 9:00 a.m. And I say uh sorry that doesn't work. Okay these are uh logs here. Um so the interaction so far was would you be available invitey? Sorry, doesn't work. You must answer strictly in the following format. That is this scorer like a are we done scorer and it says false. We're not done because it doesn't work for me. So the model is going to try again. Would you be available then next Tuesday at 9:00 a.m. And I'm going to say yeah, that will work. And then this time the model says true. It's done. And uh what we output in this case is um would you be available next Tuesday 9:00 a.m. And yeah, it will work which is going to serve as an input for the next agent to book the um the meeting for everybody involved including the candidates and the and the people that are going to interview. Okay, so this is the demo version. In the real version, uh you'll have to add this calendar and email um support with it. Now if that works for you, you'll have that was a demo already did it. Um also yeah this again you can do much better than what I made there uh with some more time. Um, so once you have this hardest part built, you can start to estimate like is my project gonna work or is this like hopeless and say when we test this module 90% of the times it behaves correctly and say 1% of the times it's really a bad bad failure that it's going to send something um about our reviews to the u to the candidates who could then sue us or something. So once you know that you're going to be able to fix things and also see if then your whole flow consists of parts like that how much chances are there that it's going to actually work properly. And for example if you just have a sequence of things that all have a 90% chance to succeed your total thing only succeeds with 65% of the time. So that's very important to just not be blind uh for when you when you set these things up. And then you can uh remediate a bit later we will do that like how do we put more controls in place so that it does not so dramatically bad outcome. Next thing is to define for each agent there inputs and outputs and tools. So you prepare to build the system. For example uh the manager review part uh this person does not need to update any state in my case not doesn't need any tools and doesn't need rag. It's gonna just read um the job description, the candidate CV, the notes of the phone interview from HR and then it will um output a review object with a score and typically rather send that by the human for validation for verification and so you can do that for oh sorry for every part of this workflow and then you start to know how you have to build your aentic system then and then very very important and I mean it's a lot of work but you can solve a lot of things um by making the thing resilient uh by and handling errors well and by putting many checks in place between all the steps. So just know that you can um as I showed before uh use user permissions when it comes to rag like getting document information user user permissions so you don't output anything they could not use in the first place and the same with tools not every user is allowed to call any tool uh then there's guard rails which come in lang chain 4j um they look like this basically on an AI service you add an input guard rail in this case that's going to check whatever input comes before sending it to your model. And this one is for my AI um A I uh drug researcher that we will show later. And it just want to check that nobody's trying to buy the biological weapon with this agent. Now, it's a bit silly. you want to have proper guardrails. But the way this works is that whenever a user input comes, it will ask an AI service, is this an attempt to build a bioweapon like create a toxin or something for humans? And then if the result is higher than 0.7, you're just going to flag the user, break off the system. Um, and then how do you know if this is an attempt to a bioweapon? You can either do deterministic things like we don't want these and these and these keywords these and these users these and these countries whatever or you can use another LLM give it examples and that works pretty well giving examples of oh what known antibodies are toxic to humans that's not something we will want to assist with our drug researcher so that's going to be flagged but what's the meaning of life it's not going to be flagged um again this is not a 100% guarantee right um you can you can help yourself a bit with this. Uh if you want 100% guarantees also you will need deterministic guardrails. Um, you can also set guardrails on the output if you, um, are a car vendor, uh, say BMW, and all of a sudden it says something about Audi because Audi is better in whatever the user asked about, then you maybe want to intercept that with your guardrails and and do a retry. um intermediary checks, retries or escalations to humans. Please build them in like and that's a thing you can always escalate to human typically like the the person that would normally do this step in the process. Uh that allows you to have at least AI help your users in many cases. Um if you worked with types you get way less LLM rubbage out of there because uh if you ask you saw before I wanted the boolean back it's like true or false that's it. Um I uh I I don't risk that my LLM will output like yeah the user is not entirely sure but this might work. If you insist again and then if your system is not designed to deal with these strings then that's where it's going to break. So I said like I want a boolean back and I don't have to deal with that problem already. Yeah. And like be smart about your design. Um I'm going to skip this one then. Um it's not there yet but we're going to make it happen in the coming months. that we already have async agents. So they will run in parallel or they will continue running as long as their input parameters are available. uh so that speeds up the process a bit uh that we have we have a certain form of persistence of the agentic scope kind of the memory of our agentic system but what we don't have yet is a persistence of the state of your flow which I would love to have because then if you need to book a meeting with your customer and they only read their emails a day later you just store that state in the database and whenever they come back you just can continue um yeah crash recovery I'm going try to make examples of that. Uh they're not there yet, they're coming. And I also like the idea of task buffers. So if HR has to review or make their phone calls or whatever, um put them in a buffer within their tasks and then uh they will do them maybe once a day and then and then all the rest of your system can continue until it hits an other place where a human is needed and then send it to their task buffer. There we skipped one. Okay. Then uh nondeterministic testing. So the nice thing about this agentic system I find is that you can test um agent per agent and since they're small scoped it becomes so much easier to see what goes wrong or what works well than when you have one agent like do everything and then you're going to test at the end did it do everything kind of here it's really like okay do we have a review that has the right format that has a score like are these things uh there so it's just much nicer um to work with and much more under control. Um we have in the quarkus extension for lang chain 4j we have a nondeterministic tests framework that we are going to port to vanilla lang forj too so it becomes available also for non-quarkcus users the way they do it if you test something nondeterministic it's not pass or fail as we used to kind of passes 95% of the time or like oh critical failure sometimes these are the things you you want to know so what uh quirkus uh test framework does is it's going to um concurrently run as many tests as you want on different threads. Um and then you have typically like your golden data set uh an input and a desired output that's in the sample file described below. Um and then you run that in this case five times and you will use a semantic similarity strategy for example to see what came out of your test. how close is it to what we wanted the the expected output. Um here they use these factor embeddings for it. It's one way uh you can check if certain keywords are present. You they also have an AI scorer which is really a full-blown LLM that going to say like yeah score how well this question uh is answered now. Uh, okay. And that gives you then a score like when when it's good enough, it passes and then maybe four times out of five it was good enough and you start to get an ID. Um, yeah, it's easier to do typed testing, right? Uh, if the thing is true or false, it's extremely easy to test. If your string has to kind of have the correct words or mean the right thing, it's much uh slower to test. So, I I'm really huge fan of typing in these agentic systems. Uh and for your agents and tools, you just test that they invoke the right tools in the right order and you uh check that they are invoke with right input and output parameters. Uh the tests if you put extra LLMs in the tests or if you test parts that have LLMs, it's costly. So don't just don't run it on every pull request or like three times a day. Just think when you want to run which tests because it costs you tokens. I have to start the demo. Um just thinking if we have time enough but yes let's try um good this takes a bit to start right last thing um we have observability built in lang chain 4j at least in the quarkus version and in the spring version we're going to make sure that it comes fast PR is there um yeah so some things like you have things like model drift you can solve this by using model snapshots. As you can see, if you use GPT4 and then this date snapshot, you're making sure that you're always calling the same model because if you just say GPT40, um, OpenAI is like every couple of months secretly behind your back going to change what happens and that's has real impacts. Say, um, you've been testing this with a model and it was super verbose and you said, "Hey, be concise in your prompt." and all of a sudden they change the model and it's much conciser and all of a sudden your hey be concise makes it way too short like and it's not anymore what you expect that's a real thing so be careful. Um if you are allowed to collect how people use the data review a couple of things by hand see what goes wrong if they like thumbs down that's really gold and you can even leverage AI to just like group issues um and to see what you need to address most urgently. Good. I'm going to quickly show off a demo even though my time is up. I would say if you're not interested, you are very free to leave the room. I'm just going to quickly run through the AI drug researcher to show you how an agentic system can uh work when you put it in like a state machine and have much much more control. It's actually I I find it pretty impressive. I built this together with Muhammad Abd Raman. So, uh kudos to him as well. And you'll find this at my GitHub repo Lisa AI drug discovery if you want to check what tricks I use for all the parts. Uh should be up. So let's go to local host. Yes. So um yeah wait we have even zoom. Okay. Here is my uh chat. So, I want to solve some uh solve some type of cancer that's not been solved yet and that's deadly. Um, it's a demo thing. There's many things behind the scenes that are just demo or even putting outputting random things, but you can actually plug in real parts of of a drug research flow and make this very powerful. So okay, it wants pancreatic cancer but um I have actually um prepared all the input articles for for this with glyobblast which is a brain tumor. So I'm just going to say that it has to do that. What we've seen in the logs is that it has stored my disease name because I said the JBM and once it does that automatically my status triggered to the next thing which is going to start to find uh the antigen like the marker of this disease because that's the one we want to find and attack basically and it find it in a database uh this is the real sequence and then you see blue is tool calling green is that it goes look into uh into rag into scientific articles So like half of this is working properly. Half this is is just demo stuff. So it's searching the literature for anything that's already known known that targets this this disease marker basically. And I have a no pointer exception. That's very painful. Okay then. Um now that's good. Then then you guys can go home. But um no it goes a couple of steps further. But as you can see what I output here it's not a real chat. This is basically informing the user of fixed things. These are tool call outputs that are deterministic. I can say with very much confidence that this is the antigen and that this is its sequence and so on and so on. Uh what's more in there if you want to have a look and try it out. So proper error handling is not in there. Um but what's also in there is that there's other models being called. It's calling um Google's alpha fold to then output also the structure of the disease which we return in there. It has audio. It has a lot of interesting things. So if you want to play around at home, but I'm going to let you go home now. If you have more questions, I would say come see me because time is up or write me on LinkedIn or if it's about lang forj itself, open a discussion or even open a PR. Thank you very much. [Music] Oh.

Video description

Struggling with your AI-powered PoC? This session demonstrates how LangChain4j can help, showcasing a set of often overlooked techniques that keep AI systems on track and unlock more advanced use cases. We explore LangChain4j’s advanced RAG methods for finding all relevant information across documents, databases, APIs, and more, and share practical tips for effective tool calls and responsible MCP usage. You will also see how LangChain4j’s agentic approach lets you decompose complex workflows for greater clarity and control. The presentation wraps up with a guided build of a production-ready agentic system, including the operational and legal considerations that matter once you move beyond PoC. Presented by *Lize Raes* (Developer Advocate - Java Team / Oracle) during Devoxx Belgium 2025 ➤ https://devoxx.be See also https://youtu.be/HGbxFO_pPNI Slides: https://epic.engineering/level-up-your-langchain4j-apps-for-production/ Thumbnail photo credit ➤ https://www.flickr.com/photos/bejug/albums/72177720329485456/ ~~~~~ Chapters ~~~~~ 00:00 Intro 02:53 Advanced RAG 13:23 Advanced Tool Calling Techniques 21:20 MCP for Production? 25:38 Agentic Module 26:30 Steps to design Production Grade Apps 36:27 Agent Prototype 41:35 Security, Resilience, Error Handling 50:16 Agentic System Demo #Java #AI #llm #RAG