Java Performance Update: From JDK 21 to JDK 25

Java · 2.9K views · 92 likes

Analysis Summary

20% Minimal Influence

mildmoderatesevere

“Be aware that while the performance data is likely accurate, the presenters use 'consensus manufacturing' to make upgrading feel like the only professional choice, briefly mentioning a paid 'JDK8 PF' product as a solution for those who don't.”

Transparency Transparent

Primary technique

Human Detected

98%

Signals

The transcript captures a live, unscripted technical discussion between two specific human presenters at a conference, featuring natural conversational flow, audience engagement, and spontaneous reactions. There are no signs of synthetic narration or AI-generated scripting; the content is a recording of a physical event.

Natural Speech Patterns Presence of filler words ('uh', 'um'), self-corrections, and conversational interruptions ('materialized', 'right').

Audience Interaction The speakers ask the live audience for a show of hands regarding their current Java versions and react to the visual feedback.

Contextual Authenticity Specific references to industry events (Jfokus), internal Oracle teams (JPG), and niche technical benchmarks (spec jib 2015).

Worth Noting

Positive elements

This video provides a deep technical dive into JVM internals, specifically explaining how 'Stable Value' and JIT optimizations translate to real-world energy and latency savings.

Be Aware

Cautionary elements

The use of light social pressure and 'professional shaming' during the live Q&A to frame software versioning as a matter of diligence rather than business trade-offs.

Influence Dimensions

How are these scored?

About this analysis

Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.

This analysis is a tool for your own thinking — what you do with it is up to you.

Analyzed March 13, 2026 at 16:07 UTC Model google/gemini-3-flash-preview-20251217

More on This Topic

Related content covering similar topics.

Join Me at JavaOne!

Java

Minimal Transparent

software engineering java

The Genuine lazy Sieve of Eratosthenes (Java, Records, Iterable/Iterator, PriorityQueue, Comparator)

Fred Overflow

Minimal Transparent

software engineering java

Unboxing Java 26 for Developers - Inside Java Newscast #108

Java

Minimal Transparent

java garbage collection

And How to Prevent It #ai #tech

Minimal Transparent

software engineering software architecture

Evolving Java is not about...

Java

Minimal Transparent

software engineering java

Transcript

Yeah. So, hi everybody. My name is Perr. I work in the collaborators team at Oracle. >> My name is Claus. I work in the performance team. So, we're both doing work at uh JPG or Oracle Java performance Java platform group. Thank you. >> Yeah. And we are Yeah. performance enthusiast both of us and this is not going to be like a regular presentation it's going to be more like a discussion so we're going to discuss here and we invite you to participate maybe we can take most of the questions at the end but it's not like we're going to stand that's why we're not standing at the podium okay so let's start >> right so we started collaborating like a year ago with some articles about recent performance improvements like every every six month release of Java JDK something we've been planning for a while but finally got got the time to do >> materialized >> materialized. Yeah. Uh so we want to you know continue that if if people people seem to be appreciative of uh you know digging deep or also you know showcasing like the smorgus board of changes coming into each and every other release. >> Yeah. And it's in so many different areas. So it's good to have this kind of a gather all the >> like an eagle eye view of things going on. >> Um >> yeah and then we have for 25 as well. >> So hopefully we'll get it for 26 one of these days. >> Right. >> So we worked a lot of uh we put a lot of effort into improving the performance of the JDK. And here is an example where we run this spec jib uh 2015 jobs. On the left hand side here you can see one a job that's called critical job jops and that is geared towards something called latency which we're going to talk about later what that actually is >> right >> and on the right hand side it's more of throughput which we're also going to talk about later on but the good thing is that latency comes down a lot without even you don't even have to recompile your code you will get like 10 10% score improvements by just changing JDK >> right and This is from 21 to 21. >> From 21. Yeah. From 31. >> So just by the few uh two two most recent LTS releases. >> I mean if we look back even further the improvements are even more >> massive. I would say >> massive. Yeah. >> Yeah. And it should be noted that every you know tiny percentage here is a struggle because it's already so optimized. So we are really happy to see these figures materializing here. >> Right. every incremental improvement going for forward is probably going to be, you know, a lot of more work. But >> yeah, >> we're not going to uh be >> intimidated by >> intimidated or disadvanted by that. >> Exactly. >> Uh >> and here is another example for a more modern framework uh which maybe use some you know modern constructs of of Java and then the difference is even >> we're Oracle so we have to show Helen of course but the same same thing kind of applies if you look at other corpus or micronaut whatnot. Exactly. >> Uh the difference there if you you characterize a little bit is like specb 15 is an old benchmark doesn't use lambdas doesn't use uh much outside of you know like forkarm pool >> uh if you use a more modern uh application as as your benchmark you will see >> an even more pronounced different >> more work going on in modern language features. Yeah, I'm just curious how many of you have have any project still running on eight or earlier. >> Okay, so you have some work to do here. I >> a few hands. Not too many. >> No, actually I was surprised. I was expecting more. Maybe they wouldn't admit to it. >> Right. Right. How many is running on 17 or 21? One of the Yeah, more hands. Lots more hands. Thanks. >> Yeah. Okay. >> How many have already upgraded to 25? >> Good. >> More than eight. More than eight. >> That's great. Okay. So we're going to talk about metrics today. I mean performance is a very wide you know concept. So we're going to speak about a little bit about what is a little bit about different metrics there and some challenges when you are measuring performance and then we're going to go zoom into more details. How can we get this kind of performance? How can the JIT or the C2 compiler help us? And then we're going to not we're not going to talk about 13 performance improvements. We will see how far we will get because we want to discuss some of them and maybe we got stuck on one or maybe we have more time and it will be more. >> Yeah. >> So don't expect 13. Spoiler alert. And then I'm going to summarize what we spoke about. So I'm going to start uh talking about metrics here and most of them are timebased metrics. So we have the usual what when we talk about performance loosely we we we commonly mean average throughput that is how many operations per second can we actually you know perform >> it's the easy thing to measure that many benchmarks >> yeah you're just you know trying to squeeze as much as possible through and see what happens >> batch style kind of benchmarks >> exactly but then there are al also latency and that is very important in some applications like trading and you know you have an SLA saying that almost all of my calls [clears throat] will complete in this or that time. So for example we talk about typical latency and suppose you do 10,000 experiments and measure how long time would this method take and then you order them in order and then you look at uh the the second last one. Uh no I mean you look at the the one in the middle that's the typical one. >> Mhm. >> But I went too far. I went to do the other point here and that is if we want to talk about the 99.99 commonly referred to as 49s then you have to do like uh 10,000 and then you just don't look at the outlier on the most right hand side but the second slowest one and of course in reality you you can't do just 10,000 you have to do millions to get any statistical significance of course but that's the concept of of late latency and then we have startup time uh and startup App time is the time it takes from you start your application until it produces its first output. In the simplest example, it's like you hit return and then prints hello world. But in a more uh you know real world example, it's when you deploy your web application or start your web server until you can first service the first incoming request. >> Mhm. >> And then it's warm up time. That's something different. Of course, the warm-up time is longer than the startup time by definition, but it's like how long? >> Yeah. Like the time to performance or how long does it take for your application to warm up to the steady state peak performance. >> Yeah. As you might know, Java compiles your application on demand, >> right? So, it's kind of like application defined. Uh you have to >> so this is a more loose loose thing, >> right? We we try to to pin it pin it down in a a few cases where that we are measuring but it's elusive concept. >> Maybe you can say that it's the time it takes for to reach 99% of the you know steady state performance or something like that. >> Yeah. You have to cut it off somewhere otherwise you just goes into infinity. >> Exactly. And then there are different metrics. What do we have to consume in order to fulfill all these requirements? So we have memory usage of course maybe you have a sorting algorithm which is faster than another but using more memory well that's not good. Uh you have access pattern that actually turns out to be a resource how you access memory if you just have a scatter loading that that's a pain whereas if you can uh address your memory in consecutive that's much better for uh modern CPUs >> like replace all your link lists with array lists. Yeah. Or if you do matrix multiplication, >> do that in small tiles rather than just, you know, iterate over the entire thing, >> right? >> And memory pressure is another one. If you allocate a lot of memory, of course, you have to recuperate that memory via the garbage collector and you know, you create this memory pressure. So maybe you can do in another way that doesn't create all these object >> or maybe you can make sure that the compiler can remove those creation of those objects. That's that's great when the JIT can you know constant fold and all do all that >> escape scale scalarize your object so that >> your allocations actually disappear in in practice. >> Exactly. And then with thread uses of course if you just use one thread to do the same work or if you use the same job with 16 threads or whatever it's better that you conserve threads because that's a a resource. Contention is another one. If you have a a place in your code that that only one thread can be in at the same time that you know creates contention and that's bad. So maybe you can design around that and allow some some other kind of access mechanism, >> right? And [snorts] most of these things you can actually you know get a tab on with tools these days like if you're running a Linux running perf tools >> uh you can get metrics on on your app like how many how many cache misses how much contention how much inter you know interconnect traffic is there and and whatnot. So it's like we're living in a in a time where all these metrics are, you know, readily available without you know much further ado. >> So that's fine. >> Branch misses and all that stuff important. Code size is another you can write an algorithm that is maybe a bit faster. All this is 5% faster but maybe was like 500% more code and code space is also a valuable resource. Yeah. >> We only have so much place in the code cache. It >> all adds up quickly. M >> and then there is SIMD. SIMD is like those uh single instruction multiple data the the uh vector operations you can say >> and that is also a scary resource because in some CPUs they are shared in some other CPUs you the CPU needs to clock down if you use the wider versions of them and so that becomes a resource as well. Uh so all this boils down to energy efficiency and I mean in today's world when uh energy and other resources are scarce scarce it's important to be able to be as efficient as possible. Yeah, and we recently ran some uh some numbers on one of this is from one of the Takappa benchmarks that uses Spring Boot uh and collecting collecting some stats just with uh the Perf energy package and an NG RAM uh and we can see you know when contrasting the ADK8 through 21 and 25 uh that uh the energy use on on like a fixed load is down 24%. M >> and and much of that is uh the the lower one is the uh RAM the access memory access contribution to energy. Uh it's also down by almost 40% uh by you know smarter improvements to garbage collectors better improvements to escape analysis getting rid of uh memory accesses in in across the board. >> So uh you know >> and you you can see how this correlates with the figures we had in the beginning. So I mean energy often translates to performance >> right on the same hardware is like >> what what's this JDK8 PF? >> Oh uh that's that's a commercial offering that Oracle does, right? Uh so it's like most of the VM changes backported into a JDK compatible package. So if you're >> it's for those lazy people who didn't convert to J. >> Yeah. If you're stuck on eight and want some of the performance improvement uh we have we have a solution for you. Okay. Okay. Platform is also important of course uh what kind of CPU you're running on even what stepping it is. Uh the compute to retrieve ratio has changed a lot over time. When I started using Java back in 1.0 the ratio was about one to one. So it took you know one clock cycle to access memory and one clock cycle to compute something. Whereas today it will take you set you back around 100 clock cycle just to get in 64 bits of memory. So therefore it kind of pushes solutions from you know just caching simple things uh it's better to just calculate them locally if there it's very simple to do. So that you know changes the algorithms that we use. Number of cores is another one. Of course if you run on a uh two CPU solution or if you have a massive 128 platform uh the program is going to behave differently >> very much. Yeah. >> And these cores are interconnected in different ways. Maybe if they are adjacent under the die, they might even share caches and so you can communicate very rapidly between them. Whereas if they're on completely different dice >> Yeah. like in Numa setup with multiple sockets. >> Yeah. >> Um >> I think we're going to see that in an example in 30 slides ahead something in Numa. But you know it's it's it's one of those research problems that you know even even though we do as much as we can to to make the JVM work nicely on multiple sockets it's it's still going to be you know you have shared resources you have compiler you have compile caches that uh you know we there are modes that allows us to replicate across numer nodes but it's generally uh recommended to or you know run JVMs [clears throat] pinned tool specific specific node you >> and maybe use other means of communication between >> we're working on on on making the JVM smarter on crosset deployments but it's it's it's a very hard problem >> indeed >> okay operating system of course is important >> uh okay the application is also important how is your workload uh distributed what's the shape of your data pollution is something that is being used often in in within the JDK community it's like you run on some kind of data that has some kind of statistical behavior and suddenly they there's another behavior and because you have compiled assuming those uh statistics will hold you will get bad performance and that's called pollution. >> Mhm. Uh so summing this up, we usually measure average throughput. That's our main objective. We measure after warm-up uh with the kind of mantra. If they compile, it's hardly used anyway. And that's the truth with kind of a modification because it is important at startup when you're running in compile mode. >> But we we we focus on things that's been called many times, >> right? And you know there's a lot of fuss about AOT modes on in in Java and uh even even in those cases we you know you have to make sure that you don't spend too much time compiling things that aren't actually going to be run because that adds [snorts] footprint to your binaries and whatnot. So we it's very important also in those cases to to see you filter out the things that are not used. >> The garbage collector might also affect how our program behaves. Of course, we have different platforms, Linux, Mac, Windows, and we will not forget forget our porting friends uh that are constantly keeping Java vibrant and possible to run on other platforms as well, >> right? >> Yeah. And I think Mac OS x64 is kind of sunsetting now. So maybe we can >> Yeah, >> at least I I'm not looking so much at that at the moment. >> We we still support it for the time being, but it's not our main focus. >> It's going out. >> Yeah. So when when benchmarking or you know doing performance work measuring performance uh has a lot of pitfalls. Um one one of one of the you know old sayings is don't use your laptop. Uh lap laptops tend to overheat get warm and changes your benchmarks uh numbers that you that you retrieve. Still we do you know we do [laughter] a lot of benchmarking on our laptop. So but you have to know the >> you have to know when when when uh you know your system if it you know how it behaves when when you can trust and and not trust run things back to back and back again uh so that you see that it numbers are consistent and you know if if the if the hardware is uh starting to act up have at it. Um but like we we do we have a lab setup uh both quite unrealistic with uh you know everything from tur turbo boost and uh hyperthreads turned off just to you know lock things down as much as possible >> to get it reproducible. >> Yeah. For for the things like we want to see uh every little every little incremental uh contribution from every little patch uh to to to be able to uh isolate uh changes. uh and then more like you know setups in in in our own local cloud uh things to uh measure in more you know the kind of the kind of system that you would run your production workloads in. >> Yes. [snorts] >> Uh when when benchmarking you know uh you know need to make sure that you're looking at something that is comparative that is uh consistent across runs. So you need to warm up run millions of times. uh you need to set up your benchmark to to to realize that or to to guard against the the fact that you know the AVM can do a lot of dynamic things like you know figuring out that yo in this in your particular benchmark this method isn't even used so I'll just >> throw it away >> throw it away >> uh that code elimination it's called uh so so the JVM does a lot of things uh >> it's good at speculating >> it's good at speculating >> just like us [laughter] >> oh yeah uh that's that's how half our work speculating about what the JBM will do and then testing it >> and uh and then uh uh you know figuring out why we were wrong >> uh [laughter] and then and then you know looking looking scientifically at at things you you run into these things all the time if you you know get an a benchmark from a customer and then they say that this this isn't working or this is working uh poorly or whatever. So yeah and and of course when when measuring uh time uh there's there are lots of pitfalls like typically people say you use system nano time because it's uh it's monotonically increasing compared to current time mills and and and such things. Uh but you should also be you know cautioned uh that system any time you know can cause u cross socket contention. Yes uh it needs to be synced across cores and and whatnot. So it can actually have a pack quite a punch when it comes to >> uh contention and and latency overheads. So also uh you should have a system that um uses like these uh time functions sparingly or in a in a controlled fashion so that you advertise the cost of them over over time. >> Yes. Over time >> and the go-to tool for this is uh the Java microbenchmark harness that we you know >> we use all the time >> in in the open. We uh you know we test all our u benchmarking hypothesis hypothesises uh with benchmarks written in the JH uh uh tool. It it it gives you a lot of uh you know things for free. You just write your method annotated with benchmark. It cooperates these days with the JVM. You have like a black hole implementation. So you just return whatever from the from the code. >> So you properly consume that value. it's consumed so that the JIT doesn't think that oh this method doesn't do anything useful and throws away the entire calculation. >> Uh instead you know you go through a black hole that that puts a fence uh against uh you know >> discarding it >> discarding it's like oh this is this is going to be used yeah sure uh then you don't use it >> um and don't add any much much of overhead. So it's pretty cool uh thing. >> Maybe we This is how it looks like. >> Ben benchmarks are you know simple simple uh tests. >> Yeah. kind of kind of test simple simple Java code you annotate with various things to >> this benchmark annotation here and you can you know add in command line parameters all >> there's a lot of cool samples and you know the JDK these days we have hundreds if not more uh benchmarks that that you know zoom in on different packages different utilities different you know parts uh of of the system so we're we're building that out cataloging uh benchmarks >> and a good thing is that people actually looking at this when we do changes and we do weekly releases >> and ran on the benchmark and and I want to say to you that they will uh they will get back if you did something wrong they would say that oh we have a regression here what's going on here guys >> yeah that's a little bit of my job description is is finding those regressions and >> run run telling to uh to Paris's boss Uh so so the results are uh you know we we try to uh look at as many many benchmarks as possible uh over over every release and you know the result from uh this this this benchmark is built into the ADK. So we you know we get the results on on every build and and look at these uh you know we don't we don't we don't test all the microbenchmarks on every uh build that would take too far. >> Uh but we we you know we we use this as as a way to guide uh performance uh work and then uh yeah you don't you don't you you don't microbenchmarks are you know a curse and a blessing at the same time. uh you you need to use them as like you you would use unit tests in your [clears throat] in your application stack. They they don't they don't you know tell >> the answer >> no and and don't give you the full picture. You can also you know stare at local optimizations and and thing like you can overoptimize something specific but you know regress the total system performance. So but but but it's a very useful tool to uh to zoom zoom in on things and you know we use uh we we have our own internal uh visualization tools but there's also some like this >> public tools >> public tools that just you just dump out the output from from the image and you can >> I think this is nice because you can see the error margin there and then you can relate that to the actual improvement. So maybe you got this small improvement but the error margin was like this and then maybe you shouldn't care too [clears throat] much about it. Exactly like this for example. >> Not not any difference at all between Yeah. >> And you can of course you know upload this in your favorite uh uh you know number or Excel sheet and >> Google Docs. >> Google Docs. Yeah. >> We have our own own internal system of course to to to help us uh guide and find find these things with you know triarching over many machines and run thousands of benchmarks and everything. >> Different columns here are different uh hardware and operating systems. So you can see a change well how would that you know appear in that platform and how will that appear in another platform sometimes they contradict. >> Yeah we we put some JPEG over here because this is an proprietary system we can show you this. >> Okay so we come to the next section compiler tricks and this is kind of a thrill or shill section. We're going to go deep deep deep now into the inner workings of the compiler. >> So we're just going to do a cursory glance just to scratch on the surface. But if you think this is, you know, hard to understand, bear with us. We will we will bounce back later on. So this is the circle of life for Java code. We we start out at interpreting mode. We'll read each and every bite code. We mutate some state, read the next by code, mutate the state. And this sounds very slow. And in fact, it is slow, but it's it's surprisingly fast considering everything that is done, >> right? But we can we can go faster. So >> yeah, we can do that. So what's what is C1 here? Yeah, like when once we realize that some method is is hot enough and to warrant us looking deeper at it, we throw it at the at the first tier of our compiler uh JIT compiler infrastructure. C1 uh is that uh C1 is is originally like the the um what what they call the desktop uh compiler that was used uh for for the applets. It's it's a sloppy sl you know the sloppy quick and quick and dirty compiler. So we use that to do you know generate something that is faster than interpreter but also has more profiling uh profiling counters strewn out into the program. And when we say profiling we mean gathering statistics about the behavior of the program >> right like which which loop is actually executed which what if branch >> which parameters are null things like what branches are taken and so >> so so we collect a lot of data on on on you know every particular bite executed >> and then finally we can go to C2 >> after a while maybe a couple of 10,000 operations or so >> right or some other compiler >> um yes uh so uh it's like the this tier tiered approach allows us to uh get a lot of uh important data or you know that that that drives you know assumptions speculation uh you can do certain optimistic things if you realize that certain branches are never taken you realize you know you identify code blocks that are never executed and can eliminate them you >> and this is the reason why jaw sometimes can be faster than other programming language that are compiled before you run them because we know more. Not only do we know the code but we know how the code is run, >> right? >> And we can speculate around that. >> Exactly. There are >> but sometimes there are unfortunate events, >> right? So even even the code we you know to to to have the uh feedback if if you do a lot of speculation you can deoptimize your code [clears throat] back to the interpreter. So the JVM does a lot of speculation but but puts in safeguards like if you're assuming that something is never null and then you find you know you trap on on something that is not pointing to uh the null pointer you get a trap you you know roll back your execution and go back to the interpreter >> and kick back to square I mean square zero is the correct name for programmers not square one >> you know this is this is a surprisingly robust thing >> it's the natural course of uh of operation >> exactly But yeah, it's been it's been a few bugs in this area. But anyway, >> yeah, so let's talk >> things that the Cits do to help your program along. >> So there is something called inlining, loop unrolling, >> hoisting and dead code elimination. There are lots of more but we don't have time to speak about them today. Uh we have auto vectorization also. So let's start with this simple code snippet here. We create uh two arrays source and destination and then we copy from the source to the destination. Very common pattern you see all the time >> and we are running just a loop that that iterates over uh the source length and just you know >> by a segment for segment copied into >> right but calling methods takes time. So uh like one one of the first simple and uh optimizations that that you do is inlining. So you take all the calls, you know, copy copy what's every whatever is in them into one big super method and uh you compile that one instead. This um of course gets rid of the overhead of of many many calls. Uh but it also uh unlocks quite a few other optimizations >> further down the pack. >> Like if things that are [snorts] outside the method that is being compiled you don't know much about you can speculate too much about it. But everything that's in that this big artificial super method you can it can uh do a lot more. Um >> so what you can do for example is to do something called unrolling. So >> every every time you do this iteration you have to check this uh you know predicate I is less than source length on every iteration. But of course you can if you do in in chunks of four and then you can bump the counter by four and then we can just have a cleanup you know tail cleanup at the end. >> Mhm. Of course, we can amotize them the looping by by a fair amount. >> Yeah. That this this removes uh you know 75% of the branches in the main loop. >> Exactly. And branching is is expensive. >> Very expensive like memory access check if you know. >> Uh so that that's that's one optimization that that's you know bread and butter to to what these two do. The next one is hoisting. Uh like every on the previous slide on every array check uh you would do uh you know >> range checking >> range checking and and and such things. If you can hoist that out of the loop uh >> that's that would be great. >> So uh hoisting is the is the process of taking things that are loop invariant doing them you know before or after or you before primarily. >> Yes. >> Uh so that you don't have to do them. So, so this now now the array accesses are you know synthetically gotten rid of the array check then you know we are uh >> and the parenthesis here is to this is not purely this is not Java it's just to illustrate that this is a pseudo code >> it's an access without range check and if you are very you know uh formal here you would notice that this is not quite equivalent because if you get uh an array out of bounds on on this line here for example the message there will be different that that will be fixed in the next slide which is dead code elimination >> because uh if you look at the creation of the race you can see that this uh you know predicate up here in the if statement will never trigger so we can just remove that >> right >> and then we didn't even we hoisted it and then we removed it so that's a great win >> so we went from like uh three uh branches per uh per uh per access to zero Zero. >> So that that that that kind of thing is uh is extremely you know beneficial in in practice. You can go from you know >> a few hundred kilobytes to uh gigabytes per second in in throughput in in certain cases. Uh the JVM does a lot more uh like we have auto vectorization since uh early JDK7 I think. >> Mhm. >> Uh but it's been it's been you know gradually improved. Um >> so the auto vectorization is uh that you can work with logged or chunks of data. M >> so in this case we we have a byte array but internally we can say that well we would like to look at this byte array as a long array and then we can do long access instead and that means that we can instead of eight bytes we can have 64 bytes per chunk and of course that will amatize as well because it's just in one instruction we can work with more data >> and so we can see we can bump up uh eight steps for so it does it's not only that we can access them in larger chunks but we can we need to iterate fewer times, >> right? >> But that's not the end, >> right? Long long values are 64-bit and computers these days, they have SIMD registers up to 512 on Intel hardware. Probably going to be >> even more in the future, >> right? Uh so we we have code to uh automatically, you know, emit u SIMD instructions on on on the processors that that support it. Mhm. >> Um, you know, taking advantage of the of the fact that you don't even have to know if you're going to run on that kind of hardware. The JVM is going to sort it for you. >> Uh, and you know, we can iteratively apply this optimization over and over as you know as as as compiler does. So we have >> so this is kind of the end game >> hoisting and unrolling. >> Hoisting and unrolling and super wording. So here you can see that we bumped 256 byt per iteration. So this is really really high performance and and in fact these operations here would would translate to pure machine code >> right assuming that your inputs are actually large enough to to take advantage of this and that's also >> but then again if if it's just a small array it doesn't matter so much but if it's long array it does matter >> right but that's you know something we pick up during the profiling phases like how big is your data and you know figuring out which which does it >> pay off you know how deep should we go >> okay so let's look at these uh performance improvements which is not going to be 13 when I look at my watch. uh we have let's go into those jeeps because jets is one thing that maybe that's the shopping window of what we're doing but there's a lot of other things happening underneath >> and there is an article written here of all the jets that's been you know uh integrated since from 21 to 25 and you can just go there read all about it and actually yeah so these are not some of the deprecations and the removals >> why is it important removals and deprecations >> uh deprecations removals you know some sometimes the key to performance is getting rid of uh um degrees of freedom for your application. If you if you're if you have code that that sometimes goes in and you know pulls the rug from under your feet, uh you're always going to, you know, be looking for for those you know exceptional cases. You can remove things that >> uh that sometimes uh make your ba program behave in a very different way. then you can simplify and you know streamline a lot of things. I remember certainly this one removed the security manager. There was a really wow moment because the security man was all cluttered out in the entire you know core libraries and creating know anonymous classes all the way and really ruining the startup process >> right and and and surely like we have we have optimizations in place so that you know if a security manager is never installed in a system we realize that every call site like oh I'm going to check if the security manager is on uh but it's never on uh so we code eliminate all that stuff >> but still that's work and that's it has to be maintained >> interpreter has to run through that the the you know it has to go through all the phases and such if we can get rid of it we we reduce the resource usage of the JVM >> and there's a a whole podcast about deprecation and removals with all our own doctor deprecated there so if you're interested you can go there his name is Stuart Marks >> yeah so this one uh the first one here is not about performance at all but it is a tool to pinpoint performance uh you know snags in your code. >> So this is kind of a microscope into your application. So now you can say that I want to get notified every time this method is called and I want to have statistics about it. >> Yeah. So JFR has been around now since well in some versions since JDK. Uh the improvement here in 25 is to get more you know lowlevel uh uh method timing and tracing built in. And earlier you would have to use something like what's it called the you know you know either a big big profiler like vune or you could use the async profiler. Uh still great great >> tools for devops kind of people >> right now now you can you you have that built into your uh into the JVM on on on systems with higher granularity than than before. So it it improves the uh you know the insight you can glean from from the tool. >> So here is an example. I I said I'm interested in hashmap resize because I I realize that the re resize is expensive [clears throat] and so I get this the the stack traces of all those and then I can go in there and maybe see that maybe I want to create my hashmap with a bigger upfront size so I can avoid that kind of thing. >> Yeah. Or you can even trace like class loading and see what classes are loaded >> and you can see how long time it takes and maybe if you can you know initialize your class in another thread or you can try to delay class initialization if you're concerned about >> exactly flight recordings are are a great tool now now they are better than ever. Uh we still you know there's still the mission control application to to to consume them in a visual fashion. This JFR view tools are, you know, command line complement that I find very very nice to use since it weaves easily into tools and uh and benchmark tools. >> Yeah. So about this one. This is kind of complicated. >> Oh, >> can you tell us more about I like this one uh because well there's an anecdote about uh the ZGC uh project finding that the way that uh we put in so when when when whenever you do memory accesses or accessing an reference object in Java you need to have a cooperative barrier uh for the garbage collector uh since objects are prone to move around. >> Yeah. Uh so you can you know you allocate something in one area and then uh during the FA you know uh >> it's like a three-year-old child running >> yeah you have you have something else running around rearranging your uh your your blocks uh so to avoid stepping on Lego we have barriers to to to uh you know make sure that >> if I'm accessing a reference to an object that has moved you know you patch it up a little bit and then uh you you go go and look at in the right place. Uh so this was done um in the parsing phase when when doing you know parsing the the code. Uh so you >> at compilation >> at compilation yeah uh so it was done very early in in the in the uh in the uh phase up until >> recently >> JDK 24 for G1. So it was a little bit differently for different GCs. uh set you see was earlier in in doing the late barrier expansion optimization. So what really what it entails is rather than doing you know all the barrier stuff up front in in the in the parser uh that created um you know nodes in the compiler graph tree to be uh compiled. It meant that the compilations were growing quite large in you know as you inline more and more stuff. Uh you have barriers accesses to objects all over the place. Every every such access you know adds barrier nodes to the comp compiler tree. Long [snorts] story short, if we can move them uh to the back end, not have them as part of the of the sea of nodes graph tree and that the uh that the that the optimizing compiler is uh is uh operating on that reduces the problem of of compiling quite cloud application for example >> reduces reduces the resource usage in uh is part of the story for for why we see quite substantial reduction in in memory use is that you know we by by doing this uh we reduce post times >> there there's you know there's a flip side of this uh this was originally done in the set GC not so much as as a warm-up startup optimization but you know in in this in the in the scope of uh uh of set you see uh this the the original case was causing correctness issues. So this was originally a correctness issues that realized that no if we apply this everywhere else we have a big startup improvement uh and possibly better correctness. >> Okay. So this one is uh this one reduces the the possibility of starvation of of the garbage collector. Uh we will skip that because we have more interesting stuff. So here we have a zg generational move by default. And what what does it mean by generational? Well, first the ZGC is for larger heaps in terabytes and it were introduced in 15 and gradually we moved to this generational stuff >> and before we had non-generational mode meaning that the heap was just one uh single blob and when we created a new object there like this chicken here which is shortlived uh unfortunately and then the hens they will live longer and the geese will have to look at all these objects all the time when it evaluates whether they are alive or not. So if we look at the time distribution as I said most most objects die young. >> Yeah. [snorts] >> And that's that's the that's the hard reality >> generational hypothesis. It's been around since you know in academia since the 60s and most most other garbage collectors at GC was already generational but >> and the idea is to have a separate place for all the chickens. So it will get created here and of course if you look at them uh both at the same time the distribution will remain the same. But if we look at them individually then if because the objects move to the other one they are they they have to die in the sense that they are moved or are they survived. So the probability that they will die will relatively increase whereas it's been the other way around. >> It's like me I have now lived for like almost 60 years and I but I haven't died. >> No. So that means I have get rid of all those probability when I drove a motorcycle or whatever I did when I was a you know crazy kid that's all been away gone away. So I would probably live longer than the average guy now >> right expected life life expectancy increases as you get older which is uh the same for objects. >> Yeah. So therefore it's likely we we spend more time there with the can of kids and spend less time here and that means that the garbage collector is more effective. So that's the statistics. What's this one? ah compute object headers so we're running a bit short on time but uh compute object headers uh implemented by Roman kink so this was an external contribution uh is is is now in 25 enabled by not enabled by default but it's not out of experimental uh it allows us for so so objects all carry some luggage uh we call them object headers that that has uh information like bits pointing to which uh you know which classes this object off. Uh, you know, we have some space for hash codes. We have space for uh for GC specific bits and locking and and and a few other things. Uh, and totally in total on a typical 64-bit JVM. This adds up to 12 bytes I think in per object, >> which is quite a bit of overhead. It's like on an integer box object which carries four bytes of useful data. That's >> that's a lot of overhead. That's a that's a lot of overhead. Yeah. >> Uh if you could reduce that just a little bit uh then that would be great. >> So compers uh in this iteration that is now uh in JDK25 reduces typical object size from 96 to 64 bits. Saving four bytes per object. >> Yeah. >> You know four bytes per object adds up to quite a lot of improvement. uh we >> if we look at how modern CPUs looks like it's very important that we you know stack more things into the caches for performance reasons. >> Yeah, you mentioned earlier that like the C retrievable stuff uh you know 110 cycles on a typical CPU >> to to retrieve from memory but executing you know calculating this. So compactness is important for uh >> that moves up a lot of we we get more space in L1 which we can move objects from L2 to L1 and so on. >> Exactly. And an everinccreasing amount of stuff can go further up this uh tower so to speak. >> Exactly. Getting getting rid of excess uh space is one of the main things I think for for the future for for JVM improvements. Uh this particular thing particular feature you know can improve uh uh JBP15 by you know 22% less heap 8% less CPU. It's a >> it's a great thing. It's not enabled by default but you should really run your applications with it. You can try today with this switch down here. Combine. >> Usually we, you know, all features comes with trade-offs. The only known trade-off with this one is that it limits the number of classes you can have in your application to 4 million. [laughter] >> It's very unlikely that you will have that ever. But I mean, >> let us know if you have more than 4 million and I will, you know, tell my friends, colleagues, >> then you can just turn it off and you can run with your four billion or something limit as as before. >> Okay. So we have five minutes left. Uh I think uh we can stop here and maybe see if they have any questions. >> Right. If there is anything particular of interest to anyone >> we have we have we have some slides on leaden which is the uh uh >> startup >> startup effort >> big big startup effort of the open ediate team. >> Mhm. >> But any questions? >> Okay let's talk about len >> good. new language features. >> Yeah. >> So the so the question is the with the introduction of new language features are there any performance regression and I think the the answer is yes. uh we need to you know keep tab of them and of course we are trying to mitigate that before we launch uh the the the the thing whatever it is >> right that that's one of the reasons why we have to focus so much on you know our microbenchmarks because like the the many of the these system benchmarks especially maybe 2015 likely won't be affected by you know any new language features because the benchmark is set in stone 10 years ago >> uh so we have to create new benchmarks stressing new language features all the time that's a big part of defining >> I think that that's one part of the un glorious job we have to do to kind of >> clean up all these regressions or all these regression but some of the regressions that get gets introduced like >> before before they become a problem >> before they become a problem exactly so that's a good question >> very good >> and one more >> so you see any any other big change in future other than that has huge impact on >> yes we are working on a lot of things and we think uh I have something uh on another slide here which we haven't talked about but that that allows you to that's called lazy constant for example allows you to say that this is a constant but I want to compute it later on but when it's computed I want to treat as a constant and it may be something called constant folded by a VM >> right >> and that is just the first step so we have uh more things up in our sleeves >> I I would make a short shout out to the foreign function memory stuff as well Panama uh which already is out there but is you know we're working on lots of improvements and tooling to uh make it easier to integrate JVM with uh >> [clears throat] >> uh native binaries and and such which is very useful. >> Yes. Okay. >> Um >> so what what's the question? Can you repeat? I I couldn't really catch the question. Sorry. >> JavaSc JVM unfortunately. So now is it at last time to get excited to try out the GC in? So the question so the question as I see it here is that will the Z gc be able to handle extreme heap sizes and and the the answer is that I know that they are working on ZGC all the time >> right >> but I'm not um you know I'm not into that detail of exactly what's being worked on there >> but I think you can have you know uh >> I don't I don't have any comparative uh you know I don't do comp comparative benchmarking [clears throat] with uh with other vendors but I I do know that you with the new iteration of non of generations that you see you you you would you will see a leap uh in performance on on many applications so it's a good time to I think re retry >> okay so that's great so uh let's just >> yeah we're running out of time here but >> that's great because we had a lot of questions so >> any anyone else for our last >> okay one last question you just mentioned a Few thoughts about startup. >> Okay. Yeah. A lot. So question is can we tell you a little bit more about leaden? And yes we can. So the the objective with leading is that you can do a training run and when you do a training run you can store all those class initialization maybe some profile data and even code in something called an AOT cache. And then when you start up you can just load that AOT cache and off you go. So you can skip all this you know sequence of the circle of life for code and you can just start there and that means faster start up and better warm-up time >> right >> so that's the gist of of late >> so the the promise of leaden is to have no added limitations like it's it's job as ordinary but you have this cache to speed it up uh on on first >> and it all relies on speculation so if you have if you speculate and then deploy it and the speculation doesn't hold okay then you just go back >> you can get a little bit of regression If your training run and you know misses the target that's always a >> but it's a very promising technology. It's going to improve you know work that is being done for for future releases is to you know do per you know AOT compilation entirely into your A2 cache. You can tune how far you want to go and how much you want to put in that. >> Okay. So I think we're going to get kicked off of the scene. >> So we have to we have to stop but thank you for interest. >> [applause]

Video description

JDK 25 has arrived, bringing a major set of performance gains over JDK 21—often letting your existing, unchanged Java applications run faster right away. In this talk, we’ll dive into 13 concrete performance improvements delivered between JDK 21 and JDK 25 across the standard libraries, the JIT compiler, and garbage collectors. Along the way, you’ll get an inside look at the design tradeoffs behind these optimizations and how JDK engineers evaluate performance in the real world—where platforms differ and optimization goals can conflict. We’ll also spotlight one of the most exciting new additions: the preview feature Stable Value, which lets a field combine key benefits of both mutable and immutable data. You’ll learn how Stable Value works, what kinds of speedups it can unlock, and how you can start taking advantage of it today. Presented by *Claes Redestad* and *Per Minborg* (Java Platform Group - Oracle) during *Jfokus* 2026 ➤ https://www.jfokus.se More on Performance ➤ https://inside.java/tag/performance Tags: #Java, #Performance, #JVM, #OpenJDK