Breaking Storage Barriers: How RabbitMQ Streams Scale Beyond Local Disk - Simon Unge |MQ Summit 2025

Code Sync · 169 views · 10 likes

Analysis Summary

20% Minimal Influence

mildmoderatesevere

“Be aware that the 'ABBA disaster' scenario is a rhetorical device designed to make a specific technical limitation feel like an urgent business crisis.”

Transparency Transparent

Human Detected

100%

Signals

The video is a live recording of a technical conference presentation featuring a human speaker with natural speech patterns, spontaneous interactions, and personal context. There are no indicators of synthetic narration or AI-driven content farm production.

Natural Speech Disfluencies Transcript contains frequent filler words ('um', 'uh'), self-corrections, and natural pauses ('Oh, works', 'I have to be closer').

Personal Anecdotes and Context Speaker mentions being Swedish, living in San Francisco, and jokes about his colleague's comments and the 'hat' he is wearing.

Live Event Interaction Speaker interacts with the audience/tech setup ('how is this working fine by the way?') and references previous years' summit talks.

Spontaneous Sentence Structure The phrasing is conversational and non-linear, unlike the rigid, optimized structure of AI-generated scripts.

Worth Noting

Positive elements

This video provides a detailed look at the internal storage mechanics of RabbitMQ streams and a specific implementation of tiered storage to reduce recovery time objectives (RTO).

Be Aware

Cautionary elements

The speaker explicitly separates his 'Amazon hat' from his 'RabbitMQ hat,' which may subtly obscure how this technical direction aligns with Amazon MQ's commercial interests.

Influence Dimensions

How are these scored?

About this analysis

Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.

This analysis is a tool for your own thinking — what you do with it is up to you.

Analyzed March 13, 2026 at 20:40 UTC Model google/gemini-3-flash-preview-20251217 Prompt Pack bouncer_influence_analyzer 2026-03-11a App Version 0.1.0

More on This Topic

Related content covering similar topics.

Using RabbitMQ Steams In Go

ProgrammingPercy

Minimal Transparent

rabbitmq distributed systems

Mnesia as a complete production database system | Chaitanya Chalasani | Code BEAM V

Code Sync

Minimal Transparent

distributed systems erlang

Microservice the OTP Way - Diede Claessens | Code BEAM Europe 2025

Code Sync

Minimal Transparent

distributed systems erlang

Transcript

Hello. Oh, works. Um, uh, you actually did a very good job. That's, uh, pretty much how you pronounce my name. Uh, so yeah, I'm Simon Simon Una. Uh, I am Swedish. Uh, my colleague thought it was important to to tell you that. I don't know why. Uh, but I live in San Francisco. I I do work for Oh, I have to be closer. I do work for Amazon MQ. Uh, more specifically, I work for Rabbit MQ or I work with Rabbit MQ for the Amazon MQ team. So, that's the hat I'm going to wear today. Not the my Amazon hat, but the the Rabbit MQ hat. Um, and today I want to talk to you about how is this working fine by the way? Okay, good. Um I want to talk to you of how our team is thinking about how to break the storage barriers of uh Rabbit MQ streams and how to uh scale them beyond the local desk to stand here. Uh first a little quick primer for for the ones out here who does not know what Rabbit MQ streams are. Uh they are an append only log one of the Q types of Rabbit MQ uh with offset base consumption. think um Kafka topics or or a database transaction log. That means messages gets written by multiple writers to the in this context to the tail of the log uh to the very tip of the tail and you have multiple consumers and the consumers will can start reading from the log at any point that they choose an offset and they start uh reading from that offset into the towards the tail of the log and eventually they end up at the tail. So um keep that in mind that there is a tail uh where where writers are writing and there is uh that's the place where the most consumers eventually or all consumers will end up there. So the majority of the reads will be at the tail of the the log but you can start it anywhere and they read sequentially right they start somewhere and then they just continue reading. Um oh yeah I have this one. Uh yeah and the difference here of course from traditional cues is like once you consume a message it does not get deleted. It it it sticks around. So that means um storage here becomes a matter of throughput and retention. So if you have a retention policy that says that you want to keep your data for 7 days and you have a throughput for of 10 megabytes per second that ends up being 6 terabytes of data that you need to store. Uh keep that in mind uh as well for the rest of this talk. streams are replicated or they should be replicated if you want to have a uh durable uh system and handle distribute the load. So how this works in Rabbit MQ is that you write to one node we call that the leader and you consume preferably from the other nodes uh that we call the replicas or or the the followers uh in order to just distribute the load, right? Uh this is great for reliability uh and it's good for for load handling. um streams were designed I think for throughput in mind and the throughput for rabbit MQ streams are are kind of amazing. It's it's really really fast. I'm not going to go into why and how it's fast. There are talks from previous uh talks um for the Rabbit MQ summits in the past and blog post written by the Rabbit MQ maintainers that talk about uh how and why and all the cool features Rabbit MQ Streams has. That's not the topic of my talk today at all. Uh but it is cool. So I would look uh I would look look it up if you have the time. Uh but I don't think storage scalability was uh part of their focus. That was more of an infrastructure problem. So they they pushed that to to you or the users of Rabbit MQ. Um so one of the catches here like one of the the consequences of this design that you have an cluster of nodes and each node can handle the consumers means that all the nodes needs to have that data local as well. So they replicate the data. So if you have six terabytes of data in one node, that means totally you'll have uh 18 terabytes of data, right? Times the number of nodes that you have in your cluster because every node needs to have a full copy of the data set. So also keep that in mind for the rest of this talk. All right, let's uh let me tell you a little story. Let's do a little uh imaginary tale here. Uh let's say that you are uh responsible of an app called Ticket Champs. um a ticketing platform where you sell tickets for shows, hockey games, soccer, football here um should be everywhere. Um and uh let's say that um part of your system is a uh is a rabbit MQ cluster of three nodes where which handles your order streams and you have three mega 10 megabytes of throughput like I said before and you and you have seven days of retention. you end up having these 7 terabytes of data per stream and the streams handling your orders. They handle um confirmation emails, um payments, confirmations being sent out. Um you have an analytics platform that needs this data and fraud detection or whatever. Like a bunch of different consumers that needs this data from the screens. Everything works good, humming along nicely. But then a disaster happens and and ignore the typo. Um, disaster. ABBA announces a show and a one-day only show. One night only show. And we're talking about the real deal here. Like the the actual members of ABBA will be on stage. Not going to be holograms or anything like that. It's the the four original members of ABBA will be on stage. Not sure if you know much about ABBA, but this is a kind of a big deal. Millions, maybe billions of people want to take it, right? And and they want them now. And for whatever reason, they chose your platform and they did not give you a heads up. So So what happens? Well, traffic expose your orders um goes from 10 megabytes per second up to maybe hundred or hundreds of megabytes per second, right? Um payment processing going through the roof. Millions of confirmations, not millions, but a lot of confirmations needs to be sent. Fraud detection system is struggling, inventory system is struggling, your nodes are at capacity, CPU is pegged, and the network is saturated. So what do you do? Well, you scale up, right? That's the natural solution to this problem. You just add more nodes. Like three nodes can't handle it. So we'll add two more nodes and we'll have five nodes and and that will everything will be all dandy, right? Well, no, because we have the problem here of replication. So for a node to be able to handle traffic, it needs a full copy of the data and that takes time. 6 terabytes of data is a lot of data to replicate. So even if you have a fast um connectivity between your nodes say 1 GB per second if we do the math still ends up of being like 102 minutes or say let's say two hours for for replicating this stream and you probably have more than one stream but for for this little experiments let's say you just have one stream right uh so that means for two hours your three nodes that are up and running won't get help from these other two nodes. So the system is struggling. Your your customer gets all weird errors on their ends like they they tried to buy the ticket. It doesn't work. The tickets do sell out but they sell out in an unorderly fashion. So you will end up having very angry customers and angry reviews and you will have u headlines in newspapers that you don't want. So we have a dream and yeah it's going to be other people. So we have a dream in our little team to to fix this right. We wanted the scale out to go down from hours to just seconds preferably. Um and our thought process or progress of thinking was that we looked at what we have today where we have local only storage 6 terabytes of data that really doesn't really work in in case of uh ABA right we just saw that fail then we looked at traditional uh tiered storage where you um kind of solve that problem by having um hot and cold data. So you have the a certain amount of hot data and older data you you ship it off to some kind of remote shared storage and you call that cold data. Uh but usually you have like a set amount of data that you that you store locally like 50 gigabytes or something like that and that kind of solves the issue a little bit like you do get scale out um down from hours to probably minutes but it's not good enough for aba tickets. We want this to be seconds. Not not even not even minutes is good enough, right? So we want to get to the the lower point of this little graph here where we minimize what gets replicated when we add a node. And uh we can't solve the replication speed, right? The the six terabytes of data needs to be replicated uh on a a like the network is going to be the the bottleneck here. Like we can't make that faster. There's no no magic bullet to to fix that. So instead we need to solve what we replicate and what we want to replicate is as little as possible. So instead of saying we replicate only 50 GB of data, what if we only replicate the tail? If we aggressively upload everything else to remote storage and in that way when you add a new replica all it has to do is get the tail from the other nodes of the cluster say 50 megabytes or so because um well it no longer needs the full data because the full data is is is already in remote storage right and yeah compared to like traditional tier storage we don't have to wait for data to age out or your disk to to fill out like we immediately just archive it. So that's what we uh that's what we did or that's what we're doing. Um and the way we're doing this like we don't want to mess with Rabbit MQ's uh sophisticated high throughput. So we didn't want to mess with the control plane at all. We didn't want to mess with how writes are done. We did not want to mess with how replication is done. And we we still want replication to happen uh because it needs to and we don't we don't want to mess with that at all because it's smart and clever and and don't don't mess with that. We don't want to mess with the reads or we do want to mess with reads but we don't want to mess with the read APIs but we do want to mess with where the reads are coming from. So we wanted and we also wanted this to be pluggable, right? We we want to be able to fall back to the default behavior or or add something different if you want that. So we introduced uh an abstraction layer where we have two behaviors or infra behaviors think uh think interfaces in Java but in the Arling world which is the language that this is built on uh we call them behaviors. So um two behaviors that we think that you need to implement. The um abstract um reader behavior which it tells you how to well read the data from somewhere how from an offset uh a point in in in some file start shipping that to to the consumer. And then we have the manifest abstract which is uh where the data is like if it's uh where if you come in with an offset how do we find this message in a file or or whatever kind of backend system you have like that that's up to the manifest to tell you and we wanted this to work uh as it does today if you don't want to use this. So the default behavior implements these two behaviors uh by doing what it does today like the the the manifest is the local disk and um reads just read from local data local local disc just like it does today like we didn't touch that at all or you add a plugin the remote storage plugin which handles uh how to read from where to read and the archiving of of the data. Yeah, I know. Um, so let's talk about the the write path here. And I want to emphasize that this is unchanged. I didn't talk about how writing works, but this is kind of very simplistically how it works. Like you have a publisher, it writes to sends a message to Rabbit MQ. You have a writer in Rabbit MQ that writes that message to local disk, appends it to the log. you have replicas that take that message and and does the same thing on their end and then the message is considered committed when when you have a quorum in in your cluster and that's it like it's unchanged that works just like it did before then and I'm going to use the word uh asynchronously even though I'm not entirely sure I should we have a background process that uploads this data once it committed to to shared storage and update the manifest so again no impact on rights because we didn't mess with the rights at The reads however are a little bit different. We need to know the reader needs to know where the data is and it knows that by looking at the manifest. So a consumer who does not care where the data is starts a reader. The reader tries to figure out where the data is. If it's on local disk, well, it just behaves like it did before. It reads the data from local disk very fast. No latency as all. No latency at all almost, right? Because it's just local disc. And then it ships the data to the to the consumer. or it's not on local disk which means we we archive it on to um remote storage uh and then we read it from there instead and ship it to the consumer. Now, since it's remote, there will be a little latency hit when we open the connection and set everything up, right? But we think or we're fairly confident that this small latency hit of 50 milliseconds or or whatever it's going to end up being is all you'll see. But the because the throughput will be fast. Uh and it will be fast because or just as fast as reading from local uh disk. And the reason for this is because the data is sequential, right? We you start reading from one point and then you read forward in the history. So we know what the the consumer is going to read next. So we can do some smart logic there and do pre-fetching and make sure that um yes it's going to take a little hit in the beginning but then it's going to be throughput uh as you're used to and of course this is also completely transparent to the consumer. They don't care. They use the same API as before. So what does this mean? Um on the right or on the your left um we have the old or the current setup where a new node joins it tries to download or it downloads all the data from the other nodes at 6 terabytes and in two hours um it's ready that that's not acceptable for for Bjorn and Benny. And then we have the new version where instead the new node joins and it only fetches the tail 50 megabytes or so from the leader and it's up and running in seconds and and the node is ready to start traffic because yep again there's no full replication needed. Just fetch the tail and and uh everything is um is up and running right and it's a kind of a dramatic um change here, right? So if you go back to our ABBA story um where ABBA announced their tour and you saw the traffic spike and we need to do something and what we do is that we scale up. So we add two nodes and now in this new world we get two nodes in seconds and the concert sells out again but it sells out sells out smoothly. So you have happy customers or whomever the customer the customers who were there in time are happy. So the winner takes it all and the winners are you and and and the customers. Right now, we are currently building this. We in the Amazon MCQ team, we're currently building the S3 plug-in that implements this reader and the manifest part and does the the the archiving of data up to S3, figures out the reads, figures out how to do this efficiently, right? Um, it's open source and uh it's a work in progress. So, so please come and help us and and look at it and and um come with ideas and and yeah contribute. Uh but we'd also like to see you implement your own plugins for different backends like Azure blob story blob storage or the Google cloud storage or whatever like your QAP NAS or whatever you have. Now, I talked with one of the engineers from from uh Cloud AMQP about this and and he said uh can it be more generic and yes it should be able to be more generic like this could be what we're going this is not what we're currently doing but it would be cool if this is where we ended up because I think uh that most of the logic in the reads and the archiving and the manifest will be very similar depending on in regards of the the back end. So we could probably make a cool cloud plugin where most of the logic is and you just have to add small small modules or small uh behaviors for authentication and basically how to do the puts and gets to to those backends. This is where I u call you for support to to get involved. Uh this is open source like I said uh we have a PR open for the abstraction layer that needs to go into Rabbit MQ uh in the Rabbit MQ Osiris library which is the the backend streaming back end of Rabbit MQ uh PR196 and then we have the S3 plugin that we're currently writing that is working but it's very much a work in progress and like I said it's all open source so so please uh get involved um go and look at the PR for the abstraction and comment on it. Um, look at our S3 plug-in design and comment on that. Try it out. Create issues. Uh, better yet, create PRs that fixes those issues. And, uh, come implement your own plugins. Uh, share your use cases with us and and tell us if this abstraction level works or not. We think it does, but it would be great to to get more input. And um, yeah, thank you for the music and thank you for listening to my little story. The QR code should take you to the GitHub links and the Discord channel that's on the Discord Rabbit MQ server. Um, questions. No, I'll be standing in the booth uh this afternoon. I don't know when, but at some point. So, so please come and talk to me if you have u more detailed questions that you don't think you can ask me here. Right. Thank you. >> Thank you very much. Thank you, S.

Video description

✨ This talk was recorded at MQ Summit 2025. If you're curious about our upcoming event, check https://mqsummit.com/ ✨ As streaming workloads grow, storage becomes the bottleneck that limits RabbitMQ deployments. This technical deep-dive explores how to break through local disk barriers using tiered storage architecture. You'll learn the fundamentals of RabbitMQ stream storage - from segment files to I/O operations - and discover how to seamlessly extend existing systems through file abstraction layers. We'll cover implementing storage backends that preserve write performance while enabling transparent reads across tiers, all without disrupting live streaming workloads. Let's keep in touch! Follow us on: 💥 Twitter: / MQSummit 💥 BlueSky: / mqsummit.bsky.social 💥 LinkedIn: / mqsummit