Multi-Tenant messaging systems - Dirk Fröhner, Artem Gayardo-Matrosov | MQ Summit 2025

Code Sync · 145 views · 5 likes

Analysis Summary

20% Minimal Influence

mildmoderatesevere

“Be aware that while the technical patterns discussed are industry-standard, the examples and trade-offs are framed through the lens of AWS services, which may lead you to favor their ecosystem over alternative open-source or multi-cloud solutions.”

Transparency Transparent

Primary technique

Human Detected

100%

Signals

The video is a live recording of a technical conference presentation featuring natural human speech patterns, spontaneous humor, and contextual references to the physical event. The presence of authentic verbal fillers and real-time interaction with the audience confirms it is human-generated.

Natural Speech Disfluencies Transcript contains filler words ('uh', 'um'), self-corrections ('load sh...'), and conversational stutters ('from from an example').

Situational Context Speakers reference the physical environment ('first time I use such a hand mic'), the event schedule ('afternoon slots of this conference'), and previous sessions ('as we have learned in the keynote today').

Personal Anecdotes and Humor Speaker makes a joke about not 'eating' the microphone and uses colloquialisms like 'there's always something that sucks'.

Professional Identity Speakers introduce themselves with specific roles at AWS and Amazon SQS/SNS, matching the technical depth of the content.

Worth Noting

Positive elements

This video provides a highly clear and structured breakdown of advanced sharding techniques like 'shuffle sharding' which are often difficult to find explained simply.

Be Aware

Cautionary elements

The 'authority bias' created by the speakers' titles at Amazon may lead viewers to accept the presented trade-offs as universal truths rather than AWS-specific constraints.

Influence Dimensions

How are these scored?

About this analysis

Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.

This analysis is a tool for your own thinking — what you do with it is up to you.

Analyzed March 13, 2026 at 20:40 UTC Model google/gemini-3-flash-preview-20251217 Prompt Pack bouncer_influence_analyzer 2026-03-11a App Version 0.1.0

Transcript

So hello everyone. First time I use such a hand mic. So I hope it will work well and I don't try to eat it up. So, welcome to the uh afternoon uh slots of this uh conference. My name is Durk. I'm a principal solutions architect in Amazon Web Services. In my day job, I work with software companies on their multi-dimensional transformation. >> And my name is Artam and I'm a principal engineer with Amazon SQS SNS. >> All right, let's kick it off. And um we're talking about noisy neighbors as you can see already. Um, probably nobody really likes noisy neighbors. I would agree. But maybe they can be less stressful if we find ways to isolate them. Um, to make this all more tangible, let's also um um take this from from an example scenario. So imagine you would want to build a um multi-tenant uh solution that implements a right share service. So you are thinking big and your customers are now large companies. Each of those companies are your tenants and the employees of these large companies are actually the users of your right share service. This is uh of obviously dramatically simplified here. So we only look at the right booking service but that's good enough for this talk. And we're particularly interested in this booking processing queue. There's obviously a ton of other downstream systems that will also contribute to the processing of everything, but what we are interested in here are those components. We have an API resource probably there's an API gatewayish thing that uh manages that for us. We have a pre-processing resource that just does some sanity check um puts the booking request into the processing queue and immediately returns to the client 202 accepted. um we have received your booking um rest assured we are going to look after it and then we wanted to decouple and you probably too wanted to decouple the pre-processing from the processing for that we use the processing queue and as we have learned in the keynote today one of those aspects that we're looking after with cues is to reduce the temporal phase of coupling which we're going to do here but also um to protect the downstream systems from peak loads Now when you want to start building such a thing, you will probably do it like most software companies do it in the very simple way. Create a multi-tenant queue for all of your tenants. Very simple. You don't have to do a lot when you onboard a new tenant. So that's great as long as you have a balanced um content of your queue here. So all uh tenants have uh an equal amount of messages or at least there's nobody that stands out. But what if one of your tenants starts building up a huge backlog? So reasons for that are that one of your uh tenants or the employees of one of your tenants create a crazy amount of ride bookings. That's one reason. The other reason could be that processing of those uh bookings uh run into errors or take unexpectedly much time to do and that made makes uh you probably end up in such a scenario where we have a tenant one that dominates the queue and um the other tenants will suffer from starvation which is not good. So there are a number of uh mitigation patterns to address this and patterns are a great means for architects anyway, right? And if those patterns even help you to mitigate pain points, that's even better. Um, I said already it's a great tool for architects to speak in patterns and what it is even more important to understand patterns and particularly the benefits and the trade-offs that you get from patterns because you always need to pay with something for the benefits that you're getting. And this is why I always like to remind everybody that every architecture decision you take comes with a trade-off. Or in other words, there's always something that sucks. And every architect should be aware of this. Know the patterns and the trade-offs to pick the best choice on the table. All right. So let's look at some patterns. The first one is load shedding. You might think why load shedding? Uh is isn't that more of a general uh pattern that you can apply everywhere? Yes, you can. But you can also make use of it in such a situation. What does it mean? you just want to get rid of messages in a queue that have reached a certain age. And that might be really also useful in uh cases where your messages lose relevance quickly after a short amount of time. Let's go back to that example. If we look at the booking processing queue, let's assume we have an SLA of 5 minutes and then the end user should get the response for their booking request. And if after 5 minutes that message is still in here, we can also throw it away. It has no value anymore. So that's uh cool. Otherwise uh in general is it is of course a quite drastic tool to uh to follow. So you should really be aware um that you might also do evil things with blindly throwing messages away. Right? So it is in many cases and also in this case also more of a business decision than a technical decision when and where to throw messages away. Let's look at another also more general pattern which is back pressure. Um one cool thing of cues is um we everybody knows it I don't have to explain it that it can flatten peak loads. Um and there are situations where you have constantly fast producers and constantly slow consumers. This is in the lower case here. And that means over time the cues fill up. Um and we wouldn't uh rather want to keep message producers in general to slow down but the ones that are responsible for sending those messages that belong to our noisy neighbor. And uh in this uh example here, we are in the um fortunate situation that we also have um end users that enter our system through the web. So we can actually also use our API gateway to throttle a tenant. Typically in API gateway products, you have an API key that is assigned to a tenant and you can assign a dynamic configuration how many requests per time unit somebody can send. So that's cool. Um in other cases where you uh don't have an API entry point you need to address the producers directly with producer flow control and for that you have to have access to your producers and again it is more um of a business than of a tech decision um when to throttle and how to throttle. It is a pattern that can be used in combination with other patterns too. And now let's look at the other side of the spectrum. We saw we started with a multi-tenant queue for everybody. The other side of the spectrum is one single tenant queue for each of your tenants. That looks super nice at the first glance and super easy and super straightforward. Um, nobody can harm any other tenant. So that's great. But if you look at the operational um overhead, at the implementation overhead and also at the cost penalty that might come along with it, it might not be the best solution because you have those uh tenants here that constantly need to pull your cues and even if there is a tenant that doesn't produce a lot of uh messages, you still need to pull your cues and that is a waste of resources. Altime I just yesterday spoke with a customer who migrated away or let's say refactored away from this situation because of the cost penalty. Oops, that was the wrong direction. So what we've looked at so far are the two extremes of the spectrum. One MTQ for end tenants and STQs for end tenants. Um maybe we should look also at options somewhere more in the middle. And one thing we could look into is cell sharding. It follows the ideas of a cell-based architecture. And this is all about fault fault isolation and reducing the blast radius of when something goes wrong. And the idea here is it that you just use dedicated cues for a subset of your tenants. You can use a sharding function like you would do with database sharding. It is of course a little bit simplified on this um slide but you get the idea right. So um as long as everything goes smooth we don't need to look into that um too much. But if now for the first cell tenant one becomes a noisy neighbor. We see that only the other tenants in this cell are affected. Um the other cell uh in the other cells no tenant is affected from that. So that's quite cool. You can isolate the blast radius in this case to 25% of your overall tenants. Um it comes of course with the respective um overhead of implementing the onboarding and offboarding and making sure you choose um a good and suitable sharding function for this. But other than that it is uh no rocket science at all. All right. And now there are ways to mitigate noisy neighbor impact even more and without falling back into one single tenant queue as we saw before and at is going to talk about it. Thanks Durk. All right. So with cell sharding we can reduce our blast radius to the number of uh cells. Uh essentially with four cells it's 25% noisy neighbor can only impact 25% of our customers of our tenants or with 100 cells we would see 1% impact uh across the system. So the next strategy is to use uh shuffleing. With shuffleing, we still have uh multiple multi-tenant cues, but now we assign each tenant to uh several shards, two for example, in in this diagram. Um in this example, we're in a steady state. No one is building a backlog. Uh our producers, they uh logically assign a pair of shards for every tenant using for example a hash function. And when publishing a message, the producer will pick the one of the assigned shards that has the shallowest backlog. So how does it help? Uh let's say we tenant one becomes a noisy neighbor and now they start building a backlog in their boss assigned shards. So when publishing a message for tenant one, we're picking the shard with the least backlog. Doesn't really change anything. We're filling up both shards that belong to our tenant. But what happens for example for tenant two that has a partial intersect. They're assigned to shard two and shard three. Shart 2 is impacted. But when we publish a message that belongs to tenant two uh we realize our publisher realizes that uh shard 3 actually has no backlog. So it will direct all messages for tenant two into that shard and tenant two now is not impacted anymore. Same goes for every every other tenant. And now our impact is limited only to these tenants that are unlucky to have to share the same set of shards with the noisy neighbor. And this gives a quite a significant benefit. So for example, if we have uh 100 charts and we assign every tenant to five shards, now the chance of having the exactly the same five shards for two tenants is only one in 75 million. So our impact would be reduced to almost nothing when there is a noisy neighbor. Uh it would be very unlikely to have two tenants sharing the same set of shards. But what are the trade-offs here? Well, first of all, every publisher needs to know the depths of the chart every time they publish a message. So, for example, if I have a 100 charts, it means I need to periodically pull the depths of every shard uh in every uh producer. So my polling rate is a multiplication of the uh number of shards times the number of producers and it might not scale well uh as we uh as our capacity increases and generally it's not a nice property to have in a system where the more uh producers I have the more work my system has to do in order to run. So what other strategies could we employ? Let's look at some hybrid strategies where we have a multi-tenant queue for most tenants and we have a single tenant queue for selected tenants. So this is a typical approach used in for for example software as a service solutions where there are some special tenants that are paying extra for to have a dedicated capacity. In that scenario, all messages that belong to such a tenant go into their own queue. This gives us all kinds of options what to do on the consumer side. We could for example have a dedicated consumer capacity just for that queue. Uh this way our dedicated tenant can never impact anyone else. Uh if they generate a search of traffic, they only impact themselves at worst. uh but we still don't really solve the problem for the rest of the tenants that are sharing the main queue. So how can we improve this? We could implement a a dynamic uh Q dedication dynamic overflow Q creation. This is how it would work in the steady state. We have a single multi-tenant queue. All messages go into that queue. However, when producers publish a message, they look at the Q backlog and if they realize that the Q is building a backlog, uh they scan the publish rate for every tenant according to their uh view of how many messages they're publishing per second. And if one tenant stands out, they dynamically create an overflow queue just for that one tenant. and they start directing all the messages into that uh queue. Uh on the consumer side again having multiple cues it gives us flexibility how to prioritize consuming from this uh cues. We could for example have a separate consumer fleet only reading from the multi-tenant main Q and a separate consumer fleet only reading from the uh overflow cues. This adds complexity because for example producers they need to tell the consumers that we've just created a new overflow queue just for that one tenant. So you need to start polling it. Um so this is a great mitigation. Now when there is a noisy neighbor they don't impact anyone at all. They get their own queue. No one is impacted, but it doesn't account for a scenario where the backlog in the queue is caused not by a tenant publishing a high volume of messages but some kind of processing slowdown for messages that belong to a specific tenant. Uh this could happen because for example our messages are links to some files and some of these files for a specific tenant are large or maybe processing messages for a specific tenant requires talking to a dependency that is slow or has little capacity. So how can we address this? uh the other approach is to flip it and detect noisy neighbors in the consumers. So how it would work again we start with a steady state where everyone is publishing to the multi-tenant main queue. On the consumer side we're monitoring how much time we're spending messages that belong to a specific tenant. If at some point we realize on the consumer side that we're falling behind the queue, the queue is starting to build a backlog and we see that according to our local view processing messages for a single tenant takes most of our capacity, most of our processing time. In that case, we could signal to the producers that hey tenant 2 is noisy. And once producers receive this signal, they will create an overflow queue for tenant 2. Um when this happens again we have the flexibility of how to load balance between the cues. We could for example first read from the multi-tenant Q and only if it's empty then we try to read from the overflow Q or we could implement all kind of load balancing logic. So while this is a very effective mitigation strategy, it adds significant complexity. Consumers need to somehow tell producers that there is a noisy neighbor which kind of defeats the whole purpose of you know decoupling producers from consumer making them independent. Now they need to have some shared state where they agree what is the list of overflow cues created dynamically for tenants and where do we publish where do we consume from. We also need some kind of cleanup logic for the time when a tenant stops being noisy. We don't want to be stuck in a state where tenant two has long stopped publishing messages but our system is still having a dedicated queue for that tenant and keeps pulling that empty queue for nothing. Over time this could leak to an unbounded unbounded number of uh tenants. Such cleanup is also quite hard to implement actually because we can't simply delete the queue. We might risk deleting a que just as the message lands here. So, we need to have some kind of two-stage tear down process where first we tell the producer stop publishing to this queue. I will still try to consume it. Then once we're sure that now for sure no one is publishing there, then we can tear down the skew and delete it, stop pulling from it. Um recently in Amazon SQS we have released a feature called fair cues that implements this pattern uh automatically. The way it works is that when publishing a message to a queue, producers can specify a tenant identifier and under the hood, the system does the same implements the same pattern we've just discussed because the system knows which messages are currently in flight, which messages have been received and not yet acknowledged. Uh it knows how my how much time is spent processing messages for every tenant. So it can automatically uh detect that there is a noisy neighbor. The queue is building a backlog because of that noisy neighbor and automatically create an overflow queue for that tenant. To the publishers, to the producers and to the consumers, the whole system still looks like a single queue, but under the hood, the messages are now organized into multiple subcqes. And when consumers try to receive a message, the system will prioritize returning messages from the multi-tenant main queue, it will only return messages from the overflow queue for any noisy neighbor once the main queue is empty or once the processing time for that noisy neighbor drops to a level that is comparable to other tenants. This is a great mitigation, quite a simple one. Still has some trade-off. For example, we don't limit the uh amount of messages that uh the noisy neighbor uh can occupy on the consumer side. So they can still utilize all of the consumer capacity, which is generally a good thing. But now if a new message is published into the main queue and all the consumers are busy processing messages from a noisy neighbor uh the message for a regular tenant has to wait for a processing slot to become available on the consumers uh before it gets processed. So there is still some increase in the processing times but now it's controlled and limited to the time it takes to process a message. >> Right? as uh we have learned uh there's a lot of lots of patterns and lots of trade-offs. So again the reminder for every architect to be fully aware of um the um the trade-offs. Um apparently we could talk for ages about all this but unfortunately our time is up so we just want to share a few actions and resources. If you want to learn more about more about SQS fair cues you can look into the linked AWS blog on the left hand side. If you're interested in building um SAS solutions and want to learn more about tenant isolation strategies, you can have a look into the white paper on the right hand side. And other than that, thanks a lot for joining today. Um please feel free to reach out to us on LinkedIn and I guess time is up, but we will be around somewhere here for questions if you have any. Thanks a lot and bye-bye. >> Thank you.

Video description

✨ This talk was recorded at MQ Summit 2025. If you're curious about our upcoming event, check https://mqsummit.com/ ✨ This talk explores the architectural challenges and solutions for building scalable multi-tenant messaging systems. We'll examine isolation strategies, including shared versus dedicated queue architectures and their trade-offs. We will cover authentication and authorization frameworks to prevent cross-tenant data access and tackle the "noisy neighbor" problem—where high-volume tenants might impact others' performance. Let's keep in touch! Follow us on: 💥 Twitter: / MQSummit 💥 BlueSky: / mqsummit.bsky.social 💥 LinkedIn: / mqsummit