MQTT and Apache Pulsar for IoT: Powering the Connected Vehicle - Matteo Merli | MQ Summit 2025

Code Sync · 126 views · 3 likes

Analysis Summary

20% Minimal Influence

mildmoderatesevere

“Be aware that the benchmarks and architectural comparisons are presented by a core contributor to the projects, which naturally favors their specific design choices over competitors like Kafka or RabbitMQ.”

Transparency Transparent

Human Detected

100%

Signals

The video is a recording of a live technical conference presentation featuring natural human speech patterns, audience interaction, and spontaneous storytelling. There are no indicators of synthetic narration or AI-generated visual sequencing.

Natural Speech Disfluencies Transcript contains 'uh', 'um', self-corrections, and mid-sentence pivots typical of spontaneous live speaking.

Live Audience Interaction Presence of laughter, applause, and the speaker reacting to a missing co-presenter due to visa issues.

Contextual Anecdotes Speaker mentions specific history at Yahoo and real-world customer constraints regarding firmware and protocol extensions.

Acoustic/Transcript Artifacts Transcript captures gasps and music cues consistent with a live conference recording environment.

Worth Noting

Positive elements

This video provides a deep architectural explanation of how to decouple storage from serving layers in messaging systems to achieve horizontal scalability.

Be Aware

Cautionary elements

The use of selective benchmarks (Pulsar vs. RabbitMQ/NATS) creates a sense of objective superiority that may not account for the operational complexity of managing Pulsar.

Influence Dimensions

How are these scored?

About this analysis

Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.

This analysis is a tool for your own thinking — what you do with it is up to you.

Analyzed March 13, 2026 at 20:40 UTC Model google/gemini-3-flash-preview-20251217 Prompt Pack bouncer_influence_analyzer 2026-03-11a App Version 0.1.0

Transcript

[music] [music] Thank you. [applause] So this talk uh should I skip the introduction. [laughter] [gasps] So this talk is about connected vehicle and Gorav should have been here. He's one of our customer. He's working for a leading automotive vendor. He couldn't be here for visa issue. Not coming into Germany but getting out of US. Very good. [laughter] Uh anyway you the being more powerful with him but I try to uh chug along. Um so this is coming from a real use case um with them and the kind of the requirement is in part simple. They have t tens to millions to hundreds of millions of vehicle potentially and all these vehicles are streaming term data a lot of ter data high volume and they want to stream to the cloud to do analytics and they want to also use um this channel to do uh realtime control. So send commands to the vehicles to do any any kind of logic there. Um and they want to have um kind of the the data plane goes to cloud control counter plane goes from cloud to vehicles. Um so is very high volume and but control data is low volume but but it's very critical. The the commands cannot be missed. They have to be retried. Um they have to be stored for potentially days or weeks. uh and uh and there's a logic to detect if a vehicle does not come back online. Um and we have to kind like also track the delivery that this vehicle has really um applied those command and so on. Um and that each device needs to be addressable individually. So you have to have a channel for each of these vehicles. So hundreds of channel potentially um with no no broadcast though um which is nice. The other requirement is MQTT. So, MQTT I mean as one knows it's a hard requirement for any uh IoT use cases and in their case they had already tens of millions vehicle using EQT um and uh they cannot change that firmware that that that is in the the vehicle and also it's not plain equity they have extensions some extension to the protocol some [gasps] modification to the to the standard so that that we also have to uh take care of The going back to uh pulsar uh I I give you a very high level introduction of porsar for those of you that don't know porsar um and how does it fit in this in this context so pulsar is a flexible popsub and compute based backed by durable log storage and kind like a main uh selling point for pulsar are durability data is always replicated multiple nodes you can decide how many and sync fync to uh latencies are typically around 5 millconds 99 percentile can be lower even some trade-off in the configuration and even it's a partition system so you can have topics with multiple partition but even on a single partition you can you can reach up to 1.8 million message per second. Um the system is highly available uh on the right especially uh very different from architecture from other systems. uh if if you have any any two storage nodes available, you can write you can p publish messages here even if you lost 98 other storage storage nodes. Uh it's cloud native in the sense that uh it is designed to for an environment where we can scale up and scale down the clusters very uh frequently. Uh there's a unified messaging model. I will expand a bit a little bit on it. Um highly scalable. We can have m millions of topics. We also have a network compute which is like lightweight compute uh function framework. Um you can submit a function on the cluster and it will run it for you. And um and it's is multienance. It was designed to be multien from the beginning. It would we designed this at at Yahoo. We we had one single cluster for the whole company. We had hundreds of teams using POSA cluster uh without um making a mess between each other. And also we have support for geo replication um out of the box. So typically deployment can can be from from one to two to 15 clusters in in all around the world. Uh I I spoke about messaging model since uh since there are lot of messing people here I just want to get uh a bit on this uh so for has a topic very similar to anyone else. uh you can publish on the topic and you don't have to care about how the data is going to be consumed. Uh a topic can can have subscriptions. Uh subscription have can can have different types. So and subscriptions are independent between each other. You can have exclusive one consumer only. Uh order delivery failover. You can have multiple one one active or order delivery. Share is purely you can have as many consumers as you want on the subscription. Everyone robbing delivery is basically this what you can implement a queue. Um we also have this key share which is a hybrid mode uh which is you can have as many consumers as you want. You can have them come up and down uh in and out and we have guaranteed per key delivery. So uh you can basically we use consistent hashing in between we we track whatever was act what was what was not acted and we can uh have this scale up down with order the delivery architecture is again just a couple of words is a bit different from other messaging systems. We have two layers in the system. Brokers that are stateless are we have they have no durable no durable state but I said um topics can fail over from different brokers without copying any data very quickly. Um then then we have bookkeeper which is a storage nodes. Storage nodes um it just p store data multiplex from multiple different topics. Um the there are multiple advantages for this architecture. You can scale these two layers independently based on if you need more brokers for serving capacity or if you need more storage. Um failover is is very quick. You can just add more nodes more brokers and they will we can fill over topics to to these brokers without copying data. We can add storage nodes and they will gradually receive more more more data to store. Uh there's no rebalancing like in Kafka or or or other systems and we can scale up and scale down very easily. Finally, uh, Porschar is multi protocol. Um, there's support for the same topic model can be accessed through multiple protocols. Um, so we have our own native protocols which we have all our features, but we also support Kafka protocol, AMQP, MQTT protocols. So you can interact with the same topic using different protocols. So you can say publish with Pulsar, consume with Kafka or the reverse or send with MQTT, consume with POSs. Um this is just a brief not a benchmark. I know people love or love or or hate benchmarks. This is one this is one we did with uh uh between postar rabbit and Q and nuts and uh kind of like first one is like the throughput. So this was on a three node cluster uh so small one um throughput on a on a on a 50 topics. Uh upright is uh consumer throughput uh like training backlog and so on. Uh latency lower is better and uh max fan out on a on a topic. So how many subscription how many X is out compared to one X in uh you can scan the QR code there is a the full it's like 20 pages the the report the all the criterias what what we went through there is if you are into these kind of things. Um, next one. Um, we started a journey a couple of years ago, uh, actually three years ago, uh, to replace Zookeeper. That's a pretty pretty common to be hated system, uh, all around the the world. Uh, but basically we we using Zookeeper as a coordination and metadata, but Zookeeper has few limitations. Um, basically each node has to have the entire data set um, in memory. Uh this is one you cannot scale horizontally. Um his throughput is limited but also his data size because everything is memory it has is limited also because it takes a snapshot of everything each time takes a snapshot. So after like one gig or two gigs of metadata you are out of luck of scaling a zookeeper up. Um so with pro we can reach few few million topics. Uh but at at that point zookeeper becomes the limiting factor. we cannot have more metadata in Zookeeper. That's that that's the bottleneck. So we embarking in this journey of creating a new meta data store coordination system uh to provide similar functionality as Zookeeper and ECD. Basically design has been to remove all single node limitations and also one key aspect was uh designed for Kubernetes. Um that's because we don't deploy on bare metal almost never. Uh so try to take take advantage of um new envir Kubernetes as the base environment rather than bare metal. Um we want you to have transparent horizontal scalability just keep adding nodes and have more capacity and we don't need fully lineariz history across all the data set. We just need per key per key linearizable operation with millions of bites and reads per second and store hundreds of gigabytes of metadata. This is not for storing data. this just from metadata and coordination and u this um a month ago oxia hasn't has entered the CNCF um sandbox so it's I mean it's been open source since the beginning but we we're trying to form a bigger community uh in CNCF for for Oxia um architecture for Oxia is um is pretty much straightforward I would say um there is one so we have storage nodes storage nodes will hold shards You can think of this as a big key value store. Um clients will have a server discovery. We figure out which server has their the shard that that they're talking about. For this key, there's a coordinator. Um the key part of the architecture here is that uh we're not using raft uh for replication of the data is we are using basically log replication. This is very similar to what bookkeeper does. And if there is any failure the coordinator is the one that will trigger leader election and and resolve the all the uh all the conflicts between the different nodes and it will checkpoint the class status in the coffee map. Uh so this kind of like pluggable consensus that is only applied when there is a failure. So when there's the leader election then we have to checkpoint for example the epoch and so on. Um so the tying back ox to uh IoT here um we design octress to zookeeper but then uh while maturing ox we figured out that we can use this for many other things. Uh before we couldn't use zookeeper for any intensive uh updates because it doesn't scale. So but now we have a scalable u data system so we can do things in different ways. So we started like thinking what can we add to oxia that will make our life easier and we and we came up with new features uh in in there. one is sequential keys and or secondary indexes or on or also we have like this fast notification um secondary indexes I guess everyone knows what what they are sequential keys are I mean it's not like any isoteric thing but it's kind of like a way to have um the key value store to assign atomically assign uh keys based on arbitrary in integral sequences. So this is very useful if you want to generate unique offsets for example and if your offset also you don't want to always increment by one maybe you want to increment by two the batch size for example or the size of the batch uh so and and you can have multiple sequence within a single key so you can have 001- 0 1 megabyte for example um the the key here is that um you can assign ordering without having your data to travel to a single node. Here only metadata will travel to a to a single point in in Oxia. [sighs] The unique part is also that you can sub subscribe uh to updates for keys for a specific sequence. If you if you do the math here, you can see that you can build a queue on the top of this, right? It's kind of like a very very similar to a queue. So you you you you have some way to atomically assign ordering to events and you can also subscribe and receive all the updates for for for a channel of events. Um the going back to MQTT um so we need a way to address each vehicle individually. So we need one MQTT topic uh per each vehicle. So in POSA typically we map one MQ topic to um one P topic. Now POSA can scale to millions of topics. With Oxa we can scale to to 10 millions topics, 20 million topics. Fine. That's not hundreds. And also that even if it's possible doesn't mean that is uh convenient or if it is the most efficient way of doing sol solving the problem. Um because essentially when you have a poster topic that is all about writing data analog as fast as you can um in this case we have hundreds of millions of logical topics with very small throughput on each of them very very sparse events. So using one person topic per each of vehicle even if it were fe physible not to hundreds or millions but if you not be efficient and we we need a better better solution for that. So um tying everything together uh we we came up with with this um in our MQTT implementation with the concept of virtual virtual topics. Um essentially we map uh multiple logical um MQTT um topics into one physical port partition topic. It's like multiplexing the multiplex not rocket science but um so on the injection it's easy just use the key as to retain order and then you can kind of like have all the vehicles centered with the data into this particular topic is and and then you can like run streaming analytics pipeline and consume all this data and run some analytics on it on the dispatching becomes more involved right so we have uh basically a single topic take uh to receive all these commands for hundreds of millions of vehicles and then we have to kind of like control whether these vehicles have consumed the data or not and it can be days weeks and so on. So essentially this MOP which is our MQ protocol kind like reads from this port topic and then de the multiplexes it into Oxia using this using these sequences and um approach. Um so there's one single one sequence per device but this is a way low lighter weight than having a fullblown um log base based proctopic per per device. The it works as follows. So uh if if you if you have a new device or vehicle that connects, it will first check into the available queue in Oxia for uh any available events that commands that he has to download uh through MQTT or it will basically use this the sequence notification uh to get new events. So this is kind of like how everything got tied together. uh and uh um I think this is like a interesting use case uh with like very practical and is is is both generic and and specific to to vehicles but uh I think it was a good way to show that um what can be achieved with postar and at at at a large scale and the advantages typically are is that for for the for the user where that they have a single platform that can handle everything uh they don't need an MQTT broker that have to fed some other messaging system that will have to fed some other streaming analytics basically you we can have the MQTT and the messaging in the same service uh you don't have to duplicate data you don't have to integrate those two system in a way um and you can run streaming analytics directly out of it um it is easy to operate um you can scale up and down very easily um add more capacity or reduce capacity when when you don't need it. Um that was the last one. Uh I don't know if we have time for few questions. I be very happy to answer. Thank you. [applause] >> Any questions? So thanks for the talk. Uh if your client language of choice you know coming over from the inrex it does not have a native adapter for pulsar which protocol would you suggest we would be using like either Kafka or AMQP or >> um to consider Q to be more precise? There are many language bindings for for poster. Uh [laughter] >> uh there is uh there is an a rank based one. Yes. uh I haven't tried it so I cannot I can comment on the on the completeness or or the quality but there are many um otherwise um and I I would say it depends more if you say should I use Kafka to to send or or or or MQTT or MQ it really depends more on your application what you're trying to achieve and what's the style of your your application or if you're familiar with with any of these these SDKs probably easier to cope with that >> anymore. Um, so I assume what you just discussed were persistent sessions, right? So how do you deliver queued up messages back into into client? >> What do you do when the session expires? >> The session for the for the vehicles uh then basically the the the topic. So that logic itself is on the depending on on on control. So the data will stay there until it gets either um consumed and acknowledged by the by the vehicle or some admin will just cleaned up. >> Okay. Thank you. Any more questions from just a quick comment uh EMQ built the Erlang binding for post see >> thank you very much. Thank you. [applause] [music] [music]

Video description

✨ This talk was recorded at MQ Summit 2025. If you're curious about our upcoming event, check https://mqsummit.com/ ✨ This presentation explores robust messaging solutions for the Internet of Things, focusing on MQTT and Apache Pulsar. We'll begin with MQTT as the de facto lightweight pub/sub protocol for edge communication, detailing its strengths and limitations. Then, we'll dive into Apache Pulsar, a scalable, durable streaming platform ideal for IoT backend infrastructure, highlighting its unique architecture. Finally, we'll examine how MQTT and Pulsar can be combined, particularly through MQTT-on-Pulsar (MoP), to create a unified IoT data streaming pipeline. Let's keep in touch! Follow us on: 💥 Twitter: / MQSummit 💥 BlueSky: / mqsummit.bsky.social 💥 LinkedIn: / mqsummit