We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Ruby on Rails · 1.4K views · 36 likes
Analysis Summary
Worth Noting
Positive elements
- This video offers a clear, high-level explanation of the difference between monitoring and observability and how OpenTelemetry provides a vendor-neutral path for Rails developers.
Be Aware
Cautionary elements
- The use of 'empathy' and 'human-centric' language to describe technical tooling can make a specific architectural preference feel like a moral imperative.
Influence Dimensions
How are these scored?About this analysis
Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.
This analysis is a tool for your own thinking — what you do with it is up to you.
Related content covering similar topics.
Santa Claus on delivering 99% Uptime
Kai Lentit
What is Tracing & Why You Need It
ProgrammingPercy
Rails 7: The Demo
David Heinemeier Hansson
Rails 5: The Tour
David Heinemeier Hansson
Transcript
[music] Welcome to On Rails, the podcast where we dig into the technical decisions behind building and maintaining production Ruby on Rails apps. I'm your host, Robbie Russell. In this episode, I'm joined by Kayla Riopelo, a lead software engineer at New Relic, where she works on both the Ruby agent and the open telemetry Ruby gems. Earlier this year, Kayla led a workshop at Rails Comp 2025 in Philadelphia, helping developers get hands-on with open telemetry. That session sparked some great conversations and made me want to bring her on the show to go a bit deeper. [music] Kayla's been helping the Ruby community understand observability, not just as a way to debug production issues, but as a tool to build clear, more reliable systems. We'll talk about what open telemetry is, why it matters for Rails developers, and some practical ways that teams can experiment with it in production, in local development, and even in your CI. [music] It's a conversation about building more human-friendly systems and how a bit more visibility can make us better collaborators and maintainers. Kayla joins us from Portland, Oregon in the United States. All right, check for your belongings all aboard. Kayla Riopel, welcome to On Rails. >> Thanks, Robbie. Happy to be here. >> Kayla, let's start with this. What keeps you on rails? >> What keeps me on rails? Well, I feel like Rails and the Ruby programming language just make sense in my mind. You know, granted, I'm not working in Rails day-to-day, but I'm working to support Rails applications. And I feel like the way the framework is structured, kind of the rules that it has just allows me to work within just enough creative constraints to build whatever I'd like to build. How did you find your way into working on observability and instrumentation for Ruby? >> Well, um let's see. So, I started off working on Rails applications and got to install observability into applications when I was getting started early on. And I realized that as I continued building Rails applications that I was more and more curious about the internals of how everything worked. And I started to open up gems and read them more. And it was more of an an interest in trying to pursue more opportunities to learn about gems and those kind of internals of Ruby that power Rails that got me interested in working on a gem full-time. And so I had worked a little bit in observability. You know, I think New Relic was the tool that I was using that saved me on multiple occasions when I was up late after being paged. And that made me curious when the opportunity opened up to me to potentially go work for them and work on the Ruby agent. >> As a software developer, do you enjoy working on web applications themselves or do you find yourself being more like I'm kind of under the hood tooling person or kind of just do you feel like there's a distinction there? Because I noticed there's I'll talk to some other engineers in the community and they're working on some really really low-level behindthe-scenes things that all of us rub rails developers really benefit from. sometimes they're not the same people. I'm like really curious about that area. Do you actually still identify as like a web application developer at all? >> I feel like it's something where I could envision myself in my career bouncing between them. You know, I think at at some point when I was working on the web applications, I started to lose the excitement of building a new feature in that particular software development life cycle and wanted a new challenge. And this was a new challenge that offered itself to me. You know, when I learned how to code, I kind of started with Ruby and Rails, and Rails really supercharged my education. I felt like understanding why Rails worked and why it supercharged me started to be the thing that felt more interesting. So, I mean, I really love looking at the internals. I love having the opportunity to build something that so many people get to use. It helps feel like I can make a positive difference in the web development realm. Yeah, I could see myself bouncing back and forth. I don't think I'm necessarily in the library space forever. >> Now, you've clearly carved out like a fascinating space in the Ruby community helping others understand what's happening inside their systems. And I should also mention for our listeners, just as a disclaimer, Kayla and I actually go back a little bit. She joined Plan Argon kind of early on in her career right after finishing a boot camp if I recall and spent a couple years helping our clients modernize their Ruby on RS applications. So, I don't know if that speaks to some aspect of like I'll never want to touch another Ruby on Rails application again. I'm kidding. Uh, but but I remember being struck by how quickly you took ownership of some of those tricky upgrade work really early on as even like a junior and kind of mid-level developer in your career and how you approach those projects with a lot of patience and curiosity. So, looking back now, do you think that time working with say legacy Rails code bases shaped how you think about observability or even system empathy today? Yeah, I think that time was foundational for how I think about applications and approach them. One of the things that I really like about working on the New Relic Ruby agent is that it is so old. >> It's the oldest. >> Yeah. Yeah. [laughter] >> I mean, every once in a while I'll find a commit from the founder or some of the co-founders from 14 years ago, and it's kind of exciting that that piece of code has lasted this long. But I think that the process of working on legacy applications and at New Relic continuing to find ways to help people monitor their legacy applications feels like a strong throughine. I think it's important to not just throw away something that you've built and I think I learned that or that was ingrained more deeply by working at Planet Argon and getting to work on those Rails upgrades. There were also times where observability tooling I think helped us give confidence about an upgrade in certain areas that we knew you know things were still up and working on one half of a dual booted system and so that kind of showed me to the power of that tooling. >> You when people talk about observability to me it can kind of sound a little abstract at times. So how do you personally define it? >> So I have a few definitions that I work through. One that I think feels almost more like a co-on was from charity majors. Observability helps you answer unknown unknowns whereas monitoring helps you answer known unknowns. And getting to that level of monitoring or data collection, trying to figure out the right combination of things to capture so that you can answer questions you don't even know that you have yet. In more of a concrete way, I usually like to define it as, you know, having a way to observe the internal state of your system by looking at its outputs. You ship your application into the world. People will use it. You're not exactly sure how they're using it or you know always what exactly has happened in your system when a certain variable has been provided. Observability kind of captures that information for you and gives you archive. The other day I got to attend Exo Ruby in Portland and this is going to get published a little bit later but you had mentioned that teams can start actually building more I'm air quoting human friendly systems thanks to better visibility. What do you mean by that? >> So when I think of a human friendly system I think of a system that is kind of focused on the empathy to the people involved. So some of the groups involved are your users, yourself, your teammates, any future engineers who might work on the project. And when you are trying to add observability into your system, you can think about how adding a particular piece of data might help solve a problem or work backwards from knowing about a problem that your users have and try to consider what data you might add to your application to help you solve that problem in the future. So if I'm a Rails developer who's never thought that much about observability, what are some signs that my system might not be or might be trying to tell me something? >> So some signs uh could be that you are having abandoned shopping carts. Maybe things are running a little too slowly. Maybe you are getting a lot of angry calls or different reports that you know parts of your site aren't working. You might also have team members who are just kind of confused about features that have been built recently and they have trouble building on top of them or trouble debugging them when something goes wrong. >> And I think also even, you know, if you have deployed a feature into the world and you're not really sure if it's getting used or how it's getting used, >> that might be another sign that observability could help you kind of see the effectiveness of what you've built. Now I think historically I would think that product owners or would be asking for some data or report like all right we shipped this new feature out to production in our Rails app and we want to know who's using it is anyone using it do we need to better promote it educate people I'm curious like is that an observability thing or is that just a reporting thing and where's that kind of overlap there >> I think there's some sort of overlap there when you have different objectives from stakeholders people love numbers as a way to hell what's working and what isn't working and often there's some connection to how your system is working and how the business is doing. So when you can find a way for those things to connect, really breaking down how whatever your stakeholders are asking you about relates to the system that you're building, then you can use observability to create different metrics or alerts to have service level objectives, goals that you have about uptime for your users because usually downtime translates to dollars. So the more that you can show that your site is up, show that your site is working, the more faith that your stakeholders might have in the project. >> Do you think that's still kind of mostly an ops concern or developers increasingly needing to be part of that conversation? >> I feel like it really depends on your company and how it's structured. I do think that it's important for everyone to be included in a conversation. I think that ops folks often have a different perspective than devs who have a different perspectives than people more on the business side of things. It's easy to lose something or miss some dimension if someone's not part of that conversation. I think more and more tools in observability are trying to focus on helping everyone in that process and making things that are valuable to everyone. Let's take that as a a shift into let's talk specifically about open telemetry itself. I think most of us have probably at least heard the name or maybe some references to it but can still feel a little bit mysterious like we have all these different vendors out there that we can plug in and get instrumentation with you know you work in New Relic or data dogs app signal honey badger etc etc. So how do you usually describe what open telemetry is say for a to a Ruby developer >> to explain open telemetry to a Ruby developer? I kind of think of open telemetry as a replacement for your standard agent. So if you've been installing app signal data do New Relic directly into your gem file this can be a replacement for that and more than that it's a vendor agnostic replacement for that. So you can start adding APIs into your codebase that are from open telemetry and you can send that data to multiple vendors and that way they think there's some hesitance about putting vendor specific code into your application just based on kind of the tight coupling that you then might have with another business. This allows you to kind of loosen that and instead you know similar to working in Ruby working in Rails have a fully opensource tool that you can use to collect that data and see the development of. >> Interesting. So maybe like another parallel there would be like you might install like the Stripe gem into your app, but if you also need to support uh PayPal and it's not accessible because not everybody has a credit card around the world and you're tightly coupled to Stripe, it makes it a little bit more complicated just to have another payment thing that works for PayPal. So I think that might be a simple example there but I I would imagine if someone's like curious like all right well if I'm using New Relic app signal what have you and I switch over to use like one of the open telemetry gems am I then able to send to you mentioned kind of send to both or is it specific data that I can send to one or the other or send both the same packages of information and then just use their tools to look at that data differently. >> Yeah, you can absolutely send your data to both. You could send your data to many. Open telemetry is is super modular and customizable. It's designed to be extensible. You can have multiple exporters. You can have a lot of tooling too to shape your data so it looks exactly the way you want it to look before you send it to a backend. You're not really locked into any one path. You can be pretty flexible with how your data gets visualized and ingested. >> That's interesting. Where does the say Ruby ecosystem currently fit into the broader open telemetry project and effort right now? Is this kind of a big initiative right now? >> Yeah, open telemetry is kind of structured by the data types that it collects. The three primary ones right now are traces, metrics, and logs. Traces are kind of the story of a request through your system. It's broken up into individual pieces called spans. Metrics are aggregated time series data. You know, usually things you want to see about how long a request took or how many jobs you have running. And then logs are the traditional form of observability, just a a timestamped text record. Open telemetry is really designed around structure so that the way that people can use it, they can feel confident about what data they're getting out of it. So there's a lot of standards in open telemetry related to the specification for each of those signals and what features should be available. There's also a whole other realm of specification that's focused on what data you should collect to kind of provide best practices and expectations for instrumentation. Instrumentation being the thing that you use to collect data kind of by calling APIs inside of a library like Rails, we instrument an application. All of that is to say that in Ruby right now, we have varying stability for each of those signals. Traces right now are marked as stable. So what that means is if you were to look at the open telemetry specification, we should check all of those boxes for everything, every feature listed on that page that is marked as stable. Metrics and logs are marked as experimental or in development. And the way that that looks is that we have gems available, but they're not fully feature complete. And in order for us to reach stability, we have to internally, you know, make sure that we've checked off every element on that specification in our implementation and then also get someone from the open telemetry technical committee to review our implementation to verify that everything is met. Another element of getting to stability is having people actually use these tools. They are kind of considered prototypes in a way and they need to be battle tested before we can flip that stable switch. So I think that the the presentation you know online if you were to just look at open telemetry Ruby and where the project is at it doesn't really reflect I think where we're at right now. I think the logs project is pretty stable. There's maybe, you know, one or two new features that have come along since our initial development that we need to incorporate before we can reach out to the technical committee. Metrics, we still have a few kind of feature groups that we need to address before respect compliant, but your basic metrics, you know, if you want to implement a counter or a histogram or a gauge like that is ready for you and ready to go. And we really, you know, need people since we're kind of in this active development state to let us know where we've fallen short because we do a lot of testing. But I think unfortunately some of the best testing is done in production environments. And often as someone who works on a library, you make a lot of kind of fake applications like really pseudo applications that are very light that just want to prove you know a concept at the most minimal level >> and creating like a real world application that does all of the things is usually out of reach. That's another thing that I think makes open telemetry special and its implementation is that it's a collaboration between end users. So that's who we refer to as the people who are building web applications and vendors the observability vendors you know from companies like data dog New Relic apps signal etc who are collaborating together to work on this implementation to make sure you know it kind of combines our knowledge to the best product possible >> what's interesting about this project is from a community perspective I understand I think this there's a lot of it's like trying to optimize for consistency and developers and and being a communitydriven and shared goals together. But there's also an interesting thing about like well what's the secret sauce? Like why one vendor versus another outside of just comparing apples and oranges on prices or something like how much is platform A versus platform B offer? Like what's the secret sauce if they're all feeding them the same data? How would you describe what distinction then is from like a business value proposition? >> Yeah. So vendors are having to take a hard look at their business models because I think in the past the data that we collected was the thing that differentiated us but open telemetry is challenging that and saying that everyone should have access to the same great data and we should create great data that is openly available. So now vendors are kind of being forced to compete on pricing. That's a big part of it. try to find ways that um they can make their products affordable and and also you know storage and visualization like they're >> really putting way more efforts into >> right >> what you see on your screen how they synthesize that data those are the things that they're competing on instead of the data they're collecting so I think when you're trying out different vendors just I would say find a problem that you want to solve in your system and see which tooling actually makes the most sense to you when you're trying to solve that problem, what gives you the answers or shows you the path as you're trying to solve it because they want the user experience to be kind of where they stand out. Now, >> you know, admittedly, like it's one of those challenging things for a lot of developers, Ruby on Rails developers listening might be thinking, well, someone already made the choice of what platform we're using to send data to and we've been using this and we have some custom code for that. And so, it's like it's not just a gem install and add a config file or API key and then all of a sudden it's all magic. It's like, well, we already have a bunch of things already connected to one system and it's not easy to just to flip the switch over to another thing. So something like open telemetry could potentially allow that to happen a bit more quicker and you can maybe do some ABC actual comparisons. We don't always get to do that when we're evaluating these types of tooling unless you're just installing the gem and just getting the basic details out of those and then you're probably not really benefiting from the platform as much as you could be either. So for those listening I'm just like I think there's some benefit to kind of exploring ways to how to and it sounds like open telemetry might be able to make that kind of make that a little bit easier. But also there's a cost to making the switch over as well if you're heavily vested there. Most of what we hear about observability tends to focus on production. So like tracing, you mentioned slow requests, debugging maybe incidents or fine-tuning performance, but I wonder if there's also a way to bring some value into with visibility closer to where Rails developers are working every day. So do you foresee a way that a Rails developer could say benefit from running open telemetry locally while they're developing or testing a new feature? >> Yeah, this is something that I did when I was working on Rails applications was in the development process. If I had kind of two different ideas for implementation and I wasn't sure which one was going to be the most performant, I would kind of add some extra spans or things like that around my code and time out the different approaches that way in addition to using benchmarking to see if I could find, you know, a clear path of like what the right choice was because there's often many ways to do a thing. just explain this concept like if you're looking at a let's say a controller some controller code or some models in your method or tell us more about a span like what would that look like? I'm just thinking like a typical request process you know we might send some data new relic or app signal what have you and here's the full request and like we see that show up in app signal and I know that you can do that locally as well with these tools by just enabling in your local development environment. So what would a span be within that? So a span in that is just a single step in the process. So if you're thinking about a Rails request, you know, your controller action is going to be one of the steps. Any calls to your model is going to be another span. Rendering the view is going to be another span. If you make an external call to an API, that will be its own span. And you know in some tools it goes so far as to get into the internals as well where every rack middleware call that is inside of the structure of your Rails application is another span. Some tools go really deep into that. Others are a lot lighter. An individual job starting is a span generally as well too if you want to think about non-web transactions as well. >> Okay. So like the if you got an error that was triggered that would show you maybe the back trace and you can step through and like this is where the error happened. You can walk by back step by step into that process. So it kind of sounds very similar. Is there much of a distinction there? >> So errors are sometimes categorized a little differently right now in open telemetry and this will possibly be changing soon. they create something that's called a span event for an error so that you can kind of get that special error information including like the stack trace and and things like that. But most UIs for observability will kind of have a a red color or some sort of like warning sign along whatever span raised the but you can see you know inside of the larger trace which is the grouping of spans that represents the whole request what was the specific step that raised the error. I see. I think I'm I'm also just kind of thinking like with open telemetry, are you then defining custom spans for like code within a method or is it like line by line? Like how does that kind of play into that? >> Yeah, so instrumentation covers a lot of things. It covers all of those span examples that I provided earlier. and custom instrumentation or manual instrumentation. That's where if you want to get spans for the methods that you are uniquely writing that are not attached to any other framework or library, then that's where adding your own span comes from. And you can also do that with exceptions. There are APIs as well to record exceptions that you can add into your error handling. You know, usually that will bubble up to whatever the span was that was closest or occurring when that error was raised. But if you want more detail, there's options to go more granular. >> What about active support? How does that play into open telemetry? >> So active support notifications is an element of active support that is heavily used by instrumentation authors including open telemetry. It's essentially a publish, subscribe, pub sub interface that allows you to listen to different Rails actions and create things based on it. So, open telemetry, we use active support notifications as the bulk of our instrumentation. There are still a few things that we'll use monkey patching for, but it gives us great information, you know, related to different attributes of what's going on at the current point in time. And that allows us to build out spans for example by making sure that they have appropriate names carrying over a lot of those attributes to meet the attributes in open telemetry semantic conventions. That is the element of Rails that really powers most observability work. And it's also you know like a lot of things in Rails available for users to use for their own purposes. So you can actually make your own active support notifications and open telemetry has tooling available for you to instrument those with hotel. We have an active support instrumentation gem that allows you to pass your own custom notifications to it and those will get kind of the same treatment in terms of having spans created for them at this time. Now right now we don't have instrumentation for metrics. That's something that we're working on, but eventually I could see a lot of those things being used to fuel the metrics and semantic conventions and also possibly, you know, custom metrics as well. So that could be a way to look at integrating open telemetry into your application if you want to avoid in some cases adding open telemetry specific APIs to your code. I don't think it gets you out of the woods entirely for full observability or answering every question, but it does get you a lot closer in terms of the automagic powers. >> What are some examples that you've seen people use act like some custom active support notifications for those that haven't really dove into that yet? >> So, one of the places that I've seen people use them is in service objects. That's not something that's captured by the existing active support notifications. I'll also add here as an aside that the Rails guides have great documentation on the active support notifications that are available. So you can kind of scroll through them to see what might already be covered. But I'd say anything that you do in your application that falls outside of those strict sub gems of Rails like active job, active record, etc. Those are things that I would add instrumentation for perhaps with active support notifications. >> I see. So you use like the service object is something like as an example there or if you had a different pattern for how you were approaching something in your application. I think there's already hooks for was it like all active record action mailer action cable you know active storage etc. I've also seen people do things like where they'll might use it for like a database like long database queries or something like that. They might they might track something like that and get some instrumentation there as well. That sounds interesting. I'll have to definitely poke around that as well. any thought experiments you might encourage people like listening right now to like hey if you got a Rails app and you want to play around with your active support notifications with open telemetry just go try this today >> just to play around with it I would open up the Rails guides look at how it all works kind of look at the subscribe method I'm forgetting right now what the other method is called that kind of lights things up and I would just find maybe one spot in your application add a notification to that and then maybe in a similar spot in your application add a open telemetry span that's using the tracer in span method call and kind of compare what's available to you see which interface that you prefer in terms of adding attributes creating names subscribing because they both kind of have that setup process of >> right >> the stage you have to set in order to create the notification and then inside of that actual notification you're subscribing to I think interface preference is a great example there. And also if you did like a onetoone switch for active sport notification and an open telemetry span could be an interesting way to see if there's any performance difference in what you get with both of them. >> Nice. All right folks, there's your assignment for this afternoon. Recently at the talk you were talking about a like an open source tool that you could use to run maybe locally or on a server somewhere that you can send this data to as well. So you can look at this information without needing to rely on New Relic or AppSign or what have you. Could you tell us a little bit about that as well? >> Yeah, so a few tools that you can use that are kind of separate from the vendor payment landscape. I mean Graphfana is one. It has fully open source and kind of self-hosted to their own like paid products as well. Prometheus is a great example for metrics and being able to capture data there and most open telemetry systems I believe ours included allows you to translate open telemetry metrics into Prometheus and then there's also Jerger which is another fully open source solution to look at tracing and that's kind of from the same parent organization that open telemetry works under the cloudnative computing foundation they also support Jerger so there's a lot of interoperability designed between those two tools. [music] >> This episode of On Rails is brought to you by asset fingerprinting because back in my day, we just renamed the file and hope for the best. Have you ever changed a stylesheet, deploy it, and still see the old version? Yeah, me too. We all did for years. We'd slap question mark v equals 2 on the end of the URL. We'd clear our browser caches. We'd blame the CDN. Anything but fix it properly. Then Rails gave us fingerprints. [music] Real contentbased automatic file versioning. No hacks, no begging the browser to behave. Asset fingerprinting because life's too short to fight stale CSS. Respect your future self. Hash your assets. Something I don't hear discussed much is testing the instrumentation itself. So once a team starts adding a bunch of say open telemetry or vendor specific code, I've rarely seen anyone really write a lot of test coverage or tests around making sure that it's working the way that they expect and it's not potentially breaking things. Has that been your experience? Do you see teams writing tests for their telemetry in their instrumentation like this? some teams I think it's you know not the majority of teams but some teams who are kind of committed to trying to get 100% coverage especially if they're adding custom instrumentation into their methods they'll make sure that spans are created for those systems as well and open telemetry has some tooling to you know possibly make that easier you can have a console exporter or an in-memory exporter that's what we use for our own testing to kind of validate what has been created this is a little bit different from the question that you asked, but there are also libraries to help observe your testing frameworks as well, so that you can get spans about tests that are frequently failing. You know, maybe you want some sort of CI monitoring to see maybe what costs you have or to judge the speed of your test suite. Open telemetry has tools for that as well. I guess if it's you're thinking about it like any other thirdparty system in your in your test like if you're mocking or stubbing out some of that data or using something like VCR. I'm just thinking like things people use in in Rails apps to mock like a third party service if you're just trying to send some data. Does open telemetry have a way to pull back down data into your Rails app? Is that ever a thing that people kind of do? Is it mostly a kind of a one-way send it into this platform and then you can interact with the data there? So, it's kind of like a data storage pushing out perspective. >> Yeah, that's a good question. That's not something that I've seen is people pulling the hotel data back into their systems. But I suppose that if you're using something like Graphana or Jerger to visualize your data and you have your own storage system, which is something open tele allows, you can kind of cut yourself off entirely from the observability vendors. I could see that providing some strategies to bring the data back into your app. Yeah, I was just kind of curious if like down the road a couple steps is there anything that people might start taking advantage of the data to like if you're collecting data on you mentioned like end user data or how things are being used and that ends up changing the behavior of how your application is actually running based off of that information. Like for example, something I've been thinking about is how people will in teams will try to optimize the performance for something in their Rails app. Let's say maybe it's a dashboard or something in their app and they'll see that their slowest requests are like from their biggest customers and so they'll do things like oh well maybe we need to optimize our queries to render this page faster but like oh but certain customers are too big they have too much data they have 20 times the amount of data that our average the rest of our users but they'll end up implementing things that they'll like well maybe we'll do some pre-caching for this information or we'll some background jobs or we'll store some data some like a temporary cache And they'll do that for all of their customers, you know, because they'll just kind of like, well, we'll just optimize this and be like, for each of our customers, pre-optimize the data, even though like most of their users aren't logging in to check that data. So, you're doing all this background. You create a bunch of busy work for the the servers that don't actually be needed, but you only need it maybe for those few big clients, right? And I've been thinking a little bit around this idea of is there a way to like track when customers are getting to a certain size that you start enabling some of these performance benefits automatically in your system so it can kind of I say selfheal but kind of adapt to how things work and set the expectations a little bit differently. I'm wondering if tools like open telemetry could potentially help out with some of that. >> Interesting. Um this isn't something that I've personally tried. So in theory, you could be running a local collector that's processing your data. The collector is kind of a special executable file that has a lot of different plugins and extensions for it that allow you to kind of shape your data before it gets sent off to a backend or, you know, wherever you want to store it. There may be a way to create some kind of feedback loop there when you have kind of the data before it's been fully sent off, but I wouldn't say it's something that has been designed as part of telemetry. >> Sure. Sure. >> At least not that I'm aware. But the thing that's kind of cool about open telemetry is that it could very well be designed at some point because it's designed by committee. You know, you can bring up a change to the specification or the semantic conventions which are kind of the names of attributes and and spans and such that have been codified to take the product in the direction that you think is useful for your team and people can have a discussion about it. Usually it's a discussion across languages to decide you know how that works and that usually will bring up other solutions that different teams have thought of maybe with the shape of the tool as it is today. So yeah, I think that's not something I've interacted with too much. >> I get it. This is kind of something I've been thinking about and writing about recently and that's a whole another topic. But I I kind of am curious around how we collect data and we store this information in these systems. And one of the pricing models tends to be on the number of requests and stuff like that. And so for some people listening, they're like, well, how much is this going to cost if our platform scales and we're collecting all this information. We may or may not be needing it until there's like, you know, air quoting an issue that we need to investigate or until a customer raises a concern. In the meantime, we're just collecting all this information and data, and that's going to cost us a lot to potentially use some vendor to do that. Do you find there's some interesting strategies that teams can explore to limit like let's say you only wanted to do that for your most important paying customers? You want to track certain metrics or spans or extra logging for those users, but maybe your free tier people, you're like, well, maybe we don't need to collect all of that. We just want to capture some of it. Is there some good tooling available to help kind of guide that or is it just Ruby code that you can be like if a paying customer send some data off otherwise we're not going to bother with that right now and that's just one of the value ads to signing up to be a paid customer. >> I mean the first thing that I would probably reach for right now is Ruby code. There's nothing that I can think of outside of like a condition that's already in open telemetry to kind of control that. >> But it can be done because it's just Ruby code, >> right? I mean, I suppose what you could do there there are these things called span processors in open telemetry. So, they allow you to edit your spans, change the shape of them before they get sent away. There's also some hooks that we're working on. There's an on-ending hook that is still being worked out that would allow you to kind of edit your span as it's ending. And so at that point you could use one of those intervening tools to drop the data you could collect it for everyone and if you have a particular attribute that shows you know what >> I see >> what it is. Yeah. Then you could build a special processor to strip that out before you actually spend the money to store the data. >> Oh interesting. What about things like feature flags? Is that something that teams are using to turn on and off and capturing data without needing to say do a full redeploy of your application code? >> Yeah. Um I think that feature flags are another thing that I have seen customers use to allow different types of observability to be sent without redeploying their code. Is that sort of thing where you would have to do that on like let's say you had a a feature or a collection say you have 20 feature flags in your Ruby on Rails application and toggling things on and off specifically for that you want to track different like 20 different little hotspots in your application you want to capture some some metrics are those things that you can control you have to enable 20 different flags or is there some configuration options that you could also be handling or is like very specifically like if this feature is enabled in like say like there's 20 different spots in say 20 different controller actions and you want to be able to toggle them on and off one by one whenever you want is that 20 different areas in the code. I haven't looked that closely in in the actual implementation to see what that looks like but was kind of curious if that's like kind of a it would go there in the controller code versus like in some other configuration options in the application. >> For me I think the patterns are still getting determined by the end users. I don't see a ton of examples um for kind of best practices with this right now. What we use generally are you know kind of the entry point for open telemetry for Rails applications is usually in an initializer and there you know you call um the open telemetry SDK configure method and that will turn on open telemetry for you. you know, you have to do a little more work than most of the the vendor um specific gems and it it's not all automatic. You have to add a bit of code to get it started. And I could see, you know, most of the configuration we have right now is in that file. You know, within that method, you can provide a lot of settings. There are also environment variables that you can use for configuration for open telemetry and down the line a future project. I think once we have metrics and logs more established is to move to declarative file-based configuration system which is something open telemetry is working on specifying right now and it seems like that configuration option could provide more of these opportunities because it it does seem to get nested pretty deeply. The way that you kind of create custom spans in your own application is by initiating or initializing your own tracer. So the tracer is the thing that makes spans and I could envision someone creating different tracers for different scenarios and maybe deciding whether or not to ship the data from those tracers based on that. >> When you're sending data to open telemetry, are you also telling it how long to store the data or is that something that's managed separately through the vendor specific management? >> Yeah, that's that's something that's managed separately by the vendors or the backends. Open telemetry just deals with collection and standardization of data. >> So it's not like hey save this data for the next six months but then please delete it after that like a kind of like a cache system or something. You can tell it to like expire at some point. That's helpful to know. You've also talked about how observability can help our future selves as engineers or other future engineers. So was there a moment in your own work where you really felt that benefit? Yeah, this is something that's kind of a unique challenge as someone who's creating a library to get observability on that library that you're writing. There are features that I've built for the New Relic agent where I want to know if they're being used and I have to add my own really lightweight telemetry to be able to observe that or also to debug problems that we don't expect that customers bring to us. Sometimes that involves shipping a special version of the agent with that additional debugging information in it to kind of run for a short period of time to collect that data. But yeah, I find myself thinking once this is out of my hands, once this is in the world, what's the information that I need to debug it to know if it's working? And what questions might I have? What questions might my my managers or other stakeholders have about the time that they spent on that project and whether or not it was meaningful? >> I could see how that could be really helpful. You know, you mentioned that there's multiple vendors all working to have this be a committee kind of driving or having the be the set patterns and the the standards that we would all expect from the committee. How often is like the those organizations talking with each other and like are they all contributing to these different like programming language framework specific toolings themselves and how often are you collaborating on these types of projects? >> So work is feels like happening constantly from a million different directions and so many different companies. Open telemetry. One thing I think is pretty cool about its structure is that on some of the bigger committees like the technical committee and the governance committee, so things that monitor the specifications and then also just more of the day-to-day structure of open telemetry, you're not allowed to have more than two people from the same company, I believe, on those boards. So they really want to make sure that it is diverse and that no single company ends up having a monopoly over the project. The specification is constantly being worked on. I join a meeting weekly. The way that open telemetry meets is that there are special interest groups that are usually referred to as SIGs and they generally meet on a weekly or bi-weekly basis and you get together to kind of talk about problems, review PRs and discuss the things that can't be worked on async. >> We also have a Slack channel where there's a ton of communication as well. So I think that people are pretty active in it but there's also some complications because I think a lot of the people involved their job isn't only open telemetry. They usually have other expectations from their employer or maybe they're doing this in their spare time. So there's a lot of activity but I think there's also sometimes a lot of split attention which can mean that things are maybe not resolved as quickly as they would be with a vendor specific codebase. >> Right. Right. >> Yeah. you have paying customers. >> I'm familiar with that in the open source world and not everybody has the time to do the things that they hope that they can do. Are there a lot of people that are not part of the vendor ecosystem that are also contributing to these gems that you're working on? >> Yeah. >> What do you think lures them in for those listening if they're looking for ways to get involved in helping out with maybe these types of Ruby gems? How would you advise that they get started? So I think what brings a lot of people from the enduser group in is they're using these to solve problems and open telemetry maybe doesn't do something that they expect it to do or they've found a path that has a bug that we haven't discovered and so they'll submit a fix for that or maybe even through adding new instrumentation. You know, we don't have instrumentation for everything, but if we find a community member who is willing to help maintain the project, we're happy to bring in instrumentation for new gems. And so, most of the people who I see join have a question or a problem. And that usually gets them to contributing to help find the solution. You know, in doing a little bit of prep for this conversation, I actually reached out to a couple people that had had attended Exor Ruby in Portland recently and asked if they had had a chance to ask you a few more questions if there had been more time, what would they have asked you? And one person, Renee Hendrickson, asks that they're finding that each of the company's collectors seems to be wrapped in a unique API and the portability of data or sending the same hotel data to two different collectors isn't as easy as just adding a second endpoint. So how are they handling that? Is that something that open telemetry conveys that that is the case? >> Yeah, this is a question that's hard for me to answer because I personally have not made the collector like my area of expertise. I think because I've been able to ignore it and just focus on trying to build things for Ruby. Now when we get to a point where Ruby has more stability, I could see myself shifting my focus more there. That data portability issue I think is a problem that's trying to be solved. From what I know about it, you know, the collector has hundreds of components and a lot of those are vendor specific because vendors are still trying to find a way to differentiate themselves in this new open telemetry world. So kind of having their own pipelines, having their own components is one way to do that. >> Interesting. Yeah, I guess the basics of open telemetry provide some options, but I think as with most things in hotel, it depends on how much time you want to commit to it in order to make it specialized to your use case. >> You know, something I I hadn't really thought of until just now is what's the makeup of the gems, like the client gems themselves? Like, are they having to account for all those vendors in the gems themselves or is there just consistent endpoints to handle that? like is that going through some central hotel tooling and then it gets sent off or is it like the client gems themselves if there are some vendor specific details or is the whole idea that there's not vendor specific things in the gems so that way you can just be like give me an API endpoint and some keys or credentials and it'll just send the data that way and let that system deal with it. The latter is the goal with hotel. The idea is to not have any vendor specific code in the API or the SDK instrumentation is where it can sometimes get a little murky because like AWS for example has their own special samplers that are related to X-ray. So that that way you can get kind of the best data there. Interesting. >> So I would say in the core repo which is just called open telemetry Ruby that's supposed to be very vendor neutral. There's a few things related to other open source tooling that are specifically mentioned for legacy purposes because open telemetry you know itself was a project that was built out of a merger of two other similar projects open census and open tracing. >> Okay. >> So there's some things decided merger related to Jerger is one thing that comes to mind. Prometheus is another kind of project that's also drawing that thread as well. But I think the idea is from here on forward, unless it's kind of surreptitiously added, the idea is that nothing there should be added that isn't able to be vendor neutral and used by a community more widely. There might be a vendor who's more interested in building something for their particular product, but it it shouldn't just work for them. >> Earlier we had touched around the idea around some how rubbound developers could start taking advantage of this and using something like open telemetry in their their tooling. So in in the talk that I had seen you give you you had shown an example where you're talking about like say background jobs for example would love to like run through a couple of like kind of high level like things that people could put into practice say later today if they have some time or this coming week if they were like okay I'm going to install this ge I'm going to experiment a little bit. So let's let's take their framing around jobs. How might you use open telemetry to make more sense of what's happening within your jobs? >> One example could be by using metrics. So in open telemetry there's a bunch of different metrics. There's like a different flavors of metrics I guess as well and those are all called instruments and uh one that I think is particularly useful for jobs is the up down counter. So essentially you can increment and decrement. And what you could do is find maybe one of your most critical workers and around the perform method wrap maybe first a call to an updown counter that increments one to show you know how many jobs you have that are active and then when that job finishes make a call to that same instrument to subtract by one. So that that way you can have more visibility in your system of choice to see how many jobs are active at a given time. This could be a good way to help you identify if you need to adjust maybe the capacity that you have for your workers or tell maybe from things that aren't necessarily technical just userdriven experiences about why certain times of the day or times of the year could be causing the number of workers you have running to spike. I could see that being helpful if you had a bunch of long running jobs but then you're like firing off a bunch of them and wanting to get a sense of when things are happening or you needed to maybe shift when things are getting run. Maybe you have like well we separated them out by an hour but if you have like long running jobs and they kind of start to cascade over and then you have a bunch of really short jobs happening is that potentially go into the same queue or not see that being quite useful. Are there other types of things you've seen people or you think could also help people with benchmarking or thinking about their jobs? So another thing that's interesting too on open telemetry metrics specifically is that they have dimensions and so you could include additional information in that metric maybe related to the queue or if you want to implement it on a bunch of jobs the job class so that you can facet that data to get more of that feedback there. Another thing that could be interesting related to jobs is you could also add a metric related to maybe Q size. There's an observable gauge that could be a good option for that. Gauges are intended for numbers that aren't specifically increasing or decreasing. They're kind of just a snapshot in time. And observable in front of it means it's an asynchronous metric. So it's not triggered by any specific method call. You kind of set it up and then it runs on whatever interval you tell it to. And so that's something that would give you more data over time to give you lots of snapshots about what size the queue is. You know, maybe every 60 seconds or >> interesting >> 20 seconds, whatever is meaningful to you. >> That's helpful. What about for maybe something more userfacing like in your front-end application? We talked about just tracking the performance or like some spans there. Are there other things that you've seen there some other interesting tooling within maybe things related to logs or anything that we could getting more information on? >> Yeah, I think logs could be a great example here. You know, logs have been around for a long time. Rails has a lot of great logs built in. You know, I think people often will add logs for debugging purposes as well. And open telemetry, its logging tooling puts the logs in context with your traces and spans. We recently released a logger bridge is what they call it. The package is open telemetry instrumentation logger and that adds instrumentation to the Ruby logger to capture all of the logs that you're already creating and send them to open telemetry or you know put them in that OTLP format. OTLP being open telemetry's special protocol that allows you to kind of take advantage of the telemetry that you've invested in for years if you're maybe just starting with an observability company through your logs and then make them even better by seeing them in the context of a particular span, a particular trace to kind of get that duration data that might not be included in the logs. So if you have logs that are emitted when a user does something on the front end, those would be captured and and brought in as well. >> I see. And is it a lot of work to enable that? >> It's pretty easy these days. Um for a while we it needed to be installed from a branch which made an extra step in your gym file. But now the logging SDK, the logs exporter and the log instrumentation is all available. So you you mostly need to install it. just call that open telemetry configure method. There's another method called like use or use all where you can specify with use individual instrumentations to include. Use all just installs all of the instrumentation it can find that's compatible with your system and that's about it. If you have a top level exporter already set through your environment variables, the data should get sent there. If not, you might need to add another environment variable that's specific to whatever backend you're trying to send your data to specifically for logs. But just with some minor configuration, you can kind of take advantage of those logs and and see them in context. >> I know that there's some new structured events that's going to be built into Rails itself. Have you had a chance to take a look at that much? And how how does that vary or differ from what open telemetry is doing? >> So, I've had a little bit of a chance to look into it. I would really love to spend more time looking into it and I think I I will be soon. What I think is kind of interesting about structured logging for open telemetry and like Ruby and Rails in general is that we're kind of getting builtin in Rails for the first time logs with attributes. This is something that is pretty common in other languages. >> What does that mean exactly for those listening that might not know the difference if they're just in the Rails bubble? I'm like, well, I get information. There's like war debug and I see the request details and I can see some tracing of what happened in this and depending on the log level that I have set. Tell us more about what you mean by that attributes. >> Generally with your logs, when you use the Ruby logger, there's maybe like three different types of data. You have your time stamp when it happened. You have the severity level like debug and warn and the message itself. And often if you are just ingesting those logs into some sort of system maybe like Splunk, you have to do a lot of parsing in that message to pull out unique data. You know, you might have it semicolon delimited or pipe delimited. And that's how we've been kind of adding structure to our logs. Structured logs with the event reporter give you the opportunity to add attributes. So your log looks a little more like a hash. And there are libraries that have been working on that like semantic logger is one that's really popular with rails and they also have an open telemetry bridge. So you can use open telemetry with semantic logger. But now with the event reporter in Rails you can start to add those attributes to your logs. If you want to include something like user ID or if you have a log that's consistent maybe it's defined in the application controller. can add whatever the unique controller is as its own separate attribute you can then filter on when you're trying to debug using those logs. >> That's interesting. So like if you're like a multi- application or something if you wanted to just include the subdomain whatever that that you're using so that way you can filter out or distribute that is some of the tooling with telemetry able to just kind of ship that stuff over directly to to telemetry or is that also still happening independently? Is there any performance impact by using things like open telemetry or vendors if they are needing to make a network call to send data out while also logging to the servers that they're running on? >> The thing with observability is you have to get a comfortable with a little bit of overhead like nothing comes for free. I feel like people who are working on these gems are quite obsessed with performance and will write really ugly code in the name of performance if it just saves like that little fraction. You're right in that if you're sending things to more places like those are all going to be calls that take up memory and take up time. So I I could see a performance cost occurring if you are sending your data to too many places or also if you are adding too many spans or or things like that. So I would be mindful of that as well as mindful of the cost of wherever you're sending your data for ingest when you're adding anything with observability. >> But hey, CPU is cheap these days, right? We just throw more servers at it. That's always been the answer. Or we can optimize our code a little bit. And it's also trying to help us optimize our code. It's always an interesting thing when we're like where's that fine line of how how much we monitor everything and record everything and store everything. It's always it depends, isn't it? >> Yeah. And that did remind me of one other thing, you know, like for example, profiles are something that you're usually just monitoring at a snapshot. I think there's some some new tools that are making it easier to kind of monitor in production. Open telemetry probably about a year ago now had elastics profiling specification and tooling donated to the open telemetry project. So the idea is that eventually open telemetry 2 will be able to create profiles that you can record to just have that in addition to the traces, metrics, and logs. >> Oh, interesting. >> But we haven't started work on that yet. But if anybody out there is interested, please come to the Ruby Sig. >> How would they go about finding that? >> Yeah. So, Open Telemetry has an organization on GitHub, Open-Tlemetry. In that or there's a repo called community, and on that communities readby page is a list of all of the different special interest groups that meet and the times that they meet. And so, within that table, you can find a link. I think our link is calendar Ruby and that'll add you to a Google group which will send you an invite for our Zoom meetings. >> Approximately how many people are attending these these days? >> We don't have a lot of people attending them. >> So, we can change that. >> Yeah, that would be great. >> Well, I'll definitely include links to that in the show notes for folks as well there. A couple of last questions for you, Kayla. Is there a technical book that you find yourself recommending to PICE? The book that helped me the most when I started my job at New Relic was Metar Programming for Ruby 2. It broke down everything that I needed to know about monkey patching, which felt terrifying and wrong when I started this job. It was something I really tried to avoid when I was using Rails applications previously. But often the way that you add instrumentation to different libraries is by watchable prepending or alias method chaining. New Relic still supports both. That gave me a great sense of why you would want to use those tools, how you could use them, and also their their downsides as well. And even though we've moved on for Ruby 2, I think a lot of that information is still very relevant and hasn't changed too much. >> I'll definitely include a link to that in the show notes. Do you know if there's a a new one yet? >> I don't. We'll look it up. And Kayla, I just have to say it's been incredible to see your journey since those early days of Planet Argon. And to watch you up on stage teaching folks and mentoring so many Ruby developers about topics like this can be pretty intimidating. So I just want to say it makes me genuinely proud. >> Thank you. >> And I'm looking forward to seeing where you continue to evolve and thanks for uh coming on Rails today to talk shop a little bit about open telemetry. I'm going to go experiment a little bit more with this in some of the projects that I'm working on and I hope some of our listeners do as well. >> Thank you, Robbie. Yeah, thank you as well for the opportunities at Planet Argon. I [clears throat] think it it's a straight line from there to here and everything that I learned on those Rails applications. So, thank you for the opportunity. >> Likewise. Thanks again, Kayla. Thank you so much for stopping by to talk shop with us today. That's it for this episode of On Rails. This podcast is produced by the Rails Foundation with support from its core and contributing members. If you enjoyed the ride, leave a quick review on Apple Podcast, Spotify, or YouTube. It helps more folks find the show. Again, I'm Robbie Russell. Thanks for writing along. See you next time.
Video description
In this episode of On Rails, Robby is joined by Kayla Reopelle, a lead software engineer at New Relic, where she works on both the Ruby Agent and OpenTelemetry RubyGems. They explore what observability means for Rails developers—not just as a debugging tool, but as a way to build clearer, more reliable systems. Kayla explains OpenTelemetry's vendor-agnostic approach to instrumentation and shares practical ways to experiment with traces, metrics, and logs in both production and local development. *[00:00:00]* – Intro and welcome to Kayla from New Relic *[00:01:12]* – What keeps Kayla "On Rails" and working in observability *[00:06:14]* – Defining observability: unknown unknowns vs known unknowns *[00:08:08]* – Signs your system might be trying to tell you something *[00:11:17]* – What is OpenTelemetry for Ruby developers *[00:13:42]* – Where Ruby fits in the broader OpenTelemetry project *[00:20:40]* – Using OpenTelemetry locally while developing features *[00:24:13]* – How ActiveSupport notifications power Rails observability *[00:28:42]* – Open source tools like Jaeger and Prometheus for local visibility *[00:30:58]* – Testing instrumentation itself and CI monitoring *[00:38:17]* – Using feature flags to control data collection *[00:49:43]* – Making sense of background jobs with metrics *[00:53:50]* – Capturing logs in context with traces and spans *[01:00:09]* – Book recommendations and getting involved with OpenTelemetry Socials: LinkedIn: https://www.linkedin.com/in/kaylareopelle/ GitHub: https://github.com/kaylareopelle Company: Homepage: https://newrelic.com/ Blog: https://newrelic.com/blog 🧰 Tools & Libraries Mentioned ActiveSupport::Notifications → Rails’ pub/sub API used for instrumentation. (https://api.rubyonrails.org/classes/ActiveSupport/Notifications.html) AppSignal → Rails-friendly APM and error tracking. (https://appsignal.com/) AWS X-Ray → Distributed tracing for AWS services. (https://docs.aws.amazon.com/xray/) Datadog → Full-stack observability platform. (https://www.datadoghq.com/) Elastics Profiling Spec → Donated profiling format for OpenTelemetry. (https://github.com/open-telemetry/opentelemetry-proto/tree/main/opentelemetry/proto/profiles) Grafana → Open-source dashboards and visualization. (https://grafana.com/) Honeybadger → Error monitoring for Ruby apps. ( https://www.honeybadger.io/) Jaeger → Distributed tracing system (CNCF). (https://www.jaegertracing.io/) New Relic Ruby Agent → APM agent for Ruby and Rails. (https://github.com/newrelic/newrelic-ruby-agent) ObservableGauge (OTel Metrics) → Async gauge for snapshots like queue size. (https://opentelemetry.io/docs/specs/otel/metrics/) OpenTelemetry Collector → Pipeline for receiving and exporting telemetry data. (https://opentelemetry.io/docs/collector/) OpenTelemetry Logger Bridge → Sends Ruby logger output to OTEL. (https://github.com/open-telemetry/opentelemetry-ruby/tree/main/instrumentation/logger) OpenTelemetry Ruby → Vendor-agnostic telemetry for Ruby. (https://github.com/open-telemetry/opentelemetry-ruby) OpenTelemetry Ruby SIG → Community group maintaining OTEL Ruby. (https://github.com/open-telemetry/community#special-interest-groups) Prometheus → Metrics collection and storage. (https://prometheus.io/) Rack Middleware → Web middleware stack used in many Rails instrumentations. (https://github.com/rack/rack) Rails Structured Logging / Event Reporter → Structured logs built into Rails. (https://github.com/rails/rails/pull/51188) Semantic Logger → Structured logging for Ruby & Rails. (https://github.com/reidmorrison/semantic_logger) Stripe Ruby Gem → Payments client used as an instrumentation analogy. (https://github.com/stripe/stripe-ruby) UpDownCounter (OTel Metrics) → Counter for tracking active jobs. (https://opentelemetry.io/docs/specs/otel/metrics/) #rails #rubyonrails #tech On Rails is a podcast focused on real-world technical decision-making, exploring how teams are scaling, architecting, and solving complex challenges with Rails. On Rails is brought to you by The Rails Foundation, and hosted by Robby Russell of Planet Argon, a consultancy that helps teams improve and modernize their existing Ruby on Rails apps