We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Analysis Summary
Worth Noting
Positive elements
- Provides a clear, hands-on demonstration of how to use Elixir's low-level primitives to handle process failures and race conditions.
Influence Dimensions
How are these scored?About this analysis
Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.
This analysis is a tool for your own thinking — what you do with it is up to you.
Related content covering similar topics.
Beyond your first NIF - Riccardo Binetti | Code BEAM Europe 2025
Code Sync
Optimizing the BEAM's Scheduler for Many-Core Machines - Robin Morisset | Code BEAM Europe 2025
Code Sync
Parallel Map in Erlang
BEAM Channel - Erlang & Elixir
Testing with ExUnit in Elixir
SmoothTerminal
I wrote a lock-free, String-like List. It's faster than you think!
Fred Overflow
Transcript
hi I'm aarin and this is exploring Elixir where we look at interesting language features libraries and design patterns from the world of Elixir and the beam virtual machine in this episode we'll look at one method to gracefully handle unexpected errors in our codes business logic a lot of Elixir applications provide services over the network and often times these are done over long running connection such as a web socket or even perhaps a raw TCP uh connection so we're going to look at this toy example of such a service where we're providing the ability to pass in some Json formatted data um along with a key that we expect to be in that data and it's going to return some transformation of that data back to us and perhaps we're going to do this um or provide this over a web socket to the user so it's calling this extract function let's just quickly go in I've already set up the Json filter module let's go ahead and Implement um a function called extract and it's going to take a process ID some Json and then a key in that Json to work on now normally if this is a websocket we wouldn't be necessarily taking a process ID but um perhaps a the actual socket itself or whatnot but for our purposes this will do so we're going to check to make sure that the data we're getting in our parameters is what we want and need trying to write some you know reasonably reliable code here um and so first thing we're going to do is we're going to use poison which I've already added to the project earlier to decode the Json into a native Elixir term and then we're simply going to send back to the pit that called us or the process that called us um the data associated with that key very simple two lines of code what could ever go wrong right so let's try running that and we immediately get a bug and this is a a typical kind of bug if you've used poison or similar libraries where uh you might forget that it doesn't return just the data you've asked for but it's actually returning an okay or error tle and it's the second uh member of that tupple that actually has a data we're after so now if we try this okay we're getting what we wanted that's great and perfect now such bugs can slip into production code especially as it gets more complex and you know we want to make that we can handle that and especially if the jobs that we're running or the workers that we're running um are using say external services that we don't fully control as well so the world is not always in our control so the problem is that when this failed here if this had been running inside of our uh websocket code it there's a chance that the whole socket connection would just also be reaped and and and close on us and the client on the other side would then either get a uh closed socket on them they don't know why uh they have to reestablish the connection uh or it might even worse just hang around on them um and you get no response additionally if the websocket that um the request came over and we started this bit of work going on closed because maybe the client is on a bad uh mobile connection or what have you and they lose internet connectivity and the stocket goes down then our job would also be interrupted and and if we were trying to store this data you know somewhere perhaps durably in a database we may not want that either so it'd be really nice if we could separate failure in either the websocket connection or the worker that where we've got running here um from each other so that failure in one doesn't affect the other and of course in Elixir we have a really nice primitive to do this called processes so let's put the worker now in its own process so we'll just make a another function called extract data for a lack of a better name coming to my mind um and what we'll do here is we'll simply spawn a process that runs the extract data function and we'll pass in the PID uh the Json we we received and the key um and now this will run the same code but it'll run it in a separate process now to get the data back we'll just write a really simple um manual uh receive block here and what we're going to end not when and what we're going to look for is some sort of data coming back to us being sent back to us from the um the process and actually what we're going to do is we're going to send uh our S as the pit to respond to because it's going to send back the message here so we'll wait for it and when we receive something we'll send on to the pit we were given the data that we received great so now we really haven't fixed the problem if if this bug is fixed here then this will work just fine for us great we got our data back but if there's a bug in here again if we turn ourselves to the bad condition this receive Loop is not going to receive anything it's just going to sit there forever waiting on data that's never going to arrive so maybe we go okay let's let's add a time out here and we'll say after you know 1 second uh if we don't get anything back we'll just send a message to person who called us and we'll give them an error we'll say it you know timeout which is a sad event but at least now even when there's a bug after one second we get a very nice error timeout message fantastic so now we're you know stepping the right direction how it would be nice not to have to wait for a second and maybe it takes more than a second we don't know how long it's going to take and what we'd really like to do is be able to respond on actual failure so at this point we might be tempted to do a spawn Link and Link the two processes together our web socket process and our worker process the problem with linking is that it does well link the two processes to each other so if either of them fails it's going to cause the other process to stop and that's not we want at all we could say here well let's trap exit setting the the Trap exit flag and then right code to manually manage the um the responses but what we really want to do is we just want to monitor the worker from the code that's being called here so we can actually do that and magically enough it's called Monitor and what monitor does is it lets you watch the lifespan of another process when you call spawn monitor it's going to return a two Tuple the first of which is going to be the uh PID of what is been spawned so we'll call it uh worker PID we're not going to use these values but it's neat to see what um we're getting back and then it's going to give us a reference to the monitor you can have multiple monitors on a single uh process unlike links which you only have one um but that's not a problem for us because we're going to call spawn monitor we're not going to call spawn itself and then call monitor on that PID the reason for that is because there's an obvious raise condition between these two if we call spawn the process might start and immediately crash or immediately complete it its job before we get to callor monitor so we use the uh Atomic spawn monitor function just like spawn link and this will guarante as we get a monitor back on whatever the pit is now we are going to get down messages whenever the process exits this is fantastic and and the the down message is a tle as well it Returns the monitor reference which we received earlier we'll match up the function that failed the PID um of the process that crashed and then a reason right so if this happens let's send back to our caller um the fact that we got an error processing failed and then let's just uh tell them what the reason was as simple as that and now we should get an error back pretty much immediately we don't have to wait for that timeout and this is beautiful so now we actually see we get a back Trace telling us what the error was and the processing failed now we may want to take this a step further and actually receive all the data because maybe it doesn't receive come in all at once so we'll take our our manual receive block here that we've written um let's wrap it in a little function we'll pass in the PID to communicate back to for response PID do and when we get data instead of just we're not only are we going to just send that data back to our caller but now we're also going to call um our wait for response function recursively um so that we can now do things in our worker like have a stream of of uh messages so let's say we want to send back um progress reports so progress 50% or whatever and then when we're done we will send a progress of 100% wonderful now if we run this oh a little bit of an error there ah yes need to uh provide the third term no options good so we got our error but we got our progress message first and then we got our error so let's see what happens now if we fix our error our little bug again oh too fast okay great and then we get um our progress and then we get an error message and and the reason it exit is normal okay so what's happened is the uh worker has successfully completed um but it's giving it's exiting and we're getting notified of that now we could hear do an unmonitor of the process using the monitor reference we got but we don't know when the last bit of data is going to come in so instead let's also wait for an exit of just normal type which means everything went well it wasn't a failure at all it was successfully completed and let's give the uh user the good news and now if we run that boom we get progress messages we get the data we wanted back and the processing is successful and even when there's an error in our code that may slip in later on or maybe only happen one time in a thousand the websocket or if it was a TCP connection or whatever the connection is the caller is not going to uh fail they're not going to be unnotified they're going to know that something happened wrong it's best of all it's not going to take the connection down because it's being written or it's being run in a separate process that's running the extract data uh function as a worker for us so we've successfully isolated both of the processes from each other um and you can imagine if this was you know a a phoenix Channel or what have you that we could instead of having a receive block that we've written ourselves we can actually turn each of these uh Clauses into individual um handle info uh me uh functions that would be looking for you know down messages or your data message um and so you can run as many of these as well once you're once you have a gen server because you can just spawn off a worker they can take a long time maybe it's doing some some data crunching in the background or waiting on a third party service to respond to it um and then maybe the client asks for another job to be started and you can just spond at another worker uh so this is a really nice way to not only take advantage of the multiprocessing that you get um you know for free almost with with the beam but it helps you separate or isolate failures and faults in one part of the code from the other um again if your socket was to go down your worker would continue so this is a really nice general purpose kind of pattern that I find quite useful um when writing um services in Elixir um I hope you find it uh useful as well if you have any comments or questions please leave them in the comments below and we'll see you in the next episode
Video description
In this episode we look at one method to gracefully handle unexpected errors in our code's business logic by isolating faults from other parts of the program using processes and monitors. Exploring Elixir code repository: https://github.com/aseigo/exploring-elixir