We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Analysis Summary
Performed authenticity
The deliberate construction of "realness" — confessional tone, casual filming, strategic vulnerability — designed to lower your guard. When someone appears unpolished and honest, you evaluate their claims less critically. The spontaneity is rehearsed.
Goffman's dramaturgy (1959); Audrezet et al. (2020) on performed authenticity
Worth Noting
Positive elements
- This video offers rare, high-level detail on the specific mechanics of large-scale data reconciliation and the reality of negotiating egress waivers with AWS.
Be Aware
Cautionary elements
- The 'revelation framing' makes the cloud exit seem like a universal 'best practice' rather than a specific strategic choice that requires significant internal engineering talent.
Influence Dimensions
How are these scored?About this analysis
Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.
This analysis is a tool for your own thinking — what you do with it is up to you.
Related content covering similar topics.
Transcript
Well, yeah, there's no real difficulty. It's just copying some stuff, right? Among those five pabytes of data is spread across hundreds of buckets and uh on the order of about 5 billion objects. The concept is straightforward, but you got to be careful and do it right. And there are easy ways to do it right, but there are a lot of easy ways to go wrong, too. >> This is for portables, a place where the team at 37 Signals shares their behindthescenes work building base camp. pay and opensource projects. We're diving deep into what we've done and how we've done it so you can learn a thing or two and learn from some of our mistakes. I'm your host Kimberly. I'm joined to help with the technical side of discussion by Fernando from our engineering team. Hello Fernando. >> Hello. Hello. >> Well, we are talking this week a little bit more about our move out of the cloud. If you've been following 37 signals at all, you know that we started a move out of the cloud back in 2022. We have wrapped up the big portion of moving data out of S3. To join us, we have Jeremy Dar, principal programmer here at 37 Signals who did much of that final work. To talk about it, Jeremy, thanks for being here. Before we dive into our topic today, tell us a little bit about you and how long you've been at 37 Signals. >> Um, yeah, I'm programmer here. I've been around for some ages. And uh this uh storage migration is the the most recent of probably half a dozen over 15 20 years. Um so coming to it feeling like it's uh it's time we need to move every all the bits across the continent again. >> Okay. Well, I'm sure there's a lot to dive into. Fernando, you're going to help me with all the technical aspects of it, but let's just start kind of where we ended, if you will. We know we've started this move back in 2022. Um S3 moving that piece of data was like the last part of the project as I understand it. Kind of tell me why that was the last piece and then we'll we'll dive into the technical side. >> Yeah, there are two things. It's expensive um and it's scary. We've got a lot of things to move and uh we had built a lot of trust in S3. It's super durable. It's super reliable. Um, you can't go wrong with S3. It's like the old school IBM thing of like nobody's gonna get fired for using IBM. Whereas storing your own stuff, um, it's all on you. And so if you haven't been shouldering that risk for a long time, kind of forget what it feels like. And, uh, so a bunch of planning, a bunch of mitigation work, risk assessment can help prepare for that. Um, but kind of getting over that hump and just uh choosing to do it, we're going to do it uh solves that problem. The other is cost. What are we what are we doing here? Um, it's easy to open up our wallet and pay a AWS and it causes us to cry a little bit or a lot bit. Um, but when we want to replace it, uh, we're looking at spending hundreds of thousands, millions of dollars on something and we've got to be pretty sure it's going to work or we've got to be certain it's going to work. Uh, but also not spin our wheels for weeks, months trying to figure out and prove it for sure. So, we need to come up with some kind of framework for understanding what we need um, and being able to prove it before spending a bunch of money. Third thing really S3 also has this tricky bit which motivated some of the cloud exit of uh you got to pay you got to pay for everything and one of the things you got to pay for is a bandwidth coming out of S3. So if you want to move your data you've got a little bit of a well again a lot bit of a handcuff situation. You got to pay to get your data out. So, uh, the EU came up with some kind of regulations around this and all the major cloud providers got out ahead of the regulations saying that you've got to be able to you got to be able to exit um, and you can't kind of keep people's data for ransom. And it was like this before like we didn't necessar people didn't necessarily want to leave S3 because the alternatives uh were not great and you're on the upswing of cloud adoption of look at all the things we're not doing. You just get to send it out to this abstracted thing and pay some monthly fee rather than doing it all yourself and paying you know hundreds of thousands upfront for storage hardware that you've got to run yourself and maintain yourself. Um so anyway uh AWS came up with um this waiver program wherein you could get your data out of S3 uh under certain conditions and the conditions run like this. You've got 60 days, 90 days, whatever. You've got a ticking clock and you got to get all your data out. You can't halfass it. You got to get it all out. And if by the end of that time you can prove that it's all out, then you'll get uh AWS credits for the bandwidth cost. So you got to estimate how much stuff you're storing and then how much bandwidth it should take to get out. If you do it all perfectly, which of course everybody would do, then that would be your credit. So the public messaging is like we're chill with this. We'll let you get all your data out. The reality is uh you've got this kind of tight binding contract and you got to do it perfectly and then we'll give you a refund. So yeah. >> Oh wow. Do we pick the time frame? >> We do not pick the time frame. >> Do we pick the 30, 60 or 90 days? >> It's negotiable. So depending on u uh well the the internal messaging toward customers is u is a limited time frame like 60 days or something. Um the external messaging is well we'll work with you. We'll be we'll be reasonable. So the reality is kind of in between. We we do have great account reps. Um so I can't fault that uh that end of things. It's been wonderful interacting with AWS. But you see both sides of things of uh we're going to look to the world like we're ready for you to leave, but in fact we're going to make you jump through a bunch of uh very tight hoops. Well, I I was going to joke like, "Oh, it doesn't sound that difficult. You just I don't know, copy a bunch of files over to your hard disk drive somewhere." Like, what is the real difficulty there? >> Well, yeah, there's no real difficulty. It's just copying some stuff, right? It's just that if you want to copy things, well, there's a lot of things you got to do, especially if you've got a lot of files stored. For most people, this copy is is not technically hard. Uh the basic process would be um clone your S3 bucket. A bucket of objects is like a folder full of files. Um you list the bucket to see what's in there. You copy it over to your new place. Um and typically be doing this with a live system. So you need to have some kind of awareness of your system would need to know that you're moving to a new system and there's a live old system. So if you want to do it without downtime, you need to be able to store files to both places. You need to mirror your files to both storage systems. And then you need to do a copy and uh and after that first copy, maybe you've got some stuff that's in one storage system but not the other and vice versa. So you need to bring them into sync. You need to reconcile them. >> And none of this is offered by AWS. It's all in your >> AWS does offer something that does something like this, but you got to pay. [laughter] >> Feel like that's a theme. >> Yeah. So, if there's there's a data transfer service uh that can do something like this, and in fact, it can do a lot more. It can do like incremental syncing between disperate systems. Um, but it is very expensive. You're looking at tens of thousands of dollars to uh to move large scale buckets of stuff. Was was it ever an option for us? >> We did evaluate it. It would be very nice uh not to have to do this job ourselves um and to have somebody else do it. And there are vendors that do stuff like this. But again, you're looking in on the order of like tens to hundreds of thousands of dollars. >> Wow. >> Depending on the size of your buckets, and most of them will scale with your storage. So they're they're looking to take a percentage cut because the more you've got, the more you can pay probably. Jeremy, when you're saying the size of our buckets, like how much data are we talking about that was being moved? >> So, in aggregate, we had uh we had about 10 pabytes of data across a bunch of buckets. Um some of our applications were responsible for um a lot more than others. Um our average object size was about 1.1 megabytes. Um of all that data, we're also geographically distributed. So dduplicated we had probably about five pabytes of like unique objects and among those five pabytes of data spread across hundreds of buckets and uh on the order of about five billion objects. It's a lot of stuff. It's you get into the realm where you you can't like if this is a folder on your computer and you tried to open the folder your computer would crash and you can't you can't list a bucket of that size without uh taking literally days. So there there are a bunch of interesting constraints that come into play when you try to do this conceptual process of you copy then you stop your application or whatever storing things for a little while. Then you do a catch-up copy to reconcile to make sure your destination has all the stuff that was in the source and then you cut over. you have you start using the new system and you got to be sure that you actually got everything that everything was copied correctly um and there were no mistakes and you didn't miss anything and nothing showed up while you were copying etc etc. So even at the small scale you kind of the concept is straightforward but you got to be careful and do it right and there are easy ways to do it right but there are a lot of easy ways to go wrong too and once you get to a larger scale um there are a lot of easy ways to go wrong and a lot more easy ways to go wrong and a lot of ways to go slow. >> Well I feel like we should talk about all of those things. not only like the things that are easy to do well, but what are all the things people should avoid if they're trying to do this? [laughter] >> Well, what would be wonderful is if uh if egress were free and we weren't stuck in the situation of needing to move quickly. Um because when we look at the the kind of bigger picture of what the job to be done is, we got to move five pabytes of stuff um within we negotiated 90 days. Um, and that was based on kind of our back of the napkin math of what kind of what size of network connection we have available and uh and how quickly we can move stuff. >> Is there a is there a limit on the AWS side on the connection? >> There are plenty of limits. Yes. And there are some hidden limits and some stated limits. So that that comes into play when we try to figure out what our uh limiting factors are. Sorry. So if we do a bounds analysis of >> let me backtrack just just a little bit. You said like, okay, we negotiated 90 days. What what what was that like? Like like, oh, we have 5 pabytes if we have access to, I don't know, 10 gigabytes per second and we do this twice. >> I was chatting with a rep saying, "Here's how fast we can possibly do it." >> So, uh, that's what we need. And they said, "Okay, >> okay." Um, we built in plenty of buffer because there's a part of this process is not just the copy. And something that uh that anybody doing a process like this will quickly discover is that copying is >> uh it seems like uh the central point and the purpose of the job, but it's really reconciliation and verification. Making sure that you did what you thought you did. And the doing what you thought you did um is as costly as the copying. And it has different kinds of uh limitations because you need to go look and see what you've got. And that means listing all your files, listing all the objects in a bucket. And when this process can take days to perform, you need faster ways to do it. And you need cheap ways to do it because again, you got to pay. If you want to list a bucket, you got to pay for API requests. Um, and so you got to be careful that you don't get in a situation where you're u uh repeating work or doing unexpectedly expensive work. Um, and in many cases these like API requests are cheap. So, it turns out we're totally fine, but you got to do the due diligence first, not so you don't end up discovering that in order to save whatever tens of thousands of dollars on a on a paid service, we don't end up spending tens of thousands of dollars on errant API requests ourselves. >> Yeah, that makes sense. How do you even um I'm trying to wrap my mind around this. Where this process must be automated somehow, right? Is it in the app? Where where does this reconciliation process happen? So that that gets in a little bit deeper um into the approach that we arrived at. Um so there are there's more than one way to do reconciliation and something I'd recommend to uh people who are moving just a single application which ideally it's a situation you'd be in. Um so a little digression. Um looking back uh the ideal way to set up AWS accounts in the first place is around these kind of scopes of uh kind of responsibility radius blast radius of if you ever wanted to leave you got to take your whole account out. So don't go putting a bunch of buckets from different places. Don't share an account like make accounts early and often so that uh so that accounts are uh aligned with your application and so that gives you the flexibility to move one application at a time. Uh we were in that situation to some extent with some of our newer stuff, but our older applications, we had a shared account, so we needed to do everything all at once. And when you have multiple storage systems in a single account and you need to move them all, you can't tailor the solution for the system that's being moved. Instead, you need something that's going to work for all of them. So that led us to uh it's got to be something that's at a lower level um and is mostly transparent to the applications we are running. So if I were doing this with a modern Rails app, this is something I'd build into active storage at the application level. I'd build in uh some kind of modeling for where things are stored. So you get metadata uh within the application itself uh knowing that I've got this object stored in AWS and I've also have it stored in my whatever my new destination storage is. And so the process would be that I've got some rule that says uh I need two copy storage and they need to be in each of these locations. And so the application could do that with just a bunch of active jobs uh copying things on its own. You could do it lazily in the background and just let it trickle through if you didn't have these other constraints of you got to get it all done. You got to blast it at maximum speed. Um, and you've got to make it work with and you got to make it work with older apps. So, not all of our stuff is on uh latest Rails and we de we extracted active storage from our own apps and so most of our older apps are using older abstractions that were kind of the source of the source of the extraction. So, we needed to be compatible with all these possible systems. So, we needed something lower level. So we built >> so we built something uh we didn't want to build something um >> but we also didn't want to pay a lot of money. So kind of rock and hard place thing um and we are in the position of having the expertise like we are a technical organization. We have programming and operations teams. We do all this stuff. So it's >> natural for us to do our jobs. [laughter] Maybe this isn't the thing that we'd outsource. If there was a natural fit, we would. And in fact, we did uh to the greatest extent we could by picking uh purpose-built tools to do each of the jobs. Um and in fact, we went through multiple iterations. As we got closer and closer to um to the kickoff of we want to begin the copy um we reinvented the system that we built to do copying several times as we discovered kind of quicker, simpler, straighter paths um to do that job. So we started out with a with um something that would be dead simple um of distributing jobs over um a bucket itself would store objects and each object would be a final manifest and then that file manifest would be available to a bunch of workers. We'd spin up as many workers as we needed to saturate whatever bottleneck we had. So the bottleneck could be um our network connection, how fast we could pull things from S3. It could be our write rate to our destination storage. How quickly can we write objects? And what kind of band are we bandwidth limited? Is there a a metadata limit on the IO operations per second? Um or is it going to be uh the S3 read rate? So we can discover all these things. >> Read rate. Why why does it have a read rate? >> You're just throttling you. Uh yeah, because S3 doesn't technically have a a cap on read rate, but they do have a cap on what they call bucket partition read rate. And so a normal bucket will only have a single partition, and that's kind of well, the way S3 is laid out, it looks like it's kind of hierarchical folders, but it's not. It's actually key names that are separated typically with slashes. So it looks like a file path. And the natural way to partition a bucket would be let's take all the first file paths up to the first slash and we're going to uh turn that into an internal S3 partition which is like their own way of kind of sharding the bucket so that they can scale out so that if you want to do things like write a ton of things to S3 um it can be fanned out to multiple partitions and as you write more it's smart about noticing that write rates are hot on certain partitions. and it'll split them automatically for you all behind the scenes. So, you can't even tell what your partitions are um unless you give some kind of cues or use like a traditional uh like slash character, which can help S3 figure out, but also you don't need to. It'll just do it for you, but it'll do it based on uh your usage. So, if you have a moderate usage app, uh the first time you're going to have high usage is when you try to copy things out of it. So, you're going to hit the the rate limits pretty quickly, especially if you have a lot of small objects. If you've got a lot of big objects, you're going to be bandwidth constrained um probably on your own band on your own network connection. If you've got a lot of small objects, uh you can fill that pipe with tons of connections and uh you're going to hit the u uh the rate limit quickly. The rate limit is for the record uh 50 5,500 git requests per second. So that's fetching >> and we hit that. >> Oh yeah. Yeah. Uh >> oh my god. >> Yeah. That's >> that's insane. >> So out of the box tools can hit that pretty quickly if you especially if you have small objects. Um right. We got a little bit lucky because when we uh did our migration into S3, we worked with S3 to um to pre-partition our buckets knowing that we'd be writing at high rates. So on the way in, we contact contacted S3 ahead of time saying, "Hey, here's about how much stuff we've got. Here's what the key layout looks like for the objects." Um and we generally use random hash keys, so it's like completely uniform. So on S3's end the job is easy because um when you have a uniform key distribution they can just say let's take the whatever the first two or three characters of the key and you will use that as partition keys. >> Um so our big buckets were already partitioned. So who knows what the limit could be because when we did load tests we didn't hit a rate limit on those larger buckets. On smaller ones we quickly hit a rate limit in our load testing. I I know this is mostly about S3, but I'm also curious, did you really hit like a ride limit because um David has spoken at length about this, about the hardware, about the amount of money that we spent on new brand new hardware that's like blazing fast. Even that couldn't keep up like we just like went full throttle. >> We certainly hit its limits. So, uh it became a question of where >> where are the limits and which one is. So we got a system with a bunch of components and each component has a kind of maximum has a cap and which one is going to be the weakest link. Which one's going to slow us down? And it turns out that we ended up pretty similarly bound by our network connection. We got a 100 gigabit uh network connection dedicated to just this copying process that thanks to our data center pros at Summit. Um they set this up just for this job and we uh set up a separate VLAN for the machines that would be doing the work. So it was essentially their dedicated little network universe of you can saturate this pipe. Turns out to not have been completely true. We were actually sharing it for a couple other things which we discovered because as we tuned our system to ek out the maximum possible performance, we actually overshot a little bit and started uh interfering with other traffic. Uh but that that we that was what we thought would end up being our uh our bottleneck. And uh to our wonder um it was not our destination storage. We had initially considered doing using minio and uh using hard drives and our read and write rate for normal application usage can easily be satisfied by spinning disc hard drives. We didn't exactly relish uh the idea of maintaining a bunch of spinning hard drives uh because the failure rate can be notoriously poor depending on which batch you get and whether you have kind of a hot rack in your data center. And uh it's just we're we're not looking forward to it. And um Aaron, our head of ops team, had a line on a new storage system from Pure Storage uh Flashblade, they've got this fancy, super duper um proprietary flashy bashy setup where they've got um rather than using off-the-shelf flash stuff, they mounted their own flash on their own boards and did something a lot cheaper, kind of bringing the u kind of flash of two years in the future back a couple of years which made it kind of cost competitive with hard drives. So that was that was a surprise and ended up being a huge blessing. Um not because we need the performance for steadystate usage but because we needed the performance for the copy. So if we had been on hard drives, we definitely would have been limited by our right rate uh into those into that storage cluster >> by physics just how fast you can spin those right. >> Yeah. Yeah. And so I mean it would depend on the number of drives and like I mean actually we probably >> Yeah. Thinking back I think we probably would be able to satisfy 100 gigabit traffic on hard drives cuz we'd have so many of them. In any case, it was a blessing. Not quite in disguise, but we're all happy to uh to to take the other path. Um I'm digressing a little bit, but uh if you're choosing storage systems, if you're choosing storage systems, like it can seem like make the choice based on what you can afford now, but uh it's also what what you're paying over the course of 5 years or 10 years. And our our uh total cost of ownership analysis was based on 5year, 7year, 10 year, what it would be like to to keep the system around for a long time. And and the power savings alone from flash uh are significant. Um we it's a lot cheaper when your power is expensive in a costly data center um that uses up less rack space. >> I hadn't even thought of that. >> Yeah. Like when you go from an actual like I'm just picturing like a bunch of machines in a in a place, but if you're going to like clone five pabytes, it has to be a lot, right? A lot of machines, a lot of power, a lot of network, a lot of everything. >> Yeah. Yeah. This is I mean, it's a crazy time in the storage world. There's a bunch of new form factors for solidstate drives coming out right now. So, we uh it may give me a little bit of FOMO because I see these things coming out just as we purchased this giant system. And you know, we still made the right decision for the time, but in about a year or two, there's going to be kind of a new generation of solidstate drives coming out at uh 256 terbte size per module. So, you can fit um shoot what was it [sighs and gasps] 40 pabytes in two units rack space. >> Wow. >> So, you could you could fit all of our storage into just a tiny little bit of a rack. And in the hard drive era, we'd be looking at like two full racks just for that storage. And so the shrinkage and power savings are dramatic. And it's all happening now. And a lot of this is driven by AI stuff of uh people need a ton of data stored and super high bandwidth to it. And so new vendors are cropping up daily trying to do this kind of job. So they're also driving uh uh the flash hardware side. So hopefully this will become just kind of a commodity storage problem and you'll be able to go to Super Micro or Dell or whatever and order up some servers that are packed full of these drives and uh you won't need a special setup. You'll just go to Newegg and >> and buy one. >> Yeah. >> Yeah. >> The limiting factor by far is network bandwidth. So uh in in almost any copy that's going to be uh the the cap you're going to hit. So having a good data center partner is uh essential there. We were able to get 100 gigabit connection set up in just a span of days. Um and we had plenty of lead time etc. But still it's just wonderful to be able to uh to bring in a big pipe like that. And of all things we have AWS direct connects already but not allowed. You cannot use your special direct fast connection to AWS to to uh do the data egress. You've got to use the public internet >> even if you pay money pay more. >> Oh wow. Yeah. So we yeah you pay money but to get this other bandwidth covered you've got to use the egress cost. And I guess maybe it's just tied up in some kind of red tape in um uh the direct like would they cover the cost of the direct connect like maybe they can't account for it. Who knows? Or >> Jeremy I do have a question because obviously we were moving multiple applications or working with multiple applications to make this move. Was there a specific order that you were moving them or was it just you just kind of picked an application and did that? Like what what was the reasoning behind the order for the move? There were two phases. So we we uh chose some smaller applications with uh with less storage um but that would be representative of our larger of our applications with larger storage needs that we would migrate before the egress window opened. So essentially wanted to test the process. We wanted to be prepared that when the window opened we'd be able to blast. Didn't turn out to be that way. We had some delays as we optimized and restarted things and whatever, but um it was crucial to do to do that, identify some systems that we could do real life copies and not just kind of test runs. So we chose a couple of representative systems and did those first to prove uh kind of a a blueprint uh for how we do it for our other applications. And each application ended up being not quite cookie cutter, but uh once you've kind of improvised a recipe a few times, you kind of know what you need to do. So you come into it with a plan of attack of uh you need to do dual rights to multiple places, you need to do a reconciliation step, you need to be um uh have a well- definfined cut over process. Um, and so you come up with a checklist, validate the checklist, uh, by doing a live migration and kind of iterate, fine-tune, and then you're ready to go. Then when it when it comes to the actual copy, we go criticality first. the things that we want to be absolutely sure and the business critical stuff base camp all our primary revenue generating apps with the big data we want to get those started as soon as possible because there's the most to copy um and we want the greatest assurance that we did it all properly so give us the most uh headroom for uh unknown unknowns we've got our known unknowns of things that might crop up but we also have room for just we don't know, who knows what would happen. And we came up with a bunch of those. So, we're grateful for having uh having started them early. But yeah, definitely definitely dive into the trickiest biggest thing first. That that makes sense. I'm still trying to wrap my mind around this is like um you go to a small app and you're like, "Okay, you know what? We need to migrate this." You start building the program that you mentioned, right? the reconciliation program. Is that program uh a rail server on its own? Is it like the modifications that you mentioned to um action action storage active storage? >> No. So, we built a new thing um that that uh kind of scales up the basic idea I started with of I need to list things to know what's in the source. I need to copy everything in that list over to the destination. And at a small scale I can use a single program. Uh we used one called Arclone. You can also there's one called Arsync and most folks technical folks have used one of these before. Um and you just fire it up. You give it a source and a destination and it turns and it does the job. Uh at most scales that'll work fine. And in fact for AWS um their free bandwidth egress limit is is pretty generous. So most people would fit within it and you could just rync and do it do it yourself or arclone and uh and call it done and it does all the bookkeeping for you. Um at our scale we needed to fan out to a bunch of workers doing this job. So we needed to do it in parallel. That means splitting up batching it up and so it becomes a u of a classic like map reduce problem. you've got a big input you need to spread out to a bunch of jobs um and then they've got all their individual outputs. In this case, it's take a big list of files, split it up into batches, send those to workers that are going to do the copies. Uh have those workers have some kind of supervision that's tracking what they're doing, their progress, whether there's an error, um retries, all that kind of stuff. Um that's where we used a Rails app for command and control uh for wrapping up the uh the jobs and the work. Uh we arrived at a Rails app after trying what what I thought would might be some kind of simpler low-fi ways of doing things and all of all things I kind of backed my way into doing to using Rails because I was missing some of the u the uh conveniences of home. One of them was uh uh secrets management credentials. Here we're doing something that's copying between a bunch of AWS accounts to a bunch of destination buckets. You've got a ton of sensitive credentials in one application. And so as I was building this kind of scriptbased kind of simple system, it's like I realize I'm rebuilding a credentials manager. This is not not the life I want to be leading right now. And how about I do something that's already built for me? And so going to essentially to vanilla rails um because of this mildly auxiliary concern. I had a bunch of other kind of pressures swirling in my mind that were resolved by this. So who knows whether my unconscious was also kind of like uh I can kind of feel that a change is going to need to be made and here's the thing that just triggered it. And the trigger then led to a bunch of nice outcomes like being able to use active job and solid Q and um a lot of things we're familiar with for basically how do you distribute this work. So in the end you have like a single rails app that you're constantly like during this window you're constantly monitoring like okay did it complete everything or are there any retries errors and then you go and fix them like you say right oh we hit this limit let's try and and work around it. So that's part of the approach based on the initial migrations as our kind of test runs as we discovered in the test runs you're going to have new errors kind of surprises crop up um and so we uh prioritized uh failing fast and uh and not trying to be kind of resilient in automated way. So not building in things like exponential backoff too early uh because sometimes things were not things we wanted to retry. there were actual errors. So being able to identify something that was truly like a transient failure um and then automate it late rather than early. So we treated it kind of like an endon cord of like here we've got a production line, we notice something's failing, we pull it, we stop everything, we fix it and then we you know proceed from there. So that drove a lot of other decisions of uh when you break things up into chunks, they need to be observable, they need to be retriable, um they need to be supervised. Um and especially for a diagnostics and troubleshooting, you need to be able to see what's going on. So in in these cases, you've got a act actor job process that is invokes another tool and you've got hundreds of these running and so you've got this standard output and standard error from a bunch of tools. You've got the uh exit status. How do you see them? Um and so this for me this was kind of a crucial stumbling block of like if I can't see exactly what's going on, I don't know what's going on. And so I don't want to spend a bunch of time um guessing at and troubleshooting. I want to just look at the output and I want to be able to figure it out as if I were running it on my own console. So that was a critical step early on too of making something that was easy to supervise and just witness if something's failing. I could I could try it myself. I could invoke it myself or I could pull up uh a transcript of what that uh what that process had done. So this was the job of the Rails app of uh coordinate uh pulling an inventory or a catalog from a source split that catalog or inventory up into a bunch of pieces um which was its own whole thing. Like you've got something that's huge and you need anyway you get into tooling in a little bigger um and uh and then make a bunch of jobs for all that stuff. Um and each job well it has its responsibility is its uh chunk of files. Um and the output for that job is a bunch of things like status and transcripts and whatnot and those themselves are actually stored in a storage bucket as well. So every job there's a unique ID associated with it and you can go inspect um the whole process and in fact we did have a live tail. So, since these things can take a while, you can just you can visually see what's going on. You can you could just kind of snoop on uh on any transfer. >> So, you were basically Neo for 90 days. >> Uh not 90 days, thank goodness. [laughter] Uh we ended up getting it down uh thanks to uh to pure storage and the the very fat pipe of bandwidth um to uh less than 10 days of transfer. >> Wow. um which turned out to be pretty critical u because that gave us uh lots of time to uh to do reconciliation verification um yeah we didn't know that and so we built in a lot more buffer than we needed but I'm sure I'm glad we had the buffer because I was also going to I was headed on vacation on sobatical right after this was going to wrap up which is not the wisest of uh uh career choices but uh [laughter] >> Jeremy tell us a little bit about those 10 days like is it 10 days just like non-stop are you break kind of walk us through >> oh it's non-stop yes >> okay so so the setup is I didn't want to leak kind of trickle things into a pipe and have to be carefully tending things I wanted to feed a pipe and have a backlog um and so I can well I can go through some of the technology stuff just briefly um if you're copying a bucket with billions of objects uh from S3 use S3 inventory reports. It's something you can turn on S3 console. It's easy to do. You got to pay. You got to pay. >> It's the theme. It's the theme. >> But but it is the most efficient effective way to get a uh a large scale bucket listing without doing the work yourself. It is delayed. The the most frequent you can do it is daily. And so they drop on a schedule. And so for this kind of process um where you want to do a big bulk transfer uh daily is fine. And particularly if the system that you're migrating is doing dual writes you're writing to both the uh the old source and your new destination you know that you're already in sync. Um so there's not going to be missed rights. So the thing you do here is you turn on dual writes to both places and then you take the inventory from the day be day before you turned on rights. So you know all all new objects are being written to both places. So the old inven in inventory is sufficient for knowing that you're going to get a bulk copy of and it's going to bring you into a into a chord. Everything's going to be the same. So you start from that snapshot of the bucket. Um, so we used S S3 inventory reports. We didn't have that turned on everywhere. Um, we developed um, uh, we looked for another tool that could do something like this. And there are a bunch that take a similar approach. If you try to list an S3 bucket, it will take literally days because you need to sequentially list files and it lists whatever a thousand at a time or some 10,000 at a time, something like that. Anyway, order of magnitude wise, it it's ages. Uh but there are some tricky very clever ways of doing this uh where you can do it in parallel uh by uh estimating what the prefixes of a bucket are and you can ask S3 for bucket listings starting from a certain prefix. So if you know your key distribution then rather than doing a single sequential listing you can instead do thousands of parallel listings for every prefix you've got. So you can turn a multi-day bucket listing into something that takes like 30 minutes. So that's I had that in my back pocket in case uh we needed live listings. Um if we needed if we discovered that we were going to be in a situation where we needed to do a downtime or we weren't able to do dual writes in a system. So you'd need a downtime to be able to stop writes from the old system but also not write to the new system so you wouldn't get out of sync. Um and also to uh to list the objects in the destination buckets uh because uh pure storage and the flashb product does not have an equivalent to inventory reports. So you've got to do the listing yourself. So on that other side you want to do you want to take the inventory report from S3 and then you want to develop your own report of the destination and compare them and any discrepancies need to be accounted for. So you need a fast way to do that. So there's a tool called S3 fast list uh that's on our GitHub. We forked it from um AWS samples um and we adapted it so that it would support um nonS3 storage systems so that we can use it to to list uh uh flashblade buckets. Works great. Very clever approach. Really pleased to find that. We didn't end up needing it that much, but it was a wonderful diversion and felt like have felt like a kind of an insurance policy of a special built tool in the u in the toolbox. Um the next thing was how do we split these things up? And we started with oh gosh, how are we going to take this? Uh the S3 inventory report comes in either CSV format or parquet format. Um and parquet being like flooring. it's like kind of split up and actually I don't know how far the metaphor goes um but it's a very efficient format for doing columner um u data storage. So it's great for analytical processing and whatnot where you know which columns you want to work with and you need to do some transformations on them. And it's particularly nice for something like this because there's tooling that can ingest it that can stream it from a remote destination and um and operate on it and then emitt it again. So took us a while to discover this because I was looking initially for just something something like polars. There are different tools that can ingest uh parquet and uh and operate on it, split it up, whatever like use a windowing function to we wanted to do something like uh uh split this not just into number of objects but in uh total batch size so that we would evenly distribute batches across machines so that we wouldn't end up with uneven bandwidth demands. So you wouldn't have one machine that is working on a batch of a bunch of small objects and it can't fill the pipe. What you want is to have an even distribution size-wise so that you're maxing out uh the pipe on each of the worker machines. So to do that, you need a windowing function that goes through the uh the inventory report and does a cumulative sum on the bite size of the objects. And each time it reaches 10 gigabytes, it says up. I'm going to uh do a split right there. And I'm going to turn that into a chunk. Um and this turned out to be hard. Use a lot of memory. Um and uh kind of work. Um but ended up maxing out the memory on the machine I was using. And it's like, well, okay, this is probably might be feasible, might not be feasible. And then I discovered duck DB. freaking awesome. Doc DB is amazing. Um I just I cannot sing its praises enough. Uh it's like uh somebody discovering SQLite for the first time. Well, Duck DB is like SQLite on whatever next generation steroids because it does it can even do the stuff SQLite does. Maybe better because I I'm just glowing with its uh capabilities. But it can do SQL. It can like connect to remote databases. It can uh work with a local database in process just like SQLite. It can work with CSV files and parquet files on the local file system. Uh so most kind of big data data science stuff duct DB can do locally on a single machine and super efficiently. It's really smart about spreading out IO's uh to do things as uh smartly as possible u to try to avoid doing things like bringing everything into memory. So, I'll sing his praises a little bit more. Um, it not only can connect to everything like a Swiss Army knife of data analysis, it can also connect to remote URLs and to S3. So I had this whole system built of uh ingesting data from S3, downloading it um and then staging it in a u on the local file system and then doing the splitting myself using my own tooling and then storing those split files uh in another bucket as a staging area for jobs that would then be uh dispatched to work each of those chunks. Turns out I could skip all of that. uh with one duck DB invocation, I can point it at a glob that's referencing multiple uh S3 files. So the the inventory report is split up into hundreds of files. It can reference all those files, um stream them all in, partition them the way I like, and then write them to a remote S3 compatible file store. All streaming. So there's no local file system. There's no other code I need to write. It's just you got to configure it properly. You got to be know what you're doing. But when you get it working, it's like, "Oh god, yes, this is sweet." You're working with a remote thing, streaming it all through, not using a bunch of memory, and then writing it out to uh remote storage. And then I was able to take that uh the files that had been written, and I kind of wrapped active storage records around them. I said I'm going to make um active storage records that point to um uh where those batch files have been stored and then I distribute those uh active storage records out to uh to the jobs to work. >> That's a benefit that you were in a Rails app, right? >> That is a benefit of a Rails app. Yes. Yeah. And I had that abstraction to work with. And thankfully active storage was able to accommodate this where I was kind of going behind its back because I was using duct DB to write the files rather than using Active Storage. And I said, "Haha, Active Storage, I've got these files. Can you make use of them?" And it's like, "Yeah, of course I can do that." So, you just feed it the key of where it's stored and active storage will be happy to work with it as you bring it. >> That is so cool. >> So, any case, >> duct DB amazing, [laughter] able to able to partition the problem and uh kind of eliminate a whole step of what would otherwise need to be custom code. It just goes on to show you that like nothing is new. Like how did the TV guys know like oh you know what what process would be really nice if you take this partition it and then put it this way like that's so so cool. I I love that. >> Um >> yeah the thing in common like people's resource constraints everybody's constrained in similar ways and they all have different problems but they're all stuck in similar ways and here somebody comes along and solves it elegantly and does it with open source. >> Very cool. And and this process this had to happen like once a day, right? Because of the limitations of the uh inventory. >> Am I right? >> So I I did it in a um I did end up automating it, right? Of course. But it it can it can happen at most once a day. So I made a >> a scheduled scan for new inventory reports and I automatically process them so they'd be ready if I did choose to use them. And I only did this because it turned out that uh that partitioning was so easy and cheap. You know, I had anticipated it being like a a pretty slow process, and so I wouldn't want to just be firing it off all the time. I'd want to choose which specific inventory report I used, and it would take hours or who knows how long to uh do the partitioning. Um, but now is a matter of minutes. So, I'm like, well, I'm just going to I'm just going to do it. Make it easy to choose uh which which inventory report I want to use as a sync s as a copy source. So from uh the app dashboard, I could have a list for every app and every AWS account in every bucket uh which things were transferred using which uh source manifest. I've got a bunch of inventory reports. For each one, I can see that I've kicked off a copy. I have state tracking for uh every part of the process of partitioning to copying to errors to reconciliation. >> Wow. As a non-technical person here, I do have a question because all of this sounds very hard and scary. I'm curious, Jeremy, like what was the most nerve-wracking part of this process? >> Deletion. Yeah, the the final deletion where you you know, I mean, it's like anything that's uh high stakes. Your brain's got a lot of things going on. My brain's got a lot of things going on. I've got feelings, sensations. I've got some kind of cognitive whatnot that's blinking on and off sometimes. And some parts are telling me like I know that things are fine, but anxiety is telling me like, >> "Yeah, >> maybe you should discover why it's not." And uh and you know, those two those things need to work together. And I can use my anxiety as a guide that maybe I haven't figured everything out, but I can then use my cognitive process of uh here's the things I've worked out. I've ruled all these things out and I have some standards of proof. I can demonstrate conclusively um in a way that is not dependent on my anxiety that can be externally verified that uh that it worked and that I'm done. Um nonetheless, deletion is still u dicey. Uh but when you do press delete then it's I mean oh >> like sky take the wheel. Yeah. It's just all happening now. Yeah. Right. There's no going back. We're now doing deletions. Um yeah, coming up with the the biggest unlock feeling was um was adding kind of a bolt and suspenders step. You know, keeping your pants up. You want more than one way. When you're verifying and reconciling, you want to be a little bit more than sure. And so um thanks to uh Pure Flashblad's extraordinary metadata uh read write rates, we can do hundreds of thousands of uh metadata operations per second um just without breaking a sweat. It made it easy to do um uh rather than doing reconciliation against inventory reports um to do reconciliation by doing a live sync. And so with an inventory report, you've got lag time between the listing and kind of the live state of things. Um whereas with a live synchronization, you can see uh exactly how many objects were needed to be copied um and and the size of them. And so you could essentially do repeat copies fairly cheaply um until you can see that everything is done. And uh it really helped just psychologically and as a matter of kind of certifi certifiable proof that you get the final copy that says nothing needed to be copied, everything was up to date. Um and that typically came after there's another key step of when you're doing dual rights to a previous source and the new destination. um you turn off dual rights and you go to single write just to your new storage destination and then at that point you know only your new one is new and if you did need to roll back you're kind like ah if I did need to roll back now you now you're out of sync now you need to copy the new stuff back to the old place so that happens after uh you get your green everything was cool nothing new needed to be copied turn off the uh dual rights and uh and a final sync can double verify nothing is changing nothing is accidentally writing. So I did a bunch of other you know more than just belt and suspenders also whatever else contraptions you can imagine keeping your pants up. Um one of them was just changing the um the permissions on the S3 buckets. Some of these we had old systems that had multiple things writing to them and you just you kind of know but do you really know? And one of the ways to be sure is to just turn off writes. So if something was writing to it, it would error. Um, so that's kind of the final straw of u uh assurance like when I go to delete this or I go to turn things off, I don't have some straggler that's going to surprise me and it was only writing some hours of the day or was on a chron job or something so I wouldn't have caught it in the initial sync. My hands are sweating and I had nothing to do with this like oh my god the amount of Okay, the million-dollar question. Was there any downtime? >> There was no downtime. Um everything everything worked and it was yeah really quite wonderful. There were some things we broke. So I suppose >> we don't need to talk about that. Okay. uh [laughter] we so the the 100 Gbit link was actually on a shared link with uh some other things that were using a portion of the bandwidth. And so when we pushed over about 80 gigabits um we started kind of impinging on some other stuff that needed to not be impinged on. And so we did cause some errors elsewhere but the copy was fine. >> Yeah. [laughter] >> Suspenders worked right. Yes. Yes. [laughter] >> Jeremy, question for you. Now that this is done, like it's tied up with a bow, looking back, are there things that you're like, "Oh, I wish I had done this differently." >> Um, >> you can say no. >> No. >> Okay. >> No. Yeah, I'm pretty happy with how things worked out. Um, I I appreciated the incremental approach to starting simple and focusing on the epicenter of what I thought was the epicenter of the problem because I didn't build too much as I discovered that the true epicenter was in verification, reconciliation, inventory management. It was um it's about how do I track what's going on? And the copy itself was fairly simple. I did rabbit hole a couple of times. I built out a um kind of a live view system that I didn't actually end up using much, but in key times I did. So in a kind of like hindsight bias of like I didn't need to do all that, but I kind of did to discover that I didn't need it. It's a little circular, but allowed me to diagnose and troubleshoot uh things that were blockers that would have been really hard to uh to work out otherwise because it's things that turn into like little Heisen bugs of if I run this myself um outside of supervision, it works. But then in my supervis supervision framework um something breaks and it's you know stuff like I'm opening a pipe and I'm feeding things to standard input and I've got another pipe that's reading and um like a pipe could get uh wedged because somebody hasn't read from it frequently enough for something like this. Um and it can manifest as some other kind of error. Um anyway, having that kind of visibility uh was really helpful, but looking back, I could probably delete it from the uh app we built. >> Uh my question would be um I mean you are an eminence within the Rubyan Rails community. Let's assume I'm just an average Rails developer. Can I what what is the complexity of the project I can take on from removing from moving out of history to our own hardware? >> You could take on this whole project and this is one of the I and the magic of this is that uh a lot of the uh feeling of criticality is the business criticality. It's not technical difficulty. >> Um it's a modeling pro problem. Um, and there's some tricky things with process supervision that Ruby doesn't make super easy, but it's not not bad. And there's plenty of other kind of worked examples you could start with. Um, and otherwise the degree to which vanilla rails just works as quite gratifying. uh in a particular well Kamal also and in fact leading into this as part of um uh figuring out our upper bounds was doing S3 load testing against our our um flashblade. So we got this new S3 service. What can it actually do? Well, there are load testing tools out here that can do that. And I use Kamal to deploy one of these tools out to a bunch of nodes and hammer it as much as I could. Worked great. And I used Kamal to uh to deploy um our copying application called Nostos. Um and worked fantastically too. And uh I was able to use accessories Kimal accessories to stand up a the database that did all the state tracking um do open telemetry observability do logging um it was all just single system like a single developer pushing to some VMs somewhere. So it's all kind of bog standard stuff but it's being used in the employment of a kind of critical operation but again the criticality is all in our heads. Uh the actual app is fairly simple. >> Um you mentioned uh duct TV as being like wow an incredible tool in your arsenal. >> I cannot sing his praises enough. >> Are there any other tools that were like completely necessary in this process? Yeah, I I I sung Rails praises a little bit. Vanilla Rails was uh turns out to be the way to go and I don't mind tooting that horn a little bit. This other tool S3 Fast List um was like a little pleasant discovery. Diamond in the rough. Um in our clone itself, what we did to do the heavy lifting copies was incredible. Um it did all the stuff we needed. Um it's open source. It was easy to contribute to. In fact, as part of this, we added a u a flashblade an official flashblade destination to our clone. So, when you go do an an clone of your own, and it's got this nice kind of interactive thing where it asks about where you're going, where you're coming from, where you're going to, and now pure storage, flashb blade is one of the places you're going to. And what that is is just it's essentially like a um uh list of characteristics of the system so that Arclone knows how to best do its transfers. there are certain quirks and oureasies with different S3 compatible file stores and and Pure Storage has some of them and um and uh now it's just all set up out of the box where you don't need to go figure out the command line flags yourself >> that that change was upstreamed. >> Yes, that's that's part of that's part of Arclone. Now >> uh the other key thing is that uh Arclone is bandwidth and metadata operation efficient. Um, and it's oriented around resilience first. So, >> you can operate it in a bunch of different ways. You can kind of back off on the kinds of checks it does, but it does nice things like pull the check sum from the source and check it against the destination. And it can even do a a uh kind of extreme check where uh normally you would write the file to the destination. You get a check sum back and you compare that the check sum that it says was written is what uh you had and you say okay cool it looks good. >> But you can also do it in kind of really be careful mode which is write to the destination get the check sum then download it from the destination and check the actual check sum of the bits. [laughter] So, it's got you covered for kind of uh every degree of risk mitigation you want to have at play. So, if you don't trust your destination yet because funny things can happen where like a bit gets flipped on a hard drive or a gamma ray hits something and >> messes something up where you where it says that you got >> the check sum you expect but it turns out on disk there was a modification. So depending on the level of criticality of your data, how many infinite nines you're going for after that decimal point, >> um, Arclone's got your back. >> I'm just going to sing his praises a little more. Sorry. >> Yes, go for it. >> So So there certain things, um, you can, uh, you can be very efficient with Arclone where you can skip operations that don't matter. And if I don't want to do a bunch of metadata operations again against S3 like checking last modified time whatever things that are not present in the S3 file listing or an inventory report that would normally need to make a head request against S3 which you got to pay if I don't want to go do that and I don't want to bottleneck on making those calls to S3 because again there's just adds to the uh the uh the rate limit bucket and on the flip side if you don't want to do excess metadata operations on the destination you and tune our clone to um to your heart's content. I I was going to ask did we at any point consider like doing random statistical analysis with like a full give me the bytes back check do the check sum here >> we did >> okay >> we did so we did some of the copies with the full pull um and we didn't run into any issues we just did kind of a >> a grab bag sample I just manually ran some of the batches >> just to satisfy myself >> yeah because I mean you could do it for everything and be completely quote unquote completely Sure, but it would be insanely expensive, insanely timeconuming, but that makes sense. >> Yeah. And it it would suck a lot of our our bandwidth. So all the bandwidth we want to be using exclusively for writes, we don't want to be eating it up on reads, verification reads, whereas metadata operations >> um uh they don't use much bandwidth. They just eat up um CPU time on the flash blades. So we were able to calculate all that out too of depending on the number of kind of compute um cores in the flashblade cluster uh how many metadata operations you can possibly do concurrently. So essentially we wouldn't uh flood it. We would we would edge it just a little bit over what it ought to be possible what ought to be able to do to keep it fully utilized. >> That makes sense. Um I want to go back to the to the open source thing. I really I feel it's a it's such a great part of of this of 37 signals that we're both contributing to open source uh in in like the the patch that that uh that went out. But uh I'm curious if we are going to open source our tool like this tool that you built. I did build it with a mind toward open sourcing and part of that was to uh it's almost like a kind of a design discipline that I'm not going to make something like too bespoke. Um, and there were some decision points along the way. At one point, uh, it's easy for us to do an NFS mount, for example, and to use an NFS mount to share files, but it's not terribly different to use an object store and to use active storage. So, in a case like that, I aired on the side of using uh uh vanilla rails of using object storage rather than developing a different kind of file store file storage back end. Now, if I was just doing bash scripts, I would probably do an FS mount. But as soon as I moved into Rails, it's like, let's just do it all the Rails way. And once you're doing things the Rails way, it becomes almost hard to do it in a 37 signal specific way of the things you hardcode in, it's easy not to. And uh things like uh where your credentials go well goes into a separate credentials area and things like AWS accounts and credentials are modeled in the database. I used u active record encryption to store credentials in the database. So they're not part of the repo that's part of um kind of your onboarding process. You start with a blank app. You add AWS accounts. It takes your credentials and uh and goes and scans for all the buckets you have and imports them into the app and it on a regular basis and can go pull them and then you can start copies from that stuff. And similarly you set up destinations and just the same way. So you can support different kinds of sources, different kinds of destinations. Uh perfect setup for being a general purpose tool. As we got closer to uh to the end, I did start specializing some of my design decisions based on the phase of the copying that we were in. So I started tuning the dashboard to reflect what I needed to know. So, and that sacrificed a little bit the things that are important earlier on. So, as it got into like, okay, I just need this for me. Um, I'm going to totally revamp things right here. Um, now the UI is um it's a little bit narrower and purpose-built for my needs and not everybody else could necessarily understand it. Um, so in any case, yeah, it's very open sourcable. Um, we'd like to do it. I would like to do it. um the investment in doing it is like okay you got to like tease apart some things and so one possibility is uh is sharing it kind of like an artifact of here's what what this look like frozen in time >> at the at the end and if you wish to take it and adapt it and turn it into something because this is certainly not our line of business we're not going to make a product out of doing this >> that's right >> so I'm not going to spend a six week cycle of work on polishing up something that then we end up being open source maintainers for it's like no I'm not going to so I'll share it, but I'm not going to maintain it. >> That's awesome. And uh what what are what what do you feel are the next steps now that you're like that the full transfer is done? You deleted everything. What where do we go from here? What what about backups? What about >> uh Yeah. So it's all the other stuff. all the other this is all the programming side of I drove the uh the copies and the transfers from the application and the software looking out our awesome ops team and Nat and and uh in particular um did all the operation side of how do we stand up these storage systems on the back end and uh and feed them and keep them uh keep them well fed and now that we've got all the bits on a disc somewhere uh how do we make sure we don't lose them? It's all the standard kind of data reliability stuff. Um and so we run through the cases of like what can happen, what are the things that we what are the risks we need to mitigate and things like data loss on a drive or um losing a system, losing a power supply. Each of these things has its own kind of redundancy and that's all on the system side. Then there are other kinds of redundancy concerns like what happens when a truck backs into the power transformers at a data center and takes the whole thing down. Well, we're not going to lose our data. Um but it's going to be unavailable. So then we got the availability problem. Well, we've got a second site which is our backup of the first. It's kept in sync live. Um and it it lags in a sense that uh that we don't write to it directly. So, we're insulated from things like uh software bugs on our applications. If we accidentally delete something like, oh crap, we wrote a bug that deleted some stuff. It's like you need a backup. It's not sufficient just to have just to have like high durability. You also need to be insulated from other kinds of mistakes. Um and our second site is essentially that we've got another storage system that similar size, similar class, similar rate and uh and we replicate from the first to the second. So if there is an issue availability wise or durability wise um we've got a place to go and we'll just flip the applications over to use the second one. Now that's not the whole story for backups. There's the kind of age-old 321 rule where you've got uh three composure data on two different kinds of media uh and one offsite. It's a little bit uh rusty around the corners in the age of cloud stuff and uh and particularly with modern flash because the old school thing was uh typically hard drives and hard drive failures and um your other media would often be tape. Um and these days that picture looks a little bit different. What is different media really? Is that other kinds of flash? Is all flash one kind of media? um you need sufficiently different characteristics that if something catastrophic were to happen to one mode of uh file storage, your other mode would not be affected by the same risk factor. So, we're looking at uh doing a third site with hard drive storage or a different kind of flash um and acting as our insurance policy. And the other factor that we're pulling in here is that our two sites are using the same vendor. So we've been very happy with uh PUR's flashblade product and we had similar resource constraints both in power and cooling and rack space in both data centers. So uh using Pure for both made sense and we're able to smush them together into one contract. Uh just sensible step but that's a single vendor and we run the same operating system. We do stagger upgrades, but if there's a bug in the system and we deploy it to both systems, well, we essentially have a single point of failure. We don't have two systems anymore and we've got one operating system and the same bug affecting both. So, what we'd like is a third system with a different vendor. So, we'll we're looking at using Minio, the kind of open core u storage S3 compatible storage system. They've got a dual license where you can use minio free or you can get a support plan with them and it's what we had been considering using before we discovered pure. So uh we'll be going back a little bit and having um not just a second vendor but also kind of an open- source fallback. That's our insurance policy against things like people becoming uh storage vendors being acquired. That is often the final destination that's pushed us off of other vendors in the past of you got bought by some big company and now what do we do? >> Wow, that is a lot of things to consider. >> I I will say the the other storage medium that was kind of a romantic diversion for quite some time was uh was using tape for real. Um, back in the day I operated tape libraries and there's there's something that's just kind of a retro future tech satisfying about seeing a literal robot going and grabbing a tape from a library and putting in a little drive and like I'm going to copy my stuff onto there and uh and it's super durable. You don't need power for tapes just sitting there. Um, and they're guarant I mean they're their lifetime is like 30 years. Uh, and you can go put them in whatever assault mine somewhere and you're good. And you could you could pack up your tapes and you could carry them in a in luggage if you wanted to. There's just a lot of aesthetically pleasing characteristics about it. So, we really tried to to make this work because you could get a whole multipetabyte storage system going for like less than 100 grand. Um, which is as far as like capital outlay that's like fairly cheap per terabyte. Um, but devil's in the details. So, like these tape systems are built for older school systems that have a that have a directory like a file system that you control through to see what needs to be backed up. Object storage is a little bit different. Uh, to figure out what needs to be backed up. With object storage, you need some kind of gateway which pulls new objects uh into a scratch space and then backs those up to tape and keeps a catalog. and there's just a whole finicky kind of additional system you need. Um, and then you'd have the troubles of like how do you actually uh how do you restore from tape? Well, the only case it would be in is like a truly end of the company kind of scenario where some kind of nuke is hit and you want to recover things and it's going to take weeks to recover from tape. So, yeah, it's truly an insurance policy at that point. like, well, it'd be an aesthetically pleasing insurance policy and one that's fun to consider, but it's not actually going to work. Dang it. >> Jeremy, thank you for joining us. This has been Recordables, a production of 37 Signals. To learn more from our technical team, check out the developers blog at dev.37s signals.com.
Video description
In this episode of RECORDABLES, we talk through the final and most nerve-racking part of our cloud exit — moving massive amounts of data out of Amazon S3. Principal Programmer Jeremy Daer shares how we moved billions of files with no downtime. He covers everything from dealing with bandwidth limits and AWS constraints to building custom tooling when off-the-shelf options won’t work. The conversation gets into the human side of a project like this, including verification, anxiety, and the moment you finally hit delete. You’ll also hear how long it actually takes to move that much data and the tools we used to make it happen seamlessly. *Timestamps* 00:00:00 – Introduction 00:02:05 – Why S3 was the last (and scariest) piece 00:08:34 – The volume of data to move 00:11:11 – Bandwidth limits and AWS constraints 00:13:12 – The custom-built Rails tool for copying and reconciliation 00:21:25 – The logistics of hard drives, write speeds, and network connections 00:28:05 – The intentional order of moving data 00:49:55 – Anxiety, verification, and the fear with deleting data you can’t get back 00:54:13 – Was there any downtime? 00:58:56 – Essential tools that made the migration possible 01:07:03 – What happens next *Links* Rclone – https://rclone.org/ DuckDB – https://duckdb.org/ S3 Fast List – https://github.com/aws-samples/s3-fast-list For the full episode transcript, visit https://dev.37signals.com/