Databases: Are We There Yet? - Spasov

ClojureTV · 1.9K views · 72 likes

Analysis Summary

30% Low Influence

mildmoderatesevere

“Be aware that the 'spaghetti architecture' story is a simplified narrative designed to make the speaker's specific technical solution appear as the only logical escape from systemic failure.”

Transparency Mostly Transparent

Primary technique

Human Detected

98%

Signals

The content is a live conference presentation featuring natural human speech patterns, audience interaction, and specific technical expertise. The presence of verbal fillers and real-time reactions to the room confirms this is a human-delivered talk.

Natural Speech Disfluencies Transcript contains filler words ('uh'), self-corrections ('the the story'), and conversational pauses ('So uh we and of course').

Live Audience Interaction Speaker conducts a poll ('Who here loves closure?') and reacts to the audience's physical response ('significantly less people love their database').

Contextual Metadata Recorded at a specific industry conference (Clojure/Conj 2025) with a named speaker (Rangel Spasov) and professional biography.

Narrative Structure The storytelling is anecdotal and opinionated ('What the rationale for that is we don't know but it happened') rather than a generic AI summary.

Worth Noting

Positive elements

This video provides a clear, high-level conceptual introduction to the DBSP formalism and the benefits of incremental computation for database performance.

Be Aware

Cautionary elements

The presentation uses a highly dramatized 'failure' narrative of traditional architectures to bypass a critical comparison with existing industry-standard optimization techniques.

Influence Dimensions

How are these scored?

About this analysis

Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.

This analysis is a tool for your own thinking — what you do with it is up to you.

Analyzed March 23, 2026 at 20:38 UTC Model google/gemini-3-flash-preview-20251217

More on This Topic

Related content covering similar topics.

Datomic Cloud - Getting Started

ClojureTV

Minimal Transparent

clojure datomic

Datomic Cloud - Datoms

ClojureTV

Minimal Transparent

clojure datomic

A database that doesn't "change"? Go behind the scenes of Datomic, the immutable database!

Building Nubank

Minimal Transparent

clojure datomic

Transcript

I'm going to talk about incremental computation and databases. I'm going to talk about three things. A short story about the world today. Then we're going to take a quick trip to the past and then we're going to go back to the future and see if there's anything we might want to do differently. First of all, quick poll of the audience. Who here loves closure? That's a little bit of a rhetorical one. Who here loves their database? >> Some, yes, but significantly less people love their database than their programming language. Ever wonder why that is? Let's start with a short story and see if uh we can get uh some insights why the the story is called the world today. It's a fictional story and any likeness with with actual events is purely accidental. So it's a story about a company that is building their first product. So we're building a product. We're using a database and a and a web server. And the database in this case is Postgress. We've just gotten our first users and we we're really excited. The best part is that the users really love our product. They they're using it all the time. They're calling our API hundreds of times a second, sometimes thousands of times a second. So they send a request, we take some data, we we write it to database, and we return response. Pretty straightforward. Soon we get more users, we're processing more requests, we're making revenue, and we're officially in business. Postgress works great up to the point where it doesn't. So whether somebody uh issued an a heavy analytics query at 5:00 p.m. you know querying the whole year worth of data we actually don't know what the root cause is of the problem but we do know that our database is down our API is down and things are clearly not working. We might try a number of reasonable options but we finally decide to do the right thing and we add a data warehouse. So uh we and of course we do add a queue to copy the data over and we hire some engineers to manage the system and things are going well at this point. So we hire more engineers and then we add another warehouse because uh data bricks because our other engineers like that because it's good at AI and then we add another queue of course to copy the data over there. At some point somebody makes a critical decision to take some data request data user data and write it directly to data bricks to the warehouse. What the rationale for that is we don't know but it happened. The data is only there. Sometime later we realize actually we really need to build a user feature out of that data. So how do we do it? Well data warehouses are really great at responding to requests but they take might take 10 20 30 60 seconds to respond. Our API needs to respond in 60 milliseconds not in 60 seconds. So how do we solve that problem? Well, of course, we add another service called a reverse ATL service and we copy the data back to Postgress so we can actually do some data transformation and finally answer the questions that I actually have. So unfortunately this is where most companies end up high complexity and low momentum and actually out of money. So it's game over. So can we do better? But before we jump to solutions, let's take a quick trip to the past and see how databases actually answer questions. How does a database database answer a query? It starts with a declarative program in a language which might be SQL or data log. Database takes that declarative program and compiles it into a runnable code and the and the result of that compilation is effectively a pure function. Then we take our database, we pass it to the query function and we get a query result. Here's a question. How long does it take for that query to run and produce a result? 50 milliseconds, 50 seconds, 50 minutes or more. Of course, it's a bit of a stretch question because I've never g I haven't given you any information that's relevant to even begin to answer it. But let's assume it's 50 seconds. So, let's do this again. Here's our data database data. Almost the same, but except for the new piece of data at the bottom. So, how long does it take for the query to run? Now, this is a bit more shaper. We kind of know from experience it should take 50 seconds again, right? So if you compare the two query runs, we have the same query almost exactly the same database and we have the same execution time. So is there any opportunity for improvement? Are we just going to redo the same computation over and over on the same databases and almost the same data sets every time? The truth is most databases have not really changed since they were first invented. Yes, there's been some changes perhaps column storage being a notable example and yes the atomic is absolutely great but for the most part only the packaging at least for the popular databases is and the marketing is different and crucially and that's a very crucial point in terms of query execution they all start from scratch every time so looking ahead to the future can we do any better I believe the answer is yes using incremental computation we can actually stop recmp computing query results from scratch. What incommercial can enable us to do is get rid of extra services, data warehouses, random madeup acronyms, and we can actually get rid of those guys as well. So all we really need is our original database with an incremental query engine next to it and possibly a Q. But in the future, but in the future, we definitely don't need data warehouses. What we need is incremental computation. And the good news is that incremental computation is fully general. It can work with datomic posgress and my SQL even graph databases and warehouses. It's really not coupled to a specific data system or paradigm. Pretty much this is the only requirement. As long as a data system can provide a total order of divs, transactions or whatever the system calls it, it will most certainly work. However, determic in my experience is by far the best system I've seen that is the best fit for incremental computation. I im immutability and the fact that it exposes a transaction log as a first class feature really helps. So in conclusion for incremental computation with traditional query execution there is no reuse no efficiency between runs with with traditional query execution time scales with the total data size which is really not ideal. If you compare that to incremental computation, increment incremental computation enables efficient reuse between query runs. So we don't do the same work over and over again. And most importantly, the query execution time scales with the diff between updates. So as long as your data changes a little bit and vast majority of use cases that I've seen, that's that is definitely the case. You can achieve single milliseconds response times reliably all the time. For the past year, I've been working on incremental query engine for the atomic. It's under active development. It's open source on GitHub and we're pushing updates there every week. One last bit I want to leave you with a data structure called zets. They're an awesome fit with transducers. Transducers and zets are the workh horses that make a correct incremental query computation possible. Please come talk to me afterwards if you want to see how they work, but they're truly truly awesome. And since this is a closure conference, you know, I can't say that lightly. Thank you so much. [applause]

Video description

Most databases since the beginning of time have tried to answer queries in the same fashion: by starting from scratch every time we ask a question. More data means slower queries, especially those involving JOINs. So is this it? Are we done? Do we accept that more data requires complex pipelines and a warehouse? We argue that this is not the case. What if databases could maintain correct results while only processing deltas? DBSP is a simple formalism for doing exactly that. It also happens to be a great fit for Datomic! Now, complex queries over large datasets can be executed instantly, often on a single node. Our Clojure library implements DBSP circuits as transducers, targeting Datomic. Biography Rangel is the founder of Saberstack. Prior to Saberstack, he served as the CTO of companies across ad tech, gaming, and e-commerce. There, he lived and experienced first-hand the state of the database world while working on real-time systems for millions of users that served millions of requests per minute. Rangel has been solving problems with Clojure since 2013 and intends to continue doing so. Recorded Nov 13, 2025 at Clojure/Conj 2025 in Charlotte, NC.