Designing Data-Intensive Applications CH7

Hello guys, today's article is going to be about chapter 7 from the book 'Designing Data-Intensive Applications' by the brilliant Martin Kleppmann. This chapter talks about transactions in a database, without further or do lets dive right in. The chapter starts off by listing a couple of things that can go wrong in a data system, such as

The database software may fail at any given time
The application itself may fail or crash at any given time
Interruptions in the network may cut off the application from the database
Several clients may right to the database at the same time, overwriting each other's changes
Race conditions between clients may cause unexpected bugs

And to achieve reliability, a system must deal with these kind of problems and make sure they don't cause failures to the whole system. Transactions are a way to minimize or combat these issues

What is a transaction?

It is a way for an application to group several reads and/or writes together in a single logical unit, and they are all executed as it's one operation. Either the transaction succeeds or it gets rolled back. And the application can safely retry the transaction without worrying about any partial failures. By using transactions, the application is free to ignore some potential error scenarios and concurrency issues, because the database takes care of them instead.

But transactions have their advantages as well as their limitations. In order to understand them clearly we'll go into them deeply.

The safety guarantees of transactions

They are often described by the well known acronym ACID (atomicity, consistency, isolation, durability), any system that doesn't implement ACID is known as BASE (basically available, soft state and eventual consistency) which means anything but ACID basically.

Atomicity refers to something that cannot be broken into several parts. It describes what happens if a client wants to make several writes but a fault occurs (e.g a crash, network interruption, full disk), if the writes are grouped together in an atomic transaction and the transaction cannot be committed due to a fault, then the transaction is aborted and the database must discard or undo any writes that were done so far in that transaction.

Consistency basically means that the application notion of the database being 'in a good state' , the idea behind consistency is that you have some specific statements about your data that must be true.(e.g credit and debit in an accounting system must always be balanced) so if a transaction is started with these statements, then they must be preserved once the transaction completes. But it's really the applications responsibility to define its transactions correctly so that they preserve consistency. That isn't something that a database can guarantee, if you write data that invalidates the statements the database can't stop you. In general the application defines what data is valid or invalid and the database only stores it. Atomicity, Isolation and durability are properties of the database, whereas consistency is a property of the application.

Isolation basically talks about the fact that most databases are accessed by lots of clients at the same time. If they are writing/reading several parts of the database that's no problem. But once they access the same parts you can run into concurrency issues (race conditions) Isolation means that concurrently executed transactions are isolated from each other, they cannot step on each others toes. In the classic database books they formalize isolation as serializability which basically means that each transaction thinks it's the only one running in the database at that specific time. This is known as a pessimistic approach which we'll get into later on. However the pessimistic approach is rarely used because it comes at a huge performance cost, most databases use a more optimistic approach and we'll get into that later on.

Durability is the promise that once a transaction has committed successfully, any data it has written will not be forgotten, even if there is a hardware fault or the database crashes. The data has been written to a storage such as a hard drive or a SSD. Also usually involves a write-ahead log that makes B-trees reliable by basically writing the writes that are sent to the database before actually saving them just in case a database crashes or any fault occurs. It allows recovery in the event that anything went corrupt.

To wrap up this part, in ACID, atomicity and isolation describe what a database should do if a client makes several writes within the same transaction. In atomicity you don't have to worry about partial failures as if the transaction fails half way, a whole rollback to the transaction will occur as if it never happened. And isolation guarantees that concurrently running transactions shouldn't interfere with each other. If one transaction makes several writes then any other concurrent transaction should see either all or none of those writes, not some subset.

Then we proceed to talk about 2 types of operations; Single-Object and Multi-Object operations. Multi-Object Operations concern modifying multiple rows or objects from a single table or multiple, these type of operations need a transaction so that everything is kept in sync and data integrity is ensured. Anything between BEGIN and COMMIT in a transaction is considered to be in the multi object operation.

Single-Object Operations need transactions as much as multi objects do, imagine writing a 20KB JSON document to a database

If a network interruption occurred during the first 10 KB, does the database store that 10KB or rollback?
If power fails while overriding the previous value, which value gets stored?
If another client reads the document whilst updating it, does he see a partial value?

Since these questions are confusing databases guarantee Isolation and Atomicity for a single object. Single Object operations are useful because they prevent lost updates (don't worry if you don't know what that means we'll get into it in Part II) when several clients write to the same object concurrently, however they are not transactions in the usual sense.

Error handling and aborting

One of the key features in transactions is that it can be aborted and safely retried if an error occurred, ACID databases are based on this philosophy. But errors do happen, and popular ORMs (Object Relational Mappping) like Rails' ActiveRecord doesn't retry aborted transactions.It usually bubbles up a raised execution through the stack. Although retrying the aborted transaction Is a simple error handling mechanism, it's not always efficient

If the transaction succeeded but the network failed when notifying the client, deduplication may occur on retry as the transaction will be made twice (unless you have a deduplication mechanism)
If the failure is due to overload then retrying the transaction will make things worse, so limiting the number of retries is the way to go
It's only worth retrying on a transient error ( deadlock, isolation violation or network error) but not permanent errors (constraint violation) as a retry would be pointless.
If the transaction has side effects outside of the database (e.g sending an email when the transaction commits) you wouldn't want to send an email every time we retry a transaction as these side effects may happen even on transaction abortion. Two-Phase Commit can help in such cases (I recommend watching Martin Kleppmann's video explaining Two-Phase Commit and how it works in distributed systems if you're interested)
If the client fails while retrying any data it's trying to write is lost.

This is an introduction on transactions and in the next chapter we'll dive deeper into Isolation and different isolation levels and various problems that can occur. See you in the next one!