Concurrency control should become a familiar term if you are dealing with locks.
In a large application accessed by thousands of users, concurrency is inevitable. Your application should be able to handle multiple requests simultaneously.
When you execute operations concurrently, the results can be conflicting. For e.g. if you are reading a row while someone else is writing to it simultaneously, then you are bound to get inconsistent data.
If we execute these transactions sequentially, then we don’t need concurrency control. But sequential execution affects scaling of systems. Hence, we cannot avoid concurrent transactions.
But when operations are executed concurrently, how do we make sure that the results are consistent?
There are various techniques for concurrency control. And one of them is using locks 🔐
So what are locks exactly? All you have to know is that locks are a mechanism to ensure data integrity. How these locks are implemented is beyond the scope of this blog.
So, how do they work? Locks work by letting only one transaction modify or access a certain row or table. This makes it look like the transactions are executed one after the other and hence maintaining data integrity.
Different types of locks
There are majorly 2 types of locks; exclusive locks and shared locks.
Shared locks let you read the row or the table that is being locked. Hence, it’s also called a read lock. Multiple transactions (you can think of each transaction as a separate process) can acquire a shared lock on the same resource and read from it. No transaction is allowed to update the resource while it has a shared lock.
Exclusive locks lock the row or table entirely and let the transaction update the row in isolation. Unlike shared locks, only one transaction can acquire an exclusive lock on a certain resource at one point in time. And while it acquires the lock on that resource, other processes that want to acquire the lock on the same resource will have to wait. Once the lock is released, the remaining processes can acquire it and make modifications.
You should note that multiple shared locks can be acquired on a resource at one time. But if the resource already has a shared lock then another process cannot acquire an exclusive lock on it.
Similarly, a process cannot acquire a shared lock on a resource that is locked by an exclusive lock.
There are other types of locks that are modifications to the above. For e.g. write locks that let you read a resource while it has been updated.
Optimistic vs pessimistic locking
This is an interesting concept. Remember how I mentioned above that all the parallel processes that want to do database transactions will have to sequentially execute them in order to maintain data integrity?
But that can slow things down quite a lot, especially when the transactions involve multiple steps or updates.
So there are two ways of going ahead with it.
One approach is to let the processes update the same records in parallel. Only when one process successfully commits its changes, the other process is told that there exists a conflict due to which the other process will have to attempt the transaction again.
Another approach is to lock the row as soon as one process attempts to modify it (or delete it) and ask the other processes to wait before doing anything.
The first approach is named optimistic locking. It’s letting everyone have a chance at updating the records.
The second approach is called pessimistic locking wherein the first one to attempt gets the chance to update the record.
The difference in these two methods can be felt in large transactions. Remember that a transaction can have a single step where it updates a row, or it can have multiple steps where it fetches a column from a row, queries another table, deletes a couple of rows, etc. In a transaction with multiple steps, having a pessimistic lock may not be a good idea since if the transaction locking the rows fails in between, the time that it spent has gone to waste. While in such scenarios if you have optimistic locking, other processes might be able to complete their transactions.
Problems associated with locks
Locks can cause tricky issues at times. What if the process that acquired the lock fails to release it, locking the entity from being accessed by other processes? It often happens that transactions go idle. If you haven’t set a timeout for transactions, they may acquire a lock and not release it.
Another issue is with lock contentions. There might be resources in your database that are accessed a lot. A number of transactions might try to acquire locks on these resources. As subsequent transactions have to wait for others to finish, there might be a delay in executing these transactions. This leads to the contention of these resources or locks — imagine single checkout counters in stores where everyone has to wait for their turn!
Another issue that usually occurs with locks is a deadlock situation. In this scenario, one transaction acquires a lock on a resource (resource A) and tries to access the lock on another resource (resource B) before releasing the lock on the first resource. At the same time, another resource acquires the lock on resource B while aiming to acquire the lock on resource A. Now both the processes are not complete without acquiring these locks.
But since resource B is locked by the second process, the first process cannot capture a lock on it. Similarly, since the first process has acquired a lock on resource A, the second process cannot acquire a lock on it. These two processes keep fighting for each other’s locks leading to a deadlock situation.
Let’s look at another issue associated with locks.
Row-level locks are the most common and safest form of locking. But as locks are maintained in the memory of the database, storing them comes with a cost. If there are too many locks on row-level and your database is running out of memory, it might escalate the lock to table-level to free up some memory. This can have implications you may not have anticipated!
Transaction Isolation
ACID-compliant databases need to make sure that each transaction is carried out in isolation. This means that the results of the transaction is only visible after a commit to the database happens. Other processes should not be aware of what’s going on with the records while the transaction is carried out.
What happens when a transaction tries to read a row updated by another transaction?
It depends on what isolation level the database is operating on w.r.t. that particular transaction. Let’s explore the problems that can occur.
Problems when transaction isolation is not done
- Dirty Read — Let’s take a situation where one transaction updates a row or a table but does not commit the changes. If the database lets another transaction read those changes (before it’s committed) then it’s called a dirty read. Why? Let’s say the first transaction rolls back its changes. The other transaction which read the row/table has stale data. This happens in concurrent systems where multiple transactions are going on in parallel. But this can be prevented by the database and we will explore how later.
- Non-repeatable read — Another side effect of concurrent execution of transactions is that consecutive reads can retrieve different results if you allow another transaction to do updates in between. So, if a transaction is querying a row twice, but between the reads, there is another transaction updating the same row, the reads will give different results.
- Phantom read — In a similar situation as above, if one transaction does two reads of the same query, but another transaction inserts or deletes new rows leading to a change in the number of rows retrieved by the first transaction in its second read, it’s called a Phantom read. This is similar to a non-repeatable read. The only difference is that while in a non-repeatable read, there will be inconsistency in the values of a row, in phantom reads, the number of rows retrieved by the queries will be different.
How do databases deal with this?
They implement levels of isolation to avoid such problems.
We discussed locks in the previous issue. While locks were supposed to be used for complete isolation of transactions, if they are going to limit other transactions from doing anything while the resource is locked then lock contentions might create a problem.
The solution to this is having different levels of isolation. Let’s discuss the most common ones. These are mentioned in increasing order of isolation levels.
Levels of Isolation
- Read uncommitted — This level of isolation lets other transactions read data that was not committed to the database by other transactions. There is no isolation happening here. So, if transaction 1 performs an update and before it’s able to commit, if transaction 2 tries to access the updated data, it will see the new data. This does not solve any issues mentioned above.
- Read committed — This, as the name suggests, lets other transactions only read data that is committed to the database. While this looks like an ideal level of isolation, it only solves the dirty read problem mentioned above. If a transaction is updating a row, and another transaction tries to access it, it won’t be able to. But this can still cause non-repeatable and phantom reads because this applies only to updates and not read queries.
- Repeatable read — To counter the transactions from getting inconsistent data, we need a higher level of isolation and that is offered by repeatable read. In this, the resource is locked throughout the transaction. So, if the transaction contains two select queries and in between, if another transaction tries to update the same rows, it would be blocked from doing so. This isolation level is not immune to phantom reads though it helps against non-repeatable reads. This is the default level of isolation in many databases.
- Serializable — This is the highest level of isolation. In this, all concurrent transactions “appear” to be executed serially. Pay attention to how I said it appears to be. That’s because it’s not truly serially or sequentially executed. This level works against phantom reads as well.
Advisory Locks in Postgres
This is an interesting feature of Postgres that I came across recently. One of the applications I was working on required to have a distributed locking strategy to control concurrency issues.
The problem was to have a common datastore that contains tasks that multiple, distributed workers need to execute. If one worker picks up a task, the other one shouldn’t be able to pick it up.
To have an application-level control over the locks where a shared resource is accessed by multiple processes/distributed workers and you need to make sure that only one worker can access it at any point in time, the best option is to have advisory locks.
Advisory locks help an application define a meaning to the lock, i.e. if you want to acquire a lock while you perform certain tasks in your application you can do that using advisory locks. In the case of row or table-level locks, the code can get ugly with SQL statements mingled with application logic.
The idea is to acquire a lock for a custom ID you generate. E.g., for every task, an ID is generated. In a distributed system, if one worker process has already acquired a lock on a task using its ID, the others won’t be able to. Once the business logic is executed the lock is released. Since each task has different lock IDs, the processes can acquire locks simultaneously.
Wrapping up
In this article, we looked at different types of locks i.e. shared locks & exclusive locks.
Furthermore, we understood the concepts of optimistic and pessimistic locking.
We also explored various problems associated with strict locking, namely, lock contention, deadlocks, locking tables for long periods of time etc.
Isolation is important to understand if we are dealing with locks, hence we looked at various levels of isolation and what problems they solve. We also briefly looked at advisory locks which is a very helpful feature while developing distributed systems.