Why NoSQL- Part 1 – CAP theorem

This is part 1 of Why NoSQL series. In this series we will take a look at why we need NoSQL. Basically we need to understand what good NoSQL brings to the table where SQL can’t. So we will go through point by point to analysis about NoSQL.The first part will be about CAP theorem and how it is been supported by RDBMS and NoSQLs.

Before we examining the CAP theorem we need to revise the famous ACID transactions which every RDBMS to be supported. ACID stands for Atomic, Consistence,Isolated, Durable.

Atomic:
The transaction is indivisible – either all the statements in the transaction are applied to the database, or none are.

  • Consistent:
    The database remains in a consistent state before and after transaction execution. simply, whatever rows will be affected by the transaction will remain consistent with each and every rule that is applied to them.
  • Isolated:
    While multiple transactions can be executed by one or more users simultaneously, one transaction should not see the effects of other concurrent transactions.
  • Durable:
    Once a transaction is saved to the database (an action referred to in database programming circles as a commit), its changes are expected to persist.

All leading RDBMS support ACID transactions. This is great and where comes the problem? The era of web 2.0, applications are started to deal with billions and trillions of data every day and scalability comes in to the picture. So the database needs to be distributed over the network to make it horizontally scale. That emerges the concept CAP theorem to evaluate a distributed data storage system.
CAP theorem was developed by Eric Brewer in the year 2000. Lets have a look at CAP theorem.
CAP stands for Consistency, Availability and Partition-Tolerance.

CAP theory says that it is impossible to meet all the
three attributed of the CAP theorem.

. Lets look one by one.

The cap theorem with nosql
CAP theorem diagram
  • Consistency:
    If I wrote a data in one node and read it from another node in a distributed system, it will return what I wrote on the other node.
  • Availability:
    Each node of the distributed system should respond to the query unless it dies.
  • Partition-Tolerance:
    This shows the availability and seamless operation of the distributed system even the partition (add/remove node from different data center) or message loss over the network.

Now let us examine why it is impossible to satisfy all the attributes of the CAP theorem.

Consider a distributed system like above, and We are updating a data on node-1 and trying to read the data from node-2, the possible outcomes will be,

  • 1. node-2 may return the last best version it has which obviously violates the Consistency.
  • 2. In case we like to wait the data propagate in to node-2 and node-2 has to wait for the newer version to be update. Since it is a distributed system there is a high chance of message failure and node-2 keep on waiting. So it can’t respond to the queries even though it alive. Hence it violates Availability
  • 3. Now we want both Consistency and Availability, to achieve this the network should not be partitioned, hence Partition-Tolerance violated here.

The deciding point of the NOSQL is what are the attributes you need to consider for your data model. If you need high consistence data model RDBMS still the best. But if you can compromise consistency but you need high available and Partition-Tolerance data model than NoSQL would be the right choice.

In the coming series we will look in to different NOSQL databases, its data structures, distributed locking, eventual consistency and their architectures.