NoSQL vs SQL: Demystifying NoSQL Databases

There’s a lot of talk about NoSQL being the database of the future. Let’s take a look at NoSQL vs SQL databases, and what the differences are, use cases, and why you would use one over the other. Relational SQL databases have been the default database type for many years, but NoSQL offers a few additional options that fit well in many use cases. This article will dive into the specifics of what NoSQL databases offer, and what the differences are between NoSQL and SQL databases. Let’s dig in and demystify the NoSQL vs SQL database debate!

[toc]

What is NoSQL?

In recent years, there’s been increasing amounts of discussion about NoSQL databases. One of the big reason for this is that the massive, Internet scale websites like Facebook, Twitter, Netflix, and many others rely on NoSQL databases heavily. NoSQL databases are part of what enables them to handle the massive, Internet scale level of traffic and users. These are generally use cases where an application built with Relational SQL databases can’t handle the amount of load with adequate performance.

NoSQL databases are non-relational databases that store data in a manner different than traditional Relational SQL databases

While the term “NoSQL” is relatively new in the industry, these types of databases were first created in the late 1960’s. It wasn’t until the early 2000’s they started being called “NoSQL” after Carlo Strozzi first coined the term in 1998 and then Johan Oskarsson and other stated using the term again in early 2009. His original definition of NoSQL was a database that did not support the SQL (Structured Query Language) query language interface.

While the term NoSQL first represented databases that didn’t support SQL, this isn’t really the definition today. Modern NoSQL databases are often more like “Not only SQL” databases, as many of them do support a SQL language subset for querying data, even though they do not store the data in a relational manner.

NoSQL databases are non-relational databases that store data in a manner different than traditional Relational SQL databases; such as SQL Server or Oracle. Some NoSQL databases support using a SQL language syntax for querying, so the term “NoREL” (aka No Relational) is probably more appropriate for the NoSQL databases available today.

One of the key aspects to NoSQL databases that is different than traditional SQL databases or relational databases is the method of how data is organized into the database. In relational SQL databases, you store the data in a normalized manner that is optimized for storage, and then you use the SQL language to query that data across table relationships.

In NoSQL databases, data is stored in a manner that optimizes for how the data will be queried. This means there can and generally will be data duplication or de-normalization. This is done so queries are fast, and data writes are performed asynchronously so multiple data collections can be kept in synchronization by replicating the data out across the NoSQL database cluster.

In NoSQL databases, data is stored in a manner that optimizes for how the data will be queried.

Keep in mind that a full database server isn’t needed to use NoSQL to store and query data. If you look at the most basic level of what NoSQL means, then you can see that storing data in a JSON or XML file, opening that file, modifying data, and saving the file is still considered NoSQL. You aren’t using the SQL query language and the data is stored in a non-relational manner. Sometimes you can even get a lot of great performance out of this simple data storage / retrieval method. However, most cases you’ll be using a full NoSQL database server as it will offer you many benefits in addition to fast data access.

NoSQL vs SQL Databases

There are a few differences between NoSQL and SQL databases. The initial differences are how the data is stored and queried. There are many other differences between the two database types. These differences can also vary depending on which SQL and NoSQL databases you’re comparing against each other.

What Databases are available?

There are many different databases available for use for both NoSQL and SQL database types. There are also many different Free, Open Source or proprietary, paid options available. The variety of databases available strengthens the industry, as it enables applications the ability to use the specific database server whose benefits fit the applications data needs best.

Here’s a list of a few different Relational SQL Databases and NoSQL databases available today:

Relational SQL Databases
- Microsoft SQL Server
- Azure SQL Database
- MySQL
- PostgreSQL
- Oracle
NoSQL Databases
- MongoDB
- Azure Cosmos DB
- Cassandra
- Redis Cache
- HBase
- Neo4j

There are of course many other databases available in both the SQL and NoSQL spaces. These is just a list of some of the most popular databases available.

What are the Data Query differences between SQL and NoSQL?

SQL databases store data in a tabular format on the file system in some fashion, in a manner that allows for multiple “Tables” of data to be linked through relationships. This is something that enables the cross querying and joining of data across multiple Tables within a single SQL query against the database. This method uses a data storage method that is optimized for storing the data most efficiently, and then writing queries against the data. Once this is done, then the Indexes on the Tables of data can be created and optimized to help increase the performance of the database.

NoSQL databases are used in a manner where data is stored in a way that optimizes for query performance. This offers the ability to run extremely fast queries to retrieve the data needed by an application. However, to do this there may be a bit more overhead to building the application. If you duplicate data across multiple collections so you can have each optimized for a specific set of queries, then you will have duplicate data to maintain. To handle this, you will design the application differently to write the at a to the multiple data stores simultaneously and keep everything synchronized.

Keep in mind that you don’t need to duplicate data with NoSQL, but sometimes you may need to. When you do, it can be important to perform data writes asynchronously so that you can ensure to write the data to all the data stores necessary. When writing data asynchronously, remember that your database is now eventually consistent. Meaning that when data is saved, it may not be available immediately in the expected state, but eventually it will be once all writes and replication of the data has been completed.

What’s the difference between Eventual and Strong Consistency?

One of the most desired features of a SQL or Relational database is the Strong Consistency of data. This means that if you query for data immediately after writing it to the database you will get the latest data back. When working with SQL databases these behaviors can be taken for granted as it’s a core feature of all SQL databases.

With NoSQL databases, the most common consistency level for data stored is Eventual Consistency. Eventual consistency mean that when data is written to the database it may not be immediately available. Since NoSQL databases are designed to operate by replicating data and queries across multiple nodes (or multiple machines), there is a chance that when you query for data that was just written you may not get back the latest data expected. Eventual consistency may sound like a flaw in NoSQL databases, but it’s a key feature that enables the high, Internet-level scalability of NoSQL databases.

Eventual consistency may sound like a flaw in NoSQL databases, but it’s a key feature that enables the high, Internet-level scalability of NoSQL databases.

Even though Eventual consistency is most common with NoSQL databases, there are several NoSQL databases that support Strong consistency or some consistency levels with features of both levels. Before deducing on which NoSQL database to use, it’s good to research the levels of consistency they support. There are a few options that can be chosen to help optimize the database consistency behavior to fit the needs of the application and its data.

Distributed Systems and the CAP Theorem

NoSQL databases are distributed systems designed to operate across multiple nodes. A single node is a Virtual Machine or even a physical machine. The NoSQL database will be running across a cluster of nodes so it can perform query and data storage distribution.

As a distributed system, NoSQL databases offer some benefits in availability and scalability that a single SQL database does not. If a node goes down, the database remains available and very little data is lost, if any. There are even features such as quorum agreement when querying or writing data to ensure the stability and availability of data stored in the NoSQL database.

What is the CAP Theorem?

When discussing any distributed system, it’s important to understand the CAP Theorem. The CAP Theorem, also known as Brewer’s theorem, was first presented by Eric Brewer, a computer scientist at the University of California, Berkeley in 1998. It helps describe the behavior of distributed data systems.

The CAP Theorem states that it’s impossible for a distributed data store to simultaneously provide more than 2 of the following 3 guarantees:

Consistency: All queries receive the most recent data written to the data store.
Availability: All requests receive a valid response without error.
Partition Tolerance: All requests are processed and handled without error regardless of network issues or failed nodes.

Consistency is the guarantee related to the data being replicated and synchronized across all the nodes in the cluster. Strong consistency is that all nodes will always have the same data and never be out of sync. Eventual consistency is when the data across the nodes may be different at first, but over time they will be synchronized and eventually be consistent with one another.

Availability is the guarantee related to the high availability and uptime of the system. The highest availability, so 100% uptime, would be where every request always receives a response and zero errors are returned.

Partition Tolerance is the guarantee related to the health of the system. Server and networks fail. High partition tolerance will ensure that the system will remain operational and stable if one or more nodes fail for some reason.

CAP Theorem states that one of these guarantees needs to be sacrificed in some fashion to achieve the other two more fully.

As you can probably reason while comparing these 3 guarantees, it’s impossible to guarantee each of these 100% in the same system. One of these guarantees needs to be sacrificed in some fashion to achieve the other two more fully. It’s also not a case where you need to pick two of the three. A system will more highly guarantee two of these over the third in some fashion. It’s a case where you can only have two of the three, but some range of the three simultaneously.

4 Types of NoSQL Databases

There are 4 primary types of data classifications for NoSQL systems. Each of these store the data differently and optimize queries for the type of data and storage that type provides.

Document
Key/Value
Column
Graph

There are different types of NoSQL data stores, with different use cases in mind. This allows you to use the NoSQL database that best fits the application and its data. None of these is particularly “better” than each other, or even SQL databases, but rather alternatives to add options to the toolset of application building.

The different types of NoSQL databases allow you to use the data storage type that best fits the application and its data.

What is Document NoSQL Storage?

Document NoSQL is probably the most common type of NoSQL database. Document NoSQL stores data in a semi-structured format as documents. Most Document NoSQL databases store the documents of data as either JSON or XML documents, but could also be YAML, BSON, or other data formats depending on the NoSQL database system.

Here’s an example of a simple document using JSON format:

{
  "Id": "1234",
  "FirstName": "Zefram",
  "LastName": "Cochrane",
  "Birthday": "April 5, 2030",
  "Interests": [
    "Rockets",
    "Spaceships",
    "Spaceflight"
  ]
}

What is Key/Value NoSQL Storage?

Key/Value NoSQL is also a common type of NoSQL database. It’s often used as a cache database; such as Redis Cache. Key/Value stores data with the Key as the identifier and the Value as the data itself. This makes retrieving the data by Key very fast, as it’s basically just a dictionary lookup to query the data.

Here’s an example of what some data might look like stored as Key/Value NoSQL:

Key	Value
K1	Bart,Homer,Smithers
K2	Marge,Magie,Lisa
K3	Chris
K4	1,2,3,4

What is Column NoSQL Storage?

Column NoSQL is probably the closest NoSQL type to a relational SQL database table. However, the concept of a Table doesn’t really exist in Column NoSQL. A column can be part of a Column Family that resembles at most a relational row, but may appear in one row and not in the others.

Some of the most popular Column NoSQL databases are a combination of Key/Value and Column in one. Such as Cassandra and Azure Table Storage, or Azure Cosmos DB using Cassandra API.

What is Graph NoSQL Storage?

Graph NoSQL storage is one that’s quite a bit different than the rest. It uses graph structures made up of nodes and edges (aka relationships). It basically is a way to store data as a mixture of Nodes with the Relationships between them to map out the data in a graph structure. This type of NoSQL enables data to be queried very efficiently when you need to traverse the relationships very quickly.

Some uses of Graph NoSQL would be storing a friends list and mapping the relationships between all the “friends” so they can be queries. Another is to be able to map a customer’s interests or past purchases and generating recommended products for that customer to buy.

You can see implementations of Graph NoSQL storage in the following example systems:

Netflix and their recommended watch lists
Facebook and the social graph of friends and friends of friends
LinkedIn and professional connections in your network

What is Polyglot Storage?

There are different data types and uses patterns that fit different database types better than others. There are many times where a SQL database would be the best database to use. There are also times where a NoSQL Graph, Column, Key/Value, or Document database would fit best. As a result, there are also times where multiple data stores may be necessary to provide the best data storage system for an application or enterprise system.

“Polyglot Storage” is the term used to describe a system that uses multiple database types, or storage types, for different data within the same system. It’s the basic idea of using the correct data storage that makes sense for the data, rather than just fitting all the application data into a single data storage type.

“Polyglot Storage” is the term used to describe a system that uses multiple database types, or storage types, for different data within the same system.

For many years, most developers would build an application using SQL databases and just put all the data in the same database. They would then just figure out how to query the data when needed. In a Polyglot Storage solution, there may be a SQL database for some data, and a NoSQL database (or multiple) for other data. This enables better software systems to be created with greater performance and flexibility.