RDBMS vs. NoSQL vs. NewSQL; which one to choose?

The market is flooded with many types of database and this creates dilemma for architects to choose one that suits the project’s requirement. ‘One size fits all’ approach is not longer applicable. Enterprises need to judge the relevance and applicability before adopting any of the databases. The different databases are scrutinised below to understand when to implement them.

  • RDBMS
  • No SQL – Columnar Databases
  • No SQL – Key Value Stores&Document Database
  • No SQL – Graph
  • New SQL

What is RDBMS – Relational Database Management System?

In RDBMS, we store data in form of table (a combination of rows and columns) and can create relationships among the tables to maintain consistency. RDBMS supports ACID (Atomicity, Consistency, Integrity, and Durability) properties for transactions. When to adopt RDMS:

  • The usage pattern involves retrieval of all or major column in a row in bulk
  • The usage pattern involves the update of all or most of the column in a table
  • The usage pattern involvesthousands of transactions which require ACID properties
  • The database storage capacity is within GB /TB
  • The usage pattern involves complex query including multiple data points
  • The usage pattern involves insert, update and delete functions

In short, RDBMS is best solution for OLTP (On line transaction processing) Systems.

Example: Oracle, Microsoft SQL Server, DB2 and MySQL

What is NoSQL – Columnar Database Management System?

Columnar database stores each column data from all the tuples together. Contrary to storing all column data corresponding to row sequentially in a page, values for each column in a row are stored together on the same page. This results in data for each row getting stored on different pages. When a query requires data for a row, column data for a row is pulled from all the pages storing the column values, appends them together before returning it to the user as a single row. When can we use it?

  • The usage pattern involves the retrieval of selected columns or one or two columns at a time
  • The usage pattern involve the update of selected columns or one or two columns
  • The usage pattern involve aggregations on a column
  • For an excellent load performance
  • The database size is huge and requires a scalable solution
  • CDBMS supports real time query and loading.
  • CDBMS supports high availability.
  • In CDBMS, Compression of data is efficient, since data which are similar are stored together. This help reduce space usage and improve disk IO. They also don’t need additional structure like indexes since columns themselves act as indexes.

Thus, CDBMS is best solution for Data Warehouses and Business Intelligence, Customer Relationship Management and Ad hoc query systems.

Example: Cassandra, H-Base (Apache), Vertica (HP) and SAP Hana

NoSQL – Key Value Stores & Document Database

Key Value Stores:
Key Value database is just a big hash table having keys and values. It is schema less database. The key can be synthetic or auto-generated while the value can be String, JSON, and BLOB (basic large object), etc. The data stored does not require nesting and deals with documents for key value store databases. Although some NoSQL key value store database allows three or four levels of nesting.

Document Database:
In Document Store Database data (a collection of key value pairs is compressed as a document) is stored quite similar to a key-value store, but the only difference is that the values stored (referred to as “documents”) have some structure and encoding of the data. XML, JSON (JavaScript Object Notation), BSON (binary encoding of JSON objects) are some common standard encodings. Document stores allow you to play with your data and store it as per requirement. They have high speed, good query timing based on indexing, but the main advantage with schema is flexibility and one can nest key-value pairs as much as required.

When can we use NoSQL?

  • To manage large streams of non-transactional data: Apache logs, application logs, MySQL logs, clickstreams, etc.
  • Forregular insert operations and less update operations.
  • For basic searching instead of complex searching.
  • For a schema less database or to change schema on the fly.
  • To store complete documents.
  • To handle large data volumes like in terabytes or petabytes with low latency.
  • To designdatabase to store information from social media or about social media transactionswhich are not durable.
  • For scaling out at low cost. Scaling out refers to spreading the load over many commodity systems.
  • For auto sharding. Sharding is a type of database partitioning that separates very large databases into smaller, faster, more easily managed parts called data shards.

In short, NoSQL is the best solution for storing social media information where data is humongous and requires scalability at low cost.

Example: Riak and Amazon’s Dynamo are the popular key-value store NoSQL databases, whereas Couchbase and MongoDB are the popular document-based databases.

NoSQL – Graph

Graph data is stored as nodes and relationships.  A Node is basically a noun, a person, place, thing, entity, etc.  A relationship is a one or two-way connection between two nodes.  A node could be people and a relationship could be a two-way friendship. One can have two nodes, a link to a website, and a user, and the relationship could be a one way “like” from the user to the website.
We can usually apply metadata documents to nodes and relationships, as well as the label, or label a node as a user /website /pet, etc. We can add data like name, age, gender to user nodes, its schema is less and quite flexible. When can we use it?

To handle highly connected data with good volume:

  • To handle Intensive data relationship with great performance by several orders of magnitude.
  • For a 360-degree view of master data and answer questions about data relationships in real time.
  • To design a system for Real-time analysis with data relationship uncovers fraud rings and other sophisticated scams.
  • To design Real-time recommendation engines.
  • Family, friends and followers extend into a social graph which reveals patterns of similar behaviour, influence, and implicit groups.

Conclusion: Graph database is best solution when one wants a 360 degree view of the highly connected data and analyse that data like Real-time recommendation engines and fraud detections.

Example: Neo4j, FlockDB and InfiniteGraph

What is NewSQL?

NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP), read-write workloads while still maintaining the ACID properties of a traditional database system.
They overcome the performance overheads of RDBMS caused by latching shared data structures, buffer pooling, record-level locking, and write-ahead logging features, primarily by embracing distributed computing architectures.

  • For RDBMS features and gigabytes or terabytes of data with high-speed transactional access.
  • To maintain RDBMS on many nodes across data centers and let the underlying system manage data locality and consistency.

Conclusion: New Generation RDBMS which supports large volume with high speed transactional access and applies to the OLTP and OLAP systems.

Example: VoltDB and NuoDB.

However, in today’s scenario, no single database is enough to handle all data requirements. The idea is to adopt a hybrid solution as it is difficult to criticize one and adopt another option.
Image Courtesy: thewindowsclub.thewindowsclubco.netdna-cdn.comThe market is flooded with many types of database and this creates dilemma for architects to choose one that suits the project’s requirement. ‘One size fits all’ approach is not longer applicable. Enterprises need to judge the relevance and applicability before adopting any of the databases. The different databases are scrutinised below to understand when to implement them.

  • RDBMS
  • No SQL – Columnar Databases
  • No SQL – Key Value Stores&Document Database
  • No SQL – Graph
  • New SQL

RDBMS – Relational Database Management System
In RDBMS, we store data in form of table (a combination of rows and columns) and can create relationships among the tables to maintain consistency. RDBMS supports ACID (Atomicity, Consistency, Integrity, and Durability) properties for transactions. When to adopt RDMS:

  • The usage pattern involves retrieval of all or major column in a row in bulk
  • The usage pattern involves the update of all or most of the column in a table
  • The usage pattern involvesthousands of transactions which require ACID properties
  • The database storage capacity is within GB /TB
  • The usage pattern involves complex query including multiple data points
  • The usage pattern involves insert, update and delete functions

In short, RDBMS is best solution for OLTP (On line transaction processing) Systems.
Example: Oracle, Microsoft SQL Server, DB2 and MySQL
NoSQL – Columnar Database Management System
Columnar database stores each column data from all the tuples together. Contrary to storing all column data corresponding to row sequentially in a page, values for each column in row are stored together in the same page. This results in data for each row getting stored in different pages. When a query requires data for row, column data for a row is pulled from all the pages storing the column values, appends them together before returning it to user as single row. When can we use it?

  • The usage pattern involves the retrieval of selected columns or one or two columns at a time
  • The usage pattern involve the update of selected columns or one or two columns
  • The usage pattern involve aggregations on column
  • For an excellent load performance
  • The database size is huge and requires a scalable solution
  • CDBMS supports real time query and loading.
  • CDBMS supports high availability.
  • In CDBMS, Compression of data is efficient, since data which are similar are stored together. This help reduce space usage and improve disk IO. They also don’t need additional structure like indexes since columns themselves act as indexes.

Thus, CDBMS is best solution for Data Warehouses and Business Intelligence, Customer Relationship Management and Ad hoc query systems.
Example: Cassandra, H-Base (Apache), Vertica (HP) and SAP Hana
NoSQL – Key Value Stores & Document Database
Key Value Stores
Key Value database is just a big hash table having keys and values. It is schema less database. The key can be synthetic or auto-generated while the value can be String, JSON, and BLOB (basic large object) etc. The data stored does not require nesting and deals with documents for key value store databases. Although some NoSQL key value store database allow three or four levels of nesting.
Document Database
In Document Store Database data (a collection of key value pairs is compressed as a document) is stored quite similar to a key-value store, but the only difference is that the values stored (referred to as “documents”) have some structure and encoding of the data. XML, JSON (Java Script Object Notation), BSON (binary encoding of JSON objects) are some common standard encodings.Document stores allow you to play with your data and store it as per requirement. They have high speed, good query timing based on indexing, but the main advantage with schema is flexibility and one can nest key-value pairs as much as required.
When can we use it?

  • To manage large streams of non-transactional data: Apache logs, application logs, MySQL logs, clickstreams, etc.
  • Forregular insert operations and less update operations.
  • For basic searching instead of complex searching.
  • For a schema less database or to change schema on the fly.
  • To store complete documents.
  • To handle large data volumes like in terabytes or petabytes with low latency.
  • To designdatabase to store information from social media or about social media transactionswhich are not durable.
  • For scaling out at low cost. Scaling out refers to spreading the load over many commodity systems.
  • For auto sharding. Sharding is a type of database partitioning that separates very large databases into smaller, faster, more easily managed parts called data shards.

In short, NoSQL is best solution for storing social media information where data is humongous and requires scalability at low cost.
Example: Riak and Amazon’s Dynamo are the popular key-value store NoSQL databases, whereas Couchbase and MongoDB are the popular document based databases.
NoSQL – Graph
Graph data is stored as nodes and relationships.  A Node is basically a noun, a person, place, thing, entity, etc.  A relationship is a one or two way connection between two nodes.  A node could be people and a relationship could be a two way friendship. One can have two nodes, a link to a website, and a user, and the relationship could be a one way “like” from the user to the website.
We can usually apply metadata documents to nodes and relationships, as well as label, or label a node as a user /website /pet etc. We can add data like name, age, gender to user nodes, its schema is less and quite flexible.When can we use it?
To handle highly connected data with good volume.

  • To handle Intensive data relationship with great performance by several orders of magnitude.
  • For a 360-degree view of master data and answer questions about data relationships in real time.
  • To design a system for Real-time analysis with data relationship uncovers fraud rings and other sophisticated scams.
  • To design Real-time recommendation engines.
  • Family, friends and followers extend into a social graph which reveals patterns of similar behaviour, influence, and implicit groups.

Conclusion: Graph database is best solution when one wants a 360 degree view of the highly connected data and analyse that data like Real-time recommendation engines and fraud detections.
Example: Neo4j, FlockDB and InfiniteGraph
NewSQL
NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP), read-write workloads while still maintaining the ACID properties of a traditional database system.
They overcome the performance overheads of RDBMS caused by latching shared data structures, buffer pooling, record level locking, and write-ahead logging features, primarily by embracing distributed computing architectures.

  • For RDBMS features and gigabytes or terabytes of data with high-speed transactional access.
  • To maintain RDBMS on many nodes across data centres and let the underlying system manage data locality and consistency.

Conclusion: New Generation RDBMS which supports large volume with high speed transactional access and applies to the OLTP and OLAP systems.
Example: VoltDB and NuoDB.
However in today’s scenario, no single database is enough to handle all data requirements. The idea is to adopt a hybrid solution as it is difficult to criticize one and adopt another option.
Image Courtesy: thewindowsclub.thewindowsclubco.netdna-cdn.com[:]