Introduction of NoSQL Databases

NoSQL database provides a mechanism for storage and retrieval of data that employs less constrained consistency models than traditional relational database. No SQL systems are also referred to as "NotonlySQL“ to emphasize that they do in fact allow SQL like query languages to be used. While NoSQL databases have existed for many years, NoSQL databases have only recently become more popular in the era of cloud, big data and high-volume web and mobile applications. They are chosen today for their attributes around scale, performance and ease of use. The most common types of NoSQL databases are key-value, document, column and graph databases. It's important to emphasize that the "No" in "NoSQL" is an abbreviation for "not only" and not the actual word "No." This distinction is important not only because many NoSQL databases support SQL like queries, but because in a world of microservices and polyglot persistence, NoSQL and relational databases are now commonly used together in a single application.

NoSQL databases are purpose built for specific data models and have flexible schemas for building modern applications. NoSQL databases are widely recognized for their ease of development, functionality, and performance at scale. This page includes resources to help you better understand NoSQL databases and to get started. The SQL standard enables users to easily migrate their database applications between database systems. In addition, users can access data stored in two or more RDBMSs without changing the database sub-language (SQL). NoSQL databases use a variety of data models for accessing and managing data. These types of databases are optimized specifically for applications that require large data volume, low latency, and flexible data models, which are achieved by relaxing some of the data consistency restrictions of other databases.

NoSQL databases are designed for a number of data access patterns that include low-latency applications. NoSQL search databases are designed for analytics over semi-structured data. 

The popularity of NoSQL has been driven by the following reasons:

  • The pace of development with NoSQL databases can be much faster than with a SQL database.
  • The structure of many different forms of data is more easily handled and evolved with a NoSQL database.
  • The amount of data in many applications cannot be served affordably by a SQL database.
  • The scale of traffic and need for zero downtime cannot be handled by SQL.
  • New application paradigms can be more easily supported.

NoSQL databases provide a variety of data models such as key-value, document, and graph, which are optimized for performance and scale. 

NoSQL databases often make tradeoffs by relaxing some of the ACID properties of relational databases for a more flexible data model that can scale horizontally. This makes NoSQL databases an excellent choice for high throughput, low-latency use cases that need to scale horizontally beyond the limitations of a single instance. Performance is generally a function of the underlying hardware cluster size, network latency, and the calling application. NoSQL databases typically are partitionable because access patterns are able to scale out by using distributed architecture to increase throughput that provides consistent performance at near boundless scale. Object-based APIs allow app developers to easily store and retrieve data structures. Partition keys let apps look up key-value pairs, column sets, or semi structured documents that contain serialized app objects and attributes.

Here are the four main types of NoSQL databases:

  • Document databases
  • Key-value stores
  • Column-oriented databases
  • Graph databases

Document databases

Document oriented databases should be used for applications in which data need not be stored in a table with uniform sized fields, but instead the data has to be stored as a document having special characteristics. Document stores serve well when the domain model can be split and partitioned across some documents. Document stores should be avoided if the database will have a lot of relations and normalization. They can be used for content management system, blog software etc.

Documents can be stored and retrieved in a form that is much closer to the data objects used in applications, which means less translation is required to use the data in an application. SQL data must often be assembled and disassembled when moving back and forth between applications and storage.

A document is an object and keys (strings) that have values of recognizable types, including numbers, Booleans, and strings, as well as nested arrays and dictionaries. Document databases are designed for flexibility. They aren’t typically forced to have a schema and are therefore easy to modify. If an application requires the ability to store varying attributes along with large amounts of data, document databases are a good option. 

Document databases are popular with developers because they have the flexibility to rework their document structures as needed to suit their application, shaping their data structures as their application requirements change over time. This flexibility speeds development because in effect data becomes like code and is under the control of developers. In SQL databases, intervention by database administrators may be required to change the structure of a database.

Key-value store

The key-value data stores are pretty simplistic, but are quiet efficient and powerful model. It has a simple application programming interface (API). A key value data store allows the user to store data in a schema less manner. The data is usually some kind of data type of a programming language or an object. The data consists of two parts, a string which represents the key and the actual data which is to be referred as value thus creating a „key-value‟ pair. These stores are similar to hash tables where the keys are used as indexes, thus making it faster than RDBMS Thus the data model is simple: a map or a dictionary that allows the user to request the values according to the key specified. The modern key value data stores prefer high scalability over consistency. Hence ad-hoc querying and analytics features like joins and aggregate operations have been omitted. High concurrency, fast lookups and options for mass storage are provided by key-value stores. One of the weaknesses of key value data sore is the lack of schema which makes it much more difficult to create custom views of the data. The simplest type of NoSQL database is a key-value store. Every data element in the database is stored as a key value pair consisting of an attribute name (or "key") and a value. In a sense, a key-value store is like a relational database with only two columns: the key or attribute name (such as state) and the value (such as Alaska). In the key-value structure, the key is usually a simple string of characters, and the value is a series of uninterrupted bytes that are opaque to the database. The data itself is usually some primitive data type (string, integer, array) or a more complex object that an application needs to persist and access directly.

Column-Oriented Databases

Column stores in NO SQL are actually hybrid row/column store unlike pure relational column databases. Although it shares the concept of column-by-column storage of columnar databases and columnar extensions to row-based databases, column stores do not store data in tables but store the data in massively distributed architectures. In column stores, each key is associated with one or more attributes (columns). A Column store stores its data in such a manner that it can be aggregated rapidly with less I/O activity. It offers high scalability in data storage. The data which is stored in the database is based on the sort order of the column family. Column oriented databases are suitable for data mining and analytic applications, where the storage method is ideal for the common operations performed on the data. Column-based (also called ‘wide column’) models enable very quick data access using a row key, column name, and cell timestamp. The flexible schema of these types of databases means that the columns don’t have to be consistent across records, and you can add a column to specific rows without having to add them to every single record. The wide, columnar stores data model, like that found in Apache Cassandra, are derived from Google's BigTable paper.

Graph-based

Graph databases are databases which store data in the form of a graph. The graph consists of nodes and edges, where nodes act as the objects and edges act as the relationship between the objects. The graph also consists of properties related to nodes. It uses a technique called index free adjacency meaning every node consists of a direct pointer which points to the adjacent node. Millions of records can be traversed using this technique. In a graph databases, the main emphasis is on the connection between data. Graph databases provides schema less and efficient storage of semi structured data.. The queries are expressed as traversals, thus making graph databases faster than relational databases. It is easy to scale and whiteboard friendly. Graph databases are ACID compliant and offer rollback support. The modern graph database is a data storage and processing engine that makes the persistence and exploration of data and relationships more efficient. In graph theory, structures are composed of vertices and edges (data and connections), or what would later be called “data relationships.” Graphs behave similarly to how people think in specific relationships between discrete units of data. This database type is particularly useful for visualizing, analyzing, or helping you find connections between different pieces of data. As a result, businesses leverage graph technologies for recommendation engines, fraud analytics, and network analysis. Examples of graph-based NoSQL databases include Neo4j and Janus Graph.

Why NoSQL database?

  • Support distributed database architectures. 
  • Provide high scalability, high availability, and fault tolerance. 
  • Support large amounts of sparse data. Geared toward performance rather than transaction consistency. Store data in key-value stores. 
  • NoSQL databases generally provide flexible schemas that enable faster and more iterative development.
  • The flexible data model makes NoSQL databases ideal for semi-structured and unstructured data.
  • NoSQL databases are generally designed to scale out by using distributed clusters of hardware instead of scaling up by adding expensive and robust servers.
  • Some cloud providers handle these operations behind-the-scenes as a fully managed service. 
  • NoSQL database are optimized for specific data models and access patterns that enable higher performance than trying to accomplish similar functionality with relational databases. 
  • NoSQL databases provide highly functional APIs and data types that are purpose built for each of their respective data models.

Characteristics of NoSQL databases

NoSQL avoids:

  • Overhead of ACID transactions
  • Complexity of SQL query
  • Burden of up-front schema design
  • DBA presence
  • Transactions (It should be handled at application layer)

Provides:

  • Easy and frequent changes to DB
  • Fast development
  • Large data volumes(eg.Google)
  • Schema less 

Future Prospects for NOSQL

Although NOSQL has evolved at a very high pace, it still lags behind relational database in terms of number of users. The main reason behind this is that the users are more familiar with SQL while NOSQL databases lack a standard query language. If a standard query language for NOSQL is introduced, it will surely be a game changer. There are a few DBaaS providers over the cloud like Xeround which works on the hybrid database model, that is, they have the familiar SQL in the frontend and NOSQL in the backend. These databases night not be as fast as a pure NOSQL database but they still provide features of both relational as well as NOSQL databases to the user. Thus a lot of disadvantages of both relational as well as NOSQL databases may be covered up. With a few more advancements in this hybrid architecture the future prospects for NOSQL databases in DBaaS are excellent.

Advantages of NOSQL

  • Provides a wide range of data models to choose from
  • Easily scalable
  • Database administrators are not required
  • Some of the NOSQL DBaaS providers like Risk and Cassandra are programmed to handle hardware failures
  • Faster , more efficient and flexible
  • Has evolved at a very high pace

Disadvantages of NOSQL

  • Immature
  • No standard query language
  • Some NOSQL databases are not ACID compliant
  • No standard interface
  • Maintenance is difficult