Non Relational Data Models

A frequently asked question is what differentiates NoSQL databases from relational databases. What are the technical features, that make NoSQL a better choice for high scalability and high productivity?

The two properties, that make NoSQL database systems more powerful than  relational databases are:

1. The key of the object (data record) is not a part of the object itelf. This identity key is not just per table/type but system wide and immutable.

2. They are multivalued. I.e any attribute may be a complex type. In particular an attribute may contain a collection (set, array, list, map) of other objects or of references to other objects (via the global key).

Simple enough and at first sight not very exciting. If you come from an SQL background you probably wonder, what the hype is about. After all you have working databases without these properties.

But these simple capabilities allow for a completely different design of the database systems, and a clearer modelling of the domain applications that use them.

They are in contrast to SQL where
- The identity and therefore the address of a row is determined by its content,
- the scope of a primary key is only per table,
- a cell value (column in a row) must be an atomic value.

What are the consequences of these two simple properties?

Property (1) allows for distribution. The key can be managed by the database system which can use this key to identify the node, where an object is stored. It also allows to cluster the objects by their closeness instead of clustering them by type (table). Having a system wide key allows for painless representation of data models with inheritance.

Property (2) Allows to store really complex objects (documents) more easily. It also does away with many of the problems associated with normalisation and denormalisation.

The combination of both makes the join obsolete, and massively reduces the need for indices. It allows to store a collection of references to other objects directly in the object. (Using this collection enables the database to find the referenced objects, and it can retrieve them directly without the need for a join from the right node in the storage network). It also allows to represent graphs directly.

Let us look at a typical business example – order management:

Orders, contain items consisting of a count and a product, and have customers. Products may have several suppliers and a supplier supplies several products.

In the relational world this is six tables. (Order, Order_Item, Product, Supplier and Product_To_Supplier_Link and Customer)
The SQL Query is problably longer than the actual data returned because you need 5 joins to retrieve the data.
You get a highly redundant flat list, that repeats Order, Item, Customer and Product data for every supplier that can supply a Product in an order.
Scaling this is hard, because all six tables must be joined.

If you use a NoSQL database you need only four collections. Order, Product, Supplier, Customer.

The order_items that belong to the order can be directly embedded, because they are completely dependent on the order.

The product to supplier link table is not required, because the Product object can embed a collection of references to the suppliers for this product, and the supplier can embed a collection of the products it supplies. No indices are required, because the references can be looked up directly.

I think, besides better scalability and performance the NoSQL variant is easier to program and it models reality more closely.