NoSQL gets a lot of “heat” about not having a good direct definition. And the term NoSQL only gives somes clues of what this is not.
NoSQL is like a new definition of something that is a database but is different than the usual relational model. Likewise, making a parallel by going back 30 years ago. Back then probably no one knew what a relational database was also. Let’s use this and start with a small historical and motivational point of view for NoSQL. PS: I was not a developer back then.
This article will be a first part of a series of articles. The goal will be to unite some knowledge about this topic.
Starting in the 1980’s
Relational databases emerged bringing ACID properties – Atomicity, Consistency, Isolation and Persistency. Properties taken for granted nowadays. Also, they also brought the SQL language. SQL is common enough across different systems for one to use. Although there are different flavors, can almost be considered a standard.
Relation databases also allowed for a simple and very common integration mechanism between 2 or more systems. Data can be easily shared by reading or writing data in a table on a shared relational database. This is today still a very common integration pattern.
In the 1990’s
Object database’s started to appear with more strength. They had been around for some time. Most important, they implement a different database paradigm model. This new paradigm tried to solve the impedance mismatch problem. The impedance mismatch was caused by relational databases. It emerges from the need to map objects used in memory in our application model to tables in a relational database. This page https://en.wikipedia.org/wiki/Object-relational_impedance_mismatch is a good reference for this issue.
Saving an application object into these database model should be a direct operation. Conceptually no mapping would be necessary when an object database type was used. Albeit, these gains, object databases were not able to substitute relational databases.
In the 2000’s
As internet availability grows 2 things started to have a big impact:
- Generation of enormous amount of data that had to be stored and processed;
- Accesses from anywhere in the globe became more and more frequent. Therefore latency was an issue to consider. And even speed of light limitation can contribute significantly to this. Data can now be stored far from where we are. For example, a distance of 10.000 km will add about 100 ms on each round trip. Therefore, data needs to spread around the globe to provide a quick access.
In this environment, relational databases do not always solve customer needs, either because of flexibility, price or performance issues:
- Relational databases enforces a ridged set of rules in the relational model. These will impact the flexibility of the application development;
- Price because of the need for more powerful machines and also the software licenses that go with this;
- Performance, because relational databases do not naturally meet the increasing workload. Huge CPU, memory, storage and throughput is needed to be perform in this environment. And for relational model, vertical scaling can only get you so far.
Scaling Vertically versus Horizontally
Relational databases usual scaling method is a vertical one. As a result whenever the workload incresses we will use a bigger machine with more hardware resources.
On the other hand, the big internet companies adopted an horizontal scaling paradigm. This is a cluster type environment with lots and lots of machines. In other words, this means that more machines will be used to handle bigger workloads.
Relational databases do not thrive very well in this paradigm. They hardly take any benefit of using more machines. Hence, NoSQL like databases arises to take advantage of the horizontal scale paradigm. Also companies like Google and Amazon started researching in this area. As a result, Google created BigTable and Amazon DynamoDB.
Relational databases dominated the market, why are NoSQL databases being used now?
I think this is important. Dominance and usage factors are usually a combination of several aspects. What has been changing:
- Developers hide databases behind integration layers. This makes it simpler to use. And in addition, easier to replace one database or persistence method per other;
- Cloud growth in some cases means we can try different approaches with less effort regarding infrastructure;
- Handling big quantities of data introduced new needs. NoSQL provides a good development flexibility. And in addition can also take advantage of cluster solutions.
Finally, what is NoSQL?
Common NoSQL databases characteristics
NoSQL doesn’t have a clear direct meaning. The term alludes to something like “Not Only SQL”. A more exact explanation should be “Non Relational database”. This would reinforce the new model paradigm and not the SQL language itself. The fact that some NoSQL databases actually supports some form of SQL language adds even more to the confusion.
But, the name is catchy and so common that will probably stay. The point is: defining NoSQL is hard. Therefore, we will do the next best thing. Introduce the dominant traits of these database systems:
- Non relational;
- Usually (not all) cluster-friendly;
- Most of them are open source;
- No-schema / schema-less;
- Have a big internet drive (cluster and bigdata friendly).
By these common traits, almost any non relational database can be a NoSQL database! Also, there are probably published studies in all these areas dating back 30 or 40 year ago. Out of curiosity, the name NoSQL itself is referred to being born by the end of 2009. The term emerged as a twitter hashtag for a meetup were people talked about these subjects. Even though some of these traits were not new at the time.
Next steps
NoSQL has a lot to explore. It will provide a more direct paradigm that will make simpler specific needs like this ones to query Json objects inside Sql Server and performance considerations.
My plan for the next article(s) will be to:
- Present some of the NoSQL data models;
- Talk about the no schema or the schema less feature;
- Discuss the aggregate concept. And how this relates to the CAP theorem.
Further reads
There is a lot of information online. For me, I like Martin Fowler approach. Some of these information’s are from his content. You can check his website here: https://martinfowler.com/nosql.html. He also has a book: https://martinfowler.com/books/nosql.html.