While SQL has been the big dog since the 70s in terms of database management, NoSQL has really come into its own since the late 2000s. In fact, NoSQL has become a powerful and important tool for data analysis and data scientists.
To that end, we wanted to take a look at what NoSQL actually is, what are the benefits over SQL, and what are the different data models.
NoSQL differs fundamentally from SQL in the philosophy behind its design. Unlike SQL which is completely based on a pre-built design, called a Schema, NoSQL doesn’t actually need one. This means that NoSQL becomes exceedingly flexible in terms of data structure implementation so that you can have a variety of databases working together or even different types of data in the same database.
Given this flexibility, one might think that NoSQL is better than SQL but that’s far from the truth. The reality is that NoSQL is oftentimes used to both complement and supplement SQL databases. Each type of design philosophy has its use, as well as advantages and disadvantages, which we’ll go over below.
In addition to this a previous contributor to this blog Marie Starck contributed her thoughts on the comparison between these two databases in her response;
SQL databases are also great for:
- Security: Most SQL databases come with some level of ACID compliance
- Relationships: One of the main features of SQL databases is the ability to divide your data into tables and create relationships between them. For example, you could have customers, orders and products tables. By creating relationships between them, you can avoid duplicating data.
Unfortunately, the main advantage of SQL databases, meaning its structured schema, makes it very complicated to change. Any changes to the columns or the relationships have to be carefully done. Scaling is also an issue as you can only vertically scale. Meaning you can only boost the database server itself, but you can't add more servers.
Enter NoSQL databases. Contrary to SQL, data in NoSQL databases can be stored in various ways such as key-value pairs, documents (usually JSON) or even graphs. The main advantages of NoSQL databases are:
- Flexibility: NoSQL databases don't require a strict data model. Your schema can be dynamic and change.
- Scaling: Contrary to SQL databases, NoSQL database scale horizontally. Meaning you can add servers and split your database across them.
- Performance: SQL databases often have to make complex queries to gather all the data an application needs. This is not the case in NoSQL databases making them more performant.
In SQL (Structured Query Language), a query is a command or statement that is utilised to retrieve data from a relational database management system. SQL queries are used to communicate with databases and request specific information or perform operations on the data stored in the database. SQL is the standard language for interacting with relational databases, and queries are a crucial part of working with them.
One of the primary advantages of NoSQL is that it’s Horizontally scaling, rather than vertically scaling like SQL is.
Why this is important is due to how SQL functions, and the fact that the only way to expand its hardware is to simply purchase more expensive hardware. This can often mean thousands and thousands of dollars spent on one piece of hardware, ballooning project costs and often being a large limiting factor.
On the other hand, NoSQL can be expanded by simply adding another server shard. This avoids the issue of having to purchase cumulatively more expensive hardware and instead puts large databases within reach of most projects. It also allows near infinite expansion, since hardware performance has a limit, but hardware quantity does not (within reason, of course, all resources are finite.)
Another great thing about NoSQL is that it is highly optimized for Querying. Whereas SQL databases require you to JOIN a variety of databases together, there’s no such limitation with NoSQL. Therefore, you don’t get the amount of resource strain that comes with having to carry out a large amount of JOINs. This is also great because it frees up resources for other work.
Finally, the fact that NoSQL doesn’t necessarily require a schema means that an expensive designer does not have to be hired. In a similar vein, projects do not need to spend large amounts of time and money on the schema design phase like they would with SQL.
That being said, NoSQL can certainly have a schema, and it can very much help, it just doesn’t need one to function.
Thankfully the disadvantages of NoSQL are minimal, and can mostly be worked around.
For starters, the data consistency of NoSQL is problematic, to say the least, compared to MySQL or even MSSQL. While the latter two database systems follow ACID principles (Atomicity, Consistency, Isolation, Durability) to ensure a stellar experience for their users, the majority of NoSQL databases provide “Check and Set” guarantees that are less effective.
Next, NoSQL does not optimize data to remove duplicates as SQL does. The way that SQL works is actually a byproduct of the times it was created, where hard drive space was small and exceedingly expensive (100GB could take a whole, massive room). Thankfully, storage has become one of the cheapest things you can be in computing besides the racks you place the electronics in.
As such, even though NoSQL requires a much larger amount of storage than an equivalent SQL database, it’s not that big an issue, although it should be something to consider.
Another disadvantage that may be a bit more substantial is that not only is NoSQL relatively new, there are also dozens of data models. Therefore, it’s rather problematic finding experts in the field, which isn’t the case for SQL. In fact, a lot of the knowledge base that you will be relying on will probably come either from the database’s own documentation, or database-specific & niche forums.
Thankfully, documentation is generally pretty good for NoSQL databases, so there shouldn’t be too many issues there.
Finally, while it is a big advantage that we can mix & match databases, it’s also a double-edged sword in that it adds a level of complication that might not exist with SQL only. For example, if you have a graph database but need a range query, well you’re just going to have to get a database for the latter.
A Key-Value store does essentially what it says on the tin: it uses both a key and a value. The value is the actual data point itself, whereas the key is the pointer to that data. This is actually the primary data model that the most popular NoSQL databases use. In fact, this data model was borne out of Amazon’s own research, which led to the development of DynamoDB.
As you can imagine, because you don’t have to store information relationally, this type of database is made for high-performance requirements, such as the Amazon store.
SQL relies heavily on both XML and JSON, which oftentimes requires stringing the two together and adding another layer of complexity. Since NoSQL doesn’t have the issue of needing information to be relational, you can essentially extricate XML and JSON from each other and have them run independently, which is much more resource-efficient. Interestingly enough, there are even XML-specific NoSQL databases.
Actually, document stores are also sometimes considered a type of key-value store, although it provides a richer set of metadata extraction and processing.
As you may have guessed, column-oriented databases store information by column, rather than by row. It’s actually a bit deeper than that because each column contains rows, which can themselves further contain columns. It’s essentially an infinitely stackable set of rows and columns.
Why is it built like that? Well, it essentially allows you to keep all the information inside the database, which makes querying much faster as well as general data aggregation.
Object-oriented databases store information transparently, which is really the main component of this data model. Of course, you can also store any type of data as an object, which also gives it versatility when it comes to needing to store and deal with multiple types of data.
This type of data model is great for things like research databases or web-scale projects.
With these types of databases, data is generally stored as a tree. This relevance-based, parent-child relationship is great for storing geospatial data and is actually pretty widely used for geo-tagging applications. Another well-known usage of this type of model is the Windows Registry, which is pretty interesting.
Sometimes known as ‘Graph’, this type of database is exceptionally good at storing information that would need to go into a graph.
Data is generally stored in the form of nodes, with the link between the nodes stored in the form of relationships. These relationships themselves can also have data, as well as directional flow, which helps you see pathing across sets of disparate data. As such, this type of database allows for sussing out data that might not have otherwise been immediately visible.
Not only that, but it’s also great for presenting threaded information, with one of the most popular graph databases being FlockDB, which is what Twitter uses.
This is a bit of a tricky question to answer because there is no unified method for how NoSQL databases work. In fact, even the concept of not needing a schema isn’t always the case, since NoSQL databases can absolutely use a schema, and some implementation does. The only real unifying experience between NoSQL data models is that they aren’t relational.
Ultimately, NoSQL databases are more of a design philosophy that describes how specialized database management systems should be created. In fact, individual NoSQL databases have their own specializations and are made for very specific uses, rather than being general like SQL.
As you can see, NoSQL is both diverse and powerful but isn’t necessarily the end-all and be-all of database management. It does take some effort to identify what database to use, and then furthermore learn the specifics of how that database runs.
Nonetheless, NoSQL is an important part of our daily lives and knowing more about it can only make you a better programmer and/or data scientist.
If you enjoyed this article then why not check out our guide to the essential steps involved in the software development life cycle or check out our updated guide to the best Java IDEs?