Will NOSQ replace SQL?


NOSQL databases like MongoDB are making a big splash in the computer industry. But are they set to replace the tried and tested technology of the past fifty years?

Here is my take on it, with a bit of history thrown in.

7,000 years ago, the Mesopotamians invented bookkeeping.  Since then, records were kept in ledgers: sheets of paper divided into columns.  In 1890, Herman Hollerith invented the punched card and started a revolution in record-keeping.  Cards were divided into fixed columns, just like ledgers, but you could sort them and slice and dice the data in ways that the Babylonians could only dream of.  With computers, magnetic tapes replaced cards, but the processes were essentially the same.

Things changed in a big way in the 1960′s

Disk storage entered the world of commercial computing. The disk space was divided into files and each file divided into records. IBM invented ISAM (Indexed Sequential Access Method) which allowed the software to read the records individually or in sequence. You could also link records, so the customer number in a purchase order record, allowed you to find the customer record and add this data into the mix.  

This was formalised as the relational model in 1970 and eventually implemented in numerous database management systems such as MySQL or Oracle. Access to the data was standardised in a coding standard called SQL. But a Mesopotamian would recognise the concept. A relational table was a lot like a ledger. It had rows and columns just the same.

Seven millennia of record keeping and fifty years of the relational model are now being challenged by a new model. The generic name is NOSQL. The market leader is MongoDB, but Oracle have a product as well and there are two open-source systems (more of them in a future blog). 

There are different flavours of NOSQL, but in these systems, database tables of rows and columns have gone. Instead, there are collections of free-form records called 'documents' or 'aggregates'. In the relational model, an invoice record (say) has a separate file of invoice lines. In a document-based system the invoice document has all of the lines within it.

The terminology has changed, but this table is a rough translation between SQL and the most popular type of NOSQL (Document based). An open-source system (CouchDB) has dispensed with collections, and you just get a database of documents all mixed together - a bit like my desktop. 

It’s creating a big impression on the industry. I am adding MongoDB to my lockdown project sudsjs.com and I will report on that later.

Benefits claimed by MongoDB

  1. Handle large volumes of data at high speed with a scale-out architecture
  2. Store unstructured, semi-structured, or structured data
  3. Enable easy updates to schemas and fields
  4. Be developer-friendly

These are pretty weak.

1. There can be a performance benefit to document-based systems because it has fewer records to read for many queries. However, it is not suited to situations where you want to query data in all sorts of unpredictable ways. You get the invoice data in one read, but if you want to analyse sales by product you might have issues.

2. I have never seen unstructured put forward as a benefit in an IT situation before. A mechanic tells me he/she has an unstructured approach to fixing my car I am taking my car somewhere else.  

To be fair what they actually mean is data structure is not limited to rows of information. There are of course applications that this fits, but I don't recall ever needing this capability for routine commercial or banking systems.

3. MongoDB doesn't require a pre-set document structure (aka schema).  In some situations, this could be a game-changer, but for most this is not an issue. 

4. Developer-friendly - please.

Document based NOSQL systems come with huge benefits for some applications. For run-of-the-mill commercial applications, probably not so much. Look for applications that:

  • Deal with very high volumes
  • The queries you will make on the database are predictable
  • Record contents can vary from record to record (e.g., product specifications that could be different for different classes of products).


No comments:

Post a Comment