Handling data consistency in a Document Database

You know what an invoice looks like.  We have all seen thousands of them - probably too many. But invoices illustrate an interesting problem - and I am not talking about your VAT  return. I am talking about the new kids on the database block - document databases.  An example is MongoDB. These can store all the data for one invoice in a single structured record (called a 'document').   

In a traditional Relational Database, the data is held in flat tables, like big spreadsheets. Invoices will be stored in two tables, one for information about the invoice, like customer and date (call it the 'invoice header'). The second table will store all the invoice lines. One row per line. The lines will be linked back to their invoice by an invoice number.  But what happens if the invoice header disappears, but the invoice lines are still there?

For example, invoice 476 once existed, but now it is gone. The invoice line records are orphaned.  This problem is a big deal because now the tables don't balance. Sales by product (which will come from the invoice lines table) will not balance with the total invoices. Your accountant will have a fit of the vapours and CTO's have been fired for less.

The problem with invoice 476 may have happened because 
  1. the software is designed to delete the invoice lines automatically when you delete the invoice but crashed halfway through the transaction, or
  2. the application software allows a user to delete the invoice header without deleting the lines, and someone did it. 
Traditional relational database management systems (RDMS) have ways for guarding from both of these issues. 

Problem (1) requires something called 'transaction commit'. This 'rolls back' the database updates if a transaction doesn't complete. This is common in classic relational database systems and catered for by MongoDB since version 4.0 (2018).   

Problem (2) however is a bit more complicated. This requires something called 'referential integrity'. The RDMS knows about the relationship between these records and does not allow deletes in this situation.  As far as I can see MongoDB does not provide this.   

Ahah! Says the MongoDB expert. That is not an issue because you would not structure the data that way. MongoDB allows you to include the invoice lines and invoice header all in the same record (they call it a document). So that you delete the invoice, including the header and lines in one easy movement.

So, this is not an issue - right? 

Kind of. This might remove the issue with invoice lines in our example. But what happens if someone deletes the customer record. Or a product record.  And so on and so on. Real-life database structures tend to be complicated.

Your application must check for referential integrity and not rely on the database system.  You should do this anyway, whatever the database system. But it is very hard to test for, and it is often forgotten. So, testers beware.  

No comments:

Post a Comment