How do I decide whether to go with a normalized vs. denormalized data structure for my application?
How do I decide whether to go with a normalized vs. denormalized data structure for my application?
First off, an example of a normalized model:
<model name="asset" label="Asset">
<field name="name" label="Name" type="text" />
<field name="customer_name" label="Customer Name" type="text:name" />
<field name="customer_address" label="Customer Address" type="text:name"/>
<field name="model_name" label="Model Name" type="text:name" />
<field name="model_number" label="Model Number" type="text:number" />
<display>{name}</display>
</model>
<model name="customer" label="Customer">
<field name="name" label="Name" type="text:name" />
<field name="address" label="Address" type="text:address" />
<display>{name}</display>
</model>
<model name="model" label="MSTR: Model">
<field name="name" label="Name" type="text:name" />
<field name="number" label="Number" type="text:number" />
<display>{name} {number}</display>
</model>
Now, the same model, denormalized:
<model name="asset" label="Asset">
<field name="name" label="Name" type="text" />
<belongs-to model="model" />
<belongs-to model="customer" />
<display>Model: {model} | Customer: {name}</display>
</model>
<model name="customer" label="Customer">
<field name="name" label="Name" type="text:name" />
<display>{name}</display>
</model>
<model name="model" label="MSTR: Model">
<field name="name" label="Name" type="text:name" />
<field name="number" label="Number" type="text:number" />
<display>{name} {number}</display>
</model>
In the normalized model, the user likely selects a customer
and a model
from dropdown lists, and the developer copies the relevant fields from each onto the asset
field. This means data between the asset
and the related models are not tied together. If, for instance, a customer
address is changed, the developer will have to manually update the text fields on the asset
, likely by manually running a script in production to retain data integrity.
In the denormalized model, the integrity of the data between asset
, model
and customer
is maintained. In the same scenario as described above, if the customer address is updated, the developer does not have to do any work to update the customers address on the asset
object. The address can simply be retrieved via asset.customer().address
.
A good rule of thumb is to denormalize the model if you are going to re-use the data (e.g. customer
has-many assets
). This will ensure data integrity, which is crucial for any operational database.
There are cases where normalized data models might be useful, e.g. for report objects where you care about the state of an object at some point in time (i.e. what the asset
and related belongs-to models were set to at the time of creating a PDF report, or something similar)
@MachielReyneke There is definitely a case for normalized models, especially when using objects for e.g. reporting, thanks for the comment!