5 Situational Design Patterns for Data Modelling

Martin ter Haak
17 min readMar 12, 2023

--

Photo by Vasilis Chatzopoulos on Unsplash

In the previous part, I presented a number of solutions to common data modelling problems. These so-called patterns speed up software development, lead to a more consistent implementation, and improve communication. I started with the most fundamental design patterns. If you have done any data modelling, you have most likely used them (maybe without realizing you’re applying a pattern!).

In this part of the series, we are continuing with the more situational design patterns. Their purpose only becomes clear in the niche challenges you may encounter. Also in some cases, they extend on one or more fundamental patterns from part 1, so be sure to check that out if you haven’t.

As before, I show multiple solutions for each pattern with increasing levels of flexibility from L0 to L4. You can think of L0 models as being the least flexible and L4 models being the most flexible. Flexibility also comes at a cost though, as you will see later. Here is a reminder of their meanings:

  • L0: solves the problem ad-hoc
  • L1: better, but still some downsides
  • L2: solves the problem in one place in a generic way so that future changes are easier to make
  • L3: allows reapplying pattern in the same place of the model without changing the schema
  • L4: allows reapplying pattern anywhere in the model without changing the schema

For a refresher on the used nomenclature and the diagram conventions, have a look at the Preliminary of part 1.

Property pattern

Problem: make the attributes of an entity type flexible.

Why would you want this? Well for one it could be that you don’t know upfront which attributes you need to capture for a certain entity type. The requirements are not clear and you want to leave the possibility open of changing the attributes later on.

Another reason is that you know that instances of the same entity type may have different attributes. For example, for one User entity you might know their middle name, birthday and link to their personal blog. But for another, you don’t have these or they are irrelevant. You could include all these properties as optional attributes in your data model, which means having a long list of sparse attributes on the User entity. Or you could apply a more generic approach of storing the properties as key-value pairs as you will see in a moment.

L0 solution

Property pattern L0 solution: properties modelled as regular attributes.

Let’s start with the simplest thing you can do, which is … nothing different I guess? I wouldn’t call this flexible at all but it’s still useful to show because there are cases to be made for this non-flexible model.

If you model every possible property as a separate attribute that means that any change to them will require a schema migration. As we know, schema migrations can be a pain. But they do have the benefit of forcing you to stay strict with the model. The name, datatype, cardinality and optionality of all properties are made explicit so you don’t need to check that, for example, a User has only one birthday. Data integrity is easily maintained by foreign keys, unique indices and other constraints in the database schema.

Another benefit of this L0 solution is that the model is very easy to understand. It’s clear what all the possible properties are. Also, you don’t need to perform potentially expensive joins to find out which property values a specific user has.

L1 solution

One drawback of the L0 solution is that property values are repeated. This means that if a value needs to be changed across entities, all those entities need updating. Now, normally this is not a big issue, but if it is, then this L1 solution helps out. It extracts the properties into a separate entity with their own IDs. Because of this, they may be reused by the subject entities.

Property pattern L1 solution: property values extracted into their own entity type and referenced using an ID.

This solution solves the problem of duplication in L0. But still has the issue that the number of properties for a subject is fixed. Also, the attribute names are embedded in the model and thus are not easily changeable. This is a major restriction if you need subject entities that may have a variable number of properties. Let’s move on…

L2 solution

The L2 solution that I present here solves the limitations of the L1 solution.

Property pattern L2 solution: a subject entity to whom zero or more property entities are linked.

By moving the foreign key relationship from the <Subject> entity to the <Property> entity, we have reversed the direction of the relationship. Each subject may now have any number of properties and properties can be easily added and removed as well. Furthermore, the property name has become its own attribute, which makes it changeable without a schema migration.

As you may have noticed, here we use what is generally called key-value pairs. This is a generic approach to setting values for arbitrary keys on your object.

What this model gains in flexibility, it loses in rigidness. The model no longer enforces that a subject must have certain properties, so you will need to account for this in your software. Essentially every property is now optional, and there can be multiple properties with the same name too (unless you set a uniqueness constraint on (<subject> id, name)).

Furthermore, we lost L1’s feature to reuse property values across multiple subjects. And the property name is repeated too.

L3 solution

For the L3 solution, we change the subject-property relationship to a many-to-many relationship by introducing the <Subject property> associative entity. The name attribute is kept in the <Property> entity, but the value is stored in the associative entity. This means that separate <Subject> entities can have different values for the same property. The name is deduplicated and may be easily changed too.

Property pattern L3 solution: <Subject property> associative entity between the subjects and properties.

This solution does not allow the reuse of property values. Also, values cannot be restricted to a limited set of possible values. If you require either, then you could consider extracting out the values to a <Value> entity, similarly to how the names were extracted to a <Property> entity. By the way, this is just an application of the Type pattern described in part 1, in case you’d like a more extensive explanation.

L4 solution

To meet the requirements of an L4 solution, we are looking for ways to make adding properties to any existing entity type without schema changes possible. For this, I came up with two approaches.

The first is to follow the loose relationship approach. By putting the subject entity name and subject ID in the <Subject property>, we can now associate a property to any entity in the model. Of course, because there is not a foreign key relationship anymore, maintaining data integrity is a challenge. For example, if you remove a <Subject> entity, you need to remove <Subject property> entities loosely linked to that subject as well.

Property pattern L4a solution: a loose relationship from <Subject property> to any subject.

This solution is not recommended in most cases. It just gets too messy when properties can belong to any entity type.

The second approach is to model all the data concepts the same. One entity type for all subjects, one for all properties, and one entity type for all associations. This model is extremely generic and thus great for data warehouses where you expect data of many different schemas to come in. To model the types of subjects and associations, there are the <Subject type> and <Association type> entity types.

Here we just combined the Type pattern L2 solution, Network pattern L2 solution (both from part 1), and the Property pattern L2 solution to achieve a highly generic and small model. Isn’t it beautiful how the patterns are starting to come together? The perk of composability!

Sadly, because this model is so generic, very few business rules are enforced by the model. Keep that in mind when considering this one.

Shapeshifter pattern

Problem: make the datatype of a property flexible.

There is a saying “the only constant is change”. And one of the most expensive aspects to change about your data model is the datatypes. Unless you have accounted for this from the start, changing a property’s datatype will require a schema migration. It may also be possible that instances of the same property need to have different datatypes, like sometimes a string and other times a number. For this problem, I present the Shapeshifter pattern.

L0 solution

The simplest thing you can do is to repeat the attribute multiple times for the same property with different datatypes. For each property, one attribute is non-null and the others are null.

Shapeshifter pattern L0 solution: attribute repeated for every expected datatype of property.

There are quite a lot of downsides to this approach. First, you need to know upfront which datatypes your property can possibly be in. Also, you can only use datatypes that are natively supported by the database. Moreover, you need multiple attributes on your entity type to store only one value. And finally, you need to somehow make sure that only one attribute is non-null. So plenty of opportunities to improve here.

L1 solution

All the issues of the L0 solution can be solved with the L1 solution that I am about to show you. However, for it to work, the value must be serialisable. That means that it can be represented in a string. The idea is to serialisable your value and store it in your model together with the name of the datatype. Then, to retrieve it, you deserialise it from string to the original datatype. In some cases, it could make more sense to serialise your value to bytes instead (for example, to encrypt your data at rest, save space or just because it’s your data’s original datatype).

Shapeshifer pattern L1a solution: property serialised to string and stored together with the datatype name.

Another option is to store the value in its JSON representation. This serialisation format can represent strings, numbers, maps, arrays, booleans and null. Bytes are not supported in JSON but you can always base64 encode them.

JSON is very popular and thus you don’t need to look far for a library that converts your data to it. In addition, there is no need to store the name of the datatype because that is encoded in the JSON syntax. Some database systems natively support a JSON column type. This is not required but it does make querying and manipulating JSON data a lot easier.

Shapeshifter pattern L1b solution: property serialised to JSON.

Storing the value in JSON has this drawback though: you may only use the datatypes supported by JSON. For example, if you’re looking to store a date, then you still need to store its datatype somewhere, just like in the L1a solution.

L2 solution

The L1 solution is nice because it allows you to easily model your property as many different datatypes. However, it only solves the problem once. If you know that your subject is going to need multiple properties with a dynamic datatype, then it’s not a good solution. For that, we need an L2 solution.

Shapeshifter pattern L2a solution: property extracted into its own entity type.

The <Property> entity type is extracted out while applying the same trick as in the L1a solution of storing the serialised value and its datatype. Also storing the value in JSON as in the L1b solution is possible.

Now that we have created a one-to-many relationship, a subject may have any number of properties with a dynamic datatype.

You will notice that this solution is very similar to the Property pattern L2 solution. This is not a coincidence. Both patterns aim to make properties more flexible. But while the Property pattern concerns itself with the names, cardinality, and optionality of attributes, the Shapeshifter pattern concerns itself only with the datatype of attributes.

Just to add, another option is to store all properties in a single JSON object on the entity. This is possible because a JSON object can hold multiple key-value pairs. It does mean that you need to rely on your database’s JSON capabilities to read and write a single property. Without it, you need to read and write all the properties at once.

Shapeshifter pattern L2b solution: all properties of one entity serialised to a single JSON object.

The L2b solution is extremely flexible for all the data you are allowed to store on every subject. But for that, it sacrifices the possible constraints you could set on the attributes. Whether that’s a show-stopper is situation-dependent.

L3 solution

A drawback of the L2 solution is that properties cannot be reused. So if you need to update some value, then you would need to find all properties with that value and update them. That issue can be solved by replacing the one-to-many relationship in the L2a solution with a many-to-many relationship, just like we did for the L3 solution of the Property pattern.

Shapeshifter pattern L3 solution: a many-to-many relationship between subjects and properties.

This solution makes the reuse of a property value possible. It’s as easy as adding a link from a <subject> id to an <property> id in the <Subject property> associative entity.

L4 solution

An L4 solution would not make sense for this pattern because your model would be too generic and you’d have no explicit relationships.

Temporal pattern

Problem: model information that is time-dependent.

We all know the world changes around us all the time. Some facts are only true during certain points or periods in time. In order to model this, we need to associate every bit of information with the time it is applicable to.

Usually, a fact is tied to either a single point in time when an event occurred or a time interval bounded by a start and end time (or just a start time and the assumption that the previous fact has been superseded — see image below). Information that applies to multiple discontinuous time periods is also possible but out of scope (though it is easy to achieve with a one-to-many relationship).

Above: a start and end point for each fact allows non-continuous and overlapping intervals during which they apply. Below: a single start point for each fact means their intervals must be continuous.

The concepts that can be made time-dependent are entities, properties and associations. Making an entity time-dependent is as simple as adding more attributes. For adding information to a property inspiration can be taken from the Property pattern. Adding time information to an association is essentially a specific application of the Context pattern from part 1.

L0 solution

I start by showing an L0 solution for a temporal property. That is done by simply adding extra attributes to the subject to denote when the property is valid for. In the diagram below I show both cases: time point and time interval.

Temporal pattern L0 solution for adding time to a property: extra attributes stored on the subject (left: time point, right: time interval).

This works and is easy to understand. However, it does mean having a bunch of extra attributes on your entities just to keep track of the times. Also, unless the attribute names make it obvious, it’s not clear which property each time corresponds to. And even worse, you cannot model multiple values for the same property over time.

L2 solution

We skip the L1 solution (because I couldn’t think of a sensible one) and go straight to L2. For L0 I showed a solution for a temporal property, so here I show a solution for a temporal context.

Because we are at L2, this solution adds time information to all the associations between subject and object. This can be achieved by a time attribute on the <Subject object> associative entity, for the case of a time point (time interval is similar).

Temporal pattern L2 solution for context: time stored on the associative entity.

I would normally recommend this level of genericity because compared to L0 it is more obvious which information the time relates to. L3 can be good if you want to reuse the same context or property. L4 is usually unnecessary and too flexible.

L3 solution

For completeness, I also add an L3 solution, for a temporal property again this time. It is based on the Property pattern L3 solution by utilising a many-to-many relationship between <Subject> and <Property> to allow properties to be reused. The only difference is the addition of the time attribute on <Subject property>.

Temporal pattern L3 solution for attribute: time stored with the value on the associative <Subject property> entity.

Even though you could argue that value and time are still duplicated, usually, it is undesirable to reuse them.

L4 solution

You could achieve an L4 solution using loose relationships but I advise against that because they make filtering and aggregations quite challenging (and you often want to make queries like “get me the information during this time interval”).

Changelog pattern

Problem: keep a historical record of the changes to the data.

This pattern is akin to the Temporal pattern. But while the Temporal pattern is concerned with the world time, the Changelog pattern cares about what you could call the “system time”.

These sound similar but there is an inherent difference. Imagine for a moment that you are managing a database containing the list of US presidents and their presidency term (i.e. world time). One day you realize that you made a mistake in one of the terms and it needs correction. So you make the change and also store the time at which you made the change. That timestamp is system time.

Difference between world time and system time

Saving the system time separately from the world time is useful because it allows you to go back in time and see what has changed in your database. That is essential for audit purposes, debugging and rolling back unintentional changes. For that reason, in addition to tracking the time when a change was made, we might also want to track who made the change, why, and other contextual information that will be helpful for later.

Just as in the Temporal pattern, we can decide the granularity at which we keep track of changes. This can be on the attribute level, entity level, or even for a group of entities and attributes. This choice depends on whether the changes happen at discrete points in time (a.k.a. “versions” or “snapshots”) or continuously. Also, storage and performance constraints need to be taken into account — storing a timestamp for each bit of information can become quite unwieldy.

I have seen this pattern applied only at the most specific L0 and most generic L3 and L4 levels. Because L4 applies the pattern to all entity types, while L3 applies the pattern to a single entity type, L3 can be deduced from L4. For that reason, I will only present the L0 and L4 solutions.

L0 solution

Here we apply the pattern to each change event that we want to track. For example, when an entity was created, last modified, or deleted. That might sound terrible because there are so many changes we could track. But it’s often the first step for all, but the largest upfront-designed systems, and in many cases, it’s good enough. Its main advantages are that it is simple to implement and very easy to understand. All you do is add attributes with specific names like created_at, created_by, created_from where you need them. This only works well for changes to full entities and not to specific attributes or relationships though.

At the point where the number of event types you would like to track becomes unwieldy, you could consider higher-level solutions. These solve the problem for many event types at once and so lead to a massively reduced model size. Also, they have the benefit of modelling all change events the same, which makes the processing and display of events much more straightforward.

But we are getting ahead of ourselves, here is a simple L0 solution of the Changelog pattern.

Changelog pattern L0 solution: extra attributes for every possible change.

L4 solution

The L4 solution which I am about to show is the most generic approach to creating a changelog. It essentially introduces a global registry for tracking all possible changes to all entities. If you need to remember everything that has changed in your database, look no further. I’m sure AWS uses some form of this solution.

Changelog pattern L4 solution: global changelog with loose relationships to the entities.

However, a challenge with this solution is consistently recording all changes that have occurred. In the L0 solution, there are mandatory attributes to be set on each entity (some databases set them for you if missing), which makes it difficult to forget. But in the L4 solution, the data model does not enforce the changelog to be updated on every change. So you need to add these guardrails yourself.

Party 🥳 pattern

Problem: model an entity type that can be either an individual or a group of individuals.

Let’s say you’re creating a data model for some legislation. In this model, you would like to define a “party” entity type that can be either a single individual or a group, like an organization.

Here I describe a few approaches for tackling this.

L1 solution

We are skipping the L0 solution and going right to L1. This is actually a good starting point. The solution that I am about to show satisfies the requirement for a single entity type. A subject can link to a party, which is either an organisation or an individual (not both!). Organisations are composed of individuals.

Party pattern L1 solution: a <Party> entity type with a mutually exclusive relationship to either an <Organisation> or an <Individual>.

As it’s L1 there are bound to be some limitations. In this case, a party can only be a single organisation or individual. But what if you would like to represent multiple organisations as a single party? Moreover, it is not possible for a party to consist of multiple individuals without grouping them in the same organisation.

L2 solution

The L2 solution solves some of the limitations of L1. It adds a many-to-many relationship between the party and its members. So a party can contain any number of organisations or individuals. In addition, the added <Organisation member> associative entity makes it possible for an individual to be part of multiple organisations.

Party pattern L2 solution: one many-to-many relationship between a party and its members, and another between the organisations and individuals.

This is a nice amount of flexibility in your data model but if it is not enough read on…

L3 solution

Here we extend the L2 solution in two ways. First, we add a <Subject party> associative entity to create a many-to-many relationship between subjects and parties. This is useful if you want the subject to be able to link to multiple parties. Second, we add a relation from <Party> to itself, which is also called a recursive relation (see Hierarchy pattern L2 from part 1). This allows a party to consist of other parties which in turn consist of even more parties, ad infinitum.

Party pattern L3 solution: L2 with improvements (the part of the model right from <Party member> is the same as L2 and left out for that reason).

The two improvements allow you to model the most complex party structures. It does, however, also require schema changes when new individual or organisation entity types need to be added. If you would like to avoid this, you need an L4 solution.

L4 solution

As you have probably noticed by now, L4 solutions can always be achieved with the use of loose relationships. As far as I know, this is the only way to create a relationship with a yet-to-be-modelled entity type, without schema changes. For this pattern, the idea of the L4 solution is to create a loose relationship from the party to its members.

I will leave it to your imagination to visualise this solution’s ER diagram.

Also, note that I would only recommend this approach if you expect many new individuals and organisations to be added to your data model. As mentioned before, the loose relationships make it challenging to maintain the integrity of your data model.

Conclusion

That’s a wrap folks. These are all the situational design patterns I could think of for now. When I learn more I will be sure to add them to this post. If you have good suggestions, I’ll be happy to add them as well.

In the future, I might write about data schemas as well. These are like the data modelling design patterns described, except that they are more often used to structure a whole database. Composability is less of a concern. They are especially good for storing data from highly different sources in a structured way. For that reason, they are popular for the design of data warehouses, whose purpose is storing and integrating heterogeneous data. Examples of data schemas are the star schema, snowflake schema, and the data vault schema.

--

--

Martin ter Haak
Martin ter Haak

Written by Martin ter Haak

The (occasionally sensible) thoughts of a senior software engineer

Responses (3)