[Software Architecture: The Hard Parts][Chapter 6] Pulling Apart Operational Data Part 2: Decomposing Monolithic Data
Introduction
Hello everyone! In this series of articles we're going to be going through decomposing monolithic data, decomposing monolithic databases is hard but one particularly effective technique for breaking apart data is to leverage what is known as the five-step process. This evolutionary and iterative process leverages the concept of a data domain as a vehicle for methodically migrating data into separate schemas, and consequently different physical databases.
Five Step Process
The five-step process is briefly as follows;
Analyze the database and create data domains
Assign tables to data domains and move tables
Separate database connections to data domains
Move schemas to separate database servers
Switch over to independent database servers
A data domain is a collection of coupled database artifacts- tables, views, foreign keys and triggers- that are all related to a particular domain and frequently used together within a limited functional scope.
The architect has to be able to know and understand the data domain boundaries for each domain and the cross-domain boundaries (foreign keys, views, stored procedures) that need to be broken
When extracting a data domain, these cross-domain dependencies must be removed. This means removing foreign-key constraints, views, triggers, functions and stored procedures between data domains.
When getting rid of cross-domain boundaries, each service must call the service responsible for the data instead so as not to violate the bounded context concept.
Analyze Database and Create Data Domains
The first step is to identify specific domain groupings within the database. At this point, all services should still be using one big monolithic database. This step might take some time and iteration to fully be able to identify the data domains in the database.
Assign Tables to Data Domains
The next step is to group tables along a specific bounded context, assigning tables that belong to a specific data domain into their schema. A schema is a logical construct in database servers
Schemas contain objects such as tables, views, functions and any database artifact.
When tables belonging to different data domains are tightly coupled and related to one another, data domains must necessarily be combined creating a broader bounded context where multiple services own a specific data domain.
Separate Database Connections to Data Domains
In this step, the database connection logic within each service is refactored to ensure that the services connect to a specific schema and have read and write access to the tables belonging only to their data domain.
This step is the hardest since all the cross-schema access must be resolved at the service level. Refactoring to call the service responsible for the data domain instead of accessing its database directly.
This state is known as data sovereignty per service which occurs when each service owns its data, It's considered the nirvana state for distributed architecture. It has its Benefits and shortcomings
Benefits:
Teams can change the database schema without worrying about affecting changes in other domain
Each service can use the database technology and database type best suitable for the use case
Shortcomings:
Performance issues occur when services need access to large volumes of data
Referential Integrity cannot be maintained in the database, resulting in possibly bad data quality
All database code (functions, stored procedures) that access tables belonging to other domains must be moved to the service layer.
Move Schemas to Separate Database Servers
Now what's left is to physically move the data domains to different databases. This is often necessary because while each service accesses its domain schema, it still creates a single architecture quantum since it's only one database.
When migrating you have two options; Backup and Restore or Replication
Backup and Restore
Typically you would backup the existing data of the schema and restore it on the new database, However, this would require downtime so data is synced.
Replication
We would replicate the data from the old database server to the new one, and switch over the connection afterwards. However, this approach requires some overhead to maintain the connection but no downtime is required.
Switch Over to Independent Database Servers
Once schemas are fully replicated, the service connections can be switched. This is the last step in getting the data domains and services to act as their independent deployable units.
Once the data domains have been physically separated, you can optimize the individual database servers for availability and scalability and also analyze the data to determine the most appropriate database to use. Introducing polyglot database usage in the system
Summary
Decomposing monolithic data is a complicated task that requires a lot of careful assessment and design. But with the use of the five-step process, it becomes more of a straightforward process which is still complicated but you'll always know the next task ahead.
That's it for this one hope you enjoyed it and till we meet again in the next one!