The Change Data Capture (CDC) feature in databases is software design patterns that help to track and determine changed data so that some action can be taken based on that data that has changed. When it comes to the identification, capture, and delivery of the changes that have been made to data in enterprise sources, data integration is another offshoot of the Change Data Capture feature.
The workflow of the CDC is as follows – the change data is captured, transformed into a structure that supports that of the destination database, and finally, uploaded to the target database.
PostgreSQL
Before going to the various types of Postgres CDC, it is necessary to take a closer look at the many intricacies of the relational database PostgreSQL. This platform can be used for performing several tasks ranging from data warehouse analytics to OLTP workloads. Generally, organizations prefer to use a relational database to handle transactional workloads and then perform aggregated reporting and analytics through a separate data warehouse.
The aspect to be careful about here is that it must be ensured that the latest version of the data in the transactional databases is also present in the data warehouse. Sometimes it is observed that time-bound reporting requirements are not in tune with daily batch or hourly sync between databases. The best solution to this problem is to have continuous sync between the instances of the Postgres CDC.
Benefits of Postgres CDC
PostgreSQL, also known as Postgres, is a widely used and common open-source relational database. So, what are the many benefits of the advanced Postgres CDC?
First, change events can be captured in immediate once statistics warehouses besides additional downstream organizations are kept in sync by way of PostgreSQL. Second, Postgres CDC reduces the capacity on Postgres as only changes that are linked to it are processed. Finally, use bags that need access to the changes made to Postgres can be efficiently implemented without changing the request encryption.
Types of Postgres CDC
There are three types of Postgres CDC and each along with its strengths and weaknesses will be seen now.
Postgres CDC – Trigger-based
In this form of Postgres CDC, users can classify variations denoted as Insert, Remove, and Update taking place in the table of interest. For every change identified, a row has to be inserted into a change table to create a changelog and all change events are stored inside the audit.logged movements.
This trigger-based Postgres CDC only provisions actions that are taken within the database. In cases where change events have to be linked to additional statistics warehouses, the table in Postgres that has the changes have to be queried repeatedly. This can be a very complex and tedious activity. The trigger-based Postgres CDC is supported by the 9.1 version and later of the database.
Strengths
- Since any change can be captured immediately, real-time processing of the events is possible
- Automatic addition of valuable metadata on the way to the altered proceedings such as including the conference operator name and the transaction ID that led to the change.
Weaknesses
- As the execution time of the original statement is increased, Postgres CDC hurts the performance of this database.
- For triggers to work effectively, variations should be completed to the Postgres database
- A separate pipeline has to be established that elections the table that holds the activate task when alteration proceedings are synced with any other storage repository apart from the same Postgres database.
- Generating and handling triggers is a complex activity.
Postgres CDC – Query-based
In this process, Postgres has to be repeatedly queried using the timestamp column present in it. The column shows the last time a row had changed. It provides data on all that has been changed since the last Postgres query.
Strengths
- For implementing query-based Postgres CDC, changes need not be made to the database. This can only be done though as soon as the diagram consumes a timestamp column that designates the time of alteration of the rackets.
Weaknesses
- The performance of the PostgreSQL database is impacted as the interrogation coating is used for extracting the data
- Query-based CDC needs continuous voting of the watched table and hence, there is a waste of resources if there is no change to the data.
- A column is required in this type of advanced Postgres CDC that trails the period of the previous alteration of proceedings
- Except it is lenient deletions, query-related Postgres CDC cannot capture Delete changes
Postgres CDC – Rational Replication-based
This type of Postgres CDC easily replicates statistics amongst dissimilar PostgreSQL examples on a different system. Launched with the 9.4 version of PostgreSQL, it is mainly a write-ahead log on floppy including all proceedings that alteration the statistics inside PostgreSQL database like Insert, Apprise, and Delete.
Even though different database systems use the replication model, in the Postgres CDC, logical replication is made to the conformation folder and automatically applied by a deciphering plugin. In varieties more than 10, this step has to be done physically. Rational replication sustenance such as AWS RDS, updated Google Cloud SQL, otherwise Azure Folder is offered as a part of PostgreSQL services.
Strengths
- Downstream submissions continuously get admission to the present statistics since PostgreSQL because log-related CDC detentions vicissitudes to the data in real time.
- All changes such as Insert, Update, and Delete are recorded by this form of Postgres CDC.
- Changes through logical replication-based Postgres CDC do not disturb the presentation of the updated PostgreSQL catalogue as users get direct access to file systems.
Weaknesses
- Logical replication-based Postgres CDC is not supported by versions that are older than 9.4.
If all three types of Postgres CDC are compared, Logical Replication is the best as it captures all forms of data changes.