In the modern digital business landscape, the primary focus is on ensuring data security to prevent confidential data from being breached in any way. This caution is not limited to hacking by unscrupulous elements; it is also about ensuring that any changes made to a database are stored in a way that does not compromise its historical value.
Several attempts had been made in the past to achieve these goals. Solutions tried included complex queries, triggers, timestamps, and data auditing, but none met the required objectives. However, the introduction of the Change Data Capture (CDC) technology achieved the set goals of businesses in this direction.
First off the mark was Microsoft with its SQL Server CDC product. In this post, we will elaborate on the various aspects of the SQL Server CDC, including its development by Microsoft, the concept behind this product, its functioning, and its types.
The Launch of Microsoft SQL Server CDC
In 2005, software giant Microsoft introduced its SQL Server CDC that accounted for specific changes in the database and ways to process them. These included “after update”, “after insert”, and “after delete” changes to the source database. Though this technology was a step in the right direction, it was considered too complex and intrusive by users.
Based on this feedback from users directly, Microsoft launched a revised version in 2008, where developers could capture and archive changed data without having to go through additional programming activities. This mode of SQL Server CDC became very popular and is still in vogue today.
The intricacies of this revised version of SQL Server Change Data Capture will now be studied in detail.
Concept of SQL Server Change Data Capture
In SQL Server CDC, the Change Data Capture uses SQL Server to apply change activities to the source database. The changes, typically in insert, update, and delete form, are available to users in a simple relational structure. The required inputs to record the changes to a target database or environment, such as metadata or column information, are captured for the modified rows.
After the changes are recorded, they are kept in change tables that mirror the structure of the tracked stored tables. Access to the change data is strictly controlled through complex algorithms and table-valued functions.
A very relevant example of a user using the Change Data Capture feature is the ETL (Extract, Transform, and Load) application. In SQL Server CDC, change data is first extracted from the source database, transformed to match the structure of the target data repository, and finally, loaded into the intended destination, typically a data warehouse or a data mart.
There are a lot of differences between the past versions of CDC and the present one. In the past, though source tables in a data warehouse mirrored changes made to them, the source databases had to be continually refreshed to capture the changes. In the current form of SQL Server CDC, a steady stream of change data is structured to help consumers apply it to divergent target representations of the data.
Workflow of SQL Server Change Data Capture
Change Data Capture tracks and records all changes made to tables created by a user, which are then stored in relational tables to be easily accessed to retrieve data with T-SQL. Whenever the features of Change Data Capture are applied to a database table, a mirrored image of the tracked table is created.
The structure of the source tables and the replicated tables is similar in all respects, with one exception. It is the column structure of the replicated table that has additional columns of metadata that verify the type of changes made in the database row. DBAs can use the new audit tables to track the logged tables and other activities that have occurred after completing SQL Server CDC.
The source of all the changes in Change Data Capture is reflected in the transaction log in SQL Server CDC. Whenever any changes to data, such as inserts, updates, and deletes, are identified in the tracked source tables, their details are added to the log and become an integral part of CDC. This log with the changes is then read, and the descriptions of the changes are linked to the change table part of the original table.
Types of SQL Server Change Data Capture
There are two types of SQL Server CDC. For a seamless experience, users should take up the first before going on to the second.
Log-based CDC: In this form of CDC, the system reads the files and transaction logs of a database to identify any changes made to the source database. These changes are then replicated in the target database.
Pros: The advantage of this method is that it is very reliable without any possibility of missing out on accounting for any changes. There is also a minimum impact on the production database system, as there is no need to change the schemas or add new tables.
Con: The only downside is that this method is quite complex and works well only with databases that support log-based CDC.
Trigger-based CDC: In this form of CDC, triggers are placed in the database that are set off when a change occurs.
Pros: In this method, the main benefit is that the cost of extracting the change data is quite low. Further, it is easy to implement as the changes occur faster, the logs of all transactions are present in shadow tables, and direct support is received in the SQL API for selected databases.
Cons: The main downside is that sometimes trigger overload leads to them getting disabled during certain operations. Additionally, there is an adverse impact on database functioning as multiple writes to a database might be required every time changes are made to rows.