Table of Contents
Problem Statement
Change Data Capture
Imagine you’re designing the architecture of Instagram. When users visit a profile, they need to instantly see who this person follows.
Approach & potential issues
This requires querying a database table containing follower-followed relationships. To make this lookup fast, we need an index on the follower column.
But here’s the problem. To generate a user’s newsfeed, we need to quickly find all the people who follow a particular user. To make this query fast, we need an index on the followed column.
Running both these queries efficiently at the same time can be challenging. We could maintain two separate databases – one indexed by follower and another indexed by followed. But this creates unnecessary write overhead. When a user follows someone, we’d need to update both databases simultaneously. This synchronous update significantly slows down the follow action and introduces potential consistency issues if one update fails while the other succeeds.
Solution
Change Data Capture (CDC) solves this problem elegantly. Instead of updating both databases synchronously, we only write to the primary database indexed on follower. CDC monitors the database’s transaction logs and captures every change made to the follower table. These changes are then pushed to Apache Kafka as events.
A stream processor like Apache Flink continuously processes these events from the Kafka queue and updates the secondary database indexed by followed.
This asynchronous process ensures that the follow action remains fast since it only needs to write to one database.
The second database eventually catches up through the CDC pipeline.
Please visit https: https://codeandalgo.com for more such contents.
Leave a Reply