youtube image
From YouTube: Tech Talk | Using Delta as a Change Data Capture Source

Description

Join us for an online tech talk on Delta Lake presented by Denny Lee and Paul Roome. Tech talks include a technical presentation with slides and a demo, with time for Q&A at the end.

While it is common to use Delta Lake as a sink for change data captured from traditional data sources; customers are increasingly asking how to use Delta tables as a source for a change data capture (CDC) process. To state a different way, how can we read a stream of changes from a Delta table, so that they can be propagated downstream.

Some example use cases include (but are not limited to):

- After cleaning the data following the Delta Architecture (bronze, silver, and gold tables), propagate this data to multiple downstream systems.

- An e-commerce company is using a Delta table to store features related to each of their customers sourced from multiple upstream sources. Upon any customer data change, this is propagated to update downstream ML models to provide the latest product recommendations to the customer.

- A large software company is using a Delta table to process and store 100s of TBs of customer telemetry data. Changes in this table need to be sent to a downstream consumer for updating a range of dashboards and analytics.

In each of these cases, we want to capture a change stream from a Delta table and send it somewhere for further processing. In this session, we will discuss the architecture, use cases, and solutions.

Agenda: 10AM PDT - 11AM PDT (GMT-8)

9:00AM - 9:50AM - Tech Talk
9:50AM - 10:00AM - Q&A

Link to Github for the notebooks: https://github.com/databricks/tech-talks Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner