youtube image
From YouTube: Generating Surrogate Keys for your Data Lakehouse with Spark SQL and Delta Lake

Description

For this tech chat, we will discuss a popular data warehousing fundamental - surrogate keys. As we had discussed in various other Delta Lake tech talks, the reliability brought to data lakes by Delta Lake has brought a resurgence of many of the data warehousing fundamentals such as Change Data Capture in data lakes. Surrogate keys are unique and lack any business context so they can stand the test of time when joining domain (or dimensional) and fact data. This can be difficult in single-node systems and can be even more complex for distributed systems. In this session, we will discuss the history and value of surrogate keys and what are the requirements for good strategies to implement this data warehousing fundamental into your Delta Lake.

You can find the notebooks for this video at: https://github.com/databricks/tech-talks/tree/master/2020-08-25%20%7C%20Generating%20Surrogate%20Keys%20for%20your%20Data%20Lakehouse%20with%20Spark%20SQL%20and%20Delta%20Lake Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner