Delta Lake / Delta Lake DW Techniques

Add meeting Rate page Subscribe

Delta Lake / Delta Lake DW Techniques

These are all the meetings we have in "Delta Lake DW Techni…" (part of the organization "Delta Lake"). Click into individual meeting pages to watch the recording and search or read the transcript.

25 Aug 2020

For this tech chat, we will discuss a popular data warehousing fundamental - surrogate keys. As we had discussed in various other Delta Lake tech talks, the reliability brought to data lakes by Delta Lake has brought a resurgence of many of the data warehousing fundamentals such as Change Data Capture in data lakes. Surrogate keys are unique and lack any business context so they can stand the test of time when joining domain (or dimensional) and fact data. This can be difficult in single-node systems and can be even more complex for distributed systems. In this session, we will discuss the history and value of surrogate keys and what are the requirements for good strategies to implement this data warehousing fundamental into your Delta Lake.

You can find the notebooks for this video at: https://github.com/databricks/tech-talks/tree/master/2020-08-25%20%7C%20Generating%20Surrogate%20Keys%20for%20your%20Data%20Lakehouse%20with%20Spark%20SQL%20and%20Delta%20Lake Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 2 participants
  • 57 minutes
enterprise
microsoft
doug
sql
introduce
dbas
users
setup
thanks
washington
youtube image

28 May 2020

We will discuss a popular online analytics processing (OLAP) fundamental - slowly changing dimensions (SCD) - specifically Type-2. As we have discussed in various other Delta Lake tech talks, the reliability brought to data lakes by Delta Lake has brought a resurgence of many of the data warehousing fundamentals such as Change Data Capture in data lakes. Type 2 SCD within data warehousing allows you to keep track of both the history and current data over time. We will discuss how to apply these concepts to your data lake within the context of the market segmentation of a climbing eCommerce site.

Speaker:
Douglas Moore, Solution Architect

I’m passionate about helping customers find value in data analytics and helping the people I work better succeed. 25+ year data veteran, ranging from embedded systems to massive cloud based data lakes. My early career interest centered around producing 3D animations of Finite Element Modeled Elastic Waves. Career wise, I came for the data visualizations and stayed for the computation and data. Past roles have included: Solutions Architect, Data Architect, CTO, Engineer. Current Specialties: Big Data Strategy & Architecture, Data Lakes, Streaming, Delta Lake, Spark, and Databricks. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 3 participants
  • 1:01 hours
meetup
meetups
chat
joining
forum
thanks
visit
community
onboard
summit
youtube image

30 Apr 2020

Join us for an online tech talk on Delta Lake presented by Denny Lee and Paul Roome. Tech talks include a technical presentation with slides and a demo, with time for Q&A at the end.

While it is common to use Delta Lake as a sink for change data captured from traditional data sources; customers are increasingly asking how to use Delta tables as a source for a change data capture (CDC) process. To state a different way, how can we read a stream of changes from a Delta table, so that they can be propagated downstream.

Some example use cases include (but are not limited to):

- After cleaning the data following the Delta Architecture (bronze, silver, and gold tables), propagate this data to multiple downstream systems.

- An e-commerce company is using a Delta table to store features related to each of their customers sourced from multiple upstream sources. Upon any customer data change, this is propagated to update downstream ML models to provide the latest product recommendations to the customer.

- A large software company is using a Delta table to process and store 100s of TBs of customer telemetry data. Changes in this table need to be sent to a downstream consumer for updating a range of dashboards and analytics.

In each of these cases, we want to capture a change stream from a Delta table and send it somewhere for further processing. In this session, we will discuss the architecture, use cases, and solutions.

Agenda: 10AM PDT - 11AM PDT (GMT-8)

9:00AM - 9:50AM - Tech Talk
9:50AM - 10:00AM - Q&A

Link to Github for the notebooks: https://github.com/databricks/tech-talks Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 3 participants
  • 53 minutes
discussion
delta
meet
disruption
understanding
changelog
enterprise
demos
important
workflows
youtube image