youtube image
From YouTube: Tech Talk Series Part Two: Boost Delta Lake Performance with Data Skipping and Z-Order

Description

Join us for the second in a four part series with Salesforce Engineering.

Abstract:
When building a data lake, partitioning strategy is one of the most critical decisions to make. Less optimized data partitioning strategy can generate small files and undermine read and write performance. Besides traditional file based partitioning with partition pruning, Databricks provides another option of Data Skipping and Z-Ordering (https://docs.databricks.com/delta/optimizations/file-mgmt.html) with I/O pruning and file Compaction. In this talk, we will share the evolving thinking of our partitioning strategy when building Engagement delta lake. Using this real world use case, We will elaborate why and how we leverage Data Skipping and Z-Ordering to Boost Delta Lake Performance.

Part 1: Engagement Activity Delta Lake - https://youtu.be/a7_I1Qi1LoU

-----------------
Speakers
-----------------

Zhidong Ke, Software Engineer PMTS, Salesforce
Zhidong is passionate in designing distributed systems, real-time/batch data processing and building applications.

Yifeng Liu, Software Engineer LMTS, Salesforce
Yifeng is a software engineer who has extensive experience in big data processing and distributed system, and interested in high volume, high complexity, low latency data pipeline and framework building.

Aaron Zhang, Software Engineering PMTS, Salesforce
Aaron is an experienced software engineering leader with interests and areas of focus in engineering secure, fault-tolerant, high volume systems built on micro services.

Heng Zhang, Software Engineering PMTS, Salesforce
Heng is a software engineer who is interested and specialized in micro services, distributed systems and big data. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner