16 Apr 2020
In the earlier Delta Lake Internals tech talk series sessions, we described how the Delta Lake transaction log works. In this session, we will dive deeper into how commits, snapshot isolation, and partition and files change when performing deletes, updates, merges, and structured streaming.
In this webinar you will learn about:
- A quick primer on the Delta Lake Transaction Log
- Understand the fundamentals when running DELETE, UPDATE, and MERGE
- Understand the actions performed when performing these tasks
If you want to join the live conversation on zoom, follow the link on our online meetup: https://www.meetup.com/data-ai-online/events/269776125/
- Watch Part 1, Unpacking the Transition Log: https://youtu.be/F91G4RoA8is
- Watch Part 2, Enforcing and Evolving the Schema: https://youtu.be/tjb10n5wVs8 Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
In this webinar you will learn about:
- A quick primer on the Delta Lake Transaction Log
- Understand the fundamentals when running DELETE, UPDATE, and MERGE
- Understand the actions performed when performing these tasks
If you want to join the live conversation on zoom, follow the link on our online meetup: https://www.meetup.com/data-ai-online/events/269776125/
- Watch Part 1, Unpacking the Transition Log: https://youtu.be/F91G4RoA8is
- Watch Part 2, Enforcing and Evolving the Schema: https://youtu.be/tjb10n5wVs8 Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
- 2 participants
- 55 minutes
2 Apr 2020
Online Tech Talk hosted by Denny Lee, Developer Advocate @ Databricks with Andreas Neumann, Staff Software Engineer @ Databricks
Link to Slides - https://github.com/dennyglee/databricks/blob/c42c3939e2b6ae8a97d70207554105ffaaf26206/notebooks/Users/denny.lee%40databricks.com/Delta%20Lake/Delta%20Lake%20-%20Enforcing%20and%20Evolving%20the%20Schema.pdf
Link to Notebook - https://github.com/dennyglee/databricks/blob/master/notebooks/Users/denny.lee%40databricks.com/Delta%20Lake/Diving%20into%20Delta%20Lake%20-%20Enforcing%20and%20Evolving%20Schema.py
Link to Diving into Delta Lake Part 1: https://www.youtube.com/watch?v=F91G4RoA8is
Link to Online Meetups Playlist: https://dbricks.co/youtube-meetups
Abstract:
Data, like our experiences, is always evolving and accumulating. To keep up, our mental models of the world must adapt to new data, some of which contains new dimensions – new ways of seeing things we had no conception of before. These mental models are not unlike a table’s schema, defining how we categorize and process new information.
This brings us to schema management. As business problems and requirements evolve over time, so too does the structure of your data. With Delta Lake, as the data changes, incorporating new dimensions is easy. Users have access to simple semantics to control the schema of their tables. These tools include schema enforcement, which prevents users from accidentally polluting their tables with mistakes or garbage data, as well as schema evolution, which enables them to automatically add new columns of rich data when those columns belong. In this webinar, we’ll dive into the use of these tools.
In this webinar you will learn about:
- Understanding table schemas and schema enforcement
- How does schema enforcement work?
- How is schema enforcement useful?
- Preventing data dilution
- How does schema evolution work?
- How is schema evolution useful? Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
Link to Slides - https://github.com/dennyglee/databricks/blob/c42c3939e2b6ae8a97d70207554105ffaaf26206/notebooks/Users/denny.lee%40databricks.com/Delta%20Lake/Delta%20Lake%20-%20Enforcing%20and%20Evolving%20the%20Schema.pdf
Link to Notebook - https://github.com/dennyglee/databricks/blob/master/notebooks/Users/denny.lee%40databricks.com/Delta%20Lake/Diving%20into%20Delta%20Lake%20-%20Enforcing%20and%20Evolving%20Schema.py
Link to Diving into Delta Lake Part 1: https://www.youtube.com/watch?v=F91G4RoA8is
Link to Online Meetups Playlist: https://dbricks.co/youtube-meetups
Abstract:
Data, like our experiences, is always evolving and accumulating. To keep up, our mental models of the world must adapt to new data, some of which contains new dimensions – new ways of seeing things we had no conception of before. These mental models are not unlike a table’s schema, defining how we categorize and process new information.
This brings us to schema management. As business problems and requirements evolve over time, so too does the structure of your data. With Delta Lake, as the data changes, incorporating new dimensions is easy. Users have access to simple semantics to control the schema of their tables. These tools include schema enforcement, which prevents users from accidentally polluting their tables with mistakes or garbage data, as well as schema evolution, which enables them to automatically add new columns of rich data when those columns belong. In this webinar, we’ll dive into the use of these tools.
In this webinar you will learn about:
- Understanding table schemas and schema enforcement
- How does schema enforcement work?
- How is schema enforcement useful?
- Preventing data dilution
- How does schema evolution work?
- How is schema evolution useful? Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
- 3 participants
- 55 minutes
26 Mar 2020
Online Tech Talk hosted by Denny Lee, Developer Advocate @ Databricks with Burak Yavuz, Software Engineer @ Databricks
Link to Notebook: https://github.com/dennyglee/databricks/blob/master/notebooks/Users/denny.lee%40databricks.com/Delta%20Lake/Diving%20Into%20Delta%20Lake:%20Unpacking%20The%20Transaction%20Log.py
The transaction log is key to understanding Delta Lake because it is the common thread that runs through many of its most important features, including ACID transactions, scalable metadata handling, time travel, and more. In this session, we’ll explore what the Delta Lake transaction log is, how it works at the file level, and how it offers an elegant solution to the problem of multiple concurrent reads and writes.
In this tech talk you will learn about:
- What is the Delta Lake Transaction Log
- What is the transaction log used for?
- How does the transaction log work?
- Reviewing the Delta Lake transaction log at the file level
- Dealing with multiple concurrent reads and writes
- How the Delta Lake transaction log solves other use cases including Time Travel and Data Lineage and Debugging
See full Diving Into Delta Lake tutorial series here:
https://databricks.com/diving-into-delta-lake-talks Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
Link to Notebook: https://github.com/dennyglee/databricks/blob/master/notebooks/Users/denny.lee%40databricks.com/Delta%20Lake/Diving%20Into%20Delta%20Lake:%20Unpacking%20The%20Transaction%20Log.py
The transaction log is key to understanding Delta Lake because it is the common thread that runs through many of its most important features, including ACID transactions, scalable metadata handling, time travel, and more. In this session, we’ll explore what the Delta Lake transaction log is, how it works at the file level, and how it offers an elegant solution to the problem of multiple concurrent reads and writes.
In this tech talk you will learn about:
- What is the Delta Lake Transaction Log
- What is the transaction log used for?
- How does the transaction log work?
- Reviewing the Delta Lake transaction log at the file level
- Dealing with multiple concurrent reads and writes
- How the Delta Lake transaction log solves other use cases including Time Travel and Data Lineage and Debugging
See full Diving Into Delta Lake tutorial series here:
https://databricks.com/diving-into-delta-lake-talks Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
- 3 participants
- 53 minutes