14 Jun 2022
Join us for Module 3: SQL and the Transaction Log - Tuesday, June 14
-Delta Lake SQL
-Time Travel
-Transaction Log Fundamentals
This 3-part workshop is intended to teach you what Delta Lake is and how to use Apache Spark and Delta Lake in your data architectures for reliable large-scale distributed data pipelines. This course will show the features of Delta Lake that, alongside Spark SQL and Spark Structured Streaming, introduce ACID transactions and time travel (data versioning) to your ETL batch and streaming workloads. Slides, demos, exercises, and Q&A sessions should all together help you understand the concepts of the modern data lakehouse architecture.
Quick links:
https://delta.io/
https://github.com/jaceklaskowski/spark-delta-lake-workshop/blob/main/notebooks/Delta_Lake.py
https://github.com/badal-io/datastream-deltalake-connector
https://groups.google.com/g/delta-users
-Delta Lake SQL
-Time Travel
-Transaction Log Fundamentals
This 3-part workshop is intended to teach you what Delta Lake is and how to use Apache Spark and Delta Lake in your data architectures for reliable large-scale distributed data pipelines. This course will show the features of Delta Lake that, alongside Spark SQL and Spark Structured Streaming, introduce ACID transactions and time travel (data versioning) to your ETL batch and streaming workloads. Slides, demos, exercises, and Q&A sessions should all together help you understand the concepts of the modern data lakehouse architecture.
Quick links:
https://delta.io/
https://github.com/jaceklaskowski/spark-delta-lake-workshop/blob/main/notebooks/Delta_Lake.py
https://github.com/badal-io/datastream-deltalake-connector
https://groups.google.com/g/delta-users
- 2 participants
- 1:04 hours
31 May 2022
Join us for Module 2: DML and Schema - Tuesday, May 31
-Create, Insert, Update, Delete, Merge
-Schema Enforcement and Evolution
This 3-part workshop is intended to teach you what Delta Lake is and how to use Apache Spark and Delta Lake in your data architectures for reliable large-scale distributed data pipelines. This course will show the features of Delta Lake that, alongside Spark SQL and Spark Structured Streaming, introduce ACID transactions and time travel (data versioning) to your ETL batch and streaming workloads. Slides, demos, exercises, and Q&A sessions should all together help you understand the concepts of the modern data lakehouse architecture.
Quick links:
https://delta.io/
https://github.com/jaceklaskowski/spark-delta-lake-workshop/blob/main/notebooks/Delta_Lake.py
https://github.com/badal-io/datastream-deltalake-connector
https://groups.google.com/g/delta-users
-Create, Insert, Update, Delete, Merge
-Schema Enforcement and Evolution
This 3-part workshop is intended to teach you what Delta Lake is and how to use Apache Spark and Delta Lake in your data architectures for reliable large-scale distributed data pipelines. This course will show the features of Delta Lake that, alongside Spark SQL and Spark Structured Streaming, introduce ACID transactions and time travel (data versioning) to your ETL batch and streaming workloads. Slides, demos, exercises, and Q&A sessions should all together help you understand the concepts of the modern data lakehouse architecture.
Quick links:
https://delta.io/
https://github.com/jaceklaskowski/spark-delta-lake-workshop/blob/main/notebooks/Delta_Lake.py
https://github.com/badal-io/datastream-deltalake-connector
https://groups.google.com/g/delta-users
- 2 participants
- 58 minutes
19 May 2022
Join us for Module 1: Introduction to Delta Lake - Thursday, May 19
-Bringing Reliability to Data Lakes (Concepts)
-Convert existing tables to Delta Lake [SQL]
-Unified Batch and Streaming [Python, SQL]
This 3-part workshop is intended to teach you what Delta Lake is and how to use Apache Spark and Delta Lake in your data architectures for reliable large-scale distributed data pipelines. This course will show the features of Delta Lake that, alongside Spark SQL and Spark Structured Streaming, introduce ACID transactions and time travel (data versioning) to your ETL batch and streaming workloads. Slides, demos, exercises, and Q&A sessions should all together help you understand the concepts of the modern data lakehouse architecture.
Quick links:
https://delta.io/
https://github.com/badal-io/datastream-deltalake-connector
https://groups.google.com/g/delta-users
-Bringing Reliability to Data Lakes (Concepts)
-Convert existing tables to Delta Lake [SQL]
-Unified Batch and Streaming [Python, SQL]
This 3-part workshop is intended to teach you what Delta Lake is and how to use Apache Spark and Delta Lake in your data architectures for reliable large-scale distributed data pipelines. This course will show the features of Delta Lake that, alongside Spark SQL and Spark Structured Streaming, introduce ACID transactions and time travel (data versioning) to your ETL batch and streaming workloads. Slides, demos, exercises, and Q&A sessions should all together help you understand the concepts of the modern data lakehouse architecture.
Quick links:
https://delta.io/
https://github.com/badal-io/datastream-deltalake-connector
https://groups.google.com/g/delta-users
- 2 participants
- 1:01 hours
3 Mar 2021
This talk is brought to you by the Istanbul Spark Meetup.
Abstract: This live coding session is a gentle introduction to the latest and greatest of Delta Lake (https://delta.io/).
You will learn what Delta Lake is and what challenges it aims to solve. You will hear about how Delta Lake builds upon the features of the recent Apache Spark 3 and why it can complement your data processing workloads.
During this talk, Jacek will talk about the slogan from the main page of Delta Lake: "Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads."
You will learn about time travel and data versioning using Spark tables in Spark SQL and Spark Structured Streaming. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
Abstract: This live coding session is a gentle introduction to the latest and greatest of Delta Lake (https://delta.io/).
You will learn what Delta Lake is and what challenges it aims to solve. You will hear about how Delta Lake builds upon the features of the recent Apache Spark 3 and why it can complement your data processing workloads.
During this talk, Jacek will talk about the slogan from the main page of Delta Lake: "Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads."
You will learn about time travel and data versioning using Spark tables in Spark SQL and Spark Structured Streaming. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
- 2 participants
- 1:53 hours
27 Aug 2020
Delta Lake’s transaction log brings high reliability, performance, and ACID compliant transactions to data lakes. But exactly how does it accomplish this?
Working through concrete examples, we will take a close look at how the transaction logs are managed and leveraged by Delta to supercharge data lakes.
In this tech talk you will learn:
- Enabling and configuring OSS Delta Lake
- Creating Delta Lake tables
- Using history() to view metadata and table versioning
- How Delta manages the log files
- What goes into the transaction logs for various DML operations
- How Delta constructs snapshots of data
- The small file problem and how to mitigate it
- How to construct time travel queries
- Configuring Delta tables for deleted files and log retention
Speaker: Louis Frolio is a Senior Technical Instructor at Databricks. Leveraging his successful career in Data and AI, Louis trains Databricks business partners on Databricks and Spark. He has two Master Degrees, one in Applied Physics from the University of Massachusetts and a second in Strategic Analytics from Brandeis University. Louis lives in New England with his wife and son. As a former professional chef, Louis still considers himself a culinarian and uses his personal time to explore the world of food.
The notebooks for this video can be found at: https://github.com/databricks/tech-talks/tree/master/2020-08-27%20%7C%20How%20Delta%20Lake%20Supercharges%20Data%20Lakes Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
Working through concrete examples, we will take a close look at how the transaction logs are managed and leveraged by Delta to supercharge data lakes.
In this tech talk you will learn:
- Enabling and configuring OSS Delta Lake
- Creating Delta Lake tables
- Using history() to view metadata and table versioning
- How Delta manages the log files
- What goes into the transaction logs for various DML operations
- How Delta constructs snapshots of data
- The small file problem and how to mitigate it
- How to construct time travel queries
- Configuring Delta tables for deleted files and log retention
Speaker: Louis Frolio is a Senior Technical Instructor at Databricks. Leveraging his successful career in Data and AI, Louis trains Databricks business partners on Databricks and Spark. He has two Master Degrees, one in Applied Physics from the University of Massachusetts and a second in Strategic Analytics from Brandeis University. Louis lives in New England with his wife and son. As a former professional chef, Louis still considers himself a culinarian and uses his personal time to explore the world of food.
The notebooks for this video can be found at: https://github.com/databricks/tech-talks/tree/master/2020-08-27%20%7C%20How%20Delta%20Lake%20Supercharges%20Data%20Lakes Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
- 2 participants
- 1:02 hours
8 Jul 2020
Take a walk through the daily struggles of a data engineer in this presentation as we cover what is truly needed to create robust end to end Big Data solutions.
About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unifie...
Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unifie...
Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
- 2 participants
- 41 minutes