Delta Lake / Delta Lake Tutorials

Add meeting Rate page Subscribe

Delta Lake / Delta Lake Tutorials

These are all the meetings we have in "Delta Lake Tutorials" (part of the organization "Delta Lake"). Click into individual meeting pages to watch the recording and search or read the transcript.

14 Jun 2022

Join us for Module 3: SQL and the Transaction Log - Tuesday, June 14

-Delta Lake SQL
-Time Travel
-Transaction Log Fundamentals

This 3-part workshop is intended to teach you what Delta Lake is and how to use Apache Spark and Delta Lake in your data architectures for reliable large-scale distributed data pipelines. This course will show the features of Delta Lake that, alongside Spark SQL and Spark Structured Streaming, introduce ACID transactions and time travel (data versioning) to your ETL batch and streaming workloads. Slides, demos, exercises, and Q&A sessions should all together help you understand the concepts of the modern data lakehouse architecture.

Quick links:
https://delta.io/
https://github.com/jaceklaskowski/spark-delta-lake-workshop/blob/main/notebooks/Delta_Lake.py
https://github.com/badal-io/datastream-deltalake-connector
https://groups.google.com/g/delta-users
  • 2 participants
  • 1:04 hours
lake
delta
san
francisco
features
vinnie
hey
temperatures
summit
ciao
youtube image

31 May 2022

Join us for Module 2: DML and Schema - Tuesday, May 31

-Create, Insert, Update, Delete, Merge
-Schema Enforcement and Evolution

This 3-part workshop is intended to teach you what Delta Lake is and how to use Apache Spark and Delta Lake in your data architectures for reliable large-scale distributed data pipelines. This course will show the features of Delta Lake that, alongside Spark SQL and Spark Structured Streaming, introduce ACID transactions and time travel (data versioning) to your ETL batch and streaming workloads. Slides, demos, exercises, and Q&A sessions should all together help you understand the concepts of the modern data lakehouse architecture.

Quick links:
https://delta.io/
https://github.com/jaceklaskowski/spark-delta-lake-workshop/blob/main/notebooks/Delta_Lake.py
https://github.com/badal-io/datastream-deltalake-connector
https://groups.google.com/g/delta-users
  • 2 participants
  • 58 minutes
lake
visiting
chat
seattle
location
hey
currently
delta
webinar
summit
youtube image

19 May 2022

Join us for Module 1: Introduction to Delta Lake - Thursday, May 19

-Bringing Reliability to Data Lakes (Concepts)
-Convert existing tables to Delta Lake [SQL]
-Unified Batch and Streaming [Python, SQL]

This 3-part workshop is intended to teach you what Delta Lake is and how to use Apache Spark and Delta Lake in your data architectures for reliable large-scale distributed data pipelines. This course will show the features of Delta Lake that, alongside Spark SQL and Spark Structured Streaming, introduce ACID transactions and time travel (data versioning) to your ETL batch and streaming workloads. Slides, demos, exercises, and Q&A sessions should all together help you understand the concepts of the modern data lakehouse architecture.

Quick links:
https://delta.io/
https://github.com/badal-io/datastream-deltalake-connector
https://groups.google.com/g/delta-users
  • 2 participants
  • 1:01 hours
thanks
italy
chat
hey
yousek
ciao
anybody
seattle
joining
travel
youtube image

3 Mar 2021

This talk is brought to you by the Istanbul Spark Meetup.

Abstract: This live coding session is a gentle introduction to the latest and greatest of Delta Lake (https://delta.io/).

You will learn what Delta Lake is and what challenges it aims to solve. You will hear about how Delta Lake builds upon the features of the recent Apache Spark 3 and why it can complement your data processing workloads.

During this talk, Jacek will talk about the slogan from the main page of Delta Lake: "Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads."

You will learn about time travel and data versioning using Spark tables in Spark SQL and Spark Structured Streaming. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 2 participants
  • 1:53 hours
sparknet
kubernetes
docker
delta01
databricks
platform
streams
cloud
yapila
kafka
youtube image

27 Aug 2020

Delta Lake’s transaction log brings high reliability, performance, and ACID compliant transactions to data lakes. But exactly how does it accomplish this?

Working through concrete examples, we will take a close look at how the transaction logs are managed and leveraged by Delta to supercharge data lakes.

In this tech talk you will learn:
- Enabling and configuring OSS Delta Lake
- Creating Delta Lake tables
- Using history() to view metadata and table versioning
- How Delta manages the log files
- What goes into the transaction logs for various DML operations
- How Delta constructs snapshots of data
- The small file problem and how to mitigate it
- How to construct time travel queries
- Configuring Delta tables for deleted files and log retention

Speaker: Louis Frolio is a Senior Technical Instructor at Databricks. Leveraging his successful career in Data and AI, Louis trains Databricks business partners on Databricks and Spark. He has two Master Degrees, one in Applied Physics from the University of Massachusetts and a second in Strategic Analytics from Brandeis University. Louis lives in New England with his wife and son. As a former professional chef, Louis still considers himself a culinarian and uses his personal time to explore the world of food.

The notebooks for this video can be found at: https://github.com/databricks/tech-talks/tree/master/2020-08-27%20%7C%20How%20Delta%20Lake%20Supercharges%20Data%20Lakes Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 2 participants
  • 1:02 hours
hi
chat
bonjour
louis
seattle
danny
delta
shortly
madagascar
youtube
youtube image

8 Jul 2020

Take a walk through the daily struggles of a data engineer in this presentation as we cover what is truly needed to create robust end to end Big Data solutions.

About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unifie...

Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 2 participants
  • 41 minutes
workflows
data
delta
pipeline
streaming
dbas
delays
processing
problems
advanced
youtube image