Delta Lake / Getting Started with Delta Lake

Add meeting Rate page Subscribe

Delta Lake / Getting Started with Delta Lake

These are all the meetings we have in "Getting Started with…" (part of the organization "Delta Lake"). Click into individual meeting pages to watch the recording and search or read the transcript.

15 Sep 2020

Join Michael Armbrust, head of Delta Lake engineering team, to learn about how his team built upon Apache Spark to bring ACID transactions and other data reliability technologies from the data warehouse world to cloud data lakes.

Apache Spark is the dominant processing framework for big data. Delta Lake adds reliability to Spark so your analytics and machine learning initiatives have ready access to quality, reliable data. This webinar covers the use of Delta Lake to enhance data reliability for Spark environments.

Topics areas include:
- The role of Apache Spark in big data processing
- Use of data lakes as an important part of the data architecture
- Data lake reliability challenges
- How Delta Lake helps provide reliable data for Spark processing
- Specific improvements improvements that Delta Lake adds
- The ease of adopting Delta Lake for powering your data lake

See full Getting Started with Delta Lake tutorial series here:
https://databricks.com/getting-started-with-delta-lake-tutorial-series/

Get the Delta Lake: Up & Running by O’Reilly ebook preview to learn the basics of Delta Lake, the open storage format at the heart of the lakehouse architecture. Download the ebook: https://dbricks.co/3IIcVCg
  • 2 participants
  • 58 minutes
lake
databricks
spark
database
cloud
streaming
insights
downstream
webinar
delta
youtube image

12 Mar 2020

Online Tech Talk with Denny Lee, Developer Advocate @ Databricks

A common data engineering pipeline architecture uses tables that correspond to different quality levels, progressively adding structure to the data: data ingestion (“Bronze” tables), transformation/feature engineering (“Silver” tables), and machine learning training or prediction (“Gold” tables). Combined, we refer to these tables as a “multi-hop” architecture. It allows data engineers to build a pipeline that begins with raw data as a “single source of truth” from which everything flows. In this session, we will show how to build a scalable data engineering data pipeline using Delta Lake.

Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs.

In this session you will learn about:
- The data engineering pipeline architecture
- Data engineering pipeline scenarios
- Data engineering pipeline best practices
- How Delta Lake enhances data engineering pipelines
- The ease of adopting Delta Lake for building your data engineering pipelines

See full Getting Started with Delta Lake tutorial series here:
https://databricks.com/getting-started-with-delta-lake-tutorial-series/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 2 participants
  • 58 minutes
delta
dashboard
streaming
webinar
demos
data
downstream
insight
infrastructure
lee
youtube image

5 Mar 2020

Online Tech Talk with Denny Lee, Developer Advocate @ Databricks

Lambda architecture is a popular technique where records are processed by a batch system and streaming system in parallel. The results are then combined during query time to provide a complete answer. Strict latency requirements to process old and recently generated events made this architecture popular. The key downside to this architecture is the development and operational overhead of managing two different systems. There have been attempts to unify batch and streaming into a single system in the past. Organizations have not been that successful though in those attempts. But, with the advent of Delta Lake, we are seeing a lot of our customers adopting a simple continuous data flow model to process data as it arrives. We call this architecture, The Delta Architecture. In this session, we cover the major bottlenecks for adopting a continuous data flow model and how the Delta architecture solves those problems.

See full Getting Started with Delta Lake tutorial series here:
https://databricks.com/getting-started-with-delta-lake-tutorial-series/


Other links:
Notebook: https://dbricks.co/dlw-01
Slides: https://dbricks.co/BeyondLambdaSlides
https://delta.io
https://mlflow.org
Spark + AI Summit Discount Code: DennySAI020
Register for Spark + AI Summit: https://databricks.com/sparkaisummit/north-america Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 1 participant
  • 58 minutes
complexity
streaming
pipelines
processes
delta
lambda
troubleshooting
schemas
api
thinking
youtube image

27 Feb 2020

Online Tech Talk with Denny Lee, Developer Advocate @ Databricks

One must take a holistic view of the entire data analytics realm when it comes to planning for data science initiatives. Data engineering is a key enabler of data science helping furnish reliable, quality data in a timely fashion. Delta Lake, an open-source storage layer that brings reliability to data lakes can help take your data reliability to the next level.

In this session you will learn about:
* The data science lifecycle
* The importance of data engineering to successful data science
* Key tenets of modern data engineering
* How Delta Lake can help make reliable data ready for analytics
* The ease of adopting Delta Lake for powering your data lake
* How to incorporate Delta Lake within your data infrastructure to enable Data Science

QUICK LINKS:
Download the notebook: https://dbricks.co/dlw-01
https://delta.io
https://mlflow.org
Spark + AI Summit Discount Code: DennySAI020
Register for Spark + AI Summit: https://databricks.com/sparkaisummit/north-america Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 2 participants
  • 59 minutes
spark
server
cloud
infrastructure
delta
webinar
kubernetes
hi
discussion
lee
youtube image

19 Feb 2020

We're re-igniting the Spark Online Meetup! In this live meetup, Denny Lee (Engineer and Developer Advocate at Databricks) interviews Delta Lake engineer Burak Yavuz.

Read more here: https://delta.io/
Learn more about Delta Lake Connectors: https://github.com/delta-io/connectors
Join Delta Community Slack: https://dbricks.co/DeltaSlack Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 4 participants
  • 44 minutes
geeks
brock
hi
whatnot
delta
denny
came
developer
streaming
thinking
youtube image