Delta Lake / Conference Talks

Add meeting Rate page Subscribe

Delta Lake / Conference Talks

These are all the meetings we have in "Conference Talks" (part of the organization "Delta Lake"). Click into individual meeting pages to watch the recording and search or read the transcript.

16 Dec 2022

Modernize Regulatory Reporting: Get Ready for T+1 Settlement - Antoine Amend, Databricks; Stephen Goldbaum, Morgan Stanley; Ephrim Stanley, Goldman Sachs and Moderated by Ashley Trainer, Databricks

Speakers: Ephrim Stanley, Ashley Trainor, Antoine Amend, Stephen Goldbaum
Financial regulators and capital market participants have laid out a path to shorten the standard settlement cycle to the trade date plus one business day (T+1) by September 3, 2024. The transition to T+1 means that shortening the settlement cycle can mitigate risk, increase overall efficiency, and allow for better uses of capital. Panelists will cover the advantage of having a unified, open-source technology platform (like Morphir, Legend) and application interoperability to help manage this significant transition. In this talk, the speakers will:
Demonstrate the benefits of the Lakehouse in the ingestion, processing, validation, and transmission of regulatory data.
Address the need for organizations to ensure consistency, integrity and timeliness of regulatory pipelines.
Show how capital market firms could bring full transparency and confidence to the regulatory data, reducing operation costs and adapting to new standards like T+1. Given the timeline of implementation, 2022 has been labeled the year of impact analysis and securing budgets and management buy-in.
  • 4 participants
  • 24 minutes
volatility
transition
transaction
trading
finos
liquidity
2021
regulation
decisioning
backend
youtube image

30 Jul 2022

The Pace of Innovation in Delta Lake - Vini Jaiswal, Delta Lake
  • 3 participants
  • 33 minutes
data
databricks
lake
iot
storage
meaningful
ai
decision
developers
workloads
youtube image

19 Jul 2022

Delta Lake has quickly grown in usage across data lakes everywhere due to the growing use cases that require DML capabilities that Delta Lake brings. Outside of support for ACID transactions, users want the ability to interactively query the data in their data lake. This is where a query engine like Trino (formerly PrestoSQL) comes in. Starburst provides an enterprise version of the popular Trino MPP SQL query engine and has recently open sourced their Delta Lake connector.

In this talk, Tom and Claudius will talk about the connector, its features, and how their users are taking advantage of expanding the functionality of their data lakes with improved performance and the ability to handle colliding modifications. Get started with this feature-rich and open stack without the need of a multi-million dollar budget.

Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/data...
Instagram: https://www.instagram.com/databricksinc/
  • 6 participants
  • 34 minutes
trino
lakehouse
users
infrastructure
process
initiative
issue
nodes
streeno
delta
youtube image

19 Jul 2022

Delta Lake is becoming a defacto-standard for storing big amounts data for analytical purposes in a data lake. But what is behind it? How does it work under the hood? In this session you we will dive deep into the internals of Delta Lake by unpacking the transaction log and also highlight some common pitfalls when working with Delta Lake (and show how to avoid them).

Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/data...
Instagram: https://www.instagram.com/databricksinc/
  • 1 participant
  • 46 minutes
delta
informations
lake
leak
streaming
databricks
pipelines
apis
technical
internals
youtube image

19 Jul 2022

Data + AI Summit Keynote talk from Michael Armbrust

Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/data...
Instagram: https://www.instagram.com/databricksinc/
  • 1 participant
  • 9 minutes
delta
lake
infrastructure
dock
analysts
databases
complexity
updates
streaming
committers
youtube image

19 Jul 2022

After three years of hard work by the Delta community, we are proud to announce the release of Delta Lake 2.0. Completing the work to open-source all of Delta Lake while tens of thousands of organizations were running in production was no small feat and we have the ever-expanding Delta community to thank! Join this session to learn about how the wider Delta community collaborated together to bring these features and integrations together.

Join this session to learn about how the wider Delta community collaborated together to bring these features and integrations together. This includes the Integrations with Apache Spark™, Apache Flink, Apache Pulsar, Presto, Trino, and more.

Features such as OPTIMIZE ZORDER, data skipping using column stats, S3 multi-cluster writes, Change Data Feed, and more.

Language APIs including Rust, Python, Ruby, GoLang, Scala, and Java.

Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/data...
Instagram: https://www.instagram.com/databricksinc/
  • 5 participants
  • 38 minutes
lake
important
maintaining
infrastructure
delta
databrick
process
transactional
storage
proprietary
youtube image

24 May 2022

Denny Lee from the Delta Lake project discusses in detail the new Native Delta Lake connector for Presto.
  • 4 participants
  • 50 minutes
lake
infrastructure
delta
database
staging
databricks
warehousing
logs
cloud
currently
youtube image

12 May 2022

No description provided.
  • 1 participant
  • 26 minutes
databases
lake
infrastructure
databricks
warehouses
storage
delta
azure
reason
proprietary
youtube image

16 Dec 2021

Delta Lake Connector for Presto - Denny Lee, Databricks

Delta lake is an open-source project that enables building a lakehouse architecture on top of existing storage systems such as S3, ADLS, GCS, and HDFS. We - the Presto and Delta Lake communities - have come together to make it easier for Presto to leverage the reliability of data lakes by integrating with Delta Lake. In this session, we would like to share the design decisions and internals of the Presto/Delta connector.

For more info about Presto, an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes, see: https://prestodb.io/
  • 2 participants
  • 33 minutes
presta
prestacon
presto
pre
databricks
delta
interface
meta
mlflow
lake
youtube image

14 Jul 2020

Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Comcast, GrubHub, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores. Delta Lake, a storage layer originally invented by Databricks and recently open sourced, brings ACID capabilities to big datasets held in Object Storage. While initially designed for Spark, Delta Lake now supports multiple query compute engines. In particular, Starburst, developed a native integration for Presto that leverages Delta-specific performance optimizations. In this talk we show how a combination of Presto, Spark Streaming, and Delta Lake into one architecture supports highly concurrent and interactive BI analytics. Furthermore Presto enables query-time correlations between S3-based IoT data, customer data in a legacy Oracle database, and web log data in Elasticsearch.

About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unifie...
Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 1 participant
  • 33 minutes
presto
functionality
databases
analytics
stuber
streaming
superset
enterprise
engine
boosting
youtube image

21 Oct 2019

This talk will start by explaining the optimal file format, compression algorithm, and file size for plain vanilla Parquet data lakes. It discusses the small file problem and how you can compact the small files. Then we will talk about partitioning Parquet data lakes on disk and how to examine Spark physical plans when running queries on a partitioned lake. We will discuss why it's better to avoid PartitionFilters and directly grab partitions when querying partitioned lakes. We will explain why partitioned lakes tend to have a massive small file problem and why it's hard to compact a partitioned lake. Then we'll move on to Delta lakes and explain how they offer cool features on top of what's available in Parquet. We'll start with Delta 101 best practices and then move on to compacting with the OPTIMIZE command. We'll talk about creating partitioned Delta lake and how OPTIMIZE works on a partitioned lake. Then we'll talk about ZORDER indexes and how to incrementally update lakes with a ZORDER index. We'll finish with a discussion on adding a ZORDER index to a partitioned Delta data lake.

About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unified-data-analytics-platform

Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 9 participants
  • 39 minutes
lake
delta
spark
chat
park
gig
querying
updated
thanks
amsterdam
youtube image