Delta Lake / Delta Lake Tech Talks

Add meeting Rate page Subscribe

Delta Lake / Delta Lake Tech Talks

These are all the meetings we have in "Delta Lake Tech Talks" (part of the organization "Delta Lake"). Click into individual meeting pages to watch the recording and search or read the transcript.

8 Nov 2022

There are a lot of use cases of Delta tables on AWS. Noritaka Sekiyama, Principal Big Data Architect at AWS Glue, will demonstrate how to get started with processing Delta tables on Amazon S3 using AWS Glue, Amazon Athena, and Amazon Redshift on Tuesday, November 8, 2022 at 4:00PM PDT.

Learn more about Delta Lake: https://delta.io/
Noritaka Sekiyama: https://www.linkedin.com/in/moomindani/
Denny Lee: https://www.linkedin.com/in/dennyglee/
Join us on Slack: https://go.delta.io/slack
Delta Lake Releases: https://github.com/delta-io/delta/releases
  • 3 participants
  • 48 minutes
nori
chat
hi
visit
located
seattle
asking
webinar
foreign
loading
youtube image

3 May 2022

Change data capture is a popular method for unobtrusively ingesting data from SQL sources. In this talk, we will show how to easily incorporate your SQL data sources in near-real-time into Databricks and Delta Lake on Google Cloud. We will provide a short introduction to change-data-capture, Google Datastream (serverless CDC on Google Cloud), Databricks, and Delta Lake. In addition, we will also give a walk-through of our new open source Spark Structured Streaming connector which provides an easy-to-use / configure method of linking Datastream to Delta Lake.

Quick links:
https://delta.io/
https://github.com/badal-io/datastream-deltalake-connector
https://databricks.com/blog/2022/02/03/google-datastream-integration-with-delta-lake-for-change-data-capture.html
  • 3 participants
  • 49 minutes
seattle
torontonian
thanks
calgary
location
community
podcast
zoom
summit
doom
youtube image

21 Apr 2022

Join us for a live tech talk and learn about architecting for data quality in the lakehouse with delta Lake and PySpark. After the presentation, we’ll have time for questions. Excited to have you join us!

From null values and duplicate rows to modeling errors and schema changes, data can break for millions of reasons. To combat this, teams are increasingly adopting best practices from DevOps and software engineering to identify, resolve, and even prevent this "data downtime" from happening in the first place. Join Prateek Chawla and Ryan Kearns as they walk through how data and ML engineers can solve for data quality across the data lakehouse by applying data observability techniques. Topics to be discussed include: how to optimize for data reliability across your lakehouse's metadata, storage, and query engine tiers, building your own data observability monitors with PySpark, and the role of tools like Delta Lake to scale this design.

Links:
Exercises: http://github.com/monte-carlo-data/data-downtime-challenge
Jupyter Notebooks: http://github.com/monte-carlo-data/data-observability-in-practice
  • 4 participants
  • 1:01 hours
seattle
oakland
thanks
chat
community
dublin
aires
zoom
launched
trip
youtube image

24 Feb 2022

Data Lakehouse is combining the best elements of data lakes and data warehouses into a single platform to assist data teams to operate efficiently. With this modern data stack and Lakehouse capabilities, we can enable multiple types of data transformations to co-exist while eliminating the data silos in data teams. That means better data flows, simpler operational maintenance, and overall better data products! But, what happens when our transactions' logic contains more than one table? How can we attain cross-collection consistency with foreign keys using multi-statement transactions? That might corrupt our data products! In this session, you will learn how to leverage Delta Lake and LakeFS to reach cross-collection consistency when operating on multi-statement transactions.

Speaker Bio:
As Vice President of Developer Experience at Treeverse, Adi helps build lakeFS, git-like interface for the data lakehouse. In her work, she brings her vast industry research and engineering experience to bear in educating and helping teams design, architect, and build cost-effective data systems and machine learning pipelines that emphasize scalability, expertise, and business goals. Adi is a frequent worldwide presenter and the author of O'Reilly's upcoming book, "Machine Learning With Apache Spark." Adi is also a proud Databricks Beacon supporting Data & AI practitioners around the world! Previously, she was a senior manager for Azure at Microsoft, where she focused on building advanced analytics systems and modern architectures.

When Adi isn’t building data pipelines or thinking up new software architecture, you can find her on the local cultural scene or at the beach.
  • 3 participants
  • 48 minutes
lakefest
chat
thanks
session
danny
users
adio
exciting
personally
summit
youtube image

8 Feb 2022

In this tech talk, we'll explore how Delta Sharing has been extended to support sharing Delta Tables on both Azure and Google Cloud Platform. We'll dive into some of the great enhancements to the Delta Sharing project over the past two releases, including some cool features like query limits, to reduce the size of the dataset you’d like to explore, adding a Share expiration time, automatic refresh of presigned file URLs for long running queries, as well as other enhancements to the Sharing Server protocol.

To conclude, we’ll talk about some of the upcoming milestones in the Delta Sharing project and end with a live Q&A session to get your feedback and to answer some of your burning questions.

*This meetup will be recorded and will be available online here: https://youtube.com/playlist?list=PLTPXxbhUt-YVPwG3OWNQ-1bJI_s_YRvqP

A few resources:

https://delta.io/sharing/
https://github.com/delta-io/delta-sharing
Speakers:

Will Girten is a Sr. SSA at Databricks, where he's helped some of the largest customers at Databricks build modern, enterprise Delta Lakes in the cloud. He specializes in building efficient and reliable ETL pipelines for fast data engineering and BI workloads. Prior to Databricks, Will worked as a Data Architect helping federal customers build intelligent data lakes in HealthCare and Government verticals.

Ryan Zhu is a Staff Software Engineer at Databricks, an Apache Spark committer and a member of the PMC. He is one of the core developers of Delta Lake, Delta Sharing and Structured Streaming.
  • 4 participants
  • 42 minutes
delta
sharing
enterprise
server
demo
users
information
microsoft
docker
ryan
youtube image

27 Oct 2021

Why Data is Eating the Universe: The Coming Age of Massive Sky Surveys From Killer Asteroids to Dark Energy: How Apache Spark and Delta Lake can enable the next generation of discoveries in Astronomy

-----------------------------------------
Please note that this is an event hosted by the Seattle Spark+AI Meetup group. If you're local to the Seattle area, check them out! https://www.meetup.com/Seattle-Spark-Meetup
-----------------------------------------

Over the past decade, astronomy has morphed into an extremely data-rich field, with numerous telescope projects dedicated to scanning the sky every night in order to find and measure the properties of the tens of billions of visible objects in the sky. For example, Rubin Observatory’s Legacy Survey of Space and Time (LSST; http://lsst.org) will be the most comprehensive optical astronomy project ever undertaken. Starting in 2024, the LSST will take panoramic images of the entire visible sky twice each week for 10 years, building up the deepest, widest, image of the Universe. The resulting hundreds of petabytes of imaging data for close to 40 billion objects can enable scientific investigations ranging from the properties of near-Earth asteroids to characterizations of dark matter and dark energy. Yet at the same time, the sheer data volume and richness make it a difficult dataset to analyze using classical data management tools.

This is where Spark can help. At the UW’s DIRAC Institute, we’re about to embark on a 5-yr LINCC Frameworks project to develop analysis frameworks on industry-standard solutions, and enable astronomers to scalably work with petabytes of data stored both in cloud and on traditional HPC resources. Spark, combined with astronomy-specific extensions we developed, enabled us to prototype a system that gave our researchers exploratory access to large astronomical datasets. In this talk, we will describe the challenges of astronomical data analysis, how we tweaked Spark to analyze 2Bn of astronomical time-series data, some hopes and visions for a (cloud-based) future, and how you could get involved with the largest data analysis problem in the history of optical astronomy.

-----------------------------------------
Speakers
-----------------------------------------
Mario Juric: I’m interested in astronomical ‘Big Data’: developing and applying methods and algorithms that let us use large data sets to answer research questions. Major astronomical surveys of today are routinely collecting hundreds of terabytes of images, creating databases with billions of objects and several billion measurements. Large surveys astronomers are becoming part data scientists. In my research, I go where the data takes me — I’ve worked on topics ranging from asteroids in the Solar System, Galactic structure, to the scale structure of the universe. My current focus is using survey data to understand the structure and evolution of the Milky Way. I also lead the Data Management team for the Large Synoptic Survey Telescope, a project to build the largest sky survey ever undertaken.

Colin Slater: I work on understanding interactions between the Milky Way and the population of dwarf galaxies in the Local Group. This includes observing the tidal debris left behind by dwarfs as they fall onto the Galaxy, along with modeling the changing properties of dwarfs as they become satellites of the Milky Way. Much of my work uses data from the Pan-STARRS survey. I am part of the LSST Data Management System Science Team, and I support that project with analyses of the scientific requirements and expected performance of the survey. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 3 participants
  • 47 minutes
astronomers
galaxy
cloud
quasars
introductions
ai
meetup
caltech
seattle
currently
youtube image

7 Oct 2021

Delta Lake committers Christian Williams and R. Tyler Croy from Scribd discuss with Denny Lee from Databricks the technical and business requirements around the Delta Rust API project: kafka-delta-ingest.

This project aims to build a highly efficient daemon for streaming data through Apache Kafka into Delta Lake and has been in production at Scribd for the last four weeks after six months of active development.

Come to learn about why they built it and how it's going.

Resource links:
https://github.com/delta-io/kafka-delta-ingest
https://kafka.apache.org/
https://delta.io/

Speakers:

R. Tyler Croy leads the Platform Engineering organization at Scribd and has been an open source developer for over 14 years. His open source work has been in the FreeBSD, Python, Ruby, Puppet, Jenkins, and now Delta Lake communities. The Platform Engineering team at Scribd has invested heavily in Delta and has been building new open source projects to expand the reach of Delta Lake across the organization. Tyler is also a Databricks Beacon.

Denny Lee is a developer advocate at Databricks, where he works on Delta Lake, Apache Spark, Data Sciences, and Healthcare Life Sciences. He has previously built enterprise DW/BI and big data systems at Microsoft including Azure Cosmos DB, Project Isotope (HDInsight), and SQL Server as well as the Senior Director of Data Sciences Engineering at SAP Concur. Denny holds a Masters in Biomedical Informatics from Oregon Health Sciences University.

Christian Williams is a senior engineer on Scribd's Core Platform team. He has done application and data engineering for 15 years working with a wide range of languages and platforms, most recently working with Kafka, Delta Lake, Rust, and AWS to deliver streaming data ingestion. Before working in software Christian was also one of the fastest sandwich artists in the greater Jacksonville area. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 4 participants
  • 53 minutes
kafka
delta
docker
tyler
2019
challenges
presenting
thanks
started
scribd
youtube image

23 Sep 2021

We're starting to get feedback from the Delta Lake community on more integrations per the proposed 2021 H2 roadmap. Through this talk, Vini and Denny will provide a recap of the features including Spark 3.1 support, Delta Sharing released in 2021 H1, and what the community asks for the future roadmap for Delta Lake OSS. There are callouts for OPTIMIZE, Apache Heron, and Trino CTAS support as well as the current integration efforts around Apache Flink, PrestoDB, Apache Pulsar, LakeFS, and Nessie. Don't forget the standalone readers and writers and rust lang API optimizations

What are your favorites? Come participate, engage with the community and get your voices heard for Delta Lake!

Learn more: https://delta.io/
Join the Delta Lake community slack: https://dbricks.co/delta-users-slack
Join the Delta Lake google group: https://groups.google.com/g/delta-users
Roadmap Survey: https://forms.gle/LGMQtEbEjGezvfPz6

Meet our speakers:

Vini Jaiswal is a Developer Advocate at Databricks who helps data practitioners implement scalable data architectures and AI applications. She brings several years of Data and cloud experience working with Unicorns, Digital Natives, and some of the Fortune 500 companies including her role as VP - Data Science Engineering Lead at Citibank, Data Analyst at Southwest Airlines and holds an MS in Information Technology and Management from the University of Texas at Dallas.

Denny Lee is a Developer Advocate at Databricks. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premise and cloud environments. He also has a Masters of Biomedical Informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise Healthcare customers. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 3 participants
  • 48 minutes
delta
lakefs
lake
screen
streams
introduction
thanks
vinnie
chat
oss
youtube image

31 Aug 2021

This is a recording of the live tech talk hosted on August 31, 2021.

Join us for a tech talk about LakeFS and Delta Lake. Paul Singman from Treeverse and Denny Lee from Databricks discuss multi-table transactions with LakeFS and Delta Lake.

LakeFS enables you to manage your data lake the way you manage your code, allowing for a collaborative development environment and CI/CD deployment of data. Delta Lake is an open-source project that enables building a Lakehouse architecture on top of existing storage systems such as S3, ADLS, GCS, and HDFS. In this session, Paul and Denny will discuss LakeFS and Delta Lake and the integration of LakeFS and Delta Lake simplifies your multi-table pipelines.

Speakers:

Paul leads DevRel for lakeFS, after several years as a ML engineer at Equinox Fitness. He enjoys contextualizing the latest data trends and technologies in blog posts and talks, instead of getting caught up in the hype surrounding specific tools.

Denny Lee is a developer advocate at Databricks, where he works on Delta Lake, Apache Spark, Data Sciences, and Healthcare Life Sciences. He has previously built enterprise DW/BI and big data systems at Microsoft including Azure Cosmos DB, Project Isotope (HDInsight), and SQL Server as well as the Senior Director of Data Sciences Engineering at SAP Concur. Denny holds a Masters in Biomedical Informatics from Oregon Health Sciences University. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 3 participants
  • 45 minutes
chat
forums
meetup
podcasts
ai
blog
streaming
users
thanks
information
youtube image

26 Aug 2021

Join us for part three of a three part tech talk series: Upgrading from legacy to the cloud with Scribd. This is the final session, Moving Ad Hoc Users to the Cloud. Alexander Kushnir, R. Tyler Croy, and Hamilton Hord from Scribd discuss with Denny Lee from Databricks the technical and business issues around moving ad-hoc jobs to the cloud as part of Scribd’s migration from legacy environments to the cloud.

In this session, we dive into a variety of topics including exploratory non-dev use cases, how Scribd moved development into Databricks, model training use cases, and shared cluster resources. Listen to how Scribd engineers used Delta Lake to solve their production distributed cloud data issues.

Part One Recording: Replicating Data to the Cloud Recording - https://youtu.be/vGv6AcPp7Zs

Part Two Recording: Moving Batch Jobs over to the Cloud - https://youtu.be/siVvtalssrI Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 4 participants
  • 50 minutes
introductions
session
alex
migrated
cloud
2020
setups
sql
stepped
kroy
youtube image

25 Aug 2021

Join us for part two of a three part tech talk series: Upgrading from legacy to the cloud with Scribd. Part Two: Moving Batch Jobs over to the Cloud.

Part One: Replicating Data to the Cloud Recording: https://youtu.be/vGv6AcPp7Zs

Abstract: Alexander Kushnir and Stas Bytsko from Scribd discuss with Denny Lee from Databricks the technical and business issues around moving batch jobs to the cloud as part of Scribd’s migration from legacy environments to the cloud. Instead of performing the migration as a big bang, there was a byte-by-byte migration performed incrementally to minimize any disruption to the business. Listen to how Scribd engineers used Delta Lake to solve their production distributed cloud data issues.

Speakers:

Stas Bytsko is the team lead of the Data Engineering team at Scribd with somewhat mysterious, and possibly exciting, past.

Alex Kushnir is a Data Architect and a Tech Lead of Data Engineering Team @ Scribd. Throughout his almost 20 year career, he has acquired experiences in various Software Engineering domains: desktop applications, web development, mobile APIs, distributed computing, cloud architecture, big data. He designed and implemented solutions utilizing various data stores: relational databases, document databases, key/value stores, object stores. For the past 5 years, he focused on distributed computing in cloud environments utilizing various BigData tech stacks and he’s a big fan of Apache Spark.
https://www.linkedin.com/in/alexander-kushnir-2b96114a/

Denny Lee is a developer advocate at Databricks, where he works on Delta Lake, Apache Spark, Data Sciences, and Healthcare Life Sciences. He has previously built enterprise DW/BI and big data systems at Microsoft including Azure Cosmos DB, Project Isotope (HDInsight), and SQL Server as well as the Senior Director of Data Sciences Engineering at SAP Concur. Denny holds a Masters in Biomedical Informatics from Oregon Health Sciences University. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 3 participants
  • 46 minutes
cloud
alex
spark
transitioned
scribd
configured
microservices
tech
databrick
discussion
youtube image

24 Aug 2021

Join us for part one of a three part tech talk series: Upgrading from legacy to the cloud with Scribd. This series will run August 24-Aug 26 at 9AM PT each day. Come join us!

Part One: Replicating Data to the Cloud

Alexander Kushnir (https://www.linkedin.com/in/alexander-kushnir-2b96114a/) and Maksym Dovhal from Scribd discuss with Denny Lee from Databricks the technical and business issues when migrating Scribd’s systems from legacy environments to the cloud. We discuss many technical issues ranging from S3 eventual consistency, cross-cloud consistency, metastore consolidation issues, utilizing multiple catalogs for the same Delta tables, data replication from on-premises to the cloud, and more. Listen to how Scribd engineers used Delta Lake to solve their production distributed cloud data issues. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 3 participants
  • 46 minutes
scripts
introductions
cloud
lee
alex
replicating
transitioning
spark
hosting
databricks
youtube image

18 May 2021

Join Delta Lake committers Burak Yavuz, Shixiong (Ryan) Zhu, Tathagata Das, QP Hou, and R. Tyler Croy for a fun and informative “Ask Me Anything” session on the journey of Delta Lake.

One week before Data+AI Summit 2021 (https://databricks.com/dataaisummit/north-america-2021) this is your chance to ask questions from key contributors to the Delta Lake project especially on the how and why! Some come prepared with your questions! And yes, there will be a special appearance by Michael Armbrust as well! Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 8 participants
  • 58 minutes
meetup
chat
ai
streaming
forums
session
showab
delta
come
keynotes
youtube image

29 Apr 2021

Join us for the final in a four part series with Salesforce Engineering.

Link to complete series recording: https://delta.io/news/salesforce-engineering-delta-lake-tech-talk-series/

Abstract:
Abstract: As we build our Engagement Delta Lake on Databricks Workspace, one of the challenges is how to automate the integration testing of our Spark jobs in the CI/CD pipeline. We came up with two designs to tackle the challenge : Namespace Deployment and Scenario Based Testing. In this talk, we will discuss the rationale and implementations of the two designs.

Part 1: Engagement Activity Delta Lake - https://youtu.be/a7_I1Qi1LoU
Part 2: Boost Delta Lake Performance with Data Skipping and Z-Order - https://youtu.be/CwJeKANlSLo
Part 3: Global Synchronization and Ordering in Delta Lake - https://youtu.be/OtYXc6ud2bQ


-----------------
Speakers
-----------------

Zhidong Ke, Software Engineer PMTS, Salesforce
Zhidong is passionate in designing distributed systems, real-time/batch data processing and building applications.

Yifeng Liu, Software Engineer LMTS, Salesforce
Yifeng is a software engineer who has extensive experience in big data processing and distributed system, and interested in high volume, high complexity, low latency data pipeline and framework building.

Aaron Zhang, Software Engineering PMTS, Salesforce
Aaron is an experienced software engineering leader with interests and areas of focus in engineering secure, fault-tolerant, high volume systems built on micro services.

Heng Zhang, Software Engineering PMTS, Salesforce
Heng is a software engineer who is interested and specialized in micro services, distributed systems and big data. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 7 participants
  • 54 minutes
torontonian
thanks
hey
doug
visit
montrealer
asking
friend
canuck
francisco
youtube image

1 Apr 2021

Join us for the second in a four part series with Salesforce Engineering.

Abstract:
When building a data lake, partitioning strategy is one of the most critical decisions to make. Less optimized data partitioning strategy can generate small files and undermine read and write performance. Besides traditional file based partitioning with partition pruning, Databricks provides another option of Data Skipping and Z-Ordering (https://docs.databricks.com/delta/optimizations/file-mgmt.html) with I/O pruning and file Compaction. In this talk, we will share the evolving thinking of our partitioning strategy when building Engagement delta lake. Using this real world use case, We will elaborate why and how we leverage Data Skipping and Z-Ordering to Boost Delta Lake Performance.

Part 1: Engagement Activity Delta Lake - https://youtu.be/a7_I1Qi1LoU

-----------------
Speakers
-----------------

Zhidong Ke, Software Engineer PMTS, Salesforce
Zhidong is passionate in designing distributed systems, real-time/batch data processing and building applications.

Yifeng Liu, Software Engineer LMTS, Salesforce
Yifeng is a software engineer who has extensive experience in big data processing and distributed system, and interested in high volume, high complexity, low latency data pipeline and framework building.

Aaron Zhang, Software Engineering PMTS, Salesforce
Aaron is an experienced software engineering leader with interests and areas of focus in engineering secure, fault-tolerant, high volume systems built on micro services.

Heng Zhang, Software Engineering PMTS, Salesforce
Heng is a software engineer who is interested and specialized in micro services, distributed systems and big data. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 7 participants
  • 41 minutes
salesforce
databricks
dashboards
workloads
microservices
session
performance
ahead
maining
boost
youtube image

23 Mar 2021

Apache Spark™ has become the de-facto open-source standard for big data processing for its ease of use and performance. The open-source Delta Lake project improves Spark’s data reliability, with new capabilities like ACID transactions, Schema Enforcement, and Time Travel.

Join us in this meetup to learn more about the performance improvements in Apache Spark 3.0 including Adaptive Query Execution (AQE), Dynamic Partition Pruning (DPP), and handling skewed queries!

Topics to be covered including:

* The new Adaptive Query Execution (AQE) framework within Spark 3.0 can yield query performance gains. Based on a 3TB TPC-DS benchmark, two queries had more than a 1.5x speedup, and another 37 queries had more than 1.1x speedup.
* With Dynamic Partition Pruning (DPP), we can significantly speed up performance by pruning partitions based on the joins between the fact and dimension tables common in star schema design.
* Showcasing transactional support as part of DataSourceV2 with Delta Lake Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 1 participant
  • 57 minutes
conversations
thanks
chat
meetup
hi
seattle
webinar
kirkland
lake
juanita
youtube image

18 Mar 2021

In part one, we’ll talk about how they built the engagement activity Delta Lake to support Einstein Analytics for creating powerful reports and dashboards and Sales Cloud Einstein for training machine learning models.

Abstract:
At Salesforce, their customers are using High Velocity Sales (https://www.salesforce.com/products/sales-cloud/tools/high-velocity-sales/) to intelligently convert leads and create new opportunities. To support it, we built the engagement activity platform to automatically capture and store user engagement activities using delta lake, which is one of the key components supporting Einstein Analytics (https://www.salesforce.com/products/einstein-analytics/features/) for creating powerful reports and dashboards and Sales Cloud Einstein (https://www.salesforce.com/products/sales-cloud/features/sales-cloud-einstein/) for training machine learning models.

We will include: 1. Ingest the data. 2. Incremental Read. 3. Support exact once write across tables. 4. Handle mutation with cascading changes. 5. Normalize tables in data lake.

-----------------
Speakers
-----------------

Zhidong Ke, Software Engineer PMTS, Salesforce
Zhidong is passionate in designing distributed systems, real-time/batch data processing and building applications.

Yifeng Liu, Software Engineer LMTS, Salesforce
Yifeng is a software engineer who has extensive experience in big data processing and distributed system, and interested in high volume, high complexity, low latency data pipeline and framework building.

Aaron Zhang
Title: Software Engineering PMTS, Salesforce
Aaron is an experienced software engineering leader with interests and areas of focus in engineering secure, fault-tolerant, high volume systems built on micro services.

Heng Zhang, Software Engineering PMTS, Salesforce
Heng is a software engineer who is interested and specialized in micro services, distributed systems and big data. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 8 participants
  • 51 minutes
salesforce
panelists
workflow
activity
session
microservice
customers
app
scaling
ai
youtube image

16 Mar 2021

In this talk, Tristan Nixon, a Solutions Architect at Databricks and Ricardo Portilla, Lead Solutions Architect at Databricks, will demonstrate how data teams can leverage an open-source package tempo (available in Python and Scala) to advance time series use cases with Delta Lake and Spark.

In particular, we will show you how resampling to AS OF joins, and descriptive analytics of up to millions of time series can be done in parallel using a simple interface.

Speakers:

Tristan Nixon is a Solution Architect at Databricks. Tristan has been working in Data-science and ML engineering for over 15 years, in industries from Education to Telecoms and Chemical Manufacturing. He joined Databricks about a year ago where he acts as an SME for time series and Natural Language Processing (NLP).

Ricardo Portilla is a Solutions Architect at Databricks. Ricardo works with data teams to put data engineering, data analytics, and data science use cases into production. He as been at Databricks for ~3 years helping customers with use cases in all verticals and previously worked in the financial industry for 7 years. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 3 participants
  • 59 minutes
chats
streaming
thanks
meetup
ai
joins
visit
tempo
summit
salvador
youtube image

26 Jan 2021

Deep learning has come a long way over the past few years, with advances in cloud computing, frameworks, and open source tooling, working with images has gotten simpler over time. Delta Lake has been amazing at creating a tabular structured transactional layer on object storage, but what about images? Would you like to know how to gain a 45x improvement in your image processing pipeline? Join Jason and Rohit on data collab lab as we find out! Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 5 participants
  • 49 minutes
dataworks
databricks
episode
processing
view
enterprise
hosting
streaming
lee
lake
youtube image

17 Dec 2020

The Delta Architecture pattern has made the lives of data engineers much simpler, but what about improving query performance for data analysts? What are some common places to look at for tuning query performance? In this session we will cover some common techniques to apply to our delta tables to make them perform better for data analysts queries. We will look at a few examples of how you can analyze a query, and determine what to focus on to deliver better performance results.

About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unified-data-analytics-platform

See all the previous Summit sessions: https://databricks.com/sparkaisummit/north-america/sessions

Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks/
Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 1 participant
  • 31 minutes
optimizations
delta
benchmarking
workflow
data
analyze
staging
schedule
querying
checkpoints
youtube image

11 Nov 2020

Notes from the perf lab with fish and joe

Have you ever had poor performing queries in your Spark jobs? Want to learn how some of the most skilled practitioners dissect data performance problems at scale? Join us on the next Data Collab Lab as Franco and Denny compare notes with Chris Fish and Joe Widen about tuning some of the toughest data problems around!

Check out more details here: https://www.meetup.com/data-ai-online/events/274420324/

-----------------
Guests
-----------------
Chris Hoshino-Fish is a Solutions Architect at Databricks. Chris is an active member of the Performance Subject Matter Expert group and a former Principal Consultant focused on Data Engineering, working with several Fortune 500 Databricks customers. Prior to Databricks, Chris worked for an adtech company as a data engineer managing pipelines using Apache Spark for 3.5 years. Chris has a B.A. in Computational Mathematics from University of California, Santa Cruz.

Joe Widen is a Solutions Architect at Databricks. Joe leads the Performance and Delta SME horizontal initiatives along with making customers successful with the Databricks Unified Analytics Platform. Joe has been working with Spark and more generally Hadoop for 5 years, with previous stops at Hortonworks and Capital One.

-----------------
Hosts
-----------------
Franco Patano is a Solutions Architect at Databricks, where he brings over 10 years of industry experience in data engineering and analytics. He has architected, managed, and analyzed data applications both big and small, with open source and proprietary software, utilizing SQL, Python, Scala, Java, and Apache Spark, as well as experimenting with data science. Prior to Databricks, Franco worked as a Data Architect and Analyst in the Commercial Real Estate, Banking, and Education industries for organizations large and small.

Denny Lee is a developer advocate at Databricks, where he works on Delta Lake, Apache Spark, Data Sciences, and Healthcare Life Sciences. He has previously built enterprise DW/BI and big data systems at Microsoft including Azure Cosmos DB, Project Isotope (HDInsight), and SQL Server as well as the Senior Director of Data Sciences Engineering at SAP Concur. Denny holds a Masters in Biomedical Informatics from Oregon Health Sciences University. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 4 participants
  • 54 minutes
spark
bottleneck
databricks
problems
tasks
geeky
starters
performance
worry
experience
youtube image

27 Oct 2020

Apache Spark™️ has become the de-facto open-source standard for big data processing due to its ease of use and performance. And the open-source Delta Lake project enhances Spark’s lead with new capabilities like ACID transactions, Schema Enforcement and Time Travel. These features help ensure that data lakes and data pipelines can deliver high-quality, reliable data to downstream data teams for successful data analytics and machine learning projects.

In this tech talk, we will discuss the top tuning tips for Apache Spark 3.0 and Delta Lake on Databricks. Come prepared to ask your questions and join Joe Widen, Chris Hoshino-Fish, and Denny Lee to discuss when to use which join operations, how to pick your machine sizes, how to help speed up your merge operations, and how to make your jobs easier!

Link to slides and the notebooks used in this tutorial: https://github.com/databricks/tech-talks

Chapters
0:00 Welcome
02:52 Use the latest version of DBR
04:53 Picking the best join strategy
13:39 Use Apache Spark 3.0 and AQE
26:27 Partition Pruning
28:36 Data Skipping
31:24 Z-Ordering
39:34 Databricks Delta Lake and Stats
44:39 Optimizing Merges
47:24 Picking good instance types

Speakers:

Chris Hoshino-Fish is a Solutions Architect at Databricks. Chris is an active member of the Performance Subject Matter Expert group and a former Principal Consultant focused on Data Engineering, working with several Fortune 500 Databricks customers. Prior to Databricks, Chris worked for an adtech company as a data engineer managing pipelines using Apache Spark for 3.5 years. Chris has a B.A. in Computational Mathematics from University of California, Santa Cruz.

Denny Lee is a developer advocate at Databricks, where he works on Delta Lake, Apache Spark, Data Sciences, and Healthcare Life Sciences. He has previously built enterprise DW/BI and big data systems at Microsoft including Azure Cosmos DB, Project Isotope (HDInsight), and SQL Server as well as the Senior Director of Data Sciences Engineering at SAP Concur. Denny holds a Masters in Biomedical Informatics from Oregon Health Sciences University.

Joe Widen is a Solutions Architect at Databricks. Joe leads the Performance and Delta SME horizontal initiatives along with making customers successful with the Databricks Unified Analytics Platform. Joe has been working with Spark and more generally Hadoop for 5 years, with previous stops at Hortonworks and Capital One.

To join the zoom live chat:
https://www.meetup.com/data-ai-online/events/274093223/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 4 participants
  • 52 minutes
tuning
spark
tips
delta
databrick
server
gig
hadoop
lee
lake
youtube image

20 Oct 2020

Are you struggling with performance on your Tableau experience in the cloud? Do you use Tableau Server to Extract data from Delta Lake? Want to learn how you can create a hyperleaup from your Delta Lake to Tableau that will decrease latency to insights? Join us for another fun AMA session as we collaborate with Will Girten and Shoam Bhatt, and figure out what exactly a hyperleaup is!

Link to notebook: github.com/databricks/tech-talks

-----------------
Guests
-----------------
Soham Bhatt is a Senior Solutions Architect at Databricks based out of Seattle, WA. He is passionate about helping his customers in their journeys to build modern Data Lakes for Advanced Analytics and ML/AI. Before Databricks he worked at Toyota Motors on building their next generation Big Data Platform. Prior to that his background was in building Enterprise Data Warehouses for Fortune 100 companies with Kimball methodologies and now he loves guiding his customers with best practices as they convert those EDWs into modern Data Engineering Platforms in AWS and Azure.

Will Girten is a Sr. RSA at Databricks. He's helped some of the largest federal customers at Databricks build modern, enterprise Delta Lakes in the cloud. He specializes in building efficient and reliable ETL pipelines for fast data engineering and BI workloads.

-----------------
Hosts
-----------------
Franco Patano is a Solutions Architect at Databricks, where he brings over 10 years of industry experience in data engineering and analytics. He has architected, managed, and analyzed data applications both big and small, with open source and proprietary software, utilizing SQL, Python, Scala, Java, and Apache Spark, as well as experimenting with data science. Prior to Databricks, Franco worked as a Data Architect and Analyst in the Commercial Real Estate, Banking, and Education industries for organizations large and small.

Denny Lee is a developer advocate at Databricks, where he works on Delta Lake, Apache Spark, Data Sciences, and Healthcare Life Sciences. He has previously built enterprise DW/BI and big data systems at Microsoft including Azure Cosmos DB, Project Isotope (HDInsight), and SQL Server as well as the Senior Director of Data Sciences Engineering at SAP Concur. Denny holds a Masters in Biomedical Informatics from Oregon Health Sciences University.

To join the live chat, check out the meetup page: https://www.meetup.com/data-ai-online/events/273893266/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 5 participants
  • 39 minutes
franco
ai
thanks
come
malcolm
databricks
denny
summit
logistical
introing
youtube image

8 Oct 2020

How Scribd Uses Delta Lake to Enable the World's Largest Digital Library

Discuss with Scribd Engineers on Delta Tables and the Transaction Log

Join us for the next Data Collab Lab with Franco and Denny where we interview QP and Tyler from Scribd for a fun AMA session on How Scribd Uses Delta Lake to Enable the World's Largest Digital Library. In this session, we will discuss with Scribd engineers on how they transitioned from legacy on-premises infrastructure to AWS as well as utilize, implement, and optimize Delta tables and the Delta transaction log. Come ready with questions as this will be a fun interactive session. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 4 participants
  • 56 minutes
script
geeks
delta
workflows
chat
server
scribb
developer
challenges
danny
youtube image

25 Aug 2020

For this tech chat, we will discuss a popular data warehousing fundamental - surrogate keys. As we had discussed in various other Delta Lake tech talks, the reliability brought to data lakes by Delta Lake has brought a resurgence of many of the data warehousing fundamentals such as Change Data Capture in data lakes. Surrogate keys are unique and lack any business context so they can stand the test of time when joining domain (or dimensional) and fact data. This can be difficult in single-node systems and can be even more complex for distributed systems. In this session, we will discuss the history and value of surrogate keys and what are the requirements for good strategies to implement this data warehousing fundamental into your Delta Lake.

You can find the notebooks for this video at: https://github.com/databricks/tech-talks/tree/master/2020-08-25%20%7C%20Generating%20Surrogate%20Keys%20for%20your%20Data%20Lakehouse%20with%20Spark%20SQL%20and%20Delta%20Lake Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 2 participants
  • 57 minutes
enterprise
microsoft
doug
sql
introduce
dbas
users
setup
thanks
washington
youtube image

18 Aug 2020

On August 18th, 2020, join Apache Spark and Delta Lake committers Burak Yavuz, Tathagata Das, and Denny Lee for an illuminating “Ask Me Anything” session. Whether you would like to know more about the history of Apache Spark to the current bleeding edge use cases of Spark 3.0 and Delta Lake, this is the session to ask your questions!

Learn more at delta.io

---
Speakers:

Burak Yavuz is a Software Engineer and Apache Spark committer at Databricks. He has been developing Structured Streaming and Delta Lake to simplify the lives of Data Engineers. Burak received his BS in Mechanical Engineering at Bogazici University, Istanbul, and his MS in Management Science & Engineering at Stanford.

Tathagata Das is a Staff Software Engineer at Databricks, an Apache Spark committer and a member of the Apache Spark Project Management Committee (PMC). He is one of the original developers of Apache Spark, the lead developer of Spark Streaming (DStreams) and is currently one of the core developers of Structured Streaming and Delta Lake. Previously, he was a grad student in UC Berkeley at AMPLab, where he conducted research about data-center frameworks and networks with Scott Shenker and Ion Stoica.

Denny Lee is a developer advocate at Databricks, where he works on Delta Lake, Apache Spark, Data Sciences, and Healthcare Life Sciences. He has previously built enterprise DW/BI and big data systems at Microsoft including Azure Cosmos DB, Project Isotope (HDInsight), and SQL Server as well as the Senior Director of Data Sciences Engineering at SAP Concur. Denny holds a Masters in Biomedical Informatics from Oregon Health Sciences University. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 4 participants
  • 54 minutes
delta
updates
streaming
meetup
spark
demos
webinar
lake
hey
comments
youtube image

28 May 2020

We will discuss a popular online analytics processing (OLAP) fundamental - slowly changing dimensions (SCD) - specifically Type-2. As we have discussed in various other Delta Lake tech talks, the reliability brought to data lakes by Delta Lake has brought a resurgence of many of the data warehousing fundamentals such as Change Data Capture in data lakes. Type 2 SCD within data warehousing allows you to keep track of both the history and current data over time. We will discuss how to apply these concepts to your data lake within the context of the market segmentation of a climbing eCommerce site.

Speaker:
Douglas Moore, Solution Architect

I’m passionate about helping customers find value in data analytics and helping the people I work better succeed. 25+ year data veteran, ranging from embedded systems to massive cloud based data lakes. My early career interest centered around producing 3D animations of Finite Element Modeled Elastic Waves. Career wise, I came for the data visualizations and stayed for the computation and data. Past roles have included: Solutions Architect, Data Architect, CTO, Engineer. Current Specialties: Big Data Strategy & Architecture, Data Lakes, Streaming, Delta Lake, Spark, and Databricks. Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 3 participants
  • 1:01 hours
meetup
meetups
chat
joining
forum
thanks
visit
community
onboard
summit
youtube image

14 May 2020

Cloud computing has fundamentally changed how companies operate - users are no longer subject to the restrictions of on-prem hardware deployments such as physical limits of resources and onerous environment upgrade processes. With the convenience and flexibility comes challenges on how to properly monitor how your users utilize these conveniently available resources. Failure to do so could result in problematic and costly anti-patterns.

In this tech conversation, Denny Lee will interview Craig Ng and Miklos Christine to discuss the best practices on how to process and analyze Databricks audit logs using Delta Lake and Structured Streaming. We will discuss and demonstrate how administrators can utilize audit logs to track resource usage and identify these potentially costly anti-patterns.

Agenda: 10AM PDT - 11AM PDT (GMT-8)

10:00AM - 10:50AM - Tech Talk
10:50AM - 11:00AM - Q&A

To join the live chat, check out the meetup page: https://www.meetup.com/data-ai-online/events/270455958/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 4 participants
  • 54 minutes
meetup
meetups
chat
streaming
vlogs
join
online
thanks
delta
speakers
youtube image

7 May 2020

Join us for an online tech talk on Delta Lake. Tech talks include a technical presentation with slides and a demo, with time for Q&A at the end.

Abstract:
The General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) both aim to guarantee strong protection for individuals regarding their personal data and apply to businesses that collect, use, or share consumer data, whether the information was obtained online or offline. This remains one of the top priorities for the companies to be compliant and they are spending a lot of time and resources on being GDPR and CCPA compliant.

Your organization may manage hundreds of terabytes worth of personal information in your cloud. Bringing these datasets into GDPR and CCPA compliance is of paramount importance, but this can be a big challenge, especially for larger datasets stored in data lakes.

Learn how you can use Delta Lake which is created by Databricks and powered by Apache Spark™ to manage GDPR and CCPA compliance for your data lake. Because Delta Lake adds a transactional layer that provides structured data management on top of your data lake, it can dramatically simplify and accelerate your ability to locate and remove personal information (also known as “personal data”) in response to consumer GDPR or CCPA requests without disrupting your data pipelines.

Join our Tech Talk to learn:
- The compliance challenges big data and data lakes create for organizations.
- How Delta Lake improves data lake management and makes it possible to quickly find and surgically remove or modify individual records.
- Best practices for GDPR and CCPA Compliance using Delta Lake.
- Use of “Pseudonymization” (https://en.wikipedia.org/wiki/Pseudonymization) and structuring pipelines to locate and remove the identifier to destroy the linkage between the pseudonyms and identifiers.
- Demo on how to easily fulfill data requests with Delta Lake and Databricks.

Agenda: 10AM PDT - 11AM PDT (GMT-8)

10:00AM - 10:50AM - Tech Talk
10:50AM - 11:00AM - Q&A

To join the live chat, check out the meetup page: https://www.meetup.com/data-ai-online/events/270370715/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 4 participants
  • 1:04 hours
gdpr
governance
providers
data
gpr
privacy
disclaimer
session
ai
darin
youtube image

30 Apr 2020

Join us for an online tech talk on Delta Lake presented by Denny Lee and Paul Roome. Tech talks include a technical presentation with slides and a demo, with time for Q&A at the end.

While it is common to use Delta Lake as a sink for change data captured from traditional data sources; customers are increasingly asking how to use Delta tables as a source for a change data capture (CDC) process. To state a different way, how can we read a stream of changes from a Delta table, so that they can be propagated downstream.

Some example use cases include (but are not limited to):

- After cleaning the data following the Delta Architecture (bronze, silver, and gold tables), propagate this data to multiple downstream systems.

- An e-commerce company is using a Delta table to store features related to each of their customers sourced from multiple upstream sources. Upon any customer data change, this is propagated to update downstream ML models to provide the latest product recommendations to the customer.

- A large software company is using a Delta table to process and store 100s of TBs of customer telemetry data. Changes in this table need to be sent to a downstream consumer for updating a range of dashboards and analytics.

In each of these cases, we want to capture a change stream from a Delta table and send it somewhere for further processing. In this session, we will discuss the architecture, use cases, and solutions.

Agenda: 10AM PDT - 11AM PDT (GMT-8)

9:00AM - 9:50AM - Tech Talk
9:50AM - 10:00AM - Q&A

Link to Github for the notebooks: https://github.com/databricks/tech-talks Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 3 participants
  • 53 minutes
discussion
delta
meet
disruption
understanding
changelog
enterprise
demos
important
workflows
youtube image

23 Apr 2020

Join us for an online tech talk on Delta Lake. Tech talks include a technical presentation with slides and a demo, with time for Q&A at the end.

Predictive Maintenance (PdM) is different from other routine or time-based maintenance approaches as it combines various sensor readings and sophisticated analytics on thousands of logged events in near real time and promises several fold improvements in cost savings because tasks are performed only when warranted. The top industries leading the IoT revolution include manufacturing, transportation, utilities, healthcare, consumer electronics & cars. The global market size for this is expected to grow at a CAGR of 28%. PdM plays a key role in Industry 4.0 to help corporations not only reduce unplanned downtimes, but also improve productivity and safety. The collaborative Data and Analytics platform from Databricks is a great technology fit to facilitate these use cases by providing a single unified platform to ingest the sensor data, perform the necessary transformations and exploration, run ML and generate valuable insights.

Agenda: 10AM PDT - 11AM PDT (GMT-8)

10:00AM - 10:50AM - Tech Talk
10:50AM - 11:00AM - Q&A

To join the live chat, check out the meetup page: https://www.meetup.com/data-ai-online/events/270166033/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
  • 3 participants
  • 57 minutes
webinar
monitoring
streaming
chat
server
hi
iot
workshops
troubleshooting
delta
youtube image