►
From YouTube: Delta Lake Community Office Hours (2022-05-26)
Description
Join us on May 26, 2022 at 9:00 AM PST for the Delta Lake Community Office Hours! Ask your Delta Lake questions live and join our guest speakers, Scott Sandre and Ryan Zhu alongside Vini Jaiswal from Delta Lake!
Ask us your #DeltaLake questions. These sessions allow our community to ask questions about Delta Lake OSS and get to learn what we are building, planning to build and know about recently released features. These sessions are live and the recordings are available on the Delta Lake YouTube channel.
Quick links:
https://delta.io/
https://github.com/badal-io/datastream-deltalake-connector
https://groups.google.com/g/delta-users
A
In
the
meantime,
please
tune
in
to
our
databricks
linkedin
page
and
link
youtube
channels.
That's
where
you
can
find
questions
or
you
can
post
your
own
questions
and
please
say
hi
where
you're
from
for
those
who
are
new
to
our
session.
These
sessions
are
live
and
occur
every
two
weeks
on
thursdays.
We
try
to
keep
it
at
9
a.m,
pacific
time
or
12
p.m.
A
Eastern
standard
time,
and
we
bring
a
panel
of
contributors
and
champions
of
delta
lake
to
this
session,
who
answers
your
questions,
live
on
just
like
open
source
software
and
any
questions
about
how
we
are
working
with
other
open
source
ecosystem.
So
a
mini
recap.
From
our
last
session
we
had
a
fun
trino
plus
delta
session,
where
we
did
some
live
music.
We
discussed
trino
and
the
connector
and
hope
we
try
to
have
fun
on
this
session
too.
So,
please
ask
away
your
questions.
A
We
will
be
monitoring
channels
on
linkedin
and
youtube
awesome
and
without
further
ado.
Let's
start
by
introducing
our
panel,
we
have
awesome
contributors,
scott
and
ryan.
Today,
who
will
be
answering
your
questions
so
scott?
Why
don't
you
go
ahead
and
just
introduce
yourself.
B
For
sure
hi
everyone
and
thanks
vinnie
for
for
having
me
my
name's
scott
I've,
been
working
on
delta
lake
for
just
about
a
year
now.
Currently,
I'm
working
hard
on
on
the
next
release
on
some
exciting
new
features
that
I'm
sure
I
can
talk
about
today
and
yeah.
Looking
forward
to
answering
any
of
your
questions.
C
Yeah,
hello,
I'm
ryan,
I'm
a
software
engineer
at
scala
bricks
and
I
have
been
working
on
chaotic
since
the
beginning,
which
is
kind
of
already
almost
five
years
yeah
and
I'm
pretty
excited
to
hear
and
may
would
like
to
hear
a
lot
of
feedback
to
the
project
and
maybe-
and
I
can
also
try
to
help.
If
you
have
any
questions.
A
Awesome
awesome,
thank
you,
scott
and
ryan,
so
we
have
a
few
feeds
coming
in
from
linkedin
wow.
There
are
people
joining
from
everywhere,
all
right,
the
first,
oh
already
the
questions
are
coming
in.
So
I'm
gonna
read
out
the
first
question.
C
No
basically,
right
now
for
the
dialect
of
open
source
project.
We
focus
on
builder
list
new,
it's
kind
of
like
a
data
lake
format,
so
we
don't
want
to
like
us,
make
our
scope
much
larger
today,
right
now,.
A
Yeah
and
just
to
add
to
that
you
know
we
do
have
like
other
efforts
in
streaming
area.
I
think
this
on
this
office
hours.
We
will
focus
on
what
things
we
are
working
on
in
delta
lake,
and
you
know
we
will.
We
will
help
you
out
with
a
lot
of
ins
and
outs
of
what
is
happening
within
the
software,
so
ask
away
those
questions
all
right.
Other
question
is:
can
we
use
delta
lake
for
heavy
volume
load
of
data
like
600
gigabytes
of
data
for
atomic
operations?
B
B
I'm
curious
if
this
is
a
600
gigabyte
commit,
but
again,
data
commits
is
different
than
actually
committing
to
your
to
the
transaction
log,
which
is
just
metadata,
which
is
very
quick
and
easy.
So
again,
it
depends
on
their
exact
use
case.
A
Yes,
and
also
like
dazzling
is
built
for
scaling
your
storage
and
operations
transactions.
So
definitely
it's
meant
for
big
data
workloads
awesome.
So
we
already
have
next
question.
Please
explain
the
difference
between
serializable
and
right,
serializable,
isolation.
C
A
Yeah,
I
agree,
ryan.
I
think
we
also
have
a
talk
around
this,
so
I
will
share
the
document
and
talk
in
the
channel
shortly
after
our
emails.
Another
question
we
have
is:
do
we
have
connectors
to
bring
metadata
into
purview
or
any
other
catalog
services.
C
So
you,
it
basically
means
so
currently
stereotypes
try
to
basically
manage
the
mega
data
by
itself,
which
we
can
ensure
it's
atomic
and
you
won't
see
any
inconsistencies
between
like
a
hybrid
store
or
photovoltaic
mega
store
service.
A
Yeah
yeah
so
yeah
say
just
adding
on
top
of
it.
If
you
have
like
any
kind
of
connectors
that
you
don't
see
or
if
you
want
to
see
any
kind
of
features
built
into
the
lake,
you
know
we
have
very
a
great
sla
and
our
teams
are
working
hard
to
determine
the
priorities.
So
please
engage
with
us
on
github
and
slack.
We
will
be
happy
to
answer
your
questions.
A
Awesome.
There
is
a
flood
of
questions,
so
it's
hard
to
pick,
but
the
next
question
is
how
different
is
delta
lake
from
lake
house?
Can
you
say
some
use
case
examples
pretty
cool
question
actually
who
wants
to
take
this.
B
For
sure,
I'm
sure
vinnie,
you
can
comment
on
it
as
well,
but
delta
lake
is
just
an
underlying
technology
that
enables
you
to
build
a
lake
house.
The
lake
house
is
like
an
architectural
paradigm
for
managing
all
like
your
end-to-end
data
use
cases
and
delta
lake
is
just
one
technology
in
that
chain
that
enables
you
to
do
so.
A
Exactly-
and
I
think
I
will
add
on
to
what
scott
said
by
giving
some
use
case
examples,
for
example,
like
you
know,
lake
house
is,
is
a
term
that
we
realize
it's,
it's
not
something
that
we
developed
it's
something
that
we
realized
from
a
lot
of
use
cases
working
with
customers
over
a
period
of
like
last
four
years,
so
it
just
started
becoming
a
thing.
People
are
doing,
bi,
analytics
machine
learning,
ai
everything
together
on
on
a
single
platform,
and
that
helps
us
coin.
A
A
term
which
is
lake
house,
and
you
know,
delta
lake
is
just
an
ideal
enabler
of
or
foundation
of,
ideal
open
source
lake
house.
So
hopefully
that
helps
answer
your
question.
A
C
Yeah
so
basically
delta
provider,
acid
support
which
support
transactions
and
the
opener
the
optimize
command
is
also
built.
On
top
of
this,
which
will
ensure
your
table
will
never
get
cropped.
For
example,
if
you
cancel
optimize
optimize
command,
you
will
see
either
this
command
complete
is
the
name
of
dynamic
thing
or
is
already
complete.
A
Exactly
yeah,
that's
right
as
a
transaction
enable
us
to.
Basically,
you
know,
help
with
that
situation.
That's
why
a
lot
of
people
or
users
use
it
to
make
sure
that
their
data
is
not
corrupt
and
it's
reliable
data.
Reliability
is
one
of
the
guarantees
that
delta
lake
provides
so
yeah.
C
Yeah,
I
think
this
question
probably
not
really
a
that.
I
like
a
question.
A
Yeah,
exactly
and
also
like
you
know,
as
far
as
delta
lake
is
concerned,
it's
an
independent
project
and
you
know
it's.
You
can
configure
any
kind
of
storage
system
underneath
and
you
know,
underneath
the
storage,
whatever
storage
system,
you
are
using,
follow
the
best
practices
on
what
their
impact
and
limitations
are.
A
You
know,
interestingly
enough,
when
you
build
data
pipelines,
you
will
encounter
like
if
your
data
volumes
are
growing,
there
might
be
limiting
limits
that
cloud
providers
or
any
any
vendor
will
post
on
your
on
your
environment
infrastructure
so
be
mindful
of
reading.
You
know
those
documentation
as
well
and
I'm
happy
to
share
some
more
cool.
C
Yeah,
so
basically
delta
lag
doesn't
really
support
and
struck
data
today,
but
with
lag
of
a
number
using
spark,
you
can
actually
easily
pass
language
and
structure
data
into
like
strong
data
and
store
in
data
lab
and
the
data
label
those
suppliers
for
the
most
schema
evolution,
which
you
can
like
handle
a
lot
of
like
use
cases
like
this.
Whenever
you
have
some
schema
changing
and
structure
data,
is
you
can
still
inject
like
this
data
into
the
other
leg?.
A
B
Add
on
to
that
ryan
delta
lake
works
really
well
with
the
medallion
architecture
of
your
bronze
silver
and
gold
tables.
So
what
delta
lake
really
enables
you
to
do
is
import
all
of
your
data
as
bronze
tables,
without
parsing
it
or
performing
any
kind
of
computation
or
transformation
on
it,
and
then
you
know
gradually
aggregating
it
and
parsing
it
and
cleaning
it
up
into
your
silver
and
gold
level
aggregates,
which
is
just
a
really
efficient
pipeline
and
a
really
safe
way
to
manage
your
data.
A
Exactly
awesome,
we
have
another
question
that
is:
does
delta
lake
are
able
to
read
data
from
web
services,
meaning
like
some
kind
of
external
protocol.
C
Yeah,
I
think
it
is
basically
depends
what
list
web
service,
for
example
like
for
s3.
It's
also
like
provided
a
data
through
web
service.
Actually
even
it's
look.
It's
like
a
storage.
So
if
your
web
service
can
provide
like
all
the
storage
like
a
requirements,
will
we
ask
then
we
should
be
able
to
support
this
by
building,
for
example,
a
new
implementation
of
log
store
to
on
top
of
this
web
service?
But
it
really
depends
on
what
features
your
web
service
provides.
A
Exactly-
and
you
know
it
can
be
from
you
know
that
like
supports
wide
variety
of
sources,
you
might
have
kafka
streaming
coming
up
like
which
is
ingesting
data
from
your
source
systems
to
delta
lake.
So
you
know
any
variety
of
system
like,
for
example,
media
files
or
you
know,
a
iot
device
which
is
sending
sensor
data,
so
it
could
be
like
multiple
things
which
delta
natively
supports
and
then
another
question
is:
can
you
explain
delta
lake
with
one
real-time
scenario
or
use
case?
A
A
I
can
yeah
I
can.
I
can
answer
that
question.
So,
for
example,
if
you
have
like
you
know,
real-time
feed
from
an
iot
device,
for
example
a
lot
of
for
example.
If
you
are,
you
know,
I'm
not
really
sure
who
what
your
industry
is,
but
I'm
just
gonna
take
iot
example.
So
imagine
you
have
150
sensors
on
different
trucks
for
trucks
and
they
are
sending
data
in.
You
can
actually
have
that
feed
into
s3,
which
and
top
it
off
with
delta
lake.
A
The
how
you,
how
you,
how
you
use
delta
lake,
in
that
particular
scenario,
is
whenever
you
have
like
whenever
you
have
a
detection
event,
for
example,
a
real-time
detection
event
like
you
know,
if
a
driver
behavior
is
going
sideways.
Based
on
your
past
analysis,
you
can
actually
put
models
around
it
and
wait
for
those
signals
to
help
you
to
help
you
out,
detect
those
signals
and
immediately
take
action
on
it
and
delta.
A
Helps
you
with
that,
because
you
know
you
can
have
either
batch
or
streaming
stream
of
data
coming
in
from
those
devices
and
help
monitor
or
help.
You
know
put
it
into
your
machine
learning
pipeline
either.
In
the
batch
format
or
streaming
format-
and
you
can
basically
you
know
basically
either
use-
use
those
events
to
track
to
track
or
maybe
like
whenever
there
has
been
a
cut
up
data
or
maybe,
if
your
software,
so
maybe
you
had
a
new
release
of
software
and
something
goes
wrong
you
can.
A
You
can
push
it
back
using
the
transaction
log
from
delta
lake
and
it
helps
you
save
those
those
already
detected
events.
So
that's
one
of
the
use
cases,
I'm
sure
there
are
so
many
that
I
would
be
happy
to
share.
We
have
like
a
whole
use
case
repository,
so
I'm
happy
to
share
with
that
link
as
well
all
right.
Another
question
is:
what
sort
of
transaction
volume
does
datalake
support
for
inserts
or
deletes.
A
Wouldn't
this
be
like
just
a
basic
property
for
delta
lake?
Doesn't
regardless
of
transaction
volume
it?
This
is
a
delta
characteristic
right.
C
A
Yeah,
I
think
that's
right,
maybe,
like
you
have
a
particular
use
case,
and
maybe
the
preliminary
question
is
like
you
know
yeah
what
sort
of
like
how
you're
defining
it
may
be
helpful.
Yeah.
Also.
Another
question
is:
is
it
mandatory
to
have
bronze
and
silver
layers?
Can
we
combine
into
one
storage.
B
B
So
it's
certainly
not
mandatory,
but
it's
it
enables
some
really
powerful
features
to
be
built
on
top
of
it,
and
if
you
have
like
more
specific
questions
about
how
to
build
your
pipeline
feel
free
to
like
message
us
on
slack
or
post
an
issue
on
our
github
and
I'm
not
quite
sure
what
you
mean
by
combining
into
one
storage.
But
perhaps
this
is
a
discussion
you
can
like
take
offline.
A
Yeah
definitely
yeah,
and
I
will
also
add
to
this
scott
scott.
While
scott
said
it's
not
mandatory
and
it's
one
of
the
best
practices
we
have
seen
working
well
for
customers.
That's
why
you
know
in
our
medallion
architecture
we
consider
three
types
of
layers,
because
we
deal
with
a
lot
of
different
use
cases.
A
For
example,
somebody
is
working
on
machine
learning,
somebody's
working
on
analytics,
so
analytics
users
who
are
building
just
the
dashboards,
don't
have
to
have
raw
data,
they
will
be
lost
if
you
give
them
raw
data,
so
it's
very
helpful
to
give
them
business
level
aggregates,
and
that's
why
goal
tables
are
really
helpful.
A
Another
scenario
is:
if
you
have
pii
and
non-pii
data,
you
don't
want
to
give
access
to
everybody
in
your
organization
for
all
all
your
data,
so
you
make
silver
table
where
all
data
goes
in.
Not
all
everybody
has
access
to
it.
You
make
a
bronze
table
as
an
extraction
of
that
and
that
will
that
can
be
like
non-pii
table
where
you
can
freely
give-
and
you
know
also
work
towards
having
like
some
kind
of
governance
around
your
data
that
helps
just
as
a
best
practices.
That's
why
we
have
that
in
place.
A
So
it's
not
mandatory,
but
it
is
something
that
we
recommend,
as
we
have
seen
that
working
well
with
customers
all
right.
Another
question
is:
is
there
a
plan
to
integrate
any
data,
governance
standards
or
data
quality
rules
to
make
data
management
more
robust
right
from
the
source
through
multi-tiered
target?
A
A
So
I
think
this
is
going
towards
expectations,
so
not
really
like
a
delta
lake
question
but
feel
free
to
correct
me,
ryan
r,
scott,
but
this
is
something
in
delta
life
tables.
What
you
can
do
is
you
can
define
certain
expectations
and
quality
rules
to
ensure
that
you
know
whatever
you
are
you
have
going
in.
You
have
like
you,
have
a
live
lineage
of
whatever
is
happening
in
your
within
your
data.
So
that's
the
architecture.
You
can
follow
all
right,
correct,
scott
trial,
anything
you
read.
A
All
right
all
right,
let
me
see
if
there
are
any
questions.
A
A
Is
there
any
way
to
restart
the
pipeline
with
minimal
streaming
initialization
time.
C
C
So
but
it's
basically,
it
would
be
great
to
like
know
the
exact
details
of
when
you
mess
here.
You
mention
that
when
data
grows,
so
it's
what
kind
of
data
is
the
data
log
for
the
multi-sectional
or
it's
just
like
a
practice
files,
but
in
general,
if
you,
the
data
growth,
is
just
like
a
packet
file
in
increase
a
lot,
then
this
should
not
be
a
problem.
A
Yep
that
works
all
right.
Another
question
is:
can
we
use
incremental
in
live
delta
tables.
A
Yeah,
yeah
and-
and
I
think,
maybe
like
if
this
is
meant
for
like
how
you
separate
batch
or
how
do
you
define
stream,
maybe
it's
like
trigger
if
you
set
up
trigger
characteristics
that
that
would
help
you
in
you
know
determining
when
you
want
data
to
arrive
so
set
those
trigger
points
or
like
how
how
frequently
you
want
to
trigger
and
the
drive
you
know
the
mode
is
this
happening
in
badge
or
stream
mode.
A
And
then
scott,
you
are
working
on
cdf
for
the
next
release.
Can
you
touch
on
that,
like,
what's
bfs
and
in
terms
of
delta
lake,
and
what
is
that
exact
feature.
B
For
sure
so
yeah,
so
the
next
feature
I'm
or
feature
I'm
currently
working
on
on
developing
is
called
change.
Data,
feed
or
cdf
for
for
sure,
and
what
this
enables
users
to
do
is
actually
capture
row
level
changes
instead
of
like
per
file
level
changes
for
like
streaming
queries,
which
serves
a
lot
of
like
downstream
use
cases,
and
it's
been
a
really
highly
demanded
feature.
So
I'm
really
glad
we're.
Finally,
finally
getting
around
to
developing
it.
A
That's
awesome
and
this
cdf
or
change
data
capture
means
like
it
only
will
attract
the
changes
that
happens,
and
it
will
add
statistics
to
your
transaction
logs.
Is
that
the
feature.
B
Exactly
so,
you
know
if,
if
there's
only
only
deletes,
then
we'll
actually
just
tell
you
which
rows
were
deleted
and
if
there's
only
inserts,
we'll
tell
you
which
rows
are
inserted,
but
then
there's
more
complicated
cases,
for
example,
updates
and
merges,
and
during
cases
like
that,
will
actually
tell
you
what
the
pre
updates
and
the
post
update
values
were
because
and
then
it's
up
to
you
in
your
downstream
application
to
decide
what
you
want
to
do
with
that
information.
A
Awesome
that
that
would
be
a
nice
feature.
Another
question
was,
you
know
in
terms
of
you
know,
next
release
that
we
are
prepping
up.
I
think
we
are
soon
going
into
our
summit,
which
is
happening
in
last
week
of
june.
What
are
other
exciting
features
that
you
both
are
working
on?
Maybe
we'll
go
with
use
card
first
and
then
right.
B
Sure
yeah,
then
the
next
really
exciting
feature
is
z-order
optimize,
so
optimize
is
already
in
delta
lake,
which
allows
you
to
essentially
compact
your
data
parquet
files,
which
reduces
the
size
of
the
transaction
log.
A
B
Like
ryan
said,
this
speeds
up
the
delta
log
operations
and
what
z
order
does
is
it's
essentially
a
new
algorithm
to
improve
or
optimize
the
optimize
command,
and
it
improves
exactly
how
we
determine
best
to
compact,
your
parquet
files.
So
what
it's
really
useful
for
is,
after
you
have
partitioned
your
data.
B
You
now
have
all
of
these
actual
data
columns
and
it's
a
really
hard
problem
to
decide
how
to
best
co-locate
or
group
files
together
based
on
n
arbitrary
amounts
of
data
columns,
so
z
order
is
one
such
algorithm,
which
is
a
space
feeling
curve
which
basically
is
able
to
determine
which
best
files
to
compact
together
and
this.
The
main
benefit
of
this
is
during
your
next
or
after
your
queries.
A
B
I'm
not
exactly
sure
what
which
sources
you're
referring
to
cdf
is
a
feature
that
just
it
applies.
It
can
be
applied
to
any
change
to
the
transaction
log,
regardless
of
where
the
source
is
coming
from
so
yeah
it's
just
it's
yeah
we're
I'm
curious
what
they're,
what
exactly
they're,
referring
to
yeah.
A
So
maybe
if
they
post
it
adam,
please
post
it
what
the
what
you
are
referring
to
I,
this
is
just
an
extended
feature
on
how
we
are
evolving
transaction
logs,
so
it
doesn't
matter
like
whatever
that
link
supports.
This
will
support
as
well
cool.
So
the
next
question
is
for
you,
ryan.
What
are
you
working
on
exciting
things?
Yeah.
C
Yeah
for
me,
and
while
I'm
also
I'm
working
on
general
ed,
I'm
also
kind
of
also
spending
a
lot
of
time.
Working
on
delta
live
table,
which
is
a
different
project,
but
it's
also
built
on
top
of
delta
lag
to
allow
people
to.
For
example,
you
can
define
your
table
and
tell
us
the
relationship
between
these
tables,
and
then
we
will
help
you
manage
all
this
table
automatically
and
basically
and
when
I
kind
of
move
in
a
lot
of
time
on
this
project
as
well,.
A
That's
awesome,
then
there
are
a
lot
of
questions
on
delta
light
table
so
whenever
I
think
we
already
have
a
demo
and
blog
on
the
live
table,
so
I'm
I'm
happy
to
post
it
on
the
slack
channel
and
stay
tuned
into
that
area
as
well,
and
there
have
been
a
lot
of
questions
around
data
breaks,
sql
and
some
databricks
related
question.
A
Unfortunately,
this
is
the
forum
for
only
our
team,
which
is
working
on
delta
lake,
so
I
will
paste
the
channel
link
so
that
you
can
ask
appropriate
questions
about
databricks
and
databrick
sql
and
also
I
will
paste
the
link
for
our
github
and
slack
channels
where,
if
this
was
not
the
forum,
where
you've
got
your
questions
and
asked
or
if
you
are
joining
offline,
please
feel
free
to
join
us
from
slack
and
github.
We
definitely
will
get
back
to
you
on
those
yeah
recording.
A
We
are
recording
this
sessions
and
it
will
be
available
shortly
after
naveen.
If
that's
what
you
want
cool,
so
that's
about
it.
We
are
almost
close
to
our
time
and
you
know
appreciate
everybody
who
asked
great
questions.
I
highly
encourage
you
to.
You
know
keep
working
with
us
and
if
you
are
passionate
about
contributing,
we
also
have
some
good
first
issues
in
github,
so
please
do
engage
with
us
and
thank
you
so
much
scott
and
ryan
for
being
awesome
panelists
for
this
session.
A
Hopefully
we
see
you
soon
next
next
next
week,
yep
and
one
more
thing,
if
you
want
to
join
our
summit,
we
will
have
a
lot
of
meetups.
We
will
have
a
lot
of
like
machine
learning
and
ai
visionaries
and
I'm
super
excited.
We
will
also
have
contributor
meetup
and
if
you
want
to
meet
us
in
person,
I
will
send
you
the
registration
link.
So
please
check
that
out.
Yeah.
Thank
you
all.