►
From YouTube: Delta Lake Community Office Hours (2022-08-18)
Description
Join us on August 18, 2022 at 9:00 AM PDT for the Delta Lake Community Office Hours! Ask your Delta Lake questions live and join our guest speakers Gerhard Brueckl and Scott Sandre, alongside Vini Jaiswal from Delta Lake!
Ask us your Delta Lake questions. These sessions allow our community to ask questions about Delta Lake OSS and get to learn what we are building, planning to build and know about recently released features. These sessions are live and the recordings are available on the Delta Lake YouTube channel.
Quick links:
https://delta.io/
https://go.delta.io/slack
https://groups.google.com/g/delta-users
https://go.delta.io/github
A
I'm
waiting
for
my
link
to
pop
up
on
linkedin
one
second
awesome
long,
island
whoa.
We
have
san
francisco
canada
so
for
those
who
are
new
to
the
session,
these
sessions
are
live
and
occur
every
two
weeks
on
thursdays.
A
Last
week
last
month
we
took
a
break
because
we
did
a
lot
of
work
during
the
summit
and,
as
a
result,
you
have
awesome
features
to
work
with
on
delta
lake.
So
if
you
missed
any
previous
amas,
no
sweat,
the
recordings
are
available
on
our
linkedin
and
youtube
channel.
So
please,
please
subscribe
to
our
channels
and
also
there
is
an
official
webinar.
A
You
know
that
is
brought
to
you
every
two
weeks.
Also,
since
this
is
official
webinar
of
delta
lake,
we
want
to
make
sure
that
you
know
we
are
fostering
an
open
and
welcoming
environment
for
everybody.
So
please
make
sure
that
you
are
abiding
by
our
core
of
the
conduct,
which
means
that
please
do
not
add
anything
to
the
q
a
or
ask
questions
that
would
be
in
violation
of
our
code.
A
So
please
be
respectful
to
your
fellow
participants
and
presenters.
I
will
drop
in
a
link
as
well.
So
you
can
review
the
code
of
conduct
and
please
have
your
questions
coming.
We
want
to
make
sure
that
you
know
we
get
your
questions
answered
on
any
open
source
delta
lake.
A
If
you
know
if
you
are
doing
a
project
from
from
the
beginning
or
if
you
are
already
working
on
something
cool,
we
would
like
to
know
if
you
have
any
questions,
if
you
are
hitting
any
roadblocks
or
if
you
want
to
know
about
any
future
roadmap
or
features
that
we
are
building
so
without
further
ado,
let's
do
a
quick
round
of
introduction
of
the
panel.
So
scott,
you
want
to
go
first.
B
For
sure,
thanks
vinnie,
thanks
for
having
me
super
excited
to
be
here,
hi
everyone,
I'm
scott,
I'm
a
software
engineer
on
the
delta
lake
team
at
databricks.
I'm
super
excited
to
be
here
happy
to
talk
about
a
lot
of
the
new
features
we
have
coming
out
with
delta
lake
2.1
or
any
questions
you
have
about
our
big
recent
release:
delta
lake
2.0,
as
well
as
a
lot
of
our
other
connectors.
We
have
like
the
flint
connector
that
we've
been
working
on
so
happy
to
answer
any
questions
there.
A
C
Hi
everybody
matthew
powers.
I
am
a
longtime
spark
blogger
and
spark
open
source
nerd.
So
I
just
recently
joined
the
delta
lake
team,
working
as
a
developer
advocate,
so
looking
forward
to
writing
more
delta
lake
content.
A
Awesome
great
to
have
you
matt,
you
have
made
a
huge
impact
in
the
spark
community,
and
now
we
have
you
on
delta
lake
as
well,
so
welcome
welcome.
Would
you
like
to
you
know,
tell
about
any
exciting
features
that
you
have
seen
in
delta
lakers
recently,
so
the
community
will
know
from
your
perspective
as
well.
C
Yeah,
well,
I
think
I'll
give
just
like
my
high
level
delta
lake
experience.
I
was
using
playing
vanilla,
parquet
lakes
and
suffered
from
all
of
those
bugs.
You
know
not
having
transactions
schema
mismatches,
trying
to
compact
small
files,
so
I
mean
the
new
features
are
amazing,
but
just
the
old
features
too,
like
I've
faced
all
of
those
plainville
parque
lake
problems
and
just
having
them
all
magically
solved
by
delta
lake
has
made
me
love
delta
lake
lots.
A
Awesome
so
we
have
a
lot
of
people
joining
in
mariella,
punet
hammond
welcome
everyone!
Please
post
your
questions
on
youtube
and
linkedin.
You
can
find
our
linkedin
live
link
from
delta
lake
channel
awesome.
So
I
have
a
question
here.
What
are
some
of
the
recently
released
features
from
delta
2.0
scott?
Do
you
want
to
answer
that.
B
Yeah
for
sure,
well,
yeah,
the
biggest
news
with
delta
2.0
was
just
the
fact
that
delta
lake
is
now
fully
open
sourced
and
every
feature
that
you
see
in
some
sort
of
like
in
databricks
has
now
come
to
delta
lake,
which
is
just
really
good
news.
You
know
we're
happy
to
give
the
best
features
to
all
of
our
open
source
users
and
customers,
so
that
was
super
exciting
one
such
feature
was
the
optimize
the
order
command.
B
So
this
is
a
really
useful
utility
to
help
solve
the
small
files
problem.
This
is
a
problem
that
you
can
experience
when
you
have
constant
streams
coming
into
your
delta
table,
writing
lots
of
small
parquet
files
and,
after
a
certain
amount
of
time,
all
these
small
files.
They
will
increase.
Your
list
calls
to
your
various
cloud
providers.
So
what
we're
able
to
do
in
a
really
really
smart
way
is
let
you
basically
sort
your
table
along
and
arbitrary
dimensions
and
compact.
B
These
files,
together
through
what's
called
like
a
space
filling
curve,
and
we
open
sourced
that
in
2.0
and
got
like
a
lot
of
really
positive
feedback
to
that.
People
were
really
excited
about
that.
So
that's
probably
my
favorite
feature
recently,
but
delta
2.1
is
coming
up
soon
as
well.
So
we
can.
We
can
chat
about
that
later
too.
A
B
Yeah
for
sure
so
delta
your
delta
table
will
exist
in
some
cloud.
Storage
like
from
s3
from
azure
from
gcs,
et
cetera
and
delta,
keeps
track
of
all
the
metadata
files
for
your
table.
B
So
you
first
read
from
the
delta
log,
which
is
all
the
metadata
that
tells
you
which
data
files,
parquet
files
to
actually
read
and
over
time,
as
you
have
more
and
more
of
these
metadata
files,
just
performing
the
list
call
to
your
cloud
provider
say
hey:
what
are
the
files
on
my
table
that
can
start
adding
up
as
there's
more
files,
there's
longer
lists
so
by
compacting
your
files
together.
B
That
actually
just
reduces
the
list
overhead,
but
another
cool
thing
to
add
on
actually
is
what
we
have
a
really
exciting
pr
right
now
from
the
delta
community.
That's
actually
aiming
to
optimize.
Our
list
calls
on
s3
to
make
them
not
be
like
a
function
of
the
length
of
the
table,
but
just
like
constant
list
calls,
which
would
be
like
a
huge
speed
improvement
on
s3.
So
that's
just
one
little
tidbit.
I
wanted
to
share.
A
That's
awesome
so,
adding
on
to
that
there
was
another
feature
that
I
think
was
in
the
works,
which
was
change.
Data
feed
was
was
that
release
card.
I
think
so
right.
B
Yes,
it
was
yeah
that
was
also
released
in
delta
2.0,
so
change
data
cdf.
That's
our
solution
on
delta
lake,
for
the
capture,
data
change,
problem
cdc
and
what
we
basically
let
users
do
is
now
capture
row
level
changes
as
opposed
to
file
granularity
changes
with
like
very
very
little
to
almost
no
performance
overhead.
B
So
now,
when
you
are
writing
your
data,
doing
upserts
updates
merges
deletes
whatever
we're
able
to
capture
just
the
row
level
changes
in
like
separate
parquet
files
such
that
when
you
read
you're
able
to
know
if
a
row
was
removed
or
updated,
etc,
and
that's
there's
a
lot
like
downstream
use
cases
of
that
feature
and
again,
we've
gotten
like
really
great
feedback
to
that.
So
I'm
curious
to
see
how
people
are
using
it
and
what
exciting
things
they're
doing
with
that.
A
Awesome,
that's
that's
wonderful!
You
know.
Cdf
is
one
of
the
very
popular
features
that
have
been
asked
by
a
lot
of
people
who
want
to
make
sure
that
you
know
what
do
they
do
for
the
upcoming
tables
or
upcoming
records.
So
that's
really
helpful.
I
know
for
those
who
are
just
tuning
in,
we
also
have
gerard
in
the
background.
He
was
unfortunately
not
feeling
well
this
morning,
but
he
has
volunteered
to
answer
your
questions
through
slack.
He
worked
on
amazing
power,
bi
integration.
A
So
if
you
do
have
any
questions
related
to
that,
he
will
answer
in
the
chat
cool.
So
we
have
more
questions
on
our
linkedin
channel,
which
is
will
delta
lake
version.
2.0
support,
spark
3.1.1.
B
3.1.1
I.
B
So
so
delta,
like
2.0,
supports,
spark
3.2
and
currently
we
are
not
supporting
like
cross
spark
version
support
a
few
people
have
asked
for
it.
Just
there's
a
lot
of
like
work
overhead
to
do
for
that
sort
of
support
and
right
now
we're
prioritizing
the
latest
focus
on
the
latest
spark
version
and
adding
like
the
best
and
latest
delta
features.
So
if
there's
growing
support
and
demand
for
that
feature,
that's
a
discussion
we
can
have
later
on
for
cross.
Cross
version
spark
support,
but
right
now
no,
it
won't
work
with
3.1.1.
A
Yeah,
so
sama,
if
you
want
to
you,
know,
give
this
feedback
in
our
github
repo,
where
we
have
like
roadmap
discussion,
we
can
possibly
you
know,
you
know,
make
that
into
consideration
in
the
future.
Awesome
yeah.
B
Go
ahead
I'll
share
a
link
in
the
in
our
zoom
chat
and,
if
you
could
share
it
to
linkedin
or
wherever
that
shows
the
the
delta
and
spark
cross
cross
version,
support.
A
Awesome,
let
me
look
at
the
youtube
all
right,
so
there's
a
question
around
support
for
dropping
columns.
A
Is
somebody
read
through
it
and
is
that
supported
now
in
2.0
version.
B
Yes,
it
is
matthew
if
you
ever
want
to
jump
in
for
these.
I'm
not
sure
how
familiar
you
are
with
the
latest
features,
but
feel
free
to
to
chime
in
as
well
but
yeah
drop
column
was,
is
support
added
in
delta
2.0.
B
A
That's
awesome
so
matt
from
your
perspective,
what
like
you
know,
what
are
some
of
the
momentum
you
are
seeing
in
the
delta
lake
community.
C
Well,
before
I
jump
into
that,
I
just
wanted
to
add
a
little
thing
for
the
drop
column,
which
was
so
basically
parquet.
Files
are
immutable
and
I'm
actually
kind
of
confirming
my
knowledge
here
with
scott,
so
parquet
files
are
immutable
so
prior
to
2o.
C
In
order
to
drop
a
column,
you
would
basically
need
to
read
in
all
of
the
data
and
then
write
it
out
less
that
column
so,
like
dropping
a
column,
was
a
big
data
processing
exercise
versus
what
we
do
now
is
we're
just
in
the
metadata
in
the
transaction
log
saying
we'll
just
ignore
that
going
forward
type
of
type
of
situation
and
without
the
column,
mapping
that
was
not
possible.
Pretty
much
and
column
mapping
was
added
in
one
two
is
that
correct.
C
Yeah,
so
so
you,
I
think
to
your
other
question
vinnie.
So
what
are
the
kind
of
big
trends?
I
think
one
of
the
big
trends
I'm
seeing
is
just
more
adoption,
more
connectors
for
delta
lake.
We
have
so
many
connectors.
Everybody
seems
to
be
supporting
us
now,
which
is
so
important
because
you
know
when
you're
building
a
big
data
etl
pipeline,
you
always
want
to
be
able
to
go
from
one
system
to
to
the
next,
like,
oh
yeah,
we're
building
this
big
data
processing
pipeline,
and
then
we
want
to
make
a
delta
lake.
C
A
That's
awesome,
I
think,
releasing
delta
standalone
has
opened
a
lot
of
doors,
for
you
know
other
integrators,
other
integration
to
come
in
the
picture,
and
I
think
I
yeah
that's
a
very
critical,
or
that
was
a
very
critical
release
that
we
made
back
in
december
yeah.
A
So
in
terms
of
connectors,
I
think
some
of
the
things
we
were
working
on
around
flink
right,
something
that
that
was
missing
was
source.
Is
that
something
that
we
are
working
on?
Scott.
B
It's
something
that
we
worked
on
and
solved
yeah,
we
with
the
release
of
delta
connectors
0.5.
Two
weeks
ago.
We
added
support
for
the
flink
source.
So
now
we
have
a
flink
connector
that
suppose
that
supports
both
streaming
reads
and
writes
with
exactly
one's
guarantees.
Yeah,
it's
great
again.
We've
gotten
I've
gotten
a
couple
dms
actually
about
that
people
saying
thank
you.
B
We've
been
waiting
for
this,
and
so
now
that
we
have
both
the
sync
and
source
working
with
the
data
stream
api,
we're
now
working
on
adding
catalog
and
sql
support
for
our
flink
connector.
So
that's
our
next
major
initiative,
we're
still
in
like
the
design
phase
right
now,
but
we're
actually
we've
actually
gotten
a
few
issues
about
this
on
our
repo
and
we're
so
we're
excited
to
in
a
couple
weeks,
share
the
design,
dock
and
and
get
people's
feedback
on
our
approach
and
our
api
decisions,
etc.
A
You
know
it's
very
interesting.
A
lot
of
people
give
feedback
that
we
just
think
about
a
future
and
that's
our
community
releases
it
it's
it's
just
so
fast.
So
that's
a
very
good
feedback
cool.
I
think
another
question
is,
I
am
a
data
engineer
and
new
to
delta
lake.
May
I
know
from
where
I
should
start
to
learn
delta,
like
I
think
we
have
a
lot
of
resources,
but
I
would
let
panelists
answer
this
question.
C
I
personally
started
at
the
quick
start.
I
fired
up
a
jupiter
notebook.
I
went
through
the
examples
and
I
just
studied
what
was
happening
in
the
transaction
log.
That's
that's
how
I
personally
I
was
able
to
grok.
Definitely
because
at
the
first
it
seemed
a
little
mysterious,
but
then
once
I
was
like
performing
operations
and
seeing
which
transaction
log
entries
were
made
that
that's
what
made
me
understand
it,
but
luckily
it's
a
beautiful
abstraction.
So
you
actually
don't
need
to
do
that.
B
Yeah,
like
matt,
said,
I
don't
think
I
you
don't
necessarily
need
to
know
exactly
all
the
details
about
the
delta
protocol.
That's
all
abstracted
away
for
you,
we're
just
here
to
help
you
solve
your
problems
and
meet
your
business
needs.
What
does
get
exciting
from
my
perspective
is
those
little
details
about
the
delta
protocol,
because
one
feature
that's
on
our
h2
roadmap
for
this
year.
B
That,
I
think,
is
really
exciting,
is
called
deletion
vectors
and
what's
what
makes
it
so
exciting
for
me,
is
exactly
those
little
details
so,
for
example,
the
problem
that
it's
trying
to
solve
is
that
whenever
you
perform
an
update
on,
let's
say
a
single
file
in
your
delta
table,
because
we
support
multi-version
concurrency
control,
we
never
modify
that
parquet
file
in
place.
Of
course,
we
just
rewrite
a
new
one.
B
That
means
you
can
time
travel
back
in
time
and
see
historical
versions
of
your
table,
but
sometimes
you're
only
updating
a
few
rows
in
that
part
k
file.
Yet
here
we
are
going
and
rewriting
the
entire
par
k
file
and
what
deletion
vectors
help
us
do
is
within
a
certain
threshold.
We
will
just
be
writing
the
changes
to
a
separate
file,
which
means
you're,
not
rewriting
the
entire
4k
file.
B
So
little
detail
little
details
like
that
once
released
like
once
fully
developed
that
those
will
significantly
speed
up
your
reads
and
rights,
your
rights,
sorry
only
but
yes,
little,
details
like
that
are
really
fun.
A
Yeah,
I
think
that's
a
good
point,
scott,
because
you
know,
as
a
data
engineer,
you
you
like
to
know
how
those
mini
details
come
into
the
picture.
What
what?
What
can
make
your
data
engineering
pipeline
robust
so
sometimes
yeah?
It's
it's
really
helpful
to
know.
Like
you
know
what
is
the
protocol?
What
is
it
doing?
So
I
think
we
have
a
lot
of
resources
if
you
do
want
to
go
to
delta
dot
io.
A
We
have
like
learn
page
where
you
can
start
with
getting
started,
guide
spin
up
your
local
instance
or
you
know
in
the
cloud
we
are
also
working
on
making
more
tutorials
accessible
for
the
community
so
that
they
can
get
up,
started
and
running
yeah.
So
please
check
out
delta
dot,
io
and
github.
I
pasted
the
link
in
linkedin.
B
Yes,
that
is
a
great
question.
I
yeah
you're
testing
my
knowledge
of
spark
sql
right
now.
I
don't
think
we
support
it
right
now
and
I
also
don't
see
it
on
our
h2
roadmap.
So
this
is
something
where,
if
this
is
something
that
this
user
wants,
I
think
they
should
message
us
on
slack.
Join
our
community,
create
an
issue
and
start
a
conversation
about
that,
because
currently
there
hasn't
been
many
asks
for
that.
So
that's
not
something.
B
That's
not
a
feature
that
we've
actually
prioritized,
but
if
that's
something
that
the
community
wants,
then
we're
happily
happy
to
work
on
it.
C
B
C
A
A
Another
question
is
on
clusters:
edit
api
has
clusters
api.
I
think
that
is
more
specific
to.
I
think
you
are
using
data
bricks,
so
I
will.
I
will
pay
some
resources
to
help
you
sharon
yeah,
so
this
is
only
for
delta
lake
oss
cool.
So
I
think
we
are
coming
close
to
the
hour
close
to
our
30
minutes,
but
I
do
want
to
make
sure
that
we
call
out
some
specific
features
that
the
community
is
working
on
in
the
second
half
of
the
roadmap
scott.
A
What
did
we
miss
like?
What
are
some
of
the
features
that
our
prioritized
based
on
our
feed
feedback
from
the
community.
B
Yeah
great
question:
the
biggest
feature
was
that
was
really
really
demanded.
The
past
couple
weeks
has
been
support
for
spark
3.3
and
I'm
happy
to
announce
that
with
delta
lake
2.1,
which
a
preview
was
just
released
yesterday,
for
that
we
we
have
added
support
for
spark
3.3.
So
with
that
comes
like
a
lot
of
all
the
any
kind
of
improvement
you
get
with
the
latest
version
of
spark.
That's
now
brought
into
delta
lake
and
there's
also
some
extra
sql
syntax
support.
That's
pretty
exciting
too.
B
B
Preview
was
yesterday
and
then
oh
preview,
so
the
goal
of
our
previews
is
to
get
community
feedback.
Are
there
any
bugs
any
noticeable
performance
decreases?
Even
though
we
do
our
own
internal
benchmarks,
get
early
feedback
from
the
community
work
up
some
kinks
and
then
release
the
final
version
in
a
couple
weeks.
A
Awesome
awesome
ravi
is
asking
how
shall
I
research
on
integration
with
delta
lake
yeah,
so
I'm
gonna
drop
links
here.
We
have
worked
on
a
lot
of
integration,
so
I'm
gonna
drop
links
for
you
for
our
roadmap.
You
can
check
it
out.
There.
A
Marcos
is
saying
very
excited
about
2.1
release,
especially
time
travel
and
show
column
support.
Oh
thank
you.
A
Awesome
there
are
more
people
joining
in
now
any
closing
thoughts
from
you,
matt.
C
I
think
closing
thoughts
is
we
just
have
a
really
friendly
nice
community
and
we
encourage
everybody
to
join.
Our
slack
ask
questions,
ask
questions
on
stack,
overflow
and
we're
always
happy
to
help
and
we're
very
friendly.
B
If
I
could
add
on
to
that,
not
only
do
I
want
you
to
join
our
community,
I
want
you
to
make
prs,
because
we've
had
a
lot
of
really
good
features
added
in
the
past
couple
weeks
after
the
launch
of
delta
2.0,
when
people
got
excited
about
delta
lake
and
people
have
been
adding
great
features,
so
a
couple
of
those
actually
have
been
our
dml
commands
like
delete,
update,
merge,
etc.
B
The
sql
commands
are
now
are
actually
returning,
some
really
useful
metrics
from
that
operation,
which
was
an
api
that
wasn't
there
before.
So
this
is
something
where
users
wanted.
This
wanted
to
see
this
result
and
they
made
the
pr
themselves.
So
I'm
glad
that
people
are
are
contributing.