►
From YouTube: Delta Lake Community Office Hours (2022-03-31)
Description
Join us for the next Delta Lake Community Office Hours and ask us your #DeltaLake questions. The Delta Lake community AMAs occur bi-weekly on Thursdays at 9AM PST. These sessions allow our community to ask questions about Delta Lake OSS and get to learn what we are building, planning to build to recently released features.
A
A
A
Yeah,
why
don't
you
check
linkedin
danny?
I
think
I'm
still
waiting
for
the
url,
but.
A
Awesome
everybody
who
is
joining,
we
are
just
getting
started.
Hi
everyone.
A
From
italy
I
am
from
san
francisco,
we
are
joining
from
all
different
places,
so
why
don't
you
put
put
down
where
you
are
joining
from
and
then
we
will
start
with
introducing
our
panelists
great
and
for
any
questions
related
to
delta?
These
are
delta
lake
community
office
hours.
So
we
encourage
you
to
ask
questions
on
open
source
delta
lake.
It
can
be
whether
the
questions
about
the
feature
release
that
we
we
did
recently
or
what
is
coming
up
in
the
roadmap.
A
So
with
that,
I
will
kick
it
off
with
introduction
to
the
panelists.
We
have
scott
wenke
and
denny
the
contributors
of
delta
lake
oss.
So
why
don't
we
start
with
scott
scott?
Why
didn't
you
introduce
yourself.
C
Sure,
thanks
vinnie
good
morning,
everyone
or
good
afternoon,
wherever,
wherever
you're
from
I'm
scott,
I'm
on
the
delta
lake
ecosystem
team
here
at
databricks
and
what
I've
been
working
on
recently
is
a
variety
of
open
source
features
for
delta
lake,
as
well
as
expanding
the
connector
ecosystem.
So
for
the
former
one
exciting
feature
I
worked
on
recently
was
open
sourcing
column,
stats
generation
and
data
skipping.
C
So
that's
a
big
boost
to
read
performance
and
I'm
sure
that
in
the
next
release,
1.2
people
are
really
going
to
enjoy
that
and
for
the
connector
ecosystem.
We
are
working
on
releasing
the
delta
flink
sync
and
we're
actively
working
on
developing
the
delta
flank
source
and
that's
all
really
exciting.
Thanks.
A
D
Sure,
thanks
vinnie,
I'm
venky
and
I'm
also
part
of
the
delta
ecosystem
team
at
dynamics.
I've
been
working
on
some
parts
in
the
connector
related
work
with
delta
and
also
improving
the
delta
liquid
project.
So
recently
I
worked
on
a
file
compaction
feature
and
also
some
improvements
to
the
restoring
the
delta
table
to
earlier
snapshots
and
some
box
here
here
and
there
around
fixing
some
bugs
and
on
the
connector
ecosystem
side.
D
I
worked
on
like
the
the
presto
delta
delta,
like
integration,
also
like
the
trinomial
and
delta
like
integration,
so
those
are
available
now
in
in
the
respect
to
trinomial
and
delta
presto
projects
and
in
future,
like
while
planning
to
work
on
improving
the
file
compaction
to
capture
the
partial
progress
and
also
adding
the
z
ordering
so
that,
like
the
queries,
will
be
faster.
D
Yep
and
looking
forward
to
like
reviewing
any
full
request
or
like
any
core
issues
that
you
post
on
the
delta
metallic
projects,
thanks.
A
That's
exciting,
thank
you
already.
Questions
are
coming
through
for
you
great
to
hear
so
we
will
get
to
that.
But
let's,
let's
get
denny's
introduction
danny.
Why
don't
you
introduce
yourself
yeah.
B
Thanks
very
much
vinnie
hi
everybody,
my
name
is
denny.
I'm
a
developer
advocate
here
at
databricks,
long
time,
spark
and
delta
lake
guy,
so
here
to
answer
any
of
those
delta
lake
questions,
but
I
figure
with
scott
and
venky
here.
I
probably
won't
be
needed
that
much
today,
so
hopefully,
hopefully
that
stays
true.
A
I
think
we
need
a
little
bit
of
everything
from
the
panel
panels,
so
thank
you
awesome.
So
yeah
hi
everybody-
I
am
vinnie
jesus
developer
advocate
at
databricks
and
I'm
excited
to
get
started
with
our
questions.
A
So
let
me
start
with
the
question
on
trino,
so
the
question
is
trino
launched
delta
connector
will
trino
be
able
to
leverage
optimization
done
on
data,
z
order
and
bloom
filter,
while
reading.
D
So
so
the
optimization
like
what
two
components
like
whether
the
file
compaction
and
z
ordering
so
the
file
compaction.
It
should
be
able
to
make
use
of
it
because
it's
independent
of,
like
I
mean
you're,
just
like
compacting
the
small
files
into
a
bigger
file.
So
the
planner
should
be
able
to
read
like
less
metadata
and
able
to
like
prune
more
depending
on
how
the
data
is
layout.
D
For
the
z
ordering,
I
am
still
investigating
like
how
how
much
work
is
needed
and
also
like
how
to
implement
in
the
in
the
delta
lake.
So
I
mean
when,
when
we
implement
like
the
order
in
delta
lake,
we
are
also
plan
to
look
at
to
see
like
how
trino
rhino
and
other
projects
can
make
use
of
that.
So
I
I
don't
have
exact
answer
now,
but
like
stay
tuned
for
this
one
like
so
we'll
soon
have
a
jira
issue
in
the
project
and
have
the
details.
A
Yeah
your
work
slate
seems
so
busy.
Thank
you
with
all
these
requests
be
ready
for
putting
what
exciting,
yeah
getting
more
community
more
excited
about
it.
Thank
you
all
right.
So,
let's
take
a
question
from
youtube:
are
we
planning
file
skipping
without
the
ordering
at
the
same
at
the
time?
Being
I'll
repeat
the
question:
are
we
planning
file
skipping
without
the
ordering
at
the
time
being.
C
That's
that's:
what's
already
been
added
to
the
master
branch
of
delta
lake
and
that's
what
will
be
included
in
1.2
but,
of
course
adding
z
ordering
will
make
file
skipping
get
that
much
better,
but
the
first
version
is
up
right
now.
A
Awesome
thanks
scott
and
for
those
who
want
to
have
a
link
on
the
roadmap.
We
have.
We
just
released
data
skipping.
As
scott
mentioned
in
q
q
2,
we
will
be
working
on
z,
ordering
so
I'll
put
the
link
in
the
chat.
All
right
next
question
is:
will
the
upcoming
dynamo
db
log
store
work
with
dynamo
db,
compatible
alternatives
like.
C
I
can,
I
think
I
can
take
that.
Oh
okay.
C
I'm
the
one
working
with
open
source
contributor
on
that.
So
it's
a
good
question
for
me.
That's
a
great!
That
is
a
great
question,
so
currently
it
won't.
Currently
we
are
hard
coding
this
just
to
work
with
dynamodb
and
not
just
because
this
is
our
first
attempt
at
solving
this
problem.
But
what
we've
done
in
our
solution
is,
I
know
what
one
relevant
implementation
detail
is:
there's
a
specific
class
that
interacts
with
our
all
of
our
metadata
and
interacts
with
all
of
our
cloud
stores.
C
It's
called
a
log
store
and
what
we've
done
is
we've
abstracted
away
the
whole
problem
of
interacting
with
a
cloud
store
that
doesn't
give
you
mutual
exclusion,
which
is
one
example,
is
s3,
and
so
what
we've
done
is
we've
implemented
one
version
which
does
use
dynamodb
and
that's
our
first
version,
but
it's
completely
open-ended
and
up.
There
can
definitely
be
other
implementations
that
use
any
other
key
value
store
or
any
other
external
locking
store
as
a
solution.
So
this
is
something
you're
interested
in
file
an
issue,
and
we
can.
B
Yeah
one
thing
I'd
like
to
add
to
scott's
point
is:
don't
forget
the
reason
why
the
dynamodb
log
store
is
required
is
because
we
do
need
a
lock
store,
as
opposed
to
log
store
in
this
particular
case,
due
to
the
fact
that
s3
is
missing
the
put
of
apps
and
consistency
issues,
and
so
this
is
not
applicable
to
excuse
me
not
applicable
to
azure
or
google
cloud
storage.
B
There
are
rumors
that
this
will
be
eventually
fixed
as
well
within
s3,
but
because
there
are,
including
the
the
community
member
that
scott's
currently
working
with
they've,
actually
been
running
this
type
of
system
in
production
for
a
while
now,
so
it's
pretty
cool.
So
basically,
scott's
been
working
very
closely
with
that
team
to
make
sure
that
it's
applicable
and
more
generic
for
many
other
use.
Cases
that
are
also
using
s3
so
just
want
to
add
a
little
little
tip
in.
A
A
B
B
That's
actually
already
in
production
for
the
last
three
or
four
months
as
part
of
the
kafka
delta
ingest
project
and
bring
it
back
into
delta,
rust
kafka
delta
injustice.
Well,
as
the
name
sounds,
is
for
kafka
to
write
directly
to
delta
lake,
and
actually
it
happens
to
be
utilizing
the
rust
api
to
do
that,
but
it's
actually
a
subclass
or
somehow
some
set
of
classes
are
part
of
kafka
delta,
ingest.
Now,
they're
moving
those
classes
back
up
to
well
actually
more
like
crates
and
modules
up
to
delta
rust
itself.
B
In
order
to
be
able
to
go,
do
that
so
right
now,
the
primary
focus
is
to
get
that
part
done
first
and
there's
actually
a
couple
of
pr's
already
open
for
that.
Now
related
to
that,
because
you
did
ask
specifically
about
python
bindings,
the
python
bindings
actually
would
be
shortly
after
that
part
is
done,
and
so
right
now
there
are
already
python
lightings
that
work
with
delta
rust
for
the
purpose
of
read.
So
as
soon
as
the
right
is
done,
then
the
python
bindings
will
be
working
on
top
of
that
afterwards.
B
So
I
believe
the
target
was
around
q2
time
frame,
but
the
best
thing
for
us
to
do
specifically
would
be
to
go
ahead
and
join
the
delta
user
slack
there's,
actually
a
delta
rush
channel
that
you
can
go
ahead
and
chime
in
and
ask
your
questions
there,
and
so
we
can
probably
provide
a
lot
more
details
and
that
actually
leads
to
the
second
question
you
just
asked,
which
is
about
the
partitions
things
of
that
nature.
B
Honestly,
I
think
that's
still
up
in
the
air
in
terms
of
those
specific
details,
and
I
would
advise
joining
the
delta
user
slack
again,
delta
rust
channel,
it's
literally
pound
delta-rs
and
all
of
us
are
there
to
answer
your
questions
specifically
on
that.
So
hopefully
that
does
answer
your
questions.
A
It
does
thank
you,
danny
there's,
a
question
about
is
optimize
available
with
the
oss
now,
so
I
think
it's
a
part
of
it
that
I
would
say,
file
skipping
is
available,
but.
D
So
the
optimized
file
compaction
is
available
in
on
the
master,
but
it's
not
yet
released,
so
it
will
be
released
soon
in
the
next
couple
of
weeks.
A
Awesome,
thank
you
and
then.
B
I
did
want
to
add
sorry,
I
think,
he's
downplaying
all
the
amazing
work
that
him
and
the
other
team
members
have
worked
on.
It's
actually
really
exciting
the
fact
that
we
have
that
optimizes
being
brought
in
to
delta
oss.
So
do
you
want
to
call
that
there's
going
to
be
a
bunch
of
blogs
and
for
a
matter
of
the
community
emacs?
We
have
here
right
now
specifically
for
us
to
go
into
some
of
the
details,
but
I
think
vicky's
being
a
little
humble,
modest.
B
B
C
B
About
how
cool
of
a
project
that
that
particular
feature
is
so
hats
off
to
the
team
to
get
this
out
as
already
with
delta
1.2?
So.
A
Thank
you.
Thank
you,
yeah.
That's
an
amazing
feature.
I
think
it
was
like
a
over
a
year
requested
feature.
So
thank
you
for
finally,
releasing
it
like
soon
all
right.
Next
question
is
you
know
how
do
you
see
the
focus
on
and
the
future
of
using
manifest
files
with
the
you
know,
connect
connectors
that
you're
releasing?
A
D
Yeah,
so
I
I
can
talk
about.
D
D
Like
I
mean
it's
like
mainly
from
maintenance
pointer,
if
you
like
to
you,
can't
keep
the
manifest
for
each
partition
in
sync
whenever
you
make
any
updates
to
the
table
so
for
that's
a
reason
like
why
we
developed
like
the
presto
db
and
train
odb
native
connector,
so
that
you
can
just
rely
on
the
delta
log
as
the
source
of
truth
and
the
user
doesn't
have
to,
like
our
data
engineer,
doesn't
have
to
worry
about
like
doing
an
extra
work
of
generating
the
manifest
and
everything
so
and
don't
know
the
future.
D
I
mean
there
may
be
still
some
use
cases
that
may
need
the
manifest
based
approach,
but
for
like
who
I
mean
in
cases
like
where
the
maintenance
is
like
a
burden
like
especially
like
crystal
and
pine
or
db,
cases
like
we
are
trying
to
get
a
get
around
that
and
try
to
use
the
delta
log
directly.
A
Awesome
thanks
frankie
yeah
that
answers
this
question
and
then
next
question
is
any
plans
on
enabling
tagging
the
objects
on
delta
lake
we
can
use.
Can
we
use
s3
life
policy
with
tags
to
clean
up
inactive
files
or
objects.
B
I'm
going
to
take
a
step
unless
you
want
to
take
a
stab
at
the
first
scott,
oh
yeah,
all
right
cool.
So
the
concern
with
using
the
s3
lifecycle
policy
is
that
it
actually
would
not
be
interacting
with
the
delta
lake
transaction
log,
so
you
wouldn't
be
actually
able
to
ensure
consistency
of
the
data
right
so
especially
from
a
transaction
perspective.
Right.
If
you
think
about
what
an
s3
lifecycle
tag
for
lifecycle
policy,
you
can
automatically
delete
files
right
based
if
they're
too
old.
B
But
the
problem
is
that
it's
not
actually
interacting
with
the
delta
lake
transaction
log
to
determine
if
those
files
that
are
maybe
old
but
they're,
still
valid
for
the
purpose
of
the
data
that
you're
trying
to
query
for
sake
of
argument,
because
you've
decided
to
keep
really
old
history
of
the
data
or
for
that
matter
it
was
all
inserts
and
there
was
no
updates.
So
it's
perfectly
fine
that
you've
got
data,
that's
two
years
old,
but
still
valid,
because
you
want
data,
you
want
to
query
your
table
from
two
years
ago
right.
B
So
there
are
discussions.
Nowhere
concrete!
Yet
in
terms
of,
is
there
a
way
to
intermix
by
the
saying?
Could
you
tag
with
the
s3
lifecycle
policy,
but
then
also
interact
with
the
transaction
log?
Somehow,
but
that's
still
not
a
remember
when
you
talk
about
the
s3
lifestyle
calls,
it's
just
simply
a
tag
that
just
says
go
delete
the
files
that
have
been
tagged.
So
it's
going
to
be
a
lot
more
complicated
than
just
simply
saying.
B
Okay,
let's
tag
anything,
that's
old,
so
long
story
short,
it's
not
that
simple
and
maybe
we
can
make
use
of
like
life
cycle
policies
with
metadata.
But
even
then
that's.
This
is
not
a
simple
thing
to
solve
long
story
short.
C
I
would
always
just
like
to
know
what
people's
use
cases
are.
You
know
if
this
is
a
feature
someone
wants
I'd,
be
curious
to
learn
about
why
they
want
it.
What's
what's
missing
from
the
delta
protocol
and
delta
lake
that
that
you
want
this
and
it'd
be
great,
just
to
talk
and
and
learn
more
about
the
problem
you're
trying
to
solve.
A
Yeah,
I
agree
yeah
and
you
know
for
those
joining
and
interested
like
it.
It's
always
helpful
like
for
our
team,
when
we
we
are
brainstorming
on
what
to
build.
It's
helpful
to
understand
the
use
cases
and,
like
you
know,
there
are
different
industries
where
you're
trying
to
solve
different
problems.
So
thank
you.
Another
question
is.
B
I'm
a
little
confused
on
this
one
honestly,
because,
if
you're
trying
to
explain
plan
in
terms
of
spark,
there
would
already
be
it
like.
The
logical
plan
would
be
generated
within
spark
and
you
can
go
read.
It
is
the
context
within
maybe
perhaps
some
other
system,
because
I
mean
delta
from
the
standpoint
of
what
it
is.
It's
a
file
system
right.
It
means
storage
layer.
So
it
has
the
metadata
to
do
that,
but
whatever's
actually
running
the
queries
themselves.
B
D
There
was
a
recently
one
issue
on
the
open
source
I
think
to
the
user,
so
wants
to
like
build
the
match
command
and
then
call
explain
so
that
you
can
see
the
physical
plan
so
currently
that
is
not
available.
So
I
believe
that's
what
the
question
is
about
regarding
the
plans
when
that
will
be
available,
I
mean
I'm
not
sure
about
that
but
like
if
that
is
one
that
you
want
more
feel
free
to.
D
Like
put
your
comments
on
the
jira,
sorry,
the
github
issue,
so
we
can
prioritize
that.
A
Yeah
yeah
so
rajendra,
if
you
can
provide
more
context,
venky
and
jenny
are
here
to
help
your
support,
so
we'll
look
forward
to
providing
more
context.
Thank
you.
Next
question
is
that
will
be
auto.
Optimize
is
known
best
for
streaming
loads,
but
what
would
be
the
bottlenecks
when
used
with
batch
workloads.
D
I
think
the
it's
basically
like
one
of
the
benefits
of
like
auto
optimize
like
if
you
have
like,
if
you
are
inserting
small
files
like
which
is
the
case
like
in
the
streaming
it
helps
to
like,
come
back
like
as
soon
as
they
come.
So
in
those
cases
like
it
helps
so
in
the
batch
cases,
it
depends
on
like
cool
what
type
I
mean,
what
size
of
files
the
batch
shop
is
generating.
I
mean,
if
you
have
the
similar
situation,
then
auto
optimize
may
help.
A
Thank
you.
I
think
there
is
a
kind
of
a
fetal
request
question.
The
question
is:
will
there
be
a
direct
delta
table
or
party
to
grafana
integration
in
the
future?
I
do
understand
that
the
plug
you
know
it
may
be
up
to
grafana
team
to
develop
the
plugin,
but
is
there
any
considerations
in
the
future
for
this
type
of
connector.
B
I
mean
honestly,
I
would
create
a
github
issue
for
something
like
this
and
get
of
get
votes
from
the
community
right
so
because
that
would
basically
tell
both
the
delta
lake
community
and
for
that
matter,
the
grafana
community,
that
this
is
something
that's
that
we
should
all
work
on
together
right
so
to
just
provide
a
little
context.
B
All
the
issues
that
you
see
in
the
roadmap
that
we
pasted
into
both
the
linkedin
and
youtube
channel
right
for
the
delta
lake
road
map.
This
is
all
based
off
of
slack
messages
or
github
issues
and,
oh
sorry,
and
of
course,
the
delta,
like
google
group,
so
we're
getting
tons
of
feedback
from
the
community
based
on
all
of
these
features,
and
so
that's
actually
how
we
prioritized
what
we
have
here.
So
it's
not
to
say
that
we're
not
interested
in
grafana,
it's
more
a
matter
of
like
candidly.
B
We,
I
don't
at
least
personally.
I
don't
think
we've
been
asked
that
question
until
now,
and
so
I
would
definitely
chime
in
and
add
an
issue
in
github
or
for
that
matter,
and
the
delta
uses
slack
and
just
see
if
you
can
get
other
people
involved
with
that,
and
then
that
would
definitely
allow
both
the
grafana
teams
and
the
delta
lake
teams
to
say.
Yes,
maybe
we
should
go
work
on
this
together.
So.
A
Yeah
and
we
can
provide
you
different
channels
to
get
get
started
with
our
team,
with
our
open
source
community
to
build
the
connector
all
right.
Are
there
plans
to
move
okay?
This,
I'm
not
sure
about
this
question,
but
I'm
gonna
read
it
anyway.
Are
there
any
plans
to
make
vacuum
have
no
effect
on
pdf
logs
change
data
feed
logs?
B
B
Yeah,
I'm
probably
going
to
fumble
this
a
little
bit,
but
I'm
going
to
give
it
a
try,
anyways,
okay,
so
first
things.
First
change
data
feed
has
been
added
to
the
delta
lake
road
map,
though
we've
had
a
lot
of
feedback.
Okay,
so
we're
targeting
around
q2q3
time
frame
for
this
one.
So
we
haven't
really
dug
in
deep
in
terms
of
how
we're
going
to
be
implementing
it
right,
but
we
have
some
pretty
good
ideas
and
designs,
but
yeah,
so
just
want
to
call
that
out.
B
First
now
related
to
that
the
change
day
fee
itself
would
contain
pretty
much
any
changes
like
so
basically
there's
a
delta
table
that
contains
the
deltas
pun
intended
of
your
table,
okay
and
so
every
single
insert,
update
whatever
you're
doing
basically
is
recorded
in
this
table.
So
if
you're
running
a
vacuum
on
the
change
data
feed
in
essence,
what
you're
doing
is
you're
removing
any
reference
to
the
fact
that
the
change
occurred,
and
so
I
guess
I
sort
of
need
to
understand
a
little
bit
better.
B
Is
there
maybe
a
context
that
I'm
missing
or
maybe,
if
was
there,
because
in
the
end
the
chain
like
running
a
vacuum
on
the
table
itself?
In
essence,
the
snapshot
table
or
the
static
table
makes
sense.
Because
you're
saying
I
don't
want
the
history
anymore,
but
the
change
they
defeat
itself
would
actually
contain
every
single
change
and
that's
sort
of
the
point
of
the
sea
of
cdf.
So
I
get
unless
I'm
missing
something
and
it's
possible
by
the
way.
So
I
apologize
if
I'm
missing
the
context
here.
A
But
I
think
that
you
provided
a
little
bit
more
context
today,
so
that's
that's
helpful
yeah
on
how
how
they
can
go
about
doing
auditing
and
working
with
cdf.
So
if
you
have
any
follow-up
for
whoever
asked
the
question,
please
use
slack
and
we
can
provide
more
details.
So,
thank
you
all.
You
know
I
think
before
we
leave.
A
There
are
a
lot
of
questions,
so
it's
amazing
to
see
how
much
engagement
you
you
provided
through
our
live
event,
and
we
will
get
through
all
the
questions
we
will
try
to
do
it.
I
have
one
more
announcement,
so
all
right
ready
for
this.
We
now
have
a
delta
lake
linkedin
home
now,
so
please
follow
and
share
with
your
network.
It's
this
is
the
url
linkedin.com
company
delta
lake.
A
We
really
appreciate
your
support
and
you
know
being
here
with
us,
always
showing
the
support
by
as
a
ques
question,
by
asking
questions
and
thank
you
to
the
panelists
who
answered
all
the
questions
in
like
a
very
detailed
manner.
So
thank
you
all
any
closing
thoughts.
B
Just
want
to
say
thank
you,
everybody
for
asking
questions.
Sorry,
if
that
we
have,
if
we
haven't
chimed
in,
but
so
join
us
on
exactly
what
venice
called
out
the
the
delta
lake
linkedin
and,
of
course,
the
delta
delta
uses
slack
channel
where
we're
all
super
active
as
well.
So.