►
From YouTube: Delta Lake Community Office Hours (2022-09-22)
Description
Join us on September 22, 2022 at 9:00 AM PDT for the Delta Lake Community Office Hours! Ask your Delta Lake questions live and join our guest speakers, alongside Vini Jaiswal from Delta Lake!
Ask us your #DeltaLake questions. These sessions allow our community to ask questions about Delta Lake OSS and get to learn what we are building, planning to build and know about recently released features. These sessions are live and the recordings are available on the Delta Lake YouTube channel.
Quick links:
https://delta.io/
https://go.delta.io/slack
https://github.com/delta-io/delta/releases
https://groups.google.com/g/delta-users
A
Matthew
I'm
wearing
the
same
t-shirt
from
data
AIS
helmet.
A
C
A
Let
me
find
the
exact
link
of
the
stream
one
second.
A
It
doesn't
show
me
the
exact
link:
okay,
no,
that's
a
company
web
company,
page
okay
got
it
all
right
sounds
good,
so
hello.
Everyone
welcome
to
the
Delta
Lake
office
hours
please
tune
in
to
our
feed
on
Delta
Lake
LinkedIn
page
where
we
are
live,
and
we
also
are
live
on
our
YouTube
channel
for
just
a
sanity
of
this
office
hours,
a
quick
reminder
for
those
who
are
new
to
the
session.
These
sessions
are
live
and
occur.
A
Every
two
weeks
on
Thursdays
at
9am
Pacific,
we
bring
a
panel
of
contributors
and
champions
of
Turtle
Lake.
To
answer
your
questions
about
either
getting
started
on
Delta
lake
or
if
you
are
wondering
what
is
coming
up
in
the
roadmap,
what
we
built,
if
you
missed
any
previous
amas,
don't
worry
about
it.
We
have
recordings
available
where
you
can
watch
it,
either
on
LinkedIn
or
YouTube
on
our
desolate
channels,
and
since
this
is
an
official
webinar
of
Delta
Lake.
Therefore
it's
subject
to
a
code
of
conduct.
A
Please
do
not
add
anything
or
ask
anything
which
will
be
in
violation
to
that
code.
So
without
further
Ado
we
have
Scott
and
Matt
on
our
panel.
Why
don't
I,
give
the
room
to
them
for
introductions
Scott.
C
Sweet
thanks,
Vinnie
hi
everyone,
I'm
Scott
I've,
been
on
here
a
few
times
before
I'm
an
engineer
at
databricks
working
on
Delta
Lake
I've
been
working
on
this
product
for
and
this
open
source
project
for
over
a
year
at
a
lot
of
cool
features
in
the
past,
several
releases
I'm
really
excited
to
be
here
and
answer
your
questions.
B
Hello,
my
name
is
Matthew
Powers
I'm,
a
developer
Advocate
at
dead
Rex,
big
Delta
fan
for
quite
some
time
since
it
was
released
and
recently
I've
been
focusing
on
a
Delta
acceptance,
testing
project
that
we'll
be
chatting
about,
and
also
writing
a
bunch
of
Delta
blog
posts
that
are
getting
me
even
more
excited
about
Delta.
Every
time
I
do
Delta
stuff.
It
just
makes
me
I'm
more
happy
about
the
product.
That's.
A
Awesome
so
because
you're
so
happy
Matthew
what
what
are
you
bringing
for
the
community
I
know
you
are
working
on
some
exciting
things.
So
if
you
want
to
give
a
quick
sneak,
peek.
B
Yeah
one
I'll
give
it
a
high
level
overview
of
the
Delta
acceptance,
testing
project
and
it's
something
I'm
collaborating
with
Scott
on
and
also
a
bunch
of
other
members
of
the
community.
So
basically,
at
a
high
level,
we
want
to
create
some
reference
tables
and
those
are
going
to
be
Delta
lakes
and
we
want
the
connectors
to
be
able
to
write
integration,
integration
tests
off
of
those
reference
tables.
So
we
have
a
pretty
vast
ecosystem
of
connectors.
B
Like
trino
and
Presto
and
rust
and
pandas,
we
want
to
make
sure
all
those
Delta,
like
implementations,
can
Implement
all
all
the
core
functionalities
that
Delta
Sports.
So
we're
going
to
help
drive
that
with
the
Delta
acceptance
testing
project
I
think
it'd
probably
be
helpful
for
to
have
Scott
rephrase
the
same
thing
in
different
words.
C
Yeah,
the
goal
here
is
to
make
sure
that
any
client
implementation
that
reads
and
writes
Delta
tables
does
so
correctly.
We
want
the
Delta
table
to
be
a
single
source
of
Truth
and
that
looks
the
same
to
any
client,
no
matter
how
you're
accessing
it
or
writing
to
it.
So
this
is
just
making
sure
that
different
clients
are
able
to
actually
work
together,
because
this
is
actually
really
reflective
of
how
people
use
Delta,
sometimes
you'll
have
one
workload,
that's
or
one
team.
C
That's
using
one
client,
another
team
that
maybe
wants
to
do
something
a
lot
more
lightweight,
so
one
is
using
spark
and
others
using
pandas
or
rust,
or
the
Delta
Standalone
reader,
and
things
like
that.
So
making
sure
that
we
have
a
standardized
way
to
test
and
invalidate
how
these
different
clients
interact
is
really
important
to
the
overall
success
of
the
Delta
ecosystem.
A
Does
that
mean
we
also
provide
some
some
kind
of
like
a
ground
framework
for
Community?
If
they
want
to
do
any
benchmarks
or
like
build
any
connectors,
they
will
be
able
to
use
that
framework,
something
like
that
or
am
I
getting
it
totally
wrong.
C
C
The
expected
you
know
the
expected
data
that
that's
represented
the
same
in
your
own
system
as
it
is
in
the
table.
Format.
A
That's
pretty
cool
and
also
we
have
Benchmark
framework
which
was
released
with
2.0.
That's
success,
that's
that's
great
because
it
allows
it
takes
away
a
lot
of
complex
setup
from
community.
So
thank
you.
Another
question
is
around
flank.
I
know
that
we
were
working
on
several
things
with
flank.
What
are
some
of
the
things
that
that
is
coming
up
in
the
roadmap.
C
Yeah
a
great
question
so
for
those
that
don't
really
know
much
about
Flink,
perhaps
because
your
version
of
Delta
lake
is
the
Delta
Lake
implementation
on
spark,
which
is
our
most
popular
repo.
But
Delta
is
just
a
file
format
and
it
should
be
able
to
work
with
any
compute
engine
and
not
just
spark
so
Flink
is
one
such
compute
engine.
C
It's
able
to
provide
really
really
low,
latency
and
was
built
in
the
ground
up
with
streaming
in
mind,
whereas
spark
was
initially
kind
of
more
geared
towards
batch
jobs
and
we
figured
out
streaming
a
little
bit
later.
So
the
Flint
connector
is
a
connector
with
Delta
Lake
that
supports
link
sources
and
flank
syncs,
and
those
two
versions
were
released
this
year,
so
we're
actually
working
on
it,
which
is
really
exciting
and
we
have
our
first.
C
You
know
companies
out
in
the
open
source
Community
using
it
for
the
first
time
and
getting
great
feedback
from
them.
C
So
source
and
sync
is
what's
been
developed
so
far
this
year
and
currently
right
now
we're
working
mainly
on
upgrading
our
the
Flink
version
that
we
support
so
just
upgrading
our
versions
along
the
way
and
there's
some
little
bumps
in
the
road
along
the
way,
while
we're
doing
that
that
we're
problem
solving
and
we're
also
working
on
adding
SQL
and
catalog
support.
C
So
the
goal
here
is
for
our
Flint
connector
to
work
complete,
integrate
with
SQL
queries
as
well
as
to
provide
our
own
Delta
capsule
catalog,
which
is
just
kind
of
necessary
to
solve
some
of
the
problems
when
you're,
integrating
with
a
with
a
meta
store
so
yeah.
Those
are
both
actively
in
development
and
there's
actually
a
public
design
doc
on
our
GitHub
issues
in
the
connectors
repository.
If
people
want
to
go
in
and
leave
some
feedback.
A
So
Scott
one
question
on
that.
So
if
I'm,
using
for
using
Flink
and
integrated
with
Delta
lake
is
the
idea
that
I
will
I
can
run
some
SQL
queries
on
Flink
to
be
able
to
actually
query
any
metadata
information
from
Delta
Lake.
C
Any
metadata,
any
actual
data
for
sure,
okay
and
again,
a
cool
thing
to
highlight
here,
is
the
fact
that
all
these
different
compute
engines
can
work
together.
So
one
common
use
case
is,
you
could
have
the
Flint
connector
appending
to
an
append
only
table
and
that's
just
really
really
low.
Latency
like
that
could
be
one
of
your
pipelines.
Then
you
can
have
a
spark
query
on
Delta
Lake,
that's
running
optimized
compaction
to
compact
your
data
for
faster
reads
later
on.
A
Got
it
got
it?
Thank
you
and,
in
terms
of
like
you
know,
just
these
engines
in
general
I
do
see
a
lot
of
trend
around
streaming,
so
people
use
like
different
tools.
So
Matthew
like
what
are
your
thoughts
on,
you
know
the
Trends
on
streaming.
What
are
what
are
popularly
used
methods
with
that,
like.
B
It's
a
great
question
and
I
think
that
in
the
streaming
space
there's
a
lot
of
different
ways
you
can
go
with
this.
Like
let's
say
you
have
a
Kafka
stream
and
you'd
like
to
get
the
data
into
a
Delta
Lake.
B
What
I
understand
Scott
is
you
can,
and
let's
say
you
want
to
do
some
transformations
to
the
to
the
data
and
the
cafe
screen
before
you
put
in
the
Delta
Lake
I
think
that
the
Flint
connector
that
Scott
was
just
referencing
would
be
a
way
to
do
that
and
I
think
another
way
to
do
that
is
you
could
read
the
Kafka
stream
directly
with
spark
streaming.
Do
some
Transformations,
then
output
it
into
a
into
a
Delta
lake,
so
I'm
actually
kind
of
curious
on
some
of
those
high
level
design
decisions
Scott.
C
Yeah
it,
it
all
depends
on
like
a
customer
or
user
to
user
basis.
It
depends
like
what.
A
C
Of
latency
you're
looking
for
or
depends
on
what
kind
of
cost
management
you
want
to
implement,
but
it
depends
what
looks
like
the
Legacy
architecture
you
have
at
your
company.
You
know
Delta
and
Spark
is
a
very,
very
robust
battle
tested
solution.
You
know
it's
been
in
the
open
source
Community
for
over
four
years
now,
and
it's
been
continuously
getting
better
and
better
where
these
new
connectors,
they're
they're
still
being
created.
You
know
we're
still
working
out
some
Kinks
along
the
way,
but
we
still
want
people.
C
C
This
connector
and
they're
giving
us
feedback
right
because
they
want
to
make
it
better
so
yeah,
that's
all
I
can
really
say
on
that.
A
Got
it
got
it
awesome?
There's
another
question
which
is
unrelated
to
this
feel
free
to
you
know,
say
no,
but
which
one
should
I.
Should
we
use
DLT
or
dbd
modeling,
so
I
think
Abhishek
I'm
not
really
sure
what
you
are
trying
to
build,
but.
C
A
A
Dlt
Delta
live
tables.
Awesome.
Another
question
is
so
we
you
know
there
is
very
helpful
information
which
is
which
is
captured
in
the
operation
metrics
of
Delta
Lake.
So
what
are
I
know
that
you
know
when
I
used
use
some
of
the
operation
metrics
for
like
compliance
and
things
like
that.
It
really
helps
to
understand
the
user
history
and
all
that
cool
details.
Are
there
any?
A
C
We
have
apis
that
expose
all
this
metadata
for
you,
I'm
sure
you
can
go
on
our
website
and
explore
the
different
apis
for
getting
the
table
history
for
getting
operation
metrics
for
a
given
transaction
each
commit
to
the
Delta
log
has
a
commit
info
action
under
the
hood
that
will
store
these
operation
metrics.
So
you
can
see
number
of
Target
rows
deleted
or
number
Source
roads
copies
if
you're
doing
a
merge
and
things
like
that.
All
this
is
like
really
easily
accessible
got.
A
It
got
it
and
I
remember
something
that
was
coming
up
in
the
roadmap,
I'm,
not
really
sure
which
feature
it
was,
but
it
was
related
to
some
operational
metrics,
but
I
will
be
excited
to
see
that
feature.
B
C
Yep,
so
for
for
one
change
due
to
feed
is
enabled
on
your
table
and
you
perform
any
sort
of
DML
operation
like
insert
upsert
merge,
delete,
Etc.
We
we,
we
keep
special
CDC
CDF
change
data
feed
related
operation
metrics
in
that
commit
as
well
so
like
operation.
Metrics
are
a
public
API,
they're,
first
class
Citizen
and
they're
heavily
supported
throughout
Delta.
So
when
we
added
a
new
feature
like
CDC,
we
made
sure
to
update
the
operation.
Metrics
accordingly
does.
B
A
So
Matthew,
maybe
like
I,
can
add
if
that
makes
sense,
so
anytime,
a
user
performs
any
action,
Delta
stores
a
record
of
add
delete
and
it
also
stores
what
queries
it
perform
that
operation
on
and
how
many
lines
of
Delta
how
many
rows
change,
how
many
rows
were
impacted?
So
all
these
information
were
collected
and
gets
collected
into
operation,
metrics
that
a
user
can
query.
A
This
allows
a
user
or
you
know,
audit
team
to
see
which
user
performs,
what
action
on
a
Delta
table,
for
example,
if
I'm
missing
a
row
which
was
supposed
to
be
a
gdpr
compliant
data
for
a
user
and
now
a
user
is
requesting.
If
my
information,
you
know
you
have
my
information,
but
you
can
show
them
like
hey.
We
have
this
record.
B
A
Just
one
thing
that
we
don't
capture
right
now
is
I
think
the
vacuum
operation.
So
that's
something
that
we
I
think
we
are
discussing
I'm,
not
really
sure
Scott.
If
we
have,
you
know,
made
any
PRS
on
it,
but
I'm
pretty
sure,
like
you
know,
that
would
be
a
good
enhancement
to
this.
C
Yeah
for
sure
you
know,
Community
can
give
us
feedback.
I
would
love
to
hear
like
the
way
that
people
want
and
Matthew.
This
seems
like
a
great
future
blog
for
you.
A
All
right,
I
think
we
are
also
working
on
some
of
the
some
of
the
user
facing
you
know
blogs
and
tutorials.
So
Matthew,
do
you
want
to
give
us
sneak
peek
on
some
of
the
things
that
is
coming
up.
B
Yes,
I
do
so
one
blog
post,
I,
just
drafted,
be
publishing
soon
is
on
how
to
convert
a
parquet
table
to
a
Delta,
Lake.
I.
Think
that
one
thing
that's
really
cool
is
that
this
is
actually
an
In-Place
operation.
So,
like
let's
say
you
have
a
bunch
of
RK
files
sitting
in
disk
and
you
want
to
convert
them
to
a
Delta
Lake
to
get
all
the
advantages
of
the
nice
features
of
Delta
Lake.
B
But
you
don't
want
to
do
an
expensive
rewrite
of
the
data,
so
the
Delta
OSS
API
exposes
this
convert
to
Delta
method
and
you
can
kind
of
just
take
those
parquet
files
convert
it
to
a
Delta
Lake,
which
all
that
entails
is
making
this
Delta
log
directory
and
then
all
of
a
sudden,
your
parquet
data
is
a
Delta
Lake
that
you
can
enjoy
all
the
wonderful
benefits
that
Delta
provides.
B
So
I
will
be
publishing
that
post,
hopefully
today
on
the
delta.io
blog
and
in
doing
that,
work
I
noticed
that
we
hadn't
yet
exposed
the
opportunity
to
for
users
to
not
gather
statistics
in
this
conversion,
so
opened
an
open
source
requests
for
or
sorry
an
open
source
issue
for
that
one
and
that's
a
good
first
issue.
So
if
somebody
wants
to
to
tackle
that
that
that's
that's
something
that
you
could
jump
on.
A
Awesome
yeah
looking
forward
Matthew.
There
are
some
more
questions
on
our
chat,
so
I
think
one
of
the
question
is
even
I
tried
vacuum,
but
it
not.
It
did
not
work
what
we
expected,
especially
for
streaming
so
Morgan.
If
you
actually
can
provide
more
details,
if
you
have,
the
I
will
paste
the
link
for
slack
Channel,
because
I
think
I
would
like
to
learn
more
about
what
exactly
your
parameters
are
and
what
you
are
trying
to
run
so,
hopefully,
I
can
help
there
or
somebody
else
can
yeah.
C
C
Secondly,
I'm
curious,
like
how
you're
partitioning
your
data,
if
you're
not
and
that
could
be
kind
of
a
bad,
a
bad
smell
of
the
design
of
your
architecture,
perhaps
because
you
could
be
touching
a
lot
of
files
with
your
upserts
and
secondly
or
thirdly,
I
wanted
to
call
out
future
future
that
we're
working
on
called
deletion
vectors,
which
is
designed
to
really
help
with
these
fast
fast
rights
into
slightly
reduced
storage
costs.
C
In
terms
of
typically,
when
you
do
an
update
now,
you
have
to
completely
rewrite
that
file.
Even
if
you
only
changed
one
row
and
then
it's
just
that
initial
file,
but
with
deletion
vectors,
we
can
make
that
change
of
that
one
grow.
A
very,
very
lightweight
sort
of
metadata
right.
Instead,
which
means
you're
not
fully
duplicating
that
file
and
and
features
like
this
once
they're
once
they're
added
to
Delta
I
think
it
really
help
you
save
some
cost.
There.
A
Awesome,
thank
you.
There's
one
more
question,
I
think:
that's
all
we
have
time
for
so
Gary
is
asking.
Is
there
a
difference
in
performance
in
implementing
that,
like
using
the
library
directly
on
spark
or
use
data
bricks.
C
Yeah,
so
that's
a
great
question
and
I
would
say
that
any
sort
of
API
feature
that's
not
yet
in
Delta
Lake
we
have
committed
to
fully
opens
open
sourcing,
adding
it
eventually
that's
what
we
announced
at
Delta
2.0
and
that's
what
we're
really
excited
to
be
adding
all
these
features
like
deletion,
vectors
to
Delta
Lake.
C
But
then,
when
you
talk
about
this
performance
difference
between
spark
and
databricks,
that's
where
it's
not
really
Delta
lake
anymore
databricks
has
a
proprietary
engine
called
Photon,
and
that
is
much
faster
than
spark
than
Apache
spark
for
for
various
reasons,
but
that's
not
really
related
to
Delta
Delta.
Our
goal
here
is
to
have
the
table
format
and
the
apis
be
fully
compatible.
B
A
Awesome
yeah,
so
I
pasted
the
links
for
our
slack
GitHub
and
other
channels
of
Delta
Lake
feel
free
to
join.
That
and
you
know,
ask
many
questions.
I
know
this
happens
twice
like
bi-weekly,
but
you
know
you
can
always
reach
out
any
day
on
slack
or
GitHub.
So,
thank
you
all.
You
know
we
had
a
wonderful
q
a
around
here.
So
thank
you,
Scott
and
Matthew
for
doing
your
due
diligence
and
answering
bunch
of
questions.
A
We
will
have
these
office
hours
on
October
6th
next
time
so
hope
to
see
you
there.
Thank.