►
From YouTube: Delta Lake Community Office Hours (2022-09-08)
Description
Join us on September 8, 2022 at 9:00 AM PDT for the Delta Lake Community Office Hours! Ask your Delta Lake questions live and join our guest speakers, alongside Vini Jaiswal from Delta Lake!
Ask us your #DeltaLake questions. These sessions allow our community to ask questions about Delta Lake OSS and get to learn what we are building, planning to build and know about recently released features. These sessions are live and the recordings are available on the Delta Lake YouTube channel.
Quick links:
https://delta.io/
https://go.delta.io/slack
https://github.com/delta-io/delta/releases
https://groups.google.com/g/delta-users
A
Delta
lake,
if
you
missed
any
of
the
previous
ama's
office
hours,
no
sweat,
there
are
recordings
available
on
our
linkedin
and
youtube
channels.
So
just
a
quick
reminder
as
well.
This
is
an
official
webinar
of
delta
lake
and
therefore
it
is
subject
to
a
code
of
conduct.
Please
do
not
do
or
post
anything
to
the
q
a
or
ask
questions
that
would
be
in
violation
of
that
code.
I
will
paste
the
link
for
code
of
conduct.
A
This
was
a
thing
I
had
to
mention
you
know,
so
I
didn't
mention
that
so
would
love
to
know
without
further
ado,
where
you
are
where
you
are
dialing
in
from
we
have
a
panel
from
across
the
world,
california,
france,
so
yeah
cool.
So
in
the
meantime,
while
you
are
joining
our
channels,
we
will
go
ahead
and
kick
off
the
introductions
of
our
panelists.
So
I
will
start
with
the
person
on
my
right
florian.
Why
don't
you
give
a
quick
intro.
B
Thank
you,
vinnie,
hello,
everyone,
so
I'm
florian
and
I
work
at
black
markets
where
we
sell
refurbished
devices
marketplace
and
I'm
a
contributor
to
delta
irs,
rust
native
library
to
to
have
a
low
access
on
the
delta
tables,
and
we
provide
as
well
python
by
dangerous.
C
Hi
so
my
name
is
nick
karpov,
I'm
a
developer
advocate
here
at
databricks,
I've
been
working
on
the
delta
project
for
several
years
in
capacity
as
a
field
engineer,
and
I
recently
joined
the
dev
advocacy
team.
So
I'm
excited
to
be
here
and
talk
about
delta.
A
D
Hi,
I'm
alison
and
I'm
a
software
engineer
here
at
databricks
and
I
work
on
the
delta
lake
project,
and
this
includes
things
like
managing
the
2.1
release
so
happy
to
talk
about
things
related
to
that,
and
I
also
work.
I've
worked
a
little
bit
on
the
connectors
repo
and
the
delta
standalone
project.
A
Pretty
cool
last
one
we
have
ryan.
E
A
That's
awesome:
we
have
like
exciting
range
of
contributors
here
and
myself
and
vinnie
jesus
developer
advocate
at
databricks
have
been
associated
with
the
league
project
for
several
years
now
so
happy
to
answer
your
questions.
Please,
post
your
questions
here.
We
recently
released
three
major
three
major
announcements
to
our
project,
so
we
will
cover
what
is
available
in
delta,
late
2.1,
with
spark
3.3.
A
We
have
release
on
the
python
dress
binding.
So
you
can
ask
questions
around
that.
We
also
applied
a
fix.
That
was,
you
know
that
was
released
for
for
fixing
a
bug
for
a
recent
release.
So
that's
how
quickly
we
are
relieving
the
features
so
pretty
exciting.
A
So
what
are
some
of
the
things
which
were
fixed
in
delta,
python,
rust,
bindings
florian?
Would
you
able
to
take
that
question.
B
Yeah,
thank
you.
So
basically,
we
release
a
breaking
change
with
the
latest
version
of
the
python
bindings.
It
was
a
6.0
and
we
change
it.
The
way
we
we
read
the
storage,
so
we
allow
more
configuration
of
options
for
the
different
cloud
providers
that
we
have
and
this
we
can
change
using
a
new
crate
of
the
rest.
B
B
So
we
would
like
to
evolve
with
the
rest
of
the
python
ecosystem
and
also
to
measure
improvement
about
the
way
that
we
better
read
and
pass
statistics
of
the
delta
tables
using
the
the
library,
and
we
provide
also
a
better
way
of
support
the
schema,
especially
for
the
both
dates
and
the
decimal
inside.
It.
A
That's
awesome
so
for
those
who
don't
know
what
delta
rest
bindings
are,
can
anybody
on
the
panel
give
a
quick
overview
of
what
that
project
is.
E
So
basically,
this
is
just
like
a
kind
of
like
we
used
us
to
write
like
a
reader
and
writer
based
on
the
delta
protocol,
and
we
we
already
have
like
a
delta
protocol
which
defines
how
to
read
and
write
the
delta
transaction
log
and
then
tell
the
laws
it's
kind
of
just
building
from
scratch,
using
rust
itself
and
then
so
it
is
so.
You
don't
need
that.
A
Yep,
that's
that's
what
it
is
and
also
like.
If
you
have
any
projects
you
are
working
on
and
you
would
like
to
build
a
connector
with
delta
lake,
please
we
welcome
contributions
from
everyone
and
because
we
released
some
standalone
reader
and
writer
back
in
december,
you
are
able
to
delta
lake
opens
the
whole
new
room
for
connecting
with
other
ecosystem
projects,
so
definitely
check
out
our
github
repo,
and
please
put
your
pr
there.
A
We,
we
have
a
very
good,
you
know
reviewing
schedule,
so
we
will
be
happy
to
connect
with
you.
There
is
question
around
why
you
know
if
delta
lake
works
with
emr,
so
I'm
pretty
sure
it
does,
and
you
can
actually
configure
it.
You
know
on
emr.
If
you
need
the
bindings,
if
you
need
the
binaries,
it
is
available
on
our
github
people.
Does
anybody
would
like
to
add
anything.
E
So
basically,
emi
is
just
like
a
ship
like
apache
spark
on
yamaha
and
as
long
as
you
are
using
spot,
it's
pretty
easy
to
like
use
delta
lag.
We
have
like
a
a
quick
start.
The
document
to
tell
you
how
to
start
like
to
use
delta
electron
like
a
spark.
A
And
we
have
more,
we
are
working
on
more
tutorials
for
those
quick
starts,
so
you
know
we
also
improve
our
documentation
so
like
ali
mentioned,
please
definitely
let
us
know
if
we
are
lacking
anywhere
in
the
documentation
and
our
team
will
be
happy
to
update
it
awesome.
There
is
another
question
on
anything
related
to
governance,
data
governance.
We
have
added
in
delta
lake
or
data
quality,
something
like
great
expectation.
E
Yeah,
so
basically,
there
are
like
a
support
like
for
example
constraint.
You
can
define
the
writer.
The
requirement
for
call
a
column
is
pretty
like
a
sql
standard.
You
can
just
add
a
constraint
on
a
column
and
then
just
set
whatever,
like
expectation,
you
would
like
to
do
with
list
data
and
then,
if
so,
with
this
check
constraint,
we
will
like
verify
the
data
quality
first
before
writing
to
a
data
table.
E
A
Yep
exactly
and
also
like,
if
you
have
any
pii
data-
or
you
know,
like
some
kind
of
compliance,
that
is
that
should
be
enforced
or
you
know
you
need
to
have
separation
of
tables.
For
you,
your
users,
you
can
also
like
apply
salt
key
or
any
kind
of
like
pseudonymization
for
your
tables,
and
you
know
we
also
recommend
bronze
silver
gold
level
architecture,
so
you
don't
have
to
give
access
to
all
the
users
in
your
organization
for
all
tables.
You
can
basically
apply
apple's.
A
Sorry
icos
at
the
tackles
meaning
access
level,
control
that
table
level
for
your
users,
and
you
know,
govern
your
data
in
that
regards
as
well.
So
there
are
different
ways
and
approaches
you
can
take.
Anybody
else
would
like
to
add
any
other
recommendation.
C
You
can
also
this
is
a
little
bit
more
custom,
but
you
could
perhaps
leverage
the
commit
user
commit
data
as
well
on
a
per
commit
basis
to
to
write
additional
metadata
for
a
given
committer
and
then
leverage
that,
as
kind
of
the
basis
of
additional
information
to
improve
governance.
So
you
can
use
that
in
your
readers
really
to
do
a
check.
Allow
things
stuff
like
that.
It
would
require
more
work
than
just
using
the
metadata,
but
the
metadata
could
be
the
basis
of
that.
C
Well,
it's
just
a
like
the
technical
ability
to
add
metadata
as
part
of
a
commit
is
just
like
kind
of
foundational,
so
as
an
example
within
databricks,
specifically
there's
metadata
regarding,
like
which
user
actually
committed
this
data
from
what
cluster
things
like
that,
so
in
the
open
source
delta
project,
you
could
achieve
similar
things
using
the
user
commit
data.
A
And
also
we
recent,
I
think
I
read
it
somewhere
in
our
release.
Notes
we
recently
added
some
more
operational
metrics
to
our
delta
lake
project.
Is
that
right.
D
I
think
it's
less,
that
we
added
new
operation
metrics,
but
more
that
we
exposed
them
directly
after
performing
like
the
operation.
So
previously,
you'd
have
to
like
go
describe
history
and
retrieve
the
metrics
for
previous
versions
and
previous
commits,
but
we
just
added
it
in
the
sql
commands
now
so
that
you
get
those
operation
metrics
like
directly
back
from
your
command.
A
Yeah-
and
I
think
this
is
a
very
powerful
feature
having
that
exposure
to
operation
metrics,
you
know
I
have
seen
a
lot
of
use
cases
in
the
field
where
it
could
be
used,
for
you
know
like
backtracking
your
data
as
well
as
a
lot
of
like
different
use
cases
yeah.
So
having
that
functionality
is
pretty
helpful,
awesome
there's
another
question
on:
do
you
need
to
reshuffle
before
merging
in
delta
lake,
or
is
it
enabled
by
default.
C
Do
you,
I
think
so
the
merge
consists
of
two
joins,
which
will
which
will
perform
a
shuffle,
and
there
is
a
repartition
if
needed,
flag
that
you
can
add
such
that
your
files
do
get
shuffled
and
re-partitioned
into
larger
files
per
partition.
C
But
if
the
question
is,
do
you
need
to
re-partition
before
you
run?
Merge
then,
probably
not.
A
Like
merge,
dot
repartition
equals
to
prove
something,
so
I
think
we
have
a
documentation
around
that.
E
C
Particularly,
I
guess
the
source
side
that
could
have
an
impact,
but
on
the
writer
side,
like
you,
don't
really
have
control,
except
for
the
flag
that
you
can
disable,
because
it's
enabled
by
default.
A
C
B
B
We
can
think
about
having
databricks
the
way
of
decentralizing
the
data
sets
produce
so
in
a
way,
it's
a
more
decentralized
way
to
fetch
everything
and
make
it
available
to
for
every
use
case
that
we
can
find
so
in
a
way.
Yes,.
A
Awesome
so
there
are
some.
I
would
definitely
like
to
dive
into
some
of
the
2.1
features.
So
you
know
it
would
be
helpful
to
know
what
have
been.
Why
did
we
decide
to
release
some
of
the
features
in
2.1?
One
like
what
were
the
most
asked
features?
If
you
know
our
panelists
can
like
take
a
quick,
quick
answer
on
that.
D
Well,
I
think
probably
the
biggest
thing
was
really
the
upgrade
for
spark
3.3.
I
think
that
this
has
been
something
that
we've
been
meaning
to
do
for
a
while
and
was
kind
of
pushed
back
a
little
bit
because
of
2.0,
and
I
think
that's
something
that
people
have
requested
a
lot.
So
that's
was
like
the
major
thing
that
was
kind
of
pushing
the
2.1
release,
but
I
also
think
that
a
lot
of
the
other,
like
main
improvements
that
came
along
with
that,
had
to
do
a
lot
with
like
sql
syntax.
D
D
So
yeah
like
a
lot
of
different
changes
for
the
sequel
side.
I
think
that
other
than
that
there's
been
a
lot
of
other
improvements,
as
well
as
just
bug
fixes
and
things
like
that,
a
lot
of
which
were
contributed
by
community
members.
So
that's
awesome.
Yeah.
I
mean
happy
to
talk
about
anything
more
in
detail,
but
or
if
anyone.
A
Overview
ali
yeah
time
travel,
definitely
a
very
good
highlight
and,
of
course,
3.3.
Any
other
features
from
your
perspective.
Nick
that
will
help
you
know,
unblock
any
use
cases.
E
C
Sorry
I
was
on
me
earlier
when
you
asked
me
sorry,
I
don't
know
if
it
was
mentioned,
but
maybe
the
available
available
now
trigger
that
was
introduced
in
spark
3.3,
that
that
could
certainly
help.
That's
not
exactly
a
delta
use
case,
but
as
far
as
the
delta
spark
connector
goes,
it's
it's
definitely
a
notable
improvement
that
can
help
people
implement
their.
A
A
Yeah,
so
I
think
yeah,
I
think,
doesn't
nick
quick
question.
Doesn't
latest
version
always
gets
retained.
A
D
D
A
And
also,
I
think,
vacuum
also
saves
it's
a
it's
a
soft
delete
for
seven
days
I
mean
if
retention
duration
is
not
specified.
If
retention
duration
is
zero,
then
it
will
delete
everything,
but
if
you
don't
specify
retention
duration,
it's
still
there
softly
for
seven
days
in
case.
If
you
want
to
have
ability
to
read
to
read
it
retain
those
tables
back
right
within
seven
days.
C
A
Awesome
all
right,
let
me
see
if
we
have
any
more
questions.
A
I
think
one
question
is
when
foreign
key
primary
relationships
constraints
are
coming.
I
think
it's
it's
the
way
you
define
the
table
right.
It's.
E
A
Yep
good
call
out
all
right
all
right,
so
I
think
that's,
that's
all
the
questions
we
have
thank
you
nick
for
sending
the
link
cool
any
other
closing
remarks
from
anybody
in
the
panel.
C
If
there's,
if
there's
none
from
anybody
else,
just
really
excited
to
hear
from
the
rest
of
the
community,
what
what
new
features
you'd
like
to
see
so
don't
hesitate
to
reach
out.
Please.
A
Awesome
and
you,
if
you
do
want
to
you,
know,
associate
yourself
with
the
project.
If
you
want
to
contribute
something,
we
have
like
a
list
of
good
first
issues,
so
I
will
think
the
link
in
the
chats.
You
can
start
with
good.
First
issues,
don't
hesitate,
like
you
know,
it's
my
new
contribution
or
something
just
get
started,
and
we
will
be
happy
to
help
you
get
there
so
awesome.
Thank
you
all.