►
From YouTube: 2021-09-17 delta-rs open development meeting
Description
Discussing recent progress made in the project and some production use-cases
A
So
we're
going
live,
we're
live
hooray
all
right,
so
I'm
actually.
I
I'm
curious
to
start
neville.
If
you
wouldn't
mind
sharing
with
the
the
empty
nested
list,
bug
is
and
how
that's
going.
B
Cool
the
empty
nested
list
bug
is
a
bug
that
we've
encountered
or
that
mission
I'm
christian
encountered
when
writing
a
list
that
has
a
struct,
but
the
struct
is
empty.
B
So
the
the
issue
there
is
that,
when
calculating
the
right
definition
levels
to
write,
if,
for
example,
you
you
need
to
have
three
definition
levels,
you
end
up
having
two
definition
levels
and
then
there's
the
it
triggers
an
assertion
error
in
the
per
k
code,
where
you've
got,
let's
say
three
values
but
you're,
trying
to
write
two
definition
levels.
B
So
I've
been
working
on
it
for
quite
a
while
on
and
off,
I've
been
quite
busy
with
work
and
other
stuff,
but
I
was
working
on
it
earlier
today
and
the
test
the
test
case
that
misha
created
I'm.
B
I
got
that
to
pass,
but
I've
got
regressions
on
two
other
test
cases,
so
I'm
just
looking
at
those
and
then
once
I've
once
I've
resolved
them
I'll,
be
able
to
submit
a
pro
request.
B
Yeah
and
that's
me,
I
haven't
been
doing
anything
else,
I'm
at
least
okay
and
error
related.
A
C
That
letter
skew,
so
in
a
kafka
delta
ingest,
we
have
now
an
option
which
enables
a
that
letter,
skew
table
which
actually
cuts
catches,
those
arrows
and
write
the
invalid
records
to
that
another
table,
so
the
the
whole
stream
cannot
like
will
not
be
aborted
just
just
because
of
this
single
row,
yeah
we
so
we've
kind
of
got
a
workaround
for
it,
sort
of
right
just
just
by
filtering
those
records,
but,
to
be
honest,
like
we've
got
a.
C
We
got
only
for
one
table
when
we
had
this
a
list
of
of
structures
and
from
what
is
same,
his
is
like
one
or
two
messages
per
you
know
by
several
millions
a
day,
so
not
a
big
deal
from
the
kafka
delta
interesting
point
other
than
that
we
are.
C
We
went
live
for
24
streams
with
the
top
top
yeah
top
message
rates
like
300
messages
per
second,
not
that
big,
but
it
seems
to
be
working
and
it's
just
like
each
one
of
them
works
on
the
quarter
of
of
of
cpu
and
the
top
stream
consumes
like
700
megabytes
of
ram,
and
that's
only
because
the
messages
in
this
topics
are
pretty
large.
Like
a
3
kilobyte
per
message
so
yeah,
that's
it.
A
I
think
you
know,
in
in
case
denny
watches
this
later.
What
that
means
is
that
we
have
production
delta
writers
consuming
right
now
with
rust,
which
is
really
cool.
We
did
it.
D
C
C
It
does
not
yet
does
not
yet
on
ro
6.6.0
snapshot,
which
that
means
that
we
have
to
comment
that
out
from
the
build,
because
it
is
been
failing-
and
we
do
need
this,
like
600
arrow
address,
because
it
has
a
map
support
which
is
essential
for
checkpoints.
Usage,
as
the
schema
has
a
map
structure
in
it.
C
So
until
we
are
on
a
o,
stable
release,
especially
the
date
of
business,
it
will
be.
You
turn
it
off
to
that
point.
D
Yeah
for
those
who
are
using
the
latest
master
branch,
a
main
branch
from
delta
rs,
you
will
have
to
update
to
use
the
latest
master
branch
from
arrow
r
rs
as
well.
So
that
will
be
arrow
six
point
six
point
x
release,
so
data
vision
will
be
broken
for
now
until
data
version
catch
up
to
the
six
point
or
release.
A
A
C
Good,
do
we
have
any
estimation
time
at
when
the
600
will
blur
is
for
arrow.
A
A
Yeah
they
haven't,
they
even
cut
a
new
release
of
that
so
yeah.
We
would
still
have
to
disable
the
azure
sdk.
A
I
think,
given
all
that
that's
changed
in
the
in
the
rust
binding.
I
think
it
would
be
good
for
us
to
try
to
get
to
our
release
in
the
next
month
or
so
because
there's
some
really
good
stuff.
There.
D
Checkpoint
well,
so
technically
we
can
read
some
of
the
features,
so
we
can
release
features
that
was
merged
before
we
merged
the
arrows
exponents
release
branch.
So
if
we
won't
do
that,
we
can
do
it.
Otherwise,
we'll
have
to
wait
for
the
main
error
release
people
who
can
do
another.
A
B
The
five
thought
was
released
on
the
17th
of
july.
I'd
estimate
that
we've
got
about
three
to
four
more
weeks
till
this
meadow
is
released.
A
A
D
Yeah,
so
this
is
not
a
full
list,
I
might
have
missed
some
important
ones,
but
this
is
what
I
had
in
mind.
We
have
connor
who
has
been
a
really
active
contributor
to
the
project
and
has
been
picking
up
a
lot
of
help
needed
tickets.
For
example,
he
added
a
create
table
feature
to
delta
rs,
so
you
can
now
create
new
tables
using
the
rust
binding
brandon,
who
has
added
gcp
support
to
delta
rs,
which
is
a
big
milestone.
D
So
we
now
support
all
major
cloud
providers
now
yuan
zhou,
who
has
also
been
a
new
and
reactive
contributor,
who
has
been
adding
helping
with
all
these
help
wanted
tickets.
He
added
here
the
batch
deletes,
which
optimize,
which
will
make
vacuum
a
lot
faster
for
large
tables.
D
This
idea
was
suggested
by
daniel
and
angel,
also
added
a
lot
of
other
features
like
for
example,
he
went
through
all
that
all
the
partition
serialization
codepath
and
make
sure
that
it
it's
supported,
make
sure
we
cover
all
the
types
florian
added,
actual
file
system
argument
for
the
python
binding.
D
So
when
you're
reading,
when
you're
loading
a
data
table,
you
can
pass
a
actual
file
system
argument
which
allows
you
to
basically
over
overwrite,
basically
supports
file
system-
that's
not
supported
by
pi
arrow
by
default,
so
you
can
pass
any
kind
of
file
system
object
to
read
the
objects
from
any
remote
store
that
you
want.
Florian,
also
added
a
glue,
catalog
support.
So
that's
also
pretty
cool
yeah.
D
A
D
You
now
you
don't
have
to
with
that,
you
don't
have
to
hard
code,
the
table,
s3
path
or
whatever
path
that
you
want
actually
has
to
be
s3,
because
it's
aws
group,
so
it
only
works
with
the
80
plus,
but
you
can
pass
in
the
table
name
once
you
have
the
glue
integration
setup.
So
that's
also
pretty
cool
florian
also
audited,
all
the
table,
fields,
delta
or
actually
all
the
all
the
delta
commit
action
fields.
D
So
to
make
sure
that
for
the
fields
that
has
to
be
optional,
we
are
setting
it
as
optional
and
further
feels
that
that's
not
optional,
we're
not
using
optional,
and
this
has
a
impact
on
compatibilities
between
the
official
reference
implementation.
D
We
run
into
problems
where
there
were
fields
that
that's
supposed
to
be
optional
and
we're
not
seeing
setting
it
as
optional
and
causing
some
crashes
in
the
official
reader,
and
I
think
there
is
one
more
thing
that
misha
did
that's
really
critical,
but
I
forgot.
C
Yeah,
that's
maybe
that's
what
we've
been
doing
with
christian.
C
Oh
yeah,
okay,
we've
added
a
config
which
is
which
which
actually
there's
a
lot
more
options,
such
as
retention,
lock
policy.
So
I
notice
in
spark.
After
each
checkpoint
they
are
trying
to
remove
locks
and
checkpoints,
which
is
older
than
30
days.
So
we
don't
have
that
and
that
that'll
be
nice
to
have.
D
D
Tyler
that
will
be
introduce
delta,
config
and
tombstone's
retention
policy.
Yeah,
that's
the
one.
So
this
replicates
replicas
replicates
all
the
config
options
from
the
reference
implementation.
D
C
And
and
then
I
have
in
another
critical
bug
with
sp
has
been
fixed
by
christian:
that's
the
fixed
compatibility,
fixed
checkpoint
compatibility
for
remove
fields
yeah.
So
the
issue
is
that
there's
been
breaking
changes
from
from
delta
0.7
to
delta
1
0
that
they
have
a
dynamic
schema.
C
So
there's
like
specific
flag
standard
file
metadata
and
we
were
not
like
run.
We
were
writing
this
in
incorrectly
because
so,
when
spark
reads
like
a
delta
1.0
reset,
if
extended
file
metadata
is
false,
then
the
parkit
schema
has
like
those
size.
Tags
based
on
values
should
be
omitted
from
parque
schema,
but
if
that's
true,
then
they
should
be
present
and
like
we
were
just
like
writing
not
to
it
right.
C
If,
if,
if
that's
missing,
but
that's
actually
a
like
div,
that
should
that
that
will
extend
schema
spark
would
would
fail.
So
did
this
change
introduces
a
like
a
dynamic
schema
to
include
those
columns
or
not,
depending
on
that
intended
file,
metadata
yeah,
so
so
we're
trying
to
am
the
delta
1.0
version.
A
D
Says
you
have
you,
you
need
to
provide
these
extra
fields
if
the
extended
file
metadata
is
set
to
true,
but
it
didn't
say
if
that
is
not
set
to
true,
for
example,
if
it's
false
or
if
that's
not
set
at
all,
you
cannot
input
any
of
these
fields.
Otherwise,
you
will
crash
the
reference
implementation
yeah,
even.
C
A
A
C
Yeah,
that's
what
was
I
was
talking
about
when
yeah,
when
we
commented
our
data
fusion.
A
Cool
one
of
the
things
I
was
speaking
with
denny
about
to
change
topics
slightly.
I
was
speaking
with
denny
about
scheduling
like
a
meet
up
to
talk
about
kafka,
delta,
ingest
and
the
road
to
production.
A
I
think
it's
unfortunately
probably
going
to
be
too
late
in
the
evening
for
you
misha,
but
it'll
be
right
in
the
middle
of
the
day
for
for
christian,
so
I
think
I'll
be
able
to
present
some
of
what
what
y'all
have
done
there
and
I
think
we're
shooting
for
october
7th,
which
is
a
thursday
a
couple
weeks
away.
A
D
D
I
think
there
is
a
doubt-
oh
sorry,
not
delta
delta
lake
office
hour,
that
folks
interested
in.
A
Yeah,
that's
what
denny
just
announced
in
the
events
channel
and
the
delta
user
slack,
so
the
delta
lake
community
office
hours
are
going
to
be
alternating.
I
think
it's
every
other
week,
every
two
weeks
yeah.
So
this
thursday,
that
is
september
16th,
we'll
be
starting
at
9
a.m,
pacific
and
then
two
weeks
from
then
we
would
be
doing
4
p.m,
pacific
to
try
to
accommodate
the
global
audience,
but
the
goal
of
the
office
hours
is
really
to
have.
A
You
know:
general
delta,
like
discussion,
he's
going
to
be
bringing
some
of
the
committers
from
data
bricks.
To
that,
I
think
it's
on
my
calendar
he's
asked
me
to
show
up
for
a
couple
of
those
but
yeah
that'll,
be
I
don't
know
we'll
see
how
it
goes.