►
From YouTube: 2021-01-19 delta-rs open development meeting
Description
Tentative agenda
* Neville share how arrow writer support is going
* We discuss the low-level transaction API here: https://github.com/delta-io/delta-rs/discussions/67
* Pending time, review any bugs/features we think merit discussion.
A
This
is
not
strict,
but
let
me
go
ahead
and
click
the
live
button
on
the
youtubes,
all
right
we're
live.
So
this
is
our
first
open
delta
rs
development
meeting.
I
figure
we're
gonna
try
to
do
these
every
couple
weeks,
while
there's
some
significant
development
going
on
on
the
delta,
rs
library
or
the
bindings
on
top
of
it.
A
First
up
that
I
had
on
the
list
and
neville
is
traveling,
so
hopefully
his
connection
stays
pretty
strong
just
to
share
how
some
of
the
the
aero
writer
support
that
we
need
is
going.
B
B
Okay,
thanks
hi
everyone,
so
I've
been
working
on
the
aero
pacquiao
write,
support
mainly
for
nested
lists,
structs
and
combinations
thereof.
The
the
current
writer
well
before
I
started
with
this
work
only
really
supported
primitive
types
and
very
basic
lists.
So
it's
been
extending
all
of
that
work.
I've
worked
on
it
for
the
past
few
months
now
because
there's
been
quite
a
lot
of
edge
cases
to
to
to
cover
and
deal
with,
you
know
the
different
semantics
between
how
air
reflects
the
data
and
how
4k
interprets
it.
B
As
of
this
past
sunday,
slash
monday
morning,
I
opened
a
foot
request.
That's
really
sort
of
the
penultimate
work,
getting
the
right
support
working,
so
the
the
pro
request
enables
calculating
the
the
nesting
or
the
the
definition
levels
for
arbitrarily
nested
list
types
and
a
combination
thereof.
So
it's
it's
the
the
one
step
that
I
was
missing
before
fully
enabling
you
know
nested
right
support,
so
I'm
nearly
there
I'm
expecting
that
in
the
next
two
weeks
we
should
have
right
support
in
the
parquet
library.
A
The
the
pull
request
is
this:
that's
9240,
let
me
open
it
up
real
quick.
This
is
compute.
B
Neville
so
it's
mainly
getting
the
the
pro
request
reviewed
I've
added
as
much
detail
as
I
could,
but
I'm
still
gonna
add
some
more
to
make
it
easier
for
non-parquet
experts
to
also
review
it.
I'm
just
looking
at
the
logic
it
does
get
a
bit
intricate
when
it
comes
to
dealing
with.
You
know
lists
on
top
of
lists,
but
it
should
be
something
that
one
could
be
able
to
follow
so
just
going
through
the
pull
request.
B
If
there's
any
parquet
experts
in
the
in
the
in
the
chat
or
we've
got
some
time
go
through
it,
you
can
probably
suggest
some
things.
So
what
I
said
in
the
product
was
that,
because
I
had
to
iterate
on
that
on
a
different
on
a
few
different
implementations
to
get
it
right.
I
haven't
done
like
optimizations
like
bit
packing
the
the
masks
etc.
I'm
also
it
also
makes
it
easier
to
review.
B
You
know,
instead
of
looking
at
the
number,
seven
and
figuring
that
it's
it's
one,
one
one
or
something
it's
easier
for
you
to
just
see
true
true,
true
kind
of
thing.
So
once
we've
once
you've
gone
through
that,
then
I
can.
I
can
start
optimizing
it,
but
yeah,
I
think,
as
many
hands
as
possible.
Looking
at
the
request,
there's
also
a
few
other
smaller
stuff
that
I'll
be
opening
more
pull
requests
for
during
the
course
of
the
week,
but
some
of
them
relate
to
the
right
side
of
things.
B
One
of
the
challenges
I've
heard
is
that
it's
it's
quite
difficult
to
to
test
that
you're.
Writing
correctly.
If
you
don't
have
read
support
so
outside
of
using
spark-
and
you
know,
horses
outside
of
using
pi
spark
and
pi
arrow
to
to
read
files.
I've
been
working
also
on
some
other.
The
read
support
so
that
I
can
just
make
sure
that
the
pake
implementation
is
run
tripping
correctly.
D
D
The
writer
that
I'm
targeting
now,
which
is
the
arrow
writer
which
instantiates
a
parquet
rider
internally
and
I'd
like
to
just
kind
of
throw
that
out
there
for
neville
while
we're
on
the
call.
This
is
the
right
thing
for
me
to
be
targeting
at
a
high
level
to
execute
parquet
rights.
Is
that
correct.
B
You
there's
still
a
few
things
missing,
which
I
suppose
will
be
the
work
that
I'll
be
doing
after
that.
The
right
support,
so
you'll
notice
that
it's
very
heavily
tied
to
a
file
on
the
file
system.
We.
C
B
Make
it
more
generic
so
that
the
underlying
packet
writer,
that
it
creates
ultimately
allows
you
to
write
to
a
bunch
of
a
bunch
of
bytes
instead
of
writing
to
a
concrete
file.
But,
yes,
that's.
D
The
correct
I
feel
like
I
could
work
around
this
at
present,
though,
because
from
what
I've
seen
what
I'm
seeing
in
the
arrow
writer
it
looks
like
it
does,
take
an
iterator
so,
rather
than
going,
I
know
the
documentation
says
I
need
to
go
through
that
value.
Error.
That's
provided
that
wraps
a
buffered
or
a
file,
but
I
feel
like
I
should
be
able
to
do
this
without
actually
having
a
file
right.
D
I
could
create
my
own
iterator
on
an
in
on
an
in-memory
buffer
of
json
values
and
pass
that
instead
of
evaluator
does
that
sound.
B
B
E
D
A
Cool
anything
else,
level,
nothing
else,
thanks
cool,
well
what
I
was
hoping
we
would
be
able
to
cover
next
and
maybe
screen
sharing,
isn't
working
well
for
me
qp,
would
you
would
you
be
open
to
sharing
your
screen
and
just
opening
up
the
discussion
that
we've
had
on
the
low
level
rate
api
from
the
delta
rs
light.
E
Yeah,
I
can
try
to
see
if
it
works,
screen,
share.
A
A
E
I'm
gonna
jump
into
the
discussion
discussion,
so
we
have
tyler
opened
this
discussion
here
about
the
right
support.
E
So
I
guess
I
I
don't
know
if
everyone
have
go
through
this
yet,
but
I
can
do
a
quick
overview
on
what
the
proposal
is.
Basically.
E
Yeah,
basically,
we
don't
have
any
right
support
in
delta
rs
right
now,
and
we
are
at
this
stage
where
we
are,
we
should
be
able
to
start
up
implementing
the
right
support,
including
transaction
log
commits.
E
So
what
the
over
all
the
the
high
level
view
we
have
right
now
is
to
first
implement
a
set
of
low
level
apis.
So
if
we
look
at
the
delta
particles,
which
is
open
here,
there
are
a
set
of
actions,
that's
defined
for
10
for
the
transaction
logs.
So
the
idea
is
to
first
implement
these
set
of
actions.
For
example,
add
files
remove
files
change,
metadata,
editing
can
make
transaction
identifiers
all
that
stuff.
E
All
these
low-level
apis
will
first
implement
them,
and
then
we
can
build
higher
level
abstractions
on
top
of
them,
and
with
with
these
low
level
actions,
we
should
be
able
to
have
a
fully
working
end-to-end
demo
with
rust,
with
pure
rust
code.
So
what
I
suggested
is
to
use
the
scala
implementation
essay
inspiration
to
see
how
what
kind
of
api
we
can
design
to
work
with
a
transaction.
For
example,
I
gave
a
pretty
rough
dummy
code
here,
which
is
basically
with
a
a
table
object.
E
We
can
add
some
other
metadata
and
then
at
the
end
we
can
do
a
transaction
commit
which
will
write
all
the
actions,
that's
all
the
actions
and
commit
a
new
transaction
log
to
the
data
table,
and
so
this
is
a
rough
example
that
I
gave,
but
I'm
I'm
happy
to
see,
see
what
other
people
think
about
it.
D
So
I
feel
like
what
you
wrote
here
makes
perfect
sense
to
me
as
an
overall
api,
but
then,
as
I'm
thinking
about
our
incremental
deliverables,
I'm
thinking
like
starting
a
new
transaction
to
begin
with
is
not
going
to
do
anything
special
in
the
long
road.
It's
going
to
interact
with
you
know
a
transaction
coordinator
that
can
kind
of
service
between
multiple
simultaneous
potential
transaction
writers.
But
we're
going
to
ignore
that
at
first
right,
just
to
get
kind
of
the
baseline
in
place.
Yeah.
E
I
think
for
the
initial
implementation
we
should
just
we
should
target
single
rider
supports
instead
of
multi-writing,
so
there
will
be
no
coordination.
Gotcha.
A
Okay
with
how
you
sketch
this
out
qp,
could
you
I
mean
at
the
transaction
commit
is
when
we'd
actually
be
doing
a
write,
but
with
the
the
add
file
remove
file.
Those
would
be
those
would
be
paths
of
some
form
that
I
have
to
go
in
the
transaction
log.
Wouldn't
they
those
will
be
path
like.
C
A
E
A
Have
been
to
be
s3
thing
like
if
I
had
an
s3
object
or
s
yeah
would
would
we
would
the
low-level
api,
be
I
pass
a
uri
in
here
like
an
s3
uri.
E
This
is
up
for
discussion.
I
was.
E
It's
a
path
relative
to
the
relative.
D
D
D
D
Should
we
handle
a
case
where
somebody
where,
where
some
other
party
is
trying
to
leverage
the
same
api
and
chooses
to
use
absolute
paths
instead
of
relative
paths
or
that
create
more
problems
than
it's
worth,
but.
A
F
C
A
B
E
Anyway,
I
think
when
we
do
the
transaction
commit
that's
when
we're
gonna,
we
gonna
grab
the
next
transaction
id
and
then
try
to
commit
that's
where
we
do
the
up
optimistic
right
and
then
for,
I
think
for
the
first
iteration.
We
will
just
do
it
with
the
local
file
system
without
s3
and
then
well.
E
D
Well,
I
mean
if
I
understood
the
previous
conversation
with
neville,
we
don't
have
support
yet
to
provide
a
writer
that
writes
to
s3.
E
A
My
understanding
from
what
you
were
proposing
last
week
when
we
had
chatted
a
little
bit
this
a
little
bit
about
this
privately
qp,
was
that
this
api
would
know
nothing
about
parquet
file
rights,
and
this
would
just
be
taking
like,
when
you
say,
add
file
a
dot
park,
a
as
an
example
that
would
be
an
a
dot.
Parquet
file
would
be
somewhere
already
there
like
it
would
have
to
already
be
in
in
the.
B
A
And
like
the
way,
I'm
interpreting,
that
is,
the
discussions
about
the
the
native
cloud
or
the
cloud
storage
providers
is
sort
of
moot
in
that.
A
D
So
basically,
I
I
think
I
hear
what
you're
saying
tyler
in
this
case,
like
the
paths
might
be
in
s3
that
we're
adding
here
to
the
delta
transaction
log
within
this
transaction
scope.
But
it's
actually
doesn't
matter
at
all
the
transaction
log.
The
transaction
log
is
just
updating
the
transaction
log
wherever
it
happens
to.
E
D
A
Our
simple
test
would
be
like
you'd
create
a
you
know:
you'd
have
a
delta
table
on
on
disk
somewhere
and
you
would
go
put
a
park
or
parquet
file
there,
which,
from
the
delta
standpoint,
doesn't
exist
until
you
would
go
through
this
transaction
and
actually
add
that
to
the
transaction
right.
D
B
Let
me
find
one
there's
a
pr,
that's
semi-abundant
by
someone
where
they
were
implementing
the
ability
to
write
to
a
memory
buffer.
So
let
me
find
that
one,
because
I
was
going
to
use
that
as
a
as
a
start
and
then
abstract
away
the
the
file
system
after
that
I'll
post
it
in
the
chat
now.
A
E
A
Safe
islam
and
we've
got
florian
who's
joining
us.
Welcome,
florian,
hello,
so
qp.
Is
there
anything
else
on
the
on
the
or
anybody
else
I
mean
this
topic
is
open
for
discussion,
but
anybody
else
that
wanted
to
discuss
the
transaction
api.
A
Twice:
okay:
we
are
just
sit
to
time
check
for
everybody.
I
think
I
had
scheduled
this
originally
yeah
for
30
minutes.
So
we've
got
10
minutes
left.
We
don't
have
to
use
the
10
minutes.
I
was
I
wanted
to
leave
the
last
part
of
the
meeting
open
to
open,
pr's
or
bugs
or
or
things
that
we
wanted
to
discuss
that
might
merit
a
synchronous.
E
One
thing
that
I
think
it's
worth
working
on,
but
I
don't
think
it's
checked
yet
is
we
have
this
golden
test?
Data
sets
prepared
from
data
breaks
and
I
don't
think
we
are
passing
all
of
them
for
for
reads.
E
I
tried
a
bunch
of
them
and
there
are
low
hanging
fruits
that
we
like
looks
like
a
lot
of
them
are
failing
for
the
same
problem
and
it
seems
like
a
pretty
quick
fix.
So
I
think
it's
worth
spending
some
time
to
get
us
passing
all
the
tests
for
the
golden
data
set
and
make
sure
that's
part
of
the
csad
pipeline.
A
E
I
I
don't,
but
I
think
I
I
think
that's
something
we
should.
B
A
Yeah
for
for
those
of
you
that
may
not
have
seen
the
pull
request.
I
we've
got
this
hard-coded
azure
storage
account
and
we've
got
a
a
hard-coded
delta
s3
bucket
that
are
both
in
accounts
that
I
control
and
I
populated
both
of
those
with
the
golden
data
set
from
from
the
connectors
repo.
A
I
think
it
is
so
they're
not
they're
not
present
on
disk
for
for
local
development,
but
that's
pretty
easy
to
just
do
a
you
know:
a
get
sub
module
if
we
want
to
go
that
route,
but
they
are,
at
least
in
the
storage
buckets
for
testing.
D
I
yeah.
I
want
to.
F
D
A
So
devel
asked
about
medium
term
keeping
the
test
running
in
tyler's
s3
account.
Yes,
the
the
I
actually
wrote
a
blog
post
about
this,
which
I
can
I
can
share
later,
but
I,
the
the
s3
account
that
is
running
for
those
it
has
a
maximum
budget
of
five
dollars
a
month
if
we
somehow
ever
exceed
that
budget
for
just
storing
s3
objects
for
the
the
golden
test,
I'll
be
very,
very
surprised,
yeah.
A
So,
basically,
unless
denny
has
something
official
for
the
delta
project
as
a
whole,
I'm
perfectly
happy
to
keep
hosting
these.
They
won't
be
disappearing
anytime
soon.
D
A
D
Neville
the
the
file
you
linked
here
just
doing
a
quick
peek
at
it.
It's
a
little
different
than
the
api
I
was
expecting.
I
was
I
was
thinking
there
would
be
like
a
trait
that
would
be
passable.
They
would
do
the
right.
It's
fine
that
it's
not!
It
is
so
basically
what
I'm!
What
I'm
reading
here
is
in.
C
D
B
No
next
to
that
chat
and
the
hand
the
right
the
hand
that
you
raise
there's
a
screen
there,
where
you
can
share
your
screen.
Well,
if
it's
on
my
side,
it
should
be
the
same
followers.
D
D
D
D
I
was
expecting
to
be
able
to
pass
in
a
trait
or
basically
a
strike
that
implements
a
trait
right.
That
would
do
the
right,
but
it
looks
like
the
support.
That's
added
by
this
pr
is
more
in
the
direction
of
writing
directly
to
an
in-memory
buffer,
which
is
fine.
D
I
think,
because
I
can
write
to
an
end
memory
buffer
and
then
handle
the
file
right
to
s3
myself,
but
I
just
wanted
to
make
sure
that
I
was
on
the
same
page,
because,
where
we
were
going
with
conversations,
I
I
thought
it
was
more
in
the
realm
of
passing
in
a
writer
instead,
if
that
makes
sense,.
B
Yeah,
that
makes
sense
I
can
answer
it.
So
there's
there's
a
bunch
there's
a
couple
of
constraints
that
are
quite
challenging
to
solve,
so
one
of
them
is
that
the
the
one
one
of
the
three
conditions
is
that
you
you
have
to
be.
We
have
to
support
track
clone,
which
makes
it
difficult
for
you
to
to
do
it
arbitrarily.
So
this
pr
sort
of
goes
halfway
there,
because
when
you,
when
you
can
sub
supply,
you
know
just
a
buffer,
a
vec
buffer
the
benefit.
B
There
is
that
in
this
pr,
the
the
triclone
constraint
has
actually
been
abstracted
out
to
moved
out,
making
it
easier.
So
what
I
would
practically
do
here
on
this
pr
is
that
I'm
going
to
take
it
as
it
is,
and
then,
instead
of
passing
in
a
vac,
I'm
going
to
then
pass
a
trait
that
has
a
few
fewer
restrictions
than
what
we
currently
have.
So
this
is
sort
of
the
foundation
to
get
to
the
point
where
we
pass
a
struct.
I'm
sorry,
a
trait
sorry,
something
that
implements
the
right
trait.
B
So
yes,
this
is
when
working
with
this
pr,
it's
not
going
to
be
the
end
of
it,
but
I'm
going
to
take
it
further
so
that
we
can
be
able
to
pass
in
anything
effectively
as
long
as
it
can
write
data.
E
So
it
doesn't
have
to
be
a
buffer
right
as
long
as
the
you
pass
in
a
thing
that
implements
the
parquet
writer
trait,
which,
after
this
pr
gets
merged,
it
will
only
require
right
and
sick
trait,
and
it
doesn't
really
matter
how
what
what
this
struck
is
doing
underneath
as
long
as
it
implements
the
right
hand,
side
trade,
and
then
this
chart
can
be
writing
to
you,
know,
memory
or
s3
or
whatever
yeah.
A
Okay,
one
thing
I
wanted,
maybe
with
oil
and
qp
here
we've
got
this
open:
pull
request
for
tokyo,
bumping
tokyo
to
1.0,
which
qp-
and
I
already
had
some
discussion
around
this
in
in
the
pull
request.
This
is
pull
request
76
on
delta
rs,
just
so
you
know,
but
this
is
dependent
on
an
aero
aero,
pull
request
to
upgrade
to
tokyo,
1.0
and
I'm
curious
qp
from
your
perspective,
like
how
important
is
it
for
you
or
to
you
that
we
get
to
tokyo,
1.0.
E
A
Okay,
the
the
impression
that
I
got
from
the
arrow
pull
request,
which
is
arrow
pull
request,
which
I'll
drop
in
our
slack
channel,
is
that
we
might
not
see
this
get
merged
for
a
while.
B
Yeah,
we
will
have
to
wait
for
that
pr
yeah.
I
haven't
followed
the
discussion
around
it.
Do
you
do
you
know
iqp.
E
There's
some
discussion
about
how
to
do
worship
painting
with
the
within
the
cargo
tumble
file.
I
think
that
was
the
main
discussion
that's
remaining,
but
the
pr.
A
A
What
was
not
clear
to
me
neville
was
there's
a
comment
from
alan
b
and
I'll
link
that
directly
in
in
the
slack
saying
that
we
should
merge
this
or
that
it
would
be
great
to
merge
after
3.0
ships.
I
don't
know
enough
about
the
arrow
development
timelines,
but
if
we're,
if
tokyo
1.0
doesn't
get
into
arrow
for
the
next
few
months,
then
qp's
pull
request
is
probably
just
gonna
stagnate.
While
we
change
delta
rs,
because
we
have
a
lot
of
development
planned
in
the
next
few
months,.
B
I
think
andrew
andrew
was
speaking
more
from
we,
we
sort
of
had
a
merge
freeze,
so
I
can
call
it
that
where
we
went,
I'm
measuring
any
pull
requests
but
that
that
that
has
been
resolved
now,
because
I
see
that
there's
a
bunch
of
police
that
have
been
merged
because,
typically,
once
once
the
release
maintenance
branch
has
been
cut,
we
can.
We
can
still
continue
with
development
as
usual,
so
I'll
I'll
chime
in
on
the
discussion
and
see
what
the
the
versioning
issue
is.
I
mean
we
had
a.
B
We
had
a
we
had.
A
breaking
change
in
the
package.
Format
live
crate
where
we
sort
of
upgraded
from
version
2.6
to
2.7
of
the
format
and
it
it
broke.
It
broke
a
couple
of
people's
code,
so
I
think
the
concern
was
alright,
but
we
resolved
that
so
I'll
I'll.
Look
at
I'll
look
into
why
we're
being
pedentic
about.
A
I'm
pinning
versions
from
a
release
standpoint
the
thing
that's
important
to
understand,
and
this
this
is
really
for
everybody,
that's
contributing
we
can.
We
can
release
new
versions
of
the
python
binding
and
we
currently
do
I
mean
this
is
this
is
what
qp
cut?
I
think
zero
two
one
over
the
weekend
with
pinning
rust
dependencies
to
sha
ones
or
branches
from
get
in
order
to
release
the
delta
lake
crate,
so
the
native
rust
binding.
A
We
can't
have
any
references
to
get
get-based
dependencies.
Everything
has
to
be
a
real
release,
dependency
on
crates.I
o.
So
if,
if
we
wanted
to
be
on
tokyo
1.0,
for
example,
we
could
continue
releasing
the
python
crate
or
the
python
wheel
and
continue
developing
with
delta
rs.
For
for
some
of
the
the
downstream
projects
that
we
have
here
at
scribd,
but
nobody,
nobody
would
be
able
to
depend
on
the
native
rust
binding
in
any
sort
of
release
standpoint.
They
would
have
to
point
to
our
git
for
everything
and
that's
to
me
that's.
B
I
think
that's
fine,
so
the
the
the
tokyo
1.0,
the
the
the
issues
that
are
that
are
there
or
the
the
question
the
questions
about
our
own
pinning.
We
can
resolve
that,
probably
by
the
end
of
this
week
and
then
get
that
merged
in,
but
it
means
that
then,
for
we'll
have
to
really
wait
for
version
4.0
of
error
in.
A
B
Yeah,
I
think
in
general,
there
should
be
an
appetite
if
anybody
wants
to
use.
What's
this
delta
delta
rs,
there
should
be
an
appetite
to
to
to
pull
from
get
in
the
short
term,
because
it
at
least
what
I've
seen
is
that
a
few,
a
few
relatively
big
projects
that
are
that
are
moving
a
bit
quickly,
are
using
the
are
using
aaron
pacquiao
from
from
get.
C
B
Of
the
released
version-
okay,
because
yeah
the
problem
is
really
that,
with
the
other
language
implementations,
there's
quite
a
lot
of
release,
work
that
has
to
be
done.
It's
not
as
smooth
as
as
we
do
with
with
rust.
So
as
a
result
of
that,
there
isn't
that
much
appetite
to
do
releases
more
frequently,
but
there
are
some
people.
C
B
A
E
Right,
yes,.
A
I
just
think
that,
like
I,
I
agree
with
your
viewpoint
much
more
now,
qp
of
let's
keep
these
versions
independent,
okay!
Well,
for
me,
I
don't
know
if
anybody
else
has
topics,
but
for
me
I've,
I've
accomplished
everything
that
I
wanted
with
this
meeting.
A
C
Welcome
my
first
code
with
the
rest
is:
it
was
very
complicated
on
my
side,
because
I'm
more
comfortable
with
java,
scala
and
python,
but
feel
free
to
share
with
me
details
regarding
my
my
request.
A
Well,
welcome
to
the
party
thank
you
for
the
pull
request
and
yeah
everybody
and
have
a
good
day
have
a
good
good
week
or
so.
The
next
time
this
is
scheduled
is
two
weeks
from
now
at
9am,
pacific
time.
Thank
you
all
for
joining
I'm
going
to
go
ahead
and
end
the
stream.