►
From YouTube: 2021-03-30 delta-rs open development meeting
Description
Tentative agenda :
* Neville to share the 2.6.0 parquet writer updates
* Misha sharing the dynamodb lock work * Review rustdoc progress* Granting committer privileges to Christian * Surfacing writer stats through arrow/delta-rs
A
All
right
welcome
everybody
to
the
regular,
as
in
every
other
week,
regular
delta,
rs,
open
development
meeting
today
for
the
tentative
agenda.
We
had
a
whole
bunch
of
topics
that
we
had
suggested.
So
I
tried
to
to
cut
down
a
little
bit
and
if
we
get
through
these,
then
we'll
go
through
everything
else
that
we
had
in
the
slack
thread.
But
at
a
high
level
I
would
enable
to
share
the
the
2.6
parquet
writer
updates.
A
I
wanted
misha
to
then
talk
more
about
the
dynamodb
lock
work
that
he'd
been
doing
qp.
I
wanted
to
review
some
of
the
restock
progress
to
see
what
we
need
to
still
do
there.
A
I
also
wanted
to
make
sure
that
we
discussed
granting
chris
committer
access
and
then
hopefully
we'll
have
time
to
talk
about
surfacing,
writer's
stats
through
arrow
or
sorry
through
the
parquet
crate
or
up
through
delta
rs,
which
is
a
topic
that
christian
will
leave.
But
let's
go
ahead
and
get
started
with
you.
Nibble.
B
C
On
my
side
of
the
update,
so
I've
materially
completed
the
package,
2.6.0
right
support,
there's
only
one
item
pending
in
terms
of
data
types
that
need
to
be
written
and
that's
the
decimal
128
data
type.
It's
actually
dependent
on
a
an
open
pr
in
the
aero
site,
where
we
still
whistle
going
through
it.
So
it's
still
in
a
draft
state.
But
with
that
said
it's
it's
just
really
the
actual
type,
but
the
underlying
you
know
fixed,
fixed
length,
binary
type.
C
We
can
already
write,
so
it's
just
semantics
waiting
for
it
to
sort
of
be
ticked
off,
and
then
we
can
sort
of
say
we
completely
done
with
that.
So
I've
just
attached
the
umbrella
jira
on
the
on
the
chat
now
I'll
also
put
it
on
on
slack,
but
it
just
talks
about
the
the
final
work
that
I've
been
doing
on
the
on
the
right
support.
You
know
supporting
nanosecond
timestamps.
C
I
think
that
was
the
big
one
for
a
couple
of
people
and
then
what
I've
also
done
in
this
in
the
past
week
is
I've
so
when,
when,
when
trying
to
write
tests
or
all
or
benchmark
or
even
you
know,
check
check,
what's
working,
what's
not
working,
it's
been
very
difficult
because
you
sort
of
have
to
write
the
schema
by
hand
and
then
generate
the
data
by
hand.
So
I
opened
the
pr
which
I
mentioned
today
to
allow
us
to
generate
arbitrary,
random
data.
C
So
you
just
provide
the
schema
that
you
want
very
useful
if
you've
got
like
deeply
nested
structures,
you
know
with
a
struct,
struct
list,
etc,
and
then
you
supply
the
number
of
records
you
want,
and
then
it
generates
that
for
you.
So
with
that,
I'm
also
just
creating
some
benchmarks
to
look
at
a
couple
of
to-do's
that
I'm
left
with
in
terms
of
cleaning
up.
C
You
know
the
stuff
that
I,
where
where
I
would
write
a
vector
of
booleans
instead
of
a
bitmap
just
so
that
I
could
be
able
to
diagnose
easily
without
having
to
worry
about
bit,
manipulation
logic.
So
there's
a
bit
of
those
things
that
I
need
to
clean
up,
I'll,
run
them
up
and
just
put
them
in
a
in
a
single
gyro
so
that
I
can
check
them.
But
that's
it
from
my
side.
A
Explain
things
perfectly
well
done.
Thank
you!
That's
that's,
fantastic
that
the
support
is
coming
in.
I
think
I
can
speak
to
scribd's
specifics
use
case
in
that,
I'm
pretty
sure
we
don't
have
any
128-bit
decimals
floating
around,
but
it's
great
to
know
that
that
work
is
coming
in
shortly.
Yep.
A
All
right,
the
next
topic
that
we
had
misha
if
you
could
just
go
ahead
and
lead
us
to
and
share
the
the
work
that
you've
been
doing
on
the
the
dynamodb,
lock
support
and
I'll
drop.
This
link
in
the
slack
channel
as
well
yep.
C
So
bulletproof
is
still
in
draft
mode,
but
I
just
need
I
just
need
to
figure
out
integration
tests
and
then
yeah
and
then
the
creation
client
within
s3,
but
other
than
that.
The
the
data
log
workflow
is
this:
it's
it's
copied
from
java
info,
meaning
that
it
supports
all
of
the
major
features,
such
as
acquire,
lock,
updating
and
releasing,
and
with
a
with
a
simple
addition
from
my
site.
C
Is
that
there's
the
specific
scenario
where
java
impul
is
lacking-
and
I
addressed
that
in
this
implementation-
is
where
the
orchid
that
holds
a
lock
dies
and
then
there's
more
than
one
workers
trash
to
acquire
this
lock
and
in
java
impo
when
when
they
try
to
acquire
a
lock,
there's
least
duration
for
which
they
wait
like
by
default,
is
22nd
and
only
then
it
is
able
to
expire
lock.
But
the
issue
is
that
they
are
not.
C
If
the
lock
is
changed,
they
are
not
extended,
at
least
at
least
duration,
so
only
for
example,
if
you
have
like
more
than
one
workers
and
both
of
them
like
example,
if
both
of
them
wait
for
20
seconds
and
only
one
which
is
the
the
faster
one
will
get
a
lock,
another
will
will
fail
with
a
with
a
timeout.
So
that's
that
should
not
be
our
case.
C
I
extended
that
logic
with,
with
the
check
that,
if
the
second
work,
which
is
the
which
does
not
acquire
a
lock,
it
says
that
the
actual
lock
has
changed
like
that
record
version
number
has
changed.
It
will
still.
It
will
then
increase
eight
amount
duration
that
may
cause
to,
for
example,
a
waiting
move
for
more
than
20
seconds.
But
if
the
worker
that's
faster,
which
require
lock
releases,
then
then
we'll
get
the
normal
workflow
anyway.
C
That's
that
there's
a
still
addition
from
me
for
our
use
case
compared
to
can.
A
I
ask:
can
I
ask
you
and
I
think
qp
you've
been
collaborating
on
this
a
little
bit.
I
wouldn't
ask.
I
see
in
this
draftpool
request
that
there's
a
new
feature
like
a
new
cargo
feature
for
dynamodb,
that
we'd
be
adding.
I
was
hoping
that
maybe
one
of
you
could
sort
of
share
the
reasoning
behind
adding
another
another
feature
flag
here.
C
Yeah,
I
can
answer
that,
so
the
I
we
have
not
yet
integrated
the
911db
into
a
street
storage,
but
the
idea
is
that
not
the
every
use
case
will
need
it
in
the
dynamodb
lock,
for
example,
if
that
simple
worker
central
worker
be
a
behavior
right
and
they
might
not
use
the
aws
s3
something
like
this,
so
they
they
won't
need
a
dynamo
db,
lock
within
it.
So
they
might
they
just
so.
C
We
hide
that
under
the
future
and
only
those
that
need
that
need
a
locking
mechanism
for
multi-worker
environment
and
have
access
to
the
normandy
b,
then
only
then
my
leverage
of
using
the
number
limit.
C
And
also
recently,
when
I
was
not,
I
I
had
a
meeting
with
chris
on
kafka
delta
ingest
and
we
were
talking
on
how
we
can
benefit
from
a
delta,
rs
dynamodb,
lock
that
there's
there's
additional
meta
field
data
where
we
can
store,
for
example,
for
kafka
delta
ingest.
We
might
store
a
kafka
offset
and
for
delta
arrest.
We
might
store
a
latest
latest
delta
version.
A
Sorry
misha,
I
I
may
have
lost
the
the
key
piece
of
context:
that's
a
metadata
field
in
the
dynamodb
row.
Yes,
okay,.
C
It's
it's
also
in
java
import,
but
in
lack
of
simplicity,
I
was
not
a.
I
was
not
using
a.
I
was
not
including
that
in
delta
arrests,
but
which
recently
figured
that
it
will
might
be
beneficial
for
us
for
kafka
delta
ingest
to
store
the
latest
offsets
and
for
a
delta
rs.
You
might
use
that
for
latest
version.
So,
instead
of
relying
on
optimistic
currency,
where,
where
worker
tries
to
create
a
new
delta
version
in
loop
right
with
atomic
rename,
it
just
might
start
from
the
latest
version.
B
C
Also,
just
to
add,
in
this
dynamo
db
lock,
I
have
to
try
to
copy
every
a
every
like
a
dynamo
schema,
a
another
structure
from
a
and
and
and
a
and
the
final
code
to
be
similar
to
java
impulse.
So
I
in
case
there
will
be
an
actual
part
of
that
library.
In
us.
We
will
be
able
to
easily
immigrate
and
yeah.
That's
it
for
me,.
D
A
Okay,
I
trust
y'all
will
be
able
to
find
each
other
to
have
that
discussion
except
qp,
rust
docs.
How
are
we
doing.
E
Yep,
this
part
is
easy,
so
we
have
a
work
in
progress.
Pr
open
for
adding
rust
stocks
for
all
our
rust
coolbase
florian
has
already
added
docs
string
for
all
the
python
binding
already,
and
the
goal
of
that
is
to
once.
We
have
all
the
docstring
added
to
public
interfaces.
We
will
enable
missing
dogs
linkedin
rule
so
that
all
future
code
change
require
doctrine
in
any
new
public
interfaces
so
that
we
could
make
sure
that
our
documents
are
documented.
E
This
is
following
the
same
rule
that
the
arrow
creates
has
set
up
and
we're
just
doing
the
same
thing
here,
and
anyone
is
welcome
to
send
pull
requests
to
the
docstring
branch
we
are
currently.
E
I
think
we
we
still
have
160
public
interfaces
still
we
need
to
add,
but
once
we
get
that
the
crci
to
pass
in
that
branch,
we'll
merge
that
into
the
main.
D
I
plan
on
adding
adding
a
pr
for
another
big
pass
on
this
later.
I'd
like
to
get
get
this
in
as
soon
as
possible.
Okay,
because
we're
going
to
keep
growing
the
surface
area
of
non-required
rust
stock
right,
so
the
sooner
the
better
so
I'll
take
another
pass
this
evening.
A
What
I'd
recommend
for
anybody
that's
working
on
this,
and
this
goes
for
anybody.
That's
watching!
The
video
is
to
also
maybe
announce
what
area
you're
working
for
working
on
for
adding
rustoc
in
the
slack
channel.
To
make
sure
that
you
know
two
folks,
don't
have.
You
know,
spend
a
lot
of
time,
adding
the
same
restocks
qp,
I'm
assuming
that.
So
this
is
pull
request.
156,
I'm
assuming
once
this
goes
green.
We
merge
it
and
then
that's
that
right,
yep.
A
A
I
believe
the
next
thing
that
I
wanted
to
to
have
us
discuss
is
qp
suggested
that
christian,
who
is
the
big
letter
c
in
this
video
christian,
be
granted
commit
privileges
to
the
delta
rs
repository.
I
think
florian
qp
myself,
I
don't
know
if
miesha
or
neville
are
committers
at
the
moment.
Like
a
look.
E
A
It's
almost
threatening
soon
so
qp
you
put
this
up.
So
I'm
assuming
you're
in
favor
yep.
A
All
right,
I'm
also
I'm
also
cool
with
this.
So
christian,
I'm
clicking
the
buttons
now
hooray
great
power,
great
responsibility,
yada
yada,
yeah,
yep
yep,
so
christian's
been
great
at
rate
access
welcome
board.
Thank
you
for
your
contributions,
etc.
I
guess
we
should
go
into
a
next
topic.
That
was
actually.
This
is
the
last
topic
that
I
had
said
that
we
definitely
should
get
through
and
maybe
neville
and
christian.
This
is
y'all
that
stats
support
bubbling
up
through
delta
rs,
something
or
other.
D
Yep,
mostly
just
going
to
defer
to
neville
on
this,
but
basically
you
know
the
the
decision.
We're
trying
to
reach
here
is
for
deriving
the
stats
to
include
in
each
add
action
for
a
delta
transaction.
Where,
where
do
we
want
that
code
to
live?
Do
we
want
to
live
it
want
it
to
live
in
the
parquet
crate,
the
aero
crate
or
in
kraka
delta
ingest
running
compute
kernels
from
arrow,
so
neville's
done
some
research
on
this
so
I'll
hand
it
off
to
you.
C
Cool,
thank
you.
So
broadly,
there's
two
options.
The
first
option
is
to
compute
the
stats
using
the
aerocompute
functionality,
so
we've
got
the
minimum
maximum
kernel.
I
think
we
only
calculate
minimum
maximum.
The
null
count
is
already
provided
by
the
record
badge
and
then
there's
distinct
columns.
I
think
that
that's
that's
only
the
other
one,
but
it's
more
optional
than
the
others.
C
The
second
option
is
to
surface
the
k
statistics
for
the
you
know
the
the
file
that
has
just
been
written
because
it
we
compute
the
stats
as
part
of
writing
the
file
and
then
use
use
those
statistics
as
part
of
the
metadata
to
to
to
to
populate
then
the
the
delta
statistics.
C
You
know,
we've
got
a
32-bit
into
64-bit,
end,
float
double
and
then
we've
got
byte
byte
array
and
fixed-length
battery,
so
with
the
byte
arrays
that
we
don't
actually
have
any
methods
and
errors
yet
to
compute
what
the
minimum
you
know
if,
if
you're
looking
at
an
ip
address,
for
example,
sorry
not
an
ap
address
a
mac
address
for
as
an
example
which
you'd
store
as
a
fixed
length,
binary
byte
array:
you
do,
we
don't
have
any
functionality
to
calculate
what
the
minimum
and
what
the
maximum
is
and
per
k
actually
requires
those.
C
Now
with
the
parquet
one,
there's
a
slight
challenge,
but
I've
been
exploring
it
in
the
background
because
I've
mob
I've
I've
been
thinking
about
it,
but
not
actually
trying
it
out,
but
the
challenge
there
is
that
chris
needs
to
well
is
writing
data
to
an
in-memory
buffer
and
then
writing
that
in
memory
buffer,
then
to
to
s3.
So
after
you
write
the
file
and
you
close
the
file,
we
currently
don't
have
a
way
of
returning
the
statistics
of
the
file
without
actually
reading
the
file
again
now
to
read
the
file
again.
C
If
it's
an
if
it
was
a
file
on
disk,
it
would
be
fine,
you'd,
sort
of
instantiate
the
file
writer
and
then
just
retest
the
the
metadata
and
then
get
the
statistics.
But
with
an
in-memory
in
memory
file
that
has
just
been
written
to
of
a
bunch
of
data,
we
actually
don't
have
any
functionality
to
read
data
from
an
in-memory
source.
C
I
added
some
text
on
this
on
the
chat
on
the
select
channel
just
before
the
talk,
so
that's
option
two
there
on
and
there
also
option
one
I'm
looking
at.
You
know
the
writer.
The
close
function
returns
an
empty
result,
but
I'm
looking
at
whether
we
could
retain
like
the
file
metadata
and
I
can
potentially
share
my
screen.
Let
me
see
if
I
can
do
that.
A
D
Here
yeah,
it's
mostly
for
optimization
of
reads
when
running
queries
from
databricks,
so
they
spark
sql
queries
or
or
spark
queries
that
you
run
from
a
databricks
notebook
against
a
table
that
we
would
write
from
delta
rs.
Can
leverage
leverage
these
statistics
to
optimize
optimize
the
query
and
give
better
better
read
performance.
C
Yeah
to
add
on
to
what
is
the
same,
so
you
would
have
a
file
that
has
you
know
like
a
bunch
of
chunks,
chunk
one
chunk,
two
chunk,
etc,
and
then,
if
you've
got
the
statistics
for
each
column,
what
you'll
do-
and
we
also
do
this
in
data
fusion-
if
let's
say,
for
example,
you're
reading
you're
only
interested
in
in
data
where
somebody's
age
is
between
18
and
20.,
and
here
we
know
that
the
maximum
is
15..
C
Let's
say
it's
one
year
to
15
years
and
then
the
other
you
know
you'd
effectively
skip
this
instead
of
so,
instead
of
touching
it
at
all.
So
if
we
have
the
so
this
is
at
a
parquet.
Can
you
assume
you
can
see
my
screen?
C
Sorry
this
is
at
a
particular
level,
so
the
file
would
have
different
chunks,
but
then
beyond
that,
what
delta
would
then
do
is
that
for
the
whole
file,
it'll
sort
of
say
well,
okay,
if
you've
got
if
this
is
one
of
let's
say,
100
files,
we'll
then
say
the
let's
say
that
okay,
we've
got
2
250
and
then
let's
do
this
quickly
and
then
56
to
19.,
very
old
people
here.
C
What
you
have
with
these
statistics
is
that
you'd
have
the
minimum
of
one
year
and
then
the
maximum
of
90.,
so
in
the
next
vowel,
if
you're
looking
for
somebody
who's,
let's
say
92
years
old,
then
in
this
case
you
won't
even
touch
this
file
at
all.
So
we
need
those
statistics.
Otherwise,
if
you
don't
have
them
here,
you're
forced
to
read
the
right,
the
whatever
readable
one
will
be
written,
we'll
be
forced
to
go
here
and
then
find
what
it
means.
Yeah
great.
B
C
So
this
this
guy
here
in
this
trade,
the
foul
writer
trait
before
this
change-
that
I'm
exploring
I'm
just
letting
the
compiler
guide
me
to
see
what
else
I
need
to
change.
If
I
get
stuck,
I
get
stuck
sort
of.
If
I
perish,
I
perish
so
this
guy
here,
I'm
when
you
close
the
file,
it
returned
an
empty
empty
result.
So
I'm
just
exploring
what
happens
if
we
return,
you
know
the
actual
metadata
the
file
metadata,
because
with
the
file
metadata
you'll
then
have
with
the
file
metadata
you'll
then
be
able
to
see
this
one.
C
Yes,
it's
this
one!
Sorry
I
keep
creating
like
random
ripples
just
to
try
stuff
out
with
the
file
metadata.
What
you'll
then
be
able
to
do
I'm
here.
So
here's
an
example
where
I
was
talking
about
the
random
data
generator
thing.
I
generate
a
large
enough
record
batch.
I
think
here
we've
got
like
20
million
records.
I
write
it
to
the
packet
file
and
then
to
get
the
to
get
the
statistics
I
need
to.
After
closing
it
I
need
to
read
back
the
file,
but
christian
wouldn't
be
able
to
do
this
currently.
C
E
So
when
you
calculate
that
stats
for
battery
in
the
parking
crate,
are
you
not
using
arrow
to
do
that
calculation.
C
No,
we
we're
not
using
error.
It's
it's
sorry,
it's
it's
actually
a
bit
inefficient
because
we
we
sort
of
doing
it
on
a
row-by-row
basis.
Okay,
no.
I
want
to
look
for
the
code
we
sort
of
doing
it
on
a
row-by-row
basis.
It's
one
of
the
things
that
I
wanna
look
at
improving,
because
what
happens
is
when
you
write
the
file.
You
can
provide
this,
that's
if
you've
pre-computed
them
or
if
you
you
can
provide
like
it's
it's
an
option.
C
If
you
provide
nothing
it'll,
then
compute,
the
stats
for
you
on
a
row
by
row
basis
so
where
we
could
use
arrow
is
actually
to
compute
the
stats
for
the
the
chunks
or
the
or
the
batch
or
whatever,
and
then
you
know
pass
them
to
the
to
the
writer
so
that
it
doesn't
compute
the
stats,
but
now
the
problem
with
that
is
that
sometimes
there's
a
there's
a
bit
of
a
disconnect
between
the
parquet
record.
Sorry,
the
error
record
badge
size
and
the
pacquay
row
groups,
so
you
would
have
a
in
this
instance.
C
It's!
It's
10
million
records
that
I'm
writing
here,
but
the
because
the
rust
implementation
actually
doesn't
split
the
the
batch
into
you
know
smaller
chunks.
It's
writing
the
whole
thing
in
one.
Go
sorry
I'll
stop
scrolling
in
a
bit
once
I
reach
I'm
going
here,
we
are
so
I've
written
a
481
megabyte
file,
which
is
one
row
group.
So
that's
that's
a
problem
from
a
parallelism
for
a
reader
perspective.
C
So
if
spark
way
to
read
this
file
that
has
been
written
by
rust,
it
wouldn't
be
able
to
parallelize
it
because
now
you've
got
only
one.
You
know
one
gigantic
row
group,
so
this
is
related
to
the
other
item
that
I
was
talking
about
around.
C
But
now
going
back
to
the
discussion
at
hand
of
the
of
of
whether
you
could
compute
the
statistics
from
arrow
the
problem,
there
is,
if
you
compute
the
statistics
at
the
at
the
parquet
right
level,
if
you
compute
the
statistics
for
the
whole
record
badge
and
the
record
batch
has
10
million
records.
What
happens
if
parquet
writes
a
chunk
of,
let's
say
a
hundred
thousand
records.
You
still
have
to
compute
the
stats
for
just
that
hundred
thousand,
so
I'm
still
exploring
that,
but
it
it
just
specifically
answers
your
question.
Quepita.
E
So
it
sounds
like
the
parquet
writer
has
its
own
road
group
split
right,
regardless
of
the
size
of
record
patch?
Yes,
okay,
so
I
guess
the
stats
that's
passed
in
from
outside
of
the
writer.
It's
not
really
useful.
In
that
case,.
A
Yes,
unless,
from
a
from
a
performance
standpoint,
I
kind
of
I
don't
understand
why
one
would
pre-compute
the
stats
at
the
arrow
level,
because
it's
not
like
you
would
pre-compute
on
one
node
and
write
on
another
like
it's
just
you'd,
basically
be
just
deciding
where
you
want
the
cpu
overhead
to
be
in
a
single
process.
Either
way.
Wouldn't
you.
C
Yes,
and
no
so
if
you're
able
to
tweak,
if
you're
able
to
get
the
record
batch
sizes
to
equal
to
the
chunk
sizes
computing,
that's
that's
an
arrow
is
more
efficient,
because
if
you,
if
you're
calculating
the
the
sum
of
a
column,
you
know
you're
going
to
vectorize
the
computation,
whereas,
if
you're
doing
it
on
a
row
base
yeah,
it's
that
okay,
you
probably
save
like
a
few
milliseconds
well,
depending
on
how
big
your
data
is.
E
So
parquet
writes
stats
calculation.
It's
all
row
by
row,
there's
no
vectorization
at
all.
C
The
rust
library
computes
row
by
row,
so
it's
it's
part
of
the
problem
actually
with
well
the
performance
issue,
the
main
performance
issue
with
with
the
rust
parquet
writer
and
reader.
So,
unlike
the
the
c
plus
plus
implementation,
for
example,
where
they
sort
of
rewrote
the
parque
write
and
read,
support
you
know.
From
from
from
the
very
low
level
we
took,
we
took,
we
took
a
a
a
more
a
more
convenient
route
of
saying.
Well,
you've
got
the
low
level
where's
that
thing
you've
got
the
low
level
column.
C
Writer
you've
got
the
low
level
column
right.
So
this
is
where
the
stats
actually
compute
computed.
So
you
see
you
sort
of
say
the
minimum
is
the
minimum
of
of
oh.
No,
no
sorry!
This
is
not
it.
It's
it's
oh
yeah!
Here
it
is
so
this
is
where
you
sort
of
computing,
your
your
stats,
you
we
do
it
row
by
row
and
the
reason
why
we
do
this
is
that
we
we
use
this
function
of
writing
the
batch.
C
The
internal
batch
instead
of
what
the
c
plus
plus
implementation
did,
where
they
sort
of
re-re-worked
the
whole
support
from
scratch.
That's
why
it
took
them
like
over
over
a
year
or
so
to
to
eventually
get
the
the
the
right
support
completed.
But
what
we
did
is
we
sort
of
using
the
low
level
column
functionality.
C
We
materialize
the
error
values
into
you
know
into
the
into
the
primitive
types
which
is
a
bit
inefficient
because
we're
creating
an
allocation.
Here
we
compute
the
definition
and
repetitions
we're
creating
two
more
allocations,
and
then
we
compute
we
are
the
minimum
maximum
if
we
need
them.
C
So
in
the
long
term-
and
this
is
work
I'd
like
to
probably
do
in
the
like
yeah
in
the
next
year
or
so,
but
in
the
long
term,
what
we
need
to
do
is
we
need
to
be
able
to
go
from
a
an
error
column
without
computing,
the
definition
and
repetition
and
sort
of
in
the
ideal
state.
C
We
would
iterate
through
the
you
know,
through
the
through
the
error,
column
and
compute
these
things
as
you
go
along
for
primitive
values,
it's
easier,
but
once
you
once
you
get
to
lists
and
structs
where
it's
it's
deepliness,
that
it
becomes
a
bit
tricky.
But
this
is
the
the
sort
of
direction
that
we'd
like
to
take
and
when
we
take
that
direction,
even
the
computing
of
stats
will
probably
be
a
bit
easier
because
we
would
sort
of
then
be
able
to
say
well
for
this.
C
For
this
column,
chunk
of
yeah
for
this
column
chunk
that
we're
on
the
right.
We
only
want
to
write
like
ten
thousand
of
the
hundred
thousand
records
in
this
in
this
error
column.
So,
let's
slice
the
let's
do
a
zero
copy
slice
of
the
arrow
column
compute
the
stats
for
that
quickly
and
then
pass
them
here
as
an
option.
Instead
of
then
having
to
calculate
them
sort
of
on
a
row
by
row
here.
F
Production
yeah:
we
will
talk
about
it
in
in
a
next
meeting.
I
think,
but
with
pleasure,
to
share
insight
on
what
are
the
use
cases
that
we
faced
using
that
irs
in
production.