►
From YouTube: D3L2: The Genesis of Delta Rust with QP Hou
Description
For this next session of D3L2, we are happy to have a conversation with QP Hou who led the genesis of the Delta Rust project. How did the Delta Rust project start? Why build an open-source data engineering project using Rust and Delta Lake? Learn more about this popular Delta project with QP Hou.
Learn more about Delta Lake: https://delta.io/
QP Hou: https://www.linkedin.com/in/qingpinghou/
Denny Lee: https://www.linkedin.com/in/dennyglee/
Join us on Slack: https://go.delta.io/slack
Delta Rust on GitHub: https://github.com/delta-io/delta-rs
A
Perfect
hi
everybody
I
just
want
to
give
us
a
couple
minutes
to
go
ahead
and
get
ourselves
get
the
live
stream
going
on
both
LinkedIn
and
YouTube.
So
just
give
us
a
couple
minutes.
My
name
is
Denny.
This
is
the
next
session
of
d3l2
with
an
old
colleague
and
friend,
QP
Howe.
We're
we're
like
I,
said
we're
just
gonna,
give
it
a
couple
minutes
just
to
get
the
LinkedIn
to
YouTube
up
and
running.
A
B
A
Perfect
and
then,
let's
see
yeah,
you
can
chime
in
the
chat
or
you
can
chime
in
the
sorry
in
the
zoom
links
in
the
chat
in
YouTube
or
in
LinkedIn,
eventually
I'll
be
able
to
speak
English
today
by
the
way
I
I'm.
Just
it's
apparently
not
my
primary
language.
Today,
that's
also
and
okay,
we
looks
like
we
got
Dan
from
Georgia.
Welcome
aboard,
let's
see
what
else
here,
oh
yep,
we
have
YouTube
online.
We
have
LinkedIn
alive.
A
This
is
great
George
from
Vancouver,
hey,
George,
how's
it
going
buddy
all
right,
let's
literally,
let's
just
go
start
the
show.
So
for
everybody,
who's
wondering
this
is
a
d3l2,
because
I
do
not
have
a
very
good
Imagination
on
what
to
call
this,
because
it
was
literally
a
bunch
of
alliteration.
It
was
just
Delta
Lake
discussions
with
Danny
Lee,
so
hence
it's
called
d3l2
This
is
a
vidcast
and
podcast
series,
okay,
so
for
the
vidcast.
Obviously
you
can
ask
questions
type
in
either
LinkedIn
YouTube
or
through
Zoom.
A
We're
glad
to
answer
questions.
This
is,
and
also
we're,
gonna
read
all
re-stream
it
also
into
into
Spotify
for
your
podcasts
in
the
case
of
podcasts.
Obviously,
you
can't
ask
questions,
but
next
time
go
ahead
and
join
us
on
Delta
user
slack,
and
you
can
probably
even
just
talk
to
us
there
saying
that.
Why
don't
we
start
off
with
the
basics
who's
this
QP
guy,
yeah
I,
want
you
to
start
with.
I
want
to
introduce
yourself
QP
well,.
B
So
hi
hello,
everyone,
I'm,
QP
I'm,
currently
leading
the
course
software
team
here
at
milling,
and
my
team
is
mostly
responsible
for
building
all
the
internal
softwares
and
infrastructure.
Make
sure
that
new
link
is
most
efficient,
biotech
company
in
the
world
and
prior
to
that,
I
would
say
a
technique
that
a
script
where
I
work
on
the
data
and
ml
infrastructure
there,
and
so
that's
when
I.
During
that
time,
I
actually
worked
on
a
really
cool
project
called
Delta,
OS
and
I.
Think
that's
what
started
this
whole
podcast.
A
A
B
So
well,
actually
I
I
was
majored
in
math
and
CS.
So
it's
a
computational
math
actually
and
before
that,
my
the.
A
B
I
got
this.
Computer
was
mostly
because
I
had
a
really
bad
a
shitty
computer
and
I
want
to
play
games.
That
requires
a
lot
of
memories,
and
this
space,
like
I,
could
spend
a
lot
of
time
understanding
how
Windows
works
and
optimizing
it.
But
later,
when
I
got
into
college,
I
I
got
really
fascinated
into
open
source
community
and
I
started
really
I.
Think
most
of
my
education
actually
came
from
just
reading
open
source
code
and
understanding
how
people
do
things
in
open
and
you
know,
participating
in
communities.
B
I
literally
started
with
contributing
to
you
know:
translations
as
relative
and
English,
when
I
was
okay
early
in
college
and
then
and
then
that
started
working
on
Project,
reverse
engineering,
Network
protocols
to
make
you
know
at
the
time
when
I
was
in
college,
Network
only
works
on
Windows.
The
so
I
actually
had
to
reverse
engineer
the
whole
particle
to
make
Linux
clients
that
I
I
actually
run
this
online
desktop.
So
that's
how
I
got
into
Linux
and
then
yeah
a
lot
of
hacking
in
the
community
and
I
I.
B
Think
my
biggest,
you
know
open
source
project.
Well,
the
first
biggest
open
source
project
was
a
open
source.
Reader
e-reader
software
is
called
a
kobeaters
Shameless
plug,
and
this
is
a
project
that
I
worked
on
I,
think
senior
year
in
college
and
today
I
still
think
this
is
like
maybe
the
most
powerful
open
source
readers
for
devices
out
there.
B
So
basically
I
worked
with
a
bunch
of
you
know:
Community
hackers
to
kind
of
jailbreak
Keynotes
and
try
to
get
a
open
source
reader
software
is
running
Aquino
device
so
that
we
can
read
PDFs
and
any
other
formats
we
want,
and
so
that
I
think
at
the
time
it
was
like
actually
the
biggest
lure
project
from
GitHub.
So.
A
So
that
is
so
cool,
so
basically
yeah.
So
basically
it's
just
it's
the
the
hacker
mentality
that.
B
B
Myself
I
think
I'm.
The
kind
of
person
who
actually
ended
did
my
time
to
actually
recover
from
talking
to
people,
so
talking
about
actually
consumes
my
energy.
So.
A
A
I
know
professional,
let's
be
careful
with
this
wording
here,
okay,
but
but
the
context
was
that,
like
yeah,
when
I
first
started
this
entire
Journey,
which
ultimately
led
me
to
like
doing
advocacy
in
the
first
place.
A
Well,
it
was
exactly
that
I
was
I,
was
I,
I
mean
I,
wasn't
talented
by
any
such
imagination,
but
I
definitely
consider
myself
more
of
a
coder
in
the
first
place
right
and
it
exactly
to
your
point,
like
I
was
like
going
like:
oh
yeah,
if
it's
an
email
where
I
don't
have
to
talk
to
them
in
person
and
I
could
just
address
it
in
documents
and
I'm
like
okay,
that's
easier.
It's
like
as
soon
as
like
I
actually
have
to
show
up
to
something
like
oh
nice,.
B
B
A
Yeah,
exactly
no,
you
practice
it
eventually.
It
becomes
sort
of
second
nature
but
yeah
yeah,
but
most
people
like
it
goes
back
to
like
the
what's
the
phrasing
extroverted
introvert
or.
A
Exactly
so
so
it's
definitely
one
of
those
little
things.
So,
okay,
so
now
you're
an
extroverted
introvert
who's
decided
that
you're
gonna
go
ahead
and
hack
some
windows
stuff,
so
to
make
sure
it
works
on
Linux
create
an
awesome
e-reader.
This
is
great,
no
seriously
that
that's
awesome,
you're,
fighting
the
power
against
Microsoft
Windows
I'm,
a
former
Microsoft
employee.
So
you
know
fair
enough.
I
get
it
no
worries,
but
I'm
curious,
then
yeah.
What
due
to
what
ultimately
is
in
the
data
engineering
so
again
for
the
folks
who
may
not
be
familiar
qp's?
B
It
yeah
all
of
that,
because
it's
because
I'm
also
the
e-reader
software
that
we
wrote,
we
actually
have
a
really
fancy
feature
that
actually
can
Reflow
the
PDF
pages,
so
PDF
are
usually
rendered
with
a
fixed
page
size.
B
So
it
looks
really
bad
on
really
small
screen
like
Kindle,
so
we
actually
had
an
algorithm
where
we
actually
Reflow
the
page
based
on
experience
size
so
that
the
PDF
paper
is
actually
readable
on
really
small
screen
devices,
and
so
you
can
imagine
that
the
whole
Reflow
algorithm
it's
highly
heavy,
like
manual
heuristic
based
and
so
when
it's
also
a
random
time
where
deep
coding
actually
became
popular.
So
we
look
at
the
these
papers,
we
switch
it.
B
We
thought,
okay,
we
should
use
deep
learning
to
actually
Reflow
the
page,
smartly,
but
with
more
context
right,
so
so
that
the
model
can
learn
about
all
the
contacts
in
the
papers
that
before
the
page
better.
So
that's
how
I
got
into
machine
learning
and
then
I
joined
a
startup
I
started
working
on
ML
instructor,
okay,.
A
B
After
that,
that's
I
joined
a
script
because
you
know
Emma
they're
hiring,
for
you
know
someone
to
build
the
ml
data
platform
and
then
also
the
company
that
actually
provides.
You
know
a
subscription
for
people
to
read
books
right.
So
this
is
like
really
good
match
of
my
skill
sets
and
my
passion.
So
that's
why
I
joined
script
and
that's
how
I
actually
got
like
started
working
on
data
and
about
infrastructure,
full-time
and
so
at
the
time
we
were
actually
working
on
modernizing
our
data
infrastructure.
B
So
we
were
running
explore
on-prem
in
our
own
Data
Center,
and
we
want
to
actually
move
migrate
to
the
cloud
using
S3
and
the
data
databricks.
So
without
okay.
So
the
first
decision
we
made
was
like
we
wanted
decide
on.
You
know
you
basically
migrating
our
parquet.
You
know
simple
pocket
tables
into
Delta
like
so
because
it
looks
really
awesome
and
so
for
me,
it's
I!
B
Guess
we
at
the
time,
it's
obvious
that
I
should
learn
about
this
technology
that
we'll
be
adopting
so
and
I'm
also
started
picking
up
rust
during
the
same
time.
So
I
was
like
what's
the
best
way
for
me.
Most
efficient
way
before
we
can
run
both
technologies
that
that
is
actually
to
write
Delta
like
from
scratch
and
rust,
so
I
can
learn
both
of
them.
So
that's
how
I
started
the
whole
project.
A
B
Actually
closer
to
OCR,
oh.
A
B
A
This
is
actually
fun
stuff,
but
we're
having
a
whole
other
conversation
about
OCR,
but
okay,
so
so
you're
doing
OCR,
actually,
maybe
I'll
bring
a
little
bit
of
it
in
it.
Were
you
primarily
like
doing
like
the
OCR
like
deep
learning
like
using
tensorflow
pack
torch
or
some
other
projects
and
then
windowing
in
order
to
be
able
to
figure
out
how
to
recognize
the
characters?
I'm
just
curious.
B
So
it's
not
at
the
Character
level,
so
the
the
idea
is
that
the
reason
that
we
want
to
do
OCR
instead
of
NLP,
is
because
we
want
this
algorithm
to
work
for
any
I,
guess
document
format,
so
so
the
best
way
to
do
that
is
we
actually
so
it's
an
abstraction
where
we
render
any
document
formats
into
the
screen
and
let
me
get
the
raw
pixel
and
then
we
start
we
flow
on
the
raw
pixel
instead
of
the
the
text
that
we
extracted
from
the
documents.
B
B
Models
should
be
designed
to
understand
high-level
structure
of
the
documents
and
then
also
able
to
break
it
as
well
as
once
you
understand
that,
then
it
can
actually
further
go
further
down
and
understand
character
by
character
to
refill,
but
it
actually
does
not
understand
the
what's
in
the
character.
You
only
understand,
like
the
I
guess,
the
structure
of
the
document
and
just
color
into
different
pieces
and
then
revolve.
A
It,
oh
so,
okay,
so
you
are
basically
doing
the
windowing
concept.
So
basically
there's
an
assumption
of
what
the
structure
of
the
document
is
yeah
and
then
you're.
Basically,
in
essence,
building
I'm.
Sorry
I
forgot
what
you
just
said,
but
because
I'm
using
windowing
in
my
head
basically,
but
the
idea
is
that
in
essence
it's
basically
you
you
can
so
that
way
you
can
extract
out
each
in
essence,
each
character
throughout.
A
A
In
between
exactly
yes,
oh
that's
so
cool
yeah,
the
only
reason
I
bring
that
up
is
because
actually
I
was
doing
something
similar,
not
quite
okay,
but
something
similar
when
I
was
actually
at
a
concur
where
we're
actually
trying
to
resign
receipts.
Yeah
yeah
I
can
see
that
right,
so
we're
doing
OCR
receipts
and
we're
basically
looking
at
the
you
know.
For
example,
let's
just
say
you're
talking
about
a
restaurant
receipt
right.
B
A
A
This
was
so
like,
let's
be
clear,
smarter
people
than
me
we're
doing
working
on
this
okay,
but
the
idea
is
that
what
was
tricky
about
for
us
is
like
we,
so
that
that
was
the
clear
text,
but
we
also
need
to
be
able
to
do
written
numbers
too
right,
because,
especially
in
Western
countries,
you
have
to
write
the
tip
and
you
have
to
write
the
total.
Yes,
so
we
had
to
OCR
the
tip
of
the
total,
like
in
terms
of
that's
the
handwritten
one.
A
Okay,
so
perfect,
so
you
you
decide
that
you
know
the
this
ml
world
is
complete.
The
world
you
want
to
be
in,
which
is
great,
I,
completely
grok
that
that,
in
fact,
that's
part
of
the
reason
why
I
joined
databricks
myself,
because
my
belief
in
the
ml
world,
but
then
it
seems
like
you,
got
tracked.
What,
with
what
many
early
on
ML
people
do,
which
is
like
exactly
I
need
to
fix
the
infrastructure.
First
right,
the
infrastructure.
B
A
B
I
I,
actually,
to
be
honest,
we
didn't
run
into
any
serious
problem.
It's
more
of
it
started
as
a
learning
project
and
then
once
I
got.
This
working
I
actually
saw
a
light,
12
actually
using
this
project
in
production,
and
so
that's
what
I,
when
we
Tyler
and
I
and
Alex,
so
that
my
script
team
who
were
thinking
about.
B
If
let's
say,
if
we
use
this
in
production,
what
will
we
actually
be
able
to
gain
and
boom
just
from
first
principle,
it
seems
the
first
obvious
game
is
that
we
can
reduce
the
memory
and
CPU
usage
by
a
lot
by
replacing
some
of
the
jobs
that
we
have
in
production
and
yeah
that
that's
that's
how
we
started
you
know
actually
taking
this
project
seriously.
A
Gotcha,
so
so,
for
example,
this
is
now
a
slightly
Shameless
plug
for
an
upcoming
b3l2
session,
in
which
basically
I
interviewed
Tyler.
By
the
way,
the
context
is
you're,
basically
trying
to
figure
out
ways
to
save
money
right,
and
then
you
say
that,
okay,
if
we
were
able
to
reduce
the
number
of
CPU,
then
you
would
use
the
amount
of
memory
to
process
all
this
data.
Then
hey
you've
got
something
to
work
with.
So
there's
a
bunch
of
implications,
though
like
which
is
how
why
are
you?
A
A
B
So
if
I
remember
remember
correctly,
I
think
the
primary
reason
was
actually
reliability,
not
saving
money.
Okay,
well,
that
would
cup
came
as
a
pretty
supply
side
effect
after
we
saw
that
you
know
90
CPU,
usage
reduction,
but
oh.
A
B
Think
part
of
the
reason
was
that
we
at
the
time
the
the
smart
cluster,
the
spark
streaming
cluster
we
had
doesn't
really
support,
does
not
support
Auto
scaling
and
it
was
a
pinpoint
for
us
very,
almost
weekly
basis.
We
had
to
you,
know
manually,
come
in
and
scale
up
or
down
the
cluster
to
support
to
handle
Spike
traffic
right.
B
So
that
was
the
biggest
ping
pong
bottle
soft
and
it
wasn't
easy
to
Implement
like
like
a
really
Reliable
Auto
scaling
Solution
on
top
of
what
we
had,
because
it's
a
really
complex,
I,
guess
it's
it's
a
really
complex
software
and
it's
kind
of
hard
to
understand
the
whole
thing
end
to
end
and
customize
it.
So
at
the
time
I
thought,
if
you
can
actually
build
this
thing
from
scratch
and
only
Implement
what
we
need.
B
We
can
come
up
with
a
much
simpler
implementation
of
this
system
and
then
we'll
be
able
to
actually
Auto
scale
this.
So
the
first
you
know,
example,
is
that
the
whole
system
is
designed
to
be
stateless,
so
it
can
spin
up
as
many
workers
as
you
want
for
a
particular
Kafka
topic
and
will
automatically
you
know,
spread
the
loads
between
these
workers,
and
so
it
makes
it
really
easy
to
Auto
scale
where
you
can
triggers
set
up
server
alerts
based
on
the
threshold
and
then
the
alert
can
trigger.
A
Got
it
got
it
okay,
so
it
was
super
interesting
because
again,
we've
shifted
from.
We
were
just
talking
about
mlocr,
and
then
we
shifted
to
the
fact
that
we're
basically
trying
to
spin
up
infrastructure
that
basically
Auto
scales,
yeah
based
on
some
container
threshold,
in
other
words
the
number
of
topics
or
the
number
topics
or
how
much
the
throughput
of
the
topics
is.
A
Dictating
exactly
how
many
the
CPU
memory
and
if
you
need
to
spin
up
more
of
them,
basically
of
course
for
the
stateless
environment
and
so
okay.
So
let's,
let's
focus
on
that
reliability
perspective.
So
again,
part
of
the
journey
is
ml,
need
to
build
the
infrastructure
to
process
all
this
data.
What
were
the
reliability
issues
that
eventually
LED
you
to
Delta?
In
this
case.
B
So
we're
actually
using
those
at
the
time-
okay,
perfect.
So
it
was
actually
a
reliability
issue.
What
the
the
spot
streaming
job!
That's
streaming
from
Kafka
to
data
tables.
A
Gotcha
gotcha,
okay,
so
so
Delta
was
there
for
you
you're
feeling
comfortable,
but
basically
it
was
the
auto
loading
of
streaming
that
basically
sort
of
messed.
You
up:
okay,
Fahrenheit
yeah,
I
I,
apologize
to
my
data,
brick
stockholders
right
now
so,
but
so
then
that's
what
led
you
to
why
rust
like?
Why
not
go
or
like
python
like
why?
Why?
What
made
you
go
down
the
route
realizing
I'm
gonna,
go
build
all
this
stuff
and
rust.
Yeah.
B
So
I
think
at
the
time
I
was
learning
Russ.
This
is
I,
guess
the
set
the
first
production,
Roscoe
I,
wrote
the
second
serious
rust
project,
I
wrote
so
the
first
project
I
work
on
was
the
PDF
parser
from
scratch
for
the
cable
product
and
then
the
second
one.
B
This
is
one
that,
where
I
just
finished
the
PDF
browser-
and
it
was
like
I-
want
to
learn
actually
why
something
I
can
put
into
production
so
and
then
and
then,
at
the
same
same
time,
I
need
to
learn
how
this
deltaic
thing
works.
So
that's
how
I
basically
came
up
with
the
idea
to
just
write
this
thing
in
Rust
and
learn
both
Concepts.
At
the
same
time,
okay,.
B
A
reason
like
behind
that
technical
release
in
the
sense
that
color
and
I
always
had
this
Vision,
that
we
can
build
a
rust,
a
simple
core
of
the
Delta
Lake
implementation
and
make
it
easy
accessible
to
other
languages
right
so
without
having
required
all
the
languages
to
load
the
jvm
at
runtime.
So
this
is
where
we
thought
that
you
know.
A
Okay,
this
is
really
cool
I'm,
going
to
talk
about
that
in
a
little
bit,
but
I
did
want
to
go
backwards
a
little
bit
okay,
because
you
kept
on
saying
you.
You
mentioned
this
a
couple
times
which
is
you're
saying
stuff.
Like
you
know,
this
is
your
second
real
project
or
a
second
serious
project
and
second,
the
first
production
for
first
production
project;
okay,
that's
great
and
and
but
I
also
want
to
call
out
that
QPS
being
also
excessively
humble
here.
B
A
A
Such
a
huge
impact
is
because
you
actually
have
an
extensive
knowledge
in
your
journey
from
like
ml
to
infrastructure.
Is
that
you've
been
also
very
involved
with
the
arrow
project
and
the
airflow
projects,
the
things
of
that
nature?
So
basically
every
single
system
that
you
typically
care
about
you
end
up
getting
Uber
involved,
went
to
the
point
where
basically
like
you're,
basically
a
maintainer
for
these
projects,
or
at
least
a
hardcore
contributor
right.
So
so
I
did
want
to
call
that
out
that
you
know
you're
you're
playing
down
like
you're
you're.
A
The
massive
skill
set
that
you
have.
That's
all
I'm
saying
thanks
for
coming
out,
yeah,
that's
all
okay!
So
now
you're
playing
with
rust,
you
like
the
language,
let's,
let's
pause
for
a
second
and
just
talk
about
that
for
a
second
and
incidentally,
that's
the
funny
things
that
I
actually
did
the
same
thing
with
Florian.
We
had
a
d3l2
session
honestly
two
weeks
ago
and
so
yeah,
basically,
by
the
way,
when
I
started
this
Pro
this
project,
this
vidcast
podcast,
I
I
did
not.
A
Exactly
it's
very
contagious
So
for
anybody.
Who's
been
listening
to
the
beginning
for
the
audience
of
three
that
has
listened
to
this
from
the
beginning.
I
apologize,
we're
gonna,
go
Rat,
Hole
again
on
Rust,
so
now
so
what
you
start
off
with
the
statement
like
oh
I,
I
I
was
playing
with
rust.
So
why
don't
we
start
with
that?
Why
rust
like?
Why
did
you
start
playing
with
it?
In
the
first
place,
yeah.
B
So
I
used
to
write
quite
a
bit
of
CNC
Plus
Code
before
and
I,
and
especially
like
highly
concurrency
C
code.
Those
I
understand,
like
it's
really
really
hard
to
get
them
correct
lots
of
race
conditions,
not
memory
bugs
and
actually
spend
a
lot
of
my
time,
fixing
this
debugging
and
fixing
these
memory
box.
So
when
I
first
saw
rust
the
fact
that
it
can
guarantee
zero
memory
bug
of
if
the
program
compiles.
This
is
insane
to
me
like
I've,
never
dreamed
that
well,
I've.
B
Never
thought
of
this
is
a
actually
something
that
can
could
happen.
So
for
me,
it's
a
no-brainer
that
I
have
never
learned
this
language
and
see
how
well
it
actually
works.
I,
actually
at
first
I,
wasn't
really
I
I
yeah
I
wasn't
really
buying
into
this
I
I,
but
after
I
actually
tried
it
for
actually
tried.
I
actually
tried
watch
three
times
to
be
honest,
so
it
was
not
easy
to
fight
with
the
compiler,
so
the
third
time
I
create
it
worked.
B
So
I
quite
understand
how
how
the
small
Checkers
and
ownership
works
and
lifetime
work.
So
after
that
I
was
really
productive.
For
me
and
I
can
actually
give
some
anecdotes
where
I
think
I've
been
working
on
data
Fusion.
The
query
engine
Google
rust
query
engine
for
a
little
bit,
almost
two
years
now
and
during
during
these
two
years,
it's
highly
concurrent
career
engine
as
well,
at
least
for
me.
B
I've
never
run
into
a
single
memory,
plug
or
race
condition
book
all
the
books
I
ran
into
was
you
know
my
own
Shady
code,
so
this
is
really
literally
like
writing.
Low
level
programming
is
highly
concurrent
a
lot
of
low-level
programs
and
not
ever
have
to
think
about
race
conditions
or
memory
bugs.
This
is
really
liberating.
This
itself
may
be
so
much
more
productive.
A
Yeah,
so
the
fact
that
you
don't
have
to
think
of
a
garbage
collection
at
all
like
you,
this
is
not
like,
and
so
what's
interesting
for
me
as
I'm
learning
myself.
Basically
this
that
concept
of
ownership
is
is
an
audience,
isn't
obvious.
It's
also
a
little
confusing
at
times,
because,
like
integers,
don't
integers
your
loud
cloning,
basically
but
strings,
you
can't
right,
yeah,
yeah
and
so,
but
go
ahead.
Sorry
yeah.
B
B
But
but
I
think
now,
unless
you
get
familiar
with
how
the
compiler
works
and
it's
not
a
problem
anymore,
but
naturally
it
was
really
frustrating
like.
Why
can't
you
just
understand,
this
thing
is
correct.
A
Gotcha,
so,
okay,
so
perfect,
so,
but
the
idea
is
that
in
essence,
the
rust
itself
provides
all
these
additional
guard
rails
to
ensure
that
exactly
as
you
called
out
that
there's
you
don't
run
into
memory
problems,
you
don't
run
into
race
conditions,
which
is
great.
So
then,
like
I'm,
just
curious,
did
you
start
Delta
rust
before
data
Fusion?
Did
you
start
data
Fusion
before
the
Delta
rest,
I'm.
B
Just
so
I
got
into
arrow
and
data
Fusion
because
it
got
the
rust,
so
okay,
well
Delta,
take
a
space
off
Json
and
parquet.
So
we.
B
That's
right
read
and
write
prank
a
from
rust,
so
right,
the
the
the
only
right,
I
guess,
perky,
rust
implementation
at
the
time
was-
was
in
the
the
arrow
rust
project.
So
the
error,
rust
project
implemented
the
arrow
format
and
the
pro
K
format
so
that
you
can
convert
data
between
these
two
formats.
So
that's
how
I
got
into
arrow
and
starting
helping
fixing
bugs
or
adding
features
missing
features
that
I
need
for
Delta
RS
and
that's
how
I
eventually
got
become
like
PMC
members
committed
for
an
error
project.
A
Which
is
pretty
cool?
Okay,
so
let's
go
backwards
a
little
bit
so
now
back
into
Delta
rust,
because
that's
how
you
started
the
assault
process.
So,
basically
for
you,
it
was
just
like
hey.
It's
a
I
love,
rust,
I,
I
love
the
fact
that
it's
handling
memory
for
me,
yay
I,
need
to
build
a
more
reliable
system
that
I
can
more
easily
Auto
scale.
B
A
This
can't
be
right,
we're
pro
we're
missing
like
half
the
data
like
that.
That's
the
only
way
this
works
right
yeah.
So
no
no
I
I
get
we'll
talk
about
that
a
sec,
but
I'm
just
curious.
So
so
what
was
it
like?
You
know
basically
deciding
to
yourself
like
hey
I'm,
just
this
project,
which
was
you
know,
Delta,
which
was
basically
JV
in
base
skull
base.
You
got
a
bunch
of
folks
at
databricks
we're
going
like.
A
A
Yeah,
but
but
still
like
yeah
yeah,
you
corrected
the
protocol.
Spec
yeah
I
was
saying,
like
you
corrected
it
right,
so
I'm
just
like.
So
what
was
it
like?
Like
how'd,
you
feel
like
what
made
you
decide
that
yeah,
let's
just
go,
do
this.
This
is
this
is
what
we're
gonna
go.
Do
I
mean
obviously
having
Tyler
saying
yes,
let's
go
freaking.
Do
it
because
I'm
pissed
off
about
reliability,
certainly.
A
You
know
I'm
just
curious
like
if
that
was
really
it
or
like
anything
else
that
was
like
sort
of
the
driving
factor
for
you,
yeah.
B
I
mean
part
of
being
also
a
rust,
Fanboy
Lafayette
House
at
the
time
and
then
right,
yeah
I
get
we
get
about
rust
all
the
time
and
then
the
other
thing
is
I
I.
Guess
it's
more
of
an
engineering
Philosophy
for
me
where
I
I,
I
love,
simple
system
and
simple
implementation,
designs,
and
and
in
this
particular
case,
when
we
actually
look
at
that,
what
data,
how
data
spec
works
and
what
we're
actually
doing
by
you
know
moving
bytes
from
Kafka
to
Delta
tables.
B
B
A
B
B
A
Unequivocally,
yes,
there's
there's
no
debate
on
that,
one,
okay,
so
perfect!
So
then
you
started
building
Delta
rest
and
basically-
and
just
like
you
mentioned
before-
that's
actually
what
got
you
into
working
on
Arrow
working
on
data
Fusion,
so
I
want
to
tell
people,
because
not
everybody
understands
what
really
arrow
is
and
what
really
data
Fusion
is.
So
once
you
know
in
your
word,
since
you
know
your
PMC
contributor
member,
like
yeah,
why
don't
you
explain
that
to
the
audience
here
like?
Why
are
these
Technologies
actually
so
important?
In
this
case,
yeah.
B
So
arrow
is
a
in
memory,
a
columnar
in
memory,
data
format
and
the
the.
So,
if
you
think
about
column
format,
another
really
popular
one
is
part
A.
So
what's
the
difference
between
iro
and
pro
K
is
because
parquet
is
optimized
for
this
on
disk
storage,
where
we
actually
do
a
bunch
of
encodings
to
make
sure
to
reduce
the
size
on
disk
right
so
for
for
Arrow,
the
biggest
I
guess
the
design
trade-off
is
we
trade
trade-off
size
for
compute
runtime
efficiency
in
the
sense
that
all
the
Arrow
data
will?
B
If
you
pass
this
error
data
between
different
processes
or
language
runtimes,
we
guarantee
that
there
is
no,
you
don't
have
to
do
any
serialization
or
deserialization,
so
the
same
data
by
bytes.
If
you
pass
that
to
a
different
process
or
different
runtime
different
function,
they
should
be
able
to
read
the
data
as
is,
and
that
makes
I
guess.
Data
exchange
really
efficient
right.
So
that's
one
thing
and
then
also
because
it's
column
based
it
makes
you
know
analytical
type
workloads
really
efficient
as
well.
B
When
you
need
to
do
any
kind
of
application
over
the
data
and
if
they're
already
in
column
format,
then
you
know
your
query
will
be
run
a
lot
faster.
So
that's
the
whole
system
arrow
and
then
data
Fusion.
It's
basically
a
new
query.
Engine
built
from
scratch
and
started
by
ND
and
and
the
whole
idea
of
that
is.
B
We
want
to
build
a
query
engine
using
pure
rust
and
based
off
the
the
error
memory
format.
So
it's
an
oltp
optimized
for
oltp,
the
query
engine
itself
and
yeah.
That's
that's
the
idea
behind
that.
That's.
B
And
the
story
is,
and
the
action
I
believe
he
started
this
project
because
he
was
writing
a
book
about
how
query
engine
works.
B
A
B
A
Like
any
other
open
source
project,
there's
always
a
there's
a
there's,
an
ebb
and
flow
to
it.
So
no
no
I
get
that
no
worries.
I
I,
don't
think
we're
keeping
this
thing
for
a
posterity
right.
So
oh
Dan
just
asked
a
question:
hey
who's
the
person
writing
the
book
on
query
engines
again
and
I
believe
you're
referring
to
Andy
Grove
right.
Yes,
exactly
yeah,
exactly
so
Andy
Grove
Dan,
that's
the
person
you're
looking
for
I'm,
hoping
to
actually
get
him
involved.
A
One
of
these
sessions
as
well,
because
he
he
and
I
have
like
rather
a
bunch
of
side,
slack
conversations
but
I,
don't
think
I
ever
get
it.
Considering
we're
talking
about
each
other's
projects,
all
the
freaking
time
exactly
funny,
as
hell.
B
He
actually
opens
actually
made
the
book
free,
so
anyone
can
download
it.
A
A
A
Oh
sorry,
if
I
can
actually
use
a
keyboard
correctly,
this
okay
I'm,
going
to
put
it
directly
into
our
LinkedIn
and
to
our
YouTube
So
Perfect.
This
is
a
small
segue.
We
are
busy
upselling
somebody
else's
book.
This
is
great.
We're.
A
We're
not
sponsored
by
him
so
no
worries.
This
is
basically
because
we
we
think
this
stuff
is
great
and
now
Dan.
Thank
you
very
much
for
helping
us
out
like
this.
Okay,
all
right
Okay,
so
we've
talked
about
how
you
gotten
the
data
Fusion.
Now,
let's
specifically
talk
about
the
metrics
and
the
stats
now.
A
Come
out
of
it,
so
you
build
Delta
rest
you're,
you've
built
a
really
and
I
mean
this
is
a
compliment.
A
simple
system
right
I
mean
that,
like
in
the
best
way
possible,
which
is
like
you're
able
to
recreate
the
entire
Delta
protocol,
you're,
not
using
any
of
the
Scala
any
of
the
jvm,
nothing
that
we
were
that
was
originally
written
with
for
spark
you've
built
a
very
simple
protocol
that
that,
basically
and
again,
you
made
some
feature
changes
with
data
Fusion
or
with
with
pi
Arrow.
A
Sorry
Arrow,
we'll
talk
about
pyro
later
the
cool
now
you've
got
it
up
and
running
what
happened
that
resulted
in
you
seeing
this
90,
like
reduction
in
utilization,
I'm
just
curious,
like
so
yeah
tell.
A
B
Yeah
so
shout
out
to
the
team
like
so
actually
so
I
did.
The
project
I
was
mostly
working
on
the
Delta,
RS
implementation
and
then
there's
another
team,
a
Christian
and
Michelle.
They
were
actually
they
were.
The
team
actually
worked
on
the
Kafka
Delta
interest
project
right,
so
that
is
part
of
that
leverage,
Delta
RS
and
reads
from
Kafka
and
then
runs
to
Delta
table
right.
So
they
were
the
one
who
actually
worked
on
that
demand
service,
the
interesting
service
yeah.
So
we
got
into
production
and
then.
B
The
cluster
and
we're
using
data
back
at
a
time,
so
the
message
came
back
and
it
was
too
low
to
the
point
that
without
this
is
probably
something
wrong
with
the
how
we
animate
metrics
from
a
raw
statement.
So
but
then
we
actually
double
checked
everything
and
actually
everything
checks
out
so
yeah
I,
guess
that
that's
how
we
found
out
wow.
This
is
actually
a
lot
more
efficient
than
what
we
expected,
and
this
is
like
I
think
the
power
of
having
a
really
simple
system
and
simple
implementations.
A
Right
so
to
provide
context
for
everybody
he's
talking
about
QB's
talking
about
a
Christian
and
Misha
from
script.
They
created
the
project,
Kafka
Delta,
ingest,
okay,
that
that's
that's
the
project
that
QB
is
referring
to
so
there's
basically,
what's
really
cool
about
this
is
that
you
basically
built
the
system
for
Simplicity
and
for
reliability
and
because
of
the
efficiencies
around
memory,
you
literally
got
the
performance
improvements
for
free
in
this
case.
Yes,.
A
B
Right
we're
only
executing
the
code.
That's
needed
to
to
do
that
job.
So
the
I
guess
the
co-execution
point
of
view.
It's
also
a
lot
more
efficient
than
the
price
version,
so.
A
So
so
from
that
standpoint,
then,
like
the
the
the
interesting
context
around
this
is
that
it
seems
like
especially
with
the
ability
for
rust
to
be
that
much
more
memory
efficient.
From
that
standpoint,
it
seems
like
you're
able
to
like
forget
about
the
90
part,
though
obviously
that's
great,
it's
just.
It
seems
like
you're
able
to
be
that
much
more
efficient
with
the
existing
CPU
as
opposed
to
needing
to
distribute
the
problem
out.
Yes,.
B
Exactly
so,
so
that's
also
funny
because
when
we
started
designing
the
system,
we
designed
it
to
be
really
scalable,
we're
honestly,
tens-
or
maybe
you
know
50
workers
per
topic
and
what
ended
up
happening
is
for
most
of
the
topic.
We
actually
only
need
one
worker
and
we
all.
We
only
had
to
spin
up
two
to
three
workers
for
the
larger
topics,
which
is
we
only
have
maybe
a
couple
of
them,
but
all
of
the
top
topic
actually
works
with
a
single,
basically
a
single
instance.
So.
A
So
this
I'm,
pretty
so
Candace
who's,
our
wonderful
Linux
foundation,
events
coordinator,
she's,
probably
she's,
on
the
line.
Right
now,
don't
worry,
I'm,
not
making
you
talk
right
now,
but
she's
probably
heard
the
story
too
many
times.
That's
why
I
wanted
to
apologize
to
her
explicitly
here
and
which,
in
which
yet
again
I'm
bringing
up
the
story
of
I
used
to
work
with
Frank
bashiri
back
in
Microsoft
research?
He
was
back.
He
was
a
Microsoft
research.
I
was
in
Microsoft
back
in
2007.
A
Yeah
but
exactly
another
time,
but
the
reason
I
bring
this
story
up,
which
is
what
which
I
love
doing
is
that
so
Frank
mcsherry
is
a
really
really
smart.
Guy
I,
just
I
just
happen
to
be
lucky
enough
to
be
in
the
room:
okay
and
he
he
and
I
would
have
these
friendly
debates
every
so
often,
okay
and
by
the
way,
he's
the
to
be
clear
how
how
much
respect
he's
the
CTO
materialize?
A
Okay,
so,
like
he's
a
really
really
smart
guy,
okay
and
he
and
I-
would
have
these
friendly
debates
and
I
was
basically
saying.
No,
no.
The
wave
of
the
future
is
distribution.
We
don't
have
to
care
about
the
CPU
utilization
memorialization
anymore.
We
can
be
inefficient,
it
doesn't
matter
we're
just
going
to
distribute
our
way
out
of
the
problem
and,
of
course,
boom.
You
know
that
naturally
led
to.
Why
am
I
working
on
spark
right
and
then
him
he's
like?
No,
no,
we
can
be
more
efficient.
We
can
be
more
efficient.
A
We
can
more
fortunate,
which
led
him
basically
doing
rush,
and
so,
if
you
even
look
at
the
GitHub
repo
for
materialize,
a
lot
of
the
code
is
all
written
in
Rust.
Why?
Because
he's
got
a
streaming
database,
that's
extremely
efficient
and
so
what's
interesting.
Is
that
sure?
Maybe,
during
the
time
that
spark
became
the
you
know,
de
facto
Big
Data
engine
I
won
the
debate.
A
But
this
is
me
again
saying:
yeah
I,
think
I
I
think
Frank
actually
won
the
debate,
because
to
your
point,
exactly
you
over
engineered
thinking
that
you
would
need
it
all
stuff
it
in
the
end.
You're.
A
B
Yeah
not
at
all
this
is
I
actually
built
the
python
binding,
mostly
for
demo
purposes.
B
It's
easy
to
demo
that
you
know
using
python
than
using
rust,
because
if
I
show
the
rust
code
to
people,
people
won't
really
understand
that
because
right
because
index
might
not
be
familiar
to
them,
so
I
it's
easier
for
me
to
show
the
python
code
and
show
them
hey.
You
can
actually
read
this
without
jvm,
so
that
was
how
I
started.
A
Okay,
oh
sorry,
by
the
way
I
want
to
chime
in
because
Dan
Dan
I'm
really
sorry
that
I
can't
actually
have
you
on
on
this
line,
because
what
you're
saying
is
really
interesting,
so
I'm
just
going
to
quote
what
he
says
here:
okay,
it's
a
so
dancing
I've
written
a
number
of
algorithms
and
software
written
in
C,
plus
plus
and
then
he's
rewriting
into
rust,
Windows,
Mac
and
Linux,
and
in
every
single
instance
the
performance
is
either
the
same
or
slightly
better,
but
the
memory
of
consumption
of
rust
is
always
better,
sometimes
consuming
only
50
of
what
C
plus
is
consuming
right.
A
So
yeah
like
Dan.
Thank
you
for
that
that
that
call
out
this.
This
is
why
you're
hearing
us
being
so
imag
like
I'm,
not
sure
I'm,
a
Delta,
evangelist
anymore,
I'm,
pretty
sure
I'm
a
rust
evangelist
like
it's
so
yeah
Dan,
we're
we're
completely
in
agree
with
you,
so
okay,
so
I
just
want
to
call
that
out.
But
let's
go
back
to
the
python
thing.
Okay,
you
built
it
for
demo
purposes.
Only
just
so.
You
could
show
the
code
base
because
it
was
easy
easier
to
show
it
than
showing
the
rust
crates.
A
Okay,
so
so
let
me
see
if
I
can
go
straight,
this
project
dealt
which
is
Delta
Lake
and
pip,
install
Delta,
Lake,
okay,
which
is
you
know
on
in
Pi
Pi,
which,
incidentally,
there's
a
I.
Think
last
month
is
like
about
10.5
million
downloads
for
Delta,
okay
wow,
somewhere
between
1.3
to
1.5
million
of
them
are
just
Delta
Delta
Lake
I.E,
the
Delta,
rust
python,
bitings,
okay,.
B
Guess
everyone
is
telling
my
demo
word
yeah,
but
I
guess
yeah,
so
part
of
that
is
also
I
really
want
I.
Guess
we
want
to
prove
the
point
that
we
can.
If
you
build
this
thing
it
was
we
can
share
the
same
core
implementation
between
multiple
languages
and
building
the
WASP
button
was
actually.
This
initial
demo
was
actually
really
minimal
work
because
there's
a
really
good,
rust
crate.
That
makes
it
really
easy
to
write
python
extensions
in
Rust,
so
yeah
that
book.
B
A
No,
that
that's
awesome,
I
mean
I,
I,
think
I'm.
Actually
pretty
much
done
for
most
of
my
questions.
I
guess
the
only
thing
I
had
left
is
more
like.
Where
do
you
see
this
project
going
in
the
future?
I
mean,
like
you,
you're
the
one
who
created
as
a
as
just
for
like,
as
in
you
had
a
you,
had
a
manager
that
was
a
rust
geek
too
and,
like
you
said,
like
it
was
one
of
your
first
production
systems
right.
A
Something
I
apologize.
What
was
the
first
production
systems
in
Rust
like
where?
Where
do
you
see
it
going?
What
do
you
think
it
we
should
be
addressing
in
the
community
for
for
Delta,
rust
and
and
don't
limit
just
to
Delta
by
the
way,
just
just
in
Rust
in
general,
just
because
I
feel
that
there's
a
there's
a
lot
of
just
a
lot
of
oh
God
I
was
a
goal
marketing
on
you.
A
lot
of
synergies
between
us,
so
I
apologize
for
that
one,
but
yeah.
B
Yes,
so
to
fair
I'm,
not
the
main
person
who's
driving
this
project
anymore.
This
is
I'm
really
happy
that
this
is
now
I.
Actually
Community
Driven
projects,
where
will
and
mobile
Robert,
has
been
doing
a
lot
of
most
of
the
community
work
at
the
moment
that
a
lot
of
information
work
as
well.
So
for
me,
I
I
still
have
you
know
a
couple
of
personal
problem:
I
guess
each
I
want
to
scratch
on
the
rust
project.
B
I
guess
one
of
the
biggest
one
is
even
though
the
project
is
actually
really
efficient
and
through
CPU
and
memory
usage
we
think
that
we
can
actually
get
another.
You
know
10
volt
Improvement
on
a
memory
memory
usage
by
basically
switching
the
data
structure
that
we
use
in
Delta
OS.
So
that
is
something
I
really
want
to
see
happen
and
that
will
actually
make
that
ours,
I
guess
literally
the
most
efficient
Delta
like
the
information
there.
So.
B
Thinking
about
so,
the
main
idea
is
to
switch
so
the
current
way
that
Delta
RS
keeps
all
the
Delta
metadata
in
memory
is
we're
using
a
row
based
format.
So
if
we
switch
to
a
columnar
based
format,
then
we
actually
reduce
the
overhead
memory
overhead
by
10,
volt
or
even
100
hold
fold
depending
on
what
data
metadata
we're
loading
in
memory.
So
that
would
have
a
huge
impact
to
the
runtime
efficiency.
A
Really
cool
to
know,
actually,
we've
got
a
quick
question
from
Dan
which
Dan
by
the
way,
make
sure
you
join
us
on
slack.
You've
got
some
great
questions
of
great
calls.
We'd
love
to
talk
to
you,
but
he's
wondering
if
and
I'm
gonna
butcher,
this
guy's
name,
okay,
so
I
apologize,
John
Jin
sets
noria
DB.
Has
that
influenced
any
of
your
work
in
Russia?
So
far,
that's
what
Dan's
asking.
B
I
have
not
yeah
I'm,
not
really
familiar
with
this
project.
I
have
heard
of
this
project,
but
I'm
actually
not
looked
into
the
code.
Okay,.
A
That's
cool!
No!
No!
So
that's
cool
that
you.
Basically
a
lot
of
this
has
done
like
not
with
like
some
of
these
other
influences,
which,
admittedly
enough
I
don't
know
much
about
either,
but
I'm
just
curious,
like
I
I
did
want
to
go
back
to
before.
We
end
this
today's
session
to
go
back
to
the
part
where
you
said,
like
hey
yeah,
you
know
you're
glad
it's
a
community
of
different
projects.
You
call
out
both
Robert
Pack
and
will
Jones
they.
A
A
It's
one
of
those
it's
one
of
those
days
but
yeah
I
mean
it's
one
of
those
things
where
you've
got
to
be
really
happy
that
basically
this
Project's
able
to
go
ahead
and,
like
start
with
you
and
for
that
matter,
even
your
company
and
you've
left
and
joined
a
different
company.
And
yet
the
project
is
just
continuously
growing
and
so
yeah
I
I
I'm,
just
thinking
like
yeah,
you
got
to
be
really
happy
about.
A
B
Yeah,
exactly
and
also
kudos
to
both
of
you
at
math,
for
you
know
doing
all
this
community
work
and
wouldn't
have
been
there
without
all
of.
A
B
Know
well
actually,
I
want
a
cloud
that
you're
the
one
who
actually
bought
Delta
RS
from
into
the
Delta
IO
org
right.
B
A
We
did
it
together,
it's
it's
all
good
man
we,
this
was
definitely
but
all
of
us
working
together
on
this
stuff,
I
I
I,
don't
want
to
play
up
my
responsibilities,
especially
because
you
guys
were
the
ones
who
created
it.
So
no,
let's
not
do
that.
All
right
and.
B
My
secret
Penance
to
be
able
to
adopt
Dr,
OS
and
New
link,
so
I
can
start
working
on
this
full
time
again.
Perfect.
A
Perfect
well,
we'll
definitely
put
this
as
a
side
note,
but
let
me
know
how
I
can
help
with
that.
Yes
for
sure,
perfect!
Well,
you
know
what
I
think
we're
done
for
today
in
terms
of
questions
and
answers,
but
at
QP.
This
is
great,
like
always
really
appreciate
you
taking
the
time.
A
I
really
appreciate
everything
you've
done
as
the
the
Genesis
for
from
the
standpoint
of
even
today's
session,
the
Genesis
of
Delta,
rust
and
yeah
we'll
catch
up
soon,
because
I
need
a
I
need
to
head
down
to
your
neck
of
the
woods
anyway.
So.