►
Description
Denny Lee from the Delta Lake project discusses in detail the new Native Delta Lake connector for Presto.
A
Okay,
hopefully
you're
able
to
see
my
slides
so
just
give.
Let
me
know
give
me
a
nod
if
there's
any
issues
or
perfect.
Thank
you
very
much
and
yeah.
So
today
we're
gonna
talk
about
building
reliable
data
lakes
with
delta
lake,
but
obviously
the
start
of
the
show
is
presto.
So
let's
go
ahead
and
start
the
show
here.
A
Okay,
so
in
case
you're
wondering
who
I
am
and
why
I've
decided
to
go
ahead
and
join
you
guys
here
today,
it's
the
I'm
actually
a
senior
staff
developer
at
theater
bricks,
long
time,
brickster
prior
to
that.
Actually
I've
been
working
with
apache
spark
since
0.6
and
delta
lakes
inception
previously
I
was
the
senior
director
of
data
science,
engineering
at
sap,
concur
and
prior
to
that
I
was
a
principal
program
manager
at
microsoft
for
azure
cosmos,
db
project
isotope.
A
What
is
now
known
as
azure,
hdinsight
and
also
sql
server,
so
a
little
bit
of
experience
when
it
comes
to
data,
and
so
hopefully
we're
gonna
make
this
a
fun
session
and
glad
to
chat
with
you
all
on
on
all
things.
Well,
you
know
delta
lake
so,
and
so,
if
you're
not
familiar
with
what
delta
lake
is
it's
an
open
source
storage
format
that
brings
acid
transactions
to
big
data,
workloads
on
cloud
object
stores
and
it's
the
key
gradient
for
building
your
lake
house
now?
A
Why
do
I
sort
of
harp
on
this
concept
and
in
fact
I
want
to
talk
about
the
evolution
of
data
management
from
the
prospects,
especially
from
the
fact
that
I
used
to
be
part
of
the
sql
server
database,
team,
okay
and
so
by
the
way,
there's
a
little
bit
of
background
noise.
I'm
not
sure
what
happened
there.
So,
if
you
can,
if
you're
whoever's,
not
muted,
if
you
can
mute
yourself,
please
thank
you
very
much,
and
so
why
do
we
need
lake
houses
right?
A
Well,
let's
start
talking
about
data
warehouses,
so
this
is
actually
literally
my
historical
context.
You
know
hey,
let's
go
build
a
data
warehouse
right
and-
and
I
was
part
of
the
sql
server
team,
like
I
mentioned
before,
and
I
helped
build
some
of
the
largest
sql
server
implementations
that
you
you
had
found:
okay,
pretty
sweet
right
and
we're
going
like.
Yes,
let's
go
build
these
data,
warehouses,
they're,
purpose-built
for
bi
and
reporting
and
in
addition
to
helping
the
build
sql
server
data
warehousing
itself.
A
I
also
was
part
of
the
sql
server
analysis
services
team
right,
the
the
bi.
What
eventually
turned
into
what
now
is
known
as
power
bi,
but
basically
we
were
building
these
super
analysis.
Services
cues
that
allow
you
to
like
really
fast
bi,
ad-hoc
querying
that
would
come
back
in
seconds
and
to
provide
a
little
context.
I
was
actually
one
of
the
first,
if
not
the
first
microsoft
employee,
to
present
at
hadoop
summit
hadoop
world,
where
we
had
showcased
the
creation.
A
This
is
within
collaboration
with
yahoo,
the
at
the
time
the
largest
analysis
services
cube
on
the
planet.
Right,
it
was
24
terabytes.
The
source
of
that
cube
was
a
2000
node
hadoop
cluster
at
the
time.
So
you
know
a
little
bit
of
data,
so
pretty
cool.
You
know
this
is
awesome.
We
have
the
best
bi
the
best
reporting,
but
there
inherently
were
problems
with
this
right.
There's
no
support
for
video
or
audio
or
text
there.
What
definitely
wasn't
any
support
for
data
science,
machine
learning.
A
You
know
we
had
these
schemas
these
star
schemas
that
you
had
to
follow
through
when
you
say,
limited
support
for
streaming.
That's
actually
us
being
a
little
too
nice
frank.
The
fact
is
really.
There
was
no
support
for
streaming,
certainly
close
and
proprietary
formats
right.
So
you
know
you
had
to
actually
in
the
case
of
sql
server
or
the
case
analysis
services,
and
this
is
similar
to
any
other
data
warehouse.
A
These
were
specific
formats
that
you
had
to
work
with
right
and,
of
course,
it
was
extremely
expensive
to
scale
out
to
give
you
some
context
when
we,
while
hadoop
itself,
was
designed
very
well
for
scaling
out.
When
I
talked
about
that
24
terabyte
cube
solution.
A
That
was
a
single
box
that
basically
we
had
to
maximize
to
the
nth
degree
where
we
also
had
to
go
ahead
and
was
it
we
actually
had
a
clone,
and
so
we
actually
did
hardware
cloning
in
order
to
be
able
to
support
if
there
was
any
concerns
with
downtime
and
we
ended
up
using
not
my
choice
by
the
way,
an
oracle
rack
server
to
be
the
staging
server,
so
the
most
expensive
staging
server
you'll
ever
see
so
really
really
expensive
way
of
doing
it.
A
A
Obviously,
not
that's
why
we
created
data
lakes,
and
I
was
part
of
the
project
isotope,
which
is,
like
I
said
mentioned
before,
as
part
of
was
the
precursor
to
azure
hdinsight,
and
so
we
were
a
nine-person
team.
That
said,
hey,
let's
bring
hadoop
into
microsoft.
This
is
during
the
bomber
year.
A
So
that
was
a
lot
of
fun
when
we
did
that,
pull
that
one
off
okay,
and
because
of
that
you
know,
hey
we
got
hadoop,
we
got
it
running
so
now
you
can
do
your
data
science
and
now
you
can
do
your
ml
and
you're
good
to
go
and
sure
maybe
the
queries
were
slow,
but
if
I
had
you
know
terabytes
hundreds
of
terabytes
or
petabytes
of
data,
the
query
could
actually
finish
as
opposed
to
if
I
was
trying
to
chuck
into
a
sql
server.
A
It
would
just
never
finish
right
now,
of
course,
standard
problems
with
data
lakes,
poor
bi
support,
complex
setup,
poor
performance
and
unreliable
data
swaps,
and
I
really
want
to
harp
on
this
little
point
here.
Okay,
this
idea
of
unreliable
data
swamps.
You
know
you
hear
about
data,
salad,
data
swamp
data,
whatever
right,
but
the
context
is
we
pitched
the
idea
and
I'm
guilty
of
that
by
the
way.
So
I
want
to
call
myself
out
on
this
the
idea
that
we
would
say:
hey.
A
You
could
just
go
ahead
and
solve
all
of
your
problems
by
simply
going
ahead
and
doing
schema
on
read
like
the
idea.
Was
that
no
problem?
A
We
were
done,
you
know
you
build
your
data
lakes
and
and
we
could
just
magically
solve
the
problem
every
time
now
for
anybody
who
is
using
presto
or
for
that
matter,
spark
delta
lake
anything
else
any
other
these
systems.
We
know
that
that
statement's
not
even
remotely
close
to
true
right.
So
what
we
need
is
we
need
somewhere
in
between
those
the
best
of
both
worlds
and
that's
what
this
concept
of
lake
house
really
is
about
right
and
and
the
reason
why
is
because
people
say
well?
A
No,
no,
let's,
let's
see
if
I
can
just
have
one
side
of
it.
That's
running
warehouses,
one
side
of
its
running
data
lakes,
so
the
bi
would
be
we
would
go
and
reporting
would
go
off
the
warehouses,
but
then
the
machine
learning
data
science
in
real
time
and
and
that
would
all
go
off
the
data
lakes
and
you're
perfectly
fine,
except
what
you're
doing
is
that
you
run
into
basically
this
really
messed
up,
lambda
architecture,
where
you
you
have
to
reconcile
the
data,
that's
sitting
on
one
side
versus
the
other
side
right.
A
So
what-
and
this
is
what
we
go
back
to
saying
well.
Well,
then,
really
what
you
need
is
a
lake
house
which
is
the
best
of
both
worlds.
I'm
saying
I
can
take
the
transactional
consistency
and
that
I've
got
with
warehouses,
but
the
flexibility
that
I
got
with
data
lakes,
so
I
could
go
ahead
and
actually
handle
all
the
different
data
domains,
whether
it
was
streaming
bi,
data,
science,
machine
learning
or
whatever
else
right,
and
so
in
order
for
you
to
build
that
lake
house.
This
is
the
context
here
right.
A
It
starts
here,
you're,
going
like
great.
We
start
with
delta
lake
great,
which
allows
us
to
have
that
scalable,
open
general
purpose,
transactional
data
format,
that's
half
of
the
solution.
What's
the
other
hats
of
solution
using
a
high
performance?
Query
engine
of
course,
today
we're
going
to
talk
about
presto,
but
it's
applicable
to
all
the
above,
and
that's
the
context
here
right
that
this
is
how
you
can
go,
build
your
lake
house
by
get
allowing
yourself
that
high
performance
query
engine
yet
also
scalable
or
open
general
purpose.
Transactional
data
format.
A
A
A
Cool
all
right,
so
then,
let's
talk
about
so
in
order
for
delta
lake
to
work.
It's
not
just
about
your
scalable
storage
for
your
data.
It's
also
about
your
scalable
transaction
log,
for
your
metadata,
the
metadata
that
defines
what's
really
going
on
with
your
data,
we're
going
to
talk
about
that
a
lot
more
in
a
second
okay.
So
in
order
to
have
scalable
storage,
let's
talk
about
the
old
good,
old-fashioned
cloud,
optics
stores
right.
A
A
Adl's
gen
2
right
doesn't
matter
perfect.
Well,
then,
what
is
a
scalable
transaction
log?
Well,
in
the
same
folder
as
your
delta
lake
table
in
the
path
of
the
table,
we
put
an
additional
folder
under
sport,
delton
square
log
inside
that
folder
is
a
bunch
of
json
files
and
then
subsequently
parquet
files.
A
What
is
that
well
is
the
delta
transaction
log.
It
is
a
scalable
transaction
log,
a
sequence
of
metadata
files
to
track
all
the
operations
that
are
made
on
those
files
that
make
up
your
table.
Okay,
so
you
store
it
in
cloud
storage
along
with
the
table,
so
that
way
it's
portable
right.
So,
in
other
words,
if
you
move
the
table
from
one
system
to
another
system,
just
as
long,
you
move
that
fold.
All
the
everything
underneath
that
table
path
table
folder
the
transaction
log
comes
with
it.
A
So
it's
portable
right
and
because
it's
made
up
of
json
files
and
every
10th
checkpoint
it's
a
parquet
file.
Then
you
can
read
and
process
the
metadata
which
tracks
all
the
files
in
parallel
and
that's
the
key
thing
so,
in
other
words,
you're
taking
advantage
of
the
scale,
the
j
distribute
scale
that
you
actually
have
a
storage
to
scale,
your
transaction
log.
That's
what
delta
lake
allows
you
to
do
all
right,
so
it's
important
that
these
transaction
law
commits
are
done
in
as
ordered
and
as
atomic
commits.
A
In
other
words,
they
either
happen
or
they
don't
so,
for
example,
if
I'm
doing
an
insertion
like
so
I'm
inserting
new
rows
of
data.
Well,
that
rows
of
data
comprised
of
let's
just
say,
zer01.parquet
and
02.rk
perfect,
so
in
the
first
entry
into
the
transaction
log
log00.json,
what
happens
is
that
we
record
the
fact
that
the
table
is
now
comprised
of
01.4k
and
002.k
perfect,
then
now
we're
about
to
do
an
update.
A
Okay,
so
an
update
basically
says:
okay,
I'm
going
to
do
an
update
which
ends
up
deleting
a
whole
bunch
of
files,
because
it's
a
merge
as
well.
Let's
just
say,
and
what
ends
up
happening
is
that
we're
generating
a
new
set
of
rows.
Well,
those
new
set
of
rows
make
up
003
dot
part
k,
okay,
but
the
table
now
is
comprised
at
this
checkpoint.
Zero,
zero
three
part
k,
because
we
actually
removed
zero,
zero
one
and
zero
two
dot
par
k.
A
A
But
the
point
is,
then:
you
know
at
a
particular
version
or
particular
time
the
table
was
comprised
of
001.rk,
there's
a
two-part
dot
par
k
and
then,
at
a
later
time,
after
the
update,
completed
transactionally
completed.
It's
now
comprised
of
03.4k
that
list
of
files
that
make
up
the
table
well,
in
this
case,
03
parquet,
is
included
in
the
transaction
log
itself.
A
So
when
any
system
like
presto,
is
attempting
to
read
what
makes
makes
up
that
table
as
opposed
to
it
doing
a
file
list,
okay
of
your
clogged
cloud
object
store,
which
can
be
extremely
slow.
Okay,
the
listing
of
objects
on
a
cloud
object
store
is
extremely
slow
so
because
of
that,
what
it
does
instead
is
hey.
Let
me
just
go
to
the
transaction
log
in
the
transaction
log
that
list
of
files
is
already
there,
so
in
this
case
zero
one.json.
A
It
contains
a
list
of
files.
This
example
is
only
one
file,
but
the
point
is
that
for
any
large
scale
system
it
contains
a
list
of
files
and
then
presto
can
go
ahead
and
actually
now
send
its
workers
to
go
ahead
and
read
all
those
different
files.
A
Okay,
so
that's
the
context
and
that's
why
it's
so
important
for
us
to
have
these
transaction
log
commits
it
allows
us
to
have
consistent
snapshots
right
so
either
the
reader.
If
it's,
you
know,
if
the
query
from
presto
had
occurred
before
the
update
completed,
it
knows
it's
supposed
to
read:
zer01.barcams
there's
a
two
dot.
Okay,
but
if
the
query,
then
another
query
occurs
a
millisecond
later,
but
that
means
the
update.
This
is
at
a
point
in
time
where
the
update
had
completed.
A
Then
you
actually
will
that
for
that
core
client
for
the
second
reader
it'll
actually
say.
Oh,
I
can
only
I'll
read,
there's
a
3.4k
right.
So,
in
other
words,
no
dirty
reads:
right:
there's
nothing
in
between
either
you've
got
the.
When
you
query
the
table,
you
know
exactly
which
transaction
it's
referring
to
at
that
point
in
time
of
the
query
and
then
you'll
know
exactly
which
files
that
make
up
that
table
at
that
point
in
time:
okay,
okay,
sorry!
A
So
because
we're
because
it's
so
important
for
us
to
provide
asset
transactions,
the
whole
context
is
that
we're
doing
it
via
mutual
exclusion
on
the
law
commits
right.
Basically,
what
happens
if
their
concurrent
writers
are
trying
to
write
to
the
same
table
at
the
exact
same
point
in
time,
you
actually
have
to
agree
on
what
the
order
of
those
changes
are
now
we
typically
follow
optimistic
concurrency
control,
meaning
that
most
of
the
time,
we're
hopeful
that,
in
fact,
these
rights
will
actually
not
interfere.
That's
the
idea
of
optimistic
concurrency
control.
A
Why
do
we
allow
for
that?
For
example,
if
there's
one
writer,
that's
trying
to
write
to
one
partition,
but
one
writer
that's
trying
to
write
to
another
partition
under
optimistic
concurrency
controls.
That's
perfectly
fine
they're,
actually
not
interfering
with
each
other
they're
writing
to
the
same
table,
but
they're
writing
to
different
partitions.
A
So
that's
not
that
big
of
a
deal
okay,
but
when
they
do
how
about
if
they
do
write
and
they're
actually
they're
both
trying
to
write
the
exact
same
partition
even
under
the
rise
of
optimistic
turns
to
control,
one
will
fail
to
complete
because
of
the
fact
the
other
one
actually
takes
precedence,
and
this
is
what
means
we
say.
The
concurrent
writers
need
to
agree
on
the
order
of
those
changes
so,
for
example,
write
the
writer
one
is
going
to
go
ahead
and
write
into
writing
to
the
initial
transaction.
A
Log.Json
writer,
two,
no
problem,
it
says:
hey,
no
well
I'll
I'll.
Do
my
writing.
Insertions
it'll
go
to
001.json
so
far
so
good,
but
how
about,
if
they're,
both
trying
to
do
updates
or
insertions
or
whatever
else,
to
the
same
partition
to
the
same
table?
Well,
then,
what
happens?
Writer
two
and
writer
one
are
going
to
fight
with
each
other
to
say
which
one
commit
gets
to
commit,
and
in
this
case
only
writer
two
wins
this.
A
Just
this
specific
example,
writer
one
would
fail
the
client
depending
on
what
it
what
and
this
is
included
within
the
delta
transaction
protocol.
The
client
will
then
get
to
auto
retry
and
see
if
it
works
in
different
scenarios.
If
for
sacred
argument,
you're
trying
to
do
an
update
and
the
data
has
changed,
the
reality
is
that
you
most
likely
will
have
to
fail.
Allow
the
user
to
recognize
hey.
A
The
date
is
different,
since
when
you
ran
it,
you
might
need
to
do
something
different
now
versus
if
it's
just
an
insertion,
then
it
probably
wouldn't
it
probably
actually
would
not
interfere
with
each
other,
all
it'll,
just
automatically
retry,
and
to
do
the
insertion
okay.
So
far,
so
good
everybody.
A
Cool
all
right
so
because
we
rely
on
these
cloud
object,
stores
or
or
for
that
matter,
hdfs.
So,
in
other
words,
we're
talking
about
we're
leveraging
the
scale
of
these
storage
systems.
The
whole
premise
is
that,
even
as
we're
doing
all
of
these
things,
there
isn't
a
single
point
of
failure.
That's
the
whole
purpose
of
working
with
s3
or
adl
gen
2
or
gcs.
It's
a
scalable
infrastructure.
So
there
is
no
single
point
of
failure.
A
It's
not
like
a
single
node
disk
that
that
could
fail,
you're
writing
to
cloud
object,
stores
right
so
in
terms
of
storage
of
support
which
allows
that
concurrent
rights
now
s3
has
always
allowed
for
concurrent
rights
by
the
way
with
delta
lake.
But
it's
through
a
single
driver
with
the
with
multiple
drivers
hcfs
as
their
gcs
has
the
mutex
I
specifically
put
if
absent,
consistency
guarantees
out
of
the
box,
so
it
never
had
that
problem.
A
The
data
okay:
this
is
a
problem
for
any
system,
that's
running
the
s3
now
how
most
of
our
systems
have
solved
it,
including
how
delta
solved
is
that
we
use
dynamodb
as
a
lock
store,
not
log
store,
now
lock
store,
so
in
other
words
who
determines
who
gets
the
lock
the
transaction
log
at
that
point
in
time.
So
that
way,
we
can
ensure
that
it
is
done
in
the
proper
order,
and
so
we're
not
sure
when
the
put
of
absence
guarantees
are
going
to
be
added
to
s3.
A
This
we've
had
discussions
with
them
for
quite
some
time
on
this
one,
but
nevertheless
included
in
the
was
it
last
month
or
the
month
before.
When
we
released
delta
lake
1.2,
it
actually
included
the
dynamodb
s3
multi-cluster
rights,
and
we
actually
recently
blogged
about
that
as
well
too.
So
so
in
the
end,
this
allows
you
to
not
have
a
single
point
of
failure
for
any
of
these
systems.
A
Now,
what
makes
delta
lake
unique?
I'm
not
going
to
dive
into
I'm
going
to
keep
it
relatively
high
level,
because
I
don't
want
this
to
be
just
a
marketing
pitch.
I
want
to
show
you
a
demo.
I
want
to
dive
into
the
details,
but
the
like,
I
said,
before,
delta
lake.
The
key
features
include
the
fact
that
we're
talking
about
acid
transactions.
It
protects
your
data
with
the
strongest
level
isolation
because
of
its
design
is
designed
to
handle
scalable
metadata
handle
petabyte
scale
of
data
with
the
excessively
large
sizes
of
metadata
often
goes
with
it.
A
You
can
because
of
time,
travel
you're
able
to
access
and
revert
to
earlier
versions
of
your
data
for
audits,
rollbacks
or
reproduction,
and
this
inherently
also
allows
you
to
actually
have
your
audit
history
to
go
with
it
as
well.
A
A
We
can
ensure
that
the
either
the
right
happen
or
the
right
did
not
happen
right,
it's
atomic,
and
so
it
allows
us
to
basically
run
those
batch
and
streaming
concurrent
writers
and
readers
all
at
the
same
time,
without
running
into
too
much
trouble,
which
is
pretty
sweet.
Okay
in
the
process
of
doing
that,
we
do
support
schema
evolution
and
enforcement.
So
in
other
words,
if
you
have
a
table-
and
it
says
it
is
comprised
of
these
five
columns-
we
can
either
enforce
it.
A
So
if
you
know
another
set
of
files
or
instead
of
data,
that's
trying
to
be
inserted
in
the
table
is
decide
to
add
a
sixth
column.
We
can
enforce
the
schema
and
say
nope
you're
not
allowed
making
changes
to
that
table.
Yet,
at
the
same
time,
you
can
evol
tell
the
table
during
that
query:
no,
no
we're
actually
going
to
allow
the
schema
to
evolve
based
off
this
fact.
So
no
problem
go
ahead
and
go
do
that
other
things
include
constraints
and
generated
columns.
A
So
that
way
you
can
go
ahead
and
partition
using
a
generated
column,
for
example
off
the
most
common
approaches
to
say
your
data
comes
in
as
a
time
stamp,
but
you
really
want
to
partition
not
by
the
timestamp,
but
maybe
like
the
day.
You
know
day
week
month,
type
scenario.
Well,
then,
you
create
a
generated
column
to
go.
A
Do
that
instead
right
and
then
the
dml
operations
are
applicable
to
scala
java
sql
whatever,
and
it
includes
merge,
update,
delete
and
all
these
other
things
so
delta
lake
allows
all
those
capacities
right
and
so,
with
the
context
of
with
presto,
what
we've
done
is
we've
started
with
the
the
delta
standalone
project,
which
allows
you
to
automatically
read
the
metadata
that
makes
up
the
transaction
log
and
then
subsequently,
presto,
then,
is
able
to
read
that
that
metadata
from
the
transaction
log
and
then
its
workers
are
able
now
to
go
query
read
directly
from
those
delta
tables
natively.
A
In
the
past
we
actually
had
to
go
ahead
and
use
a
manifest
file,
that's
great,
being
a
little
sarcastic
honestly
if
you
were
okay
with
making
changes
that
manifest
on
an
hourly
basis
or
on
a
very
slow
basis.
But
if
you
are
dealing
with
streaming
and
batch
all
running
at
the
exact
same
time,
that's
actually
problematic
to
put
it
lightly.
A
A
Okay,
thank
you
perfect.
All
right,
so
delta
lake
there's
a
fast
paced
information.
Do
not
worry,
I'm
not
going
to
ask
you
to
read
it,
but
the
point
is
that
we
open
source
delta
lake
back
in
0.1
back
in
2019
december
2021,
we
released
1.1.
We
recently
released
1.2,
so
we're
constantly
continuously
adding
more
and
more
to
delta
lake,
as
we
speak
so
and,
like
I
said,
just
released
with
delta
1.2.
This
was
maybe
a
month
ago
now,
but
basically,
data
skipping
with
called
stats
isn't
included.
A
Now
the
s3
multi-cluster
rights
that
I
referred
to
before
is
included
compaction
of
small
files
with
the
optimized
command,
restoring
by
previous
version,
using
the
restore
command
renaming
columns.
Well,
it's
also
being
added
to
the
roadmap,
and
this
is
what
we're
targeting
for
delta
lake,
the
next
version
of
delta
lake,
which
is
going
to
be
targeting
around
the
end
of
june,
because
we
have
the
data
ai
summit,
that's
coming
up,
which
hey
bunch
of
the
presto
folks
are
going
to
be
coming
to
the
dna
i72.
A
So
please
do
join
us
there.
Whether
you
want
to
join
us
physically
or
virtually
physically
in
san
francisco,
virtually
I'll
have
a
slide
at
the
tail
end
of
this,
which
actually
gives
you
a
promo
code
as
well,
but
included,
is
the
not
just
the
optimize,
but
now
the
optimize
with
z
order
as
well,
so
we're
actually
including
that
capacity
as
well.
That's
that's
being
targeted
for
around
q3.
I
believe
right
now
and
we're
also
including
generate
excuse
me-
generate
change
data
feeds
as
well
as
drop
columns.
A
So
we
have
a
lot
of
really
cool
things
that
are
still
in
the
wood
works
for
us
and
so
hopefully
you're
able
to
go
ahead
and
dive
into
those
details.
And
so,
let's
see
oh
okay
before
I
go
into
the
demos,
I
just
want
to
say:
look
you
know.
Delta
lake
covers
many
different
cloud
platforms,
with
many
different
apis
and
languages,
with
many
different
engines
and
sql
engines,
and
many
different
etl
and
streaming
engines
right.
So
it
is
a
system
that
has
a
very
broad
integration
or
connector
system.
A
That's
really
cool
for
you
to
work
with
so
so
by
all
means.
Please
go
ahead
and
if
you
have
any
questions
about
any
of
these
things,
you're
more
than
welcome
to
join
us
your.
We
also
typically
have
delta
delta
lake
community
office
hours
every
two
weeks,
in
fact
the
next
one's
the
third.
I
believe
so
that
way
you
can.
The
community
can
definitely
ask
us
any
questions,
and
so
what
were
some
of
the
key
native
connectors?
A
The
one
we're
going
to
show
right
now
is
the
presta
one
which
includes
a
delta
reader,
which
was
initiated
part
backed
in
presto
db0.269,
which
is
pretty
cool
all
right.
So,
let's
see,
let
me
switch
to
demo
mode
and
then
I'll
probably
jump
a
little
back
and
forth
okay,
so
all
right
so
you're
all
familiar
of
course.
With
this
view-
and
in
this
case
thank
you
rohan
for
letting
me
actually
leverage
ahana
to
go
ahead
and
run
the
queries
against
just
because
it
allowed
me
to
make
my
life
a
little
bit
easier.
A
All
right
perfect!
So
here
we
go
everybody's
familiar
with
the
good
old-fashioned
presto,
so
I've
already
connected
to
that
that
environment
that
rohan
already
set
up,
and
so
you
notice
that
the
catalogs
here
basically
I've
got
multiple
schemas,
that
I
can
work
with.
I'm
going
to
specifically
go
ahead
and
sorry.
A
I'm
going
to
specifically
query
this
available
schemas
within
my
format
for
delta
lake
okay.
So
far
so
good,
and
by
the
way,
if
you're
wondering
I'm
gonna,
I'm
doing
a
little
bit,
switching
back
and
forth.
So
apologies
for
for
doing
that,
but
I
figured
it
was
easier
and
I've
got
a
very
large
screen.
So
I
don't
want
to
go
ahead
and
I
don't
want
to
go
ahead
and
show
my
full
screen
because
then
usually
nobody
can
see
anything.
A
But
basically,
if
you
go
back
to
this
here,
then
you
can
basically
see
that
the
the
queries
came
through
perfectly
fine
in
my
presto
cluster
right
here.
So
no
problem.
So
I'm
going
to
go
back
to
my
terminal
here
there
we
go
boom
all
right.
So
what's
next
in
our
dot
all
right!
Let's
see,
let's
sew
the
tables
from.
A
And
as
I'm
running
these
queries
by
the
way,
you're
more
than
welcome
to
ask
any
questions
again,
you
can
unmute
yourself,
you
can
put
them
in
chat,
so
I'm
gonna
specifically
take
a
look
at
the
customer
park
table.
So,
let's
see
what's
what's
inside
there.
A
Okay,
so
let's
definitely
take
a
look
what's
inside
this
table,
so
all
right,
so
I
think
it's
running
a
little
slow
today,
but
only
because
I
I
think
I
think
rohan
we
probably
had
set
up
when
using
spot
instances,
so
our
own
bat
on
that
one
didn't
he.
A
C
So
danny
this
is
project
and
I
have
a
question
about
the
querying
part.
Actually,
so
what
I
want
to
ask
is
when
suppose
presto
is
trying
to
query
any
delta
table,
so
is
it
only
reading
the
latest
latest
latest
json
file
from
the
transaction
log,
or
does
it
read
all
the
files
from
the
transaction
log.
A
Oh,
that's
a
that.
That's
a
great
question.
So,
in
the
end,
what
it
typically
does
it'll
actually
put
in
memory
all
the
transaction
logs,
but
what
it
does
it
looks
at
the
parquet
files
and
they're
up
to
the
last
nine
json
files.
This
also
denotes
the
fact
that,
basically,
because
when
you
have
the
tenth
one,
that's
becomes
a
part
k
file
again.
That's
why?
Okay,
the
reason
it
does.
A
That
is
because
that
way,
it
actually
has
the
full
historical
context
of
what
the
table
is
now
saying
that
if
you
only
query
the
last
json
file
right,
then
what
ends
up
happening
is
the
list
of
the
files
that
make
up
that
table
are
comprised
inside
there.
So
exactly
to
your
point,
you
can
just
theoretically
go.
Do
that,
that's
actually
how
we
generate
the
manifest
files
themselves,
initially
right,
and
so
when
I'm
writing
right
now.
A
This
query,
for
example,
this
query
here:
okay,
where
I'm
querying
the
table
and
I'm
just
looking
at
you
know
just
I'm
just
querying
what
the
results
in
it.
What
it's
doing
basically,
is
looking
at
the
last
json
file
to
get
what
is
comprised
of
that
table
and
then
go
ahead
and
have
presto
go
query.
The
files
that
make
up
that
table
basically.
A
Correct
so,
basically,
when,
when
we
make
up
the
transaction
log,
there
are
multiple
json
files
that
make
up
the
transaction
file,
but
the
of
the
10th
file.
There's
a
checkpoint
that
kicks
in
that
checkpoint
basically
goes
ahead
and
says:
okay,
take
the
the
previous
10
json
files
and
go
ahead
and
make
it
into
a
parquet
file.
So
that
way,
if
your
cluster
crashes,
as
opposed
to
trying
to
iterate
through
all
those
json
files,
it
can
just
iterate
through
the
par
k
files.
That's
the
context
makes
sense.
C
A
Honestly,
it
was
relatively
arbitrary,
I
mean
it
was
just
basically
the
fact
that
we
knew
that
we
didn't
want
to
have
that
many
json
files,
because
it
would
slow
down
the
reading
of
the
metadata
it
would
slow
down
the
reading
of
the
transaction
log.
So
that's
the
reason
why
you
you
definitely
want
to
definitely
would
want
to
go
ahead
and
be
able
to
do
that.
Basically,
so
so
we
just
arbitrarily
chose
10.,
that's
really.
It.
A
That's
correct
so
specifically
there's
a
command
called
vacuum
that
you
can
execute
that
basically
clears
up
or
any
old
files,
whether
it's
data
files,
ie
parquet
data
files
or
your
json
files,
and
so
the
idea
is
by
default.
You
when
you
run
a
vacuum
on
its
own.
It
will
clear
out
the
last
anything
that's
older
than
seven
days
for
data
and
anything
that's
older
than
30
days
for
logs
now,
the
key
important
thing:
it's
not
removing.
If
you've
got
data
from
like
five
years
ago,
that's
inside
there,
okay,
it's
not
removing
anything!
A
A
That's
the
reason
why
you
want
to
go
ahead
and
clear
that
out
right
and
so,
but
there
are
plenty
of
people
that
actually
leave
that
running
for
an
extended
period
of
time
now
in
batch
scenarios,
I
think
that's
perfectly
fine,
but
in
streaming
scenarios
I
would
definitely
not
advise
that.
I
would
definitely
advise
to
clear
out
these
things.
Yeah
make
sense.
A
Problem
so,
for
example,
I'm
going
to
go
ahead
and
run
the
square,
and
I'm
also
going
to
answer
then
dennis
or
denise
question
here:
okay,
so
I'm
going
to
run
a
quick
period
just
to
get
the
count
of
the
table
right
now,
okay,
and
so
to
your
point,
yeah.
Basically
right
now,
it's
going
to
the
most
recent
or
the
last
json
file
to
determine
what
makes
up
the
files
and
it
basically
is
going
to
go
query
and
it's
going
to
come
back
with
the
results.
A
So
it
is
currently
planning
right
now
as
and
running,
and
so
like.
I
said,
it's
probably
a
little
slow
today.
So
my
apologies
for
that
saying
that
dennis
or
denis
you've
asked
the
questions.
As
far
as
you
know,
the
views
are
not
supported
by
the
delta
catalog.
Are
there
any
plans
to
support
views?
A
So
one
of
the
things
I
would
advi
ask
you
to
do,
though,
is
if
you
can
go
ahead
and
actually
create
a
github
issue
on
the
delta
github,
and
then
that
way
people
can
vote
in
almost
one,
because
my
take
on
stuff
like
this
is
that
it's
a
lot
of
the
asks
we
do.
We
do
it
based
off
of
what
the
community
feedback
is
and
so
right
now,
there's
been
massive
asks
for
like
dynamic
partition,
overrides
s3,
multi,
multi-cluster
rights,
the
change
data,
feed
capabilities
and
so
forth
and
so
forth.
A
So
we've
been
focusing
more
on
that,
so
it's
less
of
a
we
don't
want
to
do
it
and
much
more
of
a
what
is
the
community
asking
then
this
is
what
we
together
are
working
on
now,
just
to
finish
up
this
query
here,
I
didn't
want
to
call
out
that,
basically,
okay,
when
I
query
the
customer
table,
I'm
saying
hey,
there's
it's
right
currently
has
about
what
two
two
million.
That's
right!
Two
million
two
point:
six
million
rows
inside
okay,
so
excuse
me
no
problem
at
all.
A
But
what
happens
if
I
go
ahead
and
create
an
earlier
version
of
this
team?
Well,
all
I
have
to
do.
Oh,
of
course,
when
I
run
it,
that's
when
I
that's
when
it
fails
all
right,
duh
all
right.
This
is
what
happens
when
you
start
doing
live
queries
there
you
go
so
when
I,
in
this
case
I'm
taking
the
same
query
that
I
just
specified,
but
I'm
actually
now
adding
this
at
v1.
This
tells
me
the
version
table.
A
You
can
actually
also
specify
the
timestamp,
but
it's
easier
to
write
when
I
just
use
the
version
number.
So
that's
all
I'm
just
going
to
go
do,
but
the
context
basically,
is
that
by
adding
that
add
sign
I
can
basically
choose
the
version
of
the
table.
There
are
multiple
versions
of
this
table
so
right
now
it's
currently
occurring
the
not
the
current
version
of
the
table,
but
a
the
second
version
right,
because
v0
is
the
first
first
table.
A
So
the
second
version
to
have
v1
of
that
table
is
actually
going
ahead
and
being
queried
right
now,
so
this
will
come
back
and
it'll
be
a
much
higher
number,
because
what
happened
is
that
in
between
version
one
and
the
current
version
table,
we
basically
deleted
that
table
deleted
a
bunch
of
rows
from
that
table
and
so
back
to
your
initial
point
about
like
the
vacuuming.
What
happens
is
that
if
the
version
one
of
this
table
is
really
really
old?
A
Okay,
like
you
know-
and
it's
like
it's
older
than
seven
days-
let's
just
say
it's:
it's
like
yeah,
it's
older
than
seven
days.
It
is
possible
that
I'll
when
I
run
the
vacuum,
that
I'll
delete
the
files
that
make
up
that.
Allow
me
to
calculate
the
fact
that
they're
15
million
rows
inside
that
table,
because
we
no
longer
need
that.
A
So
that
way
to
get
to
get
specific
versions,
if
you
wanted
to
so
there's
definitely
other
options
in
play,
but
the
idea
is
that,
if
you
don't
want
that
table
to
grow
too
much
because
you
don't
care
about
the
old
history,
then
you
can
run
vacuum
to
basically
clear
it
up.
So,
as
you
see,
that's
pretty
cool
so
now
in
this
case,
because
I've
got
multiple
versions
of
the
table
sitting
in
here,
I
can
basically
just
specify,
for
example,
I
want
to
specify
version
five.
A
I
can
see
the
number
of
rows
that's
inside
there
as
well.
So
this
is
the
ability
to
go
ahead
and
look
at
the
different
versions
of
the
table
will
significantly
make
things
a
lot
more
performant,
basically
so,
oh
and
yeah,
to
follow
up
with
the
question
about
the
the
delta
or
preston
repository
with
the
views,
the
views
actually
part
of
the
problem
will
definitely
be
in
the
presto
repository,
but
part
of
the
problem
is
actually
in
the
delta
repository.
A
So
the
remember
the
the
the
part
of
the
reason
why
I'm
saying
part
of
it,
it's
part
of
its
presta
part
of
its
delta-
is
that
what
the
del?
What
is
normally
presto,
does
is
right,
it
says:
hey,
let
me
look
in
the
repo
on
the
the
catalog,
the
and
it'll.
Have
the
it'll
make
up
it'll
tell
me
the
initial
location
of
the
table,
so
I
specify
the
temple
name
and
it'll.
Tell
me
the
base
location,
that's
great,
but
what
it
also
does.
A
It'll
also
go
ahead
and
often
tell
me
other
pieces
of
information
well
in
the
case
of
delta.
We
don't
want
that.
Actually,
we
only
want
you
to
go
ahead
and
specify
the
table
name
and
it
specifies
the
table
path,
but
then
subsequently,
it'll
go
ahead
and
use
the
delta
standalone
project.
The
styles
of
standalone
reader
to
go
ahead
and
say:
okay,
now
presto,
is
making
use
of
that
to
say.
Okay,
I
have
the
I
have
the
name.
A
It
tells
me
the
path
now
with
the
path
go
ahead
and
query
that
information,
and
so
what
it
does
it's
doing
is
reading
the
transaction
log
to
get
that
information.
So
that's
why
I'm
saying
part
of
the
problem
when
it
comes
to
view
will
definitely
be
on
the
press
department.
In
terms
of
saying,
hey,
presto,
here's
a
view.
The
view
is
comprised
of
what
a
bunch
of
select
statements.
Let's
just
say
that
our
headquarter
table,
then
that
table
itself
now
needs
to
be
translated
into.
A
How
do
I
query
that,
from
the
standpoint
of
delta,
which
is
the
for
example,
working
on
the
query
path,
so
using
a
very
explicit
way
of
showcasing
this
idea
instead
of
me
specifying
the
table
name,
I
can
literally
specify
the
path.
So
no,
instead
of
going
select
star
from
delta
dot
delta
tpch
sf100,
I'm
saying
no,
no
just
give
me
delta
dot
path,
so
you
actually
specify
like
this
dollar
path.
Dollar.
Okay,
dot
s3.
So
this
is
actually
an
s3
path
with
files
that
are
inside
that
now
this
is
extremely
small
table.
A
You
can
just
query
it
directly
from
the
path,
and
so
this
path
that
you're
seeing
here
in
essence,
is
what's
actually
inside
the
presto
catalog,
and
so
so
that
way,
when
we
specify
delta
oh
go,
got
it
go
ahead
and
simply
give
me
the
initial
path,
tell
the
delta
standalone
project
to
go
ahead
and
get
the
metadata
from
there,
because
the
the
metadata
would
be
in
s3
delta,
glue
test,
one
sample
table
underscore
delta
and
score
log
right,
that's
the
context,
and
so
so.
In
the
end,
this
is
the
the
quick
callout.
A
I
want
to
show
you
about
how
pretty
cool
the
presta
delta
native
reader
is,
and
so
you're
able
to
go
ahead
and
make
sense
of
it
and
to
be
able
to
query
it
to
be
able
to
run
natively
without
actually
going
ahead
and
running
a
manifest
file.
Okay,
so
and
rohan
asked
a
wonderful
question.
A
Let
me
go
ahead
and
switch
to
back
to
my
slides,
because
I
did
have
a
couple
more
things
to
show
before
I
go
back
and
oh
by
the
way
here
is
back
to
the
thank
you
again
rohan
for
the
the
hana
cluster.
That
allows
me
to
showcase
all
the
queries
that
we're
running
and
all
the
active
works
and
so
forth
so
forth,
and
is
there
a
survey
coming
up
for
2022
for
users
to
request
new
features
or
provide
featured
feedback?
A
Yes,
we
will
be
having
a
survey
we're
probably
targeting
around
august
or
september,
for
the
next
delta
lake
survey.
You'll
get
free
t-shirts
by
the
way
free
swag
when
you
go
ahead
and
fill
out
the
survey,
but
between
now
and
then
you
like,
I
say,
you're
more
than
welcome
to
go
ahead
and
ping
us
in
the
github
issues
and
get
all
that
information
from
there.
A
Okay,
seeing
that
I
did
want
to
finish
up
to
talk
about
like
here's,
the
context
for
the
incredible
scale
of
delta
lake,
we're
talking
about
more
than
450
petabytes
of
data
processed
a
day
in
in
databricks
alone.
It
it's
75
of
the
data
scan
is
all
on
delta
and
there's
more
than
5000
companies
in
production
that
are
running
on
delta
lake,
which
is
pretty
cool.
A
Oh
sorry,
there
we
go.
So
how
do
you
engage
with
us?
Just
like
I've
been
sort
of
implying
and
calling
out
before
you
can
go
to
delta
dot
io,
we
have
the
delta
user
slack,
the
delta,
like
youtube
channel,
the
delta
use.
Google
group,
definitely
ping
us
in
the
delta
lake
github
issues,
the
delta
lake,
linkedin
and
or
data
and
ai
online
meetup.
A
So
so
there's
lots
of
ways
to
engage
with
us
and
that
I
didn't
even
include
our
stack
overflow,
a
tag
as
well
and
then,
like
I
said
we
have
community
office
hours
or
ema's
every
two
weeks
right.
So,
for
example,
this
one
that
I
just
posted
here
is
from
february
17th.
Why?
A
Because
I
wanted
to
call
out
a
cool
appearance
by
apple's
dominique
brzezinski
who
have
who's
been
involved
with
delta
lake
since
its
inception,
in
fact,
there's
a
good
cool
story
about
him
and
michael
armburg,
getting
together
during
spark
summit,
2017
specifically
to
go
ahead
and
work
on
what
end
up
becoming
project
tahoe,
which
itself
became
delta
link,
and
so
like.
I
said
how
to
use
delta
like
to
go
back.
You
make
use
of
this
stuff.
A
So
the
key
thing
I
did
want
to
call
out
is
that
not
just
for
about
delta
lake
there's
plenty
of
sessions
from
presto
from
our
friends
on
on
it
as
well
too
for
date
and
ai
summit,
which
is
at
the
end
of
june
june,
27th
the
30th.
It
is
a
hybrid
format,
so
you're
more
than
welcome
to
join
us
from
virtually
or
if
you're
in
san
francisco
you're,
more
than
welcome
to
go
ahead
and
join
us
physically
there
in
san
francisco.
A
But
if
you're
going
to
join
us
physically
sign
up
soon,
because
actually
tickets
are
running
out
now
saying
that
I
do
have
a
code
here.
This
d-a-I-s-com
25
for
25
off
of
conference
of
the
off
the
conference
pass
and
of
training
as
well.
So,
of
course,
I
should
spell
better,
but
nevertheless
that's
the
context.
So
that's
it
for
me
today.
A
B
Awesome
denny
what
a
great
presentation
and
demo
appreciate
it.
Thank
you
so
much
I
don't
know
if
somebody's
raising
their
hand
because
they
have
a
question,
if
you
do,
you
can
unmute
yourself
or
you
can
put
it
into
the
chat,
then
we
can
ask.
D
Yeah,
so
this
is
surya
a
couple
of
questions
here
so
right
now
we
are
using
the
open
source
version
of
delta
link
and
you
know
so
what
we're
trying
to
do
is
we
are
trying
to
get
the
data
incremental
data
from
kafka,
and
then
we
are
merging
that
data
into
delta
lake,
which
has
limits
of
records
right
so.
C
D
We
found
out
was
initially
when
we
start
the
in
a
process
when
there
is
so
much
delta
log.
The
merge
process
takes
about
12
seconds
or
something
right
and
later
on.
When
the
delta
logs
increases.
The
merge
process
you
know
keeps
on
increasing.
The
12
seconds
will
turn
into
a
60
seconds
after
two
weeks
of
time,
right
sure
yeah.
So
we
are
also
running
this
compaction
process
where
we
merge
out
these
smaller
files
into
a
larger
files-
and
you
know,
and
also
we
run
the
vacuum
once
every
day.
D
Is
there
something
else
we
can
do
you
know
to
optimize
this?
We
also
have
this
partitioning
on
the
delta
lake
as
well.
A
Okay,
so
without
rat
holding,
let
me
let
me
provide.
Let
me
I'm
actually
going
to
answer
your
question,
but
I
also
want
to
be.
I
want
to
give
you
the
shorter
version,
because
the
longer
version
is
going
to
take
up
like
a
much
more
time,
but
basically
one
of
the
things
I
definitely
do
definitely
chime
in
on
the
delta
user
slack.
We
all
of
us
are
actually
there
answering
questions
exactly
like
the
one
that
you've
asked
now,
specifically
because
you're
running
a
merge
as
you're
writing
into
it.
A
The
fact
is,
it
really
depends
on
if
the
merge
is
actually
looking
at
all
the
historical
data
or
only
at
the
current
data.
Okay,
if
you're
looking
at
all
of
the
historical
data
candidly,
there
isn't
much,
you
can
do,
except
for
increasing
the
number
of
workers
that
are
involved
in
order
to
be
able
to
handle
the
load.
A
That's
coming
in
because
of
the
fact
that
you're
going
back
to
historical
data
and
making
changes,
whether
it's
an
update,
delete
or
whatever
else
included
with
that
now
saying
that
are
there
ways
to
forsake
argument,
increase
the
number
of
workers,
even
if
you're
not
increasing
the
number
of
nodes
or,
if
you're
able
to
go
ahead
and
partition
slightly
differently.
These
are
all
definitely
in
play
that
we
can
help
things
out.
The
other
thing
to
notice
as
you're
debugging
is
that
you
want
to
figure
out
how
often
the
merge
is
happening.
A
Okay,
if
the
merge
is
happening
for
sacred
argument,
specifically
within
the
context
of
the
current
day's
data,
but
there
actually
isn't
that
much.
You
know,
because
it's
not
looking
that
much
in
history.
What
I'm
inclined
to
say
is
that
you
probably
want
to
run
the
vacuum
more
often
than
just
once
a
day.
Okay
in
those
typical
situations.
What
you
want
to
do
is
you
want
to.
A
You
probably
even
want
to
set
up
a
second
cluster,
it's
small,
there's,
probably
even
just
a
single
node
honestly,
which
will
simply
go
ahead
and
vacuum
up
the
files
to
decrease
the
number
of
both
logs
and
also
part
of
the
actual
data
files,
thus
reducing
the
sizes
of
everything
such
that
your
merge
can
run
faster.
Okay,
so,
like
I
said
this,
is
that
was
the
short
version,
there's
a
longer
version
that
probably
really
needs
to
understand
and
how
to
debug
your
scenario
a
little
bit
better.
A
So
it's
not
that
like
I
just
because
I
do
want
to
answer
a
couple
other
questions,
but
by
the
same
token,
like
definitely
make
use
of
the
delta
user
slack
because
those
type
of
questions,
a
lot
which
allow
us
to
sort
of
dig
into
those
details.
It'll
it'll,
be
a
lot
easier
for
us
to
go
to
dive
into
those
those
scenarios.
If
that's
okay
with
you
yeah.
B
A
Oh
there
we
go
okay,
no
problem
and
dennis
denis
so
after
adding
z
order,
support
to
delta
standalone
will
work
for
the
delta
connector.
Yes,
no,
it
will
not.
A
Well,
how
do
you
make
use
of
of
the
the
the
column
stats,
the
files
that
are
to
be
scanned
and
which
is
which
is
basically
built
in
within
z
order
that
you
basically
need
to
go
ahead
and
work
with?
So
this
is
definitely
again
a
two-parter
there's
parts
of
it
that
we
can
definitely
do
in
the
standalone
but
they're
parts
of
it
that
we
actually
definitely
need
to
go
do
with
the
presto
community.
B
Hey
just
out
of
curiosity
does
it
while
we
wait
for
another
question
to
come
in
who's
using
presto
with
delta
lake
today,
if
you
wanna
just
do
like
a
quick
virtual
hand,
raise
I'd,
be
interested
to
see
how
many
folks
today
are
using
it
or
may
be
thinking
about
using
it.
So
sorry,
we
know
you
are
okay.
Cool
dennis
is
awesome.
Okay,
good,
it's
good
to
see,
and
hopefully,
after
this
even
more.
B
A
Definitely
yeah
like,
like
I
said,
definitely
ping
us
on
the
delta
user
slack.
I
just
propped
it
there.
If
you
have
any
other
questions,
we
we
love
working
with
the
presta
community
and
we
hope
to
see
more
of
you
involved.
So
that
way
we
can
go
ahead
and
work
on
more
features
together,
faster.
So
absolutely.
B
All
right,
well,
I
think
that's
it
looks
like
we
don't
have
any
more
questions,
so
I
will
reiterate
two
great
events.
Coming
up.
We
hear
the
data
in
ai
summit,
so
a
ton
of
great
sessions
around
delta
lake
and
I
think,
there's
a
few
presto
ones
in
there
too,
which
is
cool
right
and
then
presto
khan
day,
fully
virtual
happening
in
july.
So
we
love
to
see
folks
at
both
of
those.
I
think
it's
a
great
crossover
between
the
two
communities
and
with
that
denny.
Thank
you
again.