►
From YouTube: Delta Lake Community Office Hours (2022-05-12)
Description
Join us on May 12, 2022 at 8:30 AM PST/11:30 AM EST for the Delta Lake Community Office Hours! Ask your Delta Lake questions live and join Claudius Li, Brian Olsen, Tom Nats of Trino and Denny Lee and Vini Jaiswal from Delta Lake!
Ask us your #DeltaLake questions. These sessions allow our community to ask questions about Delta Lake OSS and get to learn what we are building, planning to build and know about recently released features. These sessions are live and the recordings are available on the Delta Lake YouTube channel.
Quick links:
https://delta.io/
https://github.com/badal-io/datastream-deltalake-connector
https://groups.google.com/g/delta-users
B
A
C
B
C
A
Sounds
good
exactly
yeah,
so
this
is.
This
is
an
amazing
start
to
the
delta
lake
office
hour.
So
how
about
you
know
if
you
are
tuning
in
feel
free
to
join
through
our
linkedin
page
youtube
page
and
ask
your
questions
away,
or
you
know
just
to
start
with.
Please
say
where
you're
from
say:
hi
to
us.
We
love
seeing
like
you
know
where
people
are
joining
from
so
so
for
those
who
are
new
to
the
session.
A
These
sessions
are
live
and
occur.
Every
two
weeks
on
thursdays
at
9
00
am
because
of
some
schedule.
We
had
to
do
it
8
30
a.m.
Today,
so
we
bring
a
panel
for
contributors
and
champions
of
delta
lake
to
answer
your
questions
live
on
linkedin
and
youtube.
So
please
ask
away
your
questions
about
open
source
delta,
open
source
community
trino,
community
delta
lake
community,
and
we
are
happy
to
answer
those
questions
for
you.
So
a
mini
recap.
A
From
our
last
session
we
had
scott
fabian
and
denny,
who
provided
amazing
insights
on
our
recent
reviews
of
delta
lake
1.2.
So
we
talked
about
column
generation,
data,
skipping,
optimize
delta
link,
sync
connector,
and
we
also
touched
on
few
upcoming
features
like
reordering.
So
please
make
sure
you
check
out
our
release,
notes
and
the
blog.
I
will
paste
the
link
in
the
chat.
So
without
further
ado.
A
I
am
very
excited
for
today's
community
office
hours,
where
we
have
chino
and
delta
lake
meet,
so
we
have
brian
claudius
and
tom
from
chino
and
denny
and
myself
from
delta
lake.
So
brian,
why
don't
you
go
ahead
and
give
a
little
introduction
about
yourself.
B
Okay,
well
thanks
vinnie,
and
thanks
for
having
us
here,
I
am
brian
olson.
I
am
a
developer
advocate
at
starburst,
we're
the
vendor
company
that
builds
around
trino
and
does
a
lot
of
contributions
in
the
open
source
site
as
well,
and
so
I
yeah
I
I've
been
working
with
trino,
for
I
don't
know
five
years
now
and
and
back
when
it
was
called
presto
and
so
yeah
we've
we've.
We
really
enjoyed
the
product.
B
You
know
really
liked
a
lot
of
what
it
had.
So
I
got
really
started.
Getting
a
little
more
involved
in
the
community
started
contributing
a
little
bit
myself
on
the
elasticsearch
connector
in
particular
and
yeah,
and
then
after
that,
one
thing
led
to
another.
And
now
I'm
you
know
talking
about
this.
This
cool
query
engine
non-stop
and
you
know
getting
to
hang
out
with
cool
people
like
denny
and
vinnie,
and
obviously
you
know
tom
and
claudius.
I
yeah
yeah,
so.
A
D
So
we
have
a
lot
of
proprietary
starburst
code,
some
of
that's
getting
contributed
to
trino
and
we're
building
a
lot
directly
in
trino
so
that
everyone
can
take
advantage
of
it,
which
is
just
a
great
combination
with
the
contributions
that
databricks
is
doing
to
delta
lake
because,
as
you
saw
in
I
don't
know
if
the
people
called
the
treated
community
broadcast
a
while
back,
you
could
use
trino
and
open
source
technologies
like
delta
lake
and
min
io
and
just
start
using
a
lake
house
and
get
started
right
away.
It's
really
exciting.
A
E
All
right,
my
name
is
tom
nast,
director
of
customer
solutions.
Here
at
starburst
I
used
to
work
with
denny
and
vinnie.
It
seems
like
30
years
ago.
So
it's
been
a
long
long
time,
but
don't.
E
Yeah
one
of
the
first
things
I
did
at
starburst
was
just
push
delta
like
because
I
was
pretty
excited
about
when
I
worked
at
databricks
and
I
wanted
to
make
sure
that
it
gets
adopted
more.
I
went
through
the
pains
of
building
out
a
large
hadoop
clusters
and
I
came
from
a
database
world.
So
when
I
got
into
I
kind
of
do,
updates
deletes
and
all
the
headaches
I
did.
I
just
wanted
to
run
run
the
other
way.
E
So
I
was
very
excited
to
learn
about
felt
the
lake,
and
hopefully
I
influenced
it
here,
and
we
also
fixed
it
recently.
So
just
the
more
the
more
the
better
on
something
like
this
for
the
technology,
the
more
people
we
get
around
using
it,
that's
better
for
everybody.
So.
A
Yeah-
and
you
know
it
was
it-
it
was
great
working
on
the
healthcare
customers.
You
know
when
their
when
our
product
had
still
like
a
long
ways
to
go
and
yeah
it
was.
It
was
just
an
amazing
ride,
so
look
at
where
we
are
now
right.
Amazing
communities.
C
Sure,
thank
you
very
much.
Hey
everybody!
Name's
danny
lee,
I've
been
working
with
apache
spark
and,
more
importantly,
delta
lake
since
its
inception
and
a
long
time
database
guy
just
like
tom.
So
that's
why
tom
and
I
know
each
other
super
happy
to
work
with
the
trina
folks.
I
before
my
when
I
was
in
my
realm
of
database
in
the
database
world,
I
actually
was
working
on
bi
with
analysis
services
world
too.
C
So
the
fact
that
we
got
trino
and
delta
lake
together
that
super
excites
me
so
glad
that
we
finally
got
the
session
together,
we've
known
each
other
for
a
while.
Yet
it
took
us
what
three
years
for
for
us
to
finally
actually
get
on
air
together.
So
yeah
we're
a
little
bit
we're
a
little
behind
schedule
that
one
but
otherwise
super
happy.
A
Awesome
so,
like
danny,
I'm
also
a
developer
advocate
working
on
advocacy
efforts
for
delta
community,
and
you
know
I
have
been
with
catholic
projects
since
its
inception
as
well
and
have
seen
a
lot
of
implementation
as
well
as
use
cases
with
the
customers
working
on
features
from
a
from
like
a
journey
of
four
years.
I
guess
so
that's
exciting.
That's
why
I
love
talking
to
all
the
delta
enthusiasts
and
the
community
ecosystem
who
works
with
delta.
A
So
all
right
also
looks
like
we
have
people
from
all
over
the
globe,
which
is
awesome.
One
of
the
questions
I
will
start
with.
Is
you
know
so
you
know
we
just
released
the
delta
and
chino
connector.
So
why
don't
we
give
audience
a
quick
rundown
of
like
you
know
what
this
feature
is
meant
to
be,
and
what
does
what
problem
does
it
solve?
D
So
so
the
the
problem
that
that
we
were
solving
was
already
solved
for
enterprise
customers
right
we'd
had
a
delta
lake
connector
that
did
reads
and
writes
and
we'd
been
adding
optimization
capabilities
and
the
the
problem
is
that
trino
customers
just
didn't
have
this
at
all
right.
So
their
options.
B
D
Delta
lake
were
not
right,
okay,
so
what
what
we,
but
why?
Why
do?
People
want
delta
lake
right,
so
the
the
people
have
been
using
hive
since
day
one-
and
there
were
you-
know
a
lot
of
advantages
right
scale
and
trino
could
could
make
a
lot
of
that
access
significantly
faster
by
paralyzing
it,
but.
D
These
benefits
of
these
lake
houses,
having
these
flexible
schemas
and
having
this
enormous
scale
are
great,
but
people
still
need
acid
right.
It
turns
out
that
saying
hey.
Can
we
just
ignore
acid
for
a
while,
like
sort
of,
but
it's
really
hard.
It
makes
your
entire
life
more
difficult,
so
you
can
use
delta
lay.
You
can
do
consistent,
updates
and
row
level
inserts
and
deletes
and
just
treat
it
like
a
regular
database.
E
Yeah,
I
think
that
don't
jump
on
that
too.
We've
hit
this
inflection
point
with
a
lot
of
our
customers
in
the
last
probably
year
and
they've
they're.
Getting
these
lar
very,
very,
very
large
tables,
with
hundreds
of
thousands
of
partitions
and
a
lot
of
that
data
is
statistics
and
partition
information.
E
All
that
stuff
is
stored
in
the
hyde
meta
store
and
if
you're,
on
aws
you're
using
glue
and
we've
hit
the
wall
on
performance
and
a
lot
of
these
customers,
because
they're
trying
to
we
can
either
cache
that
metadata
but
they're
using
like
maybe
spark
or
another
process
to
add
files
or
add
partitions.
So
you
have
a
cache,
but
you
have
to
invalidate
that
cache.
E
So
we
really
we've
struggled
lately,
because
that
is
like
the
the
single
point
of
failure,
or
this
you'll
know
that
that
is
really
causing
performance
issues
for
a
lot
of
our
customers.
So
we
encourage
them
to
get
to
delta
late
because,
as
you
may
know,
majority
of
all
that
data
is
not
stored
in
the
hymena
store.
E
Now
all
that
data
is
stored
in
in
the
file
system
that
could
be
read
in
parallel,
so
we're
seeing
we're
solving
a
lot
of
problems
not
only
just
on
the
dml
side,
but
actually
performance
issues
with
a
lot
of
our
customers,
because
they've
grown
from
smaller
tables
to
these
huge
humongous
tables
and
nobody's
ever
deleting
data.
So
I
think
that's
another
one
that
I
think
people
don't
really
understand
is
the
performance
benefits
of
delta
late.
B
In
particular
that
that
list
file
listing
you
were
talking
about
it
as
well,
denny
like
in
in
the
cinco
de
trino
talk
like
that
is
a
that's
just
something
that
immediately
brings
any
of
your
queries
to
a
screeching
halt,
if
not
just
totally
like,
doesn't
run.
One
thing
I
wanted
to
point
out,
and
I
I'm
gonna.
I
believe
there
was
a
way
that
we
had
like
people
in
the
trino
community.
B
I
I
would
I
did
want
to
kind
of
mention
like
we
had
a
lot
of
people
doing
this
hack,
because
we
had
just
such
a
huge
demand
because
they're
like
hey,
like
we're,
we're,
for
instance,
we're
on
a
data,
brick
setup.
You
know
and
like
we.
We
want
to
use
trino
for
the
interactive
stuff,
but
like
we
have
no
way
to
do
this,
so
our
our
initial,
I
think,
tom,
you
wrote
a
blog
on
this-
was
kind
of
like
how
to
kind
of
put
this
together
somewhat.
B
You
know
without
without
having
an
official
connector
that
actually
interfaces
with
with
with
delta
lake,
and
so
so
that
was
a
big
I
think
for
a
couple
years.
You
know
just
kind
of
like
we,
we
didn't
have
a
really
good
way
to
solve
that
for
the
open
source
community,
and
so
that's
that's
another
big
thing
like
for
a
lot
of
those
people
that
are
maybe
hearing
this
for
the
first
time.
You
know,
like
there's
an
official
connector.
C
C
You
generate
the
manifest
file
and
sure
you
could
auto
update
and
auto
regenerate
that
manifest
file,
but
if
you
were
even
remotely
in
a
scenario
where
you
were
like,
even
if
it's
just
batch
processing
level
and
stream
processing
right,
where
you're
making
constant
modifications
inserts
or
whatever's
to
the
data,
the
idea
of
constantly
modifying
that
manifest
file
yeah,
it
was
never
going
to
survive
and,
just
like
you
know
what
what
tom
was
calling
out
like
if
you've
got
even
moderately
sized
tables,
let
alone
the
massive
ones,
let
alone
you're
trying
to
keep
all
of
history
right.
C
The
reality
is
that
that's
a
massive
amount
of
data
that
just
like
what
brian
called
out
like
the
fact
that
you're
doing
a
cloud
object,
store
file
listing
that
takes
forever
right,
if
assuming
it
even
works
at
all,
and
so
basically
the
the
hacks
and
the
manifest
file
were
not
really
good
scenario
situations.
So
it
was.
C
These
are
good
temporary
hacks,
but
this
is
where
the
community,
you
know
both
our
trino
and
delta
lake
communities,
work
together
to
basically
go
ahead
and
say
you
know
what
we're
going
to
go
ahead
and
actually
make
this
work
so
hats
off
to
the
starburst
folks.
By
the
way
I
want
to
give
the
seriously
hats
off
to
them.
You
know
that
they
went
ahead
and
put
did
the
vast
majority
of
the
work,
so
I
want
to
call
them
out
right
that
to
go
ahead
and
get
this
through.
C
C
If
you
want
to
go
ahead
and
ping
us
on
either
the
trino
slack
or
the
delta
lake
slacks
just
go
ahead,
and
let
us
know
right
what
other
features
do
you
want
in
what
other
things
that
are
potentially
missing
or
advancements
that
you
would
like
to
see
because
we'd
love
to
engage
with
the
community
to
see
what
else
we
should
be
doing?
C
We
have
a
plan
that
we're
currently
working
on
which
we'll
publish
shortly,
but
again
we
always
want
to
engage
with
you
all
on
in
terms
of
how
what
else
we
can
go
do
so
I'll
get
off.
A
My
box
now
no
definitely,
but
there
is
one
question
on
like
you
know:
you
talked
about
the
manifest
file.
So
what
are
these
files?
How
are
they
generated.
C
For
those
who
are
new
yeah-
oh
sorry,
yes,
to
generate
a
manifest
file.
Basically
there's
a
there
is
a
generate
manifest
command.
I
don't
happen
to
remember
the
syntax
off
the
top
of
my
head,
but
basically
there's
a
generate
manifest
syntax.
What
the
manifest
is,
basically
a
single
file
that
lists
all
of
the
parquet
files
that
make
up
your
delta
table.
C
Okay,
that's
basically
what
it
means,
so,
in
other
words
when,
when
you
wanted
trino
or
any
other
system
that
wasn't
basically
already
natively
working
with
delta
lake,
what
it
could
do
is
just
basically
read
that
manifest
file
to
figure
out
what
was
going
on.
Okay,
here
are
the
files
you
need
and
then
you're
good
to
go.
The
problem,
of
course,
is
that,
if
you're
modifying
updating
inserting
deleting
data,
which
is
pretty
common
in
a
data
lake
last
time,
I
checked
right
if
you're
going
to
go
ahead
and
constantly
do
those
modifications
right.
C
That
means
you
have
to
constantly
regenerate
that
manifest
file.
Now
there
is
the
auto
manifest
generation
after
the
first
time
you
do
it
that
actually
takes
care
of
that
for
you,
but
that's
only
if
you're
doing
very
slow
modifications
like
modifications
on
a
very
infrequent
basis
and
in
almost
any
data
lake
I
can.
This
is
that's
not
the
primary
mode
of
operation
anymore,
like
you
know,
if
you
had
asked
that
question
to
me
10
years
ago,
sure
maybe
now
definitely
not
so.
A
C
E
E
That's
what
that's
the
quickest
way
of
doing
it,
and
then
you
would
just
treat
it
as
a
regular
regular
table
at
the
regular
delta
light
table.
B
Yeah,
essentially,
you
just
make
a
copy
copy
of
that
that
table
you
have
you're
essentially
simultaneously,
writing,
maybe
to
both,
while
you're
doing
the
migration
and
then
once
you've
checked
that,
like
you
know
the
everything's
pretty
much
the
same
you're
you're
getting
the
same
like
checksums
out
of
out
of
both
of
those
out
of
both
of
those
tables,
then
you
know
you
can
essentially
switch
over
and
start
treating
the
delta
table
at
that
point,
as
your
primary
table.
D
D
If,
if
you
want
to
work
at
it
hard
enough
right
like
you
can
always
you
know
hack,
something
together,
and
maybe
it
works
well
or
maybe
it
works
not
so
well
often,
the
more
relevant
question
is:
how
easily
can
you
do
this
and
how
well
does
it
work
right
so
with
having
a
a
lot
of
increasing
amounts
of
delta
lake
in
open
source
and
with
having
the
delta
a
connector
in
open
source
and
having
these
work
together
and
having
them
having
all
the
code
base
be
visible
and
adding
a
lot
of
these
functions
in
some
cases
doesn't
unlock
something
that
you
couldn't
do
it
just
makes
it
much
much
easier
and
makes
it
work
much
better.
D
A
That's
a
very
good
point
glorious
because
you
know
all
of
these
new
systems
are
built
on
the
foundation
of
open
source
and
for
us
to
like
really
advance
and
solve
all
the
data
engineering
problems.
It's
required
that
we
all
work
together
and
form
like
a
community
right,
so
that
simple
sorry
systems
are
simplified
and
easier
for
use
for
end
users.
So
thank
you
for
adding
that
information
claudius.
So
our
next
question
is:
how
do
we
stop
implicit
con
conversion
of
input,
data
to
custom
trainer,
udfs.
A
Is
that
a
question
yeah
that
you
can
answer
brent
yeah.
B
So
I
mean,
is
this
specifically
saying
like:
is
there
a
way
to
essentially
turn
off
the
like
udf
functionality
in
trino,
so
that
you're,
basically
not
relying
on
on
any
any
system,
specific
any
system?
Specific
conversions?
I
think,
if
that's
how
I'm
understanding
the
question
and
like
I
mean
essentially
this,
would
this
would
come
down
to
there's,
there's
no
way
to
be
like
no,
you
like
turn
off,
udfs
or
or
basically
stop
any
of
these
kind
of
conversions
in
any
query
engine
really
right.
It's
it's
actually
opening
up.
B
You
know
kind
of
helping
you
out
with
like
trying
to
solve
various
edge
cases,
one
of
the
cool
things
one
of
the
strengths
actually
around
trino,
I
would
say,
is
that
you
can
kind
of
write
your
own
udfs
write
your
own
plugins
and
and
add
that
to
the
system
and
that
usually
kind
of
gets
around
some
of
the
edge
cases
where
it's
that
are
difficult
to
solve,
with
like
just
pure
sql,
but
yeah.
There's,
there's
not
like
a
way
to
kind
of
stop
it.
B
So,
in
terms
of
like
you
know,
if
you're
thinking
about
how
do
I
maintain,
I
think
the
bigger
broader
question
would
be
like
how
do
I,
how
do
I
keep
myself
from
kind
of
getting
stuck
into
one
solution
versus
and
being
able
to
just
kind
of
say
purely
on
sql,
so
that
let's
say
I
want
to
move
from,
you
know
trino
or
spark,
and
I'm
kind
of
maybe
even
analyzing,
both
of
those
right.
Like
you
know,
I
want
to
keep
the
closest
as
sequel
as
possible.
B
I
think
that's
that's
kind
of
where
the
question
is,
and
so
that's
I
mean
that's
really
going
to
come
down
to
you
as
the
implementer
and
maybe
locking
down.
You
know
who
who
actually
writes
that
sequel,
how
that
actually
gets
generated
and
makes
it
there
and
then
you
know
being
having
a
very
tight
control
over.
You
know
what
what
what
can
be
used,
what
can't
be
used
if
you're
trying
to
kind
of
keep
that
more
specifically
focus
on
just
following
the
sql
standard,
I'm
hoping
that's
what
the
question
was.
A
A
A
Yeah,
so
on
that
brand
right,
you
mentioned
that
one.
You
mentioned
some
of
the
highlights
of
trino.
What
are
other
shiny
features
where
you
know
which
trino
solves
yeah.
B
So
the
first
thing
is
interactive
right,
like
a
lot
of
times
when
it
comes
down
to
like
I'm
trying
to
just
do
a
quick
scan
over
this
data
trying
to
kind
of
like
you
know
you
let
this
thing
through
the
data
analyst
or
bi
situation
where
you're
running
you
know
these
queries
that
over
this
you
know,
petabyte
scale,
data
and
you're
trying
to
like
only
you
really
only
care
about,
maybe
one
customer
out
of
all
of
that
right,
and
so
maybe
that
only
would
return
like
a
terabyte
and
then
you're
only
you
have
a
couple
predicates
that
you
know
would
ultimately
filter
the
data
down
to
maybe
a
couple
gig
or
something
like
that,
and
so
what
trino's
really
good
at
doing
is
it's
actually
really
good
at
you
know,
making
very
optimal
query
plans
to
avoid
a
lot
of
like
extra
work,
a
lot
of
and
do
do,
a
lot
of
the
stuff
in
a
in
a
pipelined,
parallel
fashion,
so
that
you
know
the
results
come
back
blazingly
flat
fast.
B
Traditionally,
you
know:
we've
really
kind
of
avoided
fault
tolerance
in
the
in
the
name
of,
let's
avoid
you
know,
making
any
any
kind
of
extra
work
for
us
so
that
we
can
just
get
the
answer
back
as
quickly
as
possible,
and
if
something
does
fail,
you
can
it's
so
quick
that
you
can
just
rerun
it
so
enter
that
that
encompasses
a
whole
workload
of
interactive
analytics
that
we
like
to
call
it
and
so
and
interactive
analytics
would
definitely
be
the
one
kind
of
thing
that
people
know
krino
and
presto
for,
and
so
so
that's
definitely
you
know
kind
of
up
there.
B
The
other
thing
that
we're
very
well
known
for
is
query
federation,
which
is
like
you
know
when
you
have,
let's
say,
data
that
is
sitting
in
elasticsearch,
for
instance,
or
you
have
data
sitting
in
a
real-time
infrastruc
database
like
pino
or
something
like
that,
and
you
want
to
be
able
to
like
ask
questions
or
run
a
machine
learning
algorithm
over
all
these
systems,
and
you
don't
want
to
like.
B
Let's
say
in
the
real
time
situation
you
want
to
be
able
to
get
that
answer
back
really
fast
or
even
get
the
union
of,
like,
let's
say
your
historical
data
sitting
in
delta
lake
and
then
your
your
real-time
data
coming
in
through
apache
pino.
You
want
kind
of
a
union
of
both
of
those
data
sets
to
just
spit
out
somewhere
and
then,
like
maybe
analyze,
in
a
different
table
or
even
just
analyze,
on
the
fly.
So
so
this
this
trino
allows
you
to
be
able
to
do
this
query
federation
super
easy
it.
B
It
becomes
this
very
easy
way
to
kind
of,
like
not
only
query
the
data
ad
hoc
being
able
to
like
you
know
from
multiple
places,
but
you
can
also
you
know,
as
kind
of
tom
mentioned
before
do
what
you
know
c
task
query
create
table
as
and
just
quickly
pull
from
one
data
source
and
then
move
it
into
another.
So
let's
say
you
know
you
want
to
the
the
downstream
or
the
time
that
it
takes.
B
All
of
your
real-time
data
to
make
it
into
delta
lake
is,
is,
is
a
little
bit
longer
for
your
let's
say
batch
pipeline,
so
you
want
to
actually
have
one
process
that
brings
it
in
a
little
earlier
so
that
you
can
do
some
some
faster
query
analytics
over
over
over
delta
late,
and
so
that
would
be
one
option
where
you
can
actually
do
select
from
this
this
you
know
real
time,
data
store
or
or
near
real
time,
maybe
with
elasticsearch
and
put
it
into
my
my
data
lake.
D
B
Like
the
broad
spectrums
of
like
why
people
really
like
trino
alongside
a
lot
of
these
other
query
engines,
so.
A
So,
brian,
before
we
close
that
question,
there's
a
question
related
to
that.
Doesn't
this
me
you
mentioned
about
c
dash
right,
so
doesn't.
D
A
Doesn't
c
test
mean
that
they
will
have
to
have
data
in
either
silo
or
redundant
fashion
if
we
are
using
c
tests
for
trino
and
delta.
B
B
So
you
kind
of
kick
the
can
down
essentially
you're
saying
like
I'm
gonna,
keep
I'm
gonna
keep
you
know,
I'm
not
sure
if
this
is
something
that
I
need
to
do
at
full
scale,
right,
I'm
just
kind
of
analyzing
and
looking
looking
over
my
data
right
now
and
I
essentially
do
not
necessarily
need
to
do
etl
right
now
or
some
big
batch
job.
B
I
just
want
to
kind
of
like
pull
data
from
this
location
and
pull
data
from
that
location
before
I
I
actually-
and
it
gives
you
this
option
of
like
I
could
just
ad
hoc
pull
data
from
both
of
these
siloed
locations
right.
These
kind
of
people
say
data
silos,
but
you
could
also
call
that
a
data
mesh
nowadays,
you
know
so
I
essentially
I
have
data
in
all
these
different
locations
right
and-
and
it's
all
everybody's
kind
of
doing
their
own
thing,
because
this
is
kind
of
like
what
we
get.
B
We
get
the
giant
spaghetti
monster
where
everybody's
got
like
their
own
data
solution,
and
so
what's
what's
nice
about
trino
from
a
query
federation
standpoint
is
that
I
can
as
an
analyst
now
I
don't
have
to
like,
say:
okay,
let
me
go
out
to
this
team
and
have
a
meeting
with
their
team
leader
not
to
go
to
another
team
and
this
other
team
here
to
like
meet
with
all
their
team
leads
before.
B
I
can
actually
get
them
to
agree
that
we're
going
to
run
this
big
giant
etl
process
and
and
put
all
this
stress
on
their
servers
and
then
get
everything
into
one
location
here
I
can
actually
just
do
a
quick
swipe
around
like
as
an
analyst.
I
could
just
immediately
go
hey
I'm
going
to
connect
to
your
data
server,
I'm
going
to
talk
to
your
data
server,
I'm
going
to
connect
to
your
data
server,
and
now
I
can
actually
just
interactively
run.
Queries.
That's
not
going
to
stress
out
either
of
those
servers.
B
It's
just
going
to
be
these
kind
of
small
push
down,
queries
that
get
get
that
does
get
executed
on
them,
but
it's
like
smaller
tidbits
of
queries
and
then
anything
that
can't
get
pushed
down
it
gets
pulled.
It
gets
streamed
into
trino
and
then
trino
basically
does
the
rest
of
the
the
process,
processing
and
slicing
right.
So
so
that's
kind
of
like
you
know
everybody
like
silos,
especially
through
like
acquisitions
and
everything
like
silo
data,
siloing
and
data
kind
of,
like
you
know,
separation.
B
A
A
Yeah,
that's
that's
really
helpful,
so
we
are
only
five
minutes.
We
only
have
five
minutes
left,
but
there
are
so
many
questions
that
that
community
is
asking
so
I'll.
Just
take
one
or
two
more
questions.
Next
question
is:
is
there
any
trino
plus
delta
lake
performance,
benchmark
available
and
case
studies
that
we
can
refer?
So
we
can
definitely
share
the
resources,
but
any
quick
callouts
that
anybody
would
like
to
add.
D
On
the
topic
of
benchmarks
in
general,
so
yeah
there
are
tcp
benchmarks
and
so
on.
I
would
generally
encourage
people
to
say:
what's
your
data
set
and
your
query
pattern,
because
that's
what's
actually
going
to
matter
and
there's
just
so
much
difference
with
any
with
any
performance
benchmarks
like
if
you
change
what
the
data
is
or
how
you
access
it,
you
can
make
any
engine
look
good
or
terrible
right.
So.
C
Yeah-
and
I
definitely
want
to
concur
with
claudius
on
this
one-
it's
like
you,
know,
yeah,
you
can
run
it
on
tpcds.
If
you
want
some
form
of
standardization,
that's
fine!
I
guess
right,
but
the
reality
is
anything
like
any
modifications
to
the
data
depends
on.
Are
you
running
this
on
aws?
Are
you
are
learning
this
min
io?
Are
you
running
like
you
know?
C
There
are
all
these
variables
that
are
going
to
come
into
play
that
are
going
to
change
everything,
so
I'm
just
going
like
nah,
I'm
with
claudius
on
this
one
like
really
test
it
out
with
what
you
have
and
don't
forget
brian
also
has
his
actually
can
we
repost
brian's
blog,
like
he
actually
has
a
blog
about
like
about
how
benchmarking
is
actually
bad,
so
brian?
Why
don't
you
chime
in
since
I.
B
I
just
get
really
frustrated
right,
like
being
in
the
being
an
engineer,
that's
trying
to
figure
out
like
top
from
bottom,
and
what's
what's
a
good,
you
know.
Where
do
I
get
started
every
single
time
you
go
to
any
vendor
site
or
even
like
os
sites
now
are
starting
to
even
get
into
this
practice
a
little
bit
and
it's
it's
kind
of
sucks.
Is
that,
like
you
know,
you
go
there
and
like?
What
do
you
expect
to
see
benchmark?
B
That
shows
the
whatever
vendor
you're
at
that's
just
like
shows
a
ridiculous
like
weirdly
comparison,
you're
like
what
did
they
do
to
get
that
benchmark?
You
know
what
I
mean
like
that's
a
question
you
need
to
ask,
and
so,
like
my
my
current
approach
and
my
way
that
I
like
to
kind
of
work
with
the
community
is
like,
I
actually
want
them
to
help.
B
I
want
to
help
them
figure
out
how
to
create
a
benchmark,
ideally
for
cheap,
because
I
know
like
not
all
the
time
your
boss
is
like
really
up
to
letting
you
spend
a
whole
bunch
of
money
on
benchmarking.
It
would
be
nice
if
you
just
had
a
never-ending
budget.
I
understand
that
so,
like
my
hope,
is
that
I'm
actually
working
now
on
some
resources
too,
where,
like?
B
How
do
we
enable
the
community
to
just
run
their
own
benchmarks
on
their
own
data,
with
their
own
data
shapes
on
the
specific
systems
that
they're
using
like
it's
always
going
like?
The
answer
is
always
it
depends
right
and
so
benchmarks,
don't
really
say
anything
except
for
you
know
it's
literally
just
technical,
peacocking
and
so
like
yeah.
So
that's
that's
really
all
it
comes
down
to
I.
I
wouldn't
trust
too
many
like
benchmarks
to
too
much
right
and
there's
obviously,
third
parties
that
maybe
do
a
better
job
and
try
to
be
as
unbiased
as
possible.
B
Maybe
take
some
heat
into
those.
I
mean,
I'm
not
saying
it's
not
worthwhile,
but
anyways
you.
You
should
do
your
own,
ideally
try
to
do
your
own
benchmarks
or
get
a
group
of
your
friends
or
local
businesses
that
try
to
get
up
together
and
do
a
benchmark
together
and
help
analyze
like
stuff.
It's
it's
so
silly.
This
these
games,
we
play
anyways.
C
D
B
D
B
Need
to
train
people,
you
just
need
to
help
people
out
and
make
and
figure
out
help
them
with
cost-effective
ways,
and
you
could
do
that
a
lot
with
the
cloud
these
days
right
and
maybe
you
just
have
a
benchmarking
fest,
where
you
just
get
around
get
together
with
a
bunch
of
your
friends
like
open
source.
Benchmarking
is
basically,
you
know,
hashtag
open
source
benchmarking.
Let's,
let's
make
it
happen
now.
You
know
yes,.
C
A
A
That's
where
we
can
have
you
know
brian
do
put
on
the
rabbit
hat
with
electric
guitar.
At
the
same
time,.
A
Because
that
was
the
question
bran
for
you
that
you
ever
play,
do
you
ever
put
a
rabbit
hat
and
play
the
girard?
At
the
same
time?
Oh.
B
Go
go
to
the
trino
youtube.
It
is
actually
the
front
front
thing
of
me
literally
playing
margaritaville,
but
a
parody
of
it
in
the
trino
bunny
hat
so
so
go
go
check
out
the
torino
youtube
and
please
subscribe.
Okay.
C
A
Awesome
so
last
quick
call
out.
I
did
share
the
resources
of
trino
and
delta
community
to
for
you
to
join
the
slack
channels
and
ask
away
questions,
because
I
see
that
there
are
amazing
questions
which
our
data
engineering
journey
you
know
goes
through.
So
a
quick
call
out.
We
have
amazing
delta
sessions,
developer,
lounge
and
community
events.
We
will
also
have
delta
contributors
from
the
ecosystem
at
our
summit
in
june.
A
So
I
shared
a
blog
if
you
want
to
check
out
who
is
coming
because
I'm
pretty
sure
you
will
love
who
who
the
people
are
who
are
doing
keynote,
and
there
are
amazing.
You
know
sessions
for
delta
lake
as
well,
and
I
also
shared
a
link
to
register.
So
hopefully
that's
helpful
so
with
that.
A
If
you
don't
have
any
more
thoughts
and
since
our
time
is
running
out,
let's,
let's
call
it
great.