►
From YouTube: Delta Lake Community Office Hours (2022-04-28)
Description
Join us on April 28, 2022 at 9 AM PST/12 PM EST for the Delta Lake Community Office Hours! Ask your Delta Lake questions live including but not limited to Apache Flink, PrestoDB, and TrinoDB connectors. Join Fabian Paul from Ververica, Scott Sandre, Denny Lee and Vini from Databricks!
The session will be hosted live and the recordings are available on the Delta Lake YouTube channel.
A
And
we
are
also
live
on
zoom,
so
we
have
three
channels
today,
so
please
say
hi
where
you're
from
and
then
for
those
of
you
who
are
new.
These
sessions
are
live
and
occur
every
two
weeks
on
thursdays
at
pacific
time.
So
we
bring
a
panel
of
contributors
and
champions
of
delta
lake
to
answer
some
of
your
questions
about
delta
lake,
open
source
software
and
also
the
things
we
are
doing
with
other
open
source
ecosystem
projects,
so
welcome.
All
of
you.
A
All
right
so
without
further
ado,
let's
go
ahead
and
get
started
with
introductions.
So
fabian,
do
you
want
to
kick
it
off.
B
Sure
hi
everyone,
so
I'm
fabian,
I'm
currently
active,
committer
and
contributor
to
the
apache
flink
project
and
and
I'm
currently
employed
with
bavaria
and
yeah.
I'm
basically
here
today
to
talk
about
the
new
flink
delta.
Sync,
where
you
can
just
write
your
data
from
a
sling
pipeline
into
your
delta
lake.
A
Awesome,
thank
you
for
being
a
fabian
scott
you're
up
next.
C
Hi
everyone,
I'm
scott,
I'm
a
software
engineer
on
the
delta
ecosystem
team
here
at
databricks,
and
I'm
here
today
to
talk
about
a
bunch
of
things,
including
deltalake
and
dot,
o
huge
new
release,
which
includes
a
lot
of
features
that
I
worked
on,
that
our
team
worked
on,
including
data
skipping
and
adding
multi-cluster
rights
to
s3,
as
well
as
our
connecto
ecosystem,
which
includes
the
the
fling
sync
that
fabian's
here
to
talk
about
as
well.
A
Awesome
awesome,
denny.
D
A
That's
awesome:
I'm
excited
for
this
panel,
so
a
mini
recap.
From
last
session
we
had
td
and
ivana,
who
provided
amazing
insights
on
the
key
features
of
delta
1.2,
which
included
column,
start
generation
and
data
skipping,
which
is
a
big
boost
to
read
performance.
We
had
also
discussed
file
compaction
feature
and
improvements
to
restore
to
delta
table
to
earlier
snapshots.
A
We
also
chatted
about
the
delta
connectors,
with
other
open
source
tools
like
delta
of
linksync
and
updates
on
delta
flink
source,
and
also
we
got
a
few
insights
on
future
releases,
like
z
ordering,
and
you
know
some
of
the
other
features
which
we
can
still
hear
from
our
panelists
today.
In
case
you
missed
it.
The
recording
is
still
available
I'll
share
that
in
the
chat.
A
So
this
brings
us
to
today's
session,
where
I'm
excited
to
talk
about
other
things:
data
like
data
processing
streaming
and
the
flinking
connector
from
with
fabian
from
verica
the
original
creators
of
a
part
of
apache
flink
itself.
So
and
then
we
have
other
contributions
from
you
know:
scott
denny
and
other
data
lake
contributors.
So
we
will
discuss
that
without
further
ado.
A
B
Yeah,
maybe
let's
let
me
start
with
a
quick
few
words
about
apache
flink,
so
apache
flink
is
like
very
similar
to
what
probably
most
of
you
have
already
seen
or
heard
from
sparks.
So
it's
like
a
unified
batch
and
streaming
engine,
but
like
in
the
recent
days.
We
are
mostly
focusing
on
on
our
streaming
execution
because
we
see
like
batch
mainly
as
like
the
special
case
of
streaming
application.
B
So
a
new
connector
can
be
basically
used
in
like
streaming
and
batch
use
cases,
but
I'm
we
are
mostly
professionally
focusing
on
like
streaming
use
cases
where
you
can
insert
your
your
data
lake
with
like,
like
low
latencies
and
like
high
throughput.
B
So
the
idea
is
yeah.
Currently
we
our
are
still
working
on
like
different
api
levels
of
the
data
sync,
but
in
the
current
situation
you
can
use
it
right
away.
I
think
it
has
been
released
with
one
of
the
delta
releases,
so
probably
scott
knows
more,
which
that
release
exactly
it
came
with
so
yeah.
I'm
looking
forward
to
your
feedback
using
the
project.
A
Awesome,
danny
and
scott
from
your
standpoint,
what
are
the,
how
did
the
release
go?
What
were
the
features,
or
you
know
things
which
got
you
excited
throughout
working
on
this
project?.
C
C
We
want
it
to
work
with
every
compute
engine,
so
the
delta
connectors
repo
contains
both
the
standalone
writer,
which
is
our
spark
list,
writer
and
reader
for
all
the
delta
log
metadata,
as
well
as
all
of
our
future
connectors,
which
includes
right
now,
the
flink
sync
which
was
just
released
and
then
in
the
next
release
in
the
coming
months.
We
also
hope
to
release
the
flink
source.
So
a
lot
of
exciting
work
going
on
there.
D
I
mean
really,
I
think,
fabian
and
scott
covered
most
the
call-outs,
but
if
you
are
interested
in
testing,
if
you're
interested
in
also
participating
and
doing
some
more
additional
prs,
please
join
us
in
the
delta
user
slack,
I'm
probably
going
to
be
a
broken
record
on
this
one,
but
join
us
here.
There's
actually
a
flink
connector
channel
specifically
to
talk
about
these
things
as
well,
and
so
we'd
love
to
get
your
feedback.
This
has
been
a
very
popular
project.
D
Lots
of
folks
have
been
asking
us
tons
of
questions
about
this
from
the
community
in
general.
So
please
love
to
have
you
test
it?
That's
pretty
much
my
little
addition.
A
Awesome
so
with
that,
with
that
note
denny,
there
are
some
questions
around
like
what
are
the?
What
are
the
limitations
of
this
connector.
B
Yeah
yeah,
I
I
can
start.
I
think
one
of
the
current
limitations
of
the
connector
is
that
in
the
first
version
we
only
support
like
amending
to
two
tables.
So
basically
we
can
create
the
table
and
then
amend
to
existing
ones,
but
in
the
future
we
will
also
like
plan
to
support
like
upsell
statements,
where
we
can
basically
update
existing
tables.
But
this
is
currently
like
not
supported
yet.
A
Got
it
and
are
there
any
others,
and
if
this
is
the
limitation,
what
is
the
plan
of
making
it
you
know
available
in
the
next
release
or
something
like?
Can
you
give
us
some.
B
Yes,
so
I
think
the
next
few
few
releases
will
mainly
concentrate
on
like
bringing
the
delta
thing
to,
like
the
other
apis
that
are
supported
things.
So
one
very
interesting
project
that
is
currently
being
worked
on
is
like
bringing
it
to
sequels,
so
that
yeah
it
makes
it
like
way
easier
for
businesses
to
track
the
data.
B
If
you
can
just
use
your
usual
flink
sql
statements
and
you
do
not
have
to
like
worry
about
any
internal
or
any
of
the
connector
apis,
and
one
thing
that
also
like
becomes
like
more
and
more
important
than
modern
businesses
is
the
use
of
catalogs
that
you
can
basically
share
the
different
delta
tables
within
your
organization
or
within
the
different
teams.
So
I
think
this
feature
will
be
also
be
live
once
the
support
for
flinxy
was
released.
A
B
Yeah,
I
can
start,
and
then
probably
someone
from
scotland
can
take
this
over
yeah.
I
I
would
say
there's
no
like
clear
winner
in
in
terms
of
if
you,
if
you're
thinking
about
performance
or
any
of
these
kind
of
like
metrics,
I
think
it
really
depends
in
a
bit
of
if
you're,
working
in
like
a
larger
organization.
B
What
kind
of
expertise
is
already
in
your
company
and
then
it
depends
on
if
there
are
like
more
people
already
familiar
with
spark
or
more
people
already
familiar
with
flink,
and
I
would
probably
say
that
the
current
delta
lake
support
is
still
better
with
sparks.
This
is
probably
where
it
originated.
D
Yeah
I
mean
I
I'll
just
add
a
little
bit.
I
mean
exactly
as
what
fabian
would
call
out
right.
I
mean,
if
you're,
already
familiar
with
the
flink
ecosystem.
That
makes
a
ton
of
sense
if
you're
already
familiar
with
the
spark
ecosystem.
That
makes
a
ton
of
sense
each
has
their
own
strengths.
We're
not
going
to
do
a
competition
here
right.
D
What
we're
really
talking
about
is
the
fact
that
both
systems
do
require,
though
reliability
on
the
underlying
storage
right
and
the
fact
is,
your
cloud
object
store,
can
itself
in
it,
whether
it
doesn't
matter
which
distributed
system
you're
working
with
will
ultimately
have
issues
when
it
comes
to
orphaned
files.
That
really
a
transaction
log
is
going
to
be
super
helpful
to
ensure
you
have
the
reliability
of
your
data.
So
that's
what
we're
really
focusing
on
today
and
just
like
fabian
called
out.
D
There
are
some
aspects
that
the
because
delta
lake
was
initially
built
with
spark
that
yeah
we
have
a
little
bit
more
code
base
available
for
that,
admittedly
enough,
but,
as
you
can
tell
flink,
is
super
important
super
popular.
So
we've
started
already
with
the
the
data
stream
api
for
starters,
which
is
around
flink
1.12,
then
we're
currently
working
on
the
table
api.
That's
what
scott
had
already
called
out.
D
That's
flink,
1.14
onwards,
correct
me
if
I'm
wrong
with
the
version
numbers
again
by
the
way,
so
I'm
trying
to
do
this
all
out
of
memory
and
and
also
this
allows
us
and
also
we're
trying
to
work
on
the
flink
delta
source,
so
we're
getting
these
features
in
and
that's
actually
what
I
meant
by
going
ahead
and
pinging
us
on
the
delta
user
slack
or
especially,
the
flint
connector,
because
if
there
are
other
features
that
you
would
like
to
see
faster
and
especially,
if
you're
interested
in
pulling
pull
requests
of
your
own,
please
do
chime
in
because
we'd
love
to
go
ahead
and
number
one
get
the
help
and
number
two
also
get
your
opinion
and
feedback
on
what's
working
and
what
should
be
prioritized.
D
Know
I
think
you're
are
you
buying
to
the
person
who
asked
the
question?
Are
you
by
any
chance
of
referring
to
kedrow
like
the
mckinsey
project,
open
source
project.
D
So
if
it's
that
so
cadro
actually
works
more
like
is,
is
sort
of
like
a
project
that
actually
works
on
the
ml
flow
side.
So
it's
not
to
say
that
it
can't
work
with
delta
like
what
the
offset
it
can
actually,
but
I'm
not
entirely
unders.
I'm
not
entirely
sure
what
the
question
is.
So
I
apologize.
If
you
can
clarify
that
a
little
bit,
I
can
probably
help
help.
Explain
that
one.
But
yes,
just
the
long
story,
short
pedro
actually
does
we
actually
have
a
partnership?
D
Well,
a
community
that's
working
together
with
quedro
and
mf
mflo
and
delta
lake,
but
it's
still
very
early
stages.
So
I
didn't
want
to
like
promise
something
yet
because
again,
very
early
stages,
so.
A
Yeah,
I
think
that
helped
denny
and
you
hit
the
right,
the
right
product
pedro,
which
works
with
them
also,
so
I
think
deepish
is
nodding
that
yeah.
This
is
what
I'm
asking
for
awesome.
There's
another
question
on
when
we
should
use
delta
lake
when
we
can
store
data
in
adls,
gen
2.
A
D
I
can
tackle
that
and
if
others
want
to
chime
in
afterwards,
so
in
general,
adls
to
gen
2
is
a
format
that
works
perfectly
great
with
delta
lake.
So
you
can
absolutely
whether
you're
working
within
the
context
of,
for
example,
something
like
azure,
databricks
or
you're
working
within
the
context
of
delta
oss,
where
you're
generating
your
own
vms
within
the
azure
environment,
to
save
directly
the
adls
gen
2.
It
actually
still
is
the
same
process
now
adas
gen
2,
as
opposed
to
like
s3,
and
this
is
also
simple
for
gcs.
D
They
actually
have
this
concept
of
put
of
absent
consistency
that
s3
does
not
have
so
it
does
actually
have
that
full
transactional
consistency,
issues
which
allows
us
to
have
multi-cluster
rights,
multi-cluster,
jvm,
multi-cluster
driver
rights
to
the
same
storage
system.
In
fact,
scott.
You
can
probably
chime
in
afterwards
to
talk
about
the
s3
multi-cluster
right
thing
by
the
way,
but
to
answer
your
question
about
adl
gen
2,
but
irrelevant
irrelevant
of
even
though
it
does
have
those
consistencies.
D
The
thing
is
that
there's
still
the
issue
of
and
again
whether
it's
flink
or
spark
or
any
other
distributed
system.
That's
doing
the
right,
you
could
still
leave
orphan
files
that
a
transaction
log
will
protect
that
data.
So
because
of
that
yeah
I
mean
I
would
generally
almost
run
always
my
stuff
up
with
adl
as
a
delta
lake
table
on
adl
gen
2
to
protect
the
data.
So
a
bit
long-winded,
I
realized,
but
the
context
is
yeah,
even
with
the
put
of
absence
consistency
that
adls
gen
2
has.
A
C
Yeah,
just
I
would
just
add
on
some
more
pros
of
using
delta
lake
in
general,
of
course,
whether
it's
on
adls,
gen,
2
or
any
other
cloud
storage.
It's
not
just
reliability
as
denny
said.
There's
also
features
you
know,
time
travel,
schema,
evolution,
schema
enforcement,
there's
all
these
great
things
that
can
help
you
protect
your
data
and
query
it
more
efficiently,
so
yeah.
A
That's
awesome,
scott,
so
I
think
that
brings
us
to
the
next
question
so
drew
is
asking
in
the
last
few
sessions
optimize
for
delta
lake
is
going
to
be
made
open,
source
and
q1
any
updates
on
the
roadmap.
That's
perfect
because
we
already
released
it
so
and
scott.
Do
you
want
to
tackle
that
question
like
what
optimize
does
and
a
little
bit
higher
expansion
on
that
feature?
Yeah.
C
Yeah
for
sure,
so
optimize
was
released
in
delta
lake
1.2,
and
what
optimize
does
it
helps?
You
compact
your
files
and
what
this
does
is
it
helps
you
solve
the
small
file
problem,
which
is
that
you
have
these
streaming
rights
you
have
all
these
files
coming
in,
which
is
great
delta
lake,
is
able
to
handle
that
we
partition
our
data,
but
but
compacting
them
into
into
larger
files
helps
with
reads
helps
with
the
listing
problem
on
various
cloud
stores.
I'm
sure
denny
can
add
on
more
benefits
of
this
feature.
A
Great
danny,
do
you
want
to
add
more
like.
D
No,
actually,
I
think
scott
said
succinctly.
I
mean
the
the
reality
is
that
it's
we,
we
dealt
the
blog
for
delta
one
day,
two
is
coming
out:
we're
gonna
have
a
lot
more
explanations,
deep
dive
into
it.
So
if
you
wanna
wait
till
our
next
community
n
a
we'll
dive
even
deeper
as
well,
but
the
context
is
it's
a
great
feature
that
allows
for
faster
performance
and
just
like
scott
called
out.
If
you
have
any
more
questions
in
the
interim,
either
ping
us
here
right
now
or
and
the
delta
user
slack.
A
Awesome
so
denis
questions
are
flooding
in
I'll.
Ask
another
question:
moving
from
a
traditional
rdbms
data
warehouse
to
a
lake
house
with
delta
lake
data,
modeling
changes
should
data
modeling
changes.
Should
we
make
if
any
and
oh
so,
I
think
he's
asking
like:
should
we
make
any
data
modeling
changes?
If
any
and
what
are
some
considerations,
we
should
take
into
account
when
doing
this.
Migration.
D
Oh,
no,
that's
a
wonderful
question
and
so
to
zachary.
I
think
you're,
the
one
who
asked
the
question
live
so
again,
I'll
provide
the
context,
and
so,
by
the
way,
the
part
of
the
reason
why
I
love
it.
Zachary
is
because
I
actually
am
formally
from
sql
server
team
so
built
used
to
build
data
warehouses
so
complete
rock,
where
you're
coming
from
so
the
question
really
is
about.
If
I'm
coming
from
a
traditional
relational
data,
warehouse,
hey,
I'm
gonna
go
move
to
the
lake
house
in
delta
lake.
That's
awesome!
D
Do
I
have
what
considerations
do
I
have
to
consider?
Okay,
so
from
a
data
modeling
perspective,
for
example,
type
particularly.
We
talked
about
in
like
like
olap
dimensional
models,
or
we
talk
about
infirm,
third,
normal
form
or
things
of
that
nature
in
general.
That's
not
necessary,
doesn't
mean
you
can't
do
it
by
the
way.
I
want
to
be
very
clear
about
this.
It's
not
like
you
can't
build
a
star
schema
within
the
context
of
a
data
like
house.
It's
just
it's
no
longer
necessary
to
to
do
this.
D
By
the
same
token,
if
you
are
going
ahead
and
taking
an
existing
system,
in
fact,
actually
I
would
transition
almost
exactly
as
you
would
have
designed
it
in
a
relational
data
warehouse
first
purely
from
the
standpoint
to
make
it
easier
for
that
transition
to
happen,
but
then,
over
time,
you'll
recognize
there's
various
advantages,
especially
when
you're
working
with
spark
or
flink
or
any
of
these
other
systems
that
are
interacting
with
delta
lake.
To
you
know,
for
example,
just
things
like
oh
do
I
need
the
equivalent
of
identity,
columns
or
like
uuids.
D
Do
I
need
to
build
circuit
keys?
Some
of
them
will
be
necessary
and
in
fact
what
you'll
hear
is
that
we
actually
have
multiple
sessions
about
how
to
take
data
warehousing
concepts
with
myself
and
douglas
moore
on
how
to
do
exactly
that
surrogate
keys,
type,
2,
slowly,
changing
dimensions,
things
of
that
nature,
so
I'll
find
some
time
to
prop
it
into
inside
the
the
linkedin
and
zoom
for
without
those
links.
But
again
it
these
common
processes
that
you
traditionally
do
with
rd
masses.
D
A
Awesome.
Thank
you
very
much.
There's
another
question
on
how
we
are
making
integrations
with
power
bi
simple,
so
that
it
can
be
used
over
delta
lake.
C
A
great
question,
so
we
do
already
have
a
power
bi
connector
on
the
delta
connectors
repository
and
feel
free
to
check
that
out.
That
was
actually
contributed
by
an
external
contributor,
so
the
author
of
that
would
know
more
about
it
than
I
would.
But
if
you
have
any
questions
about
that
again,
join
us
on
the
slack
feel
free
to
ask
about
the
specific
connector
and
I'm
sure
we
can
help
get
you
some
more
information
about
that.
A
Yeah,
that's
awesome
and
also
we
release
like
a
lot
of
other
integrations
with
other
open
source
ecosystem,
as
well
as
reporting
tools.
Data
processing
tools
so
definitely
check
out
our
website
where
we
have
all
this
information
and
also
engage
with
us
on
slack
or
github,
so
that
we
can
see
what
you
are,
what
you
are
considering
to
build
and
we
will
be
happy
to
follow
great.
A
So
there
is
another
question
on
so
you
mentioned
about
bringing
more
bringing
more
performance
into
delta
lake,
and
you
also
mentioned
about
transaction
log.
What
are
some
of
the
features
that
were
released
in
1.2,
which
people
can
find
it
beneficial
for
processing,
big
data.
C
Yeah
great
question
so
we've
already
mentioned
optimize,
we've
already
talked
a
little
bit
about
that
and
that
will
give
you
huge
performance
improvements
as
well,
but
so
will
data
skipping
data,
skipping
or
file
stats
collection
are
pretty
much
the
same,
the
same
solution,
so
this
major
feature
we
added
lets
you
actually
collect
per
column
stats
as
you're
writing
to
the
delta
log
and
with
the
very
minimal
write
overhead
and
what
this
means
is
in
addition
to
partition
skipping
during
reads:
we
can
actually
look
at
the
per
column
stats
to
determine
which
data
files
to
actually
skip,
and
this
we've
various
benchmarks
have
concluded
different
things
about
the
actual
read
over
improvement.
A
That's
great
and
then
there
are
some
questions
about.
A
How
can
we
join
this
live
channel?
I
pasted
a
link
in
the
chat
for
slack
and
git.
Please
check
that
out.
I
think
this
remains
a
very
popular
question
for
all
our
amas.
Any
support
for
python
connectors
most
existing
connectors
are
jvm
based.
D
I
can
probably
chime
in
a
little
bit
on
this
one.
So
long
story
short.
Actually
there
is
a
python
binding.
That's
part
of
the
delta
rust
api,
so
you
can
actually
check
that
out.
The
delta
rest.
Api
actually
has
python
bindings
that
allow
for
read,
there's
currently
work
being
done
for
the
delta
for
the
delta
rest
api
for
writers
as
well,
and
so
then,
subsequently
we'd
love
to
have
help
to
advance
those
python
bindings
to
go
ahead
and
support
the
writing
as
well.
D
There's
also
ruby
support,
in
addition
to,
of
course,
rust
itself,
there's
also
work
being
done
by
the
community
for
golang
as
well,
by
the
way,
just
as
an
fyi,
it's
still
very
much
early
stage,
but
so,
in
other
words,
we
can't
put
it
up
on
the
the
website
yet,
but
there
is
actually
work
being
done
on
that
front
as
well.
So
just
as
an
fyi.
A
Awesome
there
is
a
question
about
migration.
C
Yeah,
so
one
thing
I
would
I
would
love
is
help
on
getting
more
python
binding.
So,
for
example,
we
have
the
jvm
based
delta,
standalone
library
and
the
delta
connectors
repository,
and
what
we
want
this
library
to
be,
and
what
it
is
right
now
is
the
source
of
all
metadata
interactions
with
the
delta
log,
not
involving
spark.
C
This
is
our
sparkless
standalone
library
and
currently
it's
jvm
based,
but
we
would
love
as
an
open
source
community
member
to
go
and
help
us
or
make
a
project
plan
or
make
a
design
dock
to
add
python
bindings
to
that,
because
again,
this
is
the
source
of
all
metadata
interactions,
we'll
be
adding
more
more
features
to
it
over
time
for
a
sparkler's,
connector
and
adding
python
bindings.
To
that
would
be
able
to
have.
You
know
a
lot
larger
scope
of
impact,
since
our
hive,
flink,
etc.
Connectors
are
already
using
this
as
well.
A
That's
awesome
call
out,
I
think
yeah.
So
thank
my
thank
you
mayank
and
andy
for
asking
great
questions.
There's
another
question
from
through
he's
asking:
when
is
the
rust
api
for
writing
scheduled
for
release.
D
Oh
right
now
it's
turn
it's
currently
planned
for
q2,
which
is
basically
the
way
I
figure
around
june
july.
Roughly
we'll
have
we
have
regular
updates
actually
on
it.
So
just
join
back
to
the
delta
user,
slack
there's
a
rust
api
there's,
actually
a
delta
rust
channel
as
well.
So
you
can
definitely
ping
your
questions
there,
but
right
now
at
least
that's
where
we're
currently
at.
A
D
Okay,
so
it
really
so
it's
always
the
it
depends.
Okay,
so
now
part
of
the
reason
I'm
using
the
it
depends
is
because,
for
some
people,
some
designs,
what
they
have
done
is
that
they
put
the
dimension
data
they've
left
that
in
their
relational
store
right,
and
so
what
they're
doing
is
they're
running
their
queries
that
join
the
relational
data
with
their
their
their
their
lake
house.
With
the
actual
the
fact
data
which
is
sitting
in
delta
lake,
that's
a
perfectly
valid
system,
and
especially
common
when
you
do
a
lot
of
ad-hoc
queries.
D
So
this
is
a
complete
normal
system,
especially
when
they're
using
some
other
system
to
ensure
that
the
dimensional
data
integrity-
okay,
which
is
in
other
words
as
you,
have
something
like
type
2,
slowly
changing
dimensions.
You
over
time
will
need
to
update
that
this
is
actually
controlled
by
a
third-party
system,
not
necessarily
controlled.
By
the
same
folks
who
run
the
data
platform
where
delta
lake
is
residing.
D
On
the
other
hand,
if
you
do
have
full
control
of
the
system,
it
is
also
just
as
common
to
say
forget
it.
I
actually
don't
need
a
slowly
changing
type
of
slowly
changing
dimension
type,
two
dimension,
because
I'm
actually
just
storing
the
actual
dimensional
value
directly
in
the
parquet
files
that
make
up
the
delta
like
table.
That's
actually
a
very
valid
approach
too.
The
concern
usually
is
when
you're
looking
at,
like
your
favorite
favorite
bi
tool
like
power,
bi
or
looker
or
tableau
that
then
the
generation
of
the
actual
dimension
data
becomes
complicated.
D
So
then,
again,
all
of
a
sudden,
you
are
inclined
to
build
a
star
schema,
so
that
schema
may
be
built
after
the
fact
I,
when
I
talk
about
the
medallion
architecture,
you're
building
it
as
post
gold
tables
versus
actually
building
them
with
it
now.
Suffice
it
to
say
I
gave
you
an
answer,
but
it
isn't
a
straightforward
one.
Is
it
and
that's
more
or
less
the
point?
D
It
really
depends
on
your
environment
and
how
your
pipelines
are
structured
in
which
whether
it
makes
sense
to
actually
stay
with
a
star
schema
or
to
go
ahead
and
remove
it,
and
so,
like
I
said
before,
I
would,
if
you
already
have
it,
I
wouldn't
necessarily
throw
it
away
right
away.
Okay,
I
would
definitely
allow
your
trans
allow
yourself
to
migrate
slowly
off
it.
D
There,
like,
I
said
before,
they're
distinct
advantages
of
having
a
star
schema
and
there
are
just
advantages
of
not
having
one
and,
incidentally,
all
the
advantages
and
disadvantages
of
a
star
schema
pretty
much
everyone
that
you
can
think
about
within
a
database
like,
for
example,
the
replication
of
data
and
things
of
that
nature
actually
apply
to
the
lake
house.
So
frankly,
you
probably
already
know
the
answer
which
is
pretty
sweet.
Hopefully
that
answers
that
question.
A
That's
awesome.
I
think
that
that
gave
a
very
good
overview
and
answered
a
bunch
of
questions
danny.
Thank
you
so
with
that
just
last
question
to
all
of
all
three
of
you
like,
if
you
all,
can
take
a
moment
and
tell
us
about
what
is
coming
up
next,
what
are
the
features
that
you
are
excited
about
and
working
on
so
fabian
I'll
start
with
you.
B
I
think
I'm
really
excited
about
like
seeing
the
data
source
in
place,
because
this
basically
enables
like
a
nice
end-to-end
experience.
You
can
basically
use
your
flink
pipeline
to
process
like
any
tables
you
already
have
and
to
and
right
into
other
data
tables.
I
think
this
really
enables
like
using
flink
in
like
a
full
yeah
fully
in
full
processing
mode
for
like
data
data
tables.
B
A
So
end
to
end
streaming
dreams
coming
true
with
flink
and
delta
right.
That's
awesome!
How
about
you,
scott.
C
Yeah
two
exciting
features:
I'm
really
really
looking
forward
to
are
both
z
order,
which
will
help
you
really
optimize
how
you
well
run
the
optimize
command,
which
is
the
compat,
the
the
compaction
that
solves
the
small
file
problem.
That's
one
really
exciting
feature,
I'm
looking
forward
to
as
well
as
change
change
data
feed
which
will
actually
let
you
capture
row
level
changes
from
your
delta
table,
which
is
something
that
a
lot
of
users
have
been
asking
for,
and
that's
actually
what
I'm
working
on
right
now.
A
D
I
I
the
first,
you
know
fabian
and
scott
already
took
my
two
favorites
once,
but
in
addition
to
that,
I
actually
did
want
to
call
it
all.
These
awesome
integrations
right,
so
we've
recently
released
the
presto
one.
We've
released
the
recently
released
neutrino
one
there's
actually
an
upcoming
one
with
pulsar,
and
for
me
it's
less
about
the
newer
features
about,
but
just
integrating
more
and
more
and
improving
the
features
we
currently
have,
for
example,
improving
the
presto
writer
working
with
the
trinity
community
with
with
improving
their
reader
and
writer.
A
That's
awesome.
Thank
you
all
this
is.
This
was
a
wonderful
session.
We
got
a
lot
of
insights
and
I'm
pretty
sure
there
were.
There
were
a
lot
of
people
who
asked
great
questions
and
got
amazing
insights
into
our
project.
They
are,
it
seems,
like
a
lot
of
people
asked
about
slack
channel
link,
so
they
feel
you
know.
Please
welcome
all
the
you
know
all
the
new
integrations
or
requests
from
them
so
very
exciting.
So
thank
you
all
for
joining
again
reminder.
A
Please
join
us
through
all
previous
channels.
I
posted
a
link
in
the
linkedin
for
all
different
channels
that
are
available
for
delta
lake,
and
then
this
sessions
are
live
every
two
weeks
on
thursdays.
9
am
so
please
look
forward
to
joining
us
next.