►
From YouTube: Delta Lake Community Office Hours (2022-02-03)
Description
Join us for the next Delta Lake Community Office Hours and ask us your #DeltaLake questions. Thanks!
A
B
B
B
All
right,
hi
everybody-
we
are
joining
our
second
office
hours
for
this
year.
We
have
scott
ryan,
denny
and
myself
on
the
panel.
So
why
don't
we
introduce
yourself
scott?
You
want
to
go
ahead.
D
B
Awesome,
thank
you,
scott
and
ryan
danny.
Why
don't
you
introduce
yourself.
A
Oh
thanks
very
much
vinnie
hi
everybody.
My
name
is
danny
lee,
I'm
a
long
time,
brickster
long
time,
spark
delta
lake
guy.
Just
here
to
answer
some
questions.
I
did
want
to
call
out
right
away
that
we-
probably
you,
probably
have
some
questions
on
the
2022
h1
roadmap
that
was
published
in
a
blaze
of
glory
last
night,
so
we're
going
to
post
it
in
there
but
vinnie.
Why
don't
you
go
ahead
and
start
the
show
and
meanwhile
I'll
go
ahead
and
post
the
roadmap
to
both
linkedin
and
youtube
concurrently.
B
Awesome,
thank
you
for
the
introduction,
all
the
panelists.
We
have
attendees
from
virginia
bangalore-
seattle,
argentina,
oh,
my
god,
nice
england,
bristol
atlanta,
wow,
so
many
folks
around
the
globe
so
excited
to
have
you
here.
So,
as
denis
mentioned,
we
just
released
the
roadmap
yesterday
night.
If
you
have
any
questions
about
roadmap,
please
ask
us,
ask
away
and
I
will
be
monitoring
youtube
as
well
as
linkedin
channels.
So
please
post
your
comments
over
there
yeah.
B
So
let's
begin
with
our
first
question
scott:
do
you
want
to
provide
update
on
the
flink
delta
connector?
I
think
that's
the
that's
really
hot
in
the
market.
Right
now,.
C
Yeah
for
sure
vinnie
I'd
be
happy
to
so.
For
those
that
don't
know
just
this
this
past
year,
we
we
finally
completed
both
the
read
and
the
write
functionality
of
our
delta
standalone
library
and
what
this
library
does
is
it
lets
in
a
single
jvm
without
using
spark
any
connector
actually
integrate
with
delta
lake,
which
is
great
because
now
it's
really
helping
to
add
a
lot
more
connectors
to
our
ecosystem.
C
So
one
of
those
connectors
that
we're
working
on
right
now
is
the
flink
sync
we're
also
working
on
a
source
as
well,
so
flink
sync's
coming
along
we're
just
finalizing
the
public
api
and
getting
all
the
making
the
api
as
distinct
and
useful
to
people
as
it
could
possibly
be,
and
we're
just
finishing
up
some
some
java,
docs
and
some
examples.
So
we
hope
to
be
releasing
that
soon.
B
So
a
follow-up
question
scott.
Does
it
mean
if
I
have
flink
as
my
flink
as
my
system?
Would
I
be
able
to
read
from
delta
lake,
or
would
I
be
also
able
to
write
to
the
delta
lake.
C
So
so,
there's
both
the
sync,
which
is
writing
and
the
source
which
is
reading
from
our
feedback
from
the
community.
The
sync
was
a
higher
priority,
so
that's
what
we're
doing.
First,
we're
we're
letting
flink
connectors
and
flink
engines
right
into
delta
and
then
soon
enough,
as
as
we
as
we
come
along
with
the
source,
you'll
also
be
able
to
read
from
delta
lake
as
well.
B
That's
awesome
yeah,
so
for
all
the
streaming
folks
out
there.
If
you
have
like
lots
of
streams
going
on,
you
know
if
you're
connected
to
stream.
This
is
an
exciting
news.
Now
you
can
speed
up
your
queries,
make
sense
out
of
your
data
using
delta
lake,
exciting
awesome
and
then
danny.
A
Problem,
actually,
you
know
what
vinnie,
why
don't
you
go
ahead
and
figure
out
a
question?
I
did
want
to
add
one
little
tidbit
on
behalf
of
scott,
but
scott
did
forget
to
call
out
his
the
delta
standalone
project.
Actually
is
so
crucial
that
in
fact,
it's
not
just
the
flint
connector
that
actually
was.
A
It
actually
is
the
basis
for
the
hive
connector
right
that
was
recently
released
and
also
the
presto
db
reader,
okay,
so
the
press,
the
tv
reader,
which
we
announced,
I
think
in
december
sometime,
but
yes,
so
that's
already
been
merged
in,
and
so,
if
you're
specific
looking
for
it
is
part
of
the
269
snapshot
of
presto
db.
I
don't
know
why
I
remember
the
number,
but
it
is
what
it
is,
but
but
nevertheless,
so
it
is
actually
crucial.
A
Also
we're
working
with
the
pulsar
team
and
some
other
teams
using
the
exact
same
thing
for
integrations,
and
so
yeah
just
want
to
let
you
know
so,
but
this
is
how
we
actually
had
a
blog
post
that
we,
I
think
all
of
us
here
wrote
that
basically
was
what
we
deemed
the
ubiquity
of
delta
standalone.
That
actually
calls
us
out
so
just
want
to
add
that
little.
You
know
a
little
bit
of
call
out.
B
Thank
you
denny
for
adding
that
yeah.
So
one
question
to
you
will
be,
as
you
are
working
on
like
you
know,
publishing
this
roadmap
from
the
community.
What
are
some
of
the
highlights
from
the
roadmap
that
community
can
look
forward
to.
A
Sure,
ryan,
I'm
actually
going
to
target
you
first,
but
mainly,
but
I'll,
do
it
this
way.
While
we
talk
a
little
bit
about
the
optimize
features
just
because
I
know
you're
heavily
involved
with
those
so
about
the
basically,
the
performance
sentence,
specifically
optimize
optimize,
the
order
and
data
sniffing.
Let's
start
with
that
with
you,
ryan.
D
Yes,
so
first
we
will
try
to
open
source
the
particular
optimize
command,
which,
for
the
first
motion,
will
support
like
a
file
convection,
which
you
can
run
optimal
commander
to
compact
small
files,
and
on
the
next
step
we
will
support
the
writer,
for
example,
optimize
with
the
order.
The
other
is
kind
of
trying
to
sort
your
table
in
a
great
way.
So
you
we
can
do
like
a
like
a
better
data
skipping.
D
Basically,
we
also
are
going
to
open
source
like
a
fire
stakes
which
of
the
delta
rights.
We
are
also
generated
the
fastest
for
everything
you
write
to
the
delta
table,
and
then
we
will
leverage
these
five
states
to
do,
for
example,
data
skipping-
and
this
is
a
lot
there
will
be
a
lot
of
performance
improvements
here
and
we
are,
we
are
pretty
excited
to
see
like
how
fast
the
data
can
become.
After
all,
these
features
are
released.
B
Yep,
perfect
thanks
man,
so
there
was
a
z-order
component
in
the
delta
engine
which
was
available
in
the
databricks
runtime.
So
if
you
were
already
using
those
co-locating
of
data
through
the
order,
that's
now
gonna
be
available
in
delta
lake
yeah.
Any
other
call
outs,
denny.
A
Oh
sure,
so,
basically,
if
you
look
at
the
hopefully
you
all
have
the
link.
If
you
need
me
to
send
the
link
again
to
linkedin,
sometimes
it
doesn't
come
through
just
let
me
know
and
I'll
post
it
back
out
there,
but
so.
A
A
The
google
group
messages,
the
community
animes
we've
done
in
the
past
and
the
survey
we
had
this
huge
survey,
okay,
that
we
got
last
year
where
we
sent
out
700
t-shirts-
and
you
all
should
have
them
now
so
and
we
will
fix
that
by
the
way
so
they'll
get
out
faster.
The
we're
pretty
sure
this
is
what
you
all
want,
okay,
so,
but
it
is
a
discussion
so
by
all
means
please
chime
in
and
into
that
github
repo
and
add
comments,
things
that
we
may
have
missed.
A
D
Yeah,
I
can
talk
about
this
this.
Basically,
we
are
adding
the
support
to,
for
example,
you
can
rename
a
column
in
your
data
table
with
a
different
name,
or
you
can
just
drop
a
column
that
you
think
at
least
make
the
like
sensitive
data,
and
you
don't
need
this
column
many
more.
Then
you
can
try
to
drop
this
column,
which
is
like
a
a
pretty
like
a,
I
think,
a
long
asking
requirements
in
the
community
and
then
we
because
this
is
pretty
hard.
D
B
Perfect
greg
says:
databricks
is
going
to
be
a
long
project.
For
me,
that's
awesome
greg.
We
would
love
to
have
your
feedback
on
the
roadmap
as
well
as
you
as
it's
been
a
long-term
project.
For
you
great.
There
is
a
question
about
kafka,
delta,
intest
and
the
question
is
same
question
for
confluent
kafka,
source
and
singh
from
n2
delta
lake.
B
A
No
problem,
so
actually
we
recently,
I
think
late
last
year,
did
a
tech
talk
specifically
in
cockpit
delta,
ingest
I'll,
try
to
see
if
I
can
find
it
and
post
it
directly
inside
linkedin
here,
since
the
question
came
from
here,
so
the
question
basically
is:
can
I
go
ahead
and
write
from
kafka
directly
to
a
delta
lake,
okay
and
so?
And
so
yes,
you
absolutely
can
so
kafka
in
the
past.
A
A
lot
of
people
would
often
use
kafka
through
spark
streaming
structure
streaming
into
delta
lake,
and
yes,
that's
normal
attack
too,
but
but
the
real
ass
is
like
no,
no,
I
just
want
to
natively,
and
so
so,
some
of
our
delta
commanders
from
scribd
and
back
marker,
they
created
a
project
called
kafka
delta
ingest.
It's
actually
built
on
top
of
the
delta
rust
api.
So
it's
very,
very
memory
cool
okay,
like
in
terms
of
very
memory
optimized
and
so
long
story
short.
A
It
actually
goes
ahead
and
allows
you
to
take
your
coffee
topics
and
write
directly
to
the
to
delta
lake
following
the
delta
protocol,
and
so
it's
built
on
top
of
the.
As
I
noted
it's
built
on
the
delta
rest
api,
but
what's
interesting
about
the
delta
rest,
api
itself
can
read
it
can't
write,
but
the
kafka
delta
ingest
can
go
ahead
and
write.
A
So
one
of
the
call
outs
that
we
did
in
the
the
the
road
map
actually
is
the
delta
rust
folks
are
working
on
taking
the
high
level
apis
that
are
in
the
kafka,
delta,
ingest
and
merging
them
back
into
delta,
rust
itself
such
that
delta
rust.
You
can
go
ahead
and
read
and
write.
Why
is
that
important?
Because
in
addition
to
the
core
rust
api
being
able
to
do
that,
so
you
now
you
can
use
rust
to
go
ahead
and
read
and
write,
don't
forget
the
rest.
A
Api
also
has
python
bindings
and
also
ruby
bindings.
So
subsequently
that
means
you
can
go
ahead
and
write
to
delta,
using
through
the
rust
api
use
it
with
python
and
ruby.
Subsequently.
Now,
of
course,
there's
still
a
little
bit
work
to
do.
You
have
to
update
the
bindings.
We
will
always
invite
you
all
to
go
ahead
and
join
us
in
the
community
to
talk
to
it.
A
We
actually
have
delta
rust
meetings
every
two
weeks,
so
if
you
want
to
go
ahead
and
chat
with
us,
then
by
the
same
token,
there's
the
delta
rush
channel
just
join
us
there
too.
So
hopefully
that
answers
that
question
I
I
did
want
to
call
out
actually-
and
this
is
the
scott
now
also
on
the
the
roadmap.
We
had
called
out
something
super
important
in
this
case.
It's
the
s3
multi-cluster
rights,
so
scott
you've
been
super
involved
with
that
project
working
with
the
community
as
well.
C
Yeah
for
sure
jenny
thanks
so
late
last
year
we
had
a
and
also
open
source
contributor,
make
it
pr
to
add
s3
multi-cluster
write
functionality
to
delta
lake.
So
what
this
problem
is
essentially
solving
is
the
fact
that
s3
doesn't
give
us
one
of
the
critical
things
that
dealt
lake
needs
when
performing
rights,
which
is
mutual
exclusion.
C
So
the
way
that
this
open
source
contributor
solved,
that
is
by
using
another
external
store
dynamodb,
to
give
us
that
mutual
exclusion,
so
they
they
created
the
pr-
and
I
was
I
was
engineer
on
the
ecosystem
team
that
was,
that
was
spent
to
review
it
and
tweak
the
design
a
little
bit,
and
so
now
we're
fully
working
with
them.
C
C
C
A
Apologize,
I'm
actually
looking
at
the
youtube
link
and
since
it
involves
scott,
I'm
just
gonna
stick
with
him
and
then
send
it
back
to
you.
Vinnie,
there's
a
great
question
from
oliver
any
visibility
for
the
standalone
connector,
the
the
one.
You
know
your
project
for
scala,
2.13,
basic
spark,
3.x,
just
any
visibility
on
that.
C
So
2.13,
I
believe,
was
actually
added
recently
and
the
next
connectors
release
like
that
will
be
available
and
in
terms
of
spark
integration,
you
don't
need
to
worry
about
spark
integration,
because
the
standalone
doesn't
use
spark
so
no
dependency
issues
there.
So
yeah
the
next
connectors
you're,
not
4.0
release
you
can
you
can
expect
scala
213.,
I'm
great
that
someone
asked
about
that.
Actually,
because
we
were
just
talking
about
that
in
one
of
my
meetings.
So
I'm
glad
someone
cares.
B
Awesome
there
are
a
few
questions
about
what's
the
topic,
so
these
are
the
delta
office
hours,
so
delta
lake
is
a
project
for
the
community
who
are
doing
you
know,
data
pipelines,
so
any
you
know
any
challenges
you
face:
building
delta
lake
or
setting
up
your
pipeline
or
maybe
even
like,
connecting
to
other
tools
in
the
big
data
ecosystem.
Please
bring
that
away
and
we
always
host
the
sessions
we
also
host,
like
you
know
some
demos,
as
well
as
tech
talks
on
specific
topics.
B
So
if
you
are
just
getting
started-
and
you
want
to
learn
more
about
the
delta
lake
project,
you
can
definitely
you
know,
tune
in
on
our
youtube
or
tune
in
to
our
slack
channel
google
groups
etc.
B
Hope
that
answers
some
of
your
questions
on
the
comments
and
then
there
was
a
question
about
I'm
currently
needing
to
transform
the
spark
delta
data
frame
to
pandas
data
frame
to
torch
for
modeling.
However,
the
data
size
is
too
big
for
trying
to
call
two
pandas
function.
Is
there
a
way
to
get
around
this?
I
think
the
question
is
around
like
how
you
can
use
the
pandas
frame
for
the
big
data,
so
I
posted
a
link
on
this
last
youtube
channel.
B
Please
check
that
out.
There
is
a
config
release
on
the
pi
pi
page,
but
anybody
from
the
panel
have
you
seen
this
issue
occurring
with
big
data
on
pi
data.
Sorry,
on
python,
api
is
koalas
the
right
option,
danny.
A
It
should
be
qualis
is
the
right
option,
though.
We've
called
it
now.
The
pan
is
api
for
spark,
but
this
is
a
delta
lake,
since,
let's
skip
the
spark
questions
for
now,
if
that's
okay,
we
have
spark
sessions
separately
for
that,
so
let's
definitely
stick
to
delta
lake
yeah.
So
if,
if
there
are
no
other
questions,
one
thing
one
thing
we
can
definitely
do
is
also
talk
about
integrations
if
you'd
like
and
by
the
way
I
did
notice
folks
on
linkedin,
there's
there's
actually
two
small
problems.
A
One
thing
I
just
don't
realize
that
the
linkedin
actually
still
says
databricks
university
alliance,
so
maybe
that
may
have
explained
why
there
there
was
some
confusion.
So,
yes,
this
is
what
we're
currently
talking
about
delta
lake
and
then
shiv
our
one
of
our
old
buddies
from
our
previous
community.
Amaze,
yes,
go
ahead
and
ask
your
question:
please
pop
it
down
linkedin!
Meanwhile,
let
me
go
ahead
and
at
least
provide
a
little
context
on
the
integrations
aspect
of
the
the
delta
roadmap.
A
Okay
and
so
as
as
we
already
noted,
we've
already
talked
about
presto
db
and
we've
already
talked
about
flink.
Okay,
now
the
to
be
being
a
little
bit
more
specific
when
it
comes
to
flink.
What
we're
doing
is
we're
talking
about
it
from
the
standpoint
of
the
streams
api,
which
is
flank
1.12,
1.13,
well,
more
1.12,
anyways,
we're
also
currently
working
in
terms
of
working
on
a
timeline.
A
I'd
say
around
q2
q3,
the
flink
sync
for
the
table
api
as
well,
which
which
is
basically
we're
talking
about
1.14
1.15
versus
apache,
flink,
okay,
so
so
for
those
folks
that
are
interested
inside
the
github
repo,
we
actually
also
link
directly
to
the
flink
delta
connector
channel.
So
you
can
talk
to
us
and
work
with
us
there,
whether
you
want
to
test
you
whether
when
to
contribute
things
of
that
nature.
A
Okay,
there
is
an
apache
data
delta
source
for
apache
pulsar,
the
the
beta
code's
actually
already
up
and
running,
but
they're
updating
it
to
work
with
the
latest
version
of
delta
standalones
for
member
optimization.
So
once
that's
done,
we'll
be
able
to
announce
that
as
well
so,
but
the
code
base
is
actually
already
sitting
in
the
apache
pulsar
repo.
Let's
see
what
else
we
have.
A
Oh
just
as
scott
noted,
we
also
are
talking
about
the
delta
source
for
apache
flink,
okay,
I.e
the
ability
for
apache
fund
to
read
from
delta
right
now.
The
thing
we've
been
talking
about
is
writing
to
apache
flint
for
apache
flink
to
right
to
delta.
Now
we're
going
to
be
talking
about
delta
being
able
to
apache
flink
to
read
from
delta,
so
that
is
the
delta
source
for
apache
flank.
That
is
also
targeted
for
q2q3
time
frame
as
well.
A
Okay,
I
already
talked
about
the
delta
rust
writer,
so
that's
cool
with
that
one,
but
then
the
other
key
thing-
and
actually
this
is
just
literally
an
update-
the
community
we've
specifically
florian.
So
I'm
going
to
call
our
buddy
florian
and
hopefully
I
can
get
him
to
come
to
our
next
community
ama.
A
So
he
can
go
ahead
and
ask
questions
so
because
he's
much
more
knowledgeable
design
than
I
am,
but
we
actually
have
started
working
on
the
idea
of
a
delta
source
for
google
bigquery,
okay,
so
that
basically
allows
bigquery
to
natively
read
delta
lake
tables.
Okay,
so
we
actually
already
created
a
delta
bigquery
connector
channel.
I
think,
but
there'll
be
more
about
that
soon,
but
the
we're
targeting
also
q2q3
but
florian
is
taking
the
lead
on
this
one
he's
one
of
the
delta
committers
who
is
based
out
of
back
market
in
france.
A
B
Great
thanks
for
the
insights
jenny
on
the
connectors
now
next
question
is
any
plans
for
point.
Point-In-Time
join
queries
to
be
supported,
natively
by
the
delta.
A
D
No,
I
maybe
I
think
we
need
some
clarification
about
what
korean
time
joint
means
is.
It
is
at
least
a
spark
feature.
Probabilities
should
a
scar
ask
to
spark
no
zero
leg.
I
think
for
basically
it's
territory
that
doesn't
like
to
do
like
a
single
blending
of
this
stuff.
We
are
basically
relying
on
like
a
spark
to
build
a
single
plane
and
it's
cute
and
then
just
do
it
from
delta
right
to
delta.
A
A
B
Sorry
about
that,
okay,
so
I
think
the
next
question
is
when
z
order
a
table
and
do
a
select
statement
on
it
using
my
z,
ordered
column
in
the
where
clause
it
is
super
fast.
But
if
I
do
a
where
value
in
and
then
put
the
list
of
values,
it
takes
the
same
time
as
individual
queries
would
take.
Why
is
this
so.
D
B
Got
it?
Thank
you,
ryan.
That's
a
helpful
suggestion
and
he
also
responded
that
you
know
skip
if
it's
not
relevant.
He
can
reach
us
out
on
the
slack.
Thank
you
awesome
and
then
there
are
a
few
questions
about
data
bricks,
specific
delta,
so
I'm
gonna
post
a
link
in
the
linkedin
chat
where
you
can
ask
and
get
involved
in
databricks
community
as
well.
So
I
let
me
you
know,
take
the
next
question
so.
A
Yeah,
I
suggest,
being
sickly,
because
we
only
have
two
minutes
left.
Why
don't
we
just
answer
the
last
one
more
question
and
I
just
noticed
the
question
youtube.
So
why
don't
we
just
do
that?
One
which
is
alexandra
yeah
had
basically
said
he
saw
zeorda
on
the
robot,
it'll
be
open
source
and
yes,
the
the
simple
answer
to
that
question
is
yes,
it
will
be
open
source
we're
targeting.
A
D
A
D
A
Optimize
falcom
action,
q1
z
order,
specifically
for
q2,
so
yes,
it
is
by
all
means:
please
go
ahead
and
chime
in
directly
into
the
github
or
delta
user
slack
for
that
matter.
By
the
way
there
is
a
2022
h1
roadmap
channel
as
well.
So
you
can
even
just
chime
in
there
specifically
to
ask
us
questions
on
that,
but
go
ahead
and
chime
us
there
because
yeah
it
is,
it
is
unequivocally
being
open
source.
We
are
working
with
the
community
rapidly
on
this
as
well,
so
yeah
so
yeah.
A
B
Yes,
thank
you
denny
yeah
and
I
posted
some
links
in
the
chat.
Hopefully
you
find
them
useful.
I
think
a
lot
of
your
questions
are
surrounded
by
you
know
some
connectors,
as
well
as
the
latest
in
the
roadmap.
So
please
follow
us
on
those
link,
links
and,
as
just
have
a
discussion
with
us.
Another
cool
thing
is,
you
know,
as
danny
mentioned,
I
think
in
the
q1
and
the
main
highlight
is
data
skipping.
I
think
I'm
really
excited
for
that
feature
so
anyway.
B
C
I
just
thanks
for
having
us
vinnie
glad
to
answer
everyone's
questions
and
see
everyone
in
two
weeks.