►
Description
As a follow up to our session "Why did we migrate to a Data Lakehouse on Delta Lake for T-Mobile Data Science and Analytics Team", Robert Thompson and Geoff Freeman, Members of Technical Staff at T-Mobile continue their in-person discussion with Denny Lee on how their data lakehouse improves their data science and data analytics efforts.
Quick Links
Blog: https://delta.io/blog/2022-09-14-why-migrate-lakehouse-delta-lake-tmo-dsna/
Join us on Slack: https://go.delta.io/slack
Join the Google Group: https://groups.google.com/forum/#!forum/delta-users
A
To
get
the
LinkedIn
and
the
the
YouTube
links
to
get
them
up
and
running
so,
but
we
are
live
together
actually
at
T-Mobile
campus
in
cloudy
Bellevue
or
Factoria
Washington
factorial
area
in
Bellevue
Washington,
so
give
us
a
one
more
or
two
more
minutes
to
go.
Get
ourselves
up
and
running
and
chime
in
on
through
the
LinkedIn
or
YouTube
or
WebEx.
Tell
us
where
you're
based
out
of
like
I,
said
we're.
The
three
of
us
are
live
together,
based
out
of
the
cloudy
Bellevue
Washington
area,
Factoria.
A
A
A
All
right
so
did
we
sorry
I
since
I
haven't
gotten
the
LinkedIn,
their
YouTube
links
yet
I'm
just
wondering.
A
All
right,
if
you
can
just
ask
Carly,
to
send
me
the
LinkedIn,
where
we're
good
to.
A
Send
me
by
slack:
where
are
we
good
to
go?
Let's
see,
okay,
perfect!
Thank
you
kindly
all
right.
Well
then,
let's
start
the
show,
while
we're
waiting
for
everything
hi
there,
my
name
is
Denny
Lee.
You
are
currently
live
in
Cloudy
Factory,
belly
Washington
at
T-Mobile
campus.
A
This
is
the
implementing
data
lake
house
for
improved
data
science
and
analytics
at
T-Mobile
I'll
start
off
by
actually
having
our
gentleman
here
to
my
left
and
my
right,
I
don't
know
which
way
you
guys
seen
it
to
introduce
themselves
a
little
bit
about
who
they
are
and
why
we're
even
talking
to
them
today.
B
My
name
is
Jeff
Freeman
I'm,
a
Solutions
architect
on
the
procurement
team
at
T-Mobile.
I
have
been
here
for
about
three
years
I
before
here,
I
was
at
Cruz
for
a
little
while
building
autonomous
vehicles
before
that
I
was
at
Microsoft
for
13
years.
My
background
is
in
cloud
computing
and
data
warehousing.
D
I
am
Robert
Thompson
I'm,
also
a
Solutions
architect
here
at
T-Mobile
same
team.
We
our
desk
side
by
side.
D
A
Stuff
all
right:
well,
hey!
Let's,
we
actually
had
a
a
webinar
I
want
to
say
a
year
ago
now,
but
let's
rehash
a
little
bit
that
I
actually
want
to
rehash
by
starting
you
and
saying
like
a
little
bit
of
your
history,
because
you
as
opposed
to
me,
which
is
very
boring,
like
you
all,
have
very
interesting
Histories
on
how
you
got
into
this
world
in
the
first
place.
So
since
Robert
you
spoke
last,
why
don't
you
start
off
a
little
bit?
D
Don't
know
computer
science
is
what
I've
always
wanted
to
do.
So
you
know
I
got
a
computer
when
I
was
like
10
and
self-taught.
Pretty
much
I
went
to
school
in
Louisiana
at
Louisiana
Tech
for
computer
science
did
that
for
a
little
while
and
then
left
and
went
to
LA
worked
on
some
planes
worked
on
the
prince
of
Saudi
Arabia's
plane.
That's
the
most
famous
one,
a.
D
D
A
Yeah,
that's
your
history!
Yeah,
let's
see
if
we
can
shut
off
your
video
because
it
looks
like
your
video
is
interfering
and
it's
only
showing
yours
as
opposed
to
the
panel,
so
it
looks
like
the
cameras
turned
off
right
or
is
the
camera
on
right
now,
of
course
you
see
it.
D
A
At
least
on
right
now,
for
some
reason,
we
can
only
see
you
Robert
on
the
restream
for
that's
going
on
LinkedIn,
so
I
want
to
shut
off
yours
just
to
see
what
happens.
No!
No!
It's
that
I'm!
Sorry.
D
C
A
So
I'm
saying
why
don't
you
shut
it
off
and
see
if
that
helps,
I
can
use
that
yeah.
Sorry,
we're
still
working
on
the
technical
stuff
see
if
this
actually
helps
us,
okay,
bring
it
up.
If
only
we
had
some
technical
people
in
here
I
know
I
wish
we
had
some
people
that
knew
what
they
were
doing,
but
apparently
that's
not
us.
A
Yes,
apparently
not
so
about
product
qualities,
all
right,
so
yeah
right
now
we're
gonna
wait
a
couple
of
minutes
just
to
see
if
this
happens,
because
there's
usually
about
a
minute
delay
between
what's
broadcast
on
LinkedIn.
That's
also,
yes,
so
because
right.
D
A
You
did
oh,
it
looks
like
we're
good,
okay,
all
right,
yeah
yeah,
all
right.
We
are
okay
for
people
who
are
on
LinkedIn
YouTube,
please
validate,
and
let
us
know
that
you
actually
are
seeing
us
right
now.
I
believe
you
are
now,
but
that's
we
want
to
confirm
that
so
just
chime
in
on
the
the
comments
please
I'm.
D
A
Log
into
link
YouTubers
say
that
I
haven't
done
that
yet
so
all
right!
Well,
meanwhile,
as
I
wait
for
that
well,
Jeff,
why
don't
you
talk
a
little
bit
about
yourself,
your
history,
how
you
got
involved
in
all
this
other
stuff.
B
B
What
do
I
like
to
do,
I
like
to
cook
and
I
like
to
play
with
computers,
and
it
turns
out
I,
did
a
little
like
pros
and
cons
and
it
was
pretty
heavily
weighted
for
the
computers,
so
so
I
I,
my
first
programming
job
was
at
a
startup
called
complete
in
Portland
Oregon,
where
I
was
the
back
end
and
data
access
layer
programmer
I
worked
there
for
a
few
years
and
then
I
decided
I
was
going
to
move
to
Seattle
just
to
see
what
would
happen
and
I
got
a
job
at
Microsoft,
working
for
Sasha,
Berger
and
Moshe
postmansky,
and
the
first
thing
they
said
was
hey
we're
building
a
distributed
version
of
analysis.
B
And
it's
about
that
time
that
I
got
introduced
to
Denny
and
Denny
had
some
experience
at
building
absurdly
large
things.
At
that
point,.
B
Yeah,
that
would
that
would
be
pot
calling
kettle
black
yeah
and
then
worked
for
being
for
a
little
while.
That's
where
I
met
Robert,
that's
where
Denny
gave
us
like
the
largest
machine
that
had
ever
been
built
for
testing
out
tabular
models
and.
B
I
mean
yeah
and
then
and
then,
while
I
was
there.
So
I
worked
on
that
for
Bing
for
a
while
and
then
I
was
working
at
Azure
on
the
data
Lake
analytics
team,
which
is
for
anybody
who
used
U
SQL
all
three
of
you
but
learned
a
lot
about
about
cloud
data
processing.
There
went
to
a
cruise
where,
basically,
like
the
the
amount
of
data
that
was
coming
off
of
these
cars,
it
was
just
truly
absurd.
B
So
I
basically
built
giant
data
shovels
in
a
catalog
and
when
I
left
there
Robert
was
was
telling
me
he's
doing
some
interesting
things.
Having
some
interesting
problems
at
T-Mobile
and
I
decided
I'd,
you
know
I.
C
A
A
Of
it
right,
which
is
basically
just
a
small
rehash
for
folks
that
didn't
watch
the
previous
video
like
what
were
the
problems
that
you
originally
were
facing
and
like.
Basically
when
Robert,
you
said:
hey
Jeff,
come
join
us
right,
and
what
did
you
basically
do
the
shift
into
like
what?
What
was
the
transition
that
you
ended
up
doing
and.
A
D
Start
with
that,
we
had
a
bunch
of
data
silos
and
no
one
could
tell
you
the
source
of
Truth
or
a
particular
attribute
like
where
did
this
come
from?
I
I
got
it
from
here.
Well,
where
did
that
come
from
I.
D
So
there
was
no
way
to
know
the
source
of
Truth
for
pretty
much
anything
like
yeah
I
mean
you
had
a
couple
of
source
systems,
but
it
was.
It
was
tough
to
get
access
to
those
systems.
You
were
told
to
go
here
or
here
or
here
and
then.
C
D
D
B
The
big
thing,
the
the
big
source
of
contention
that
we
had
was
we
had
analytical
workloads
and
operational
workloads,
trying
to
run
at
the
same
time
on
this
Azure
data
warehouse
that
you
know
isn't
isn't
really
meant
for
that
sort
of
combination
of
workloads
and
and
the
the
first
thing
we
identified
is
like
Hey.
How
do
we?
How
do
we?
We
can't
scale
this
up
any
further,
not
like
it,
wouldn't
be
worth
the
the
money?
B
We
can
still
use
the
data
warehouse
for
presentation,
but
we'll
just
move
those
etls
off,
and
so
we
did
a
proof
of
concept
of
one
of
our
allocation
programs
where
we
just
did
it
all
in
spark,
so
we
just
compiled
it
all
at
spark
using
databricks
and
then
we
output
it
to
Delta
tables,
and
then
we
sourced
the
the
actual
program
that
was
reading
from
that
from
those
Delta
tables
and
a
huge,
immediate
benefits
like
that.
B
A
Should
do
this
so
once
you,
when
you
talk
about
probably
scale
like
I'm
curious?
What
what
are,
what?
What
are
the
type
of
numbers
we're
talking
about,
like
the
amount
of
data
we're
talking
about
like
yeah,
so.
B
It's
it
honestly,
it's
the
amount
of
data,
for
that
is
not
huge.
We're
talking
about
I,
don't
know,
P3
would
take
it
overall
was
probably
analyzing.
30
gigabytes
of
data
gotta
go
right.
The
problem
was
not
the
amount
of
data.
The
problem
was
the
the
workload
contention
that
we
were
seeing
on
a
dedicated
Warehouse
compliance
gotcha.
B
Right,
yeah
yeah,
the
the
the
the
major
problem
we
were
having
was
was
concurrency
both
in
number
of
users,
but
also
in
the
types
of
of
ETL,
frequently
block.
A
Each
other
right,
gotcha.
A
B
Also,
we
would
have
operational
workloads
would
be
blocking
the
the
ETL
okay
and
then
analytical
workloads
were
waiting
on
the
ETL
to
run.
So
you
have
so
like
how
do
we?
How
do
we
continue
to
support
these
operational
workloads
that
people
had
built
organically
under
you
know,
without
necessarily
having
the
the
authorization
to
do
so,
but
I
mean
not
not
from
a
probation
standpoint
but
like
when
all
this
data
was
centralized
here,
a
lot
of
this
data
had
been
centralized
for
the
first
time
and
people
just
were
like.
Oh
my
God.
B
We
can
build
such
powerful
reporting
off
of
this
right
that
that
they
would
go
build
reports
without
having
like
an
SLO
agreement
right,
but
by
the
time
that
that
they
were
told
hey
this
is
you
know
we
did
not
agree
to
support
this
right.
A
B
B
So
the
the
combination
of
of
analytical
workloads
and
operational
workloads
would
cause
all
this
contention,
both
from
the
number
of
people
that
you
can
have
concurrently
querying
a
data
warehouse,
but
also
from
just
is
the
workload
going
to
complete
right.
C
B
The
so
the
biggest
thing,
the
biggest
driver
for
us
to
move
to
a
lake
house
architecture
from
the
very
get-go
was
just
workload:
isolation
right
the
fact
that
it
scales
out
linearly
instead
of
up
what
is
also
a
huge
bonus.
The
fact
that
now
we
don't
have
to
do
as
much
data
engineering
to
move
data
around
was
a
huge
bonus,
but
the
the
real
Crux
of
what
drove
us
in
that
direction
was
concurrency
and.
A
So
I
guess
there's
an
implication
also
that
your
environments
are
very
very
heterogeneous
to
begin
with,
anyways
right,
so
you've
got
lots
of
different
systems
trying
to
access
this
data,
so
you've
got
lots
of
different
workloads
trying
to
process
the
same
update
into
whatever
to
the
same
tables
set
of
tables,
and
you
also
have
basically
a
very
diverse
environment.
That's
actually
query
these
tables.
At
the
exact
same
time,
you
provide
a
little
bit
of
context.
C
D
C
D
D
You
have
to
pick
the
proper
location
to
build
the
Tower,
because
if
you
build
it
in
the
wrong
spot
and
the
fibers
on
the
other
side
of
the
river
and
you
have
to
drill
a
hole
under
the
river
in
order
to
run
fiber,
that's
a
problem
right.
So
all
of
these
things
go
into.
You
know.
Where
should
I
build
the
next
Tower.
A
So
basically
you're
saying
like
from
the
standpoint
of
like
the
data
that
you're
centralizing
at
this
point
before
this.
Basically,
everybody
had
to
go
to
very
different
environments,
yes
to
access
like
to
do
the
logistics
of
how
to
even
build
a
tower.
So
basically,
it's
like
okay,
if
I
I
can
access
one
system
that
says,
okay
am
I
legally
left
purchasing
this.
This
land
I
have
to
access
a
completely
different
system.
That
says
where
the
heck,
the
fiber
is
yeah.
That's
and
that's
just
the
simple
problem,
let
alone
everything
else
right,
yeah,
okay
got
it.
B
You
know
you
know
DocuSign
or
whatever
contracts
in
T-Mobile
until
like
2017
was
just
a
marketing
company
with
a
technology
yeah.
C
B
C
A
D
C
B
D
A
Not
to
mention
it
I'm,
just
thinking
to
myself
the
different
states,
different
Counties
have
different
records
with
different
formats
that
provide
you.
The
information
just
on
the
land
leases,
let
alone
like
the
locations
of
land,
let
alone
like
the
rights
versus
the
legal
documentation
that
goes
with
it
and
yeah
yeah
they're,
all
in
different
forms
the
smoking
standardization
that
we
all
know.
If
you
go
to
a
single
County
within
the
county,
it's
actually
going
to
mess
you
up,
let
alone
yeah
go
ahead
and
cross
counties
across
States,
okay
wow.
A
C
A
Yeah
right
so
yeah,
because
you
need
to
basically
somehow
standardize
all
of
this
data,
so
people
can
actually
go
query
this
data.
So
that's
why
your
ETL
processes
were
so
complicated
are
not
so
yeah
are
still
complicated
because
it's
actually
how
to
try
to
standardize
and
conform.
All
of
the
forget
about
any
of
the
facts,
just
the
dimensions
alone,.
C
D
A
That
is
okay,
so
damn,
okay.
So
that's
why
it
was
crucial
for
this
lake
house,
because,
basically
before
this,
if
you
actually
had
to
go
to
all
these
different
systems
in
order
to
be
able
to
go
from
300
or
sorry
100
a
month
to
1500
a
week
that
wouldn't
even
be
possible,
like
it's
not
even
physically
possible
to
query
that
it
wasn't.
C
A
Business,
there's
always
that
one
person
or
that
you
know
Gaga
that
basically
has
the
Excel
spreadsheet.
You
know
this
goes
back
to
our
power
pivot
power
bi
days
like
the
one
guy
or
the
one
gal
right
that
knew
the
business
domain
and
he
or
she
would
be
able
to
go
ahead
and
like
no,
oh
yeah,
I'm
gonna
grab
these
20
different
sources
and
press
these
things.
I
want
to
create
this
one
spreadsheet
that
merges
all
together
and
then
boom
you're
good
to
go
except
I'm.
D
A
A
A
You
know
if
we're
just
having
the
conversation
to
be
like
cool
I,
get
to
run
another
cool
pivot
report.
You
know
like
or
an
Excel
report
power,
RBI
report,
but
obviously
that's
no
longer
the
case
for
you
guys
right.
You
actually
have
other
scenarios
other
other
problems
that
you
have
to
solve
so
up
to
like,
in
other
words,
what
you
did
per
the
last
webinar
was
described
like
in
essence,
how
you're
able
to
upscale
from
100
a
month
to
1500
a
week,
okay
great.
What
are
the
problems.
A
B
B
We
have
got
our
system
to
the
point
where
it's
easy
for
us
to
ingest
new
data
and
and
get
that
data
and
run
etls
that
are
going
to
shape
it
and
and
present
it.
The
the
two
big
problems
that
we're
having
right
now
are:
how
do
we
get
the
rest
of
the
Enterprise
to
do
that?.
B
That's
one
track
and
the
Enterprises
is,
is
going
down
that
path
with
some
velocity,
which
is
great.
The
and
the
other
path
is.
How
do
we
then
make
this
this
data
easily
discoverable,
okay
and
and
give
people
confidence
in
the
quality
of
that
data
where
it's
come
from
and
and
what
it
means.
Okay,.
A
So,
let's,
let's
actually,
let's
break
those
two
things
out
absolutely
so
with
data
discoverability.
The
last
thing
you
just
said
and
before
that
you're
talking
about
like
basically
making
this
available
for
the
Enterprise
right
is
this:
by
a
chance,
the
template,
CDF
stuff
that
we
had
been
talking
about:
okay,
so
yeah.
Let's
definitely
focus
on
that.
Let's
talk
about
actually
yeah
for
the
audience.
A
C
C
D
We're
actually
trying
to
templatize
our
platform
all
together,
okay,
so
that
we
can
hand
off
templates
and
say
okay,
secure
team
over
here.
That
does
marketing
sure
you
should
also
build
the
same
type
of
platform
that
we
have
in
order
to
service
that
domain.
B
Got
it
the
benefit
now
that
all
of
your
data
is
in
Delta
tables,
which
makes
it
really
easy
for
us
to
share
across
without
having
to
move
stuff
like
right?
So
you
get
all
the
data
Mobility
off
that
so
the
the
templatized
change
data
feed,
though
that
you
were
that
that
we
were
talking
about
before,
is
how
do
we
use
the
power
of
Delta
tables
and
the
change
data
feed
in
order
to
there
are
still
systems
that
require
their
own
back
end
or
that
require
some
some
way
of
of
reading
data.
B
Do
whether
that's
Erp
or
is
that
sort
of
web
app
or
whatever
like
they
need
a
a
a
live
database
that
they
can
go
query,
and
so
how
do
we?
How
do
we
take
what's
in
the
lake
house
and
most
efficiently
get
it
there,
and
so
what
we
just
built?
The
first
version
of
for
our
system
is:
how
do
we
read
from
the
change
data
feed
so
as
at
what
level
of
expertise
are
we
talking
to
here
with
with
everybody
like
if
people
need
me
to
explain
to
feed
I
can
but.
B
Yeah
so
with
Delta
tables,
as
of
Delta
2.0
great
job
Denny,
yes,
the
when
you
make
any
sort
of
changes
to
a
Delta
table,
we've
always
been
able
to
see
a
point
in
time
with
a
Delta
table,
but
Delta
table
2.0
introduced
the
change
data
feed,
which
makes
it
possible
not
to
say
show
me
the
state
of
the
table
at
a
given
time,
but
show
me
everything.
That's
changed
since
that
time.
So
you
can
say
the
last
time
I
loaded
my
table
was
yesterday
at.
C
B
Show
me
all
the
things
that
have
changed
since
yesterday
at
noon,
and
then
you
just
get
a
little
output.
That
says
this
row
is
added.
This
row
is
deleted.
Here's
the
new
version
of
this
row
and
you
can
load
that
into
a
staging
table
in
your
system
and
then
do
a
merge
depending
on
you
know.
Whatever
type
of
system
it
is
the
syntax
May
differ,
but
the
the
concept
is
the
same
I'm
going
to
create
a
table
that
looks
just
like
my
Delta
table
or
my
parquet
file.
B
That's
that
says,
here's
all
the
things
that
have
changed.
You
load
it
to
that
staging
table,
and
then
you
use
a
merge
based
on
whatever
Keys
you
have
on
that
table
and
instead
of
every
time
having
to
have
a
like
either
a
rigid
ETL
where
you
have
to
have
strict
standards
of
timing
or
you
have
to
have
strict
standards
of
structure,
you
can
just
say
point
it
to
a
table,
make
a
table
in
the
database.
B
It
looks
just
like
that
and
then
on
a
regular
basis,
run
something
that
keeps
it
in
sync,
and
it
becomes
very,
very
easy
to
say
all
of
these
tables
that
I
have
in
my
in.
B
A
A
B
A
About
getting
it
across
to
the
Enterprise,
but
getting
all
these
folks
to
be
able
to
jump
on
board,
so
they
can
leverage
exactly
these
systems,
so
they
can
go
and
and
do
their
own,
not
just
like
operational
data
scaling
and
concurrency,
but
also
their
own
business.
Scaling
like
it
goes
back
to
like
I'm,
going
to
repeat
it
again:
100
a
month
to
100
Towers
a
month
right
like
to
1500
a
week
right,
that's
a
that's
a
massively
large
job
that
blows
my
mind
away
still
right.
A
B
B
They
just
set
up
their
power
bi
or
their
Excel
or
whatever,
to
point
to
our
serverless
endpoint,
and
they
can
query
the
data
Lake
directly
right,
but
if
they
do
have
something
where
they
want
to
they're
setting
up
a
machine
learning,
that's
going
to
take
all
of
these
the
the
towers
that
are
going
online
and
then
find
all
the
people
you
know
based
off
of
you
know,
Pitney,
bows
or
whatever,
then
then,
then,
that
all
they
do
is
they
set
up?
B
D
A
C
C
A
We've
had
this
specific
meaning
any
time
where
we're
talking
about
the
fact
that
people
need
to
be
trained
into
this
environment.
It's
there.
There
are
very
many
vendors
which
shall
not
be
named.
That
will
make
the
attempt
to
say:
oh,
let's
take
a
DBA,
a
or
less
debt,
and
they
can
automatically
be
a
data
engineer
or
data
scientist.
A
Yeah,
that's
never
going
to
work
like
everybody
is
trying
who
wants
to
learn
absolutely.
They
can
definitely
learn
that.
Don't
get
me
wrong,
and
so,
in
other
words,
if
you're
a
former
estat
or
a
former
DBA,
absolutely
you
should
absolutely
you
should
yeah
yeah,
absolutely
we're
not
discouraging
people,
quite
the
opposite,
we're
encouraging
people.
We
just
also
don't
want
to
tell
tell
people-
oh
yeah,
magically
tomorrow,
all
sudden
your
data,
that's
that's
the
only
economics
I
apologize
for
throwing
that
one
in
so.
B
In
all
seriousness,
T-Mobile
is
a
big
company
with
a
lot
of
with
a
lot
of
moving
Parts.
It's
gonna
take
a
while,
but
there's
a
lot
of
there's
a
lot
of
recognition
at
the
value
of
this,
and
so
there
are
a
number
of
teams
that
that,
even
if
they're
not
going
to
do
it
exactly
like,
we
have
have
already
moved
in
this
direction.
Right.
B
That's
important
actually
and
it's
important,
and
because
it's
already
showing
the
value
like
when,
when
when
teams
have
their
stuff
in
Delta
tables,
and
then
in
order
for
us
to
share
that
that
data
or
combine
it
all
we
have
to
do,
is
Grant
permissions
like
it
blows
people's
minds,
yeah
I'm
not
kidding
like
people
are
like.
Oh
my
gosh,
that
was
it.
Was
that
easy,
yes,
store
your
data
in
a
data
Lake
like
in
Delta
tables
and
sharing
data
becomes
like
a
click
of
the
button.
B
There
are,
you
know,
obviously
pieces
around
that
you
need
to
have
the
governance
in
place
right.
You
need
to
have
you
need
to
make
sure
that
the
the
the
way
in
which
you're
sharing
data
the
way
you're
controlling
access
is
something
that
you
can
audit,
and
that
makes
you
can
rationalize.
You
can
standardize
and.
D
B
Yeah
yeah,
but
the
the
the
key
there
is
is
that
now
you're
talking
about
a
procedural
hurdle
to
sharing
data
and
not
a
technological
right
right.
A
A
Make
sure
when
you
share
the
data,
you
know
exactly
who
you're
sharing
with
how
long
you're
sharing
with
the
whole
and
so
as
much
as
people
want
to
tell
you.
Oh
I'm
frustrated
like
no.
No,
that's
the
whole
point.
You
actually
have
both
the
scalability
and
then
your
frustrations
are
now
policy,
and
it's
a
good
frustration
to
have,
let's
be
very
clear.
I
want
to
be
as.
A
A
So,
okay,
so
in
some
ways,
I've
already
implied
where
why
data
Discovery
is
so
important
because
with
that
last
diatrib
alive
so
yeah
once
you
instead
of
me
doing
the
diaphragm.
D
So,
in
order
for
your
users
or
your
your
developers
to
trust
the
data,
if
they
build
a
report
and
they
go
to
a
an
executive
at
some
point
and
they're
like
the
executive,
every
time
is
going
to
say
where'd.
This
data
come
from.
C
D
D
D
Going
to
show
from
this
is
where
the
data
came
from,
and
this
is
how
it
got
to
you
and
we're
going
to
show
every
step
of
the
way
and
how
that
attribute
got
its
it's
number
or
its
value
right
right
so
and
and
that's
what
we're
working
on
right
or
yeah.
That's
that's!
What
we're
trying
to
build
into
this
whole
system
now
and
it's
being
a
little
cantankerous.
B
A
Like
it's
like
a
standard,
sap
or
Erp
system
of
some
type
right
and
I've
got
like
eight
different
sources
of
rest,
apis
or
whatever
else
that
I'm
at
and
data
warehouses,
I'm
dragging
data
into
the
lineage
on
just
that
alone
is
already
relatively
complicated.
In
your
case,
it
seems
like
it's
like
a
permutation
of
God
knows
what,
at
this.
D
A
B
You
had
a
bunch
of
the
people
who
were
the
domain
experts
and
and
doing
stuff
in
Excel
on
their
on
their
desktops
those
processes,
even
though
they
might
have
sourced
them,
or
like
pointed
them
back
to
the
data
that
that
was
centralized
in
our
Azure
data
warehouse
at
first.
Those
processes
still
live
on
right,
they're
still
driving
their
business
off
of
that
right.
But
what's
happened
once
you
have
all
that
data
in
the
same
place
is
previously,
those
people
were
answerable
to
nobody
right
who's,
going
to
tell
them
that
their
data
is
wrong.
A
B
But
now
that
you
have
all
this
data
together
and
you
can
do
reconciliation
right,
there
becomes
those
questions
about
like.
Oh,
what
is
the
what
is
the
correct
way
for
this
to
actually
be
bubbled
up
and
how
should
it
actually
be
structured
and-
and
even
as
you
create
the
correct
answer
to
that,
their
reports
still
live
on
yeah.
C
B
Right
and
now
that
all
this
data
is
centralized
and
anybody
can
just
go
find
it
by
by
you
know,
if
you
have
access
you
just
if,
then
you
just
go,
find
it
and
you're
like
oh.
This
is
this
is
the
way
that
that
so
and
so
has
been
doing
their
reporting,
and
they
gave
me
access
to
their
View
and
now
I'm
going
to
use
that
and
go
build
something
off
else
off
of
it.
The
Challenge
there
is.
B
B
Your
golden
table:
how
do
you
direct
them
to
your
golden
tables
right
and
and
amongst
all
of
these
different
things
that
have
been
floating
around?
How
do
you?
How
do
you
socialize
that,
and
so
one
of
the
big
things
that
we're
working
on
right
now
is
using
databricks,
Unity
catalog,
so
I
promise
I'm,
not
shielding
for
data
breaks,
I
do
not
get
paid
I!
Think
it's
I
think
it's
quite
a
lot
in
the
other
direction.
Actually,.
B
Well,
I
mean
compared
to
the
other
options:
it's
not
bad,
but
but
because
we're.
C
B
Doing
our
our
are
etls
in
data
bricks,
we're
onboarding
to
their
their
Unity
catalog.
That
makes
it
we've
already
had
a
data
catalog,
but
Unity
catalog
is
going
to
make
it
a
lot
easier
for
us
to
point.
People
to
this
is
the
actual
like
source
of
Truth,
and
this
is
the
easiest
place
to
get
it,
but
it
also
has
all
of
that
lineage
that
Bobby
was
talking
about
where
all
we
have
to
do
is
export
that
into
whatever
the
Enterprise
data
catalog
system
is
got.
C
B
And
that's,
and
and
with
that,
it
becomes
possible
for
people
to
recognize
oh
I've,
been
reporting
off
of
off
of
you
know,
Dougies
view
that
just
drives
his
particular
business
and
nobody
else's
and-
and
they
can
see
that,
oh
it
came
from
here-
it
came
from
here-
here's
the
actual
it's.
Instead,
here's
the
actual,
like
business,
approved
View,
and
we
can
go
redirect
from
there
right.
A
C
B
And
so
this
becomes.
This
becomes
great
for
reporting,
but
the
place
where
we're
really
looking
forward
it
towards
the
future.
Is
that
that
this
makes
it
possible
to
ensure
that
the
data
that
you're
using
to
drive
any
ml
processes
when
you
move
from
oh
we're,
reporting
on
what
happened
or
we're
going
to
predict
what
we
think
is
going
to
happen
as
a
batch
in
the
future.
B
That's
that's
the
piece
that
we're
looking
forward
to
with
that.
Discoverability
is
not
just.
How
do
we
write
better
reports
for
socks
compliance,
but
how
do
how
do
we
make
sure
that
the
the
automation
that's
is
is
what
we
think
it
is
right.
A
So
so
one
of
the
things
that
I
I
definitely
like
talking
about
when
it
comes
to
especially
now
that
we're
shifting
more
toward
the
machine
learning
side
of
the
house
right
is
that
there's
a
term
coined
as
explainable
AI
like
which
is
basically
this
idea
that
I'm
going
to
run
a
machine
learning
model.
That's
great!
A
That
I
have
some
idea
what
my
features
are,
but
I
still
need
to
even
be
able
to
explain
how
did
I
come
to
the
conclusion
that
I
got
in
the
first
place
like
because,
most
of
the
time,
a
lot
of
machine
learning
models
are
black
boxes
right.
So
you
have
no
idea.
I'll
use
the
decision
Tree
in
context
right.
You
have
no
idea
where
it
says,
like
a
alpha,
is
less
than
0.2.
C
A
Does
that
even
mean
to
say
that
also,
okay,
we're
gonna
forsake
argument,
go
Target
a
marketing
campaign
for
factorial
for
some
reason
you
know
like
that
that
type
of
thing
right
we
so
even
our
attempts
within
tradition
like
not
even
like
General
easy
machine
learning
models
there
we
go
like
decision
trees
comparatively
versus
AI,
like
especially
like
your
llms
or
whatever
else.
We
can't
explain
all
that
stuff
unless
we
actually
have
full
lineage
full
policies,
one
central
place
to
go.
C
A
D
That
is,
but
so
so
having
this
lineage.
Even
when
you
get
down
to
the
features
right
like
the
the
next
column
and
one
of
the
data
breaks
guys,
he
said
it
the
best
and
I
forgot
who
it
was,
but
he's
like
machine
learning
is
just
another
T
yep.
Yes,
yes,
it's
just
another
T
and
I
was
like
that's
perfect,
I'm,
going
to
steal
that
so
and
yes,
but
the
the
out.
The
output
of
that
t
is
another
table,
and
the
feature
is
just
a
column
in
that
table.
D
Right,
like
let's
just
break
it
down
and
make
it
oversimplify
right,
so
so
having
the
whole
thing
discoverable
from
end
to
end,
like
all
the
way
down
to
this
next
column.
So
this
next
guy
can
build
another
T
on
top
of
the
T
on
top
of
the
T
right
in
order,
I
mean
just
being
able
to
explain
that
all
the
way
through
is
powerful,
right,
I
can't
think
of
a
better
word,
but
we'll.
B
How
do
we
get
there
to
push
back
on
on
people
wearing
databricks
hats,
just
a
little
bit
I,
don't
think
we're
actually
going
to
like
centralize
it
all
in
in
UC.
Oh
yeah,
amazing
catalog,
because
a
big
part
of
the
of
the
automation
that
we
have
built
is
in
Azure
data
Factory.
So,
unfortunately,
ADF
doesn't
export.
B
Well,
so
so
he's
building
a
something:
that's
going
to
Output
it
to
open
lineage,
but
but.
A
B
Job
of
exporting
data
of
exporting
that
lineage,
so
people
will
still
go
to
a
Unity
catalog
to
discover
data,
but
for
for
a
lineage
perspective,
I
think
that
we're
going
to
be
I,
don't
know
if
we're
going
to
have
it
in
an
enterprise-wide
catalog.
B
Our
team
has
our
own
data
catalog,
but
to
bring
it
back
to
Delta
tables
I
mean
you
could
do
the
same
thing
with
astronomer
and
datacan
or
whatever
you
know
whatever
it
is.
Your
orchestration
engine
is
the
the
really
cool
thing
about
Delta
tables.
Is
that
it's
easy
to
automate
and
scale
out
and
as
soon
as
you're
doing
it
in
a
consistent
manner,
you're
only
having
to
figure
out
your
lineage
from
one
engine
right.
B
If,
if
you
have,
every
team
has
their
own
version
of
whatever
like
80
like,
if
every
pipeline
is
custom,
then
it's
going
to
be
impossible
for
you
to
standardize
that
reporting
right
right.
But
when
you
centralize
on
one
technology
like
Delta
tables,
and
then
you
scale
it
out
using
automation
using
whatever
your
automation
engine
is
going
to
be
right,
it
makes
it
possible
to
to
turn
your
attention
to
other
things
right
right.
So.
A
No
I
appreciate
you
doing
that
call
out,
because
that's
almost
the
whole
point,
like
you're
you're
gonna,
basically
choose
whatever
Technologies,
whether
it's
both
that
you're
able
to
whether
it's
a
limiting
factor
or
whether
it's
a
oh
now
I
can
do
something
that
I
couldn't
do.
The
whole
point
is
that
you're
not
limited
like
you
can
just
do
what
you
need
to
do
in
order
to
be
able
to
focus
on
the
next
thing
and
then
on
the
next
thing,
and
on
the
next
thing,
so
you
can
make
it
this
automation,
concept,
A,
Lot,
simpler!
A
B
Yeah
exactly
yeah,
one
of
the
things
I
like
to
I
like
to
tell
people
on
our
team
is,
is
that
I
want
to
code
myself
out
of
a
job.
C
B
So,
and
so
you
know
the
frustration
that
people
might
have
with
oh
now,
I
have
to
go
deal
with
governance
or,
or
you
know,
the
process
type
of
of
workflow
is
evidence
that
you've
done
a
good
job
coding
yourself
out
of
a
job
yeah,
and
now
you
get
to
turn
your
your
attention
to
this
process,
piece
which
might
not
be
as
exciting
as
data
engineering,
depending
on
what
sort
of
nerd
you
are,
but
but
now.
A
Yeah,
actually
with
that
I
think
that's
a
great
segue
to
end
it
actually.
So
no,
no
literally
the
lake
house
allows
you
to
focus
on
more
interesting
problems,
so
I
think
that's
a
great
way
if
you
have
any
other
questions,
please
do
chime
in
Bye.
By
this
token,
I
think
we're
actually
technically
even
a.
A
D
A
A
For
helping
us
out
with
all
of
us
and
Carly
behind
the
scenes,
thank
you
very
much
as
well,
so
without
further
Ado.
That
should
be
it
we're
done
for
today.
Thank
you
very
much.
Everybody.