►
From YouTube: NEW! Lineage Impact Analysis
Description
Gabe Lyons & Dexter Lee (Acryl Data) give a demo of Lineage Impact Analysis - using DataHub to understand the impact of changes on downstream dependencies.
Learn more about DataHub: https://datahubproject.io
Join us on Slack: http://slack.datahubproject.io
Follow us on Twitter: https://twitter.com/datahubproject
A
So
then
continuing
the
lineage
theme
dexter
and
I
are
going
to
talk
to
you
about
a
new
feature.
Lineage
impact
analysis.
I'm
really
excited
to
talk
about
this
feature
at
town
hall,
especially
because
I
know
this
has
just
been
requested
so
much
from
folks
in
the
community
and
I'm
really
happy
to
be
able
to
finally
present
it.
You
know:
we've
incorporated
a
lot
of
that
feedback
and
I
feel
like
this
is
a
really
collaborative
project
between
us
and
all
sorts
of
folks
in
the
community.
A
So
thank
you
and
I'm
excited
to
excited
to
demo
it.
So
what
is
a
lineage
impact
analysis?
So
essentially,
what
you
can
do
now
is
for
a
data
set.
You
can
go
and
view
a
collection
of
all
the
downstreams
of
your
data
set
together
in
one
grouping,
so
not
just
that
first
layer
that
we
were
showing
you
previously,
but
actually
we'll,
go
across
all
the
layers
and
bring
it
together
into
a
collection
in
this
collection.
A
You
can
view,
but
you
can
also
filter
it
using
all
different
types
of
filters
that
you
can
use
in
the
search
that
you're
used
to.
So
you
can
filter
by
tags
platform
entity
type.
Now,
with
the
recent
update
we
gave
you
can
filter
by
owner,
you
can
also
have
a
free
text
search
box
to
just
do
free
search
across
all
of
the
downstreams,
and
then
we
added
a
new
filter
just
for
the
impact
analysis
section,
which
is
the
level
of
dependency.
A
So
you
can
say
you
know
I
actually
want
to
see
only
things
that
are
two
layers
deep
or
three
layers
deep
away
from
the
current
entity
that
I'm
looking
at
so
then.
Finally,
once
you
have
that
collection,
you
can
they're
viewing
it
as
one
thing.
But
what
are
you
going
to
do?
What
do
you
want
to
do
with
it?
And
so
we've
added
the
ability
to
download
this
collection
that
you
filtered
as
a
csv.
You
can
take
it
with
you
and
then
go
do
any
other
action
based
on
that.
A
So
on
the
motivation
front,
I
think
about
two
different
main
use
cases
here.
So
there's
the
proactive
side
and
the
reactive
side
on
the
reactive
side,
that's
sort
of
the
data
ops
use
case.
Maybe
something's
gone
wrong
with
your
data
set
and
you
need
to
know
what
do
I
do
about
it?
Who
do
I
alert?
Who
do
I
am
for?
How
do
I
make
this
right,
so
you
can
go
in
and
find
who's
depending
on
your
data
set.
A
A
Who
do
you
have
to
discuss
this
change
with
who
might
be
impacted
in
a
similar
fashion,
and
you
know
how
who
do
you
essentially
want
to
reach
out
to
and
make
sure
that
you're
going
to
deprecate
a
data
set
or
change
a
column
or
do
a
backfill
or
something
like
that?
Who
do
you
need
to
talk
to
and
again
thanks
to
the
community
for
great
conversations
that
helped
us
understand
better
how
to
build
this
future
and
what
exactly
impact
analysis?
A
A
He
was
super
helpful
in
helping
us
clarify
how
we
can
make
this
most
useful,
all
right
so
now
I
want
to
go
into
a
live
demo
of
this
feature,
so
you
can
see
it
in
action,
so
I
brought
made
this
raw
events
kafka
data
set,
and
you
know
maybe
this
is
some
event
stream
that
we
have
going
on
and
all
of
a
sudden,
I
realize
this
event
stream's
delayed
and
I
need
to
know
what
to
do
about
it.
A
A
But
now
I
can
jump
into
the
impact
analysis
section
of
the
lineage
tab
and
see
not
just
that
first
layer
of
language,
but
all
the
different
layers
beyond
that,
and
so
I
jump
in
we'll
do
a
little
search
across
the
graph,
and
actually
I
have
to
give
a
shout
out
to
ibru.
This
is
evie's
little
animated
gif
that
they
contributed.
A
Now
is
our
loading
indicator
for
the
impact
analysis
section,
and
this
after
I
jump
into
that
impact
analysis
view.
I
see
all
the
different
entities,
all
the
different
layers
deep.
So
you
can
see
here
in
the
top
right
corner
of
the
entity.
Pro
is
how
far
away
the
connection
is
from
this
source
data
set.
So,
for
example,
dim
users
is
a
few
steps
away
and
I
can
open
this
filter
panel
and
say
you
know.
A
A
So
then,
just
as
one
is
the
end
of
the
demo,
but
as
one
final
easter
egg,
we
thought
this
csv
download
csv
feature
is
super
cool,
and
so
something
that
we
did
just
for
fun
was
added
to
the
search
page.
So
if
I
search
for
raw
events
here,
you'll
see
that
same
menu,
and
I
can
download
any
group
of
search
results
of
csv.
A
B
A
In
all
the
different
embedded
search
elements,
so
if
I
go
to,
for
example,
my
ownership
page,
you
can
get
downloaded
csv
there
and
whatever
filters
you've
applied,
are
going
to
be
included
here
as
well.
So
I'm
really
excited
to
hear
your
feedback
on
this
download
csv
universally.
You
know,
let
us
know
what
you
think
and
also,
of
course,
impact
analysis
as
well
yeah,
so
that's
the
demo.
A
What's
on
this
slide,
but
it's
also
determined
by
you
folks
in
the
community,
so
please
let
us
know,
try
it
out
and
I
really
look
forward
to
reading
your
feedback
now
I'll
hand
it
over
to
dexter,
and
he
can
talk
about
the
amazing
engineering
and
architecture
that
went
into
this
feature
before
you
switch
over.
Would
you
mind
talking
about
the
api
really
quickly?
Oh.
A
Yeah,
so,
in
addition
to
being
able
to
view
this
through
the
ui,
we
also
have
an
api
that
you
can
get
to
query
and
get
this
information
and
in
the
api,
it's
not
just
those
columns
that
you
see
in
the
csv,
but
you
can
fetch
any
metadata
about
these
entities,
so
you
can
programmatically
say
you
know.
I
want
to
get
all
the
down
streams
across
all
levels
for
this
data
set
and
just
give
me
you
know
you
can
provide
filters
and
search
queries,
just
like
you
can
in
the
ui.
A
B
Awesome
all
right,
so
let
me
go
really
quickly
through
how
the
back
end
works.
So
so,
in
order
to
do
this,
we
have
to
change
some
fundamentals
of
how
lineage
works
in
the
back
end.
Basically,
let
me
go
over
how
it
was
working
before
so
after
no
code
change
that
we
did.
B
Last
year,
we
were
able
to
put
relationship
annotations
on
our
models,
to
note
that
this
field
called
dataset
in
an
upstream
lineage
aspect,
means
it's
a
downstream
of
lineage
between
these
entity
types
and
these
relationship
annotations
are
converted
into
edges
in
the
graph
index.
So
these
are
some
example
edges
that
you
can
see
here.
For
example,
a
logging
event
is
a
downstream
of
sample
hive
data
set
data,
job
user
creation,
consumes
this
data
set
and
so
on
and
so
forth.
So
how
this
edge
is
provided
is
the
the
entity
on
the
left
side?
B
Is
the
source
earn
of
the
entity
and
the
entity
on
the
right
side?
Are
the
values
inside
the
aspects?
So
what
is
the
source
line?
What
is
the
value
in
the
aspect
decides
the
direction
of
the
edge
in
the
graph
index
now.
The
problem
is
that
this
is
not
necessarily
the
direction
of
the
edge
in
the
lineage
graph,
so
if
you
see
above
we
go
from
data
set
to
data
set,
and
then
data
set
to
data
job
and
so
on
in
the
lineage
graph
on
the
top
right.
B
B
Oh
sorry,
that's
a
that's
a
typo,
it
produces
this
data
set,
and
that
is
the
same
as
the
lineage
direction
on
the
graph
above.
So
what
we
had
to
do
before
was
for
front
end
to
white
list,
a
bunch
of
these
edges
and
figure
out
how
to
query
for
on
our
graph
service.
So
our
graph
service
has
no
knowledge
about
how
lineage
works,
and
it
just
has
knowledge
about
these
edges.
B
So
we
had
our
initial
set
of
iterations
to
go
to
the
next
slides
gabe
next
slide
to
build
this
lineage
registry,
where
it's
all
no
code
for
given
a
relationship
annotations,
we
add
a
few
fields
called
is
lineage
is
true
means
it
will
show
up
in
the
linear
graph,
and
this
is
an
actual
lineage
edge
and
some
other
metadata
about
this
relationship
so
that
we
can
build
a
lineage
registry
that
says,
given
an
entity
type.
What
are
the
upstream
data
sets?
B
How
do
we
query
for
them
in
the
graph
database
and
what
are
the
downstream
edges
and
how
do
we
query
for
them
in
the
graph
database?
So,
for
example,
if
you
see
this
data
job
lineage
registry,
it
says
that
to
get
the
upstream
entities
of
this
data
job,
you
need
to
look
for
these
type
of
edges
in
the
graph
database.
One
is
consume,
so
basically,
I
wanted
to
see
what
this
data
job
consumes,
which
is
actually
the
upstream
of
this
data
job.
B
As
well
as
the
downstream
of
so,
what
is
this
data
job
downstream
of
which
means
that
the
what
this
data
job
is
downstream
of
are
upstreams
of
this
data
job
right?
It's
a
little
confusing
but
bear
with
me
here
and,
and
then
the
same
thing
for
downstream
so
like
here
are
the
edges
to
look
for
in
the
graph
database
to
find
the
downstreams.
B
Now
the
contract
between
the
front
end
and
back
end
is
now
much
simpler.
The
front
end
tells
us
give
me
all
the
upstreams
and
downstream
lineages
without
knowing
anything
about
edge
types.
Anything
about
anything
like
that,
the
lineage
registry
in
the
back
end
decides.
Oh,
these
are
the
edges.
We
need
to
fetch
on
the
graph
graph
index
and
fetches
of
special
edges,
so
now
so
as
a
side
effect.
Our
lineage
graph
has
also
improved
by
saying
all
of
these.
B
B
B
So
basically
we
had
to
do
a
simple
bfs
across
the
graph
index.
So
given
an
entity
given
the
set
of
entities
from
the
last
top
fetch
all
the
downstream
edges,
so
do
one
hop
to
the
next
set
of
edges
and
then
go
go,
keep
track
of
all
the
visited
nodes
and
do
all
the
hops
until
we
reach
the
leaf
nodes,
we
batch
all
the
requests
to
minimize
number
of
queries,
so
we
don't
do
it
per
entity.
We
do
it
per
hop.
B
So
perhaps
this
one
request
elastic
search
and
then
we
cache
the
final
set
of
earns
all
the
errors
that
are
impacted
by
this
data
set.
Now,
once
we
have
the
final
set
of
earns,
we
query
the
search
request,
so
we
do
search
across
entities
the
same
way.
We
do
the
search
on
the
main
search
page,
but
with
this
added
urn
filter,
saying
anything
returned
by
the
search
needs
to
be
among
these
urns
that
are
impacted
by
the
data
set.
B
By
doing
so,
we're
able
to
support
that
embedded
search
experience
that
you
saw
in
the
demo
so
now
like
once,
we
have
the
neo4j
implementation
of
that.
The
first
part
we'll
have
this
working
on
neo4j
as
well.
If
you
have
any
questions
on
this,
please
ping
either
gave
or
me
we're
open
to
any
suggestions.