►
From YouTube: Fine-Grained Lineage & Timeline API
Description
Shirshanka Das & Ryan Holstien (Acryl Data) review Fine-Grained Lineage & demo the new Timeline API during February 2022 Town Hall.
Learn more about DataHub: https://datahubproject.io
Join us on Slack: http://slack.datahubproject.io
Follow us on Twitter: https://twitter.com/datahubproject
A
This
is
the
most
interesting
part
of
my
day,
because
I
get
to
tell
you
all
of
the
cool
stuff
that
has
happened
in
data
hub
and
is
about
to
land.
First
thing
is
column
level:
lineage
we're
actually
calling
it
fine
grain
lineage.
A
Where
are
we
at?
We
have
committed
the
model
and
it
supports
both
data
set
to
data
set,
as
well
as
data
set
to
job
lineage.
We
have
apis
and
documentation
for
how
to
add
and
query
this
data.
So
if
you
have
programmatic
integrations
that
you
are
dying
to
kick
off,
go
at
it,
you'll
find
that
documentation
exactly
where
maggie
was
talking
about.
A
If
you
go
to
the
doc
site
and
you
look
on
the
left,
rail
and
you
drop
down
to
metadata
modeling
entities
and
you
drop
into
data
set
you'll
see
a
section
there
on
find
green
lineage
and
over
there.
You'll
have
examples
of
exactly
how
you
can
add
this
lineage,
both
to
a
data
set
or
to
a
job
that
reads
or
writes
to
a
data
set
and
then
how
to
query
that
lineage
information.
A
So
that's
just
the
beginning.
Obviously,
but
it's
a
good
checkpoint
for
us,
the
next
step
is
going
to
be
actually
building
out
integrations
with
existing
sources
and
producing
this
kind
of
fine-grained
lineage
from
those
existing
emitters.
We're
still
doing
an
inventory
of
which
sources
are
actually
good
targets
to
start
out
with.
So
please
let
us
know
if
there
are
ones
that
you
think
are
a
good
ones
for
us
to
go
after.
A
A
Next
slide,
all
right,
so
we've
talked
about
schema
version
history
ad
infinitum
and
as
we
started
building
it,
we
actually
built
one
version
of
it
and
then
realized
actually
a
better
way
to
do
it.
So
we're
actually
calling
this
the
timeline
api.
What
do
we
mean
by
that?
A
It's
a
unified
timeline
of
changes
to
entities
in
the
metadata
graph
and
it's
computed
across
all
of
the
individual,
fine
grain
changes
that
are
happening
to
entities,
so
you
essentially
get
a
unified
timeline
of
any
entity
like
a
data
set,
and
you
can
then
filter
as
a
consumer.
Whatever
categories
you
care
about,
so,
for
example,
we've
got
this
entity
here.
This
looks
like
a
hive
data
set
and
we
want
to
see
what
happened
to
this
data
set
from
seven
days
ago.
A
A
As
part
of
this
change,
we
also
added
finally
an
open
api
server
to
the
data
hub
back-end.
So
I
don't
know
how
many
people
love
wrestling,
but
it's
a
very
hard
to
use
rest
api.
A
So
we've
we
want
to
move
towards
open
api
as
our
public
facing
rest,
endpoint
and
as
part
of
the
timeline
api
work
we
actually
put
in
the
work
to
add
in
an
open
api
server.
A
If
you
want
to
think
about
technically
how
it's
actually
built,
the
metadata
model
is
split
up
into
aspects
right
and
each
aspect
is
very
fine-grained.
So,
for
example,
we
have
an
aspect
called
schema
metadata
and
we've
got
another
aspect
called
global
tags
and
we've
got
another
aspect:
called
data
set
properties
and
there's
lots
and
lots
of
aspects
and
individual
changes
are
happening
to
each
one
of
these
aspects.
So
you
might
have
a
schema
changing
and
that
might
be
happening
over
in
the
schema
metadata
aspect.
A
You
might
have
tags
being
added,
and
sometimes
that
happens
through
the
schema
metadata
aspect,
and
sometimes
it
happens
through
the
global
tags
aspect.
So
there's
a
lot
of
complexity
in
the
lower
level
model
and
how
changes
are
happening
in
there,
but
we.
What
we
wanted
to
provide
was
actually
a
semantic
way
to
look
at
all
of
these
changes
in
a
single
way.
So
at
the
top
level
there
are
categories
like
technical
schema
tag,
documentation
and
these
techniques.
A
These
high-level
categories
actually
span
across
these
fine-grained
aspect
changes,
and
so
you
can
then
create
a
singular,
unified
timeline
across
all
of
these
categories.
So
you
can
almost
think
about
this
unified
timeline
as
a
projected
view
that
is
merged
across
all
of
these
individual
version
timelines.
So
it's
pretty
cool.
It
was
a
lot
of
fun
working
on
this,
a
huge
shout
out
to
ryan
and
surya,
who
I
collaborated
with
in
building
this,
and
it's
just
the
beginning.
A
I
think
once
now
that
we
have
this
we're
going
to
be
able
to
build
simple
things
actually
now,
like
schema
version,
history,
documentation,
history
and
all
sorts
of
different
experiences.
On
top
of
this
much
more
general
api,
so
that's
kind
of
the
high-level
visual
of
how
to
think
about
it.
But
how
do
you
get
it?
Well,
it's
already
there.
The
cli
was
released
last
night,
so
you
can
try
it
on
the
quick
start
using
the
latest
cli
0827.1.
A
The
server
code
is
also
in
obviously
so
it's
part
of
quick
start.
It
will
be
released
officially
as
part
of
zero.
Eight
twenty
eight.
So
that's
the
next
release
coming
up
and
in
terms
of
like
caveats,
it's
supported
only
for
data
sets
right
now,
so
the
entity
type
has
to
be
a
data
set
and
it
supports
these
kind
of
change
categories,
technical
schema,
tags,
documentation,
glossary
terms
and
ownership.
B
I've
got
the
ui
up
here
and
so
you'll
notice
that
here,
we've
added
a
couple
of
links
in
the
drop
down
here,
specifically
I'm
interested
in
the
open
api
one.
So
this
is
our
new
swagger
page
that
sri
shanka
mentioned.
So
this
has
the
open
api
spec
for
the
timeline
endpoint,
and
here
you
can
explore
it
just
like
any
other
swagger
page.
B
So
if
we
add
an
earn
and
a
category
of
change
that
we
want
to
search
for,
then
we
can
get
this
nice
little
response
of
a
list
of
changes
that
have
happened
to
that
particular
data
set
and
so
the
schema
of
the
response.
Basically,
we
split
each
change
into
a
transaction
level
change
that
includes
a
list
of
change
events,
so
those
list
out
the
individual
changes
that
have
happened
to
each
category.
B
We
have
this
data
hub
timeline
command
that
we've
added,
so
it
has
two
mandatory
parameters,
so
earn
and
category,
and
you
can
specify
a
start
and
end
time
and
a
few
other
parameters
as
well,
and
so,
if
we
send
in
that
we
get
a
list
of
all
the
changes
that
have
happened
to
the
technical
schema,
which
is
linked
to
the
schema
metadata
aspect.
B
So
we'll
see
like
the
for
the
first
set
of
changes,
we
got
all
of
these
fields
added
and
then
we've
put
in
some
changes
where
we
modified
and
removed
some
of
the
fields
in
there,
and
so
one
of
the
things
you'll.
Probably
notice
is
so
we
have
a
time
stamp
of
each
transaction.
That
happened
and
a
semantic
version
that
was
computed.
B
So
these
are
all
relative.
Basically,
we
compute
this
as
a
result
of
what
changes
happen.
So
if
a
major
major
version
change
happened
like
a
backwards
incompatible
change
where
a
field
is
removed,
then
we'll
up
it
to
a
major
version.
But
if
something
less
significant
happens,
then
it
would
be
a
minor
or
a
patch.