►
From YouTube: New! Airflow Run History in DataHub
Description
Tamás Németh (Acryl Data) gives a demo of upcoming improvements to the Airflow/DataHub Integration during March Town Hall.
Learn more about DataHub: https://datahubproject.io
Join us on Slack: http://slack.datahubproject.io
Follow us on Twitter: https://twitter.com/datahubproject
A
And
just
to
recap,
what
we
have
currently
and
how
we
model
currently
and
we
have
the
data
flow
and
the
data
job
in
our
model.
Data
job
basically
represents
a
pipeline
which
basically,
in
an
airflow
world,
it's
a
dag
or
yeah
it's
a
deck
and
we
also
have
the
data
jobs
which
are
like
the
task
tasks
in
the
airflow
world.
A
But
actually
so
there
are
that's
just
half
of
the
reality
and
there
are
a
few
drawback
having
a
modeling.
Only
this
one-
and
let
me
give
you
a
simple
example,
just
to
understand
one
just
to
showing
what
I
mean
in
a
drawback.
So
let's
say
you
have
your
precious
data.
It's
super
very
valuable
and
it's
unique
and
it's
super
expensive.
So
that's
why
I
I'm
demonstrating
with
a
mona
lisa.
A
So
you
have
your
daily
pipeline
runs
and
then
somebody
changes
in
your
pipeline.
Maybe
it
removes
some
output
so
that
someday
you
will
your
model,
your
monolith,
your
super
expensive
data
will
change.
You
know
that
somebody
drew
a
face
onto
your
mona
lisa
and
you
are
doing
your
job,
your
daily
job,
without
worrying
about
anything
and
knowing
anything
that
that
something
you
know
happened
when
people
calling
you
or
just
notifying
you
hey,
there
is
a
fire
burning
behind
you
and
then
you
want
to
figure
out.
A
Okay,
okay,
I
think
the
my
model
is
a
change,
so
my
data
change
and
I
need
to
figure
out
when
and
what
happened.
But
in
this
case,
if
you
just
use
our
current
model,
which
I
basically
just
get
the
actual
snapshot
of
your
pipeline
definition-
basically
you
are
out
of
luck,
so
you
can
go
even
even
airflow
does
not
give
you
any
help
in
there.
A
So
that's
why
that
we
decided
to
model
the
other
part
of
the
reality
which,
basically
the
runs
so
with,
and
this
new
entity
is
the
data
process
instance
which
actually
captures
for
basically
captures
the
run
of
a
data
flow
and
there
or
an
execution
of
a
data
flow
and
also
an
execution
of
so
that
so
basically,
the
dex
execution
and
the
task
executions
as
well,
and
not
just
we
are
storing
the
and
we
are
basically
storing
with
this
new
model,
a
history
of
these
pipeline
definitions.
A
So
not
just
so
so
with
that.
You
can
see
in
time
how
your
pipeline
look
like
in
history
and
not
just
storing
this,
but
also
we
can.
We
can
store
the
different
status
changes
because
in
reality
your
pipeline
is
not
just
a
definition,
but
actually
you
based
on
the
definition
you
want
to
instantiate
your
pipeline
and
and
your
tasks
as
well,
and
those
tasks
and
those
pipeline
as
well.
If
you
have
some
kind
of
status
that
it's
just
started,
that's
fair!
A
It's
completed
so
with
this
new
model,
what
we
are
which
are
going
to
land
soon.
We
are
able
to
capture
this
information,
but
I
guess
it's
just
a
boring
model,
but
I
show
you
so
maybe,
let's
show
with
a
real
demo
how
this
would
look
like.
A
Lineage
airflow
pipeline
from
the
the
airflow
example
I
added
some
and
I
have
the
or
lineage
backhand
data
the
table
in
each
but
can't
set
up
and,
as
you
can
see,
I
have
a
few
data
set
defined
and
in
in
the
end,
if
you
check
this
basically
will
generate
like
these
cat1
get
to
cat
swing
at
cat
four
five
category:
five
categories
as
a
data
set.
Let's,
let's
change
this
and
remove
three
of
them
and
run
or
pipeline
all
right
trigger.
A
A
Okay,
so
this
pipeline
should
run
and
it
finishes
and
if
I
go
now
to
data
hub
if
you
use
the
pipeline,
so
if
you,
if
you
say
emitted
pipelines,
this
screen
can
be
familiar
for
you
to
you
and
if
you
click
like
in
this
last
one,
there
is
a
new
tab
called
runs.
If
you
go
to
the
runs
page,
it
basically
shows
the
last
couple
of
pipeline
runs.
A
As
you
can
see,
it
shows
a
debate
when
it
was
executed.
There
is
some
run
id
and
it
there
is
the
status
and
also
what's
what's
it's.
I
think
it's
super
important
here
interesting
here.
You
can
also
see
how
your
pipeline
inputs
and
outputs
change
in
time.
So,
basically,
as
you
can
see
previously,
we
had
like
four
s3
files
as
an
input
and
in
with
the
latest
run.
There
is
only
one,
and
maybe
you
are
interested
okay,
but
what
was
the
I'm
interested
in
the
logs?
A
So
what
happened
there,
or
maybe,
if
it's
in
a
fair
state
you?
You
are
interested
in
what
happened
and
you
don't
want
to
go
to
airflow
and
you
know
just
finding
that
specific
run.
Maybe
it
can
be
hard,
so
there
is
a
button
as
well
which,
if
you
click
it
basically
brings
you
to
the
to
the
logs
of
that
execution.
A
A
A
So
this
was
the
amount
and
this
code
what
it
does.
Actually
it
just
creates
one
data
flow
and
one
data
job
and
with
the
new
high
level
api.
This
will
be
much
simpler.
A
How
much
you
may
have
to
hit
some
present
mode,
because,
oh
yes,
yeah,
can
you
see
it
now
so
yeah
so
now
with
this
new
api?
But
what
we
are
going
to
pitch
will
land
soon.
Basically,
you
can
do
it
like
in
in
three
or
four
lines
of
code.
That
was
previously
it
was
like
50,
70
or
even
more
and
actually
the
airflow
demo.
What
I
showed
was
using
that
new
api.