►
From YouTube: Great Expectations Outcomes in DataHub
Description
John Joyce (Acryl Data) gives a demo of Displaying Data Quality Checks in the DataHub UI during the February Town Hall.
Learn more about DataHub: https://datahubproject.io
Join us on Slack: http://slack.datahubproject.io
Follow us on Twitter: https://twitter.com/datahubproject
A
A
Basically,
great
expectations
is
a
way
to
define
assertions
or
tests
on
particular
data
assets
and
then
evaluate
them
repeatedly
over
time
to
track
the
quality
of
a
data
set.
The
goal
being
to
maintain
the
data
sets
quality
through
time
as
it
changes
so
some
examples
I
pulled
from
the
great
expectations.
Documentation
is
this,
so
you
would
define
this
in
your
python
code,
basically
an
expectation
of
what
you
would
expect
from
a
data
set
either
at
the
table
level
or
at
the
column
level.
A
So
what
we
got
from
the
community
was
a
request
to
display
the
outcomes
of
the
great
expectations
assertion
suites
in
datahub
for
all
the
perfectionists
out
there.
We
had
to
make
one
modification
to
that
request,
so
the
solution
they
wanted
to
see
is
as
an
end
user,
to
be
able
to
see
the
types
of
assertions
specifically
the
results
of
running
the
assertions
associated
with
a
data
set
inside
of
datahub.
A
Additionally,
with
the
requirement
to
be
able
to
see
the
assertion
runs
over
time
or
over
history,
and
so
now
I'm
just
going
to
jump
right
into
a
demo
where
I'll
talk
about
how
to
actually
configure
the
integration
and
then
I'll
run
some
expectations
and
show
you
what
the
output
looks
like
once.
We've
ingested
that
into
data
hub,
so
I'm
going
to
step
over
to
a
local,
great
expectations,
project
that
I've
got
here.
A
A
So
we've
already
gone
ahead
and
defined
a
set
of
expectations
that
I'd
like
to
run
against
this
table.
So
basically,
just
some
tests
or
assertions
a
few
of
them
that
we
have
in
here
are
you
know,
expecting
the
table
columns
to
match
a
predefined
list
expecting
the
row
count
to
be
between
a
particular
minimum
and
max
and
expecting
a
column
to
always
have
values
that
fall
into
a
finite
set.
A
So
I'm
going
to
go
into
a
checkpoint
file
that
I've
configured
and
you
can
see
that
we
configure
this
this
to
run
against
the
taxi
jan
19th
table
and
we're
using
that
suite,
which
is
just
a
group
of
expectations
that
I
previously
showed.
You
there's
also
an
interesting
configuration
called
action
list,
and
this
is
where
we're
going
to
configure
the
integration
with
data
hub.
So
actions
and
great
expectations
are
a
way
to
run
code
once
a
checkpoint
has
been
hit.
A
A
So
now,
I'm
going
to
run
through
the
process
of
ingesting
an
assertion
run
into
datahub.
First
thing:
we'll
need
to
do
in
our
great
expectations.
Environment
is
actually
just
install
the
great
expectations
plugin
of
acura
data
hub
now,
I've
already
gotten
that
installed.
So
I'm
just
going
to
skip
that
step,
but
once
we've
done
that
we
can
run
this
checkpoint,
and
hopefully
this
will
execute
the
suite
as
well
as
push
data
into
my
local
data
hub.
A
What
you
can
see
inside
of
here
is
a
top
level
summary
saying
that
all
of
the
assertions
that
datahub
is
aware
of
are
currently
passing
for
this
data
set.
You
can
also
see
kind
of
human
english
descriptions
of
each
type
of
assertion,
and,
if
you
hover
over
it,
you
can
see
the
native
grade
expectations
operator
that
was
run.
A
A
Now
I'm
going
to
go
ahead
and
show
you
what
failing
expectations
looks
like
and
for
that
I'll
go
to
this
feb
19th
table.
So
we
can
see
that
this
one's
actually
failing
one
of
nine
of
its
assertions
that
we
know
about,
and
we
can
go
over
here
to
actually
see
that
this
is
the
assertion.
That's
failing.
A
A
A
So
yeah
this
is
pretty
much
the
demo
initially,
we
will
support
great
expectations,
but
we've
modeled
this
in
a
fairly
general
purpose
way
with
this
concept
of
assertions
such
that
we
can
support
things
like
dq
among
other
types
of
validation
systems.
So
now
I'm
going
to
navigate
back
to
the
the
presentation
here.
A
So
quick
configuration
recap
for
grade
expectations,
in
particular
in
your
grid
expectations,
environment,
you're,
going
to
want
to
install
april
data
hub
grid
expectations,
plugin
you're
going
to
want
to
add
the
datahub
validation
action
to
any
checkpoints.
You
have
you're
going
to
want
to
execute
them,
and
then
you
can
view
the
results
in
datahub
as
assertions.
A
So
just
briefly
I'll
talk
about,
you
know
how
this
works
under
the
hood,
particularly
the
modeling.
We
have
a
new
entity
on
data
hub
that
we
call
assertion.
Assertion
can
be
associated
with
other
entities
and
it
does
exactly
what
you
would
think
it
just
defines
conditions
that
are
executed
against
a
particular
entity.
A
We
also
have
a
special
time
series
aspect:
we've
added
called
assertion
run
event,
and
this
is
basically
what
powers
that,
over
time
view
historical
view.
Every
time
an
assertion
runs,
an
assertion
run
event
will
be
produced
to
give
different
information
about
the
assertion
run
like
its
status,
its
results,
maybe
how
much
time
it
took
things
like
that,
and
in
this
case
we
have
just
a
basic
relationship
between
the
assertion
entity
and
the
data
set.
A
So
now
I'll
quickly
cover
the
availability
in
data
hub
version,
0,
8
28,
which
is
the
next
release.
We
will
be
shipping
support
for
data
set
table
and
column
assertions
from
great
expectations,
support
for
great
expectations,
v3
api,
which
is
their
latest
api,
we're
going
to
push
assertion
results,
as
you
saw
in
real
time
via
that
checkpoint
action
and
we're
going
to
support
the
sql
alchemy
execution
engine
inside
of
great
expectations.
Now
there
are
other
engines
like
spark
and
pandas,
but
that
will
be
not
in
the
v1
support.
A
Finally,
we'll
have
a
graphql
api
which
will
allow
you
to
check
the
assertion
status
for
a
particular
data
set,
and
I
think
this
is
actually
pretty
powerful
because
it
allows
you
to
build
automated
workflows
that
only
proceed.
If
maybe
some
input
data
sets
are
actually
passing
their
most
recent
assertions.
A
Finally,
I'll
just
talk
about
where
we
see
this
feature
going,
starting
with
some
improvements
we
have
to
make
already
to
the
grid
expectations
connector,
we
want
to
support
other
execution
engines,
like
I
alluded
to
spark
pandas
support
for
legacy
apis,
depending
on
the
feedback
we
get
from
the
community.
A
I
think
there
are
some
people
who
are
still
working
with
the
v2
apis
as
v3
is
actually
kind
of
new
support
for
cross
data
set
assertions
which
is
an
advanced
feature
in
grid
expectations,
support
for
conditional
expectations,
which
is
basically
expectations
that
only
apply
to
a
subset
of
a
table
based
on
some
filtering
criteria.