►
From YouTube: Embedding Workflows in DataHub
Description
Kartik Darapuneni (Included Health) shares his experience building embedding Looker, Querybook, and Jupyter into DataHub.
Learn more about DataHub: https://datahubproject.io
Join us on Slack: http://slack.datahubproject.io
Follow us on Twitter: https://twitter.com/datahubproject
A
So
we
are
now
going
to
move
over
to
our
community
our
community
case
study.
So
today
we
have
kartik
and
I'll
pass
it
over
to
you,
sir.
B
Thanks
maggie,
let
me
share
my
screen
and
kind
of
walk
through
our
presentation.
I
feel
like
every
time
I
come
here.
I
come
away
with
more
work
to
do
so.
It's
a
testament
to
some
of
the
cool
things
that's
happening,
but
anyway,
hello,
everyone,
my
name
is
kartik
and
I'm
one
of
the
tech
leads
that
included
health
working
on
the
data
platform.
B
More
specifically,
my
team
focuses
on
making
the
interface
between
humans
and
data
as
simple
as
possible,
so
today
I'll
be
talking
about
how
we've
leveraged
datahub
and
some
of
the
custom
workflows.
We've
created
around
datahub,
specifically
by
embedding
different
tools
that
we
have
directly
into
datahub
to
make
our
workflows
more
seamless
to
so
so
to
start
like
what
is
included
health-
and
why
is
data
so
important
here
included?
B
Help
is
a
new
kind
of
healthcare
company
and
we're
focused
on
giving
our
members
a
service
where
all
of
their
medical
needs
are
all
included
under
one
company,
and
so
I
know
that's
kind
of
vague.
So
like
what
does
that
really
mean,
it
could
be
as
simple
as
you
know,
finding
a
new
primary
care
doctor,
whether
that's
in
person
or
virtually
or
getting
expert
medical
opinions
on
their
more
complex
medical
conditions.
B
One
of
the
core
ingredients
for
success
is
clean,
usable,
managed
and
understandable
data,
and
so
you
know
given
that,
like
what
is
the
focus
of
our
data
platform
at
the
end
of
the
day,
it's
really
about
simplifying
existing
data
and
making
sure
that
we
deliver
the
right
care
at
the
right
time
for
all
of
our
members,
based
on
deep
understanding
of
our
data,
and
so,
as
I
mentioned
earlier,
one
of
the
focuses
of
my
team
specifically
is
the
interface
between
humans
and
data
and,
as
such,
we've
kind
of
taken
an
extensive
advantage
of
the
amazing
like
open
source
and
closed
source.
B
B
We
use
jupiterhub
for
more
programmatic
interactions,
with
data
and
then
query
book
by
pinterest
to
do
kind
of
like
ad
hoc
reporting
and
sql
based
data
exploration
and
I'll
kind
of
show,
like
some
screenshots
and
examples
of
what
some
of
these
tools
look
like
if
you're
unfamiliar
with
them-
and
you
know
we
started
this
project
or
this
platform
about
two
years
ago.
We
kind
of
like
rebuilt
it
and
we
we
had
no
data
to
now.
B
We
have
almost
7000
different
data
entities
that
are
within
our
ecosystem
and
so
pretty
quickly
it
became
a
you
know.
You
can't
just
use
the
tools,
because
the
workflows
kind
of
cross,
different
tools
and
different
domains,
and
we
really
needed
a
tool
like
data
hub
that
acts
as
the
search
and
discovery
platform
to
kind
of
like
enable
the
workflows
that
we
really
want
at
scale.
B
So,
to
give
a
quick
example
of
what
these
workflows
look
like
at
included
health,
you
know
a
user
typically
starts
with
searching
for
you
know:
existing
data
of
various
entity
types
through
datahub,
perhaps
reading
through
the
metadata-
that's
been
ingested
through
datahub
or
has
been
added
later
on
by
owners
or
users
of
that
data.
B
You
know
I
typically
have
tons
of
like
tabs
open
and
I'm
kind
of
like
hopping
back
and
forth,
and
I
lose
track
of
like
what
I'm
doing
and
it
becomes
a
little
frustrating
and
so
again
like
how
do
we
simplify
this
right?
Like
that's
the
submission
of
our
team?
How
do
we
make
this
simpler
and
easier
to
use?
B
And
so
the
way
I
think
about
this
is
like
well,
we
have
this
like
great
search
and
discovery
platform
through
data
hub,
and
how
do
we
bring
the
rest
of
the
tooling
that
we
have
into
that
ecosystem
in
a
way
that
it
seems
more
seamless
and
that's
where
embedding
really
comes
into
play?
And
that's
what
I'm
going
to
talk
about
today,
and
so
the
hypothesis
is
that
if
we
can
embed
the
right
context
in
the
right
location,
we
can
have
a
majority
of
our
users
never
leave
datahub.
B
If
you
want
to
understand
the
metadata,
the
ownership,
the
domains
that
falls
in
all
of
that
content
is
explorable
here,
and
it's
really
nice,
because
when
you
go
to
a
dashboard,
you
want
to
know
all
of
the
metadata
and
lookers
specifically
doesn't
give
you
a
lot
of
context
to
you
know,
add
a
lot
more
text
or
add
a
lot
more
context
than
some
of
the
primitives
they
provide.
B
So
taking
advantage
of
the
ui
that
data
hub
provides
is
nice
while
giving
the
rich
like
visualizations
that
looker
gives
and
then
one
of
the
example
workflows
that
we
see
here
is
that
you
know
each
looker
dashboard
is
composed
of
different
charts,
charts,
explorers
or
tiles,
and
so
using
the
lineage.
We
can
go
from
a
looker
dashboard,
all
the
way
to
a
chart
and
see
what
the
the
content
of
that
chart
is.
And
so,
if
you
wanted
to
explore
from,
you
know
like
this
particular
tile,
and
I
want
to
slice
it
by
something.
B
B
So
that's
one
tab
that
I
can
close.
The
second
example
I
want
to
show
is
querying.
So,
as
I
mentioned,
we
used
a
query
book
by
pinterest
to
do
like
ad
hoc
execution
of
sql
data,
and
so
this
particular
tool
can
connect
to
lots
of
different
sql
execution
engines
in
the
background
from
sql
alchemy
to
bigquery,
to
whatever
you
really
need,
and
so
in
this
example.
B
Another
example
is
notebooks.
I
mentioned
you
can
do
like
ad
hoc
reports
through
a
query
book
and
so
here's
an
example
where
you
know
we
have
a
sql
query
that
we've
written
and
the
output
of
it
produces
a
chart.
And
so,
if
you
wanted
to
share
the
different,
you
know
reports
that
people
have
compiled.
B
Instead
of
you
know,
throwing
a
bunch
of
different
links
to
the
different
tools.
We
now
have
the
ability
to
send
just
the
data
hub
link
to
people,
and
this
enables
us
to
drive
more
traffic
through
data
hub.
More
people
will
add
the
metadata
that's
required,
instead
of
having
to
jump
back
and
forth
between
tools
and
have
the
perpetual
question
of
well.
B
And
lastly,
I
also
mentioned
data
sets,
I
know,
there's
the
profiling
aspect
of
data
hub,
but
what,
if
you
want
to
do
more
than
that
right,
like
you,
want
to
query
the
table,
you
want
to
join
it
to
existing
tables
and
kind
of
have
a
more
rich
way
of
interacting
with
the
data
than
just
the
profile
view.
We've
again
embedded
the
query
execution
tab
per
data
set
where
it
you
know,
pre-populates
the
query
for
you,
and
so
you
could
imagine
an
example.
B
Use
case
of
this
could
be
looking
at
like
query
history
and
saying
I
want
to
you
know
this
person
has
run
this
particular
query
a
lot.
Let
me
go
look
at
it
again
and
kind
of
like
have
an
easy
way
of
getting
to
the
right
data
at
the
right
time
to
make
the
actual,
like
life
cycle
easier
for
people,
so
they're
not
worrying
about
the
tools
but
they're
worrying
about
how
do
I
work
with
data
and
how
do
I
deliver
insights
and
value
to
our
company
faster?
B
So
to
summarize,
we've
instead
of
having
bespoke
tools
that
are
sitting
outside
of
our
ecosystem,
we've
really
brought
them
together
under
one
platform
or
what
what
it
looks
like
to
be
one
platform
for
a
lot
of
people
and
we've
abstracted
away
the
the
need
to
think
about
these
different
tools
and
like
what
the
urls
for
each
of
them
are
and
really
data
hub
becomes
the
hub
for
all
data.
B
Obviously,
I've
gone
a
little
overboard
with
embedding.
So
maybe
we
can
clean
up
some
of
our
workflows,
but
I
think
there's
enough
value
here
that
hopefully
it
will
spark
some
of
your
own
ways
of
embedding
different
workflows
into
the
tool,
and
so
I
also
wanted
to
talk
a
little
bit
of
gotchas
around
embedding
content,
because
this
is
a
web-based
product,
and
you
know,
hacking
and
like
like
man
in
the
middle
of
attacks,
are
always
kind
of
a
concern.
B
What
are
the
some
of
the
things
that
you
have
to
work
around
if
you
do
want
to
embed
content
and
what
are
some
of
the
things
that
we
had
to
kind
of
work
around
to
get
there?
The
first
is
like
authentication
every
one
of
our
tools
uses
single
sign-on,
based
authentication,
and
so
one
of
the
things
that
you'll
quickly
find
is
for
any
tool
that
needs
to
set
cookies.
B
They
will
be
blocked
because
of
you
know,
I
think
it's
like
from
2020
or
something
there's
like
a
security
patch
to
not
allow
cookies.
So
you
have
to
explicitly
set
specific
headers,
that's
described
in
the
link
here
and
I'll
share
this
presentation.
So
if
you
do
end
up
needing
to
embed-
and
you
want
to
look
at
the
content,
you
can
feel
free
to
look
take
a
closer
look,
the
other
one
which
is
like
I'm,
not
a
web
developer.
B
So
this
is
the
bait
of
my
existence,
which
is
like
the
course
issue
which
is
like
cross
embedding
of
different
websites
into
one
page.
Thankfully
we
have
you
know
we
use
kubernetes
to
launch
everything,
and
so
it's
easy
to
add
new
proxies
through
nginx.
B
That's
been
how
we've
solved
it.
If
there's
better
other
ways
to
solve
this,
I
would
love
to
hear
it
and
then
finally,
we
do
use
load
balancers
a
lot
in
lots
of
different
places
around
our
embedded
sites,
and
so
one
of
the
things
you'll
see
specifically
if
you
use
flask,
for
example,
is
there's
some
like
issues
with
proxies
and
how
flask
works,
and
so
you'll
specifically
have
to
set
some
headers
to
make
sure
flask
works.
B
The
way
you
intend,
when
it's
embedded
again
like
not
not
like
super
hard,
but
you
know
it's
not
something
that
you
want
to
spend
more
than
an
hour
or
two
on
so
hopefully
this
will
help
unblock
you
if
this
is
a
feature
that
you're
interested
in
in
implementing
for
your
own
workflows
and
yeah.
B
That
is
the
end
of
my
talk.
Hopefully,
this
was
interesting
if
you're
interested
in
getting
you
know
like
a
more
specific
demo
or
just
understanding
more
about
how
we
use
embedding
feel
free
to
reach
out
to
me
on
slack,
that's
the
slack
handle
in
the
data
hub
slack
channel
and
then
in
general,
if
you're
interested
in
included
health
and
kind
of
what
we're
doing
here
or
contributing
to
the
open
source
community.
B
There
is
the
careers
page
on
our
company
website
or
feel
free
to
reach
out
to
me
directly
as
well,
and
that's
it
for
me.
If
there's
any
questions,
I
can
take
them
or
we're
short
on
time
I'll
hand
it
off
to
the
next
presenter.
A
Yeah,
I
think
so
ben
had
a
question
around
forking.
Did
you
have
to
fork
data
hub.
B
Yes,
we
did
need
to
fork
it.
We
also
do
a
lot
of
different
custom
things
with
data
hub
and
so
there's
been
lots
of
different.
You
know
things.
We've
edited.
A
Awesome,
I
love
this
so
much
kartik.
Thank
you
so
so
much.
This
is
insanely
amazing.
I'm
also
a
huge
fan
of
keeping
people
in
data
hub
for
as
long
as
possible
as
well.
So
right
there
with
you
lots
of
love
in
the
chat,
so
I
recommend
taking
a
look
there.