►
Description
Dane Curran, Edward Morgan, Levi Le from Raft submitted this hack during the inaugural Metadata Day 2022 Hackathon, winning Community Favorite Hack!
"We decided to tackle one of the biggest asks we've had from data scientists and analysts interested in using Datahub to look for interesting datasets: how can I easily get from a user-friendly metadata UI (Datahub) to a place where I can interact with the data? To accomplish this, we wanted to add a new "Visualize with Superset" button in Datahub that links to the dataset loaded in Apache Superset, a tool that can be used for SQL exploration and data visualization."
See the code submission here: github.com/raft-tech/datahub-metadata-day-2022
A
All
right
hi
and
thank
you
for
reviewing
our
submission.
My
name
is
edward
morgan
and
I
am
a
data
engineer
at
raft.
A
So
what
we
decided
we
wanted
to
do
for
our
hackathon
proposal
was
to
add
a
feature
to
datahub
that
we've
had
requested
more
than
once,
and
that
is
how
do
you
get
from
the
datahub
ui,
which
is
this
really
rich
way
to
visualize
and
view
and
search
for
metadata
about
a
data
set?
How
do
you
get
from
there
to
actually
playing
with
the
data?
Even
if
it's
in
you
know
a
web
or
browser-based
fashion?
How
do
I
get
to
actually
see
what
is
in
those
data
sets?
A
So
that's
what
we
did
so
just
to
sort
of
take
you
through
where
we're
at.
We
are
deployed
on
kubernetes,
I'm
running
locally
in
kind,
which
is
kubernetes
and
a
docker,
and
so
we've
got
kafka
up,
elasticsearch,
postgres,
superset
and
trino,
so
trino
and
postgres
are
where
we're
sourcing
data
from
so
trino
is
sql
over
everything.
A
The
data
sets
that
we're
actually
looking
for
are
in
postgres.
We
ingested
them
using
the
trino
plugin
and
that's
how
we'll
access
them.
Superset
is.
It's
an
apache
open
source
project
and
it
provides
a
really
easy
way
to
explore
your
data
to
visualize
your
data.
It
has
a
ui
in
the
browser
where
you
can
look
at
things
in
a
sql
editor
and
then
generate
charts
from
those.
So
what
we
wanted
to
do
was
have
a
single
click
way
of
going
from
data
hub
to
superset.
A
So
let's
say
I'm
a
data
user,
I'm
looking
in
trino-
and
there
are
these
two
data
sets
that
I'm
interested
in
these
ais
data
sets
from
noaa.
We
can
see
the
schema
and
you
know
maybe
some
documentation,
some
properties,
some
ownership.
We
know
how
large
they
are,
but
if
we
actually
want
to
visualize
them
and
view
them,
that's
where
we're
adding
some
some
functionality.
A
So
we
see
up
here
we
added
a
new
button
up
to
the
sidebar
of
the
data
set
entity
to
visualize
with
superset.
So
if
we
click
that
behind
the
scenes,
what
happens
is
data
hub
is
going
out?
The
front
end
is
calling
a
mutation
that
we
made
to
go
back
to
data
hub
gms
on
gms.
It
is
carrying
out
a
sequence
of
steps
to
set
up
the
data
connection
in
superset
to
postgres,
where
the
data
is
actually
residing.
A
A
A
Coming
from
the
data
set
name,
the
data
platform
name,
information
that
we
can
include
in
the
deployment
like
environment
variables
for
the
datahub
gms
pod,
which
is
how
we
passed
in
some
information
and
that's
something
that
you
know
we
could
expand
on
outside
of
a
hackathon
is
how
do
we
pass
that
in
in
a
more
dynamic
way,
for
example
like
using
secrets
for
passing
in
credentials
so
from
the
data
user's
perspective,
we
are
in
data
hub,
we
click
a
button,
we're
in
superset.
We
can
run
a
query
and
we
can
see
these
results
in
superset.
A
You
can
also
do
things
like
create
charts,
create
dashboards
and
then
view
that
lineage
in
data
hub,
which
is
really
nice,
and
just
to
just
to
show
that
it's
that
it's
not
all
hard-coded
we've
got
this
noaa
2020
data
set,
there's
also
a
noaa
2021
data
set,
and
so
we
can
do
the
same
thing
and
we
see
that
the
the
nova
2021
data
set
is
what's
being
selected
here.
It's
reusing
the
database
connection
that
it
set
up
previously,
so
it's
sort
of
lazily
evaluating
that
and
then
setting
up.
A
So
from
this
we
intend
to
expand
upon
this,
to
make
it
a
little
bit
more
full
featured
and
also
for
data
sources
other
than
postgres
over
trino
trino
is
nice
because
it
runs
sql
over
everything.
But
you
know
maybe,
including
other
data
sources
like
kafka
would
be,
would
be
useful,
but
we
decided
to
tie
this
all
into
datahub,
because
datahub
is
the
central
way
in
which
people
are
viewing
datasets,
and
we
think
that
this
is
a
really
great
addition,
because
it's
allowing
people
to
go
directly
from
data
set
exploration
to
data
exploration.
A
We
think
that's
really
important,
so
I
think
that's
about
all
for
our
hackathon
topic.
We've
pushed
our
code
up
and
we
hope
that
you
guys
take
a
look
and
that
you
guys
enjoy
it.
But
thank
you
for
giving
us
the
opportunity.