►
From YouTube: Case Study: Contributing to DataHub
Description
Eric Cooklin (Stash) shares his experience collaborating with the DataHub Community and contributing back to the Open Source Project.
Learn more about DataHub: https://datahubproject.io
Join us on Slack: http://slack.datahubproject.io
Follow us on Twitter: https://twitter.com/datahubproject
A
I
was
asked
to
kind
of
recently
stash
we
added
some
data
set
transformers
up
to
the
upstream
repository
and
we
were
asked
if
we
would
like
to
kind
of
explain
why
we
did
that
where
what
was
the
use
case
for
and
then
explain
our
experience,
because
this
was
also
our
first
time
contributing
up
to
open
source
projects.
So
how
easy
was
it?
What
were
there
any
snags
or
anything
like
that
spoiler
alert?
It
was
pretty
easy.
A
Oh
there
we
go
wait
a
second
there
we
go
so
a
little
bit
of
background.
My
name
is
eric.
Obviously,
I'm
joining
you
guys
from
austin
texas.
I've
been
a
data
engineer
at
stash
for
two
years
and
stash
is
an
industry-leading
subscription
platform
empowering
everyday
americans
to
build
wealth.
As
you
may
know,
middle
class
americans
are
struggling
to
build
wealth.
The
u.s
stock
market
is
the
largest
generator
of
wealth
in
our
history,
and
yet
45
percent
of
americans
aren't
currently
invested.
Majority
of
americans
live
paycheck
to
paycheck.
A
Stash
is
a
personalized
financial
platform,
with
tools
for
investing
banking
insurance
and
more
that
helps.
People
grow
their
wealth
for
the
long
term
more
on
that
after
a
presentation,
so
this
journey
was
back
last
year
in
q2,
our
team
was
really
looking
at
what
was
out
there
in
the
data
catalog
data
catalog
offerings
just
both
from
a
paid
vendor
standpoint
and
is
also
from
the
open
source
community
and
like
many
of
y'all,
we
ended
up
choosing
data
hub
because
we
really
fell
in
love
with
the
product.
A
I
mean
it
was
one
of
the
only
ones
that
really
matched
our
technology
footprint
and
just
the
openness
of
the
platform
meant
it
could
grow
as
we
grew
some
of
the
cool
features
like
data
lineage,
the
business
glossary
stuff
like
that
were
really
appealing
to
us
and
as
we
knew
we
want
wanted
something
with
strong
support.
This
open
source
community
seemed
really
active,
really
strong
and,
as
we
will
learn
it
was
that
that's
a
huge
benefit
for
us
and
probably
for
a
lot
of
others
as
well.
A
In
q3,
we
started
using
this
tool
internally
and
started
onboarding
some
of
our
business
teams
on
it,
trying
to
kind
of
capture
their
knowledge
capture
what
they
they
know,
working
with
their
smes
and
analysts.
A
We
identified
kind
of
like
low-hanging,
fruit
kind
of
things,
to
add
to
data
hub
to
make
sure
that
the
user
experience
out
of
the
box
was
going
to
be
really
strong.
So
you
know
in
on
top
of
just
the
metadata
that
comes
from
the
various
platforms
that
we're
ingesting.
We
wanted
to
get.
You
know
like
confluence
links
and
create
some
tags.
Add
to
the
business.
Glossary
really
help
the
the
first
users
get
get
familiar
with
the
platform
and
have
you
know
that
wealth
of
data
right
there
at
their
fingertips?
A
But
the
big
question
was:
how
do
you
add
these
into
data
hub
efficiently
right?
We
knew
we
would
just
from
these
meetings
with
the
business
teams.
We
would
have
like
you
know.
Okay,
these
data
sets
need
these
terms
and
these
tags
and
these
links
and
stuff
like
that.
But
there
wasn't
really
a
way
to
get
out
of
the
box
to
get
that
information
in
the
data
hub.
A
There's
the
graphql
api,
which
kind
of
was
the
obvious
solution,
we'll
just
read
it
in
send
it
up
blah
blah
blah
it
works,
but
reading
into
kind
of
the
source
code
we
identified
that
hey.
There's
data
set
transformers
out
there
that
already
modify
the
data
set
entities.
What
if
we
can
just
use
those
ourselves
to
add
like
tags
or
business
glossary
terms,
so
that's
kind
of
talking
with
the
project
management.
A
A
So
we
use
this
data
set
transformer
that
is
already
available.
This
pattern
add
data
set
ownership
and
the
config
is
on
the
screen
right
here.
Basically,
what
this
does
is
in
the
rules
section.
There
is
a
regex
pattern
and
then
an
array
of
urines
and
the
dataset
transformer
is
just
going
to
apply
all
those
urns
in
the
array
to
whatever
is
in
the
regex
pattern,
so
we're
like
yeah.
This
should
be
pretty
straightforward
to
copy
over
for
glossary
terms
and
tags
and
stuff
like
that.
A
So
it
turns
out
it
was
pretty
straightforward
if
you're
not
familiar
with
how
this
works.
This
is
a
very,
very
quick
overview,
because
you
know
I
don't
want
to
waste
too
much
time
here,
but
with
when
the
pipeline
runs
right
before
it
writes
every
record
to
whatever
sync
is
defined.
It
passes
it
to
this
transform
function
and
the
transform
function
just
says:
if
there's
a
transformer
transform
the
record
and
it
does
it
on
the
right
side
of
the
screen.
A
Using
this
method
called
transform1,
which
you
can
see,
takes
in
a
metadata
change
event
and
outputs
a
metadata
change
event.
So
that's
what
we'll
be
extending
and
as
an
example
of
this
here's,
what
we
did
for
the
data
set
terms,
and
you
can
see
that
we
on
the
left
side
of
the
screen.
The
data
set
terms
just
extends
that
data
set
transformer
and,
most
importantly,
that
transform
one
method
actually
does
something.
A
Now
it
doesn't
just
pass
and
we
use
the
mce
builder
to
add
a
glossary
term
to
this
to
the
mce
and
then
pass
it
back,
so
that
was
already
implemented
by
that
pattern.
Ad
ownership
class.
So
that's,
we
literally
was
just
like
copy
paste.
That's
how
straightforward
and
easy
it
was
and
then
on
the
right
side
of
the
screen
is
kind
of
the
magic
that
makes
this
pattern.
Add
data
set
transformer
work,
there's
already
this
config
model
that
takes
in
the
regex
patterns
and
parses
it
through.
A
So
we
really
just
copied
that
implementation
and
then
used
it
and
that
that's
kind
of
the
the
crux
of
this
all
you
know
this
was
our
first
time
contributing.
So
it
wasn't.
We
we
didn't
go
for
you
know
this
huge,
huge
feature
or
anything.
We
just
thought
hey.
We
have
this
problem,
maybe
some
others
have
the
same
problem.
Let's
explore
it
and
contribute
it
back
up.
You
know
if
the
community
doesn't
like
it,
they
they'll
always
just
pass
on
it
right.
So
the
risk
was
really
low.
A
We
had
the
flexibility
from
the
project
management
and,
most
importantly,
I
think
we
had
the
community
to
kind
of
back
us
up
reading
through
the
slack
channels
reading
through
the
github.
That
was
pretty
invaluable
to
our
experience.
Just
like.
Oh,
we
we
see,
people
are
having
this
problem.
Setting
up,
you
know
the
dev
environment,
blah
blah
blah,
it's
really
easy
to
fix.
So
with
the
documentation
everything
set
up
testing
all
that
the
contributing
experience
was
pretty
straightforward,
really
easy.
A
I
think
we
had
like
a
few
comments
from
shashanka
and
the
team,
but
honestly
it
was
really
straightforward
and
I
really
can't
stress
how
invaluable
that
slack
communication
was.
You
know
using
that
search
function,
a
lot
of
a
lot
of
good
informations
on
the
community
already.
A
A
Obviously,
we
probably
wouldn't
have
gotten
as
far
as
we
did,
even
with
an
addition
as
simple
as
that,
and
thank
you
all
to
the
stash
team
for
all
your
support
and,
if
you're
interested
in
joining
our
team,
there's
a
link
on
your
screen
right
there
stash.com
about
slash
careers,
we're
growing
and
if
our
mission
sounds
like
something
you'd
love
to
help
build
out.
Please
reach
out
to
me
either
on
the
slack
or
on
linkedin.