►
From YouTube: How the Hurb Team adopted DataHub
Description
Patrick Braz (Hurb) shares how he and his team successfully adopted DataHub within their organization during the February'23 Town Hall.
Presentation Deck:
https://docs.google.com/presentation/d/1cFZ1hhMtuXM1yU5ZHvT2aTGT89pUg33dR5ckWEzskQY/edit?usp=sharing
Learn more about DataHub: https://datahubproject.io
Join us on Slack: http://slack.datahubproject.io
Follow us on Twitter: https://twitter.com/datahubproject
A
Hello:
everyone,
my
name,
is
Patrick
I'm,
the
engineer
at
herb,
and
today
I
want
to
present
to
you
the
data
Hub
implementation
at
church.
They
did
the
Hub
implementation
Journey,
so
I
separated
this
presentation
in
four
sections
and
the
first
one
I
want
to
present
to
you
a
little
about
herb
in
the
second
session.
I
will
talk
about
challenge
that
we
have
and
what
why
we
decided
to
use
data
in
the
third
in
the
third.
A
Second
I
want
to
talk
about
the
adoption
steps
and
at
the
final
I
want
to
introduce
to
you
how
we
are
today
and
what
what
plans
do
we
have?
Where
do
you
want
to
come?
So
so
talking
a
little
about
the
herb,
we
are
a
Brazilian
online
travel
agency
and
we
offer
travel
deals
that
include
hotel
accommodations
activities
and
travel
packages.
A
A
But
how
is
it
possible
and
what
are
the
challenge
involved
so
herb
counts
on
a
solid
data
during
culture
across
the
company
and
a
team
with
data
analysis,
data,
scientists,
machine
learning
or
and
data
Engineers
that
is
growing
and
working
hard
to
build
a
secured
framework
for
the
company
decision
making
with
that,
we
have
to
deal
with
different
issues
in
our
daily
routine.
A
A
Resource
cataloging
data
assets
Discovery.
The
question
is:
what's
the
value
to
have
many
data
assets
if
you
cannot
find
them
or
discover
their
purpose?
So
this
is
a
problem.
There's
a
usual
problem
that
we
had
and
the
traceability
of
data
audition
is.
Basically
it
plays
an
important
role
when
strategic
decisions
rely
on
information
or
data.
A
So,
with
the
traceability
of
data
origin,
we
can
find
and
discover
how
data
was
transformed
if
it
was
transformed
correctly
and
if
the
data
has
low
to
specific
location
and
at
the
final,
the
most
important
one
is
building
a
single
source
of
Truth
in
herb.
We
deals
with
different
services
and
our
primary
Services.
The
services
are
metabase
and
bigquery,
so
we
can
catalog
method.
We
can
catalog
our
assets,
Within
These,
two
different
services,
but
catalog
different
Assets
in
using
distinct
Services,
can
cause
metadating
consistencies.
A
So
with
these
problems
with
this
challenges
in
mind,
we
started
to
think
about
a
data
governance
project,
so
the
first
step
was
to
create
a
project
requirements
documentation,
so
this
documentation
consolidates
all
the
requirements
in
a
clear
and
concise
manner.
So
the
idea
was
to
move
to
quit.
How
can
you
say:
what's
the
map
map
problems
and
expectations
from
the
two
like
how
users
will
use
the
platform?
How
can
we
re-engage
collaborators
to
use
the
platform,
how
applications
will
communicate
and
other
questions?
A
So
after
we
create
the
document
and
after
we
create
the
requirements
document,
we
started
to
find
to
search
for
data
catalog
tools,
so
we
found
data
hub
and
why
we
decide
to
implement
datahub
inside
our
company,
so
I
can
sit.
Four
main
points
that
drive
our
idea
to
implement
data
hub.
First
of
all,
is
the
user
friendly
interface.
A
We
have
a
solid
self-service
security
inside
the
company,
so
we
want
to
permit
any
collaborator
to
access
and
navigate
our
services
in
a
unique
platform,
so
the
collaborator
can
build
can
can
find
assets.
They
want
and
build
analysis
that
that
can
help
him
in
his
daily
routine.
A
A
Another
point
is
that
contribution
opportunity,
so
we
have
a
strong
open
source,
skew
turn
inside
the
company
and
we
want
to
position
ourselves
as
a
solid
Brazilian
company
and
finally
Beauty
in
injection
sources,
with
our
primary
services
that
is
metabase
and
bigger
query.
A
So
I
summarize
this
unit
in
these
steps.
First,
we
started
with
the
POC
phase
after
we
started
the
we
started
to
host
our
data,
Hub
instance,
and
with
the
data
Hub
instances
inside
our
kubernetes
cluster,
we
start
the
customization
phase
and
finally,
we'll
present
to
you.
How
is
our
integration
stack
today?
A
So
in
the
POC
phase
we
tested
all
available
features
at
the
moment
and
Integrations
with
our
primary
services
like
bigquery
and
metabase,
and
it's
first
it's
important
to
note
that
we
use
VMS
to
deploy
the
data
hub
using
custom,
Docker
compose
files,
so
with
custom
guitar
composed
files,
we
could
change
environment
variables
to
test
different
behaviors
of
the
data
hub.
A
This
phase
was
important
because
of
decisions
that
we
make
for
the
future,
so
one
of
the
most
important
was
to
disable
fronti
the
front
end
injection.
So
we
want
to
use
data
Hub
as
a
source
of
Truth,
so
the
ingestion
through
UI
was
not
an
exciting
feature
for
a
science.
We
could
not.
We
could
control
the
ingestion
through
back-ends
and
that's
why
we
decide
to
orchestrate
our
injection
with
a
partial
flow
so
to
use
our
flow
is
our
injection
orchestrated?
A
We
separate
the
dependencies
with
kubernetes
pod
operator,
so
airflow
just
needs
to
start
parts
and
provide
the
parameters
for
the
execution
for
that
we
create
a
deck
Factory
that
views
that
gather
different
ingestion
receipes
and
builds
the
ingestions
okay.
So
talking
about
the
kubernetes
deployment
phase,
we
face
it
with
some
issues
like
in
development
experience,
how
can
I
say
managing
environment
variables
across
multiple
deployments
and
the
multiple
values
files
is
costly.
So
all
through
now
we
know
sub
charts
are
recommended
as
good
practice.
A
We
decided
to
refactor
data
Hub
community,
chime
into
a
flattened
version.
Besides,
we
started
to
test
the
usage
of
config
maps
in
separated
Scopes
to
manage
environmental
variables
through
different
applications.
So
this
was
the
the
most
important
decisions
and
one
thing
that
is
important
to
note:
it's.
We
are
planning
to
open
source
or
harm
charts
in
the
future.
I
think
this
year.
A
Talking
about
the
customization
phase.
We
know
that
I
already
have
an
airflow
integration
with
data
Hub,
but
we
had
problems
with
the
dependencies,
so
we
decided
to
implement
a
new
integration
in
our
site
based
on
the
community
Integrations
and
use
the
new
A
New
Concept
of
airflow,
that
is,
data,
set
objects,
so
data
Engineers
can
enrich
metadata
during
airflow
decks
development.
With
this
integration
we
can
not
only
take
advantage
of
triggering
the
execution
of
decks
by
changing
data
sets,
but
we
cannot
automatically
build
lineage
with
lineage
backend.
A
A
Okay,
so
how
is
all
our
stack
today?
I
I,
created
this
diagram
to
show
the
Integrations
that
we
have
today
and
I
forgot
to
talk
about.
Animal
anomalu
is
a
data
quality
platform
that
we
use
to
build
our
data
validations,
so
airflow
manage
all
the
ingestion
to
into
Data
Hub,
and
with
this
integration
we
can
we
can.
A
We
have
the
visibility
or
quality
of
quality
control
from
source
to
Destiny
and
the
one
thing
that
is
very
important
that
we,
we
are
frequently
using
the
impact
analysis,
feature
to
see
who
is
impacted
with
some
chains
or
data
issue
that
are
normal,
finds
and
talking
about
our
roadmap.
A
One
important
thing
that
we
are
currently
working
in
is
customizing
our
front
end,
so
we
find
that
it's
very
important
to
use
the
company
visual
identity,
and
this
will
help
employers
to
identify
with
the
two
and
increase
engagement,
and
this
is
very
important
because
we
want
to
adapt
data
Hub
or
so-called
data
Urban
internally,
as
a
data
product
for
the
company
and
other
things
that
we
are
planning
to
do
is
customize
the
metadata
model
to
Inc
to
include
apis
and
Metric
entities.
A
Most
part
of
our
data
source
is
our
apis,
and
now
we
are
working
to
build
our
semantic
layers.
So
these
two,
these
two
entities
we
will
you'll,
be
very
important
for
us.
Another
thing
is
to
integrate
our
machine
learning
models
and
services
in
our
stack,
so
we
will.
It
will
help
us
to
build
our
full
data
lineage
from
source
to
services
that
are
using,
and
finally,
we
want
to
use
the
actions
framework.