►
Description
Taufiq Ibrahim (Bizzy - Warung Pintar Group) shares Bizzy's experience choosing a data catalog and working closely with the DataHub open-source community
A
Good
morning,
everyone
I'm
tavik,
I'm
from
indonesia
right
now,
I'm
working
at
busy
now
part
of
one
twitter
group
and
I'm
going
to
share
our
case
study
with
data
hub
and
how
we
develop
the
re-dash
source
connector
here
next.
A
Yeah,
this
is
some
a
few
things
but
busy
and
which
is
now
part
of
worldwide
group
busy
was
founded
in
2015
and
it
was
a
b2b
marketplace,
and
then
we
have
several
multiple
company
restructures.
There's
a
merger,
there's
cells
and
then
acquired
by
one
quinturing
to
20
early
to
2021..
A
Now
we
are
serving
around
600
brands,
fmcg
brands
and
serving
around
230k
of
retailers,
fresh
indonesia.
So
actually
we
have
two
kind
of
business
here.
One
is
the
supply
side,
which
is
which
is
working
with
the
distributor
and
fmcg
brand,
and
the
other
thing
is
we
work
with
the
retail
part
with
the
we
call.
It
is
actually
in
in
initial
work
for
grocery
retailers,
yeah
thanks
yeah.
This
is
a
data
ecosystem
at
one
printer
group
and
busy.
A
So
we
have
several
legacy.
Let's
say
a
legacy
stack
coming
from
existing
platform
from
corporate
enterprises
like
sap,
but
we
also
have
more
modern
architecture
like
a
cloud-based
application.
So
we
have
like
a
mix
of
technology
stacks
like
you
can
see
that
we
have
airflow.
We
have,
as
it
still
have
ssis
here
and
then
we
broke
some
of
this
stack
into
operational
part
and
the
analytical
part,
and
also
we
have
operational
domain,
which
is
actually
the
erp
and
the
application
databases
we.
A
We
do
some
best
processing
in
operational
data,
engineering
and
stream
processing,
which
is
actually
quite
different
from
most
analytical
data
engineering.
A
We
also
touch
the
production
database,
like
updating
data,
synchronized
data
from
multiple
sources,
and
then
we
also
capture
change
the
capture
from
the
application
database,
using
kafka,
connect
and
sync
it
into
multiple
things,
like
operational
reporting,
dbs
and
then
also
right
into
the
bigquery,
which
is
processed
by
airflow,
to
be
served
by
several
bi
and
reporting
tools.
A
You
can
see
that
we
have
multiple
reporting
services
like
we
have
the
the
old
stack
like
legacy,
sql
server,
reporting
services,
we
have
metabase,
we
have
redash
and
also
we
have
jupiter
why
we
have
so
much
stack
here,
because
we
we've
been
through
multiple
merged
and
sales
and
we
need
to
maintain
most
of
it,
because
the
users
still
need
to
use
it.
That's
why
metadata
and
then
the
lineage
things
is
really
important
here.
A
So
we
can
understand
easier
for
all
the
data
yeah
next,
so
why
we
need
the
data
catalog
in
busy,
because
one
the
first
one
is.
We
have
like
endless
repeated
question
from
from
anyone
like
where
the
data
is
how
it
is
produced
who
owns
it,
and
the
question
is
like
repeated
every
day
from
different
person
and
we
we
we
keep
answering
it
and
then
it's
also
difficult
to
look
for
lineage
and
impact
analysis
like
because
we
have
lots
of
data
source
and
a
lot
of
reporting
that
use
the
the
data.
A
It's
it's
quite
difficult
to
to
search.
If
we
we
want
to
to
change
a
data
or
modified
data,
what
what
is
the
impact
for
for
the
other
for
the
the
application
for
the
reporting,
something
like
that
yeah.
A
So
this
is
our
journey
with
the
data
catalog
things
at
the
beginning
of
2020.
A
We
just
create
a
like
a
simple
manual
dead
lineage
on
cuba,
siege
and
then
we
move
to
to
do
some
poc
with
mhg
atlas,
but
we
found
that
it
was
too
complex
and
too
hadoop
at
the
time.
So
we
we
stopped
the
the
poc
and
then
we
also
doing
some
plc
with
amundsen,
but
at
the
time
it
wasn't
really
answering
what
we
need
and
then
actually
at
the
end
of
2020,
we
found
data
hub
and
then
we
start
doing
poc
and
then
development
with
data
hub
next.
A
So
this
is
some
reason
why
we
choose
data
hub,
mostly
because
data
hub
pretty
much
match
with
our
data
set
like
mostly
like
connect
and
bigquery
and
kafka,
because
data
have
used
kafka
a
lot
right
so
like
it's
really
matched
with
our
requirement
and
then
the
nokia
ingestion,
the
emerald
recipes,
that's
really
really
helpful
for
us
and
then
the
development
of
the
source,
connector
and
sync
connector.
A
It's
really
really
helpful.
The
the
documentation
was
really
helpful
and
then
the
other
features
that
we
are
really
love
is.
We
can
show
the
dashboard
link
from
the
app
and
then
right
click
to
it
and
click
to
it,
and
then
we
we
will
brought
right
into
the
the
the
dashboard
itself
and
then
now
we
have
the
the
role
based
access
and
we
can
limit
what
users
can
do.
That's
just
really
really
awesome
right.
Now
yeah
next.
A
So
this
is
our
data
hub,
integrations
usage
here
at
our
fintech
group.
We
have
databases
mostly
rdbms,
like
mysql
sql
server
postgres.
We
also
have
bigquery
kafka
and
there's
there
are
two
source
integrations
that
we
contributed
this.
That
is
cafe,
connect
and
re-dash.
A
The
basic
is
actually
as
long
as
you
can
construct
the
urls
things,
then
you'll
be
fine,
and
then
previously
we
have
several
legacy
lineage
stored
in
the
google
seat
and
we
just
parse
it
into
mcs,
and
we
have
that
lineage
right
into
the
data
hub,
even
if
the
source
is
actually
not
working
as
plugin.
A
We
just
push
it
directly
to
data
hub
mcs,
yeah
thanks,
so
why
we
develop
read
this
integration
because,
after
the
merge,
we
found
that
one
quinta
group
use
readers
a
lot
from
data
analysts
to
product
teams
to
hr
teams.
They
use
redash
a
lot,
they
practically
love
to
learn
sql
and
they
can
use
redash
quite
good
and
it's
actually
a
develop
based
on
the
superset
source
and
then
the
other
reason
is
actually
it
helped
the
plc
to
be
approved
internally
and
right
now.
A
B
A
Yeah
this
is
the
the
the
example
of
the
recipes
of
our
re
dash.
Is
you
can
find
it
in
the
documentation,
the
github?
Basically,
what
you
need
is
the
the
url
connection
of
the
radar
server
itself.
This
is
a
hosted,
I
know
not
the
hosted
three
dash,
but
this
is
the
open
source
one
and
we
need
the
api
key
and
then
we
can
limit
the
page
for
like
for
testing
purpose
by
default.
A
Optionally,
if
this
is
the
default
true,
so
if
you
want
to
invest
the
draft
or
unpublished,
dashboard
and
chart,
you
can
set
it
to
false.
A
A
A
Yeah,
this
is
a
with
this
dashboard.
We
can
see
the
what
is
inside,
but
unfortunately,
for
for
current
development.
We
haven't
ingested
the
ownership
for
now.
Then
we
have
the
the
view
in
re-dash
button
here
we
can
see
what
is
it
actually.
A
This
is
the
re-dash
dashboard.
That's
actually
querying
the
usage
of
the
radius
itself.
So
if
you
see
here,
we
can
see
that.
A
But
currently
we
haven't
do
things
like
how
to
map
to
the
actual
table
because
it's
going
to
get
a
sql
parsing
like
look
ml
does,
but
we
haven't
developed
it
right
now.
B
I
think
someday
you
can
probably
demo
us
the
massive
lineage
graph
that
you
showed
me
once,
which
I.
A
If
you
want
to
yeah,
I
can
do
that
right
now.
I
already
prepared
it
for
you,
so
one
of
the
few
things
that
we
hope
data
hub
can
address,
that
is
the
lineage
visualization.
I
think
for
now
most
of
the
data
catalog
tools
actually
have
the
same
problem.
A
If
we
see
here
this
is
actually
the
the
lineage
that
came
from
like
our
legacy
lineage
google
sheets,
which
I,
which
I
just
push
it
into
data
hub,
and
if
we
see
that
this
is
having
quite
a
large
lineage
graph,
and
when
you
have
this.
A
B
Yeah,
I
think
we
will
ship
people
oculus
glasses,
so
they
can
fly
through
these
kind
of
lineage
graphs,
yeah.
B
Like
one's
demo,.
B
All
right,
but
yeah,
point
completely
taken,
I
think
lineage
graphs.
They
look
beautiful
until
they
become
incomprehensible
and
I
think
that's
something
we
as
a
entire
industry
have
to
actually
tackle
yeah
cool.
Let's
move
on
to
the
rest
of
the
slides
and
I
will
share
them
here.
A
Data
hub
development
experience
coming
from
me
and
our
team:
actually,
the
contribution
for
connect
source
was
my
first
open
source
contribution
ever
actually
for
github
tastes,
yeah,
and
I
thought
that
the
community
is
very
welcoming
it's.
They
are
very
supportive.
I
even
got
some
private
message
and,
like
sushanka
asked
me
do
do
I
still
want
to
to
contribute
something
like
that.
It's
very
it's
very
supportive
yeah
and
the
documentation
is
very
helpful
like
how
to
add
new
ingestion
source.
A
It's
really
really
helpful
in
the
standard
way.
A
Yeah,
this
is
our
studios
and
future
works
for
internal
we
currently
being
in
poc
state.
A
We
are
still
in
p
assisted,
but
we
will
socialize
and
get
users
feedback
starting
from
next
week,
and
I
hope
that
this
will
give
impact
for
for
our
organization
and
expected
for
data
hub
yeah.
We
already
talked
about
the
lineage
for
the
large
graphs
and
then
we
also
interested
in
operational
data
quality
metrics,
something
like
lagging
metrics
and
then
row
count
something
like
that:
yeah
just
to
check
for
the
anomaly
on
daily
basis,
something
like
that
yeah.
That's
all
for
me.
Thank
you.
So
much.