►
Description
Maggie Hays and Shirshanka Das from Acryl Data give an update on the DataHub Community and Project for the month of October:
- Community growth + DataHub Swag preview
- Q4 Roadmap Update
- Contributor shout-outs
- Release highlights
- Improvements to User & Group Management
- Nested field support for Hive + Trino
Join us at our next Town Hall - RSVP here: https://forms.gle/g8EpCLnohtPLLtdg6
A
Let's
take
a
look
at
what
the
community's
been
up
to
so
to
date
we
have
we're
nearing
600
members.
This
is
bonkers
to
me.
I
remember
when
there
were
maybe
150
members
in
the
community,
so
the
fact
that
we
grew
by
200
in
one
month.
It's
just
like
it's
mind-boggling
for
those
of
you
who
are
new
or
have
not
joined
us.
Yet
we
do
have
office
hours
every
tuesday
and
thursday.
A
So
we
welcome
you
to
bring
any
and
all
ideas,
questions
troubleshooting,
we're
there
for
you
and
then
also
reminder
like
we're
all
about
collaboration.
A
So
if
you
guys
want
to
help
contribute,
if
you
are
looking
for
ideas,
you're
looking
for
support
head
over
to
the
contribute
channel,
we'll
get
you
set
up
and
then
also,
if
you're,
just
like
really
proud
of
something
you've
done
with
data
hub.
I
personally
would
love
to
hear
about
it.
I
geek
out
about
this
stuff,
so
join
us
in
show
and
tell,
and
let
us
know
the
other
thing
is
we.
We
have
a
very
silly
but
wonderful
way
of
showing
appreciation
for
one
another.
A
So
if
someone's
made
your
day
gone
out
of
their
way
to
help
you
please
say
thanks
to
them
by
giving
them
a
virtual
taco,
with
our
slack
bot
called
hate
taco.
A
Is
it
silly
to
send
a
virtual
taco
absolutely
but
like
it's
okay,
to
be
silly
and
also
show
thanks
so
coming
up
in
future
months,
we
are
gonna,
start
building
out
some
redemption
programs
for
hey
taco.
A
So
if
you
accumulate
tacos,
you
can
trade
that
in
for
swag,
you
can
trade
that
in
for
other
little
perks
and
bonuses
there,
so
really
all
you
have
to
do
is
tag
someone's
username
tell
them
what
you
know
what
you
appreciate
about
what
they
did
for
you
and
then
add
in
the
taco
emoji
and
bada
bing
bada
boom
you've,
given
a
taco,
so
also
here's
us
an
exclusive
peek
at
some
upcoming
swag.
Is
that
a
fanny
pack?
Yes,
we
are
absolutely
going
to
have
a
date
of
brandon
fanny
pack.
A
We
are
well
into
q4,
and
so
we
we
do
have
a
little
bit
of
work
to
do
to
update
our
publish
roadmap
more
to
come
there,
but
just
wanted
to
give
a
heads
up
that
the
core
data
hub
team
is
going
to
be
working
on
building
out
some
additional
support
within
the
metadata
model,
specifically
for
schema
history,
com
level,
lineage,
which
is
a
widely
requested
feature
for
us
and
then
also
data
quality,
specifically
targeting
great
expectations,
but
just
also
building
out
kind
of
that
more
generalized
data
quality
metadata
model
we're
also
going
to
be
looking
or
building
out
support
for
multiple
data
platform
instances.
A
We're
also
going
to
be
focusing
on
improved
support
for
dbt
and
also
just
figuring
out
a
better
way
to
kind
of
organize
the
the
entities
of
that
and
then
another
widely
requested
feature
from
the
community
is
better
handling
of
stale
metadata.
So
if
something's
actually
deleted
or
removed
upstream
or
in
production,
then
that's
going
to
be,
it
will
also
be
removed
or
soft
deleted
from
data
hub
to
minimize
confusion
of
if
that
data
or
if
those
assets
are
even
still
available
for
you,
the
last
one
is
spark
data
set
lineage.
A
So
that's
going
to
be
on
the
horizon
as
well,
and
then
just
a
general
call
for
community
support,
we're
hearing
a
lot
of
interest
still
in
building
out
a
tableau,
connector
and
then
click
house
is
also
a
big
one.
I
think
they
just
raised
like
250
million
dollars
or
something
this
week.
So
I
think
you
know
we're
gonna
we're
seeing
a
lot
more
activity
there,
so
tableau
and
click
house,
if
you
are
either
in
interested
in
contributing
either
in
the
build
or
the
design
of
that.
A
Please
reach
out
to
me
and
I
can
help
facilitate
the
community
collaboration
there.
So
that's
all
from
me,
sri
shanka,
I'm
going
to
pass
it
over
to
you
and
we'll
talk
about
project
updates
for
october.
B
Awesome
so,
first
off
there
was
a
lot
of
activity
in
the
last
couple
of
releases
and
we
got
three
new
companies
who
joined
our
community
officially
as
adopters
peloton.
I
think
everyone
knows
who
they
are
so
really
excited
that
we're
all
going
to
be
getting
peltons
from
them
shortly.
B
All
right
so
arun
people
might
know
him
from
you
know.
Being
a
long-time
data
hub
contributor
was
at
expedia
and
looks
like
datahub
jobs
is
a
thing
and
so
arun.
Why
don't
you
share
a
little
bit
on
your
journey
from
expedia
to
peloton.
C
I
was
one
of
the
initial
contributors
for
within
expedia
group
for
data
hub
and
we
were
able
to
get
some
good
traction
with
an
expedia
group
and
recently
moved
to
peloton.
Pelotonism
is
in
the
stage
of
like
rapid
growth
and
they're,
trying
to
get
a
data
discovery
platform
and
the
data
hub
suited
to
what
they're
looking
for
so
so
the
community
definitely
helps
in
in
getting
getting
your
visibility
and
getting
your
jobs
around.
That
is
something
that
I
can
guarantee.
B
Awesome
there
are
a
few
other
companies,
also
that
have
added
their
logos
at
dfds.
If
you
don't
know
who
they
are
they're,
actually
a
really
large
company,
a
danish
shipping
company,
the
busiest
shipping
company
according
to
wikipedia.
B
Actually
is
a
crypto
company
and
they're
also
they're
much
earlier,
obviously
than
the
fds
or
peloton
in
their
journey
as
a
company,
but
it's
very
nice
to
see
kind
of
companies
at
very
different
stages
of
their
journey,
deciding
that
they
need
something
like
data
hub
and
then
deciding
to
adopt
it
so
very
nice
to
see
the
the
wide
range
of
adoption
of
this
project.
That
said,
let's
go
into
the
release
details.
B
B
We
had
30
plus
committer
contributors
from
almost
like
20
plus
companies,
it's
hard
to
sometimes
know
from
the
github
vandals,
if
they're
from
the
same
company
or
not.
But
I
did
my
best
and
I
think
we've
got
about
20
companies
contributing
to
the
project
right
now,
which
is
great.
It's
a
substantial
growth
from
like
10
plus
in
the
last
month.
So
almost
doubling
the
number
of
companies
that
have
started
sending
contributions
into
the
project,
which
is
great
a
lot
of
new
contributors.
B
Thankfully
github
now
gives
a
nice
release
notes
highlight
showing
who
are
the
new
contributors.
So
thank
you
to
all
of
you
for
the
new
contributions
that
you
made
this
month.
Oktoberfest,
I
think,
had
something
to
do
with
it,
but
I
would
like
to
believe
that
this
is
actually
going
to
be
a
trend
in
terms
of
contributor.
B
Shout
outs
huge
shout
out
to
enrico
for
actually
improving
the
testing
infrastructure
for,
like
he
battled
the
ci
system,
like
crazy
ben
marty,
who
has
been
working
really
hard
on
getting
data
hub
to
work
on
m1.
Thanks
for
all
your
work
and
patience
so
far,
and
hopefully,
by
the
time
the
next
town
hall
rolls
around,
we
actually
have
an
m1
deployment
like
a
quick
start,
working
sim
bunsel
as
usual.
B
For
always
following
up
on
small
things
and
sending
pr's
david
schmidt
sent
his
first
pr
to
meta
world
which
is
still
under
wraps,
but
public
web
github
repo,
where
we
are
starting
to
store
recipes
and
best
practice.
Examples
for
how
to
work
with
data
hub,
so
david
schmidt
contributed
how
to
write
a
custom
source
and
run
it
on
your
end
without
having
to
fork
data
hub.
So
that
was
pretty
cool
recommend
you
check
it
out.
B
Remy
salman,
of
course,
has
been
quietly
improving
the
looker
and
dvd
sources
so
love
all
the
contributions
he's
making
and
back
at
you
maggie
for
contributing
the
features
overhaul
page.
I
think
it
was
long
overdue
and
really
like
seeing
kind
of
how
nicely
you've
laid
it
out
with
gifs
and
everything,
so
people
can
get
a
very
easy
way
of
understanding
what
how
to
use
data
hub.
B
It's
actually
been
quite
fun
and
I'd
highly
recommend
joining
them
to
geek
out
about
both
metadata
model
designs
as
well
as
troubleshooting,
as
well
as
pretty
much
anything
kitty,
danielle
for
being
very
patient
with
us,
as
well
as
sending
lots
of
interesting
problems
as
well
as
solutions
on
the
community,
assem
and
remy.
Of
course,
thanks
for
helping
people
out
as
well
and
jared
martin
newcomer,
but
lots
of
great
questions
and
engagement.
B
So
thanks
thanks
for
all
of
all
the
engagement
from
all
of
you
cool
and
then
moving
on
details
around
the
the
release
and
the
upcoming
release.
So
0816
is
out
and
the
python
release
that's
going
along
with
it
is
zero.
B
Support
for
redshift
usage
landed
support
for
external
tables
in
redshift
and
a
few
small
improvements
in
representing
types
in
redshift
as
well
a
host
of
improvements.
You
can
go
over
the
commit
history
and
see
all
of
them.
Trino
landed
some
improvements
to
hive
mongodb
improvements
in
handling
large
document
sizes,
bigquery
lineage.
We
will
have
a
talk
on
it
later
today
and
some
performance
improvements
that
we
snuck
in
as
well.
So
I
don't
know
how
many
of
you
have
been
running:
looker
ingestion.
B
It
usually
takes
a
while
to
go
over
because
most
people
have
a
lot
of
dashboards
and
a
lot
of
charts.
We
added
parallelism
for
the
looker
source,
so
it's
now
going
much
faster
and
ingesting
metadata,
much
faster
and
then.
Similarly,
on
the
data
hub
rest,
sync
side,
we
actually
added
a
max
threads
config
variable.
B
It's
currently
defaulted
to
one
just
because
we
didn't
want
to
randomly
surprise
the
community
with
a
lot
of
parallelism.
But
if
you
know
what
you're
doing
you
can
go
in
to
the
config
and
change
your
max
threads
to
you
know:
10
20,
30
and
just
see
that
kind
of
that
really
not
exponential,
but
multiplicative
improvement
in
your
ingestion
throughput.
B
We
were
able
to
get
in
some
cases,
liquor
ingestion
times
down
from
50
minutes
to
like
five
minutes,
so
it
actually
has
a
huge
bang
for
the
buck
because
you
know
you're,
basically,
mostly
I
o
bound
in
calling
looker
and
then
calling
data
hubris
to
get
the
metadata
in
on
the
data
hub
kafka.
Sync.
Obviously,
you
won't
see
those
throughput
problems,
because
the
data
have
kafka
sync
automatically
batches
and
sends
data
off
async,
so
data
her
breasts
think
we
improved
it
so
very
excited
about
that
cool.
Moving
forward.
B
D
You
can
see
all
the
users
and
groups
that
you've
ingested
into
data
hub,
as
well
as
those
that
are
active.
So
in
some
cases
you
may
batch
ingest
users
from
say
a
d
and
then
you
can
actually
see
if
that
user
has
logged
in
via
sso,
using
like
a
little
active
badge,
which
is
pretty
handy.
You
can
remove
users
and
groups,
there's
a
lot
of
cases
where
you're
you
know,
manipulating
how
users
are
ingested,
changing
things.
D
So
we've
tried
to
make
it
a
little
bit
easier
to
see
where
you
guys
are
at
and
then
kind
of
fix
up
on
the
fly.
So
you
can
remove
them.
You
can
create
new
groups
through
the
ui,
as
opposed
to
ingesting
them
from
a
third
party
source,
and
you
can
add
and
remove
members
from
groups
and
of
course
this
is
all
integrated
with
the
policy
system,
ownership,
etc.
So
you
can
kind
of
do
that
entire
workflow
from
creating
a
group
adding
members
and
then
adding
responsibilities
to
that
group.
D
B
Yeah,
I
blocked
my
screen
accidentally
back
here,
so
so,
quick
updates
on
what
happened
on
the
ingestion
side,
nested
field
support,
landed
for
high
ventrino.
It
was
a
long-standing
community
request,
so
we're
happy
it's
available
in
zero,
eight,
sixteen
one
and
beyond,
I
would
recommend
moving
to
zero
eight.
Sixteen
two,
it
just
looks
like
how
it
looks
on
the
screen
very
simple.
B
B
Maybe
I
want
to
just
transform
metadata
in
flight
and
set
these
owners,
and
one
of
the
things
that
we
often
ran
into,
or
people
often
run
into,
is
that
maybe
someone
has
gone
into
the
ui
and
added
some
more
owners
or
made
some
changes,
and
maybe
there
are
some
other
people.
Who've
been
added
as
owners,
but
if
the
injection
system
runs,
it
can
then
override
those
ownership
metadata,
and
we
were
thinking
about
how
to
improve
the
situation
and
one
of
the
things
that
we
came
up
with.
B
That
is
kind
of
very
simple,
but
can
actually
help
the
situation.
Quite
a
bit
is
using
data
hub
itself
during
the
transformation
process
to
assist
with
the
transformation,
so
we're
calling
it
server,
assisted
transformers
and
what
that
leads
to
is
having
the
transformer
have
access
to
the
data
hub
graph,
while
it
is
making
the
transformation
happen.
So
if
you
click
the
next
animation
transformers
have
access
to
now
a
context
object.
This
is
coming
soon.
B
It's
not
in
the
code,
yet
they'll
have
access
to
a
context,
object
which
allows
them
to
get
access
to
the
graph
which
allows
them
to
first
see
who
the
existing
owners
are
and
then
to
decide
what
they
want
to
do.
So
this
can
allow
transformers
to
do
complicated,
sophisticated
things
like
patching
owners
or
figuring
out
the
new
set
of
owners
that
it
needs
to
write.
B
So
it's
not
deleting
owners
that
should
not
be
deleted,
so
it
you
know
a
couple
of
screenshots
for
what
that
might
look
like
you
know
you
get
a
pipeline
context
in
your
transformer
which
has
access
to
the
graph
and
then
in
your
transformer
you
can
kind
of.
If
you've
got
a
configuration
called
semantics,
you
can
kind
of
look
at
the
semantic
that
you
want
to
do
and
then
apply
that
change.
So
it's
pretty
simple,
but
I'm
very
excited
about
it.
B
Cool
next
is
an
interesting
thing
we
did,
which
is
use
data
hub
itself
to
represent
the
metadata
model.
This
has
been
a
common
ask.
You
know
people
have
looked
at
the
data
hub
metadata
model,
it's
scattered
in
the
code
base
across
many
many
pdl
files,
and
it
can
often
be
challenging
to
understand
how
everything
relates
to
everything.
So
we
built
a
little
bit
of
an
automated
system
that
can
process
all
of
these
pdl
and
avro
files
and
produce
a
metadata
model
and
we
use
the
dot
to
kind
of
render
it.
B
Obviously,
it
looks
kind
of
messy
and
complicated
like
real
metadata
models.
Do
it's
a
very
easy
way
to
look
at
how
everything
relates
to
everything
and
what
are
all
the
relationships
that
have
been
already
created
by
the
metadata
model?
But
if
you
actually
wanted
to
browse
it
live
you
can
click
on
the
link
right
here,
which
is
datahub,
project.io
and
then
navigate
to
the
entities
page.
So
maggie,
do
you
mind
clicking
on
that.
B
And
then
you'll
be
able
to
see
pretty
much
all
of
the
entities
that
we
have
modeled
represented
just
as
data
sets
for
now.
So
if
you,
for
example,
click
into
the
data
set,
data
set
you'll
be
able
to
see
the
entire
schema
of
the
data
set
model,
including
the
urn.
The
data
set
key
and
it's
all
modeled
as
a
struct,
so
you
can
expand
them
out
and
look
at
them
in
all
their
glory.
B
So
maggie.
If
you
don't
mind,
looking
at
ownership.
A
B
Oh
there,
it
is
ownership
and
then
expand
out
owners
you'll
be
able
to
see
that
the
relationships
that
exist
between
data
sets
and
owners
are
actually
modeled
as
foreign
keys
like
this
should
so,
if
you
click
on
that
foreign
key
link
you'll
be
able
to
see
that
that's
a
link
to
the
corp
user
and
that
there's
another
foreign
key
relationship
to
corp
crew.
So
all
of
this
is
generated
automatically
off
of
the
metadata
model
itself,
so
very
excited
that
we're
able
to
automatically
generate
a
bunch
of
stuff
like
this.
B
If
you
scroll
down
further,
you
also
see
time
series
aspects
that
we
are
able
to
capture
in
this
they're
marked
as
temporal.
So
you
can
see
also
the
data
set
profile
aspects
as
well
as
the
data
set
usage
statistics
aspects
in
there,
so
have
fun
with
this.
It's
loaded
up
into
demo
and
will
share
it
out
into
the
code
base
as
well
as
something
that
you
can
build
locally
and
browse
the
metadata
model
yourself
on
your
local
data
hub.