►
From YouTube: Data Team Group Conversation Preview: 2020-08-13
Description
Preview session for the upcoming Data Team Group Conversation livestream scheduled for 2020-08-13. Hear Data Team responsibilities, objectives, and direction from Rob Parker.
https://docs.google.com/presentation/d/1N9Jq4DcLWhkJKLw9tHFr7UIwbZUUR1q_euS9Ez65moQ/edit?usp=sharing
A
Hi
everyone,
my
name,
is
rob
parker.
I
run
the
data
team.
I
am
the
senior
director
of
data
and
analytics
and
thanks
for
joining
today's
group
conversation
covering
the
data
team,
we
all
have
very
important
questions.
We
want
to
answer.
We
have
questions
that
are
focused
on
sales
and
sales
performance
or
customer
and
customer
performance,
or
maybe
questions
that
span
our
business
functions
and
divisions.
A
Many
of
these
questions
are
very
important
to
help
gitlab
grow,
but
they're
very
difficult,
if
not
impossible,
for
us
to
answer
today
and
if
we
think
about
the
questions
and
the
nature
of
the
questions,
really
what
they're
driving
to
it's
about
two
major
business
workflows
that
occur
at
gitlab.
The
first
is
lead
to
cash
and
operations,
questions
around
sales,
performance
or
marketing's
performance
of
driving,
new
qualified
leads
through
the
marketing
funnels
or
the
questions
around
our
product.
How
is
our
product
performing?
A
What's
our
adoption
of
feature
x
or
y
or
z?
Look
like?
Are
our
customers
consuming
our
features
per
their
purchased
quotas
and
purchase
subscription
amounts?
So
all
of
these
questions
are
critical
to
moving
gitlab
forward,
but
we
also
need
to
be
able
to
trust
the
data
that
we're
receiving.
We
don't
want
to
make
a
thousand
or
ten
thousand
or
million
dollar
decision
based
on
bad
data,
so
the
data
needs
to
be
accurate,
reliable.
We,
of
course
want
to
drive
the
single
source
of
truth
so
that
we
don't
have
different
answers
from
different
sources.
A
The
data
needs
to
be
organized
and
labeled
and
we're
all
looking
for
that
self-service
capability
data
democratization
where
everybody
in
the
company
can
have
access
to
the
data
when
they
need
it.
So
all
of
these
problems
are
problems
that
the
data
team
is
working
on
we're
part
of
the
finance
organization
and
we
build
out
and
work
on
solutions
like
key
performance
indicators.
A
As
an
example
of
where
we
are
today
we're
at
level
one
and
some
of
the
things
that
you
might
expect
are
easy
for
us
are
actually
not
easy
for
us.
Recently
in
q2,
we
had
the
opportunity
to
work
on
a
major
cross-functional
initiative
to
generate
email
lists
for
handover
to
marketing,
for
communicating
updates
to
our
customers.
A
These
are
fairly
large
lists.
We're
talking
about
millions
of
records,
so
these
aren't
lists
that
are
easy
to
just
generate
in
a
spreadsheet
plus.
The
list
contains
sensitive
data,
so
we
have
to
treat
that
data
with
encryption
with
security
in
mind
so
for
us,
generating
these
three
initial
lists
took
over
15
person
days
and
with
a
well
designed
dimensional
model
in
place,
which
is
the
direction
we're
heading.
A
Both
of
these
solutions
are
going
to
be
based
on
a
set
of
new
data
models
that
we're
building
that
are
very
much
subject
focused
it's
very
difficult
to
take
a
large
database
that
contains
over
600
tables
and
98
billion
rows
of
data,
and
just
say
here.
Take
this
away
have
fun
with
it.
So
the
approach
we're
taking
is
to
focus
on
specific
subject
areas
such
as
product
geolocation
or
customer
segments,
and
we're
going
to
provide
very
robust
training
materials
for
that
content
and
then
roll
those
out
separately.
A
So,
let's
dive
in
here
for
a
second.
How
do
we
actually
make
98
billion
rows
of
data
into
a
self-service
offering?
So
what
we've
chosen
to
do
is
to
create
a
reference
solution.
We
have
a
link
here
that
you
can
take
a
look
at
in
our
handbook
and
establish
the
reference
solution
as
the
standard
over
the
course
of
q3
we're
going
to
build
out
two
brand
new
self-service.
Subject:
area
focus
solutions
with
this
standard
we're
going
to
deliver
them
to
over
25
self-service
team
members
and
we're
going
to
take
a
look
at
the
results.
A
All
of
our
data
flows
through
similar
data
flow
paths,
we
land
data
into
our
environment
and
we
all
organize
toward
the
ultimate
dimension
destination
of
the
dimensional
model,
and
this
is
how
we
achieve
single
source
of
truth.
If
all
data
is
organized
the
same
way
in
this
dimensional
model,
it
supports
self-service
through
the
dashboard
development
capability
in
sci-sense,
because
it
looks
the
same
you
expect
customer
to
be
in
the
customer
table.
You
expect
product
lists
to
be
in
the
product
table.
That's
the
way
we
organize
it.
A
On
top
of
that
self-service
through
the
sql
analysis
also
drives
through
the
same
dimensional
model.
So
if
you're
querying
the
data
through
psi,
sci-sense
or
through
sql
in
this
dimensional
model,
you're
going
to
get
the
same
results
on
top
of
this
entire
business
flow,
we're
releasing
a
framework.
We
call
the
trusted
data
framework
that
builds
in
business
friendly
data
validations
all
along
these
business
processes.
So
if
we
expect
a
very
important
customer
to
exist
in
the
final
dimensional
model,
with
a
certain
set
of
criteria,
we
can
build
that
in
as
a
test.
A
If
we
expect
a
certain
number
of
rows
to
be
integrated
from
zora
during
our
normal
zorro
refresh.
That
also
is
implemented
as
part
of
our
trusted
data
framework.
Over
time
we
build
out
these
full
suite
of
trusted
data
tests
and
we
build
out
the
notion
of
having
improved
capabilities
and
test
assertions
built
across
our
entire
stack.
A
The
way
we
actually
deliver.
This
is
through
what
we
call
fusion
teams.
Fusion
teams
is
a
new
concept
for
the
data
team
we've
just
rolled
out
in
early
q3,
but
it's
really
the
vehicle
for
us
to
create
these
very
business
friendly
solutions.
If
you
looked
at
the
data
team
in
the
past,
we're
very
organized
in
silos,
we
have
a
finance
team,
member
and
an
engineering
team
member,
but
the
way
that
business
works.
If
you
remember,
is
across
those
two
major
business
flows,
the
lead
to
cash
and
product
release.
A
Of
course,
like
all
gitlab
teams,
we
are
driven
by
our
okrs.
We
have
a
variety
of
okrs
we're
focusing
on
in
q3.
The
call
out
here
around
our
self-service
solutions
is
okr2
certifying
25
team
members
in
self-service
by
the
end
of
the
quarter.
The
data
team
is
really
excited
to
work
on
this
and
we
believe
we're
going
to
have
a
very
robust
self-service
solution
and
invite
any
git
lab
team
member
who's
interested
in
this.
To
let
us
know
you
can
add
a
comment
to
the
doc.
A
Thanks
for
listening,
you
can
contribute
in
a
variety
of
ways.
You
can
provide
feedback
directly
into
our
google
doc
about
this
content.
You
can
jump
onto
any
of
our
slack
channels
and
ask
questions
relevant
to
this,
really
invite
your
feedback
and
comment
data
is
not
a
one
team
solution.
It's
an
everyone
solution.
Everybody
has
a
stake
in
making
data
a
reality,
making
self-service
reality
and
helping
to
scale
get
labs
data
acumen
thanks
for
the
time
see
you
at
the
live.