►
From YouTube: Discussion about product analytics export functionality
Description
Max Woolf, Dennis Tang, and Sam Kerr discuss the problem, proposal, and requirements around exporting user data for Product Analytics.
A
Foreign
perfect,
so
we're
here
trying
today
about
some
questions
about
the
requirements
around
exporting
data
from
product
analytics.
We're
currently
scoping
that
to
two
different
issues,
but
we're
talking
about
the
one
about
exporting
to
Json
files
for
this
discussion.
B
That's
right,
so
the
the
overall
aim
is
that
data
in
gitlab's
new
product
Analytics
capability
should
be
portable,
exportable
movable
to
other
tools,
which
is
the
general
Gest
of
what
it
is
we're
trying
to
aim
for
is.
Would
you
agree
with
that?
Sam.
A
Maybe
a
little
bit
softer
on
that
last
part
about
integrating
with
other
tools.
That's
a
value,
add
feature
but
and
it's
something
we
want
to
do.
But
it's
not
a
critical
requirement.
Really
the
the
base
level
requirement
is.
We
need
to
make
sure
that
customers
and
users
don't
feel
like
they're
locked
into
gitlab
because
we're
holding
their
data
and
won't
give
it
to
them.
A
B
Okay,
so
that's
that's
a
useful.
Why
we,
you
know
so
the
the
problem
that
I've
come
up
with
here
is
in
the
issue
itself,
we're
discussing
implementing
an
API
that
connects
to
a
cube
and
then
exports
all
the
user
data,
which
sounds
right.
However,
Cube
doesn't
really
allow
you
to
do
that
Cube,
my
understanding
of
it
at
least
it
takes
data
and
produces
information
based
on
dimensions
and
measures.
B
So
the
idea
that
we
could
use
Cube
to
Output
a
list
of
raw
Json
objects-
I,
don't
think,
is
possible,
and
even
if
it
were
it's
not
really,
what
cube
is
designed
for
so
I'm,
not
sure
how
scalable
it
would
be,
which
leads
me
to
two
questions.
I
guess
one
is:
is
the
outcome
here
that
we
just
want
access
to
the
raw
data?
B
We
want
to
be
able
to
take
the
raw
data,
that's
in
clerk
house
and
just
provide
that
as
a
Json
file
to
the
user,
or
do
we
want
to
produce
a
sort
of
something
the
cube
can
give
us.
That's
probably
going
to
be
aggregated
by
some
sort
of
time
and
then
provide
that
because
that's
essentially
on
cubeless
anyway,
as
it
just
produces
Json,
that
you
can
then
turn
into
graphs.
B
If
you
want
to
so
I
yeah
I
guess,
I
want
to
straighten
out
whether
the
outcome
here
is
that
we
want
access
to
click
out
data
that
hasn't
been
edited
or
that's
what
I'm
looking
for
hasn't
been
analyzed
in
any
way
whatsoever,
in
which
case
right
now
we
don't
have
that
direct
connection
from
gitlab
to
click
house
and
Cube
handles
all
of
that
security
stuff,
the
stops
the
data
leaking
between
projects,
and
so
that's
just
not
something.
We've
really
considered
right
now,
so
yeah,
that's
where
we
are,
but.
A
Yeah
so
I
think
there's
two
parts
to
that
question.
You
asked
one
about
the
raw
data.
That's
definitely
where
we
want
to
be
with
the
intention
being
that
you
could
take
all
your
data
from
gitlab
in
a
lossless
format,
and
you
could
do
any
sort
of
analysis
or
whatever
you
want
to
do
with
it.
After
it's
out
of
our
system.
A
The
second
part
about
Json,
specifically
I'm
a
little
bit
more
flexible
on
we've,
picked
that
from
an
earlier
discussion,
because
we
thought
it
would
be
simpler
to
do
versus
some
other
format,
but
Json
itself
is
not
a
hard
requirement.
If
there's
something
easier
to
use.
That's
fine!
A
Sorry,
the
reason
I
say
that
is
I
saw
Rob's
comments
about
Cube
having
a
backup
option
which
it
looks
like
it
just
exports.
A
zip
file.
Zip
file
would
be
totally
fine
as
well.
If
that's
feasible,
to
go
that
route
as
well.
I.
C
Don't
think
format's
the
main
concern
here,
though,
it's
more
about
how
we're
exporting
the
data
or
where
from
right,
so
it's
more
about.
Are
we
based
on
and
I
agree
with
Max's
assessment
cube
is
not
the
right
tool,
because
I
think
one
thing
that
we
could
have
clarified
is
that
exporting
is
just
that.
C
From
my
perspective,
our
production
Readiness
conversations
as
well
as
I
think
there
might
be
some
prerequisite
steps.
We
would
need
to
require
for
us
to
feel
comfortable
with
that,
since
Max
is
mentioning
that
you
know,
Cube
has
been
handling
all
these
security
contacts
and
stuff
like
that.
I've
really
just
last
minute
drops
the
link
of
like
CH
proxy,
which
handles
like
authentication
rate,
limiting
requests,
and
things
like
that.
So
at
some
point,
I
guess
this
is
turning
into
my
perspective
on
it
and
I
apologize
for
that.
I
should
have
just
started
with
that
perspectives.
C
C
But
then,
even
if
we
Implement
CH
proxy,
which
is
just
like
another
set
of
like
security
and
tools
and
I
do
like
that,
you
can
map
users
to
like
other
anyways
I.
Think
we
need
a
layer
in
front
of
it
before
we
expose
clickhouse
is
what
I'm
saying,
because
it
seems
like
that's
what
we
would
need
for
for
this
exporting
job
that
doesn't
even
get
into
the
further
details
of
like
do
we?
Where
do
we
store
these
artifacts?
C
You
know
eventually
we're
talking
about
gigabytes,
if
not
terabytes,
of
data.
How
does
that
work?
So
my.
B
Yeah,
my
yeah
you're,
basically
exactly
right
so
at
the
moment,
clickhouse
is
not
accessible
to
the
internet
essentially,
and
that
buys
us
the
security
we
need
in
a
sense
of
you
can't
access
it.
You
can't
access,
it
is
via
Cube
and
Cube
does
all
the
stuff
that
says
where
you
can
access
this
road,
but
you
can
access
this
table
based
on
what
killer
Beauty
you
are,
which
is
great
and
that's
exactly
what
we
need
yeah.
B
The
bigger
problem
we're
going
to
have
is
if
we
need
to
export
click,
enhance
data
in
some
sort
of
automated
fashion.
We'll
benefit.
Do
that
and
we'll
need
to
do
exactly
the
same
thing
that
we're
doing
with
Cube.
So
in
short,
I
agree
with
Dennis
in
the
sense
of
anything's
possible.
We
can
do
anything
we
want
whether
or
not
we
can
do
that
as
part
of
an
MVC
is
very
much
Up.
For
Debate
I
would
say,
I
mean
it
comes
down
to
priorities.
A
Well,
so
let
me
ask
another
question
about
the
the
zip
file,
because
my
the
thing
that
piked
my
interest
there
was
not
the
change
in
format
versus
Json,
but
the
fact
that
now
we're
going
to
be
managing
a
single
binary
blob
that
might
open
up
some
simpler
options
for
us
because
I
agree.
Exposing
click
house
directly
to
the
internet
is
not
the
direction.
I
think
we
should
go
in
for
Myriad
of
concerns.
A
If
we
have
a
backup
zip
file,
though,
would
that
open
up
the
option
for
us
to
store
that
file
somewhere
in,
say
an
S3
bucket?
Send
the
user
an
email
to
here's,
where
your
backup
is
go
download
it
because.
B
I
mean
it's
it's
a
solution,
but
it
doesn't
solve
the
problem
that
we
have
so
it
it
reduces
the
the
problem
further
down
the
line
where
we're
going
to
be
potentially
sending
gigabytes
or
terabytes
of
data.
If
we're
sending
that,
essentially
between
one
data
center
to
another
data
center,
then
there's
a
cost
implication
in
terms
of
bandwidth
there,
especially
for
very,
very
large
files.
But
if
we
ignore
all
that,
yes,
that
can
totally
be
done.
That's
probably.
C
B
Right
that
helps
the
problem
that
will
yes,
we
can
totally
do
that
from
a
technical
point
of
view.
If
we
put
cost
and
bandwidth
to
decide.
Yes,
that's
doable,
but
the
concern,
then,
is
that
we
need
to
write
the
the
logic
and
the
code
that
says
well
when
I
backup,
Project
X,
then
that
needs
to
that
needs
to
export
the
particular
Clinic
house
database.
B
C
Have
an
existing
configuration
options,
I
believe
for
this
right
now,
right
with
regards
to
like
project
exports
and
those
artifacts
so
like
that
will
write
to
your
server
if
it
resides
on
that
in
a
writable
directory
or
cloud
storage
depending
if
you
have
that
defined
so
I
guess,
for
the
purposes
of
this
MVC
and
ignoring
the
extreme
edge
cases
of
like
you,
know,
terabytes
of
data
and
like
not
having
enough
space
for
that.
Do
you
think
that's
a
suitable
like
MVC
start.
B
C
What
would
have
to
happen
is
likely
sounds
like
we
need,
instead
of
doing
this
backup
option
that
Rod
has
posted
I
think
does
make
sense,
but
that
would
at
least
require
some
type
of
minimal
backup
service
to
interface
with
clickhouse
or
CH
proxy,
which
I
think
would
be
the
preferred
option.
So
then,
there's
like
a
couple
dependencies
here
right.
We
have
ch
proxy.
We
have
a
backup
service
to
actually
execute
these
backup
arguments
and
then
say:
hey
once
you've
done.
B
Mean
we
do
have
the
one
thing
that
we've
got
going
for
us
here
is
that
clickhouse
has
native
integration
with
S3,
specifically,
which
probably
reduces
a
little
bit
of
complexity
instead
of
proxy
will
allow
if
we
set
that
up.
B
The
X
proxy
will
allow
us
to
say,
okay,
cool,
we'll,
take
this
project
and
send
it
to
our
story,
and
we
can
use
the
pre-defined
three
credentials
if
it's
been
set
up
to
do
so,
and
that
can
be
downloaded
that
bit
I'm
not
worried
about
I,
think
that
seems
totally
fine
and
it
will
take
as
long
as
it
takes
in
terms
of
size
and
bandwidth
that
yeah
the
bigger
concern
is
those
dependencies
are
on
CH
proxy.
B
How
we
interface
with
that
safely,
because
my
biggest
concern
is
that
we
end
up
introducing
some
sort
of
vulnerability.
That
means
that
someone
can
export
the
data.
Sorry
excuse
me
from
one
project
and
then
access
it
and
then
store
it
in
S3,
and
then
how
do
we
make
sure.
C
We
can
download
it
yeah
there,
it
so
I
as
I
understand
as
chpxc
allows
you
to
Define
like
arbitrary
users
to
then
interface
with
clusters
or
databases
that
actually
have
like
users
authenticated
to
them.
So
I
wonder
if
there's
like
some
type
of
way,
we
can
set
that
up
dynamically
so
that
the
backup
job
only
has
access
for
that
when
it's
initiated
and
that's
it
or
something
like
that.
But
that's
a
very
investigatory
for
both
the
chbox
for
chboxy,
specifically.
C
That's
I
think
that's
the
last
question
we
need
in
terms
of
like
requirements,
because
something
similar
I
think
we
were
thinking
about
pulling
on
a
previous
group
with
a
compliance,
for
example,
when
we
wanted
to
like
push
events
somewhere.
We
just
said
okay
for
our
first
iteration
we're
going
to
support
only
AWS
right
so
which.
B
Amazon
one
right
now,
just
looking
now,
but
judging
by
the
backup
documentation,
it
only
supports
AWS,
S3
yeah,
that's
I'm,
pretty
sure
you
come
back
up
to
a
local
disk,
so
you
can
generate
the
file.
You've
got
to
find
somewhere
locally
to
store
that
file,
and
then
you
could,
if
you
wanted
to
send
that
to
GCS.
But
again
that's
another
dependency.
We'd
have
to
yeah.
C
Because
that's
a
stack
dependency
then,
because
I
have
to
attach
another
storage
volume
to
the
clickhouse
instance.
To
then
save
to
that
which
has
to
then
be
big
enough
to
transfer
to
GCS
right
yeah,
which
is
fine,
and
this
is
that's
the
question
Mike.
So
should
we
go
away
from
AWS?
These
are
the
these
are
what
we
have
to
figure.
This
is
what
we
have
to
figure
out.
B
C
B
There
could
be
a
plug-in,
photographic
videos.
This
is
this
seems
like
a
problem
that
probably
we're
not
the
first
people
that
have
no
so
that
may
be
worth
looking
into,
but
I'm
all
I'm
doing
is
looking
at
the
click
House
official
documentation,
but
it's
fairly
extensible.
So
it's
possible,
but
there's
a
way
to
do
this
elsewhere.
A
B
I
was
going
to
say
for
self-managed
no
because
well,
it
depends
on
the
size
of
the
project,
but,
like
I've,
run
into
the
problem
before
administering
a
self-managed
gitlab
instances
that
you've
got
gitlab
backups
and
if
you're,
storing
them
on
the
same
servers
that
you're
running
gitlab.
Eventually
they
just
fill
up-
and
you
know
if
your
instance
is
10,
50
100,
a
terabyte
that
can
cause
trouble
and
you
can
end
up
losing
those
files,
essentially
so
yeah,
but
I'd,
be
it
feels
like
we
should
be
saying
to
people
using
this.
B
If
you're
going
to
back
up
from
product
analytics,
don't
store
it
locally
over
a
certain
size,
for
example,.
C
Your
question
Sam
was
like:
can
we
use
that
location
to
simplify
things
that
that
doesn't
really
work,
because
that's
the
final
location
for
where
the
user
can
retrieve
it,
but
we're
talking
about
like
the
service
in
between
where
we
like?
We
don't
want
to
expose
clickhouse
to
as
much
like.
We
want
to
limit
its
access
to
anything
right
to
as
much
as
possible.
So
we
don't
want
that
writing
to
the
gitlab
server.
A
A
Yeah
yeah
I
know
that
that
makes
sense
thanks
for
thanks
for
explaining
that
yeah
something
else
that
comes
to
mind
on
this
is.
We
will
need
to
put
some
sort
of
rate
limit
or
abuse
protection
in
here,
because
if
you
click
export,
5
000
times
a
second
with
a
script
or
something
that
could
just
flood
us
completely
yeah.
C
So
that's
that
will
be
implemented
throughout
the
CH
proxy
level,
as
well
as
the
backup
service,
since
we
should
have
controls
already
in
the
rails
for
real
living,
so
yeah,
okay,
I
guess.
The
summary,
then,
is
that
this
is
obviously
a
lot
more
complicated
than
we
thought
and
I.
Think
that
is
immediately
actionable,
I,
guess
potentially,
then
it
means
I,
don't
know
for
for
Max.
Perhaps
this
might
be
best
to
look
into
either
session
aggregation
or
funnels
yeah.
B
Yeah
probably
probably
agree,
I
mean
I'm
happy
to
look
into
this
stuff,
but
I
think
by
the
sounds
of
it.
The
initial
problem
isn't
infrastructure
on,
rather
than
a
get
that
movie
code
problem
in
terms
of
other
things,
I'm
looking
at
the
aggregation
thing,
as
well
as
throwing
up
some
potential
issues
as
well,
but
that's
a
discussion
for
for
later
I
think
for
another.
A
Yeah
it
sounds,
it
sounds
like
this
is
a
lot
more
complicated
than
urging
speculative.
So
how
about
we
do
this
for
next
steps,
because
it
almost
sounds
like
we
need
to
have
this
issue
sit
in
just
planning
breakdown
for
a
while
to
get
an
implementation
plan
against
potentially
a
spike
issue.
A
If
it's
that
big,
we
also
need
some
requirements
updates,
as
well
as
some
implementation
notes
that
I'll
need
URLs
help
on
to
kind
of
capture
what
we
discussed
today,
yeah
some
of
some
of
the
things
that
I
think
will
need
to
be
done
in
the
requirements.
Updates
and
I
can
do
these.
Let
me
know
what
y'all
think
is
to
remove
some
of
the
specificity
around
Json
files
or
to
re-clarify
Json
as
just
a
placeholder
for
something
simple
change
it
if
it's
easier
to
do
it
in
a
zip
file.
A
Tar
file,
whatever
also
talking
a
little
bit
more
about
the
security
requirements
around
customer,
a
can't,
Export
customer
B
can't
see
other
customers
that
sort
of
thing,
as
well
as
the
abuse
and
inappropriate
use
protections
and
rate
limits.
A
Okay,
what
do
you
all
think
about
that
from
a
requirements
perspective?
I?
Don't
want
to
talk
about
implementation,
details
and
requirements
just
to
make
sure
we
have
freedom
to
go
around,
do
whatever
we
need
to
in
terms
of
solving
that
problem.
C
A
C
B
A
C
B
C
C
A
To
export
the
dashboard,
because
I'm,
an
analyst
I
built
this
really
cool
thing
that
shows
my
boss.
Here's
what
we
need
to
do
here.
You
go!
Here's
the
report,
great
I,
get
a
promotion.
This
is
really
Target
more
at
the
pain
point
of
I
need
to
change,
vendors
or
I'm.
Moving
to
a
different
system,
I
need
to
export
all
my
data
wholesale.
C
A
A
B
Which
case
I
could
probably
do
with
some
input
from
either
or
both
of
you
about
where,
because
I
was
probably
going
to
spend
the
majority
of
this
Milestone
looking
at
this
until
I,
dug
into
it
and
realized
that
maybe
nothing
happening.
So
how
to
think
about
where
my
effort
will
be
best
concentrated.
This
milestone
have
a
look
at
pre-aggregation
funnels.
C
Or
anything
else
I
think
sessions,
ultimately
just
off
the
top
of
my
head,
because
if
we're
basically
like
the
direction
we
wanted
to
get
towards
is
like
shipping,
dashboards
and
I
think
sessions
plays
a
big
part
of
the
audience
dashboard.
Okay.
So
if
there
are
complications
with
that,
though
I'm
happy
to
set
up
like
another
call
to
Workshop
that
yeah.
B
C
A
Right
I'd
agree
with
the
sessions
Direction,
that's
one
of
the
key
key
things.
People
are
going
to
be
looking
for
once
that's
available.
B
C
B
Yeah
well,
in
that
case,
do
you,
if
you
set
up
a
call
with
you
and
me
tomorrow,
Dennis
like
tomorrow
afternoon,
my
time
we
can
go
over
that
and
sure.
A
C
Okay
sounds
good
I'll
try
to
find
a
spot
on
the
calendars
right,
all
righty,
so
we'll
we'll
we'll
create
some
more
issues
out
of
this
and
set
up
some
spikes
and
then
we'll
we'll
try
to
figure
out
where
it.
C
C
We're
gonna
have
any
more
thoughtful
about
this,
because
there's
no
easy
one-leg
click
thing
I
can
knock
out
of
this.
Unfortunately,.
A
Cool
well
thanks
for
the
discussion.
This
was
good
helpful
for
me.
Hopefully
it
was
for
you
all
as
well.
Definitely.