►
From YouTube: Product Analytics Sync - Data Export
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
So
yeah
the
eye,
the
problem
is
well,
it's
not
a
problem.
The
thing
we're
trying
to
solve
is
that
we
want
to
be
able
to
export
the
underlying
data
from
click
house
or
keyboard,
get
the
data
out
in
some
way.
That
means
it's
portable,
so
yeah,
there's
kind
of
two
options
that
we've
got
so
far.
A
One
is,
in
my
mind
at
least
low
effort,
medium
reward
and
the
other
one
is
sort
of
high
effort,
High
reward
so
using
cube.js
to
surface
a
data
export
so
that
this
is
sort
of
the
the
easy
thing
to
do
and
as
far
as
I'm
aware,
there's
no
engineering
work,
whether
it's
documentation
work
so
just
to
recap
for
anyone
who's
watching
this
by
just
listing
a
list
of
the
dimensions
which
map
pretty
much
one-to-one
to
clickhouse
columns
or
clickhouse
Roche
I
should
say
that
allows
us
to
essentially
output
a
table.
A
There
are
a
few
very
glaring
downsides
to
this,
one
of
which
is
that
the
data's
returned
in
Json
format,
so
that
would
require
whoever's
exporting
this
data.
Using
this
method
would
have
to
either
be
okay
with
that
or
process
it
themselves
into
a
CSV
or
other
whatever
other
format
they
required
it
would
by
default,
be
in
Json.
A
A
couple
of
other
problems
are
that
there's
a
SQL
limit,
whether
that's
set
in
Cube
by
us,
which
I'm
not
sure
of,
but
even
if
it's
not.
There
will
come
a
point
where,
if
you're
trying
to
export
a
hundred
thousand
a
million
10
million
rows,
you're
gonna
hit
some
limits
somewhere
and
that's
gonna
be
slow
or
even
impossible.
A
In
some
cases
we
could
probably
do
some
magic
with
like
pre-aggregations,
but
then
we
end
up
storing,
essentially
an
entire
copy
of
a
clickhouse
table
in
a
pre-aggregation
which
doesn't
really
solve
any
problems,
so
there'd
probably
be
some
like
whoever's
exporting.
This
data
would
probably
have
to
implement
pagination
for
it
as
well,
so
it
could
get
longer.
It
could
get
complicated.
The
the
main
glaring
upside
to
this
is
that
there's
no
engineering
work
to
do
right
now,
so
someone
could
send
this
query
via
the
API,
the
proxy
API.
A
Sorry,
this
query
or
something
similar
while
the
proxy
API
and
get
the
correct
data
out.
So
anyone
with
maintainer
access
to
a
project
could
in
theory,
do
that
as
long
as
they
had
access
to
product
Analytics.
B
Okay
so
I
guess
that's
where
my
point
of
confusion
was
because
I
was
looking
at
the
screenshot
of
the
playground,
which
is
not
enabled
in
production
mode
gotcha
it
is
for,
for
it
is
actually
enabled
right
now
in
production
in
the
production
cluster,
which
shouldn't
be
the
case,
but
that's
a
different
problem,
but
but
I
think
what
you
just
said
about
so
querying
that
all
that
the
input
and
the
output
it
happens
through
our
proxy,
with
the
cube
API
right.
So
that's
that
is
okay.
B
C
B
A
Yeah,
so
it
would,
it
would
require
some
sort
of
we
would
have
to
document
what
the
The
Columns
were
and
I
guess.
If
we
moved
a
snowplow
that
could
change,
but
again
that's
something
that
we
could
document
as
part
of
our
standard
documentation
process
so
yeah
in
terms
of
no
engineering
work.
A
That
does
work
with
the
caveats
that
I've
outlined
in
terms
of
limits,
and
obviously
the
hope
is
that
those
using
this
will
hit
hundreds
of
thousands,
if
not
millions,
of
data
points
which
would
be
great
for
us
in
terms
of
user
adoption,
but
will
give
us
trouble
and
we'd
need
to
probably
load
test
it
or
test
it
pretty
hard.
A
C
This
proposal
that
that
really
clarifies
my
point
of
confusion,
because
I
was
not
quite
sure
if,
when
we
said
no
additional
work,
if
we
were
talking
about
just
the
back
end
piece
or
sounds
like
since
you
just
said,
the
maintainers
I
as
a
customer
would
be
able
to
use
the
docs
go
directly
to
do
the
download
with
scaling
considerations
which
I
I
think
that
gets
me
back
to
where
I
start
with.
C
My
initial
understand
of
customer
could
use
the
documentation
to
get
what
they
need,
which
I
I'd
be
very
happy
about
for
our
first
boring
sort
of
iteration.
Definitely
on
the
the
scale
comment.
If
we
do
get
to
that
point
and
I'm
assuming
we
will,
that
seems
like
something
we
could
come
back
and
revisit
at
some
point,
but
at
least
for
our
first
iteration
of
export.
C
Really,
our
goal
is
to
make
sure
people
aren't
locked
into
git
lab
if
they
want
to
get
their
data
to
take
it
to
another
service
provider
or
if
they
want
to
do
some
custom
processing
with
Ruby
or
python
or
whatever
we're
not
locking
their
data.
In
so
I
mean
that
that
makes
me
feel
good
about
this
again,
since
we
clarified
that
point.
So
thank
you
for
walking
through
that.
Okay.
A
So
that
makes
sense,
I
think
we're
on
the
same
page
there,
and
it
makes
it
in
my
mind
at
least
it
makes
a
first
do
nothing
first
iteration,
which
is
always
nice
in
terms
of
not
locking
people
in
like
this.
This
goes
some
way
towards
that
shorter
than
more
extensive
solution,
which
I'll
talk
about
in
a
minute.
A
There's
there's
nothing
to
stop
us
if
we're
only
talking
about
a
handful
of
customers,
initially
anyway,
I
assume,
if
if
one
of
those
is
is
very
keen
on
on
exporting
their
data,
I
assume
there's
nothing
to
stop
us
from
manually
doing
so
on
their
behalf.
A
Obviously,
that's
not
a
scalable
solution,
but
if,
if
someone
wants
their
data,
we
have
it
and
we
can
give
it
to
them,
we
can
even
upload
it
into
object,
storage
for
them
from
clickhouse,
which
leads
me
nicely
to
sort
of
the
better
point
of
a
better
word
solution,
which
is
to
export
the
clickhouse
data
directly
from
clickhouse
to
object,
storage
of
their
choice
or
whatever
it's
configured
to
so
github's
already
got
support
for
object,
storage,
I,
assume,
S3
and
gcp,
or
possibly
others
I'm,
not
really
sure,
and
the
idea
would
be
to
start
a
background
job
which
would
export
clickhouse
table
using
native
clickhouse
functions
for
exporting,
save
that
file
locally
on
the
file
system
and
then
upload
it
to
whichever
defined
object.
A
Storage,
that's
self-managed
instance
has,
or
in
the
case
of.com
wherever
object,
storage
for
us
there's.
The
first
thing
that
jumps
out
at
me
there
is
that
on.com
I
assume
we're
the
administration
of
the
object
storage,
which
means
we
could
be
storing
potentially
very
large
files
and
object,
storage
and
there's
a
cost
implication
there.
A
But
the
the
big
thing
here
is
that
it's
scalable
in
the
sense
of
we
just
use
clickhouse
to
export
the
file
and
any
any
number
of
Records
should
be
able
to
handle
this,
especially
if
it's
running
a
background
job.
But
it
will
take
time.
C
A
A
This
feels
like
the
the
scalable
solution
in
the
sense
of
even
if
we
only
provide
the
download
to
be
available
for
I,
don't
know
a
few
hours
or
24
hours
and
then
remove
the
file
from
object
storage.
You
know
we're
we're
providing
it
as
an
export.
It
won't
be
available
forever
if
you
want
to
get
it
again,
you'll
have
to
generate
a
new
export
which
should
keep
costs
down
to
a
fairly
manageable
level.
A
Yeah
like
this.
This
is
the
solution,
but
the
big
red
flag,
as
I'm
sure
Dennis,
is
thinking
right
now
is
that
it
requires
direct
access
from
the
gitlab
application
to
clickhouse,
which
currently
we
only
have
fire
cubes
abstraction
layer
so
that
that
becomes
an
infra
task.
Then.
B
Yeah
and
we
don't
want
to
directly
expose
Cuba
excuse
me,
click
outs,
anyways,
and
we
want
to
implement
a
proxy
in
front
of
that
as
well,
which
I've
mentioned
before
CH
proxy,
but
and
we'd
have
to
map
out
that
whole
interaction
model.
But
I
I
we've
spoken
about
this
particular
solution
before
anyways
and
our
I
think
the
last
time
we've
actually
had
a
synchronous
discussion
about
exporting
and
so
I
mean
we'll
have
to
map
that
out,
but
ultimately
I
think
yeah.
A
So
yeah
the
implementation
will
be
made
up
of
a
background
job,
so
an
export
background
job
which
would,
in
the
background,
make
the
clickhouse
call
to
generate
the
file.
Presumably
I,
don't
know
how
the
internals
work
on
click
house,
I,
don't
know
if
an
exporting
of
a
single
table
locks
anything
else.
A
So
we
need
to
make
sure
that
it
doesn't
affect
any
other
consumers
or
any
other
projects,
but
assuming
it
doesn't,
then
the
clickhouse
command,
which
we'd
run
directly
via
CH
proxy,
would
export
file
which
we'd
save
somewhere
locally
temporarily
and
then
that
same
background
job
would
then
use
fog,
for
whichever
object,
storage
was
configured
and
uploaded
somewhere
useful
and
make
it
available
to
a
set
number
of
users.
A
I
assume
we'd
have
a
signed
URL,
which
we
would
then
pass
to
whoever
generated
the
export
job
and
then
but
a
life
cycle
rule
on
that
file
for
it
to
expire
after
a
certain
amount
of
time,
at
which
point
we
can
delete
the
file
and
that
person
has
got
access
to
their
data
and
can
do
what
they
like
with
it.
B
A
It
well,
in
which
case
I,
will
write
up
to
issues
one
for
each
of
these
and
ping.
You
both
on
them
just
so
we're
make
sure
we're
in
agreement
about
what
it
is.
We
want
to
do
and
then
yeah.
We
can
schedule
that
first
one
as
soon
as
you
like,
Sam
and
the
second
one
as
dinner
said,
might
just
take
a
bit
longer
to
Think
Through,
exactly
how
it's
going
to
work.
B
Yeah
I
think
for
the
second
one
I
want
you
want,
you
won't
be
well,
you
might
be
involved
for
the
for
the
second
option
or
exporting
the
object.
Storage.
There
I
believe
there
will
be
follow-ups
like
investigations
for
front
end
in
terms
of
how
to
like
trigger
the
job
or
we
do
API
only,
but
also
for
the
infrastructure.
B
It's
like
investigation,
which
that's
why
I
said
you
might
actually
be
involved,
but
yeah
I
think
once
we
we
have
the
plan
laid
out
from
your
perspective
from
the
back
end,
then
we
can
start
the
investigations
from
the
other
angles.
Okay,.
A
Cool
well
I'll
I'll
do
that
this
afternoon.
So
you
should
have
those
lessons
today
or
tomorrow.
Right
awesome.
A
C
Was
this
was
good
I'm
glad
we
got
the
chance
to
sync
up
on
this
on
the
call
much
much
easier
than
going
back
and
forth
on
the
issue.
So
oh
yeah
for
sure.
A
B
Yep
as
far
as
experts
concerned,
I,
think
I
think
we're
good
and
then
we'll
follow
up
on
these
and
get
working
on
the
documentation
side
of
it.