►
From YouTube: 2023-06-12 Analytics Section Sync Recording
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
So,
okay,
so
based
on
the
follow-up
question,
I
just
wanted
to
add
the
point
at
the
beginning
of
the
conversation,
because
I
good
impression
that
there
was
a
little
bit
of
a
confusion
going
on
between
what
is
the
service
ping
and
what
is
the
snow
plow
or
like
what
is
distinction
between
those
two
I
think
that
it
will
help
us
to
be
more
Direct
in
the
follow-up
points.
A
A
But
this
is
not
the
event
level
data.
This
is
the
aggregated
information
based
on
the
number
of
rows
in
the
database
or
personal
database
like
number
of
roles
in
the
issue
table
or
some
other
information
like
the
instance
license
md5
or
shark
hash,
so
various
information
and
it
all
we
have
roughly
two
thousands
of
metrics
in
service
being
all
of
this
metrics
comes
together
as
the
single
Json
object
via
history
post
request.
A
So
it's
definitely
very
different
structure
than
the
even
based
data
which
reports
Atomic
bits
of
information
via
single
event,
and
the
second
part
of
the
conversation
is
Slow
blow
for
SAS
for
github.com,
which
reports
events
based
data
and
it's
collected
and
aggregated
later
on
in
the
downstream
systems
and
yeah.
Those
two
are
very
independent
from
each
other
and
not
really
a
compatible.
B
I
guess,
since
it's
probably
me,
that's
misunderstood
that
because
I'm
pretty
sure
you've
explained
this
to
me
before,
but
I
appreciate
you
highlighting
it
again,
just
so
myself
and
another
zero
where
I
I.
Is
it
correct
in
saying
that
like
snowplow
kind
of
came
after
service
ping,
or
did
they
kind
of
come
around
at
the
same
time
like
you
call
it
basically,
the
reason
I
ask
is
there's
multiple
events
that
are
being
there's
thousands
of
events
being
collected
in
service
ping
that
aren't
in
snowplow
like?
B
A
Have
very
specific
insights
but
I
believe
service
ping
was
a
little
bit
earlier
than
snowplow,
but
also
there
was
the
the
main
event
driver
for
like
the
difference
and
the
the
those
two
tools
diverging
from
each
other
was
driven
by
the
ability
to
report
from
the
self-manager,
and
we
got
a
lot
of
pushback
and
constructive
feedback
from
the
wider
Community.
A
When
we
try
to
introduce
the
even
tracking
on
the
instance
levels,
so
service
being
in
the
aggregated
form
is
way
more
privacy
concerned,
because
it
doesn't
really
reports
any
information
which
is
that
how
to
what
is
singular
user
doing
the
number
of
issues
or
the
number
of
projects
doesn't
really
tells
anything
about
the
user
and
also
it's
way
smaller
data
stream
coming
from
the
instance,
so
the
instance
administrators
feel
way
more
in
control
like
they
can
just
rolls
the
payload
and
see
what
is
exactly
going
out
of
the
instance
and
with
the
snowplow
events,
which
could
be
Millions
as
simply
impossible
to
monitor
every
bit
of
data.
A
That's
coming
out.
There
are
numbers
of
initiatives
to
bring
those
two
data
sets
closer
together,
also
driven
by
the
fact
that
we
have
a
little
bit
of
a
challenge:
how
to
replicate
the
service
being
metrics
per
customer
on
the
SAS
level,
because
this
service
being
runs
on
the
instance
level
and
for
the
self-managed
once
instance,
is
one
customer
for
SAS.
That's
not
the
case.
This
is
the
thousands
of
customers
combined
together,
which
makes
the
life
of
customer
support
customer.
B
A
Harder,
so
there
are
a
number
of
initiatives
that
help
to
break
down
the
SAS
instance
into
the
particular
namespaces
which
represent
the
customers,
and
for
that
we're
also
using
this
snowplow
events
to
mirror
or
replicate
some
of
the
snowplow
a
summary
of
the
service
Bing
Matrix,
but
not
all.
There
is
also
an
incentive
which
is
called
internal
events
tracking,
which
was
recently
started.
We
try
to
provide
more
cohesive,
like
singular
events
API,
because
we
did
the
confusion
that
you
face.
A
Then
this
is
well
whether
we've
come
on
GitHub,
like
we
Face
numbers
of
feedbacks
that
people
are
really
confused,
like
which
events
to
track
and
like
what
this
tool
does
and
what
the
other
tool
does.
So
we,
we
started
the
recently
the
initiative
to
bring
those
all
tools
together
and
provide
a
singular
interface,
but
it's
a
very
early
start
and
we
basically
are
wrapping
up
the
the
groundwork
to
even
start
building.
On
top
of
it.
No.
B
Thanks
thanks
for
that
that
that
Rings,
a
lot
of
bells
to
what
you
famously
explained
to
me
in
that
you
know
the
the
method
of
collection
I
mean
some
people
manually
still
send
it.
It's
part
of,
like
I,
think
their
license
check
service,
paying
some
contracts
for
people
that
are
in
air
gapped
environments,
for
example,
or
just
not
regularly
sending
them
send
that
payload.
B
So
that's,
obviously
a
lot
more
of
a
process
to
figure
out
when
we
have
to
then
start
to
collect
millions
of
potentially
rows
of
events,
data
things
like
that,
so
that
makes
sense
and
I
also
now
remember
the
the
problem
with.com
that
yeah,
we
don't
have
that
same
level
of
granularity,
since
we
are
a
much
larger
instance
with
different
cohorts
of
customers,
and
things
like
that.
B
So
thanks
for
clarifying
that
cool,
so
I
guess
on
to
why
Nikolai
clarified,
that
is
that
I
had
a
couple
of
questions
that
I
really
wanted
to
bounce
off
of
an
analytics
instrumentation.
There
have
been
a
few
discussions
that
I've
had
with
some
members
of
analytics
instrumentation,
some
with
product
analytics
as
well
and
I
kind
of
wanted
to
understand
better
with
all
of
us
in
the
room
like
what's
actually
possible.
B
What
we
can
maybe
move
forward
with
or
what's
what's
not
actually
possible
in
terms
of
using
what
I
thought
was
servicing
data,
but
which
now
to
correct
myself,
is
snowplow
data
from.com
and
then
potentially
use
that
as
a
way
for
us
to
instrument
gitlab.com
or
at
least
view
that
gitlab.com
event,
data
within
product
analytics
dashboards
and
so
just
trying
to
figure
out
which
person
and
which
point
to
refer
to
I
guess
we
can
just
go
and
order
I'll.
B
If
you
kind
of
go
to
2a1
here
that
I'm
highlighting
currently
in
the
agenda
I'm
asking.
Is
it
worth
potentially
setting
up
some
kind
of
ETL
where
we
can
grab
data
from
and
I'll
call
that
and
clarify
that
to
be
snowplow
from.com
and
periodically
import
it
into
the
production?
A
cluster
for
the
purposes
of
using
that
for
product
analytics
features,
with.com
data
that
we
already
collect
and
then
make
a
while.
You
have
a
point
there.
A
A
Think
that
the
the
subjects
of
the
ETL
was
covered
by
Bastille
to
be
down
below,
with
suggestion
to
use
the
different
endpoints
and
not
really
backboard
the
data,
but
rather
keep
on
the
collecting,
but
I
will
add
busty
to
voice
out,
because
I
think
he
has
more
broader
suggestion
to
share.
B
Cool
I'll
just
quickly
cover
my
my
point
that
passing
response
to
animal
tester
capacity
just
for
the
context
so
I
was
recalling
when
we
had
early
discussions
when
we
were
wanting
to
implement
snow
plow
for
product
analytics
that
we
wanted
to
use
self-describing
event
schema
and
part
of
that
was
because
what
we
use
on.com
currently
for
snowplow
is
using
an
older,
older
schema,
and
so
that
was
my
understanding,
but
bossy
you've
clarified
it
here.
C
Yeah
I
think
I
mean
that
that
understanding
was
correct,
so
our.com
schema
right
now
uses
this
I
think
it
snowball
calls
it
structured
events,
which
is
a
event
that
always
has
I,
think
a
label,
an
action,
a
category
so
a
bunch
of
different
string
identifiers
and
that's
a
kind
of
deprecated
structure.
I
think
that
was
originally
inspired
by
Google
analytics.
So
it's
kind
of
the
same
structure
that
Google
analytics
use
but
snowplow
by
now
recommends
these
self-describing
events
and
not
using
this
old
structure.
C
I
think
if
we
would
really
build
kind
of
snowplow
from
scratch
on
gitlab.com,
regardless
of
product
analytics,
we
would
also
use
self-describing
events.
So
in
theory,
I
think
we
could
still
do
something
like
in
an
ETL
where
we
take
the
snowplow
data,
so
the
snowplowdata
from.com
it's
actually
running
on
AWS
in
an
Amazon,
Kinesis
Pipeline
and
then
put
into
an
S3
bucket.
So
there's
a
bunch
of
S3
Pockets
which
just
hold
text
files
with
the
snowplow
data
and
in
theory
we
could
grab
those
and
just
actually
transform
it.
C
I
think,
for
example,
so
our
custom
event
right
now
has
I
think
just
a
name.
So
in
theory,
you
could
just
imagine
I,
don't
know
taking
this
label
action,
something
something
and
either
putting
it
into
specific
Properties
or
just
concatenating
it
into
a
long
string
and
then
turns
back
kind
of
that
way,
transforming
it
and
interesting
that
data,
then,
at
the
same
and
I
think
what's
important
to
differentiate
here
is
between
events
and
Page
views
or
other
like
standard
events
within
snowplows.
C
So
there's
page
views,
there's
page
pings,
which
tell
you
about
kind
of
how
long
a
patreon
has
been
ongoing
and
there's
things
like
link
tracking,
so
events
that
are
specifically
sent
from
certain
plugins
that
are
not
affected
by
this.
These
are
just
still.
This
is
a
like
special
kind
of
event.
This
is
not
part
of
this
custom,
self-describing
event
thing
so
page
views
and
so
on.
Those
we
could
just
take
and
like
use
as,
as
is
in
theory
but
they're,
all
part
of
the
same
big
bunch
of
S3
buckets.
C
So
just
by
looking
at
the
bucket,
you
can
differentiate
between
those
two
and
then
I
think
so
we
already
thought
a
bit
about
this
and
what
Nikola
was
already
referring
to.
Is
this
confusion
by
our
like
internal
users,
so
people
in
gitlab
who
need
to
instrument
an
event
around
okay?
What
kind
of
instrumentation
do
I
need
to
use
like
do?
C
I
need
to
use
this
service
ping
redis
stuff,
do
I
need
to
use
snowplow
and
for
that
reason,
I
think
we
would
prefer
to
what
we're
already
right
now
doing
is
encapsulating
it
all
in
one
API.
So
you
just
call
one
method
called
Track
event
or
something,
and
in
the
background
we
use
all
those
separate
systems
automatically.
C
So
we
send
things
to
the
service
ping
if
it's
on
a
self-managed
instance,
if
we
send
things
to
snowplow,
if
it's
on.com
and
you
would
use
the
same
API
then
to
send
events
to
our
product
analytics
Ruby
cluster
as
well.
That
would
be
our
idea
and
then,
in
the
beginning,
we
could
do
yeah,
just
figure
out
a
way
to
not
send
all
events
at
once,
because
what's
also
important
to
consider
is
the
amount
of
events
we
have
on
gitlab.
So
that's
around
60
million
events
per
day.
C
Around
7
million
of
those
are
page
views,
and
so
this
kind
of
amount
I
think
also
takes
a
toll
on
the
infrastructure
we
have.
So
even
if
we
would
just
ETL
it
into
our
system,
clickhouse
would
still
need
to
Able
be
able
to
handle
millions
of
events
that
would
accrue
over
time.
C
B
B
The
fire
hose
like
that,
but
no
I'm,
actually
particularly
interested
in
the
fact
that
we
have
that
much
data
coming
in,
so
that
we
could
actually
use
that
for
testing.
It
would
be
interesting
to
see,
and
not
might
my
understanding
correct
me
if
I'm
wrong
from
your
response
here
is
that
we
could
do
an
ETL
for
old
events
to
kind
of
bring
in
old
old
event
data
into
new
ones.
B
But
given
the
recent
encapsulation
efforts,
it
might
be
better
off
just
to
kind
of
move
forward
with
just
sending
new
events
to
the
project
analyst
clusters
and
just
doing
it
that
way.
That
way,
we're
not
really
doing
any
effort
on
kind
of
old
data
that
we
and
just
kind
of
do,
do
it
on
new
and
then
still
basically
have
the
same
value.
And
since
we
have
enough
events
that
it
wouldn't
matter,
anyways
yeah.
C
Except
if
we
out
of
some
reason
want
to
test
our
system
with
billions
of
events
at
once,
then
we
could
just
because
all
this
data
is
available
and
as
free
buckets
still
so,
there's
S3
buckets
going
back
a
long
time.
So,
if
we
out
of
some
reason
need
billions
of
rows
of
data,
we
can
ingest
it
and
we
can,
for
example,
yeah
write.
C
A
script
adjust
the
page
views
if
we
are
just
interested
in
kind
of
looking
at
graphs
from
page
views
or
or
transform
the
events
that
are
there
into
a
structure,
that's
feasible
for
us
to
work
with
and
I
mean
there
is
also
existing
knowledge
about
kind
of
working
with
this
data.
So
the
data
platform
team,
for
example,
recently
wrote
scripts
to
go
through
all
this
data
to
remove
IP
addresses,
so
they
already
had
python
scripture
running
on
the
existing
S3
bucket
data.
A
Clickhouse
has
the
integration
with
S3s
plus
I
know
so
probably
would
be
possible
to
just
connect
the
clickhouse
instance
directly
to
DS3
buckets
with
the
events
files
and
build
the
old
old
ETL
pipeline
just
in
clickhouse.
B
Cool
yeah,
not
to
mention
just
the
billionaire
scale,
but
also
if
we
you
know
want
found
it
interesting
enough
that
we
wanted
to
connect
historical
data
that
we
currently
have
with
with
what
we're
collecting
now
or
when
we,
when
we
do
set
that
up
random
follow-up
question.
That's
ever
that's
come
out
of
that.
Just
to
understand
how
much
is
historical
data
is
still
on
S3.
Did
we
ever
get
around
to
defining
a
data
retention
policy,
or
do
we
still
have
like
basically
everything
that
we've
ever
Okay
cool?
So
that's
interesting.
C
And
it's
all
as
far
as
I
know,
it's
all
also
in
S3.
So
it's
not
only
in
the
data
warehouse,
because
the
like
S3
is
our
actual
backup
of
of
all
these
events.
In
the
event
that
the
data
warehouse
would
go
down,
so
we
could
readjust
them,
but.
A
A
Okay,
let's,
let's
go
back
guys
by
month
by
year,
whatever
decide
as
a
fit.
So
we
have
a
little
bit
of
control.
How
much
back
in
the
past,
we
want
to
get.
B
Yeah
that'll
be
really
important
because
the
ones
that'll
make
the
process
a
lot
easier,
especially
making
sure
we
have
enough
space
for
all
of
it
in
the
cluster
or
we
need
multiple
clusters,
but
also
depending
on
our
interest.
If
we
really
want
to
go
back
that
far
or
not
so
it's
good
that
it's
organized
as
such
I
didn't
write
it
down,
but
also
action
item
I'll
create
some,
probably
an
epic
around
us
and
some
issues
to
kind
of
pursue
this
further,
but
I
appreciate
Nikolai
and
ambassy.
B
Here
your
input
on
this
sorry
I
can't
type,
and
this
is
what
I'm
terrible
taking
meeting
notes
type
and
talk
at
the
same
time.
So
then
I
asked
about
using
the
browser
SDK,
but
at
this
point
I
guess
well,
I
I
think
it's
still
relevant,
because
what
we're
looking
at
as
far
as
snowplow
is
is
specifically
well.
No
we're
collecting
page
views
is
well
I,
guess
I'm,
not
sure
if
I
should
even
mention
this
question,
but
basically
the
reason
I
asked
it
was
that
you
know.
B
Could
we
use
a
browser
sdkn.com
if
it
were
able
to
collect
events
in
the
same
way
that
snowplow
currently
does
on.com
but
Boston
to
your
point?
That
requires
the
use
of
the
Kinesis
pseudonymization
pipeline,
which
would
then
basically
be
an
additional
part
which
wouldn't
really
mesh
well
with
our
current
flow.
As
far
as
student
immunization
is
concerned,
so
I'll
maybe
table
this
part
for
now.
The
people
are
happy
to
read
through
this
at
this
point,
but.
C
I
think
it
might
still
be
I
mean
we
have
time
so
I
think
it
still
might
be
helpful
to
just
kind
of
lay
out
all
the
options,
because
we
have
this
ability
to
either
like
we
said
before
kind
of
ETL,
all
data
in
or
then
also
new
data
in,
but
I,
think
that's
like
handling
this
amount
of
data
is
kind
of
like
to
with
an
ETL
pipeline
can
can
be
also
a
challenge.
C
I
mean
that's
just
to
talk
to
our
data
platform
team
who
handle
this
every
day
and
the
alternative
is
to
actually
add
these
additional
sdks
and
there
I
think.
C
The
the
main
point
is
that
that's
important
to
understand
is
that
we
are
pseudonymizing
quite
a
bit
of
stuff
yeah
right
now
for
kidlab.com
and
that's
written
down
in
our
yeah
in
terms
of
service
or
something
that
this
data
isn't
collected
so
kind
of
it
would
be
important
for
us
to
do
the
same
if
we
ever
add
an
additional
layer
of
collection
to
gitlab.com
and
there
is
and
rituals
and
and
kind
of
easy
ways
to
do,
pseudonym
assessing
pseudonymization.
C
To
on
with
snowplow,
but
mostly
on
specific
properties
and
the
one
thing
that,
like
I,
think
we
are
doing,
Special
that
is
not
easily
covered
is
actually
looking
at
URLs
so
because
in
theory,
in
gitlab,
namespaces
or
group
names
or
project
names
that
they
could
expose
personal
data
like
the
name
of
a
project,
the
name
of
a
and
so
those
are
getting
actually
sort
of
to
domestic
or
something
in
this.
C
I
think
the
important
part
here
is
also
that
this
is
like
the
in
the
current
pipeline.
It's
running
on
Kinesis,
but
the
the
the
part
that
does
the
the
conversion
is
a
ruby
Lambda.
So
it's
just
a
bunch
of
Ruby
code.
We
could
theoretically
also
try
to
kind
of
modify
that
to
run
with
Kafka
as
well,
and
so
the
actual
code
to
do
the
southernization
probably
could
stay
the
same,
because
the
structure
of
snowplow
is
the
same
of
the
snowplow
data.
C
It's
just
kind
of
the
connection
between
it's
it's
a
different
type
of
connections,
instead
of
doing
it
with
Kinesis,
you
do
it
with
Kafka
and
there's
also
alternatives
to
kind
of
replicate
that
code,
because
it's
not
that
much
I
mean
the
actual,
perhaps
like
taking
the
data
supplementation.
You
could
also
convert
that,
for
example,
into
JavaScript,
which
would
be
easy
to
to
use
on
as
an
enricher.
So
snowplow
has
possibilities
to
just
add
JavaScript
enrichers
directly,
so
you
don't
need
to
kind
of
set
up
a
different
function
or
anything.
C
So
this
would
also
be
a
possibility
to
do,
but
I
mean
that
the
big
part
here
is
I
think
is
that
this
is
very
different
to
what
probably
our
customers
want,
because
our
like
this
is
very
specific,
gitlab
thing.
So
either
we
would
have
to
have
a
different
cluster
I
think
just
for
bracket
lab
data
goes
through
which
I
don't
know.
Maybe
a
good
idea
anyway
or
alternatively,
have
the
code
in
a
way
where
it
looks
at
okay.
C
Is
this
a
gitlab
project
and
then
just
do
the
pseudonymization
and
if
not,
then
not.
B
C
The
one
thing
is
I
think,
with
the
Ruby
SDK
we
can
theoretically
just
choose
to
send
data,
that
is
that
doesn't
need
to
be
pseudonymized
as
long
as
we
don't
send,
namespaces
user,
IDs
and
so
on.
We
could
just
send
start
sending
events
without
any
additional,
like
user
information
with
the
browser
SDK
it's
different
because
it's
the
browser,
it's
page
views
being
tracked,
and
so
on.
So
as
soon
as
you
put
that
in
you're
gonna
send
information
that
is
potentially
pii
data,
which.
C
So,
theoretically,
on
the
Ruby
side,
we
could
Implement
start
implementing
without
this
being
in
place
on
the
JavaScript
like
web
mobile
web
browser
side.
Note.
B
C
Yeah,
so
we
just
have
similar
to
our
current
cluster
like
there's,
no
plot
part
and
so
on.
This
is
also
just
set
up
on
AWS.
It's
like
it's,
not
in
kubernetes,
it's
like
the
ec2
instances
for
the
collectors
for
snowplow
collectors
and
instead
of
Kafka
and
Kinesis,
but
otherwise
we
have
to
kind
of
just
a
similar
pipeline.
It's
just
then
not
ending
up
in
clickhouse,
but
rather
in
this
has
three
buckets
which
are
then
getting
ingested
into
snowflake,
which
is
our
data
warehouse.
A
Around
the
around
the
page
views
but
I
want
to
specify
one
thing:
I'm,
like
95
sure
that
the
URLs
synonymization,
which
you
mentioned
actually
happens
on
the
server
size
on
gitlab
when
the
page
is
loaded
to
the
browser.
A
So
the
snow
blow
SDK
browser
SDK
is
sending
already
personalized
URLs
because
we
don't
have
on
kinesis
the
way
to
map
the
namespace,
ID
or
namespace
namespace
names
to
the
IDS.
So
servers
gitlab
server
resolves
back
so
it
gets
the
request.
Hp
request
finds
the
namespace
ID
or
the
namespace
name.
Project
name
results
is
backstory,
ID
prepares,
minimize,
URLs,
send
it
with
the
HTML
and
then
from
that
the
URL
is
replaced
just
for
snowplow
and
it's
sent
back
to
the
pipeline.
B
If
we
well
I
guess:
first,
we
have
to
get
out
to
customers,
so
they
can
try
it
first,
but
you
know
busty
mentioned
we're,
not
sure
if
they
would
want
this,
but
if
we're
getting
a
gearing
for
application
developers
and
being
able
to
see
them
in
a
suit
and
now
I
can't
say
it
anymore:
synonymize
namespaces,
when
they're
doing
it
in
their
applications
that
might
actually
reduce
the
barrier
for
them
to
actually
be
able
to
collect
event
data
without
collecting
pii.
B
So
for
the
same
reasons
that
we're
trying
to
get
you
know,
instrumentation
set
up
more,
maybe
maybe
potentially
there
are
customers
that
have
a
similar
situation
as
well,
but
I
guess
which
we
won't.
We
won't
know
which
one
will
have
to
come
first
before
the
other,
but
yeah
I
think
at
any
rate,
we
would
likely
need
a
separate
in
clustering
environment
for
specific,
which
we
can
do
now
connecting
a
project
at
a
to
a
different
cluster.
B
So
we
could
theoretically
connect.org
get
lab
to
a
separate
cluster
and
which
would
then
have
something
to
do
the
pseudonymization
I
can
say
that
version
of
it
and
then
later
on,
we
can
explore
whether
other
customers
who
need
it
and
we
can
build
it,
maybe
actually
into
it
for
everyone
but
anyways
bottom
line
is
I'll
I'll
create
an
issue
for
us
to
to
Really
outline
this,
since
we
started
getting
into
like
technical
design
for
it,
but
cool
good
discussion.
Thank
you
thanks.
B
Everyone
I
think
we're
at
time
at
this
point,
so
I
think
busty
and
I
had
wanted
to
just
kind
of
give
an
overview
of
what's
going
on,
busty's
already
written,
what's
happening,
I
will
write
down
what's
happening
and
then
yeah,
but
we're
already
at
time.
So
unless
there's
anything,
anyone
else
would
like
to
call
out
I'll
just
pause
for
a
second.
If
anyone
wants
to
talk
about
anything
a
few
minutes,
we
have
left.
B
All
right:
well,
let's
get
to
see
everyone
thanks
for
the
discussion
and
I
hope
everyone
has
a
good
rest
of
their
Monday
good
rest
of
your
week
and
take
care.