►
From YouTube: 2021-05-11 CNCF TAG Observability Meeting
Description
2021-05-11 CNCF TAG Observability Meeting
A
A
B
Happy
post
kubecon
everybody.
B
B
Okay,
okay!
Well,
let's,
let's
start
in
in,
let's
say
a
minute!
So
if
you
haven't
yet
signed
in,
please
do.
B
Document
all
right
and
while
we
get
started
this
is
a
cncf
meeting.
The
code
of
conduct
applies
and,
as
usual,
we
are
kind
to
our
colleagues
as
we
are.
So
today
we
have
a
couple
of.
We
have
a
couple
of
guests,
we're
that
are
going
to
do
some
quick
presentations
they're
enlisted
in
the
agenda.
We've
got
the
author
of
prom
dub
ivan
sim,
and
we
also
have
a
very
cool
talk
by
pixie.
B
So
zane
is
going
to
walk
us
through
pixie
labs.
What
they've
been
up
to?
I
wanted
to
extend
a
welcome
to
everyone
who
may
be
here
for
the
first
time
if
anyone
is
here
for
the
first
time
or
you've
never
introduced
yourself
before.
Do
you
want
to
take
a
quick
second
and
just
say
hi.
A
C
A
Word
small
world
hey,
I
can
introduce
myself,
I've
been
lurking
on
the
channel
a
little
bit
and
I
attended
a
few
other
meetings.
My
name
is
dave
cruzanti.
I
work
at
comcast,
I'm
just
we
do
a
lot
of
stuff
with
durability,
most
of
it
not
very
homogeneous.
So
I'm
trying
to
get
some
things
going
there
and
I've
just
been
contributing
on
so
happy
to
be
here.
D
Welcome,
hey
everybody.
My
name
is
ivan.
I
work
for
red
hat
yeah
first
time
here
and
I
got
a
co-email
from
matt
who
invited
me
to
join,
to
come
and
share
like
a
personal
site
project
that
I've
been
doing
so.
A
A
A
I
work
in
their
open
source
program
office
as
well
as
some
other
stuff
and
observability
is
definitely
hot
on
our
you
know,
topics
to
kind
of
do
more
work
on.
So
it's
been
great
to
kind
of
see
what
you
guys
have
been
producing.
A
Great
to
have
you
thanks.
Welcome,
hey
I'm,
I'm
matthias!
I
haven't
been
here
for
a
while.
I
think
I
joined
the
first
couple
of
meetings
and
then
transitioned
to
another
company
previously
worked
at
red
hat.
Now
I
work
at
polar
signals.
So
hi
everybody
again,
hello,
everybody.
My
name
is
anaes,
I'm
mainly
here
today
because
of
the
presentation,
because
I'm
really
curious
about
pixi.
I
just
joined
the
sre
team
at
sivo
and
before
that
I
worked
at
codefresh.
B
I
think
that's
everyone
that
we
haven't
cool.
Thank
you
for
saying,
hi,
everybody.
I
don't
think
we
missed
anybody,
so
I'll.
Just
take
a
very
quick
brief
minute
at
the
top
last
week
you
know
we
talked
about
launching
some
initiatives
for
2021.
B
I'll,
follow
up
with
this,
with
a
blog
a
little
bit
later
today,
but
very
quickly.
I
won't
go
through
them
all
in
detail,
but
we
have
about
five
or
six
depending
on
how
you
count
work
streams
and
in
the
last
week
there
have
been
a
couple
of
additional
suggestions.
So
if
there's
no
objections,
I
think
we'll
plan
that
we'll
just
use
github
projects
and
we'll
make
a
con
mod
board,
for
you
know
the
definition
of
working
groups.
B
So
if
you're
interested
in
some
of
the
things
we
talked
about
last
week
again,
I
won't
I
won't
elaborate,
but
they
were
generating
a
big
giant
list
of
vendors
and
the
projects
they
contribute
to
a
plan
for
engaging
with
other
cncf
projects
intentionally
over
the
year
with
a
prioritization
and
like
a
common
interview
template
if
you
will
plan
for
and
foster
in-person
meetups,
once
the
world
gets
vaccinated,
generating
personas
and
curating
case
studies,
the
compendium
of
case
studies,
case
studies
and
other
existing
content
that
we
want
to
surface
from
cncf
members
and
partners.
B
So
those
were
the
five
that
we
talked
about
last
week.
If
folks
are
interested,
I
think
we'll
we
can
collaborate
in
github
and
slack
and
then
maybe
in
the
two
weeks
from
now,
we
can
we
can
present
once
we've
sorted
it
out
online
together
how
we
want
to
self-organize
and
self-manage.
But
a
number
of
people
have
reached
out
to
me
personally
and
said:
hey,
I'm
really
interested
in
helping
driving
one
of
those
things.
So
that's
awesome
to
see
so
with
that.
B
D
Yeah
thanks
everybody
thanks
for
having
me
so
yeah
today,
I'd
like
to
share
with
you
like
a
slight
project
that
I
have
recently
built
and
published,
and
it's
basically
a
quick,
auto
plugin
that
allows
like
sre,
I'm
sorry
is
to
dump
and
restore
like
permittivist
persistent
blocks
and
a
big
thank
you
to
bartek,
who
you
know,
provided
some
feedback
and
take
time
out
of
his
busy
week
weekend
chat.
D
Yes,
okay,
cool,
great
yeah,
so
the
gist
of
it
is
you
know,
late
last
year,
like
I
was
talking
to
one
of
the
red
hat
software
engineers,
and
he
told
me
that
they
have
been
like
an
increasing
number
of
issues
and
bugs
on
our
customers
openshift
customers,
clusters,
where
it
would
be
super
helpful
if
we
can
get
access
or
dump
of
their
primitives
data.
D
So
even
up
to
you
today,
right,
like
you
know,
we
usually
tell
like
you
know,
like
the
customer
representatives
to
hey
capture,
was
to
capture
like
a
premier
state
committee's
data,
and
here's,
like
you
know,
like
a
15
lines
of
bash
code
and
try
to
run
it
in
their
environment
and
what
that
bash
script
does
is
really
just.
D
D
Now
like
it
becomes
more
challenging
if
it
is
like
a
production
from
bts
instance
with
maybe
100
gig
of
data
200
500,
you
know
gig
of
data
right
like
as
we
like,
I'm
compressing
tarp
on
this
file
like
we
need,
and
we
try
to
restore
it
and
say
if
it
is
like
a
500
gig
of
thumb.
You
know
data
directory,
it's
almost
impossible
for
me
to
run
it
on
my
laptop.
D
You
know
to
try
to
just
to
reproduce
like
that
that
that
the
issue
to
see
what's
going
on
to
spin
up
clusters-
and
I
have
to
port
the
file
over
et
cetera,
et
cetera.
So
you
know
like
just
been
thinking
about
it
as
like,
really
like
when
we
diagnose
bugs
all
we
really
care
about
is
like
a
very
specific
chunk
of
data
that
falls
between
like
a
very
specific
time
window.
D
So,
and
you
know
I
just
started
researching
into
it
and,
as
you
folks
might
know,
currently
prometheus
like
offered
like
two
ways
to
dump
data,
and
one
of
them
is
through
like
a
prom
tool
where
it
has
like
a
tsdb
snapshot,
sub
command
and
then
there's
also
like
a
snapshot
like
an
http
endpoint.
D
Both
of
them.
Don't
really
do
what
we
want
because,
like
first
of
all
like
it
dumped
everything,
but
you
know
again:
200
gigafrom
data
metrics,
which
we
don't
need
and
then,
secondly,
with
prompt
tool
like
there's
a
github
issues
that
I
believe
are
to
erase
about
how,
like
the
output,
is
somehow
limited
that
you
can't
really
just
like
you
know,
restore
it
onto
another
primitive
instance.
D
So
that
kind
of
led
me
to
just
do
some
like
research
like
in
the
evenings,
look
through
code,
and
you
know
and
try
to
find
out
like.
Is
there
a
way
to
do
this,
and
it
turns
out
that,
like
primitives,
has
this
some
tsdb
package
that
really
offered
like
a
very
nice
apis
interface?
D
That
would
do
what
exactly
we
wanted
to
do
capture
data
blocks,
but
allowed
us
to
filter
it
by
by
you
know
like
a
time
time
range,
so
that's
kind
of
it,
and
I
want
to
go
into
a
very
quick
demo
and
really
there
are
only
three
commands
that
I
need
to
run
so
right
now.
I
have
two
kind
clusters
running
and
one
of
them
is
listening
to
port
9090
and
then
you
know,
and
this
demo
http
request
total
metric.
D
It's
just
like
a
dummy
metric
that
I
created
and
just
for
demonstration
purposes
and,
as
you
can
see
on
this
cluster,
like
there's
some
weirdness
going
on
right,
like
I
said
my
the
the
summation
rates,
like
I'm
dropped,
all
of
a
sudden.
You
know
something
some
weird
stuff
happened
here
and
you
know
and
pretend
this
is
like
a
200
gigs
of
data
directory.
You
know
I
don't
want
to
again
dump
and
restore
the
entire
primitives.
D
You
know
I
only
care
about
this
time
frame
here,
to
help
me
to
get
into
to
to
reproduce
and
diagnose
a
problem.
Now
I
have
a
second
instance
of
primitius
listening
at
port
1991.
D
D
Yes,
okay,
cool,
so
yeah.
So
I
have
like
one
kind
cluster
running
here
and
the
first
thing
I'm
gonna
do
is
just
like.
So
I'm
gonna,
the
problem
dump
is
just
a
cube,
cuddle
plug-in
and
it
provides
like
maybe
like
two
sub
commands
and
one
main
command,
and
actually
let
me
just
make
sure
I
am
looking
at
the
right
clusters.
D
Okay,
so
I
want
to
run
like
a
very
quick
like
meta
like
just
make
sure
I'm
pointing
to
the
right.
A
D
So
what
that
essentially
does
is
like
the
the
plugin,
the
cli
it
like
upload.
It
send
like
a
post
request
to
the
exact
sub
resource
of
the
prometheus
container,
and
then
it
attaches
to
like
the
sender
in
and
send
it
out
of
the
container,
and
then
it
just
streams
like
a
a
binary
like
a
second
application,
essentially
stream,
the
binaries
into
the
container
via
extended
in
and
then
like.
You
know
and
tell
the
container
to
hey.
D
You
know:
here's
like
a
stream
of
data
when
you
get
it
unturn
it
and
run
this
command
and
then,
like
stream,
back
the
output
to
me
via
send
it
out.
D
So
what
we
see
here
is
you
know
it
give
me
a
meta
theta,
the
matter,
the
meta
sub
command,
like
I'm,
just
giving
you
a
sense
of
how
much
data
am
I
dealing
with
so
have
blocks
per
system
blocks.
You
know,
details
about
primitive
storage,
tldr
have
blocked
in-memory
memory
map
files
and
mutable.
D
Persistent
blocks
are
like
more
like
quote-unquote,
long-term
storage,
where
I
think
the
data
becomes
immutable
and
like
they
get
compressed,
and
you
know
just
safe
on
this
basically,
and
so,
as
you
can
see
like
in
my
head
block
in
memory
like
I'm
about
like
what
25k
of
number
of
series
and
the
data
size
you
know,
how
much
is
this
maybe
like
37
neck
or
something
of
data
on
persistent
blocks.
D
So
you
know
you
give
me
a
sense
of
okay.
How
much
data
am
I
dealing
with
right
so
now,
if
I
try
to
like
just
like
dump
the
data
that
I'm
interested
in
same
part
and
then,
if
I
switch
back
to
my
prometheus
console,
so
maybe
I'm
only
interested
in
the
data
between
say,
maybe
around
four
o'clock
and
maybe
two
o'clock
there,
so
I'm
just
gonna
I'll
tell
it
to
like
go
and
look
for
this
data.
D
What
date
is
this?
This
is
11th
the
other
time
it's
all
in
like
utc.
So
and
then
I
go
next
time.
Equal,
20,
21,
11
and
then
say
what
was
it.
A
D
Okay,
let's
say
one
o'clock
in
the
afternoon
so
and
then
I'm
just
going
to
redirect
the
output
to
like
a
z
file
on
my
local
files
too.
D
So,
okay,
so
now
like,
I
should
have
a
local
file
here,
so
23
mac
of
data.
You
know
again
not
too
bad,
say
image,
pretend
it's
like
200
gig
in
total.
So
let's
do
a
tar
ts
and
see
what
we
got
here
so
essentially
yeah.
It
just
captured,
like
the
the
data,
the
persistent
block
that
falls
under
that
period
of
time
that
I'm
interested
in
and
at
the
same
time
it
will
also
like
capture
everything
in
the
chung's
head
and
the
wall
files
shadow
with
biotech.
Briefly
about
a
briefly
about
this.
D
I
think
like
yeah,
like
the
trunk,
the
head
block
allows
you
to.
I
guess
it
kind
of
query
like
the
time
range
as
well,
but
it's
not
as
easy
to
split
it
so
anyway.
So
I
guess,
like
I,
just
took
the
easy
way
out
and
said:
hey,
you
know,
give
me
everything
in
the
head
blocks
and
everything
in
the
wall
files,
because
those
are
pretty
much
like
confined
to
about
like
two
hours
of
data
anyway,
so
you
know
relatively
small
compared
to
all
the
other
persistent
blocks
within
the
container
so
yeah.
D
So
I
got
my
data
now
I'm
going
to
switch
over
to
my
other
cluster.
I
hope
it's
still
running
okay,
so
there
it
is
so
I
do
from
the
meta
data
just
to
see
what
is
already
in
the
container
so
yeah.
This
is
a
new
like
primitius
instance
that
I
just
installed
via
helm
and
you
can
see
from
the
last
column
here
it's
about
it's
less
than
half
an
hour.
Oh
so
not
there's!
No
persistent
data
blocks,
yet
it
does
have
some
head
block
data,
though.
D
So
what
we're
going
to
do
is
we're
just
going
to
run
an
example,
restore
sub
command
and
then
we
go
minus
t
and
tell
it
to
pick
up
this
dump
file
and
just
think
this
should
be
all.
I
need
all
right.
That's
really
quick,
relatively
quick,
because
you
know
just
27
meg
of
data,
so
not
a
lot
there.
Let's
run
the
metadata
again
oops
did
I
pick
it
up,
okay,
yeah!
So
now
I
can
see
you
can
see
like.
D
I
have
a
persistent
blocks
here
so
now
for
those
of
us
who
are
more
observant,
you
might
notice
that
hey,
you
know
how
come
there
are
two
blocks
here.
I
think
it
has
something
like
so
they
you
know
when
I
talk
when
I,
when
the
prom
dumb,
like
talks
to
like
the
tsdb,
there's
really
no
ways
to
say
like
hey,
you
know
like
break
up
your
persistent
blogs.
Give
me
a
quarter
of
it
or
half
of
it.
D
D
You
know
having
more
data
slightly
more
data,
hopefully
it's
better
than
you
know
having
missing
data
right,
so
I
restore
it
and
then,
like
you
know,
I
am
going
to
have
to
restart
my
part
and
the
prometheus
server
has
like
a
pvc
underneath
it
so
killing.
D
The
part
will
not
destroy
the
restored
data
and
the
reason
why
I
need
to
restore
it
is
just
to
give
prometheus
a
chance
to
replay
all
the
the
wall
files
and
pick
up,
like
you
know,
other
data
that
has
been
restored
so
now
I'm
gonna
like
watch
it
and
just
if
I
have
done
everything
correctly
like
you
know,
this
should
not
crash.
D
I
had
when
trying
on
open
shift,
like
you
know,
sometimes
like
when,
after
the
data
is
restored,
like
it
will
crash,
because
it
complains
about
like
the
head
blocks
not
being
in
order,
but
that
seems
to
happen
like
on
openshift,
but
like
we're
just
playing
vanilla,
like
kubernetes
kind
of
mini
cube,
like
that,
I
don't
see
like
any
data
corruption
at
all
so
far
in
all
my
testing
with
data
restoration,
so,
okay,
so
this
restore
now
I
need
to
redo
my
part
four
to
the
second
instance
of
committees,
because,
as
we
all
know,
part
forward
to
service
is
really
just
part
forward
to
one
of
the
parts.
D
D
Yeah,
so
if
we
compare
that
chart
here,
you
know
it's
almost
identical,
but
not
quite
you
know,
there's
some
missing
data.
So
what
we're
seeing
here
essentially,
is.
I
guess
this
is
it's
not
enough
data
here
I
wish
I
could
have
like
you
know
two
or
actually
no,
let's
try
this.
D
If
I
look
at
that,
okay,
in
this
case,
it
may
have
restored
like
every
thing
that
we
need.
I
wish
I
could.
There
was
a
way
for
me.
I
really
have
enough
data
to
show
like,
oh
you
know
like
yeah.
D
Only
this
chunk
show
up
and
then
changing
get
restored,
because
you
know
it
falls
under
a
different
block,
so
kind
of
like
what
I
was
saying
right,
like
I
tell
the
tsdb
that
hey,
I
need
data
from
this
chunk
of
time
range
and
then
it's
just
you
just
grab
everything
the
entire
block
of
blocks
in
this
case
and
then
just
restore
it,
and
here
like
this
slightly
like
the
head
chunk,
which
is
you
know,
the
part
that
got
replayed
as
prometheus
part.
D
We
started-
and
this
is
the
data
that
was
sent
for
the
restore
and
fall
on
that
one
or
two
blocks
of
persistent
data
blocks,
because
we
asked
for
a
time
range
that
spanned
between
two
blocks,
yeah.
A
D
Is
one
way
this
is
but
anyway,
so
this
kind
of
in
general,
so
I
had
a
chance
to
demo
it
to
our
teams
and
the
support
team.
You
know
to
find
it
helpful
helpful.
So
I
hope
that
like
it
will
be
something
that
the
community
community
will
find
find
helpful
too.
A
C
C
I
ran
into
a
similar
issue
for
an
ncd
backup
tool
that
I
wrote
a
couple
of
years
ago
and
it
was
not
fun
because
you
can't
reproduce
stuff.
If
you
don't
have
the
data
and
and
it's
you
know,
you
can't
blame
anyone,
because,
obviously
you
can't
just
you
know,
give
someone
external
a
dumbbell
of
your
production
system,
absolutely
not.
But
that
is
an
issue
and
maybe
you
are
smarter
than
me
and
find
out
some
way
around.
I
didn't.
D
Yeah
I
was,
I
was
hoping
to
actually
like
I
open
up
an
issue
on
the
primitives
github
repo
I
was
hoping
to
like.
If
there's
enough
enough
people
are
interested
in
it,
I
can.
I
can
move
the
entire
source
code
to
prometheus
community
repo,
so
others
can
help
me
to
maintain
it,
and
I
don't
have
to
spend
like
all
my
evenings,
trying
to
figure
out
bugs
and
stuff
like
that.
D
But
always
I
can
remind
people
that
this
is
not
like
a
backup
and
restore
tools,
because
this
is
like
a
just
a
sra
tool
with
a
very
specific
sra
use
case
in
mind,
and
you
know,
permeates
backup
and
restores
entire
thing
entire
product
that
I
haven't
quit
my
day
job
to
do.
But
you
know
this
is
just
sorry
so
but
yeah.
I
hope
that
will
help
scope.
The.
E
E
We
can
kind
of
play
with
some
tool
that
will
essentially,
for
example,
remote
read
the
data
from
promote
use
using
their
apis,
that
selects
actual
series
and
time
time
and
by
by
series
and
time
and
and
essentially
build
the
block
ourselves,
because
this
is
what
our
back
filling
tools
informatives
already
due
from
recording
rules
and
from
csv
and
from
json.
So
if
we
can
point
to
the
existing
promoters,
that
could
be
another
input
for
such
a
block,
which
could
be
much
much
smaller,
because
it
will
have
only
one
series
that
you
really
care
for.
E
So
this
is,
you
know
something
which
would
maybe
make
your
tool
even
more
lightweight,
right
and
yeah,
and
we
also
you're
welcome
to
propose
this
project
for
prometheus
community
repository.
But
you
know
also
it's
worth
to
mention.
This
is
not
a
way
to
just
you
know.
E
Just
remove
your
this
project
from
your
kind
of
you
know,
area
of
interest,
because
this
is
not
like
a
you
know,
random
place,
for
you
know
maintainers
with
lots
of
time,
but
it's
rather
like
how
to
make
sure
it's
more
adopted,
more
visible
project
for
everyone
else,
and
you
have
higher
chances
higher
chances
to
have
someone
else
to
help
you
but
yeah.
Just
ideas
to
improve
that,
and
also
we
can
work
on
a
better
importing
procedure
right
now.
E
You
have
to
restart
the
whole
prometheus
and
maybe
even
remove
the
actual
existing
data
data
directory.
I
mean
there
are
ways
to
reload
that
dynamically
or
maybe
even
add,
blocks
on
top
of
other
blocks
and
they
will
be
vertically
merged
or
vertically
queried,
and
I
think
some
of
those
things
already
are
doable,
maybe
behind
some
flags.
So
we
could,
you
know,
improve
this
behavior
as
well,
but
just
ideas,
yeah.
B
With
my
mc
head
on
and
we
got
it,
we
gotta
cut
it.
Thank
you
so
much
for
showing
us.
Please
join
us
in
our
slack
channel,
and
I
bet
there
are
lots
of
other
ideas
and
actually
I'm
interested
in
this
discussion,
but
we
need
to
move
on
to
to
guard
our
time.
So
thank
you
so
much
again,
zayn,
are
you
ready
to
present
yep
cool,
so
yeah
welcome.
Take
it
away.
F
Notice,
it
says
sig
observability,
like
a
tag
all
right.
Well,
thanks.
Everyone
for
you
know
having
as
over
here.
You
know,
we're
super
excited
to
be
part
of
the
community,
just
just
for
context.
You
know
I'm
saying
nascar,
I'm
currently
the
gm
and
gvp
of
pixie
and
open
source
new
relic
and
prior
to
this
I
was
the
co-founder
and
ceo
of
pixie
labs
and
we
were
acquired
at
the
end
of
end
of
last
year.
F
Also
I'll
kind
of
go
through
the
story
of
you
know
why
we
build
pixie
and
what
pixie
does,
and
you
know,
we'd
love
to
get
your
feedback
and
your
thoughts
and
as
of
about
two
weeks
ago.
Everything
over
here
is
is
open
source,
so
happy
to
you
know,
get
feedback
and
also
collaborate
out
on
on
building
this
up
in
the
open.
F
So
why
are
we
building
fixing
now
and
what
problems
were
we
aiming
to
solve?
You
know
I'm
not
going
to
go
through
this
entire,
like
marketing
fetch
right,
but
basically
software
is
more
decoupled.
F
F
Most
of
the
pxe
team
actually
comes
from
a
data,
and
you
know,
data
systems
and
machine
learning,
background
and
observability
is
actually
a
you
know,
relatively
new
area
for
us,
so
please
feel
free
to
provide
us
with
products
feedback,
but
some
of
the
challenges
we
we
noticed
is
that
you
know
when
we're
building
production
systems,
you
usually
land
a
thing
a
lot
of
times,
adding
adding
a
bunch
of
instrumentation
and
a
lot
of
this
instrumentation
is
added
up
front.
F
Some
of
it
is
added,
as
the
programs
are
built
out
and
services
are
deployed.
We
wanted
to
find
a
way
to
do
I'd
say
like
80
of
this
automatically
quickly
or
in
some
way
extendable
after
the
fact-
and
we
understand
that
you
know
in
many
cases
it's
actually
useful
to
manually-
add
instrumentation,
where
you
have
a
specific
business
logic
into
the
capture.
F
Another
area
where
you
know
we
really
look
at
especially
as
we
started
to
tap
into
new
data
sources
is
just
like
the
sheer
data
volume.
That's
generated
especially
captured
metrics
traces
and
watch,
and
then
you
trump
this
all
over
to
some
centralized
backend,
which
is
usually
hosted
by
some
provider
and
then
just
like
extensibility
of
the
interfaces.
F
So
one
of
the
things
we
did
with
pixi
is
that
we
kind
of
moved
to
this
model.
Where
similar,
I
think
in
many
ways,
similar
to
prometheus.
Everything
is
hosted
inside
of
inside
of
the
cluster,
but
one
of
the
differences
being
that
you
know
we,
we
focus
a
little
bit
more
on
the
logs
and
traces,
even
though
we
do
we
do
metrics
and
are
less
optimized
for
for
some
of
the
metric
use
cases.
F
But
more
specifically,
you
know
we
moved
away
from
having
like
very
rigid,
predetermined
collection
to
doing
everything
like
code
and
then
on
on
the
fly,
no
instrumentation
collection
and
then
we'll
see
how
this
works
in
a
couple
of
minutes.
Also,
you
know
pixie
moves
away
from
a
not
only
model
to
this
patch
plus
cloud
model.
F
We,
you
know
one
of
the
things
that
I
think
someone
earlier
mentioned,
bpf
one
of
the
things
we
heavily
leverage
is
bpf
inside
of
pixi
to
automatically
go
and
capture
data
and
also
allow
pixie
to
be
extended.
Part
of
the
challenge
over
there
is
that
you're
basically
flooded
with
so
much
data,
it's
very
difficult
to
move
it
off
the
machine,
so
we
have
to
build
our
compute
platform
to
be
able
to
handle
all
this
data.
While
you
know
it's
sitting
on
the
actual
actual
machines
first
collected.
F
Is
api
driven
and
scriptable
using
using
this?
You
know
python
dialogue
pandas
if
you're
coming
to
the
data
systems
world,
it's
very
pretty
common
way
to
represent
data
programs
and
that's
what
we
use
in
pixi.
F
So,
just
from
a
perspective,
what
is
pixi
right
again,
it's
instant
code,
driven
debugging
part
of
our
goal
is
like
how
much
data
could
we
get?
You
know,
after
installing
pixy
for
for
less
than
five
minutes
right.
So
that's
not
the
experience
that
we
optimize
for
you
get
pixie
installed
and
then
what
is
the
instant
visibility
you
can
get
and
we've
been
primarily
focused
on
apm,
which
is
like
application,
performance
monitoring
and
then
debugging,
and
it's
to
provide
this
like
baseline
level
of
visibility
and
start
to
get
into
code
level
context.
F
Although
some
of
the
people
in
the
you
know
the
broader
pc
community,
like
jana
at
aws,
she's,
been
looking
at
doing
some
security
stuff
with
with
pixie,
because
our
scripting
system
is
pretty
extendable
and
I'll
show
that
in
a
second
yeah.
So
the
way
pixie
works
at
a
very
high
level
is
that
we
have
these
things
called
pixie
edge
modules.
It's
a
little
hard
to
see
over
here.
So
let
me
just
blow
this
up.
F
Cool,
so
we
have
these
things
called
pixy
edge
modules
which
install
themselves
as
gaming
sets
and
kubernetes
nodes,
and
then
we
have
another
level
called
wazir
which
basically
serves,
as
our
you
know,
semi-centralized
data
system.
So
our
data
system
is
split
between
you,
know,
storage,
on
the
edge
and
then
storage
inside
of
inside
of
the
centralized
collection
system.
And
then
we
do
have
a
cloud
system
to
manage
things
like
you
know,
authentication
and
our
back
and
metadata,
search
and
and
stuff
like
that.
F
All
components
over
here
are
actually
open
source
and
you
can
launch
your
own
private,
public
or
private
pc
cloud
and
then
connect
multiple
clusters
to
it
and
get
administration
across
them
or
you
could
use
our
like
our
essentially
free,
like
community
hosting
which
we
have
on
pixie
labs
and
it
basically
just
hosts
the
open
source
version
of
pixy
for
everyone
to
use
yeah.
So
there
are
two
two
main
components
to
pixie.
F
As
I
mentioned,
the
cloud
and
the
zero
both
of
these
are
available,
and
then
we
can
interface
with
the
cli
which
is
available
on
our
github
repo
over
here,
the
web
ui
or
the
apis,
which
are
which
are
either
located
in
this
directory
or
a
copy
in
this
repo.
F
So
when
you
install
pixi,
you
can
install
pixie
with
our
cli
once
you
install
pixi,
you
can
immediately
you
know
within
within
a
couple
of
minutes,
depending
on
how
long
it
takes
kubernetes
to
deploy
all
the
public
services
get
visibility
into
into
your
application.
So
let
me
actually
start
with
the
cluster
level
view.
F
So
there
are
a
bunch
of
services
running
on
this
cluster.
Some
of
them
have
been
making
connection
or
requests
after
services.
You
can
get.
You
know
pretty
deep
inspection
of
what's
going
on
right,
so
it
says
like
how
much
traffic
is
going,
how
many
errors
all
that
stuff
everything
over
here
was
done
without
adding
any
manual
instrumentation
to
the
code
or
relying
on
any
instrumentation.
F
That's
already
in
the
code
so
I'll
dive
into
something
quickly
like
plc,
which
is
where
pixie
cloud
is
hosted
on
on
this
account
over
here,
and
you
know
you
can
actually
go
in
and
see
like
where
the
requests
are
being
made
so
over
here.
You
can
see
that
there's
a
proxy
service
that
talks
to
the
api
service,
which
you
know
actually
talks
to
the
authentication
service,
so
we
basically
discover
all
the
the
graph
and
everything
automatically.
F
F
This
was
a
query
that
was
done.
This
is
the
response
status
and
this
is
the
response
body.
It's
been
truncated
because
it's
long,
but
we
actually
do
like
full
body
tracing
of
requests
and
the
system
works
regardless
of
whether
you're
using
ssl
or
not.
So
in
most
cases,
even
all
of
this
traffic
is
actually
ssl.
So
in
most
cases
we
can
actually
capture
the
ssl
traffic
and
be
able
to
show
you
what
it
is
dubbing
in.
F
You
know
just
one
of
the
things
that
we've
been
really
into
is
getting
code
level
contacts.
So
if
you're
trying
to
debug
slow
requests-
and
you
click
on
the
api
server,
so
you
can
see
for
the
specific
pod
how
the
specific
plot
is
behaving.
F
F
So
if
you
ever
run
into
a
performance
issue,
you
can
go
narrow
down
the
time
window
and
then
see
like
you
know
where
your
code
performance
issues
are
happening.
F
So
part
of
our
like
long-term
goal
is
to
try
to
make
fixie
as
like
developer
friendly
as
possible,
which
in
our
mind,
means
like
trying
to
get
more
and
more
code
level
stuff
into
pixie
and
being
able
to
make
it
easier.
For
you
know,
engineers
who
are
actually
trying
to
debug
performance
issues
like
oh,
where
in
the
code
should
I
go
and
look
didn't
want
to
dive
into
too
many
details
of
exactly
how
the
flamethrower
works,
but
basically
wider
bars
mean
you
know,
you're
spending
a
lot
of
time
over
there.
F
So
it's
a
good
area
to
go,
go
debug,
just
to
add
a
little
thing
on
here.
You
can
actually
see
that
our
contact
starts
in
the
entire
cluster.
It
goes
to
a
specific
pod
container
pad.
We
can
actually
profile
across
the
entire
cluster
and
tell
you
how
you
know.
Different
different
pods
have
different
performance
profiles.
F
B
Yeah
zayn:
do
you
want
questions
in
line
or
do
you
want
to
finish
like.
B
F
Them
in
any
order,
I
was
just
gonna-
show
like
two
more
things
quickly.
So
one
of
the
things
I
wanted
to
quickly
show
is
that
you
know
everything
inside
of
pixi
is
done
using
using
scripts.
So
we
have
this
like
python
dialect
pixel
script,
so
you
can
go
ahead
and
like
yeah,
it's
basically
based
on
canvas.
You
can
say
here's
a
data
frame
and
then
you
can
do
all
sorts
of
stuff
on
it.
Our
goal
is
to
basically
build
up
a
bunch
of
scripts
in
the
community
right.
F
They
already
have
a
lot
of
them
that
you
can
run
it's
kind
of
where
our
long-term
goal
is
to
like
be
able
to
build
workflows
on
like
specific
things
or
specific
types
of
projects
like.
Oh,
I
want
to
debug
something
that's
low
in
costco
like
what
are
the
best
things
to
look
at.
What
is
the
right
keyboard
flow?
F
So
everything
is
done
using
this,
like
python
python
dialogue,
how
to
compile
and
execute
it's
pretty
powerful
like
in
the
sense
that
we
can
actually
go,
and
this
one
is
super
ugly,
but
here
the
code
is
actually
from
vpf
trace.
F
F
It
in
like
many
different
areas,
in
this
case
it's
basically
aggregated
by
source
and
destination
endpoints.
So
I
guess
a
step
back.
We
have
the
notion
of
like
pre-built
probes
that
we
already
have
like
always
instrumented,
but
you
can
also
add,
like
temporary
probes,
that
you
want
to
instrument
and
be
able
to
capture
data
for
some
of
the
other
functionality
we
have
for
things
like
pre-built
stuff
is.
F
Let's
say
you
want
to
take
a
look
at
postgres
data,
so
we
support
many
many
different
database
protocols,
so
you
can
actually
see
what
you
know.
The
the
exact
postgresqueries
are
so
actually
here's
a
postgresquery
that
was
executed
and
then
it
was
it
got
pursed.
So
we
have
like
this
raw
string
of
like
here's,
all
the
postgres
events
that
are
occurring.
You
can
do
similar
things,
for
you
know,
mysql
or
or
various
other
databases
that
we
support.
F
So
those
are
like
the
built-in
features,
but
then
we're
extendable
by
being
able
to
add
either
bp
of
choice,
code
or
dynamic
tracing
to
like
specific
needs
in
your
program.
F
I
know
I
went
through
a
lot,
but
you
know
feel
free
to
ask
me
questions
and
then
check
it
out
and
give
us
feedback,
but
I'll
stop
over
there.
C
Yeah,
that's
amazing.
It's
it's
really
fascinating
what
what
you
can
do
there
I
I
have
one.
I
have
two
questions,
but
the
one
I'd
really
like
to
understand-
and
maybe
you
can
help
me
in
the
sunshine-
is
that
if
you
compared
with
hubble
like
what
are
the
differences
or
the
overlaps
or
when
would
you
say
use
the
one
or
the
other?
Is
it
totally
different
like
what
what
would
you
say
in
terms
of
comparing
pixie
with
hubble.
F
Yeah,
let's
see
I'm
trying
to
remember
exactly
which.
C
Yes,
this
is
essentially
the
same
like
high
level
pitch
in
the
sense
of
you
know,
using
dbf
to
for
ppf
to
provide
that
kind
of
insight.
F
Yeah
I
remember
when
hubble
was
coming
out
so
hubble
I
think,
came
out
like
a
little
bit
before
we
open
source
pixie.
I
think
one
of
the
main,
or
rather
a
little
bit
before
like
right
when
we're
launching
pixie
as
a
as
a
product.
So
I
think
the
main
differences
are
that
hubble
is
focused
on
generating
metrics
from
the
data.
F
If
I
understand
correctly-
and
we
help
you
do
a
lot
more
of
the
raw
analysis
and
I
think
in
addition
to
the
bpf
stuff,
pixi
comes
with
an
entire
data
system
behind
it.
So,
like
you
know,
like
I
said
originally,
we
kind
of
come
to
this
machine
learning
and
data
systems
background.
F
So
for
us,
like
dpf,
was
like
how
do
we
get
lots
of
data
into
pixie
very
quickly,
so
we
could
actually
do
more
things
with
it,
so
it
kind
of
came
from
that
direction,
whereas
I
think
hubble's
coming
from
the
the
other
direction.
So
I
definitely
think,
like
you
know,
there's
a
future
where
they're
both
complementary
right
and
can
work
together.
F
B
Yeah,
thank
you.
I
had
a
quick
question
if
I
could
briefly
also
sort
of
around
exporting,
is
there
any
facility
in
pixie
to
kind
of
export,
not
a
continuous
stream
of
things
but
findings?
So
when
you
were
talking
about
a
systemic
view
across
all
pods
of
a
cluster,
for
example,
you
know
you
might
be
doing
a
right
sizing,
experiment
right
where
you're
going
from
a
staging
cluster
to
a
production
cluster,
and
you
want
to
put
in
place
the
right
kind
of
quotas
or
resource
limits
based
on
actual
performance.
B
Not
you
know,
whatever
defaults
were
either
copy
pasted
or
put
in
potentially
with
what's
running
or
or
if
you
wanted
to
identify
problematic
areas
and
like
generate
some
sort
of
actionable
report
that
you
could
then
give
to
a
team
and
say,
hey
everything
seems
to
be
working
across
these
two
dozen
services.
But
these
two
are
having
problems
right
or
is
there
any
kind
of
way
to
have
a
reporting
feature
or
an
export
of
findings?.
F
Yeah,
so
in
some
ways
you
know
our
standard
answer
to
that
would
be.
You
should
just
write
a
script,
that'll
export
the
findings,
and
then
you
could
stream
the
output
of
the
script.
So
we
do
have
streaming
support
right.
So
if
you
aggregate
the
data
and
say
you
want
to
do
this
over
a
30
minute
window
or
something
we
are
adding
in
the
ability
to
manage,
like
persistence,
persistent
scripts,
so
you
can
be
like
hey,
run
the
script
in
the
background,
and
then
you
know,
obviously
with
flag
errors.
F
The
script
stops
for
some
reason,
but
run
the
script
in
the
background-
and
you
know
if
the
script
says
every
30
minutes
make
a
request
somewhere
to
tell
me
yeah,
give
some
update
and
then
description
on
that
they
do
that
or
you
could
call
the
script
to
the
api
and
just
watch
the
results
or
call
for
the
cli
request.
Results.
B
Yeah
yeah,
I
was
I
was,
I
was
kind
of
thinking
of
you
know
when,
when
running
things
at
scale,
you
know
across
multiple
regions
and
things
like
that.
You
oftentimes
at
least
from
folks.
I've
talked
to
you,
have
a
fairly
small
team,
doing
a
whole
lot
of
things.
So
if
there
was
some
way
to
kind
of
say
you
know
of
all
the
things
happening,
we've
predefined
these
sort
of
alerts
or
conditions.
You
know
here's
things
to
look
at
or
you
know
we
want
to.
B
F
I
think
that
absolutely
makes
sense.
We
haven't
focused
on
those
use
cases,
but
that's
actually
been
brought
out
to
us.
Many
many
times
from
various
various
people,
including,
like
kelsey,
was
really
trying
to
convince
us
to
do
resource
optimization
stuff,
like
we
ultimately
focused
on
like
doing
performance
monitoring,
but
I
think
he's
actually
gonna
demo,
some
resource,
optimization
stuff
in
a
couple
of
weeks
using
pixi,
so
we'll
see
we'll
see
how
that
works
out,
basically
to
watch
resources
and
be
able
to
adapt,
adapt
resources
on.
C
Kubernetes
we
have
another
question
arthur
asked
regarding
how
sql
analysis
works
with
stuff
that
runs
outside
not
in
the
cluster,
and
I
think
that's
that's
a
really
good
question
right.
F
Yep,
so
the
short
answer
is
that
as
long
as
one
of
the
network
endpoints
is
located
in
the
cluster,
you
won't
be
able
to
tell
the
difference.
So
if
you
have
a
database
running
outside
of
the
cluster,
but
the
service
is
accessing,
it
is
inside
the
cluster.
It'll
still
work.
Fine,
we
will
switch
the
requester,
we'll
switch
the
tracing
automatically
from
server
side
to
client
side,
depending
on
which
end
is
located
within
the
cluster.
F
We
obviously
can't
get
you
like
the
information
about
the
database
like,
for
example,
we
do
like
jvm,
metrics
and
stuff.
You
won't
be
able
to
do
that
if,
if
it
was
running
on
a
different
node,
but
over
time
we
haven't
really
prioritized
this,
but
this
is
highly
requested
is
that
we
want
to
be
able
to
run
our
pens,
which
is
our
gaming
set
on
like
an
arbitrary
linux
vm
and
be
able
to
phone
home
the
data
to
as
long
as
you
have
one
kubernetes
cluster.
F
You
can
have
as
many
vms
as
you
want.
We
do
rely
on
kubernetes
to
host
our
data
system
and
that's
it's
pretty
hard
for
us
to
move
away
from.
But
as
long
as
you
have
one
kubernetes
cluster,
you
can
connect
as
many
linux
videos,
as
you
want
to
thank.
E
Question
coming
from
the
prometheus
world,
I
usually
care
about.
F
I
guess
I
should
have
been
a
little
bit
more
clear
about
this
up
front,
but
pixie
doesn't
do
long-term
retention
and
it
doesn't
do
things
like
alerting.
Our
goal
is
to
be
able
to
collect
lots
of
data
for
a
short
period
of
time
and
then
be
able
to
help
you
work
through
it
and
process
a
large
volume
of
data
and
go
overhead.
So
you.
D
F
B
Thanks,
so
are
you
saying
that
there's
a
plan-
or
there
already
is
the
ability
to
effectively
treat
pixie
as
an
exporter
of
sorts
from
a
prometheus
perspective
like
you've
got
this
sea
of
data
from
that
you
could
derive
interesting
time
series,
either
directly
or
by
computing
rates,
live
kind
of
thing
and
then
providing
a
a
scrapable
endpoint.
Is
that
what
you
meant
by
exporting
to.
B
F
Our
goal
will
be
more
the
former
which
is
like
you,
have
a
script,
that's
basically
generating
metrics
from
all
this
data
and
then
that
those
metrics
can
then
get
exported
to
prometheus,
either
by
us,
exposing
an
endpoint
or
through
some
push
gateway
or
something
we
haven't
quite
figured
that
out.
Yet
there
is
a
prototypal
way
that
basically
use
the
prometheus
push
gateway,
but
we're
open
to
suggestions.
B
So
I'd
love
to
talk
after
that
we've
written
a
prometheus
aggregating
push
gateway
for
metrics
from
our
front
end
for
the
same
thing
where
we
needed
to
get
some
visibility
but
yeah
we
had
the
problem
way
too
much
data,
so
we
aggregated
versus
a
standard,
push
gateway,
but
awesome.
E
Thank
you
for
for
explaining
this.
This
is
quite
epic.
I
wonder
what
it
is
involved
to
install
ebpf
kind
of
instrumentation
inside
my
cluster.
What
should
I
do?
I
have
kubernetes
cluster
and
you
know
what
steps
do
I
need
to
do
to
ensure
that
things
are
installed
properly,
and
you
know
the
the
pc
knows
how
my
service
is
actually
named
and
what
service
endpoints?
I
really
care
to.
F
Yeah,
so
right
now
we
require
everything
we
plan
to
provide
like
more
configuration
around,
don't
record
this
or
record
more
info
about
this
stuff,
but
in
the
current
state
you
know
we
connected
the
kubernetes
api
and
discover
all
the
services
and
pods
and
everything
so
it's
all
transparent,
like
it
actually
only
takes
about,
like
you
know,
two
or
three
minutes
install
pixie
and
like
those
dash
like
the
dashboards,
I
showed
you
were
not
like
specifically
configured.
They
just
should
automatically
work
on
those
clusters.
F
We
do
have
some
challenges
with
certain
things.
Like
you
know,
running
on
things
like
kind
like
kubernetes
kind,
could
be
very
challenging
for
us,
because
you
know
kind
actually
runs
on
your
local
linux
machine
and
they
can.
F
It
can
be
pretty
challenging,
but
like
typical,
like
kubernetes
clusters,
even
on
mini
cube
or
gke
or
aks,
or
whatever
eks,
we
don't
have
any
issues
with
sells,
hosted.
F
F
F
Continuous
profile
error,
for
example,
is
about
half
a
percent
overhead.
If
you
just
use
the
continuous
profiler
and
it's
pretty
constant,
it'll
use
up
half
a
percent
of
your
cpu,
regardless
of
what
your
server
load
is
and
typically
for
the
network
tracing.
You
know
we
see
like
somewhere
between
two
to
four
percent.
A
So
it's
proportional
also
with
the
with
the
traffic
more
or
less.
F
So
the
regular
like
the
network,
tracing
and
the
database
stuff
is
proportional
with
the
traffic.
The
continuous
profiler
is
independent
of
the
traffic
or
independent
upload,
because
we
basically
do
stack
sampling.
So
it's
like
a
constant
overhead,
even
if
the
server
revivals.
B
The
data
model
underneath
pixies,
like
where
you're,
storing
your
both
in
node,
local
storage
but
also
elsewhere,
is
that
one
security
domain,
if
you
will
like,
is
it
sort
of
you
have
access
to
all
the
things
or
not.
You
know
with
uvpf.
Obviously
you
can
get
inside
ssl
tunnels,
you
can
access
all
kinds
of
things
and
when
combined
with
a
script
like
you
know,
this
is
basically
god
mode,
plus
visibility.
F
Yeah,
that's
what
we
call
it.
We
got
like
that,
god
mode
visibility
for
clusters.
Our
our
goal
is
to
actually
sorry
matt.
I
I
think
I
spoke
over
here.
We
were
saying
it
was
just
plans
for
adding
more
security.
B
Well,
no,
I
was
curious
if
currently
there's
an
arbac
model
or
if
there
are
any
ways
to
compar
compartmentalize
the
data
either
to
have
some
of
it
be
yeah.
Is
it
just
like
you
have
access
to
all
or
nothing
or
is
there
a
security
framework
or
model
to
allow
for
protecting
some
of
the
data
that
may
be
sensitive.
F
Right
now
it's
it's
sort
of
all
or
nothing.
We
do
want
to
move
to
an
our
back
model
where
we
can
even
restrict
people
from
accessing
certain
tables
or
even
like
certain
fields
in
certain
tables,
but
we
are
not
there
yet.
So,
for
example,
we
could
prevent
users
from
accessing
encrypted
traffic,
but
we're
not
we're
not
we're
not
there
yet,
but
that's
something
that
is
on
our
roadmap
and
plans
to
build
out,
because
it's
obviously
very
important,
as
we
scale
out.
B
Sure-
and
I
know
we're
getting
short
on
time-
this
has
been
fascinating.
I
could
watch
this
all
day,
but
how
should
people
get
involved
if
they're
interested
in
this
say
they
said
they
want
to
play
with
it
or
they
have
ideas
or
they
want
to
make
it
better
or
they
want
to.
B
F
I'd
say,
like
kind
of
like
three
things
right,
one
is
like
someone
pointed
out:
there's
a
slack
community,
so
please
join
the
slack
community.
I
didn't
you
know
they'll,
be
super
super
helpful
for
us
to
get
feedback
over
there
and
it's
an
area
where
you
can
have
a
lot
more
discussion
with
somebody
on
our
team.
F
The
other
thing
is
to
the
other
thing
is
to
you
know:
file
github
issues,
we're
not
very
actively
asking
for
contributions
yet,
but
there's
some
area
that
you're
really
interested
in
working
on
like
file
an
issue
and
we'll
figure
out
a
figure
out
of
the
process.
For
that
we
will
be
opening
up
for
much
more
active
contributions
in
a
couple
of
months.
It's
just
that
we're
trying
to
get
everything
organized
and
figured
out,
because
you
know
we
started
out
as
a
sas
product
and
we're
trying
to
make
everything
open
source.
F
F
E
F
Yeah,
so
I
think
there
are
a
couple
of
areas
that
were
very
actively.
You
know
in
the
short
term
could
actually
use
coupons,
so
one
of
them
is
we're
working
on
a
grafana
plug-in
for
pixie
right
to
be
able
to
get.
You
know,
part
of
our
thing
is
we
want
to
be
able
to
work
with
all
the
other
tools
in
the
ecosystem.
Like
our
goal
is
not
to
build
like
a
you
know,
amazing,
ui
or
support.
F
We
have
a
ui
for
debugging,
but
we
want
people
who
will
use
for
fun
and
other
popular
tools,
we're
probably
actually
going
to
flip
the
switch
on
the
karfana
repo,
like
today,
so
very
happy
to
get
contributions
and
help
on
that.
The
other
thing.
Obviously,
I
think
that
we
mentioned
the
prometheus
side.
We'd
love
to
get
some
help
on
figuring
out
what
the
right
way
to
do.
That
is.
B
Any
other
questions
before
we
break
we're
almost
at
time
all
right.
Well,
thank
you
so
much
to
everyone
who
had
questions
and
discussion
and
for
the
presentations
themselves.
This
is
really
exciting
stuff
to
see,
and
I
will
see
you
all
online
and
next
week.