►
From YouTube: eBPF in Microservices Observability- Jaana Dogan, AWS
Description
Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.
eBPF in Microservices Observability- Jaana Dogan, AWS
A
Anyways,
everybody
is
here
for
a
reason.
You
know,
ebpf
is
kind
of
like
this.
Like
swiss
army,
you
know
toolkit.
My
reason
to
be
here
is
to
talk
about
like
observability
and
within
observability.
I
will
be
like
mainly
talking
about
like
microservices
and
how
we
are
using
ebpf.
A
So,
first
of
all,
I'm
not
a
linux
developer.
Like
some
of
the
people
here,
I
am
generally
like
working
on
monitoring
observable
and
performance.
Tooling
are
my
area,
is
you
know,
multi-tenancy
and
more,
like
you
know,
micro
services
focus
and
especially,
if
you
think
about
in
the
last
10
years,
with
the
you
know,
the
container
orchestrations
and
so
on.
It
just
became
so
much
easier
to
you
know
like
pack
things
you
know,
deploy
things
scale
up
and
down.
So
we
have
this
like
huge.
A
You
know
new
world
with
in
growing
number
of
like
microservices
topology
changes,
all
the
other
components
and
so
on,
and
so
on.
A
I'm
not
sure
if
you've
seen
this
before
this
is
from
brandon
gregg,
it's
about,
like
all
the
canonical
tools
that
we
used
to
use
back
in
the
day
to
diagnose
linux,
as
you
can
see
that
there's
all
these
like
different
layers
and
so
on.
Some
of
the
panelists
has
been
mentioning
like
very
briefly
that
this
world
was
great
because
it's
like
rich
there's,
like
so
many
tool
sets
and
so
on,
but
they
were
talking
to
like
some
very
concrete,
inflexible
apis
to
read
the
diagnostics
data.
A
So
this
model,
what
you
know
lived
for
a
long
time,
but
it
just
basically
didn't
scale
and
ebpf
came
out
as
a
result
and
ebp
is
this
more
programmable
way
to
be
able
to
hook
into
the
kernel,
get
the
events
and
then
in
the
user
programs.
You
can
basically
take
that
data
out.
Do
whatever
you
want
in
order
to
enrich
or
filter
or
aggregate?
A
So
to
give
you
a
very,
very,
very
brief
intro.
This
is
how
like
ebpf
works,
you
write
ebpf
programs,
you
send
them
to
handoff
to
a
verifier
and
a
jit
compiler,
and
then
you
can
attach
them
to
certain
places.
In
this
example,
I'm
attaching
to
the
sockets
to
be
able
to
you
know,
read
the
network
data,
and
then
there
is
this,
like
bpf
map
type
of
data
structures,
where
you
can
collect
the
data.
The
events
coming
from
the
sockets
and
bbf
event.
Maps
are
accessible
by
the
user
space
programs.
A
People
like
to
do
you
know
profiling,
continuous
profiling,
sometimes
based
on
the
data
coming
from
this
hooks,
you
can
hook
into
system
calls,
so
you
can
collect
system
calls
if
there's
a
really
huge
security
use
case
for
it.
You
know.
People
like
to
audit
monitor
system
calls
going
on
on
a
machine
network
events.
Is
the
other
example.
We
like
it
together
out
of
the
box,
you
know
network
telemetry
and
then
the
other
one
is
the
kernel
trace
points
so
before
jumping
more
into
ebpf.
A
I
want
to
like
recap,
some
of
the
bigger
challenges
that
we
had
in
like
microservices
and
when
I
say
microservice
I
think
think
about
like
in
a
grounder
scale,
think
about
also
your
kubernetes
cluster,
how
many
different
components
there
are:
don't
just
like
fixate
on
your
user,
your
you
know
your
own
services,
because
we
sometimes
know
a
lot
more
about
our
own
services
than
the
other
components
in
a
cluster.
A
One
of
the
bigger
challenges
in
microservices
is
like.
This
is
not
a
world
where
we
just
monitor.
You
know
virtual
machines,
or
you
know,
processes
anymore.
We
primarily
care
about
the
critical
path.
You
know
a
user
request
comes
in,
for
example,
you
know
it
hops,
through
different
services,
all
the
way
to
the
database
and
storage
we
care
about.
You
know
the
health
of
this
critical
path,
because
our
user
doesn't
necessarily
care
about
one
service
being
up
or
down
like
we
can.
A
You
know,
maybe
start
their
requests
for
a
different
replica
of
the
same
service,
but
they
care
about
the
health
of
their
critical
paths
because
that's
their
experience
and
if
something
is
going
down,
as
you
can
see
in
this
case,
which
is
a
downstream
service,
you
know
our
critical
path
is
broken.
So
it's
very,
very
important
for
us
to
be
able
to
you,
know
kind
of
like
understand,
what's
actually
going
in
a
critical
path
and
what
is
broken?
Some
you
know
like.
A
In
the
later
years
I've
been
working
on
distributed,
tracing
distributed
tracing
was
becoming
much
much
more
popular
because
of
the
you
know,
the
number
of
growing
services
and
different
things
in
our
critical
paths.
So
this
is
our
first
challenge.
The
other
challenge
is
the
con
is
the
context.
A
couple
of
people
in
the
panel
mentioned
this,
but
you
know
we
have
all
these,
like
you
know
different
services
in
the
chain
and
like
downstream
services,
don't
always
have
the
same
context.
A
If
you
make
a
request
from
an
upstream
service,
you
can't
really
like
capture
telemetry
data
at
the
downstream
services,
with
the
context
related
to
that
upstream
service.
Or
you
have
this
big
cluster,
you
know
it's
a
multi-tenant
environment.
You
want
to
be
able
to
capture
the
telemetry
with
your
cluster
name,
pod
name
and
all
of
that
stuff.
In
order
to
be
able
to
narrow
down
your
telemetry.
A
If
you
don't
have
context,
you
know
it
just
becomes
much
much
harder,
so
context
matters
a
lot.
This
is
like
a
typical.
You
know,
mn
problem.
We
usually
have
multiple
processes
and
there
are
multiple
like
rpcs
handled
by
each
process,
and
then
you
know
you
have.
The
containerization
is
the
namespace
and,
like
you
have
orchestration,
is
the
logical
grouping
and
you
know
you
want
to
be
able
to
capture
as
much
of
this,
this
type
of
context,
to
be
able
to
figure
out
where
the
issue
is
originated.
A
That
or
you
know,
when
you're
narrowing
down
your
telemetry,
you
want
to
be
able
to.
You
know
quickly
see
what
is
being
affected
in
order
to
understand
your
blast.
Radius,
the
to
to
kind
of,
like
you
know,
follow
up
with
the
the
critical
path
and
the
health
of
the
critical
path.
The
other
thing
that
we
he
started
to
do
is
like
when
there's
an
issue
we
first
debug.
A
What
is
in
the
critical
path
of
our
request
and
the
next
thing
we
do.
You
know
like
we
in
the
monolithic
monolith
times
it
was
more
common
to
be
able
to.
You
know,
just
go
and
debug
in
certain
functions
or
like
syscalls
and
so
on,
and
so
on.
Now
it's
kind
of
like
you
know
the
step,
one
debug
the
critical
path,
step:
two
you
can
go
and
dig
through
and,
like
you
know,
maybe
kind
of
understand.
A
What's
going
on
in
a
specific
service
and
so
on-
and
this
is
where,
like
you
know,
correlations
make
a
lot
of
difference.
We
have
actually
another
talk
with
morgan
mclean
in
the
conference
this
year
talks
a
little
bit
about
like
the
challenges
and
like
how
some
of
the
you
know.
The
ways
that
we
do
correlation
is
making
life
easier
when
it
comes
to
troubleshooting.
The
other
challenge,
like
someone
else
mentioned
today,
is
like
there's
too
much
data
in
a
in
an
environment
like
that.
A
Not
just
like
you
know,
evp
there's
already
too
much
event
data
and
you
really
need
to
want
to
be
able
to
sometimes
have
like
some
runtime
controls
or
like
have
like
a
control
plane
to
be
able
to
say
hey.
I
just
want
to
enable
more
data
and
like
disable
more
data,
that
type
of
stuff
becomes
very
important
because
of
the
enormous
amount
of
like
data
we
produce,
and
you
know
every
customer
I
talk
to
every
team.
I'm
working
with
instrumentation
itself
is
a
huge
burden
for
everyone.
A
I
used
to
work
at
google
on
the
instrumentation
team.
Now
I'm
kind
of
like
leading
parts
of
instrumentation
at
amazon,
and
you
know
it's
kind
of,
like
you
say,
there's
a
huge
amount
of
work
here
in
order
to
be
also
aligning
on
the
data
that
you
produce
consistency
of
the
labels
or
the
shape
of
the
data
naming
of
the
data.
It's
a
long,
long,
long
process
and
just
because
it's
such
a
gradually
moving
area,
you
always
end
up
being
inconsistent
in
terms
of
like
the
data
you
produce
and
so
on.
A
A
We
are
the
extensibility
in
the
runtime
is
really
really
critical
because
we
want
to
be
able
to.
You
know,
maybe
enable
and
disable
based
on
the
situation
in
order
to
be
able
to
troubleshoot
more-
and
you
know,
given
there's
so
much
data,
it's
costly
to
you
know
always
keep
maybe
instrument.
You
know
this
type
of
data.
A
You
know
fire
hose
up
and
running,
so
you
want
to
be
able
to
maybe
enable
and
disable,
and
we
talked
about
context
where
you
know
we
want
to
be
able
to
decorate
and
enrich
the
data,
so
it
becomes
much
easier
when
we're
navigating
the
data.
So
where
does
ebpf
help?
Eppf
has
a
lot
of
interesting
things
that
we
talked
about
like
network
diagnostics.
In
the
panel
you
can
get
out
of
the
box,
tcp
udp
http.
You
know
like
high
level
network
events.
You
can
turn
them
into
metrics.
A
I've
specifically
mentioned
metrics
here,
but
you
don't
have
to
you
know
you
can
get
like
very
raw
events.
You
can
also
inspect
like
you
know,
protocols,
for
example.
This
is
a
screenshot
from
pixi
I'm
running.
I
just
run
pixie
on
my
cluster,
and
this
is
all
the
like
inbound,
http
traffic
coming
to
my
service.
A
Without
me,
making
any
changes
or
anything
you
can
see.
Also
some
of
the
sample
of
the
slow
requests
you
can
go
and
like
inspect,
you
know
what
actually
happened.
This
is
another
example
from
like
psyllium.
Has
this
component
called
hubble?
It
comes
with
this,
like
nice
ui,
it's
just
so
easy
to
install
these
things
on
your
kubernetes
cluster.
You
run
like
you
know,
psyllium
hubble
enable
and
then
like
it
gives
you
a
you
know.
A
It
deploys
a
couple
of
components
and
then
there's
also
a
command
to
run
to
get
dui,
and
you
can
see
my
services
in
my
cluster
talking
to
the
world
and
you
can
see
in
the
in
the
bottom
section
that,
like
you
know
all
the
different,
like
you
know,
specific
requests
and
there's
like
some
metadata
about
it.
The
other
thing
that,
like
not
a
lot
of
people
are
talking
about,
is
distributed.
Traces
so
distributed
tracing,
is
a
very
tough
topic
because
it
requires
you
to.
A
You
know
propagate
trace
editors,
but
if
you
already
have
a
trace
editor
in
the
incoming
request,
actually
ebpf
can
help
you
to
generate
the
data,
because
you
know
as
soon
as
I
see
an
incoming
request
with
some
header,
I
can
generate
a
distributed
tracing
span.
So
you
know,
if
you
put
generate
your
distributed
tracing
headers
at
your
load,
balancer
or
something
you
don't
actually
have
to
like
instrument
all
of
your
web
services.
A
You
just
need
to
make
sure
that
you're
you're
passing
the
distributed
tracing
header
around
and
you
can
get
the
data
and
you
know
you
can
go
and
make
just
a
new
modifications
to
the
data
type
of
data
produced.
You
can
add
more
attributes,
you
know
you
can
just
kind
of
like
do
things
more
programmatically
to
like
enrich
the
data
and
so
on.
So
this
is
actually
a
very
cool
thing
that
not
a
lot
of
people
are
talking
about.
A
The
other
thing
is
continuous
profile
and
a
lot
of
people
talked
about
it
like
one
of
the
things
I
like
about
ebpf
is
actually
very
low.
Overhead
profiler
is
an
example
of
a
continuous
profiler.
There
are
so
many
of
them
nowadays.
What
is
interesting
about
contin
profiler?
Is
it
unwinds
the
stack?
So
you
can
see
like
the
kernel,
you
know
stuff,
invoking
user
space
programs
and
be
able
to.
You
know
like
see
the
entire
pro
you
know
profile
without.
A
You
know
that
breaking
the
same
thing
by
the
way
exists
in
some
of
the
other
projects
I
mentioned,
like
pixi,
there's
also
like
extensibility
side
of
it,
which
makes
us
really
happy
because
you
know,
as
I
mentioned,
that
like
sometimes
you
don't
need
this
data
all
the
time
there's
so
much
data
and
being
able
to
extend
is
is
great
like,
as
I
mentioned,
that
you
can
hand
off
an
evpf
program
and
like
enable
some
more
collection
and
some
of
the
control
plans
like
pixi,
is
actually
making
that
more
streamed
line.
A
So
you
know
you
can
pass
a
bp
bf
trace
program
and
you
know
it
kind
of
takes
it
and
distributes
to
the
existing
agent
to
the
existing
nodes,
and
you
know,
can
collect
more
data,
which
is
very
cool.
The
other
thing
is
the
decorating
with
context,
I'm
not
sure
if
you're
having
run
out
of
time,
but
as
I
mentioned
as
you
are
collecting
data
in
a
user
space
program,
this
is
where,
like
I
think,
magic
comes
in
because
in
the
same
context,
you
can
actually
look
for
additional
metadata.
A
In
this
case,
I'm
talking
to
the
kubernetes
api
server
to
read,
you
know
which
cluster
I
mean,
sort
of
the
name
space
I
mean
like
my
pod
and
so
on,
like
in
in
in
the
type
of
data
coming
from
ebpf
events.
It
for
network,
for
example,
is
like
you
have
a
source,
ip
and
a
destination
ip.
You
don't
know
much
about.
A
You
know
what's
going
on,
if
you
just
export
it
as
it
is
it's
not
that
useful,
but
if
you
can
resolve
what
services
those
are
like
what
pause
or
you
know
what
additional
metadata
you
know
about
those
ips,
then
it
becomes
useful.
So
it's
it's
really
nice
to
be
able
to.
You
know,
decorate
things
with
context.
This
is
a
profiling
from
pixi
and,
as
you
can
see
at
the
bottom,
the
data
is
broken
down
by
namespace
pods.
You
know
container
pid,
so
you
can
narrow
down
and
you
know,
navigate
the
data.
It's
very
useful.
A
You
know
when
you
have
an
incident
and
you
just
want
to
just
go
and
focus
on
one
specific
thing.
So
I
mentioned
several
projects.
If
you
want
to
take
notes,
seleum
and
hubble
does
a
lot
of
things.
Pixi
does
so
many
float
mill
was
an
earlier
project.
It's
been
sort
of
like
merging
to
open
telemetry.
Now
profiler
is
the
continuous
profiler
and
parka
from
one
of
the
prometheus.
Maintainers
has
just
been
released
as
a
continuous
profiler
based
on
ebpf.
A
So
what
is
coming
up
next,
like
I
think
a
lot
of
people
have
different
ideas.
These
are
you
know
my
ideas,
I
feel,
like
you
know,
there's
a
burden,
because
a
lot
of
people
are
still
you
know
finally
struggling
to
write
ebpf
programs,
so
maybe
a
high
level
language,
I'm
not
sure
you
know
how
it
would
look
like,
but
it
might
make
things
more
streamlined.
A
We're
talking
about
more
platforms.
Supporting
ebpf
windows
is
a
very
you
know
interesting
example.
A
At
aws
we
have
different,
you
know
more
restricted,
like
platforms
like
fargate,
we
have
a
very
tiny
like
vm
over
the
different
virtualization
layer
like
firecracker,
so
you
know
we're
looking
into
like
making
ebpf
available
in
these
places,
and
the
other
problem
is
like
you
know.
Some
people
are
very
far
behind
in
terms
of
kernel
version,
so
I
hope
that
like
people
will
be,
you
know
just
moving
up
because
of
all
the
goodness
coming
from
ebpf.
A
A
Like
you
know,
ebp
programs
are
signed
and
sandbox,
but
you
know
if
you
are
enabling
disabling
some
of
these
things
in
production
and
especially
like
copy
paste
and
someone
else
is,
like
you
know,
c
code
like
it's
just
not
great,
so
you
know
it
would
be
super
nice
if
we
were
able
to
distribute
them
in
a
more
science
in
a
different
way.
A
So
that's
something
to
discuss
with
the
larger
community.
So
I
just
want
to
thank
you.
I
hope
that
I
didn't
run
out
of
my
time.
If
you
have
any
questions,
you
know
find
me
here
or
email
me,
there's
also
an
after
party.
By
the
way
I
realize
pixie
is
a
after
party,
the
rooftop,
if
you're
around,
if
you're
in
l.a
in
person
you
have
to
rsvp,
I
highly
recommend
you
just
check
it
out.