►
Description
Step by Step Kubernetes Observability with eBPF - Denis Jannot & Lin Sun, Solo.io
In this talk, we will explore how someone can use eBPF to get insights about the communications happening in a Kubernetes cluster. We will write an eBPF program and then use the BumbleBee (https://github.com/solo-io/bumblebee) open source project to build and deploy it. This program gathers information about all the network communications happening in the cluster and publishes the corresponding metrics that we store on Prometheus. We will then deploy a service that gets the metrics and correlate them with the Pod and Service IP addresses to build a graph displaying all the communications.
A
All
right
now
we
are
ready
to
talk
about
kubernetes
observability,
with
ebpf
step
by
step.
Let
me
quickly
introduce
our
speakers
so
he's
denis
he's
the
director
of
field
engineering
with
solo.I.
Oh
a
fun
fact
about
dimi
is
all
the
demos
he's
going
to
demo
today
are
all
written
by
him
by
himself,
I'm
the
director
of
open
source
with
solo.I.
Oh
I'm
a
very
long
time
contributor
to
the
project,
one
of
the
founding
maintainer
there,
a
couple
fun
fact
of
me:
I
recently
become
a
cncf
ambassador.
A
A
A
First
thing,
the
the
first
four
thing
that
really
caught
my
eye
when
I
started
looking
at
ebpf
last
year
is
a
sandbox
environment
and
it's
extended
right.
Thomas
talked
about
ebpf.
Allow
us
to
extend
the
kernel
like
javascript,
allow
us
to
extend
on
the
web
pages
right,
and
it
has
different
hook
points
which
enables
us
to
do
the
extension
and
that's
not
the
least,
is
supposed
to
be
very
safe.
Like
I
mentioned
to
you,
we
have
a
strong
background
related
to
istio,
an
envoy
when
I
look
at
ebpm,
I'm
connecting
it
to
web
assembly.
A
The
next
thing,
as
a
user
I
started
looking
at
ebpf,
is
why
networking
observability
security
right
on
the
layer,
three
and
layer
four-
and
this
is
very
interesting.
If
you
look
at
istio
and
envoy
technology,
you're
seeing
you
know
it's
still,
enway
is
also
solving
the
exact
same
category
but
we're
solving
a
layer,
seven
layer
right
such
as.
How
do
you
do
traffic
shifting?
How
do
you
do
rbac
control
service
authorization?
A
How
do
you
do
observability
at
layer
7.
when
I
first
started
learning
ebpf,
I
started
with
bcc
bpf
compiler
collection.
How
many
of
you
actually
wrote
the
ebpf
program
before
yeah
a
few
of
you,
many
hands
very
good,
so
this
is
a
great
toolkit
to
help
me
get
started
with
my
first
program,
but
I
find
a
couple
of
issues
with
this.
First,
it's
the
user
space
program
and
the
kernel
program
they
both
written
in
python.
For
me,
I'm
a
python
developer,
so
the
first
thing
I
actually
did
is
on
my
mac,
I'm
a
mac
user.
A
I
tried
to
run
that
program,
pythonhelloworld.py,
of
course,
that
didn't
work
right,
because
you
know
it
actually
required
the
system
have
bp
a
bpf
program.
Have
the
right
kernel.
It
has
needs
to
have
the
right
kernel,
header,
so
finding
the
wrong
time.
It's
actually
it's
a
lot
harder
than
I
thought.
I
think
I
spent
a
few
hours
on
this
and
finally
doing
google
search
finally
find
a
way
grandfather
that
allows
me
to
run
my
first
hollow
world
program.
A
The
next
thing
I
look
at
is
actually
the
presenter
before
us
that
was
talking
about
libby,
bpf
and
compile
ones
and
run
everywhere.
How
many
of
you
love
java
and
go
that
allows
you
kind
of
take
the
binary
to
run
it
everywhere?
A
Yeah
many
of
you
right!
So
what's
nice
about
this
dbpf
is
a
program
loader
right.
It
allows
you
to
kind
of
write
the
kernel
and
use
a
space
program
in
c,
and
then
you
can
compile
them
in
advance.
So
as
long
as
you
have
by
kernel
level,
you
can
run
them
and
execute
the
program,
so
the
compiled
binary
has
your
ebpf
program
compiled.
So
it's
ready
to
go
with
that.
I'm
going
to
pass
on
the
mic
to
denis
to
show
us
a
demo
on
this.
B
Yes,
thanks
lynn,
so
I'll
try
to
do
with
one
end,
google.
So
obviously
the
goal
here
is
oh,
where
we
go
that
that's
fine,
it's
okay.
So
the
goal
is
not
to
do
like
a
full
demo
of
leap
bpf
right,
but
just
because
we
are
going
to
show
you
how
we
can
do
observability
in
kubernetes
with
ebpf.
B
What
we
wanted
to
do
is
to
just
start
by
what
the
foundation
right.
If
you
want
to
have
ability
you
want
to
know
which
chevy
stock,
to
which
service
you
need
to
get
the
source
ip
and
target
of
the
network
communications
happening,
and
there
is
like
this
tcp
connect.
B
Example
that
is
quite
straightforward,
that
you
can
use
to
get
this
data
and
the
first
way
you
can
do
it
by
using
like
this
tooling,
like
libby,
ppf
and
corey.
So
I'm
going
to
start
with
this
and
then
lin
is
going
to
show
you
how
to
explain
what
we
contributed
to
the
community
with
this
bumblebee
project
and
then
I'm
going
to
do
the
same
program
with
bumblebee
and
then
we
are
going
to
do
step
by
step
ability
from
there.
So
what
we
have
here
is
this
two,
these
two
files
right.
B
We
have
the
kernel
file
here
right,
that's
this!
What
we
want
to
this
is
the
bpf
program
that
we
want
to
load
in
the
kernel
and,
as
you
can
see,
there
is
a
hash
map
that
has
the
key
is
the
destruct,
so
the
source,
ip
and
the
ip
address
value
is
the
number
of
requests
that
correspond
to
this
pair.
Basically,
and
then
what
we
want
is
just
displaying
this
information
right.
B
We
want
to
display
source
ip
destination,
ip
number
of
requests,
and
just
to
do
that,
you
need
to
write
this
complex
program
in
c
as
well
right.
So
it's
a
lot
of
lines
of
code,
if
you
think
about
it,
just
to
display
what
I
get
from
the
the
kernel
program,
so
I'm
going
to
do
that
so
that
we
have
one
working
example
and
after
that,
I'm
going
to,
we
are
going
to
do
the
same
in
another
way.
So,
let's
start
by
this
to
copy
these
files
that
have
so.
Let
me
go
to
this
directory
here.
B
I'll
do
a
lot
of
copy
and
paste,
so
that
should
be
good
with
one
end.
That
would
be
my
first
one-ended
demo
live
right,
so
let's
try,
and
so
I
just
take
these
files
in
my
current
directory
and
when
you
have
obviously
in
this
environment,
I
have
all
the
prerequisites,
so
that
seems
very
easy
to
do.
The
most
difficult
would
be
probably
to
have
all
these
requests
like
the
kernel,
headers
and
all
the
stuff,
but
I
get
them
so
here.
B
A
Can
you
guys
hear
me?
Is
the
microphone
good
better?
Hopefully
thank
you.
So
this
really
got
us
thinking
right.
What
if
we
could
only
write
the
kernel
program
and
don't
need
to
worry
about
the
user
space
program?
How
many
of
you
like
darker
and
the
darker
experience
well,
darker
cli
can
use
push
build
ray
so.
B
A
So
this
is
where
bumblebee
comes
in
by
the
way
I
know
the
screen
is
a
little
bit
tiny.
So
if
you
scan
that
qr
code,
you
will
be
able
to
visit
bumblebee.io
so
essentially,
bundleb
provides
a
docker-like
experience
to
ebpf
to
you
to
enable
you
to
easily
build
your
ebpf
program,
to
be
able
to
publish
your
program
to
your
oci
registry
and
be
able
to
have
somebody
else
or
could
be.
You
run
the
program
by
pulling
the
images
from
the
oci
registry.
A
So
essentially,
what
bumblebee
does
is
allow
you
to
focus
on
writing
ebpf
program
and
bumblebee
provides
the
user
program
for
you
and
generate
the
permissive
metrics
for
your
user
pro
for
your
for
your
user
pro
for
your
metrics
for
in
promises
format
automatically
for
you.
A
So
so
it's
a
really
nice
way
for
you
to
get
started
with
ebpf
and
start
learning
ebpf
here
at
solo
we
find
out
learning
ebpf
is
hard
and
we
spend
a
lot
of
time
debugging
uvpf
program,
so
we
decided
to
open
source
bumblebee
in
fact,
a
couple
of
maintainers
from
bumblebee
stitch
right
in
the
table
in
front
of
me.
B
Thanks
lynn,
so
so
yeah,
I
think
that
what
lynn
was
highlighting
is
the
fact
that
you
have
obviously
some
cases
where
writing
a
user
program
makes
sense,
because
you
want
to
have
like
a
bidirectional
communication
with
the
kernel
program.
But
when
it
comes
to
observability,
it's
not
that
true
right
in
terms
of
observability
we've
seen
a
use
case
before,
where
you
can
filter
out
some
of
the
stuff
in
the
kernel.
But
whatever
you
program,
you
just
want
to
display
it
or
to
process
it
in
a
simple
way
right.
B
So
that's
why
this
idea
of
having
the
user
program
built
for
you.
So
let
me
go
back
here
and
go
to
the
next
steps
we
use
like
instruct
for
that,
and
we
have
like
a
workshop
where
you
can
go
through
all
this
one
by
one
and
we
we
go
through
that
deliver
that
workshop
very
often
so,
if
you're
interested
just
take
a
look
at
our
website.
B
So
now,
first
thing:
I'm
going
to
do
is
just
getting
this
bumblebee.
Cli
oops
so
just
copy
and
paste
this,
and
so
the
first
benefits
you
will
see
is
that
I
don't
need
to
worry
about
prerequisites
right.
I
don't
care
about.
B
Yeah,
can
you
hear
me?
Yes,
okay,
cool
thanks,
so
you
don't
need
to
worry
anymore
about
your
pre-quizzes
right.
What
you
need
is
just
docker,
it's
quite
easy
right.
Obviously
you
have
like
some
kernel
requirements
when
you
load
it,
but
to
compile
it,
it's
really.
You
just
need
docker
right
and
you
get
this
cli
and
then
you
could
run
a
b
in
it
to
just
have
a
skeleton
created
for
you.
I
want
to
have
a
c
program
for
network
use
case
and
you
will
get
a
skeleton
here.
B
B
It
looks
the
same
right
and
to
show
you
that
it's
really
similar,
I'm
just
going
to
do
a
diff
here
of
the
two
files,
so
the
file
that
we
used
with
lib
bpf
before
at
the
file
we
use
right
now.
So
let
me
try
to
paste
this.
No,
no!
That's
fine
thanks,
but
that
should
be
thanks
just
trying
to
understand
why
it
doesn't
do
like
a
copy.
B
That's
always
this
funny.
I
I
got
like
some
strange
demo
effect
sometime,
but
never
like
the
copy
and
paste
not
working,
but
no
worries
we'll
find
it
perhaps
is
the
wi-fi
that
is
not
very
happy
again.
So
the
good
news
I
have
a
plan
b
already
in
place,
so
give
me
one.
Second,
I'm
switching
to
my
phone
and
that
should
be
better.
B
B
Oh
yeah,
it
was
really
just
long,
okay,
cool.
So
what's
the
difference,
you
see
really
minor
right.
So
what
I
want
to
highlight
is
that
you
can
take
whatever
source
code
that
you
have
for
lib
bpf.
You
take
just
the
kernel
program.
You
forget
about
the
user
program,
you
take
the
the
the
kernel
program
and
you
just
have
this
naming
convention
that
if
I
want
the
user
program
to
treat
this
hashmap
as
a
counter,
I
just
add
dot
counter
that
should
be.
Everyone
should
be
able
to
do
it
right.
So
it's
it's
really
straightforward.
B
B
B
B
Bear
with
me
we'll
do
it
so
you
see,
I
just
say
I
want
to
build
this
program
and
you
can
see
the
the
the
format
of
the
second
argument
is
like
really
like
a
docker
image
right
like
a
oci
image
right,
so
localhost
5000
means
that
I
run
my
registry
locally
and
solo
is
my
repository
and
tcp
connects
my
image
name
and
v1
the
tag
right
and
if
it's
like
local
lost
5000,
it's
because
obviously
I
have
my
registry
running
locally
in
my
in
my
machine
right.
B
So
hopefully
my
network
will
be
able
to
give
me
move
me
to
the
next
step
that
will
be
challenging
with
this
network.
Let's
try
again
so,
okay
export
this
again.
Now
it
is
built
anyway.
So
that's
fine!
So
the
next
thing
I
want
to
do
when
I
build
it
is
to
push
it.
You
know
it's
like
the
docker
experience
right.
You
do
a
build
with
a
docker
file
here
you
do
a
build
with
like
a
bpf
program
and
then
you
do
like
a
b
push
which
basically
will
push
this
to
my
registry.
B
That
is
running
in
my
local
machine
here
and
finally,
I
can
do
a
b
run
that
will
simply
run
this
program
from
my
image
and
you
can
see
very
similar
to
what
we've
seen
before
right
a
little
bit
nicer
ui.
But
I
will
not
tell
you
that
the
value
is
to
have
this
ui.
The
value
is
just
that
I
didn't
write
any
code
right
to
get
this
ui
and
the
thing
that
I
really
like
about
it,
in
fact,
is
not
really.
B
The
ui
per
se
is
really
the
fact
that
it
exposed
automatically
all
these
data
as
promoters
metrics
or
parameters,
format
matrix
right.
So
now
I
can
go
to
this
endpoint
and
I
can
see
the
data
the
exactly
the
same
data
that
I
I
can
see
in
my
ui
here
available
as
matrix
right.
So
you
remember,
we
said
the
goal
is
to
show
you
this
cubemates
observability
from
the
beginning,
so
we
we
have
achieved
the
first
piece
right.
B
So
what
we
are
going
to
do
next
is
pretty
simple:
we
are
going
to
run
like
a
demon
set
that
is
going
to
deploy
this
program
in
all
of
my
kubernetes
nodes,
so
that
I
don't
gather
just
the
information
from
my
local
machine,
but
I
gather
the
information
for
all
the
different
kubernetes
nodes.
Then
we
are
going
to
use
promoters
to
capture
this
matrix
and
we
are
going
to
display
a
nice
graph
at
the
end.
So
to
do
that,
we
just
deploy
like
a
demo
application.
B
A
lot
of
people
are
probably
familiar
with
this
booking
for
application.
If
you
are
not
it's
very
easy,
it's
just
like
a
an
application
that
is
composed
of
multiple
micro
services
that
talk
together
right.
So
just
if
you
want
to
show
a
graph
of
services,
communicating
together,
you
need
to
have
these
services.
So
that's
why
we
use
this
example,
and
if
I
look
at
this
cube,
ctl
get
pods.
I
use
the
minus
o
wide
is
just
to
show
you
that
we
have
these
containers
that
are
deployed
in
multiple
nodes
right.
B
You
see
master
worker,
one
and
worker
two
just
to
show
that
we
are
going
to
be
able
to
display
a
graph
with
these
communications
happening
in
different
nodes,
and
the
machine
where
I
am
right
now,
you
see,
is
called
root
at
master.
So
I
am
right
now
in
ssh
of
the
master,
where
I
have
this
image
that
I
push
to
the
registry.
Remember
I
told
you
about
the
registry,
so
if
I
do
a
docker
ps,
you
can
see
here
that
I
have
my
registry
running
right.
So
my
my
bpf
program
is
already
loaded
there.
B
B
So
I'm
going
to
really
do
a
very
simple
deployment
of
promotions
right,
so
really
the
basic
community
version
of
of
promoters,
and
after
that,
I'm
going
to
deploy
this
bumblebee
image
as
a
demon
set.
B
But
what
will
be
surprising
for
you
at
the
beginning
is
that
I'm
not
going
to
run
the
image
I
built,
because
it's
not
a
docker
image.
What
I
built,
what
I
built
is
a
oci
compliant
image
that
contains
my
ebpf
program,
so
I'm
going
to
run
a
demon
set
where
it
has
a
it's.
It's
a
docker
image
that
has
this
bcli
and
the
argument
of
this
command
that
it's
called
b
is
where
it
can
find
my
image
right.
So
it's
not
that
you
run
the
bpf
program
as
a
docker
image
right.
B
It's
really
like
the
b
program
itself
runs
out
of
document,
so
you
can
really
distribute
your.
You
can
build
your
ebpf
program
and
then
you
can
distribute
it
through
a
registry.
So
if
you
have
oci
registry,
which
is
the
case
of
most
of
the
people
today,
then
you
can
use
it
to
distribute
your
program
across
many
different
machines.
B
Let's
try
it
again
yeah,
so
I
run
it
as
a
demand
set
and
I
create
the
pod
monitor
so
for
people
who
are
familiar
with
promoters,
it's
a
way
to
tell
promoters,
go
and
scrap
the
metrics
from
this
pod
right.
So
I
have
this
demand
set,
so
I
have
one
pod
per
cluster
or
per
node
and
it
exposed
this
matrix
like
I
have
shown
you
before
right.
It
exposed
this
matrix
natively.
So
I'm
just
going
to
tell
promoters
to
go
and
capture
these
metrics
here,
that's
kind
of
the
modern
way
to
do
it.
B
A
lot
of
you
perhaps
are
more
familiar
with,
like
annotation
that
you
put
in
the
pods.
That's
like
the
old
way,
and
the
new
way
is
really
like.
You
create
a
pot
monitor
and
you
you
declaratively
say
you
know
what
you
want
to
to
scrap
and
finally,
I'm
going
to
generate
traffic
right.
So
I'm
just
going
to
go
into
my
front
end
of
the
book
info
application.
That's
called
product
page
and
this
product
page
will
call
the
other
services
right
so
just
generating
some
traffic.
B
B
The
last
step
is
I'm
going
to
I
built
a
very
small
program,
a
small
ui
that
does
two
things:
it's
connect
to
the
promoters
cluster
to
get
all
these
metrics,
so
it
will
get
things
like
source
ip
destination,
ip
number
of
requests,
but
the
problem
is
that
nobody
cares
about
source.
If
I
build
you
a
graph
that
says
this
ip
talk
to
this
ip,
you
don't
really
care
right.
B
What
you
want
is
a
pod
talking
to
a
service
right,
because
if
you
look
at
the
source
and
target
ip,
the
source,
ip
is
a
pod
ip
in
the
kubernetes
world
and
a
target
iep
is
a
service
ip
in
general
in
the
kubernetes
world
right.
So
what
this
program
will
do
is
that
it
will
also
connect
to
the
kubernetes
api
server
to
correlate
all
this
data
right.
So
that's
why
I
create
this
cluster
all
binding
to
tell
that
my
program
is
allowed
to
talk
to
the
kubernetes
api
server.
B
B
So
it's
still
creating
it,
but
you
can
see
already
the
bumblebee
program.
You
see
three
pods
because
I
have
the
master,
the
worker,
one
and
walker,
two
okay,
so
again
just
waiting
for
it
to
be
ready.
B
And
that
works,
so
I
can
see
here
basically
that
I
have
like.
So
let
me
like
change
again
and
if,
in
fact,
what's
funny
that,
if
I
refresh
I
will
have
even
more
data,
because
I
will
have
the
the
program
itself
trying
to
get
the
data
from
promoters
and
so
on
right.
So
you
can
see
here,
for
example,
for
people
who
are
familiar
with
the
product
page.
B
So
that's
it!
I
don't
think
we
have
a
do.
We
have
a
closing
slide.
I'm
not
sure
let
me
check,
but
I
think
that
was
mostly
it
right.
That's
what
we
done
and
across
nodes
that
just
give
you
an
overview
of
it
and,
as
I
said,
we
have
this
workshop,
where
you
start
from
zero,
and
you
do
that
by
yourself,
so
don't
don't
hesitate
to
register.
We
run
that,
like
every
month
across
different
time
zones.
So
thanks.
Everyone
yeah.
A
C
D
Hello,
are
you
familiar
with
the
ebpf
exporter
from
cloudfair.
D
D
But
I
think
the
like
underlying
concept
seems
to
be
very
similar
to
me,
and
you
know
I'm
really
excited
to
see
this,
but
one
question
that
I
had
on
this
was
when
you're
deploying
it
as
a
debug
set.
B
One
like
can
you
hear
me
yeah,
so
I
think
that
you
know
what
we
did
here,
like
the
demon
set
and
the
being
able
to
show
you.
The
kubernetes
thing
was
just
an
excuse
to
show
you
like
a
real
use
case
instead
of
a
program
that
would
just
display
some,
you
know
boring
information
right,
so
the
goal
of
the
project
is
not
to
simplify
the
way
you
are
going
to
deploy
in
production
or
managing
the
life
cycle
of
this
program.
B
B
What
tooling
I
use
to
run
these
different
things
in
parallel
or
whatever,
but
this
is
not
something
that
is
in
the
scope
of
the
of
the
project.
Right
now,
it's
like,
if
you
look
at
docker
right,
you
will
push
your
image
and
then
kubernetes
came
to
try
to
solve
all
these
other
problems
right.
So
it's
kind
of
the
same.
Perhaps
there
will
be
a
kubernetes
of
ebpf
because
there
will
be
so
many
bpf
programs
running
in
the
same
machine
that
you
want
to
tackle
all
these
things,
but
that's
definitely
not
the
scope
of
the
project.
B
E
Thanks,
I
really
love
the
way
you
do
with
the
oci
compliant
image
for
distributing.
E
But
my
question
will
be
some
kind
of
mechanism
to
to
check
them,
maybe
to
sign
them
to
say:
okay,
so
it
doesn't
become
something
like
I
can
download
whatever
thing:
malicious
ebpf
programs,
for
example,
to
say:
will
there
be
something
provided
because
now
you
are
runtime
and
I
can
already
imagine
my
developers
going
crazy,
downloading
stuff,
like
in
the
wild
dogger
times
things
from
strange
to
200
and
run
it
on
a
production
cluster.
B
Yeah,
that's
a
good
question.
I
think
that's
why
we
we,
basically
what
we
expect
is
people
will
use
like
a
public
registry
when
they
play
with
it
and
they
want
to
try
out
stuff
in
their
laptop
right.
But
we
don't
expect
people
will
allow
users
to
run
packages
coming
from
a
public
registry
and
it's
quite
easy
to
block
that
right.
So
we
expect
customers
to
use
their
own
local,
artifactory
or
local
registry
that
is
oci
compliant
and
to
manage
that
kind
of
policies.
Right,
yeah,.
C
E
C
Yeah,
I
think
that's
what
the
the
kernel
bpf
signing
work
is
about
yeah
quest
any
more
questions.
Yes,.
B
Yeah,
I
think
we
discussed
about
rust
as
the
something
that
was
asked,
but
I
mean
like
it's
definitely
an
open
source
project.
We
want
people
to
participate
so
feel
free
to
open
issues
there.
If
you
have
preference
and
so
on
right,
that's
definitely
the
idea,
I
think
yeah.
You
wanted
to
add
something.
G
So
maybe
I
would
let
him
indeed
I'm
the
founder
of
solo.
I
mean
this
project
idea
was
to
help
the
community
right.
We
using
it
internally.
We
thought
that
it
will
help
a
lot
to
the
community,
so
we
open
source
it.
The
purpose
is
to
bring
it
to
the
cncf.
We
are
in
the
process
right
now
to
basically
put
it
as
a
project.
You
know
in
a
in
the
cncf.
We
are
not
planning
to
keep
it
for
ourselves.
So,
as
I
said
like,
please
come
help
us
make
it
successful.
Okay,.