►
From YouTube: SPIFFE at GitHub - Eric Lee
Description
SPIFFE at GitHub - Eric Lee
We’ve been rolling SPIFFE out internally at GitHub to empower teams to manage interoperable Production Identity documents. In this talk we’ll give a brief overview of how we’ve deployed SPIRE and leveraged its plugin system to integrate with our internal systems and tooling.
Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2021 Virtual from May 4–7, 2021. Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.
A
A
Getup
submission
is
to
be
the
home
for
all
developers.
We
are
the
world's
largest
subversion
repository
management
company,
and
we
also
hold
host.
Git.
Also
is
something
we
do
so.
The
goal
of
this
talk
is
to
provide
something
of
a
practitioner
story
or
a
team
trying
to
make
this
available
internally
for
a
company
that
has
around
15
or
well
11
years
of
infrastructure.
A
Opinion
is
not
fully
running
on
a
public
cloud
but
runs
on
multiple
public
clouds,
some
of
which
may
start
with
the
letter
a,
and
I
really
want
to
talk
about
two
implementation
details
of
how
we
operate
spire
today.
A
The
first
is
how
we
operate
our
agents,
and
the
second
is
how
we
generate
custom,
node,
selectors
support
registration,
entries
for
vending
svids
to
workloads
and
I'll
try
to
wrap
it
up
with
some
takeaways
and
learnings
and
outcomes
that
we've
achieved
on
the
team
and
as
a
full
disclaimer.
This
is
how
we
do
it.
This
is
not
how
to
do
it,
and
I'd
like
to
thank
ben
bury
from
my
team
who
reminded
me
to
give
this
disclaimer
to
people,
because
we
don't
want
to
present
our
work
as
what
you
should
do.
A
In
the
past
two
years,
we've
been
ramping
up
the
product
offerings,
we've
been
generating
more
traffic,
more
traffic
worldwide,
we've
actually
taken
measurements
internally
for
tcp
flows
and
we've
kind
of
shown
like
a
linear
growth
in
internal
traffic.
So
there
are
more
things
talking
to
each
other
inside
the
dmz
than
before,
and
there
are
more
data
centers
than
before.
A
On
top
of
that,
there's
been
a
lot
of
hiring
and
news
services
net
new
products
packages,
cisga
trying
to
remember
what
I
can
and
can't
talk
about,
but
go
to
the
changelog.
It's
very
well
written.
There
are
a
lot
of
things
coming
out
and
there's
a
lot
of
software
behind
what's
coming
out.
A
In
addition
to
that
kind
of
the
past
two
years
have
shown
acquisitions,
github
was
acquired,
npm
symbol,
these
acquisitions
bring
their
own
infrastructure,
their
own
opinions,
their
own
systems,
and
so
we
kind
of
took
great
pains
to
be
sympathetic
to
how
our
colleagues
are
coming
to
the
organization
and
how
they
want
to
work
with
us.
A
So
how
do
we
kind
of
plan
for
all
this
variation
in
what
run
at
what
runs
a
github
and
where
it
runs,
we
initially
were
interested
in
spiffy,
because
it's
extensible
and
open,
for
example,
I
think
in
evan
and
andrew's
presentation
they
talked
about
the
upstream
ca
plug-ins
of
which
spire
is
itself
one.
We
run
vault
internally
and
we
don't
necessarily
want
to
build
a
parallel
pki
infrastructure
just
to
support
spire.
A
B
A
Well
and
point
three,
which
I
think
didn't
actually
make
the
slide
is
we
have
workloads
that
leverage
l4
load
balancers,
so
some
groups
use
just
jot
svid.
Some
groups
seem
to
use
9.
We
use
both.
So
we
can
support
your
use
case,
whether
you're
mediated
by
a
load
balance
or
not.
A
Talking
a
little
bit
about
the
approach,
good
tools
have
gradations
of
power
and
we're
trying
to
make
our
platform
offering
as
modular
as
possible.
So
visually
you
could
kind
of
think
of
it
as
a
pyramid
where,
as
you
go
up
the
pyramid,
you
have
a
reduced
sort
of
area
where
that's
curation.
A
So
at
the
very
bottom
we
have
these
interfaces
of
x509,
svid
and
jadasvid,
where,
if
teams
were
to
actually
conform
to
these
themselves,
they
could
potentially
just
be
in
spec,
because
this
is
an
open
standard.
This
is
more
of
a
utility
than
a
strategy
component
of
how
github
uses
technology
and
in
the
center
we
want
to
be
the
team
that
operates.
A
centralized,
spire
infrastructure
stands
up.
A
The
servers
manages
the
data
store,
manages
the
infrastructure,
automation
for
agents
and
provides
sort
of
a
workload
api
out
of
the
box
for
teams,
in
whatever
execution
environment
they're
running
in
and
for
teams
that
don't
necessarily
want
to
deal
with
raw
infrastructure
at
the
very
top
where
we
hope
to
land.
Almost
everybody
is
development
tools,
so
shared
libraries
packages
and
we've
also.
I
think
everybody
in
industry
has
developed
a
side
card.
One
point
or
another:
it's
kind
of
like
making
your
own
web
framework
in
2020.
Everyone
just
does
it.
A
I
don't
know:
we've
developed
an
external
authorization,
speaking
sidecar
and
external
authorization
as
in
envoy,
so
we
can
actually
use
envoy
to
inject
and
validate
shot
tokens
coming
in
and
out
of
your
service.
A
This
use
case
is
particularly
applicable
for
dynamic
languages
where
we
may
not
necessarily
want
to
go
too
deep
into
the
app.
We
don't
want
to
do
too
much
surgery
on
the
workload
or
be
kind
of
intimately
involved
in
the
internals
of
something
that
we're
really
just
trying
to
mediate
authentication
with
using
spiffy.
A
So
take
one
we
initially
started
experimenting
with
spire
running
in
kubernetes.
A
We
run
cube
internally
and
we
wanted
to
actually
leverage
that
team's
good
work
and
all
of
their
kind
of
gains
in
reliability
and
operability
to
not
necessarily
be
managing
vms
metal
ourselves
and
the
kind
of
reference
architectures
we've
seen
in
the
community
are
services,
as
as
in
kube
services
to
run
spire
servers
and
agents
running
his
name
in
sets
per
node.
A
We
observed
some
issues
after
kind
of
kicking
this
around
for
a
little
while
in
the
first
first
month,
or
so
in
particular,
with
agents.
So
daemon
sets
can't
be
made
highly
available.
A
They're
they're
unbounded
in
downtime
between
deploys
and
you're
actually
kind
of
relying
on
the
kube
scheduler
to
place
a
pod
to
replace
the
pod.
That
was
the
daemon
set,
so
that
that
was
a
challenge.
A
Workloads
also
can't
rely
on
spire
being
available
at
startup
because
of
this
non-determinism
related
to
the
scheduler,
so
all
workloads
or
whatever
curation
we
provide
to
users,
would
have
to
implement
some
sort
of
retry
or
blocking
mechanism
to
kind
of
pull
or
wait
for
the
workload
api,
not
not
the
end
of
the
world,
but
kind
of
another
small
piece
of
complexity,
rather
than
relying
on
the
invariant
of
a
workload
api
being
there
and
ready
waiting
for
your
workload
on
startup
and
something
that's
kind
of
a
subtlety.
Is
the
duel
of
that
race.
A
A
A
A
So
we
kind
of
avoid
the
problem
of
pod
schedule
ordering
by
avoiding
the
kubernetes
scheduler
entirely
and
we
kind
of
make
spire
part
of
the
second
party
software.
We
lay
down
on
a
coupe
note
before
the
kubelet
starts
to
take
work
from
the
api
server.
A
So
this
mitigates
some
of
the
race
conditions.
I
mean
it's
probably
still
good
resilient
practice
to
pull
or
wait
for
a
workload
api,
but
this
problem
is
largely
mitigated
by
just
making
sure
that
the
workload
api
is
resident
before
the
pod
is
started
and
the
dual
maintenance
goes
away,
because
everything
is
just
one
set
of
infrastructure,
automation
and
that
is
actually
the
systemd
logo.
I
went
into
google
image
search,
I
think
it's
a
green
light
being
pointed
to-
or
maybe
it's
it's
the
letter.
A
Maybe
it's
like
the
missing
vcr
button.
I
don't
know.
Maybe
one
last
thing
is
systemd
kind
of
allows,
an
ordering
of
units,
so
obviously
we
can
say
for
workload
out
of
station.
We
would
like
to
start
after
the
kubrick
starts.
A
So
there's
not
kind
of
a
false
signal
about
errors
being
unable
to
contact
the
kubelet
things
like
that
be
kind
and
rewind
yeah.
So
if
we're
actually
running
this
as
a
systemd
unit,
how
do
we
expose
it
to
pods
a
kind
of
redacted
modified
version
of
the
the
wall
of
yaml
for
a
deployment
is
on
the
left
and
we
kind
of
take
the
underlying
domain
socket,
put
it
into
a
volume
and
just
simply
mount
that
into
the
container
within
the
pod.
A
Kubernetes
is
kind
of
a
mature
doll,
but
you
know
pods
live
in
templates
inside
of
deployments.
That's
what
this
illustrates
essentially
sort
of
the
the
punch
line
is
the
view
from
within
the
pod
is
identical
to
a
workload
running
on
metal
or
vm.
A
We
also
don't
actually
use
mutating
web
hooks
or
any
sort
of
pre-deployed
machinery
to
place
these
in
our
experience
has
been
just
instructing
teams
to
add
these
few
lines
for
the
volume
and
kind
of
guaranteeing
that
whatever
cluster
they're
running
on
provides.
This
domain
socket
gives
us
a
lot
of
mileage
and
avoids
a
lot
of
magic
and
people
sort
of
have
folks
have
told
us
that
they,
they
appreciate
the
kind
of
transparency
and
how
things
work
and
and
what's
actually
going
on.
A
A
The
second
thing
that
I
wanted
to
talk
a
little
bit
about
today
is
generating
custom
node
selectors.
So,
as
I
said
earlier,
in
the
talk,
github
runs
in
multiple
clouds.
We
use
multiple
container
orchestrators
and
containers
outside
of
orchestrators,
which
is
also
a
lot
of
fun.
A
We
run
docker
bear
for
some
workloads,
so
the
consequence
of
this
is
we
can
actually
build
a
service
once
and
run
it
n
ways
in
in
m
places
if
we
think
about
how
to
vent
identity
to
all
of
these
workloads,
there's
kind
of
one
dimension,
that's
the
same,
which
is
a
selector
which
we
can
gather
about
the
workload
using
workload
attestation,
using
whatever
workload
attestation
mechanism.
We
are
using.
A
We
had
to
do
a
little
bit
extra
work
to
kind
of
propagate
similar
notions
that
you
get
out
of
these
out-of-the-box
noda
testers
into
our
selector
library.
A
Right
one
thing
I
can
share
about
how
we
run
our
sites
is
machines
have
their
own
per
machine
certs.
So
we
can
actually
leverage
the
x509
pop
a
tester
and
use
sort
of
some
of
that
cert
key
material
to
to
pull
some
notion
and
verify
the
identity
of
something
trying
to
phone
home
to
the
spire
server.
A
A
So
using
agent
path
template
in
the
x509
pop-a-tester
we
actually
using
this
little
bit
of
go
templating.
We
pull
out
the
common
name
from
the
per-machine
insert,
which
does
actually
contain
the
fully
qualified
domain
name
of
the
machine,
we're
trying
to
bootstrap
an
agent
on
so
we
kind
of
go
from
spire
agent,
sha-1
hash
to
an
actual,
fully
qualified
domain
name.
When
you
do
a
spire
server
agent
list,
let's
keep
going.
A
So
the
consequence
of
actually
only
having
this
one
verifiable
piece
of
information,
because
the
server
doesn't
necessarily
trust
the
claims
of
the
the
agent
in
no
data
station
is
that
we
have
to
key
off
of
this.
One
datum
and
kind
of
from
the
server
side
can
consult
some
other
trusted
api
to
gather
more
information
about
the
agent
and
where
it
runs,
is
it
a
kubernetes
node?
Is
it
a
file
server?
Is
it
you
know
a
a
bastion
machine.
A
Things
like
that
are
not
something
we
can
take
at
face
value
from
the
noted
tester,
so
we
actually
have
to
write
a
custom
node
resolver
that
pairs
with
the
noda
tester
the
x509
pop
nodatester,
and
we
actually
bundled
this
as
an
os
package,
because
the
interface
is
just
a
protobuff
and
grpc
and
we
kind
of
provision
the
server
with
knowledge
of
an
allow
list
of
what
metadata
to
pull
back,
because
every
node
has
a
set
of
metadata.
We
actually
for
the
purposes
of
registration
entries
care
about
a
subset
of
that.
A
So
the
real
result
of
this
is
we
get
extra
selectors
specific
to
github
and
how
we're
running
infrastructure
for
use
in
registration
entries
and
as
a
reminder,
registration
entries
are,
can
be
written
with
workload,
selectors
or
node
selectors.
So
that's
kind
of
the
the
application
and
then
the
where
it
runs
piece
of
how
you
vend
identity.
A
This
is
a
snippet
of
configuration
to
illustrate
what
my
mouth
noises
actually
mean
in
terms
of
what
it
would
look
like
on
aspire
server.
So
what
I've
tried
to
highlight
is
this
is
also
x509
pop.
A
So
we're
not
kind
of
arbitrarily
executing
random
things
in
the
system
path
and
the
way
we
kind
of
distribute
this
plug-in
command
is
just
as
a
base
os
package
using
our
internal
packaging
machinery
kind
of
below
that
the
plug-in
data
is
interpreted
in
our
own
code
for
the
node
resolver
to
unpack
the
node
attributes
that
we
want,
as
this
allow
list
to
pull
things
in
and
make
them
selectors,
because
if
we
were
to
pull
everything
in
there's
no
actual
utility
of
knowing,
maybe
what
type
of
rack
switch
a
machine
is
connected
to
or
what
things
like
that
are
just
kind
of
superfluous.
A
An
example
of
this
is
the
one
verifiable
claim
which
is
the
common
name
out
of
the
machine.
Cert
is
used
as
the
kind
of
key
we
key
off
of
in
our
internal
registry
api
to
pull
out
these
other
selectors,
which
I've
highlighted
in
white
and
with
kind
of
a
canary
yellow
box.
A
It
looked
different
when
I
was
making
it,
but
I
the
point
is
the
same:
it's
all
highlighted
there
and
you
can
kind
of
see
that
these
are
prefixed
similar
to
how
we
structure
the
node
resolver.
So
it's
gh
api
github
rather
than
x509,
subject:
x5
509ca-
and
these
are
now
kind
of
available
to
us
to
to
write
registration
entries
in
addition
to
workload,
selectors.
A
A
A
And
some
other
observations
that
we've
made
universalizing
authorization.
I
think
this
is
something
we
talk
a
lot
about
in
the
community
spiffy
for
authentication
and
authentication
only
having
no
opinion
about
authorization
that
that
actually
has
been
kind
of
a
point
of
leverage
for
us,
because
systems
and
teams
have
either
invented
their
own
may
have
an
interest
in
standardizing
may
not
have
an
interest
in
standardizing
and
and
fine-grained
acls
are
not
something
we
necessarily
want
to
try
to
provide
parity
with.
A
In
our
opinion,
at
the
top
of
the
pyramid
in
one
of
the
earlier
slides,
we
really
just
want
to
give
people
documents,
they
can
verify,
are
good
or
not
good.
There
are
no
goals
to
kind
of
build
policy,
languages
or
enforce
policy
languages
to
replace
what
we
have
already,
and
I
think
I'm
bumping
up
against
time,
but
one
other
observation
that
we've
made
is
being
forced
to
write
registration
entries
to
identify
the
shapes
of
workloads
is
a
forcing
function
for
discussions
about
blast,
radiuses
and
security
perimeters.
A
If
you
can't
actually
differentiate
between
two
workloads,
cohabitating
a
machine
with
a
registration
entry,
that
means
they
have
kind
of
the
same
level.
Privilege,
inappropriately
so,
rather
than
kind
of
looking
at
oh,
we
can't
take
this
and
isolate
it
from
this
other
thing,
because
they're
running
as
the
same
user
or
they're
in
the
same
group,
we
kind
of
inverted
that
and
saw
these
exercises
in
discriminating
between
what
gets
what
as
fit
as
opportunities
to
improve
our
security
posture.
A
So
that's
invited
some
interesting
conversations
about
everything
from
how
we
kind
of
build
machine
certs
to
how
we
run
certain
things
in
coop.
So
there's
that
and
I'm
going
to
go
to
the
last
slide,
which
is
just
the
silhouette
and
stop
sharing
here.
B
A
I
think
we
partner
with
teams
and
and
it's
situational
there
are
workloads
that
literally
run
in
dual
modes,
so
they
actually
it
necessitates
them
getting
both
documents.
Even
though
it's
the
same
executable,
we
have
kind
of
code
paths
where
they're,
both
libraries
and
they're.
They
have
main
functions
so
it
it
depends.
A
You
know,
team
to
team
to
team.
My
perspective
is
kind
of
it's
sort
of
like
when
you
enter
a
code
base
and
there's
like
15
million
unit
tests,
and
you
change
one
character
and
like
half
of
them
break,
you
know
that
kind
of
anti-fragility
with
wide
enough
registration
entries
is
probably
the
preferred
approach,
because
if
something
is
so
brittle
where
you
need
a
control
loop
to
kind
of
endlessly
reconcile
it
by
docker
image
id
or
you
need
to
keep
track
of
node
appearance
and
depth,
then
that's
probably
the
wrong
shape.
A
But
every
organization
is
different.
You
know,
mandates
or
different
priorities
are
different.
We
largely
just
partner
with
the
teams
and
and
try
to
have
the
discussion
and
facilitate
what
spire
can
do
for
them.
B
That's
that's
great
guidance.
Thank
you
externally,
put
for
thought
for
for
the
attendees.
Thank
you
very
much
to
echo
the
the
last
comment
in
chat.
Eric
you're,
an
awesome
presenter,
nice
work.