►
Description
Infrastructure as Software with GitOps - Justin Garrison, Amazon
The cloud has enabled abstractions and automation, but Infrastructure as Code (IaC) doesn't scale. You can use declarative YAML or imperative scripts and still lose control. Infrastructure as Software (IaS) allows you to control and scale infrastructure with the same practices as applications. GitOps is an implementation of IaS with lots of benefits over IaC. We'll look at how it's different, when you should use it, and where it potentially breaks down.
A
I
was
sitting
in
front
of
floor
to
ceiling
windows
on
the
top
story
of
the
building.
Where
I
worked,
it
wasn't
a
very
tall
building.
The
view
was
was
enough.
It
was
a
of
a
freeway
that
I
was
going
to
sit
in
traffic
in
in
a
few
hours
in,
and
I
felt
like
I
was
succeeding
at
a
lot
of
things
I
had
deployed
kubernetes
for
my
organization.
A
I
had
set
up
all
of
these
things
that
that
let
developers
use
templates
and
deploy
applications
quickly
and-
and
I
was
a
good
sysadmin-
I
was
a
good
systems
engineer
by
automating
those
things
and
even
if
you're
familiar
with
core
os
at
the
time
is
what
I
was
using.
Had
automatic
updates
ability
to
have
automatic
updates
and
most
people
turn
those
off,
because
they
were
worried
that
things
would
break.
A
A
A
It
has
to
be
software,
it
has
to
be
something
that
is
running
that
manages
those
apis
and
everything
that
I
thought
I
did
right
about.
Kubernetes
was,
I
had
just
built
automation.
I
had
built
these
the
same
thing
I
had
done
over
and
over
again,
and
I
just
automated
pieces
of
it,
but
she
was
right
that
we
had
to
write
software.
A
We
couldn't
rely
on
infrastructure
as
code
anymore
infrastructure,
as
code
was
just
the
automation
piece
and
it
didn't
scale
as
well,
because
it
you
had
to
trigger
at
certain
times
you
had
to
make
sure
it
worked,
and-
and
from
that
point
I
was
like
okay-
well,
what?
What
is
the
thing
that
exists
that
does
infrastructure
as
software?
What
is
something
that
is
running
software
that
takes
in
data
and
calls
apis
and
then
applies
that,
and
the
very
first
thing
that
was
obvious
to
us
was
the
kubernetes
controller.
A
The
kubernetes
controller
just
calls
apis
over
and
over
again,
and
it
looks
at
the
state
that
it
wants
and
it
looks
at
the
state
that
it
needs
and
it
makes
it
happen,
and
that
was
the
very
core
piece
of
infrastructure
software.
This
idea
that
oh,
we
should
actually
be
managing
our
infrastructure
with
these
control
loops
and
we
found
other
things
once
we
started.
Looking
around
this
pattern
of
infrastructure,
software
showed
up
other
places
if
you're
familiar
with
netflix
has
chaos
monkeys.
Chaos
monkeys
is
like
the
opposite
of
infrastructure
software,
but
it
does
the
same
thing.
A
It's
software
that
runs
that
takes
data
in
calls
apis,
but
it
breaks
things
on
purpose
and
it.
These
chaos
monkeys,
would
kind
of
degrade
the
state
of
your
infrastructure
on
purpose
with
software,
and
it
wasn't
a
one-time
get
push.
It
wasn't
a
repository
ammo
files,
the
the
chaos
monkey
had
to
constantly
look
at
the
state
of
the
world
and
make
sure
like
oh,
is
this
right.
A
Once
I
heard
about
githubs,
I'm
like
it's
infrastructure
software,
it's
exactly
the
thing
that
we
learned
in
kubernetes
controllers
that
we
saw
over
and
over
again
at
large
scales
at
very
high
impact
in
high
velocity
environments.
You
had
to
manage
it
as
software
and
get
ups
is
a
core
definition
of
that.
It's
an
implementation
of
infrastructure
as
software
it
just
it
does
this
control
loop.
A
They
they
have
52
cores
in
them,
256
gigs
of
ram
they're
snowball
edge
devices
and-
and
I'm
using
them
today
to
represent
like
standing
up
our
infrastructure,
and
this
is
essentially,
if
you're,
using
aws.
It's
like
a
16
extra
large
right.
It's
like
that's,
that's
how
big
one
of
these
boxes
is
and
we'll
ship
them
to
you
and
you
can
run
edge
compute
with
them.
A
You
can
put
storage,
I
think
they
had
42
terabytes
of
storage,
they're
they're,
pretty
pretty
nice
to
have
at
the
edge,
but
this
was
what
I
was
thinking
when
I
had
on-prem
servers.
I
would
stand
these
up
and
what
do
you
call
two
snowballs
on
top
each
other?
It's
a
snowman,
but
when
I'm
running
infrastructure
I
would
deploy
it
and
I
would
run
these
scripts.
A
That
would
stand
up
my
infrastructure
and
it
would
get
it
all
set
so
that
every
time
I
wanted
to
run
it,
I
would
trigger
something
jenkins,
whatever
it
would
actually
deploy
this,
and
it
was
great
because,
while
I
automated
that
piece
of
it,
it
was,
I
couldn't
touch
it.
I
couldn't
do
other
things
to
it
and
realizing
time
and
time
again
that
infrastructure
is
code
again
it
didn't
it
wasn't
enough,
because
someone
would
come
along
and
like
do
something
to
the
infrastructure.
Something
would
break
and
nothing
brought
it
back.
A
Nothing
stood
those
back
up,
nothing
brought
it
back
to
the
desired
state
until
we
ran
that
infrastructure
as
code
again
that
repo
of
code
that
we
had
that
stood
these
up,
that's
what
actually
would
bring
it
back
to
a
state
that
was
like.
Okay,
now
we're
good
and
the
main
thing
there
are
those
controllers
infrastructure
software.
Is
you
always
have
something
watching
this
state
that
you
don't
have
to
worry
about
like?
Oh,
it's,
it's
gonna
go
it's!
No!
A
A
No
one
tell
my
my
coworkers
they're
rugged,
it's
fine,
but
I
wanted
to
come
and
talk
about.
Why
get
ops
is
really
powerful
in
the
the
things
the
core
principles
of
get
ops
illustrates
this
really
well,
and
yes,
you're
going
to
have
more
than
one
controller,
managing
your
infrastructure
you're
going
to
have
kubernetes
controllers,
you
can
probably
have
cloud
controllers
that
do
different
things.
There's
all
these
controllers,
that
are
implementations
of
software
and
the
first
thing
that
I
I
wanted
just
to
touch
on
again.
A
Is
this:
what
is
like
the
code
piece
like
what
is
the
the
thing
that
we
always
like
infrastructure
is
code
everywhere
everywhere,
infrastructure
is
code
and
like
if
I
really
like?
Okay,
that's
fine
all
is
we
know
running
this,
isn't
infrastructure
as
code
right
like
that's?
Well,
that's
not
it
because
you
can't
run
it
at
your
command
line.
You
don't
want
people
manually
doing
this,
so
what
you
really
want.
A
A
A
When
did
it
become
software
like
where?
Where
was
the
piece
in
there
that
were
like?
Oh
now,
it
was
oh,
it's
done
like
it's.
No
longer
software
right,
it's
like
software
is
code.
While
it
runs
it's
only
in
that
running
state,
it's
only
the
code
with
electricity
applied
to
it
like
when
it
goes
through
the
processor.
A
A
A
A
Right
like
now
now
it's
now
it's
kind
of
software.
Now
it's
a
terraform
is
a
controller.
Terraform
looks
at
a
desired
state,
it
looks
like
the
current
state
and
it
reconciles
them
right.
It's
just
a
command
line
version
of
it.
It
runs
locally.
It's
exactly
what
a
controller
inside
of
kubernetes
is
doing.
It's
exactly
what
it
flux
and
github
is
doing.
It's
doing
the
same
sort
of
reconciliation
and
that's
fine,
but
it's
it's
again.
It's
not
what
git
ops
is
proposing,
because
in
this
case
you
could.
A
Right,
like
that's,
is
that
infrastructure,
like
that's
continuously,
reconciled
at
some
level
or
like
what
terraform
is
going
to
always
just
do
this,
but
like
that.
That
is
really
bad
like
that
is
going
to
be
that's
going
to
hit
limits.
That's
going
to
hit
all
sorts
of
things
that,
like
that
doesn't
scale
we
need.
A
Get-Offs
is
a
very
specific
way
of
doing
something
like
this,
so
that
we
can
help
it
scale,
and
I
really
like
looking
at
the
principles,
because
that
helps
us
that
informs
us
of
what
other
things
are
doing
similar
things
and
they
all
kind
of
go
along
the
same
route.
They
all
have
similar
principles
and
how
they're
going
to
scale
and
and
what
they're
going
to
do
so.
We
looked
at
software
as
far
as
only
when
code
is
has
electricity
applied
to
it,
and
so
here's
these
principles
that
again
you've
been
hearing
them
all
day.
A
I
oh
for
a
long
time.
I
falsely
believe
that
declarative
meant
nothing
was
imperative,
and
that
is
absolutely
not
true.
When
I
run
terraform
apply,
I
declare
my
state
in
my
tfrs.
My
you
know:
terraform
manifests,
but
terraform
does
everything
declare
imperatively
right?
It
builds
this
dag
that
then
says
okay.
I
need
to
do
this
first
then
I
do
this
first,
that
is,
that
is
exactly
imperative.
A
B
A
All
right
so
there's
my
there's,
my
declarative
state
did
I
put
that
in
here
yeah.
So
I
have
my
my
tfr
is
in
there.
That's
my
declarative
state,
but
what
is
immutable?
I
think
that's.
Okay,.
A
That's
it
right,
I'm
immutable.
Now,
I'm
I
can't
write
to
it
anymore.
That's
that's
all
we
did
like
that's
all
we
had
to
do
and
now
we're
immutable.
Congratulations,
you're
at
step,
two
of
getting
said
no
well
kind
of
this
is
this
is
the
fundamental
piece
of
what
we
want
you
to
do.
We
want
you
to
be
immutable,
we
want
you
to
be
versioned
and
the
main
thing
we
want
to
do
is
actually.
A
A
A
A
Polling
in
get
ops
is
two
very
important
things
that
I
really
like
that
they
put
this
as
a
core
piece
of
of
get
ops,
and
I
know
there's
there's
different
sort
of
pieces
here,
but
there's
two
big
reasons:
you
want
to
pull
your
desired
state,
the
data
that
you
have.
You
want
to
pull
that
in
and
one
is
for
scalability,
because
if
you
have
one
or
two
clusters
or
one
or
two
servers,
you're,
probably
fine.
A
If
you
have
a
thousand,
it's
going
to
be
a
little
bit
harder
to
push
that
out
and
push
that
everywhere,
whereas
a
pull
states,
it's
a
lot
easier
for
us.
We
just
technology
in
general
have
known
how
to
serve
some
files
for
a
very
long
time.
We
can
serve
websites,
we
can
serve
static
stuff.
We
can
cache
them,
but
the
compute
side
of
it
is
intensive
and
all
of
the
api
calls
that
result
in
in
that
action
happening
takes
a
long
time.
A
A
The
downsides,
of
course,
is
you
have
to
run
more
controllers.
You
have
to
run
more
software
places
that
only
have
limited
scope
of
what
they
can
apply
to,
but
the
benefit
there
is
if
they,
if
something
breaks
in
that
one
limited
controller,
it
doesn't
have
a
large
blast
radius.
It
has
a
very
limited
for
security
for
downtime
for
all
these
other
things.
So
this
is
this
pole
aspect
is
really
good
for
the
scalability.
A
A
We
wanna
do
something
like
this
right:
we're
gonna
we're
gonna,
pull
that
data
down.
Let's
say
it
stores
our
our
variables
file
and
once
we
pull
it
down,
then
we're
almost
set
to
go
right,
because
then
I
can
run
that
tear
from
apply,
and
I
can
assume
that
you
know
bash
being
bashed.
I
could
assume
that
if
that
fails,
my
my
code
would
exits
and
I
don't
actually
have
a
problem
of
half
half
variable
files
applying
somewhere.
A
In
this
one
continuous
reconciliation
again
it
always,
it
was
kind
of
a
trip
for
me
where
I
didn't
know
how
continuous
that
meant.
Does
that
just
mean
while
true
and
it
doesn't
it's
not
about
being
continuous?
And
it's
one
end
we
can
just
say
you
run
every
every
loop
every
time
you
finish
you
try
again
on
the
other
side.
Is
our
traditional
infrastructure
is
code,
which
was
only
when
a
file
changed.
I
don't
know
if
anyone
did
config
management
for
a
long
time.
A
Once
I
changed
my
puppet
manifest,
then
I
applied
that
manifest
to
servers,
and
that
was
fine.
As
long
as
I
kept
sort
of
a
normal
cadence
of
changes,
but
in
something
like
infrastructure,
you
have
a
low
level
there
of
say,
dns
network
infrastructure.
These
things
don't
change
very
often,
and
if
you're
changing
those
with
a
while
loop,
you
will
likely
break
something,
and
that
is
scary,
and
so
you
wanted
something
beyond
the
like.
A
Fixing
terraform
state
files
and
if
anyone
has
done
that,
that
is
a
bad
night
on
call.
I
am
sorry
if
you've
ever
had
to
go
through
that,
but
that
was
that
was
a
downside
of
infrastructure.
As
code
right,
like
I
thought,
when
I
had
my
terraform
as
code
or
my
my
terraform
manifest
and
when
I
was
on
call-
and
I
got
this
call
and
I
was
like
how
could
it
be
down?
I
have
infrastructure
as
code.
A
A
I
want
to
do
what
infrastructure
changes
too,
but
we're
going
to
simplify
it,
for
because
this
is
a
crappy
demo
of
what
we're
doing
here.
So,
let's
say
if
you've
never
used.
Iwatch,
oh
and
I
gotta
find
it
there.
It
is
because
I
couldn't
remember
this
one
off
top
my
head.
A
I
watch
we'll
watch
for
file
system
changes
and
and
run
a
command,
so
we're
telling
iwatch
as
soon
as
a
file
is
written
in
this
folder
close
right
is
the
is
the
thing
it's
going
to
look
for
in
this
folder
run
my
infrastructure's
code
script
and
and
so
over.
Here
I
can
actually.
A
A
There
we
go
now
we
get
version
three
there.
It
is.
Thank
you
see.
We've
got
that
in
code
review.
That's
still
in
this
case.
Obviously,
looking
for
file
changes
on
disk
but
get
ops
is
doing
that
reconciliation
and
one
of
the
cool
things
about
flux
and
argo,
and
these
things
they're
they're
doing
that
two-way
sync
and
it's
one
step
beyond
what
a
typical
controller
like
this
is
going
to
do
where
I'm
only
looking
at
my
local
files
and
traditionally
we're
only
looking
for
a
git
push
and
what
you
actually
want
to
do.
A
Is
you
want
to
look
at
that
full
state
of
the
infrastructure?
And
you
say:
hey,
I'm
just
going
to
check
it
terraform's
going
to
go
out
there
and
terraform
planet
every
once
in
a
while
and
say
hey,
I
think
something's
different
now
and
you
can
hook
into
these
sorts
of
signals.
Amazon.
We
have
eventbridge,
there's
all
these
different
ways
that
you
can
look
at
what
is
going
on
in
the
infrastructure
and
you
can
trigger
things
based
on
any
changes
right.
A
A
You
can
apply
infrastructure
as
software
principles
to
anything
and
it's
not
just
the
kubernetes
pieces
because
again,
you're
gonna
have
different
scopes
for
these
controllers
and
if
you
think
that
you're
going
to
do
one
controller
and
it's
going
to
do
everything,
it's
like
writing
one
terraform
main
file.
That
does
everything
you
don't
want
that
scope.
You
don't
want
that
blast
radius
for
one
file
for
one
controller,
so
you
do
want
to
separate
these
things
out.
You
want
to
have
that
limited
scope.
A
Having
pull
base
limited
scope
inside
of
environments
is
a
great
idea
and
then
applying
this
whenever
there's
a
change
on
the
infrastructure
or
on
files
is,
is
really
that
last
piece
and
and
that's
really
all
I
have
for
why
infrastructure,
software
and
git
ops
works
so
well
together.
Git
ops
is
a
implementation
of
infrastructure
software,
and
that
is
the
main
direction
that
all
of
this
should
be
going
in,
and
I
think
it's
been
a
great
progress
with
good
ops
in
general.
So
far,
so
thank
you.
C
Well,
well,
well
done!
Thank
you
so
much
for
that
presentation.
We
do
have
a
few
minutes
for
questions.
Are
you
available
to
take
some
questions
so
questions
you
can
raise
your
hand
or
we
have
a
mic
at
the
mid.
You
can
just
jump
to
any
questions.
A
I
will
say
I
am
showing
off
running
kubernetes
on
these
in
the
aws
booth
tomorrow
with
eks
anywhere,
which
also
implements
git
ops.
So
if
you
want
to
see
them
in
action
with
kubernetes
come
by
the
booth
tomorrow
at
10
30.,
those.
C
C
C
While
they're
thinking
of
one
you,
you
were
mentioning
that,
can
you
guys
hear
me?
Okay,
because
I
can't
hear
the
mic
feedback,
but
you
you
were
mentioning
that
you
felt
like
on
some
level.
It
wasn't
declarative
because
it
relied
on
an
imperative
system
for
operation.
C
So
would
you
I
mean
in
that
case
there
is
no
such
thing
as
declarative
right,
because
everything
always
relies
on
an
imperative
operation
like
even
if
you
built,
you
know
a
language
that
was
only
imperative
at
the
end
of
the
day.
It
has
to
be.
That
was
only
declarative.
At
the
end
of
the
day,
it
has
to
be
put
into
byte
code,
which
is
imperative
out
of
the
cpu.
So
I
was
wondering
if
you'd
maybe
just
speak
on
that
for
another
moment
or
two.
A
Yeah,
there's
also
no
such
thing
as
immutable,
which
was
kind
of
mind-blowing.
For
me
to
think
of
that,
it's
like
no.
Actually
everything
changes
over
time
once
we
run
it,
and-
and
those
were
two
things
that
as
a
as
a
system,
I
kept
thinking
like
oh
well.
This
is
this
is
how
it
has
to
be,
but
realizing
that
real,
the
the
main
benefit
was
to
the
humans.
A
What
we
thought
were
immutable.
It
was
okay,
because
my
my
view
of
what's
immutable
is
like
oh
well.
No,
I
want
this
application
deployed,
but
then
I
have
this
other
controller.
That
comes
in
and
says
like,
oh,
but
I
need
to
scale
it
up.
I'm
like
well,
I
didn't
tell
it
to
scale
up,
so
it's
not
immutable
because
I
told
it
five
replicas,
but
something
else
figured
out
it
needed
ten,
and
at
that
point
I'm
like
well
now
it's
not
immutable.
So
is
it
bad?
No,
it's
wonderful
because
I
didn't
have
to
do
it.
A
A
A
If
I
need
to
get
there
in
five
seconds
ten,
you
know
five
minutes
and
it's
the
30
minute
drive
away.
I
can't
request
an
uber
because
I'm
not
going
to
get
picked
up
so
then,
so
I
need
some
control
there
like
I
have
to
get
there
fast.
You
know
I
have
to
get
to
the
hospital
my
wife's
going
into
labor.
Let's
go
like
I'm,
just
I'm
not
going
to
sit
there,
I'm
like!
Well,
let's
call
a
cab.
D
A
A
Please
don't
write
your
own
bash
controller
for
get
ops
use
one
that's
fairly
trusted
because
you're
gonna
have
those
bugs
and
and
but
to
gain.
Your
own
experience
adds
more
trust,
but
a
lot
of
that
comes
from
knowing
how
the
system
breaks
in
your
environment
and
knowing
like.
Oh,
my
process,
didn't
align
with
how
that
thing
was
intended
to
be
used,
and
so
once
you
have
some
sort
of
intention-based
thing
where
you're
like.
A
E
Hi
great
talk,
one
question:
I
don't
know
you
don't
show
it,
but
it's
always
in
my
head
when
I
use
argo
cd.
If
we
revert
in
a
case
of
kit
ops,
what
is
the
best
practice
to
revert
it
over
the
ui
on
argo
cd
because
you
have
a
history
and
revert,
possibility
or
actually
a
single
point
of
truth
will
be
a
git
right.
You
go
on
git
and
revert
it
there
and
then
you
have
auto
sync.
E
A
A
I
worked
at
disney
plus
for
a
while.
We
were
managing
infrastructure
and
we
had
we
built
our
own
controllers
for
how
we
were
managing
our
clusters
and
we
had
that
same
question
was
like
hey.
What
do
we
do?
How
do
we
revert?
We
deployed
something
that
was
bad.
How
do
we
get
back
to
a
known,
good
state
and
yeah?
We
could
go
to
get
all
of
our
state
was
stored
and
get
we
could
go
to
get
say
like.
A
Oh,
I'm
gonna
make
the
new
head
the
old
version,
and
we
had
so
many
weird
things
in
software
that
didn't
ever
assume.
That
latest
would
be
something.
That's
not
the
most
recent
and
it
said
like.
Oh
no
latest
is
the
most
as
far
as
a
time
stamp
goes
and
a
lot
of
systems
still
deal
with
time
stamps,
and
so
what
we
decided
for
our
system
was
always
that
we
never
went
back
a
head
version.
We
always
went.
A
We
always
did
a
git
revert
that
pushes
one
more
head
version
ahead
and
it
says
oh
I'm
going
to
take
that
thing
and
I'm
just
going
to
undo
this
commit,
but
it
always
has
a
new
times.
It
has
a
new
commit,
has
a
new
timestamp,
it
always
moves
forward,
and
so
in
software
and
infrastructure,
a
lot
of
those
systems
are
just
easier
to
mentally.
A
Think
of
when
the
current
state
is
always
the
newest
time
stamp
state
and
and
so
figuring
out
what
the
newest
time
stamp
would
be
and
the
newest
commit
as
far
as
history
of
dates
just
because
we
live
in
a
time
synchronized
worlds,
it's
just
easier
to
reason
about
in
a
lot
of
ways
for
all
those
controllers.
So
I
don't
know
if
argo
has
a
specific
way
to
do
those
reverts,
but
I
know
in
other
cases
it's
just
a
lot
of
times
more
safer.
C
The
argo
maintainer
in
me
says:
let's
talk
after
this.
I
think
we
have
time
for
one
more
question,
while
they're
setting
up
for
the
next
talk
one
more
question
from
somebody
big
pressure,
you
got
to
get
a
good
question:
no
okay!
Well,
let's
give
one
more
round
of
applause.
Thank
you.
So
much.