►
From YouTube: Organizing Teams for GitOps and Cloud Native Deployments - Sandeep Parikh, Google Cloud
Description
Organizing Teams for GitOps and Cloud Native Deployments - Sandeep Parikh, Google Cloud
Large scale Cloud Native deployments typically include multiple teams running multiple applications across multiple environments - but how should teams be organized to enable efficient software delivery? How should responsibilities be split between platform, DevOps, and application teams? In this talk we’ll walk through the different approaches teams can adopt for organizing Git repos, handling upstream dependencies, and managing software rollouts. This talk will go in-depth about repo structure and strategies for managing the release process, as well as how to enforce policies across configs and manifests.
A
My
name
is
sundip
and
I've
been
with
google
cloud
for
almost
seven
years.
For
the
most
part,
I've
had
several
different
roles
and
titles
over
that
time,
but
ultimately
it's
revolved
around
helping
teams
adopt
and
optimize
for
cloud
in
some
form
or
fashion.
You
can
always
find
me
at
circus
monkey.
That's
crcs
mnky
on
twitter.
If
you've
got
questions
about
get
ops,
devops
or
anything
else
that
comes
to
mind
now,
there's
quite
a
lot.
I
want
to
cover
with
y'all
today,
so
we're
going
to
move
through
it
pretty
quickly.
A
A
A
But
how
do
we
measure
that
software
delivery
performance?
Well
in
our
research
program,
we
have
found
a
valid
and
reliable
way
to
do.
Just
this.
There
are
two
metrics
representing
speed
and
two
metrics
representing
stability
for
the
speed
metrics.
We
have
deployment
frequency,
which
is
how
often
you
deploy,
and
we
also
have
your
lead
time
for
changes
and
that's
basically
measuring
the
time
from
a
commit
all
the
way
to
that
commit
being
deployed
into
prod
and
then
on
the
stability
front.
A
A
Now,
these
four
metrics
can
be
applied
to
any
kind
of
software
delivery,
whether
it's
web
or
mobile,
firmware.
What
have
you
and
using
these
metrics?
We
can
actually
bucket
teams
into
specific
categories
into
low
medium,
high
and
elite
software
delivery
performing
teams,
but
these
are
just
trailing
indicators
of
software
delivery
performance
and
that's
where
the
leading
indicators
come
in
now
we
don't
have
time
to
go
through
all
of
the
analysis
from
dora,
but
we
know
that
there
are
specific
leading
indicators
that
drive
software
delivery
performance
and
have
positive
impact
for
the
purposes
of
this
slide.
A
A
A
Now,
that's
just
a
little
bit
about
devops
and
it's
important,
because
I
think
what
we've
learned
ultimately
from
dora.
One
of
the
big
takeaways
for
us
is
that
it's
not
about
the
tools
it's
about
the
process
and
the
people
involved
and
that's
what
really
drives
the
stability
and
the
velocity
improvements.
A
They
usually
amount
to
having
a
lot
of
individual
teams
pushing
code
to
a
lot
of
deployments
or
deployment
environments
and
those
deployment
environments
are
spread
across
many
or
multiple
regions.
It's
a
simplistic
view.
Ultimately
that
encompasses
quite
a
bit
of
complexity.
So
let's
try
to
break
it
down,
starting
with
some
foundations
and
some
assumptions.
A
So
why
do
teams
even
want
git
ups?
Well,
it's
because
they
want
to
get
out
of
the
imperative
operations
business
right.
These
sorts
of
approaches
are
hard
to
scale
hard
to
fix
and
hard
to
roll
back,
especially
in
case
there
is
a
problem.
So
we
don't
want
to
do
this
approach
anymore,
so
we
adopt
a
git
ops
approach
that
gives
us
some
very
specific
properties,
namely
declarative
right.
It's
a
system,
that's
a
system
managed
by
get
ops,
must
have
its
desired
state
expressed
in
a
declarative
fashion.
A
So
if
in
case,
there
is
an
imperative
operation
against
a
particular
cluster,
if
against
a
resource,
that's
coming
from
a
git
repo,
we
know
that
that
imperative
operation
will
actually
get
overwritten
on
the
next
reconciliation
loop.
So
these
are
the
principles
that
we
get
by
adopting
a
get
up
style
approach
so
now,
with
git
ups
out
of
the
way,
I
want
to
talk
about
some
of
the
other
assumptions
that
I
want
to
make
up
front
so
for
our
notion,
around
kind
of
many
teams
we're
going
to
categorize
them
into
some
pretty
coarse
buckets.
You
know.
A
Forgive
me.
We
have
application
teams,
operation,
teams
and
platform
teams
now
for
infrastructure,
we'll
be
assuming
kubernetes
as
your
cloud
native
deployment,
which
makes
sense
and
some
git
ups
tooling.
We
don't
need
to
be
specific
about
which
getups
tooling,
whether
it's
argo,
cd,
flux
or
config,
sync
just
know
that
most
of
what
we're
going
to
talk
about
involves
one
of
these
sort
of
popular
git
ops
tools
and
then
for
deployment
regions.
A
A
Well,
then,
we
have
app
operators,
they're
responsible
for
deployment,
manifests
and
making
sure
the
app
or
service
is
up
and
running,
and
then
finally,
we
have
the
platform
admins.
They
cover
the
infrastructure
bits,
not
necessarily
the
compute
layer
of
kubernetes,
but
they
may,
but
they
cover
the
just
one
level
up
from
kubernetes,
so
things
like
rbac
quotas,
resource
limits,
all
that
sorts
of
work
right.
The
kind
of
initial
infrastructure
that
has
to
get
laid
down
on
kubernetes
before
application
teams
can
run
and
scale.
A
A
A
If
your
team
gets
a
name
space
as
their
only
playground,
then
you're
probably
living
in
a
multi-tenant
world,
and
this
tends
to
be
the
case
for
the
long
tail
of
application
teams
deploying
to
kubernetes
environments
now,
regardless
of
which
approach
single
tenant
or
multi-tenant.
The
platform
team
still
has
a
role
to
play.
So,
let's
explore
where
they
fit
into
this
equation
as
well.
A
Ops
teams
may
have
shared
repos
with
platform
teams,
or
they
may
have
distinct
repos
in
the
case
on
the
left,
the
getup
setup
process
is
simple,
but
the
organizational
process
may
be
more
challenging
because
there's
more
coordination
involved
if
two
teams
are
sharing
the
same
repo
now
on
the
left.
Well.
Well,
sorry
on
the
right
you've
kind
of
flipped
that
problem
on
its
head
and
you've
made
the
get
up
setup
more
complex
with
distinct
repos,
but
the
organizational
setup
is
easier
right.
A
Multi-Tenant
approaches,
essentially
look
very
similar
right.
The
only
difference
here
is
the
scale
and
complexity
of
the
repub
management
in
a
shared
repo
approach.
This
may
have
to
be
accomplished
via
things
like
pr
reviews
on
protected
branches
or,
if
you
have
distinct
repos,
where
platform
and
ops
teams
are
completely
separated.
Now
this
simplifies
most
of
the
day-to-day
git
management,
but
it
does
make
the
get
ops
configuration
much
much
more
complicated
with
this
distinct
repo
approach.
A
The
getups
tools
that
are
out
today
have
different
ways
of
supporting
this,
like
argo,
cds,
application
or
app
of
apps
model.
So
there
are
different
approaches
out
there.
That's
one
with
argo
with
config
sync:
there
are
other
repo
and
distinct
sort
of
root,
repo
and
separate
shared
repo
options
as
well,
but
ultimately
you're
putting
the
complexity
back
onto
the
git,
ops,
tooling,
and
you're.
Simplifying
the
work
on
the
organization.
A
So,
let's
take
a
look
at
an
example:
workflow,
where
application
operations
and
platform
teams
all
have
separate
repos,
but
they
are
effectively
able
to
collaborate
without
stepping
on
each
other.
Now
it
starts
with
the
dev
team
writing
code
and
their
applications
and
building
those
artifacts,
and
then
those
artifacts
are
stored
in
some
sort
of
artifact
repository.
A
A
Now,
if
we
build
upon
that
example,
we
need
to
talk
through
some
additional
considerations
as
well,
so
for
starters,
what
if
config
and
infrastructure
right
that
the
ops
and
the
platform
teams
what
if
they
had
a
shared
repo
between
those
two
teams?
Well,
how
do
those
teams
need
to
work
together
is
the
repo
owned
by
the
platform
team,
or
is
it
owned
by
the
ops
team?
Are
there
weird
permissions
or
protected
branches
that
we
have
to
worry
about
if
it's
owned
by
platform?
A
A
Now,
there's
also
the
option
of
you
know,
maybe
there's
a
an
approval
process
at
continuous
delivery
time
before
objects,
get
pushed
to
kubernetes
and-
and
maybe
that's
done
by
the
platform
team.
Maybe
it's
only
particular
to
maybe
prod,
but
not
the
other
deployment
environments
like
dev
and
qa,
and
staging
right,
because
we
want
to
stay
out
of
people's
way
as
much
as
possible
and
let
them
work
quickly.
A
Now
these
are
all
the
sorts
of
things
that
need
to
be
understood
and
again
it's
not
a
one
size
fits
all.
Every
organization
is
different
in
its
own
ways,
and
everyone
views
the
division
of
responsibility
and
the
ownership
in
different
ways.
So,
instead
of
trying
to
figure
this
out
with
tools
which
is
not
going
to
work,
don't
let
the
tools
drive
this
process.
A
But
if
there's
no
clear
indication
of
ownership
or
permission
or
responsibility,
then
you're
left
kind
of
wondering
how
we're
going
to
fix
and
understand.
All
of
this
now
versioning
is
the
next
topic.
Versioning
is
relatively
straightforward,
but
there
are
a
couple
of
considerations
to
remember
so
now.
This
is
not
hard
and
fast
guidance.
That's
100,
correct
by
no
means,
but
for
many
organizations
and
their
teams,
a
branch
per
non-production
environment.
That
model
tends
to
work
well
and
provides
a
pretty
clear
process
and
lineage
for
git
ops
deployments.
A
A
Of
course,
the
other
get
ups
tools
out
there
like
flux
and
config
sync
support
similar
approaches.
I
just
wanted
to
put
up
one
example
here
now
for
releasing
to
prod.
We
move
away
from
the
head
of
any
particular
branch
to
something
a
little
bit
safer,
the
safest
approach,
always
the
safest
approach
is
to
use
a
commit
hash.
A
Instead
of
trying
to
update
a
whole
bunch
of
git
ops,
controller
crds
with
different
commit
hashes
all
the
time,
we
could
take
a
slightly
different
approach
and
use
tags
tags
are
the
next
best
option
from
using
a
commit
hash.
Their
only
downside
is
that
they
are
immutable,
so
we're
back
to
having
good
process
and
hygiene
around
git
be
really
really
important
to
help
keep
this
from
becoming
a
problem
and
from
get
to
keep
it
from
getting
abused.
A
Now,
regardless
of
whether
you
deploy
to
prod
via
commit
hash
or
tag,
you
want
to
employ
some
good
basic
principles
and
practices.
First
and
foremost
those
crs
that
specify
repo
hashes
or
tags,
those
should
be
deployed
in
a
declarative
manner
and
not
using
any
imperative
approaches
like
cli
or
some
other.
You
know
imperative
other
approach
out.
There,
then
you'll
want
to
build
out
some
sort
of
distinct
delivery
process
outside
of
your
application
pipelines
to
deliver
these
updated
crs
that
specify
new.
A
You
know,
branch
names
or
new,
commit
hashes
or
new
tag
names
and
that
deploy
process
should
match
what
your
organization
wants,
whether
they
want
to
do
kind
of
a
blue
green
deployment
and
switch
100
of
the
traffic
over
or
they
want
to
do
like
a
canary
style
process.
Where
you
know,
small
percentages
of
traffic
are
shifted
over
to
newer
versions
of
the
application.
That's
really
again
back
to
what
your
teams
want
as
that
sort
of
outcome.
A
A
That
means
the
automation
is
going
to
check
for
health
checks
and
readiness
checks
before
continuing
to
progress
further
into
the
deployment,
or
is
there
a
human
who
makes
that
decision
and
says
we're
going
to
deploy
20,
I'm
going
to
check
the
numbers
and
then
I'm
going
to
deploy
up
to
50
and
so
on
and
so
forth,
but
it
should
be
written
down
and
transparent
to
every
application
team.
So
they
know
how
that
deployment
process
to
prod
works
now.
Another
way
that
dev,
ops
and
platform
teams
can
collaborate
is
via
an
upstream
dependency
process.
Now.
A
This
is
often
done
using
things
like
helm
charts.
So
in
this
section
I
wanted
to
quickly
mention
another
approach:
we're
not
going
to
spend
a
ton
of
time
on
it
because
we're
flying
pretty
quickly
through
all
this,
but
I
want
to
make
sure
y'all
are
aware
of
what
other
options
are
out
there,
especially
ones
that
match
the
get
ups
model
much
more
closely,
and
that
approach
is
called
kept.
A
We
don't
have
time
for
a
full-on
kept
tutorial
or
walk-through,
but
I'd
recommend
y'all.
Take
a
look
at
kept.dev
I
like
to
think
of
kept
as
basically
another
way
to
use
package
management
semantics,
but
with
bundles
of
kubernetes
config.
That's
it
now
one
example
that
often
comes
to
mind
when
we
talk
about
upstream
dependencies
is
this
idea
of
having
approved
software
packages
that
can
be
used
by
application
teams.
So
you
could
think
of
things
like
you
know,
reduce
or
mongodb
right.
A
Maybe
the
platform
team
or
the
security
team
has
approved
using
redis,
but
they
also
have
done
it
with
very
specific
configuration
details.
So
they
don't
want
anyone
just
grabbing
the
redis
image
and
just
deploying
it
on
their
own.
They
want
to
have
kind
of
a
carefully
controlled
red
s.
Artifact
and
redis
configuration
that
gets
deployed.
A
So
how
do
those
platform
teams
then
share
that
with
their
application
teams
or
their
ops
teams?
So
this
package
management
approach
is
pretty
helpful
because
they
can
actually
one
pull
that
packaging
in,
for
that
bundled
config
of
you
know
redis
configuration,
but
it
also
provides
the
opportunity
for
them
to
update
that
as
well.
So
as
the
platform
team
updates
that
configuration
or
revs
that
version
of
the
redis
deployment,
the
application
teams
and
the
ops
teams
can
pick
up
that
update
again
using
regular
old.
You
know
package
management,
style
semantics.
A
So
that's
why
I
like
this
approach
and
it's
worth
looking
into
now.
The
last
topic
I
want
to
talk
about
as
it
relates
to
teams
and
git
ups
and
cloud
native,
is
around
guard
rails
and
guarding
against
danger
I'll
be
using
some
key
terms
as
we
talk
about
this
in
the
next
section,
so
I
want
to
quickly
define
them
up
front
now.
First,
is
policies.
Policies
are
rules
that
tell
us
how
we
can
configure
a
resource,
pretty
straightforward,
when
you're
using
kubernetes
policies
can
specify
things
like
what
labels
are
allowed
on
a
pod
or
requiring.
A
You
know,
images
to
have
specific
tags.
That
sort
of
thing
now
policy
management
is
the
mechanism
that
helps
us
with
the
ins
and
outs
of
a
policy.
So
think
of
this
as
the
framework,
the
runtime
helping
us
manage
or
pull
in
external
data
packaging
testing
that
kind
of
stuff-
and
the
last
part
is
policy
enforcement,
and
it
really
refers
to
the
actions
that
will
be
taken
and
the
scope
of
those
actions.
A
A
The
policy
management
aspect
comes
from
open
policy
agent
right.
That's
a
really
broad
and
popular
framework
for
for
managing
policy
bits
and
the
policy
enforcement
aspect
comes
from
a
sub-project
of
open
policy
agent
called
gatekeeper,
gatekeeper,
essentially,
packages
up
opa,
open
policy
agent
and
delivers
it
as
a
custom,
kubernetes
admission
controller.
So
it's
there
to
allow
or
deny
admission
to
the
cluster
based
on
whether
you
violate
a
policy
or
not.
A
A
With
this
new
admission
controller
in
the
form
of
gatekeeper
checks
with
gatekeeper
and
says,
this
object
wants
to
enter
the
cluster,
how
do
we
decide
what
to
do
with
this
object,
and
then
it
either
provides
it
gets
that
requested
and
it
provides
a
yes
response
or
a
no
response,
and
it's
just
that
simple.
A
When
the
enforcement
happens
right,
gatekeeper,
again,
reviews
the
incoming
object
compares
it
to
all
the
policies
that
are
there.
It
checks
for
the
namespace
scope,
the
object,
type
scope
and
again
the
policy
rule
and
whether
that
policy
is
just
there
for
auditing
purposes
or
whether,
whether
it's
there
to
deny
entry
altogether
and
then
it
makes
the
decision,
and
it
has
it
back
to
the
kubernetes
api
and
the
kubernetes
api
then
rejects
admission
or
allows
admission.
A
A
So
that
means
when
commits,
are
pushed.
We
can
have
enforcement
happen
right
there.
So,
as
you
push
a
commit,
there's
a
test
that
gets
kicked
off,
that
test
comes
back
and
says
hey.
This
is
actually
going
to
violate
a
production
policy.
You
have
to
go
fix
this.
I
can't
pass
the
build
whether
it's
your
application,
whether
it's
you
know
a
deployment
object
for
kubernetes.
I
can't
pass
the
build
until
you
fix
this,
because
it's
going
to
violate
a
policy
and
you
can
have
that
same
approach.
Work
on
a
pr
review
as
well.
A
So
when
a
pr
comes
in
the
pr
is
automatically
tested
and
says.
Okay,
this
object
or
this
application
is
going
to
violate
a
production
policy,
and
you
do
that
by
having
the
infrastructure
team
or
the
platform
team
write
those
policies.
Those
policies
are
available
for
all
teams
to
see
and
they're
able
to
pull
them
in.
So
they
get
the
latest
and
greatest
every
time
they
do
a
commit
or
do
a
pr.
A
A
We
should
also
have
another
policy,
evaluation
or
enforcement
element
at
delivery
time,
just
in
case
to
catch
any
last
minute,
things
that
might
have
bypassed
an
approach
or
come
in
through
a
different
way
and
then,
finally,
we
want
to
stick
with
the
standard,
opa
gatekeeper
approach,
which
is
to
run
that
at
the
kubernetes
cluster,
basically
as
a
bouncer
for
the
front
door.
So
anybody
that's
going
to
violate
policy
through
an
imperative
operation
or
some
other
api
client
also
gets
blocked
right
at
the
kubernetes
door
as
well.
A
Now
we
covered
a
lot
of
ground,
but
there's
one
thing
I
want
you
to
take
away
from
all
this.
The
most
important
takeaway
through
the
whole
thing
is
that
this
is
not
a
one-size-fits-all
approach
right,
doing,
git,
ops
for
cloud
native
with
teams,
it
is
a
human
and
a
process
problem
right
it
doesn't.
There
is
no
one
approach
that
works
for
every
single
team.
It
doesn't
scale
to
every
single
organization.
A
Instead,
you
want
to
take
a
very
deeply
collaborative
approach
and
work
with
your
teams
early
and
often
on
documenting
process
and
understanding,
and
that
documentation
could
cover
things
like
what
is
each
team's
role
and
responsibility
and
be
specific
if
it's
down
to
things
like
hey,
this
team
only
writes
pod
and
service
manifests
or
deployment
manifests
great.
If
this
other
team's
only
responsible
for
config,
you
know
config
map
or
secrets
perfect.
What
we
want
to
have
is
a
clear
idea
for
everybody
in
all
these
teams.