►
Description
Like Uncle Ben said, “With availability-centric topology and incredible scale comes great upgrade complexity”, or something like that. Join Gurney, Joydeep, and their guests Ian Miller and Jun Chen, members of Red Hat’s Telco Engineering team, to discuss Topology-Aware Lifecycle Management (TALM). TALM aims to tame the complexity of upgrades and configuration changes across a fleet of edge appliances by integrating topology awareness into lifecycle operations.
A
A
Hey
folks
welcome
to
the
cloud
multiplier.
I
am
here,
as
always
with
my
co-host
joy,
deep
banerjee,
and
today
we
are
glad
to
welcome
two
guests
from
red,
hat's,
telco
engineering
team.
A
We
have
ian
and
june
here
today,
so
welcome
to
the
show
folks
we're
today
we're
going
to
be
talking
about,
and-
and
this
is
the
brief
bit
I
know
ahead
of
time-
taming
upgrades
at
massive
scale,
so
we're
going
to
be
talking
about
upgrades
and
configuration
changes
and
all
these
problems
that
you
you
face
with
those
big
changes
at
great
scale.
So
it'll
be
really
interesting
to
day.
A
Today,
I've
been
promised
some
cool
demos,
but
before
we
kick
off
ian
june,
do
you
want
to
tell
us
a
little
bit
about
yourselves?
Give
us
the
intro
talk
about
what
you've
been
doing
I'll,
we'll
start
with
ian?
I
guess
you're
on
the
top
of
my
screen,
hey
sure.
B
Yeah
just
really
glad
to
be
able
to
join
you
guys
today.
So
my
name
is
ian
miller
and
I
work
here
at
red
hat
in
the
telco
5g
ran
group
and
we're
working
on
bringing
openshift
out
to
the
the
5g
rand
edge
of
the
network.
B
I've
been
involved
in
networking
for
within
the
telco
space
for
for
quite
a
while
and
just
really
excited
to
be
bringing
that
to
openshift
and
helping
openshift
succeed
in
that
area.
A
That
is
amazing.
How
about
you
june
tell
us
about
yourself
a
bit.
C
Sure
yeah
jin
chen,
so
I'm
yeah.
I
recently
recently
joined
the
same
group
as
ian
a
few
months
ago
and
yeah
so
if
prior
to
that,
I've
been
working
for
telco's
customers
for
for
a
long
time,
so
yeah.
So
my
last
my
last
few
months
is
all
about
this
town.
Things
glad
to
have
a
chance
to
to
showcase
it.
A
That
is
awesome.
Well
welcome.
Folks,
someone
in
chat
has
already
said
long
live
telco
cloud.
So
a
lot
of
the
issues
we're
gonna
talk
about
today
are
definitely
telco
scale.
I
think
I've
looked
at
looked
at
a
little
bit.
I've
peaked
around
their
their
open
source,
repo
so
far,
speaking
of
which
I'll
go
ahead
and
drop
that
in
chat,
but
before
we
get
into
that,
we
have
our
usual
pile
of
off-topic
topics.
To
start
with,
I
guess
we'll
start
around
the
room.
Joy
deep.
A
D
Yeah
I've
been
about
75
pages,
so
this
is
my
ritual.
When
I
finish
my
work
every
evening
close
the
laptop
I
open
the
book
and
then
usually,
I
cannot
make
more
than
five
pages
at
a
time
because
you
know
you
read
something
and
then
did
I
understand
that
you
have
to
think
about
it
right
and
then
you
have
to
refer
back
to
something
else.
It's
fascinating!
It's
fascinating!
You
know
someday
gurney,
I'll,
take
this
the
stream
and
say:
okay!
This
is
a
dump
of
a
causal
model
in
our
space
guys.
D
A
I'm
going
to
do
that.
That
is
awesome
since,
since
our
last
stream,
I
I
told
joy
deep
a
little
ahead
of
time,
because
we
were,
we
were
nerding
out
a
bit
about
it.
I
have
it
for
the
first
time
in
my
life,
having
been
in
the
industry
for
not
way
too
long,
I've
started
playing
around
with
I
originally
started,
and
this
is
me
a
person
who
works
at
red
hat
saying
this.
I
started
on
ubuntu
and
deviant-based
distros
in
college.
A
Then
we
had
a
little
bit
of
work
on.
You
know
various
raspbian-based
raspberry
pi.
You
know
a
little
bit
of
bended
computing
as
we
all
did,
and
I've
kind
of
settled
into
fedora
lately,
as
it
seems
a
lot
of
folks
have
because
I
was
shocked
to
find
you
know,
I'm
going
to
use
fedora.
I
work
at
red
hat
all
of
us
rpms,
and
then
I
discovered
that
most
of
the
communities
for
the
devices
and
other
things
in
my
in
my
home
and
in
my
computing
were
just
very,
very
good
defaults
for
fedora.
A
So
it
worked
flawlessly
on
my
laptop,
but
now
I've
I've
picked
up
a
device
that
runs
arch.
Linux,
which
has
just
been
an
interesting
and
it's
not
just
a
normal
one-
I
I
got
a
couple
of
my
co-workers-
have
a
steam
deck
as
well,
and
it
runs
a
consumer-facing
distro
based
on
arch,
which
has
been
very
interesting
found
one
of
the
one
one
other
red
hatter
was
a
maintainer
in
the
arch
community
and
said
yeah.
That
was
kind
of
an
interesting
surprise.
A
That
valve
told
us
yeah
we're
we're
going
to
use
arch
for
our
new
version
for
our
handheld
that
we're
going
to
put
in
normal
consumers
hands
and
then
trust
them
to
to
use
this
arch
linux
based
distro
with
a
pretty
thick
layer
of
ui
over
top
of
it.
So
that's
been
pretty
jody.
Have
you
have
you
gone
through
the
journey
of
building
on
and
installing
arch
I've
told
I'm
told
it's
a
reading
comprehension
test.
Basically,.
A
Oh
yeah,
I
do
it's
it's
in
this
little
case.
I
won't
show
it
off
too
much
on
stream,
because
most
people
look
it
up
or
already
know,
but
I
just
find
it
very
interesting
that
that
we've
at
this
point
we've
reached
the
point
where
oh
well,
we
had
we
hit
it
beef
we
have
android
and
ios
out
there
in
the
wild.
We
have
windows
and
mac
os
mac.
Os
is
a
unix
system.
A
D
A
Yes,
hey
there,
we
go,
we
found
our
segway,
so
I
I
think
I
think
this
goes
really
well
into
saying.
Okay,
so
you've
started
putting
linux
on
everything.
We've
started
putting
linux
everywhere.
Maybe
we
put
it
on,
perhaps
a
cell
tower.
Maybe
we
put
it
on
every
cell
tower.
Maybe
we
make
a
flavor
of
rel
and
make
a
flavor
of
kubernetes
that
runs
really
well
on
a
small
device
on
a
cell
tower.
A
B
Yeah
good
question
right
and
yeah
great
great
segue
yeah
at
those
kind
of
scales,
there's
a
lot
of
different
issues
that
start
to
pop
up
right
and
so
so
we'll
dive
in
here.
For
for
just
a
moment,
I'm
gonna
give
a
nod
to
to
what
joy
deep
was
saying.
You
know
you're
asking
about
things
that
we're
kind
of
watching
right
now.
Well,
it's
kind
of
it's
hard
to
peel
myself
away
from
the
images
coming
out
of
the
james
webb
space
telescope
recently.
B
So
when,
when
not
focused
on
that,
we
we
are
deep
into
yeah
scaling
up
at
the
edge.
You
know
within
the
telco
environment,
so
happy
happy
to
talk
about
that
as
well
yeah.
B
So
so,
like
you
said
when
you
start
scaling
up
dealing
with
managing
a
fleet
of
clusters
that
numbers
in
the
thousands
or
tens
of
thousands
as
you
get
up
to,
that
kind
of
scale
starts
to
bring
some
of
its
own
unique
challenges
and-
and
certainly
I
could
not
even
begin
to
to
run
down
the
list
of
all
of
the
issues
that
you
may
run
into.
B
But
within
what
we've
been
doing
in
the
telco
space,
there
are
some
really
interesting
challenges
that
we
had
to
tackle
around
life
cycle
events,
so
so
various
different
things
that
are
going
to
go
on
over
the
course
of
the
life
of
your
cluster
and
trying
to
manage
that
within
an
environment
that
has
really
demanding
needs
for
for
uptime
and
availability
and
and
are
really
sensitive
to
any
sort
of
disruption
to
their
operational
environment.
You
know
when
cell
phone
service
goes
offline,
nobody's
happy.
B
So
a
lot
a
lot,
a
lot
of
sensitivity
around
that
so
yeah.
So
so
we
started
working
on
something
that
we'll
talk
more
about
here,
called
topology,
aware,
lifecycle
manager-
and
this
is
an
operator
that
we've
developed-
that
that
can
be
used
to
help
address
some
of
these.
Some
of
these
issues
with
with
managing
life
cycle
events
or
trend,
changes
or
potentially
disruptive
things
that
happen
to
clusters
at
scale
and
a
lot
of
that
sensitivity,
you
know,
will
come
in
for
for
various
different
reasons.
B
Right
you
there
may
be
service
level
agreements
with
that.
A
I
think
I
think
the
service
level
agreement
may
have
been
breached
there.
This
is
interesting
because
coming
into
the
show
my
my
computer
did
lock
up
and
we
had
to
reset
so
clearly.
I
think
we
think
we're
running
on
the
same
zone
here
I'll.
I
can
hide
ian
real,
quick,
so
so
we'll
we'll
pivot
then
so
I
should
I'll
go
ahead
and
and
branch
a
little
bit
we'll
see
when
ian's
connection
comes
back
jody.
I
can
watch
him
on
the
side.
A
He
has
a
network
connection
of
0
out
of
10.,
so
so
the
lawyer.
D
D
You
know
he
was,
and
perhaps
june
can
say
this
something
very
interesting
that
my
cell
phone
connection,
when
I'm
talking
that
connection
might
be
dropped.
If
some
stupid
stuff
is
going
on
and
this
stuff
that
junior
you
guys
are
working
on,
can
prevent
that
you
guys
kidding
me.
D
C
C
A
Yeah,
so
so
we
take
computing
and
networking
topology
about
we
may
have.
We
may
have
a
primary
and
a
backup,
an
a
and
a
v,
a
blue
or
green
on
a
tower
site.
We
may
also
have
two
towers,
is
what
you're
saying
that
cover
each
other
and
have
some
coverage
overlap
and
can
take
over
it
for
each
other?
So
we're
not
just
topology
aware
for
computing,
we're
taking
computing
topology
and
distributing
it
over
physical
topology
of
land
by
putting
on
the
cell
towers.
That's
amazing.
D
A
Right,
that's
interesting,
amazing!
That's
interesting!
Oh
yeah!
I
I
also
wanted
to
say
one
note
for
an
audience.
I
actually
linked
the
their
open
source
repo
in
chat,
but
it
is
a
different
name
than
talum.
So
talm
is
used
a
lot
topology
where
life
cycle
manager
manager
management
june.
I
want
to
manager
management
manager
got
it.
D
Yeah,
I
guess
the
the
other
question
john,
is
that
is
this
related
only
to
telco,
or
is
this
related
to
only
10,
000
or
100
000?
I
mean
what,
if
I
have
let's
say,
20
clusters,
important
clusters
in
which
I'm
running
my
production,
I
mean
gurney
runs
some
of
these
right.
You
run
some
of
these.
For
us,
you
run
the
infrastructure
for
us
and
if
it's
not
working
gurney,
we
won't
cut
your
slack.
A
C
Yeah
sure,
like
this
started
with
like
a
more
telco
focused
requirement,
but
the
work
is
truly
generic,
where
you
just
have
to
yeah
wherever
you
need
to
manage
a
number
of
a
relatively
large
number
of
clusters.
A
Awesome
whenever
topology
can
play
it
can
play
a
role.
We
got
ian
back
ian.
All
you
missed
a
little
bit
of
kickoff
a
little
bit
of
intro
and
we
talked
about
moving
moving
to
the
topology
of
a
physical
topology,
a
network
of
of
cellular
towers,
and
we
also.
B
C
B
Yeah
so
great
sounds
like
we're
good.
So
was
I
just
talking
about
uptime
and
that
service
disruption.
B
And
so
my
apologies
for
that,
but
yeah,
so
so
I'll
just
kind
of
pick
up
from
there.
I'm
glad
you
touched
on
that
right.
This
is
the
the
issues
we're
talking
about
really
are
not
a
telco
specific
environment
thing.
It's
really
whenever
you've
got
large-scale
topology
lots
of
clusters
and
whether
it
be
slas
that
you
need
to
ensure
that
you're
meeting
for
for
obviously
contractual
reasons
or
it
may
be,
that
your
operations
team
has
just
through
prudence
and
experience
over
over
time
has
said.
B
You
know
what
doing
an
upgrade
of
my
entire
fleet
of
clusters
simultaneously
at
the
same
moment
is
probably
not
the
best
idea,
right
and,
and
so
you've
got
these
different
things.
B
That
say
I
I
want
to
be
able
to
have
a
higher
level
of
control
over
these
life
cycle
events,
and
certainly
when
you're
talking
about
a
large
scale
like
that
automation
really
is,
is
key,
and
so
so
we're
looking
to
try
to
bring
some
tools
to
allow
us
to
build
on
other
existing
tools
and
bring
in
these
things
that
allow
us
to
to
manage
in
a
in
a
topology
aware
way
these
life
cycle
events.
So
it
probably
makes
sense
to
talk
a
little
bit
about
what
we're
talking
about
with
topology
right.
B
So
so
clearly
we're
talking
about
thousands
of
clusters,
tens
of
thousands
of
clusters
that
are
being
managed,
so
we've
got
large
scale,
but
those
clusters
may
also
have
some
sort
of
service
level
overlap,
whether
that's
a
logical
overlap
between
the
clusters
or
in
the
case
of
a
cell
service
right.
You
may
have
some
amount
of
geographical
overlap
and
you
want
to
make
sure
that
hey,
if
I'm
going
to
go,
upgrade
a
cluster.
B
If
I'm
going
to
do
something,
that's
potentially
disruptive,
let
me
not
take
all
of
the
cell
phone
towers
in
manhattan
down
simultaneously,
let's
at
least
share
that
between
manhattan
and
philadelphia.
Or
you
know,
wherever
right
and
and
have
some
sort
of
geographical
awareness
or
or
like
I
said,
if
you've
got
logical
service
overlap,
you
may
want
to
to
to
make
sure
that
that
logically,
you've
got
some
redundancy
built
into
your
system,
and
so
you
may
want
to
not
take
down
two
that
are
logically
overlapped.
B
So
so
yeah
topology
can
kind
of
span
across
scale,
but
also
service
availability
as
well
and.
A
I
imagine
that
could
impact
someone
else's
service
availability
as
well,
because
I
I
know
I
remember
a
project
I
worked
on
a
while
ago.
There
was
a
concern
of
okay.
If
we
run
this
as
a
managed
application
was
the
use
case.
If
we
run
this
on
this
data
center
and
this
region
of
this
this
cloud
platform,
we
run
this
number
of
them.
On
the
same
on
the
same
networking
interface,
the
same
physical
networking
interface,
I
can
imagine
if
you
decided
to
take
all
of
manhattan
down
for
an
upgrade
at
the
same
time.
A
B
Yeah,
exactly
and
and
and
actually
that's
a
great
segue
right,
so
one
thing
I
haven't
started
to
dive
into
is
right:
what
are
some
of
these
disruptive
events
that
we're
really
intending
to
to
manage,
and
one
of
those
would
definitely
be
an
upgrade
of
the
base
operating
system
right?
You
know,
openshift
and,
and
sometimes
that
content
is
not
small
right.
B
The
update
may
be
fairly
large
and
if
you're,
if
you're,
if
you're,
dealing
with
bandwidth
constraints,
yeah
that
that
topology
may
need
to
take
into
account
that
that
you
need
to
to
manage
how
many
are
sharing
links,
and
so
so
you
may
build
that
into
your
topology
and
you
certainly
don't
want
to
overwhelm
the
servers
that
are
serving
up
that
data
right,
and
so
so
you
want
to
be
able
to
not
only
do
it
in
a
topologically
aware
way,
but
you
also
want
to
do
it
in
progressive
waves
right,
so
that
you're
not
doing
more
than
some
limit
that
you've
tested
to
that.
B
You
know
you
can
support
on
whatever
your
content
service,
yeah
content,
delivery
servers
are
so
yeah
there
there's
a
lot
of
different
ways
that
that
topology
can
be
sliced
up,
and
one
of
the
things
that
we
tried
to
do
within
town
is
to
not
bake
in
knowledge
of
what
those
different
mechanisms
would
be,
but
to
try
to
provide
the
set
of
tools
that
puts
that
into
the
user's
hands
and
say
you
get
to
define
what
topology
looks
like
you
get
to
what
a
progressive
roll
out
of
changes
looks
like
you
get
to
to
define
how
this
will
stage
through
and
and
and
work
its
way
through,
and
whether
you're
you're
you're
staging
change
set
one
followed
by
change,
set
two
or
doing
things
simultaneously.
B
So
so
we
tried
to
build
some
tooling
that
allows
users
to
do
that
within
town
and
some
of
the
key
use
cases
that
we
were
focused
on
when
we
were
doing
this.
I
I've
named
one
already
is
around
open
shift,
upgrades
right
and
making
sure
that
that,
when
you
do
an
open
shift
upgrade
potentially
that
that
may
be
a
disruptive
event
for
that
cluster.
B
If
you
have
a
highly
available
cluster,
certainly
far
less
disruptive,
but
not
zero
risk
either,
and
so
again
that
comes
back
to
within
your
operations
team.
You
know-
maybe
it's
not
disruptive,
but
that
doesn't
mean
that
they
necessarily
want
to
roll
out
that
upgrade
simultaneously
to
the
entire
network
right.
B
So
there's
a
lot
of
different
reasons
why
this
comes
into
play,
but
but
a
lot
of
times
when
we're
dealing
at
this
scale
out
at
the
edge
the
kind
of
clusters
that
that
we're
dealing
with
are
single
node,
openshift
and
and
within
that
context
you
do
have
a
you
know:
service
disruptive
event,
when
you're
doing
an
open
shift
upgrade
right.
So
so
again
lots
of
different
reasons,
but
openshift
upgrades
were
certainly
one
of
them.
Olm
operator
updates
are
another
again,
not
zero.
B
You
know,
non-zero
risk
may
or
may
not
be
disruptive,
but
again
the
kind
of
thing
that
we
want
to
be
able
to
roll
out
and
within
the
context
of
operator,
updates,
there's
a
really
good
inbuilt
mechanism
within
olm
operators
that
allow
you
to
subscribe
to
a
registry
and
and
and
automatically
keep
in
sync
with
that
registry.
B
So
so
so
yeah,
you
know
the
ability
to
work
through
and
and
pull
those
is
in
built,
but
within
a
an
environment
like
the
telco
environment,
you
may
not
want
to
to
do
all
of
your
operator
updates
simultaneously,
and
so
so
town
provides
some
of
the
the
functionality
there.
A
B
D
I
mean
what
this
reminded
me
was
in
my
prior
life
working
for
an
entertainment
company.
The
thing
that
I
knew
is
technically.
We
can
be
ready
to
push
out
something,
but
then
the
business
guys
they
have
real
knowledge,
which
we
had
no
clue
about
which
really
goes
to
decide
whether
you
push
it
out
or
not.
What
you
were
telling
you
know
it
struck
the
same
chord
that
you
are
providing
flexibility,
I
guess
through
apis
for
the
user
to
customize.
However,
they
want
to.
B
Yeah
exactly
there's
a
lot
of
great
features
in
openshift
right
that
allow
this
functionality
there's
a
lot
of
great
features
within
acm
that
support
and
enable
a
lot
of
this.
The
piece
that
we
were
trying
to
fill
in
the
gap
of
is,
let
give
the
user
the
tools
to
say
we
can
we
can
we
can
time
it
when
we
want
to
time
it.
B
We
can
batch
it
the
way
we
want
to
batch
it
and
we
can
roll
it
out
in
a
in
a
controlled
manner
that
allows
us
to
meet
whatever
our
operational
constraints
are
whatever
that
industry
may
be
right.
There's
there's
real
reasons
why
they
may
not
want
to
do
simultaneously
a
whole
lot
of
things,
and
so
so
we're
trying
to
give
the
give
that
additional
set
of
tooling.
B
That
builds
on
that
great
base
to
say
here
right
here:
here's
the
additional
functionality
that
you
need
in
a
in
an
operational
sense
to
go
forward
in
your
network
and
to
be
able
to
do
the
kind
of
updates
that
you're
looking
to
do
so.
I
guess
the
last
one
that
I'll
mention
is
really
any
configuration.
Change
could
potentially
be
a
risk,
and
so
it
doesn't
even
have
to
be
the
major.
What
we
consider
life
cycle
events,
but
really
any
configuration
change
could
potentially
be
something
that
a
that
a
customer
may
want
to.
A
Okay,
the
the
question
chat
very
related.
I
need
I
I
should
not
have
put
a
caption
on
related
before
we
move
on
is
a
question
about
using
satellite
as
a
local
repo,
so
they're
talking
about
the
openshift
upgrade
repo
server.
I've
worked
with
a
very
little
bit.
I
don't
know
if
you've
done
any
work
with
that
to
to
get
that
content
closer
to
the
edge
where
you're
going
to
actually
run
that
upgrade
or
not.
I
assume
that's
a
complimentary
tool.
B
Yeah,
so
so
that
that's
a
great
question
right
and
again
getting
into
the
complexities
of
what
topology
means
and
in
bandwidth
constrained
networks
right,
you
may
want
and
need
to
move
functionality
further
out.
You
know
toward
the
edge
of
the
network
and
certainly
that
content
out
toward
the
edge
of
the
network
june.
Maybe
I
can
you
know
hand
over
to
you
a
little
bit
here.
B
There's
some
primary
use
cases
that
we
focused
on
within
tallow,
but
then
there's
some
additional
functionality
as
well
that
that
comes
along
along
with
talum
that
allow
that
allows
things
like
this.
So
so
in
terms
of
moving
or
pre-caching.
You
know
and
further
out
at
the
edge
of
the
network
june.
Can
I
hand
over
to
you
on
that
sure,
like.
C
Yeah,
so
we
have
this
building
feature
where
we
can
look
at
the
upgrade
you
want
to
achieve,
but.
C
We
can
make
all
the
clusters
involved
to
pre-download
every
all
the
artifacts,
that's
required
so
that
when
you
actually
enable
this
or
let
this
upgrade
to
start,
you
know
all
these
clusters
already
have
the
artifacts
local
like
right
on
the
node.
So
you
have
much
like
a
yeah.
We
know
for
the
edge.
C
Often
we
have
that
limited
bandwidth
or
flaky
connections.
That's
not
good
for
bulk
download
right!
Okay.
So
it's
important
that
when
we
started
the
upgrade,
we
already
prepped
this
relatively
risky
step
beforehand
right.
So
that's
that's
yeah!
That's
what
we
do
for
this.
D
So
I
guess
june,
what
you're
talking
about
here
is
again.
We
are
talking
about
real-world
systems
right,
so
you
have
to
complete
the
maintenance
within
a
certain
time.
You
have
to
complete
the
upgrade
within
a
certain
time.
So
you
only
start
the
upgrade
once
you've
made
sure
all
the
prereqs
like
downloading,
and
things
like
that
are
done
and
you
are
allowing
those
to
be
done.
Prior
yeah.
C
D
B
Yeah
you'll
probably
hear
us
say,
say
the
word
progressive
quite
a
few
times.
You
know
during
the
course
of
this
right
because
it
really
is
about
enabling,
rather
than
one
rapid,
big
monolithic
thing
to
happen
across
your
fleet,
right
to
to
break
it
up
into
chunks
and
so
yeah.
I've
talked
a
good
bit
about
breaking
it
up
into
logical
chunks.
B
To
you
know,
for
overlap
in
that
sense,
and
what
june
was
just
describing
is
is
breaking
it
apart
in
chunks
time
wise
right
and
allowing
different
phases
of
the
of
the
change
to
be
done
in
in
two
separate
events,
and
you
know,
as
joy
deep
said
right,
you
may
have
certain
windows
of
time
where
you're
allowed
to
make
those
changes
or
allowed
to
do
those
things
based
on
your
slas
or
whatever
it
happens
to
be,
and
so
that
allows
you
know
that
feature
of
town
allows
you
to
do
that
pre-caching
and
then
and
then
initiate
that
that
upgrade
yeah.
D
I
mean
just
just
one
physical
question:
garnier
ian
june,
you
guys
are
talking
about
things,
singular
openshift
telco.
Are
they
those
small
boxes
we
see
while
driving
by
which
are
mounted
a
tower
in
no
man's
land?
Sometimes,
are
you
talking
about
those
kind
of
things.
B
Yeah,
there's
a
yeah
there's
a
lot
of
different
areas
right
where,
where
those
servers
can
can
be
deployed
right
and
it
could
be
yeah
right
there
at
the
cell
phone
tower
at
the
the
you
know,
the
base
station-
that's
right
there,
distributed
out
at
the
edge
we've
seen
how
many
towers
there
are
you
get
the
the
sense
of
the
kind
of
size
and
scope
of
you
know
what
we
may
be
talking
about
here
and
then
progressively
further
back
into
the
network
right,
there's
a
lot
of
different
places
that
that
that
openshift
has
some
real
fantastic
ability
to
address
problems
and
so
yeah
it's
it
definitely
does
span.
B
A
lot
of
you
know
the
edge
all
the
way
back
toward
the
core
of
the
network
and
and
depending
on
where
you
are
within
that
network.
Different
cluster
topologies,
single
node
versus
compact
clusters
versus
you
know
a
larger
scale.
You
know
full
cluster
right
yeah.
Those
can
come
into
play
as
well.
B
B
A
I
guess
you
so,
and
you
probably
we're
about
to
go
straight
into
a
demo
where
you
probably
have
this
so
might
be
good
timing,
I'm
curious:
what
is
it
does
talim,
do
some
work
to
discover
the
topology
and
understand
some
of
these
constraints
before
the
user,
or
does
the
user
define
and
say
tell
them?
This
is
what
my
network.
This
is
what
my
fleet
looks
like
from
the
things
that
you
can.
You
can
determine
you
can
discern.
B
B
B
Great
question
so,
and
a
good
segue
right
in
so
so
again
taum
is,
is
a
it's
a
it's
a
tool
right
and
it's
something
that
builds
on
top
of
other
components
of
the
solution
here
right.
B
As
you're
doing
you
know,
life
cycle
event,
you
know
some
sort
of
progressive
rollout
that
you
want
to
do
so,
let
let
me
throw
up
a
slide
and
if
it
doesn't,
if
I
haven't
answered
your
question
you
can
you
can
definitely
double
down
on
on
the
question
happy
to
to
continue
to
dive
deeper.
B
Right,
hopefully,
hopefully,
it's
reasonably
legible
here,
so
so
I
wanted
to
throw
through
this
slide
up
to
try
to
give
a
sense
of
where
town
fits
within
the
the
broader
pieces
of
the
solution
here.
So,
as
I
mentioned
before,
town's
an
operator-
and
it
runs
on
the
hub
cluster
and
builds
on
features
that
are
available
within
advanced
cluster
management.
Acm
and
really
the
unit
that
talm
is
using
for
rolling
out
changes
to
the
network
are
policies,
and
so
the
user
has
the
chance
to
describe
what
they
want.
B
The
end
state
of
their
network
to
look
like
within
policy,
and
I
I
won't
dive
super
deep
into
into
policy,
because
I
know
you
just
had
a
great
a
great
session
on
this
within
the
last
few
weeks.
So
if
folks
haven't
heard
that
I'll,
throw
in
the
plug
right,
great
deep
dive
into
policy
available
in
the
show
archives,
but
so
so
the
unit
of
work
is
policy.
B
So
the
the
the
user
here
can
describe
whether
it
be
an
open
shift
upgrade
or
a
change
to
configuration
or
an
olm
operator
update.
They
can
describe
that
in
a
policy
and
and
then
talum
has
the
ability
to
say
all
right.
B
Let's
take
that,
let's
look
at
the
set
of
clusters
that
that's
bound
to
and
let's
start
to
progressively
roll
that
out
through
through
the
network
and
and
there's
a
lot
of
different
ways
that
can
manifest
itself
and,
like
I
said
across
different
different
life
cycle
events,
but
that
that's
the
unit,
calm
iterates
over
those
policies,
it'll
do
them
in
order,
and
so
you
can,
you
can
actually
specify
an
order
to
the
policies
that
you
want
it
to
to
remediate.
And
then
you
say
across
this
large
set
of
clusters.
B
I
want
you
to
go
and
and
roll
it
out
five
at
a
time,
ten,
at
a
time,
five
hundred
at
a
time
yeah,
whatever
that
that
increment
of
or
wave
size
is
it'll,
do
that
many
concurrently
and
then
it'll
move
on
to
the
next
set
move
on
to
the
next
set.
So
to
answer
your
question:
gurney,
it's
a
combination
of
the
placement
rules
and
the
placement
bindings
that
go
along
with
policies
along
with
cluster
labels
that
allow
you
to
do
some
selection,
that
let
you
define
how
you
want
to
roll
things
out.
A
That
makes
sense,
so
basically
you
build
you,
you
you
build
via
the
building
blocks
of
policy
and
labeling
and
and
all
of
these
other
constructs
and
you're
able
to
build
a
structure
that
says
here's
what
my
network
looks
like
and
then
you're
allow
you're
able
to
build
actions
that
you
want
to
carry
out
on
that
network.
So
it's
kind
of.
A
A
B
So,
let's,
let's
dive
through
an
example
and
see
if
that
kind
of
helps,
so
apologies
can't
fit
quite
as
much
on
the
screen
and
make
it
legible
simultaneously.
So
so
we'll
we'll
see
how
this
goes
here.
B
I've
scripted
this
out
a
little
bit
to
simplify
but
I'll
talk
through
the
steps
and
I'll
show
some
different
pieces
here.
The
first
thing
I'm
going
to
do
is
apply
a
couple
policies
that
are
going
to
describe
my
my
changes
within
the
network,
you'll
notice
on
the
left
side
of
my
screen
here
in
the
red.
These
are
the
actual
sites.
B
I
actually
have
five
of
them
configured
up
on
this
hub
cluster
and
so
we're
going
to
roll
out
a
set
of
changes
to
those
five,
and
so
the
first
thing
that's
happening
here
and
I'll
zoom
in
a
little
bit.
It's
just
creating
two
policies,
so
so
you'll
see
a
the
first
policy
here
and
then
the
second
one
here
and
it
creates
the
policy
and
the
associated
placement
rules
and
placement
bindings.
Sorry
there
we
go
all
right,
so
it
applied
those
to
the
hub
cluster
in
the
bottom
right
here.
B
This
is
a
view
of
the
hub
cluster,
and
so
you
can
actually
see
the
see
the
policies
applied
here.
So
you
see
two
inform-based
policies.
That's
that
is
one
of
the
key
things
to
what
talm
is
doing
is
that
we
create
all
of
our
policies
as
inform
based
policies,
so
they
don't
take
immediate
effect
in
changing
the
clusters
that
are
out
in
the
network,
but
you
do
get
that
immediate
visibility,
and
so,
if
I
jump
over
into
acm,
this
is
the
acm
policy
governance
view.
B
Let
me
zoom
in
a
little
bit.
I
don't
know
if
that
makes
it
hopefully
a
bit
more
readable
again,
you
can
see
these
five
clusters
and
you
can
see
that
there
are
two
policies
that
are
not
compliant,
because
these
are
describing
a
change
that
I
want
to
make,
but
that
I
haven't
made
yet
and
on
the
left
side
here,
you'll
see
under
config
the
two
policies:
one
is
creating
a
config
map
and
the
other
one's,
creating
a
secret,
so
trivial
changes
but
good
good
for
demonstration.
B
So
you
can
see
under
these
clusters
no
config
map,
no
no
secret,
so
we're
basically
sitting
in
a
state
where
we've
described
the
change
but
not
not
rolled
it
out
yet.
So
the
next
thing
I
want
to
do
is
apply.
A
cluster
group
upgrade
cr
this.
This
cr
is
what
describes
to
talum
what
you
want
to
do
and
june's
going
to
give
a
deeper
walk
through
of
what's
in
there,
but
the
the
the
two
high-level
things
that
I
want
to
point
out.
B
We
list
off
the
policies
that
we
want
it
to
remediate,
so
you
can
see
here
the
config
map
and
the
secret
policy,
and
we
tell
it
what
clusters
we
want
it
to
apply
to,
and
I'm
just
doing
it
by
label
here.
So
all
of
these
clusters
appear
to
be
named
after
space
shuttles,
and
so
so
the
the
label
fleet
equals
shuttles
is
common
to
all
of
them.
B
So
I'm
basically
saying
I
want
to
update
all
five
of
these,
but
I
want
to
do
at
most
three
at
a
time
when,
when
I
created
that
cluster
group
upgrade
cr,
you
can
see
here
that
it's
enabled
false,
and
so
it's
giving
us
the
status
saying
hey
the
upgrade
is
not
started
yet.
So
the
next
thing
I
need
to
do
is
to
go,
enable
that-
and
that's
just
a
simple
patch
to
that
cluster
group
upgrade
cr
and
and
now
talum
is
actually
remediating
those
clusters.
B
Let
me
zoom
in
on
this.
This
screen
here
a
little
bit
you'll
see
that,
in
addition
to
the
to
the
inform
policies,
we
now
have
enforced
copies
of
those,
and
this
is
how
it's
actually
pushing
those
changes
out
to
the
network.
It's
taking
those
and
in
this
case
three
clusters
at
a
time,
I'll
jump
back
here
to
the
acm
view,
and
it's
a
little
easier
to
see.
You
can
see
it's
remediating,
those
three
clusters,
the
the
first
policy,
is
now
done.
The
second
one
is
about
to
be
done.
B
The
reason
it
says
four
here
is
remember:
we
have
an
inform
and
an
enforced
copy
that
enforced
copy
will
disappear,
so
the
first
batch
of
three
is
done.
It's
now
moved
on
to
the
second
batch,
the
two
remaining
clusters:
it's
remediating,
those
and
in
about
the
next
I
don't
know
20
seconds
or
so
those
will
complete
and
and
all
five
clusters
will
have.
You
can
see
the
the
config
map
here
has
been
populated
based
on
the
first
policy.
B
That's
because
talum
is
completed
with
its
work
and
you
can
see
it
moved
to
the
state
upgrade
completed
and
it
actually,
I
didn't
mention
this
at
the
beginning.
Town
will
will
label
the
clusters
before
and
after
to,
let
you
know
kind
of
what's
going
on,
and
so
you
can
actually
track
status
through
those
labels
as
well.
So
that
was
a
super
fast
run
through
and
I
know
I
jumped
around
the
the
screens
a
little
bit.
B
So
apologies
for
the
jumping,
but
as
you
can
see,
we
went
from
non-compliant
to
fully
compliant
across
the
entire
fleet
of
clusters,
but
we
did
it
in
two
batches
yeah
as
it
progressively
as
town
progressively
rolled
that
out
I'll
pause.
There.
A
B
Yeah
so
so
tom
will
will
create
those
batches
itself
right.
The
way
tom
is
built
today
you
get
to
define
what
gets
included
in
the
set,
and
so
I
did
it
by
fleet
equals
labels
right.
So
imagine
the
scenario
that
you
had
where
I
had
some
amount
of
overlap,
and
you
know
even
versus
odd,
and
I
don't
want
evens
and
odds
to
be
offline
simultaneously.
B
I
could
easily
build
a
cluster
group
upgrade
cr
that
said,
go
go
roll
this
out
progressively
50
at
a
time
500
at
a
time
on
all
of
the
even
nodes,
and
then
when
the
even
nodes
are
complete
right
then
then
you
can
hand
off
and
go
ahead
and
do
an
update
of
all
of
the
odds
again
50
or
500
at
a
time
right,
and
by
doing
that,
you
get
that
that
ability
to
say
I
don't
want
to.
I
don't
want
to
have
these
overlapping
services
down
simultaneously
yeah.
A
B
Yeah,
so
so
everything
this
operator
is
doing
is
in
units
of
the
policies
that
you
create
right,
and
so
so
logically,
what
it's
doing
is
it's
saying:
you've
created
one
or
two
or
even
a
dozen
in
form,
based
policies
and
now
you're,
instructing
tom
to
say
I
want
you
to
go
out
and
I
want
you
to
enforce
those
inform-based
policies
and
so,
rather
than
flipping
a
switch
in
the
policy
and
saying
enforce
and
having
it
apply
simultaneously
everywhere,
it's
going
to
slowly
roll
that
out
right
in
at
the
rate
and
in
the
batch
size
that
you
that
you've
defined
and
again
you
have
the
control
by
by
labels,
which
sets
of
clusters
right,
because
those
policies
may
apply
to
the
entire
fleet
of
10
000
clusters,
but
using
labels
to
select
within
town.
B
You
can
do
a
subset
of
that
and
say.
Maybe
I
only
want
to
do
a
hundred
clusters
out
of
my
10
000
initially,
and
let
that
soak
for
a
week
as
a
canary
set
and
it'll
roll
that
out
and
then
you
can
say,
that's
been
successful
for
this
week
or
this
month.
Now
I
want
to
roll
it
out
to
the
rest
of
the
10
500
at
a
time.
B
D
Right
so
technically
in
the
in
the
policies
speak
of
terms
for
advanced
cluster
management.
What
you're
stating
is
that
the
initial
policy
that
you
create,
though
you
really
want
to
make
sure
that
a
config
map
exists?
You
just
create
an
informed
policy
to
that
that
will
that
will
basically
return
that
now
the
configmap
doesn't
exist,
then
talon
will
pick
up
and
ensure
that
the
config
map
is
indeed
created
at
time
and
pace.
As
you
said
in
the
api
of
time,.
D
A
I
have
we
to
describe
my
my
shock.
We
have
literally
had
this
problem
before
so.
We've
asked
the
question:
okay,
we
need
to
do
a
dark
roll
out
of
an
update.
We
want
to
do
that
for
some
percentage
and
we
want
to
make
sure
we
don't
have
increased
api
or
you
know,
error
rates
before
we
set
it
live
I've
even
worked
with
tech
in
the
goodness
joy
deep.
You
may
have
worked
with
it.
Some
too.
There
is
a
certain
package
that
I'm
remembering
for
a
ui.
D
B
But
but
I
did
want
to
talk
about
you
know
a
couple
other
things
so
so
you
know
we
mentioned
operator
updates
as
well,
and
so
so
operators
are
a
little
bit
unique
in
in
that
they
they
have
updates
available
in
a
registry,
and
so
when
you
go
update
that
registry,
we
want
to
be
careful
that
that
update
in
the
registry
does
not
immediately
propagate
out,
and
so
so
operators
have
the
ability
to
to
be
set
into
a
manual
mode
where
updates
are
only
applied
when
they're
told
to,
and
so
so
talum
has
some
features
built
into
it
that
that
allow
operators
to
be
handled
specially
and
for
town
to
actually
act
like
an
operator
or
a
user.
B
Going
down
to
that
cluster
and
saying
I
want
to
approve
the
operator
update
on
this
particular
cluster
june.
I'm
going
to
kick
this
off
here
in
in
a
moment.
Do
you
just
want
to
give
a
little
bit
of
you
know
a
little
deeper
dive
on
how
talm
deals
with
operators
and
how
it's
how
it's
actually
doing?
The
the
work
around
operator
approvals.
B
So
I
think
june.
A
D
C
Yeah
for
for
operator
upgrade
even
for
for,
like
ocp
upgrade
like
tom,
can
look
into
the
policy
and
recognize
these
policies
they
are.
They
are
for
upgrades
and
do
specific
things
like
one
example
is
a
precache
example.
C
We
we
already
talked
about
okay,
we
will
look
at
the
versions
and
do
the
download
beforehand
right
and
another
example
is
where,
when
there
is
a
operator
up
upgrade
tom
will
create
the
will
monitor,
monitor
the
subscription
status
on
those
on
those
clusters
and
do
the
manual
approval,
that's
normally
done
by
by
operator
when
we
reach
that
it's
called
upgrade
pending
status.
So
that's
that's.
This
there's
no
logic
for
operator
upgrade
policy.
B
B
So,
what's
happening
in
this
particular
demo
is
talum
is
working
through
we
had.
We
had
installed
the
5.3.8
version
of
cluster
logging
and
we
told
it
through
a
policy
to
go
upgrade
to
5.4.2
and
it's
working
its
way
through
again
in
batches
of
maximum
size
of
three,
and
so
it's
it's
updating
those
operators.
Again.
Apologies
it's
hard
to
see
text
on
a
screen
here,
but
you
can
see
in
some
cases
right.
B
This
endeavor
one
has
already
been
upgraded
to
five
four
two
and
the
last
two
sites
are
in
the
process
of
being
updated
right
now,
they'll
move
to
5.4.2
as
soon
as
town
recognizes
the
upgrade
pending
switch
that
manual
to
false
sorry,
let's
switch
the
manual
approval
to
true
there.
We
go
just
did
it
and
you
can
see
that
immediately.
That
operator
is
now
updating
so
again
just
wanted
to
demonstrate
that
the
town
is
dealing
with
both
openshift
upgrades
configuration
changes,
operator
updates
as
well.
A
Okay
yeah
this
is
this-
is
a
generic
enough
tool
that
I
can.
I
can
use
a
policy
to
teach
it
that
I
need
you
to.
I.
I
need
this
to
look
like
this
and
and
to
enact
that
change.
I
need
you
to
change
this
setting.
So
that's
what
it's
doing
there.
It's
toggling
that
manual
yeah
exactly
yeah,
that's
wild.
A
We
did
have
a
question.
I
wanted
to
surface
I'll
I'll
splash
it
up
here,
but
how
can?
How
can
you
convince
your
team
and
your
management
to
do
frequent
updates,
there's
a
push
and
there's
resistance
to
perform
upgrades?
I
know
we've
seen
this.
I
am
part
of
this,
sometimes
as
a
person
who
operates
a
bunch
of
open
shift
infrastructure,
any
incentives
to
update
frequently.
Why
should
we
not
miss
out?
I
I
think
I
think
this
is
a
good
place
to
say
it
sounds
like
from
my
perspective.
A
Talum
is
the
best
tool
in
the
world,
for
if
I
have
redundant
infrastructure
I
can
actually
have
a
blue
green
environment
and
I
can
bring
blue
green.
I
can
bring
blue
up
to
date
using
this
tool,
and
then
I
can
wait
and
see
how
that
behaves
and
then
green
can
come
along
with
it
once
things
are
healthy
once
this
has
proofed
that
out.
B
Yeah,
so
so
I
I
feel
like
that
could
be
a
show
fully
in
and
of
itself
right.
How
do
we
convince
folks
to
do
more
frequent
updates
and
keep
more?
You
know,
keep
as
current
as
possible
that
it's
a
great
topic
relative
to
town
and
relative
to
what
we've
been
talking
about
here
right.
One
of
the
ways
you
convince
people
to
do,
updates
faster,
is
to
make
it
safer
right
is
to
reduce
risk.
B
You
know,
p
people's
desire
to
not
update
is
general,
it's
a
risk
risk
reward
equation,
and
so,
if
we
can
provide
tools
and
provide
mechanisms
to
lower
that
risk,
I
I
certainly
think
that's
part
of
the
puzzle.
I
you
know
won't
go
so
far
to
say
that's
the
whole
the
whole
puzzle,
but
but
it
I
think
it
does
come
down
to
reducing
risk.
Yeah.
A
And
I
think,
caching
that
content
for
me,
an
openshift
upgrade,
can
take
like
two
hours.
If
I
can
cache
that
content,
if
I
can
make
sure
those
updated
images
are
there
and
that
upgrade
doesn't
have
to
pull
a
bunch
of
content,
it
happens
even
faster.
That
means
my
window
for
something
to
go
wrong
is
so
much
tighter.
It's
so
much
slimmer
and
that's
that's
amazing.
D
Yeah,
and-
and
this
just
does
give
a
this-
does
strongly
incentivize
you
to
do
upgrades
more
frequently
because
it
as
ian
you
mentioned
this,
takes
care
of
certain,
makes
it
a
little
bit
more
solid
but
depending
on.
D
B
Help
to
shift
that
risk
reward
balance,
because,
on
the
other
side
of
that
question
is
why
would
I
want
to
upgrade
and
with
the
constant
set
of
cves
and
security
threats
right
that
there
there's
real
motivation
to
want
to
do
updates
right
to
stay
to
say,
stay
current
on
the
latest
versions
of
things,
so
that
security
flaws
issues,
holes
whatever
are
are
closed,
and
so,
if
you
can
provide
tools
that
lower
the
risk
operationally
of
rolling
those
out,
it
helps
to
shift
that
balance.
A
bit
yeah.
A
The
old
the
old
risk
reward
chart.
I
I.
A
Mind
I
hope
that
answers
the
question.
Please
shout
out
and
chat
if
there
you
have
any
other
questions.
Thank
you
for
that.
That
was
awesome.
B
Yeah
so
certainly
happy
to
take
more
questions.
If
you
guys
have
more
questions,
that's
great.
I
did
want
to
put
this
up
here.
I
promised
that
we
would
do
a
little
bit
of
a
deep
dive
into
how
how
talum
is
configured
and
yeah
some
of
the
options
that
are
here
so
june.
Maybe
maybe
I
can
hand
off
to
you
and
and
let
you
do
a
bit
of
a.
C
Different
dive
in
here
yeah
sure,
like
I
think,
we've
covered
most
of
them
like
because
you
well,
the
first
part
is,
or
generic
name
name
space
right
like
the
starting
with
the
spec
right.
The
actions
part
we
briefly
mentioned.
So
that's
that's
a
another
nice
feature
where
you
can
label
your
targets,
your
clusters,
at
different
point
of
the
the
upgrade
the
process
like
a
before
or
after,
and
so
that
you
can
easily
see
which
ones
are
in
flight
which
ones
are
completed
and
yeah.
So
that's
that
part
action.
Then
the
cluster
selector.
C
We
talked
about.
There's
there's
that
enable
flap
right
like
yeah
the
other.
The
other
thing
I
want
to
mention
is
this:
enable
act.
This
enable
part
is
really
important
because,
for
example,
the
pre-caching
and
all
the
other,
like
validation,
look
verifying
the
managed
policies
do
exist
and
you
do
have
clusters
matching
the
the
labels.
This
can
all
be
done
beforehand
like
before
you
actually
flip
this
enable
flag,
you,
you
know
everything
as
much
as
we
can
right
like
you
know,
everything's
downloaded
and
your
policies
are
in
good
shape.
C
So
then
yeah
that's
there's
a
list
of
the
policies
and
yeah
we
reinforce
them
in
order
right.
The
other
thing
I
want
to
mention
is
within
each
batch
like
each.
C
We
progress
each
cluster
independently
like
so
they
can.
It's
not
like.
We
do
policy,
one
all
the
clusters
until
we
move
on
to
the
next
one,
so
within
the
batch
they
can
actually
go
on
their
own
pace.
Then
the
last
one
is
yeah
strategy
where
we
define
the
batch
size
essentially
and
the
overall
timeout
yeah.
I
think
that's
yeah.
That's
it.
D
So
so
this
is,
this
is
the
api,
where
let
me
play
it
back.
What
you're
defining
here
is
that
you're
telling
that
hey.
I
have
config
policy,
one
config
policy,
two,
so
first
roll
out
config
one
and
then
roll
out
config,
two
and
roll
it
out
three
clusters
at
a
time
in
parallel
and
we
select
the
clusters,
as
per
cluster
level,
fleet
equal
to
shuttles,
and
then
you,
you
are
instructing
that
hey
before
you
start
do
these
labels
and
after
you
complete
do
these
labels
right.
That's
the
api,
fantastic.
C
Yeah,
so
if,
for
some
reason,
your
clusters
can
never
reach
could
never
become
compliant
to
to
one
of
or
all
of
the
policies
at
within
this
timeout
period,
then
the
cgu
status
will
say:
upgrade
timed
out
yeah.
B
Allows
you
to
keep
from
from
getting
stuck
right.
You
know
talking
at
the
scale
of
ten
thousand
right,
you
have
a
cluster
go
offline
or
something
like
that.
You
don't
want
that
to
to
hold
out
hold
up
the
rollout
right,
and
so
so
there's
timeouts
built
in
that.
Allow
you
to
come
back
and
and
deal
with
those
clusters
after
the
fact
and
and
figure
out
what
went
wrong,
whether
it
went
offline
or
whether
there
was
an
issue.
B
A
That
way,
you
can
know
what
you
need
to
remediate,
but
also
at
the
scale
of
like
10
000
items.
I
can
imagine,
there's
probably
one
or
two
or
ten
or
a
hundred
that
are
down
at
any
point
in
time
for
some
reason:
we're
talking
physical
hardware
sitting
out
in
the
wild,
so
there's
there's
a
decent
chance,
you're
going
to
have
some
level
of
acceptable
outage
at
any
point
in
time.
A
B
Yep
yeah
exactly
so.
This
is
designed
to
get
the
the
maximum
amount
of
work
done,
that
it
can
right
to
move
your
entire
fleet
forward
based
on
these
policies,
but
it's
not
trying
to
re-implement
or
anything
about
policies.
Right
policies
are
a
really
good
tool
for
managing
the
state
and
for
describing
how
you
want
things
and
they've
got
the
visibility.
So
you
can
see
what
you
know
which
clusters
are
compliant,
which
ones
aren't.
This
is
just
an
adder
on
top
of
that
right.
A
Amazing,
that
is,
that
is
awesome
and
then
oh
yeah,
the
other.
The
other
important
question
that
I
saw
and
I'm
stealing
joy,
deep's
question
here
by
the
way,
a
little
back
room,
joy,
deep
and
I
have
random
thoughts
beforehand
that
we
make
sure
we
write
down.
I
have
them
typically
in
the
shower.
The
good
shower
thought
is:
does
get
up
fit
well
into
this
paradigm,
so
I've
pushed
a
change.
Everything
is
driven
by
git
ops.
I
have
my
fleet
of
clusters,
that's
all
driven
by
get
ups.
A
B
Hit
the
hit
the
nail
on
the
head
yeah,
I
think
I'm
slightly
embarrassed
that
you
said
get
ups
before
I
did
we're
all
about
get
ups
again
right,
we're
dealing
with
with
scales
of
thousands
and
tens
of
thousands
of
clusters
right
that
that's
got
to
be
manageable
in
a
really
rigorous
way
and
get
ops
is
a
fantastic
way
to
do
that,
and
so
again,
that's
a
whole
topic
on
its
own.
But,
yes,
we
haven't
supplanted
anything
in
those
flows
right.
B
A
That's
so
so
I
can
make
my
get
ops
change
and
not
you
know,
finger
check
a
change
to
10
000
clusters
at
the
exact
same
time
with
no
remediation,
no
time
out
and
and
everything.
A
Yeah,
so
yes
exactly
get
get
blame
who
to
blame
for
this
one,
that's
in
the
canary
that
canary
column,
I'm
I'm
really
curious
about
that.
That's
that's
very
interesting
that
I
can
have
that
defined.
You
know
you,
you
push
a
change.
Pr
comes,
I
can
imagine
push
a
change.
Pr
comes
in,
it
runs
through
ci,
it
gets
merged
and
then
you
still
have
one
extra
protective
layer
of
canary
in
that
roll
out
to
see.
If
something
goes
wrong,
your
ops
team
won't
blare
up
immediately
and
say:
oh
no
everything's
wrong.
B
D
B
Yeah,
exactly
so
yeah
june
highlighted
a
couple.
You
know
a
couple
things
here
that
are
right.
These
labels,
for
example,
right
these
are
these-
are
things
that
are
not
core
and
central
to
to
the
story
of
progressively
rolling
out
the
state
and
the
changes,
but
these
are
really
features
that
go
with
that
additional
theme
of
let's
make
this
really
usable
in
an
operational
environment
right,
let's
put
some
additional
tools
in
here
that
make
it
easier
for
people
that
are
using
this.
B
I
think
we've
got
one
more
slide
here,
yeah
that
talks
about
a
few
of
those
those
other
features,
and
so
at
least
briefly,
I
wanted
to
to
just
touch
on
these
or
june.
I
think
you
can
do
a
much
better
job
touching
on
these
than
I
can.
C
So
yeah,
I
think,
we've
touched
pretty
much
all
of
that
right,
like
a
training,
that's
where
we
do
blue
green,
like
a
one
one
cr
for
blue
one
c
of
green
and
chain
them
together,
so
that,
like
the
one
doesn't
start
until
the
other
one
completes
right
and
we
talk
about
the
the
sequencing
and
the
ordering
it's
like.
C
We
do
enforce
them
on
every
cluster
in
the
same
order
as
they
show
up
in
english,
er
right
and
and
then
we
talk
about
the
the
pre-post
actions,
pre-caching
yeah
and
I
think
yeah
you
hit
this
really
well
too,
like
you
make
a
change
like
in
the
policy,
but
they
don't
take
effect,
but
you
can
see
who's
which
clusters
will
be
impacted
if
if
we
do
make
it
happen
right,
so
that's
yeah.
I
think
that
that's
we
covered
this
one
pretty
well
excellent.
B
D
D
C
Yeah
yeah,
okay,
so
my
my
okay,
maybe
that's
just
one
use
case
in
my
mind
like
if
you
make
a
change,
you
expect.
Maybe
this
set
of
clusters,
I
mean
half
of
your
cluster
is
supposed
to
go
non-compliant
but
the.
But
you
made
a
mistake
and
you
see
all
of
them
when
non-compliant,
then
you
know
right
away.
This
is
something
I
need
to
take
another
look
at
all
right,
right,
yeah,
I
don't
know
yeah
you.
C
B
That's
that's
huge.
The
other
one
is
maybe
yeah,
maybe
made
a
mistake
right
and
there
was
a
typo
and
that
managed
to
get
get
it
get
its
way
through.
You
have
the
opportunity
to
review
that
in
in
the
inform
policy
prior
to
pushing
it
to
your
cluster
right
it
doesn't
it
doesn't
if
it
were
an
enforce-based
policy,
the
moment
you
hit
get
push
and
that
got
synced
to
the
hub.
That's
going
to
start
going,
live
immediately
right.
So
again
you
have
the
ability
to
see
the
scope
and
impact
of
all
of
your
changes.
A
A
So
a
goes
through
waves
b
goes
through
waves,
so
I
can,
I
can
have
a
region
that
has
an
a
and
a
b
that
are
labeled
separately,
a
blue
and
a
green,
and
I
can
do
blue
and
then
I
can,
if
blue's
up
and
successful,
and
meets
these
criteria,
then
I
can
do
b
in
a
chain
afterward
and
the
same
for
regions.
Maybe
I
maybe
I
update
northeast
and
then
I
go
southeast
and
then
I
do
central
and
then
we
just
go
through
these
regions
as
a
chained
series
of
upgrades.
C
C
Yes,
one
more
note
on
training:
it
can
be
yeah.
It
definitely
can
be
used
that
the
way
you
just
described,
but
it
can
also
be
used
to
the
same
set
of
clusters,
but
where
you
want
to
group,
your
change
sets
into
two
pieces
and
you
want
the
first
group
to
be
applied
to
all
the
clusters
until
you
start
the
second
piece
on
any
one
of
them.
So
that's
the
another
dimension.
A
A
So
if
I
have
three
sre
teams
for
three
different
applications-
or
you
know,
sre
teams
doing
rollouts
for
three
different
applications,
app
a
is
going
to
make
a
change
through
that
sequence,
they're
on
call.
Okay,
it
completes
at
b,
starts
the
roll
out
yeah
make
change
or
even
dependent
changes.
I
need.
I.
B
Is
awesome,
yeah
right,
another
really
practical
example
right
I
want
to
go.
Do
all
of
my
open
shift
upgrades
and
then
I
want
to
follow
that
with
my
operator
updates
right
now.
I
want
to
do
them
in
that
order.
So
yeah
a
lot
of
different.
D
B
And
bend
it
to
to
the
type
of
operational
scenarios
that
that
you
want.
D
D
B
You
can
really
you
can
tailor
the
way
that
the
change
occurs
right
to
be
tuned
to
what
you
want
to
happen.
Yeah.
D
D
B
So
yeah,
so
we've
covered
a
lot
of
ground,
there's,
certainly
more
more
aspects
of
of
town
that
that
we
can
dive
into
we're
always
happy
to
take
questions.
But
we
kind
of
mentioned
it
up
top,
but
I
did
did
want
to
mention
it
again
right,
thomas
building,
on
top
of
a
lot
of
really
good
technologies
and-
and
we
really
benefit
from
those.
B
Obviously,
we've
talked
a
lot
about
policy
and
acm
and
and
the
tools
that
acm
is
providing
within
our
use
cases
it
it
intersects
incredibly
well
with
the
initial
deployment
of
clusters
and
the
assisted
service
and
we've
gotten
fantastic
help
from
from
our
from
our
integration
and
our
field
teams
to
to
to
give
feedback
on
this
and
to
help.
You
know
work
through
some
of
these
real
operational
constraints.
B
So
a
lot
a
lot
of
really
good
technology
and
a
lot
of
good
folks
in
in
building
this
out,
and
then
I
can't
leave
off
you
know.
One
of
my
favorites
as
well
is
testing
these
things
at
scale.
Just
sheds.
Some
really
interesting
light
on
how
valuable
it
can
be,
and
also
where
things
start
to
rattle
and
and
give
us
the
opportunity
to
tune
those
up
and
and
and
really
have
this
roll
out
at
scale
and
deal
with.
You
know
large
scale
fleets
like
that.
B
Over
2
000
clusters
is,
is
the
the
typical
scale
environment
that
that
we've
been
working
in
so
yeah
we've
we've
rolled
this
out
to
thousands
and
and
certainly
then
you
can
start
to
scale
this
out
more
horizontally
as
well.
So
so
you
can
get
to
to
those
really
large-scale
deployments
by
then
replicating
this
out.
C
A
So
last
question:
I
always
ask
we're
right
about
time,
so
we'll
wrap
up.
It
has
been
a
pleasure
having
you
but
the
most
important
two
questions.
First,
how
can
people
get
their
hands
on
this?
I
have
okd.
I
have
ocp,
I'm
guessing
from
the
installing
section
of
your
readme.
It
looks
like
that.
Your
git
repo
is
the
best
place
to
go
right.
B
Yeah
you
can
you
can
absolutely
do
that
right
this
pro.
This
is
a
project
in
the
the
upstream
repos.
So
so
I
think
you
posted
the
link
to
that
up
front
and
if
not,
we
can
certainly
drop
a
link
here
into
the
the
cluster
group
upgrades
repository
on
github.
We
certainly
have
the
downstream
versions
as
well
within
red
hat.
You
know,
built
out
as
an
operator
and
and
yeah
just
builds
on
top
of
all
that
great
work
in
acm
and
the
policy
engines.
B
Yeah
so
it'll
be
published
as
the
topology
aware
life
cycle
manager,
I
believe,
is.
A
A
Amazing,
okay!
Well,
I
think
that
wraps
us
up
unless
you
guys
have
any
parting
thoughts,
I
the
only
other
one,
is
send
us
an
email
at
cloudmultiplier
at
redhat.com.
If
you
have
any
questions
want
any
links
to
that
downstream
repo
or
follow-up
ian
and
joondal,
we
will
loop
them
in
on
those
emails
if
they
come
in.
B
B
We
certainly
enjoy
it.
These
these
issues
are
a
lot
of
fun
to
to
dig
into,
and
yeah
folks
are
always
welcome
to
reach
out
to
us,
we're
glad
for
for
questions
and
comments
and
yeah
would
love.
A
Thanks
for
joining
us
yeah,
it's
been
magnificent:
okay,
we're
going
to
roll
the
intro
as
an
outro.
Once
again,
I
think
we're
going
to
stick
with
it
at
this
point.
It's
just
just
habit
and
we'll
see
everyone
in
two
weeks
with
a
fresh
new
topic
that
isn't
on
the
top
of
my
mind
or
the
tip
of
my
tongue
right
now,
but
we
do
have
it
scheduled
already
so
see
everyone
in
two
weeks,
thanks.