►
From YouTube: 2020-04-30 - Cluster API remediation meeting
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
B
Okay,
so
so
there's
basically
two
two
proposals
that
were
submitted:
the
reboot
remediation
one
is
the
more
more
recent
one
from
was
spinned
of
fire
on
through
from
Red,
Hat
and
I.
Guess,
I,
don't
know
any
or
if
you
were,
how
involved
you
were
with
that
one,
and
then
there
is
the
the
second
one,
that
is
the
external
remediation
controller,
that
the
idea
is
basically
to
move
move
the
part
of
the
remediation
part.
That
is
at
the
moment,
just
deleting
machines
from
the
machine
health
check
in
copy
into
an
external
provider
remediation
controller.
B
So
we
would
like
to
well
discuss
both
if
we
have
time
but
probably
focus
first
on
the
on
the
first
one,
because
it's
much
more
straightforward
and
like
less
things
to
change
with
regard
to
the
with
a
current
the
current
states
in
in
in
cluster
API
I,
don't
know
if
you
want
to
take
over
now
file
for
this
for
this
proposal.
A
C
Sure
so,
basically
we
are
coming
from
bare
metal
world
where
we
have
bare
metal
servers
which
are
which
reprovision
them
I
deleted.
The
machine
means
that
we
need
to
reprovision
it
and
the
usually
takes
a
lot
of
time
could
be
hours.
We
need
to
download,
maybe
some
images
etc
so
felt
like
a
cloud
provider
where
you
delete
the
machine
and
we
provision
and
you
want
in
just
a
few
seconds
or
minutes,
so
we
would
like
to
avoid
machine
deletion,
it's
possible.
C
So
at
first
we
would
like
to
try
maybe
to
reboot
the
O's.
Maybe
a
reboot
will
be
quicker
and
maybe
it
will
resolve
the
transient
issue
that
led
to
the
fail
to
the
health
check
fail
and
if
it
doesn't,
maybe
then
we
might
want
to
reprovision
it
again.
But
the
main
motivation
for
us
is
to
have
to
start
with
a
reboot
in
order
to
save
valuable
time
in
transient
errors
where
we
want
to
remediate
the
machine.
Does
that
make
sense?
It.
D
B
Were
so
maybe
like
considering
a
bit
some
kind
of
escalation
path,
you
know
that
you
would
try
like
steps.
That
would
be
like
stronger
remediation
every
time
like
so,
for
example,
you
would
try
a
couple
of
times
to
reboot,
maybe
like
once
or
twice,
and
then,
if
it's
not
successful,
then
you
would
take
a
stronger
action
that
would
be
provisioning
and
if
it
still
doesn't
work,
then
maybe
a
machine
deletion
deletion.
That
means
that
you
probably
end
up
on
another
host
and
it
could
like
fix
the
problem
that
you're
hitting.
B
So
basically,
like
wait,
wait,
wait.
The
first
goal
that
we're
trying
to
achieve
is
is
just
just
introduce
something
that
is
less
radical
than
machine
deletion,
like
that's
the.
What
is
behind
this,
this
first
proposal
about
the
this
reboot
remediation,
maybe
not
completely
like
taking
out
the
taking
the
remediation
part
out
in
its
own
provider,
but
at
least
allowing
for
another
way
to
another
way
to
try
to
to
to
resolve
the
problem
that
we're
having
with
with
the
machine.
C
Okay,
so,
first,
first
of
all,
we
are
using
metal
cube
cluster,
where
we
use
ipmi
protocol
to
talk
with
the
hardware
and
power
it
off
and
power
it
on.
We
are
connecting
the
using
the
management
interface
of
the
hardware
and
we
would
like
to
first
the
flow.
It
basically
goes
like
this
machine
is
being
detected
to
be
unhealthy.
We
want
to
power
it
off.
C
C
Since,
if
we
wouldn't
do
this,
we
might
end
up
with
corruption
where
we
delete
the
node
from
the
cluster.
But
maybe
this
node
is
still
running
somewhere
and
mineral.
The
scheduler
assigned
this
workload
to
something
else,
some
other
node
and
we
might
end
up
with
corruption
for
stateful
apps
that
have
run
only
once
one
one
instance,
so
I.
A
C
Yes,
we
first
saw
that
the
machine
has
two
controller
detected
that
it's
unhealthy.
Then
we
power
off
the
host
and
that
this
is
our
way
to
make
sure
that
the
node
is
really
down.
It's
not
just
hon
LC
and
maybe
still
writing
data
into
some
storage
or
something
else.
So
only
after
we
power
Yost
off,
and
we
know
that
it's
off,
we
can
safely
assume
that
we
can
delete
the
node
from
the
cluster.
A
C
B
No
because
the
the
the
actual
power
cycling
of
the
of
the
of
the
infrastructure
machine
is
depending
on
the
infrastructure
provider
so
like.
If
we
are
talking
about
a
metal
cube
node
for
example,
then
we
will
use
IPMI
to
shut
it
off,
but
it
will
probably
be
a
different
procedure
so
to
do
that
on
any
other
infrastructure
provider.
Yeah.
E
I
think
this
is
one
I
thought
there
is
that
I,
as
providers
will
have
a
VM
they'll,
have
their
own
fault,
tolerant
capabilities
and
services
running,
and
it
this
might
cause
conflict
if
we're
treating
bare
metal
and
IBM
the
same
way,
hey
I,
guess
I,
don't
know
how
its
implemented.
It's
just
curious
to
me
if
this
could
be
done
where
it's
provider-specific.
So
if
it's
bare
metal,
you
would
maybe
do
this.
If
it's
a
BM,
you
would
not
do
this.
You
would
rely
on
the
services
of
biiss
hypervisor.
B
F
It
was
close
because
it's
an
issue
basically
along
this
line.
Some
things
support
reboot
some
things
or
power.
Cycling,
maybe
not
vice
versa,
and
the
thing
I
came
up
with
was
the
infrastructure
provider
provides
a
list
of
capabilities,
those
get
synced
to
the
Machine
and
then
well-behaved
clients
can
see.
Ok.
What
capabilities
is
this
infrastructure
provider
support
and
then
I'm
going
to
make
some
change
the
machine
and
that
will
get
carried
out
down
the
line
so
I
can
link
today.
F
Client,
it
could
be
the
machine
help
checker.
It
could
be
something
one
idea:
I
had
was
like
a
snapshot
controller
right.
So
if
you
want
to
take
a
snapshot
of
it,
since
you
need
to
know,
if
you
can
power
down
on
a
WS
right
most
standard
instances,
that's
that's
pretty
easy.
They
can
be
stopped
started,
however,
on
spies.
This
is
they
don't
support
stopping,
and
so
that
would
not
be
a
valid
behavior,
and
so
that
would
be
up
to
the
infrastructure
provider
to
determine
based
on
this
individual
machine.
F
Yes,
and
in
that
way,
your
remediation
control,
like
a
snapshot
thing,
can
look
at
those
fields
and
say:
am
I
allowed
to
do
this
action
that
I
want
to
do
or
in
the
case
of
the
Machine
health
check
controller,
it
might
have
different
logic
for
some
platforms,
support
native
reboot
and
some
platforms
only
support
power
on
power
off.
It
can
ingest
that
and
then
you
know,
make
a
intelligent
decision.
C
F
F
Maybe
some
instances
are
like
integrated
with
some
kind
of
storage
platform
and
I
I
want
to
give
those
a
good,
old-fashioned,
reboot
first,
so,
for
whatever
reason
yeah
this,
this
would
still
be
there.
It
would
just
be
informing
the
client
like
if
the
Machine,
how
checker
says
I
want
to
this
needs
to
be
rebooted,
but
it
doesn't
support
reboot
well,
I'm,
not
gonna,
try
to
go
through
the
steps
and
reboot
this
thing:
cuz,
it's
not
going
to
work
I'm
going
to
maybe
throw
an
error
of
some
kind
or
whatever.
G
A
Do
think
that
I
know
that,
like
without
discussing
or
debating
the
merits
of
adding
power
states
to
machines,
I
think
you
probably
could
get
the
same
type
of
behavior
that
you're
interested
in.
If
we
did
external
remediation
because
then
you
could
have
separate
code
that
is
managing
the
power
states
of
your
infrastructure
machines
right.
B
G
G
External
I
was
wondering
whether
a
reboot
could
be
something
that
a
user
may
just
want
to
do
on
a
machine
or
some
other
controller
other
than
the
machine
health
checker
might
want
to
do
to
a
machine
for
some
reason
so
like
actually
having
some
way
to
request
a
reboot
for
a
machine
that
is
external
from
hc1
means
that
the
machine
health
checker
doesn't
need
to
know
anything
about
the
machine,
but
two
means
it
could
be
reused
elsewhere
by
other
people
as
well.
So
that
might
be
an
alternative
that
could
be
added
to
this
and
investigated.
F
So
you
basically
need
to
come
up
with
some
kind
of
API
that
says:
I
want
to
reboot
this
thing
and
have
I
already
Requested
that
so
I'm
not
just
constantly
hammering
it,
but
there's
kind
of
a
disconnect
between
requesting
and
reboot
getting
a
reboot,
especially
there's
multiple
actors
in
place.
And,
yes,
we
also
want
to
add
the
ability
for
eight
end
user
to
potentially
reboot
a
machine
or
some
other.
A
What
about
the
fact
that
not
all
infrastructure
providers
support
the
same
set
of
power
states
I
mean
you
mentioned
spot
instances,
don't
necessarily,
and
there
may
be
other
providers
that
have
mismatches.
Are
we
potentially
opening
up
the
door
for
this
to
be
a
bit
messy
just
because
people
could
get
confused
with
what's
available
and
what's
not
as
it
pertains
to
remediation
and
power
states?
Well,.
F
That's
the
infrastructure
capabilities
things.
I
was
talking
about
earlier,
where
the
infrastructure
provider
determines
what
can
I
actually
do
with
this
machine
and
I'm
going
to
copy
those
capabilities
to
the
machine
object,
or
rather
the
machine
object
copies?
Then
you
know
status
field
synching
like
we're
doing
with
some
of
the
other
things
and
I
have
a
PR
out
for
this.
Actually
in
so
this
way,
well-behaved
clients
and
administrators
can
refer
to
that
status.
Feel
of
a
machine
like
what
can
I
actually
do
it.
This
does
it
support
reboot
that.
D
F
Exactly
so
it
it's
only
gonna
support
capabilities
that
we
know
about
and
can
advertise
I
mean
we
could
I
mean
we
could
open
it
up
broadly
and
just
sync
capabilities
from
the
infrastructure
and
not
care
what
those
capabilities
are
called
and
then
third-party
actors,
if
they
happen
to
know
what
one
of
those
capabilities
is,
they
can
assume
it.
The
PR
that
I
have
out
right
now
is
using
constants,
so
the
power
off
power
on
kind
of
stuff.
So
it
is
a
finite
set,
but
from
a
technical
point
of
view,
there's
nothing
really
stopping
infinite
set.
F
It
just
might
start
to
kind
of
get
in
the
weeds
about
trying
to
control
the
behavior
of
the
machine
controller
itself,
whereas
it's
gonna
have
to
know
about
some
of
those
capabilities,
particularly
around
like
power
and
stuff,
like
that,
depending
on
how
we're
actually
going
about
setting
the
power
state.
If
that
becomes
part
of
the
spec,
then
the
Machine
controller
needs
to
know
something
about
a
power
off/on
capability,
so
it
can
try
to
help
validate
or
actually
enforce
that
state.
H
So
how
do
we
deal
with
potential
conflicts
between
different
components
and
their
expectations
around
these
behaviors,
for
example,
you
know
looking
at
a
cloud
provider
for
AWS,
we
can
easily
implement.
You
know
additional
power
state
it's
there,
but
we
would
also
be
conflicting
with
the
cloud
controller
manager
and
behaviors
that
it
implements.
So
how
would
we
ensure
that,
as
we're
adding
these
features
that
we
don't
end
up
with?
You
know
conflicting
services
trying
to
take
actions
against
the
cluster.
F
Well,
I'll
respond
to
that.
This
is
actually
something
I
brought
up
yesterday
at
the
meeting,
I've
got
an
upstream
cap
called
node
maintenance
lease.
So
when
you
have
multiple
actors
that
are
not
necessarily
related
to
cluster
API
that
want
to
do
disruptive
things
to
a
node
that
we're
putting
some
information,
that's
tied
to
the
node,
rather
the
Machine
saying,
hey
I'm
got
control
this
node
right
now,
I'm
going
to
disrupt
it
in
some
way
or
I'm,
going
to
attempt
it
from
being
disrupted
and
we're
all
coordinating
on
that.
F
So
if
two
components
want
to
do
some
kind
of
power
state
management,
the
first
thing
they
should
do
is
check
for
that
maintenance
lease
to
make
sure
that
somebody
else
isn't
actively
working
on
something
in
our
particular
use
case.
If
we're
talking
about
say
the
Machine
health
checks
with
the
administrator
wants
to
take
the
Machine
down
for
maintenance,
then
we
need
to
have
some
way
to
coordinate
those
things.
We
don't
want
to
take
the
Machine
down
for
maintenance
by
an
admin
saying
hey
power
off,
while
at
the
same
time
the
Machine.
F
How
chugger
says
hey
what
happened?
This
node
I'm
gonna
delete
it
right.
We
don't
want
those
two
things
to
happen,
so
we
need
some
way
to
coordinate
that
action
already,
but
what
better
place
to
do
it,
then,
at
the
node
that
way,
other
things
unrelated
to
cluster
API
can
be
part
of
the
same
extraction.
I.
H
Don't
disagree
there.
The
only
complication
is,
is
that
that
doesn't
exist
upstream
yet,
and
our
support
kind
of
contract,
around
management
clusters
and
workload
clusters
are
such
that
that
wouldn't
necessarily
be
in
place.
Yet
so
do
we
make
that
a
requirement
and
a
new
minimum
version
requirement
on
either
the
management
cluster
of
the
workload
cluster
is
needed
to
support
that?
Well,.
F
There's
a
couple
different
ways
that
we
can
go
about
doing
this,
so
the
input
that
I
received
from
Clayton
was
I
originally
just
want
to
make
this
inner
patient.
That's
always
he
came
up
with
using
these
lease
objects.
That's
already
in
primitive
in
kubernetes
for
a
while.
So
it's
like
a
core
core
type,
so
we
wouldn't
necessarily
have
to
wait
for
a
release
of
kubernetes.
We
could
do
one
of
those
things
of
exists
or
create
if
we
want
it
to
roll
out
the
feature
ahead
of
time.
A
A
B
A
C
A
I
haven't
had
a
chance
to
add
this
comment
yet,
but
I'll
say
it
here:
I
do
think
that
we
could,
with
external
remediation,
when
the
external
actor
has
done
its
work,
it
could
go
back
to
the
MHC
and
annotate
it
to
say.
I
finished
my
work,
please
reevaluate
and
then
the
MHC
II
could
go.
Make
that
determination
again
to
reprocess
it
rather
than
putting
the
burden
on
the
external
or
mediator
to
validate
that
it's
healthy
and
then
remove
the
annotation
and
hope
that
MHC
doesn't
read
it
it's
bad,
because
the
timing
is
slightly
off.
B
That
could
even
be
independent
of
the
remediation,
like
that
the
Machine
health
check
would
just
put
the
annotation
and
leave
it
there
until
the
node
is
healthy
again,
then
it
would
be
on
the
remediation
controller
side
to
like
half
the
proper
timeouts
in
place.
Saying
like
I,
expect
that,
like
I,
don't
know
like
five
minutes
after
rebooting,
the
Machine,
the
annotation
or
whatever
should
be
gone
right.
If
it's
still
there,
it
means
that
might
step
failed
and
I
should
then
and
like
take
a
further.
So
the
step,
probably.
A
F
F
G
If
I'm
a
chip
in
there,
I
was
thinking
about
this
problem
earlier,
cuz
I've
had
a
few
conversations
this
week
with
various
different
people
about
this.
The
conclusion
that
I
came
to
is
perhaps
the
external
remediated
reboot
II
thing
II
should
leave
the
annotation
on
the
machine
that
says
this
needs
a
reboot
until
either
some
timeout
happens
say
it
thinks
it
turned
the
machine
back
on
15
minutes
ago.
G
It
then
says:
okay,
this
probably
isn't
great,
or
it
detects
that
the
load
has
come
back,
because
once
the
node
has
come
back
and
re-register
the
machine
health
check
should
be.
It
was
redo.
It's
like
proper
health
checking
and
you
would
assume
it's
either
gonna
be
healthy
or
not.
The
conditions
on
it
will
be
relevant
again,
because
the
time
stamps
would
have
just
been
updated
because
it's
a
new
node
so
like
as
Nia,
was
saying
a
few
minutes
ago
like
if
we
can
keep
the
annotation
on
there
until,
like
the
the
node
comes
back.
B
So
like
whether
the
the
machine
remediation
takes
steps
to
remediate
or
not
as
long
as
the
machine
has
check,
finds
the
node
unhealthy,
it
would
leave
the
annotation
there
and
then,
if
the
machine
has
health
check
figures
out
that
the
node
is
now
healthy
again,
it
would
then
remove
this
annotation.
Hence
the
the
Machine
remediation
controller
would
know
that
the
remediation
actually
succeeded.
B
A
A
B
Yes-
and
that
was
part
of
actually
have
the
problem-
well,
I'm.
Sorry
I
have
no
idea.
Why,
like
all
this
text
was
like
strikethrough,
but
initially
the
idea
was
like
to
to
try
to
put
on
to
have
a
status
like
put
to
Don
one
way
or
another
so
that
to
be
able
to
give
the
the
status
back.
But
there's
two
people
who
raised
the
hand
Michael
first.
F
G
So
I
was
gonna.
Add
the
I'm
not
sure
a
a
two-way
communication
would
actually
be
necessary
because
the
Machine
Hale
checker
watches
for
events
on
nodes
and
machines
and
you'd
expect
that
when
it's
becoming
healthy
again,
one
of
those
would
trigger
a
reconciliation
for
the
Machine
of
check.
Anyway.
Good
point.
C
A
B
Well,
the
the
reason
we
we
proposed,
the
CID
at
first
is
well
from
our
perspective.
It
kind
of
looks
a
bit
cleaner
with
an
an
actual
API
that
we
could
validate
and
and
make
sure
that
everything
flows.
What
we
are
expecting
and
also
like
it
would.
It
would
allow
us
to
have
the
same
principles
as
there
is
now
for
the
infrastructure
provider,
for
example,
that
we
would
have
the.
A
G
A
B
I'm,
sorry,
would
it
be,
then,
like
does
it
matter
whether
like,
if
we
have
multiple
external
remediation
provider
under
the
hood
like
they,
so
they
would
all
have
to
be
watching
the
the
same
C
or
D.
If
we
have
CR
D,
so
it
seems
that
may
be
clearer
contract
than
having
them
watch.
Another
object
just
to
check
the
annotations.
C
B
A
Bootstrap
provider
is
in
practice.
You'll
probably
only
have
one
that
you're
using
per
cluster,
but
because
the
bootstrap
provider
or
configuration
details
is
a
reference
to
another
resource
in
the
namespace.
You
theoretically
could
use
different
routes,
bootstrap
providers
for
a
single
cluster,
so.
B
We
could
have
a
kind
of
a
similar
approach
if
we
put
in
the
in
the
mission
has
check
a
CR
that
is
created
for
the
machine
has
check.
We
can
link
to,
for
example,
a
template
for
a
provider
specific
CR
D,
so
that
then
the
remediation
CRD
that
would
be
created
would
be
directly
the
the
external
remediation
provider,
C
Adi.
A
Yeah
we
could
go
that
route
if
there's
a
need
to
have
provider
specific
information
in
a
remediation
request.
Oh
sorry,
I
was
gonna
say.
Alternatively,
if
we
don't
need
something,
that's
provider
specific,
we
could
have
a
generic
machine,
remediation,
CR
D,
and
if
the
assumption
is
that
infrastructure
providers
are
implementing
remediation,
then
they
could
all
watch
all
the
Machine
remediation
custom
resources
and
only
act
on
the
ones
that
are
for
machines
that
they
are
are
owning
and
provisioning.
B
Internally,
in
the
like,
what
we
are
planning
for
metal
cube
like
we
are
anyways
like
planning
to,
even
if
we
use
annotation
from
the
mission,
has
checked
to
transfer
the
information
towards
the
external
remediation.
We
are
anyway
spending
in
like
to
have
CRT.
That
would
be
specific
for
our
external
remediation
controller.
So
if
we
can
make
this
like
more
general
in
the
way
that
it
would
be
like
kind
of
a
template,
let's
say
that
we
would
have,
as
we
have
now
a
cube.
B
A
DM
config
template
right
that
then
the
Machine
ethic
health
check
could
directly
generate
the
the
machine
remediation
CR,
the
CR.
Sorry,
then
we
would
be
able
to
skip
the
skip
the
the
step
of
the
annotation
and
store
all
the
states
that
we
need
in
internally
in
this
reconciled
reconciliation
provider
in
that
CR
D.
And
if
we
have
a
clear
contract,
then
the
machine
has
check
would
be
able
to
fetch
from
the
status
whatever
field
is
needed.
The
same
way
that
the
machine
controller
do
with
the
with
the
infrastructure
provider,
machine
I.
A
I
I
mean
plus
one
to
get
templating
in
place.
This
is
what
kind
of
what
we
discussed
like
last
week
as
well.
I
had
like
an
infrastructure
provider,
specific
C
or
the
the
like
then
could
be
created
on
demand.
This
would
like
effect.
We
give
it
a
way
to
for
external
remediation,
but
users
can
attach
any
object
to
to
a
machine
health
check
resource
and
then
the
whole
lifecycle
to
go
from
unhealthy
to
either
back
healthy
or
deletion
would
be
in
the
hands
of
the
external
controller.
I
We
have,
we
haven't,
discussed
sorts
short-circuit,
but
I
mean
if
we
I
guess
like.
If
we
create
like
a
new
remediation
request,
we
can
potentially
like
the
final
contract
that,
like
I,
don't
know,
maybe
says
it's
a
like
a
ready
state
like
something
well
I
already
have
today,
so
that,
like
we
can
understand
if,
like
the
new
object
or
we
create
a
sobriety
or
not,
and
if
it's
ready,
we
can
reevaluate
yeah
we'll
need
to
also
add
waters
and
things
like
that.
That's
up
right,
yeah,.
A
I
think
it's
maybe
fairly
simple
just
to
say
if
you
are
a
controller,
that's
managing
multiple
replicas,
which
applies
to
kcp
and
machine
set
and
you're
evaluating
your
machines.
You
look
to
see
if
one
of
the
machines
is
annotated
as
unhealthy
and
if
you
encounter
that
you
just
well
I
guess
if
it's
doing
the
delete
remediation,
the
controller
would
do
that
itself.
Right,
yes,
like.
I
I
think
like
we
should
offer
still
the
deletion
as
the
default
case
like
the
regardless,
like
we
can,
just
like,
add
a
new
annotation
that
says
like
the
strategies,
external,
don't
do
anything
and
then
measure
the
machine
set
or
kcp
or
whoever
else
like
looks
at
piece
as
we're
saying,
and
then
that
just
looks
to
see
if,
like
the
strategy,
is
either
empty
or
it's
not
there,
and
if
it's
anything
else
like
that,
you
won't
do
anything.
It
would
just
expect
something
else
to
take
care
of
it.
A
Basically
we
needed
some
pre
delete,
behavior,
which
ties
into
what
Michael
is
asking
for
it,
and
one
of
the
other
proposals
where
we
needed
to
be
able
to
manipulate
@cd
before
we
delete
and
if
the
only
the
only
actors
involve
our
MHC
marking
it
as
unhealthy
and
delete
controller
deleting
it
as
the
remediation,
then
there's
no
opportunity
for
kcp
to
easily
remove
it
from
from
NCD,
or
maybe
there
is
that
I
didn't
come
up
with
it.
Well.
I
D
Think
it's
ever
gonna
be
as
simple
as
like
just
delete
well,
even
with
just
like
just
react
to
anything
with
simple
scale
down
and
removed
from
sed
logic,
and
so
maybe
like
what
it
means.
It's
like
deletion
strategy
only
makes
sense
on
machine
sets
and,
like
you
know,
Casey
Key
has
like
its
own
remediation
strategy.
A
F
A
J
Ratio
thing
that,
in
my
opinion,
the
deletion
should
be
or
is
in
charge
of
the
only
controller,
so
it
should
not
be
considered
as
a
strategy
as
a
remediation
strategy.
So
basically
the
default
is
there
is
no
remediation
and
we
kick
into
the
dish.
Why
reboot
or
whatever
our
elimination
strategies
probably
is
distinction.
I
I
Oh
I
see
the
way
I
think
we
should
probably
keep
the
same
remediation
strategy.
It's
like
the
owning
in
the
case
of
like
external
remediation.
If
the
machine
is
live
externally,
the
okay
CP
is
able
to
react
to
that
and
delete
that
machine
from
Exedy
today,
but
you
want
to
scale
down,
but
it
will
be
able
to
reconcile
the
HDD
members.
A
Right
so
I
see
your
comment
about.
If
hooks
existed,
then
we
wouldn't
need
any
external
or
mediation
controller.
The
I
think
the
one
issue
there
is
that
you
like
we
had.
We
hooks,
are
like
pre
drain
and
pre
delete
which
both
in
the
current
flow
require
that
you
hit
delete
or
that
you
delete
a
machine
and
that
it's
it's
you
know
pending
deletion,
so
I'm,
not
sure
that
hooks
would
strictly
solve
the
problem.
A
G
That
was
I,
so
hate
finish
before
I
meant
to
that.
No,
it
was
more
that
the
external
remediation
controller
and
the
MHC
wouldn't
need
to
know
what's
running
on
the
machines.
They
wouldn't
care,
if
it's
like
a
case
EP
or
if
it's
got
some
storage
or
something
like
our
oh
yeah,
like
that's
quite
a
nice
benefit
of
doing
the
hook
thing
is
that
we
we
just
need
to
come
up
with
some
mechanism,
but
allows
these
hooks
to
run
before
the
remediation
controller
does
its
remediation
like
for
deletion.
That's
easy!
G
C
C
A
Think
that
I
think
the
external
remediation
is
going
to
end
up
being
more
powerful
and
give
you
the
escalation
path.
That
I
know
it's
listed
as
a
non
gold,
but
I
think
if
we
go
and
just
try
and
do
the
power
based
recovery
that
we
need
to
do
the
power
primitives,
we
need
to
add
them
to
the
Machine
API,
which
I
think
is
going
to
be
a
harder
sell
than
and
getting
this
work
done
with
an
external
remediate.
F
F
Arcing
down
right,
we
think
we're
aiming
to
bring
it
back
in
the
same
exact
configuration
that
it's
already
in
so
I,
don't
really
see
a
need
other
than
draining
user
workloads.
You
know
it
doesn't
reboot
it
in
so
I,
don't
see
where
all
the
hooks
are
necessary
for
that.
We
could
definitely
come
up
with
something
like
that,
but
that's
my
impression.
B
If
I
may
to
react
to
this,
the
idea
of
like
doing
it
with
an
external,
an
external
controller,
would
be
that
we
are
not
only
able
to
just
say,
like
yeah
reboot,
but
then
also
see
that
if
the
reboot
failed,
then
we
would
go
to
a
like.
We
actually
would
have
a
path
defined
by
the
controller
on
the
actions
that
we
would
yeah.
We
would
take
yes.
A
A
B
I
I
I
It's
not
the
machine
controller
that
deletes
itself,
but
it's
the
owner
so
like,
for
example,
machines
had
in
case
you'd,
be
that
deletes
the
machine.
So
this
is
to
allow,
for
example,
image
image.
She,
because
he'd
been
to
remediate
that
machine
properly
by
doing
scale
up
before
scale
down
or
we'll
be
moving
from
a
TV,
and
things
like
that.
A
G
A
I'm
fine
like
if
we
can
put
together
a
flow
in
the
future
where
we
have
a
pre
drain,
pre
delete,
hook:
option
where
the
MHC
issues
the
delete
and
case
EP
does
a
hook
where
it
says:
go:
remove
this
from
Ed
CD.
If
we
can
make
that
work,
sure
but
I
think
we're
in
a
transitional
state
as
you
mentioned
Joel.
So
this
will
allow
us
to
move
forward
and
experiment.
A
little
bit
see
how
things
go
and
we
can
certainly
revisit
I
mean
so,
as
I
said,
Michael
I'm
fully
on
board
with
the
hooks.