►
From YouTube: OpenShift Commons Gathering 2019 Santa Clara Future of CGroups Breard, Heo & Brandenburger
Description
OpenShift Commons Gathering 2019 Santa Clara
Future of Linux Control Groups
Ben Breard Red Hat
Tejun Heo, Facebook
Filipe Brandenburger, Google
A
A
I've
been
mainly
focused
on
everything,
we're
doing
on
the
core,
OS,
relic
or
left
side
of
the
house,
and
you
know
kind
of
how
we're
tying
into
OpenShift
that
was
kind
of
talked
about
earlier
this
morning
also
spent
a
lot
of
time,
working
on
system,
D
and
some
large
container
technologies
and
so
forth,
and
we're
gonna
talk
about
a
super
important
topic
today.
If
you
guys
are
familiar
with
control
groups,
raise
your
hand,
has
anybody
ever
heard
of
these?
So
like
half
the
room?
A
No,
no,
no
leave
it
leave
your
hands
up
now,
if
you've
ever
logged
into
a
Linux
box
within
the
last
five
years,
it's
a
little
I'm,
not
I'm
a
little
surprised
at
that,
but
okay,
now
what
about?
If
you've
logged
into
Facebook
come
on
guys
or
use
some
of
Google
services
every
hand
in
this
room
should
be
alright,
that's
pretty
much
everybody!
Okay,
so
you
have
all
interacted
with
Louis
control
groups,
whether
you've
known
it
or
not.
Right.
This
is
one
of
the
primary.
A
You
know
kernel
api's,
that
we
use
for
containerization,
and
you
know
isolation,
accounting
and
all
this
kind
of
stuff.
So
anyway
with
us
today,
we're
really
excited
that
these
guys
came.
So.
Thank
you
guys
for
being
it's
a
big
deal
that
they're
here
so
I'm.
Sorry,
hey,
June,
he's
the
upstream
maintainer
for
control
groups
from
Facebook,
so
you
know
tell.
C
If
you
think
about
a
web
server,
you
know
it's
a
it's
a
from
our
production
web
server,
so
we
have
a
lot
of
restaurateurs,
but
everybody
has
a
lot
of
web
servers
right.
If
you
think
about
the
web
server,
you
know
larger
flip.
It's
not
gonna
just
have
that
web
server
right.
It's
gonna
have
a
lot
of
my
training
in
a
sheriff
or
other.
You
know,
machine
maintenance
and
and
all
those
stuff,
and
sometimes
like
those
things,
go
wrong
right.
C
You
run
chef,
you
know
it
runs
yum
and
whatever,
and
somebody
you
know,
makes
one
innocuous,
one-line
change
and
sometimes
they're
just
leaks
a
lot
of
memory
right.
Imagine
that
happening,
so
this
is
kind
of
simulating
that
so
the
the
purple
line
are
just
consider
a
little
purple
line.
There's
the
IPS
leakers
per
second
right
right,
so
Ted
line
is
what
we
are
doing.
What
happens
when
you're
doing
load
testing
so
the
tag
web
server
is
now
fully
loaded.
C
So
if
you
look
at
the
first
red
line
this,
you
know
where
we
start
ten
mega
BPS
memory
leak
in
the
part
of
system
which
is
just
you
know,
support
support
part
of
the
system.
So
it's
not
the
main
workload,
but
it's
a
management
part
and
it
starts
leaking
memory.
I
feel
about
I,
don't
know
four
or
five
minutes.
You
know
it
consumes.
Whatever
is
left
in
terms
of
memory
in
the
system
and
the
system
started
thrashing
right
and
then
it
dips,
because
you
know
there's
no
memory,
and
this
is
a
hard
disk
machine.
C
So
hard
disk
is
really
slow.
If
you
run
out
of
memory,
you're
accessing
hard
drive
and
it's
slow,
so
it
dips
and
the
content
reclaim
some
memory.
It
manages
to
come
up
again,
then
it
comes
down
again
and
it
just
dies
there.
That
flat
line
is
just
loss
of
data
points
right.
The
Machine
completely
checks
out
after
a
while,
so
we
disabled
our
de
mediation
mechanism
like
an
outside
Mine
Train,
so
it
stays
stay
down
longer.
C
It
will
come
a
little
bit
sooner,
but
you
know
it
still
the
same
thing
after
about
I,
don't
know
half
an
hour.
The
machine
called
me
booty
and
then
it
comes
back
up
again
now.
Imagine
like
this
happening
synchronized
across
a
lot
of
machines,
and
that
does
happen
in
the
fluid
right.
It's
kind
of
surprising.
Sometimes
when
that
happens,
it's
kind
of
really
scary,
but
you
know
some
hose
changes
some
bugs
trigger.
At
the
same
time,
that's
really
scary.
C
So
if
this
happens
in
a
lot
of
machines
at
Facebook
Facebook,
it's
going
down
right,
nobody's
gonna,
be
happy
everybody's
getting
paged.
So
it's
not
a
happy
situation.
Now.
Look
at
the
queen
line,
that's
the
same
thing.
I
said
doing
exactly
the
same
testing,
but
with
resource
control
setup
to
protect
them
main
workload
from
the
rest
of
the
system.
So
the
first
first
10
megabyte
line.
We
started
the
same
thing:
it
dropped
a
bit
drops
a
bit
and
the
covers
right
and
it's
completely
fine.
So
we
started
it
again.
Another
leak,
that's
the
same
thing.
C
C
So
you
know
from
purple
to
green.
It's
a
lot
of
improvement
right.
We
all
want
to
have
that.
So
we
are
resource
control
group
at
Facebook,
and
this
is
our
mission
statement.
We're
conserving
full
OS
resource
isolation
to
unpack
that
a
bit
we're
conducting.
Is
that
that
we
don't
want
to
pay
right?
We
want
we
want
to
have
resource
isolation,
but
we
don't
want
to
pay
overhead
nominally
right
like
if
we
go
into
like
in
our
production,
tiers
and
ask
them
yeah.
C
C
You
know
teams
and
ask
them
to
put
like
severe
restrictions
on
their
locations.
We
want
them
to
be
able
to
keep
doing
whatever
they've
been
doing,
and
we
want
to
layer
resource
isolation
transparently,
on
top
so
test
our
goal,
and
if
you
think
about
that
right,
I
mean
because
sounds
simple
right.
If
you
have
control
groups,
which
we
says
that
you
know
it
can,
categorize
workloads
and
distribute
resources
should
be
easy,
so
the
term
epitaxy
is
what
we
use
for
those
management
part
of
the
system
right.
C
So
every
host
in
our
fleet
has
to
pay
a
tax
to
be
inside
Facebook
inside
Facebook
flit,
so
there's
ap
packs,
and
so
the
project
became
a
P
tax
right.
We
all
wanna.
We
want
to
protect
the
manual
close
from
malfunctions
in
the
text
part,
and
we
chose
this
project
because
it's
minimum
right
I
mean
if
you
have
working
resource
isolation.
This
should
be
possible,
and
this
is
the
minimum
you
can.
You
should
be
able
to
achieve.
So
this
is
the
minimum
viable
product
in
terms
of
memory
and
IO
isolation.
C
We
didn't
get
to
I
mean
we
mostly
invested
in
investigated
memory
and
IO
isolation,
because
you
know
CP
is
relation,
is
easier
and
more
difficulty
now
in
a
different
term.
So
we
concentrate
on
memory
and
IO
for
this
project
and
these
are
the
requirements.
So
when
something
misbehaves
in
system
applies
or
in
the
rest
of
the
system,
which
is
not
a
main
workload,
the
main
workload,
the
impacts
on
many
workload
should
be
limited
right.
The
main
workload
should
be
able
to
survive.
C
It
might
not
be
100%,
you
know
perfect,
but
you
know
the
impact
should
be
like
1020
percent
for
a
short
while
so
that
you
know
the
fully
can
stay
stay
up
and,
as
I
said
before,
we
didn't
want
our
applications
to
be
changed
at
all.
We
wanted
visual
control
to
be
layered
on
top
transparently
and,
of
course,
we're
conservation.
We
don't
want
any
performance
regression
like
we
can
now
sell
this.
If
we
they
have
to
pay
five
ten
percent,
it
just
doesn't
fly
so
uhm
chorus
sounds
simple
right.
C
A
C
Recording
so
so
one
of
the
problem
was
that
I
don't
see
any
clock
here.
Let
me
undo
okay
sure
so
one
of
the
problem,
so
there
were
a
lot
of
challenges
in
different
areas,
but
like
the
biggest
one
or
like
the
one
in
terms
of
memory
management,
memory
control
was
that
just
like.
If
you
look
at
a
single
one
right,
there
are
two
knobs
right
in
terms
of
memory
control,
one
is
memory
limited
bytes
and
the
other
one
is
soft.
C
Limiting
bytes
right
there
like
a
subtle
differences
but
like
what
they
eventually
do
ultimately
do
is
putting
a
hard
cap
on
how
much
memory
testicle
can
consume,
and
this
thing
will
really
work.
Well,
so
we
we
try
to
use
it.
I
mean
it's
the
obvious
thing
to
do.
Right:
I
want
to
protect
main
workload
from
the
system
that
slice,
so
we're
gonna,
put
memory
limit
on
system
that
sliced
and
there
should
be
fine,
didn't
really
work
out
because
it
turned
out
that
under
maximum
load,
machines
are
often
oversubscribed
right.
C
I
mean
not
constantly
I
mean
if
the
machine
is
constantly
oversubscribed.
It
cannot
sustain
the
workload
weight
right,
but
it
would
nominally
stop.
Oversubscribe
can
parolee
here
and
there
right
when
something
happens,
right
and
and
in
my
trouble
bit.
But
you
know
the
machine
would
be
able
to
sustain
that.
The
problem
with
like
putting
hard
limits
on
memory
consumption,
is
that
if
you
put
it
too
low
right,
if
you
restrict
the
measurement
part
too
hard,
then
the
system
will
we
suffer,
because
the
management
part
is
constantly
flashing
right.
C
So
if
you
said
memory
limits
on
something
and
and
that's
smaller
than
if
that's
smaller
than
lower
than
is
working
natural
working
set,
it's
gonna
generate
a
lot
of
BIOS
because
you
know
doesn't
have
a
lot
of
memory.
So
you
know
color
memory
measurement
Cup
kicks
in
and
kicks
out.
You
know
what
it
thinks
to
be
cold
pages,
which
are
actually
active,
working
sad
and
you
know
soon
after
you
try
to
fold
them
back
in
right,
so
that
just
generates
a
lot
of
is
whether
you
have
swab
or
not.
What
doesn't
really
matter
right?
C
All
your
code
pages
get
swapped
kept.
You
know
48
out
and
48
back
in
so
that
just
generates
all
other
wires
and
if
you're
heading
like
IO
storm
happening
in
in
the
measurement
part.
This
point
of
fact
your
main
workload
right
if
main
okhla
test
anything
any
I/o,
it's
gonna
get
affected,
and
so
yeah
tell
us
another
problem
that
we
noticed
and
if
you
remember
the
first
scrap
that
I
showed
you
right.
There's
like
this.
Can
you
mean
a
stretch
where
there's
no
data
point
being
reported
right?
C
The
machine
is
still
alive
right,
I
mean
like
it's
powered
up.
You
know
it's
running
like
full
Celt
like
if
you
look
at
the
energy
consumption
from
the
management
interface.
Is
this
consuming
all
the
power
there
is?
The
problem
is
that
colors
way
of
recognizing
that
the
system
doesn't
have
enough
memory,
it's
kind
of
crude
right
and
it's
it
kind
of
in
a
sense.
It
just
has
to
be
a
really
conservative,
because
you
don't
want
Connor
to
be.
C
You
know
killing
things
willy-nilly
right,
so
so
Connors
criteria
for
tree
during
em
killing,
it's
really
conservative
and
that
often
means
that
you
would
fall
into
a
condition
where
the
system
is
really
not
doing
anything.
It's
just
kind
of
thrashing.
The
only
thing
is
doing
is
thrashing,
but
the
color
would
still
think
that
yeah
it
seems
to
be
making
full
progress,
so
your
service
is
down,
but
the
conflict
that
is
okay.
This
way
you
get.
C
You
know
that
20-minute
stretch
of
the
Machine
being
on
these
pencils,
and
then
you
know
something
external
has
to
be
drove
that
by
rebooting
it.
So
obviously
you
know
that's
not
good
right
and
it
also
combines
me
the
first
point
right.
If
you
said
memory
limit
right,
the
really
interesting
thing
is:
you
can
fall
into
this
thrashing
condition.
Even
with
you
know,
free
memory
available
right.
C
A
cigarette
has
memory
limit
the
workload
you
know
hits
against
it
and
it
goes
it
tries
to
go
over
buddy
can't
so
it
keeps
thrashing,
and
then
they
can
actually,
you
know,
bring
down
the
whole
system
to
make
the
whole
system
on
these
pansit.
So,
by
selling
memory
limit
you
actually
made
your
system
worse.
C
If
you
create
a
journal
entry,
you
create
the
street
to
order
in
there
right.
So
you
cannot
really
so
that
it
just
has
to
be
executed
right
away,
but
right
you
think
about
it.
They
should
still
be
charged
it
to
the
guy
who
caused
a
Taiyo.
So
these
none
of
the
existing
I/o
controllers
did
that
which
means
that,
if
somebody
causes
a
lot
of
metadata
is
all
a
lot
of
sois
buyers,
they
would
get
away
with
it
without
being
charged,
and
you
know
that,
obviously
in
agreeance
isolation,
so
we
worked
a
couple
years
on
it.
C
You
took
a
lot
longer
than
we
expected
and-
and
so
these
are
the
solutions
that
that
we
came
up
with
so
in
co2.
There's
some
memory
too
low
and
memory
that
min
all
right.
So
so,
there's
a
low
mean
low,
hi
max
right,
so
high
end
max
highest
best
effort
limit,
Max's
absolute
limit.
You
know,
if
you
try
to
go
over
it,
you're
gonna
get
killed
low
is
the
kind
of
you
know.
The
other
way
around
low
is
best
effort,
guarantee
the
corner
might
break
it.
C
If
it's
in
emergency
mean
is
stricter
than
that
right
it
would.
The
corner
would
kill
something
else:
people
freaking
breaking
it.
So
then,
with
the
low
end
mean
lift
up.
You
know
to
push
down
and
another
part
of
that
another
kind
of
really
nice
property
that
we
added
to
low
in
min
is
that
the
protection
is
proportional
in
the
sense
that
let's
say
you're
working
society's
10,
gigabyte,
right
and
and
it
kind
of
varies
over
time.
Let's
say
swings
between
9
and
9
and
11
gigabytes
and
like
without
problem.
C
What
it
does
is
that
you
can
set,
then
the
protection
si
aching,
abite
or
60
kilobytes,
even
and
then
it
is,
it
will
keep
the
proportional
I
create
you're,
gradually
folding
protection
beyond
their
point,
so
you
don't
have
to
get
the
number
easily
right.
You
can
just
kind
of
ballpark
it
conservatively
and
it'll
still
give
you.
You
know
sufficient
protection,
so
that
made
you
know,
configuration
a
lot
easier
and
we
basically
can
use
almost
the
same
configuration.
You
know
everywhere,
not
everywhere,
but
you
know
almost
everywhere.
So
it's
in
terms
of
operational
simplicity.
C
It
helps
a
lot
and
and
Joseph
Pasig
of
our
team
implemented
IATA
latency.
This
is
completion,
latency
based
IO
control
and
the
one
thing
special
about
this
controller.
That
we
hope
to
add
to
other
controllers
too,
is
that
it
hinders
back
charging,
meaning
that
if
a
C
group
does
a
shared
IO
like
a
metadata
or
swab
IO,
it
will
go
through
because
otherwise
there
will
be
per
conversion,
but
it
will
get
charged
later,
like
a
credit
card,
you
know
just
spend
first,
but
you
get
charged
you
later
and
you
pay
for
it.
C
So
it
maintains
overall
isolation,
and
the
thing
is
that
I
I
said
that
I
said
that
memory
and
I
were
conjoined
right.
So
if
you
try
to
control
memory,
you
have
to
control
IO
together.
Otherwise
you
are
just
you
know,
pushing
on
one
side
and
getting
getting
licks
on
the
other
side,
and
this
is
why,
like
one
of
the
fundamental
differences
between
secret
one
and
C
group
2,
so
single
one
has
like
per
controller
per
resource
type.
Everything
is
completely
independent
right.
C
So
in
signal
to
like
there's
a
concept
of
resource
domain,
so
when
you
create
you
know
memory
pressure,
you
can
tie
it
to
the
same
resource
domain.
That
Iowa
controller
can
look
at,
so
you
can
control
post
memory
and
I/o
on
the
same
resource
domain
yeah.
That's
one
of
the
critical
enabling
things
about
0-2
to
make
this
possible,
and,
and
also
we
edit,
some
people
psi.
C
What
what
psi
tell
you
is
that,
like
how
short
of
a
specific
resource
the
workload
is
under
right?
So,
for
example,
if
it
says
I
am
under
20%
memory
pressure,
it
means
that
the
workload
is
20%
slower
because
it
he
didn't,
have
enough
memory
for
the
past
many
average
or
it
has
different
average
intervals
and
that
helped
a
lot
in
terms
of
allocating
resources
and
monitoring
workload
to
health.
If
you
remember
what
I
talked
about
Connor
killer
right,
the
problem
with
Connor
killer
was
dead.
C
It
couldn't
tell
whether
the
workload
was
healthy
or
not
right.
So
it
would
kick
in
too
late
to
be
useful,
but
using
PSI
we
get
a
canonical
way
of
telling
where
the
workload
is
just
healthy
or
not
right.
If
something
is
slowed
by
I,
don't
know
if
your
vegetable
is
slowed
down
by
40%,
it's
obviously
not
healthy,
well,
you're,
not
doing
a
good
job
and
and
colorful
color
killer
right.
It
would
never
kick
in
at
elevation
like
on
a
loom
killer
would
only
kick
in
when
the
pressure
goes
up
to
90
90,
something
percent.
C
So
based
on
psi
excuse
me:
we
implement
this
optical
undie,
it's
already
to
like,
like
everything
else,
so
it
watches
like
the
system.
Metrics
psi
is
the
main
source,
but
it
also
watches
other
metrics
and
it
is
really
configurable.
So
you
can
tell
you
know
things
like
you
know.
If
workload
is
suffering
more
than
5%
and
system,
you
know
the
management
party
is
doing
more
than
this.
You
know,
we
know
that
you
know.
System
part
is
messing
up
the
workload.
C
A
lot
mostly
and
the
problem
there
was
that
yesterday
for
journal,
creates
this
really
bad
prior
convergence
in
Britain's
that,
where
a
high
priority
see
group
would
end
up
waiting
for
a
little
privacy
group,
I'm
sure
it
can
be
fixed,
but,
like
our
our
team,
a
lot
more
boniface
expertise
than
ext4,
so
we
fixed
everything
in
power,
FS
and
we're
just
reaching
over
to
bar
FS.
But
you
know
this
should
be
fixable
in
other
fashion
systems.
C
So
well,
that's
that
this
is
a
similar
test,
so
we
are
in
the
process
of
certifying
this
or,
like
you
know,
qualifying
this
on
different
service
tiers
and
deploying
deploying
them-
and
this
is
a
more
modern
motion-
distance,
SSD
machine
and
again
you
know,
green
and
top-line.
You
know
green
line,
you
know
three,
three
memory
leaks,
it
doesn't.
Even
you
know
it
doesn't
matter.
This
is
fine
purple
line.
C
You
know,
that's
not
good,
but
you
know
the
difference
is
more
striking
now,
because
you
know
we
have
more
I/o,
better
I/o,
and
this
is
a
memcache
here.
It's
kind
of
similar
testing
and
the
graph
color
is
not
great,
but,
like
the
top,
you
know
the
green
line
at
the
top,
which
is
barely
visible,
is
the
protected
machine
and
the
you
know,
orange
line,
which
is
you
know
going
way,
is
you
know,
obviously
the
unprotected
one
and
with
this
was
with
I,
think
50
megabyte
per
second
leak.
C
C
So
we
have
this
minimally
verified,
minimal,
Viable,
Product
in
terms
of
work,
conserving
memory
and
Iowa
isolation.
You
know
in
in
this
epic
text
tool,
which
is
the
workload
protection
we
load
in
host
protection
scheme
and
I
said
that
it
is
the
minimum
minimally
Viable
Product
right.
What
that
means
is
that
of
the
of
the
pieces
that
I
talked
about
like
all
these
things,
if
you
take
out
one,
it's
not
going
to
work
right,
I
mean
it
may
work
to
a
certain
extent.
C
C
No
digression
whatsoever
in
terms
of
disaster
readiness,
meaning
that
when
the
main
workload
wants
to
spike
up
it
should
be
allowed
to
as
if
there's
no
sidewalk
load
and
the
second.
The
other
thing
that
we
are
working
on
is
well.
This
might
be
more
interesting
to
to
you
guys
I
guess.
Instead,
you
know
when
you
put
multiple
containers
or
workloads
on
the
same
system,
and
you
want
to
say
these
guys.
You
know,
20%
is
guy,
guess
40%.
This
doesn't
work
reliably
yet
mostly
because
our
bio
isolation,
so
we
are
working
on
that
one.
C
So
one
click
away.
That
I
want
to
say
is
that,
as
I
said,
having
one
component
configured
doesn't
really
help
you
much.
It
might
even
hurt
you,
so
it
might
be
interesting
thing
to
think
about
like
if
anybody
wants
a
resource
isolation
in
their
system.
You
know
it's
not
a
symbol
of
it's
a
profile
of
configurations
to
protect
all
affected
sides
and
I'm
gonna
hand.
B
Of
course,
we
want
all
the
components
we
want
to
look
at
psi
when
I
get
only
working
on
that
you,
but
it's
a
group
with
you
is
basically
like
the
first
step
towards
that.
So
earlier
on,
you
know
in
Russia
we're
talking
about
like
the
the
components
in
the
stack,
and
one
see
is
basically
the
component
that
ends
up
running
the
container
so
like
wider,
either
you're
using
my
OpenStack
or
or
sorry,
OpenShift
or
kubernetes
are
using
cryo
or
container.
B
Do
your
darker,
your
your
mainly
one,
you
end
up
using
one
see
and
run
see
created
this
library
called
leap
container
to
abstract
all
these
steps
of
creating
a
Linux
container.
That's
that's
what
we
see
today
as
that's:
that's
what
we
basically
wanted!
A
lid
container
is
not
see
group
to
friendly
and
that's
what
we're
trying
to
fix
so
system
is
the
path
reward
see
goofy
to
because
everybody
loves
system
day
and
essentially
system
D
has
been
embracing.
B
B
It's
basically
using
version
1
to
control
all
of
the
the
limits,
but
version
2
is
already
mounted
there
and
unified
mode,
which
is
only
see
butcher
is
mounted
and
that's
where
we
want
to
go
so
luke
container
has
this
new
group
driver
and
it
makes
out
this
group
1
assumptions
right
indirectly
to
the
C
group
3
and
doesn't
work
at
all
with
the
unified
hierarchy
and
so
I'm
starting
a
plant.
You
not
three-step
plant
you
to
fix
this
system.
B
District
group
driver
first
of
them
is
actually
setting
setting
system
D
properties
instead
of
writing
to
the
C
group
3
so
like
when
you
start
the
system,
the
unit
bit
like
a
service
unit
or
a
scope
unit,
which
is
mostly
what
container
managers
use
a
scope
unit,
you
can
tell
it
which
kind
of
memory
limits,
CPU
limits
and
so
on.
You
use
so
you're.
Basically
abstracting.
It's
telling
system
the
these
are
the
limits.
I
want
and
system.
B
They
can
figure
out
whether
it
is
in
C,
so
you've
won,
or
a
group
Q
or
in
the
future.
So
group
3
is
gonna,
basically
give
this
kind
of
API,
while
system
is
useful
to
to
write
the
status
properties
and
modify
this
properties
reading
the
statistics
is
something
you
want
to
go
directly
to
the
to
DC
group
3,
to
which
you
will
it
so
you're
gonna
have
to
detect
whether
you're
running
on
the
unified
hierarchy
or
not,
but
there
is
there
are
some
simple
and
documented
ways
to
do
this
by
checking
this.
B
This
is
a
fast
group
filesystem
checking.
If
it's
a
group
to
file
system
directly,
then
you
know
you're
using
unified
hierarchy,
and
you
can
that
detect
a
hybrid
case
as
well
and
step
3
is
fixed
delegation,
so
the
ligation
is
a
concept
in
system.
Do
you
are
like
you
create
a
scope
unit
and
you
give
it
to
the
container
manager,
in
this
case
like
run
seal
in
container
cubelet,
and
once
they
get
this
unit
they're
free
to
use
this
subtree
as
well
as
they
want.
B
One
problem
with
with
with
with
doing
this
is
that
OC
ice
packs,
which
basically
come
from
the
docker
image
and
docker
aspects
that
were
created
like
few
years
ago,
were
created
with
C
group
one
in
mind,
because
that's
what
was
pervasive
at
the
time
and
so
like
a
lot
of
the
items
that
this
pack
lets
you
set,
don't
really
are
not
really
matches
to
the
C
group
to
so.
In
some
cases
we
can
do
translations.
In
some
cases
we
can
ignore
some
settings.
You
set
something
that's
not
available
and
on
Scripture
we
can
ignore
it.
B
But
the
main
thing
is
like
one
of
the
big
motivations
for
moving
to
the
group
Q
is
we
want
to
start
making
sure
making
good
use
of
these
limits?
Like
the
mean
with
the
memory
memory
limits
like
traditionally,
we
only
had
the
the
hard
limit
at
the
top
and
the
soft
limit
for
a
force
up.
Reservation.
That's
not
as
good
as
the
new
sub
the
new
reservation
limit,
which
is
memory
law
and
the
the
hard
limit,
for
instance
in
in
security.
B
One
is
something
we
we
don't
even
like
really
use
right
now
in
in
kubernetes,
because
we
we
don't
want
ohms
in
our
container,
so
like
we're,
basically
monitoring
and
evicting
pods,
and
we
will
would
really
like
to
be
able
to
to
set
some
pressure
and
some
containers
when
they're
going
about
order.
Their
assigned
limits
and
the
memory
high
is
actually
like
a
great
tournament
to
do
that.
So
like
we
want
to
start
using
those
new
new
limits,
and
we
probably
need
to
address
that.
You
know
to
the
OCI
specification
as
well:
fantastic.
A
Excuse
me,
so
it's
interesting
so,
for
you
know
well,
over
a
decade,
we've
had
secrecy,
one
in
place
right.
So
it's
one
of
the
most
okay
kind
of
well
established.
You
know
like
underlying
API,
is
right
that
we
have
to
work
with.
So
it's
it
is
pervasive
and
any
type
of
you
know
resource
allocation.
You
know
today
uses
it,
so
you
know
we
contrast
that,
though,
to
what
we
hear
from
you
know
talking
to
customers
and
those
running
large,
you
know
to
be
OpenShift
or
communities
environments.
A
You
know,
there's
a
big
push
right
now
of
running
running
this
stuff
on
bare
metal
right
because
you
don't.
Why
should
I
pay
the
tax
of
that?
If
you
know
I
already
own
these
systems,
for
example
right
now,
one
of
the
one
of
the
challenges
there
is
just
you
know
we're
actually
talking
about
the
silver
lunch.
You
know
there
is
overhead
in
the
system
right
and
secrecy
to
is
not
a
magic
bullet.
That's
gonna
just
solve
all
of
that
challenge,
but
this
is
actually
probably
one
of
the
best
knobs
and
levers
we
can
pull
by.
A
You
know
when
you
look
at
it
from
the
operating
system
point
of
view.
You
know
we
we
gain
a
lot
from
sensible
defaults
in
Linux
right
if
you've
ever
done
any
performance
tuning.
You
know
well,
there's
a
reason
why
it
set
that
way
out
of
the
box.
So
you
know
when
you
install
rel
or
Fedora
or
any
flavor
of
Linux
you,
you
expect
these
three
things
listed
on
top
to
basically
just
work
out
in
the
box
right.
A
A
It's
not
a
good
experience,
that's,
which
is
why
why
this
work
is
so
is
so
important
here
when
we
look
at
other
container
engines
system,
the
end
spawn
and
LXE
already
support
v2,
so
it'll
be
fantastic
once
you
know,
OCI
and
kind
of
the
run
see
stuff
works
as
well.
Now,
on
a
kubernetes
side,
this
is
actually
probably
the
longest
road
we
have
ahead
of
us.
Because
again,
you
know.
We
talked
earlier
that
the
OCI
specs
are
very
v1
centric.
Well,
so
is
a
coop
api
in
several
ways.
A
So
this
is
actually
probably
the
longest
road,
but
you
know
we.
We
can't
actually
start
that
until
we
get
higher
level
stuff
done
so
sorry,
I
mean
all
you
got.
You
got
some
homework
so
yeah,
so
the
work
is
going
on
the
specs.
There's
meetings
like
like
this
week
happening
on
this
stuff
right,
so
this
is.
A
This
is
all
actively
in
progress
right
now,
but
this
is
actually
an
interesting
lesson
that
we
can
all
learn
from
when
you
write
a
technical
spec,
there's
a
there's,
a
cost
of
that
when
you
write
it
to
a
very
specific
implementation
right,
and
so
this
is
kind
of
why
we're
having
to
go
in
and
deal
with
this
right
now.
It's
an
interesting
lesson
around
that
and
then
some
of
the
other
controllers
that
are
used
commonly
in
containers.
A
A
few
of
these
haven't
actually
landed
upstream,
but
they're
mostly
done,
but
they
haven't
gone
out
in
a
million
kernels
I
think
the
CPU
sets
that
landed
right
and
freezer
is
liyan,
so
that'll
be
that'll,
be
all
all
lined
up
here
really
soon,
but
again
the
thing
that
like
really
actually
concerns
me
about
this,
is
we
don't
want
any.
We
don't
want
the
ecosystem
to
be
dependent
on
one
versus
the
other.
That's
bad!
It's
a
bad
experience
right.
We
don't
want
a
Deb,
an
RPM
kind
of
situation
here.
A
So
you
know
when
things
like
open
JDK,
actually
do
a
quick
check
and
read
the
see
groups
to
see.
Am
I
running
in
a
container
or
do
I
have
the
whole
system
right
right
now,
that's
a
v1
specific
call.
We
need
to
get
to
a
space
to
where
more
user
space
isn't
written
around
a
particular
implementation
of
cgroups,
because
now
we
have
to
like
that
problem
goes
up
from
the
cluster
view.
A
We
have
to
actually
track
and
taint
nodes
or
or
label
nodes,
rather
to
know
where
you
can
run
or
and
like
that's
a
problem
that
what
he
wants
to
deal
with
right.
So
we
got
to
get
we
got
to
get
over
to
v2
before
this
gets
out
of
control
from
that
hand.
So
from
the
distribution
side
of
the
house,
we
are
we're
working
on
flipping
Fedora
to
the
fault
of
e2
of
Fedor
31,
so
that's
like
November
timeframe
and
so
like
the
the
criteria
for
that
is
libvirt.
A
The
run
see
stuff
has
to
be
in
place,
or
else
we
can't
flip
the
switch
we
know.
Kubernetes
is
not
likely
to
be
done
with
that.
At
that
point,
and
of
course
you
can
always
easily
boot
a
system
in
v1,
and
it's
not
a
problem
so
that'll
be
like
it.
You'll
opt
to
v1
in
the
stuff
on
the
Relf
side
of
the
house
by
the
way
rl8
is
in
beta,
everybody
here
used
it,
of
course,
just
for
the
recording
all
hint
all
heads
are
nodding.
B
A
Are
you
laughing
don't
have
to
cut
this
out
so
rel
rel
8
is
gonna
continue
to
default
to
v1,
but
in
a
in
a
soon
minor
release,
I
said
a
1,
but
maybe
a
2.
We
will
also
have
full
support
for
v2,
so
rel
8
is
our
release.
This
is
going
to
kind
of
bridge
this
gap
and
live
in
this
dual
world
kind
of
secret
life.
A
A
We
do
you
think
I
did
reach
out
to
some
of
the
fantastic
people
and
Sue
say
they
don't
have
a
specific
date,
but
they
think
it
may
be
possible
to
flip
either,
maybe
in
pot
before
Fedora,
so
we'll
see
how
that
goes,
but
I'm
not
actually
aware
of
other
distributions
plans,
but
once
one
disrobe
normally
goes
through
the
stuff.
A
lot
of
people
follow
suit.
So
now,
if
anybody
wants
to
try
v2
now,
it's
super
easy
to
get
your
hands
on
a
fact.
A
You
can
just
mount
up,
you
know
type
secret,
be
to
none
and
then
pass
it
a
path
by
the
way,
that's
a
90s
reference.
So
if
you're
like
under
the
age
of
30,
you
may
not.
You
know
remember
that
that's
okay,
but
it's
really
easy
to
just
get
the
hierarchy.
I
started.
Looking
at
the
controllers.
A
better
way
to
use
v2
is
actually
to
boot
your
system
with
the
system,
the
unified
secret
hierarchy.
You
know
just
to
pin
that
to
the
kernel
boot
up,
everything
works,
you
have
the
unified
hierarchy
system
v.
A
Will
little
translate
any
of
the
well.
You
do
like
a
best-effort
translate
of
any
of
the
old
like
CPU
shares,
become
CPU
weight
and
these
types
of
comm
higher
level
controllers,
but
that
that's
the
newer
terminology
there.
So
it's
really
easy
to
just
fire
up
and
actually
use
this
stuff.
So
if
you
have
systems
today
that
aren't
doing
containers,
vert
and
Kubb,
this
is
stuff
you
can
go
ahead
and
and
and
leverage
and
get
the
benefits
like
jejune
walked.
Us
through
that
facebook's
doing,
which
is
fantastic,
I
can't
wait.
A
Okay
and
just
a
couple
other
quick
hitters
here,
so
you
can
run.
You
can't
run
like
this
hybrid
mode,
where
you
have
V
one
and
two
available,
but
the
same
controller
can't
be
used
in
both
spaces.
So
it's
it's
kind
of
it's
kind
of
useless,
so
we
really
recommend
you
pick
one
or
the
other,
that's
that's
ideal
and
yeah,
and
then,
if
you
just
want
to
disable
controllers
one
at
a
time,
you
can
do
that
as
well.
So
that's
really
all
we
had.
A
We
just
want
this
to
be
like
an
awareness
talk
of
like
kind
of
what
is
that
value
secret
v2?
Why
is
it
important
that
we
go
there?
We
don't
want
the
ecosystem
to
kind
of
split.
We
don't
want
more
user
space
being
attached
to
one
or
the
other
right.
This
should
be
a
low
level
implementation
that
you
know
your
container
runtime
or
your
wonderful
init
system,
abstracts
away
for
you
right.
So
that's
really
why
we
want
to
make
this
aware
with
everybody.
So
we've
got
a
few
links
here
again,
we'll
make
these
slides
available.