►
From YouTube: State of Container Security | Urvashi Mohnani Sally O'Malley Red Hat | OpenShift Commons Briefing
Description
State of Container Security
Urvashi Mohnani and Sally O'Malley Red Hat |
OpenShift Commons Briefing
March 2020
A
B
Hey
I'm
Sally
I'm,
a
software
engineer
on
the
open
shift,
workloads
team
for
the
past
few
years,
Urvashi
and
I
we've
been
getting
together
and
giving
container
security
talk
every
few
months,
we'll
get
together
with
Dan
Walsh
and
we'll
talk
about
what's
new
Dan's
always
presenting
on
security.
But
when
there
are
enough
cool
features
to
talk
about,
we'll,
put
them
together
and
present
them,
and
we
like
to
share
them
with
as
many
people
as
we
can.
A
Ok,
so
before
we
dive
into
a
container
security,
let's
talk
about
Goldilocks
and
the
three
bears
so
true
district
app
the
story.
It's
a
story
about
a
girl
Goldilocks
who
ventures
into
the
forest
and
finds
an
empty
house
being
adventurous
as
she
is.
She
enters
the
house
and
sees
there
are
three
types
of
everything
there.
So
there
are
three
types
of
stairs
three
types
of
foods
of
foragers,
three
types
of
birds,
etc.
She
decides
to
go
sit
on
the
first
chair
and
realizes
it's
not
comfortable.
It's
too
hard
that
was
Papa
bear's
chair.
A
Do
you
then
moves
to
the
second
chair
and
realizes
it's
the
opposite?
It's
way
too
soft,
it's
still
not
comfortable,
and
that
was
mama
Verster.
She
then
steps
on
the
searcher
and
finds
that
it's
just
right
and
I
was
baby,
bear
stir.
So
as
we
progress
through
the
story,
we
see
that
Goldilocks
always
leans
towards
the
just
right
option
for
the
things
in
the
house,
but
she
goes
for
the
bowl
of
porridge,
that's
neither
too
hot
nor
too
cold.
She
goes
for
the
bed,
that's
neither
too
hard
nor
too
soft,
etc.
A
So
when
it
comes
to
continuous
security,
we
can
look
at
it
with
a
similar
lens.
The
first
level
is
where
we
have
all
our
security
features
enabled,
but
we
realize
we
run
into
challenges
when
trying
to
run
our
applications.
This
is
the
papa
bear
model
of
container
security
too
hard
and
often
times
when
we
had
such
snags.
We
end
up
overcompensating
for
it
and
disabling
all.
A
A
B
Please
don't
make
Dan
cry
right
so
that
that's
what
our
talk
is
about,
but
first,
let's
just
talk
about
container
images.
When
you
run
a
container,
there
are
three
inputs
to
a
system.
First,
the
OCI
image
format
when
a
developer
crafts,
an
image-
they
include
things
like
the
entry
point,
the
user
volumes
whatever
goes
in
a
docker
file.
These
get
translated
to
a
JSON
file,
the
OCI
in
the
spec.
B
Next
there's
the
container
engine,
much
of
the
security
of
a
container
comes
from
these
hard-coded
values
from
the
container
engine
and
like
which
namespaces
get
spun
up.
See
groups
assign
second
pools
things
like
that,
and
the
third
input
is
from
the
user.
Users
can
override
and
set
aspects
of
running
containers,
bypassing
flags
to
the
run
command
like
volumes
or
capabilities
of
port
forwarding.
B
Those
are
the
three
inputs
and
then
the
Tayler
engine
it
will
take
its
own
hard-coded
default,
the
user
inputs
and
the
information
from
the
image,
and
it
creates
an
OC
I
run
pine
spec.
It's
the
JSON
file
that
the
runtime
then
uses
to
launch
the
container.
The
runtime
is
the
run,
T
or
C
run
or
kata
containers.
B
So
that's
how
that
all
works
now,
the
middle
part
that
container
engine
like
I
mentioned
it
mostly
has
been
hard-coded,
but
at
Red
Hat
we've
worked
to
split
the
monolithic
engine
that
was
docker
with
the
docker
daemon
into
four
different
functions
with
distinct
tools.
Brio
is
for
running
locked-down
containers
in
production,
cryo
is
meant
to
run
containers
in
kubernetes
pod.
Man,
then,
is
for
running
locally,
while
developing
and
experimenting
pod
man,
the
CLI
is
pretty
much
one
to
one
with
the
docker
CLI
and
then
scope.
Eo
is
the
tool
just
for
moving
images
between
registries.
B
You
can
move
an
image
from
one
remote
registry
to
another
without
ever
pulling
that
image
down
to
your
system
and
then
build
a
build
us
for
building
container
images.
So
the
idea
is
by
setting
security
separately
for
each
of
these
tools.
You
end
up
with
the
highest
level
of
security,
rather
than
that
least
common
denominator
that
you
had
with
docker
and
also
without
the
doctor.
Daemon.
These
tools
can
run
as
rootless.
The
Red
Hat
is
constantly
experimenting
with
ways
to
run
containers
most
securely.
B
For
example,
in
openshift
we
now
run
image
builds
inside
of
a
container
with
bilder
and,
with
builder,
there's
no
leaking
of
information
from
a
docker
socket.
This
has
made
open
shipped
image,
builds
more
secure
and
also
we
lead
the
way
in
using
using
for
running
containers.
We'll
talk
more
about
that
later,
but
speaking
of
rootless,
the
most
secure
way
to
run
any
container
is
by
setting
the
user
inside
the
container
to
be
non-root.
B
This
is
a
default
and
open
regular
users
can't
run
as
root
inside
their
application
containers
and
in
almost
every
case,
if
you
think
you
need
to
be
root
inside
your
container,
you're,
probably
doing
something
wrong.
For
example,
some
a
few
examples
where
you
think
you
might
need
to
be
root
is
if
you're
running
a
web
service.
If
you
need
to
bind
to
port
80,
you
can
instead
use
port
forwarding
from
the
host
to
run
as
non-root
in
the
container.
Another
common
reason
for
running
as
root
is
installing
packages
in
your
container.
B
You
should
never
have
a
package
manager
inside
your
container.
You
should
install
packages
an
image,
build
time
like
within
a
multi-stage
docker
file
or
with
build
a
containers.
These
encourage
minimal
images
that
don't
require
root.
However,
there
are
containers
that
do
need
privileged
system
containers
or
special
containers,
whose
purpose
is
to
manage
things
on
your
host.
There
are
such
containers
in
OpenShift
and
in
kubernetes,
so
these
for
these
there
are
many
ways
to
secure
them:
can
use
Linux
capabilities,
dot-com
filters,
selinux
user
namespaces.
B
All
of
these
can
work
together
to
lock
down
even
your
most
vulnerable
containers,
and
this
is
what
we're
here
to
talk
about
today,
we're
going
to
try
to
make
it
easy
for
you
to
run
early,
because
we
all
know
that
when
it
comes
to
security,
if
it's
easier
to
disable
a
feature,
then
is
to
configure
it
chances.
Are
it's
going
to
get
disabled.
A
All
right,
so
one
of
the
ways
we
currently
enforce
security
as
by
limiting
the
power
of
root
using
capabilities
capabilities
are
chunks
of
pseudo
power.
So
each
capability
gives
sudo
the
right
privileges
and
needs
to
carry
out
certain
actions.
For
example,
if
I
run
a
container
and
I
disabled
all
my
capabilities
in
it,
the
root
user.
B
A
Currently,
we
have
37,
different
and
except
abilities,
out
of
which
we
enable
14
by
default,
when
we
run
our
container
workloads.
These
14
were
originally
defined
by
the
upstream
dollar
project
back
in
around
2013.
But
do
we
really
know
what
these
14
capabilities
are?
The
answer
is
no,
but
here
is
a
list
of
the
14.
After
after
further
examining
what
exactly
these
capabilities
do
we
found
that
a
few
of
them
are
not
entirely
critical
to
running
your
container
workloads,
so
the
first
one
is
ordered
right.
A
Audit
right
gives
you
the
ability
to
write
certain
information
into
the
auditing
subsystem,
the
back
in
the
day
when
containers
were
starting
off,
people
thought
that
the
only
real
way
of
running
jobs
in
your
container
was
to
SSH
into
it,
and
for
that
to
be
possible,
you
needed
the
SSH
daemon
running
inside
the
container.
Now
the
SSH
daemon
won't
run
unless
it
had
the
audit
write
capability
enabled
now.
We
know
that
that's
not
entirely
true.
A
We
have
tools
such
as
part
of
an
exec
or
docker
exact,
that
let
us
do
exactly
that,
and
there
are
really
no
other
applications
or
tools
that
need
the
SSH
daemon
running
inside
the
container.
So
why
have
this
enabled
by
default?
For
all
the
container
workloads
I
can
one
is
make
node
make
node
gives
you
the
ability
to
create
device
nodes
on
the
system.
This
can
be
pretty
dangerous
as
it
can
be
used
to
attack
the
kernel
and
the
reason.
A
The
main
reason
that
this
is
enabled
by
default
is
that
certain
a
bunch
of
packages
need
to
make
a
device
nodes
when
they're
being
installed.
But
if
you
have
a
different
tool
to
build
your
container
images
such
as
bilder
and
you
have
a
different
tool
to
run
your
containers
in
a
production
cluster
such
as
cryo,
you
can
run
your
containers
more
securely
in
your
production
of
production
environment
by
disabling
this
capability
by
default,
while
having
it
enabled
in
your
build
tool
without
compromising
the
functionality.
A
This
one
assist
route.
This
just
gives
you
the
ability
to
bility
to
true
route
inside
the
container.
Nor
will
application
really
uses
this.
So
we're
not
sure
why
this
needs
to
be
enabled
by
default,
for
all
the
containers
you're
on
the
fourth
one
is
natural
and
natural
gives
you
the
ability
to
create
any
type
of
IP
packets.
A
All
right,
we
have
some
demos
for
you.
So
let's
look
at
a
demo
where
I
can
I
can
drop
the
natural
capability
without
compromising
the
fing
ability
and
my
container.
So
here
I
am
running
a
basic
image
that
has
thin
capability,
because
my
natural
capability
is
still
enabled
and
think,
what's
expected
now
using
the
tab
drop
by
a
flag,
I'm
going
to
drop
the
natural
capability
and
is
expected,
it
will
not
work
because
I
can
no
longer
ping
in
there.
A
So,
let's
so,
if
we
want
to
drop
this
capability,
but
still
won't
have
think
there
is
a
way
around
it
by
enabling
this
control
here.
What
this
is,
what
does
this
call
is
doing
is
that
if
your
group
ID
falls
in
the
range
of
0
to
thousand
you
get
the
ability
to
paying
inside
your
container.
So
let's
try
that
out
so
I'm
running
container
here
I
have
dropped.
My
natural
capability
and
I
have
enabled
that
syscall
and,
as
you
can
see,
things
works
as
expected.
A
Right,
so
usually
it
so,
as
you
saw
we,
we
said
that
we
can
further
reduce
the
default
capabilities
to
what
a
list
of
ten
for
all
their
container
workloads,
but
usually
as
an
end-user
it
it
can
be
confusing.
As
to
what
exactly
pathologies
we
need
to
run
certain
containers.
The
image
developers
are
the
people
that
best
know
what
what
exactly
abilities
are
needed
to
run
the
containers
that
they
are
building.
So
if
I
am
an
image,
developer
and
I
know
that
my
container
only
needs
the
set
UID
and
said
GID
capabilities
to
function
as
expected.
A
I
can
set
this
up
as
an
image
of
the
label
or
an
annotation
and
the
image
when
I'm
building
it
and
when
my
container
engine
such
as
Bodmin
launches
this
container,
it
will
know
what
to
only
started
with
this
set
UID
and
GID
capabilities
and
not
the
default
14,
and
we
have
demo
for
this
as
well.
So
here
I.
This
is
a
docker
file
and
filled
the
chakra
file,
and
here,
as
you
can
see,
as
said
that
label
saying
that
my
container
only
needs
that
UID
and
said
GID
capabilities.
A
So
when
I
run
this
container
and
use
the
appointment
top
commands
to
see
what
capabilities
and
have
you
can
see
here
that
it
is
the
own,
it's
only
the
said,
GID
and
security
capabilities.
So
now,
let's
run
the
container
image
that
doesn't
have
such
a
specification
and
we
will
see
that
it
runs
with
all
the
14
that
are
default.
Now.
What
happens
when
an
image
developer
says
that
the
image
needs
capabilities
that
fall
outside
of
this
list
of
14?
A
So
here
I
have
a
docker
file
where
I'm
saying
that
I
want
my
image
to
run
with
the
net
admin
and
sysadmin
capability.
So
point
man
will
run
your
container,
but
it
will
log
an
error
saying
that
you're
not
allowed
to
run
these
capabilities
as
they
don't
fall
in
the
default
14.
But
it
will
run
your
container
with
a
default
14.
So
you
will
not
have
net
administers
admin
enabled,
but
you
will
have
the
default
14
list
enabled.
A
However,
if
you
run
the
same
image
and
you
specify
these
capabilities
using
the
cat
that
flag,
you
will
see
that
Foreman
actually
launches
a
container,
what
the
tool
capabilities
that
that
is
still
outside
with
the
fall
14.
So
the
reason
that
we
do
it
this
way
is
that
we
don't
want
users
to
end
up
pulling
random
images
of
the
internet
and
running
it
and
when
I
get
not
realizing
what
capabilities
it
has
enabled
like
it,
but
for
thoughts
of
the
14,
that's
usually
enhanced
capabilities
that
container
workers
don't
usually
need.
So
why
why?
A
Why
are
you
running
it
like
that?
But
as
an
end-user?
You
really
believe
that
you
need
those
capabilities,
then
born
man
will
not
stop
you
from
doing
that,
so
this
was
just
a
way
of
showing
how
we
can
further
lock
down.
Arkansas
containers
move
more
towards
the
papa
bear
module
by
living
image
to
image
developers,
restrict
which
capabilities
are
needed
to
run
your
containers.
B
Great,
so
we
showed
how
easy
it
is
to
set
Linux
capabilities,
how
about
limiting
syscalls?
Well
processes
communicate
with
the
kernel
through
syscalls,
so
one
way
to
attack
a
host
is
to
gain
access
to
the
kernel
through
syscalls.
Just
by
turning
on
calm
in
a
kernel,
you
go
from
having
eight
or
nine
hundred
sis
calls
down
to
about
450.
That
comp
is
a
kernel
feature.
It
was
added
in
2005,
so
just
by
turning
that
on
you're
already
better
when
running
containers
just
about
everybody
runs
with
a
default
filter.
B
This
was
developed
upstream
by
docker
and
by
Jesse
Frisell
and
a
whitelist
about
300.
This
call
you
can
find
it
on
your
system,
their
user
share
or
containers
that's
better,
but
can
we
do
even
better
recently
aqua
sec
did
a
study
and
they
looked
at
all
the
containers
out
there
and
found
that
most
only
require
40
to
70
sis
calls
and
red
hat.
We
found
the
same
to
be
true.
The
problem
is
it's
really
difficult
to
figure
out
which
this
calls
a
container
need,
they're,
not
the
same.
B
For
these
a
70
each
container
is
different,
so
that
was
the
problem
that
we
looked
at
last
summer.
We
had
an
intern
on
the
run
time,
Steen
vivillon
through
google
Summer
of
Code,
and
he
worked
with
Valentin,
Rothberg
and
Dan
Walsh
to
come
up
with
a
tool
to
do
just
that.
It
will
generate
a
second
profile
based
on
a
container
that
you
give
it
so
the
way
it
works.
B
It's
an
OC,
I
hook
it's
by
the
its
OC
ice
comp,
the
PF
hook
and
an
OC
I
hook
is
a
helper
program
that
gets
launched
by
the
runtime
just
after
our
container
is
created,
but
before
it
starts
it
hooks
into
the
kernel
through
BPF,
and
it
watches
all
this
just
calls
on
your
system.
It
records
those
disk
calls
that
are
in
a
given
containers,
Pig
namespace,
and
when
that
container
exits
a
second
profile
is
generated
with
the
whitelist.
A
second
profile
is
just
a
JSON
file.
That's
a
whitelist
of
this
calls.
B
All
the
recorded
sis
calls
that
the
container
used
the
will
show
this
running
in
a
demo.
It's
pretty
cool,
but
the
idea
is
that
an
application
developer
can
run
this
hook
through
their
entire
CI
a
CD
program.
It
can
test
every
code
path
and
use
case
and
edge
case.
It
can
run
it
in
a
test
or
a
production
environment
for
a
few
months,
and
just
you
know,
continuously
watch
the
second
profile
until
it
stabilizes,
and
there
are
no
new
sis
calls
being
added
at
that
point.
B
So,
let's
see
how
it
works
here
in
this
demo,
here's
a
here's,
just
a
look
at
the
hook
itself.
If
you
you
can
see
it's
in
OCI
hooks
D,
there's!
That's
where
the
binary
is
and
for
any
container
that's
launched
with
the
annotation
containers
trace,
just
call
a
second
profile
will
be
generated
and
the
user
will
pass
a
path
to
where
they
want
that
file
generated.
B
So
now
we
can
get
out
and
we'll
just
run
a
simple
Fedora
image
and
we're
going
to
pass
that
annotation
to
tell
the
hook
to
start
and
we
just
ran
LS.
So
now
we
can
look
at
that
profile
that
was
required
just
to
run
LS
and
a
fedora
container.
You
see
there
are
about
there's
like
30
or
so
40.
This
is
called
required
just
to
run
LS.
B
So
now
we
can
turn
around
and
run
that
container
again
using
that
generated
second
profile,
and
you
can
see
it
works
just
fine,
but
behind
the
scenes
that
container
is
completely
locked
down.
Only
those
shortlist
assist
calls
30
or
40
are
allowed
out
of
the
300.
So
great
what
happens
if
we
need
to
run
LS
dash
L?
Let's
check
it
out
so
we'll
use
this
same
profile
to
run
LS
dash,
L
and
it
errors
out
because
apparently
and
dash
L
requires
more
syscalls.
So
you
can
see
that
not
at
log.
Hopefully
you
can
see.
B
Yes,
some
sis
calls
it
was
trying
to
use
it
stopped
and
there
are
a
few
listed,
but
there
are
probably
more
because
that's
just
all
that
the
audit
log
caught
so
let's
go
back
and
run
the
container
again
with
the
annotation
to
catch.
Any
new
sis
calls
that
are
required
with
LS
dash
L.
So
here
we
have
a
new
file
generated
and
we
can
run
with
that
new
file
and
run
LS
dash,
L
and
hopefully
it'll
work.
B
Now,
let's
look
at
the
difference
between
the
two
second
profiles:
one
for
LS
and
one
for
LS
dash,
L
I
found
this
interesting.
These
are
the
plus
signs.
Are
the
added
sis
calls
that
you
need
to
run
else?
What
I
found
interesting
is
connect.
I
was
surprised
that
you
need
to
connect
to
run
just
LS
dash
L.
What
what
is
what
socket
is
being
connected
to
here?
Well,
when
you
run
LS
dash
L.
B
If
you
look
at
the
output,
your
UID
is
mapped
to
your
username,
so,
instead
of
outputting
for
a
file
ownership
instead
of
zero,
it
says
root
or
instead
of
saying
1,000.
It
will
say
s
o
malley,
that
action
uses
NS,
which,
in
the
background,
NF
switch
and
the
SSS
demon,
and
that's
where
the
connect.
This
call
comes
into
play.
That's
an
interesting
tidbit,
but
so
that
that's
how
the
hook
works.
You
can
download
it
and
check
it
out
for
yourself,
but
to
implement
this.
B
The
runtimes
team
has
been
working
on
a
plan
and
what
what
they've
come
up
with
is
that
an
image
developer
should
ship
the
second
profile
within
the
image
and
include
the
label
on
the
image
that
will
tell
the
container
engine.
Oh,
this
image
has
a
profile
I'm
going
to
use
that
the
reason
is
because
again,
application
developers
know
best
how
their
container
should
run,
and
so
it
makes
sense
for
the
developer
to
include
this
with
their
image
there.
So
again,
with
capabilities.
B
Just
like
capabilities
say
you
have,
as
this
call
that's
outside
of
the
default,
you
might
be
met
with
a
decision.
The
container
engine
might
error
out
and
tell
you
that
it
won't
run
this
image,
because
this
is
calls
outside
the
default
or
you
can
tell
the
engine.
Hey
I
trust
this
image,
let's
run
it
anyways,
so
those
are
some
things
being
worked
out,
but
the
hook
itself
is
in
github.
Please
download
it.
Try
it
out,
it
works
great
and
that's
it
for
the
hook.
Yep
yeah.
A
Okay,
so
another
way
we
enforce
security
currently
is
using
any
Linux,
which
is
a
tool
we
all
know
and
love.
So
the
way
selinux
works
is
it's
a
security
model
based
on
type
enforcement,
where
files
and
processes
of
different
types
and
access
is
restricted
based
on
what
type
you
can
access.
So,
in
the
past
seven
years,
almost
every
CVE
that
has
occurred
has
been
filed,
system
breakouts
and
guess
what
esle
index
has
blogged
each
and
every
one
of
them.
So
SEL
SELinux
is
the
best
tool
to
protect
your
file
system
from
continued
escape.
A
A
problem
with
SELinux
occurs
when
we
use
volumes
when
we
are
amounting
volumes
into
container
we're
essentially
taking
a
part
of
the
OS
and
exposing
it,
including
containers,
since
container
processes
are
only
allowed
to
read
files
that
have
the
container
file
T
label.
A
lot
of
these
system
directories
on
the
host
are
not
are
not
accessible
because
they
don't
have
that
label.
A
So
a
way
to
a
way
to
make
this
work
is
to
use
the
lowercase
e
or
Africa
is
the
mount
option.
When
you
mount
into
your
container
and
what
this
does
is
it
really
builds
the
content
on
your
host,
so
your
container
processes
can
then
access
it.
Now,
this
all
works
fine,
if
the
directly
that
you're
mounting
into
your
container
is
going
to
be
solely
owned
and
used
by
your
container.
A
But
if
you
have
other
applications
running
on
your
host
that
need
to
access
the
same
directory,
they
will
end
up
breaking
because
they
wouldn't
recognize
a
new
label
that
these
compass
content
has
been
relabeled
with.
So
the
only
way
to
make
this
work
then,
would
be
to
disable
selinux
confinement
in
your
container,
causing
you
to
run
in
a
very
insecure
fashion
and
pushing
you
all
the
way
towards
the
mama
bear
modular
security.
A
A
So
we
have
a
quick
demo
that
we
can
look
at
how
this
works.
So
here
I
am
running
container
and
I'm
mounting
in
my
home
directory
is
read-only
and
the
wire
spool
directory
is
readwrite,
but
these
are
their.
These
are
system
directories
which
do
not
have
the
right
label
for
container
processes
to
read
them.
A
So
as
expected,
when
I,
when
I
try
to
read
the
home
directory,
my
container
I
will
get
permission
denied
and
same
thing
with
VAR
school,
and
since
I
mounted
this
as
read/write,
if
I
try
to
write
to
it,
I
wouldn't
be
able
to
because
I
haven't
be
labeled
the
content.
So
now,
let's
use
your
data
to
generate
a
custom
policy
for
this
container,
so
I
run
my
container
with
the
volumes
that
I
want
mounted
in
and
let
you
need
to
inspect
it
and
create
a
ceiling
X
policy
for
me.
A
So
when
you
run
this,
what
you
did,
sir
does
is:
basically
it
creates
a
new
label
type
for
your
container
process,
and
this
is
what
gives
you
access
to
the
content
on
your
post
and
then
it's
very
simple.
It
tells
you
exactly
what
you
need
to
run
to
load
this
new
custom,
isotonix
policy.
So,
as
you
can
see
down
here,
I
have
ran
that
command
and
it
takes
a
few
seconds
to
hook.
A
So
the
great
thing
about
you
deeds
is
that
you
don't
need
to
be
an
acid
Linux
policy
expert
and,
as
you
don't
expert
to
figure
out
what
exactly
customizations
is
need
to
get
this
to
work
without
labeling
the
content.
It
does
everything
automatically
for
you
so
yeah
my
thing,
my
new
policy
has
loaded
and
I'm
going
to
run
the
container
again
for
this
I'm
gonna
set
this
new
label
here
that
I
got
from
Judy's
ax
and
there
goes
my
containers
running
so
now.
A
If
we
look
at
the
process
running
on
our
host
machine,
we
will
see
that
it
is
running
with
this
new
label,
as
expected.
So
now,
when
I
exact
into
the
container
and
I
try
to
do
exactly
the
same
thing,
I
was
doing
earlier
to
access
my
home
directory.
It
works.
It
works
fine.
Now,
I
no
longer
get
permission
denied
when
I
to
our
school.
B
A
Do
the
same
thing:
I
can
read
it
now.
Let
me
try
writing
to
it
and
you
can
see
here.
I
am
able
to
write
to
it,
so
you
need
sir
lets.
You
increase
the
custom
policies
for
you
that
lets
you
volume
out
into
a
container
without
having
to
use
relabeling
or
without
having
to
disable
SELinux
confinement
completely.
B
That
that's
really
cool
we've
talked
about
linux
capabilities
and
SELinux
and.com
filtering.
Let's
talk
now
about
using,
as
I
mentioned
earlier,
Red
Hat
has
been
leading
the
way
and
driving
forward
user
name
spaces
in
Linux.
Just
a
little
background.
A
namespace
is
what
gives
an
isolated
view
of
your
system
with
regards
to
a
set
of
resources,
so,
for
example,
in
a
container
you're
in
a
pit
namespace,
you
only
have
access
to
processes
in
that
pit
namespace.
So
within.
B
Space,
you
only
have
access
to
the
range
of
you
IDs
and
G
IDs
that
are
in
that
user
namespace.
This
provides
just
an
extra
layer
of
isolation
and
a
privileged
root
user
inside
the
container
then
can
be
mapped
to
a
non-privileged
user
on
the
host.
So
if
a
process
was
to
break
out
of
that
container,
it
wouldn't
be
a
privileged
user
on
the
house.
That's
the
idea
and
in
fact
a
new
ID
separation
has
always
been
the
standard
security
tool
in
like
Linux
dared
systems.
B
So
pod
man,
it
does
some
really
pop
build
some
really
cool
things.
With
user
names
faces,
they
are
user
name
spaces
are
the
reason
why
you
can
run
these
tools
in
rootless
and
they're,
also
really
effective
at
providing
separation.
You
can
imagine
if
you
had
a
kubernetes
environment,
it
would
be
a
huge
boost
in
security
if
every
container
was
separated
by
usernames
basis,
but
sadly
nobody
is
using
user
name
spaces
for
container
separation.
B
Yet
there's
still
some
work,
though
some
issues
to
work
out
and
again,
Red
Hat,
has
been
and
is
leading
the
way
with
this
work.
One
problem
with
kubernetes
is
that
there's
still
no
support
in
kubernetes
for
user
name
space,
so
UID
shifting
with
volumes
in
kubernetes
it's
difficult.
It
requires
kernel
support
that
isn't
quite
there
yet
so,
when
mounting
a
volume
from
a
host
to
a
pod,
the
ownership
of
files
is
just
not
automatically
updated
and
the
work
people.
The
community
has
been
working
on
this
for
years
and.
B
B
So
you
need
to
Chone
all
those
files,
and
that
is
prohibitive,
because
it's
well
but
the
container
storage
team
led
by
Nollan
and
then
the
kernel
storage
team
was
vivek,
they
have
been
working
and
they
added
a
new
feature
recently
in
to
overlay
FS
to
make
Chonan
and
assigning
file
ownership
and
user
namespaces
much
faster.
The
things
are
moving
forward,
also,
giuseppe
score.
Bono
he's
been
working
on
a
prototype
in
kubernetes
using
user
name
spaces.
If
anyone
can
figure
it
out,
it's
these
three
guys,
giuseppe.
B
Also,
as
an
aside,
he
rewrote
run
see
in
the
like
over
a
weekend
allowing
containers
to
run
with
cgroups
v2,
that's
a
different
story,
but
these
the
runtimes
team
and
Red
Hat
is
working
to
move
this
forward,
and
it
is,
it
is
it's
just
taking
some
more
time
so
I
do
have
a
demo,
though
that
shows
how
user
name
spaces
are
really
effective
at
separating
containers
and
pod
man.
This
is
this
is
easy.
You
can
use
this
in
pod.
B
Now
I
can
run
a
second
container
and
map
UID
0
inside
the
container
to
200,000
on
the
host
for
the
next
5,000
you
IDs
and
with
pod
man
top.
You
can
see
now
that
on
the
host,
this
is
running
as
200,000
inside
the
container
on
route,
and
you
can
see
with
a
PS
that
all
the
processes
are
owned
by
200,000.
B
This
show
you
can,
you
can
see
now.
If
a
process
was
to
break
out
of
the
first
container,
it
would
be
100,000
on
the
host,
it
would
have
it
wouldn't
have
elevated
privilege
and
it
would
have
no
access
to
the
second
container
running
in
200,000.
It
wouldn't
even
see
the
containers.
The
container
storage
is
separated
so
that
that's
just
an
example
of
how
pod
man
uses
use
of
namespaces.
A
Alright,
that
that
is
a
really
great
tool,
but,
as
we
saw
in
the
demo
every
time
we
run
a
container
and
we
want
to
use
user
name
space
the
board.
We
have
to
set
a
specific
UID
map
for
each
containing.
We
run
now
as
an
end
user.
That
can
be
pretty
tedious
to
keep
track
of.
If
you're
running
hundreds
of
containers
to
know
which.
B
A
You've
already
used
and
which
are
still
available
so
to
make
it
easier
on
the
end
user
and
to
help
them
move
towards
Papa
very
easily.
We
have
a
new
flag
and
form
and
run
called
user
NS,
and
when
you
set
this
to
auto
portman,
will
automatically
pick
a
different
user
name
space
for
every
container
that
you
run
and
it
will
guarantee
uniqueness
and
we
plan
to
there's
still
some
issues
that
we're
working
around
this
and
we
plan
to
test
it
out
briefly
and
quad
man
and
once
its
feature
complete
and
we're
happy
with
its
stability.
A
So
here
I'm
just
going
to
run
a
container
and
set
that
flag
to
auto,
and
when
we
look
at
the
user
that
is
running
with
on
the
host.
We
see
it's
running
with
user
1
billion.
Now,
if
I
run
another
container
in
similar
way-
and
we
see
what
user
is
running
on
the
host
with
it's
running
with
a
billion
and
1024,
so
the
reason
it
fixed
1024
is
because
the
default
size
that
quadrant
automatic
picks
is
1,024.
But
let's
say
you
know,
your
container
needs
a
wider
range,
so
it
needs
a
range
of
5000.
A
So
we
can
do
that
with
the
same
flag.
Just
add
a
coolant
size
equals
to
whatever
size
you
want.
So
here
I
wanted
to
have
5000,
and
when
I
run
my
container,
we
will
see
that
it
started
off
the
UID
from
2048,
which
is
1024
later
than
the
one
that
I
run
before
this,
because
that
the
range
of
that
container
before
it
was
1024.
But
if
I
look
at
the
UID
map
in
this
container,
we
will
see
that
the
range
here
is
set
to
5000.
So
this
is
still
a
work
in
progress.
B
All
right
so,
finally,
the
last
thing
I
want
to
talk
about
is
the
the
containers
dot-coms
this
central
file,
it's
a
feature
being
added
in
pod
man.
Now
it's
a
central
location
where
you
can
set
security
configurations
system-wide
for
all
of
your
containers
and
all
of
your.
So,
for
instance,
the
distro
might
put
the
containers,
comp
and
user
share
containers
and
an
application
developer
would
include
a
containers
comp
and
that
would
go.
It
would
override
the
Sharyn
going
at
sea
containers
and
then
a
user
could
override
that
further
and
put
the
container.
B
So
some
things
that
you'd
use
containers
call
for
would
be
like
removing
those
four
capabilities
that
we
talked
about
earlier,
the
four
capabilities
that
nobody
really
needs.
You
could
remove
them,
system-wide
for
all
of
your
containers
and
all
your
tools
in
containers
comm.
If
you
wanted
to
enable
ping,
then
you
could
add
that
Cisco
back
in
for
all
of
your
containers-
and
you
wouldn't
have
to
remember
that
long
command
with
the
specific
sis
call
that
her
she
showed
in
the
demo.
So
that's
one
way
that
you'd
use
containers
comps.
B
Another
way
is
some
of
these
commands
like
the
build
up
nodes.
They
have
tons
of
flags.
You
know
you
could
have
ten
or
twenty
flags
and
parameters
that
you
need.
If,
if
a
containers
come
file
contain
those
flags,
then
that
just
makes
it
easier
for
a
user
to
run
that
image.
The
same
thing
with
high
performance
computing
in
very
high
security
environments
and
there's
a
lot
of
configuration,
that's
required
with
every
container,
so
adding
this
containers
con
file
just
makes
it
easier
for
configuring.
B
B
Yeah,
okay,
so
I'm,
let's
just
run
fedora
container
and
here
are
all
the
capabilities.
There's
no
fourteen
default
capabilities,
but
let's
edit
our
containers,
comp
file
and
to
just
those
ten
capabilities,
take
out
the
floor.
We
talked
about
earlier
that
nobody
needs
so
I'm
gonna
pass
this
containers
comp
and
variable
to
the
pod
man
command.
You
won't
have
to
do
that.
This
is
just
for
the
demo.
B
B
I
think
that
yeah,
that's
it
that's
the
end
of
the
demo
and
that's
the
end
of
our
talk.
We
do
want
to
thank
mo
Maureen,
Duffy
D
wrote
she
did
all
the
artwork
on
the
slides
anything
else.
There
Ritchie.