►
Description
Join Jamie Poole and Scott Solkhon from G-Research as they discuss their approach to bare metal deployments.
The User Research user group meets regularly to discuss and advance Research Computing using cloud native technologies.
If you'd like to learn more about the CNCF End User community or join as a member, you can find more information at: https://www.cncf.io/enduser
A
All
right
so
welcome
everyone
to
another
session
of
our
research
user
group,
so
today's
topic
has
been
requested
for
a
long
time
and
we
never
actually
got
to
cover
it
so
elements,
and
we
have
jamie
and
scott
from
g
research
to
present
how
they
are
handling
this
and
and
teach
us
how
to
how
we
should
do
it
so
yeah
onto
youtube.
B
B
Okay,
all
right
so
yeah.
This
is
me
and
scott
are
going
to
talk
to
you
a
little
bit
about
bare
metal
kubernetes
at
g
research,
so
not
necessarily
telling
you
exactly
how
to
do
it
or
how
the
only
way
it
can
be
done,
but
just
talking
a
bit
about
our
adventures
with
it.
B
What
we're
up
to
what
we've
learned
along
the
way
so
in
terms
of
introductions
we've,
actually
both
of
us
been
to
cern
to
visit
ricardo
and
co
and
managed
to
get
the
obligatory
photo
in
front
of,
I
think,
that's
alice,
so
we
put
those
next
to
each
other,
but
yeah,
I'm
jamie
paul.
Most
of
you
probably
know
me
because
I
co-host
this
with
ricardo
every
other
week
anyway,
although
haven't
been
around
for
a
few
weeks.
B
D
I'm
cloud
engineer:
I
work
mostly
on
openstack
but
yeah,
so
yeah.
A
lot
of
that
is
ironic,
cool.
B
Very
very
brief
bit
about
georesearch
for
those
who
don't
know
so
we're
a
fintech
company
based
in
london.
We
run
a
large
distributed
research
platform
for
teams
of
quants
to
look
for
patterns
in
real
world,
noisy
data
sets
of
financial
data
looking
for
for
patterns
for
for
our
clients
and
currently
we're
we're
still
saying
for
a
while.
But
it's
still
true
migrating
large
amounts
of
our
batch
compute
workloads
from
windows
and
hd,
condor
onto
kubernetes
and
linux
and
containerization,
and
all
that
good
stuff.
B
D
Okay,
so
so
yeah
a
little
bit
of
a
background.
First
of
all.
So
what
is
ironic
ironic
is
an
integrated
openstack
service
which
aims
to
provision
bare
metal
machines
instead
of
virtual
machines,
so
ironic
supports
using
vendor-specific
plugins
which
implement
additional
functionality,
such
as
moving
machines
between
different
networks,
so
yeah
and
the
main
things
about
well.
The
main
thing
for
this
talk
really
is
to
focus
on
the
different
states.
We
have
in
ironic
it's
not
limited
to
these,
but
the
main
ones
are
rolling
cleaning
holding
and
provisioning.
D
So
how
does
it
work
under
the
hood?
So
ironic,
it's
pretty
straightforward.
So
what
it
is?
It
does
ipmi
and
pxe
and
a
mix
of
mix
that
and
around
disk
image,
and
then
it
turns
machines
on
and
off
and
moves
in
between
different
networks
as
they
move
through
different
parts
of
the
build,
so
ironic
can
be,
can
be
deployed
standalone,
but
most
common
way
to
do
it
and
probably
like
in
a
production
environment.
It
sort
of
sits
beside
other
openstack
projects
such
as,
like
nova,
neutral
and
glance.
D
Just
a
bit
of
background,
so
nova
is
used
for
deploying
like
vms,
like
virtual
machines.
Neutron
is
your
network
and
and
then
glance
is
like
an
image
catalog.
So
yeah
ironic
will
use
those
different
services
to
to
get
images
or
change
networks
or
whatever
it
needs
to
do
so.
D
The
good
thing
as
well
is
when
a
bare
metal
machine
is
deleted
by
the
user,
it's
cleaned
and
then
it's
just
returned
back
into
the
available
pool,
and
then
someone
can
else
just
someone
else
can
just
pick
out
that
that
pool
so
this
is
really
high
level
diagram,
just
to
sort
of
show
the
sort
of
enrollment
stage
that
we've
got.
D
So
if
you
look
on
the
left
there
there's
a
few
open
source
products
we
use,
so
one
is
kyobi,
which
is
a
sort
of
a
subproject
of
the
color
ansible
project
in
openstack,
and
that's
used
to
deploy
also
use
it
to
deploy
new
bare
metal
nodes
into
ironic
as
well.
So
essentially,
it's
just
a
bunch
of
ansible
and
we
use
muse
jenkins
to
sort
of
to
orchestrate
that
I
guess
so.
If
we
look
at
what
the
enrollment
phase
actually
does,
so
we
go
through
pre-inspection.
D
First
of
all,
it's
just
like
the
pre-steps
before
you
can
actually
look
at
the
nodes
and
work
out
what
what's
going
on
in
there.
So
we
create
a
record
of
it
in
the
openstack
api.
Then
we
set
the
resource
class
we
buy.
We
apply
some
baseline,
bios
and
ilo
settings.
The
important
bit
here
is
the
resource
class.
So
a
resource
class
essentially
is
just
like
defines
like
what
I
know
like
a
type
of
node,
so
you
might
have
like
a
certain
type
of
gpu,
node
or
cpu
node
or
like
specialist
hardware.
D
You
need
to
find
that
as
a
resource
class.
It
basically
says
this
is
what
my
server
should
look
like.
You
just
have
this
much
ram
these
disks
and
all
that
kind
of
all
that
stuff.
D
So
we
define
all
that-
and
we
say
this
is
what
I
expect
these
new
nodes
to
look
like,
and
then
we
go
through
to
the
next
phase,
which
is
inspection,
which
is
an
ironic
sort
of
state.
So
it
will
turn
the
server
on
pixie
boot
into
the
ram
disk,
and
then
it
will
discover
what
hardware
is
there
check
for
things
like
cabling
issues
and
identify
the
switch
it's
plugged
into?
So
when
we
move
it,
we
know
which
switch
to
log
into
to
actually
move
the
port,
and
then
it
will
create
those
ports
in
ironic.
D
So
what
that
allows
us
to
do
is
then
basically
cross
check
between
the
resource
class
and
what
the
server
actually
actually
has
inside
it,
because
if
you've
got,
if
you
think,
if
you've
got
a
big
pool
of
servers
and
when
you
hand
one
back
and
you
take
a
new
one,
you
want
to
make
sure
that
you're
getting
the
same
server
back
well,
not
literally
the
same,
but
one
that
has
the
same
spec.
D
So
once
we've
done
the
inspection
we
can.
We
then
know
where
it's
plugged
in
and
which
switchboard
so
neutral
will
then
move
it
to
what's
known
as
the
cleaning
vlan,
which
is
essentially
the
same
as
that
the
cleaning
state.
And
then
we
go
for
like
a
what's
known
as
cleaning.
So
cleaning
runs
inside
the
ram
disk
image
again
and
what
it
does
is
it
will
just
boot
into
it,
and
it
will
have
like
a
set
of
steps,
basically
python
scripts,
and
it
will
just
run
through
those
in
order
of
priority.
D
So
we
do
things
like
updating
firmware,
verifying
that
the
ilo
settings
are
all
correct.
Ntp
set
up
the
storage,
we
wipe
the
hard
disks
and
then
we
check
if
there's
gpus
in
there
we'll
check
their
health
as
well.
Once
it's
done
that
it
should
be
good
to
go
almost
so
we
just
finally
run
some
tests
on
it,
so
we
run
some
burning
tests
and
then
we
move
it
to
the
holding
vlan,
and
then
it
goes
from
cleaning
into
the
holding
state.
D
That
essentially
means
it's
ready
for
a
user
to
pick
it
up
the
other
side.
So
if
we
just
look
back
at
that
diagram,
you
shall
understand
this
a
little
bit
more
now
so
nodes
come
in
from
the
left
for
our
automation
into
ironic
api.
We
inspect
them
and
then
they
get
moved
into
where
they
are
in
the
data
center.
So
a
conductor
basically
is
a
microservice
in
ironic,
which
is
which
its
purpose
is
to
sort
of
look
after
a
group
of
nodes.
D
So
you
might
have
like
a
common
set
of
nodes
or
like
an
area
in
a
data
center
like
an
availability
zone
or
something
like
that,
so
they
all
just
get
bunched
up
and
yeah
and
then
it's
all
ready
for
ready
for
people
to
use
the
other
side.
D
So
moving
on
onto
the
deployment
side,
so
there's
a
person
there.
They
will
pick
a
flavor
and
a
network
and
an
a-z
in
an
image
and
then
that
transforms
into
some
novus
some
stuff
that
happens
in
iran
in
openstack
and
then
out
pops
a
bare
metal
node,
the
other
side,
so
we'll
just
take
a
little
bit
of
a
closer
look
at
that
happens.
D
Is
the
user
requests
the
new
bare
metal
machine
via
terraform
in
our
case,
but
I
mean
you
can
just
do
it
by
the
api
if
you
really
want
to
and
then
the
flavor
selected
is
the
thing
that
maps
to
the
resource
class.
So
earlier,
when
I
said,
you've
got
a
resource
class,
it's
like
a
type
of
server.
D
D
So
the
openstack
side
of
the
process
you
hit
that
you
hit
the
nova
api,
then
that
will
talk
to
placement
and
the
scheduler,
and
that
will
basically
look
in
the
pool
and
it
will
say,
what's
available,
give
me
the
first
node
of
the
top
or
the
first
hundred
nodes
or
first
thousand
nodes
whatever.
When
you
select
neutron,
although
then
go
and
move
all
of
those
into
the
provision
in
vlan,
so
then
they
go
into
that
provisioning
state.
D
So
we
go
from
holding
back
into
provisioning
and
then
this
is
the
state
where
we
get
ready
for
the
user
to
use.
So
in
machine
provisioning.
We
turn
the
server
on
using
ipmi.
We
pixie
boot
into
the
round
disk
image,
and
then
we
apply
a
few
bios
settings
that
might
be
like
hyper
threading
on
or
off.
That's
probably
the
most
common
one.
But
you
can
you
can
configure
anything
you
want
as
long
as
it's
available
via
the
api
and
then
we
pull
the
user's
image
from
glance.
D
So
the
user
will
specify
that
image
when
they
actually
build
the
machine.
They
don't
want
to
run
the
ram
disk
image,
because
that's
what
all
our
tools
in
it
hasn't
got
their
tools
in
it.
So
this
would
in
our
case,
be
flat.
Car
would
get
pulled
from
glance,
which
is
the
image
service
in
openstack,
so
that
basically
explains
the
blocks
there
on
the
right,
so
yeah
request
comes
in
schedules.
D
Nova
compute
will
coordinate
some
stuff
in
neutron
to
move
it
to
the
right
vlan
and
then
the
ironic
conductor
will
pull
the
image
down,
put
it
on
the
node
then,
and
then
from
there.
All
we
need
to
do
is
move
the
vlan
again
into
the
the
requested
vlan
that
the
user
wanted,
and
then
we
just
restart
the
server
and
then
the
server
will
just
boot
into
it
into
an
os
and
then
present
a
prompt
screen
that
the
user
can
log
into
that
means.
D
Yeah
then,
hopefully,
we've
got
everyone
got
their
metal
servers
and
they're
happy
to
go
and
use
their
their
fleet
of
servers.
Now
that
is
all
good
until
the
users,
then
is
finished
with
the
server.
So
the
idea
between
ironic
is
sort
of
cattle,
not
pets,
so
you
use
a
server
for
a
lot
of
time
or,
however
long
you
need
it
and
then
you
hand
it
back,
go
through
cleaning
and
then
it
goes
available
ready
for
someone
else
to
use.
So
it's
really
really
flexible.
D
So
yeah
the
user
deletes
the
server
neutron
goes
and
moves
the
server
to
cleaning.
We
go
through
those
same
cleaning
steps.
So
if
the
firmware
has
changed
since
since
users
handed
back
the
machine,
then
that
will
get
updated,
wipes
all
the
disks.
So
it's
all
nice
and
secure
when
the
new
user
use
gets
given
the
server
and
yeah.
We
checked
that
it
hasn't
been
tampered
with
or
anything
like
that,
as
well.
D
Just
for
extra
security
checks
and
then
yeah
neutron
will
finally
move
it
into
holding
and
then
then
it
becomes
available
again
so
just
to
recap
on
those
states.
So,
first
of
all
you
roll
the
node
into
ironic.
Then
it
sits
there
and
it's
ready
for
the
user
to
use
and
then
we've
got
cleaning
between
holding
and
provisioning
really
so
yeah
enrolling
cleaning
holding
provisioning.
That's
about
it
really
ot.
B
Thank
you
all
right
and
then
more
on
to
the
sort
of
how
we
use
this
for
kubernetes
and
research
purposes,
side
of
things,
I
suppose
so
we
historically
we've
always
run
kubernetes
in
g
research
on
openstack
for
a
long
time,
we've
been
doing
it
on
on
vms,
but
more
recently
we
started
moving
on
to
building
clusters
using
bare
metal,
but
this
process
is
pretty
much
the
same
regardless.
B
We
just
use
a
different
flavor
as
scott
was
talking
about
so
the
way
we
tend
to
do
it
is
we
define
our
clusters
in
terraform
terraform
code
in
github.
B
We
then
use
terraform
enterprise
in
our
case,
to
build
the
clusters
into
openstack
using
ironic
machines
are
built
and
configured
using
the
flat
car
operating
system
and
then
flat
car
uses
ignition
to
pull
down
user
data
and
configure
a
very
minimal
kubernetes
installation.
So
our
initial
bootstrap
task
basically
gets
us
a
small
bare
metal
server.
Sorry
collection
of
servers
running
a
kubernetes
cluster,
pretty
vanilla.
B
B
Once
we
have
a
minimal
cluster,
then
we
apply
our
more
detailed
kubernetes
configuration
on
top.
Typically,
we
do
that
these
days
using
jenkins
or
arc
cd
or
combination
of
the
two
actually
sort
of
undergoing
a
bit
of
migration
at
the
moment,
and
that's
just
how
we
then
deploy
all
of
our
sort
of
desired
state
kubernetes
configuration
on
top
so
things
along
the
lines
of
ingress
controllers
and
calico
and
all
the
other
bits
and
pieces
and
things
that
we
want
to
have
in
our
clusters
to
make
them
look
and
feel
like
our
desired
gr
clusters.
B
Once
we've
done
that
we
then
deploy
armada.
So
this
is
an
application
which
I've
talked
about
a
bunch
of
times
at
this
in
this
forum.
So
I
won't
go
into
too
much
detail
now,
but
this
is
just
the
overall
architecture
diagram
of
the
the
application
which
we
typically
deploy
on
top
of
these
clusters.
So
you
can
see
here,
the
the
blue
boxes
at
the
bottom
of
the
screen
are
kubernetes
clusters
in
this
sort
of
new
world
of
metal.
These
are
all
but
high
performance
bare
metal
clusters.
Quite
a
large
number
of
nodes.
B
We
tend
to
scale
up
to
about
a
thousand
and
then
one
our
model
server
sitting
on
top,
which
allows
our
users
users
to
submit
jobs,
to
run
on
the
hardware,
a
couple
of
notes
just
on
some
benefits,
we've
seen
so
far,
so
that
this
is
really
the
reasons
for
us
moving
to
this
model
in
the
first
place.
B
So
it's
still
early
days,
but
the
things
sorts
of
things
we've
seen
are
increased
stability,
so
certainly
for
things
like
gpu
intensive
workloads,
we
had
seen
some
issues
when
we
were
running
on
virtualization
that
have
just
completely
evaporated
since
movement
spare
metal.
It's
certainly
been
a
lot
simpler
than
trying
to
debug
sort
of
kernel
level
issues
within
the
within
the
virtualization
layer,
just
to
move
to
their
metal
and
not
really
worry
about
it.
B
Some
other
benefits
we've
seen
are
things
like
increased
network
throughput
between
nodes
and
external
resources,
being
able
to
use
bgp
peering
very
easily
can
be
done
with
vms,
but
it's
a
little
bit
more
complex
for
us
as
well.
Typically,
we
end
up
with
much
larger
nodes,
because
your
bare
metal
servers
tend
to
be
a
bit
bigger
than
your
average
virtual
machine.
By
definition,
I
suppose-
and
for
us
it's
just
simpler
estate
management
as
well,
so
we
have
fewer
layers
between
our
our
workloads
and
our
hardware.
B
B
However,
there
are
limitations
as
well,
so
some
of
the
things
we've
noticed
so
far,
certainly
a
slower
provisioning
time,
which
is
actually
completely
expected,
as
you
can
imagine,
when
you're
provisioning,
a
bare
metal
server,
you're,
actually,
basically
turning
on
a
real
machine,
and
you
have
to
wait
for
it
to
power
on
with
a
virtual
machine.
All
of
that
sort
of
abstracted
away
from
you,
you
don't
really
see
it.
There's
a
lot
more
precise
quota
management
required.
B
I
think
you
can
be
a
little
bit
more
fast
and
loose
when
you're
running
a
large
virtual
estate.
You
can
oversubscribe
things
and
you
know
over
subscribe,
cpus
and
things
like
that.
It's
much
harder
to
do
in
a
bare
metal
environment,
you're
very
much
constrained
by
the
physical
resource
you
actually
have.
It
is
a
little
bit
less
flexible
in
some
ways
and
there's
some
features
of
virtualization,
which
we
don't
get
as
a
side
effect.
So
things
like
being
at
a
snapshot
of
vm
are
quite
useful.
B
B
So
we've
tended
to
take
the
approach
of
just
starting
from
scratch
and
building
new
clusters
as
bare
metal
from
the
beginning,
rather
than
trying
to
add
it
into
existing
virtual
clusters,
but
yeah
in
summary,
for
us
we're
now
using
bare
metal
kubernetes
for
our
highest
performance
workloads,
we're
still
also
making
heavy
use
of
virtualization,
where
appropriate,
so
from
all
sorts
of
classic
kubernetes
clusters,
if
you
like,
for
services
and
so
forth,
we're
still
making
good
use
of
vms,
but
for
the
clusters,
where
we
really
care
about
performance
and
we're
running
lots
and
lots
of
high
throughput
jobs,
then
we
are
now
moving
to
my
metal
and
openstack
ironic
is
our
metal
as
a
service
choice?
A
Awesome,
thank
you
jamie
and
scott.
It
was
nice
nice
summary
anyone
has
any
questions,
feel
free
to
just
go
for
it
and
ask.
I
will
have
a
couple,
but
I'll
leave
the
floor
for
two
others.
First,.
E
I'm
here
I
have
a
question:
did
you
look
at
any
other
tool,
suites
besides
ironic
or
was
were
you
set
on
ironic
because
I
believe
it's
an
open
stack
project
right.
It.
B
Is
yeah?
Okay,
we
have
looked
at
some
other
things,
we're
relatively
opinionated
about
it.
I
suppose,
because
we're
already
got
quite
a
foothold
in
ironic,
sorry
in
openstack,
using
lots
of
other
openstack
services,
as
scott
mentioned,
we
have
actually
on,
I
think,
independently
ahead
of
ironic
rolled
our
own
metal
as
a
service
system
internally,
which
does
work
as
well,
but
it's
kind
of
nice
to
be
able
to
use
the
off-the-shelf
open
source
tooling.
That
fits
in
nicely
with
the
rest
of
our
ecosystem.
F
I
was
wondering
whether
they
had
looked
at
the
you
know.
You
mentioned
the
provisioning
times
the
slow,
whether
there
was
any
looking
at
pre-provisioning
sort
of
expected
images
that
you're
gonna
spin
up.
I
know
that
when
we
had
the
on
metal
service
at
rackspace,
before
we
switched
over
to
ironic,
that
was
part
of
the
whole
plot
was
to
pre-spin
up
these
bare
metal
servers.
B
It's
not
something
we've
used
yet
I
mean
certainly
some
things.
We've
looked
at
in
our
processes
where
we
can
save
time
with
earth.
Maybe
you
can
cover
this,
but
some
of
the
things
like
bios
settings
and
things
where
we
want
to
make
sure
we
eliminate
the
requirement
for
reboots
and
things
like
that,
I
suppose
during
the
process,
yeah.
D
So
yeah
there's
a
little
bit
of
that
we
can
trim
around
but
yeah.
So
the
way
it's
designed
is
you
have
one
big
pool
of
data,
sorry
of
nodes,
and
you
can
sort
of
have
multiple
tenants
using
that
where
we
don't
have
that,
there's
some
stuff
that
we
can
sort
of
pull
out
like,
for
example,
the
buyer
settings
they're
they're
applied
at
like
a
provision
time
if
they're
static,
then
you
just
don't
have
to
reapply
them
every
time.
D
You
just
have
to
make
sure
in
cleaning
and
I
was
tampered
with
them
and
then
you
can
save
a
bit
of
time.
There
also
there's
lots
of
things
you
can
do
about
caching,
images
and
that
kind
of
thing,
and
actually
for
us,
that's
something
that
has
been
relatively
easy,
because
it's
because
these
these
kubernetes
clusters
tend
to
use
the
same
image
and
then
we
have
lots
of
them
that
use
the
same
image.
So
everything
gets
cached
and
it's
all
kind
of
hot
at
all
times,
pretty
much
so
there's
more
things.
D
B
Pretty
well
one
thing
which
I
noticed,
which
surprised
me
actually
in
my
own
reaction
to
it
I
mean
is
the
first
time
I
saw
it.
It
took
20
minutes
to
to
build
a
server.
I
was
like.
Oh,
this
is
a
nightmare.
This
is
going
to
make
everything
really
slow
and
difficult,
because
I'm
used
to
a
vm
spinning
up
in
30
seconds
or
something,
but
actually
when
you're
used
to
it
and
you're
doing
things
at
large
scale
and
in
bulk.
B
It
doesn't
really
matter
if
one
server
takes
20
minutes,
if
you
can
build
hundreds
or
thousands
simultaneously,
you
actually
end
up
caring
a
lot
more
about
reliability
and
being
comfortable
that
your
automation
will
just
work
and
you
can
walk
away
from
it
and
come
back
later
and
everything
will
be
up
and
running.
It
will
be
much
worse
if
it
was
faster
but
less
reliable.
So
I
always
sort
of
er
on
the
side
of
reliability
over
performance
personally
of
the
build
that
is
once
it's
up
and
running.
We
want
performance
as
well.
Obviously,.
B
Yeah
I
mean
20
minutes
is
slightly
anecdotal.
I
would
say:
that's
that's
our
current
experience
for
a
certain
type
of
flavor,
but
it's
it's
of
the
order
of
minutes
no
longer
seconds,
but
in
that
way.
C
So
is
is
that
the
mode
of
operation
that
everybody
want,
so
if
somebody
submits
to
to
run
a
particular
workload,
they
get
provisioned
a
particular
resource
or
resource
type.
It's
not
that
some
things
are
long
lived
and
people
kind
of
swap
or
you
know,
interchangeably,
use
the
same
standing
resource
it.
B
Depends
how
you
choose
to
use
it
in
our
model?
What
we
do
is
we
have
a
bunch
of
hardware
and
built
into
clusters
ahead
of
time
which
sit
there
and
are
used
relatively
constantly
by
a
collection
of
different
users,
so
it
affects
the
hardware
as
well
being
a
large
pool
of
hardware
is
all
being
shared
by
lots
of
different
people,
we're
quite
lucky
in
the
sense
that
we've
got
relatively
in
the
grand
scheme
of
things,
a
relatively
small
pool
of
researchers
all
doing
quite
a
similar
thing.
B
So
we
can
be
quite
prescriptive
about
the
hardware
that
they'll
get
so
we
have
a
smallish
number
of
flavors
of
cpu
nodes
and
similar
gpu
nodes
and
potentially,
in
future
other
accelerators.
B
It
might
be
the
case
that
in
other
companies
who
do
or
other
organizations,
even
who
want
to
do
a
more
like,
I
guess-
offer
metal
as
a
service
or
cluster
as
a
service
up
to
users
to
actually
create
their
own,
that
that
would
be
a
possibility.
But
for
us
we
we
take
the
more
sort
of
we
provision
it.
We
being
the
infrastructure
and
platform
teams
and
then
our
our
users
within
our
organization,
then
just
use
what
we've
provisioned
for
them.
B
It
depends
what
you
mean
so,
generally
speaking,
the
way
we
we
have
is
we
have
a
these
pools
of
compute,
which
we
understand
the
sort
of
flavors
and
qualities
of,
and
then
we
have
a
bunch
of
tools
and
software
which
allow
users
then
run
jobs
on
them.
So
they
don't
it's
not
a
ticketing
system.
It's
really
a
case
of
they
can
just.
They
are
already
set
up
with
access
to
this
large
pool
of
compute,
and
then
they
can
submit
jobs.
G
B
Yeah,
so
it
looks
loosely
like
this,
so
we
actually
have
multiple
of
this
whole
picture,
in
fact,
but
if
we
just
look
at
one
of
these
as
an
example
imagine
this
is
a
data
center.
We
have
many
of
these
clusters
under
here
each
one
of
these
clusters
itself,
but
the
cluster,
I
suppose
in
it
in
in
of
itself
may
even
last
for
years
we
might
create
it.
You
know
a
couple
of
years
ago
saying
it's
still
running
now
and
we'll
still
be
running
jobs
on
it,
the
nodes
themselves.
B
We
tend
to
quite
frequently
rebuild,
because
I
think
we
actually
have
a
bit
of
a
fetish
for
sort
of
rebuilding
stuff
in
gr,
making
sure
everything
comes
back
clean
and
tidy.
B
We
probably
also
eventually,
I
think,
we'll
move
into
a
more
rolling
cluster
rebuild
process
as
well,
because,
obviously
that's
long-lived
state
itself
which
could
get
dirty
or
out
of
sync,
somehow
it
shouldn't
but
as
possible,
but
no
generally
speaking,
the
clusters
themselves
live
for
quite
a
long
time
and
then
the
nodes
within
them
are
of
the
order
of
tens
of
days
maximum
couple
of
months.
G
So
this
architecture
is
really
for
at
that
facilitator
level,
where
you're
you're
building
environments
for
individuals
and
you're
you're,
keeping
it
you're
moving
with
whatever
the
ongoing
research
is
and
the
reason.
B
Yeah
short
time
scales
on
the
pods
and
whatever
yes
yeah
exactly
so,
we've
got
like
time
scales.
Then
time
scales,
the
pods
themselves
are
anything
from
seconds
up
to
a
couple
of
weeks,
say
and
then
the
nodes
and
the
clusters
last
for
a
lot
longer
and
they
were
they're
just
sort
of
running
this
primal
deal
soup
of
of
user
workload,
but
also
just
using
ironic
or
any
kind
of
metal
as
a
service
is
also
just
a
useful
thing.
B
Don't
do
this
model,
so
we
have
a
model
where
we
create
clusters
and
then
effectively
offer
you
can
think
of
it
as
like,
namespace
as
a
service,
so
that
tenancy
is
the
thing
which
we
offer
people
on
the
existing
clusters,
whereas
I
think,
or
certainly
when
last
time
we
were
talking
about
it
over
in
cern
they're,
doing
more
sort
of
cluster
as
a
service,
so
people
can
ask
for
their
own
clusters,
which
then
may
use
something
like
ironic.
In
fact,
do
you
do
that?
Ricardo?
A
I
had
a
question
because
you
mentioned
it's
a
kind
of
follow
up
for
the
last
one,
which
is
you
you
describe
the
workflow
with
github
and
then
the
provisioning
using
terraform.
Do
you
also
use
this
for
like
cluster
upgrades
or
or
is
this
like
you
just
redeploy
from
scratch
and
you
cluster.
B
That's
a
good
question,
so
we
tend
to
use
the
cluster.
Bootstrap
thing
is
kind
of
a
one-time
thing
to
build
a
cluster.
If
we
have
quite
a
long
loop
cluster,
then
we
can
actually
do
all
of
our
upgrades.
Then,
from
this
point
onwards,
if
this
makes
sense
so
things
like
upgrading
kubernetes
itself,
we
have
a
bunch
of
tooling
to
do
that.
So
we
can
do
it
in
place.
Cluster
upgrades,
even
the
kubelet
on
all
the
nodes
as
well,
because
that
itself
is
containerized
and
similarly
then
operating
system
upgrades.
B
We
can
do
in
a
rolling
fashion
because
we
have
this
model
where
here,
underneath
the
long
lift
cluster
the
nodes,
get
rebuilt,
sort
of
sequentially,
underneath
the
cluster
with
error,
budgets
and
so
forth,
so
that
we
don't
do
the
whole
thing
at
once,
but
we
have
options
we
can
also.
If
we
want
just
completely
blow,
you
know,
coordinate,
wait
for
stuff
to
drain
and
then
blow
it
away
and
rebuild
it
all.
If
we
want
to
do
upgrades,
but
yeah
we
tend
to
just.
B
A
Okay
and
the
other,
the
other
question
I
had
was
like
there's
quite
a
lot
of
activity
in
this,
try
to
to
kind
of
manage
the
clusters
from
as
if
they
were
kubernetes
resources
and
then
just
build
on
things
like
argo,
to
kind
of
make
everything
uniform.
A
B
I
would
definitely
consider
it
and
I'm
very
excited
about
it,
and
I
would
like
to
do
it
at
some
point,
but
it's
just
never
quite
been
up
the
priority
list
enough
for
us
in
our
world.
I
think
what
it
would
end
up
doing
is
effectively
replacing
terraform
yeah.
I
think
basically
we
would
be
going
straight
from
github.
B
Well,
I
suppose
we'd
have
something
to
bootstrap
our
initial
cluster
somehow
and
then
cluster
api
would
then
go
off
and
talk
straight
to
openstack,
but
hopefully
everything
we've
already
done
would
then
continue
to
integrate
nicely,
and
we
would
just
use
that
directly.
So
I
think
it's
really
a
question
of
how
well
supported
openstack
is
by
the
cluster
api
right.
I
haven't
checked
recently,
but
yeah.
It
would
be
very
interesting
to
do
that
cool,
certainly
a
limitation.
B
It
has
effectively
maintains
a
big
graph
of
resources
which
it
has
to
have
to
walk,
walk
the
graph.
Every
time
you
make
any
kind
of
change,
and
especially
when
every
resource
is
actually
a
remote
thing
that
has
to
go
off
and
be
checked,
then
you
can
imagine
that
ends
up
translating
into
a
lot
of
api
calls
which
can
be
quite
slow
and
expensive.
B
H
Other
questions
in
the
chat
of
someone
there
you
go
when
you,
I
have
another
question
so.
E
B
We
have
a
networking
team
who's
more
responsible
for
that.
So
within
our
organization
we
have
a
few
different
functions
and
different
areas
responsible
for
different
things,
so
we've
got
an
infrastructure
function
and
a
platform
function
me
and
scott
from
both
of
those
respectively.
B
There
is
a
team
within
the
infrastructure
function
who
deals
with
networking
specifically,
but
what
we're
definitely
finding
is
having
more
cross-functional
teams
is
really
powerful.
So
I've
got
people
in
my
team
who
have
got
really
strong
networking
skills
and
understand
that
kind
of
stuff,
including
down
to
the
hardware,
and
I
actually
suspect,
over
time,
we're
probably
going
to
need
to
develop
some
kind
of
special
cross-functional
team
that
just
looks
at
performance
and
tuning
of
the
estate.
Basically
because
we
need
to
be
able
to
do
it
all
the
way
from
top
to
bottom.
B
E
Yeah
we,
you
know
we
use
a
lot
of,
but
we
have
a
lot
of
bare
metal
and
vms
and
we
just
there's
this
friction
with
the
same
networking
team
professional
disagreements,
maybe
over
how
the
switches
should
be
managed-
and
I
saw
like
in
during
your
presentation-
you
were
switching
vlans
and
at
the
beginning
and
it
seemed
like
you
had
a
decent
amount
of
control.
E
B
Yeah,
we,
I
think,
I
think
we
do.
Our
networking
team
is
quite
sort
of
up
to
speed
with
everything
that
we're
doing
as
well,
and
I
mean
like
you
say,
though
there
was
always
friction,
sometimes
between
teams,
because
different
teams
sometimes
operate
at
different
rates
and
when
we've
got
responsibility
shared
across
groups,
it
can
be
tricky,
but
we've
got
quite
a
sort
of
singular
purpose
at
the
moment.
So
there's
a
particular
large
project
happening
at
the
moment,
which
involves
a
lot
of
this
stuff.
B
E
B
Yeah
I
mean
up
to
about
a
thousand
nodes
in
the
given
cluster.
We
could
go,
we've
actually
decided
arbitrarily
to
sort
of
stop
about
there,
but
that
was
in
fact
one
of
the
reasons
for
the
armada
architecture
so
that
we
could
have
many
of
these
things,
because
we're
aware
that
past
a
certain
limit,
kubernetes
can't
really
scale
much
further.
B
I
think
the
official
limit
is
still
5000
nodes,
but
I
know
from
I
think
anyone
knows
from
going
to
conferences
and
things
that
you
have
to
do
quite
a
lot
to
get
that
far
and
then
over
backwards.
So
we
have
a
model
where
we
just
go
to
about
a
thousand
and
then
just
plug
in
more
clusters
horizontally
and
scales.
Quite
well.
That
way
and
is
armada,
a
g,
a
g
research
project,
or
is
that
yes,
okay,
so
it's
an
open
source.
But
yes,
it's
come
from
g
research.
B
A
I
have
one
more
you:
may
you
mentioned
the
issues
with
gpus
and
stability
and
improvements
by
moving
to
bare
metal
yeah?
Were
you
doing
a
pci
pass-through,
I
guess
and
in
vms,
and
do
you
remember
which,
which
specific
issues
you
had
and
how
did
they
think
we.
B
Were
yes?
Yes,
I
can't
remember
the
specific
issues,
but
we
were
basically
getting
unexpected
errors.
Things
were
being
reported,
as
you
know,
not
a
number
and
that
kind
of
thing
just
mathematical
errors
which
shouldn't
have
been
happening
under
quite
nice
circumstances
as
well.
When
we
were
running
out
of
memory-
and
you
know,
you'd
have
to
have
a
few
different
failures
happening
a
certain
way,
but
somehow
we
managed
to
always
hit
this
scenario
quite
frequently,
and
then
we
just
thought
well,
hey,
look
rather
than
try
and
debug
all
this.
B
A
Yeah,
I
ask
is
because
we
we
have
been
seeing
simulations
with
virtual
machines
recently
right
and
yeah,
that
that
is
a
tempting
solution.
I.
B
A
Yeah,
the
the
issue
is:
how
many
gpus
do
you
have
per
node
on
average
for
us
up
to
about
eight
all
right?
It's
the
issue
is
that
for
kubernetes
clusters
this
is
easy
to
handle,
but
for,
if
you
have
a
mix
of
vms
and
kubernetes
clusters,
using
those
gpus
actually
virtualization
allows
you
to
like
expose
is
on
a
multi-gpu
node
quite
easily
yeah.
A
Well,
if
you
just
dedicate
like
parameter
nodes
to
to
to
people
directly
as
you
would
do
with
vms,
then
you
basically
like
potentially
giving
them
a
a
really
nice
way
to
waste
pressure,
reserve
resources.
Yeah,
that's
true
yeah!
I
think
I
think
that's
that's.
The
reason
why,
like
for
coordinates,
cost
is
kind
of
a
no-brainer
that
you
can
go
bare
metal
for
gpus
and
just
schedule
directly.
A
A
What
would
be
the
suggestion
if
I
just
want
to
do
kubernetes
on
bare
metal
like
what's
the
the
best
option
or
and
the
least
complicated
option
to
get
stuff
up
and
running?
You
know
sort
of.
B
Yeah,
quick
fashion,
I
mean
I
I
don't
know
because
I
know
what
we
do
and
we've
got
obviously
quite
opinionated
about
using
openstack
and
ironic.
I'm
sure
I
I
think
one
thing
that
probably
can
be
said
of
ironic
at
openstack
is:
it
can
be
quite
complex
and
it's
probably
quite
difficult
to
get
up
and
running.
D
B
D
There's
two
projects
that
you,
probably
if
you
want
to
have
a
guide
to
look
at
which
sort
of
lower
that
barrier
to
entry,
so
one
is
called
bifrost
which
will
allow
you
to
just
sort
of
run
like
ironic
from
like
just
laptop,
that's
good
for
like
bootstrapping
new
environments,
where
you
don't
already
have
a
control
plane
and
then
the
other
is
what
we
actually
use
here
to
deploy
all
of
our
openstack,
which
is
color
ansible.
D
That
is
it's
basically,
a
collection
of
ansible
roles
and
basically
a
lot
of
the
hard
work's
been
done
for
you.
So
a
lot
of
it
just
kind
of
works
out
the
box
and
you
can
deploy
it.
Vanilla,
open,
stack
really
easily
to
like
a
couple
of
vms
on
your
machine
or
if
you've
got
a
couple
of
bare
metal
nodes,
you
can
deploy
a
control
plane
there
with
relatively
little
openstack
experience
tuning
it
and
getting
it
to
large
scales.
D
B
A
Yeah,
I
think
that
comes
back
to
this
idea
that
we've
had
for
a
while,
which
is
to
do
these
recipes
for
different
sorts
of
workloads
that
are
kind
of
specific
for
research
environments.
I
think,
like
the
deployment
on
premises
and
norman
bare
metal
is,
is
something
that
is
not
like
super
common,
maybe
because,
like
most
users
will
be
using
public
cloud
providers
or
some
sort
of
commercial
virtualization
solution
that
is
already
available.
A
B
Or
pointers,
I
guess
we
don't
even
have
to
have
a
recipe
for
it
and
say
do
this
we,
but
we
can
say
we
as
a
collective,
have
done
these
things
yeah.
We
know
that
they
work
well,
it
can
can
be
made
to
work,
but
yeah,
that's
good.
She
didn't
mention
that
in
this
presentation,
all
of
our
computers
is
on-prem
yeah.
I
suppose
anyone
using
a
cloud
provider
can
also
just
use
bare
metal
through
through
whatever
they
support
as
well.
I
think
they
will
all
do
now.
G
I
think
that'd
be
very
valuable
for
the
university
community
from
what
I
can
tell
most
of
the
the
bare
metal
clusters
are
hand
created
from
various
various
methods.
So
what
works
and
what
works
well
would
be
definitely
useful.
A
I
I
guess
the
the
dream
is
really
this
idea
of
the
cluster
api,
where
you
you
put
some
effort
into
the
bootstrap
cluster
that
you
do
by
hand,
but
then
everything
else
is
kind
of
coming
automatically
via
the
cluster
api.
I
don't
know
how
far
it
gets,
because
then
you
still
need
this
kind
of
metal
as
a
service
component
somewhere.
B
A
But
yeah,
but
so
maybe
maybe
we
take
this
as
an
action
just
to
to
send
around
like
a
survey
like
we
did
the
last
couple
of
times
with
asking
specifically
about
parameter
deployments.
Yeah.
B
A
A
G
A
A
Talk
this
time,
but
we
should
probably
just
circulate
like
a
slot
lunch
time
or
something
where
we
all
get
together
and
yes,
a
it
all
started
in
barcelona.
So
we
might.
B
A
Still
escaping
this
jam
session,
I
can
see,
I
know,
I'm
sorry.
A
All
right-
that's
I
I
don't
have
anything
else
for
today
and
if
anyone
else
wants
to
raise
something.
A
All
right,
otherwise,
we
have
a
container
sage
in
two
weeks
and
after
that
could
come
so
yeah
thanks
everyone
for
attending
and
we'll
follow
up
also
in
the
in
this
live
channel.
Thank
you.
Everybody
great
to
see.