►
From YouTube: Managed Kubernetes — Next Gen Academic Infrastructure? - Viktória Spišaková & Lukáš Hejtmánek
Description
Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.
Managed Kubernetes — Next Gen Academic Infrastructure? - Viktória Spišaková & Lukáš Hejtmánek, Masaryk University
A
A
For
example,
let
me
introduce
the
research
and
educational
infrastructure
in
the
Czech
Republic,
except
a
real
super
Computing
Center.
We
have
two
main
kinds
of
infrastructure
available
to
scientists.
There
are
HPC
and
kubernetes
infrastructure.
The
HPC
infrastructure
consists
of
32
000
CPU
cores,
it
has
15
petabytes
of
storage
capacity
and
it's
used
by
three
targets:
active
users.
Those
users
are
running
about
20
000
jobs
every
day
and
we
also
have
360
gpus
of
virus
kind.
This
kind
of
Interest
infrastructure
is
based
on
PBS
Pro
batch
system,
the
other
one
kubernetes
infrastructure.
A
It
consists
of
2500
CPU
scores.
It
has
60
100
terabytes
of
dedicated
storage
capacity,
expected
on
flash,
only
and
storage
array.
It's
currently
used
by
about
200
users.
They
are
running
1000
books
every
day
and
this
infrastructure
is
equipped
by
50
gpus.
Some
of
them
are
a
Nvidia
a100
and
they
are
yet
to
be
installed
and
we
will
experiment
to
it.
Make
technology
as
well.
This
kubernetes
infrastructure
is
based
on
Ranger
and
RTA.
You
know,
distribution.
A
Speaking
of
manage
kubernetes,
so
what
can
you
imagine?
Basically,
it
means
that
a
devops
beam
manage
the
infrastructure.
We
offer
type
integration
with
direct
all
of
our
infrastructure
like
the
HPC,
and
we
aim
to
offer
many
components
that
allow
easy
deployment
of
user
application.
We
have,
for
instance,
several
storage
classes
like
NFS,
Samba,
sshfs
or
cvmfs.
A
We
also
integrated
xfs,
but
this
storage
class
use
uses
a
special
version
of
Jeff
FX
driver.
This
driver
has
been
patched
so
we
are
able
to
change
user
ID
and
group
ID
that
are
locally
visible,
so
it
does
not
matter
under
which
user
ID
around
the
container.
This
page
is
public
as
a
pull
request
to
step
Upstream,
but
I
call
as
I
know.
It's
still
not
merge.
We
also
have
one
data
class
storage
class
and
both
of
these
storage
classes
are
implemented.
As
a
few
CSC
drivers.
A
We
have
a
workaround
so
that
the
CXC
driver
can
be
restarted
without
baking
mode
point.
Next
we
have
integration
of
DNS
system
for
Ingress
and
load
balancera.
It
means
that
DNX
name
is
created
for
such
a
service
like
ingressor
and
the
road
balances.
We
also
provide
like
a
encrypt
certificates
for
both
Ingress
or
also
for
non-web
services.
A
We
provide
a
single
sign-on
service
based
just
on
annotation,
so
if
you
want
to
and
use
a
single
sign-on
form,
it's
called
her
application
and
they
just
needs
to
add
some
annotation
to
to
Ingress
Android
single
sign-on
is
just
automatically
registered
and
provided
we
also
offer
shared
gpus.
It
means
that
a
single
GPU
can
be
shared
by
a
multiple
container
or
multiple
users,
but
there
are
no
guarantees
about
consumer
resources
from
the
GPU,
and
we
have
also
a
slightly
modified
GPU
operator
from
Nvidia
that
enforce
the
GPU
allocation.
A
A
The
users
are
allowed
to
run
only
unprivileged
containers,
which
can
be
a
bit
limiting,
but
on
the
other
hand,
we
do
not
enforce
users
to
use
any
particular
UV
ID
ufrl,
so
you
have
any
user
ID.
They
want.
Users
also
cannot
install
custom
resource
definition
or
any
other
cluster
scope
resources.
This
you
can.
This
operation
are
forbidden
and
only
administrator
can
do
so.
A
It
means
basically
a
devops
team
has
to
install
such
resources,
but
we
want
the
users
to
not
struggle
with
maintaining
infrastructure,
maintaining
kubernetes
and
maintaining
all
the
compound
company
that
needs
to
be
run
and
the
user
can
focus
only
on
own
application
or
on
workload
and
fully
utilize
their
service.
The
devops
team
provides.
A
However,
we
do
not
offer
just
an
infrastructure,
we
go
a
bit
further
and
we
prepare
some
prefabricated
applications,
such
as
Jupiter,
Hub
and
binder
hop
next
levels
to
our
famous
and
very
popular.
Also.
The
Jupiter
Hub
offers
integration
which
HPC
storage
systems
via
hsshfs,
and
we
also
have
two
special
instances
of
Jupiter
half
one
is
our
studio
that
runs
inside
Jupiter
app,
so
user
can
get
our
studio
on
one
click
that
is
integrated
with
HPC
storage
system
and
the
other
one
is
Alpha
fault
on
demand.
A
This
application
is
based
on
collabora,
Jupiter,
app
notebook
and
we
also
integrated
more
star
reviewer
that
allows
user
to
preview
the
folded
protein
goes
to
application.
The
Jupiter,
Hub
and
binder
Hub
are
run
a
web
application
that
has
ear
in
their
own
logon
system,
but
next
to
those
applications,
we
prepared
another
application
that
are
accessible
directly
in
a
ranger
as
a
rental
application
and
those
applications
mainly
oh,
contains
or
are
based
on
remote
desktops,
and
we
offer
applications
such
as
crime
Matlab
answers,
vmd
viewer,
IBM,
C,
Plex.
A
All
this
applications
are
based
on
either
VNC
technology
and
protocol
or
web
RTC
protocol.
In
the
latter
case,
the
user
is
given
a
fully
3D
accelerated
desktop
that
is
pretty
capable
of
almost
anything,
and
also
we
prepared
containers
that
allows
users
to
use
SSH
access
to
this
container
via
Network
and
those
containers
are
running
behave
much
like
virtual
machine,
because
user
does
not
have
a
root
access
in
the
in
the
container,
but
on
the
other
hand,
using
some
say
tricks
and
hex,
so
the
user
can
install
any
package
or
anything
in
initial
container.
A
So
it
should.
It
behaves
much
like
a
virtual
machine.
We
also
offer
some
web-based
applications,
such
as
the
code,
server
or
neo4j
and
including
other
applications
such
as
a
personal
menu
or
parallel
server,
recipient
or
personal
symbol.
Server.
Those
personal
service
means
that
the
user
can
run
the
menu
or
Samba
on
its
own
and
can
connect
the
local
computer
to
do
this
service
via
X3,
or
we
are
popular
Samba
protocol,
for
instance,
from
Windows
system.
A
So
here
you
can
find
some
examples
of
our
prefabricated
application
on
the
laptop
you
can
see
our
studio
running
in
a
Jupiter
hub
below
you
can
see
the
form
for
Alpha
fault
on
demand.
You
can
see
that
most
of
the
parameters
that
are
used
for
for
the
scripts
that
are
standard
scripts
of
alpha
fold
you
can
fill
in
the
parameters
in
the
next
two
is
on
the
right
side
down.
You
can
see
the
most
card
viewer
that
offers
the
preview
of
40
protein
and,
above
on
the
right
side
top.
A
A
So
now
let
me
reveal
some
implementation
details.
First
for
remote
desktops,
our
solution
is
completely
unprivileged,
so
it
means
that
none
of
the
participating
containers
in
its
privilege
escalation
on
run
X
root
everything
just
on
as
a
user.
However,
it
required
to
patched
a
server.
It
also
requires
some
minor
changes
to
Nvidia
GPU
operator
and,
as
I
mentioned,
the
enforce
GPU
allocation,
and
this
enforcing
denies
to
share
GPU
among
containers,
because
Nvidia,
visible
devices
all
is
ignored.
A
If
this
is
the
only
request
for
GPU,
however,
we
use
some
GPU
sharing
from
China
Oliver
background
that
is
publicly
available
and
with
the
sharing
we
can,
we
can
share
the
GPU
between
Excel
container
desktop
container
and
streamer
container.
I
also
mentioned
that
we
offer
an
integration
with
the
DNS
system.
However,
we
have
no
solution
for
a
name
conflict.
Currently,
any
user
can
and
select
any
domain
name
under
some
specific
subdomain.
A
However,
this
sub
domain
is
shared
among
all
the
users,
so
then
can
write
some
name
conflicts,
and
this
has
currently
no
solution
with
external
DNS
driver
also
with
like
encrypt
certificates.
There
is
a
one
problem
with
DNX
challenge,
because
we
offer
to
get
a
certificates
also
for
the
whole
sub
domain.
A
That
is
meant
for
both
external
DNA
and
like
encrypted
certificates,
and
in
this
case
all
every
user
is
able
to
get
any
certificate
in
in
this
domain,
because
there
is
no
real
validation
of
of
the
request,
and
we
also
are
not
aware
of
any
any
possible
solution.
You
know
for
this
problem.
Probably
one
of
the
solution
could
be
that
we
can.
We
create
this
thing:
DNX
zones
for
each
user
or
every
every
group
of
user,
but
this
is
currently
not
implemented.
A
We
decided
to
use
kubernetes
also
for
a
sensitive
data
processing.
We
set
up
a
small
cluster
that
is
dedicated
only
for
quantitative
data
processing.
This
cluster
is
separated
from
the
public
cluster.
However,
the
single
small
cluster
is
used
by
all
the
users
that
want
to
process
the
sensitive
data
we
are
working
on,
Azure,
27000
certification
and
which
is
equivalent
to
an
ISD
853
certification.
A
But,
as
I
have
said,
the
single
cluster
is
shared
by
distinct
users,
which
brings
some
isolation,
challenges
mainly
related
to
usually
single
Ingress
instance,
all
of,
and
also
for
still
instance,
that
is
not
multi-tenant
by
default.
A
We
do
not
run
just
few
web
applications
or
remote
desktops
on
our
in
kubernetes
infrastructure.
We
also
use
HPC
jobs
on
pretty
regular
basis.
Currently
we
run
the
HPC
jobs
via
workflow
managers.
We
use
two
of
them,
one
is
snake
make
and
the
other
one
is
next
flow.
The
snake
make
is
integrated
with
task
execution
service
from
gh4ga
initiative,
and
the
next
flow
is
directly
integrated
with
kubernetes.
A
B
A
B
A
B
Because
Lucas
already
mentioned,
there
are
limitations
of
HPC
in
kubernetes.
These
limitations
are
eventually
beneficial
because
they
bring
research
opportunities
for
the
community.
There
are
plenty
of
areas
where
research
can
be
conducted
that
we
started
with
scheduling
challenges
because
they
were
the
most
prominent
to
us.
I
would
like
to
present
to
you
some
of
our
research
interests
problems
with
Techo
Solutions
we
found
and
new
areas
we
would
like
to
scrutinize.
B
B
Firstly,
I
am
going
to
talk
about
effective
resource
allocation
in
kubernetes,
as
we
all
attend
HPC
day.
I.
Believe
majority
of
you
have
ever
asked
answered,
discussed
or
just
came
around
the
question
of
effective
scheduling
in
any
Computing
environment
scheduling
is
an
omnipresent
topic
because
everyone
tries
to
come
up
with
the
best
scheduling
strategy
that
will
accommodate
the
most
jobs
on
all
notes,
and
no
job
will
wait
too
long
and
cluster
usage
will
be
above
90
with
no
down
times.
B
Sadly,
this
is
not
the
reality
and
we
all
experience
a
plethora
of
problems.
We
come
from
academic
environment,
where
computational
resources
are
provided
more
or
less
for
free
for
all
researchers
and
academics.
This
is
a
very
different
approach
from
commercial
providers
where
you
can
prepay
notes
for
desired
time
or
follow
pay
as
you
use
model.
B
When
you
have
a
records
to
compute,
you
naturally
don't
want
to
pay
providers
more
than
necessary,
not
mentioning
if
you
have
specific
requests
on
resources
such
as
graphical
cards,
whose
usage
can
be
really
pricey
from
the
opposite
point
of
view,
providers
reach
very
high
resource
usage
because
they
combine
offered
plans
in
a
very
smart
way
and
efficiently.
They
overcome
it
very
much.
B
However,
burstable
clothes
as
we
call
the
applications
with
unstable
resource
usage
are
not
the
only
case
that
makes
that
makes
scheduling
in
kubernetes
hard.
We
distinguish
between
two
types
of
these
bursty
jobs.
One
are
long-running
services
that
are
used
three
times
a
week
for
two
hours
and
the
second
type
are
computations
characterized
by
Dynamic
variation
when
most
of
the
time
resource
usage
is
low,
but
for
some
short
time,
perhaps
a
more
complex
part
of
the
computation
restarts
resource
consumption.
B
B
Third
problem
is
posed
by
interactive
jobs,
which
are
common
in
HP
sync,
for
example,
when
working
with
a
software
like
Matlab
or
ansys,
if
interactive
record
is
created,
user
doesn't
want
to
wait
until
job
moves
from
waiting
queue
to
running
for
too
long.
They
want
to
work
instantly
or
in
the
span
of
approximately
two
to
three
minutes
in
kubernetes,
you
can
set
a
higher
priority
on
the
interactive
job
and
but
then
you
must
decide
which
pod
can
be
terminated.
B
Lastly,
for
scheduling
problem
is
tied
more
to
the
Academia,
where
you
need
to
enforce
fairness
and,
at
the
same
time
account
everyone
for
their
resource
usage.
Kubernetes
does
not
Implement
any
built-in,
accounting
or
fair
or
user
fairness.
If
we
talk
about
multi-tenant
clusters,
but
these
are
crucial.
Concepts,
imagine
that
you
have
a
user
who
spawns
too
much
interactive
jobs,
and
so
this
user
will
use
all
of
the
resources
and
new
user
might
never
get
to
compute.
B
The
good
news
is
that
there
are
some
solutions
to
the
problems
we
proposed.
One
possible
solution
to
the
need
to
reserve
resources
in
the
in
the
manuscript
linked
below
the
solution
is
based
on
the
existence
of
small
or
large.
It
doesn't
matter
jobs
that
can
be
evicted
easily.
Maybe
they
do
checkpoints,
maybe
their
inherent
logic
counts
with
restarts.
B
B
Another
much
easier
solution
would
be
to
create
separate
clusters
where
each
cluster
is
dedicated
to
accommodating
a
specific
workloads
type.
One
more
solution
is
a
vertical
autoscaler,
which
should
be
available
from
kubernetes
version.
125
and
vertical
autoscaler
is
able
to
scale
resources
on
the
running
container.
B
Now
I
will
move
from
effective
resource
allocation
to
HPC.
In
kubernetes
we
have
been
researching
the
potential
of
kubernetes
platform
to
run
big
workloads,
such
as
analysis
on
genomics
data
using
workflow
manager.
We
asked
ourselves
to
questions,
can
HPC
work
in
kubernetes
and
will
short-living
tasks
perform
better
in
kubernetes
we
answered
Those
Questions
by
performing
several
genomic
analysis,
runs
on
different
infrastructures,
that
being
traditional
HPC
environment,
with
batch
scheduler,
open,
PBS
and
second
environment,
the
kubernetes
cluster
we
compared
Numa
where
and
where
kubernetes
environment,
with
no
matter
where
open
PBS,
environment.
B
From
our
observations,
we
can
safely
state
that
for
kubernetes
to
perform
as
good
and
even
better
as
traditional
HPC
environment,
proper
Numa
configuration
is
the
most
important
aspect
of
the
success.
We
have
configured
just
the
standard,
kubernetes
normal
settings,
so
no
custom,
Solutions
or
deep
system
administration
work
was
needed.
We
also
found
out
that
Numa
memory
manager
has
limitations
because
kubernetes
scheduler
does
not
see
the
whole
amount
of
available
memory
with
each
Numa
node.
It
observes
whole
state
in
the
cluster.
B
It
happened
to
us
that
many
pods
were
rejected
from
the
cluster
due
to
unexpected
emission
error,
which
is
unreco
unrecoverable.
This
error
is
caused
by
not
enough
memory
on
the
pneuma
Node
assigned
to
pod.
This
truly
happened
just
because
the
scheduler
thought
that
enough
memory
was
available
overall,
but
the
poll
was
assigned
to
the
specific
pneuma
node
which
didn't
have
the
memory
Additionally.
The
time
elapsed,
from
job
being
scheduled
still
running.
B
Job
is
much
shorter
in
kubernetes,
because
container
images
are
cached
and
therefore
started
almost
immediately,
whereas
in
open
PBS
environment
there's
a
bit
of
the
setup
which,
with
larger
number
of
jobs,
significantly
delays,
whole
computation
and
there's
a
material
effect
runs
in
kubernetes
were
much
more
stable.
Overall.
B
On
these
figures,
you
can
see
the
graphical
interpretation
of
our
results.
The
upper
left
picture
shows
that
the
average
duration
of
long
running
processes,
processes
of
genomics
analysis
is
the
highest
for
non-numa,
aware
kubernetes
environment.
If
we
configure
Numa,
the
time
is
identical
or
just
slightly
more
than
open.
Pbs
on
the
other
side
is
the
upper
right
picture
shows
if
we
compare
short
living
tasks,
kubernetes
either
with
pneuma
or
non-numa,
configuration
performs
significantly
better
than
open
PBS.
B
To
send
infrastructure
comparisons
app,
we
just
saw
that
kubernetes
is
certainly
capable
of
accommodating
HPC
workloads
and
its
performance
could
improve
even
more.
We
found
out
that
kubernetes
scheduler
acts
almost
as
a
rifle
last
in
first
out
queue
because
it
does
not
preserve
the
queue
and
the
implemented
exponential
back
off
makes
just
more
mess
in
the
queue.
B
We
all
hear
about
and
really
feel
the
rising
Rising
prices
of
electricity
and
listen
to
the
stories
about
how
not
deleting
emails
as
to
the
climate
change.
By
keeping
the
servers
on
majority
of
cluster
providers.
Would
agree
that
there
are
times
when
huge
clusters
are
just
turned
on,
but
not
utilized
or
utilized
with
really
low
effectivity.
B
B
A
small
container
can,
which
will
a
small
container,
will
be
truly
better
for
a
static
website
than
starting
a
whole
virtual
machine.
Moreover,
we
can
tune
the
hardware
based
on
CPU
usage
and
power
on
and
off.
The
nodes
based
on
true
usage
is
an
idea
to
work
on.
We
came
up
with
a
thought
that
some
cluster
nodes
could
be
dedicated
to
running
specific
workload,
types
similar
to
scavenger
jobs
or
short-lived
jobs.
If
there
is
a
sudden
Spike
of
the
amount
of
parts
of
this
type,
a
new
node
could
be
dynamically
added
to
the
cluster.
B
B
Lastly,
I
would
like
to
mention
the
concept
of
hybrid
Cloud,
that
is,
that
can
be
seen
as
a
solution
to
the
scheduling
as
well.
The
idea
is
pretty
straightforward
and
is
based
on
connecting
HPC
world
with
kubernetes
world.
The
HPC
world
has
usually
more
resources
or
better
scheduling,
capability
and
kubernetes
world
is
perfect,
for
other,
let's
say
short-lived
workload
types.
We
are
currently
working
on
implementation
of
open,
PBS
connector,
which
would
allow
moving
pots
from
kubernetes
to
the
PBS
world
transparently.