►
From YouTube: Knative Community Meetup May 2022 - Serverless Research with STeLLAR and vHive with Dmitrii Ustiugov
Description
Serverless clouds boost developer productivity by taking over cloud infrastructure management, allowing the developers to focus on their service’s business logic. This labor division opens opportunities for systems researchers to innovate in serverless computing. However, leading serverless providers rely on proprietary infrastructure ill-suited for systems research in academia. In this demo, I will talk about vHive, a full-stack open-source ecosystem for serverless clouds benchmarking, experimentation, and innovation; which is now in use in 25+ universities and companies world-wide.
A
About
this
session,
so
without
further
ado,
I'm
I'm
really
happy
to
introduce
today's
speaker,
dimitri
mustigoff,
dimitri
received
his
phd
from
the
university
of
edinburgh
and
is
now
joining
zurich
as
a
postdoctoral
researcher,
and
his
presentation
is
turbocharging
serverless
research
with
stellar
and
v
hive
so
dimitri
over
to
you.
B
Thank
you
it's
great
to
be
here.
Can
you
enable
sharing
the
screen.
B
B
So
serverless
allows
service
developers
to
focus
on
writing
their
code
as
a
set
of
functions,
whereas
the
cloud
providers
take
care
of
all
the
heavy
lifting
which
is
automatic
scaling
of
function.
Instances
according
to
the
traffic
changes,
this
labor
division
boosts
developers,
productivity,
allowing
fast
time
to
market
for
their
products,
so
say
that
developers
need
to
write
a
video
analytics
application
with
two
functions.
B
First,
the
function,
the
first
function,
the
codes
frames
which
comes
from
a
camera
in
video
fragments
from
the
camera,
and
it
invokes
the
second
function
to
recognize
objects
in
the
string.
So
in
serverless,
this
application
can
be
written
with
two
functions
composed
with
simple
function
calls
and
I'm
sure
everyone
is
familiar
with
that.
So
this
flexibility
also
comes
with
pays
your
goal
combined
with
space,
you
go
billing,
where
the
developers
only
pay
for
the
cloud
resources
which
functions
actually
use.
B
So
this
is
why
service
clouds
simplify
the
programming
burden
and
also
on
have
cost
advantages
compared
to
the
conventional
clouds.
B
From
the
serverless
call
provider's
perspective,
the
situation
is
completely
different
and
to
accommodate
changes
in
the
key
invocation
traffic
in
front
of
each
function.
The
provider
scales
the
number
of
instances
of
each
function
on
demand,
so
a
function
can
have
from
zero
to
virtually
infinity
number
of
instances
which
can
change
at
any
time.
B
B
As
a
result,
providers
tend
to
over
provision
the
number
of
instances
according
to
their
understanding
of
the
cost
performance
trade-off
and
the
second
example
of
a
service
problem
is
it's
an
efficient
communication
across
functions
in
a
service
which
is
related
to
their
stateless
nature
providers
scale
instances
on
demand.
It
is
allowing
instances
to
hold
a
shared
state
so
that
any
instance
can
process
any
invocation,
and
there
are.
These
are
entire
research
directions
with
a
lot
of
papers
coming
out
every
year.
B
B
Our
first
work
in
serverless
characterizes
the
state-of-the-art
cloud
offerings
and
it
was
first
to
compare
cloud
performance
across
different
providers
even
with
proprietary
infrastructure,
and
this
was
acknowledged
by
the
leading
amazon
engineer
mark
brooker.
Even
before
the
paper
got
published
to
analyze
various
components
of
the
deep
serverless
stack.
B
We
introduced
beehive
ecosystem,
for
which
we
had
a
full
day
tutorial
at
a
top
conference
in
systems
this
year.
Hospice.
Everything
is
on
youtube,
so
you
can
take
a
look,
and
there
are
a
lot
of
universities
which
uses
beehive
today
for
their
research
studies
and
also.
B
For
coursework,
at
the
same
time,
we
have
a
lot
of
collaborators
and
contributors
across
many
companies,
and
this
keeps
growing
over
time,
and
many
of
these
companies
have
research
groups
even
several
which
use
beehive
for
either
evaluating
their
accelerator
products
or
to
understand
the
server's
workflows.
B
Using
beehive,
we
innovated
in
several
systems,
domains
with
the
first
work
called
reap
snapshots,
which
improves
the
reaction
time
to
traffic
changes
by
accelerating
launching
new
instances
of
functions
by
using
far
functional
working
set
of
air
snapshots.
This
reap
technique
is
already
supported
in
in
the
latest
version
of
aws
firecracker
and
the
second
work
called
expedited.
Data
transfers
or
xdt
introduces
a
cloud,
a
novel,
cross-function
communication
fabric
for
serverless
clouds,
which
enables
data
transfers
at
the
wire
speed,
in
conjunction
with
existing
servers
of
the
scaling
infrastructure.
B
And
this
is
the
work
we
actually
prototype
in
k
native
and
there
is
another
work
which
called,
which
is
called
jukebox,
which
got
recently
published
in
the
top
conference,
which
talks
about
specializing
micro
architecture
for
service
workloads.
B
So
in
this
talk,
I'm
going
to
start
by
presenting
the
performance
analysis
of
the
commercial
clouds
that
we
did
with
two
code
stellar,
which
we
deviced
and
then
I'll
talk
about
the
experimentation
framework
that
we
hive
represents
and
show
you
how
to
address
the
real
world
problems,
whether
it's
a
cold
start
or
serverless
functions,
communication
and
I'll
briefly
touch
about
the
future
work
for
future
directions
of
the
ecosystem.
B
B
And
so
to
launch
a
new
function
instance:
first,
a
function
image
has
to
be
retrieved
from
the
storage
service,
for
example,
a
container
image
to
launch
a
new
function,
a
new
instance.
B
B
So,
to
reason
about
the
overall
performance
of
service
clouds,
we
devised
the
method
and
tool
chain
to
analyze
the
performance
of
each
of
these
fundamental
components
in
isolation.
B
With
this
insight,
we
introduce
stellar
an
open
source
framework
for
performance,
analysis
of
serverless
clouds
by
configuring
function,
characteristics
and
the
load
scenarios.
Stellar
can
stress
any
of
these
components
in
isolation,
and
we
showcase
some
of
the
results
by
comparing
three
leading
cloud
providers.
B
B
B
We
noticed
that
python
container
deployments
are
significantly
slower
and
more
unpredictable
than
the
others.
One
possible
explanation
of
this
phenomenon
is
that
a
golden
program
is
compiled
as
a
static
binary,
suggesting
that
both
zip
and
container
image
comprise
the
same
binaries
that
are
likely
to
be
stored
in
the
same
storage
service.
Meanwhile,
for
python,
a
container
based
deployment
shows
higher
median
and
tail
agencies
compared
to
the
corresponding
zip
deployment.
We
attribute
this
behavior
to
the
fact
that
python
imports
modules
dynamically
requiring
on-demand
access
to
multiple
distance
files
in
the
function
image.
B
When
combined
with
the
container-based
deployment
methods,
we
hypothesize
that
this
results
in
multiple
access
to
the
function.
Image,
storage
since
container
support
splinters
the
the
the
chunks
of
the
image
and
the
loads
them
on
demand.
B
The
additional
access
to
the
image
store
would
explain
the
high
cold
start
time
and
lengthy
variability,
which
we
observed
for
writing
and
container
based
deployments.
Particularly
combination
of
this
runtime
choice
and
the
deployment
methods
have
severe
impact
on
the
overall
performance,
so
compared
to
zip.
Container-Based
deployments
for
python
can
significantly
increase
the
response
time.
B
B
With
the
first
function,
saving
the
payload
to
storage
and
the
second
function
load
in
the
payload
from
storage.
We
capture
these
latencies
with
timestamps
taken
by
the
user
code
in
this
functions
on
the
left
chart.
One
can
see
median
latency
shown
with
solid
lines.
A
tail
latency
is
shown
as
dash
lines
on
the
right
chart.
The
cdf
lines
are
for
one
megabyte
and
one
gigabyte
payload
transfers
respectively.
B
So
clearly,
storage
transfers
are
slow
and
have
very
unpredictable
response
time,
for
example,
for
one
megabyte
transfer
in
google.
This
delay
the
result
in
150
millisecond
median
and
around
six
second
tail
latencies,
which
corresponds
to
more
than
order
of
magnitude
gap
between
the
tail
and
the
median.
B
B
So
the
first
problem
is
related
to
loading
functions,
relation
state
from
the
storage,
which
is
at
the
corners
of
the
cold
start
problem,
and
the
second
issue
related
to
the
data
movement
is
related
to
the
cross-function
communication,
which
has
to
happen
through
the
storage,
which
contributes
a
lot
to
the
median
and
the
tail
wait
and
see
so
for
today's
systems.
B
There
is
a
huge
for
headroom
for
improvement
for
both
of
these
issues
and
it's
fair
to
say
that
today's
clouds
are
bound
by
data
movement.
So
this
is
the
problem
that
we
targeted
first
in
our
research.
C
So
the
reason
I
think
that
storage
is
external
is
to
enable
scaling
to
keep
functions.
Stateless,
so
did
you
evaluate
actual
scaling
of
function,
not
just
looking
on
latency
of
execution?
So
like
what
happens
when
you
increase
number
of
invocations
like
1000
times,
and
you
know
a
storage
becoming
bottlenecked
there
or
keeping
function.
Stateless
is
actually
keeping
scaling
very
nice
and
also
cost,
because
if
you
look
on
it,
the
cost
is
maybe
the
most
important
thing
about
performance
I
can
buy.
C
You
know
like
dedicated
instances
and
whatnot
and
I
get
very
low
cold
start
and
maybe
dedicated
storage.
But
then
I
will
be
paying
a
lot
and
also
I
cannot
scale
that
much.
B
This
is
an
excellent
question
and
we
actually
did
experiments
with
bursting
vacations
where
we
invoke
the
same
function
like
1000
times
in
a
short
period
of
time,
and
we
actually
see
completely
different
behavior
from
different
providers.
B
B
The
scaling
like
a
cold
start
because
it's
easy
to
have
a
caching
layer
in
front
of
the
storage,
however
yeah,
how
it's
rather
easy
to
have
a
question
layer,
but
all
the
providers
have
different
ways
to
deal
with
scaling.
So
this
is
what
we
observed
and
regarding
the
course
this
is
an
excellent
question.
So
I'll
talk
about
it
a
bit
later,
foreign.
B
In
one
of
the
works,
the
cost
of
transmitting
the
femoral
storage
is
comparable
with
the
compute
costs
of
function
execution.
So
we
know
it
is
like
30
to
60
percent
of
the
cost
going
into
the
storage
cost,
and
we
are
talking
about
s3,
which
is
rather
slow,
storage,
so
cost
is
definitely
a
problem.
D
Sorry
I
worked
on
the
google
infrastructure.
This
is
fascinating
to
see
major
from
the
outside.
B
You
know
we
have
like
this
is
an
overview.
You
know
overview
talk,
so
we
can
go
as
deep
as
you
like.
Actually,
and
we
can
have.
You
know
a
much
deeper
dive
together
with
you
and
your
team.
D
I'm
not
working
at
google
anymore,
but
but
it's
really
interesting
to
see
lambda
actually
has
a
slightly
different
architecture
than
google
cloud.
I
don't
know
what
azure's
internal
architecture
is,
but
I
link
to
a
youtube.
Video
aws
has
a
great
how
lambda
works.
Talk
from
reinvent
2018.
B
Yeah
for
sure
just
to
share
yeah
so,
and
we
also
compared
like
all
three
providers,
so
you
can
actually
see
which
provider
is
stronger,
in
which
sense
so.
E
A
quick
question
I
have
because
you
said
the
communication
happens
through
storage.
I'm
going
to
assume
that
not
all
communications
happens
through
a
storage
like
you
can
just
do
an
hp
call
potentially
to
another
function.
Do
you
measure
those
things.
B
Yeah
we
do
measure
so
the
problem
with
inline
communication
when
you
put
arguments
in
the
http
packet
is
that
it's
limited,
so
you
can
transmit
a
lot
of
data,
so
it's
usually
like
several
megabytes
and
another
problem
is
that
it
really
limits
the
function
execution
model.
So
you
cannot
efficiently
support
scatter
gather
patterns,
for
example,.
B
All
right
cool,
so
stewart
allowed
me
allowed
us
to
pinpoint
and
evaluate
the
data
movement
issues
at
the
component
level.
So
now
we
need
to
dive
deeper
into
this.
So
what
we
did
is
we
tried
to
analyze
the
tools
available
for
service
research
in
like
open
source
academic
environment.
So
what
we
looked
at
is
first,
the
production
servers,
deployment
and
these
systems
feature
complex,
distributed
software
stacks
with
many
proprietary
components
and,
on
the
other
side,
the
two
chains
which
are
available
to
the
academic
researchers.
B
They
are
insufficient.
Often
insufficient
and
academics
often
focus
on
distinct
components
rather
than
complete
systems,
and
also
many
prototypes
rely
on
technologies
like
containers,
while
most
provider
like
at
least
the
biggest
providers,
they
moved
on
to
lightweight
virtualization.
B
So
what
we
had
to
build
is
a
complete
open
source
framework
for
service
research.
The
good
news
is
that
there
are
so
many
companies
which
open
source
their
key
components
and
we
integrated
them
in
a
single
representative
framework
for
serverless
experimentation.
So
we
adopted
k
native
and
kubernetes
as
a
function
as
a
service
programming
model
and
the
orchestration
framework
for
sandboxing
technologies.
B
We
support
firecracker
and
juveniles
and
micro
vms,
along
with
the
vanilla
containers
that
k
native
supports
and
for
life
cycle
of
micro,
vms
and
other
control,
plane
messages
we
support
container
d,
just
like
a
native
and
jpc,
is
used
across
the
whole
stack,
which
comes
with
a
lot
of
advantages,
for
example,
for
metrics
collection
and
profiling.
B
So
what
behind
framework
today
is?
It
is
representative
of
production
clouds,
and
it
includes
only
open
source
reaction
grade
components
and
what
we
are
actively
working
on
and
keep
expanding
is
the
tool
chain
for
holistic
benchmarking
both
end
to
end
and
per
component,
which
includes
a
representative
suit
of
the
workloads
distributed,
tracing
support
and
also
full
system.
Psychoaccurate
simulation
support
in
j5
cpu
simulator.
B
So
we
have
allowed
us
to
innovate
in
three
different
system
subfields,
and
this
talk
will
focus
on
two
works,
only
yeah,
so
the
first
one
is
innovation
in
the
operating
system
and
the
cold
starts,
and
the
second
one
is
actually
the
communication.
B
First
is
the
time
that
the
scheduler
takes
to
select
the
worker
host
to
launch
a
new
instance
and
according
to
aws,
this
is
the
shortest
and
most
predictable
delay.
So
I
actually
exclude
this
from
the
analysis,
even
though
in
key
native,
it's
not
as
it's
not
the
case.
It's
still
quite
long
and
the
second
one
is
the
second
category
is
related
to
loading
the
state
from
the
storage,
and
this
depends
on
the
function
instance.
Isolation,
technology
and
the
third
phase
is
actually
the
the
processing
in
our
experiment.
B
B
So
the
snapshot
is
taken
when
the
microvm
has
a
function
instance
which
runs
inside
and
is
completely
ready
to
serve
in
vacations
during
the
restoration
first,
the
hypervisor
loads
and
restores
the
state
of
the
virtual
machine
monitor
and
the
emulated
devices.
Second,
the
hypervisor
maps,
the
guest
memory
file
into
the
main
memory
without
populating
the
memory
contents-
and
this
is
important-
and
then
the
hypervisor
resumes
the
execution
of
the
virtual
machine
from
the
point
at
which
the
snapshot
was
taken.
B
B
On
this
chart,
you
can
see
the
measured
latency
breakdown
as
pairs
of
stack
bars
for
each
function,
type
with
left
bars,
showing
the
warm
allocation
lenses
and
the
right
bars
showing
the
cold
and
vacation
weights,
and
you
can
notice
that
the
right
bars
are
much
higher
than
the
left
bars
and
particularly,
there
is
a
big
orange
fraction,
which
is
related
to
establishing
a
new
connection
which
is
not
even
visible
in
the
worm
case,
because
it's
not
there
and
second,
you
can
notice
that
the
green
part,
which
is
function,
processing
increased
in
order
by
an
order
of
magnitude.
B
And
the
deal
is
that
these
functions
are
written
in
python
and
use
quite
a
lot
of
different
functionality
inside
the
guest
printing
system,
for
example,
the
guest
networking
stack
now
we
should
recall
that
firecracker
doesn't
populate
the
guest
memory
with
its
contents
from
the
snapshot
and
instead
of
relies
on
lazy
paging,
which
results
in
a
series
of
page
faults
arising
after
the
function.
It
resumes
its
execution.
B
These
page
faults
are
processed
one
by
one
and
take
a
lot
of
time,
because
many
of
them
require
retrieving
their
contents
from
disk.
That's
why
we
found
that
the
disk
accesses
upon
the
page
faults
dominate
the
whole
cold
start
latency.
We
traced
these
page
folds
and
found
out
that
the
following
the
following
key
observation:
when
functions
execute,
they
touch
almost
the
same
set
of
memory
pages,
which
means
that
the
function
have
stable
working
sets
across
invocation.
B
So
now,
if
you
imagine
that,
like
a
function
which
rotates
an
image,
it
kind
of
makes
sense
that,
whether
it
retains
a
cat
image
or
a
dog
image,
you
will
still
engage
the
same
python
modules.
For
example,
it's
the
same
networking
stack
and
so
on.
So
this
led
to
an
intuitive
solution
to
record
and
prefetch
the
memory
pages
from
this
workings
working
sets
of
for
each
function.
B
B
B
So
first,
the
entire
working
set
file
is
read
from
the
storage
in
a
single
io
operation.
Then
all
these
pages
are
installed
eagerly
at
once
into
the
guest
memory,
and
this
allows
to
avoid
the
bulk
of
the
page
faults,
except
for
the
rare
accesses
to
the
pages
outside
of
the
working
set,
which
are
still
retrieved
from
the
storage
on
demand,
and
this
way
reap
snapshots
accelerate
all
the
calls
start
after
the
first
invocation
of
the
little
at
the
cost
of
the
little
extra
storage.
B
So
this
plot
shows
the
caused
awareness
of
different
functions.
Each
pair
of
bars
corresponds
to
a
single
function
type
and
in
each
pair
the
left
bar
stands
for
the
vanilla,
firecracker
snapshots.
While
the
right
bar
stands
for
their
rip
snapshots,
first,
one
can
see
that
the
reap
significantly
reduces
the
time
of
restoring
the
connection
between
the
function,
server
inside
the
micro
vm
and
the
rest
of
the
infrastructure
showing
the
efficiency
of
profession.
B
The
network
in
stack
and
the
commonly
used
used
and
reused
code
and
second,
the
function
processing
is
a
fraction
is
reduced
by
more
than
four
times
on
average
for
all
these
workloads
and
the
overall
speed
ups
for
all,
these
functions
are
significant
as
well.
So
it's
a
software
technique
which
delivers
a
multi-fold
acceleration
very
quite
impressive,
and
this
all
comes
with
a
small
extra
file
being
recorded.
B
So
the
way
it
works
is
that
we
found
that
it's
the
lazy
paging,
which
causes
the
long
series
of
page
faults
and
what
rip
does
it
introduces
selective
eager
pre-folding
by
in
moving
the
working
set
pages
all
inside
the
guest
memory
at
once,
exploring
the
trade-off
of
just
a
little
bit
extra
storage
for
this
working
set
files.
B
To
get
rid
of
the
bulk
of
the
page
faults-
and
this
is
already
supported
in
aws
firecracker-
these
do
support
like
user
page
fault
handling,
which,
however,
they
don't
really
open
source.
The
only
like
one
day,
wirecracker
had
why
the
code
is
available.
B
D
I
had
a
couple
questions
over
in
the
chat
this
was
running
on
vhive
yeah
were
the
container
images
already
prefetched
locally
onto
the
disk
for
the
timing.
B
Well,
in
this
case,
in
this
case,
it
was
just
loading
the
snapshots
right,
so
the
dev
mappers,
the
shutter,
was
already
full,
so
okay,
this
was.
B
Yeah,
this
is
actually
a
great
question
and
like
in
the
production
system,
this
would
be
another
fraction
as
well.
D
There's
also
some
work
with
container
d
and
crio
to
fetch
docker
images
on
a
sort
of
lazy
basis
on
demand.
Basically,
because
if
you
use
the
default
mechanism,
you
need
to
pull
down
and
unpack
all
the
tars
and
write
them
all
out
to
the
file
system.
Before
you
can
start
your
container
yeah.
F
B
D
B
Is
another
cold
start
hit
yeah?
Absolutely
so,
with
with
the
guest
memory
like
with
snapshotting,
the
dev
mapper
should
not
be
on
the
critical
path,
but
it
can
be
for,
like
you
know,
four
parts
of
the
file
system,
which
is
not
memory,
mapped
and
that
can
be
a
big
hit
because,
like
docker
images
are
compressed
all
over
the
place
and
across
the
board.
So
decompression
takes
way
more
time
than
you
know,
map
and
even
page
faults.
So
we
need
to
restructure
that.
D
Part
there's
a
star
gz
effort
that
attempts
to
restructure
that
compatibly
within
the
docker
ecosystem,
but
it
sounds
like
these
had
already
been
unpacked
into
firecracker
memory
maps
things
yeah.
B
So
this
is
something
that
we're
just
about
to
merge.
Actually,
so
we're
going
to
merge
the
support
for
that,
and
then
we
actually
wanted
to
explore
how
this
extends
to
multi-node
setup
and
that's
very
interesting
as
well,
because
there
is
a
lot
of
cost
radius
performance
trade-offs
all
around
that.
If
you
can
store.
D
Both
the
memory
mapped
things
and
those
prefetch
maps
in
shared
storage.
You
can
get
a
lot
of
scalability
exactly.
B
Yeah
but
the
most
simple
case
you
just
store
them
together,
basically,
but
obviously
like
we
can
take
a
look
at
how
to
scale
it
better.
E
Yeah,
that
was
my
question
about
like.
Are
you
moving
snapshots
across
the
worker
nodes?
It's
interesting,
I'm
glad
you're.
Thinking
about
that.
One
thing
I
was
gonna
ask,
but
then
you
answer
it
here
about,
like
I'm
surprised,
the
page
faulting
seems
to
be
just
like
a
firecracker
issue
and
I'm
surprised
like.
Is
there
a
reason,
I'm
assuming
there's
a
reason
why
they
didn't
introduce
the
like
re-hydrating
the
guest
memory
right?
Is
that
do
you
know
what
that
reason
is.
B
Well,
it's
actually
it's
not
a
firecracker
specific
issue.
It's
just
related
on.
You
know
the
host
os,
relying
on
lazy
patient
across
the
board
right.
The
way
lazy
patient
is
the
default
policy.
So
if
you.
E
B
Right,
yeah,
exactly
so,
if
you
use
divisor
snapshotting
it's,
you
would
probably
also
have
like
either
lazy,
paging
or
eager
patient,
which
means
moving
everything
which
is
also
suboptimal
as
we
find
so.
E
Yeah,
that's
what
I
thought
I
guess
to
clarify.
What
is
in
the
snapshot
is
probably
useful
for
me
to
know.
Are
you
including
the
guest
memory
or
just
the
like?
Just
the
app
essentially.
B
It's
the
subset
of
guest
memory
pages
and
like
the
like,
the
hypervisor
doesn't
need
to
know
what's
inside,
so
we
actually,
we
are
completely
dumb
and
be
in
the
oblivious
to
what
we
snapshot
in
this
working
set
files.
But
it's
like
it's
gonna,
be
the
modules
and
the
libraries
which
are
touched
upon
the
invocations,
for
example,
not
the
ones
which
are
used
for
which
are
attached
during
boot
time,
for
example,.
D
But
if
you're
using,
for
example,
java
this
wouldn't
capture
the
jitted
code
after
hotspot,
would
it.
B
That's
a
good
question
actually,
so
I
think
this
question
is
how
much
the
applications
of
wolves
right
during
their
lifetime-
and
there
is
nothing
preventing
us
from
recapturing
updating
this
memory
pages
and
so
on,
and
also
if
the
updated
code
resides
in
the
same
exact
guest
memory
pages,
then
it's
going
to
be
reused.
Naturally,.
D
B
That's
100
true,
and
this
is
not
something
that
we
add
to
the
security
problem
like
the
security
model.
This
is
in
an
inherent
problem
for
snapshots
in
general
and
actually
aws
is
working
on
this
issue.
They
are
there
and
there
is
like
more
issues
related
to
that.
There
is,
for
example,
aslr
is
also
compromised
by
default
yep.
So
there
are
a
bunch
of
stuff
that
needs
to
be
fixed
and,
as
far
as
understand,
aws
folks,
they
already
start
patching
linux,
at
least
for.
D
Generators,
I
would
love
to
find
a
standardized
way
to
do
this
and
to
fold
that
into
k
native
but
yeah
aslr
is
address-based
layout,
randomization.
B
That's
exactly
right.
I
can
just
point
you
to
one
of
the
archive
papers
that
aws
folks
published,
and
they
at
least
enumerate
the
problems
and
the
patches
that
they're
they
were
in
work.
This
was
here
that
would
be
very
interesting
yeah.
Definitely
it
would
be
good
to
actually
somehow
connect
to
the
community
like
to
exchange.
You
know
more
information,
maybe
questions
answered,
and
so
on.
I
see
a
lot
of
like
you,
know,
good
questions
and
reactions.
B
All
right,
so,
let's
talk
about
the
communication
now
so
service
workloads
are
diverse
and
many
of
them
have
a
lot
of
communication.
As
you
know
so,
to
devise
a
solution
for
fast
cross-function
communication,
we
need
to
account
for
special
characteristics
of
service
programming.
Mode
first
functions
are
stateless
and
they
cannot
hold
state
or
share
it
directly
with
other
functions
due
to
scalability.
B
B
So
this
leaves
functions
to
communicate
only
through
an
external
storage
in
general
case
and
there's
a
fundamental
fundamental
contradiction
here,
because
serverless
compute
scales
purely
on
demand-
and
it
is
stateless,
but
it
has
to
be
coupled
with
this
classic
stateful,
always
on
service,
which
is
storage
and
as
a
result,
often
the
storage
people
choose
like
s3,
it's
slow,
because
it's
not
device
for
such
workloads
and
also
it's
pricey
and
like
for
the
workloads.
We
consider
it's
up
to
70
of
the
cost.
B
B
Let's
consider
the
same
video
96
service,
which
I
presented
in
the
beginning,
the
first
function
decodes
frames
from
the
videos,
video
fragments
coming
from
the
camera
and
invokes
the
second
function
for
object,
recognition.
So
in
serverless.
The
deployment
of
this
two
functions
would
look
somehow
somewhat
like
this.
So
first,
let's
assume
that
the
service
has
been
operating
for
a
while,
so
there
are
already
active
instances
for
each
of
these
functions
and
the
everything
starts
with
the
decoded
function,
producing
a
frame
in
which
it
wants
to
recognize
objects.
B
B
So
now,
if
you
look
at
the
sequence
of
these
actions,
it
is
clear
that
the
real
fundamental
problem
is
due
to
the
fact
that
the
decode
function
instance,
which
is
the
source
of
the
object
transfer,
doesn't
know
the
destination
of
this
transfer.
And
that
is
why
an
external
storage
service
needs
to
be
in
the
in
the
past.
B
B
So
we
introduce
a
technique
called
xdt
or
expedited
data
transfers,
which
enables
communication
without
storage
again,
so
a
recorder
function
instance
produces
a
frame
and
then
the
user
code
invokes
invokes
the
second
function
directly
directly
with
the
object
as
an
argument
which
can
be
a
heavy
object
and
then
the
runtime
in
that
decoder
function
instance
keeps
this
object.
Buffered
in
memory
of
the
decoder
function,
creating
an
xdt
reference.
B
The
runtime
then
replaces
the
object
in
the
invocation
request
with
the
x80
reference
and
then
forwards
it
to
the
scheduler
and
this
lightweight
packet,
with
the
object
stripped
and
like
in
the
baseline
case.
The
scanner
chooses
the
most
appropriate
instance
of
the
recognition
function
and
forwards.
The
invocation
to
the
runtime
inside
that
instance,
the
runtime.
B
B
So
this
way
xct
achieves
direct
object,
transfer
communication
in
conjunction
with
existing
auto
scaling
load,
balancing
which
is
driven
by
the
scheduler.
B
A
few
words
about
the
implementation,
so
the
key
design
principle
of
xt
is
in
separating
of
serverless
control
and
data
planes
which
we
implement
in
xdt
sdk,
which
is
kind
of
conceptually
similar
to
aws
sdk
that
is
used
for
packaging
the
functions
so
at
the
transfer
size
as
a
transfer
source.
The
sdk
offers
the
same
api
as
the
baseline
system
invoke
api
or
get
put
api
transparently.
B
Extracting
all
the
large
objects
from
the
control
plane
requests
the
invocation
itself,
which
still
goes
through
the
scheduler,
whereas
the
heavy
objects
go
through
the
expedited
data
plane
and
at
the
destination.
The
sdk
reassembles.
The
original
invocation
with
all
objects
before
passing
the
control
to
the
user
code,
so
note
that
the
objects
are
located
inside
the
memory
of
the
source
function
instance
and
add
no
extra
footprint
compared
to
the
basement,
transparent
data.
Point
separation
allows
cloud
providers
to
choose
any
protocol.
B
B
Namely
video
analytics
ensemble
model,
training
and
mapreduce,
and
here
I
put
the
lindsey
breakdown
for
a
single
request
going
through
each
of
these
workloads,
where
it
flows.
Each
of
the
workloads,
as
you
can
imagine,
have
several
functions,
each
of
which
has
compute
phase
shown
in
orangish
colors
and
communication
waves
shown
in
the
bullish
colors.
So
you
can
see
that
for
all
the
workloads
existing
significantly
reduces
the
communication
pattern,
accelerating
overall
and
execution
for
all
of
these
workloads.
B
So,
to
recap:
the
system
design
lessons
here,
the
key
problem,
why
modern
serverless
systems
require
functions
to
communicate.
Certain
external
storage
relates
to
the
fact
that
it
is
the
scanner
that
makes
decisions
for
the
destination
of
the
transfer,
so
exete
delays
the
actual
transfer
until
that
decision
effectively
replacing
eager
storage,
page
transfers
with
lazy
transfer
but
through
a
high
bandwidth,
low
latency
fabric.
B
So
the
main
away
from
this
work
is
that
storage
is
actually
not
required
because
the
data
can
be
buffered
at
the
source
and
quickly
consumed
and
recycled
by
their
destination.
E
E
Would
I
want
to
run
this
in
a
serverless
fashion,
or
do
I
want
to
run
it
in
like
a
classic?
Whatever
people
used
before
like
apache
storm,
we
have
topologies
or
heroin
is
like
the
the
new
version
there
like
is
the
functional
benefit
like
I
guess.
What
would
you
like,
given
the
use
case,
if
I
know
I'm
going
to
have
a
continuous
stream
of
video,
and
I
need
to
do
processing
should
I
use
classical
approaches,
is
maybe
what
I
call
it
versus
like
a
serverless
approach.
B
Yeah,
this
is
an
excellent
case.
This
is
an
excellent
question
and
the
answer
is
that
in
this
case
it's
we
do
support
complete
skill
to
zero
as
compared
to
classic
approaches.
So
I
just
showed
this
simple.
You
know
warm
everything
case,
but
you
can
imagine
that
cold
starts
can
be
overlapped
with
data
transfer
for
further
efficiency.
B
So
we
deliver
efficiency
similar
to
a
streaming
system.
Wheeze
the
scale
down
capability.
B
But
it's
a
great
noise.
This
is
like
a
common
serverless
systems.
They
are
not,
you
know,
compatible
with
streaming
services,
but
with
xdt,
where
we
decouple
the
data
transfer
from
their
like
transfer
of
references,
xt
references,
then
we
can
steer
xt
reference
through
streaming
engines
like
kafka
and
then
like
scatter
gather,
reconcile
like
join,
and
all
of
it
is
totally
possible.
So
this
is
something
that
we
plan
to
work
on
in
future.
D
I
understand
why
s3
is
an
attractive
target
for
that
kind
of
storage,
but
have
you
looked
at
using
using
less
durable,
but
still
external
storage
services
like
redis
or
aerospike,
or
something
like
that
and
how
that
compares,
because
yeah
s3
does
a
lot
to
make
sure
that
once
you've
handed
the
handed
it
data
before
it
returns,
it's
actually
landed
on
some
disks
somewhere
in
a
way
that
you
won't
ever
lose
it
again.
B
Yeah,
this
is
a
great
question
and
the
answer
is
we
can
exit
you
can
perform
better
than
radis.
We
actually
evaluate.
B
Like
we
have
been
invalidating
with
xct
versus
elastic
cache,
which
that
is
really
is
from
aws
and
we
can
perform
better
because
it's
fewer
network
round
trips
and
we
are
not
bound
by
their
redis
replica
itself,
like
if
you
scale
your
producer
instances,
consumers
to
like
tens,
hundreds
and
so
on
you
can.
You
can
scale
your
bandwidth
together
so
now.
The
question
is:
how
to
you
know,
organize
the
programming
model
so
that
we
can
have
continuous
scaling
of
compute,
and
then
you
get
the
bandwidth
scale
and
automatically
session.
A
Hey,
I
just
interject
here
real
quick,
thank
you
dimitri,
so
much.
I
know
people
are
going
to
have
to
start
leaving,
as
we've
only
got
about
a
couple
minutes
left.
A
A
And
then
we
will
hand
it
back
over
to
dimitri,
as
well
as
to
evan
and
carlos
and
any
other
toc
members
or
anybody
else
who
has
an
announcement
that
they
would
like
to
make
for
the
today's
meeting.
A
E
D
And
if
you
think
you
might
be
eligible
to
vote,
you
can
try
logging
into
electo,
and
it
will
tell
you
also
if
you've
been
doing
a
lot
of
stuff
with
the
canadian
community
and
you
aren't
list
is
eligible
to
vote,
which
means
at
least
50
interactions
with
one
of
our
github
projects.
Let
steering
know
asap
and
they
might
be
able
to
do
an
exception,
but
obviously
that
would
be
very
last
minute,
given
that
voting
closes
tomorrow.
G
G
A
one-day
event
happened
during
kipcon
last
week
and
the
videos
are
available
online
already.
So
if
people
want
to
watch
those,
you
can
go
ahead
and
also
there's
other
k
native
talks
in
the
main
schedule
by
others.
A
Yeah
I'll
make
sure
that
that
is
included
great.
B
Feedback,
yeah
yeah.
We
have
some
more,
not
a
what
actually
we
had
some
questions
for
the
canadian.
You
know
engineers
which
can
help
us
and
further
research.
D
G
We
have
some
former
researchers.
B
Cool
yeah
and
we
would
actually
appreciate
the
connections
to
like
people
who
work
on
this
thing
and
like
many
feedback
over
there.
D
And
you
can
talk
to
me
too,
but
dave
is
the
working
group
lead
and
paul
as
well
isn't
on
his
way
there,
I
think,
on
the
the
serving
side
of
things.
E
A
weekly
meeting
it
alternates
so,
for
example,
I
think
like
today
we
cancel
it
because
of
this
meeting.
Conflicts
might
shift
it
around,
but
it's
on
the
canadian
calendar.
So
there's
an
alternate.
This
is
where
the
timing
doesn't
work
out
for
eu
so
essentially
like
I
might.
Maybe
what
I'll
do
is
I'll
shift
the.
If
you
pick
a
date,
I
can
ship
that
meeting
earlier
to
accommodate
european
time.
A
E
Yeah,
because
right
now,
it
kind
of
favors
like
one
time
zone,
favors
japan,
which
one
of
our
contributors
is
in
and
the
other
time
zones
just
favor
kind
of
the
majority,
which
is
like
eastern
and
west
coast.
Time.
A
E
If
we,
if
you
want
to
ping
me,
it's
just
deep
rotatio
on
the
k
native
slack
I
can
and
like
with
the
date,
I
can
shift
that
time
to
accommodate
your
team.
If
you
want
to
have
it,
so
I
think,
for
the
majority
part
like
a
bunch
of
us
are
mostly
like
eastern
and
west
coast,
so
maybe
like
a
10
o'clock
start
is
not
too
extreme
for
us
or
nine,
and
it
might
be
okay
for
you.
So.
E
B
So
yeah,
I
just
want
to
mention
like
what
we're
going
to
work
on.
What
we
are
working
on
right
now
like
a
bit
further,
is
that
we
are
actually
looking
at
the
scheduling
space
and
we
are
working
on
a
methodology
to
replay
traces
from
production
clouds
like
azure
functions
into
like
small
scale
clusters,
so
that
we
keep
all
the
characteristics
of
you
know.
Cold
starts
and
memory
footprints
and
so
on.
B
Another
is
that
we
have
a
benchmark
suite
that
is
called
beastworm,
with
a
lot
of
like
more
than
40
workloads
right
now,
with
small
functions
like
multi-function
workloads
and
so
on,
and
we
also
look
at
simulation
of
the
hardware
and
plan
to
extend
two
emerging
clouds
like
edge,
but
we
haven't
found
k
native
enabled
platforms
for
now.
So
if
there's
any
feedback
from
that
side
would
be
great,
and
I
think
there
were
a
few
questions
that
we
wanted
to
ask.
B
C
Hi,
I
have
one
I
was
asking
it
before,
but
did
you
look
on
you?
You
had
this
diagram
where
k
native
and
then
all
other
systems
that
are
public
cloud
servers,
but
there
are
today
k
native
api
compatible
systems
like
google
cloud
run
and
I
think
the
version
there
are
two
versions
and
there
is
also
ibm
code
engine
available
so
and
they
are
all
serverless
and
you
can
run,
I
think,
all
those
workloads
on
them
but
be
very
interesting
to
find
out.
You
know
how
do
they
compare
or
is
those
optimizations?
B
Yeah,
this
is
a
it's
a
great
point,
yeah,
like
obviously,
with
all
these
platforms,
we
have
visibility.
You
wanted
to
like
some
of
the
you
know,
user
level
infrastructure,
some
optimizations,
that
we
do.
They
need
a
further
deeper
visibility
or
modifications
to
like
hypervisor
or
like
q,
proxies,
for
example,
so
xdt
we
basically
implemented
in
two
process
and
the
sdk,
but
some
of
it
is
possible
yeah
good
point.
Thank
you.
D
D
Why
can't
allows
container
concurrency
greater
than
one
so
could
so
k
native,
evolved
out
of
google
app
engine
and
also
looking
at
some
other
platforms
like
cloud
foundry
and
in
a
lot
of
those
they
allow
you
to
run
a
full-fledged
application
and,
as
you
noticed,
cold
start
can
be
a
little
bit
of
a
pain
and
also,
if
you
allow
only
one
request
at
a
time
you're,
you
have
to
scale
really
high
in
terms
of
concurrency
and
also
cost,
but
a
lot
of
the
time.
These
functions
are
doing.
D
E
And
lambda
allows
for
more
than
one
request
at
a
time.
If
you
configure
it,
I
think,
does
it,
I
think
it
was
something
in
the.
D
Okay,
so
historically,
aws
lambda
had
a
model
where
the
worker
calls
out
to
the
controller
process
and
it
could
only
have
a
single
basically
ticket
in
flight
at
a
time.
This
allows
you
to
do
some
simplified
programming
models
like
having
globals
and
not
have
to
worry
about
it.
It
sounds
like
maybe
in
the
last
couple
of
years
they
may
have
relaxed
that
somewhat,
but
by
default
they
started
with
one
and
that
gave
a
simpler
programming
model
and
if
you
know,
there's
only
one
request
in
flight
at
a
time.
D
If
you
see
an
outbound
connection,
you
know
it's
associated
with
which
request
is
associated
with,
whereas
if
you
have
more
than
one
container
request
at
a
time-
and
you
see
an
outbound
request-
you
don't
know
which
one
the
request
is
associated
with.
Unless
you
do
some
really
fancy
ugly
stuff
that
google
app
engine
did
early
on,
but
I
don't
recommend
it
to
anyone.
B
D
Potentially,
but
you
can
also
run
into
the
same
problems
if
you
reuse
a
container
for
more
than
one
request,
and
they
do
that
so.
D
D
The
next
question
was
about
a
k-native
worker
running
less
than
200
pods,
while
aws
their
workers
will
do
a
thousand,
or
so
it's
in
those
eight
that
aws
internals
talk
so
kubernetes
assigns
an
ip
address
for
each
pod
and
the
default
kubernetes
ip
address
layout
restricts
you
to
about
110
pods,
which
is
a
cubelet
level
flag.
Theoretically,
you
could
do
lots
of
big
workers
by
changing
a
bunch
of
kubernetes
level
flags
in
practice.
D
Very
few
people
do
that,
but
it's
due
to
ip
address,
limited
limitations
in
kubernetes
and
yes,
the
kubernetes
scheduler
is
a
bottleneck.
We've
talked
about
what
would
it
look
like
to
change
kubernetes
so
that
you
could
pre-schedule
a
pod
but
not
start
it
yet
and
lease
those
resources
back
to
the
cubelet
for
lower
priority
tasks?
That's
going
to
be
a
major
engineering
effort.
D
My
estimate
would
be
one
and
a
half
to
two
years
for
four
to
five
engineers,
at
least
one
of
whom
is
pretty
senior
and
deep
in
the
kubernetes,
the
kubernetes
ecosystem.
So
yes,
we're
aware.
Yes,
it's
a
blocker,
also
propagating
ip
address.
Decisions
in
kubernetes
also
ends
up
being
part
of
our
slow
path,
because
you
don't
know
what
I
p
address.
D
A
pod
is
going
to
have
until
it's
assigned
to
a
node
and
that
node
cni
has
made
a
decision
about
that
ip
address
and
then
whoever's
upstream,
like
the
activator
that
needs
to
call
needs
to
find
out
what
ip
address
that
pod
decided
on.
So
if
you
could
either
have
the
activator
be
on
the
same
node
so
that
the
latency
for
that
was
near
zero
or
if
you
knew
that
ip
address
in
advance,
which
starts
to
hit
you
up
against
those
ip
addressing
limits.
B
So
yeah,
that's
a
very
good,
very
good
production
points
and
we
are
actually
actively
looking
at
the
scheduling
right
now
and
we
are
trying
to
understand
the
bottlenecks
and
we
are
trying
to
see
which
policy
which
we
should
use
like
from
the
tune.
Theory
perspective
for.
B
F
Yeah,
if
you
come
to
the
working,
the
surveying
working
group
you'll
find
all
of
us
at
the
same
time.
It's
probably
the
best
time
to
start,
and
then
we
can
like
have
other
meetings
started
like
pick
it
up
from
there.
B
Yeah,
amazing,
that's
pretty
much
what
I
wanted
to
like
talk
about
and
thanks
a
lot.
I
hope
it
was
interesting
and
we
have
a
lot
of
projects
in
flight
both
in
edinburgh
and
in
the
th,
and
the
beehive
is
getting
a
lot
of
popularity.
So
actually
yeah
join
the
community,
follow
us
and
see
how
we
can
do
knowledge
transfer,
for
example,
both
ways.
D
B
To
find
where
to
find
all
these
books
like
in
a
dm,
probably
for
I
need
some
kind
of
you
can.
D
D
G
Yeah,
I
think
we
can
close
and
thank
you
greg
for
hosting
this
first
time,
meetup
appreciate
it
yeah.
It's
my
pleasure.