►
From YouTube: Knative tech talk - Build and Autoscaling
Description
This tech talk gives a brief introduction of Build and Autoscaling project in Knative.
Build is presented by Jason Hall from Google.
Autoscaling is presented by Joe Burnett, also from Google.
A
B
The
model
that
cloud
that
it
uses
is
largely
based
on
Google
cloud
builds
model
that
is
historically
because
that's
what
I
worked
on
since
2015
and
at
the
time
that
we
founded
Canadian
and
that
model
is
also
sort
of
similar
to
circles.
So
you
guys
containerized
build
process
if
you've
used
that,
but
they've
all
sort
of
diverged
apart
since
then
build,
has
met
significant
contributions
from
people
at
Google,
pivotal,
Red,
Hat,
many
many
others
it
is.
It
is
a
team
effort
and
we
wouldn't
have
been
able
to
do
it
without
everybody.
B
Builds
resource
model
is,
is
fairly
light
fairly
simple,
especially
compared
to
serving
and
inventing.
We
have
three
custom
resources.
One
build
a
build
specifies
optionally,
specifies
some
source
like
a
get
reboot
to
build
from
the
steps
to
take
on
that
steps
are
required,
steps
to
run
an
order
in
containers
on
the
cluster
when
the
build
controller
sees
a
build.
It
starts
a
pod
and
watches
that
pod
and
then
reports
on
the
status
back
through
the
build
status
build
logs
are
exposed
from
the
underlying
pod.
B
So
if
your
build,
you
know
your
maven
build
or
something
emits
logs.
Those
will
be
available
in
the
underlying
pod
asterisk,
mostly
for
something
we
will
talk
about
later.
In
addition
to
build,
we
have
another
resource
called
build
template,
which
is
basically
a
sheriff
will
be
usable,
parameterised,
build
process.
The
build
specifies
some
steps
and
says:
I
will
run
step
ABC
where
each
of
them
can
be
parameterised
in
some
way
and
then
instantiated
with
a
build.
When
you
create
the
build,
you
say,
go
and
Stan
she
ate
this
template
filling
in
these
parameters.
B
With
these
arguments,
there's
a
library
of
reusable
build
templates
that
candidate
build
templates.
We
have
them
for
like
build
a
Conoco
basil,
build
kit,
maybe
maybe
half
a
dozen
others
and
then
related
to
build
template
is
clustered
built
in
blue,
which
is
just
a
cluster
scope,
diversion
of
a
build
template.
It
can
be
referenced
from
any
namespace,
so
the
build
templates
are
vanilla,
build
templates.
Our
names
based
cluster
build
templates
are
across
the
whole
cluster.
B
The
intention
there
was
for
an
operator
with
significant,
with
sufficient
permissions
to
install
a
template
across
the
whole
cluster
and
for
everybody
to
use
that
build
template
used
the
same,
build
template
across
the
cluster,
as
I
mentioned
before
the
build
lifecycle
is
when
you
create
a
build
resource.
The
build
controller
validates
it
using
the
admission
web
book.
It
does
a
couple
of
simple
things:
like
did
you
specify
steps
if
you
referenced
a
template?
Does
that
company
things
like
that?
B
The
build
translates
that
request
into
a
pod.
Basically,
the
steps
you
specify
become
init
containers
in
in
the
pod.
The
source
is,
if
you
specify
a
source,
we
will
prepend
the
container
to
that
list
of
containers.
That
knows
how
to
French
the
source.
We
do
a
lot
of
crazy
stuff
to
make
credentials
work.
B
You
can
specify
get
credentials
that
are
ssh
credentials.
You
can
specify
user
name
and
password
get
credentials.
You
can
specify
docker,
username
and
password
credentials,
which
authorize
requests
to
push
images
at
the
end
of
the
build
to
a
private
registry
or
pulled
from
a
private
registry.
So
the
controller
sees
the
build
translates
into
a
pod
starts
the
pod
in
the
same
namespace
watches
that
pod
for
updates
as
the
pod
progresses.
B
By
way
of
illustration,
this
is
how
it's
used
in
serving
or
how
it's
how
it's
been
used
in
serving
so
a
certain
configuration
can
specify
a
build
in
its
spec
that
build
can
specify
the
source
you
want
to
build,
how
to
build
it.
In
this
case,
it's
using
a
fill
template
called
Conoco.
The
idea
here
is
that
the
details
of
how
Conoco
works
and
when
what
it
does
is
entirely
hidden
from
the
from
the
deployer
user.
They
don't
they
don't
really
care,
they
don't
have
to
care
how
Conoco
works.
B
As
long
as
that
template
is
installed
in
a
namespace
or
a
cluster
built,
template
is
installed
on
the
cluster.
They
just
have
to
say:
use
Conoco
I,
don't
care
how
and
push
this
image
the
blue
my
image
and
then
in
the
configuration
the
revision
template
says,
use
that
image
the
same
image
that
I
know
which
we'll
talk
about
a
little
more
later.
When
you
create
that
revision,
it
will
first
start
the
build
the
revision
controller
before
start
the
build
watch
the
build
status
while
the
build
is
ongoing.
B
They
say
this
so
I
think
the
basic
needs
of
a
just-in-time
deployment,
a
just-in-time,
build
during
a
serving
deployment.
They
give
you
a
sequential
list
of
steps
to
find
as
containers
they
let
you
build
images
they
actually
let
you
do
anything
if
you
want
to
specify
whatever
steps
you
want,
that
run
unit
tests
or
container
image
scans,
or
you
know,
update
github
or
do
whatever
you
want.
You
can
do
that,
but
primarily
people
just
use
them
for
building
container
images
that
are
then
run
in
that
downstream
revision.
B
It's
a
bit
of
a
misnomer
builds
can
really
do
anything.
They
can
run
tests,
for
instance,
they
barely
type
their
inputs,
so
you
can
tell
build
that
it's
building
a
git
repo
or
a
Google
Cloud
storage
object
other
than
that
it
doesn't
know
anything
about
what
that
means
and
it
doesn't
report
anything
about
its
output.
So
it
doesn't
say,
I
built
this
image.
With
this
digest,
specifically,
it
just
says:
I
succeeded
in
building
the
image.
B
You
asked
me
to
build
the
distinction
there
is
that
it
won't
tell
you
exactly
what
it
built
it
will
just
say.
You
told
me
to
build
X
and
I
did
it
which
it,
which
is
a
bit
of
a
gap
if
it
were
to
do
anything
else.
If
it
was
running
unit
tests,
it
wouldn't
be
able
to
tell
you
anything
any
sort
of
structured
information
about
what
those
unit
tests
were
and
which
ones
paster
succeeded,
faster,
failed,
and
so
that's
a
bit
a
bit
limiting
as
an
implementation
detail.
B
Translating
the
steps
into
any
containers
in
the
pod
made
persistent
logging
very
complicated.
It
turns
out
that
any
containers
aren't
great
in
Corinne
at
ease
in
some
cases
in
a
container
logs
are
dropped
before
all
of
the
ended.
Containers
are
done,
running
which
can
be
problematic
and
even
if
they
are
persistent
until
the
end
of
the
pod
they're
not
persisting
much
after
the
pod
and
because
the
NIC
containers
run
in
serial
before
the
pods.
B
Regular
containers
there's
no
way
to
specify
at
Logging's
ID
card,
for
instance,
to
to
persist
to
collect
those
logs
and
persist
them
somewhere
else.
So
you
really
only
have
like
a
split
second
to
check
those
logs
before
they
disappear
forever
into
the
ether.
There
has
never
been
any
automatic
triggering
of
builds
or
automatic
deployment
process
in
general
for
serving
builds.
Builds
must
either
be
manually,
started
or
started
as
part
of
it
manually
started
serving
deployment.
B
B
Key
native
build
pipeline,
had
tasks,
runs
and
tasks.
Tasks
are
like
build,
templates
task
runs
are
like
builds,
and
then
we
created
another
layer
of
resources
on
top
of
that
called
pipelines
and
pipeline
run
pipelines
define
many
tasks
to
run,
possibly
concurrently
with
typed
inputs
and
outputs
passed
between
them
and
pipeline
runs,
are
the
executing
instantiations
of
a
pipeline,
etc.
B
At
the
end
of
the
day,
test
runs
much
like
builds
start
and
watch.
Pods
at
first
task
runs,
created,
builds
which
started
pods
and
then
updates
were
bubbled
back
up.
We
eventually
removed
to
build
intermediary
in
there
and
resources
is
another
resource
that
that
the
build
pipeline
experiment
added,
which
was
sort
of
to
be
able
to
type
the
inputs
and
outputs.
So
a
thing
says:
I
rely
on
a
git
repo
that
git
repo
is
now
a
resource
resource.
B
There's
only
so
many
words
yeah
success.
I
think
people
were
really
happy
with
the
level
of
express
ability
and
flexibility
that
that
to
build
a
pipeline
project
gave
them.
The
ability
to
report
inputs
and
outputs
like
that
was
was
pretty
compelling.
We
dropped
our
dependency
on
Canadian,
build
and
started
pods
directly.
B
We
dropped
the
NIC
containers
for
better
persistent
logging,
which
turned
out
to
be
really
really
useful
for
it's
like
usability,
and
it
was
so
successful
that
we
moved
that
project
out
of
K
native
into
another
project
called
Tecton,
which
was
donated
to
the
continuous
delivery
foundation
and
moved
out
to
its
own
github
repo.
So
the
build
pipeline
repo
is
no
longer
there.
It
just
redirects
to
tech
on
CD
by
place
similar
to
how
we
have
a
candidate
build
templates
repo.
We
have
a
tech
con,
CD,
catalog
repo.
B
But
at
the
end
of
this,
at
the
end
of
this
experiment
and
and
sort
of
the
success
of
build
pipeline,
Tecton
pipelines,
we
were
sort
of
faced
with
an
uncomfortable
situation.
K
native
build
still
exists
exists
today.
Tech
con
pipeline
exists
today.
They
largely
do
the
same
thing,
but
they
don't
share
much
if
any
code.
They
share
a
little
bit
of
like
a
native
PKG
code
and
they
share
some
concepts
and
they
share
some
copied
four
code,
but
mostly
they're
separate
efforts,
logging,
persistence
and
candidate
build
is
still
hard.
B
We
don't,
we
still
don't
have
into
typed
inputs
or
typed
outputs
in
caƱedo.
Build
and
Tecton
has
all
of
these
things.
It
also
led
to
a
lot
of
aside
from
duplicated
effort.
It
led
to
a
lot
of
user
confusion.
Users
show
up
set,
the
users
show
up
at
Tecton.
Sorry,
users
show
up
at
K
native
dev
and
say
I
want
to
use,
build,
they
use
it
for
a
little,
and
then
they
say
this
isn't
really
solving
what
I
need.
Someone
eventually
says
you
should
try
Tecton
and
they
say
Tecton
looks
very
similar
to
this.
B
Why
do
you
have
both?
Why
do
I?
Why
do
I
need
to
figure
out
how
to
use
both
or
which
one
of
the
both
to
use
and
the
build
working
group
spent
a
couple
of
months
trying
to
resolve
this
split?
Trying
to
think
of
a
good
way
to
resolve
this
split,
we
thought
about
having
build
the
K
native,
build
components,
depend
on
a
tech,
con
installation,
whether
that's
one
that
was
installed
when
you
installed
build
or
whether
that
was
one
that
we
required
you
to
have
installed
both
of
those
have
downsides,
pretty
serious
ones.
B
Operators
shouldn't
have
to
manage
a
matrix
of
compatible
versions
of
K
native
built
and
tacked
on
pipelines.
We
don't
want
to
have
the
default
K
native
installation
require
a
Tecton
installation.
We
don't
want
to
have
the
default
installation
of
can
ativ
install
Tecton
for
you.
If
so,
which
version
like
it's
it's
a
gigantic
headache.
Another
option
was
that
Tecton
would
produce
a
library
for
creating
and
watching
pods
that
build
would
consume.
B
This
also
has
some
overhead
as
far
as
like
who
is
responsible
for
features
to
that
library
and
sort
of
at
the
same
time,
we
were
going
through
this
existential
crisis
of
what
ease
what
is
even
a
built.
What
is
build
before
build
historically,
like
I
said,
was
for
just-in-time,
builds
as
part
of
a
deployment
process.
B
That's
great
and
that's
a
really
good,
like
getting
started
user
workflow,
but
where
do
tests
happen
in
that
world?
Where
do
more
complicated
things
like
integration
tests
especially
happen
in
that
world?
It's
not
a
very
clean
story
in
that
case
and
really
what
we
want
and
what
users
want
is
CI
CD
they
want
like.
That
is
the
best
practice
we
should
be.
We
should
be
pushing
on
our
users.
B
It's
like
don't
do
all
of
this
work
as
part
of
a
deployment
do
all
of
the
work
and
then,
if
it's
successful
deploy
you
know
so
that
was
sort
of
a
lot
of
the
overarching.
We
were
having
like
tactical
discussions
about
how
to
resolve
this
split
technically
and
then
strategic
discussions
about
what
is
what
is
built
good
for.
We
should
instead
I
think
that
the
result
was.
We
should
make
CIC
be
easy
to
adopt
from
the
start
4k
native
and
see
where
we
go
from
there.
B
Meanwhile,
separately
on
another
thread,
Kay
native
serving
v1
beta
one
had
a
proposal
to
stop
bedding
builds
in
serving
I,
think
largely
informed
by
and
and
spurred
by
the
discussion
that
we
were
having
about
whether
just-in-time
builds
are
a
good
thing
or
not,
and
so
this
is
a
slide
from
that
proposal
embedded
into
this
slide.
That
is
a
meta
slide,
but
basically
the
result
was
you
know
how
I
said
in
a
serving
configuration
you
can
specify
a
build
after
v1
Bay,
the
one
you
cannot
this.
B
The
this
slide
says
a
leave
integration
of
build
and
serving
for
a
separate
orchestration
concept,
which
vaguely
could
mean
almost
anything.
It
could
mean
Jenkins,
it
could
mean
Travis,
it
could
mean
a
hosted
service
like
Google
cloud
build,
it
could
mean
tacked
on
it
could
mean
a
client.
It
could
mean
a
lot
of
things
and
likely
it
will
mean
a
lot
of
things.
B
So
where
does
this
leave
build
after
v1?
Be
the
one
serving
will
no
longer
depend
on
build
if
it
wants
to,
but
I
think
that
the
limitations
that
build
are
such
that
I.
Don't
think
it
personally
makes
sense
to
take
any
new
dependency
on
build,
and
so
build
is
sort
of
this
free-floating
entity
in
Canada
today.
B
That
con
is
is
a
separate
non-canadian
project
that
is
more
mature
and
more
powerful
and
more
people
are
contributing
to
it
as
all
the
shared
history
of
Canadian
build
and
more
development,
since
it
left
in
February
partner
and
so
just
Tuesday
yeah,
just
Tuesday
Vincent
de
Mistura
Red
Hat
proposed
this
weekend
a
to
build
repo
to
deprecate
Canada
build
in
favor
of
techsan
pipelines.
This
is
a
hot
off
the
presses
proposal,
and
so
don't
be
surprised.
If
this
is
the
first
you
are
hearing
about
it.
B
It
was
discussed
some
yesterday
in
the
build
working
group
which
I
am
realizing
now
I
haven't
put
up.
The
recording
of
it
is
going
to
be
discussed
at
today's
TOC
meeting
in
about
an
hour
and
15
minutes
if
you're
curious
and
want
to
come
to
that,
and
it
will
be
discussed
at
steering
committee
meetings
and
things
like
that
as
well.
B
Don't
expect
this
to
happen
quickly
if
it
gets
ratified.
If
it
gets
accepted,
it
will
still
be
a
process
of
slowly
deprecating
it
and
responsibly
sort
of
guiding
it
into
the
ground.
So
what
can
you
do
now?
You
can
go
read
this
proposal
discuss
this
proposal.
If
you
have,
if
you
really
really
like
build,
if
you're
using
it
today
and
really
really
love
it
in
it,
and
it
solves
a
specific
problem
that
nothing
else
would
solve
for
you,
please
let.
B
Is
why
it
is
a
proposal
and
not
like
an
edict?
We
would
love
your
feedback.
Is
there
anything
that
Tecton
can't
do
that
build?
Can
that
you're
using
build
for
what
is
your
ideal
developer?
Experience
for
deploying
Kay
native
apps
is
that
something
in
this
can
CLI.
Is
that
something
in
CIC
be
Morgan,
please
let
us
know
we
welcome
and
need
your
feedback.
B
B
B
D
C
C
E
A
C
C
E
Can
you
see
my
screen
yeah,
okay,
great
okay,
so
let
me
introduce
myself.
My
name
is
Joe
Byrne
and
my
email
is
at
Google
I'm.
The
scaling
working
group
lead,
I've,
been
working
on
app
engine
and
then
K
native
for
a
couple
of
years.
Well,
Kate
native,
since
it
started
kind
of
just
fell
into
auto-scaling,
just
because
it
was
a
thing
to
do,
and
Candida
was
first
getting
started
and
I've
been
working
on
it
ever
since
it's
been
a
really
fun
and
interesting
prone
space.
E
What
I'm
not
going
to
touch
on
is
the
core
algorithm
of
the
key
native
autoscaler
in
the
metrics
pipeline,
because
I
think
that
deserves
some
focused
discussion.
So
this
is
part
one
of
two
Marcus
Thomas
with
IBM.
Sorry,
excuse
me:
the
Red
Hat
he
moved
companies,
we
Marcus
Thomas
with
red
hat,
is
going
to
talk
on
the
core
algorithm
and
the
metrics
back
line
the
next
autoscaler
to
talk
in
this
series.
Okay,
so,
let's
start
a
little
bit.
E
Let's
talk
about
auto
scaling,
you
general
okay,
so
auto
scaling
is
about
balancing
performance
with
costs
and
I'm
talking
about
performance,
usually
I'm
talking
about
latency.
How
long
does
it
take
to
give
a
response?
Usually
it
means
that
the
request
is
able
to
get
the
resources
that
it
needs
to
process
the
request
as
fast
as
it
can
and
without
being
throttled
or
delayed.
So
you
can
optimize
for
performance.
This
is
like
you
know,
just
giving
yourself
enough
elevation
provisioning
enough
resources
that
you
just
can
handle
anything
that
is
thrown
at
you.
E
Now
these
these
are
hard
to
mix
server.
Lists
is
about
getting
more
of
both
of
them.
So
what
does
it
mean
to
be
service?
Well,
service
auto-scaling
needs
to
be
very
fast
to
scale
up.
It
needs
to
be
pretty
fast
to
scale
down
and
when
you're
not
using
it,
it
shouldn't
cost
you
anything.
The
resources
should
be
just
there.
That's
what
kind
of
makes
it
this
no
magical,
sparkly
service,
server
list
thing.
E
Is
you
just
you
know
you
just
put
your
code
out
there,
and
it
just
has
the
resources
that
it
needs
when
it
needs
them,
and
you
just
pay
for
what
you
use
it's
like
for
any
of
you,
military
helicopter
enthusiasts.
It's
like
flying
nap
of
the
earth.
You
know
it's
just
just
having
the
resources
that
you
need
just
in
time
and
it's
it's
a
little
bit
harder
than
more
simpler,
auto-scaling
use
cases
so.
E
E
You
know
like
two
parts
of
the
same
two
sides
of
the
same
coin,
because
whenever
you
talk
about
one
you're
going
to
talk
about
the
other
trade
offs
in
one
affect
the
other,
so
as
far
as
routing
goes,
you
could
take
a
centralized
approach
where
you
put
all
the
knowledge
in
one
place
where
you
have
the
request
to
come
to
one
place,
it
sees.
What
requests
are
there?
It
makes
precise
auto-scaling
decisions
sends
the
work
exactly
where
it
needs
to
go.
E
An
example
of
this
is
the
open
whisk
project,
which
is
you
know,
does
exactly
this.
It's
very
efficient
at
low
and
rapidly
changing
load.
It's
good
for
a
function
framework.
However,
the
scale
is
limited
by
having
a
central
choke
point.
Everything
has
to
come
to
one
place
so
that
you
can
make
a
decision
in
there
and
then
send
it
in
to
where
the
work
is
gonna
happen.
Daler
end
of
the
spectrum
is
a
decentralized
routing
and
auto
scaling
system.
E
Now
this
is
kind
of
where
Kay
native
started
with
the
sto.
So
the
assumption
is
that,
like
okay,
we
can't
we,
we
don't
have
all
the
knowledge
in
one
place,
so
in
particular
in
a
mesh
mode
with
a
with
sto
and
a
mesh
mode.
You
know
routing
requests
just
go
directly
of
the
where
they're
needed
routing
is
very,
very
decentralized,
so
those
scale
limits
are
much
much
higher.
It's
a
very
efficient
at
like
high
lobe.
We
know
this
from.
You
know
how
we've
used
it
inside
of
Google
and
it's
it's
efficient.
E
It's
a
relatively
stable
low,
meaning
you
know
if
the
requirements
aren't
changing
really
dramatically.
You
know
it
works
very
well
and
since
you
don't
have
everything
right
there
to
make
a
absolutely
central
auto-scaling
decision,
you
need
to
have
some
sort
of
a
feedback
controller.
Where
you
you
make
a
change.
You
observe
the
system.
You
make
another
change,
you
observe
the
system,
so
there's
a
feedback
loop
that
you
operate
on,
rather
than
this
more
precise
scaling
of
a
centralized
system.
So
keen
Edith
has
some
above
you
know
they
vary.
E
It
started
very
much
on
a
decentralized
end
of
the
spectrum
and
we've
been
including
some
features
of
centralized
routing
as
well.
So
what
I'm
gonna
do
is
I'm
gonna,
walk
you
through
how
Kay
native
scale,
2-0
works
and
you'll
see
how
some
of
these
parts
are,
how
we
kind
of
balance
between
you
know
decentralized,
mesh
mode
and
a
more
centralized
sort
of
queuing
mechanism.
E
Before
we
do,
I
wanted
to
kind
of
touch
on
makinia
entities
just
to
make
sure
everybody's
familiar
with
them.
This
should
be
just
a
refresher
that
the
service
is,
you
know
you
give
the
service
a
container,
and
you
say:
I
want
it
to
run
this
container
in
this
way,
and
part
of
that
is
telling
it
if
it
has
a
concurrency
limit
such
as
this
thing
can
only
handle
one
at
a
time.
That's
where
you
tell
it,
you
say
hey.
This
is
this:
is
the
limitation
of
my
container?
E
It
creates
some
other
entities,
the
configuration
which
is
the
canonical
staff
shot
of
code
and
configuration
that
you
want
to
run
and
an
immutable
list
of
revisions
which
is
stamped
out
every
time.
You
change
the
configurations.
The
revisions
are
the
things
that
actually
run.
They
create
pods
by
way
by
way
of
deployments
and
actually
run
your
code,
and
then
the
route
is
a
thing
that
references
revisions
which
says:
ok,
how
do
you
want
to
get
these
requests?
E
Do
you
want
them
just
to
be
sent
to
whichever
revision
is
most
recent
and
healthy,
or
do
you
want
them
traffic
split?
Or
do
you
want
to
sort
of
have
a
Bluegreen
deployment
etc?
So
these
entities
are
the
public
Cana
divinities.
That
I
think
you
should
be
fairly
familiar
with
for
the
purpose
of
decomposition,
extensibility
and.
E
Internal
mechanics,
we've
created
a
couple
of
internal
custom
resources.
One
is
called
the
pod
autoscaler
and
I
gave
a
I
gave
a
cute
con
talk
on
this
entity,
specifically
the
last
Seattle
cuke
on
December
last
year
called
scaling
from
0
to
infinity,
which
did
it
all
over-promised
and
the
so
you
feel
free
to
go.
Take
a
look
at
that.
D
E
The
scale
to
zero
mechanism
and
the
way
we
handle
endpoints
and
victor
has
created
a
circus
service
SKS,
which
is
a
proper
kubernetes
service
which
is
sort
of
populated
with
weather.
Endpoints
are
capable
of
serving
your
code,
it
may
be
a
running
pod.
It
may
not
be
so.
You'll
see
how
that
works
in
just
a
my
first
I
want
to
kind
of
show
you
what
scale
2-0
looks
like
in
a
key
native.
So
when
you
first
deploy
a
revision,
the
route
creates
some
ingress
stuff
creates
the
misty
or
outs.
E
The
revision
also
creates
a
server
list
service
which
creates
some
private,
which
creates
a
private
service,
and
it
creates
a
public
service
and
the
pods
when
they
become
healthy,
get
put
into
the
private
set
of
private
endpoints
and
the
server
the
service
will
then
copy
those
over
into
the
public
end
points,
and
then
ingress
will
be
able
to
find
those
pods
and
send
requests
to
them.
So
you
serving
path
is
fairly
straightforward
and
because
this
is
a
mash,
if
those
pods
want
to
send
requests
to
other
revisions,
they
don't
go
back
through
ingress.
E
They
just
go
directly
to
the
other
pod,
because
the
service
is
programmed
on
all
of
the
pilots.
All
of
these,
the
ingress
and
the
pods
they
all
have
sidecars,
so
they
all
are
programmed
by
history,
oh,
no,
where
to
send
their
requests,
so
this
is
sort
of
like
the
initial
mode,
very
decentralized
routing.
Now
the
first
thing
that
we
introduced
is
an
auto
scaler.
So
there's
this
cluster
common
component,
it
says
so
here
observes
the
metrics
coming
from
the
pods.
E
It
actually
has
a
system
to
scrape
Prometheus
metrics
from
the
pods
and
say:
okay,
how
many
requests
are
you
operating
now?
How
many
do
you
are
working
on
now
and
it
take
keeps
track
of
the
average
concurrency
and
tries
to
maintain
a
desired
target
average,
concurrency
and
scales
up
and
down
to
achieve
that?
The
actual
mechanics
of
that
are
quite
interesting
and
that's
what
Marcus
is
going
to
talk
about
when
he
comes
comes
to
talk
at
this
series?
E
Well,
what
happens
when
you
stop
receiving
traffic
right?
Because
oftentimes,
you
know
either
you
deploy
another
revision
and
you
stop
using,
though
this
one
or
maybe
your
vision,
is
something
that
is
only
called
a
couple
times
a
day
for
whatever
reason
you
start
receiving
traffic
and
we
need
to
come
all
the
way
back
down
to
the
ground.
Well,
the
circle
is
serviced.
This
is
the
reason
that
we
have
it.
It
actually
takes
the
takes
different
set
of
end
points
and
copies
them
in
to
the
public
in
points.
E
It
knows
how
to
look
at
the
header
and
figure
out
which
revision
it's
meant
to
go
to
and
then
proxy
the
requests
to
those
pods
when
they
become
available,
so
the
activator
caches
requests
when
there's
no
pods
and
then
when
the
climate
is
scaled
to
zero
and
it
waits
for
them
to
be
available
before
proxying
the
request.
So
this
is
where
we
start
to
get
more
a
centralized
system.
You
can
see
that,
since
all
of
the
requests
for
the
revision
come
to
this
one
place,
this
acts
as
a
revision
level.
You
know
each
pause.
E
Each
pod
has
a
small
queue
on
it,
so
that
it
can,
it
can
put
work
into
a
pending
state
while
it's
processing
the
request,
but
here
we
have
actually
a
higher
level
Q
in
the
activator,
so
there's
to
make
the
system
actually
work
in
to
work.
Well,
there's
a
couple
other
things
we
do
kubernetes
will
still
be
populating.
Those
private
endpoints,
which
is
you
know,
just
always
whatever
pods
are
available,
it
puts
into
the
private
endpoints
set
the
activator,
actually
watches
those
private
endpoints
and
is
aware
of
how
many
pods
are
up
and
running.
E
So
it
actually
uses
that
to
throttle
the
number
of
requests
that
it
sends
to
the
downstream
pods
so
that
it
doesn't
overwhelm
them.
This
is
important
because
you
know
you
don't
want
to
take
all
of
the
requests
that
you've
received
and
dump
them
on
the
first
pod.
That
shows
up.
You
want
to
give
it
just
as
much
work
as
it
can
handle
so
that
when
new
pods
show
up,
you
can
give
them
work
too.
This
is
a
way
it
is
sort
of
load
balanced
a
little
bit.
E
It's
not
precise,
like
it's,
not
it's
not
as
though
we're
saying
okay
pod,
you
take
this
one
and
you
take
this
one
and
you
take
this
one,
although
this
is
something
that
we
have
discussed
in
that
key
native
scaling
working
group.
This
is
really
an
interesting
area
of
discussion
like
how
how
intelligent
can
we
make
this
route,
and
we
could
we
could
do
a
lot
with
regard
to
using
these
resources
efficiently
from
the
activator,
the
other.
The
other
additional
piece
here
of
this
system
is
once
the
activator
gets
some
requests.
E
It
gives
a
signal
to
the
autoscaler
to
say:
hey.
I
have
this
back
of
requests.
That
is
this
big.
The
autoscaler
can
make
a
decision
to
create,
as
many
pauses
are
necessary.
So
actually,
when
you
get
your
very
first
request,
that's
zero!
So
suppose
your
scale
to
zero
a
request
comes
in
lands
on
the
activator.
It
doesn't
go
anywhere
because
that
for
aatul
is
it's
zero
as
there's
no
private
endpoints.
E
E
Should
I
come
down,
I
get
proxy
to
the
activator.
I,
don't
go
anywhere
because
there
are
no
private
endpoints.
A
signal
comes
to
the
autoscaler
that
there's
backlog.
Autoscaler
says:
oh
well,
I
see
that
you
have
five
requests.
I'm
gonna
make
you
know
five
pots,
it
scales
it
up.
The
pods
get
created,
they
get
health
checked
put
into
the
private
endpoints.
E
This
is
watched
by
the
activator.
The
activator
opens
up
the
throttle
and
says:
okay,
five
requests.
You
can
go
through
those
requests
get
proxy
to
the
pods
process.
They
do
whatever
they
do
and
then
metrics
are
returned
to
the
autoscaler.
If
you
continue
to
receive
requests
like,
maybe
you
get
more
on
an
ongoing
basis,
they
still
just
kind
of
flow
through
here.
I
wish.
This
thing
would
just
talk
before
they
keep
going
through.
E
Then,
if
you
reach
a
certain
threshold
or
as
the
you
once
once
you're
scaled
above
zero
and
maybe
you're
up
and
running
getting
you
know
some
number
of
QPS,
the
service
service
will
start
copying
the
private
endpoints
into
the
public
endpoints,
taking
the
activator
out
of
the
serving
path
and
then
request
start
going
directly
to
the
pods
again.
So
you
kind
of
back
where
you
started
so
that's
gist
of
how
was
hail
to
and
from
zero
works.
E
Now,
there's
more,
we
can
do
here
and
I'm
gonna
talk
about
like
the
the
next
thing
I
have
to
talk
about
is
what
we're
gonna
I'm
gonna
do
next,
and
but
maybe
I'll
I'll
talk
a
little
bit
about
some
new
work
to
provide
a
guaranteed
burst
capacity.
So
remember
one
night:
when
you
first
give
the
container
to
the
service,
you
say:
hey
I,
want
you
to
run
this
container
and
I
want
you
to
do
it
in
this
way.
It's
one
thing
that
you
can
specify
is
it
would
be
possible
to
say.
E
I
would
also
like
you
to
make
sure
that
any
given
time
you
can
handle
an
additional
1000
QBs
and
since
the
serverless
service
is,
has
full
control
over
where
requests
are
routed.
Through
this
private
and
public
endpoint
mechanism,
we
can
leave
the
activator
in
when
we're
below
the
desired
target,
burst
capacity
and
accept
a
little
bit
of
overhead
from
a
jump
through
the
activator
as
well.
As
you
know,
not
being
able
to
scale.
E
You
know
a
little
bit
more
load
on
the
activator
for
sending
all
the
requests
right
and
kind
of
mix
in
a
little
bit
of
a
centralized
routing
low
scale
or
when,
were
you
know,
below
a
certain
threshold?
I
call
this
duel
mode
routing,
because
we're
kind
of
like
switching
back
and
forth
between
you
know
a
decentralized
and
a
centralized
routing
mechanism.
E
C
E
Calls
a
serve
mode
and
proxy
mode,
so
this
is,
this
can
be
very
powerful.
The
ultimate
goal,
of
course,
is
that
you
would
just
give
a
container
to
a
service,
and
a
revision
would
just
run
it.
You
would
just
send
it
some
requests
and
it
would
serve
those
requests
and
the
latency
would
be
pretty
stable
and
if
you
send
it,
nothing
just
goes
to
zero.
If
you
send
it
a
whole,
it
scales
up
as
high
as
you
needed
to
as
long
as
you
have
cluster
capacity.
E
The
goal
is
for
this
thing
to
be
kind
of
magical.
That's
the
service,
auto
scaling
aspect
of
it.
So
that's
kind
of
that's
the
main
piece
that
I
wanted
to
show
in
this
tech.
Talk
kind
of
sets
up
the
environment
of
our
auto
scaling
system,
as
I
mentioned,
target
burst
capacity,
something
the
victors
working
on,
which
is
really
cool.
It
generalizes
scale
to
zero.
So
it's
really,
you
know
not
just
scale
to
zero.
It's
just
like.
Whenever
you're
below
this
threshold
use
the
activator
as
an.
D
E
E
The
other
thing
another
thing
to
working
on
is
a
cold
start
latency,
so
Greg
Haynes
from
IBM
has
been
spending
a
lot
of
time,
looking
at
what
why
it
takes
a
long
time
for
the
first
request
to
get
serviced,
which
is
about
six
seconds,
it's
it's
much
too
slow.
It's
not
really
that
very
fast
scale-up
that
we
were
promising
and
there's
a
couple
reasons
for
it.
E
The
cubelet
in
this
in
the
custom
container,
runtime
interface,
whatever
you
happen
to
be
running,
usually
are
contributing
a
large
chunk
of
latency
and
actually
just
starting
the
pod
readiness
probes
take
some
time
to
get
going.
I
think
the
smallest
interval
you
can
configure
is
one
second
and
envoy
has
to
start
before.
You
can
send
the
readiness
probes
through
to
your
container
and
then
there's
network
programming,
for
example.
The
pod
that
starts
may
not
be
on
the
same
node,
so
you
have
to
wait
for
everybody
to
know
how
to
get
from
here
to
there.
E
E
E
Cpu
scaling
is
nice
because
you
don't
really
need
to
configure
it.
You
just
you
know
it's
just
a
percentage,
it's
one
of
the
easiest
things
to
use,
but
it
doesn't
scale
to
zero
right
now
and
you
know,
like
key
native
needs
to
inject
some
of
its,
not
I'll.
Let
about
requests
to
enable
scale
to
zero,
because
you'll
never
really
use
zero
CPU
and
you
can't
you
to
get
off
the
ground
right.
So
you
need
the
K
native
networking's
awareness
of
requests
in
flight
in
order
to
scale
and
to
incomes
there.
E
So,
anyway,
that's
something
that
can
be
coming
up
and
there's
a
bunch
of
other
stuff
too,
that
we're
working
on
in
the
auto
scaling
space
is
a
very
lively
discussion.
Each
week
on
scaling
working
group,
every
Wednesday,
9:30
a.m.
PST
slack
is
always
good
to
ask
questions
and
we've
recently
landed
a
2019
roadmap
sort
of
outlining
some
areas
that
we
won't
invest
in.
So
if
you
want
to
know
more,
come
ask
play
around
with
it.
If
you
want
to
work
on
it,
you
know
come
check
out
the
roadmap,
see
what
issues
are
available.
E
One
of
the
engineers
I
mentioned
before
Jacques
Chester
has
implemented
a
simulator
which
is
a
lot
of
fun
to
play
with,
and
it's
really
powerful
for
understanding
how
the
algorithm
works
and
I
think
maybe
that'll
be
more
of
interest
after
Marcus's
talk
next
time.
That's
all
I
wanted
to
present.
Is
there
you
have
any
questions
on.
F
C
E
E
Gonna
like
handle
that,
in
the
most
optimal
way,
we
don't
have
plans
to
really
dig
deep
in
sophisticated
algorithms.
Like
there's
a
lot
of
ways
you
could
tackle
one
you
could
use
what's
called
a
pit
controller,
which
sort
of
is
a
better
mathematical
model
for
describing
these
changes.
It
has
a
proportional
piece
that
will
make
bigger
changes
in
response
to
bigger
error
that
has
a
integral
which
sort
of
understands
how
changes
have
been
accumulating
over
time
and
a
derivative.
So
I
can
see
if
it's
like
you
know,
curving
in
other
direction.
E
A
pig
controller
would
probably
do
better
in
sort
of
a
slow,
continuous
ramp
in
recognizing
that
you
have
that
kind
of
a
profile.
There's
also,
you
know
you
could
do
machine
learning
to
do
predictive,
auto
scaling.
We
haven't
gotten
into
that,
because
we're
still
focusing
on
like
really
making
the
auto
scaling
system
that
we
have
rock-solid.
E
You
know
like
we're,
really
we're
still.
You
know
an
alpha
state,
you
know,
and
a
lot
of
the
service
service
work,
making
sure
that
we
don't
that
we
can
handle
traffic
and
not
drop
it
on
the
floor
and
generally
like
me,
making
all
these
components
a
lot
more
robust
he's
more
important
right
now
than
improving
the
autoscaler
algorithm.
But
if
you
need
it,
then
you
can
do
it.
E
This
I
don't
mean
pull
request.
Welcome
I
mean
like
I.
What
I
mean
is
like
I
have
an
escape
hatch
and
it's
this
pod
autoscaler
here.
So
if
you
want
to,
if
you're
like
okay
I
know
exactly
what
scale
I
should
be
using
right
now,
I've
got
like
just
algorithm,
and
you
know,
and
I
just
want
you
to
use
this.
The
pod
autoscaler
can
be
annotated
with
a
class
and
you
can
provide
a
different
controller
for
that
doesn't
matter.
In
fact,
a
native
comes
with
two
controllers
for
the
pod
autoscaler
resource.
E
One
will
create
a
key
native
auto-scaling
and
one
will
just
turn
around
and
create
a
generic
kubernetes
HPA
resource,
and
that's
actually
how
we
support
scale
on
CPU.
So
you
can
go
implement
your
own
I've,
the
the
coupon
talk
that
I
mentioned
scale
from
0
to
infinity
really
kind
of
walks
you
through
what
that
would
look
like
it
even
gives
an
example
of
like
a
of
an
alternative
controller.
It's
pretty
it's
a
pretty
large
component
to
replace
because
you
have
to
collect
your
own
metrics
and
implement
and
your
own
auto
scaling
algorithm.
E
Yes,
Marcus
pulls
those
two
apart.
It
would
be
easier
to
have
an
algorithm
operates
over
the
existing
key
native
metrics.
You
know,
like
you'll,
have
you'll
be
able
to
replace
smaller
and
smaller
pieces.
You
know
as
we
as
we
tease
the
system
apart
right
now,
you
can,
you
can
still
replace
the
pod
autoscaler
and
implement
a
predictive
algorithm
if
you
want
I,
think
gun
been
browning
actually
recently
has
worked
on
this
and
implemented
a
pod,
auto
scale
or
reconciler
that
auto
scales
on
a
workload
cue
on
a
kafka
cue.
So
that's
something!
E
F
E
E
As
a
whole
has
a
principle
to
be,
let
me
see
if
I
can
get
this
right
decoupled
on
the
top
and
pluggable
on
the
bottom,
meaning
you
can
use
Candide
of
serving
independently.
You
can
use
K
native
build
independently,
but
you
shouldn't
do
it
because
Jason
said
there's
a
better
thing:
now
you
can
use,
you
can
use
T
native
events
independently
and
they
work
they,
but
they
compose
together
and
within
each
of
these
projects.
They're
pluggable,
so
you
can
actually
plug
in
different
pieces
like
this
is.
E
G
G
Wanted
to,
instead
of
meaning
that
dissembling
like
leave
it
there,
but
like
sets
up
us
to
zero
is
the
circle.
Question
is
I.
Maybe
you
already
answer
is
the
HPA
in
upstream
takes
custom
metric
and
they
have
a
new
API
or
maybe
like
recent
API,
to
take
custom
metrics
are
we?
Are
we
pushing
our
metrics
in
there
or
that?
That's
what
you
talk
about,
pluggable
like
we
can
plug
any
X
metrics
into
the
HPA
or
kPa,
using
the
same
API
sure.
E
I
can
talk,
I
can
talk
on
both
those,
so
first
of
all
the
but
then
actually
answer
the
firt
of
the
second
one.
First,
yes,
we're
definitely
planning
on
doing
more
deeper
integration
with
the
v2
ember
Nettie's
HPA,
there's
a
v1
v2
beta
1
and
a
V
2
beta
2.
They
differ
in
a
little
bit.
Only
in
like
what
things
you
can
specify.
V2
beta
2
is
probably
the
best
one.
E
We
do
plan
to
provide
the
key
native
metrics.
That
is
like
the
concurrency
of
each
revision.
We
plan
to
provide
that
as
a
custom
metric
in
the
cluster.
So
Marcus
is
part
of
this.
Decoupling
is
actually
implementing
a
custom,
metrics
adapter,
so
that
anything
in
the
cluster
in
access
or
metrics
in
a
standard
kubernetes
way,
including.
B
E
That's
that's
pretty
powerful
I
mean
ultimately,
if
the
kubernetes
autoscaler
becomes
as
good
or
better
than
ours,
we
can
just
throw
ours
away
right,
there's
not
really
a
strict
need
for
us
to
have
our
own
autoscaler.
The
other
thing
that
it
enables
is.
It
allows
for
us
to
provide
custom
metrics
so
now
that
we
and
actually
that
this
change
just
landed
just
like
a
couple
days
ago
in
the
conserving
we
are
creating
v2
identities.
Now
we
to
beta
want
entities.
E
Endpoint
Imam
and
you
have
a
way
to
scrape
it
you
can.
You
can
tell
the
service
by
the
way
you
should
be
scaling
on
this
metric
name
and
that's
will
be
fun
through
for
you
there's
somebody
working
on
that.
Actually,
in
slack
he
chats
about
it,
sometimes
so
definitely
yeah.
We
plan
to
integrate
with
that
now
on
the
on
the
HP,
a
scale
to
zero
in
kubernetes
upstream
I.
Think
it's
a
very
it's.
It's.
E
E
D
E
The
2018
scaling
roadmap
kind
of
enumerates
a
little
bit
more
clearly
our
principles
in
designing
the
system
which
had
really
changed
our
goals
are
make
it
fast.
So,
first
of
all,
it
has
to
be
a
solid,
fast,
auto
scaling
system
goal.
Number
two
is
make
it
light,
meaning
you
should
be
able
to.
Just
like
you
know,
give
us
your
pot
and
we'll
just
do
the
thing
for
you
right
blight
on
configuration,
and
the
third
thing
is:
make
everything
better
right.