►
From YouTube: SF JAM Hosted Jenkins @ Google
Description
Scott Zhu talks about Kokoro, the internally hosted Jenkins service at Google.
A
I'm
responsible
for
the
manager
for
the
team
at
Google,
that's
doing
internally
hosted
Jenkins
we've
been
working
for
about
the
last
year
on
bringing
Jenkins
into
Google
and
hosting
it
for
our
developers.
It's
replacing
a
prior
system
which
was
proprietary
and
closed
source,
and
this
is
sort
of
a
little
bit
more
of
the
story
of
our
journey.
Working
with
two
on
board
with
Jenkins.
A
You
can
see
the
there's
a
prior
presentation
by
my
previous
TL,
which
is
up
on
YouTube,
which
has
documented
some
of
the
countries
to
be
made
to
the
core
and
remoting
to
help
improve
the
scalability
of
Jenkins,
because
we
thought
that
was
really
important
for
us
and
Scott
is
a
my
TL
he's
going
to
be
talking
about
sort
of
our
journey
for
the
last
year
and
our
experience
hosting
Jenkins
some
of
the
stuff
we've
done.
Hi.
B
I'm
Scott
I'm
a
software
engineer
in
Google
and
I'm,
currently
the
CEO
for
the
house
that
Jenkins
team
and
we
we
think
Google.
We
call
our
instance
Kokoro.
There
was
a
I
think
when
we
decide
a
name.
We
had
a
small
boat
for
this.
Actually,
our
previous
tool
is
called
post
and
it's
related
to
heart
and
heartbeat.
So
we
kind
of
think
oh
maybe
give
a
new
code
name,
which
is
the
related
to
heart
and
kind
of
the
Japanese
word
is
cool
and
the
core
is
the
meaning
of
heart
in
Japanese.
B
So
that's
we
had
this
code
name,
okay,
so
today,
I
will
talk
a
little
bit
about
the
hosted
the
scale
of
the
hosted
instance
within
Google
for
Jenkins
and
some
of
work.
We
have
done
to
make
it
more
stable,
reliable
and
scalable,
and
also
some
some
array
and
the
key
ways.
B
We
want
to
consolidate
all
this
together
and
give
them
better
experience
and
the
the
part
of
reason
is
to
debit
deprecated
owed
tool
and
also
try
to
avoid
those
like
ad
hoc
usage
of
rankings.
As
yourself
and
currently
as
I
said,
there
is
like
about
1k
per
day
and
we
run
about
100,
ish
agents
for
Windows
and
Linux,
and
there
are
200
projects
and
200
daily
active
users.
B
You
might
wonder,
like
Jenkins,
already
supporting
like
Windows
and
Linux,
why
we
have
a
kind
of
lunch
date
for
that
we
kind
of
run
a
different
approach
for
hosting
slaves.
Sorry,
the
Jenkins
agents
keep
forgetting
that
:
slaves,
so
the
we
want
to
have
a
sandbox
experience.
So
we
run
all
those
windows
and
the
Linux
agents
SVM,
and
we
actually
run
this
agents
binary
outside
of
the
VM
itself.
B
So
it's
kind
of
like
a
peering
instance
and
then
whatever
Beauty
Square,
for
you
run
whizzing
the
Windows
and
Linux
it's
more
like
sandbox
and
we
actually
reboot
the
Windows
and
Linux
afterwards,
so
that
it's
a
clear
environment
for
the
next
view,
and
you
can
guarantee
that
the
boot
is
replaceable
and
an
emetic
and
we
actually
put
a
kind
of
some
amount
of
work
to
support
the
Windows
and
Linux
flavor
of
boots.
And
then
currently
we
are
support.
B
We're
trying
to
support
the
Mac,
OS
and
iOS
viewed
and
that's
currently
under
our
first
priority,
and
we
have
teams
which
one
who
want
to
use
our
tools
and
they
come
to
us
and
they
have
various
requirements
and
a
use
cases.
And
we
discover
those
quite
constantly
and
we
expect
maybe
losing
this
year.
We
probably
can't
hit
like
10k
buttes
per
day.
B
Okay,
so
works,
we
have
done
we.
There
are
some
part
of
the
work
which
we
made
the
invest
like
automate
automate,
the
Jenkins
fair
over
that
really
benefit
us
from
the
beginning,
so
that
we
can
have
a
scalable
and
the
reliable
instance
and
also
some
of
the
work
like
project
configuring.
The
sauce.
Would
you
invest
a
lot
of
on
that
and
during
that
time,
I
think
that
pipeline?
Wasn't
that
mature?
B
Okay,
jump
to
the
boot
config
in
our
project,
config
in
the
source.
I
think
there
are
already
existing
plans
and
approaches
to
do
that.
It's
kind
of,
like
my
my
favorite,
mean
in
like
xkdc,
it's
like.
Oh,
there
are
14
competing
standards
and
it's
ridiculous.
We
should
invent
one
for
ourselves
and
rule
them
all,
and
now
we
have
15
kind
of
centers
now,
so
there
are
existing
like
llamo
and
pipeline
dsls.
B
From
our
perspective,
we
would
you
want
some
features.
Currently,
I,
listen
as
issues
but
I
would
say
its
features.
We
we
really
want
to
have,
because
within
Google
we
have
a
giant
code
base
and
that's
a
single
code
base.
We
are
trying
to
use
right
now.
It's
called
code
name
is
called
Piper,
you
can
search
online
and
we
want
to
have
project
or
putting
their
configs
in
that
color
poetry
and
the
copying
shared
configs
with
it
like
between
projects,
it's
kind
of
pain
and
less
scalable.
B
So
we
want
to
have
project
sharing
there
configs
between
each
other
and
also
we
split
the
project
config
and
you
config
into
two
places
so
that
usually
the
butte
config,
your
beaut
staff
is,
and
we
want
that
boot
configuration
in
a
certain
version
rather
than
always
reading
it
from
head.
So
that
example,
if
you
want
to
build
your
own
version
of
software,
you
still
can
do
do
it
rather
than
reading
all
your
config,
which
is
a
from
head
and
may
break
your
build
okay.
So
the
buttes,
as
I
said,
is
generating
as
generated
at
runtime.
B
B
Probably
that
apply
to
a
lot
of
you,
but
we
think
it's
very
beneficial
to
us,
because
our
instance
is
kind
of
big
and
we
don't
want
people
to
randomly
come
to
Jenkins,
UI
and
clicking
button
and
changing
stuff,
and
what
we
do
is
completely
disable
the
UI
for
project
config.
What
anytime
we
want
to
do
a
project
config,
you
have
to
create
a
changeless
and
then
send
it
to
review
after
people.
Ok
with
that,
the
changes
will
get
committed
and
then
your
config
change
take
effect
so
so
that
we
actually
can
tract.
B
We
change,
get
made
to
the
project
itself
and
also
any
chance
any
chance
of
like
randomly
breaks.
Someone
else
project.
Those
kind
of
scenario
can
be
avoided
and
also,
since
you
cannot
create,
since
you
cannot
modify
to
configure
itself,
we
have
a
service
to
automatically
listen
on
the
commit
of
project
config
and
then
create,
modify
and
delete
the
project
for
you.
C
B
B
B
Okay,
another
work
I
actually
like
that,
a
lot
the
reduce
of
the
master
workload
at
the
very
beginning.
When
we
start
the
project
we
are
thinking
about.
Oh,
we
are
we're
kind
of
Google
and
we
run
stuff
in
scale.
B
We
see
the
backup
size,
get
accumulate
and
increased
very
quickly
and
that
actually
increased
our
backup
time
and
restore
time
when
it
tried
when
it
becomes
a
problem.
So
the
first
thing
kind
of
we
did
is
use
the
external
log
services,
currently
every
single
agents
when
it
starts
up
you'd,
actually
stream.
All
the
log
directly
back
to
your
master
and
master
will
save
you
somewhere.
Unless
you
have
a
project,
config
say
how
many
project
history
I
want
to
keep
your
stay
there
forever.
B
What
we
did
is
we
directly
redirect
or
the
agent
lock
to
external
lock
services,
so
we
don't
save
those
slots
at
all
on
the
master
master
itself
were
only
it
doesn't
it.
It
goes
straight
to
the
logging
service
it
didn't.
Go
back
to
master
master,
only
keep
a
reference
of
that
log
entry
and
that
actually
reduce
the
backup
data
and
so
the
traffic
on
the
master
and
the
load
on
the
master.
A
lot
and
some
other
aspects
like
for
SCM
pooling.
B
We
do
and
have
an
external
service
handle
that
so,
for
example,
if
any
get
changes,
I'm
not
sure
for
the
current
gift
plugin,
but
my
guess
is
it
will
periodically
pull
in
the
git
repository
and
see
if
there
any
changes
come
in.
We
have
an
external
services
to
handle
that,
and
we
have
a
API
interface
say
anytime,
that
external
service
listens
on
any
change.
You
can
trigger
a
beaut,
our
master
and
the
master
will
kick
off
the
boot
for
you
that
also
reduced
the
masterlock
master
traffic
and
load.
B
We
also
save
directly
save
the
artifacts
to
external
storages
that
actually
saves
a
lot
of
space
on
the
master
itself.
If
you
do
package
a
lot
of
binaries
on
the
master,
sometimes
you
actually
want
to
keep
them
for
fairly
long
time,
because
you
want
to
compare
with
the
old
versions
and
we
we
kind
of
have
some
other
storage
system
using
Google
which
save
those
binaries
for
you.
B
B
So
when
we
hit
this
case,
we
are
thinking
about.
Oh,
we
probably
should
to
at
least
some
warm
standby
sitting
there
waiting
see
if
the
live
master
is
still
there.
Otherwise
I
should
quickly
grab
the
mastership
and
then
I
will
start
serving
a
user,
and
then
we
rely
on
the
google
internal
master
election
service.
So
there
will
be.
We
have
three
shards
there.
B
Currently
dad
well
give
us
the
benefits
about
every
time
when
the
master
switch.
Our
downtime
is
about
one
minute.
During
that
one
minute
you
will
not
be
able
to
access
that
Jenkins,
you
I,
don't
do
anything,
but
afterwards
there
will
be
life
in
a
server
again,
but
for
those
in
flight
viewed
it
will
also
get
corrupted
and
which
you
actually
riku
them
and
review
them.
D
B
We
actually
have
that
will
actually
be
covered
in
next
few
slides.
We
have
a
very
dynamic
scaling
agent
who's,
so
the
agents
itself
get
bring
up
by
itself
and
it
connect
to
the
master
and
authenticated
by
a
certain
Google
internal
mechanism.
So
we
don't
actually
met
config
those
agents
and
make
them
connect
to
master.
There's
an
automatic
mechanism
force
to
do
that
and
do
it
in
scale.
C
C
B
I
think
correct
me:
if
I'm
wrong,
David
I,
think
the
Butte
juice
also
get
backed
up
if
there's
a
unexpected
termination
of
the
job,
if
the
Butte
itself
might
get
lost,
but
in
some
certain
case
for
example,
if
we
want
to
do
deploy
anything,
we
do
actually
keep
the
master
boot
queue
in
the
backup
itself.
So
when
the
new
step
by
comes
online,
they
will
actually
have
the
new
out
history
of
the
boot
queue
itself
and
then
we'll
review
all
the
pending
jobs.
B
B
Yes,
so
the
master
election
service
actually
now
write
the
new
master,
for
example
in
the
middle
here,
it
will
try
to
grab
the
master
lock
from
the
master
action
service
and
then
the
marks
election
service-
you
all
know,
oh.
The
second
instance
is
the
new
master.
Now
and
I
should
tell
everyone
who's
coming
to
me
say
who's.
The
new
master
I
should
tell
them
the
second
one.
B
No,
the
agents
actually
talked
to
the
Marcie
election
service.
It
doesn't
do
any
like
SSH
connection.
We
we
have
we'll
have
google
internal
like
RPC
mechanism,
to
do
that.
It's
basically
not
ssh
connection
in
the
oil
HTML.
It's
it's
an
RPC
procedure.
Yes,
we
have
an
open-source
version
which
daikichi
our
keys.
You
can
search
at
GRP,
see
yes,.
B
Yes,
so
the
agent
itself
currently
have
a
very
small
interval
before
I
try
to
retry
the
masa
itself.
The
one
minute
interval
is
between
the
true
master
fell
over.
There
are
some
times
and
for
the
new
master
to
take
to
grab
the
master
back
upload
data
and
load
it
to
itself
and
initialize
itself.
That's
for
the
one
minute,
and
then
the
agents
were
kind
of
periodically
retrying
say
who's.
The
new
master.
B
It's
it's
more
like
the
master
itself,
always
to
talk
to
the
master,
backup
data.
So
currently
we
are
not
mounting
it
as
a
file
system,
but
in
future
we
probably
can
currently
it's
sending
those
backup
data
to
a
storage
system,
and
then
we
copy
it
back
afterwards.
So
that's
why
we
want
to
reduce
the
master
backup
size
as
much
as
possible
so
that
we
can
reduce
that
copy.
C
But
these
big
design
things
are
kind
of
new
for
us,
so
at
Jenkins,
oh,
that
was
the
first
big
long.
The
French
we
had
a
branch
going
for
six,
which
is
unheard
of
in
that
eight
years,
have
been
in
the
project,
so
at
a
contributor
summit,
we're
going
to
be
talking
about
what
are
the
beginning
of
those
baby
steps
that
we
can
do
so.
C
C
A
B
Okay,
there's
also
things
I
mentioned
that
we
have
external
service
which
automatically
manage
the
project,
creation,
editing
and
deletion,
and
some
of
the
work
we
have
done
to
dynamically
make
agent
come
online
and
register
itself
and
make
it
ready
tribute.
So
we
we
actually
don't
have
any
agent
management
overhead
right
now,
I
would
say
we
have.
We
run
agent
SVM's
and
it's
pretty
standard
and
templated,
for
example.
If
I
want
to
bring
up
like
100
more
agents,
I
can
just
easily
change
a
config
file
and
it
will
be
online
like
in
five
minutes.
B
So
as
long
as
we
have
the
enough
resource
for
that
and
the
the
the
tool
which
enable
us
to
do
that
is
like
the
google
internal
scheduling
service
and
we
run
all
our
jobs
on
it.
Currently,
there
is
a
open-source
version
for
that
called
Coogan
ad.
You
could
try
that
and
see
if
it
works
for
you
and
there's
also
things
I
already
mentioned,
and
the
agents
will
automatically
connect
reconnect
to
the
master
if
it
loses
the
connection
with
the
out
one
and
that
it
will
talk
with
the
Massey
election
service
to
the.
B
So
the
service
is
kind
of
quite
stable
for
us
right
now.
There
is
only
one,
our
unplanned
outage
during
the
first
eight
months
of
the
project
and
that
that
outage
actually
is
not
related
to
Jenkins
Seto.
It's
related
to
our
mass
deduction
service,
so
I
would
say
it's
quite
stable
and
we
also
have
done
some
work
to
support
the
Mack
Butte.
Currently,
there
are
some
challenges
for
it
and
I
think
also
true
for
some
of
the
user
here
I
would
say:
running
Mac,
PO
or
Mac.
Feed
is
kind
of
a
little
bit
hard
to
manage.
B
So
we'll
have
to
set
up
special
data
centers
to
just
hope
their
help,
those
Mac
machines
and
that
run
Butte
from
there,
and
that
also
prevents
us
from
using
that
perfect
Google
scheduling,
services
to
schedules
and
we
kind
of
have
to
do
extra
work
to
serve
the
Mac
use
case.
Currently
we
have
a
design
which
is
some
were
similar
for
our
existing
use
case.
B
A
A
D
A
Virtualization
we
did
so
we
experiment
2012,
we
may
it
worked
for
two,
but
without
key
expansion
capabilities
instead,
2014
isn't
powerful
enough.
Who
knows
maybe
someday,
though
I
think
they
did
that
on
purpose,
because
they
were
cannibalizing
their
pro
sales.
That's
the
price
differential
is
ridiculously
stupid
between
between
the
minis
and
the
process,
such
a
steep
jump
that
they
have
to
justify
it.
Someone.
C
B
B
We
have
a
server
which
handle
basic
Mac
images
and
every
time
when
Mac
reboots
and
it
will
grab
a
fresh
image
from
there,
so
that
we
don't
leave
any
bad
bits
which
could
affect
the
next
view
comes
in
and
also
we
turn
off.
Some
of
these
services
on
the
Mac
yourself
example
like
the
suspension
service
and
also
like
there's
a
screen
saver
mode
on
Mac
like
fries
and
the
turnoff,
your
Mac
after
20
minutes.
B
If
we
don't
have
any
actions
and
return
off
that-
and
you
know
you
actually
don't-
have
an
easy
way
to
wake
that
up.
If
you,
if
you
don't,
have
a
way
to
physically
access
it
and
also
we
do
turn
off
some
of
the
like
updates
of
Xcode
and
like
Java
whatever.
Sometimes
we
had
a
our
like
sister
team,
had
a
experience
when
they
run
Mac,
sometimes
one
day,
there's
sunlight
breaks
because
there's
a
pop-up
screen
of
Xcode
update
and
prevent
you
from
doing
further
buttes
before
you
actually
click
the
button
of
yes.
B
B
B
B
B
And
that's
it
so
we
are
part
of
the
developer
infrastructure
team
within
Google
and
we
worked
on
the
beauty
and
release
services
for
Google
and
currently
we
are
managing
the
hosted
Jenkins
instance
in
Google
for
different
teams,
and
this
is
John
our
manager
and
that's
David
and
Shane.
Who
is
the
new
team
member
joining
us
and
some
of
the
there
are
Sabu's
mercy
and
Patrick?
Who
is
our
assistant
admin,
but
they
cannot
extend
today.
B
Since
our
ode,
existing
party
rule
was
maintained,
target
the
iOS
viewed
Windows
and
Linux,
and
those
are
mainly
for
the
clinic
binaries.
So
the
major
team
who
use
our
service
right
now
is,
for
example,
Google
Drive,
sync,
those
Carnot's
running
on
Windows
and
Mac
and
Linux,
and
also
some
other
team
like
Google
Earth's.
They
have
through
package
binaries
and
the
deployer
user.
A
Anything
that
says
when
it
comes
up
at
the
screen
that
says
signed
by
Google
calm
now,
all
of
those
things
fit
in
our
use
case.
So
when
you're
installing
something
and
it
says,
signed
by
Google,
it
came
from
our
service
now
either
some
of
them
are
still
being
built
by
the
old
version,
the
Mac
ones
and
the
iOS
ones.
But
all
the
Linux
and
windows
are.
B
B
Within
Google,
I
would
say
within
Google.
Our
service
is
a
very
small
one.
Only
hosting
for
the
clinic
binaries
and
Google
has
a
gigantic
CI
system,
which
is
only
for
like
web
services
yeah,
so
that
service
is
also
like
Jonas
manager
of
that
service,
and
that
was
like
I
can't
say
like
how
large
it
is
right
now,
but
it's
much
larger
than
our
service
right
now.
There's.
A
B
Kind
of
a
since
we
kind
of
invent
our
own
where,
when
the
part
line
isn't
that
mature,
so
we
still
kind
of
consume
one
single
boot
script
and
run.
Maybe
one
argued
for
you,
but
we
do
have
the
plain
try
to
move
to
pipeline,
because
Jenkins
community
has
spent
a
lot
of
effort
to
make
that
work
and
we
might
integrate
with
pipeline
in
future.
B
That's
right,
I
think
if
we
feel
that
our
plugin
or
our
work
actually
fit
both
internal
and
external
use
case,
who
will
definitely
open
source
that
we
already
pushed
some
of
the
our
changes
and
plugins
to
the
open
source,
world
and
I.
Think
Mac
OS
won't
be
a
exception
for
that.
A
A
If
we
start
hosting
it
for
our
cloud
customers,
it
will
definitely
be
hosting
max
as
well,
so
they'll
just
be
able
to
spin
one
up
and
have
it
work,
whereas
right
now
in
cloud,
you
can't
find
a
Mac
to
be
to
be
had.
So
if
you
need
to
build
on
Mac,
I
think
eventually,
I
can't
say
exactly
when
we'll
have
something
I
think
a
lot
of
people
would
like
that.
We're.
B
Yes,
as
the
original
author
of
our
PC
slave,
we
actually
have
a
another
engineer
within
Google.
Try
to
open-source
that
there's
a
external
version
which
is
like
a
gr,
PC
slave
I'm,
not
sure
about
the
status
of
that,
but
I
do
see
code
reviews
coming
in
for
that
I
might
check
the
status
and
see
if
it's
really
re
open
source,
or
maybe
it's
juniper.
C
I
run
CIA
Tom
Jenkins
I/o,
which
is
the
Jenkins
project
on
Jenkins.
We
recently
migrated
infrastructure,
so
I
rebuilt
the
instance
and
in
LDAP
we
used
a
lot
max
Jenkins
and
we
use
the
matrix
authorization
strategy,
I,
disabled
anybody
that
admins
from
being
able
to
configure
anything
and
the
approach
that
we've
gone
is
with
pipelines
to
say.
C
If
you
provide
a
Jenkins
file,
then
you
can
have
a
job
in
this
instance
and
thus
far,
that's
worked
out
fairly
well,
it
was
before
at
you
know
the
project's
been
going
for
a
long
time
having
people
have
access
to
go
change
these
things,
whether
their
project
or
somebody
else's
over
time
had
resulted
in
a
lot
of
untrackable
and
inaudible
changes.
So
when
we
did
the
migration
to
new
infrastructure,
we
actually
didn't
migrate,
a
lot
of
jobs,
because
I
couldn't
figure
out
who
that
help
did
what
or
why.
C
D
A
B
Currently,
most
of
the
plugins
run
on
our
instance
are
actually
written
bars
by
ourselves
because
we
have
so
highly
customized
environment,
most
of
the
plugins
just
work
out
of
box.
So
in
the
process
of
rewriting
those
plugins,
we
actually
will
take
a
look
and
balance
between
different
user
requests,
but
we
haven't
heard
use
case
so
far
when
user
come
to
us
say:
I
want
to
use
this.