►
Description
Update on Operator SDK - Rob Szumski
Operator Metering - Chance Z
Operators @ Pantheon - Daniel Feinberg
GCP Spark Operator - Chaorun Yu (Lightbend)
A
We
go
guys
alright,
so
welcome
again
to
another
operator
framework
C.
This
is
the
October
meeting.
I
had
put
a
call
out
on
the
email
list
if
anyone's
joined.
Usually
what
I
like
to
do
is
have
someone
external
to
red
hat
he's
working
on
an
operator
talk
I,
couldn't
track
anyone
down
to
commit
to
coming
this
time
next
time
the
folks
from
pain
cap
said
they
would
come
and
talk
about
that
I
did
all
right,
tid
be
operator
that
they've
created
and
how
you
say
that
so
there'll
be
one
next
week.
A
If
someone
who
comes
on
the
call
wants
to
take
a
risk
and
share
their
operator
story,
just
let
us
know
and
we'll
do
that
today,
but
what
we
when
I
have
on
the
agenda
today
is
to
get
an
operator
SDK
update
from
Rob
Szymanski
and
little
talk
about
operator
metering.
Chance
and
I
was
just
looking
at
the
Google
group
to
see
what
else
was
bubbling
up
in
topics
after
we
do
that
the
updates-
maybe
we
can
talk
about
some
of
the
stuff,
that's
on
them.
B
So
we
have
a
few
different
efforts
underway
for
getting
a
more
structured
version
and
released
process
going
for
the
SDK,
and
some
of
this
discussion
has
been
happening
on
some
of
the
PRS
up
in
the
repo.
If
you've
been
following
that
and
what
it
comes
down
to
is
we're
going
to
start
rapidly
progressing
from
our
current
status
to
more
of
a
1.0
status.
B
Has
been
progressing
on
a
separate,
PR
and
there's
about
to
be
merged
here
soon
and
further
complicating
that
we're
adding
a
few
new
types
of
operators
to
the
operator
SDK.
So
we
have
the
ansible
based
operator
sdk,
getting
merged
into
the
mainline.
That
has
also
been
updated
to
use
this
new
library
under
the
hood,
and
so
what
we'd
like
to
do
is
we'll
tag
a
release
of
that
once
we
get
it
merged
gets
close
to
start
testing.
B
If
that
is
going
to
change
around
some
of
the
mechanics
of
how
your
operators
work
and
how
they
just
called
the
sdk
under
the
hood
but
you're,
you
know
baseline
logic,
for
what
you're
doing
and
go
is
going
to
remain
the
same,
and
so
what
we're
going
to
do
is
have
a
beta
period
for
that
and
then
once
we
iron
out
all
the
bugs,
which
will
hopefully
not
take
too
long,
we'll
call
that
the
1.0
and
that'll
be
our.
You
know
stable
API.
B
B
D
I,
don't
I
just
wanted
to
follow
up
kind
of
on
what
you
mentioned,
so
we
already
have
the
controller
runtime
refactoring
changes
moved
on
to
the
master
punch
and
we
have
our
latest
release
version.
Point:
zero,
zero,
seven
I
believe
that's
the
one!
That's
before
any
of
the
controller
one-time
changes.
So
up
until
that
release
you
shouldn't
face
any
breaking.
Yes.
D
B
A
E
All
right,
Lee,
oh
I'm,
chance,
I
work
at
Red,
Hat
and
previously
core
West
I
came
in
with
acquisition
and
I've,
been
with
a
crossroad
hat
for
a
little
over
three
years
and
Rob
asked
me
if
I
would
give
a
brief
overview
of
what
operator
metering
is
and,
if
possible,
also
demo.
So
I
have
all
that
I
thought.
I'd
start
by
just
kind
of
giving
the
base
idea
of
what
metering
is
for
what
its
purpose
is
and
like
we'll
be
able
to
solve
with
it
and
where
you
can
find
out
more
information.
E
E
So
the
project
name
is
operator
metering,
but
I
want
to
preface
that
with
that,
the
fact
that
this
isn't
necessarily
only
geared
towards
operator
these
cases,
though,
that
is
probably
the
best
way
for
to
get
better
integration.
If
you
do
have
an
operator
that
you
can
leverage
metering
in
a
more
communities
native
way.
E
So
the
base
idea
of
metering
is
that
we
work
closely
together
with
your
monitoring,
stack
and
potentially
other
data
sources
to
collect
data,
store
it
for
a
long
term
and
then
provide
the
ability
to
report
on
it
over
over
time
and
slice
and
dice
it.
The
way
you
need
so,
let's
see
just
to
give
a
quick
example
of
kind
of
what
everything
actually
looks
like
we.
B
E
So
what
I'll
do
is
I'll
actually
go
through
this
in
more
detail,
but
that's
just
the
rough
rundown
is
that
it
starts
with
is
from
easiest
query.
You
start
in
just
in
the
data
through
the
reporting
operator.
Reporting
operator
then
has
the
ability
to
query
using
some
sequel
die
either
you
write
or
we
write,
and
then
you
get
that
great
a
run
by
creating
a
report
or
a
scheduled
report
which
actually
says
what
do
you
want
to
report
on?
E
So
in
the
background,
I
already
have
an
installation
of
metering
running
by
default.
It
runs
a
set
of
pods
for
storing
our
data
for
querying
it
and
then
also
for
the
part
that
runs
collection
and
the
actual
queries
themselves.
In
particular.
The
primary
component
here
is
the
reporting
operator.
It's
the
one
that
does
the
data
collection
we
theists,
and
it's
also.
What
query
is
the
database,
which
is
presto
for
all
the
real
work
on
the
underlying
data?
We
use
ACS
HDFS
for
storage,
but
that
is
something
you
can
change.
E
You
can
actually
also
use
s3
natively,
as
basically
a
file
system
is
where
you
can
think
of
it,
or
you
can
also
use
a
local
disk
that
is
mountable
on
all
the
pots,
still
like
NFS
cluster
ifs
deficit.
Anything
that's
mountable
by
many
pots,
all
so
be
used
as
a
storage
in
for
for
this,
instead
of
HDFS.
A
E
Alright,
so
we
have
a
number
of
custom
resource
definitions,
so
the
top
one
here
is
the
metering
resource,
it's
kind
of
the
resource
that
tells
you
to
install
everything.
Our
metering
operator
installs,
the
pods
listed
above
presto,
I've
reporting
operator.
It
does
all
that
through
the
metering
resource
is
basically
the
config
resource
for
installation.
E
We
have
some
presser
tables
which
people
don't
usually
interact
with,
but
they're
kind
of
restoring
some
of
our
state
and
then
the
rest
are
all
things.
I
would
expect
the
end
user
to
deal
with
so
there's
a
report,
data
source,
which
is
basically
incoming
data
or
data
that
already
exists
and
I'll
show
you
that
there
are
report
generation,
queries
which
are
the
sequel
queries
that
we
saw
before
there's
the
report.
E
For
me,
this
query,
which
is
a
Prometheus
ul
expression
for
collecting
data
out
of
Vitas,
and
then
there's
reports
and
scheduled
reports,
which
are
the
parts
to
actually
act
upon
that
data.
The
storage
locations
are
a
way
for
configuring.
Whether
or
not
you
want
your
data
to
be
stored
in
HDFS
s3
or
a
local
file
system,
for
example.
E
E
So
I
will
make
this
comment
slightly
smaller
see
if
I
can
get
a
nice
break
in
here.
But
this
is
a
large
prometheus
query
expression
that
gets
the
containers
memory
usage
and
then
does
it
by
grouping
it
by
pod
level.
So
you
get
pollen
usage
information
instead
of
just
container
level,
and
then
we
do
a
bunch
of
joining
basically
at
the
end
do
or
lane
it
with
other
kubernetes
data
so
that
we
have
like
the
pod
name,
the
node
name
and
the
namespace.
So
this
query
is
just
run
by
the
configuration
of
the
data
source.
E
Which,
actually
is
what
Maps,
eventually
down
to
a
real
table,
table
query.
So
the
report
data
source
has
a
Prometheus
query
name,
which
is
the
name
of
the
pot
of
the
report
from
UT
square.
We
looked
at
before
cloud
usage
memory
bytes
and
that
configures
the
operator
to
actually
go
and
collect
this
periodically.
E
This
section
normally
can
have
some
extra
options
for
like
how,
often
to
poll
and
like
pump
sizing
for
like
how
much
data
to
grab
at
once,
but
by
default
it
would
just
use
some
cool
defaults
and
then
we
have
a
status
just
like
every
CR
which
store
there's
other
information
about
this
resource.
In
this
case,
the
table
name
field
is
set
indicating
there's
a
database
table
created
for
this
and
that
we're
collecting
the
data.
So
now
that
we
have
a
data
source,
we
can
actually
query
it
from
our
database.
Using
a
report
generation,
query.
E
So
there
can
be
many
report
generation
queries
that
act
on
the
data
sources.
That's
kind
of
the
idea
is
that
the
data
source
is
the
underlying
raw
data,
and
then
you
can
have
one
zero
or
more
queries
actually
utilize,
that
underlying
data.
That
way,
you
don't
actually
have
to
collect
the
data
more
than
once
for
processing
in
different
ways,
just
obviously
useful.
E
So
a
report
generation
query
just
like
everything
else
has
a
name
all
this
other
stuff
is
auto-generated
because
Hayes
likes
to
fill
in
the
metadata.
We
have
a
comest
set
of
columns,
which
is
basically
what
we
expect.
This
query
to
output,
in
terms
of
like
a
database
schema
if
you're
familiar
was
like
a
sequel
table.
This
is
roughly
what
that
map's
to
is
the
columns
in
that
table
and
then
some
extra
information
for
how
to
display
it.
E
E
E
Is
this
one?
So
we
have
an
hourly
report,
two
reports,
actually
one
for
memory
usage
and
one
force
acusa
j--,
and
what
these
do
is
they
run
the
sequel
query
specified
by
the
generation
query
field
and
they
run
according
to
a
particular
schedule.
We
can
do
hourly
daily
monthly,
whatever
you
want.
Really.
We
also
support
cron
for
the
more
flexible
use
cases
as
well,
and
so
it
will
report
from
data
starting
at
the
reporting
start
time
until
the
reporting
in
we
don't
have
a
reporting
in.
E
So
it's
gonna
report
forever,
which
is
what
I
want
for
this
purpose
and
it
what
retro
actively
go
back
and
fill
in
the
data?
That's
missing
from
the
start,
assuming
we
have
the
data
collected
from
Prometheus
already,
and
so
as
I
showed
before
this
has
been
running
for
about
ten
hours
since
last
night,
so
that
we
actually
have
some
more
than
just
a
few
rows
of
data
before
I
show
you
that,
though,
we
can
see
the
status
to
indicate
like
where
it's
at
in
the
report.
E
B
E
I'm
using
routes
you
can
use
blowed
bounce
of
services
or
no
ports
as
well.
I
have
a
route
that
is
configured
to
expose
my
endpoint
at
a
particular
domain
name
here,
so
I
actually
already
have
a
command
set
up
for
create.
Yes,
you
can
query
it,
but
it's
not
really
anything.
I'm
too
worried
about
this
CI
clustered
I
set
it
up
with
off
using
the
omen
ship
off
the
proxy
and
I'm
querying
the
route
that
I
just
showed
before,
and
then
this
is
the
endpoint.
E
The
API
v1
schedule
reports
it
and
then
it
hard
to
see
but
I'm
querying
for
a
particular
report,
which
is
the
namespace
B
usage
hourly
and
I'm.
Getting
it
in
type
of
the
de
format,
don't
agree
that
we
basically
get
the
results
as
tab.
Tab,
delimited
format
for
each
column
period
start
is
the
start
time
for
the
given
scheduled
interval.
So
it's
an
hourly
report,
so
each
period
starts
a
period.
End
isn't
one
hour.
E
The
namespace
is
the
namespace
that
we're
calculating
on
a
start
and
data
and
are
the
minute
max
of
the
values
in
that
time
range.
And
then
the
policy
of
usage
for
seconds
is
the
CPU
every
instance
in
time,
multiplied
by
the
resolution
of
that
data,
all
added
together
to
give
us
an
actual
CPU
core
usage
seconds,
and
then
we
do
this
for
every
hour.
So
we
have
13
to
14
14
to
15
and
everything
up
down
to
basically
the
last
hour
15
to
16.
E
You
see
we
have
as
well
with
memory
and
you
can
see
the
same
value
except
it's
in
bytes
and
it's
a
different
set
of
values
for
the
these
Pollock's
usage
information.
So
this
is
actually
all
coming
from
node
exporter
at
the
end
of
the
day
and
then
Prometheus
collects
annoyed
exporter
data
and
we
do
some
extra
processing
with
the
sequel
and
the
Prometheus
query
to
get
it
into
this
format.
But
that
means
basically
that
if
we
ever
needed
to
change
power,
it
works.
E
All
we
have
to
really
do
is
modify
one
of
these
resources,
like
the
the
namespace
memory
usage
request.
I
can
just
edit
it
and
modify
it
to
my
needs.
So
that's
the
rough
idea
of
like
how
you
interact
and
use
me,
but
there
are
other
things
you
can
do
we're
currently
working
on
a
another
concept
which
is
see.
E
This
is
a
regular
report,
but
the
concept
is
that
you
can
have
custom
inputs
where
you
could
imagine
having
a
query
that
maybe
is
specific
to
a
particular
namespace
and
you
could
add
the
old
inputs
to
it
that
customize
the
behavior
so
that
it
filters
everything.
That's
not
the
namespace
that
you
want.
Maybe
you
know
your
CI
test
namespace,
and
you
only
want
to
report
on
that
namespace.
E
This
is
something
that
we
just
added
in
is
finally
being
worked
on,
so
I
don't
have
a
great
demo
of
it
has
never
expertise,
utilize,
these
custom
inputs
very
ugly
yet,
but
that's
something
that
we
just
released
and
then
we're
also
working
on
with
this
future,
a
concept
of
roll-up
which
allows
you
to
calculate
really
granular
reports
that
have
like.
Maybe
you
say,
like
the
hourly
interval
that
I
was
showing.
But
then
this
could
be
rolled
up
into
a
daily
report
which
basically
aggregates
the
hourly
results.
E
Yeah,
so
that's
the
rough
idea.
I,
don't
really
have
a
whole
lot
more,
given
that
all
of
this
is
just
custom
resources.
The
real
power
here
is
that
you
can
program
it
using
kubernetes.
Just
the
same
way,
you
can
program
anything
else.
Thank
you,
Burnett
ease.
You
have
an
operator
that
wants
to
interact
with
this
system.
It
can
do
so
using
typical
kubernetes
technologies
like
operators
or
true
CTL.
E
B
B
So
here's
kind
of
how
it
all
comes
together,
few
examples
of
how
you
can
use
this
in
a
real
environment.
So
chance
is
showing
us
all
the
reports,
all
the
stuff
into
the
hoods.
But
at
the
end
of
the
day,
say
if
you
want
to
do
show
back
for
a
number
of
different
teams.
You
know
each
team
has
three
different
projects
and
you
know
they
have
a
certain
budget.
You
can
run
the
reports
the
chance
we're
just
talking
about
and
get
the
usage
on
Amazon.
B
We
can
actually
correlate
to
a
dollar
amounts
which
is
really
cool
using
the
Amazon
billing
API,
and
so
you
can
get
those
into
Excel
and
just
you
know,
sort
them
and
group
them
by
the
different
namespaces
and
total
things
up
just
manually
or
because
these
are
all
just
using
CSVs.
You
can
actually
import
these
into
your
business
intelligence
tool
of
choice.
Whatever
you
want
to
do
and
make
dashboards
out
of
these
and
have
a
more
automated
flow,
you
can
also
start
doing
a
number
of
kind
of
like
augmented
math.
B
If
you
wanted,
if
you
want
to
call
that
on
so,
if
I've
got
like
two
bare
metal
clusters,
for
example
and
I
know
how
much
that
they
I'm
leasing
the
hardware
for
a
certain
amount,
maybe
I've
got
a
block
of
bandwidth.
You
know
whether
I
use
it
or
not,
don't
cost
any
money,
and
so,
if
you
wanted
to
combine
all
of
that
infrastructure
cost
together,
you
know
of
your
shared,
like
enterprise
math
device.
For
example.
B
You
can
take
some
of
the
usage
from
these
reports
and
combine
it
with
that
fixed
cost
and
multiply
that
stuff
together
and
then
show
that
back
to
your
team,
whether
in
you
know
another
Excel
document
or
in
that
bi
tool
hook
it
up
to
any
of
the
other
cost
reporting
that
you
might
be
doing.
Email
reports,
that
type
of
thing-
and
my
favorite
use
case
for
this
of
all,
is
you
can
shame
teams
that
are
under
utilizing
what
they've
reserved.
B
So
if
you're,
you
know
asking
for
more
than
2x
what
you're
using
on
the
cluster
itself,
you
can
start
shaming
those
teams,
you
know
calculate
the
ratio
of
what
they're
using
like
or
not
list
them
out,
so
exactly
which
apps
need
to
be.
You
know
yanked
down
to
size
or
even
just
go
ahead
and
do
that
for
them.
You
could
have
automation,
that's
running,
that's
automatically
adjusting
people's
resource
limits
and
that
type
of
thing.
So
it's
pretty
exciting.
B
This
is
you
know
all
using
cluster
metrics
that
we
have
today,
and
that
is
one
whole
use
case
for
this.
But
you
can
also
you
know,
export
custom
metrics
from
your
operators,
and
so
that
is
kind
of
the
the
other
use
case
of
this.
So
the
cluster
monitoring
and
gaining
insights,
for
that
is
great,
but
if
you've
got
a
database
operator
and
it's
emitting
vetrix
for
the
different
things
that
is
tracking
internally
the
number
of
rebalance
operations
or
things
like
that
that
are
critical
to
how
it.
D
C
B
A
All
right,
well,
I,
think
that
they're
probably
covers
off
metering
I'm,
not
seeing
any
questions
and
Daniel
Feinberg
from
Pantheon
has
been
asking
in
the
chat
a
lot
of
questions
and
updates
around
the
controller
which
we
I
think
he
joined
late
and
we
were.
We
missed
him
out
so
I'm
going
to
unmute,
you
Daniel,
but
he
can
introduce
yourself
and
what
you've
been
working
on
and
we
can
kick
off
a
conversation.
F
So
I'm
Daniel
Feinberg
I'm
a
senior
engineer
sre
at
Pantheon.
We
are
a
Drupal
and
word
hope,
WordPress
hosting
platform.
We
also
offer
developer
tools
to
agencies
and
we
host
about
I,
don't
know
150,000
sites.
It
varies
quite
a
bit
depending
on
the
day
as
customers
join,
so
we
host
a
large-scale
Kassandra
infrastructure
in
two
different
ways,
so
we
used
sandra
in
two
different
major
pieces
of
our
platform
and
so
we're
in
the
process
of
building
a
cassandra
operator.
F
That
building
that
are
talking
about
so
our
other
operator
is
a
machine
operator
that
is
also
right
now
being
built.
It's
not
running
in
production.
We
have
our
standard
operator
operating
production
sandra
clusters
in
a
very
simple
way.
Our
larger
monolithic
database
will
be
migrated,
hopefully
by
the
end
of
the
month.
Probably
in
the
first
week
of
next
month
is
more
realistic
on
to
the
operators
management
plane.
F
So
our
machine
operator
is
actually
we
may
we
wrangle
and
maintain
systemd
containers
in
large
quantities.
All
of
our
customer
code
runs
and
systemd
containers,
no
docker,
no
images
system,
D,
namespaces,
C
groups,
all
the
base
stuff
we've
been
around
about
seven
years
and
when
they
built
this
out,
docker
wasn't
there
yet,
and
so
we,
the
large
monolith
in
our
system
is
an
orchestration
plane
and
our
goal
is
to
slowly
piece
that
out
into
operators
in
kubernetes
wrangling,
our
system
D
containers
our
first
step.
F
There
is
putting
couplets
on
all
of
our
our
servers
that
run
the
system,
D
containers
and
bringing
in
a
provisioning
tool
that
is
being
built
as
an
operator
it
will
deploy.
Daemon
sets
to
manage
services
on
each
server
that
worked
a
knitter,
a
great
for
customer
load
on
our
servers
with
those
assistant
e
containers.
So
the
machine
operators
super
interesting
and
unfortunately
won't
be
open
source.
But
what
the
big
the
big
deal
for
us
is
that
we're
moving
into
a
way
where
kubernetes
will
be
our
central
database
for
infrastructure.
F
It
will
we'll
be
using
it
like
right
now.
The
machine
builder
operator
doesn't
really
do
much
beside
daemon
sets
and
allows
us
and
our
monolith
provisional
the
system
via
containers
still
and
so
kind
of
utilizing
at
CD
and
kubernetes
in
its
multi
region
way.
We
use
gke
at
Google
to
our
multi
zonal
way
to
store
all
of
our
operational
information,
whether
or
not
it
affects
kubernetes
and
using
its
event-driven
system
and
operators
to
operate
on
kubernetes
resources,
but
also
external
things
that
kubernetes
isn't
managing
so
kind
of
a
hybrid
operator
there.
F
A
C
A
F
Well,
because
we
have
two
different
kinds
of
use:
cases
for
Cassandra,
one
is
a
multi
region,
eventually
consistent
set
up,
and
the
other
is
a
three-way
data
replication
for
a
meta
data
for
a
distributed
file
system,
which
is
publicized.
We
maintain
a
distributed
file
system
for
our
customers
and
the
meta
data
is
stored
in
Cassandra,
and
the
files
are
stored
in
GCS
on
Google,
and
so
the
two
different
use
cases
have
had
us
broaden
out
the
built
and
spec
of
the
management
of
the
Cassandra
cluster.
There's.
F
F
So
ours
will
be
specific
to
the
containers
like
you'll
have
to
use
our
images
with
our
operator,
because
we
have
some
logic
inside
of
the
entry
point
files
and
the
docker
images
that
do
calculations
that
can't
be
done
at
the
at
the
level
of
the
operator
to
set
up
configurations
and
things
so
there's
definitely
design
decisions
that
we've
made
that
are
based
on
our
feature
set,
but
I
believe
that
by
open
sourcing
it
we
can
set
up
things
like.
Oh,
we
do
repairs
differently.
F
A
F
Hoping
early
November
as
soon
as
we
get
our
monolith
migrated
and
see
that
it's
stable,
then
I'm,
hoping
to
put
that
out.
You
know
Sean
just
asked
about
the
control
of
runtime
I
missed
the
beginning
and
part
of
my
project.
This
weekend
is
hopefully
migrate
to
the
controller
runtime
version
of
the
operator
SDK,
so
I'm
wondering
if
maybe
someone
can
speak
to
how
hard
that's
going
to
be
and
how
much
change
that's
gonna,
be
across
the
codebase
for
a
user.
D
So
I
can
kind
of
jump
into
that
dannion.
So
you
don't
have
the
migration
guide
out
yet
we're
planning
to
have
it
out.
I
guess
this
coming
week,
along
with
release
that
tags
on
changes
on
the
master
but
kind
of
at
a
high
level.
There
is
a
fair
amount
of
change
involved
in
just
the
interfaces
changed,
because
we've
moved
over
from
using
the
SDK
api's
SDK
dot.
Watch
as
to
give
the
handler
that
you
have
in
the
SDK.
You
move
those
over
to
using
the
controller
runtimes
controller
package.
D
Essentially,
most
of
your
reconciliation
code
would
probably
stay,
as
is
it's
just
that
you
would
need
to
update
the
project
layout
just
a
bit.
So
if
you
look
at
the
master
branch
today
and
if
you
try
to
create
like
a
samples
and
catch
the
operator
as
an
example
you'll
be
able
to
see
just
what
the
new
project
new
clothes
looks
like.
So
there
is
a
fair
amount
of
change
involved
in
actually
taking
your
reconcile
cold
and
just
moving
it
over
to
a
controller
package
and
then
just
kind
of
changing
the
interface
to
the
reconcile
code.
D
So
obviously,
when
you
would
get
events
sent
to
your
handle
now
you
basically
get
like
an
object
key
that
you
use
to
look
up
the
object
and
the
cache,
for
instance.
So
there's
some
changes
around
the
edges,
but
I
think
by
and
large,
the
bulk
of
your
like
operator
code
should
stay
the
same
and,
if
you're
using
multiple
consume
resources
within
your
SDK
base
project
today.
D
D
C
Just
as
another
data
point,
I
have
moved
the
and
we've
moved
the
ansible
operator
over
to
the
controller
runtime
a
while
back
and
sse
was
saying
most
of
our
core
logic
didn't
have
to
change.
We
just
had
it
and
changed
the
interfaces
it
was
calling,
so
that
was
really
nice
and
then
the
rest
of
it
is
like
moving
files
to
the
new
location.
So
as
far
as
amount
of
work
it
should
be
like
it
should
just
be
about
moving
files
into
the
right
locations
or
renaming
them.
C
But
as
far
as
like
the
core
logic,
it
shouldn't
change
that
much.
Your
main
file
will
change
some
because
you
need
instead
of
a
manager
and
do
some
of
that
other
stuff.
But
you
can
kind
of
just
take
the
core
like
the
controller
one
times,
like
example,
just
like
take
that
and
that
they'll
mostly
work
for
you,
so
just
add
them
as
you're
starting
to
go
through
things.
A
All
right,
then,
I
was
asking
in
the
chat.
If
anyone
else
had
any
updates
and
Sharon
was
gonna.
Ask
for
a
slot
today
about
GCP
spark
operator.
Oh
I
apologize
for
that
sermon
that
got
lost
in
the
email
threads.
We
have
15
minutes
if
you'd
like
to
take
it
I'd
quite
like
to
hear
it.
So
why
don't
you
share
your
screen
and
take
it
away
and
it's
a
little.
G
A
G
A
So,
while
he's
doing
that,
I
just
wanted
to
make
an
announcement,
we
did
get
a
room
at
KU,
con
North
America
to
host
kubernetes
operator
framework
hands-on
workshop.
It
is
that
on
the
December
14th
the
morning
Friday
morning,
the
14th
in
the
Seattle-
it's
not
actually
at
the
convention
center,
it's
going
to
be
at
the
Seattle
Sheraton.
So
if
you
are
interested
in
enjoying
that,
I
will
send
that
information
out
on
the
mailing
list
after
this
song
so
Sharon
take
it
away.
G
Ok
yeah
sure
today,
I
can
talk
a
little
bit
about
the
Google
cloud
platform
spark
operator.
I.
Think
in
two
weeks
ago
I
attended
a
meeting
where
Chiri
presented
his
spark
operator,
but
this
was
from
Google
I
think
it
has
more
momentum
and
more
contributors
and
users.
So
let's
look
at
what
it's
about
so
have
a
few
things.
I
want
to
talk
about
for
today,
so
to
get
started
out.
G
Just
talk
briefly
about
what
the
operator
pattern
is,
but
I
think
most
people
are
already
familiar
with
it,
but
the
this
treaty,
Peace
Park
operator,
is
basically
an
implementation
of
this
pattern
and
then
I'll
talk
about
the
architecture
of
the
of
this
operator,
how
to
install
it
and
what
are
when
some
of
its
basic
features
are
and
then
I'll
talk
about
a
COI
tool.
That's
provided
in
this
operator
project
called
the
spark
CTL.
It
makes
some
of
the
workflow
with
managing
spark
drops
the
easier,
as
we'll
see
then
comes
a
feature
called
militating
animation
webhook.
G
This
is
a
feature
that
the
spark
oratory
leverages
to
to
provide
lots
of
flexibility
in
customizing.
Your
spark
driver
and
executor
paths.
I
think
this
is
one
of
the
most
useful
features
in
this
project
and,
as
the
last
thing
I'll
talk
about
exporting
and
looking
at
Prometheus
matric
for
Matias
matrix
with
this
spark
operator.
I'll
conclude
with
some
future
things
that
you
can
contribute
to
this
project.
G
G
The
what
it
really
is
is
it's
an
application,
specific
controller
that
extends
the
communities
API
that
makes
managing
this
complex
application,
makes
Mex
management
creation
and
configuration
easier
and
the
way
it
does.
It
is
it's
an
it's
an
event.
Loop.
That's
constantly
running
the
operator
component
keeps
observing
for
events.
Listening
for
creation
of
a
new
custom
resource,
for
example,
and
then
when
something
happens,
the
operator
evaluates
the
current
status,
what
it
should
do
and
then
it
acts
on
the
the
insights
that
what
it
thinks
it
should
do
and
this
loops
this
event
loop
keeps
going.
G
Then
that's
quickly
cut
into
the
gist
to
talk
the
specifics
of
the
GCP
spark
operator.
It
was
created
by
this
guy
called
Ian
Ali
at
Google,
and
it's
not
open
source.
The
link
is
provided
on
the
slide.
The
approach
that
it
takes
to
managing
spark
jobs
is
that
it
creates
two
customer
resource
definitions
or
CR.
These
one
called
spark
application.
Another
cost
schedule
spark
application,
so
those
are
these
represents
the
abstractions
of
a
structure
and
they
are
what
make
scratch
jobs-
citizens
in
cadiz.
G
G
G
Look
like
you
know
in
a
minute,
but
once
you
have
the
spark
job
spec
documented
in
the
yeah
mo
you
would
use
cube
CTL
or
a
spark
CTL
that
we'll
talk
about
use
these
COI
tools
to
submit
your
llamo
to
the
api
server
and
once
the
server
receives
your
request
to,
for
example,
create
a
new
c
rd
service
pack.
Application
or
schedule
spark
application.
The
there's
a
component
called
controllers
in
the
spark
operator
that
would
you
know,
get
this
request
and
assemble
those
configurations
and
cast
them
to
another
component.
Consummation.
Wonder
the
well.
G
G
The
other
basic
features
are
because
it
uses
a
llamo
to
document
in
the
spec
okay
job,
so
the
llamo
is
a
declarative
in
nature.
It's
easy
to
do
things
like
version
control
and
because,
under
the
hood,
what
it
really
does
is
it's
its
advanced
parts,
the
main
command,
so
everything
that
spark
submit
takes
of
those
configuration
options.
This
part
operator
also
supports.
You
only
need
to
figure
out
what
you
need
to
put
in
the
llamo.
The
translation.
You
just
need
to
know
that
there's
a
there's,
a
good
documentation,
it's
easy
to
figure
out.
G
It
also
supports
a
crown
like
scheduled,
spark
jobs.
So
that's
what
the
ECR
the
scheduled
spark
application
is
for
and-
and
the
interesting
feature
is
mutating
animation
black
book
the
operator
uses
that
to
enable
product
customization,
you
can
mount
configure
config
maps
or
volumes
in
your
driver
and
executor
parts.
We'll
see
that
in
a
few
slides-
and
you
can
also
use
the
spark
operator
to
enable
automatic
job
pre-submission.
If
you
would
like
to
change
the
specs
of
an
existing
start
job
or
to
restart
it
if
upon
failure.
G
B
G
Main
things
are:
let's
talk
about
prerequisites,
it
requires
community
needs
the
1.8
and
above
because
it
relies
on
garbage
collection,
customer
resources,
but
that's
only
available
starting
when
not
8,
and
if
you
would
like
to
use
mutating
animation
well
cook,
then
communities
went
on
aisle
and
about
is
required
because
this
feature
is
only
only
becomes
the
beta
feature,
starting
with
online
and
exactly
what's
distribution
of
kubernetes.
You,
you
install
the
operator
in
that,
doesn't
really
matter.
Personally,
I've
used
it
on
GT
and
open
shipped
post
worked
fine
and
the
yeah
how
to
install
it.
G
It's
easy
to
install
because
there's
a
there's,
an
incubator
chart
on
the
central
helm,
charts
repo
yeah,
you
basically
added
the
people,
how
people
and
they
started
just
what
it
would
do
for
any
other
standard
chart,
and
there
are
other
options
to
customize
it.
For
example,
you
know
you
would
like
to
install
it
in
a
different
name
space
or
there
are
some
components
that
you
would
like
to
enable
or
disable.
But
you
you
can
look
at
the
the
link.
There's
a
concise
documentation
that
you
can
look
at.
I
won't
go
into
details
here
now.
G
Let's
take
a
look
at
a
sample
llamo
what
it
looks
like
so
here.
So
basically
you
would
like
you
name
your
Spock
job
expert
PI,
and
you
would
like
to
run
it
in
default.
Namespace
yeah!
That's
where
you
specify
your
namespace
and
you
provide
your
image,
your
main
class
application
file.
It's
all
standard
things,
then
you
can
configure
our
driver
pod
to
with
some
memory,
resource
requirements
or
service
account
without
you
use
and
how
many
executor
instances
you
would
like
to
launch
and
resource
requirements
for
executor.
G
So
this
is
a
very
simple
llamo
and
you
can
have
all
sorts
of
other
configurations
as
long
as
this
proximity.
Of
course
it
and
you
figure
out
the
corresponding
speck
in
the
llamo
but
yeah.
This
is
what
it
looks
like
the
basic
operations
very
easy,
because
now
the
CRTs
are
there.
Then
you
can
just
create
a
spark
job.
For
example,
just
as
you
would
create
a
pot
cookie
I'll
apply
the
llamo
and
took
place
all
the
jobs.
Okay,
get
spark
applications,
the
name
of
the
the
CRD
you
have
to
get
other
details,
for
example,
events.
G
Let's
now
look
at
the
the
customs
tôi
có
provided
in
the
in
this
project
called
spark
CTL.
So
here
I
said
it's
a.
It
complements
cube
CDL
to
make
some
operations
easier,
but
but
I
would
say
that
it
it
can
fully
replace
coups
to
do
when
working
with
working
with
spark
application
or
schedule
spark
applications.
Here
are
these
because
yeah,
because
everything
that
Cuba
CDL
can
do
in
Sparks,
they
all
can
do
and
it
makes
things
easier.
For
example,
listing
our
spark
drops.
G
G
G
Besides
that
there
are
a
few
other
features.
Spark
city
also
supports
for
forwarding,
usually
web
UI
again.
This
is
something
that
cube
CL
can
do,
because
with
cubic
CDL,
you
can
again
just
figure
out
the
part
first
and
then
do
a
port
forwarding
on
that
part.
But
here
you
don't
know
yeah
I,
don't
need
to
find
the
part.
You
just
knows
that
the
spark
job
name
here
is
a
spark
PI.
So
it's
easier.
G
It
also
supports
staging
local
dependencies
to
s3
and
GCS,
and
so
for
your
dependencies
that
you
specified
in
your
spark
idle
llamo.
You
can
specify
your
a
GCS
pocket
or
a
tree
bucket
to
upload
them
to
to
a
remote
place,
but
you
need
to
configure
your
authentication
and
stuff
up
front.
The
details
are
in
the
documentation
which
I
won't
talk
about
here.
G
Ok,
so
now,
let's
go
to
the
mutating
animation
graphic.
So
this
feature
is
a
it's
a
feature
about
of
kubernetes
itself
rather
than
the
operator,
but
the
sparkle
operator
leverages
this
feature
to
to
enable
a
flexible
customization
of
pods.
What
this
feature
is,
is
it's
a
so-called
animation
controller
that
intercepts
requests
to
the
API
server
and
the
modify
stand
object
before
the
object
is
persisted
as
I
mention
it's
a
beta
feature
in
1.9
above
and
the
SPARC
operator
uses
this
feature
to
achieve
mostly
three
use
cases.
G
The
first
use
case
is
Mountain
config,
mats
and
Driver
and
executive
pause.
The
second
feature
is
the
monkey
volumes.
The
third
feature
is
setting
positive
affinity
and
I.
Definitely
things
like
what
what
notes
you
actually
run
on,
or
which
knows
it
would
like
to
avoid.
Let's
look
at
the
view
sample
use
cases.
So
when
would
you
like
to
month
amount
complete
maps
in
your
spark
job
parts?
So
here's
a
here's
an
example.
So
it's
a
very
common
to
have
some
custom
configurations
for
a
job
in
Sparky,
Falls,
calm,
partying,
that
Sh
or
log4j
properties.
G
G
So
you
first
mount
these
files
as
complete
Maps
and
then,
and
then
your
Yama
file.
You
simply
refer
to
that.
The
config
Maps
that
you
were
creating
and
then,
when
the
when
the
spark
CRD
the
spark
job
is
created.
Those
config
maps
that
you've
pre-mounted
would
be
automatically
mounted
in
the
inside
the
pods
inside
the
travel
and
exec
reports,
and
then
your
spark
job
will
be
automatically
configured
as
as
desired.
G
Another
use
case
is
the
supporting
Hadoop
configurations
to
access
HDFS,
for
example.
You
need
course
that
XML
and
HTML
site
XML
these
files.
Again
you
can
mark
as
camping
maps
and
and
refer
to
the
complete
maps
in
the
llamó,
which
would
then
bring
in
the
config
maps
and
mark
them
in
the
pots
once
your
job
starts.
So
this
way
you
achieve
connections
with
HDFS.
G
Another
use
case
with
the
meditating
emission
web
hook
is
the
monney
volumes.
So
here's
a
use
case
that
I've
been
conjuring
myself
is
in
the
use
of
spa
history
server.
In
this
case,
both
driver
and
executor
parts
of
a
spark
job
need
to
log
events
to
the
same
volume,
which
is
also
the
volume
used
by
the
history
server
part
itself
for
using
like
displaying
on
the
UI,
for
example.
G
So
here,
for
example,
have
a
type,
a
PVC,
a
volume
and
then
in
order
to
to
have
the
driver,
an
exactor
house
long
log
to
that
volume
you
need
to
have
this
chart
without
volume
month.
You
specify
the
name
of
the
volume
that's
available
here
and
then
the
path
that
you
would
like
to
amount
the
volume
at
yeah.
This
way
the
volume
is
available
at
this
/
month
directory
and
then
you
can
long
events
there,
which
which
are
configured
here
yeah.
G
So
these
are
some
use
cases
for
the
mutating
animation
back
hook.
I
think
it's
a
pretty
useful
feature
now,
but
it's
an
optional
component.
You
can
disable
it
if
you
don't
want
it
to
use
it
yeah.
Last
but
not
least,
let's
talk
about
Prometheus
metrics,
the
spark
opera
jury
configures
a
Prometheus
DMX
exporter
to
run
as
a
Java
agent
in
the
operator
card
itself,
but
it
also
supports
emitting
metrics
Prometheus
metrics
in
the
driver
and
executive
metrics
themselves
in
the
travel
and
adapter
executives
themselves.
G
So
so
the
two
sets
of
metrics
are
in
a
an
application-specific
metric.
For
example,
spark
driver
app
status,
stop
duration,
so
this
is
a
metric,
that's
specific
for
that
job.
Coming
from
a
driver
or
executive
park,
there's
also
a
set
of
metrics
that
are
a
higher
level,
for
example,
spark
app
running
count,
so
these
are
metrics
that
are
specifically
provided
by
the
operator
pod
itself.
So
these
are
application,
metrics
application
level
metrics,
but
note
that
tweaks
to
expose
driver
and
executor
metrics.
G
So
the
first
set
your
spark
application
image
that
you
specified
in
your
llamó
that
needs
to
contain
the
Prometheus
JMX
exporter,
Java
agent
jar,
otherwise
the
metrics
won't
be
exported,
but
once
you
have
that
jar
available
in
your
image,
it's
it's
easily
configurable
to
to
have
those
metrics
exported
yeah.
Here's
a
example
llamó
file
that
configure
configures
the
driver
and
executor
metrics
to
be
to
be
exported,
so
these
are
for
the
first
set
of
metrics
that
I
that
I
showed
in
a
previous
slide.
The
operator
part
itself
already
configures
itself
to
export
application
level.
G
Metrics
yeah.
The
this
slide
is
just
the
general
thing
about
how
you
look
at
those
metrics.
You
can
look
at
them
in
the
permitted
UI
or,
for
example,
you
would
like
to
verify
the
list
of
metrics
opera
export,
advises,
spark
operator,
part
itself,
then
you
can
find
the
pod
and
then
do
a
port
forwarding
on
that.
The
the
the
default
port
is
ten
to
five
four,
and
once
you
have
that
port
forwarded,
you
can
go
to
the
metrics.
G
G
The
future
work
is
that
the
current
status
of
the
project
that
it's
a
fully
compatible
with
spark
to
3
and
2/3
being
tested
with
a
tool
for
release,
candidates
versions
and
it's
currently
alpha,
but
it
will
be
upgraded
to
beta
once
to
love
for
its
officially
released
so
a
trial
here
at
the
lab,
and
we
are
actively
evaluating
and
contributing
to
this
project.
Our
past
contributions
included
the
hub
chart
and
integration
with
the
Prometheus
and
spark
a
server.
The
project
students
are
in
its
early
stage,
and
it
requires
lots
of
testing
to
make
it
mature.
G
So
more
integration
tests
are
I,
think
that
needs
to
be
added,
and
we
are
also
working
on
that
also
Kerberos
support.
That's
currently
lacking
also
spark
Cpl,
doesn't
have
very
good
support
for
scheduled
spark
application.
That's
also
something
to
be
added.
Just
a
few
words
about
the
team.
I
mean
I
worked
at
the
light
bends.
The
team
I
working
is
called
fast
data
platform,
so
we
are
a
the
product.
Is
a
curated
fully
supported
the
platform
that
helps
you
help
to
helps
developers,
design,
build
and
run
data
pipeline,
and
all
mortgages
is
on
streaming.
G
G
History
server,
for
example-
and
currently
it's
looking
good
so
I
think
this
is
a
promising
project
and
has
a
lot
of
activity
so
I
encourage
you
guys
to
try
it
out
and
maybe
consider
contributing
yeah
yeah
I
think
I
went
a
little
bit
too
fast
because
because
of
a
short
of
time
but
yeah,
if
you
have
questions
you
can
shoot
me
an
email
or
yeah.
Just
talk
to
me.
Okay,.
A
B
F
A
Would
be
great,
I
will
post
all
that
up
on
the
mailing
list
a
little
bit
later
today,
and
so
thank
you,
everybody
I,
don't
see
any
questions
in
the
chat
at
the
moment,
but
I
will
post
all
this
and
Sharon.
You
can
reach
out
to
him
on
the
mailing
list
as
well,
so
thanks
again
and
we'll
meet
we'll
be
meeting
again
and
on
the
third
Friday
of
next
month,
but
you
can
find
us
all
always
on
the
google
group
mailing
list,
so
that
was
a
really
great
charm.