►
From YouTube: OpenShift Commons Briefing: Operator Metering with Chance Zibolski and Rob Szumski (Red Hat)
Description
From Operator Framework SIG October 2018 Operator Metering with Chance Zibolski and Rob Szumski (Red Hat)
A
I'm
chance
I
work
at
Red,
Hat
and
previously
Korres
I
came
in
with
acquisition
and
I've
been
with
a
crossroad
hat
for
a
little
over
three
years
and
Rob
asked
me
if
I
would
give
a
brief
overview
of
what
operator
metering
is
and,
if
possible,
also
demo.
So
I
have
all
that
I
thought.
I'd
start
by
just
kind
of
giving
the
base
idea
of
what
metering
is
for
what
its
purpose
is
and
like
what
we
aim
to
solve
with
it
and
where
you
can
find
out
more
information.
A
A
So
the
project
name
is
operator
metering,
but
I
want
to
preface
that
with
that,
the
fact
that
this
isn't
necessarily
only
geared
towards
operator
these
cases,
though,
that
is
probably
the
best
way
for
to
get
better
integration.
If
you
do
have
an
operator
that
you
can
leverage
metering
in
a
more
communities
native
way.
A
So
the
base
idea
of
metering
is
that
we
work
closely
together
with
your
monitoring,
stack
and
potentially
other
data
sources
to
collect
data,
store
it
for
a
long
term
and
then
provide
the
ability
to
report
on
it
over
over
time
there
at
slice
and
dice
it.
The
way
you
need,
though,
let's
see
just
to
give
a
quick
example
of
kind
of
what
everything
actually
looks
like
we.
B
A
So
what
I'll
do
is
I'll
actually
go
through
this
in
more
detail,
but
that's
just
the
rough
rundown
is
that
it
starts
with
is,
for
me,
says:
query.
You
start
ingesting
the
data
through
the
reporting
operator
reporting
operator
then
has
the
ability
to
query
using
some
sequels
I.
Either
you
write
or
we
write,
and
then
you
get
that
clear
to
run
by
creating
a
report
or
a
scheduled
report
which
actually
says
what
do
you
want
to
report
on?
So,
in
the
background,
I
already
have
an
installation
of
u-turning
running
by
default.
A
It
runs
a
set
of
pods
or
storing
our
data
for
querying
it
and
then
also
for
the
part
that
runs
collection
and
the
actual
queries
themselves.
In
particular.
The
primary
component
here
is
the
reporting
operator,
it's
the
one
that
does
the
data
collection,
wreath
iasts,
and
it's
also
what
queries
the
database,
which
is
presto
for
all
the
real
work
on
the
underlying
data.
We
use
ACS
HDFS
for
storage,
but
that
is
something
you
can
change.
A
You
can
actually
also
use
s3
natively,
as
basically
a
file
system
is
the
way
you
can
think
of
it,
or
you
can
also
use
a
local
disk
that
is
magical
and
all
the
pods
still
like
NFS
cluster
FST's
deficits.
Anything
that's
mountable
by
many
politics
could
also
be
used
as
a
storage
bin.
For
for
this,
instead
of
HDFS.
B
A
Alright,
so
we
have
a
number
of
custom
resource
definitions,
so
the
top
one
here
is
a
metering
resource,
it's
kind
of
the
resource
that
tells
you
to
install
everything.
Our
metering
operator
installs,
the
pods
listed
above
presto
hive
reporting
operator.
It
does
all
that
through
the
metering
resource,
is
basically
the
config
resource
for
installation.
A
I
would
expect
the
end
user
to
deal
with
so
there's
a
report,
data
source,
which
is
basically
incoming
data
or
data
that
already
exists
and
I'll
show
you
that
there
are
poor
generation
queries
which
are
the
sequel
previous
that
we
saw
before
there's
the
report
parameters,
query,
which
is
a
Prometheus
ul
expression
for
collecting
data
out
of
Prometheus,
and
then
there's
reports
and
schedule
reports,
which
are
the
parts
to
actually
act
upon
that
data.
The
storage
locations
are
a
way
for
configuring.
A
Whether
or
not
you
want
your
data
to
be
stored
in
HDFS
s3
or
a
local
file
system,
for
example.
So,
starting
from
the
bottom
up,
I'll
start
with
data
collection
portion.
We
have
these
for
poorer
medias
queries
which
are
really
simple.
They
are
an
actual
Prometheus
expression
prop
QL.
So
let
me
make
this
a
little
easier
to
read.
B
A
Though
I
will
make
this
comment
slightly
smaller
see
if
I
can
get
a
nice
break
in
here,
but
this
is
a
large
prometheus
query
expression
that
gets
the
containers,
memory
usage
and
then
does
it
by
grouping
it
by
pod
level.
So
you
get
apologies
such
information
instead
of
just
container
level,
and
then
we
do
a
bunch
of
joining
basically
at
the
end
do
poorly.
That
was
other
kubernetes
data
so
that
we
have
like
the
pod
name,
the
node
name
and
the
name
space.
So
this
query
is
just
run
by
the
configuration
of
the
data
source.
A
Which
actually
is
what
Maps,
eventually
down
to
a
real
table
of
evolution?
Query.
So
the
report
data
source
has
a
Prometheus
query
name,
which
is
the
name
of
the
pot
of
the
report
from
ez
squared
we
looked
at
before
pod
usage
memory
bytes
and
that
configures
the
operator
to
actually
go
and
collect
this
periodically.
A
This
section
normally
can
have
some
extra
options
for
like
how,
often
to
poll
and
like
come
sizing
for
like
how
much
data
to
grab
at
once,
but
by
default
it
would
just
use
some
local
defaults,
and
then
we
have
the
status
just
like
every
CR,
which
stores
other
information
about
this
resource.
In
this
case,
the
table
name
field
is
set
indicating
there's
a
database
table
created
for
this
and
that
we're
collecting
the
data.
So
now
that
we
have
a
data
source,
we
can
actually
query
it
from
our
day
base.
Using
a
report
generation,
query.
A
So
there
can
be
many
report
generation
frees
to
act
on
the
data
sources.
That's
kind
of
the
idea
is
that
the
data
source
is
the
underlying
raw
data,
and
then
you
can
have
one
zero
or
more
queries.
We
utilize
that
underlying
data.
That
way,
you
don't
actually
have
to
collect
the
data
more
than
once
for
processing
in
different
ways,
just
obviously
useful.
A
So
a
report
generation
query
just
like
everything
else
has
a
name
all
this
other
stuff
is
auto-generated
because
likes
to
fill
in
the
metadata.
We
have
a
columns
set
of
columns,
which
is
basically
what
we
expect.
This
query
to
output,
in
terms
of
like
a
database
schema
if
you're
familiar
was
like
a
sequel
table.
This
is
roughly
what
that
map's
to
is
the
columns
in
that
table
and
then
some
extra
information
for
how
to
display
it.
A
Reports
can
take
in
custom
inputs,
so
you
can
like
override
their
default
behaviors
and
program
them
dynamically.
A
little
bit
right
now.
M'allister
reports
only
allow
override
in
the
start
and
end
dates
and
then
there's
the
actual
sequel
query,
which
is
just
an
SI
sequel
that
has
go
templates
in
it
that
could
be
processed
before
the
queries
run
on
them
to
allow
us
to
do
things
a
little
more
dynamically.
A
A
A
Is
this
one?
So
we
have
an
hourly
report,
two
reports,
actually
one
for
memory
usage
and
one
for
sick
usage,
and
what
these
do
is
they
run
the
sequel
query
specified
by
the
generation
query
field
and
they
run
according
to
a
particular
schedule.
We
can
do
hourly
daily
monthly,
whatever
you
really.
We
also
support
cron
for
the
more
flexible
use
cases
as
well,
and
so
it
will
report
from
data
starting
at
the
reporting
start
time
until
the
reporting
end.
A
We
don't
have
a
reporting
in
so
it's
gonna
report
forever,
which
is
what
I
want
for
this
purpose,
and
it
will
retro
actively
go
back
and
fill
in
little
data.
That's
missing
from
the
start,
assuming
we
have
the
data
collected
from
prometheus
already,
and
so,
as
I
showed
before
this
has
been
running
for
about
ten
hours
since
last
night,
so
that
we
actually
have
some
more
than
just
a
few
rows
of
data
before
I
show
you
that,
though,
we
can
see
the
status
to
indicate
like
where
it's
at
in
the
report.
A
If
we're
back
long
for
anything,
you
checked
where
the
last
period
I
ran
for
is
and
then
because
I'm
on
open
ship.
But
this
works
with
regulators
as
well
I'm
using
routes
you
can
use,
load,
balancer
services
or
no
ports
as
well.
I
have
a
route
that
is
configured
to
expose
my
endpoint
at
a
particular
domain
name
here,
so
I
actually
already
have
a
command
set
up
through
create.
Yes,
you
can
query
it,
but
it's
not
really
anything.
A
I'm
too
worried
about
this
CI
clustered
I
set
it
up
with
off
using
the
moonship
off
proxy
and
I'm
querying
the
route
that
I
just
showed
before,
and
then
this
is
the
endpoint.
The
API
v1
schedule
reports
it
and
then
bit
hard
to
see,
but
I'm
querying
for
a
particular
report,
which
is
the
namespace
EP
usage
hourly
and
I'm.
Getting
it
in
touch
of
the
deform
at
don't
accrue
that
we
basically
get
the
result
says
tab.
Tab,
delimited
format
for
each
column
period
start
is
the
start
time
for
the
given
scheduled
interval.
A
So
it's
an
hourly
report,
so
each
period
starts
period.
End
isn't
one
hour.
The
namespace
is
the
namespace
that
we're
calculating
on
day
to
start
and
data
and
are
the
minute
max
of
the
values
in
that
time
range.
And
then
the
policy
pieces
for
seconds
is
the
CPU
every
instance
in
time,
multiplied
by
the
resolution
of
that
data,
all
added
together
to
get
us
an
actual
CPU
core
usage
seconds,
and
then
we
do
this
for
every
hour.
A
So
we've
got
13
to
14
14
to
15
and
everything
up
down
to
basically
the
last
hour
15
to
16.
You
see
we
have
as
well
with
memory
and
you
can
see
the
same
value
except
it's
in
bytes
and
it's
a
different
set
of
values
for
the
these
politics
usage
information.
So
this
is
actually
all
coming
from
node
exporter
at
the
end
of
the
day
and
then
Prometheus
collects
annoyed
exporter
data
and
we
do
some
extra
processing
with
the
sequel
and
the
Prometheus
query
to
get
it
into
this
format.
A
A
This
is
a
regular
report,
but
the
concept
is
that
you
can
have
custom
inputs
where
you
could
imagine
having
a
query
that
maybe
is
specific
to
a
particular
namespace
and
you
could
add
the
old
inputs
to
it
that
customizes
the
behavior
so
that
it
filters
everything.
That's
not
the
namespace
that
you
want.
Maybe
you
know
your
CI
test
namespace,
and
you
only
want
to
report
on
that
namespace.
This
is
something
that
we
just
added
in
is
currently
being
worked
on.
So
I
don't
have
a
great
demo
of
it
because
never
agrees
utilize.
A
These
custom
inputs
very
heavily
yet,
but
that's
something
that
we
just
released
and
then
we're
also
working
on
with
this
future,
a
concept
of
roll-up
which
allows
you
to
calculate
really
granular
reports
that
have
like.
Maybe
you
say,
like
the
hourly
interval
that
I
was
showing.
But
then
this
could
be
rolled
up
into
a
daily
report
which
basically
aggregates
the
hourly
results.
A
Yeah,
so
that's
the
rough
idea.
I,
don't
really
have
a
whole
lot
more,
given
that
all
of
this
is
just
custom
resources.
The
real
power
here
is
that
you
can
program
it
using
kubernetes.
Just
the
same
way,
you
can
put
them
anything
else
in
kubernetes
you
have
an
operator
that
wants
to
interact
with
this
system.
It
can
do
so
using
typical
kubernetes
technologies
like
operators
or
true
CTL.
A
C
C
So
here's
kind
of
how
it
all
comes
together
if
few
examples
of
how
you
can
use
this
in
a
real
environment,
so
chances
showing
us
all
the
reports,
all
the
stuff
under
the
hood.
But
at
the
end
of
the
day,
say
if
you
want
to
do
show
back
for
a
number
of
different
teams.
You
know
each
team
has
three
different
projects
and
you
know
they
have
a
certain
budget.
You
can
run
the
reports
the
chance
we're
just
talking
about
and
get
the
usage
on
Amazon.
C
We
can
actually
correlate
to
a
dollar
amount
which
is
really
cool
using
the
Amazon
billing
API,
and
so
you
can
get
those
into
Excel
and
just
you
know,
sort
them
and
group
them
by
the
different
namespaces
and
total
things
up
just
manually
or
because
these
are
all
just
using
CSVs.
You
can
actually
import
these
into
your
business
intelligence
tool
of
choice.
Whatever
you
want
to
do
and
make
dashboards
out
of
these
and
have
a
more
automated
flow,
you
can
also
start
doing
a
number
of
kind
of
like
augmented
math.
C
C
Email
reports,
that
type
of
thing-
and
my
favorite
use
case
for
this
of
all,
is
you
can
shame
teams
that
are
under
utilizing
what
they've
reserved.
So,
if
you're,
you
know
asking
for
more
than
2x
what
you're
using
on
the
cluster
itself,
you
can
start
shaming
those
teams,
you
know
calculate
the
ratio
of
what
they're
using
like
or
not
lift
them
out,
say
exactly
which
apps
need
to
be.
You
know
yanked
down
to
size
or
even
just
go
ahead
and
do
that
for
them.
C
You
could
have
automation,
that's
running,
it's
automatically
adjusting
people's
resource
limits
and
that
type
of
thing.
So
it's
pretty
exciting.
This
is,
you
know,
all
using
cluster
metrics
that
we
have
today,
and
that
is
one
whole
use
case
for
this.
But
you
can
also,
you
know,
export
custom
metrics
from
your
operators,
and
so
that
is
kind
of
the
the
other
use
case
of
this.
So
the
cluster
monitoring
and
gaining
insights,
for
that
is
great.