►
From YouTube: Keptn Community Meeting - October 28th, 2019
Description
Discussion of Keptn Quality Gates
https://docs.google.com/document/d/1Vebjqs2JRtcH_GHBXTqddyowKGTUxeMCxCUIUvFd23U
Dirk: will provide a sample SLO file that includes the upper- and lower-boundaries until next community meeting
Christian: Details on SLI pass criteria and logical combination of those
Florian: Provide information on how to write SLI providers in the next community meeting
A
A
Hopefully,
you
see
what
I
see
I
want
to
continue
the
discussion
of
the
captain
quality
care
to
action
items
from
last
meeting.
It
was
that
Andy
and
drop
said
they
will
provide
a
sample
SLO
file.
That
includes
some
more
details
and
in
the
next
community
meetings.
So
we've
already
managed
to
include
that
or
incorporate
that
sample
as
a
low
into
the
captain.
A
Quality
gates,
use
case
document
and
will
briefly
sweep
through
the
changes
and
and
just
point
out
what
is
different
now
and
Christian
from
the
captain
team
will
enlighten
us
more
on
the
details
of
how
that
passed
and
warning
criteria
actually
will
work,
and
this
was
an
action
item
for
me.
But
I
asked
one
of
my
colleagues
to
actually
do
that.
So
Florian
will
provide
us
with
information
on
how
to
write
SLI
providers
in
this
community
meeting.
B
A
This
one,
but
I
got
the
file
from
Andy
and
Rob
and
with
that
I
would
just
jump
right
into
what
has
changed
in
the
captain.
Quality
gates
and
use
case
document
prerequisites
prerequisites.
They
the
same.
We
renamed
their
service
level
indicators
to
response
time.
We
discussed
this
last
time
because
request
latency
was
well
not
that
well
received
from
some
and
I
also
think.
The
response
time
is
more.
A
Thank
you
very
much
when
coming
to
the
service
level
objectives
configuration
a
few
changes
happened.
So
this
is
the
an
example.
Slo
file
and
and
Christian
will
walk
us
through
an
very
detailed
service
level,
objective
configuration
later
on,
but
bear
with
me.
So
we
we've
discussed
the
filter
section
previously,
where
you
can,
for
example,
provide
the
idea
of
a
Prometheus
Crabb
job,
and
you,
of
course,
can
override
project
stage
or
service
values.
If
needed.
A
For
comparison,
we
have
the
possibility
to
define
if
you
want
to
compare
with
a
single
result
of
its
several
results
and
also
define
a
filter.
If
you
only
want
to
compare
to
previous
past
results
or
if
you
also
want
to
consider
warning
results,
for
example
in
the
comparison
as
for
the
objectives
themselves,
they
consist
of
a
reference
to
an
SLI.
A
So
this
is
the
name
of
the
SLI,
it's
actually
error
rate,
and
then
you
need
to
define
the
pass
criteria
and
you
identify
warn
criteria
if
you
want
to,
but
more
to
that
in
the
detail
later
on,
and
we
do
too
many
requests
have
reinstated
the
scoring
into
the
SLO
file,
where
you
have
the
possibility
to
define
waits
for
objectives
and
always
have
a
total
score
of
an
evaluation.
There
is
a
value
between
zero
and
one,
so
it
can
be
displayed
as
a
percentage
value,
so
it
it
does
not
change
over
over
time.
A
So
you
always
have
a
good
reference
to
the
previous
builds
when
it's
always
a
percentage.
So
it's
better
than
an
absolute
value.
These
are
the
page.
It
changes
that
that
came
from
the
proposal
that
Andy
and
Rob
did
so.
We
have
the
comparison
with
a
single
result.
We
had
that
already,
also
with
several
results
where,
for
example,
we
have
a
filter
that
you
include
past
or
warning
results,
you
can
define
the
number
of
comparative
results
you
want
to
compare
with.
In
this
case,
three
and
I
also
think
three
is
the
default
value.
A
So
nothing
did
really
change
in
the
user
walkthrough
there
was
one
one
question
open,
so
Henry
had
a
question
about
providing
a
data
source
and
Florian
will
answer
this
question
later
on,
but
first
I
would
like
to
go
through
a
detailed
example
of
the
service
level
objective
file.
So
you
see
there
is
all
kinds
of
documentation
in
there
where
each
and
every
field
is
document
pretty
neatly
except
the
total
score.
But
I
would
like
to
give
the
microphone
to
Christian
to
give
him
a
chance
to
walk
us
through
the
detailed
example
of
the
service
level.
A
C
C
C
You
could
also
try
to
extend
the
criteria
here,
which
we've
done
for
the
warning
case.
The
idea
here
being
is
we
want
to
warn
as
long
as
the
change
is
relatively
small
sort
of
changes
between
relatively
between
15%
or
minus
8
percent,
so
that
would
be
a
relative
change
with
an
upper
bound
and
a
lower
bound,
but
we
also
want
to
say
it
needs
to
be
less
than
500
milliseconds.
C
So,
for
instance,
this
free
criteria
would
be
connected
with
an
end,
so
it
would
be
relative
change
less
than
15%,
more
than
minus
8
percent
and
in
total,
less
than
500
milliseconds.
To
see
this
as
an
example
when
this
would
be
true,
if
we
had
a
response
time
of
300
milliseconds
in
the
first
run
and
in
the
second
run
we
had
a
response
time
of
400
milliseconds.
Let's
say
like
this:
this
would
result
in
a
relative
change
of
33%.
C
It
would
be
in
total,
less
than
500
milliseconds,
but
this
is
an
end
criteria,
and
this
is
already
above
above
the
15%.
So
this
would
fail,
but
when
wood
is
not
fail,
if
the
first
run
would
be
300,
milliseconds
and
the
second
run
would
be
300
10
milliseconds,
then
this
would
be
a
relative
change
of
about
what
is
it.
3
percent
I
think
and
3
percent
is
less
than
15%
it's
more
than
-8
percent,
and
the
total
time
is
also
less
than
500
milliseconds.
C
With
this
warning
criteria,
or
we
can
make
sure,
let's
say
we
have
20
test,
runs
over
the
time
and
continuously
this
value
response
time.
95
percentile
is
rising
and
at
some
point
it's
maybe
490
milliseconds
is
what
we
have
stored
as
an
average
and
the
new
value
that
we
get
is
maybe
501
milliseconds.
C
So
this
would
be
a
criterion
you
would
combine
with
end,
and
this
way
you
combine
it
with
an
or
obviously
there
can
be
different
use
cases
where
one
or
another
is
more
important.
We
have
one
use
case
here.
I
have
a
typo
here,
for
let's
say
you
want
to
count
SQL
statements.
For
instance,
you
could
say
if
the
relative
change
of
SQL
statements
between
test
runs
is
exactly
zero
percent.
So
it's
the
same
amount
every
time
then
this
should
obviously
pass
because
no
SQL
statement
has
been
added.
None
has
been
removed.
C
C
The
second
criteria
says
it
needs
to
be
less
than
100
it's
104,
so
this
is
also
not
fulfilled,
so
it
is
definitely
not
a
pass.
Then
we
can
go
and
look
in
the
warning
criteria
and
the
warning
criteria
says:
oh,
it's
less
than
5%
change.
So
warning
is
okay.
So
it's
a
warning.
We
will
send
a
warning.
What
would
have
happened
if
this
value
would
have
been
98
and
this
value
would
have
been
99,
so
we
had
98
runs
recorded
for
the
last
couple
of
runs.
98
statements,
I
mean
and
we
know
2nd
run.
C
C
So
with
that
kind
of
syntax,
it
allows
any
combination
of
end,
as
well
as
our
statements
that
we
can
use
for
specifying
our
our
lower
bounds,
upper
bounds
and
thresholds.
The
same
is
obviously
true.
Let's
say
for
security
vulnerabilities.
What
if
we
only
want
one
criteria,
so
it
doesn't
matter.
What
is
what
is
happening?
We
do
not
want
any
security
vulnerabilities
detected
or
if
one
is
detected,
we
don't
want
to
go
on.
So
we
only
pass
if
the
number
of
criteria,
the
absolute
number
is
zero
and
nothing
else
so
yeah.
C
These
are
essentially
some
of
the
cases.
You
can
obviously
define
more
than
two.
So
if
you
say
you
have
a
third
criteria
here
that
you
want
to
connect
with
an
or
statement,
let's
say
I,
don't
know
if
it's
about
five
thousand
milliseconds
again,
it's
okay,
for
whatever
reason
you
you
wanna
say:
if
it's
really
really
high,
then
we
don't
care.
C
C
D
C
C
A
For
comparison
SS
before
we
have
two
different
possible
comparisons
in
areas,
one
is
where
you
only
compare
with
a
single
value
and
of
course
you
say
compared
with
single
result,
and
you
can
compare
compare
with
several
results
than
you
would
put
several
results
here
and
you
can
define
filter
criteria
for
previous
results.
That
you
want
to
include
in
the
comparison
default
is
all
so,
regardless
of
if
the
evaluation
has
passed
or
has
resulted
in
a
warning
or
has
failed
for
the
previous
value,
it
is
included
in
the
in
the
comparison.
A
That
is
the
default
value
and
other
possible
values
are
passed
where,
on
the
past,
evaluations
are
included
or
pass
or
Warner
also
pass
or
warn
or
evaluations
that
resulted
in
a
warning
are
included
in
the
comparison.
Any
okay
find
and
define
the
number
of
comparison
results.
You
want
to
include
an
t,
aggregate
function,
I
think
I've
done
that
like
ten
minutes
ago,
but
there
we
go
again
and
then
we
can
define
objectives,
and
this
is
what
Christian
already
presented
to
us
and
then
it's
now
it's
time
to
talk
about
the
scores
again.
A
So
by
default
we
will.
We
will
handle
it
like
that
that,
if
an
SLI,
if
the
evaluation
of
one
SLI
is
successful,
it
will
result
in
one
point:
if
the
result
is
warning,
it
will
result
in
half
a
point
and
if
it
fails,
it
will
result
in
zero
points
and
you
have
the
possibility
to
define
weights
so
for
each
SLI.
So
by
default,
the
weight
of
each
SLI
is
1.
So
the
maximum
number
of
points
that
can
be
achieved
in
this
example
is
three,
because
we
have
three
SL
is
defined.
A
Now,
if
I
say
the
security
vulnerabilities
SLI,
is
that
important
for
me?
I
want
to
have
it
more
weight
in
the
evaluation.
I
can,
for
example,
just
see
here
wait
then
it
counts
twice
as
much
right.
So
if
it,
if
it
passes,
it
counts
for
two
points,
and
if
it
there
is
a
warning,
it
counts
for
one
point,
and
this
is
of
course
also
reflected
in
the
overall
maximum
score
and
then
the
actual
score
that
is
calculated
from
the
current
evaluation
is
divided
by
the
maximum
score,
and
that
yields
the
total
score
of
that
evaluation.
A
Sometimes
so
it's
good
that
you
can
actually
wait
some
s.l
eyes
more
than
others,
but
there
is
the
concept
of
chiesa
lies
and
if
one
of
the
key
SL
eyes
fails,
then
the
entire
evaluation
should
fail,
regardless
of
the
other
results,
and
to
accommodate
for
that.
I
think
this
is
missing
in
the
example
right
now,
but
let
me
just
write
it
here.
A
Let's
add
it
to
the
SQL
statement
example
here:
if
the
evaluation
result
of
this
SLI
falls
in
yields,
a
warning
results,
then
the
the
result
of
the
entire
evaluation.
It's
also
warning
because,
one
of
because
the
key
SLI
true
statement
is
set
here
and
I
think
these
are
all
the
details
that
there
are
to
know
about
scoring
the
weights,
the
chiesa
lies
and
the
comparison
modes
that
we
have.
So
are
there
any
questions
at
that
point
in
time
with
regards
to
the
service
level
objectives,
definition,
I.
C
Think
one
question
that
we
raised
earlier:
we
can
discuss
in
in
in
this
round
if
a
key
sli
is
defined,
should
it
be
only
one
key
SLI
or
is
it
okay?
If
there
is
multiple
key
s
allies
and
if
there
is
multiple
key
s
allies
should
the
logic
be
such
that
if,
let's
say,
there's
four
TS
allies,
one
of
them
has
a
warning
and
another
one
has
a
fail.
C
A
E
C
C
E
Manipulate
the
weight
to
kind
of
give
it
a
higher
priority
to
so
to
me,
it's
there's
different
ways
to
achieve
it,
and
then
the
key
LSL
I
was
just
kind
of
a
specific
use
case
of
if
one
metric,
maybe
two
was
really
really
important
and
it
failed.
It
just
was
a
mechanism
to
force
the
whole
thing
to
fail
when.
C
He
actually
discussed
the
use
case
why
the
weights
are
not
enough
from
just
to
add
that,
let's
say
you
have
not
just
four
metrics
but
100
metrics.
If
you
then
increase
the
weights
of
certain
metrics
and
all
the
other
metrics,
all
of
a
sudden
become
absolutely
useless,
and
therefore
the
the
note
the
the
key
sli
would
be
a
little
bit
better
for
explaining
what
is
happening.
That's
a
very
valid
point.
I
agree
with
you.
F
E
I
think
it's
very
good
I
really
like
it.
My
question
would
be,
you
know
more
remain.
Maybe
it's
outside
the
spec.
You
know
what
you
know.
What
is
what's
the
first
implement
a
is
the
first
implementation
gonna
be
CLI.
Rest
both
is
the
you
know.
The
data
store
part
of
this,
presumably
the
UI,
as
well
as
a
later
later
change.
You.
A
E
And
it's,
but
it
sounds
like
it's
gonna,
be
in
sort
of
an
asynchronous
kind
of
mechanism
where
I
submit
my
evaluation.
I
receive
a
captain
context
and
then
I
have
to
use
that
cop
in
context
and
then
query
for
results
as
a
second
step.
Correct,
correct,
mm-hmm,
that's
okay,
so
there'll
be
some
sort
of.
So
if
I'm
like
in
a
pipe
a
code
pipeline,
you
know
this
scenario
of
salmon
Jenkins,
so
I,
don't
I,
don't
know
how
in
Jenkins,
you
might
do
that.
E
A
Most
likely,
not
at
least
not
in
the
beginning,
so
the
reason
why
it's
an
SOS
call
is
that
it
might
take
several
and
in
fact
several
minutes
to
gather
all
of
the
SL
ice
and
into
the
evaluation.
So
the
evaluation
part
is
the
quick
part,
of
course,
but
the
gathering
of
the
of
them
of
the
slice
values
takes
considerate
amount
of
time
which
makes
way,
which
is
why
it
makes
no
sense
to
make
it
a
synchronous
call.
E
E
Considerate
I
mean
I,
know
we
don't
decide,
but
it
might
be,
it
might
be.
You
know,
there's
advantages
to.
H
E
E
Because
I
think
that's
what
we're
going
to
try
I
guess
une,
as
you
guys
have
it
ready
like
I,
know
that
I
will
immediately
try
to
incorporate
the
quality
gate
and
at
least
two
maybe
three
different
pipeline
types.
You
know
like
Jenkins
as
your
DevOps
and
then
probably
Mike
villager.
We
probably
tried
in
concourse
with
so
we'll,
definitely
want
to
try
it.
So
will
this
happen
that
way
we
can
have
good
examples
out
there
for
how
to
do
it.
E
In
the
in
the
any
any
just
one
thought
I
had
was
with
the
indicator
file
itself,
do
you
have
maybe
have
to
go
back
once
there's
this
there's
the
same
concept
of
a
data
source?
I'm.
Sorry,
if
you
already
showed
this,
but
we
will
support
multiple
data
sources
like
we
did
before
within
the
indicator,
file
itself,
I
think.
A
This
is
the
perfect
segue
to
to
Florian's
and
topic,
so
maybe
he
can.
He
can
share
the
bigger
picture
of
what
they
were.
The
lighthouse
service
actually
does
and
how
the
different,
let's
say,
SLI
providers,
then
then
communicate
with
lighthouse
service
and
how
how
the
definition
of
the
of
also
customize
allies
could
work
in
the
future.
I
All
right,
you
should
now
see
sequence,
diagram,
yeah,
all
right
so
since
we
re
implemented
the
whole
logic
of
how
we're
evaluating
results.
Previously,
if
you
recall
it,
we
use
the
pedometer
service
and
one
of
the
main
pain
points
we
had.
There
was
if
we
would
want
to
add
another
data
source,
so,
for
example,
in
addition
to
dynaTrace
and
Prometheus,
we
wanted
to
support,
for
example,
here
some
other
data
source
like
new
T's
new
load.
I
Now
we
wanted
to
make
this
whole
thing
more
flexible
and
more
extensible,
so
the
way
it
works
now
is
that
we
have
the
evaluation
service
or
the
lighthouse
service
will
be
called,
and
this
lighthouse
service
will
trigger
the
retrieval
of
metrics
from
an
external
data
source
by
sending
a
certain
type
of
captain
event,
which
we
will
look
at
closely
in
a
few
seconds
and
basically,
the
new
data
source.
If
we
want
to
implement
the
the
data
source,
will
be
implemented
as
an
HTTP
service.
H
I
Required
SLI
values
so,
for
example,
the
error
rate,
the
throughput
and
response
time
that
we
saw
in
the
document
earlier
did
Big
Shot.
Then
it
will
retrieve
the
metrics
and
then,
as
a
result,
the
data
source
service
should
send
out
another
event
that
contains
the
values
for
those
sli
values
and
now
we're
going
to
look
at
an
example
of
that
of
those
requests.
I
So
in
that
case,
the
incoming
event
that
the
data
source
service
should
be
able
to
process
to
is
of
the
type
captain
internal
event
get
SLI
and
this
event
will
basically
contain
the
type
of
the
decide
SLI
provider.
So
in
that
case
it's
Prometheus,
then
it
will
also
always
contain
the
project,
the
service
and
the
stage
a
start
and
an
end
times.
Then,
to
enable
the
data
source
to
calculate
the
duration
of
the
tests
and
exact
timeframe
and
an
array
containing
the
name
of
the
indicators
that
should
be
retrieved.
I
So
in
that
example,
we
have
the
throughput
the
error
rate
and
50%
percentile
for
the
response
time
and
then
the
way
the
the
data
source
retrieves.
Those
values
is
completely
up
to
the
to
that
other
source
itself.
But
the
only
obligation
of
that
service
is
to
transform
it
in
an
event
that
looks
like
this.
So
what
are
the
most
important
properties
here?
So,
first
of
all,
the
type
needs
to
be
like
that,
so
Sh,
captain
internal
event
get
SLI
done.
Just
on
a
side
note.
I
This
is
still
under
development
and
subject
to
change,
but
we
will
of
course
provide
a
documentation
on
how
to
write
those
evaluation,
services
or
data
source
services
and
in
the
payload
of
the
of
the
event,
the
actual
values
for
the
retrieved
metrics
will
will
be
contained.
So
in
that
case
we
have
the
the
name
of
the
metric
some
throughput.
Then
the
value
then
an
indicator.
I
If
the
retrieval
of
the
metric
or
SLI
value
was
successful
and
if
it
was
not
successful,
we
also
have
the
possibility
of
including
a
message
that
pretty
much
describes
the
the
reason
why
it
couldn't
be
received.
So
in
that
case,
if,
for
example,
we
weren't
able
to
retrieve
the
response
time,
P
50
as
a
live
alyou.
I
We
can
send
back
an
object
that
looks
like
that,
and
we
also
already
have
an
example
of
one
of
those
data
source
services
available
in
the
captain
cantrip
organization.
So
if
you
go
to
that
or
we
you
will
find
the
Prometheus
SLI
service
and
in
the
develop
branch,
you
will
see
an
example
implementation
of
such
a
service,
along
with
detailed
readme,
which
describes,
for
example,
also
how
you
can
override
queries
and
and
define,
for
example,
the
parameters
endpoint.
I
So
if
you,
for
example,
need
access
or
the
credential
of
an
external
service,
you
could
do
that
in
a
human
at
the
secret.
And
if
you
want
to
configure
certain
certain
aspects
of
the
implementation
of
your
data
source
or
service,
you
can,
for
example,
use
a
config
map,
as
we
did
here,
all
right
and
yeah.
Of
course,
if
you,
if
you
want
to
implement
your
own
service,
you
can
go
to
this
example,
but
also,
of
course,
contact
us
directly
and
ask
for
help.
If
you
want
to
contribute
datasource
service.
E
E
H
I
Three
predefined
metrics,
so
the
error
rate,
the
throughput
and
the
response
time
that
are
also
described
in
the
in
the
Google
Drive
talked
about
this
use
case
and
those
metrics
have
to
be
implemented
by
the
by
the
data
source
service.
But
we
will
also
allow
users
to
define
any
type
of
queries
and
to
extend
those.
I
E
F
I
I
Exactly
so,
for
example,
if
you
have
two
projects
or
shop,
you
can
configure
captain
captain
to
retrieve
s
alive,
at
least
for
this
project,
either
from
the
interface
or
from
prometheus
or
from
another
potential
data
source.
But
currently
you
cannot
use
multiple
data
sources
within
one
project,
but
if
you
have
another
project,
you
can
of
course
configure
another
data
source
for
that
project.
C
And
architectural
speaking,
there
is
nothing
from
stopping
us
to
extend
this
functionality
at
a
later
point,
so
to
say
that
we
have
a
Prometheus
data
source
for
metric
a
and
the
dynaTrace
data
source
for
matrix
P,
but
it
is.
This
is
something
that
requires
a
little
bit
more
thought
and
it's
not
something.
We
we
support
right
now:
yeah,
okay,.
E
Yeah
I
would
think
that
we
would
want
I
mean
I,
guess
I
was
thinking
the
scenario
of
just
an
individual
metric
like
error
rate,
and
you
could
only
have
one
error
rate,
but
I
was
but
I
would
think.
We'd
want
to
have
that
ability
to
say
write.
Some
data
comes
from,
you
know,
have
the
ability
to
say
some
data
comes
from
some.
Some
of
the
metrics
come
from
dynaTrace
say
some
of
the
metrics
come
from
neo
load
as
an
example
and
to
be
able
to
kind
of
combine
those
in
a
single.
E
C
C
A
Because,
if
not,
then
we
will
just
end
the
meeting
and
if
there
are
any
questions
just
reach
out
to
us
through
the
usual
channels,
either
open
a
github
issue,
writers
on
our
slack
channel,
write
us
an
email
call
us
find
us
and
talk
to
us.
Whatever
suits
your
needs.
Thanks
for
joining
and
see
you
in
two
weeks.