►
Description
Originally part of the Konveyor project the Pelorus project is designed to simplify the data collection needed around DORA metrics. Learn more about DORA Devops metrics and the Pelorus project from Red Hat's own Wesley Hayutin and Michal Pryc.
Learn More: pelorus.readthedocs.io/
A
A
A
You
have
permanent
responsibilities
for
it.
Now,
though,
it's
done.
No,
no
love
it
good
afternoon.
Good
options
welcome
to
episode,
62.
A
wow
I'm,
really
at
62
now
yeah.
So
this
week's
episode
is
on
Dora
metrics,
which
are
devops
devops,
metrics
and
Polaris,
which
is
a
very
neat
tool
that
originally
came
out
of
the
conveyor
project
on
Gathering
and
measuring
those,
but
real,
quick.
First
before
we
begin
with
that,
a
couple
minor
announcements,
the
first
one
is,
is
that
argocon
North
America,
which
is
co-located
with
kubecon
in
Chicago,
this
October
October
yeah,
something
like
that.
A
The
cfps
for
that
close
on
August
6th.
So
that's
coming
up
in
a
couple
short
weeks
here
so
get
those
cfps
in.
A
We
always
like
to
watch
those
videos
and
figure
out
what
people
are
really
doing
with
Argo.
Other
minor
announcement
is
that
I
am
leaving
on
vacation
next
week
and
I
will
be
out
for
two
weeks.
Therefore,
the
next
get
Ops
Guide
to
the
Galaxy
stream
will
be
sometime
in
August
second
week
of
August
I.
Think,
instead
of
last
week
of
July,
so
we're
skipping
the
stream
in
two
weeks,
I
will
be
on
vacation
and
celebrating
my
birthday.
Instead,
you
may
send
me
birthday
wishes
over
Twitter
as
per
usual
I.
A
You
know
also
accepting
messenger
pigeons
and
you
know
LinkedIn
messages,
and
you
know
any
other
ways
of
telling
me
that
you're
celebrating
my
birthday
so
because
it's
the
best
holiday.
A
A
Perfect,
okay,
so
then
we'll
get
started.
I
will
introduce
Wes
now
and
mihao,
who
are
the
two
of
the
the
brains
behind
Polaris,
we'll
put
it
that
way
and
Wes
I'll.
Let
you
introduce
yourself
and
pass
the
Baton.
C
All
right,
I
get
a
full
brain
I'm.
Thank
you
for
the
compliment.
Wes
Hayes
and
been
at
red
hat
for
around
16
years.
Engineering
manager
for
plorus
the
community
project.
D
C
A
That
was
really
fast
interest
guys,
but
you're
good
good
for
you
good
for
you
I,
obviously
don't
like
to
talk
about
themselves,
not
going
to
be
a
streaming
host
anytime
soon.
We
have
to
be
real
good
at
talking
about
ourselves
right
Johnny.
That's.
A
C
B
C
C
Traditionally,
before
we
became
so
famous,
Polaris
was
known
as
a
navigational
instrument
used
on
ships
to
help
determine
what
the
ship's,
bearing
is
think
of
it
as
like
a
simplified
Compass
without
a
true
north.
We
hope
that
Polaris,
the
software
engineering
project
also
isn't
going
to
tell
you
exactly
where
to
go,
but
help
align
your
organization
to
where
you
want
to
go
and
keeping
you
on
track.
While
you
compare
your
organization
to
that,
bearing
that
you've
set.
C
All
right,
let
me
attempt
to
get
to
the
next
slide
Okay,
so
we've
established
what
the
name
is
and
so
how?
How
do
we
help
software
organizations
find
their
bearing?
We
are
collecting
metrics
from
various
traditional
sources.
Your
sem
your
issue,
tracker
from
openshift
directly
bringing
those
metrics
together
in
a
dashboard
that
hopefully
is
fairly
easy
for
organizations
to
read,
so
they
can
start
having
conversations
about
how
they
want
to
steer
their
organization
Improvement
to
the
bearing
that
they
want
to
set.
C
The
dashboarding
and
the
information
you
get
from
Polaris
will
help
your
organization
or
any
organization
realize
what
kind
of
business
value
they
are
getting
from.
Various
changes
processes,
procedures
that
they
are
implementing
in
the
organization
or
that
they
already
have
established,
compare
those
against
various
teams
in
your
organization
or
as
an
aggregate
as
well.
You
can
see
all
your
organizations
together
to
see
what
kind
of
business
value
you're
getting
out
of
your
changes
and
to
make
Corrections
see
who's,
doing
something
well
see
who's,
doing
something
poorly.
Compare
contrast
get
on
the
right
pairing.
C
All
right,
so
this
is
just
a
small
sliver
of
what
plorus
looks
like
fairly
easy
to
install
we're
in
the
community
operators
for
openshift
installing
and
getting
some
sample
data
we've
done
it
we've
timed.
It
takes
about
five
minutes,
so
hopefully,
folks
in
the
community
also
find
it
easy
to
get
it
up
and
running.
Get
some
sample
data
in
there
get
a
feel
for
how
things
work.
C
Okay,
so,
basically,
as
we're
collecting
information
on
various
teams
and
their
the
software
delivery
performance,
we'll
get
to
that
in
a
moment,
it
just
really
boils
down
to
a
classic
science
experiment.
You
take
your
control,
you
measure
that
you
introduce
some
change,
some
experiment,
you
measure
that
compare
and
contrast
repeat,
rinse
and
repeat
over
and
over
again
trying
to
constantly
improve
which
everyone
in
the
software
industry
is
always
trying
to
just
do
things
a
little
bit
better
every
day.
C
Not
everyone
is
measuring
those
changes,
those
in
so
what
we're
we're
trying
to
avoid.
Does
it
feel
like
we're
getting
better
sure
it
feels
like
we're
getting
better
people
are
saying
we're
getting
better,
that
kind
of
boils
down
to
gut
checks,
we're
trying
to
help
organizations
avoid
the
gut
check
and
actually
have
real
metrics
to
understand
if
their
changes
are
having
any
real,
meaningful
impact.
A
So
I'm
actually
going
to
interrupt
you
here,
because
some
context
that
I
think
you
and
I
share
West
that
other
other
people
might
not
and
I
actually
was
or
on
Tuesday
I
participated
in
a
dynatrace
panel
for
on
the
state
of
essery,
and
they
asked
me
the
question
which
was
like
what
is
the
most
surprising
thing
that
you've
experienced
in
the
last
year
in
your
esri
practices
and
so
recently
having
come
from
a
position
of
trying
to
help
Advance
esri
culture
across
a
brighter,
the
broader,
like
red
hat
organization.
A
What
I
found
was
that
teams
were
doing
a
lot,
but
with
very
little
data
and
very
little
way
of
understanding
and
contextualizing.
The
data
that
they
did
have
so
I
kind
of
you
know,
and
it's
not
like
red
hat,
is
not
we're
not
unique
to
this
position.
Right
I've
talked
to
people
all
across
the
industry.
It's
basically
been
the
case
of
people
are
doing
a
lot
with
gut
checks
and
kind
of
general
and
not
having
necessarily
great
like
hard
data
to
to
really
know
how
how
their
perception
is,
comparing
against
reality.
A
So,
like
my
favorite
metric
here
is
actually
change
rate
failure
or
change
failure
rate.
Excuse
me:
that's
the
dyslexia
happening
right
there,
the
so
the
change
failure
rate
probably
is
my
favorite
of
the
Dora
metrics,
because
that
is
like
that
that
one
is
a
data,
but
it's
also
kind
of
a
gut
check
like
I,
think
we're
getting
better
and
like
there's
a
okay
that
that
can
actually
aim
towards.
Like
oh
yeah,
we
are
now
if
you're
getting
better,
because
you're
deploying
a
lot
less
frequently
than
you
were
before.
A
Maybe
you're
not
really
getting
better
and
that's
you
know,
that's
that's
a
whole
other.
You
know
soul
searching
thing
you
have
to
do
as
an
organization,
but
so
that's
why
it's?
Why
I
really
like
this
project?
It's
why
I
asked
you
guys
to
come
here
and
and
and
talk
about
this
is
because
holistically
across
the
industry,
I
think
where
a
lot
of
us
are
still
from
like
an
SRE
perspective
where
we're
fighting
fires.
A
We're
still
like
you
know,
we're
not
not
really
looking
at
preventative
stuff
as
much
as
we
would
want
yet,
and
it's
hard
to
have
it's
hard
to
make
good
start
it's
hard
to
make
good,
as
you
say,
like
bearing
directional
decisions
without
the
without
the
data,
so
I
run.
I
want
to
kind
of
re-emphasize.
People
are
doing
a
lot
of
really
awesome
things
without
data,
and
this
solves
that
problem.
A
C
Yeah
I've
had
that
same
experience.
Some
some
of
the
these
gut
checks
will
come
out
at
the
end
of
the
release
during
a
retrospective,
and
you
know,
and
and
if
you
have
a
a
pretty
like
some
of
the
best
organizations
I've
seen-
do
bring
some
data
to
that
kind
of
a
retrospective.
C
But
what
I
have
not
seen
is
across
a
large
organization
being
able
to
break
down
application
or
team
by
team
who's,
doing
being
able
to
understand
who's,
doing
some
things
particularly
well
and
and
who's
not,
and
what
we
can
learn
from
each
other
in
that
way
and
that'll
only
that
doesn't
come
out
from
gut
checks,
because
people
aren't
going
to
call
out
other
teams
that
way
publicly
and
they
just
that's
just
not
the
way
it
works.
C
But
if
we're
all
unbiasedly
bringing
in
data
from
our
processes
and
then
have
them
up
for
the
world
to
see-
and
you
know
with
good
intentions,
raising
those
problems
up
or
demonstrating
who's
good
who's,
bad
with
the
intention
of
correcting
that
bearing
back
to
oh
I,
can
learn
something
from
this
team
over
here
then
I
think
this
product
helps
it's.
B
C
B
Got
this
I've
got
this
gut
feeling
that
it's
your
team
doing
this.
It's
like
no
I've
got
data
that
backs
us
up.
Let's,
you
are
introducing
a
regression
quarterly.
You
know
and
like
it's
it's
causing
this
thing,
so
it's
it's
tangible
data
that
you
can
use.
You
know
to
make
it
to
make
a
difference
and
essentially
get
better
so.
C
Yeah,
that's
awesome.
Yeah
well
said,
okay,
so,
overall,
what
we're
calling?
What
we're
measuring
is
software
delivery
performance
and
we're
measuring
that
overall,
with
for
the
four
key
Dora
metrics
lead
time
to
change
deployment,
frequency
mean
time
to
restore
and
change
failure
rate
and
we'll
get
into
a
better
definition
of
what
those
things
are
we
also
if
this
is
starting
to
sound,
compelling
to
you
or
your
organization,
you
can
deep
dive
and
some
good
books
to
check
out
especially
are
accelerate
and
I,
like
lean
thinking
as
well.
C
So
if
this
is
starting
to
sound
like
something
you
you
need
to
have
for
your
organization,
I
highly
recommend
reading
these
books
here,
especially
those
two
today
we're
going
to
go
kind
of
Cliff
note,
Edition
never
fully
recommended,
but
here
we
go.
So
what
are
those
four
key?
Dora
metrics?
What
do
they
mean?
We'll
start
with
lead
time
for
change
on
the
left,
and
this
is
kind
of
a
measure
of
your
Market
agility.
C
You
want
a
low
lead
time
for
change,
which
is
the
time
your
software
engineer
committed
the
code
to
when
it
was
actually
put
in
a
production
build.
So
small
changes
small
features
getting
out
to
customers
bug
fixes
coming
in
often
this
is
kind
of
what
you
want
to
see.
You
don't
want
to
see
the
monolithic
huge
release
overwhelm
your
customers
introduce
a
lot
of
new
bugs
all
at
once.
That's
not
the
ideal
According
to
some
of
the
the
reading
that
I
previously
mentioned
same
thing
with
the
deployment
frequency
now
I
have
a
lawnmower
here.
C
Okay,
so
moving
on
to
deployment
frequency
again
an
indicator
of
small
batch
size,
you
want
to
have
lots
of
deployments
getting
out
to
production,
small,
making
sure
that
you
have
your
the
more
often
you're
doing
it,
the
better
you
are
at
it
and
the
the
easier
it
is
to
consume
for
your
for
your
customers,.
D
B
And
I
just
want
to
give
an
example
of
this
right.
Like
there's
within
our
team,
you
know:
wait,
we
do
validated
patterns
and
stuff
like
that.
There's
one
time,
I
wrote
I
had
this
huge
commit
right
and
they
had
like
400
commits
like
it
was.
It
was
ridiculous.
It
was.
It
was
massive.
A
A
A
When
your
customers
or
other
software
Engineers,
just
like
big
jumping
just
like
big
bucket
size,
changes
on
them
is
just
I
mean
how
many
things
have
I
seen
gone
wrong
because
dependency
dropped
just
a
bucket
of
changes
and
something
was
missed
because
the
changelog
was
too
big
to
truly.
You
know
for
one
person
to
truly
consume
and
comprehend
right
happens
all
the
time.
So
it's
how
you're
a
good
friend
to
your
friends
and
and
co-workers
it's
how
you're
a
good
vendor
to
your
customers.
It's.
C
A
Probably
be
a
little
bit
a
little
bit
better
for
that
I.
Think,
for
you
know
one
of
the
fun
things
about.
Although
we
have
a
lot
of
red
Hatters
who
watch
this
stream
and
participate
in
the
chat,
we
do
actually
have
quite
a
few
customers
who
watch
the
stream
and
occasionally
make
it
to
participate
in
the
chat
too.
So
it's
we
actually
get
to
live
that
our
customers
are
our
friends
here
in
get
Ops
guide.
A
C
Hi
friends,
hi
friends,
so
the
small
size,
I
think
also
comes
into
small
batch
size.
Frequent
deployments
comes
into
that
the
far
right
change
failure
rate
your
favorite
metric
Hillary.
In
case
you
have
to
roll
back.
If
you
have
to
roll
back
a
huge
monolithic
change.
Oh
the
pain
you
might.
B
C
Rolling
back
a
feature
that
some
other
customer
needs,
not
not,
okay,
so
what
is
change
failure
rate,
it's
the
percentage
of
of
deployments
overall
or
in
in
a
certain
time
period
that
you
had
to
roll
back
to
fix,
and
so
obviously
you
want
to
keep
that
as
small
as
possible
and
in
worst
case
scenario,
if
you
do
have
to
do
it,
the
small
batch
size
really
helps.
C
Meantime
to
restore
is
a
little
bit
easier
to
understand.
Now,
it's
just
simply
how
long
it
took
once
you
did,
roll
back.
You
opened
up
a
bug
open
up
a
bug
rolled
back.
How
long
did
it
take
to
get
a
fix
in
push
it
out
to
production,
verify
it
close?
The
bug
and.
A
B
A
That
incident
is
fixed
by
a
push
to
production
or
by
a
manual
intervention,
because
we're
also
using
mean
time
to
restore
to
track
how
we're
doing
with
our
error
budgets
and
our
slos,
which
is
a
complete
like
not
really
part
of
the
intent
of
the
metric
I.
Don't
think.
A
But
it's
what
we've
it's
one
of
the
ways
we
started
applying
the
metric,
because
it
helps
us
to
understand
cost
in
a
way
of
not
just
like
the
the
dollars
spent
on
cloud
compute,
but
also
the
dollar
spent
on
person.
Time
and
effort.
C
Yeah
I
I
could
see
the
same
thing
being
opened
up
if
there
was
a
network
outage
or
something
like
that,
and
your
service
was
down
and
you
and
it
yeah.
If
you
wanted
to
track
that,
absolutely
I
think
that's
totally
appropriate
all
right.
So
we
know
what
the
four
metrics
are:
vaguely
Cliff
Notes
style.
C
What
do
software
organizations
do
that
impact
these
four
things,
and
this
is
where
I
kind
of
start
going
crazy,
So,
Cal
calm
me
down
a
little
bit
if
you
need
to
so
in
every
software
organization.
I've
ever
worked
in
all
of
these
things
on
this
slide
that
I
have
under
each
metric
are
done
a
little
bit.
Different
they're
also
done
a
little
bit
differently
each
week
each
month.
C
Let's
use
engineer
engagement
as
an
example
are,
are
the
engineers
on
your
on
your
team,
highly
motivated,
come
running
to
their
keyboards
in
the
morning
drinking
coffee,
banging
out,
features
and
Bug
fixes,
or
or
is
it
like,
coming
up
to
a
holiday?
Is
it
almost
Christmas
and
everyone's
burnt
out
and
just
needs
to
get
away
that
that
happens
on
every
team?
C
It's
kind
of
it's
kind
of
like
measuring
when
you
take
your
car
to
the
dealership,
you
never
want
to
get
your
car
worked
on
on
a
Friday,
because
the
folks
working
there
are
not
thinking
about
your
car
they're
thinking
about
the
weekend
and
you
don't
want
to
get
it
done.
This
engineer.
Engagement
is
a
variable
even
across
the
best
of
teams
across
time
and
very
different
across
Personnel.
Obviously,
so
it
would
be
really
neat
if
we
could
kind
of
see
that,
through
the
lead
time
for
change
same
thing
with
code
reviews.
C
C
Is
it
easy
for
them
to
find
bugs
themselves
with
automated
unit
tests,
integration
tests,
and
is
that
running
through
CI?
And
is
your
CI
fast
feedback,
or
do
you
have
to
put
in
your
patch
and
then
come
back
in
the
next
day
and
read
the
results
when
you're
out
of
context
and
you
need
to
get
back
into
Flow
State?
C
A
So
I'm
not
sure
how
much
of
a
real
question
it
is
because
I
do
not,
which
is,
is
code
review
relevant
at
all
and
feel
free
to
all
too,
because
I
don't
know
if
you
actually
expected
me
to
address
this
feel
free
to
clarify
the
question:
is
code
review
relevant
at
all
and
that's
that
is
a
really
in
I.
Don't
even
know
how
to
take
that
question.
It's
an
interesting
thought.
Experiment
I
will
say
something.
I've
seen
done
well.
A
Some
of
the
teams
at
Red
Hat
have
somebody
who's
sort
of
like
almost
on
call
to
do
the
code
reviews
for
a
period
of
time,
right
right
and
then
like
that
changes
whoever's
supposed
to
do.
The
code
review
will
will
like
alternate
through
a
rotation
to
help
kind
of
reduce
like
code
review
burnout
and
the
people
who
are
doing
the
code.
A
Reviews
are
not
necessarily
not
necessarily
expected
to
do
Feature
work
during
that
period
of
time,
so
that
they're
not
context
shifting
too
much
and
compromising
the
the
Integrity
of
the
review.
C
I
think
code
reviews
are
an
opportunity
for
more
experienced
senior
Engineers
to
help
grow
new
new
Engineers
with
their
patches.
It's
a
as
long
as
everyone
is
doing
it
with
good
intentions
and
and
nice
and
trying
to
help
folks
out
code
reviews
are
an
amazing
learning
opportunity
for
everyone.
It's
also
something
that
I've
recently
thought
that
maybe
a
first
pass
with
AI
would
be
a
good
thing
to
hook
in
your
PR's
to
automatically
have
whatever
AI
agent.
D
I
also
think
that
code
reviews
are
important
here
because
they
may
affect
change
failure
rate
at
some
point,
because
the
less
reviews
there
are
less
eyes
on
their
code
than
the
higher
change
failure
rate
may
be
right.
Yeah
yeah.
C
There's
a
big
variance
in
in
code
reviews,
I've
seen
it
done
really
well,
where
you're
not
only
looking
at
the
code
you're
pulling
the
patch
you're,
trying
it
out
you're
in
you're,
really
giving
that
feedback.
It's
it's
gonna,
be
something
that
everyone's
experience
is
a
little
bit
different.
B
A
C
Well,
I
mean
to
this
point
an
example
would
be,
let's
say,
there's
a
team
that
has
a
high
lead
for
change,
but
a
very
low
change
failure
rate.
Maybe
that
difference
is
all
in
the
quality
of
the
code
reviews
don't
know
but
wouldn't
know
unless
you
measured
it.
Moving
on
to
deployment
frequency.
Some
really
cool
new
stuff
out
in
the
industry
get
Ops
Argo
ml
Ops
to
help
you
get
your
code
from
GitHub
to
production.
C
All
automated
all
very
cool
stuff
will
help
your
deployment
frequency
go
up
if
you're
not
cutting
edge.
If
you
don't
have
those
things,
something
as
basic
as
when
you're,
when
I
used
to
help
deploy,
let's
see
was
it
was
satellite
hosted
back
in
the
day
all
we
had
was
checklists.
This
is
a
long
long
time
ago,
when
I
get
wheeled
into
the
emergency
room,
to
have
my
leg,
cut
off
and
I
looked
down
at
my
leg
and
the
and
the
correct
leg
has
the
mark
on
it.
C
For
that
one
to
be
cut
off,
I
would
be
very
happy
and
more
confident,
I'm
kind
of
paranoid
about
that,
but
checklist
surgeons
use
them
Pilots
use
them.
If
that's
all
you
got
use
it
security
scans,
improve
change
management,
all
those
kinds
of
things
help
your
deployment
frequency
go,
go
up
and
vary
greatly
from
Team
to
team
I'm
going
to
keep
re-emphasizing
that
because
we
want
to
learn
from
each
other
experiment
in
teams
take
best
practices.
That's
your
bearing
that's!
C
What
Polaris
will
help
you
find
flipping
over
to
change
failure
rate
again
on
the
far
right.
These
are
the
the
percentage
of
bugs
found
in
production.
Are
you
doing
deep
planning
designing,
looking
for
edge
cases
while
you're
in
design?
Are
your
test
environments
like
your
production
environments?
Are
you
having
retrospectives?
Are
you?
Are
you
implementing
do
not
repeat
yourself
when
it
comes
to
errors,
those
all
those
things
vary
from
Team
to
team
and
also
help
reduce
your
change.
Failure
rate
mean
time
to
restore
same
kind
of
thing.
C
What's
your
rollback
strategy,
do
you?
Do
you
have
a
backup
of
your
production?
Insta
instance?
Can
you
just
take
the
last
known,
good
and
or
n
minus
one
and
send
it
out
instantly?
Do
you
have
failover
across
regions?
Maybe
you
can
do
it
that
way,
but
getting
from
outage
to
to
back
to
running
as
quickly
as
possible
varies
from
product
to
product
and
is
key
for
customer
success.
A
A
A
So
we
have
these
like
kind
of
four
key
ones
that
they've
come
up
with
and
kind
of
they
left
the
door
open
for
Dora
metrics
could
be
these
four
plus
additional,
really
important
metrics
to
your
team
and
you're
in
your
organization,
and
so
as
we
look
at
kind
of
like
the
things
within
these
things,
like
I'm
sure
there's
room
for
other
metrics
to
be
invented
in
a
way
that
helps
your
team.
Your
organization
find
its
bearing
everywhere.
I've
worked,
every
team
I
have
worked
on.
A
We
have
worked
differently
but
have
worked
in
a
way
that
worked
for
the
team,
so
I
like
I,
think
that's
really
important.
Kind
of
with
that
whole
like
conversation
about
no
true
north,
just
bearing
yeah
I,
think
that
that's
kind
of
an
important
context
that
for
a
lot
of
folks
who
watch
this
stream
have
been
in
the
industry
a
long
time
and
have
that
same
shared
experience.
But
some
folks
are
newer
to
the
industry
and
won't
have
won't.
Have
that
perspective.
C
We've
discussed,
probably
every
team
that
uses
pelors
will
have
unique
needs
and
I.
Think
me:
how
will
kind
of
touch
on
that
in
in
some
of
those
upcoming
slides,
I
hope
so
is.
D
Okay,
so
so
far
west,
thank
you
for
your
slides
and
for
explaining.
There
are
metrics
here
and
in
this
part
of
the
presentation
we
focus
on
how
the
Polaris
is
measuring
those
metrics
and
to
start
with,
we
need
to
First
better
understand
the
definition
of
what
exporter
is
and
then
we
will
go
to
the
architecture
of
Polaris
each
of
the
exporter
definitions,
what
it
captures,
how
it
then
translates
to
those
neurometrics.
D
So
Polaris
is
an
operator
at
the
currently
a
community
operator,
as
was
explained
in
in
the
first
slide
available
in
openshift
Marketplace,
we've
had
open
a
ticket
to
make
it
also
a
working
for
kubernetes
Native.
We.
C
D
D
It's
installing
a
couple
of
components
on
the
right
side
of
the
slide.
You
can
see
that
it
installs
Prometheus
grafana,
and
this
is
simplified
example
because
later
we'll
go
to
the
a
bit
more
interesting
example:
architecture
with
exporters,
so
exporters
is
a
Prometheus
concept.
It's
not
something
that
terrorists
invited
it's
a
promotivious
concept
to
gather
some
data
from
different.
D
D
Every
a
while,
it
just
looks
for
the
data
there
and
then
it
stores
in
its
own
database
for
visualization
and
and
some
rules
on
top
of
that,
but
also
inside
Prometheus
and
grafana
installs,
or
sets
some
of
the
rules.
And
this
is
the
secret
source
of
the
roles
as
well,
because
those
rules
are
then
being
visible
for
us
for
the
UI.
D
And
we
integrate
with
Prometheus
directly
there.
Polaris
currently
have
three
main
times
of
the
exporters.
We
can
see
commit
time,
deploy,
time
failure
and
as
we
go
through
those
slides,
we
will
understand
what
each
of
the
functions
of
this
exporter
is
doing
and
why
there
was
also
a
new
exporter
introduced
webhook,
and
this
is
do
it
all
for
us.
So
basically,
webhook
can
serve
us
any
type
of
exporter,
plus
some
extra
types
that
can
be
as
well
said.
D
Some
some
teams,
some
organizations,
have
special
requests,
so
webhook
also
allows
us
to
add
those
special
needs
to
it.
It's
pluggable
architecture,
just
a
python
code
that
allows
us
to
gather
more
information
than
just
those
three
metrics
currently
and.
D
Okay,
so
this
is
a
very
simplistic
diagram
to
show
what
I
already
have
said,
and
we
can
see
those
three
exporters
commit
time,
failure
time
and
deploy
time
that
are
pulling
the
information
from
even
data
sources.
So
this
is
a
pull
method.
We
connect
to
different
apis
to
different
services
and
then
pull
this
data,
and
then
our
Prometheus
instance,
which
is
also
deployed
as
part
of
the
powers
operator,
is
pulling
data
from
those
exporters
on
the
right
side.
D
There
is
this
webhook
exporter,
so
webhook
exporter
acts
as
a
kind
of
proxy
to
the
Prometheus.
We
push
the
data
to
it
from
any
source,
so
this
can
integrate
easily
with
third-party
CI
systems
and
some
failure
tracking
systems,
and
we
have
a
well-established
structure
of
the
data
that
needs
to
be
sent
to
that
webhook,
and
this
will
then
expose
this
data
to
the
Prometheus
instance.
D
Okay,
so
now
a
bit
about
exporters,
the
commit
time
exporter
is
the
one
that.
D
Connects
the
commit
hash
together
with
the
image
Sha,
so
we
are
building
container
with
some
image,
with
some
sha
available
for
us
that
was
created
from
a
particular
comet
in
the
git
repository
or.
C
D
If
we
are
using
webhook,
then
we
can
use
any
repository
here
and
then
this
information
is
being
combined
together
to
one
metric
which
consists
of
information
such
as
what
application
we
are
deploying.
What
comet
hash
is
there?
What
image
show
was
used
to
deploy
that
application
and
then
the
namespace,
where
this
application
was
deployed
and
a
timestamp
when
this
event
happened?
D
So
having
this
information,
we
can
either
use
on
the
this
webhook
exporter
or
we
can
automatically
query
some
of
the
apis
and
power
supports
the
from
the
git
endpoints
GitHub,
GTA
and
gitlab
Azure
devops.
Also
it
supports
the
image
stream
in
openshift,
so
openshift
have
concepts
of
the
internal
registry
and
the
objects.
Their
image
streams
can
be
annotated,
labeled,
and
then
we
can
take
advantage
of
this.
By
taking
this
information
and
the
commit
information
from.
B
D
D
D
This
goes
outside
of
openshift
objects
and
apis,
and
it
allows
developers
to
embed
this
information
directly
to
the
container
image.
Then
we
have
a
failure.
Exporter
and
the
failure.
Exporter
is
capturing
the
timestamp
at
which
the
failure
in
a
production
occurred,
and
it
was
resolved.
Why?
In
production,
because,
for
example,
we
can
measure,
we
can
store
our
failures
in
a
jira,
but
not
all
of
the
failures.
There
are
production,
one
from
the
dura
perspective.
We
are
really
interested
in
the
ones
that
are
affecting
our
deployments
on
a
production.
D
So
somehow
we
need
to
filter
out
from
all
the
jira
tickets
from
all
the
jira
cards,
the
ones
that
are
actually
causing
the
failure
on
the
production
and
every
deployment.
Every
group
every
organization
may
use
a
different
layout
of
jira
cars
of
jira
labels.
That's
why
we
have
our
own
defaults
plus.
We
have
included
a
custom
query,
so
really
anyone
can
adjust
this
failure,
exporter
to
its
own
jira,
workflow
and
based
on
a
well-known
giraffe
queries.
Also,
this
failure
exporter
allows
us
to
acquire
apis
from
the
GitHub
issues:
servicenow
pager,
Duty,
Azure,
devops,.
B
D
Webcock
because
webcook,
as
we
said,
it's
a
push,
data
model
that
serves
it
all
and
then
the
deploy
time
exporter
and
which
captures
the
timestamp
at
which
the
deployment
happened
in
a
production
environment
also,
an
application
can
be
deployed
to
a
staging
environment
or
to
some
testing
environment,
some
other
namespace.
But
we
are
really
interested
in
the
ones
that
are
and.
D
So
it's
really
important
to
understand
how
we
are
deploying
application,
how
we
are
building
application
to
make
sure
that
Pandora's
will
gather
the
events
that
are
really
interesting
and
really
the
ones
that
are
affecting
production
and
not
the
staging,
and
there
are
many
many
different
configuration
options
and
the
first
homework
that
anyone
should
do
trying
to
adopt
floors
to
to
the
to
be
useful
is
actually
to
see
how
the
end
to
end
application
life
cycle
application
pipeline
is
done
and
then
add
up
those
exporters
to
feed
those
application
pipelines.
Okay.
D
D
Thanos
is
not
on
this
diagram
to
not
pollute
this
diagram,
but
this
diagram
is
to
present
that
we
may
have
multiple
clusters
with
one
Polaris
instance
per
cluster
and
then
for
that
particular
cluster.
We
have.
We
may
have
a
those
exporters
that
are
serving
the
needs
for
a
particular
cluster
or
particular
application
living
in
a
cluster.
We
also
on
the
left
side
can
connect
this
Polaris
instance.
There
were
some
tries
like
that
to
any
external
promise
use
exporter,
because
this
is,
as
I
said,
the
Prometheus.
C
D
Okay,
okay,
thank
you.
Yeah
I,
don't
see
your
faces
or
anything
that
happens
on
that
chat.
So
if
there's
anything
that's
been
here,
okay
and
on
the
left
side,
we
can
see
also
external
from
it
use
exporter,
because
maybe
there
is
another
instance
running
somewhere
else
of
the
cluster,
so
we
also
allowed
this
to
do
to
customize
the
deployment
of
the
Belarus
and
also
connect
everything
through
the
S3
storage
to
ensure
that
if
the
cluster
goes
off
or
there
is
a
redeployment
in
another
cluster,
this
data
is
not
lost.
Why?
D
Because
the
data
for
the
dura
Matrix
is
the
most
important
for
us
really.
The
purpose
of
the
Polaris
is
to
store
the
data
for
longer
period
of
time.
It's
pretty
hard
to
actually
show
this
demo
because
we
store
the
data
for
one
to
three
days,
but
the
real
value
of
this
tool
is
to
monitor
the
deployments
monitor
the
commits
across.
D
D
Bigger
set
of
instances
of
blocks
this
slide
is
to
show
this
Polaris
dashboard
we
use.
So
let's
go
back.
Actually
we
use
here
on
the
in
the
top
left
corner
grafana,
so
grafana
is
also
we
use
Community
operator
to
deploy
grafana
that
points
to
either
Prometheus
directly
or
if
we
are
using
the
S3
storage
to
a
tunnels.
Query
to
aggregate
this
data
from
multiple
clusters
and
then
grafana
allows
us
to
represent
this
data
in
those
four
nice
simple
views-
and
this
is
the
view
of
Belarus
and
that
provides.
D
How
the
Polaris
collects
this
data
is
one
thing,
so
we
know
that
it
connects
to
different
apis.
It
stores
the
information
from
there
openshift
object
itself.
We
can
push
the
data
to
those
exporters,
but
also
we
need
to
I
think
understand
how
this
lives
in
the
application
pipeline.
So
in
a
good
devops
environment
we
have
and
I
just
split
those
phases
into
some
simple
blocks.
We
have
continuous
integration,
so
this
is
everything
that
happens
before
pushing
some
code
to
the
delivery
phase.
D
So
there
is,
there
are
happening,
some
builds
some
tests,
some
small
CI
is
happening
and
the
best
is
if
it's
all
automated
of
course,
codes
writing
code.
It
cannot
be
yet
automated,
but
we
don't
know
in
a
couple
of
years
from
now.
Maybe
it
will
be
easier
to
write
a
code,
but
here
we
just
capture
the
point
where
the
code
was
reviewed
or
not.
C
D
Source
from
which
the
from
which
the
application
is
being
built,
so
we
start
we
take
this
artifact
called
artifact,
and
then
we
move
it
to
the
production
phase,
and
from
that
point
we
know
that
the
code
was
committed
at
some
point
of
time,
and
this
is
the
first
point
where
Polaris
is
interested
in,
and
this
is
this
comage
time
and
the
comic
may
have
happened
like
even
a
year
prior
to
the
bills
time.
However,
the
Polaris
has,
we
know,
is
the
direction.
D
It's
not
the
tool
to
go
back
in
history,
so
we
are
really
invested
in
the
events
that
happens
from
the
time
that
the
terrorist
was
deployed.
So
even
if
we
do
gather
this
data
from
the
past,
sometimes
we
just
ignore
this
data.
So
the
next
part
is
this
production
phase.
Where
we
build
the
source
code,
we
test
it
again
and
again
and
again.
I
hope
that
the
testing
phase
is
good
enough
to
not
introduce
problems,
and
then
we
create
some
artifact.
It
can
be
a
container
image.
It
can
be
just
a
terrible.
D
It
can
be
just
a
source
code
stored
in
the
git
repository
because,
for
example,
openshift
towels
to
build
the
application
from
source
to
image,
and
then
the
artifact
is
just
the
code
itself
and
then
this
artifact
is
sent
for
a
staging
environment,
testing,
environment,
further
environment.
We
don't
know,
but
at
the
end
it's
good
to
have
it
in
production.
So
this
is
the
point
where
Polaris
is
looking
at
second
event
and
looking
okay,
when
this
application
actually
landed
in
the
deployment
and
going
back,
we
could
we
in
the
slides.
D
D
When
was
the
commit
time,
the
comment
was
on
the
first
when
the
first
event
occurred
and
based
on
that
we
can
calculate
some
of
the
durometrics
and
then
our
application
is
working
and,
of
course,
bugs
happens,
because
why
not
and
application
fails,
because
bugs
can
cause
a
production,
application
failures
and
we
are
invested
in
those
events.
The
application
failed,
so
we
are
checking
when
the
application
failed.
D
The
entire
application
pipeline
can
start
from
the
beginning,
because
there
needs
to
be
fixed.
There
needs
to
be
an
artified,
there
needs
to
be
a
delivery
phase
and
a
new
production
already
application,
and
then,
when
the
application
is
fixed,
we
see
the
second
phase
of
the
third
data
point
event
that
the
application
is
working
again,
and
this
is
our
failure
exporter.
That
does
the
first
step
hope
this
is
understood,
and
now,
let's
focus
a
bit
on
the
for
metrics
from
the.
A
Just
a
quick
time
check
because
we've
got
about
10
minutes
to
go
and
I
know
that
you
did
have
a
demo.
I
think
that
you
wanted
to
do
so.
I,
don't
know
if
you,
okay,.
D
So
I
will
be
fast
and
the
demo
will
occupy
three
minutes
and
this
will
take
around
five
more,
maybe
three,
let's
see
so
tell
me
why.
D
Photographer
change,
failure
right
also
something
that
was
explained
in
the
at
the
beginning-
is
also
very
simple.
Calculation
is
a
number
of
failed
changes
divided
by
the
total
number
of
changes
to
the
system,
so
under
the
hood,
the
secret
sauce
and
the
Prometheus
rules
are
a
bit
more
complex
than
just
this,
but
we
don't
need
to
go
there.
Just
number
of
change
of
failed
changes
and
the
total
number
of
changes
to
the
system
can
be
many
things
here.
But
if
someone
is
interested,
the
rules
are
in
the
source
code.
B
A
D
Is
the
mean
time
to
restore
which
is
the
failure
exporter?
So
how
long
does
it
take
to
restore
service
on
a
production
when
a
service
incident
occurred?
And
this
is
the
average
time
over
period
of
time,
because
we
can
say
again.
This
happened
like
within
one
month,
two
months
time
frame
and
our
average
was
20
minutes
to
restore
our
production
failure,
and
this
also
can
be
applied
to
a
one
particular
application
or
across
group
or
across
all
the
applications
that
we
are
monitoring,
Okay,
so
I'm,
sorry,
I,
move
back.
D
We
have
in
our
source
code,
which
is
here
a
demo,
so
sorry,
a
demo
where
everyone
can
go
and
quickly
run
the
demo
against
openshift
environment,
and
this
demo
will
deploy
a
tecton
pipeline,
build
some
simple
application
using
source
to
image
or
binary.
Build
then.
D
Also
label
everything
for
the
Polaris
to
monitor
properly,
and
this
demo
will
give
us
a
pretty
nice
feeling
of
how
per
also
works
for
today,
I
created
a
slightly
different
environment
where
I
was
using
the
webhook
exporter
for
to
send
the
data
and
I
created
free
applications,
GitHub
github's
Guide
to
the
Galaxy
and
then
the
second
application,
the
Galaxy
application
and
the
third
application,
which
is
actually
called
here.
The
second
sample
application-
and
we
can
see
here
and
the
grafana
dashboard-
we
can
select
a
different
time
slot.
D
D
The
average
within
those
six
hours
was
6.7
minutes.
I
hope
that
everyone
understands
now.
What
is
the
lead
time
for
change
the
deployment
frequency,
so
we
got
23
deployments
over
this
period
of
time
and
we
got
a
couple
of
backs
failures
on
the
production
you
can
see
here
is
also
very
short,
lift
failure.
So
we
have
one
two
three
four
five
failures
and
we
can
see
that
over
the
23
deployments
there
were
five
failures
and
over
this
time
there
is
an
8.6
minutes
interval
to
fix
it.
Yeah.
C
And
one
of
the
things
I
really
like
about
this
view
for
organizations
coming
back
to
kind
of
like
your
bearing
is
obviously
we
haven't
been
running
this
particular
instance
for
like
a
year
or
two,
but
really
the
value
of
this
having
running
running
for
that
long
is
where
were
we?
Where
are
we
now?
Where
were
we
three
months
ago
and
six
months
ago?
C
And
let's
compare
those
or
even
like
a
year
ago
time
enough
for
these
organizational
changes
to
kind
of
work
to
work
their
way
in
kind
of
like
raising
interest
rates
it
takes
a
while
for
the
effect
to
be
read
in
so
that
that's
where
I
think
on
a
day-to-day
basis,
this
isn't
going
to
be
the
most
useful
information,
because
change
doesn't
happen
that
quickly,
but
really
being
able
to
see
like
a
diary
or
a
log
from
a
year
ago,
where
you
were
in
your
in
your
organization
this
this
proves
to
be
extremely
valuable,
I
think
exactly.
D
A
I
think
it's
also
really
important
to
when
you
look
at
these.
These
metrics
there's
there's
something
in
the
actual
like
the
Dora
devops
research,
it's
like
kind
of
like
these
concepts
of
Dora,
high
and
Dora
Elite
and
so
forth
and
I
think
rightfully.
So
those
get
a
lot
of
pushback
because
you
know
the
if
you,
if
you
slow
down
your
deployment
frequency
slightly,
and
maybe
you
drop
out
of
door
Elite
to
Dora
high,
but
your
change
rate
failure
also
drastically
dropped
off
and
is,
like
almost
you
know,
almost
almost
zero
percent
right.
A
D
C
C
B
D
Okay,
so
to
finish,
this
presentation
I'd
like
to
point
to
our
documentation,
it's
under
Dora,
Dash
metrics.io,
and
we
put
a
lot
of
effort
to
make
it
readable
and
pretty
clean.
You
have
a
quick
start,
installation
tutorial
a
long
configuration
list
because,
as
I
said,
it's
important
to
understand
the
process
and
add
up
those
exporters
to
your
your
needs
and
we
made
quite
a
nice
way
of
for
contributors
to
a
easy
day,
First
Steps
in
the
pillars
projects.
Basically,
everything
is
done
for
the
make
file.
D
D
Also
also,
actually,
Wes
added
are
pretty
nice
thing,
which
is
called
make
help
which
is
unique
here,
so
everything
what
you
make
which
make
will
do
for
you,
yeah.
A
Yeah:
okay-
okay,
that's
great!
That's
fantastic!
Okay!
Well!
That
timing
was
perfect.
We
are
exactly
at
top
of
the
hour.
Thank
you
so
much
for
coming
here
and
talking
to
everybody
about
these
Concepts
and
showing
us
the
demo.
I
really
appreciate
it.
I
don't
have
any
closing
thoughts.
I,
don't
see
anything
in
the
chat.
So
if
you
snooze,
you
lose
just
tweet
me
questions
later
and
I'll
relay
them
through
through
the
slack
ether
to
get
the
answers.
Should
you
so
desire
and
I
think?
Is
there
a?
C
Yeah
get
ambition
github's
the
best
way
to
do
it.
There's
the
GitHub
discussion
and
issues
discussions,
probably
key
yeah.
A
Yeah
perfect
so
yeah,
if
you
can
find
them
these
folks
on
the
discussion
on
the
GitHub
project
which
I
linked
it
somewhere
here
it
is
I'll
show
it
there.
You
go
pause.
Your
screen
copy
the
comment
from
the
YouTube
interface.
However,
you
do
it,
we
will
let
these
guys
go.
We
are
going
to
I
will
hit
end
stream.