►
From YouTube: Incident Management
Description
Kevin, Group Product Manager for Monitor, Configure, Release presents on increasing operational efficiencies and end to end insight & visibility with Incident Management.
Key Topics:
1. What is Incident Management
2. Learn about GitLab Incident Management, what it can do, what is coming up next
3. Feel equipped to ask customers about what they are using today and tee up future conversation on Incident Management
A
I'm
the
group
product
manager
for
the
monitor,
release
and
configure
stages,
and
today
I
am
here
to
talk
about
incident
management,
which
is
a
relatively
new
set
of
features
we
have
in
gitlab,
and
recently
we
just
released
on-call
schedule
management,
given
that
we
thought
it
would
be
great
for
you
all
to
learn
a
little
bit
about
it.
A
And
since
there's
so
few
of
us
here
feel
free
to
interrupt
me
at
any
time
with
questions
or
if
you
want
to
dive
into
a
particular
area,
I
don't
mind
pausing
and
answering
your
question
as
soon
as
you
have
them.
A
So
what
is
incident
management?
What's
it
or
rather
what's
a
problem
we're
trying
to
solve
at
the
end
of
day?
No
one
wants
down
time,
but
it
happens.
All
the
time
and
downtime
is
expensive
for
companies
and
for
companies
that
experience
downtime.
The
obvious
thing
is
they
want
to
recover
from
them
as
quickly
as
possible.
A
So
incident
management
is
all
about
that.
It's
helping,
coordinate
the
efforts
to
reduce
the
time,
reduce
the
time
of
reduced
downtime
as
an
aside
contrasting
that
to
monitoring
monitoring
is
about
knowing
when
something
is
wrong
or
potentially
diagnosing
helping
diagnose
what
that
problem.
Is
that
led
to
problems
and
the
other
side
on
the
other
side,
it's
like
responding
to
incidents
is
also
hard
and
stressful,
so
for
for
for
sres
or
infrastructures,
when
you're
constantly
in
the
loop
of
responding
to
that
to
downtime.
This
leads
to
burnout.
A
We
hear
a
lot
from
our
from
the
folks
that
we
interview
with
that
a
constant
problem
is
alert,
fatigue
having
constantly
respond
to
downtime,
and
you
see
a
lot
of
people
in
that
role
constantly
churning
when
organizations
gets
into
that
state.
It's
something
that's
that's
hard
to
recover
from,
and
so
because
of
this
incident
management
has
become
a
must-have
tool
for
many
organizations.
A
Looking
at
the
competition,
what
we've
learned
is
that
still
a
lot
of
companies
have
homegrown
solutions
for
incident
management,
but
there
does
exist
several
well-known
brands
in
this
space,
a
large
organizations
typically
buy
one
of
these
services
and
small
organizations
typically
have
some
homegrown
tool,
using
slack
as
a
central
way
to
communicate
and
one
trend
that
we
do
see
in
this
space
is:
there
is
consolidation
of
incident
management
tool
into
other
workflow
tools
or
monitoring
tools.
The
the
one
exception
is
the
market
leader
pagerduty,
which
still
stands
as
a
standalone
tool.
A
A
B
A
A
All
right,
this
is
a
slide
I
was
showing,
and
I
had
a
nice
cloud
losing
money
earlier
that
you
don't
miss
anyway
cool.
So
what
does
incident
management
actually
do,
there's
a
few
jobs
that,
as
the
management
tools
are
hired
for?
First
it's
it's
a
place.
It's
a
centralized
place
to
collect
alerts.
A
Alerts
are
typically
raised
in
monitoring
tools,
so,
and
organizations
can
often
have
more
than
one
monitoring
tool
that
that
they're
using
at
a
time.
So
it's
it's
nice
to
not
have
to
go
to
different
places
to
see
your
different
alerts.
That's
raised
from
different
parts
of
your
tech
stack
and
once
an
alert
comes
into
an
instant
monitor
tool.
They
typically
try
to
hydrate
the
alert
with
more
information,
so
that
incident
responders
can
react
and
or
can
understand,
more
about
what
this
thing
is
and
if
it's
important
enough,
they
will
raise
the
incident.
A
Another
thing
that
the
instant
management
tool
does
is
it
gathers
all
the
people?
That's
going
to
respond
to
the
incident,
so
this
is
when
paging
happens.
So,
typically
the
setup
is
you
have
a
schedule
or
plan
who's
going
to
be
on
call
and
once
an
alert
comes
in,
you
send
off
a
page
to
people
so
that
they
will
come
and
take
a
look,
even
if
it's
outside
the
normal
working
hours
once
the
team
is
gathered,
the
incident
management
tool
is
responsible
for
automating,
some
of
the
tasks
to
to
facilitate
coordination
among
team
members.
A
A
And,
lastly,
the
incident
management
tools
are
typically
responsible
for
facilitating
post-incident
review.
This
is
the
important
step
because
by
gathering
what
has
happened
in
the
past,
what
the
team
has
learned
from
it,
they
can
then
and
make
gradual
improvement,
improve
and
iterate
over
time
all
right.
So
I'm
gonna
move
to
show
you
what
we
built
so
far.
A
A
Occurs
so
I'll
just
give
it
a
name.
I
can
add
a
description,
it
doesn't
matter
matter
the
time
zone.
I'll,
add,
add
a
schedule.
So
once
you
add
a
schedule,
you'll
see
that
a
schedule
by
itself
is
not
enough.
Another
concept,
that's
within
our
incident
management
tool,
is
the
idea
of
a
rotation,
so
the
rotation.
A
I
set
it
to
start
a
few
days
ago
at
the
beginning
of
the
month
and
I'll
I'll
set
it.
So
in
this
example,
where
each
person
is
on
rotation
for
seven
days,
there
are
other
things
that
you
can
set,
such
as
setting
an
end
date
for
this
particular
rotation,
or
restricting
this
to
a
specific
time
interval
oftentimes
incident
response
teams
will
have
specific
hours
that
they
work
gitlab.
For
example,
we
have
people
located
around
different
parts
of
the
world
depending
on
where
they
live.
A
A
What
we'll
be
adding
in
the
next
few
milestones
is
we're
going
to
be
adding
this
concept
as
a
escalation
policy.
What
an
escalation
policy
is
is
whenever
an
alert
comes
in
escalation.
Policy
is
responsible
for
figuring
out
how
to
contact
the
people
that
needs
to
respond
to
the
alert,
so
you
could
be
so
you
could
set
it
up.
So
the
person
receives
a
page
receives
a
ping
in
gitlab.
Receives
email
receives
a
phone
call,
so
that
will
be
coming
up
in
the
next
few
miles.
A
So
once
this
is
done,
the
next
thing
I'm
going
to
do
is
I'm
going
to
trigger
an
alert,
I'm
just
going
to
use
our
alert
integration.
There's
a
capability
to
send
a
test
alert
to
set
things
up,
so
I
just
sent
this
alert
and
what
you'll
see
is
at
some
point
here.
It
is
this
notifies
the
responder
that
something
has
happened
clicking
through
the
email.
That
brings
you
directly
to
the
alert
details
page,
but
before
we
show
that
here
is
the
alerts
list
page.
A
And
you'll
see
that
alert
that
was
just
triggered
a
minute
ago.
So
this
is
a
place
where
an
sre
team
or
infrastructure
team
can
monitor
throughout
the
working
day
they're,
not
on
call
to
understand
what
is
happening
across
their
system.
It's
a
it's,
a
it's,
a
single
location
where,
where
all
the
alerts
hopefully
is,
is
aggregated,
so
you
can
do
certain
filtering
and
certain
sorting
to
to
facilitate
the
workflow
for
the
sre
team
clicking
into
the
alert.
A
Actually,
one
other
thing
I
want
to
show
you
real
quick.
Is
it's
important
for
alerts
to
be
actionable,
so
I
just
set
multiple
of
the
same
type
of
alert,
which
is
very
typical
in
a
real
situation,
because
when
something
goes
wrong,
the
monitoring
engine
will
keep
sending
additional
alerts.
If
the
threshold
is
met,
if
you
have
to
respond
to
the
same
alert
over
and
over,
that's
not
particularly
useful.
A
What
the
team
has
built
is
the
ability
to
aggregate
alerts
that
are
of
the
same
type
together,
so
you'll
notice
that
there's
actually
been
four
of
the
same
type
of
events
that
has
occurred
so
clicking
into
this
alert.
We
on
the
main
details,
page
you'll,
see
all
the
information
that
was
supplied
by
the
originating
source.
A
If
you
have
a
monitoring
tool,
that's
more
deeply
integrated.
Currently,
it's
only
with
prometheus.
The
metrics
will
also
show
up
here
and
anything.
Any
actions
you
have
taken
will
show
up
in
the
activity.
A
A
So,
looking
at
this
incident
details
page,
what
this
is
is
actually
a
a
different
issue
type.
So
if
we
go
to
issues
something
that's
new,
that's
relatively
new,
that's
added
is
you
can
actually
create.
An
incident
also
manually
should
shoot
your
workflow
dictate
that
that's
how
things
are
done,
but
in
our
example,
we
can
create
an
incident
directly.
A
Directly
from
from
our
alerts
and
in
the
new
instant
issue
type,
it
has
it
links
to
all
the
alert
details,
you'll
notice,
there's
something
specific
about
this
issue
type,
that's
different.
A
A
Under
this
tab,
there's
under
instrument
settings
you
can
se
activate
a
specific
time.
That's
a
countdown
timer
to
notify
the
team
how
much,
how
much
time
has
expired,
and
typically
team
teams
aim
to
to
respond
to
an
incident
within
a
certain
amount
of
time.
A
From
within
the
incident
page,
you
can
say,
for
example,
publish
this
information
to
a
status
page
or
collect
all
the
information.
So,
as
you
work
through
the
incident
to
resolve
the
issue
this
this,
the
comments
automatically
becomes
a
timeline
that
the
team
can
review
once
the
incident
is
resolved.
C
A
C
First,
one
is,
I
want
to
understand
if
the
alerts
are
rolled
up
to
the
group
level.
A
Eventually,
well,
currently,
it's
only
at
the
project
level.
Okay,.
C
And
so
it
kind
of
leads
to
my
next
question,
which
is
what
size
customer
is
gitlab's
incident
management
capability
well
suited
for.
A
Cool,
let
me
walk
through
what
we
have
today
and
what's
coming
up
next
real
quick.
So
what
we
have
today
is
integration
and
by
integration.
Specifically,
we
have
an
http
endpoints,
that's
able
to
receive
webhook
alerts
from
various
monitoring
tools,
and
you
can
you
can
map
whatever
the
fields
are
for
a
specific
alert
to
the
way
that
gitlab
displays
information
within
our
system,
as
you
saw
responders,
can
triage
their
alerts
in
a
single
location
and
they
can
also
triage
incidents
in
a
single
location.
A
What
we
just
released
was
on
call
schedule
management,
which
I
showed
you
in
the
very
beginning
of
the
demo,
and
we're
continuing
to
make
more
improvements
to
it.
There's
some
more
polishing,
that's
needed
for
this
specific
feature.
What
we're
working
on
next
is
paging
escalation.
A
A
A
What
what
we
built
today,
we
believe,
is
a
fit
for
customers
who
have
homegrown
instant
management
solutions
specifically
because,
right
now
we
don't
have.
We
haven't,
had
a
lot
of
feedback
on
the
entire
workflow.
A
The
problem
that
we've
heard
over
and
over
again
is
that
they
they
feel
that
it's
it's
much
too
expensive,
especially
when
they
have
to
buy
a
seat
for
everyone
that
may,
at
some
point,
interact
interface
with
the
insulin,
which
is
not
all
the
time,
so
they
want
to
consolidate
their
tool
set
in
something
that
they
already
have
so
for
customers
that
want
to
cut
costs
on
incident
management.
C
Me
sorry,
I
was
wondering
thank
you
thanks
for
answering
that
question
and
wondering
if
we
have
plans
to
dog
food
incident,
management.gitlab.
A
Do
plan
is
to
run
because
incident
management
has
a
pretty
high
bar
before
it
can
be
put
into
production.
We
at
getlab
similarly
have
a
high
bar
because
we
we
want
to
be
able
to
rely
on
the
system
100
of
the
time.
A
The
plan
to
dog
food
is
actually
start
by
running
what
we've
been
calling
game
days
to
simulate
incidents
that
are
that
are
similar
to
what
we
see
happening
in
the
in
reality
and
gradually
over
time
as
we
build
up
more
confidence
in
the
product.
We're
gonna
start
using
this
ourselves,
we're
not
quite
there
yet
today.
A
I
think
yeah,
no
problem
appreciate
that
so
mark.
That's
a
great
question
too.
One
of
the
key
I
wanna
highlight
I
wanna
talk
about
two
things.
Number
one
is
the
workflow
that
I
showed
earlier
when
I
was
ensuring
my
stream
this
one,
this
workflow.
This
is
very
similar
across
all
the
incident
management
tools.
A
The
difference
typically
were.
I
would,
I
should
say,
modern
incident
management
tools
like
pagerduty
or
ops.
Gene.
The
difference
between
those
modern
tools
is
how
they
kind
of
group
things
together.
So
we,
for
example,
we
we
call
a
plan,
a
schedule
and
a
group
of
people
who
fits
within
a
schedule,
a
rotation
and
how
we
page
people
an
escalation
policy.
A
I
would
say
that
one
one
thing
that
we
are
not
doing
today
is:
we
did
not
build
specific
custom
integration
with
all
the
monitoring
tools.
This
is
this
is
one
of
the
ways
that
tools
that
pagerduty
or
ops
genie
have
become
really
really
popular,
but
to
pursue
the
route
of
building
a
specific
integration
with
every
single
monitoring
tool
that
could
be
out
there
that
people
are
using
it's
really
expensive
and
it's.
It
really
takes
a
huge
investment
effort
right
now.
A
A
Our
working
theories
is
that
it
is
possible
that
what
we
have
today
is
good
enough,
especially
considering
the
cost
savings
that
that
they
would
get
by
switching
over
to
gitlab
incident
management,
but
that's
something
also
that
we
need
to
validate
further
any
customers
using
it
today
that
you
know
what
so
the
end
to
end
flow,
not
yet
because
we
literally
just
released
on-call
schedule
management,
but
we
are
seeing
a
ton
of
people
already
using
specifically
incidents
as
a
separate
issue
type.
A
A
C
So,
john,
just
from
a
format
standpoint,
I
know
this
is
a
skills
exchange.
Is
it
also
meant
to
be
a
discussion
forum
or
is
it
more
delivering
of
the
the
information
and
the
updates.
C
So
I
I
just
as
as
somebody
who
speaks
to
customers
on
the
front
line
representing
gitlab's
capabilities.
I
I
think
this
is
really
cool
I've.
I
would
probably
position
it
more
as
dealing
in
the
enterprise
space
as
something
that
can
help
a
customer
streamline
their
management
of
an
application,
rather
than
an
enterprise
being
able
to
monitor
the
application
performance
capabilities
such
through
prometheus
that
they're
deploying
assuming
they're
using
gitlab
for
deploy
as
well
and
tying
that
all
together.
C
A
It
and
how
would
you
define.
C
So
I
think
it's
more
a
question
of
you
know
if
I
think
about
large
enterprises,
banks,
utilities
and
such
they
have
a
whole
ops
console
with
an
incident
management
platform
that
probably
has
some
ai
to
or
log
scraping
combination
to
identify
which
incidents
are
actually
an
issue
and
then
correlate
that
with
others
to
say.
Okay,
there's
enough
correlation
here
to
raise
a
ticket
and
create
an
issue
that
is
going
to
call
attention
and
service
now
or
something
like
that
that
drives
a
workflow
and
so
on
and
so
forth.
C
It's
it's
quite
elaborate,
but
at
these
large
enterprises
may
have
an
app
that
they're
deploying
that's
of
particular
importance,
and
there
is
a
team
that
is
going
to
be
responsible
for
triaging
that
and
wants
a
closer,
better
observability
of
what's
happening
in
that
app.
That
they're
responsible
for
these
are
the
people
who
will
go
out
and
buy
their
own
copies
or
their
their
own
use
of
app
dynamics
or
new,
relic
or
datadog,
or
something
like
that
for
their
app.
I
think
that's
probably
a
just
imagining
the
fit
for
my
customers.
C
I
think
this
fits
better
with
that
in
terms
of
our
current
capabilities
and
and
how
it
would
be
useful
from
an
end-to-end
standpoint
at
a
project
level.
A
Yeah,
that
makes
a
lot
of
sense.
I
I
love
to
connect
with
you
further
and
talk
specifically
about
like
what
customers
might
be
a
good
fit
for
yeah
we're
super
open
to
to
particularly
this
type
of
feedback,
because
we
do
understand,
like
it
is
limited
today,
especially
relative
to
the
main
tool
that
their
people
are
going
to
be
using.
A
Cool
see,
there
is
a
few
other
questions
from
samir
and
adrian
adrian.
I
think
it's
on
the
call,
but
let
me
finish
the
content
I
prepared
so
actually,
I
won't
cover
this
because
we
were
just
talking
about
it.
I
hope
you
care
about
incident
management
because
it
adds
value
to
the
gitlab
platform.
A
There
is
additional.
This
additional
value
proposition
should
help
with
conversations
and
eventually
we
do
see
this
as
a
way
to
expand
to
a
different
set
of
users
completely
than
the
one
we
directly
is
a
fit
for
today
with
the
gitlab
application,
and
so
as
we
grow
this
capability,
I
think
you
should
be.
You
should
get
more
and
more
interesting
for
most
of
your
customers.
A
This
information
will
be
available
to
you
later
and
additional
resources
are
mainly
in
our
direction
page
and
I'd
love
for
to
hear
from
you
if
you
have
additional
thoughts
or
questions
after
this
call-
and
I
just
realized-
I
am
not
sharing
my
screen
again,
so
I
was
talking
and
backing.
I
really
apologize
for
that,
but
anyway,
I'm
going
to
turn
it
over
to
questions
so
samir
says:
customer
has
aws
instance
and
is
trying
to
figure
out
how
to
use
incident
management.
A
A
Typically
logs
and
to
tie
logs
or
alerts-
I
wouldn't
say
it's
typically-
to
tie
logs
to
alerts.
More
often
the
case
is
an
alert
is
generated
within
a
system
so
for
aws
you
can
have
alarms
within
cloud
watch
and
you
can
have
those
alarms
bees
be
sent
to
gitlab,
so
they
will
show
up
within
gitlab
incident
management
in
the
alert
list,
and
the
alerts
then
would
trigger
a
page
to
a
instant
responder
that
has
set
up
that
is
set
up
within
a
schedule.
A
A
That
a
more
established
player
in
within
instant
management,
its
focus
on
a
lot
these
days
is
using
ai
to
become
smarter,
with
alert
management.
One
of
the
main
problems
for
a
lot
of
teams
is
alert
fatigue
where
too
many
alerts
come
in
all
the
time.
At
some
point,
you
don't
pay
as
much
attention
to
them
as
you
need
to
which
may
prolong
incidents
when
they
do
occur,
so
so
a
lot
of
times.
A
The
solution
to
that
is
to
monitor
what
has
happened
in
the
past
and
use
that
information
to
get
smarter
about
having
a
computer
figure
out
what
which
alert
is
actually
important.
So
currently
we
don't
have
specific
plans
for
ai
ops
integration,
but
it
is
something
that
we're
thinking
about
that.
We
hope
to
eventually
introduce
in
the
future.
A
D
Think
I
can
probably
revert
it.
It's
sort
of
a
pricing
question
really,
and
I
think
you
more
or
less
answered
it.
It
was
it
was
looking
at.
Do
we
expect
the
users
of
instant
management
to
be
existing
git
lab
users,
because
in
in
many
customers
they
I
could
see,
they
may
well
not
be,
and
then
you
mentioned,
the
existing
tools
are
are
expensive.
I
wondered
you
know
what
what
is
expensive,
because
the
the
features
in
incident
management
look
to
be
spread.
Some
are
in
free
tier,
some
are
in
premium
summer
and
ultimate.
D
So
if,
if
we've
got
a
user
that
wouldn't
be
on
gitlab
that
we
had
to
put
an
ultimate
to
get
incident
management,
how?
How
would
the
price
of
that
compare
to
say
you
know
an
existing
enterprise
grade
incident
management
tool.
A
Yeah
great
questions,
so
certain
features
are
just
available
in
core
and
other
features
are
available
in
premium
today.
Right
now
there
is
no
incident
management
tool
capability.
That's
ultimate
and
I'll,
explain
why
that
is
so.
We
build
incident
management
based
on
some
existing
pieces
within
gitlab.
So,
for
example,
an
incident
is
an
issue.
An
issue
is
obviously
available
at
the
free
tier.
So
so,
if
you're
just
using
an
incident
issue,
that's
that's
freely
available,
but
that's
not
the
incident
management
workflow.
A
We
imagine
the
incident
management
workflow
will
be
at
the
premium
tier.
So
that's
so
that
that
would
include
things
like
schedule,
management,
various
or
multiple
alert,
integration,
eventually
alerts
and
incidents.
At
the
group
level.
Things
of
that
nature
will
all
be
at
the
premium
tier.
A
Our
first
iteration
is
likely.
The
the
users
will
bring
their
own
twilio
account
to
integrate
with
git
lab,
since
they
are
responsible
for
their
twilio
account
and
gitlab's,
not
paying
for
that
they're
free
to
use
it
as
they
wish.
A
But
if
we
are
offering
the
trench
of
pages
that
gitlab
is
managing
thus
far,
we
think
I
think
the
latest
thinking
is
that
still
should
be
a
premium
feature
rather
than
the
ultimate
feature.
When
we
talk
about
ultimate
feature
in
the
future,
it's
more
things
like
ai
ops
and
things
of
that
nature.
C
D
A
Yeah
and
the
status
page,
it's
place
that
ultimate,
but
it's
really
an
early
early
stage
product
that,
frankly,
doesn't
do
a
whole
lot.
Today,.
A
But
I
can
see
that
eventually
being
the
ultimate
feature,
because
who
we
will
be
competing
with
in
that
space.
Typically,
people
charge
it's
it's
we're
targeting
the
larger
enterprises
that
really
need
that
feature.
Okay,.
A
Yeah,
so
let's
talk
more
about
pricing,
this
space,
so
pagerduty
just
taking
pagerduty
as
an
example,
pagerduty
charges,
20
bucks
per
hit-
there's
there's
more
variations
of
that.
But
that's
that's
like
a
typical
on
paper
price
and
that's,
but
it
gets
expensive
fast
because
if
you're
you're
a
devops
shop,
you're
building
microservices
all
of
a
sudden,
all
these
people
need
to
be
available
on
call
in
some
fashion.
A
A
A
A
B
B
But
I
guess
before
we
wrap
up
any
famous
last
words.
B
Questions
comments
all
right.
Well,
I
hope
you
all
enjoy
the
rest
of
your
day
want
to
give
you
the
time
back
thanks
so
much
for
the
presentation
kevin
I
enjoyed
it.
Thank
you.