►
From YouTube: Monitoring Future Vision Workflows Discussion
Description
Conversation with Sid - CEO at GitLab, Sarah - Product Manager, Monitor at GitLab and Kenny - Director of Product, Ops at GitLab.
https://about.gitlab.com/direction/monitor/#future-vision
A
B
A
Hey
I
think
I've
been
noticing,
like
we
have
a
great
vision
and
we've
got
some
great
worksheets,
where
we
talk
about
how
to
triage
how
to
resolve
how
to
improve
monitoring
further,
but
we
get
some
things
that
are
still
very
minimal
and
get
lapped.
Today,
for
example,
the
logs
are
not
aggregated
logs
all
live
in
the
individual
containers,
which
is
kind
of
a
bad
practice.
A
C
So
and
I'll
just
say:
there's
two
halves
of
our
monitor
group.
One
is
the
health
side
which
Sarah's
the
park
manager
on
and
the
other
one
is
the
ATM
side
which
I
have
a
new
product
manager
named
Doug,
who
came
to
us
from
elastic
who
was
not
able
to
attend.
We
have
on
the
schedule
for
12.5
the
installation
of
elastic
to
your
kubernetes
cluster
so
that
we
can
start
aggregating
the
logs
and
he
also
has
an
issue
this
release
that
at
a
minimum
lets.
C
B
A
Think
those
are
exactly
the
right
things
like
make.
The
existing
functionality,
although
very
very
minimal,
make
it
at
least
work
and
then
also
work
on
elasticsearch
and
I
want
to
check
an
assumption.
The
assumption
is
like
when
you
stole
elasticsearch
gitlab
will
automatically
start
sending
all
your
logs
there,
so
whatever
it
is,
log
stash
or
something
else
felt
beat
on
that
cool.
That's
great
and
Sarah
I'm.
Sorry,
if
there's
not
super
relevant
for
you,
because
I
think
you've
you've
been
doing
great
work
on
that
on
the
health
side
and
and
I.
A
Like
a
lot
of
that,
there's
a
lot
of
progress.
There's
I'm
not
worried
the
only
only
question
there
is
and
I
think
I
asked
it
before,
but
why
not?
Why
not
show
that
it's
it's
important
is
how
how
we're
doing
on
auto
remediation
is
that
on
track
to
launch
this
year
an
auto
remediation
is
hey,
I
got
a
dependency
de,
a
vulnerability
was
found
in
a
dependency
I
use
and
it
gets
updated
without
me
doing
anything
and.
A
A
C
C
Yeah
correct
in
it
I
guess
in
the
sense
that
it
doesn't
create
an
incident.
It
just
creates
in
a
merge
request,
and
so
Sarah's
world
in
health
is
really
about
incident
and
that
kind
of
firefighting
response
there's
also
a
bunch
of
work
that
Sarah
recently
did
and
did
a
opportunity
canvass
review
around
error
tracking,
which
is
important
not
just
for
production
applications,
but
developers.
B
And
then
we
are
further
embedding
it
in
the
Caleb
development
workflows,
so
correlation
between
century
Aries
and
releases
century
Aries
and
merge
requests
servicing
all
of
this
intelligence
that
we
get
from
our
open
source
partner
and
different
places
within
get
web
ultimate
goal.
Being
how
much
information
can
we
provide
someone
within
get
labs
so
that
they
don't
ever
need
to
go
to
century
without
actually
rebuilding
all
of
the
workflows
and
functionality?
A
That
is
sociology
that
I'm
I'm
loving
that
I
think
it's
so
great
that
we're
first
like
using
sentry
to
the
maximum.
So
we
can
kind
of
get
value
to
our
customers
as
soon
as
possible,
and
then
don't
forget
about
the
long-term
goal
of
having
everything
in
a
single
interface.
But
I
agree
that,
even
if
it's
in
the
gate
lab
interface,
we
can
still
like,
for
example,
aggregating
errors
and
making
sure
that
errors
that
are
about
the
same
thing
get
combined
together.
A
That's
such
a
hard
thing
that
takes
years
of
experience
and
sentry
is
doing
such
a
good
job
at
doing
that
they're
so
good
about
their
open
source
values
like
everything
on
century
comm,
ships
and
their
open
source
client.
So
I
love
that
you're,
combining
that
and
I
think
viewing
century
as
a
piece
of
infrastructure
like
if
you
Prometheus,
for
example,
is,
is
exactly
the
right
path.
A
I'm
I'm
gonna
keep
trying
to
have
a
question.
That's
at
least
relevant
to
you
I'm,
not
sure
this
is
one
but
I'm
interested
in
it
as
I'll
ask
anyway,
when
you
look
at
the
health
of
things,
it's
really
important
to
get
kind
of
an
overview
of
the
entire
cluster
and
I
think
there's
a
nice
cluster
view
in
data
dog
rat
hat
is
making
progress
Jiali
that
has
a
nice
cluster
view.
A
B
As
part
of
the
future
vision
that
Kenny,
dove
and
I
are
collaborating
on,
so
my
vision
of
what
that
would
look
like
is
holistic
infrastructure
systems
and
where
they
are
running
with
different
visual
indications
that
allow
you
to
easily
drill
in
on
that
part
of
the
service.
And
so,
as
you
pick
a
system
or
a
service,
and
your
view
expands
to
include
just
that
section
of
your
application.
You're
provided
with
the
metrics,
the
aggregated
logs,
the
stock
traces,
the
synthetics,
the
impact
on
your
user.
B
Additionally,
business
metrics
that
you've
said
within
gitlab,
where
you
are
or
where
you
have
exceeded
your
service
level,
objective
thresholds
and
so
Splunk.
Does
this
a
little
bit
where
they
had
the
bigger
view
and
then
the
really
easy
way
to
drill
in
and
then
link
to
other
services
with
and
get?
Let
give
you
more
insight.
So,
yes,
on
the
roadmap.
No,
we
do
not
have
marks
right
now,
but
that's
the
ultimate
goal
of
from
a
health
perspective
where
I
would
like
to
be
cool.
A
C
Think
we
see
it
a
little
bit
in
well,
so,
first
of
all
from
your
dog
treating
it
in
that
word,
we're
starting
to
build
a
project
within
get
lab
called
get
lab
services
that
has
a
number
of
different
internal,
smaller
applications
running
on
it
and
so
we're
starting
to
see
more
usage
there
and
we're
seeing
customers
ask
for
it.
Maybe
in
the
roundabout
way,
because
they're
doing
infrastructure
is
code
where
they're
managing
their
that
platform.
C
As
a
specific
you
get
lab
project
and
then
using
terraform
to
deploy
and
kind
of
like
monitor
the
health
of
it.
So
today,
to
the
extent
that
people
aren't
doing
that
directly
with
kubernetes
we're
seeing
it
as
infrastructure
as
code
we'd
like
to
continue
to
support
teams
who
are
doing
it
with
kubernetes,
and
you
have
a
dog
for
use
case
today,
cool.
A
Fionna
problem,
it's
really
important
to
go
from
your
going
from
alert
to
paly
metrics
that
cost
the
alert,
probably
two
logs
to
look
at
like
what
specifically
is
going
wrong
and
like
a
way
to
do.
That
is
select
time
in
that
incident
and
then
view
at
the
relevant
logs
I.
Think,
for
example,
data
doc
does
a
great
job
of
showing
that
when
are,
we
gonna
go
there
with
get
lab
and
if
so,
what's
what's
the
route
to
that?
A
C
C
A
A
C
A
C
A
C
C
Also
have
each
group
be
assigned
some
of
these
workflows
so
that
we
don't
lose
sight
of
like
Oh
dopes
working
on
logging,
but
Sarah
really
needs
that
log
aggregation
time
slicing
for
triage,
that
we
don't
lose
sight
of
the
interconnection,
because
the
categories
are
kind
of
today
in
the
monitor
world
are
just
the
legacy
effective.
The
different
companies
used
to
have
point
solutions
here,
but
it's
becoming
more
about
this
combined
workflow
yeah.
A
For
sure
yeah
we're
we're
monitoring.
It
started
a
bit
earlier
like
we're
single
application
for
the
DevOps
workflow,
but
it's
clearly
monitoring
that
there's
constantly
so
consolidation
started
earlier
and
you
see
all
the
great
companies
like
data
dark,
like
honeycomb,
even
even
I,
think
Splunk
is
adding
some
metrics
to
to
to
their
offerings
or
they
acquired
a
metrics
company.
So
I
think
I
think
the
consolidation
is
like
in
full
swing
there
already
yeah.
It's
one
part
signal
effects,
that's
it!
Thank
you.
Great
acquisition,
great
company.
A
C
B
You
want
to
do
that
yeah!
You
may
want
a
moment
okay,
so
while
I
search
for
these
issues,
I'll
just
give
another
recap
of
the
plan.
We're
gonna
start
by
making
the
view
of
century
errors
in
gitlab,
far
more
robust
and
then
adding
detailed
air
reviews,
as
you
want
to
drill
in
and
then
easy
connection
to
creating
issues
which
will
allow
us
to
leverage
incident
management
workflows,
we
just
spent
the
last
quarter.
Building
the
next
milestone
following
that
will
be
all
about
correlation.
How
can
we
tie
errors
tighter
into
the
gitlab
workflow,
any
one?
B
Okay,
so
what
we're
looking
at
is
our
first
iteration
on
the
area
list
within
gitlab.
You
have
the
ability
to
filter,
by
search
by
click
out
view
it
in
sentry,
and
then
that
connects
you
to
handful
of
different
actions
that
you
can
take
on
errors
within
century.
So
again,
our
goal
is
provide
enough
information
so
that
nobody
needs
to
leave
the
tool,
give
them
enough
pertinent
details,
so
they
can
decide
if
they
need
to
ignore
triage
or
create
issues.
Beyond
this
cool.
A
B
B
B
A
B
B
A
B
A
A
A
Cool,
yes,
what's
what's
taking
us
so
long
to
improve
the
logging
Kenny,
not
not
not
about
people
but
like?
Is
there
an
organizational
thing?
Is
it
were
there
too
few
engineers,
so
what's
their
missing
direction?
It
just
we
focused
on
metrics
and
it
was
so
much
work
to
get
the
a
charge
and
get
up,
and
we
did
that
part.
C
Of
it
is
that
I
would
also
remind
you
that
the
monitoring
team
is
responsible
for
the
self
monitoring
too,
and
so
we've
spent
a
lot
of
time
as
part
of
the
self
manage
scalability
working
group,
improving
our
access
to
metrics,
or
they
get
map
instance
itself,
but
I
think,
if
I'm
being
honest
part
of
it,
is
also
direction.
We
spent
some
time
saying
like.
Is
it
a
last
to
search
or
is
it
some
other
tool?
C
A
That
makes
sense,
yeah
I've
looked
around
to
see
alternatives
because
elastic
is
like
requires
so
much
compute.
If
you
do
that
in
excellent
byte,
but
it
doesn't
seem
like
there's
anything
else
and
I
think
you
alternate
if
we
might
have
considered
was
from
the
Prometheus
people
like
indexing.
That's
right.