►
Description
What is AIOps Marcel Hild (Red Hat)
Recorded live at the OpenShift Commons AIOps SIG
March 25 2019
A
Thanks
Diane,
my
name
is
Marcell
hilt
and
I'm
working
in
a
group
of
red
hats,
office
of
the
CTO
called
the
AI
Center
of
Excellence,
and
obviously
we're
looking
into
all
things.
Ai
related
and
I'm
specifically
focused
on
the
broader
theme
of
AI
ops,
what
it
means
for
Red,
Hat
and
the
general
community.
A
A
So
changes
to
applications
were
driven
by
the
developers
and
I
mean
they
change
their
systems
multiple
times
a
day.
If
you
look
at
continuous
rollouts
and
continuous
deployments,
so
what
we
did,
we
disconnected
the
components
and
brought
them
together
via
micro
services
again
and
I
mean,
if
you
look
at
it.
The
same
concept
applies
throughout
the
whole
stack
like
distributed,
compute,
distributed
storage,
distributed
applications
and
orchestration
of
services,
and
with
cloud
native
tooling,
such
as
containers
and
kubernetes,
were
basically
able
to
infinitely
scale
out
and
I
mean.
Obviously,
this
comes
at
a
certain
price.
A
More
components
mean
more
complexity,
but
then
I
mean
hey
IT
operations
in
the
DevOps
world.
We
need
to
know
when
something
isn't
working
and
we
need
to
know
it
now,
because
we've
committed
to
those
to
those
five
nines
of
uptime,
SLA
and
stuff.
So
to
control
these
complex
systems,
we
need
to
introduce
some
sort
of
instrumentation,
some
some
telemetry
is
required,
and
so
we
produce
more
metrics,
more
locks
more
stuff.
A
And
again
we
do
this
at
every
layer
of
the
stack,
because
every
persona
brings
different
needs
developers
needs
instead.
Traces
operation
folks
need
latency
and
time
on
metrics.
So
how
can
a
single
human,
possibly
comprehend
such
a
system,
we're
creating
complex
systems
of
rules,
alerts
and
thresholds
and
guess
what
we
can't
keep
up
with
updating
our
alerts
because
the
system's
being
monitored
change
at
a
faster
pace?
A
It
might
look
like
magic,
whether
at
the
bottom,
it's
a
classification
problem
if
input
a
then
output,
X
and
actually
we're
doing
this
in
all
these
fields
that
have
a
great
and
powerful
monetary
background
like
showing
you
the
most
relevant
cat
images
or
with
a
great
media
coverage
like
beating
human
players
in
almost
every
game
out
there.
But
what
about
operations?
A
What
about
ops
and
I
think
here
we're
just
starting
to
apply
all
these
techniques
to
our
very
own
special
field,
so
in
other
words,
if
your
website
is
slow
because
your
storage
is
slow,
a
computer
can
tell
you
that
and
even
better
if
your
website
is
slow,
because
somebody
flipped
a
bit
somewhere
in
a
not-so-distant
system
with
sufficient
input
and
with
sufficient
training
data.
A
computer
can
also
possibly
tell
you
that,
yes,
your
website
is
slow
and
if
you
flip
this
bit
back,
it's
going
to
be
fast
again.
A
So
what's
a
IUP's
anyway,
Gardner
coined
this
term
like
some
years
ago,
and
it
goes
like
this
AI
ops
platforms,
software
systems
that
combine
big
data
and
AI
or
machine
learning
functionality
to
enhance
and
partially
replace
a
broad
range
of
IT
operations,
IT
operations,
processes
and
tasks,
including
availability
and
performance
monitoring,
event,
correlation
and
analysis,
IT,
service
management
and
automation.
There's
a
lot
of
words
in
there
and
I've
prop
provocatively
highlighted
these
words.
Ai
replaces
IT
operations
because
yeah
people
tend
to
think
like
this,
but
like
it
replaces
truck
drivers
at
some
point.
A
Although
the
self-driving
cluster
will
probably
be
marketed
sooner
than
we
think,
I,
don't
think
that
IT
operations
will
be
replaced
anytime
soon,
but
we
will
certainly
use
big
data
and
machine
learning
to
support
our
monitoring
and
automation
needs.
It's
just
that.
It's
just
another
tool
to
make
you
more
effective
and
efficient
the
tool
to
support
us
and
not
to
replace
us,
and
indeed
this
is
something
that
we
as
Red
Hat's,
believe
in
and
we
are
invested
in
it
they.
A
A
That's
by
oldest
wrapper
from
our
sauce
from
our
CDO
office,
and
so
not
only
for
operations,
but
also
in
our
development
processes.
We
have
to
apply
that
piece
of
scene
learning
supports
to
again.
This
is
how
Gartner
describes
in
a
a
ops
platform
at
the
sender,
there's
big
data
and
machine
learning,
and
then
it's
a
cycle
of
continuous
insights
being
delivered
to
these
three
domains.
Here,
monitoring
in
the
upper
left
corner
will
benefit
from
smart,
alerting
and
like
dynamic
thresholds.
A
As
seen
before,
the
Service
Desk
will
move
from
a
reactive
to
a
more
broad,
proactive
engagement
model
with
higher
efficiency
when
it
comes
to
troubleshooting
and
stuff,
and
ultimately
your
actions
are
highly
automated
and
at
some
point
with
less
and
less
human
interaction
and
I.
Think
this
perfectly
aligns
with
the
four
phases
of
AI
ops.
So,
first
without
data
you're,
nothing
like
so
data
is
the
new
oil,
and
so
we
need
to
get
our
data
collection
straight.
A
We
need
to
make
sure
that
we
have
systems
that
emit
the
required
telemetry
and
that
we're
able
to
store
it
for
a
longer
term
than
just
your
two
days
of
retention
period.
Plus
you
need
some
tools
for
visualization,
your
images.
You
convey
meaning
and
a
lock
file
entry.
That
might
be
obvious
to
you,
the
author
of
that
lock
emitting
system.
But
what
about
all
the
metadata
in
that
entry?
How
do
you
paint
a
broader
picture
over
time?
A
Then?
We
need
some
tooling
to
the
help
us
discover
patterns
patterns
in
that
data
and
help
us
understand
these
patterns
and
correlations,
because
we
make
no
mistake,
there
won't
be
as
one-size-fits-all
solution
for
everybody.
You
will
still
need
to
assist
the
computer,
and
the
computer
will
support
you
in
your
understanding
of
that
problem.
Domain,
you're
still
in
the
driver's
seat
and
after
learning
from
the
past,
you
want
to
apply
your
knowledge
to
the
future
event
to
some
future
events.
A
A
A
We're
slowly,
standards
like
open,
metrics
or
open
tracing
are
emerging
I,
don't
think
we
can
accelerate
the
speed
of
adoption
of
such
standards
and
come
and
open
source
tooling
by
having
a
voice
in
the
definition
of
such
standards
and
I.
Think
a
safe
is
a
great
way
and
place
to
do
such
a
thing,
and
with
that
I
would
like
to
open
it
up.
Foreign
discussions.