►
From YouTube: Overcoming Tomorrow's Operational Challenges with AIOps Phil Tee Moogsoft OpenShiftCommon AIOps SIG
Description
Overcoming Tomorrow's Operational Challenges with AIOps
Phil Tee CEO, Moogsoft
OpenShiftCommon AIOps SIG
April 29 2019
A
B
Perfect
I'm
super.
Well,
you
know
thanks
for
inviting
us
to
speak
along,
and
you
know
we're
we're
super
excited
to
see.
You
know
red
hats
and
really
dive
into
the
air
of
stuff,
and
you
know
it's
been
a
it's
been
an
area
and
a
market
that
have
moved
soft
we've
been.
You
know
somewhat
very
early
in
from
the
you
know,
from
the
history
of
the
company
going
all
the
way
back
to
2012.
B
You
know:
we've
been
long
convinced
the
AI
in
the
particular
guise
of
machine
learning
and
dead
science
has
got
a
pivotal
role
in
how
people
you
know
operate
the
sort
of
the
modern
infrastructure-
and
you
know
I-
can't
go
forward
without
sort
of
making
a
naked
pitch
for
what's
gonna
happen
on
May
the
15th,
which
is
our
a
our
ops
exchange
in
San
Francisco,
and
you
know,
if
you've
not
heard,
please,
google
it
and
please
register
the
device
and
to
have
fellow
travelers
along
so
without
further
ado.
I
want
to
talk
a
little
bit
about.
B
You
know
what
we
do
with
our
ups
to
help
people
with
with
operational
challenges
that
derive
principally
from
you
know,
the
modern
cloud
residents
stack,
and
you
know
my
personal
journey
with
this-
has
been
pretty
lengthy.
I've
been
involved
in
service
assurance
and
operations
since
the
early
1990s
having
co-founded
Micronesia
rota
protocol
neckhole,
and
you
know
that
a
bunch
of
startups
in
this
space
culminate
in
the
move
soft
and
this
little
light
right
here
in
a
kind
of
gives
you
a
sort
of
a
brief
history
of
time.
B
In
terms
of
you
know
what
I've
seen
changing
morph
over
that
period-
and
you
know
if
you
kind
of
go
back
to
the
original
mainframe
era,
which
was
sort
of
dying
out
as
I
entered
the
market
in
his
service
assurance,
was
actually
done
extremely
well
and
part
of
the
reason
for
that
was
you
know
a
single
vendor.
It
was
very,
very
simple:
it
was
pretty
low
scale
and
change
cycles,
were
you
know,
glacial
and,
to
say
the
least.
B
Then,
of
course,
we've
all
lived
through
the
distributed
computing
and
wave
where
in
it
is
started
to
erode
the
ability
to
you
know
to
deliver
service
in
a
very
straightforward
way,
just
because
scaling
and
clarity
started
to
enter
into
the
environment.
With
you
know,
multiple
vendors
standards
driven
best
in
breed
type
approach
and
really
what's
going
on,
is
the
kind
of
the
latest
phase
that
is,
you
know,
we're
all
living
in
the
cloud
residents
sort
of
digitally
transformed,
Enterprise,
these
spectacularly
complex
in
high
scale.
We
are
seeing
almost
a
sort
of
a
a
chaotic.
B
You
know
deployment
paradigm,
where
you
know
we
large
enterprises
have
moved
to
a
fully
agile,
devops
application
development
all
chain,
so
change
cycles
are
almost
instantaneous.
That
second,
very,
very,
very
large
scale,
and
you
know
in
requires
something
very
different
to
be
done,
and
you
know
this
is
kind
of
a
sort
of
almost
a
dimbo
slide
in
a
lot
of
ways,
just
to
sort
of
punch.
The
point
home
that
you
know
in
the
yesterday
datacenter,
you
know
if
you
were
writing
an
application.
B
You
know
you
wrote
an
app
to
run
on
a
specific
operating
system
to
run
on
natural
physical
server.
You
know
there
are
two
to
three
interactions
between
those
three
components.
You
know
you
have
to
worry
about
making
sure
the
right
version
you're
in
system
version
of
our
where
you
know
the
app
is
built
in
the
right
way.
The
app
is
built
to
optimize
the
bare
metal
you
know,
but
still
you
know
it
was
pretty
contained.
B
Then
you
look
today
and
actually
this
is
a
slice
through
our
cloud
in
terms
of
how
we
and
deploy
our
application.
In
you
know,
the
app
is
deployed
by
Jackson
is
configured
there,
how
that
is
driving
companies
that
is
driving
docker
that
runs
on
VM,
that
is
controlled
by
a
hypervisor
that
runs
on
a
road
consistent
that
runs
on
bare
metal
that
belongs
to
Amazon
right,
and
you
know
all
these
very
different
interactions,
and
you
know
the
dependence
is
there
in.
B
You
know
drying
two
orders
of
magnitude,
more
considerations
just
for
one
application
and
of
course
all
of
this
is
you
know,
as
as
our
friends
at
federates
that
I
were
pointing
out.
You
know
we're
also
navigating
through
the
whole
multiclad
and
instance
management
world
as
well.
Net
net
is,
is
the
complexity,
is
a
complete
phase
transition
step
change
to
what
it
was
before
and
there's.
So
what
about
that?
Is?
B
Where
say,
a
Wells
Fargo
completes
with
a
Bank
of
America
COMPETES
of
the
Citibank,
whereas
now
that
having
to
compete
with
digitally
transformed
startups-
and
this
is
across
every
segment,
every
part
of
the
economy
and
he
drives
a
step
change
in
how
you
manage
service-
and
you
know
it
really
comes
down
to
this
scale
in
complexity
that
is
driven
by
this
massive
upstage
in
you
know
the
number
of
moving
parts
in
one
applications-
application
of
structures,
much
more
chatty.
Never
data
points
is
measured
in
billions.
B
The
failure
modes,
you
know,
goes
up
into
this
10
to
120
number,
which
we've
worked
out
from
a
large
enterprise
and
Yahoo
from
a
while
back,
which
10
to
120.
By
the
way,
the
reason
I'm
an
ex
theoretical
physicist
reason
why
that
number
is
of
interest.
It
actually
is
more
information
than
we've
stored
in
the
observable
universe.
To
give
you
an
idea
of
how
bad
it
is,
and
the
complexity
and
in
terms
of
what
people
are
having
to
deal
with
is
usually
significant.
B
So
what
to
do
and
well
you
know,
if
you
look
across
a
lot
of
our
customers,
you
know
what
they're
seeing
is.
Is
this
overwhelmed
by
data
and
lack
of
information,
he's
driving
all
kinds
of
consequential
issues
with
the
quality
of
service
assurance?
The
you
know
the
amount
of
data
that
how
des
and
SR
exam
to
deal
with
the
amount
of
incidents
that
are
actually
being
captured
by
monitoring.
How
you
proceed
to
a
resolution
of
an
issue.
B
The
Australian
service
is
very
siloed
and
very
linear,
and
the
net-net
is
is:
is
it's
horrible
for
what
he
knew?
So
this
is
really
the
appeal
for
the
insertion
of
AI
and
data
science
in
operations,
and
it
kind
of
stems
from
you
know.
Back
in
the
day,
when
things
were
a
lot
more
predictable
and
a
lot
more
low
scale,
you
have
this
kind
of
state
and
measurement
separation
of
the
world
where
you
could
analyze
the
system
that
you
were
monitoring
work
out.
B
A
B
Work
them
out,
it
is
too
complicated.
So
you
can't
go
through
this
rules.
Model-Based
approach
to
monitor
for
the
known
failure
modes.
You
really
have
to
combine
state
and
measurement
and
use
the
data
that
you're
getting
from
the
monitored
systems.
To
give
you
indications
of
where
you
know
there
may
be
service
threatening
impacts
and
allowing
you
to
have
a
much
more
sort
of
fluid
approach
to
doing
it
so
long
story
short,
you've
got
to
use
techniques
of
dead
science
to
look
for
in
the
monitored
events
and
metrics
from
your
systems.
B
The
the
clues
as
to
where
service
may
be
being
compromised.
I'll
be
do
this
MOOC
software
or
incarnation
of
this
of
this
inside.
Is
these
really
a
pipeline
of
algorithms?
That
starts
on
the
left
hand
side?
Would
we
take
anything?
We
take
application
logs,
we
take
metrics
from
collecting
stats
Deenie.
We
take
traps.
A
B
Take
indication
from
ones
like
management
systems,
we
really
don't.
We
use
a
bunch
of
information,
theoretic
algorithms,
to
suppress
noise,
because
by
the
way,
most
of
that
data
is
junk
and
useless,
and
then
what
we
really
do
is
we're
looking
in
those
synthesized
flows
and
four
groups
of
self
correlated
events,
and
that
indicate
that
there
is
an
underlying
causal
thing
that
has
occurred,
which
has
caused
the
pattern
of
alerts
to
be
sent
to
us.
We
mine
that
data
for
these
groups
of
alerts.
B
We
call
them
situations
and
because
we're
extracting
this
entire
narrative
of
the
impact,
we're
able
to
take
a
collaborative
approach
and
to
inviting
people
into
a
virtual
incident
or
room
to
try
and
sort
of
remediated
the
thing
you
know
automatically
trigger
automation
and
then
mind
the
interactions
for
insights.
You
know
for
future
occurrences
and
I'm
just
going
to
cause
the
animation
to
finish
this,
a
slightly
more
detailed
view
of
it.
So
what
it
really
boils
down
to
is
sort
of
four
pillars.
B
You
know:
entropy
is
the
information
theoretic
algorithm
that
we
use
to
try
and
eliminate
the
noise?
This
correlation
is
grouping
these
combinations
of
unsupervised,
machine
learning
and
supervised
machine
learning.
We
can
take
indications
from
time
to
Paula
gene
text
in
the
data
to
form
these
clusters
of
alerts.
We
then
have
a
series
of
information,
theoretic
and
and
supervised
machine
learning,
algorithms
to
look
for
root
cause
inside
of
those
groups
of
alerts.
Some
of
them
are
in
your
natural.
B
Some
of
them
are
using
terms
that
you
might
not
be
familiar
with
vertex
entropy,
which
is
you
know,
actually
fundamental
research
that
we
that
we
did
and
we've
stopped
into.
You
know
how
you
can
take
hints
from
interrelationships
of
entities
to
look
at
where
you're
more
likely,
where
the
highlight
of
it
is
that
a
service
impacting
incident
of
a
riceball.
B
So
there
are
some
peer-reviewed
journal
articles
in
around
that
and
then,
as
some
of
that
was
we
have
this-
this
collaborative
war
room,
we're
looking
for
things
like
situation,
similarity
that's
across
the
similarity,
the
drawing
of
prompting
you
know
remediation
hints
and
to
accrete
a
a
sort
of
a
living
room.
Look
if
you
like,
and
if
there's
any
brothers
British
on
the
call
I
was
chuckle
when
I
see
the
little
bottom
right
hand
icon,
which
is
supposed
to
be
a
brain.
B
So
you
know
we
like
to
think
of
ourselves.
The
nails
leader
and
I'll
give
you
a
little
bit,
as
you
know,
the
reason
why
we
made
that
claim.
You
know
we've
been
at
this
since
2011,
we
filed
50-plus
patents
being
granted.
I.
Think
14
was
the
number
on
Friday
and
you
know
it's
about
20
they're,
all
grand
this
year.
B
On
top
of
that
we've
raised
about
a
hundred
million
in
funding
since
2012
goldman
sachs
is
going
red
point
wing
we're
about
a
year
away
from
being
a
cash
flow
generated,
we'll
never
need
to
raise
finance
again
we're
on
our
way
to
scale.
I've
got
150
of
the
14
mm,
use,
loss
software
and
you
know
the
team
is,
you
know,
comprised
of
a
bunch
of
folks
that
really
have
been
here
before.
B
This
will
be
my
third
at
scale
company
in
this
space
and
we've
got
leaders
from
you,
know:
Splunk
Qualis
at
dynamics
and
other
businesses
in
there
as
well,
and
and
if
you
look
at
the
the
breadth
of
the
the
IP
that
we've
built,
I
mean
this
is
just
a
vista
across
places
where
we
have
either
both
father
patterns.
All
produce
their
peer
review
paper,
there
is
generated
IP,
there
is
in
the
product,
so
everything
from
below
significance.
B
Timestamp
pans
text
by
these
similarities,
use
of
topology
use
of
entropy,
of
graphs,
use
of
deep
learning
in
you
know
across
the
correlation
energy,
in
feedback
and
root,
cause
kind
of
form,
part
of
our
platform
and
and
and
that's
that's
it
for
me
again-
I'll
just
say
whew.
You
know
anybody
is
welcome
to
come.
Join
us
on
the
15th
of
May
in
the
Four
Seasons
just
hit
our
website.
It's
pretty
prominent
any
questions
and
thank
you
for
listening.