►
From YouTube: OpenShift Commons AIOps SIG Full Meeting March 25 2019
Description
OpenShift Commons AIOps SIG Full Meeting March 25 2019
What is AI Ops - Marcel Hild (Red Hat)
ARCANNA - Bogdan Dass (Siscale)
Diskprohet - Brian Jeng (Prophetstor)
co-chairs: Diane Mueller and Marcel Hild (Red Hat)
A
Right
so,
let's
just
get
up.
Welcome
everybody
on
this
very
first
open
ship,
Commons,
ail
big
meeting.
If
you
haven't
yet
joined
the
Google
Group,
the
link
is
there
in
the
hack
and
denote
to
join.
That's
what
we're
going
to
be
using
or
our
open
mailing
list
for
the
group
for
QA
and
sharing
resources
and
events
and
other
things.
A
If
you
don't
know
me
yet
my
name
is
Diane.
Mueller
I
am
the
director
of
community
development
for
Red
Hat
for
the
cloud
platform
which
we
use,
lots
of
different
things,
open
shift,
operators
and
all
the
workloads
that
run
on
openshift
and
all
the
related
upstream
projects
I
get
to
play
in
such
as
kubernetes
and
the
C&C
F
stuff,
and
my
colleague
Marcel
Hill,
approached
me,
oh
about
a
month
or
two
ago
about
setting
up
this
cig
and
so
I'll.
Let
Marcel
introduce
himself
and
we'll
go
from
there.
B
Can
you
see
my
screen,
the
red
one,
absolutely
cool,
so
thanks
Diane,
my
name
is
Marcell
hilt
and
I'm
working
in
a
group
of
Red
Hat's
office
of
the
CTO
called
the
AI
Center
of
Excellence
and
obviously
we're
looking
into
all
things.
Ai
related
and
I'm
specifically
focused
on
the
broader
theme
of
AI
ops,
what
it
means
for
Red,
Hat
and
the
general
community.
B
B
So
changes
to
applications
were
driven
by
the
developers
and
I
mean
they
change
their
systems
multiple
times
a
day.
If
you
look
at
continuous
rollouts
and
continuous
deployments,
so
what
we
did,
we
disconnected
the
components
and
brought
them
together
via
micro
services
again
and
I
mean,
if
you
look
at
it.
The
same
concept
applies
throughout
the
whole
stack
like
distributed.
Compute
distributed
storage,
distributed
applications
and
orchestration
of
services
and
with
cloud
native
tooling,
such
as
containers
and
kubernetes,
were
basically
able
to
infinitely
scale
out
and
I
mean.
Obviously,
this
comes
at
a
certain
price.
B
More
components
mean
more
complexity,
but
then
I
mean
hey
IT
operations
in
the
DevOps
world.
We
need
to
know
when
something
isn't
working
and
we
need
to
know
it
now,
because
we've
committed
to
those
to
those
five
nines
of
uptime,
SLA
and
stuff.
So
to
control
these
complex
systems,
we
need
to
introduce
some
sort
of
instrumentation,
some
telemetry
is
required,
and
so
we
produce
more
metrics,
more
locks
more
stuff.
And
again
we
do
this
at
every
layer
of
the
stack,
because
every
persona
brings
different
needs.
Developers
needs
in
stack.
B
Traces
operation,
folks
need
latency
and
timeout
metrics.
So
how
can
a
single
human,
possibly
comprehend
such
a
system,
we're
creating
complex
systems
of
rules,
alerts
and
thresholds
and
guess
what
we
can't
keep
up
with
updating
our
alerts
because
the
system's
being
monitored
change
at
a
faster
pace?
B
It
might
look
like
magic,
whether
at
the
bottom,
it's
a
classification
problem
if
input
a
then
output,
X
and
actually
we're
doing
this
and
all
these
fields
that
have
a
great
and
powerful
monetary
background
like
showing
you
the
most
relevant
cat
images
or
with
a
great
media
coverage
like
beating
human
players
in
almost
every
game
out
there.
But
what
about
operations?
B
What
about
ops
and
I
think
here
we're
just
starting
to
apply
all
these
techniques
to
our
very
own
special
field,
so
in
other
words,
if
your
website
is
slow
because
your
storage
is
slow,
a
computer
can
tell
you
that
and
even
better
if
your
website
is
slow,
because
somebody
flipped
a
bit
somewhere
in
a
not-so-distant
system
with
sufficient
input
and
with
sufficient
training
data.
A
computer
can
also
possibly
tell
you
that,
yes,
your
website
is
slow
and
if
you
flip
this
bit
back,
it's
going
to
be
fast
again.
B
So
what's
a
IUP's
anyway,
Gartner
can't
this
term
like
some
years
ago,
and
it
goes
like
this
AI
ops
platforms,
software
systems
that
combine
big
data
and
AI
or
machine
learning
functionality
to
enhance
and
partially
replace
a
broad
range
of
IT
operations,
IT
operations,
processes
and
tasks,
including
availability
and
performance
monitoring,
event,
correlation
and
analysis,
IT,
Service,
Management
and
automation.
There's
a
lot
of
words
in
there
and
I've
prop
provocatively
highlighted
these
words.
Ai
replaces
IT
operations
because
yeah
people
tend
to
think
like
this,
but
like
it
replaces
truck
drivers
at
some
point.
B
Although
the
self-driving
cluster
will
probably
be
marketed
sooner
than
we
think,
I,
don't
think
that
IT
operations
will
be
replaced
anytime
soon,
but
we
will
certainly
use
big
data
and
machine
learning
to
support
our
monitoring
and
automation
needs.
It's
just
that.
It's
just
another
tool
to
make
you
more
effective
and
efficient
the
tool
to
support
us
and
not
to
replace
us,
and
indeed
this
is
something
that
we
as
Red
Hat's,
believe
in
and
we
are
invested
in
it
they.
B
This
is
another
good
quote
from
one
of
our
team
members.
The
key
next
step
for
systems,
management
and
software
development
is
the
replacement
of
heuristics
and
fixed
limits
with
learned
models
aspire
a
stepper
from
our
sauce
from
our
CDO
office,
and
so
not
only
for
operations,
but
also
in
our
development
processes.
We
have
to
apply
that
piece
of
machine
learning
supports
to
again.
This
is
how
Gartner
describes
in
AI
ops
platform
at
the
center
there's
big
data
and
machine
learning,
and
then
it's
a
cycle
of
continuous
insights
being
delivered
to
these
three
domains.
B
Here,
monitoring
in
the
upper
left
corner
will
benefit
from
smart,
alerting
and
like
dynamic
thresholds,
as
seen
before,
the
Service
Desk
will
move
from
a
reactive
to
a
more
Pro
proactive
engagement
model
with
higher
efficiency
when
it
comes
to
troubleshooting
and
stuff,
and
ultimately
your
actions
are
highly
automated
and,
at
some
point
with
less
and
less
human
interaction
and
I.
Think
this
perfectly
aligns
with
the
four
phases
of
AI
ops.
So,
first
without
data
you're,
nothing
like
so
data
is
the
new
oil,
and
so
we
need
to
get
our
data
collection
straight.
B
We
need
to
make
sure
that
we
have
systems
that
emit
the
required
telemetry
and
that
we're
able
to
store
it
for
a
longer
term
than
just
your
two
days
of
retention
period.
Plus
you
need
some
tools
for
visualization,
your
images.
You
convey
meaning
and
a
log
file
entry
that
might
be
obvious
to
you,
the
author
of
that
lock
emitting
system.
But
what
about
all
the
metadata
in
that
entry?
How
do
you
paint
a
broader
picture
over
time?
B
Then
we
need
some
tooling
to
the
help
us
discover
patterns
patterns
in
that
stator
and
help
us
understand
these
patterns
and
correlations,
because
we
make
no
mistake,
there
won't
be
as
one-size-fits-all
solution
for
everybody.
You
will
still
need
to
assist
the
computer
and
the
computer
will
support
you
in
your
understanding
of
that
problem.
Domain,
you're
still
in
the
driver
seat
after
learning
from
the
past,
you
want
to
apply
your
knowledge
to
the
future
event
to
some
few
two
events.
B
You
want
to
know
that
your
application
needs
to
scale
out
before
you
actually
hit
all
that
traffic
and
last
but
not
least,
you
finally
want
to
know
the
reason
why
something
is
failing
like
on
the
spot
or
in
post
mortem.
This
is
the
classic
needle
in
the
haystack
problem.
Let
the
computer
guide
me
to
that
flipping
bits.
That's
caused
my
outage
somewhere
and
I.
B
We're
slowly.
Standards
like
open,
metrics
or
open
tracing
are
emerging
even
think
we
can
accelerate
the
speed
of
adoption
of
such
standards
and
come
and
open
source.
Tooling,
by
having
a
voice
in
the
definition
of
such
standards
and
I,
think
it's
sic
is
a
great
way
and
place
to
do
such
a
thing,
and
with
that,
I
would
like
to
open
it
up.
Foreign
discussions.
A
A
The
sort
of
style
that
we
use
for
sig
meetings
here
in
the
Commons
is
I
and
Marcel,
as
chairs
will
ask
people
to
come
and
share
their
perspectives
to
sort
of
spur
conversation
and
to
do
some
background
education,
and
today
we
have
with
us
do
folks
who
have
been
doing
a
lot
of
work
in
this
space
from
by
scale.
Bogdan
sass
is
our
next
speaker
and
he's
going
to
talk
a
little
bit
about
our
kana
and
I'll.
A
C
As
I
will
as
soon
as
I
finish
the
presentation,
so
hopefully
everybody
should
now
see
a
very
big
screen
seeing
elkanah.
Yes,
that's
great,
so
hello,
everyone,
my
name,
is
Bogdan
sauce
I
am
a
principal
solution,
architect
with
Sai,
Stella
and
I'm.
Here,
to
talk
to
you
about
a
solution
that
we
as
I
still
have
been
developing.
It's
called
Arcana,
it's
a
short
name
for
a
very
long
duration
for
a
very
long
name.
Actually,
automated
cause
analysis.
C
C
C
Nobody
knows
where
the
issue
is
nobody,
maybe
hours
later,
nobody
even
knows
where
to
get
started
on
fixing
that
issue,
and
the
problem
here
was
very
well
pointed
out
by
Marcel
earlier.
It's
I
wanted
to
use
this
image.
I
want
you
to
talk
about,
searching
for
a
needle
in
a
haystack,
but
I
think
Marcel
put
it
too
much
better.
It's
like
chat
and
mice
came
and
the
mice
are
multiplying
like
crazy.
Just
a
few
years
ago
you
had
your
physical
server
and
you
had
your
application.
C
C
You
have
much
more
places
in
which
something
can
go
wrong
and
identifying
the
true
culprit
when
something
does
go
wrong
is
becoming
more
and
more
difficult
task,
but
we
also
have
some
very
nice
very
useful
technology
that
can
help
us
and
since
I
don't
know,
if
everybody
here
is
familiar
with
elasticsearch
and
the
elastic
spec
I
will
just
do
a
very
quick
presentation
of
them.
First
of
all,
elastic
search
started
as
a
tool
for
searching
through
huge
amounts
of
text.
C
It
is
also
a
very
powerful
way
of
dealing
with
time
series
data,
and
nowadays
we
are
seeing
elasticsearch
being
used
more
and
more
for
monitoring,
because
it's
a
very
it
works
very
well
as
a
kind
of
no
sequel
database.
You
can
just
populate
it
with
the
time
series
data,
the
matrix
that
you
want
to
collect
and
then
aggregate
correlate
work
with
those
metrics
also
around
elasticsearch.
We
have
an
entire
ecosystem.
Now
it's
the
elastic
stack.
It
was
originally
called
ELQ
for
elastic
search.
C
C
If
you've
ever
had
to
collect
information
from
multiple
devices
belonging
to
multiple
vendors.
You
already
know
this
issue.
I
want
to
know
what
user
has
performed,
a
specific
action
and
all
the
actions
are
logged,
but
what
it
is,
the
field
for
the
user.
Is
it
user
username
use
of
that
name,
nginx
that
access
that
user
underscore
name?
It's
very
difficult
to
correlate
data
when
the
fields
that
are
being
used
differ
between
different
tools
and
different
vendors,
and
this
is
where
elastic
has
come
up
with
a
very
nice
idea.
It's
called
the
elastic
commerce
schema.
C
It's
an
open
source
specification
that
defines
a
common
set
of
documents,
filled
documents
fields
for
data.
Once
you
apply
this
elastic
common
schema
once
all
your
data
is
indexed.
In
the
same
way,
it
becomes
easy
to
correlate
data
from
different
data
sources.
So
that's
one
problem
that
I
do
not
say
it
is
solved,
but
it
is
in
the
process
of
being
solved.
C
The
second
problem,
when
collecting
data
is
this
one?
This
is
an
actual
demo
created
by
the
people
at
elastic.
It
shows
real-life
troubleshooting
scenarios
using
elastic
search
term,
a
problem
that
occurred
in
an
application.
It's
a
basic
application
with
multiple
processes,
multiple
microservices
making,
update
application
and
at
some
point
we
get
an
alert
and,
as
usual,
the
alert
doesn't
say
too
much
for
performance
on
the
server
from
there
we
go
to
the
dashboards,
and
without
going
into
the
details,
we
start
to
dig.
We
start
to
look
what
has
happened.
When
has
the
problem
started?
C
C
We
see
that
there
seems
to
be
a
problem
with
one
of
the
containers
running
on
one
of
the
nodes
we
go
into
that
container
and
we
see
some
spikes
on
on
the
CPU
usage.
You
go
there
and
fight.
We
look
at
the
processes.
We
see
that
there
is
a
backup
process
that
actually
runs
at
a
certain
interval
and
everything
becomes
slow.
Well.
That
Becca
process
is
writing.
C
Joy
now
I
went
very
quickly
to
all
of
this,
but
the
problem
here
is
that
there
are
many
sources
of
data,
many
places
where
something
could
go
wrong
and
many
times
we
do
not
know
where
to
start.
We
start
digging.
Look
at
the
servers.
Look
at
the
network.
Look
at
the
application.
In
the
end,
we
will
manage
to
isolate
the
problem,
but
it
takes
a
lot
of
work.
It
takes
a
lot
of
time
and
the
question
was:
can
we
do
things
better?
C
Can
we
improve
the
time
it
takes
to
identify
the
actual
the
root
cause
and
we
believe
we
can,
because
the
problem
here
is:
how
do
we
make
sense
of
the
mountain
of
data?
How
do
we
use
the
machine
to
help
us
to
give
us
a
starting
point
to
point
us
in
the
right
direction
to
identify
the
root
cause
enter
our
camera?
This
part
here
is
what
we
already
have.
This
is
elasticsearch
collecting
data
from
our
infrastructure.
C
C
Then
we
try
to
identify
the
probable
the
root
cause
for
those
events
and
with
that
so
it
with
that
we
can
engage
the
appropriative
once
the
problem
has
been
solved.
The
feed
actually
goes
back
into
Akana.
We
tell
the
system
what
has
happened,
whether
the
determination
was
correct
or
not
in
the
system
learns
from
our
feedback.
C
Again,
we
have
our
system,
we
have
the
elasticsearch
with
all
the
data
we
have
our
Keanu,
which
is
basically
a
plugin
for
Shabana,
the
data
visualization
in
the
console
for
elasticsearch
and
inside.
We
are
adding
a
tensor
flow
to
provide
marshal
learning
model
that
actually
gets
access
to
all
the
data.
So
the
machine
learning
system
looks
at
the
data
and
try
to
identify
what
the
route
calls
might
be.
C
It
is
that
enough,
if
the
determination
correct,
we
don't
know
right
now,
it
is
not,
but
we
provide
feedback
after
the
troubleshooting
steps
have
been
completed
after
the
root
cause
has
been
positively
identified,
the
user
or
voice
feedback
for
our
channel,
the
user
tells
the
system
yes
you're
right.
This
was
the
actual
root
cause,
or
no,
that
was
not
correct.
The
actual
would
calls
was
something
else.
C
It
was
that
work
and
the
system
learns
and
all
the
data
also
goes
back
into
elastic,
stack
into
elastic
search
and
with
this
information,
the
system
continually
improves
with
time.
It
learns
to
identify
the
root
cause
correctly,
and
this
is
what
we
have
now,
but
what
about
the
future?
How
can
this
system
be
used
in
the
future?
I
need
to
specify
that
we
are
not
there
yet,
but
think
about
future,
in
which
we
can
actually
take
action
when
we
are
reasonably
confident
that
the
root
cause
has
been
correctly
identified.
C
What
if
we
have
more
than
80
percent
sure
that
the
issue
was
a
bigger
process
running
on
the
database
server?
Can
we
go
in
and
automate
the
solution?
We
believe
we
can
if
we
have
a
certain
confidence
threshold-
and
we
are
above
that
threshold
we
just
go
in.
We
have
an
instable
script.
The
script
goes
to
the
server
and
takes
corrective
action.
C
We
might
be
heading
to
a
point
where
the
problem
is
solved
before
the
users
even
notice
it.
It
will
not
apply
to
all
the
problems,
but
if
it
applies
to
50
60
70
%
of
the
problems,
it
will
free
up
a
lot
of
time,
a
lot
of
resources
for
the
people
actually
doing
the
investigation
now,
just
to
show
you
what
the
interface
looks
like
this
is
the
interface
Oracle.
If
you
have
ever
worked
with
elasticsearch,
it
will
look
very
familiar
because
it
is
nothing
more
than
another
plugin
for
abana.
C
This
is
where
you
define
the
machine
learning
jobs.
This
is
where
you
tell
it:
what
fields
to
take
into
consideration
for
the
ml
job
and,
of
course,
can
also
remain
some
of
the
fields.
If
you
need
to
do
so,
you
can
rename
them
from
the
interface
the
ml
job
starts
writing
and,
in
the
end,
we
get
an
output
like
this.
C
These
are
the
events
that
are
identified
and
I
cannot
believe
that
these
three
are
part
of
the
same
set
of
symptoms.
They
have
the
same
underlying
root
cause.
We
have
a
web
server
reporting
on
internal
server
error,
a
500
error
message:
we
have
an
secrecy,
query
server,
saying
that
it's
unable
to
write
to
disk.
We
have
a
server
that
is
out
of
memory.
I
Kenna
believes
that
this
out
of
memory
was
the
root
cause.
I
cannot
believe
that
we
should
investigate
this
particular
server.
First,
it
is
correct.
Is
it
not?
C
We
go
in
we
investigate
with
our
phone,
our
investigational
troubleshooting
steps
as
usual,
and
in
the
end
these
are
actually
toggles.
You
can
switch
them
between
a
good
cause
and
symptoms.
In
the
end,
you
can
go
in
and
tell
the
system
yes
good
job
or
no.
That
was
not
correct.
Try
to
do
better
next
time
and
the
system
will
improve.
C
Keep
in
mind
that
there
already
is
a
level
of
machine
learning
in
the
elastic
spec.
Elasticsearch
already
has
unsupervised
machine
learning
that
can
already
reduce
some
of
the
noise.
It
can
detect
only
anomalies
it
can
detect
when
something
deviates
from
normal.
We
are
adding
on
top
of
that.
We
are
adding
the
supervised
machine
learning
component
and
the
automated
root
cause
analysis.
So
the
tools
that
we
have
go
up
to
step
3
that
Parcells
mentioned
earlier.
C
Now
we
are
adding
step
for
the
automated
RCA,
automated
cause
analysis
and
of
course,
on
top
of
that,
you
can
add
place.
You
can
notify
the
correct
teams,
you
can
add,
play
books
for
automatic
remediation
if
they
would
cause
identification
is
reasonably
confident
and
you
can
always
provide
feedback
and
the
system
will
learn
from
the
feedback
you
provide.
C
So
that's
it
for
our
canna,
of
course.
If
anybody
has
any
questions
for
the
system,
I
will
be
glad
to
answer
them.
Just
please
don't
ask
me
too
much
about
the
machine
learning
part.
I
am
NOT
a
developer.
A
lot
of
that
is
magic
to
me.
I
will
have
to
ask
my
colleagues
who
have
actually
written
the
code
for
that.
A
B
Yeah
one
one
remark
for
me-
and
this
is
my
cell
again
so
I,
really
like
the
setup
that
you
are
trying
to
plug
into
existing
technology,
because
I
think
that's
crucial
that
you
need
to
have
your
monitoring
stack
in
place
before
you
actually
can
do
put
some
AI
or
machine
learning
on
top
of
it.
So
it
nicely
integrates
with
what
people
already
have,
but
then
improve
scratch
early.
B
Their
solution
and
our
authors
also
think
that
the
feedback
loop
is
a
very
important
thing
that
most
people
often
oversea
overlook,
because
in
the
end
a
machine
can
only
be
as
smart
as
you
train
it
to
be.
So
maybe
one
question
for
for
your
developers
would
be:
is
there
way
audit
you?
Did
you
analyze
how
much
better
the
the
network
got
over
time
and
how
much
feedback
was
required
to
train
it
up
to
a
certain
point
where
it
could
reliably
identify
some
of
the
root
causes.
D
C
Say
but
I
do
have
some
good
news
here
and
there
I
forgot
to
tell
you
about
that
in
the
presentation,
this
technology
will
be
open
source
and,
as
Colleen
has
said,
everything
will
depend
on
the
size
of
your
net
or
the
complexity
of
your
network.
The
type
of
data
you're
collecting
the
type
of
issues
you're
encountering
how
many
of
them
are
repeated,
how
many
of
them
are
new
and
so
on,
but
everything
all
the
code
will
be
open
sourced.
B
That's
that's
a
very,
very
good
news
and
I
saw
that,
on
your
other
talk.
Actually
one
of
our
team
members
and
also
prototype
some
similar
solution
and
I
see
already
some
some
place
for
collaboration
there.
So
we
also
plug
into
elastic
and
we
train
a
model
not
and
neural
network
model,
but
a
model
of
self-organizing
map
to
Fleck
anomalies
in
lock
files
you're
going
one
step
further
of
actually
pinning
down
some
root
causes
we're
only
looking
at
a
stream
of
lockfile
messages
and
wants
to
detect
something
the
normal
is
in
the
content
of
those
messages.
A
All
right,
well,
I,
don't
see
any
other
questions
in
the
chat
and
we
do
have
another
big
group
member.
That's
going
to
give
a
talk
this
today,
Brian
gain
from
office
tour
is
going
to
talk
about
disk
profit,
so
I'd
like
to
share
the
floor
with
him
if
you
could
take
over
the
screen,
sharing
Brian
and
make
sure
your
your
sound
is
on
we'll
take
that
conversation
up
too
and
please
keep
asking
questions
if
you
have
it
and
bogden.
Thank
you
very
much
for
the
presentation.
A
E
Should
see
there.
E
So
in
2016
we
partnered
with
another
big
company
that
actually
presented
at
Red,
Hat
storage
Day
in
Seattle
that
wanted
to
do
a
petabytes,
F
cluster
for
OpenStack
Club,
and
they
found
there
was
three
major
stability
issues
with
a
stuff
cluster
that
was
sort
of
blocking
their
project.
The
first
one
was
that
every
time
disk
failed
or
an
OSD
failed,
the
map
would
change
the
crush
map,
which
would
cause
placement
reappearing
and
back
filling
or
the
cluster
would
rebalance
to
heal
itself.
E
E
But
it
essentially
did
the
same
thing.
We
could
predict
disk
failures,
six
weeks
in
advance
and
then
they
had
they
drew
out
all
this
architecture
stuff.
But
the
most
important
thing
is
this
craft
at
the
bottom
right.
You
can
see
that
there's
a
normal
workload
here
of
around
400
or
so
I
ops
and
then,
when
they,
when
they
simulated
a
disk
failure
by
just
pulling
a
disk,
they
found
that
the
cluster
performance
dropped
below
200,
so
they
dropped
around
40
to
50%
IAP
sand.
Persisted
that
way.
E
Sorry
it
persisted
that
way
for
the
whole
duration
of
the
test,
so
800
minutes
around
12
hours
or
so
versus
with
our
disk
prediction.
You
can
see
that
with
being
able
to
know
it
just
is
about
to
fail
in
advance.
We
can
take
pre-emptive
measures,
we
can
disable
the
cluster
rebalancing
and
then
we
can
remove
it,
the
disk
and
replace
it
within
an
hour
and
how
the
performance
go
back
up
to
a
fraction
of
the
time
in
a
fraction
at
the
time
right
and
then
the
same
company
tested
art.
E
Our
prediction
engine
against
20,000,
drives
over
the
course
of
90
days
and
they
found
that
we
had
an
accuracy
rate
of
96%
and
a
recall
rate
of
97%
and
the
recall
rate
is
actually
the
more
important
statistic
here.
It's
it's.
The
number
of
correctly
predicted
failed
disks
over
total
number
of
failed
disks.
So
out
of
every
100
discs
that
failed,
we
would
correctly
predict
97
of
them
it,
and
then
this
is
just
shows
that
we're
already
integrated
in
the
set
community
we're
a
we're
called
the
disc
prediction.
E
E
You
can
use
that
with
ansible
a
chef
puppet,
any
kind
of
automation,
software
to
make
it
simple
for
a
mass
appointment
and
our
biggest
our
biggest
account
right
now,
there's
actually
in
Michigan,
there's
three
universities,
Wayne
State,
Michigan,
State
and
University
of
Michigan,
and
what
what
they're
set
up
is
they
all
three
of
these
campuses
share?
A
single
Giants
F
cluster
and
they
put
all
their
research
data
on
this
Seth
cluster.
E
So
it's
they
have
to
make
this
F
cluster
as
resilient
as
possible,
and
so
what
we
provide
is
just
the
dis
predictions
and
allowing
them
to
monitor
the
health
of
their
disks
before
they
fail
right
and
I'm.
Just
gonna
go
through
a
quick
live
demo,
I'm
gonna
switch
screens
here
you
guys
see
my
my
web
browser.
E
E
So
you
know
so
this
is
so.
This
would
be
where
you
would
go
for
the
disk
details
and
then
we
also
alluded
to
it
earlier.
We
also
have
prediction
for
capacity
and
performance
so
over
here
we
have
the
cluster
capacity,
but
we
also
go
down
to
the
OST
level.
I'll
just
use
I'll
just
use
pools
because
it's
more
more
interesting
and
then
we
can
predict
future
use
future
capacity
for
the
next
up
to
next
ninety
days.
E
A
E
A
B
E
Yeah,
because,
in
order
to
be
with,
if
they
wanted,
like
a
lightweight
version
of
our
predictor
and
then
so
we
would,
we
just
gave
them
like
one
with
with
less
baggage.
That
would
be
only
70%
accurate,
that
they
could
enable
locally.
But
it
would
it
wouldn't
use
all
the
metrics
that
were
provided
for
the
prediction
it
was
requested
by
them
to
have
a
local
lightweight
lightweight
package.
Okay,
yeah.
A
And
you
may
all
notice
that
I've
added
as
many
names
as
I
recognize
and
the
chat
into
the
attendees
list
if
I
got
your
affiliation
wrong,
please
off
into
the
hacker
MD
of
the
heck,
MD
notes
and
and
correct
me
you're.
Almost
ten
minutes
left
myself
and
I'd
like
to
talk
a
little
bit
about
some
of
the
goals
for
this
group.
One
one
is
we're
just
trying
to
reach
out
and
build
a
community
around
a
iaat
yeah
and
make
sure
that
we
have
some
of
the
resources
that
people
are
looking
at
and
requiring.
A
So
thank
you
both
Bogdan
and
Ryan,
for
sharing
your
insights
and
your
tooling.
That's
a
great
start,
and
if
there
are
other
topics
that
people
want
to
talk
about
or
present
on
or
questions,
you
have
please
reach
out
to
us
again
sign
up
through
the
the
Google
Groups
and
and
ask
for
those.
If
there's
anyone
here
that
in
looking
the
chat
that
has
any
questions
not
seeing
any
I'm,
hoping
that
some
of
you
will
have
some
suggestions
for
upcoming
topics
and
we
can
move
forward.
We
were
planning
on
doing
this
on
Mondays
at
9
o'clock.
A
So
if
you're
interested
in
getting
together,
then
please
reach
out
to
Marcel
or
myself
and
we'll
start
coordinating
a
face
to
face,
sometimes
probably
in
September
on
this
so
Marcel.
If,
if
you
wanted
to
add
a
few
more
words
in
here,
I've
added
a
few
resources
down
the
end,
if
everybody
could
send
me
PDF
versions
of
their
slide
decks
on
2d
Mueller
at
Red,
Hat
comm.
That
would
be
great
and
I'll.
Add
them
in
as
well.
A
A
The
video
of
this
session
to
the
Google
Groups
list
and
I'll
create
a
YouTube
playlist
for
these
topics
and
edit
them
and
get
them
up,
hopefully
in
the
next
24
hours
or
so,
is
there
anything
else
anyone
would
like
to
add
I'm
while
we're
here
I'm
just
check-in
the
chat
again
and
I'm,
not
so
hopefully,
I've
gotten.
Everybody's
affiliation
is
correct.
If
not
I
posted
the
link
already
into
the
Google
group,
and
we
will
we
can
correct
it
from
there.
Oh
thanks
again,
everybody
for
attending
and
we'll
be
back
again
in
another
month,
beep.