►
From YouTube: Hard Drive Lifeguard - Manpreet Singh
Description
2015 HTM Challenge Application submission
B
Want
my
name
is
monthly,
then
I'm,
working
on
hard
drive
life
car
project
for
hackathon
will
give
you
a
brief
background
on
the
project.
While
I
work
with
developer
community,
I
see
that
most
of
the
quality
time
is
spent
on
operational
burden,
while
solving
problems
for
both
applications
and
infrastructure
side,
and
I
strongly
believe
that
this
can
be
solved
using
artificial
intelligence
techniques.
B
Traditional
machine
learning
techniques
required
lot
of
label
data
to
perform
classification
on
the
other
end,
new
pic,
which
is
which
can
perform
online
learning
and
perform
predictions
and
anomaly
detection
is
a
much
better
choice
here.
To
brief
you
on
the
architecture,
data
ingestion
takes
place
through
different
collectors
used
on
H
HBase
clusters.
Smart
attributes
are
collected
into
a
common
database
from
where
it
is
fed
to
the
algorithms
to
perform
a
normally
detection.
B
I
will
take
you
through
the
data
set
and
demo
data
set,
consists
of
64
attributes,
SMART
attributes,
and
this
data
is
collected
from
one
of
the
universities
from
multiple
drives.
Each
line
in
data
section
contains
data
from
one
smart,
read
temporal
in
nature,
with
last
column,
being
1
or
0,
which
is
a
class
attribute,
defines
failed
or
good
drives.
B
If
I
go
to
life
quadrunner,
the
first
thing
I
do
is
I
select
the
feature
vector
using
z-score
technique.
I
split
the
data
into
good
and
bad.
Then
new,
good
and
bad
models
are
created.
The
idea
is
when
the
new
data
comes
in,
it
goes
through
the
good
model
in
the
bad
model
and
then
how
many
scores
are
calculated
based
on
the
anomalies
course.
The
classification
is
then
done.
B
C
B
The
data
that
we
printed
for
sample
data
can,
we
can
be
very
clearly
seen
that
it
started
learning
after
a
while
and
the
normally
score
becomes
zero
here.
This
is
what
I
could
achieve,
and
I
would
like
to
thank
new
mentor
community
for
helping
me
out
to
achieve
this
and
I
hope
from
alpha
and
beta
stage,
I'm
able
to
get
this
onto
gamma
and
prod
someday.
Thank
you
very
much.
A
But
the
charting
still
confuses
me:
I,
don't
really
understand
what
he's
what
he's
charting
there
I
tried
to
work
it
out
with
them
on
the
post,
and
this
could
be
worked
out
with
probably
a
communication
here,
but
but
I'm
still
sort
of
confused
about
whether
he
seems
to
think
he
got
good
results,
but
from
the
charts
I
don't.
I
don't
really
see
that.
So
that's
my
takeaway
yeah.
E
F
C
A
Up
wikipedia
it's
not
soft
air,
real
yeah
I
believe
I
advise
having
to
retry
to
pick
one
or
a
small
handful
of
those
values
that
were
changing
the
most
or
at
least
most
applicable
to
the
problem
when
he
did
and
look
through
it.
You
know
with
his
own
eyes
and
just
create
models
with
that
small
number
of
field
yeah
yeah.
F
I
think
in
general
you
know
these
are
kind
of
great
kind
of
problems
that,
but
often
you
spent
you
need
to
spend
as
much
an
engineering
time
verifying
that
you're
getting
valid
results
because
it's
often
it's
very
hard
sometimes
to
tell-
and
you
need
to
really
expect
engineering
time
on
that.
So
he
might
have
done
that
by
didn't.
I
couldn't
tell
that
he
one.
C
G
A
D
D
Think
this
is
a
nice
application
idea,
because
hard
drives
are
really
critical
mission
critical
devices.
Obviously-
and
you
could
imagine
there-
would
be
fluctuations
in
some
measurements
before
a
catastrophic
failure
and
so
to
the
extent
that
you
can
detect
those
fluctuations
and
I
could
imagine
there
would
be
temporal
in
nature,
not
spikes
or
anything
like
that.
But
if
you
could
detect
that
before
a
hard
drive
failed,
that
could
be
pretty
important
greed.
There's
potential
here,
I
think.
E
Place
of
right,
interesting
so
familiar
with
it
with
what
smart
clicks.
So
some
things
are
failure,
rights
and
you,
if
you
see
discontinuity
failure
rates,
then
it's
worth
learning
about
other
things,
though,
do
have
temporal
drift
like
ambient
temperature
and
in
fact
there
you
probably
do
want
to
use
a
hard
threshold
mm-hmm
and
not
yeah.
F
G
A
F
Think
I
think
in
general
you
could
build
one
model
and-
and
if
you
built
the
model
to
did
classification,
so
you'd
have
to
have
labeled
data
and
we
don't
we
don't.
We've
done
this
internally
in
us.
There's
not
a
lot
of
good
tools.
I
think
right
now
available
to
make
it
easy
to
other
people
to
do
that,
so
it's
totally
capable
doing
that
the
algorithms
are
totally
capable
of
doing
that,
but
we
haven't
really
expose
it
and
well.
F
We
found
that,
since
we
focus
on
anomaly
detection
found
that
people
were
raiding
clever
and
they
were
using
anomaly
detection
to
classification
like
this
and
it
seems
to
work
pretty
well.
So
is
this
really
the
beautiful
way
of
doing
it?
No,
does
it
work
yeah
and
in
like
someone
else
lot
of
it
and
now
a
lot
of
people
have
done
it
so,
but
ultimately
we
can.
We
can
do
it,
I,
don't
know
we
can
do
a
better
job.