►
From YouTube: Clustering in ML.NET
Description
Video to show how to perform the clustering machine learning algorithm with K-Means in ML.NET.
Text version of this sample - https://jonwood.co/blog/2019/3/2/clustering-in-mlnet
Data used - https://www.kaggle.com/dongeorge/seed-from-uci
Sample code - https://github.com/jwood803/MLNetExamples/tree/master/MLNetExamples/SeedClustering
Contact:
Twitter: https://twitter.com/JWood/
Blog: https://jonwood.co/
A
Hey,
so
in
my
other
in
Medinah
videos,
I
used
datasets
that
had
a
label
in
them
so
that
the
algorithms
can
use
that
to
make
a
predictive
model.
However,
there
are
some
datasets
that
don't
have
those
labels
on
them,
but
we
could
still
use
them
for
some
machine
learning
models
when
popular
type
is
called
clustering,
which
is
where
the
algorithm
will
attempt
to
cluster
or
a
categorize.
The
data
into
sections
that
have
similar
patterns
and
we
can
create
a
close
for
your
model
in
a
mode
net.
A
The
data
I'll
be
using
in
this
video
is
the
wheat
seed
data
which
you
can
find
at
angle.
This
data
has
different
attributes
of
different
wheat
seeds,
such
as
area
perimeter,
length
and
width
of
each
seed.
This
data
will
determine
if
each
seed
is
in
a
certain
variety
of
wheat,
such
as
comma
Rosa
or
Canadian.
A
Alright,
so
I'm
back
in
visual
studio
with
the
usual
dotnet
core
console
project
loaded
up
and
I
already
have
a
mode
that
it
downloaded,
and
my
data
is
in
the
solution
here.
One
thing
I
tend
to
forget:
sometimes
when
I
have
data
in
the
solution
like
this
is
to
remember,
to
mark
the
file
to
be
copied
to
the
output.
Otherwise,
the
program
won't
be
able
to
find
it.
Now,
if
you've
seen
my
other
videos
that
you
videos
that
use
em
on
its
new
AP
app,
then
you
know
what
happens
next.
A
I'll
start
by
defining
the
training
data
location
and
then
create
the
Emma
context.
Instance.
Next
to
reading
the
data
are
use.
The
same
context,
data
that
text
reader
class
that
we've
done
a
few
times
before
and
then
tell
out
how
to
read
in
the
file
and
the
separate
area
is
going
to
be
a
comma
and
it's
gonna
have
a
header.
A
A
A
Now,
since
this
data
set
already
has
the
labels,
we
can
go
through
it
and
notice
that
there
are
only
three
unique
labels,
so
we
can
specify
the
closure
count
to
be
3
here
in
berry
Mon.
You
would
not
always
have
this
in
your
data
set,
so
you
would
have
to
experiment
with
other
cluster
counts,
to
get
a
optimal
cluster,
and
now
that
we
have
the
pipeline.
I
can
call
a
fit
method
on
it
and
passing
in
the
training
data
and
to
perform
the
same
transformations
on
the
testing
data.
A
So
now
that
we
have
our
model
created,
let's
now
evaluate
on
it
to
do
that.
I
can
call
a
context
that
clustering,
that
evaluate
method
and
give
it
a
transform
to
testing
data,
then
specify
the
score.
Column
is
named
score
and
the
features
column
is
named
features
and
the
metric
I'll
be
looking
at
is
the
average
minimum
score
and
you
can
think
of
this
score
for
clustering
as
the
average
distance,
from
all
examples
to
the
center
point
of
their
cluster
and
so
the
lower
the
number
here
then,
the
better.
A
So
with
the
prediction
engine
created,
I
can
now
call
the
predict
method
on
it
and
give
it
a
seed
bit
a
class
instance
as
the
input
and
I
have
some
random
values
for
the
fields
on
this
class.
Well,
I'll
paste
that
in
and
to
see
what
cluster
this
is
predicted
predicted
to
be
and
I
can
look
at
this
selected
cluster
ID.