►
Description
Data preparation is important to get your data in a workable state for it to work in a machine learning algorithm. This video will show how to select columns, drop columns, take rows, shuffle rows, and filter rows in ML.NET to start preparing your data.
Code - https://github.com/jwood803/MLNetExamples/blob/master/MLNetExamples/DataPrepRowsColumns/Program.cs
Contact:
Twitter: https://twitter.com/JWood/
Blog: https://jonwood.co/
Gear used (affiliate links):
Mic - https://amzn.to/2YEXtxI
Mouse - https://amzn.to/2ZtASoQ
A
Yawn
so
often
in
machine
learning,
you
will
need
to
perform
some
sort
of
data
preparation
steps
on
your
data
to
get
it
ready
for
the
machine,
learning
algorithms,
and
there
are
quite
a
few
things
that
you
can
do
to
prepare
data.
So
I
have
these
as
a
series
of
videos
and
in
this
video,
though
I
start
by
showing
how
you
can
select
shuffle
and
filter
columns
in
ml
Donette.
Alright,
so
we're
here
in
Visual,
Studio
have
a
console
project
loaded
and,
as
you
can
see,
I
have
some
setup
already
done
here.
A
I
have
a
lot
to
get
packaged
already
installed
and
alright
to
create
my
context
and
I
already
loaded
in
the
data
and
reusing
that
the
housing
data
said
again
here
and
already
have
my
input
schema
nicely.
The
first
thing
I
want
to
show
is
how
you
can
select
columns,
and
these
are
the
columns
that
you
would
want
to
take
from
your
original
data
set,
and
so
your
sunlight
calls
and
select
columns
is
going
to
be
a
transform,
so
the
context
that
transforms
the
select
columns
and
we
give
it
the
column
names
of
strings.
Now.
A
One
thing
to
note
here
is
look
in
our
original
data
set.
We
have
different.
The
headers
are
kind
of
spelled
differently,
they
have
underscores
and
all
that,
but
in
our
input
schema
we
kind
of
remove
the
underscores
and
do
kind
of
even
do
some
different
casing
here
and
when
you
do
to
select
columns
since
we
load
it
in
with
this
schema
class.
Here
we
need
to
give
in
the
column
names
that
are
represented
in
this
file
instead
of
the
original
data.
A
A
So
we
can
see
our
data
pretty
easy
and
what
it
does
is
it
takes
in
the
out
data
view
from
mo
and
Annette,
and
it
calls
a
preview
function
on
it
and
we
tell
it
only
to
just
give
it
the
first
five
rows
and
in
the
preview
function
I
did
it
the
rows,
I
get
all
the
rows
and
then
I
just
print
out
the
keys
and
the
values
for
the
road.
So
it's
going
to
be
the
column
name
and
the
value
of
the
column.
I
said
what
this
transform.
A
Now
we
can
call
it
display
columns
on
that
transform
and
I'll
do
a
console.readline,
so
we
don't
need
to
do
in
your
breakpoints
or
anything.
So
let's
run
this
and
see
what
it
looks
like
all
right.
So
we
just
get
the
housing
median
age
and
the
total
bedrooms
there's
two
columns
that
we
told
to
select
from.
A
And
I'll
comment
that
out
and
next
other
job
columns.
So
instead
of
selecting
the
columns
that
you
want
to
use
in
your
algorithms,
the
drop
columns
will
just
drop
the
columns
that
you
don't
need
and
similar
to
the
Select
columns.
This
is
going
to
be
a
transform,
so
they
transform
stud
job
columns
and
again
we
just
give
it
strings
of
the
columns
that
we
did.
We
want
to
drop
and
so
I'll
drop
the
latitude
and
the
longitude
columns.
A
So
we
go
back
here
today
to
our
original
dataset,
the
latitude
longitude
or
the
first
two
items.
You
know
a
dataset.
So
if
I
run
that,
let
me
see
it
starts
at
housing
median
age
and
we
don't
see
the
latitude
and
longitude
anywhere
in
our
dataset.
So
the
next
thing
we
can
do,
and
so
we
can
actually
shuffle
our
rows.
A
So
if
you
want
to
you
kind
of
a
random
shuffle
for
kind
of
sampling
on
our
data
or
something
like
that,
we
can
call
this
shuffle
rows
method
and
what
that
is
is
not
a
transform.
So
it'll
be
on
the
context
that
data
the
shuffle
rows
and
we
skipped
our
data
and
we
have
the
option
to
give
it
a
seed
parameter
to
which
giving
it
a
seed
tells
it
to
shuffle
the
same
way
every
time.
A
A
A
We
got
the
first
two
rows
from
my
data
set
and
the
last
thing
we
can
do
is
we
can
filter
on
our
rows,
and
this
is
also
going
to
be
on
the
data
property
of
the
context
and
we
can
filter
rows
by
column.
But
I
just
want
you
to
make
note
that
there
is
a
filter
rows
by
missing
values.
We
don't
have
any
missing
values
in
this
data
set,
but
that's
something
to
make.
No
doubt
that
that
is
there.
A
If
you
need
it
and
say,
I
want
to
filter
on
the
population,
call
them
and
imma
tell
the
lower
bound
to
be
zero
in
the
upper
bound
to
be
a
thousand,
and
you
probably
tell
this
one
and
works
with
numerical
data.
You
know
we
can
display
those
good
population
here
we
got
nothing
over
a
thousand.
We
actually
look
at
our
original
data
set
here.
We
can
see
our
population
the
first
couple.
Rows
actually
has
over
a
thousand.
So
if
filtered
odd
those
rows
all
right.