►
Description
Using ML.NET to remove stop words in text data.
Code - https://github.com/jwood803/MLNetExamples/blob/master/MLNetExamples/StopWords/Program.cs
ML.NET NLP Playlist - https://www.youtube.com/playlist?list=PLl_upHIj19ZzYBP8I7l9MDQY3r6HbVxWw
ML.NET Playlist - https://www.youtube.com/playlist?list=PLl_upHIj19Zy3o09oICOutbNfXj332czx
Contact:
Twitter: https://twitter.com/JWood/
Blog: https://jonwood.co/
Gear used (affiliate links):
Mic - https://amzn.to/2YEXtxI
Mouse - https://amzn.to/2ZtASoQ
A
Hey
everyone,
so,
in
the
last
in
Multan,
a
video
that
we
did,
we
went
over
a
tokenization
for
natural
language
processing
to
continue
that
natural
language
processing,
type
of
pre
processing
for
text
data
I'll
go
over
how
to
remove
stop
words
from
your
text.
Data
now
stop
words
are
words
that
just
for
us,
as
as
people,
we
use
them
for
extra
context
to
what
we're
saying
to
help
people
understand
what
we're
what
we
mean
when
we're
talking,
but
for
machines.
Stop
words,
just
add
extra
noise.
A
They
don't
really
mean
anything
in
terms
of
all
the
text,
so
we
removed
those
we
removed
that
noise
and
just
leave
out
the
words
that
the
Machine
actually
cares
about
and
what
we
actually
care
for
the
machine
to
train
on
for
our
natural
language
processing.
Alright,
so
I'm
in
Visual
Studio,
here
I
have
a
dotnet
core
console
project.
Loaded
and
I
already
have
something
set
up
here:
I
have
a
node
on
net
installed
using
virtual
1.3.1
and
I
have
a
couple
of
video
classes
here,
the
text
data
class.
A
A
A
So
we
can
do
the
same
thing
that
we
did
before
in
the
previous
videos
over
use.
The
context
transform
stat
text
that
tokenizer
in
the
words
and
we
use
the
tokens
as
our
output
column
and
the
text
is
our
input
column
and
we
said
the
separators
like
we
did
before.
We
said
it
as
a
new
string,
new
character
array
and
would
do
a
space
period
and
a
coma.
A
But
after
we
get
the
tokens
we
can
append
on
to
this
pipeline,
and
this
is
where
we
can
use
the
transforms
again
and
this
time
on
the
text
property.
We
can
remove
default,
stop
words,
and
in
here
we
can
give
it
the
same
output,
column
of
tokens
and
the
same
input
call
on
the
tokens.
Since
the
tokens
is
the
upper
column
of
the
previous
transform.
A
Then
we
can
perform
kind
of
prediction
on
it
where
it
runs
those
transforms
on
some
input
data,
so
here's
the
engine
that
predict
on
it
and
we
create
a
new
text
data
class
and
give
it
some
input
text
to
perform
the
pre-processing
on
and
let's
see
we
can
do.
This
is
a
test
sentence
and
it
is
a
good
one.
Just
do
that
as
an
example
there
and
like
the
previous
video
I,
have
this
print
tokens
method,
that
kind
of
there's
a
helper
method
to
print
out
the
tokens
to
the
console.
A
A
So
here's
the
text
tokens
for
that
and
let's
run
it
and
see
what
we
get
and
I
forget
to
do
that.
Console.Readline
so
I'll.
Add
that
real,
quick,
that's
run
this
again
in
our
console.
We'll
stay
up
this
time,
all
right,
and
so
we
see
you
only
get
three
words
back
here
and
test
sentences
and
good
and
everything
else
in
Medinah
considered
a
stop
word,
so
it
got
rid
of
it.
So
that's
how
you
can
use
the
kind
of
the
built-in
default
stop
words
that
mo
dinette
provides.
A
A
The
previous
predict
that
we
had
before
just
a
new
engine
I'll
keep
the
same
text
as
it
was,
and
I'll
print
out
these
tokens
and
just
run
this
and
see
what
we
get
all
right.
So
that
came
back.
We
see
we
get
the
original
results
with
the
default,
stop
words,
but
in
the
second
set
we
get
the
set
where
we
just
remove
our
custom
Stoppers,
which
was
just
two
words.
A
So
we
get
a
lot
more
data
back
here
and
not
only
just
using
the
default
built-in
stop
for
a
dictionary
that
comes
with
a
net,
but
also
how
you
can
remove
your
own.
Stop
words
in
case
you
just
want
to
use
a
small
subset
of
a
list
of
stop
words,
and
so
thanks
for
watching
and
we'll
see
you
all
next
time.