►
Description
Using ML.NET to tokenize text data to get it ready for machine learning algorithms.
A
Hey
everyone,
so,
in
this
video
I'll
go
over
some
natural
language,
processing
and
and
Monnet
specifically
I'll
show
how
to
tokenize
text
data,
so
you
can
pre-process
it
and
get
it
ready
to
be
used
with
a
machine
learning
algorithm,
the
first
real
quick.
What
exactly
doesn't
mean
to
tokenize
text
data
and
when
you
do
that,
you
essentially
split
it
into
each
word
or
two
or
even
to
each
letter
of
the
input
data.
A
A
And
actually
I
forgot
to
bump
up
this
text
here.
Sorry
about
that
there
you
go.
This
should
be
better,
so
credit
context
and
we
create
the
empty
data
which
is
just
an
empty
list
of
our
text.
Data
class
and
with
that
we
just
create
the
add
data
view
from
the
empty
data
set
with
the
load
from
a
new
mobile
method.
A
And
similar
to
kind
of
creating
a
machine-learning
pipeline
and
doing
prediction
on
it
will
create
a
prediction
engine:
give
the
input
and
output
schemas
and
pass
in
the
modal
to
get
took
in
our
string
here.
We
call
it
engine
not
predict
and
get
our
text
data
and
say
text,
so
they
go
to
some
text
everyone
to
tokenize.
So
let's
see
what
we're
going
to
use
give
it
a
text
of
you
know:
annette
is
great
for
machine
learning
and
even
deep
learning.
I.
A
A
A
So
each
word
is:
got
split
out
on
its
own.
Here
you
notice.
We
still
have
some
punctuation
here
and
one
way
to
kind
of
fix
that
is
it
with
their
separators
character,
brain
we
can
pass
in
a
period
and
comma
another
point
elation
that
we
kind
of
want
it
to
to
recognize
here.
So
let's
run
this
again
and
see
what
differences
via
there
you
go
so
now
see
we
have
no
comma
here,
no,
no
punctuation,
so
you
can
use
that
separators
parameter
and
to
kind
of
handle
all
that
the
punctuation
within
your
text
data.
Alright!
A
So
that's
how
you
took
Naz
into
words.
What
if
you
want
to
talk
nas
into
different
characters
or
letters
within
your
input,
data
so
kind
of
a
similar
way
here
we
want
to
create
a
little
bit
of
a
pipeline
and
when
doing
the
same
or
kind
of
similar
transforms
that
text
tokenize.
Instead
of
into
words
with
token
oz
characters
as
keith
and
we'll
pass
in
tokens
as
our
output
and
then
text
is
our
input
and
I'll
use
marker
as
characters
just
said
that
as
false
now
long
with
that
I'll
append
another
transform.
A
Let
me
transforms
that
conversion
map
key
to
value
and
because
we
have
input
column
s
text
as
tokens
in
our
output
column
is
also
tokens.
We
can
just
pass
that
in
that's
the
same
thing.
We
do
that
because
it's
going
to
tokenize
as
keys
sort
of
them
map
those
keys
to
values.
Then,
after
that
we
do
the
fit
on
those
transform
pipe
lung,
still
not
empty
data
that
we
created
before
we
create
an
engine
and
if
they
call
it
predict
on
it,
and
you
know
what
we'll
give
it
the
same
input
here
and.
A
All
right,
so
here's
our
first
one
we're
tokenized
each
word
and
here's
our
second
one,
so
each
cater
has
been
tokenizing
into
its
own,
and
these
kind
of
a
question
mark
things
here.
This
indicates
that
there's
a
space,
alright
Oh
in
things
right,
they're,
just
going
to
show
you
how
you
can
token
us,
within
a
mode
within
a
node
net,
to
kind
of
prepare
for
some
natural
language
processing.
So
thanks
for
watching
everyone,
I
hope
you
learned
from
this
video
and
if
you
enjoyed
it,
please
like
and
subscribe,
so
you
can
get
more
content.