►
From YouTube: Natural Language Processing in ML.NET: Producing N-Grams
Description
How to produce n-grams from text data in ML.NET.
Code - https://github.com/jwood803/MLNetExamples/blob/master/MLNetExamples/NGrams/Program.cs
N-Gram article - https://blog.xrds.acm.org/2017/10/introduction-n-grams-need/
ML.NET NLP Playlist - https://www.youtube.com/playlist?list=PLl_upHIj19ZzYBP8I7l9MDQY3r6HbVxWw
ML.NET Playlist - https://www.youtube.com/watch?v=8gVhJKszzzI&list=PLl_upHIj19Zy3o09oICOutbNfXj332czx
Contact:
Twitter: https://twitter.com/JWood/
Blog: https://jonwood.co/
Gear used (affiliate links):
Mic - https://amzn.to/2YEXtxI
Mouse - https://amzn.to/2ZtASoQ
A
Hey
everyone
so
in
this
video
I
want
to
show
you
how
you
can
produce
engrams
within
in
Linette,
but
before
I,
go
to
straight
to
the
code.
I
want
to
just
very
briefly
go
over
what
engrams
are
and
how
they
are
useful.
The
more
technical
definition
is
that
engrams
are
sequences
of
n
words
together
and
n
can
be.
However,
many
numbers
that
you
want
kind
of,
usually
you'll,
see
these
as
two
or
three
grams
go
through
just
a
couple
of
examples.
A
A
A
A
Do
the
input
column
that
we're
going
to
use
it's
going
to
be
that
tokens
column
they
recruited
in
these
two
transforms
above,
and
we
can
tell
what
Engram
links
that
we
want
to
use.
So
if
we
want
to
get
background
trigrams
or
anything
else
above
and
here,
I'll
just
limit
to
two
and
to
actually
limit
Ingram
lengths,
we
need
to
use
said
the
use
all
links
to
false.
A
Otherwise
it's
going
to
get
other
links
of
engrams
and
we
can
give
it
a
weighting
and
in
here
it's
going
to
be
a
wedding
criterion
and
we
have
a
couple
of
different
choices
here
and,
let's
briefly
go
over
these
first,
we
have
TF,
which
is
the
term
frequency
that
guess
the
the
frequency
of
the
amount
of
times
the
term
is
within
the
corpus
in
the
corpus
in
terms
of
natural
language
processing
is
pretty
much
our
our
input
text
data.
So
it's
going
to
be
what
we
have
up
here.
A
They
have
IDF,
which
is
inverse
document
frequency,
which
tells
how
rare
the
term
is
within
all
within
the
corpus.
Then
you
have
tf-idf,
which
is
a
product
of
the
term
frequency
and
inverse
document
frequency.
For
this
example,
I'll
just
keep
it
a
term
frequency.
Here
we
go
now
we
have
our
in
Graham
pipeline.
We
can
fit
on
it
with
our
data
here.
A
So
now
we
have
our
data
transformed
into
engrams
here,
but
we
still
have
it
as
an
odd
data
view.
There's
some
measure
steps
we
can
do
to
actually
produce
the
engrams
themselves
here.
The
first
thing
we
need
to
do
is
we
need
to
get
the
engrams
slot
names
and
we
get
that
from
the
mo
data
namespace,
and
this
takes
in
a
reference
to
a
V
buffer
of
read-only
memory
of
characters
here.
So
let's
create
that.
A
I'll
call
it
a
slot
names,
you
know
just
set
to
the
default
value
of
this
type.
Let's
see
we
do
have
an
error
here.
We
need
to
upgrade
our
language
version
to
use
that
in
Visual
Studio.
You
can
do
that
for
us
and
then
just
give
the
ref
of
slot
names
and
from
there
I
can
get
the
engrams
call
them
using
the
gig
column
method.
A
Okay.
So
now
we
have
this
reference
to
the
engrams
column,
which
is
what
we
get
from
this
transform
up
here.
Let's
do
a
console,
dot,
write
line,
I'm
going
to
do
a
for
each
loop
for
each
row
and
our
Ingram's
column
here
and
we're
gonna
do
another
for
each
within
there
for
each
item
and
road
items,
and
then
we
just
consult
our
write
long.
A
Here
we
can
use
the
slots
item,
that
key
index
use
that
nesting
index
and
I'll
just
create
an
empty
line
for
each
row
and
then
I'll
do
console
to
read
lung
to
the
console,
doesn't
disappear
when
I
run
this
and
let's
actually
run
this
and
see
what
we
get
here
here
we
go,
so
we
got
our
engrams
from
the
first
input
and
then
the
first
and
the
second
and
put
together
here.
You
can
see
what
we
get
two
items
here,
so
we
do
get
our
backgrounds.