►
From YouTube: DFFML Overview and Status Update: 2020-03-19
Description
As of 2022-09-23 not much has changed fundamentals wise so this is still a solid overview.
- https://intel.github.io/dffml/main/examples/icecream_sales.html
- https://intel.github.io/dffml/main/examples/shouldi.html
- https://intel.github.io/dffml/main/examples/notebooks/index.html
A
All
right,
yeah,
so
sweet
all
right.
So
what
is
dffml
all
right?
The
goal
here
is
that
we've
got
this
thing
that
makes
machine
learning
easy
to
use.
It
helps
us
make
data
sets.
It
helps
us
make
machine
learning
models
and
it
makes
it
easy
to
go
use
like
those
models
once
you've
trained
them
all
right,
and
so
the
basic
flow
of
machine
learning
is:
we've
got
some
data
or
we've
got
basically.
A
Okay,
we've
got
a
problem
right
and
we
know
that
there's
some
way
that,
with
enough
number
of
examples
like
a
human
could
say
that's
what
I
bet.
The
answer
would
be
right.
Well,
we're
going
to
make
it
so
that
the
computer's
gonna
do
that
right
and
that's
machine
learning,
and
so
we
just
need
to
come
up
with
all
those
examples,
and
then
we
need
to
choose
the
right,
algorithm
or
model.
A
That's
going
to
give
us
that's
going
to
give
us
a
high
accuracy
on
this
problem.
Given
those
examples
right
and
then
once
we
have
the
right
examples
and
the
right
algorithm
that
which
produced
a
model
which
has
high
accuracy,
we
know
that
we've
basically
solved
the
problem,
and
so
that's
the
gist
of
things.
A
And
then
we
want
to
go
use
that
thing
and
while
this
should
make
it
easy
to
use
so
we've
got
three
pieces
here.
We've
got
this
dataset
generation
side
of
things
which
is
powered
by
directed
graph
execution,
and
while
a
directed
graph
is
basically
like
a
flow
chart
just
like
this,
and
so
these
directed
graphs
are
sequences
of
things
that
happen
sort
of,
like
mostly
data
scrapey
things
or
data
transformation.
A
Well,
we'd
have
some
operations,
maybe
that
would
go
and
grab
the
weather
for
a
city
given
the
month
and
we
could
correlate
the
temperature
to
the
ice
cream
sales
and
then,
if
we
get
a
new
city,
we
have
another
operation
in
this
directed
graph
execution
thing
which
generates
the
future
data,
and
one
of
our
features
is:
is
this
now
the
temperature
that
we're
generating
based
off
the
city
name
so
we're
taking
our
existing
data,
set
we're
combining
it
with
this
new
generator
thing,
which
is
going
to
generate
that
temperature
analysis
and
we
have
this
new
data
set
that
has
City
temperature
month
and
ice
cream
sales
and
that
way,
if
I
get
a
new,
so
that
would
be
like
generating
the
whole
data
set
there
and
then,
if
I
get
a
new
city
coming
in
and
somebody
says
well,
what's
what's
going
to
be
my
ice
cream
sales
in
this
city,
I
can
give
them
a
production,
a
prediction
on
that
single
record
and
that
record
would
basically
be.
A
You
know
that
that
one
row
in
that
Excel
file
or
that
database,
which
is
going
to
be
you
know,
what's
that
city
and
so
then
we
go
and
we
scrape
the
we
scrape
the
ice
cream
data
or
we
scrape
the
weather
data,
and
and
now
we
can
predict
how
many
ice
creams
we're
going
to
sell.
So
after
we've
generated
all
that
data,
all
the
example
examples
we're
going
to
feed
them
to
these
machine
learning
models
right,
and
so
the
first
thing
we're
going
to
do
is
this
model.
A
Training
and
model
training
is
basically
just
we're
going
to
give
it
all
the
examples
and
we're
going
to
give
it
the
thing
that
we
wanted
to
predict
and
then
we're
just
gonna
like
rerun
it
through
this
training
process,
which
is
different
for
each
model
until
it
comes
out
with
an
accuracy
that
we
like,
and
so
that's
the
accuracy
assessment.
A
So
then,
the
other
side
of
this
is
right
once
somebody
comes
in
and
we
have
this
model
with
high
accuracy
and
we
want
to
use
it
now,
you
know
give
me
a
new
city.
Predict
me
the
ice
cream
sales.
That's
that
prediction
using
shrink
model
and
then,
of
course
through
all
of
this,
we
have
to
store
and
load
this
data
somewhere
right.
So
all
of
this
lives
somewhere
right,
it
might
be
a
CSV
file.
It
might
be
an
Excel
file,
it
might
be
like
an
actual
database.
A
It
might
be
a
Json
file,
it
could
be
whatever
you
want,
because
we've
got
these
abstractions
and
you
can.
You
can
store
wherever
you'd
like
so
the
core
features
here,
like
we
just
talked
about,
are
machine
learning,
data
set
generation
and
data
set
storage
and
for
machine
learning.
We
can
do
that
from
the
command
line.
We
can
do
it
from
python
or
we
can
do
it
over
HTTP
with
an
HTTP
server
that
exposes
all
of
this
stuff.
A
Like
a
you,
know,
a
web
surface
and
so
for
the
data
set
generation.
This
is
where
we
we
have
that
wait
a
minute,
my
slides,
didn't
advance.
A
Thank
you
come
on
now,
all
right
there
we
go
so
for
the
data
set
generation,
that's
where
we're
using
that
directed
graph
execution
in
the
last
slide-
and
this
is
a
concurrent
execution
environment
with
managed
locking
and
what
that
means
is
we're
running
all
of
these
little
functions.
That
scrape
like
you
know.
The
temperature
like
I,
was
talking
about
and
we're
running
all
of
those
at
the
same
time.
A
Basically
well
concurrent
is
slightly
different
than
parallel,
so
they're
sort
of
running
at
the
same
time,
and
if
you
can
basically
think
of
them
as
running
at
the
same
time
and
but
for
certain
things
which
you
can't
operate
on
at
the
same
time,
then
the
execution
environment
will
make
sure
that
your
functions
don't
run
at
the
same
time
right.
So
if
two
things
are
going
to
like,
if
two
things
take
a
city
name
and
only
one
can
operate
on
a
city
name
at
a
time.
A
A
All
right,
so
we've
got
this
high
level
python
API,
which
basically,
this
is
what
it
looks
like
when
you're,
when
you're
actually
training
the
models
and
assessing
the
accuracy
and
making
predictions
we've
got,
you
know
you
basically
say
a
model
equals
whatever
model.
You
want
here's
the
features
which
is
the
input
data
that
we
want
to
train
on.
So
right,
for
example,
we're
saying
here
like
the
example.
These
are
the
examples
right.
A
Each
of
one
of
these
is
a
is
an
example
record,
and
we
want
to
have
the
model
look
at
the
years,
expertise
and
trust,
and
we
want
it
to
have
it
predict
the
salary.
So
it's
going
to
be
looking
at
this
data
and
it's
going
to
be
saying.
Okay
like
given
this
data,
try
to
predict
the
number
and
the
salary
right,
and
so
this
trains
the
model,
and
then
we
say:
okay.
Well,
let's
have
the
model.
Let's
see
what
kind
of
accuracy
this
model
is
at
and
well.
A
This
is
a
linear
model.
So,
as
you
can
see,
it's
10
20
30
40,
50
60..
We
provide
it
with
some
more
examples
that
are
consistent
with
this
linearity.
For
the
linear
regression
model
and,
of
course,
we're
going
to
get
an
accuracy
of
100
there.
Obviously,
if
we
provided
different
things,
we
would
get
a
different
accuracy,
but
for
the
sake
of
the
example,
that's
what
we're
doing,
then
we
make
a
prediction,
and
so,
in
this
case
we're
not
going
to
pass
in.
You
know
what
are
these?
A
What
what
is
the
true
salary
here
right?
We
have
just
years
expertise
and
trust,
and
we're
saying
well
predict
me
what
the
salary
might
be
and,
as
you
can
see
once
again,
all
of
it's
linear.
So
we
chose
input,
examples
that
are
going
to
give
us
the
next
in
this
sequence
here
and
that's
what
it
looks
like
basically
using
this
stuff
from
python,
and
here
oops,
it
looks
like
we
are
in
the
middle
of
this
GIF,
but
I.
Don't
know
why
it
doesn't
reset
the
GIF
on
on
changing
the
page.
A
So
here
we're
going
to
do
the
same
thing
that
we
just
did
in
Python,
but
from
the
command
line,
and
so
the
first
thing
we're
going
to
do
is
we
obviously
install
the
the
package
and
then
we
create
these
training
tests
and
predict
CSV
files,
which
are
comma
separated,
values
kind
of
like
your
Excel
file
type
thing,
there's
the
files
you
can
see,
I'm
listing
in
the
directory,
We
Now
train
the
model.
We
do
the
same
thing
or
say:
okay.
Well,
we
want
the
scikit
linear
regression
model.
A
Here's
the
features
that
I
want
you
to
care
about.
As
the
input
data
is
years,
expertise
and
Trust
I
want
you
to
predict
the
salary
and
the
source
of
your
data
is
going
to
come
from
this
training.
Csv
file
and
I
turn
on
some
Duty
bug.
Logging
to
show
you
that,
yes,
something
actually
happened.
The
input
number
of
Records
was
four.
A
You
can
just
see
the
thing
things
happen
there
and
now
we
assess
the
accuracy
change
it
to
the
test,
CSV
file,
100
percent,
and
then
finally,
we
ask
it
for
a
prediction,
and
so
when
we
do
a
prediction
here,
we're
going
to
see
the
Json
output
where
we're
looking
at
okay,
there's
the
salary
and
there's
the
confidence
in
that
prediction.
A
So
yeah.
This
is
just
sort
of
like
updates
since
October
and
we've
got
basically,
we've
got
some
unsupervised
models
which
sort
of
like
throw
things
into
different
groups,
and
then
we've
got
some
natural
language
processing
models
that
got
added.
We
switched
from
tensorflow
one
tensorflow
two
and
we've
got
this
easy
to
use
Python
API
and
we
also
have
a
better
tutorial
on
creating
your
own
models.
A
There's
this
tool
called
should
I
it's
this
metastatic
analysis
tool
the
what
we're
going
to
do
with
data
flows
and
they're.
Basically
like
these
graphs
like
this,
these
flow
charts,
and
so
you
can
just
think
of
it
like
a
call
graph
right
when
you're
calling
one
function.
So
imagine
these
ones
in
the
darker
purple
are
functions
in
that
top
package.
There
is
a
input,
so
you
can.
A
You
basically
would
get
the
input
data,
which
is
the
package
and
we're
passing
it
to
these
various
functions,
which
are
the
other
shades
of
purple
here,
and
so
those
are
every
time
you
see
an
arrow,
it's
acting
as
one
of
the
inputs
to
that
function.
Right
so
safety
check
takes
two
inputs.
It
takes.
One
of
the
inputs
is
the
package
name,
and
one
of
the
inputs
is
the
version
which
is
going
to
come
from
this
Pi
pipe
latest
package
version
function,
and
so
basically
anytime,
you
see
an
arrow
going
in.
A
That's
where
the
output
of
of
one
function
has
become
the
input
of
another
function
or
we
got
it
from
outside
the
network
or
like
we
got.
We
were
provided
it
instead
of
having
it
as
an
output
of
a
function.
A
So
then
the
nice
thing
about
these
things
is
that
so
we
can
take
operations
that
were
written
as
a
part
of
this
should
I
package
come
on.
Oh,
it
really
doesn't
want
to
change
the
slides
for
me.
A
All
right,
well,
there's
a
slide
right
here
that
it
does
not
want
to
show
in
between
these
two
I
wonder
if
I
could
get
out
of
this.
If
it'll
show
it
there,
we
go
so
basically
we
can
take
this
I
shouldn't
have
gone
back
a
slide.
It
doesn't
want
to
update
oh
well.
A
Basically,
you
saw
that
last
slide
right
how
we
had
mainly
just
this
one
without
the
long
arrow
pointing
to
the
new
one.
We
can
extend
these
data
flows
and
we
can
basically
say:
okay,
here's
all
these
functions
here,
there's
here's
how
they're
linked
together
and
now,
here's
some
more
functions,
throw
them
all
into
the
same
data
flow
right
and
now
we
talked
about
concurrent,
which
means
they're,
basically
running
all
at
the
same
time.
A
So
long
as
you've
got
your
inputs
to
your
function,
right
so
and
that's
sort
of
the
end
of
the
status
update
here
and
sort
of
gives
you
an
idea
of
of
what
we're
doing
here,
but
so
yeah,
and
then
we
can.
We
can
cover
more
of
like
you
know
what,
where,
where
I
need
your
your.