►
From YouTube: 6. Introduction to Dask
Description
From the NERSC NVIDIA RAPIDS Workshop on April 14, 2020. Please see https://www.nersc.gov/users/training/events/rapids-hackathon/ for all course materials.
A
So
the
idea
of
tasks
is
that
it
provides
us
with
a
single
byte,
an
API
which
allows
us
to
enable
parallel
processing
of
venues,
which
is
specially
important
when
you
are
working
with
large
datasets,
where
essentially,
we
become
mz--,
bound
or
compute
down,
and
to
enable
the
power
of
using
multiple
nodes
and
multiple
GPUs,
we
use
tasks.
So
what
is
dusk
so
dusk
is
essentially
a
distributed
computer
scheduler,
which
is
built
to
scale
in
Python.
A
The
idea
is
that
do
you
wanna
write
your
code
on
your
laptop
and
then
you
can
simply
scale
it
to
a
super
computing
cluster.
So,
and
the
idea
is
that
you
want
to
use
multiple
workers,
and
this
really
helps
specially
with
rapid
when
you
have
one
worker
per
GPU
model,
so
you
you
will
have
the
number
of
workers
set
to
the
number
of
GPUs.
We
have
and
wide
ask
what
are
the
advantages
that
ask
offices
a
major
advantage.
A
The
desk
offices
is
PI
data
nativity,
which
what
that
means
is
that
the
API
is
that
we
are
already
familiar
with
whether
it's
numpy
and
our
scikit-learn.
We
can
easily
translate
those
api's
across
multiple
nodes,
as
the
API
itself
almost
means
the
same,
so
the
user
kind
of
becomes
agnostic
to
the
fact
that
your
code
is
right
now
running
on
a
same
single
on
GP
or
a
single
CPU
or
across
n
nodes.
A
A
Now,
let's
look
at
how
a
numpy
array
will
look
like
when
you're
working
with
tasks
and
now
so
essentially,
we
again
will
have
multiple
partitions
here.
The
example
is
a
2d
number
array,
but
eyesynth.
It
can
be
an
N
dimensional
number
array
which
will
be
sick,
which
will
be
partition
and
since
Eve
a
scalar
becomes
a
rapper
Ilan
dosa
around
those
partitions
again,
because
when
you
wanna
go
from
numpy
to
koopa,
so
that
we
can
get
back
better
pal
ism
with
our
GPUs,
we
simply
translate
it
translate.
A
The
idea
here
is
that
we
split
out
our
execution
part
from
a
task
of
creation.
Part
in
this
example.
You
can
see
that
you
have.
If
you
wanted
to
create
a
number
area
of
sale
at
15,
we
can
be
simply
called
NP
dot
once
if
you
want
to
do
the
same
thing
in
task,
you
simply
called
need
a
scalar
or
da
dot
ones
with
size
equal
to
50,
but
there's
another
argument
here,
that's
important,
which
is
chunks
so
here
we
set
our
chunk
size
as
5.
A
What
this
translates
to
is
that
we
wanna
chunk
an
umpire.
It
is
a
of
into
each
of
length
5,
but
at
this
point,
when
we
create
X
equal
to
da
dot,
once
we
essentially
haven't
created
anything,
this
is
just
a
task
stuff
that
we
create.
So
when
we
execute
the
task
of,
but
only
its
only
then
when
these
will
actually
be
executed,
and
then
they
will
be,
they
will
come
in
memory
and
then
the
processing
will
become.
This
essentially
is
a
lazy
operation.
So
at
this
point
we
only
have
this
task
graph.
A
Now,
let's
look
at
slightly
a
small
computation
that
we
do
on
the
number
I
raised
here.
Essentially,
we
call
some
on
the
previous
array
doctor
previous
ayaat.
We
created
how
the
fastener
for
this
changes
is
that
we
now
have
a
reduction
operation
here,
so
this
reduction
will
look
similar
to
those
of
you
who
have
experience
with
math
I'd
use.
It
same
is
similar
to
the
reduction
that's
happening
here
in
task.
A
Now
I
said
so
does
essentially,
if
you
look
at
this,
the
API
is
fairly
simple
and
it
will
be
similar
to
what
you
write
in
number
then,
but
under
the
hood
does
will
create
a
complex
task
that
for
you.
So
this
is
an
example
of
a
statistical
computation
and
you
can
see
how
that
last
graph
gets
created.
A
A
Dusk
offers
us
really
nice
visual
debugging
as
a
debugging
tools,
which
allows
us
to
actually
see
how
done
scheduling
is
happening
right
now
in
this
task
graph,
which
is
essentially
a
graph
of
a
machine
learning
workflow.
We
can
see
that
so
the
red
here
signifies
the
stuff
that
we
have
to
hold
in
memory,
and
the
blue
here
signifies
the
stuff
that's
already
being
scheduled
and
is
now
is
freed
up,
so
does
a
skid?
Does
a
scheduler
essentially
handles
all
the
garbage
collection
for
us?
A
So
all
the
tasks
are
gustation
as
well
as
clear
as
well
as
maybe
management
is
all
taken
care
of,
but
the
dust
distributed
scheduler.
So
the
scheduling
essentially
will
tell
okay.
So
on
this
end
worker,
please
schedule
this
task
and
once
the
task
is
completed
and
it
the
memory
is
no
longer,
they
don't
seem
just
free
up
the
task.
Also,
the
scheduler
is
a
single
threaded
scheduler
and
it
essentially
is
responsible
for
scheduling
which
the
task
goes
to
which
graph.
A
Okay.
Now,
let's
see
how
we
actually
create
a
last
question.
Creating
essence
with
APL
remains
is
justice.
What
we
see
on
a
screen,
this
API
should
remain
constant,
whether
we
are
doing
it
on
a
local
machine,
or
we
are
doing
one
really
big
question.
Those
of
you
who
will
go
in
the
lab
after
this
will
I
have
already
experiment
and
will
see
how
we
can
create
a
local
Buddha
cluster
and
that
essentially
should
translate
as
well
as
when
you
are
doing
it
on
a
multiple
multi
node
system.
A
Very
essence,
the
way
you
switch
out
your
local
CUDA
trust
to
a
different
class.
Let's,
if
you
are
doing
this
on
kubernetes,
you
can
since
weekly
in
the
cluster
from
classic
by
using
this
statement.
An
important
thing
to
note
here
is
that
a
non
same
code
remains
the
same.
So
you
really
don't
have
to
worry
whether
you
see
you
good
it
on
your
laptop
or
you
wrote
it
on
somewhere
else
or
it's
being
deployed
somewhere
else
called
the
classification
step.
Essentially
these
four
lines
that
everything
else
down
she
just
means
the
same.
A
The
other
thing
to
note
is
a
lot
of
times
when
you're
working
on
multi,
node,
workflows,
communication
often
becomes
a
bottleneck
and
I
ask
natively.
We
use
TCP
sockets
for
communicating
a
data
between
workers
or
form
a
scheduler
to
a
worker.
What
you
see
XOR,
unified
communication
provides
us
is.
It
allows
us
to
access
any
hardware
acceleration.
We
have
on
a
communication,
so
this
may
use
in
the
line.
Hardware,
speed
hardware
like
any
link
on
InfiniBand
and
to
enable
using
ucx
on
dusk.
A
I
team,
rapids
route
python
bindings
for
ucx
ucx
natively
was
in
c++,
but
we
load
fights
and
bindings,
and
what
this
unlocks
for
us
is
that
we
can
now
use
it
with
tasks
for
python
based
communication.
Now
let
look
at
an
example
where
this
becomes
important
so
interesting
that
you
see
again.
All
of
these
will
be
available
on
our
dashboard.
You
see
in
the
lab,
and
you
can
actually
see
how
your
tasks
are
being
scheduled
as
well
as
them.
So
in
this
example,
we
have
four
workers
and
you're
doing
a
SVD
with
Cooper.
A
Arrays
are
lying
underneath,
if
you're
doing
it
just
for
TCP
like
this
blue
teal,
color
tasks
are
some
computation
against
doing,
and
the
red
here
is
communication,
as
you
can
see
here
after
our
initial
computation,
the
communication
becomes
a
bottleneck
once
the
switch
are
out
our
TCP
to
ucx.
That
bottleneck
goes
away
and
we
drained
out
our
task
time
for
25
seconds
to
17
seconds.
Why
this
is
important.
Is
that
a
lot
of
times,
I'll
work
flows
become
coming,
become
really
a
communication
bottleneck
and
with
you
see
eggs,
you
even
accelerate
that
part.
A
A
So
the
idea
of
the
of
this
presentation
is
like
we
saw
how
we
can
get
speed
ups
on
single
nodes
for
jock,
itch
I,
either
by
using
number
on
q,
pi
q,
DF
q
ml
and
number.
And
now,
when
you
wanna,
unlock
a
bigger
workflow
that
e
ID
don't
fit
on
a
single
machine
or
be
a
communication
button,
they
cannot
require
multiple
GPUs.
We
can
do
them
with
tasks
without
a
lot
of
code
changes.
A
Okay
now
and
how
do
we
get
the
software
I
think
this
should
have
been
going
at
the
us
pasture,
but
we
have
a
own
github
page
get
up.
We
are
because
Rapids
is
our
open
source
project,
so
you
can
look
at
what
in
development
on
them
and
installing
any
of
our
libraries
is
even
in
good
roses
as
simple
as
calling
installed
in
Golconda
packages.
So
you
can
look
at
anaconda,
app
itself
or
other
releases,
and
that's
all
I
have
for
you
right
now
and
I'm
open
to
any
questions
about
the
workflows.
B
A
C
1:1,
just
another
would
use
me
one
additional
input
there.
So
I,
you
know,
there's
no
MPI
for
pi
is
is
useful,
but
if
a
breeze
points
its,
they
can
be
limiting
more
difficult,
but
one
particular
difference
is
really
what
I
would
kind
of
describe
is
loose
versus
tight
coupling.
So,
like
the
MPI
parallels
parallelism
model,
it's
like
pretty
tightly
defined
and
that's
that's
fantastic.
It
provides
a
lot
of
benefits
through
its
guarantees.
C
Tasks
has
a
very
flexible
relationship
with
parallelism
and,
as
a
result
of
that,
it
means
you
can
use
desk
in
areas
where
you
might
not
be
able
to
as
cleanly
use
MPI
and
when
you
can
use
both
of
them
in
certain
areas
can
perform
extremely
well
still.
So
there's
a
lot
of
flexibility
that
provides
out
of
the
box.
That
would
be
very
difficult
to
consistently
and
cleanly
do
with
a
different
paradigm.
B
C
Yes,
we
do
rather
than
talk
about
that
here,
because
I
think
that's
a
really
deep
conversation
that
might
be
with
a
follow-up.
Perhaps
later
I
can
instead
say
that
there
is
actually
an
example
of
this
and
there's
a
library
in
the
desk
repository
on
github
called
ask
MPI,
which
is
something
that
I
encourage
you
to
look
into,
and
perhaps
we
can
chat
about
that
later.
Okay,.