►
From YouTube: Raphael Dussin 2020 05 11
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
All
right
so
good
afternoon,
everybody
so
I'm
gonna
talk
to
you
about,
like
a
software
stack,
that's
becoming
very
popular,
so
I'm
gonna
talk
about
it
in
the
context
of
mom6,
since
this
is
ocean
working
world,
but
that's
something
that
you
could
use
for
like
a
lot
of
different
ocean
atmospheric
models.
So
so,
if
you
don't
know
about
x-ray,
I
hope,
you're
gonna
join
us
and
and
be
part
of
the
fun,
and
if
you
already
know
about
it,
well
I
hope
you
might
still
learn
a
thing
or
two.
A
A
What
I'm
gonna
quickly
describe
is
a
bit
of
like
the
Jupiter
ecosystem
x-ray
tsar
and
then
what
I
want
to
spend
a
little
bit
more
time
on
is
like
that
demonstration,
because
talking
about
like
python
packages
and
stuff
is
fun,
but
it's
actually
think
more
useful
to
to
use
them
and
and
see
how
they
work
for
real.
A
So
so,
first
for
those
who
are
not
familiar
with
like
the
Jupiter
ecosystem,
the
whole
thing
started
with
like
those
I
python
notebook
and
then
became
a
fusion
of
like
Julia,
bison
and
R,
and
so
that's
where
the
Jupiter
and
Jupiter
name
come
from,
and
so
the
idea
is
that
you're
gonna
use
your
web
browser
as
like
an
IDE,
and
this
ID
is
gonna,
communicate
with
like
your
interactive
python,
Julia
or
R
session,
and
so
that
server
side
over
here
can
be
running
either
on
your
local
machine
or
it
can
be
running,
unlike
like
a
remote
server.
A
That
can
be
your
analysis,
machine
or
that
can
be
in
the
cloud.
So
that's
the
idea
with
like
the
jupyter
notebook.
The
Jupiter
lab
is
just
a
more
functional
idea
that
came
after
the
Jupiter
notebook.
That's
what
I
recommend
that's
what
I
use
and
Jupiter
herb
is
just
an
addition.
On
top
of
that,
that's
going
to
be
able
to
run
like
multiple
Jupiter
sessions
for
multiple
people
like
on
your
server
and
so
at
any
car.
You
have
a
good
example.
A
So
next,
like
a
really
cool
piece
of
software,
that's
gonna
help
us
like
do
a
lot
of
science
very
easily.
Is
a
x-ray,
so
I
used
to
think
about,
like
our
data
as
being
like
ND
arrays,
and
then
manipulate
the
things
based
on
indexes
in
in
your
arrays
and
stuff.
A
What
X-ray
does
is
that
it
adds
like
labels
to
all
those
Dimensions
so
which
means
that
you
don't
have
to
think
about.
Oh,
what
is
the
index
of
my
time,
Dimension
or
special
Dimension
x-ray
knows
about
that.
So
x-ray
was
inspired
by
the
data
model
used
like
in
net
CDF.
Well,
like
a
data
set,
would
be
like
similar
that
a
net
CDA
file
and
that's
a
set
of
like
different
arrays,
but
you
can
also
build
a
data
set
that
spans
multiple
files.
A
So
you
don't
have
that
one
file,
one
data
set
restriction,
so
your
data
arrays
have
labeled,
Dimensions
label
coordinates
and
you
can
actually
use
those
labels
to
use
methods,
apply
methods
on
it
and
that's
going
to
make
plotting
or
compute
more
eye
level.
So
you're,
not
thinking
about
indexes
you
now
you're
thinking
about
dimensions,
so
I've
got
two
examples
that
are
like
pretty
simple.
One
is
just
like
a
plot
where
I
just
slice
a
piece
of
data
and
I
just
tell
it.
A
Okay,
just
show
me
show
me
that
part
and
I
don't
have
to
know
or
care
where
my
well
my
Dimensions,
how
are
my
erase
threshold
or
anything
or
same
thing
if
I
want
to
compute
like
a
like
a
climatology
I,
just
tell
it
to
average
over
the
time,
Dimension
and
everything
everything
happens.
I
don't
have
to
worry
about
it.
A
One
thing
that
I'd
like
to
point
out
since
we're
working
with
like
Ocean
Models,
is
that
if
you
have
like
multiple
set
of
coordinates
so,
for
example,
in
mom
6
you're
gonna
have
like
cell
centers
and
cell
Corners
x-ray
doesn't
know
the
relationship
between
those
coordinates.
A
So
that's
where
xgcm
become
very
handy,
because
xgcm
is
going
to
add
that
knowledge
to
X
array,
and
so
with
that
you're
going
to
be
able
to
perform
some
differentiation
or
interpolation
operations
on
a
stereo
grid.
A
So
next
is
desk.
A
A
One
thing
that's
important
to
understand
is
that
x-ray
can
work
on
top
of
either
an
Empire
array
or
vascularace,
so
numpy
arrays
are
gonna,
be
basically
Computing
eagerly.
So
once
you
type
your
operation
in
It's,
Gonna
execute
it,
whereas
desk
is
going
to
delay
it,
and
so
what
does
is
gonna
do?
Is
it's
gonna
build
like
build
a
graph
of
operation
of
stuff
that
it
should
do
and
only
wait
until
the
last
minute?
A
So
when
you
actually
ask
ask
for
like
a
result
to
actually
perform
the
operation
so
I,
don't
you
could
see
that
as
like
a
as
like
a
lazy
student
who
just
makes
like
a
big
to-do
list
and
then
wait
for
the
last
minute
when
he
has
to
present
something
to
like
do
all
the
computation
at
once
and
then
I'll
show
you
the
plot.
A
So
what's
gonna
make
the
difference
between
either
we
use
numpy
or
desk
around
the
other
hood.
Is
that
chunk
argument?
So,
if
chunks,
which
are
basically
the
beat
the
small
bits
of
data,
are
specified,
then
we're
gonna
go
into
desk
and
lazy
mode.
So.
A
When
we
are
like
in
desk
mode,
we
can
define
a
cluster,
and
so
this
is
going
to
allow
us
to
do
some
multi-threaded
operations
that
can
leverage
like
over
computing
power
that
we
have
on
our
local
cluster
or
you
can
Electro
kubernetes
clusters
on
the
cloud
or
you
can
submit
to
a
job
queue,
or
so
your
slum
or
PBS
drop
you
with
that
desk
dropped
you
on
the
package
and,
what's
really
cool
about
that.
A
Is
that
because
we're
splitting
our
computation
in
like
small
bits
that
are
like
easier
for
the
computer
to
handle,
we
can
actually
work
on
data
sets
that
wouldn't
fit
into
memory,
and
that's
that
olc
for
out
of
core
computation
that
you
might
have
heard
or
might
there
in
the
future.
A
So
one
of
the
things
one
other
thing
I
was
looking
at
too
is
a
czar.
So
Zar
is
one
of
those
new
format,
that's
also
between
becoming
popular,
that's
very
useful
for
cloud
storage
and
so
who
are
interested
in
in
seeing
if
that
would
be
a
good
solution
for
needs.
So,
first
question:
why
bother
well?
This
has
pretty
good
compression,
so
that's
going
to
save
us
a
lot
of
space,
and
so
that's
why
this
is
interesting,
so
it
was
first
designed
for,
like
Cloud
object
storage.
A
So
whether
or
not
it's
a
good
solution
for
like
more
traditional
infrastructure,
it's
still
something
we're
looking
into
what
we've
seen
is
that
are
you
gonna
get
your
data
set
in
like
small
files
or
small
chunks
matters,
so
the
rule
of
thumb
is
that
there
should
be
around
10
to
100,
Megs
and
I'm
gonna
make
another
demonstration
later
on
on
that,
and
also
they
can
be
like
different
types
of
the
store.
So
the
most
common
one
is
going
to
be
the
directory
store
on
your
left.
A
Well,
basically,
every
chunk
is
saved
as
a
file
or
as
an
object
in
the
cloud
or
you
can
also
have
like
a
zip
store,
so,
for
example,
on
the
right
so
that
that
zip
file-
and
this
is
actually
just
going
to
be
a
zip
with
like
all
the
little
chunks
in
that,
so
why
I
get
interested
in
the
in
the
zip
store
so
basically
well,
your
trunk
is
one
file.
So
if
you
take
like
a
big
simulation,
3D
monthly
fields
and
you
break
them
down
in
chunks.
A
Well
that
amounts
to
a
lot
of
files.
So
that's
that's
what
I
get
here
and
I
know
system.
The
number
of
a
nodes
is
something
that
we
have
to
be
careful
about.
So
that's
why
the
zip
store
is
actually
pretty
convenient
because
that
turns
26
000
files
into
one
and
so
then
comes
to
performance
differences
and
last
I've
checked.
I've
tried
to
do
like
the
exact
same
computation
like
zero
directory
and
zip
stores,
and
the
performance
are
similar,
so
no
loss
in
performance
zipping,
the
the
chunks.
So
so
it
looks
pretty
good.
A
One
caveat,
though,
is
that
the
zip
stores
are
not
as
commonly
used
as
a
directory
store,
so
there's
still
like
some
bugs
that
you
can
find
and-
and
we
worked
on,
like
fixing
some
of
those.
A
So
now,
let's
move
to
more
of
demonstrations,
so
first
I'd
like
to
say
that
we
have
that
analysis
cookbook
from
M6.
That
is
like
a
community
work,
and-
and
so
if
you
are
interested
in
like
X-ray
and
trying
to
work
with
x-ray,
there's
like
a
lot
of
examples
there.
If
you
want
to
contribute
some
Diagnostics
that
you
haven't
found
in
the
in
the
cookbook,
then
please
submit
a
full
request
and
we'll
be
very
happy
to
to
have
your
contribution.
A
So
just
to
give
you
like
a
little
tour,
this
is
not
very
big.
Is
it
yeah,
but
that's
like
different
notebooks,
showing
like
a
little
bit
of
everything
from
like
setting
up
your
dash
cluster,
getting
studied
time
and
space
operations,
and
then
we
move
into
like
more
advanced
topics
like
horizontal
remapping
or
flooding,
or
doing
a
comparison
with
like
observations.
A
So
this
is
that
we
also
participating
into
the
documentation
for
xgcm
that
I'm
gonna
talk
a
little
bit
more
in
the
demonstration.
So
let's
get
started
with
that,
let's
make
it
a
little
bigger
for
you
guys.
Is
that
all
right
for
you.
A
Okay,
nobody's
complaining,
so
I'm
gonna
go
ahead,
all
right,
so
I'm
gonna
load.
The
data
set
that
I've
got
here
locally,
but
next
cell.
Is
this
exact
same
data
set
on
the
thread
server,
so
you
can
play
a
little
bit
with
it
little
warning,
though,
if
you
try
to
run
with
like
the
dash
cluster,
that's
not
gonna
work
and
if
you
try
to
run
it
without
the
desk
cluster,
it's
probably
gonna
take
a
long
time.
So
that's
the
limit
on
how
reproducible
that
notebook
is
so
I'm
loading.
A
My
data
set
all
happen
instantly
because
actually
nothing
is
loaded
into
memory,
but
metadata-
and
this
is
my
data
set.
So
it's
like
some
mom6
of
degree
data.
Let's
look
at
what
I
have
I've
got
some
grid
metrics
and
I've
got
velocities
temperature
selling
it
so,
okay,
first
bad
idea
that
I
want
to
show-
and
that's
gonna-
be
a
new
prediction
to
like
broadcasting.
A
A
So
that's
one
example
of
what
you
shouldn't
do
and
that's
where
I'm
going
to
introduce
xgcm
and
where
xgcm
is
going
to
be
very
useful
in
our
case,
so
xgcn
is
going
to
tell
x-ray
what
is
the
relationship
between
those
different
variables.
A
So,
for
example,
in
my
x-axis
I'm
gonna
tell
you
tell
lgcm
that
okay
x,
h
is
going
to
be
the
center
and
xq
is
going
to
be
located
to
the
right
of
xh
and
same
thing
for
y
and
I
can
also
do
the
same
for
zero.
In
that
case,
I
know
that
I
have
one
more
zy
than
than
ZL,
so
I'm
gonna,
specify
it
as
inner
and
out
of
bounds
and
I
can
also
tell
it
that
it's
periodic
in
the
X
dimension
myself.
A
So
with
that
I
can
pretty
simply
create
my
temperature,
and
my
salinity
on
the
U
point
by
using
an
interpolation
function
over
the
axis
x
x
is
X.
I
could
do
the
same
for
V
in
that
case,
I.
Don't
really
have
a
use
for
it.
So,
let's,
let's
scroll
with
it
now,
Mario
ray
as
the
right
coordinates
it's
on
YH
xq.
So
that's
my
U
point
so
everything's
fine!
A
So
now,
let's
say
that
I
want
to
compute
some
potential
density,
so
my
potential
density
I
don't
have
a
function
for
it.
That's
x-ray,
based
but
I
know
how
to
do
it
and
I've
got
like
an
old
python
function
that
I
have
so
I
can
use
that
function,
but
there's
one
way
of
doing
it,
so
I'm
gonna
Define
that
function
do
it
same
way
that
I
would
have
done
with
like
numpy,
and
so
the
first
thing
you
think
is
like.
A
Oh,
let's
just
like
apply
my
function
on
my
on
my
data
array
in
this
case
that
works
fine.
Why?
Because
the
operations
that
I'm
doing
are
like
simple
enough
that
it's
not
triggering
like
like
I,
need
your
computation,
but
if
you're
using
like
functions
that
are
like
a
little
bit
too
complex,
this
could
trigger
like
the
computation
and
if
you
trigger
the
computation,
then
it's
gonna
trigger
it's
going
to
build
like
the
whole
data
set
and
that's
not
probably
what
you
want
and
that
might
not
even
fit
into
memory.
A
A
So
now,
let's
say:
okay,
I'm
interested
in
Denmark,
Street,
overflow,
so
I'm
gonna
take
a
look
slice
along
my
coordinate
to
have
a
quick
look
at
what
my
region
look
like
and
so
from
there
I
can
say:
okay,
I'm
gonna
take
like
23.5
as
being
my
my
longitude,
where
I'm
gonna,
I'm
gonna
cut
and
I'm
gonna
cut
between
those
two
different
latitudes.
A
So
when
I'm
doing
that,
I'm
actually
selecting
the
whole
data
set
and
you
can
show
my
transport
would
look
like,
and
that's
that's
how
it
is
so
everything
like
plots
very
quickly
because
it
just
like
takes
in
memory
the
piece
that
you
only
that
you
need
so
so
all
of
that
is
like
really
cool
to
prototype,
like
some
some
Diagnostics
and
everything
so
I
had
an
idea
of
a
dynastic.
A
That's
actually
not
a
good
diagnostic,
but
I'm
gonna
show
it
anyway,
just
for
the
just
for
the
sake
of
like
Computing,
something
let's
say:
I
want
to
take
like
the
transport
in
like
layers
that
are
like,
like
heavier
than
a
certain
density.
So
I
can
very
easily
do
that.
Taking
my
transport
and
applying
that
aware
function,
where
I
put
that
condition
that
the
density
has
to
be
more
than
27.8
and
then
just
sum
over
the
vertical
and
the
Y
dimension.
A
A
And
so
make
that
cluster
is
going
to
return
me
like
a
dashboard
and
so
now
I'm
ready
to
run
my
computation
I'm,
just
gonna
studied
and
I
can
take
a
look
at
my
dashboard
to
see
what's
going
on
actually
and
not
wait
just
to
see
like
a
line
and
now
I
can
see.
What's
what's
going
on.
So
that's
my
dashboard
in
action,
so
it
should
take
only
30
seconds
but
I'm
afraid
that
maybe
meat
is
slowing
in
the
computer
a
little
bit
but
I
should
be,
should
be
pretty
pretty
fast.
A
A
It's
not
accurate
because
I'm
using
like
mean
velocity
with
like
mean
density,
so
there's
like
a
lot
of
flow,
but
actually
I'm,
I'm,
not
capturing,
but
I
get
my
results,
and
one
of
the
things
that
I
wanted
to
highlight
is
the
importance
of
the
change.
So
I
didn't
talk
about
it
much,
but
when
I
loaded,
my
data
set
I
took
that
choice
for
my
chunk,
so
one
time
frame
and
certified
depth
level,
but
basically
there's
like
50
5
Megs
for
my
dot
chunk
and
so
I.
A
Try
like
different
combinations,
and
what
you
can
see
is
that
if
you
take
chunks
that
are
like
too
big,
you
can
really
degrade
the
performance
of
your
computation
because
you're
taking
something
that's
so
big.
It
barely
fits
into
memory,
and
so
the
computer
really
struggles
with
it.
I
don't
even
know
that
one
managed
to
go
through
and
then
there's
like
a
sweet
spot.
So
basically
anything
between
10
to
100
Megs
is
where
you
get
like
the
best
performance.
A
But
then,
if
you
take
your
chunk
being
too
small,
then
you're
gonna
start
also
integrating
your
performance
so
here.
The
second
message
is
that
it's
very
easy
to
degrade
the
performance
of
your
computation
so
always
try
to
thinking
about
the
chunks
and
what
are
the
optimal
size
and
maybe
do
a
couple
tests
before
you
deploy
something
into
production,
to
make
sure
that
you
actually
using
a
dark
skin
sweet
spot.
A
So
what
I
like
with
Jupiter
is
that
Jupiter
is
gonna.
Give
you
like
the
same
experience
whether
you're
running
like
on
your
laptop,
unlike
the
cloud
on
your
HPC
system.
So
it
always
feels
familiar.
It's
very
easy
to
prototype
your
analysis
and
then
deploy
them
in
more
production.
Workflow.
So
I'd
like
to
say
that
better
meal
is
something
that
is
very
useful
for
that
x-ray
allows
you
to
very
easily
write.
A
Some
high-level,
dynastics
and
GCM
is
also
a
companion
tool
that
is
very
useful
and
it
adds
all
that
staggered
grid
awareness
to
to
x-ray
Basque
gives
you
like
a
tool
that
allows
you
to
be
very
performant
parallel
computation,
but
be
careful
of
the
chunking
is
also
good
compression,
but
chunking
is
also
important,
and
what
we're
trying
to
do
is
like
really
contribute
to
like
Community
software
and
not
build
like
one-size-fits-all
manuality
package.
A
That
does
everything
that
we
think
you
should
be
doing,
but
instead
of
doing
that,
just
like
contribute
to
tools
that
already
exist
and
and
like
teach
to
fish
instead
of
like
giving
a
fish
kind
of.
So
that's
that's
all
for
me.
C
Yeah
sorry,
yeah
I
was
just
trying
to
find
the
button
thanks
Rob
for
that
for
that
nice
demonstration.
C
A
lot
of
these
disparate
examples
that
are
out
there
for
how
to
use
desk
and
x-ray
one
comment
and
one
question.
So
one
comment
that
I
would
have
is
or
I
guess
part
of
the
part
of
the
question.
Are
you
using
like
cmf6
output
for
these
data,
or
are
you
using
just
like
output
from
one
of
your,
like
one
of
your
custom
runs.
A
C
Because
that's
one
of
the
things
I
think
that
I've
been
trying
to
convince
Folks
up
here
at
csdma.
To
do
is
to
you
know
we
have
seamorized
data
right
team
up
six
compliant
data.
So
to
make
these
examples
a
little
more
concrete
with
you
know
you
can
just
download
it
just
from
esgf
these
metrics
and
get
it
all
working
I
think
would
be
really
helpful
to
kind
of
spread
it
and
then
the
second
question
I
had
was
the
big
thing.
C
That's
always
kind
of
been
in
the
pain
for
me
is
to
figure
out
the
right
chunks.
So
do
you
have
I
mean
like
and,
as
you
saw
so
there's
differences
in
performance.
So
do
you
have
any
kind
of
general
rules
of
thumb
about
what
chunk
sizes
just
to
start
with
so
I'm,
not
yeah.
A
Yeah,
what
I
was
saying
is
that
the
the
rule
of
thumb
here
is
like
something
between
10
and
100
Mega
megabytes
per
shank
is
usually
where
you
get
the
best
results
in
my
case,
because
you
can
see
like
on
the
notebook
the
best
performance
I
had
with
my
chunk
being
like
this
G
makes
so
I
would
say
around
those
lines.
It's
not
completely
well
known,
what's
the
best
size,
so
you
you
kind
of,
have
to
try
and
there's
a
lot
of
Trail
in
there
in
in
that.
C
D
A
So
so
the
issue
with
the
opened
up
is
that
actually,
the
dust
cluster
is
not
going
to
work
with
the
open
tab
and
I.
Think
that's
because
the
opened
up
is
serial
by
definition
and
all
the
Distributing
Computing
is
actually
not
working.
You
can
you
can
try
with
the
notebook
app
tried
several
times
to
to
get
it
to
work
with
with
the
open
dab,
and
in
that
case
that
doesn't
work.
So
that's
still
something
that's
not
working
properly.
A
But
if
you
were
to
like
compute
things
say
on
the
cloud
where
everything
is
like
stored
like
in
saw
with
like
chunks,
you
would
try
to
get
your
chunk
site
to
be
also
around
that
100
mix
so
that
you
can
fit
a
bunch
in
memory,
but
not
overload
overload
your
memory
or
like
have
two
small
chunks
that
you
need
to
make
calls
IO
calls
that
also
have
like
an
override
so
yeah.
B
Any
last
questions
for
ref
we
put
the
GitHub
that
he
has
a
GitHub
Link
in
his
presentation.
We
also
have
copied
that
notebook
to
the
the
GitHub
site
for
the
webinar
series
and
that's
on
in
the
chat
box
right
now.
So
if
you
want
to
play
with
this
later.