►
Description
The 27th Annual CESM Workshop will be a virtual event. Specifically, the Workshop will begin with a full-day schedule on 13 June 2022 with presentations on the state of the CESM; by the award recipients; and two presentations from our invited speakers in the morning, followed by order 15-minute highlight and progress presentations from each of the CESM Working Groups (WG) in the afternoon.
To learn more:
https://www.cesm.ucar.edu/events/workshops/2022/
A
B
B
All
right
I'll
give
people
another
minute.
A
minute
to
join,
looks
like
people
are
trickling
in
I'll
give
people
another
minute
or
so
before
before
we
dive
in.
B
Alright
go
go
ahead
and
get
started
welcome
good
morning
and
welcome
to
the
software
engineering
working
group
session.
I'm
one
of
the
co-chairs
of
the
software
engineering
group
bill
sachs
from
ncar
cgd.
The
other
co-chair
is
lisa
bernarde
from
noaa.
Do
you
want
to
say
a
brief
hello
for
those
who
don't
know
you.
A
B
B
We
can
use
this
time
to
have
a
more
extended
discussion
that
emerges
from
any
of
the
talks
this
morning
or
really
to
discuss
any
other
topics
that
the
people
would
like
to
discuss
in
this
group
and
then,
following
that,
we
we
are
planning
to
keep
the
zoom
session
open
from
12
until
one
all
the
way
until
one
o'clock
mountain
time
for
just
a
sort
of
informal
networking
and
conversations
over
lunch.
So
I
hope
you'll
stick
around
and
join
us
it's.
B
You
know
it's
always
a
little
hard
having
these
remote
meetings
so
hoping
to
have
some
some
recreation
of
some
actual
socializing
at
that
time.
So
we
can.
We
can
catch
up
and
chat
just
a
few
logistics.
The
meeting
is
being
streamed
to
youtube
and
recorded
depending
on
how
whether
there's
sort
of
interesting
things
in
the
chat.
We
might
also
record
the
chat,
in
which
case
I
believe
that
also
records
private
chat.
B
So
don't
put
any
don't,
arrange
your
next
date
in
in
the
private
chat
here
for
questions
you
can
either
use
the
raise
hand,
feature
on
zoom
or
type
your
question
or
comment
in
the
chat.
I
know
people
who
are
on
here
with
with
the
web-based
interface
to
zoom.
I
don't
think
have
the
raised
hand
feature.
So
you
can
use
the
chat.
If
you
have
a
question
for
the
speakers
and
then.
B
Warning
at
the
16-minute
mark,
though
you
can,
let
us
know
if
you
want
a
kind
of
earlier
warning
as
well
and
then
one
more
thing
we
are.
Some
of
us
are
going
to
go
this
evening
to
the
raybach
collective
in
boulder
meeting
around
5
15
pm
to
just
socialize
have
some
dinner,
and
so
we
hope,
if
you're,
local
and
able
to
we
hope
you
can
join
us
for
that.
B
We
hope
to
see
some
of
you
there
and
then
finally,
just
to
remind
you
of
the
ncar
code
of
conduct
that
applies
during
this
meeting
and
all
other
meetings.
So
please
try
to
offer
constructive
feedback
share.
The
air
acknowledge
teamwork,
encourage
innovation,
show,
appreciation
and
consider
new
ideas.
A
A
C
All
right
great,
so
I
think
that,
given
the
variety
of
stuff
I'm
talking
about,
if
there
are
questions
in
the
chat
that
are
relevant
to
a
given
topic
and
there's
some
confusion,
please
just
stop
me.
I'm
happy
to
answer,
because
I
do
have
some
more
time,
and
so
we
can
spread
out
some
of
the
questions.
Sometimes
it's
hard
to
get
them
all
in
at
the
very
end.
So
what
I
want
to
talk
about
is
what
I
believe
is
the
transformative
capabilities.
At
least
this
is
in
my
opinion,
of
the
new
coupling
infrastructure.
C
We
have
to
enable
new
science
to
be
done
with
csm
and
also
to
enable
completely
new
collaboration
with
modeling
efforts,
not
just
in
the
united
states
at
noaa,
but
also
with
two
new
europe.
Two
european
efforts
that
are
also
using
csm
and
I'll
have
some
details
of
this
coming
up.
So
the
outline
is
very
terse.
There's
a
lot
of
information,
that's
coming
in
each
of
these
bullets,
so
the
first
thing
I
want
to
talk
about
is
how
we
communicate
information
between
components
with
this
new
coupling
infrastructure.
C
I
think
some
of
you
have
already
seen
some
of
these
presentations,
but
I'll
be
talking
about
some
of
the
new
milestones
we've
achieved
with
this
new
coupling
infrastructure,
which
is
called
cmax
and
I'll.
Try
to
get
all
the
acronyms
to
fund.
C
One
thing,
I
think,
is
extremely
important
is
enabling
an
infrastructure
that
has
the
capability
to
do
what
I
call
hierarchical,
modeling
development,
which
is
the
ability
to
selectively
turn
off
feedbacks
and
isolate
the
parts
of
the
system
that
you're
most
interested
in
developing
with
your
science
and
that's
at
least
to
some
extent,
enabled
by
the
data
models
we
chat.
C
We
have
the
new
data
models
which
have
which
are
the
community
data
models
for
earth
prediction
systems
or
c
depths,
and
then
I'll
wrap
up
with
what
we're
trying
to
do
to
address
the
challenges
of
ultra
high
resolution
simulations,
particularly
our
collaboration
with
earthworks
right
now
and
I'll
summarize.
Finally,
with
the
new
esme
release,
that's
the
earth
system,
modeling
framework
that
we're
using
that
addresses
some
of
the
bottlenecks
that
we
found
when
we
tried
to
start
up
these
ultra
high
resolution
simulations.
C
The
first:
how
do
you
enable
the
effective
communication
of
components
with
each
other?
So,
as
you
all
know,
the
csi
model
has
what
we
call
a
hub
and
spoke
architecture,
components
exchange
boundary
data.
Currently,
it's
it's
still
two-dimensional
boundary
data
with
a
central
mediator,
which
is
this
hub
called
the
new
opsi
community
mediator
for
earth
prediction
systems,
and
that
is
that
hub,
the
cmips
is,
is
targeted
to
be
able
to
grid
data
from
a
set
of
source
component
to
your
target
component.
C
Merge
that
data
using
the
fractions
that
are
needed
to
merge,
say
data
from
ice
ocean
and
land
to
the
atmosphere
as
an
example,
and
also
it
carries
out
atmosphere,
ocean
flux,
calculation,
downscaling
of
of
data
from
the
land
ice
component
and
so
on.
The
key
part
is
that
to
be
able
to
communicate
with
c-maps,
each
component
has
what's
called
the
new
opposite
cap,
which
is
just
a
very
lightweight
translational
layer
that
takes
the
data
structures
in
the
source
component.
C
Translates
it
to
esmf
neuopsy
and
then
the
mediator
connect
on
it
with
via
sending
it
via
the
connectors
these
these
lines
here
and
then,
when
you
get
data
back
to
your
target
component,
you
need
to
translate
the
data
structures
from
the
smf
nuopsi
back
to
your
data
structures
for
your
component
and
that's
all
that
the
cap
does
sorry.
I
went
so.
This
is
a
very
busy
slide,
but
it
contains
a
lot
of
the
updates
that
we've
done
so
in
this
slide.
C
What
I'm
trying
to
show
is
that
all
of
the
these
are
the
prognostic
components
that
are
being
used,
not
just
by
csm
but
other
modeling
efforts.
So
if
you
look
at
csm
and
you
look
at
what
the
components
are
for
csn,
it
has
size
six,
six
wave
watch
three
season,
mozart
for
the
river
and
a
new
component
called
mizzy
route,
cam
and
ctsn,
and
where
what
you
see
here
in
the
gray,
shaded
ovals,
are
that
those
components
size
six
month,
six
and
now
wave
watch
three
are
actually
being
shared
directly
with
noaa.
C
The
other
thing
to
point
out
is
that
the
other
bullets
in
color,
such
as
the
ocean,
empass,
o
nemo
and
blum,
are
really
being
used
by
other
modeling
efforts
and
paso.
The
empass
ocean
is
being
used
by
earthworks.
Nemo
is
being
used
by
cmcc
in
europe.
Blum
is
being
used
by
norius
sun
is
norway
and
they're
using
this
exact
same
coupling
infrastructure.
C
Similarly,
the
bullets
in
red
for
ufs
atm
and
the
land
model
noaa
mp
and
lm4
are
land
specific
components
that
are
being
used
by
noaa.
So
what
you
have
here
is
a
coupling
infrastructure
that
truly
enables
interoperability
between
several
modeling
efforts,
and
there
is
interest
to
go
past.
This
particularly
there's
interest
from
the
navy
to
potentially
use
c-maps
in
their
modeling
system.
C
The
key
point
that
is
different
between
neuopsy
c-meps
and
our
all-coupling
infrastructure
infrastructures.
That
cmaps
is
a
component
that
is
identical
in
functionality
to
say
the
atmosphere
component.
It
is
its
own
unit
and
can
be
shared.
What's
different
and
I
haven't
shown
on
the
slide
is
the
drivers
that
drive
this
whole
system?
Those
drivers
are
very
lightweight
and
they
drive
the
temporal
evolution
of
the
system
and
those
are
model
specific.
C
So
the
only
difference
between
these
various
component
utilizations
is
the
drivers
that
are
there
and,
of
course,
what
we
also
have
is
the
ability
and
I'll
talk
about
this
a
bit
later,
to
swap
out
prognostic
components
for
data
components
in
these
various
scenarios,
and
that
enables
you
to
selectively
turn
off
feedbacks
in
the
system
and
that
is
very
powerful
for
being
able
to
do
hierarchical
model
development
at
the
component
level,
where
you
will
drive,
say
cam
with
a
data
ocean
or
a
slab
ocean.
C
So
we
have
written
a
completely
new
set
of
data
models
that
I've
done
briefly
last
year
called
the
community
data
model
for
earth
prediction
systems
that
is
neopsi
compliant,
and
that
is
siemens
compliant,
and
that
has
a
lot
of
new
functionality
that
will
be
very
powerful
both
for
sharing
with
other
modeling
efforts
and
for
also
being
used
within
the
component
to
ingest
forcing
data.
C
So
what
are
some
of
the
benefits
of
cements
for
one?
It's
easier
to
introduce
new
grids,
one
of
the
things
that
esmf
neuropsy
does.
That
is
a
real
big
change
in
the
system
is
that
it
does
online
regridding.
It
creates
the
route
handles,
so
the
mapping
weights
between
a
source
component
and
a
destination
component.
It
creates
those
at
runtime
in
parallel
so
before,
when
we
were
using
our
old
coupling
infrastructure
for
a
fully
coupled
indus
pre-industrial
run,
you
needed
to
create
offline,
25
mapping
files
for
a
set
of
combinations.
C
Now
we
are
down
to
four
and
the
reason
it's
four
is
that
these
are
runoff
to
ocean
mapping
files
that
need
to
be
customized
and
at
some
point
in
the
future.
Eso
map
is
very
interested
in
creating
those,
but
imagine
all
of
the
overhead.
You
need
to
do
not
just
to
create
the
mapping
files
but
to
put
them
into
the
scripting
system
so
that
you
would
know
so.
It's
disk
space,
it's
scripting
complexity,
and
so
it
made
the
generation
of
new
grids
extremely
cumbersome.
C
Another
new
addition
is
that
when
you
introduced
a
new
grid,
you
also
needed
to
create
offline
land
fraction
files,
because
the
basically
the
fraction
that
is
land
on
the
land
grid
is
obtained
by
mapping
the
ocean
mask,
which
is
one
or
zero
conservatively
using
first
order
interpolation
to
the
land
grid,
and
so,
in
addition
to
creating
the
mapping
files.
You
also
needed
to
do
this
so
before
each
new
component
grid
required
generating
offline
fraction
files
again
putting
them
in
the
scripts
figuring
out
how
you
were
going
to
access
them.
C
So
now
what
we
have
is
inside
both
cdeps
and
the
ctsm
cap.
These
land
fraction
files
are
generated
at
runtime
by
conservatively
remapping
the
ocean
mask
to
the
land
fraction,
and
so
you
no
longer
have
to
do
this
again.
A
really
great
simplification.
C
This
is
hot
off
the
press,
but
this
doesn't
relate
directly
to
cmips,
but
it
relates
to
sorry.
I
don't
know
why
this
is
happening.
C
Okay,
it
relates
to
the
ability
to
introduce
new
grids
by
having
a
completely
new
land
surface
data
set
generation.
C
So
before,
when
you
introduced
a
new
grid
for
the
lan,
you
need
it
as
part
of
that
process
to
create
17
offline
mapping
files,
and
this
speak
and
then
use
those
mapping
files
into
a
surface
dataset
generation
code
that
only
ran
on
one
processor
as
an
example,
as
we
were
going
to
higher
resolutions,
this
took
more
and
more
time.
One
metric
is
that
it
took
over
two
days
to
generate
a
surface
data
set
at
7.5
kilometers
for
an
impasse
grid.
C
Now
we
have
a
completely
new
land
surface
data
set
generation
tool
using
esmf,
neuoxy
and
parallel
io,
which
has
totally
changed
this,
so
we
can
create
the
same
empass
surface
data
set,
but
in
10
minutes,
so
you
do
need
to
use
a
lot
more
processors,
but
you
now
have
scalability
and
basically
some
of
the
new
raw
data
sets
that
encompass
the
17
before
offline
mapping.
C
Files
are
getting
at
higher
and
higher
resolution,
so
one
of
the
challenges
we
had
is
that
how
could
you
have
mapped
a
30-second
soil
texture
data
set
you
that
have
724
million
points
to
create
a
new
surface
data
set
that
you
couldn't
before
you
were
reaching
a
bottleneck.
Now
we
can
do
it
so
also.
C
We
are
leveraging
an
esmf
dynamic
masking,
which
is
a
pro,
which
is
this
great
capability
that
can
be
used
not
just
for
mapping,
but
also
for
doing
things
like
calculating
surface
standard
deviation
of
surface
elevation
statistics,
so
I
think
those
three
things
are
are
outlined
where
I
think
that
esm
and
new
opposition
steam
maps
are
real
game
changers
for
what
we
can
do.
C
C
This
is
very
extensible,
user,
friendly
and
easy
to
see,
because
all
of
the
complexity
is
hidden
in
the
esmf
library,
and
so
not
only
can
you
run
greenland
and
antarctica
in
one
simulation
now,
but
we
have
the
capability
in
cements
to
add
an
arbitrary
number
of
ice
sheets
that
can
be
coupled
at
runtime
now,
and
we
have
carefully
validated
this.
A
lot
of
this
work
has
been
done
by
close
collaboration
with
bill
sachs,
and
so
I
think
again,
bill
lipscombe
and
his
group
are
extremely
excited
to
be
able
to
do
this.
C
C
So
you
only
need
to
pass
in
one
field
from
the
ocean
to
the
c-meps,
where
the
multiple
levels
are
contained
in
what
is
called
ungraded
dimension,
and
you
have
this
new
antarctic
ocean
landings
capability.
C
Another
able
to
do
is
do
computational
efficiency
using
this
new
esmf
managed,
threading
and
basically
the
the
idea
is
that
if
you
had
a
component
that
was
threaded
four
ways
and
another
component
that
was
not
threaded
before.
If
they
had
to
share
the
same
nodes,
then
the
second
component
component
b
could
only
use
one
quarter
of
the
nodes
in
the
node.
So
that
led
to
a
lot
of
idle
cores
and
poor
hpcc
resource
utilization,
so
esmf
last
year
has
introduced
a
completely
new
capability
called
managed
threading.
C
But
I
want
to
point
out
that
it's
there
and
it's
a
high
priority
for
us
to
try
to
get
working
out
of
the
box
as
we
move
forward
in
terms
of
atmosphere,
ocean
flux,
calculation,
we
have
a
completely
new
capability
that
I
think,
will
greatly
facilitate
scientific
development,
particularly
if
the
ocean
and
atmosphere
icebridge
are
at
two
different
resolutions
and
that's
using
what's
called
the
exchange
grid.
So
basically,
the
exchange
grid
is
simply
the
union
of
a
target
and
destination
grid.
C
So
what
does
this
imply
so
thanks
to
adam
harrington,
for
showing
how
this
worked
initially
in
a
tropical
cyclone
case,
and
the
main
idea
in
tropical
cyclone
tests
is
that
what
you
want
to
have
when
you
look
at
the
stresses
between
the
atmosphere
and
ocean
when
you
are
running
a
very
high
resolution,
atmosphere
say
a
quarter
degree.
Spectral
element
with
a
one
degree
ocean
is
that
you
want
the
stresses
to
align
with
the
winds
and
what's
happening
here.
C
What
you
have
here
is
an
excellent
example
of
the
latent
heat
flux.
That's
mapped
to
the
atmosphere
grid
when
you
do
the
calculation
on
the
x
grid,
and
you
can
see
here
on
the
left
when
you
do
it
on
the
ocean.
You
get
a
very
blocky
structure,
so
this
is
now
in
cmeps.
Extensive
simulations
were
done,
including
coupled
hundred
year
runs,
and
we
have
been
given
the
go
ahead
to
have
this
be
the
default,
and
that
is
what's
gonna
happen.
C
So
I
just
want
to
add
that
it's
not
just
us.
That's
looking
at
the
exchange
grid
on
noaa,
the
seasonal
system
is
also
has
explored
using
the
exchange
grid
because
currently
they
were
doing
their
atmosphere,
ocean
flux,
calculation
in
the
atmosphere,
and
this
enables
you
to
actually
bring
it
into
the
mediator
parallel
to
what
we're
doing
in
csm.
C
But
in
addition,
what
tarantula
has
done
is
that
he
has
brought
in
the
ability
to
use
different
science
for
calculating
the
atmosphere,
oceanflux
calculation
using
this
and
treating
the
mediator
at
c-max
as
a
host
model
for
the
ccpp
framework.
C
Dom
will
talk
about
this
a
bit
later,
but
what
this
enables
you
to
do
now
is
have
various
atmosphere,
ocean
flux,
calculations
that
can
be
explored
all
using
the
exchange
grid,
which
is
the
correct
grid
to
use
both
for
refined
grid
and
for
regardless
of
the
atmosphere,
ocean
flux
resolutions
when
you're
doing
it
on
on
two
different
grids
that
are
very
different,
even
if
they're,
not
refined
and
finally
we're
looking
at
new
capability
in
size.
C
C
We
are
sharing
this
code
with
emc
and
it's
going
to
be
the
default
in
upcoming
csm3
development
tags.
So
this
brings
in
not
just
the
capability
of
having
provenance
and
collaboration
and
new
capabilities,
but
the
ability
to
explore
new
functionality
that
we
simply
couldn't
do
before
with
our
old
wave
watch
code.
C
In
addition,
we're
looking
at
new
wave
watch
three
size
coupling,
because
you
need
that,
because
the
wave
field
to
break
the
sea
ice
into
small
flows,
cs
concentration
then
feeds
back
to
the
wave
fields,
and
so
what
we're
trying
to
do
is
send
25
new
spectral
data
fields
from
the
wave
to
size
and
with
the
smf
new
office.
You
don't
need
to
send
25
separate
fields.
You
can
actually
pack
all
of
these
into
one
field
that
has
what's
called
a
distributed
dimension
for
all
the
25
fields.
C
So
again,
this
is
just
another
way
of
greatly
simplifying
coupling
that
would
have
been
much
harder
to
do
before.
And
finally,
this
is
just
very
brief.
We
are
supporting
csm
to
do
dart,
ensemble
column,
filter
data
simulation.
C
I
haven't
shown
that
yet,
but
that
is
being
worked
on
to
validate,
but
at
the
same
time,
we're
working
with
the
jedi
group,
which
has
a
new
operational
data
simulation
system
to
create
a
layer
on
top
of
the
neopsi
driver.
That
would
enable
coupled
data
simulation
via
sending
three-dimensional
states
to
the
jetta
model
and
thereby
permitting
a
path
forward
to
do
data
assimilation.
C
C
C
The
key
point
is
that,
with
these
data
models-
and
this
was
true-
with
our
old
data
models-
you're
able
to
ingest
multiple
data
sources
where
each
have
completely
different
spatial
and
temporal
resolutions
and
also
customize
how
they
are
handled.
The
key
point
is
that
all
data
is
read
in
with
parallel.
I
o
and
now
you
can
easily
adjust
both
2d
and
3d
forcing
fields.
C
C
What's
also
new
is
that
we
are
trying
to
put
all
we
have
an
online
interface
for
cdx,
where
the
prognostic
components
can
call
the
share
code
in
cdex.
That
does
the
time
interpolation
online,
regretting
directly
from
the
its
component
and
that's
starting
to
be
increasingly
used
throughout
the
system
say
for
getting
nitrogen
deposition,
read
in
in
the
atmosphere,
new
oxycap
and
so
forth,
and
extensively
in
ctsm
already.
C
So.
The
collaboration
with
in
noah,
who
is
using
cdx
now
in
its
operational
system,
is
that
we
have
brought
in
a
whole
new
set
of
data
forcings,
particularly
for
the
atmosphere
that
test
and
and
enable
cdx
to
run
with
many
new,
forcing
scenarios
that
simply
we
couldn't
do
before.
So
this
is
the
strength
of
having
community
software
that
is
used
by
more
than
one
modeling
system.
They
bring
new
functionality
and
test
it
in
totally
different
scenarios.
So
what
you
have
is
a
much
more
extensible
and
robust
system.
C
C
So
you're
not
running
the
empass
atmosphere.
You're
running
the
ampas
dynamical
core
in
cam,
the
empas
ocean
is
what's
going
to
be
used
in
terms
of
being
able
to
run
in
the
end
pass
grid.
We
have
an
empath
sea
ice
model
that
is
analogous
to
size
6,
but
it
has
a
different
dynamical
core.
It
will
use
ctsm
and,
of
course,
cmax.
So
now
we're
taking
cmips
and
exercising
it
in
ultra
high
resolution
configurations,
and
we
ran
into
some
problems
when
we
started
doing
this,
which
is
what's.
C
A
C
Thank
you
very
much
so
esmf
had
we
encountered
problems
with
memory
scalability
in
terms
of
ingesting
the
grid
for
this
ultra
high
resolution,
and
we
ran
into
several
other
memory
bottlenecks,
and
we
worked
very
closely
with
the
esmf
group
that
created
an
esmf
a3
release
on
june
8th
and
basically
the
main
features
of
this
release
that
were
applicable
to
us
is
that
we
now
have
a
scalable
mesh
creation
from
the
file.
C
This
was
done
very
closely
in
collaboration
with
jim
edwards
and
bob
from
cseg
and
vavonky
from
the
esmf
group,
so
we
now
have
scalable
mesh
creation
and
the
parallel
io
library
that
is
used
inside
esmf
was
an
old
pio
library.
It
has
now
been
migrated
to
pio2
and
lots
of
other
memory.
Scalability
has
also
been
addressed,
and
so
suddenly
we
can
now
run.
7.5
kilometer
runs
with
no
problem
and
we're
looking
at
now.
C
C
I
want
to
really
thank,
I
didn't
mention
that
detailed
list
of
people
that
have
helped
in
this,
because
the
list
was
getting
very
long,
but
I
do
want
to
have
a
call
out
to
the
esmf
core
team,
jim
edwards
tarantula
from
esmf
and
rocky
dunlap,
who
really
really
and
bill
sacks
who
stepped
up
to
get
some
of
these
features
in
place.
A
Thank
you
so
much
mariana,
so
we
have
time
for
one
or
two
quick
questions.
So
if
you
have
a
question,
please
type
it
on
the
chat
or
raise
your
and
then
you
will
be
able
to
unmute
and
ask
your
question.
A
C
C
I
think
the
answer
is
yes,
but
I
have
to
check
with
rocky
I'm
trying
to
understand
I'm
trying
to
remember
the
limitations
mesh
that
that
are
limited,
but
I'll
get
back
to
you.
On
that
frank.
A
All
right
any
additional
questions.
D
Yeah,
I
just
wanted
to
clarify
something
that
marianna
said
earlier
about
the
esmf
aware
threading
that
does
work
out
of
the
box.
It's
ready
to
go.
C
C
A
Okay,
I
see
steve
has
his
hand
up,
but
we
are
at
time
so
I'm
going
to
ask
steve
to
move
his
question
to
the
general
q
a
session
at
the
end
and
we're
going
to
move
on.
Thank
you
mariana
and
our
next
speaker
is
dom
heinzeller
and
he's
going
to
talk
to
us
about
the
common
community
physics
package,
ccpp
update
so
marianna.
You
can
stop
sharing
I'm
trying.
A
C
A
You
can
share
yours
all.
E
Works
good
now,
thanks
all
right
so
good
morning,
everyone
on
behalf
of
the
ccp,
the
ccpp
developer
team
at
the
dtc.
I
want
to
give
you
a
brief,
update
and
summary
of
the
common
community
physics
package.
E
I've
talked
to
this
audience
before,
but
to
refresh
your
memory,
this
is
a
bit
of
a
history
of
the
ccpp.
The
work
on
it
started
in
2017
to
develop
the
future
ufs
infrastructure
for
atmospheric
physics.
So
the
goal
behind
this
effort
was
to
facilitate
the
improvement
of
physical
parameterizations
and
the
transition
from
research
to
operations.
E
The
idea
is
to
make
it
easier
to
add
new
schemes,
modify
them
or
transfer
them
between
the
different
models.
So
ccpp
consists
of
three
different
packages:
an
infrastructure
component,
the
framework,
a
library
of
compliant
parameterizations
to
physics
and
then
comprehensive
documentation,
and
here
on
this
image
on
the
right
from
rocky,
you
can
see
the
ccpp
in
the
unified
forecast
system
and
we'll
discuss
this
in
a
little
more
on
the
next
slide.
E
Oh,
what
I
wanted
to
say
is
also
that
nowadays,
there
is
also
experimental
mode
to
run
chemical
parameterizations
in
the
ccpp,
not
just
physical,
parameterizations,
so
ccpp
itself.
The
framework
itself
is
not
part
of
the
actual
model
code.
It's
more
like
a
code
generator
also
referred
to
as
a
data
broker
that
relies
on
documented
interfaces
for
both
the
host
model
and
each
of
the
physics
schemes.
So
we
refer
to
these
as
metadata
tables,
so
at
build
time
the
framework
parses
the
tables
and
connects
the
variables
by
one
of
the
attributes,
the
ccpp
standard
name.
E
It
then
generates
fortune
interfaces
between
the
host
model
and
the
physics,
and,
as
this
figure
suggests,
these
interfaces
can
hook
up
the
physics
with
the
atmospheric
driver
in
different
ways,
for
example,
in
a
traditional
way,
where
you'd
call
the
physics
right
from
the
atmosphere
driver,
but
also
inside
the
dynamical
core,
which
is
known
as
fast
or
inline
physics.
E
These
metadata
tables
they
are
used
together
with
inline
oxygen
documentation
in
the
ccpp
physics
schemes
to
generate
complete
scientific
documentation,
as
it's
shown
here
in
the
screen
screenshot,
so
the
scientific
documentation
and
the
fact
that
each
of
the
variables
that
a
parasitorization
needs
to
be
executed
are
described
in
full
in
those
tables.
One
of
the
ways
to
accelerate
the
development
of
physical
parameterizations.
It's
much
less
likely
that
you
get
units
or
something
like
that
wrong
or
dimensions.
E
E
So
the
suite
definition
file
shown
here
for
the
operational
gfs
version,
16
suite
in
the
ufs,
contains
a
surprisingly
large
number
of
schemes.
For
many
of
you,
I
guess,
because
of
a
fundamental
difference
between
ccpp
and
other
physics
driver
based,
modeling
systems,
so
all
the
glue
code
that
resides
inside
these
classical
physics,
drivers
and
that
connects
the
different
parameterizations
and
there's
often
tens
of
thousand
lines
of
code.
They
must
be
converted
in
what
into
what
we
call
interstitial
schemes.
E
So
some
of
the
features
some
of
xml
suite
definition
files
are
shown
here,
for
example,
the
ability
to
assemble
schemes
in
groups
that
can
be
called
individually
or
altogether
subcycling
for
calling
schemes
at
higher
frequency
or
with
a
smaller
time
step
and
also
user-defined
ordering
of
the
schemes.
So
one
word
of
caution
here:
if
users
want
to
change
the
order
of
schemes,
then
they
also
need
to
make
sure
that
some
of
the
logic
and
interstitial
schemes
works
as
intended,
because
some
of
it
still
depends
on
the
order.
E
E
E
E
Also,
as
mariana
already
mentioned,
the
esmf
team
recently
contributed
the
exchange
grid
capability
to
the
ufs,
in
which
cmaps
was
modified
to
act
as
a
ccp
host
model
and
to
perform
atmosphere.
Ocean
blocks
computations,
and
there
was
a
typo
that
I
just
skipped
over
all
right.
E
Up
until
today,
we
had
five
releases
of
ccpp
each
time
with
an
updated
and
growing
list
of
supported
physics
and
one
or
more
host
models
from
the
very
first
version
on
ccp
was
released
with
the
single
column
model
that
is
developed
by
the
dtc
as
well,
and
it's
one
of
the
key
components
of
the
ufs
theoretical
system.
Development
ccpp
was
also
part
of
all
the
ufs
medium
range
and
short
range
weather,
app
releases
so
far
and
more
are
coming
soon
from
epic.
E
E
So
for
the
example
of
the
upcoming
ccpp
version,
6
we're
going
to
support
four
suites
with
the
ufs
short
range
weather,
app.
That's
the
operational
gfs
version
16.,
the
rapid
refresh
forecast
system
version,
one
beta,
the
warning
forecast
and
the
her
suite
and
then
two
additional
suites
with
the
single
column
model.
E
All
right
changing
gears
a
little
bit.
An
important
effort
undertaken
in
the
last
year
focused
on
those
ccp
standard
names.
They
are
really
one
of
the
key
aspects
of
the
ccp
because
they
are
used
to
communicate
variables
between
a
host
model
and
the
physics.
E
So
whenever
possible,
we
try
to
use
standard
names
provided
by
the
cf
convention,
but
for
many
of
the
variables
we
had
to
come
up
with
additional
names,
we
had
to
create
new
names
and
it
was
difficult
because
there
we
didn't
have
any
clear
rules
in
the
beginning,
so
we
definitely
saw
this
proliferation
of
often
fully
constructed
names
coming
up
so
to
address
this
issue,
the
dtc
worked
with
ncar
and
the
community
to
put
in
place
a
set
of
rules
for
creating
new
names
and
also
to
assemble
a
dictionary
of
current
standard
names
that
are
in
use.
E
So
this
is
not
the
only.
This
is
not
the
only
standard
name
effort
that
has
been
made
in
the
past
so
but
unfortunately,
again,
this
set
of
standard
names
is
not
connected
with
any
of
the
other
efforts
in
the
united
states,
for
example,
esmf
standard
names,
the
physics
constants
dictionary,
the
jedi
yoda
data
convention.
So
all
of
these
standards
were
defined
independently,
and
I
personally
see
this
as
a
bit
of
a
missed
opportunity,
and
hopefully
something
can
be
done
there
to
be
more
interchangeable.
E
All
right
over
the
last
few
years
there
was
a
team
that
was
assembled
to
discuss
code
management
practices
for
ccpp
physics.
This
team
has
participants
from
various
institutions
such
as
dtc,
the
nrl,
noaa
and
ncar.
So
the
main
topics
here
are
to
discuss
how
the
future
collaboration
on
ccpp
should
look
like
and
how
to
come
up
with
standards
for
that.
So,
as
you
can
see
here,
there
are
common
interests
like
parameterizations
for
some
processes.
E
E
One
interesting
point:
a
discussion
point
was
centered
on
the
idea
of
having
a
single,
authoritative,
ccpp
physics
repository
for
all
models.
That's
currently
not
the
case
right
now
we
have
an
authoritative
ccpp
physics,
repo
for
sem
ufs
and
neptune,
that's
managed
by
dtc,
and
we've
got
an
encore
repository
for
for
ccpp
compliant
physics
that
are
shared
among
the
world,
empath
and
cm1
models
and
there's
a
reference
to
laura
fowler's
talk.
That
explains
that
in
a
bit
more
detail.
E
So,
as
most
of
you
know,
in
2019,
noah
and
encar
signed
a
memorandum
of
agreement
to
co-develop
the
ccpp
framework
as
a
single
system
to
communicate
between
models
and
physics,
and
it
was
actually
on
one
of
them
on
one
of
my
honest,
slides
and
last
slides
about
this
ongoing
this
new
project,
this
nsf-funded
project
as
well.
So
the
idea
was
to
jointly
define
the
requirements
for
the
next
generation
framework
and
then
converge
to
one
single
common
framework
with
a
superior
and
extended
functionality.
E
E
E
So
here
is
the
original
timeline
for
converging
to
a
common
ccpp
framework
for
both
sides,
and,
as
you
can
see,
we
decided
to
approach
this
really
from
two
angles,
with
ncar
ctd,
starting
to
develop
the
next
generation
code,
generator
called
captain
based
on
the
joint
requirements
that
we
flashed
out
after
the
moa
was
signed,
and
then
noaa
dtc
incrementally
adopting
some
of
those
new
features
and
design
specifications
so
that
a
future
transition
to
captain
in
the
ufs
would
be
almost
seamless
with
very
little
disruption
for
for
the
users
and
developers,
because
this.
E
So
it
sounds
a
little
sad
probably,
but
there
are
also
some
good
news
for
ccppm
on
the
horizon,
as
regarding
the
transition
to
operations
at
noaa.
So
this
is
a
schedule
for
when
ccpp
will
become
operational
in
the
in
the
various
noaa
models.
It
starts
in
2023
with
the
hurricane
analysis
and
forecast
system
and
the
rubbish,
reaper
rapid,
refresh
forecast
system
and
then
later
in
gfs
and
gefs
as
well.
E
E
And
while
there
is
some
uncertainty
on
the
future
implementation
of
and
development
of
ccp
at
ncar
as
part
of
sima,
we're
still
hopeful
that
one
day
not
too
far
out
we'll
see
cesm
and
empaths
ship
with
the
ccpv
framework,
the
one
ccpp
framework
under
the
hood
and
that
end
car
will
join
noaa
and
the
navy
and
other
organizations
like
nasa
and
d.o.e,
where
individual
groups
have
already
started
to
experiment
with
ccbp.
E
So
I
have
got
a
bit
more
time
left.
So
let
me
quickly
go
to
the
additional
material.
What
I
didn't
mention
is
that,
because
ccpp
consists
of
a
code
generator-
and
it
has
a
very
flexible
metadata
standard
that
gives
you
a
lot
of
information
about
the
variables
that
need
to
handle,
there's
a
ton
of
opportunities
for
development.
So
we're
talking
about
things
like
automatic
area
transformations.
E
If
the
index
ordering
is
not
correct,
vertical
flipping
calculation
of
derived
variables,
potential
temperature
from
temperature
and
geopotential
a
visualization
tool
that
shows
you
how
a
variable
travels
through
a
suite,
is
it
modified
somewhere?
Is
it
just
read
in
or
is
it
bypassing
a
certain
scheme
entirely?
That's
actually
already
in
progress.
It's
basically
finished
at
that
that
project,
better
error
handling,
to
include
traceback
information,
being
able
to
split
to
specify
whether
you
want
to
have
time
split
or
process
split
and
schemes
or
groups
of
schemes
in
the
suite
definition
file.
E
Automated
saving
of
physics
scheme
states
for
restarts,
we've
got
more,
there's
also
ideas
about
creating
a
generalized
way
to
create
either
ccpp
or
neuropc
capsule
physics,
so
that
you
can
run
them
either
inline
or
as
a
separate
component
and
the
ability
to
leverage
gpus
by
automatically
offloading
this
code
onto
gpus
in
the
order
generated
caps,
and
then
there's
also
some
idea
about
how
to
decompose
and
recombine
grid
columns
into
different
surface
types.
For
selected
physics
that
have
been
floated
around,
so
that's
basically
all
I
had
to
say
I'm
going
back
to
my
summary
slide.
E
Thank
you
for
your
attention
and
I'm
looking
forward
to
your
questions
either
now
or
later
on,
offline.
A
All
right,
thank
you
so
much
tom,
so
we
definitely
have
time
for
some
questions.
I
welcome
you
to
raise
your
hand
if
you
have
that
function
or
put
them
on
the
chat.
A
Let's
see,
I
don't
see
any
raised
hands
yet,
but
adam
harrington
has
a
comment
on
the
chat
adam.
You
wanna
mute
yourself
and
just
speak.
Your
comment.
D
Yeah,
I
just
wanted
to
voice
some
disappointment
that
the
ccpp
is
stalled
right
now
pending
the
sema
review.
Can
you
say
anything
more
about
it.
E
I
can
tell
you
that
I'm
going
to
be
on
the
team
review
panel
starting
from
next
week.
I
think
it
is
what,
in
two
weeks
time
and
then
hopefully
I'll
be
able
to
tell
you
more
I'd,
have
to
defer
you
to
my
dear
colleague,
steve
goldhaver,
who
has
maybe
a
little
more
information
on
that
sorry.
A
There
is
another
question
on
the
chat
from
richard
loft
richard.
You
want
to
unmute
yourself
and
ask.
G
E
So
we've
had
a
a
project
funded
at
noaa
gsl
to
prototype
some
of
this
work
and
it's
been
partially
implemented.
We
gpu
is
one
of
the
the
schemes
in
ccppm
the
graph
writers,
convection
scheme
and
then
sort
of
hard
coded
the
calls
into
the
model.
E
So
we
would
bypass
the
framework
for
this,
but
just
to
demonstrate
that
you
know
there
is
a
physics
package
in
ccpp
that
can
be
called
and
run
on
gpu,
and
that
has
shown
you
know
great
potential
in
terms
of
speed
up
with
respect
to
when
that
work
on
the
framework
would
start.
My
personal
opinion
is
it's
relatively
high
up
in
the
priority
list
after
the
next
gfs
operational
implementation
and
the
leader?
Please
correct
me
if
I'm
wrong,
because
I
know
that
emc
is
also
looking
into
gpus
for
in
the
future.
E
G
Yeah,
okay,
could
I
just
follow
up
real
quick
on
that
is?
Is
that
written
up
anywhere
that
technique
that
you
used
to
to
bypass
or
gpuis
the
the
grail
freitas.
E
A
And
just
to
add
in
terms
of
prioritization,
you
know
for
gpu
offload
or
many
of
the
other
developments
that
dom
mentioned.
We
are
hoping
to
conduct
a
ccpp
visioning
workshop,
in
which
we
would
discuss
that
we
don't
have
yet
the
funding
for
conducting
that
workshop,
but
we're
still
assembling
funding
and
if
things
work
out,
that
would
happen
later
2022
or
early
2023,
and
that
would
be
a
great
venue
for
you
know
discussing
in
the
in
a
multi-institutional
setting
what
our
priorities
are
and
how
to
go
forward
to
fund
some
of
those
priorities.
A
So
this
the
sema
review
is
really
important
because
if
you
know
steve,
gohaber
and
colleagues
already
did
a
lot
of
development
on
this
next
generation
ccpp
framework
and
some
decisions
need
to
be
made
about
that
going
forward
for
noaa
or
or
not
so
that
this
development,
such
as
gpu,
can
be
done
on
top
of
the
existing
system
or
this
next
generation
system.
A
And
with
that,
I'm
going
to
thank
dom
for
his
presentation.
If
you
have
more
questions
for
dom,
please
save
them
to
the
final
q,
a
or
put
them
on
the
chat
and
we're
going
to
move
on
with
the
presentation
by
brian
dobbins
about
cesm
and
new
technologies,
clouds,
containers
and
accelerators
gpus.
So
go
ahead.
Brian.
F
Everybody
yeah,
I'm
brian
dobbins,
I'm
a
software
engineer
in
cgd.
I've
got
a
lot
of
material
here,
so
I'm
gonna
go
pretty
quick,
but
I'm
happy
to
also
have
offline
conversations
about
anybody
so
to
jump
right
in
I'm.
Gonna
start
with
talking
about
cesm
in
the
cloud
we've
been
doing
runs
in
the
cloud
for
a
while
now,
but
we
finally
have
production.
Science
runs
and
production
workshops.
So
we've
really
made
a
lot
of
progress
lately
on
the
left.
You
see
these
production
science
runs.
These
are
large.
F
It's
a
one
degree
wacom
case
about
15
million
cpu
hours
going
to
generate
about
a
petabyte
of
output
in
the
whole
workflow
from
running
to
post-processing
to
archiving
is
all
done
on
aws
on
the
right.
We
recently
used
the
same
infrastructure
to
develop,
to
deploy
a
jupiter
hub
to
run
a
train
for
ctsm
they're
about
60,
plus
users
automatically
creates
the
counts,
makes
it
nice
and
easy
multiple
queues,
node
types,
so
these
are
really
success,
successful
projects
and
we're
hoping
to
make
these
tools
available
to
the
community
soon.
F
So
how
do
we
use
the
cloud?
So
you
start,
you
just
need
a
cloud
account
and
credentials.
Then
you
gotta
learn
the
cloud
interface.
Then
you
gotta
choose
a
node
type.
Then
you
gotta
choose
a
network
type.
You
gotta
choose
a
storage
type,
a
storage
size
cloud
region
create
a
yaml
file.
Configure
instance,
settings
configure
network
settings,
launch
cloud
resources,
update
the
os
and
tools,
install
the
libraries
and
compilers
install
and
configure
csm,
and
then
you
can
do
your
science.
Now.
I've
actually
made
this
a
little
bit
simpler
than
it
is.
F
I
skipped
a
few
things
to
get
this
all
to
fit
here,
and
this
is
clearly
not
something
that
we
want
the
user
community
to
do,
because
this
is
not
sustainable
for
them.
So
our
approach
is
we're
creating
a
an
api
that
makes
us
a
little
bit
simpler
and
so
the
idea
is,
you
have
a
user
with
cloud
credentials.
They
access
this
api
and
it
does
all
of
those
steps
for
you.
F
So
this
offers
several
key
advantages.
It's
a
lot
easier
for
scientists.
We
can
support
multiple
clouds
with
the
single
interface.
You
don't
need
to
learn
two
different
cloud
interfaces
and
we
can
add
features.
So
you
can
just
click
a
button
and
select
to
add
a
jupiter
hub.
You
can
phone
home,
so
we
can.
You
know
if
you
have
trouble
with
a
run
and
we
want
to
support
that.
You
can
enable
remote
access
via
an
encryption
key
from
ncar
tools
like
the
post-processing
containers
and
stuff.
We
can
all
deploy
that
automatically
for
people.
F
When
I
talk
an
api,
that's
a
pretty
abstract.
You
know
idea.
So,
just
as
an
example,
here
are
two
interfaces
that
I've
sort
of
played
around
with,
but
we
don't
have
anything
up
and
running
for
the
public.
Yet
on
the
left,
you
see
a
website.
This
is
pretty
easy.
You
select,
you
know
what
kind
of
cluster
type
here
I
did
single
users,
you
don't
see
any
account
information.
F
You
just
enter
your
access
keys,
you
click
the
button
and
in
about
10
minutes,
it'll,
give
you
a
a
a
ssh
address
that
you
can
connect
to
and
on
the
right,
I'm
running
this
verbose
version.
So
it's
showing
you
what
it's
doing,
it's
finding
credentials,
checking
the
mode,
selecting
node
type
and
that
will
return
to
you,
a
ssh
command
that
you
can
use
to
connect
to
cloud
resources.
F
So
the
cloud
is
really
easy:
that's
the
upside,
or
we
can
make
it
very
easy.
That's
the
upside!
But
let's
talk
about
one
of
the
downsides,
which
is
the
cost.
So
right
now
with
on-demand
pricing,
you
see
in
the
table
above
prices
have
improved
they've
gone
from
11
cents,
an
hour
on
the
older
generation
of
notes
to
about
three
cents
an
hour,
but
the
directio
cost
equivalent.
With
my
back
of
the
envelope.
F
Calculation
is
about
0.3
cents
an
hour
so
for
compute,
it's
still
more
expensive
to
use
the
cloud
now.
Another
thing:
I've
added
this
slide
there's
a
lot
of
talk
about
in
in
cloud
data
lately,
and
you
know
cloud-based
analysis
and
one
of
the
things
that
when
we
do
hpc,
we
often
think
about
compute
costs
and
compute
hours.
We
don't
we
don't
we
take
it
for
granted
that
data
transfer
is
free
in
the
cloud.
That's
not
the
case
and
there's
cost.
F
So,
for
example,
if
you
had
to
store
20
terabyte
in
the
cloud
per
month,
that's
not
too
bad.
If
you
wanted
to
download
that
you
pay
a
decent
chunk
and
that's
per
download.
So
if
somebody
else
downloads
it
again,
you
pay
that
again,
you
know.
So,
for
example,
you
know
we
have
this.
This
arise
data
that
we're
hosting
in
the
cloud
and
if
you
were
to
download
the
whole
thing,
it's
30
000.
F
If
10
people
download
it
that's
300
000,
but
thankfully
there
are
three
ways
of
hosting
data
in
the
cloud
one
is
you
get
generous
support
from
aws
and
the
amazon
sustainable
data
initiative
and
they
will
host
it
for
free
they've
done
this
with
csm
lens
data,
and
this
arise
data,
but
otherwise,
for
for
data
you
have
either
have
the
host
has
to
pay
for
it,
which
is
a
potentially
unbounded
cost
or
the
downloader
pace.
This
is
problematic
for
open
science
in
the
cloud.
F
Okay,
so
move
on
really
quickly
to
containers.
We
talked
about
the
clouds
offering
configurable
environments
and
the
benefit
of
that,
because
we
can
pre-install
things
and
containers
do
the
same,
but
on
your
own
hardware.
So
this
makes
it
really
easy
to
provide
ready-to-run
tools
to
the
community,
so
it
greatly
simplifies
the
ease
of
porting
and
you
can
ensure
some
cross
system
compatibility.
F
You
just
use
the
base
container
and
you
still
get
a
consistent
environment,
an
analysis
platform
across
any
users.
F
Now
containers
aren't
limited
to
csm
itself,
we're
also
containerizing
a
variety
of
our
tools,
so
we
have
some
cam
topology
tools.
Recently,
containerized
the
csm
time
series
generation
tool:
this
is
a
big
one.
That's
gotten
a
lot
of
interest
and
and
allison
baker
is
going
to
talk
later
about
some
compression
tools.
I've
started
working
on
a
container
of
that.
We
can
make
these
parts
of
workflows
for
people,
so
you
don't
need
to
install
your
own
software.
You
can
kind
of
just
use
the
container
it's
done
for
you
on
desktops
and
laptops.
F
This
is
typically
used
to
be
a
docker
and
on
htc
systems
we
usually
use
singularity,
which
has
now
been
renamed
as
actainer,
and
if
this
is
of
interest
to
you
and
you,
you
don't
know
how
to
use
this,
because
these
are
some
new
skills.
F
I
need
to
use
docker
or
let
me
do
singularity,
but
you
want
to
get
in
touch
and
we'll
we'll
happily
help
you
out
with
that
one
of
the
benefits
of
these
configurable
environments,
the
cloud
and
container,
is
that
you
can
unify
them
so
that
what
you
see
on
one
is
what
you
see
on
the
other,
and
this
is
really
helpful,
because
it
means
that,
if
you're
a
new
user
to
see
esm,
you
can
learn
on
your
own
laptop
and
then
you
can
use
the
cloud
and
moving
their
transitioning
to
that
is
trivial,
because
it's
just
very
it's
the
exact
same
environment.
F
So
you
don't
need
to
worry
about
any
differences.
Different
paths,
different
directories,
it's
all
very
easy
and
on
the
topic
of
standardization,
another
thing
we're
working
on
is
standardizing
jupiter
environments.
We
call
this
ease
the
earth
analysis,
science,
environment,
and
this
is
just
a
pre-installed
conda
environment
with
the
pangeo
and
various
cesm
tools.
The
idea
is,
we
don't
want
students
to
have
to
go
through
condo
install
and
you
know
modifying
their
own
environments.
We
want
everything
to
sort
of
work
out
of
the
box.
F
This
is
with
our
new
diagnostics
various
packages,
and
this
will
be
updated
on
a
rolling
basis.
F
At
the
end
of
the
day,
we
want
the
same
east
kernel
to
be
available
on
derecho
and
cheyenne,
and
our
cloud
and
container
environment
so
any
end.
Car
environment
would
have
the
same
kernels
available,
ready
to
use
out
of
the
box,
no
configuration
necessary
okay,
so
one
of
the
biggest
topics.
I
think
that
that
people
always
wonder
about
is
accelerators
in
cesm,
and
there
are
four
key
challenges
here
that
I'm
going
to
talk
about,
and
the
goal
today
is
that
this
is.
F
F
So,
let's
start
with
performance
versus
efficiency,
so
we
don't
have
a
gpu
capable
version
of
cesm.
So
as
a
proxy,
I
took
empass
atmosphere
version
six
at
a
one
degree,
resolution
58
vertical
levels.
This
is
the
cam
seven
vertical
levels,
double
precision
mode
and
if
I
run
it
on
the
directio
hardware,
so
this
is
the
the
gpus.
These
are
the
the
ratio.
Gpus
and
cpus
are
the
dracho
cpus.
F
You
can
see
this
impressive
difference,
I
mean
so.
Basically
the
gpu
is
about
2.75
times
faster
than
the
cpu,
so
roughly
3x
on
on
a
single
node,
and
that
is
impressive.
This
is
more
power
efficient.
It's
it's!
It's
a
really
big
win.
Now.
This
is
not
the
whole
story,
though.
F
If
we
scale
out
this
one
degree
run
well,
then
we
get
a
different
story,
so
the
gpu
needs
a
lot
of
parallelism
to
perform
well,
and
so
it
rapidly
runs
out
of
that
parallelism
and
it
doesn't
scale
as
well,
whereas
the
cpu
scales
very
well.
So
here
we
have
a
gpus
are
more
efficient,
but
the
cpus
are
able
to
perform
much
better.
F
Now,
in
this
case,
this
is
using
an
older
version
of
empass
atmosphere
that
has
a
known
issue
with
lacking
gpu
direct
communications,
so
this
will
improve
a
little
bit,
but
at
the
end
of
the
day,
for
the
one
degree
workhorse
resolutions,
there
are
still
advantages
to
both
this.
This
is
a
great
result
from
gpus
there's
really
great
value
in
the
efficiency
and
there's
also
great
value
in
the
speed
of
the
cpu.
So,
for
example,
paleo
runs
you're,
going
to
definitely
want
a
cpu
for
that.
F
Another
big
question:
when
we
talk
about
this,
that
is
this.
You
know
issue
of
science
capability
versus
capacity.
This
is
something
we
talk
about
in
hbc
with
systems,
and
you
know
the
it
boils
down
to
should
end
car
focus
on
enabling
a
few
uniquely
large-scale
runs
or
supporting
a
high
volume
of
science
at
workhorse
scales.
There's
not
a
right
or
wrong
answer
here.
F
These
are
both
great
approaches
and
I
think
we
need
input
from
the
community,
I'm
a
bit
of
a
numbers
guy,
so
I
like
to
visualize
things
and
have
data,
so
I
do
a
bit
of
an
analysis
here.
So
cheyenne
has
145
000
cores,
that's
1.27
trillion
core
hours
per
year
with
no
down
time.
This
is
24
365.
F
about
55
of
cheyenne
goes
to
cesm,
that's
about
700
billion
core
hours,
and
so,
if
we
do
a
a
b
1850
coupled
run
at
one
degree,
it
takes
21.98
core
hours
per
year.
Well
with
nothing
else
being
done
with
the
system.
No
analysis,
no
higher
resolution,
no
lower
resolution.
We
get
around
320
000
simulated
years
per
year
of
cheyenne.
F
If
we
were
to
do
a
3.75
kilometer
run
and
we
apply
linear
scaling
to
this.
That's
32x
in
the
lat,
long
direction
and
a
32x
in
the
time
step.
We
get
about
9.7
simulated
years
per
year,
so
with
55
of
the
system
for
one
user
that
that
would
be
less
than
one
model
year
per
month.
So
there's
great
science
to
be
done
at
this
resolution.
F
But
it's
a
question
of
what
are
we
aiming
for?
Are
we
aiming
for
large
scale
runs,
or
I
mean
aiming
for
a
lot
of
runs
of
different
science
at
work,
of
course,
resolutions,
and
this
is
where
we
need
input
from
the
community
on
code
portability.
This
has
been,
this
was
a
concern
for
us
that
is
no
longer
a
huge
one,
thanks
to
some
great
work
that
sizzle's
been
doing
so.
Basically,
a
lot
of
the
early
work
on
accelerators
was
done
with
nvidia's,
open,
acc
or
open
acc
on
nvidia
gpus.
F
I'm
sorry,
that's
my
dog
barking.
We
we
had
concerns
about
open
acc,
because
we
are
a
community
model.
We
need
portability,
we
can't
tell
people,
you
have
to
run
nvidia
gpus
and
the
intel
gpus
wouldn't
support
it.
So
now
we're
we
were
leaning
towards
openmp
and
sizzle's,
been
working
on
both
openacc
and
openmp,
and
they've
been
using
this
intel,
converter
and
exploring
that-
and
this
is
a
great
path
forward
for
us,
because
it
enables
us
to
look
at
embracing
open,
acc
early
and
having
automatic
conversions.
F
So
it's
not
a
some
cost
on
any
development
work
we
do.
Finally,
one
of
the
biggest
things,
of
course,
is
the
people
time.
So
models
of
similar
complexity
have
taken
around
10
greater
than
10
ft
years
to
gpuis.
So
do
we
focus
our
efforts
on
this
or
on
science
features
or
usability
features
again
open-ended
question.
There's
value
in
both
and
can
we
get
additional
funding
or
community
help
to
do
both
of
these?
This
is
you
know
that
would
be
great.
F
Finally,
there's
still
relatively
low
adoption
of
gpus
and
academic
systems.
If
you
look
at
the
top
500
us
academic
sites,
I
think
the
the
cpu
nodes
to
gpus
is
around
gpus
around
20
of
them
and
a
little
less
than
that
in
an
informal
exceed
campus
champion
survey.
F
Finally,
I
I
I'm
gonna
move
very
quickly
to
to
an
extra
slide.
I've
added
after
this
discussions
on
monday,
so
one
thing
that
so
everett
mentioned
serving
underserved
communities
and
and
and
global
use
of
cesm.
F
The
one
challenge
we're
seeing
now
is
that,
as
our
use
increases
via
clouds
and
containers,
it's
it's
coming
from
novel
ips,
there's
more
users
and
all
that
input
data
is
being
funneled
through
ncar.
Well,
that
doesn't
need
to
be
the
case.
We
can
put
data
in
in
remote
servers
and
have
data
go
through
there.
Jim
edwards
did
some
great
work
on
enabling
this
and
seem
so.
We
just
need
to
have
the
data
hosted
somewhere,
we're
talking
to
cloud
vendors
about
that
longer
term
because
of
cloud
egress
charges.
F
You
really
need
some
sort
of
mesh
network
which
relies
on
the
fast
networks
at
university
sites
to
pull
in
data
from
elsewhere,
and
this
is
something
that
we're
beginning
to
think
about
and
want
to
pursue
as
we
move
forward.
So
I
know
I
moved
very
quickly
from
that
through
all
this.
Our
summary
is:
we
have
cesm
and
jupiter
hub
usable
via
aws
for
science
and
training.
It's
pretty
easy
to
use,
but
not
as
cost
effective
as
on-prem
systems.
F
Containers
are
great.
They
enable
ready-to-run
applications
and
we're
looking
at
some
new
tools
like
again
the
time
series
one
gets
asked
about
a
lot
so
so
we'll
try
to
make
that
available
to
people
soon
and
the
big
one.
Our
gpu
approach
is
a
careful
balance
of
technology,
science
systems
and
people
to
do
what's
best
for
the
cesm
community.
F
The
technology
is
very
exciting,
but
we
we're
trying
to
just
be
a
little
bit
conservative
in
how
we
embrace
it
to
see
where
it's
really
gonna
serve
us
best
and
balance
those
those
issues
of
of
resources
with
time
and
systems.
Finally,
data
access.
As
I
said,
it's
a
growing
issue,
we're
looking
at
some
ideas
here.
So
sorry,
I
went
through
that
very
fast
and
I
apologize
for
my
dog
barking.
A
So
if
you
have
questions,
please
raise
your
hand
or
put
them
on
the
chat
and
then
we'll
ask
you
to
unmute
and
ask
your
question.
A
So
brian,
maybe
you
can
talk
a
little
bit
more
about
this
aspect
of
choosing
what
parts
of
the
modeling
system,
oh,
I
see,
rich,
has
a
handout
so
rich.
Go
ahead!
Now
ask
my
question
later.
G
Okay,
yeah,
I
just
want
to
make
a
comment
about
this,
that
you
know
brian
you
and
I
have
talked
in.
I
I
think
the
best
way
to
convey
all
this
about
gpus
is
that
gpus
are
all
about
the
data
parallelism
and
that
can
come
from
either
the
size
of
the
ensemble
or
the
amount
of
points
available
in
the
grid
of
of
the
bottle,
for
example.
G
So
it
you
know,
I
think
the
issue
with
small
ensembles
low
resolution
problems
are
not
the
gpu
wheelhouse.
G
So
that's
just
to
sort
of
follow
up
on
that
and
and
flesh
that
out.
Your
question
back
to
you.
A
I
was
just
gonna:
ask
you
to
talk
a
little
bit
more
about.
You
know
how
you're
working
to
select
which
parts
of
the
system
to
offload
to
gpu
versus
cpu,
I'm
imagining
you're,
not
testing
an
entire
system
on
gpus.
F
Oh,
I
think
that
question
the
chat
was
actually
for
for
dom.
From
the
last
talk
yeah
I
mean
we're
yeah,
so
I
think
we
have
a
talk
coming
up
from
john
sun
about
some
gpu
microphysics,
but
you
know
we're
still
sort
of
taking
a
a
large
scale,
look
and
not
working
to
massively
convert
things
to
gpu,
yet
we're
kind
of
letting
the
technology
summer
down
a
bit.
A
Okay,
I
don't
see
any
questions
on
chat.
I
see
a
comment
from
eric.
I
did
get
kicked
out
for
a
couple
minutes
there.
So
if
something
happened,
I
don't
see
it
so
at
this
time
I
don't
see
any
questions
or
any
other
hands
raised.
A
Okay,
well,
if
not,
let's
just
go
ahead
and
move
on
to
the
next
talk.
We're
going
to
continue
on
the
topic
of
gpus
and
we're
going
to
have
john
sun
talk
to
us
about
enabling
the
execution
of
pumas
on
gpus
and
jen.
We
can
see
your
screen.
It
is
not
in
presentation
mode.
Yet
yep
looks
great,
so
please
go
ahead.
H
Okay,
thank
you,
alicia
and
thank
you
brian,
for
give
a
nice
background
introduction
about
gpu
computation,
which
I
think
is
very
helpful
for
my
talk
as
well
and
good
morning.
Everyone
I'm
jim
from
anchor
today,
I'm
going
to
present
the
work
that
I
collaborated
with
john
sherry,
brian,
andrew
and
kate.
During
the
past
two
years.
The
title
of
this
presentation
is
enabling
the
execution
of
pumas
on
gpus.
Here
pumas
refers
to
the
cloud
microphysics
scheme
used
in
cam
right
now
and
the
presentation
will
be
outlined
in
the
following
sections.
H
First,
I
will
describe
briefly
what
is
cloud
microphysics.
Then
we
will
do
a
code
overview
of
the
pumas
code.
In
can
later
I
will
describe
what
is
the
methodology
we
use
to
upload
the
cpu
pumas
code
to
gpu
and
some
preliminary
results.
Finally,
I
will
draw
the
conclusion
and
mention
some
future
work.
H
H
H
It
also
requires
some
water
vapor
calculations
from
the
cam
code
that
adds
up
additional
1100
lines
of
code,
considering
the
fact
that
for
the
cam
fitness
code,
it
contains
about
0.4
million
lines
of
source
code.
Pumas
only
reprint
represents
about
1.2
percent
of
the
total
camp
fitness
code,
but
it
could
contribute
about
eight
percent
of
the
total
computational
time
of
camp
physics.
H
Well,
we
know
that
the
cam
is
already
highly
scalable
on
cpu
through
mpi
and
openmp.
Therefore,
the
strategy
we
use
in
our
work
to
speed
up
pumas
is
to
upload
it
to
gpu.
Instead
of
working
directly
into
the
whole
cam
code.
We
start
from
occasion
kernel
for
pumas,
which
is
easy
for
us
to
do
the
code,
development,
debug
and
the
testing
in
this
particular
work.
We
choose
the
directive
based
gpu
offloader
method
to
do
the
gpu
parting,
and
in
this
case
we
have
explored
both
open
cc
and
open
p
upload
directives
method.
H
The
reason
we
use
the
directive
based
program
model
is
because
it
can
keep
a
single
source
code
for
both
cpu
and
gpu
version
of
pumas,
which
improves
the
maintainability
of
the
source
code.
By
using
the
direct
base,
the
method
we
are
basically
adding
pragmas
to
convert
the
cpu
code
to
gpu
codes.
Therefore,
the
readability
of
the
code
is
minorly
affected.
H
One
thing
I
want
to
highlight-
and
I
think
brian
also
mentioned
in
his
talk-
is
that
in
this
work
we
use
the
intel's
auto
migration
tool
to
convert
open,
acc
directive
to
open
pr
flow
directive,
and
it
works
very
well
in
our
case.
So
I
put
the
link
to
that
too.
In
this
talk
and
everyone
who
is
interested
in
this
tool,
you
are
more
than
welcome
to
check
it
out
and
apply
it
in
your
production
code.
H
H
So
when
we
put
the
cpu
code
to
gpu,
usually
we
don't
expect
the
results
a
bit
for
bit,
because
we
are
running
the
code
on
different
platforms
and
we
may
have
to
use
the
different
compilers
as
well.
So
the
big
question
we
ask
ourselves
during
this
work
is
when
we
observe
a
difference,
is
this
difference
expected
or
is
due
to
a
code
bug
that
we
introduced
during
the
code
implementation
in
this
work?
We
use
the
cam
on
server
consistency,
test
to
examine
the
gpu
code
and
ensure
its
correctness
before
we
look
at
any
performance
data.
H
H
Here,
the
gpu
results
only
account
the
gpu
computation,
excluding
the
data
movement
from
cpu
and
the
gpu.
The
x-axis
is
the
input
data
size
and
the
y-axis
is
the
performance
metric
with
the
higher
number.
The
better
performance
is
both
x-axis
and
the
y-axis
are
plotted
in
log
scale.
Here
you
can
see
that
the
gpu
performance
is
consistently
worse
than
the
cpu
ones,
which
is
against
our
original
expectation.
H
So
I
will
show
some
examples
that
we
think
is
critical
for
the
gpu
performance.
The
first
example
I
will
show
is
some
loop
dependency
in
the
original
implementation.
Here
I
give
the
example
in
the
original
pluma's
code,
so
in
this
particular
code
we
can
see
that
when
we
calculate
the
variable
precipitation
fractions,
we
have
some
vertical
dependency
for
the
k
loop.
Therefore,
if
we
add
gpu
pragmas
directly
to
these
loops,
we
will
end
up
with
running
both
loops
in
serial
on
gpu,
because
we
have
to
run
the
outer
loop
in
serial
first.
H
This
really
doesn't
makes
really
sense
on
gpu,
because
the
column
calculation
is
independent.
Therefore,
the
solution
for
this
case
that
we
reverse
the
loop
order
puts
the
column
loop
at
the
outer
y,
and
in
this
case
we
can
achieve
some
parallelizing
on
gpu
at
least.
H
Another
example
that
we
find
is
critical
is
that
in
the
pumas
code
there
is
a
very
expensive
calculation
called
segmentations,
and
this
segmentation
is
repeated
for
different
hydro
medias
in
the
cpu
version
of
the
code,
and
you
can
see
that
in
the
cpu
version
the
five
segmentation
calculation
is
down
in
serial.
But
if
we
look
at
the
implementation
in
details,
they
are
actually
independent
from
each
other.
So
for
cpura,
it's
really
not
necessary
to
follow
the
same
calculation
workflow.
H
There
are
other
optimization
examples.
I
didn't
list
them
all
here,
but
after
some
optimizations
we
can
find
the
new
performance
data
for
the
same
caging
kernel.
So
it's
the
similar
plot
than
the
previous
slides
with
the
dashed
line
refers
to
the
original
performance
data
and
the
solid
lines
refer
to
the
improved
performance
data
or
the
performance
with
the
optimize,
the
gpu
code.
H
First,
we
can
see
that,
although
we
have
done
a
few
gpu
optimizations,
the
performance
on
the
cpu
is
nearly
affected,
which
we
think
is
really
good,
but
for
the
gpu
performance
compared
with
the
original
one,
the
new
performance
on
gpu
is
significantly
improved,
and
in
this
case
we
can
see
at
a
certain
point.
The
gpu
is
able
to
outperform
the
gpu
is
able
to
outperform
cpu
at
a
certain
point
and
when
we
increase
the
problem,
size
gpu
shows
more
benefits.
H
H
So,
in
the
previous
slide,
we
have
shown
that
we
can
benefit
from
gpu
parting
for
pumas
in
a
cajun
kernel.
The
next
question
we
ask
ourselves
is
that
whether
we
can
maintain
the
same
speed
up
in
a
real
cam
production
run.
So
in
this
particular
case
we
did
a
cam
simulation
with
the
f2000
the
component
set.
We
use
the
fv
digest
one
degree
and
perform
one
day
simulation.
H
So
here
is
a
bar
plot
on
the
accesses.
It's
the
data
size
that
is
uploaded
to
gpu
per
kernel
launch
and
the
y-axis
is
the
time
per
sub-step
per
mpl
rank,
so
the
lower
number,
the
y-value
is
the
better
performance.
It
is
so.
First
we
look
at
the
blue
and
the
green
bars
that
refers
to
the
cpu
results
on
shenyang,
the
casper
we
can
see
in
our
cases
with
different
data
size.
H
The
casper
dot
is
consistently
better
than
xian
yang,
which
is
expected
because
casper
has
a
relatively
new
cpu
architecture
when
looking
at
the
gpu
results.
If
we
have
the
default
value
of
p
column
equal
to
16,
we
can
see
the
gpu
performance
is
much
worse
than
the
cpu
one,
because
we
spend
most
of
time
on
the
data
movement
between
cpu
and
the
gpu.
H
However,
if
we
increase
the
data
size
reduce
the
data
movement,
frequency
between
cp
and
gpu,
the
gpu
performance
can
be
improved
quickly
and
at
the
end
we
are
able
to
achieve
two
to
three
times
speed
up
compared
with
the
best
performance
on
cpu
that
is
on
casper.
Therefore,
from
this
plot,
we
can
see
that
the
gpu-enabled
pumas
can
also
outperform
its
cpu
version,
even
if
we
are
using
it
in
a
practical
cam
simulation.
H
If
we
focus
on
the
gpu
result,
look
into
the
details
of
it
and
decompose
the
time
contribution
from
different
perspectives.
We
can
see
that
for
different
p
column
values,
the
data
movement
from
cpu
to
gpu
and
the
gpu
back
to
cpu
contribute
to
much
more
than
the
computation
on
gpu
itself.
So
from
this
pie
chart
we
can
clearly
see
that
the
data
transfer
could
be
more
time
consuming
than
computation
for
the
puma's
case,
and
that
should
be
our
next
focus
of
optimizations
in
the
future.
H
So
here
is
a
short
summary
about
what
we
have
done.
First
of
all,
we
have
fully
offloaded
pumas
to
gpu
by
exploring
opencc
and
openmp
of
load
technique.
We
evaluate
the
performance
on
cpu
and
gpu,
and
the
result
shows
that
even
we
are
using
only
one
gpu
per
node.
We
can
achieve
some
promising
speed
up
compared
with
one
cpu
node.
One
thing
I
want
to
point
out
is
that,
in
my
opinion,
open
pr
load
is
relatively
new
compared
with
open
acc
and
in
our
work
we
find
that
overseas
still
performs
slightly
better
than
openmp
offload.
H
So
that's
what
we
have
found
so
far
talking
about
the
next
step,
since
we
get
encouraging
results
from
one
gpu
per
node.
It
is
natural
to
think
of
using
multiple
gpus
per
node
and
seeking
potential
speed
up
on
gpu
further,
and
in
our
result
we
find
that
gpu
favors
large
problem
size.
So
in
this
case
we
are
interested
at
if
the
gpu
speed
up
can
be
larger.
H
So
this
work
is
funded
by
the
nsf
earthwork
project
and
ncaa
core
co-funding.
Besides
the
courses
listed
in
the
presentation
title,
I
also
want
to
give
a
huge
thanks
to
the
contributions
and
the
help
from
different
people
from
different
labs
and
organizations.
With
their
names
listed
in
this
slides,
so
with
that,
I
would
say
thank
you
for
attention
and
I'm
happy
to
take
any
questions.
A
Yep,
thank
you
so
much
jian.
So
we
definitely
have
time
for
some
questions.
So
if
you
do,
please
raise
your
hand
or
enter
your
your
name,
a
question
on
the
chat
and
we'll
ask
you
to
unmute.
G
Yeah
jensen,
one
one
thing
that
I
didn't
catch
was
so
you
talked
about
using
asynchronous
kernel
launches
in
order.
H
H
G
That's
this
stuff
that
you
showed
here,
but
how
much
did
that
actually
speed
up
the
code
compared
to
with
that
like
if
you
turn
that.
H
H
H
H
It's
about
10
and
I
would
like
to
say
this
is
really
case
by
case
because
by
running
gpus
asynchronously
we
need
to
synchronize
it
at
some
point
and
the
synchronization
is
a
very
expensive
operation
on
gpu
computing.
So
it's
really
case-by-case
choice.
I
would
say.
H
A
Okay,
jian,
so
have
you
been
able
to
keep
your
code
base
like
unified,
for
you
know,
use
in
both
a
gpu
and
cpu,
or
you
have
to
keep
separate
codes.
H
So
in
this
particular
yeah
sense
for
this
question
lesson,
so
let
me
go
to
my
slides
here:
okay
yeah.
So,
as
I
mentioned,
the
the
method
we
use
in
this
works
called
directory,
direct-to-base
program,
parallel
program
model
and
a
nice
feature
of
this
method
that
we
can
keep
the
single
source
code
for
both
cpu
and
gpu,
which
means
that
we
only
have
a
single
source
code,
but
we
can
enable
it
on
either
cpu
and
gpu
by
some
choice
of
compiler
flags.
A
H
A
Is
this
pretty
much
the
only
physics
code
that
is
being
used
in
in
cam
csm
like
using
gpu,
or
is
this
one
of
a
variety.
H
Yes,
thanks
for
this
question
as
well,
so
to
my
knowledge,
pumas
is
currently
the
only
one
brought
into
can
for
a
real
gpu
simulation.
I
know.
Rtm
gp,
which
is
a
radiation
scheme,
also
have
obcc
and
openmp
upload
directives,
but
I
personally
haven't
tested
it
so
far.
I
think
some
people
on
the
audience
list
they
might
have
experience
and
hopefully
someone
can
jump
in
and
clarify
it.
A
G
Yeah,
I
just
thought
I'd
yeah
the.
If
you
look
at
the
jim
hurls
presentation
from
the
atmospheric
working
group
yesterday,
there's
a
status
slide
which
shows
the
status
of
these
things.
There's
a
port
jensen's
right,
there's
a
rtmg
p
version
which
is
being
worked
on
by
brian
medeiros
and
company,
getting
that
into
the
the
the
physics
suite
and
there's
also
work
being
done
on
club
under
a
doe
contract
to
gpi's
it
using
offload
directives.
A
Well,
I
think
that's
it
for
questions.
I
don't
see
any
other
raised
hands,
so
I
think
we
are
going
to
come.
This
first
bring
this
first
session
to
a
closing
or
the
first
half.
So
thank
you
to
all
the
speakers
so
far
we're
going
to
head
into
a
break
and
reconvene
at
10
25
for
the
second
set
of
talks,
but
before
we
go
there,
let
me
ask
bill
if
there
are
any
announcements
or
anything,
we
need
to
know.
B
No,
that
that
sounds
great
great,
we'll
see
you
in
20
minutes.
I
guess
I'll
just
remind
people
if
they
joined
a
little
late,
that
we
would
be
happy
to
see
you
this
evening
at
5
15
at
the
raybac
collective,
for
those
who
are
local
and
they're
able
to
join
us
for
a
little
social
gathering
this
evening,
thanks
for
a
great
set
of
talks
for
this
first
first
set
and
thanks
alicia
for
moderating
that.
D
B
We
get
started
in
the
second
half
first
for
those
who
are
just
joining
us
for
the
second
half.
We
would
like
to
invite
you
to
join
some
members
of
the
software
engineering
working
group
this
evening
at
5
15
at
the
raybac
collective
in
boulder,
if
you're,
local,
so
yeah.
We
hope
to
hope
to
see
a
number
of
people
there
and
then
just
a
reminder
that,
as
in
other
meetings,
we're
following
the
ncar
code
of
conduct
here
so
offering
constructive
feedback
sharing
the
air,
acknowledging
teamwork,
encouraging
innovation,
showing
appreciation
and
considering
new
ideas.
B
So
with
that
said,
yeah
I'd
like
to
turn
it
over
to
jesse,
just
nisbaumer
who's
going
to
be
presenting
on
the
atmosphere,
diagnostics
framework,
so
take
it
away.
Jesse.
I
Great
thanks
bill
and
thanks
all
for
letting
me
speak
real,
quick,
there's
an
asterisk
here,
because
we
found
out
everyone's
been
calling
it
the
atmosphere,
diagnostics
framework,
and
then
we
found
out
10
minutes
ago.
That
way
back
in
the
original
documentation
was
supposed
to
be
the
amwg
diagnostics
framework.
So
well,
but
it's
definitely
the
adf
anyways.
I
I
want
to
I'm
just
going
to
talk
about
the
package
and
I
first
want
to
thank
my
collaborators
for
cecile
and
brian,
who
have
kind
of
been
with
me
at
the
beginning,
they're
ones
who
even
kind
of
started
this
whole
project
justin.
Who
is
a
new
associate
scientist
who's?
I
Taking
more
and
more
of
this
work
on,
it
might
eventually
become
kind
of
kind
of
become
his
and
then
julie
and
danny,
who
kind
of
also
been
contributing
a
lot
to
the
development,
and
also
that
kind
of
like
a
lot
of
the
thankless
organizational
work
that
we
all
need,
but
no
one
wants
to
do
and
then
andrew
gettleman,
I
guess,
for
you
know,
senior
scientists
this.
So,
let's
get
started.
I
Let
me
go
for
it
there
we
go
so
you
know
cam.
The
amp
group
has
the
amd
amwg
diagnostics,
which
have
been
around
actually.
I
looked
at
the
common
sub's
been
on
for
over
20
years,
and
it
basically
provides
push
button
diagnostics
ability.
Basically
you
just
send
it
the
paths
to
your
cam
history
files
and
it
outputs
a
bunch
of
tables
and
plots,
and
it
puts
it
all
on
a
website
and
it's
still
being
used.
I
I
used
it
in
grad
school
people
use
it
in
the
community,
it's
still
being
used
for
canon
7
development,
and
you
know
I
want
to
give
a
shout
out
to
it.
You
know
I'll
be
I'll,
be
amazed
at
the
adf
last
20
years,
and
so
you
know
any
credit
to
a
software
that
can
be
around
for
that
long
and
still
being
used
regularly
in
terms
of
the
actual
software
itself.
I
You
know
it's
basically
just
a
bunch
of
ncl
scripts
that
are
wrapped
with
a
seashell
wrapper,
and
so
that's
kind
of
where
the
issue
is
right.
So
we
have
to
move
on.
One
of
the
reasons
is
because
you
know
ncl's
being
deprecated,
it's
just
no
longer
really
supported
by
sizzle,
and
although
it's
still
used
still
can
be
used,
you
know.
Eventually,
there
will
become
a
time
where
it'll
just
be
really
hard
to
maintain.
A
I
Amwg
diagnostics,
do
you
want
to
change?
You
know,
contours
or
levels
or
anything
like
that,
and
also
particularly
it's
difficult
to
work
with
different
vertical
levels
and
what
the
original
cam
was,
which
is
a
problem,
because
a
lot
of
our
new,
the
new
versions
of
cam
have
many
different.
I
You
know
we
have
a
58
layer
and
a
93
layer,
and
all
of
these
you
know
we
need
a
pen,
we
need
diagnosis
packages,
agnostic
to
those
differences
and
then
finally,
which
is
you
know
it's
just
because
this
package
has
been
around
so
long
but
lacking
a
lot
of
modern
ic
practices
right.
It's
not
there's
no
open
development,
it's
not
on
a
repo
really
of
any
kind.
It
doesn't
have
any
sort
of
testing
or
ci
systems,
or
anything
like
that.
I
So
you
know.
Given
these
issues,
we
realized
we
have
to
move
on
and
build
a
new
package
to
kind
of
deal
with
this
and
so
amp
a
lot
of
us
and
iam
got
together
and
we
discussed
and
we
kind
of
looked
through
the
community.
You
know
one
of
the
things
of
diagnostics,
in
my
opinion,
is
in
some
ways
kind
of
a.
I
If
not
saturated
a
very
a
full
area
right,
there's
a
lot
of
packages,
we
looked
at
things
like
esm
battle
tool
which
is
developed
at
think
by
doe,
we
looked
at
mdtf,
which
is
developed
by
noaa,
which
all
kind
of
do
the
same
thing.
And
then,
during
this
we
built
myself,
brian
and
cecile
developed
in
like
two
days.
A
quick
python
thing
like
this
is
what
it
might
look
like.
If
we,
you
know,
started
from
scratch,
that's
how
we
landed
on.
I
We
landed
on
that
straw,
man
so
that
strong
man
is
now
the
adf,
and
so
you
know
some
ways
that
was
great,
but
in
other
ways
it
did
kind
of
immediately
add
a
lot
of
technical
debt.
So,
if
you
were,
you
know
those
first,
six
months
when
you
saw
you
might
have
been
like
ooh,
that
wasn't
the
best
written
code,
and
that
was
part
of
the
reason.
Why
then,
just
real
quick,
there's
the
url
for
folks
who
want
to
look
at
the
repo.
I
So
what
were
the
general
design
principles
of
the
adf,
mostly
and
most
of
these
design
principles
that
I
received
from
amp,
or
you
know
the
general
kind
of
user
group
community
that
we're
focusing
on
one
which
is
again
just
like
the
old
awg
diagnosis.
We
don't
push
button
capability
right.
I
Basically,
I
just
give
you
one
or
two
inputs,
like
the
paths
to
my
model
data
and
the
path
to
where
I
run
all
of
the
diagnostics
written
and
then
I
just
literally
like
type
go,
and
it
just
runs
right,
and
so
then
the
adf
will
generate
all
these
process
files.
You
know
climatologies
regretted
data
sets
generate
tables
with
statistics
plus
and
then
finally
try
to
put
it
all
together
in
a
website
and
in
the
future.
I
You
might
also
see
about
putting
it
together
on
like
a
notebook
or
a
book
collection,
one
of
the
really
strong
things
they
wanted
to
be
error.
Tolerant.
What
this
means
you
know
in
this
case
is,
if
I
have
missing
you
know
if
I
have
analyses
that
require
say
the
zonal
wind
field,
but
my
model
data
is
actually
missing.
Zona
wind,
instead
of
just
the
adf
dying,
it
just
says:
oh,
I
can't
find
zone
wind,
I'm
just
going
to
skip
this
particular
analysis
and
try
to
move
forward.
I
So
basically,
beta
will
try
its
hardest
to
run
to
completion,
even
if
it
has
to
skip
a
lot
and,
of
course,
will
print
warnings
but
you're
going
to
skip
a
lot
or
even
if
it
has
to
kind
of
bail
on
certain
things,
and
this
is
because
there's
certain
there's
certain
analyses
which
are
much
more
expensive
than
others.
So
we
don't
have
to
redo
the
expensive
analysis
that
we're
fine,
just
because
of
another
analysis
downstream
had
a
bug
in
it.
I
The
other
thing
that
was
really
hammered
to
me
was:
they
wanted
to
be
really
easy
to
port,
and
this
was
more.
This
is
actually
almost
the
most
important
thing
for
the
community.
They
wanted.
They
cared
about
it
more
than
like
performance
or
even
like
co-readability
things
like
that.
Some
ways
this
has
been
let
this
is
not
as
big
of
a
concern.
You
know,
as
brian
don't
talk
about
the
ease
container.
I
You
know
there's
now
systems
that
kind
of
solve
this
for
you,
but
in
general
you
know
at
the
time
we
wanted
to
make
sure
that
you
could
at
least
easily
install
everything
you
needed
with
conda.
The
other
thing,
which
is
a
little
different,
the
adif
does
a
little
different
than
other
diagnostics.
We
make
a
really
concerted
effort
to
use
the
minimal
number
of
external
python
packages
which
we'll
show
later
you
know.
Part
of
this
is
because,
even
with
containers
right
the
less
packages
you
have,
I
gotta
imagine
easier
just
to
maintain
over
time.
I
I
don't
also
have
some
development.
Even
during
the
development
of
the
adf,
I
ran
into
a
dependency
hell,
and
so
you
know
the
more
package
you
have,
the
more
likely
that
is,
and
so
just
trying
to
keep
it
to
the
necessary
packages
was
kind
of
our
goal.
Even
if
we
have
to
do
write
some
code
in
house,
it's
fine,
then
passing
it
off
to
a
package
that
might
not
be
around
five
years
from
now,
and
then
you
know
they
also
wanted
to
be
relatively
easy
to
to
modify.
I
Some
of
this
is
just
kind
of
basic
modularization,
and
so
you
know
also
we
wanted
to
have
inputs.
As
I
mentioned
earlier,
the
amwg
you
couldn't
really
change
things
like
contour
range
or
the
color
table,
and
so
the
adf.
Now
we
have
a
yaml
file
that
you
can
modify.
So
in
this
case,
psl
is
a
c
level
pressure.
It's
a
variable
from
cam.
So
basically
this
in
the
cml
file.
That's
something
like
this!
It
tells
the
adf
okay.
I
If
you
have
c
level
pressure,
we're
going
to
use
the
orange
color
map
we're
going
to
use
this
range,
we
also
have
the
ability
for
some
basic
unit,
conversions
right,
so
it'll
come
in
pascals,
convert
to
pascals
things
like
that,
then.
Finally,
right
almost
all
the
diagnostics
now
are
moving
to
python,
so
adf
is
also
python.
These
are
actually
all
of
the
packages
we're
using.
We
chose
them
either
because
they
have
a
they
have
a
very
large
community
base
right
like
x-ray
or
pandas
or
they're
developed
by
ncar.
I
I
Hopefully
these
packages
will
be
maintained
for
a
long
time
and
play
nice
with
each
other.
You
know
this
is
the
it's
kind
of
a
complicated
diagram,
but
basically
the
main
takeaway
is
like
the
top.
Two
files
are
the
only
files
that
the
average
user
should
modify
and
even
then,
the
only
one
that
they
would
have
to
modify.
Is
this
config
file
which
basically
contains
you
know
the
information
again
like
paths,
and
maybe
some
settings
like?
Oh,
I
don't
want
to
run.
A
I
A
I
Objects
that
inherit
from
each
other
and
then
that
object
kind
of
insanity
checking
it
doesn't
calculate
as
much
of
derived
information
and
metadata
that
we
need,
and
then
it
calls.
Essentially,
we
have
a
function,
call
that
calls
a
list
of
scripts,
and
this
is
where,
if
you're
developing,
you
can
add
your
script
in
following
a
certain
api
and
then
the
adf
will
just
call
it
all
of
these
will
basically
take
the
one
that
kind
of
is
in
the
adf
now,
but
maybe
not
necessary.
Going
forward.
I
Is
the
time
series
generator
we
just
needed
one
at
the
time.
But
again,
as
brian
dobbins
pointed
out,
you
know,
there's
a
time
series
container
and
there's
talk
about
eventually
csm
outputting
time
series,
but
so
we'll
see.
But
the
rest,
you
know
calculating
climatologies
regretting
to
your
observational
data,
sets
and
then
doing
analysis
and
plotting
and
then
putting
it
on
a
website
basically
kind
of
the
workflow
of
the
adf.
A
I
That
does
not
look
super
great
on
my
monitor,
it
looks
better
in
the
terminal
anyways,
so
the
the
api
is
so.
If
I
have
a
script
analysis
script
like
myscript.pi,
how
would
I
bring
them
to
the
adf?
Well,
all
you
have
to
do
is
at
the
top.
You
have
to
add
a
function
header
and
the
function
has
to
be
the
same
name
as
your
script,
so
my
strip.pi
has
to
be
defined
with
my
script
and
then
the
only
thing
you
have
to
bring
in
is
the
adf
object
itself.
I
The
idea
being
that
adf
object
will
contain
all
of
the
data.
You
need
to
run
your
analysis
and
so
we've
developed
basically
a
lot
of
different
kind
of
get
functions
or
methods
to
try
to
grab.
You
know:
do
you?
Are
we
comparing
its
obs
or
a
baseline?
What
are
the
name
of
the
model
run
cases,
and
then
we
also
have
even
functionalities.
I
If
you
want
to
add
your
own
config
option
to
that
yaml
file,
there's
just
a
quick
way
to
read
it
in
right
now
is
still
one
of
the
downsides
at
the
moment.
Is
it
still
means
that
each
independent
script
has
to
open
its
own
files
all
the
time,
so
we're
hoping
things
like
that
will
lessen
as
we
bring
it
into
esm,
so
individual
scripts
want
to
do
their
own
query
and
reading
and
those
sorts
of
issues
I'm
going
to
skip
this
because
it's
gonna
be
a
live
demo
and
live.
I
Demos,
of
course,
are
notoriously
successful
all
the
time,
so
I
figured
save
it
to
the
end,
so
I'll
come
back
to
that,
so
real
quick
in
terms
of
the
current
status
and
issues
and
I'm
flying
through
this
relative,
you
know
so
we
actually
hope
to
have
version,
one
which
is
kind
of
like
having
the
base
functionality
of
the
original
mwg
diagnostics
ready
by
the
end
of
the
summer.
Maybe
early
fall,
there's
two
things
that
are
holding
us
back,
one
is
we're
actually
just
lacking
observational
data
sets.
I
This
is
kind
of
something
that
we
have
to
pass
on
to
the
science
community
and
be
like
hey.
What's
what
data
set?
Do
you
want
to
compare?
You
know
the
cloud
variables
against
or
the
radiation
again
we're
also
still
missing.
We
have
I'll
show
a
lot
of
the
kind
of
basic
plot
types,
there's
still
certain
plot
types
of
missing.
You
know
like
time,
series
or
radial
overturning
things
like
that.
We
have
a
controller
to
try
to
move
this
forward.
We
have
a
weekly
hackathon
to
adjust
these
issues.
I
So
you
know
scientists
are
involved
in
trying
to
add
these
plots
and
try
to
flush
out
the
adaf.
If
you
want
to
join,
feel
free,
we're
happy
to
take
any
help
we
can
get.
One
of
the
downsides
also
is
currently
works
with
monthly
output.
This
is
an
issue
you
know,
there's
a
lot
of
diagnostics
like
the
mgo
or
looking
at
diurnal
cycles,
where
you
need
higher
frequency
output
and
again
we're
hoping
that
intake
esm
will
help
with
that.
I
The
other
another
issue
is
adf
right
now
it
works
fine
with
kind
of
standard.
You
know
model
runs,
you
know
even
to
like
maybe
half
a
degree,
but
when
you
get
to
really
long
time
periods,
thousands
of
years
or
really
high
resolution,
it's
currently
struggles,
and
so
we
need
to.
I
We
need
to
improve
performance,
particularly
be
a
desk,
and
then
one
thing
is
like
personally,
I
got
caught
in
the
you
know:
faster,
better,
cheaper
conundrum,
so
I've
had
to
kind
of
put
a
lot
of
the
good
software
during
practices
like
unit
testing
on
the
back
burner,
which
means
we
have
relatively
low
code
coverage,
and
so
you
know
going
forward.
I
I
kind
of
want
to
fix
a
lot
of
these
right.
You
know
implement
intake
esm
to
allow
for
easier,
a
simpler
api
with
scripts
and
then
also
to
manage
sub-monthly
data,
I'm
going
to
bring
in
das
to
particularly
from
memory.
I
should
back
up.
You
know
the
constraint
for
these
high
resolution.
Data
sets,
isn't
actually
the
cp.
You
know,
isn't
the
flops.
I
It's
the
memory,
it's
just
hard
to
load
that
much
data
into
memory
using
a
lot
of
you
know
kind
of
off-the-shelf
stuff,
so
we
have
to
figure
a
way
to
distribute
it.
You
know
also
bring
in
additional
plots
and
also
as
you'll
see,
the
website
can
use
some
beautification
which
can
get
subjective.
I
We
also
want
to
develop
notebook
interfaces,
especially
on
the
front
end
right
because
people
it's
a
good
notebooks-
are
great
at
kind
of
being
a
tutorial
for
things.
So
it
creates
the
front
end
to
help
explain
how
to
run
through
the
adf,
but
then
also
at
some
point,
it'd
be
nice
to
have
it
at
the
back
end.
I
So
you
can,
whenever
you're
running
through
the
adf
and
then
outputs
a
notebook,
you
can
run
again,
so
you
can
share
if
you
have
a
really
specific
diagnostic
and
then
finally
increase
unit
testing
and
code
coverage,
and
the
other
thing
which
I
want
to
give
a
shout
out
to
the
community
is
you
know
in
like
model
development
world
seem
to
have
really
good,
robust
regression
testing
an
integration
test
right,
there's
a
whole
there's
hundreds
of
regression
tests,
I
think,
for
I'm
sure,
for
you
know
cesm,
but
for
diagnostics
that
doesn't
seem
to
be
as
common
and
in
particular,
there's
a
lot
of
concerns
about
like
well.
I
How
do
you
regression?
How
do
you
determine
if
a
plot
which
ends
up
being
like
a
png
file,
is
different
than
a
previous
png
file?
So
it'd
be
it'd,
be
great
to
you,
know:
kind
of
spread
out
the
wider
community
and
figure
out
what
would
be
a
good
integration
regression
testing
for
diagnostics
and
cells,
particularly
when
we
make
small
changes
that
shouldn't
change
answers:
okay,
so
oops!
Actually,
I'm
going
to
exit
full
screen,
I'm
going
to
show
the
website
real
quick
time
so
well,
you're.
K
I
As
you
can
see,
it's
pretty
basic
right
now,
but
you
can
tell
all
the
different
kinds
of
information
we
have.
So
you
have
tables.
So
you
click
on
tables.
It
gives
you
for
each
run
simulation.
This
is
a
case
versus
a
case,
so
a
camera
versus
a
camera,
so
I
can
select
on
one.
It
gives
you
a
whole
lot
of
information
for
each
variable.
I
I
also
have
comparisons
right,
so
I
can
say
like
what's
the
difference
between
the
two
cases
over
on
the
right
hand,
side
and
what
the
units
are,
then
we
have
different
plot
types.
So,
like
lat
lawn
here's
cloud
low
cloud,
so
you
go
okay,
I
get
the
plots
below
cloud
if
it
doesn't
fit
on
your
screen
super
well
like
this
one.
You
know
I'm
kind
of
getting
to
cut
off
the
edge.
You
just
click
it,
and
then
you
get
the
full
thing
and
you
can
download
it.
I
We
have
zonal
plots
both
as
kind
of
just
two
dimensional
time
series
and
then
also,
if
you
have
a
three
dimensional
variable,
which
I'm
actually
not
sure.
If
this
one
has
oh
yeah,
well,
jupiter
heights.
H
I
Particularly
interesting,
but
right
you
can
also
then
get
height
latitude
plots
and
then
finally,
we
also
have
polar
plots
at
the
moment
to
plot
you
know,
precipitation
over
antarctica
or
greenland
those
things
anyway,
so
yeah
I'll
stop,
sharing
right
in
like
15
minutes
actually
a
little
over
so
but
yeah.
Thanks
for
listening.
Let
me
get
my
scene
back.
B
Do
a
few
minutes
for
a
question
so
feel
free
to
raise
your
hand
or
put
a
question
in
the
chat.
Let's
see,
I
think
I
first
saw
a
question
from
jim
edwards
in
the
chat
jim.
Do
you
want
to
unmute
and
ask
your
question
sure
I'm
just.
D
I
Yeah
so
right
now
we're
waiting
on
some
geocache
developing
that
ux
array
system,
which
is
designed
to
manage
that.
So
until
that
comes
online
yeah
we
do
right
now
you
have
to
regrid
to
a
regular
lat
lawn
grid.
So
but
hopefully,
when
that
comes
up
we'll
be
yeah.
We're
able
to
use
non
non-regular.
B
C
Sorry,
I
think
that
so
suggesting
this.
This
looks
great,
I'm
wondering
where
do
you
see
the
future?
Let's
say
for
when
we
release
csm3
as
integrating
this
into
the
total
workflow
of
a
seam
run,
not
not
even
for
csm,
but
as
we
move
forward,
it
would
be
great
to
start
integrating
diagnostics
routinely
so
that
you
spit
them
out
as
part
of
the
run
and
we're
not
doing
that.
So,
what's
your
vision
for
that.
I
Yeah
that
was
part
of
our
original.
I
guess
I
didn't
actually
have
an
original
design
principle
thing,
but.
D
I
And
so
it
should
be
assuming
the
environment
has
the
python
modules.
You
need
yeah,
it's
literally
just
modifying
a
yaml
file
and
then
running.
It's
literally
running
a
script,
so
it
can
be
put
as
a
job
in
a
schedule
or
just
run
like
that
and
then
also
it's
we
haven't.
I
This
could
be
a
discussion
with
sema
or
whoever
you
know
it
is
you
don't
have
to
actually
use
a
yaml
file
per
se.
We
could
it
has.
You
can
just
bring
in
the
object
and
then
add
the
information
directly
in
python
so
but
yeah
it's
designed
it's
designed
to
just
be.
It
can
certainly
be
run
as
a
as
a
compute
job
and
there's
a
few
things
we
still
need
to.
B
Rich
loft,
I
see
you
have
a
question
in
the
chat.
Do
you
want
to
ask
your
question.
G
No,
actually,
it
got
answered.
You
know.
Basically,
it's
around
the
same
question
that
jim
edwards
asked
about
connecting
with
unstructured
grids
and
with
the
effort
that
ryjin
represents
to
introduce.
You
know
unstructured
x-ray
objects
for
parallel
processing,
so
yeah.
B
Let's
take
one
more
question
from
brian
dobbins
and
then
I
see
there's
a
couple
other
questions,
but
we're
gonna
have
to
move
on
after
that
and
either
could
have
some
discussion
in
the
chat
or
there
will
be
time
in
at
11
30
for
some
further
discussion
where
this
could
be
addressed
more
so
brian.
Do
you
want
to
go
ahead.
F
Yeah,
I
mean,
I
think
this
ties
into
the
same
workflow
questions
that
marianna
was
asking
about,
and
this
is
just
you
know
you
had
this
workflow
of
generating
the
diagnostics
and
also
it
seemed
like
there
was
some
automation
for
generating
an
intake,
esm
catalog,
and
I
was
just
curious
what
the
thinking
was
there
in
terms
of
where
that
sits.
F
I
Yeah
I
mean
I,
you
know
we,
we
haven't
fully
decided
this
in
the
edf
side
yet,
but
you
know
so
we
might
have
it
as
like.
An
option
like
you
can
either
have
it
generate
a
can
and
take
esm,
catalog
or
not,
if
you
don't
want
it
for
some
reason,
but
it's
just
I
know
a.
I
Like
in
the
ocean
group-
and
I
think
to
the
land
group
to
less
I'm
not
as
familiar
with
that,
but
you
know
they
have
a
lot
of
diagnosis
that
expect
a
catalog
and
so
by
having
a
catalog
generated.
B
Thanks
our
last
two
talks
for
the
morning
are
both
on
the
topic
of
lossy
compression,
so
the
first
one
will
be
from
allison
baker
on
the
fine
on
fine-tuning
evaluation,
metrics
for
lossy
compression
of
cesm
data
allison.
Okay,
great,
I
see
your
screen.
G
J
Okay,
so,
as
bill
just
said,
this
is
part
one
of
two
talks
and
on
lossy
compression,
and
this
is
work
that
I've
been
doing
primarily
with
dorad
hammerling
and
alex
bernard
alex
will
be
giving
the
next
talk
as
well
as
haying
shu
and
many
other
people
have
contributed
to
this
effort
over
the
years.
J
So,
as
we
all
know,
computers
are
getting
faster
and
thanks
to
the
work
that
a
lot
of
people
oops
just
randomly
changing
slides
a
lot
of
people
in
this
group
have
done.
We
can
generate
data
at
a
crazy,
fast
rate.
This
plot
is
data.
I
got
from
dave
hart
it's
four
years
of
data
for
our
glade
usage
and
campaign
store.
J
You
can
see.
The
green
line
is
the
campaign
storage
right
now
the
capacity
is
92
petabytes
we're
about
60
full.
You
can
see
that
it's
growing
pretty
much
linearly.
The
current
sizzle
plan
is
that
once
it,
the
capacity
is
increased
to
between
100
and
120
petabytes.
That
will
be
it.
So
we
will
run
out
of
storage.
J
J
So
for
those
of
you
that
aren't
that
familiar
with
compression
there's,
basically
two
different
types,
one
is
lossless
compression.
This
means
that
when
we
compress
the
data
and
then
we
reconstruct
it,
we
end
up
with
the
same
information
we
started
with.
So
this
is
what
gzip
does.
This
is
what
a
net
cdf
zlib
deflate
does.
J
What
we
started
with
and
what
we
get
after
reconstruction
are
not
exactly
the
same,
and
so
the
reason
we
need
to
look
at
lossy
compression
is
that,
unfortunately,
loss
less
compression
is
not
that
effective
on
data
that
comes
from
numerical
simulations,
this
isn't
particular
to
csm.
But
it's
just
a
general
statement.
That's
true!
This
is
because,
on
this
numerical
data,
if
you
look,
I
output
the
temperature
with
you
know
what
would
be
a
six
in
64-bit
precision,
there's
a
lot
of
numbers
there.
J
J
But
the
issue
then,
is
that
we
have
to
be
careful
about
what
information
we're
losing.
Hopefully
we
lose
the
information
that
was
random,
noise
and
didn't
matter
in
the
first
place,
so
the
goal
in
our
work
overall
has
been.
We
want
to
reduce
storage,
but
we
don't
want
to
negatively
impact
our
signs.
J
So
we've
been
working
on
looking
at
lossy,
compression
and
csm
data
for
several
years,
and
I
just
wanted
to
kind
of
go
over
some
of
the
challenges
that
we
face.
I
mean
the
the
first
one
is.
Understandably,
scientists
are
reluctant
to
lose
any
information
that
might
be
important,
so
we've
really
kind
of
tried
to
have
this
mantra
of
doing
no
harm
and
trying
to
figure
out
how
best
to
evaluate
the
information
loss
for
climate
data.
J
Glossy
compression
has
obviously
been
around
for
a
long
time
and
is
really
popular
in
things
like
you
know,
video
and
images,
but
in
a
lot
of
applications.
You
know
if
the
image
still
looks.
Okay,
if
your
movie
still
looks
okay,
you
don't
really
care
what
was
lost,
but
that's
not
true
with
scientific
data,
so
we
have
to
be
careful.
The
variables
have
different
characteristics:
there's
spatial
and
temporal
dependencies,
and
just
overall,
the
focus
of
our
work
has
been
kind
of
one,
and
this
is
what
I'll
be
talking
about
in
alex
as
well.
J
So
the
first
thing
we
did
in
this
journey
was
to
establish
feasibility
that
you
know,
wasn't
a
horrible
idea
to
use
lossy
compression
on
climate
data,
and
we
did
that
by
thinking
of
ensemble-based
metrics,
and
the
idea
was
that
you
know
at
minimum.
We
wouldn't
want
any
compression-induced
differences
to
exceed
the
ensemble
variability,
and
this
is
sort
of
the
standard
we
think
of
when
porting
to
other
machines
and
stuff.
J
The
the
next
thing
we
did
was
give
scientists
access
to
some
of
this
data
that
we
had
applied
glossy
compression
to,
and
this
was
an
experience,
a
merit
that
we
jokingly
called
the
pepsi
challenge.
That's
if
you're
old
enough
you'll
know
the
reference
up
there
in
the
upper
right
and
the
idea
was
letting
scientists
look
at
the
data
and
see
if
they
could
tell
the
difference
between
which
had
been
compressed
and
not
compressed.
J
This
data
is
still
out
there.
Two
of
the
lens
one
runs
have
been
compressed
and
it's
very
difficult
to
notice
using
standard
analysis
tools.
But
of
course,
if
you
want
to
be
clever,
you
can
figure
it
out
which
are
compressed
and
which
aren't.
But
the
question
is:
does
that
matter?
And
so
we
come
to
okay.
J
We
really
need
to
be
looking
at
compression
at
these
fine
spatial
and
temporal
scales,
and
doing
this
kind
of
analysis
is
really
important
and
simple.
Metrics,
like
you
know,
the
root
mean
square
error,
which
maybe
is
enough
if
you're
just
looking
at
an
image
on
your
phone
or
something
isn't
enough
for
the
climate,
where,
for
example,
on
the
right,
I
have
pictures
of
looking
at
the
effect
on
the
contrast
variants
of
different
compressors.
So
these
are
the
kinds
of
things
that
we
need
to
look
at.
J
J
So
we've
been
working
on
developing
a
suite
of
metrics
and
I
have
some
of
them
listed
in
this
blue
box
here
and
those
are
things
like
you
know,
correlation
coefficients
that
are
good
at
noticing:
outliners
outliners
outliers,
the
ks
test,
which
notices
changes
in
distribution,
relative
errors
and
then
visual.
Similarity
is
one
thing
that
we
focused
on
it's
quite
critical
for
post-processing
analysis.
I
mean
we
just
saw
jesse's
talk
where
you
know.
Looking
at
these
different
plots
is
quite
important
and
often
the
first
interaction
with
the
data.
J
So
a
visual
similarity
metric
is
basically
a
metric
that
tells
whether
or
how
to
what
degree,
two
images
are
alike,
and
so
we
actually
had
over
100
participants
in
an
evaluation
study
a
few
years
ago,
where
we
let
people
say,
can
you
see
a
difference
or
not
between
this
data
and
from
that
we
determined
that
this
structural
similarity
index
measure
was
very
useful
and
along
with
a
corresponding
threshold
and
actually
jesse.
J
I
think
this
is
something
that
would
be
interesting
for
you
to
use
in
regression
tests
for
determining
differences
in
your
images,
but
anyway
I'll
stay
on
track
here.
So
from
there
we
developed
this
data
structural
similarity
index,
which
was
just
a
variance
so
that
we
could
apply
it
directly
to
the
floating
point
data.
J
We
just
want
more
of
general
idea
of
how
similar
images
are,
so
this
has
been
really
useful
for
us
and
in
fact,
it's
kind
of
our
primary
metric
we've
been
using
here's
a
picture
of
just
the
surface
temperature.
These
are
using
two
different
compressors
and
they
reduce
the
data
by
the
same
amount
and
the
image
on
the
right
is
a
much
higher
quality.
J
It's
hard
to
see
at
this
resolution,
but
if
you
zoom
in
and
maybe
you
can
see
that,
there's
some
blocking
artifacts-
that
the
compressor
on
the
left
has
has
left,
and
we
don't
want
those
and
it's
nice
that
the
dssim
picks
this
up
well
from
the
data,
not
even
from
the
image
just
from
the
data.
J
So
this
bottom
plot,
then
I
don't
expect
you
to
be
able
to
read
the
variable
names
at
the
bottom.
But
these
are
all
the
daily
variables
from
the
csm1
lens
data
set,
and
the
reason
I
have
this
here
is
just
to
basically
show
you
that,
for
a
fixed
quality
parameter
like
the
dssim
different
variables
are
more
compressible
than
others.
So
the
height
of
the
bar
shows
the
amount
of
compression
for
each
variable
and
the
colors
are
just
different
thresholds.
J
Here's
a
plot
that
I
think
is
useful
to
understand
how
these
metrics
work.
So
I
have
five
different
variables:
they're
in
each
color,
and
we've
used
this
zfp
compressor
as
we
look
at
the
plot
from
right
to
left.
The
solid
lines
indicate
the
file
size
they're
going
down
in
a
linear
fashion,
so
as
we're
increasing
the
amount
of
compression
we're
decreasing
the
file
size.
J
Now
we
look
at
the
corresponding
dotted
lines
and
when
you
go
from
right
to
left
here,
you
see
that
initially
the
compression
does
not
affect
the
quality
at
all,
because
getting
a
1
means
that
we
haven't
changed
the
quality
of
the
plot,
but
notice
that
these
dssim
dash
values
are
not
linear.
So
at
some
point
they
just
drop
off
and
the
quality
quickly
degrades.
So
when
you
think
about
getting
optimal
compression
and
with
these
compressors
and
using
a
metric,
you
think
I
want
to
catch.
I
want
to
stop
compressing
right
before
the
quality
drops
off.
J
There's
three
compressors
that
we're
looking
at
actively
at
the
moment-
and
our
main
criteria
is
here-
is
that
we
needed
to
work
with
that
that
cdf
data,
obviously
because
that's
what
our
output
data
is
in
right
now-
and
this
has
been
pretty
hard
until
recently,
because
most
of
the
off-the-shelf
compressors
did
not
support
net
cf
data,
we
had
to
pull
out
the
data
compress
it
you
know,
stick
it
back
into
the
next
cf
file,
but
now,
in
the
last
year
the
two
leading
doe
compressors,
zfp
and
sc
both
have
registered
hdf5
filters
that
allow
us
to
do
compression
through
net
cf4.
J
That
has
been
a
huge
for
us
as
far
as
being
able
to
use
it
and
having
others
use
it
easily.
Another
method,
I'll
mention,
is
charlie,
zender's
bit
grooming
algorithm.
This
approach
is
quite
nice.
It's
a
basically
a
pre-filter,
it's
available
through
the
nco
tools.
It's
easy
to
use.
It's
easy
to
understand
it's
available
now
and
we've
it's
been
quite
successful
when
we've
tried
it
out,
so
here's
an
idea
of
what
kind
of
compression
you
could
get
for
cam
data,
for
example.
J
This
is
all
with
zfp
and
I've
categorized
the
thresholds,
I've
used
as
conservative
middle
ground
and
aggressive,
and
we
have
the
compression
ratios
so,
for
example,
the
variable
on
the
left.
But
if
you
look
at
the
orange
bar,
it
goes
up
to
about
a
six.
This
means
that,
for
the
amount
of
compression
that
I've
termed
aggressive,
I
can
reduce
the
file
size
six
times
more
than
the
lossless
compressed
file.
So
this
means
it's
about
a
12x
reduction
over
the
original
file
and
about
six
over
the
reduction
over
the
lossless
file.
J
So
again,
you
can
see
that
the
different
variables
react
differently
to
the
same
quality
metrics,
and
I
would
say
that
you
know,
even
though
I've
labeled
this
aggressive.
It's
not
really
that
aggressive
in
the
grand
scheme
of
things,
and
I
think
it
would
be
very
realistic
to
expect
these
kinds
of
compression
rates
for
velocity
compression
just
quickly.
The
variables
I
showed
on
the
last
page
are
from
this
cam
test
set
that
we've
been
looking
at
extensively.
This
is
available
on
glade.
J
So
this
slide
is
basically
what
alex
is
going
to
talk
about,
but
I
want
to
give
some
motivation
for
it,
and
that
is
now.
I've
shown
you
there's
all
these
different
compressors.
J
There's
all
these
different
parameter
choices,
all
the
variables
act
differently
and
it
frankly
seems
kind
of
overwhelming
to
think
of
using
this
in
practice
so
clearly,
what's
needed.
For
this
to
be
practical
is
some
kind
of
tool
to
automatically
select.
You
know
for
a
given
variable
and
temporal
output
and
grid
size
to
automatically
select
the
right
amount
of
compression,
and
this
is
what
alex
will
be
talking
about.
J
So
I
think
I
pretty
much
covered
all
these
lessons
learned
and
I'm
not
going
to
go
on
them
here.
But
you
know,
working
closely
with
scientists
is
important,
we're
always
looking
for
feedback,
and
I
just
want
to
end
with
this
thought
that
my
hope
is
that
applying
lossy
compression
is
going
to
be
something
that's
not
suspicious
and
scary.
But
it's
just
going
to
be
something
that
you
do
when
running
the
model
like
you
would
choose
your
grid
resolution
and
output,
frequency
and
precision.
J
B
All
right
thanks
thanks
a
lot
allison.
I
do
see
one
question
in
the
chat
and
one
from
kevin
rader,
and
so
why
don't
we
take
that
as
we
transition
and
then
maybe
we
can
take
some
more
questions
on
lossy
compression
after
alex's
related
talk,
but
kevin
asks:
do
these
compression
tools
handle
variables
that
use
special
values
or
masking
values
to
denote
an
absence
of
data
at
some
points.
J
Yes,
they
do
now
now
that
you
can
use
them
so
before.
That
was
something
that
hying
and
I
had
to
figure
out
how
to
handle
ourselves,
because
we
had
to
pull
the
data
out
of
that
cdf
and
compress
it
and
then
basically
stuff
it
back
in,
but
now
that
these
tools
are
available
with
hdf5
filters
or
charlie
zenders,
just
through
nco
tools.
All
that
is
automatically
taken
care
of.
So
it's
it's
really
huge,
and
that
was
why
we
had
started
with
cam,
because
it
has
less
missing
values
and
things
like
that.
J
But
this
is
really
a
big
development
and
it's
going
to
make
it
easier
for
everyone
to
use
these
methods.
B
Great
so
yeah,
so
in
the
interest
of
time,
let's
move
on
again
well,
I
think
we
could
take
some
more
questions
for
alison
after
after
alex's
talk.
If
people
have
them,
but
let's
transition
to
alex
bernard
who
will
be
giving
another
talk
on
lossy
compression,
predicting
optimal
lossy
compression
settings
for
ces
and
lens
data
as
as
alison
just
introduced
so
alex.
B
K
All
right
so
yeah
today,
I'm
going
to
be
talking
about
sort
of
our
process
for
predicting
optimal
lossy
compression
settings
for
ces
and
lens
data.
So
my
name
is
alex.
I'm
a
phd
candidate
in
statistics
at
the
colorado
school
of
mines.
Allison,
who
you
just
heard
talk,
is
my
collaborator,
encar
and
art
hammerling.
K
Is
my
research
advisor
so
currently
to
obtain
these
sort
of
compression
ratios,
like
allison,
showed
in
our
previous
presentation,
we're
taking
a
sort
of
brute
force
approach
to
figure
out
what
the
optimal
compression
settings
are,
meaning
we're
trying
every
combination
of
compression,
algorithm
and
and
parameter
setting
that
we
have
available
to
us
and
running
a
suite
of
metrics
on
all
of
that
compressed
data
to
sort
of
figure
out
which
one
can
pass
all
these
metrics
and
yet
still
give
us
a
high
compression
ratio
and
it's
very
computationally
intensive.
K
So
what
we
want
is
a
model
that
we
can
use
to
predict
this
compression
level
in
a
way,
that's
a
lot
more
computationally,
efficient
and
so
to
do
that.
We're
trying
to
model
this
as
a
classification
problem
in
in
the
classification
problem
sort
of
through
review.
We
have
data
sets
some
data
that
we're
using
as
input
to
this
model
and
corresponding
labels,
we're
sure,
in
this
case
our
compression
settings
that
we're
trying
to
to
match
with
them.
That
figure
on
the
right
sort
of
illustrates
what
I'm
talking
about.
K
So
the
output
classes
that
we
are
looking
at
here
are
our,
like.
I
said
the
compression
that
will
be
the
compression
algorithm
we
want
to
use
which
can
include
either
zfp,
big
grooming
or
fc
at
this
stage,
and
for
each
of
those
there's
a
slightly
different
parameter
that
we
are
our
compression
level.
I
guess
that
we're
trying
to
tune
so
for
zfp,
it's
the
level
of
precision
for
big
grouping.
It's
the
number
of
significant
digits
of
the
original
data
we're
trying
to
maintain
and
for
sc.
K
We
have
an
absolute
error,
tolerance
that
we
can
specify
and
these
parameters
are
the
output
classes
of
our
model
and
we
select
a
series
of
possible
levels
that
cover
the
range
of
likely
optimal
compression
settings
for
for
all
of
our
variables
and,
as
allison
briefly
mentioned,
we're
using
these
four
metrics
to
determine
if
the
data
set
has
been
optimally
compressed.
So
that
includes
the
data
ssim.
K
We
as
input
to
this
model.
We
have
the
first
two
years
of
cesm
large
ensemble
cam
variables
in
daily
output,
which
is
about
47
different
variables
and
730
time
steps
for
each
variable.
So
we
have
several
thousand.
K
And
so,
as
I
mentioned
currently
we're
taking
a
sort
of
brute
force
approach
to
finding
what
the
optimal
compression
level
is,
and
I
was
going
to
go
into
just
a
little
more
detail
to
show
exactly
what
that
means.
So
on
the
right.
This
figure
here
is
is
a
plot
of
so
for
each
column
of
this
plot
represents
a
single
variable
and
then,
as
we
go
down
the
plot,
we
are
increasing
the
data
ssim,
which
is
the
visual
similarity
of
the
of
the
data
set
that
we're
working
with.
K
So
as
we
cross
each
of
these
thresholds,
the
aggressive
threshold.
That
means
our
data
is
considered
similar
under
the
aggressive
dssim
threshold,
which
is
0.95,
and
at
that
point
we
use
the
very
next
compression
a
data
set.
So
in
this
case
for
lhfox
we're
looking
at
zfpp12
and
that's
our
optimal
compression
level.
According
to
the
data
ssim
and
as
we
go
on,
we
also
find
optimal
compression
levels
using
different
thresholds
for
the
tssim,
so
middle
ground
and
conservative
thresholds,
and
we
this
is
sort
of
also
generic.
For,
for
each
of
our
thresholds.
K
We
sort
of
repeat
this
process
also
for
the
correlation
coefficients
special
relative
error
and
komagara's
sphere
knob
test-
and
we
repeat,
we
repeat
this-
for
for
every
variable-
to
come
up
and
every
time
slice
to
come
up
with
the
ideal
compression
settings
for
a
single
for
a
single
variable,
and
we
end
up
taking
the
the
the
most
compressed
variable
that
passes
those
metrics.
K
K
So
this
requires
selection
of
features
from
these
data
sets
we're
sort
of
going
to
extract
these
features
from
the
data
sets
and
use
them
as
input
to
the
model,
and
we
extract
these
from
the
uncompressed
database
sets
because
we
don't,
we
don't
want
to
have
to
compress
it
several
different
levels,
that
sort
of
defeats
the
whole
point
and
selection
of
appropriate
models
and
model
setups
for
this
type
of
classification
problem.
K
So
there
are
a
lot
of
existing
classical
statistical
models
to
that
we
can
apply
to
this
sort
of
problem,
so
we
have
random
forest
and
boosting
models
which
are
sort
of
shown
on
the
right
here.
They
are
sort
of
a
branching
tree
that
starts
at
the
top
and
works
its
way
down.
K
That
applies
some
rule
to
the
input,
features
and
ends
up
as
we
go
down
to
the
very
end
leaves
of
the
tree,
giving
us
a
label
for
what
our
data
set
optimal
level,
compression
level
or
compression
algorithm
will
be,
there's
also
a
few
others.
K
So
there's
support,
vector
machines,
linear,
discriminate
analysis,
deep
learning
and
aggregate
models
that
sort
of
look
at
the
results
of
several
other
models
to
come
to
their
conclusions
by
taking
the
mode
compression
level
or
something
like
that,
and
we
have
also
other
models
such
as
convolutional
neural
networks
that
are
sort
of
implicit
feature
models
that
discover
the
features
on
their
own.
K
So
the
explicit
feature
models
require
us
to
come
up
with
features
for
our
data,
which
are
used
directly
by
the
model
to
infer
a
label
so
so
far
for
this
sort
of
crude
first
pass
at
creating
these
models,
we've
come
up
with
eight
different
features
that
are
somewhat
useful
in
helping
us
predict
the
optimal
compression
level.
So
those
include
the
mean
of
the
variance
the
north
south
contrast
variance
east
west.
K
K
We
sort
of
designed
it
with
compression
in
mind,
but
it's
not
specific
to
that
use
case,
and
it
allows
us
to
perform
all
of
this
analysis
to
generate
these
features
and
to
compare
differences
between
the
data
sets
and
also
allows
us
to
to
do
things
like
plot
the
data
to
sort
of
give
us
a
better
sense
of
how
the
data
the
original
data
compares
to
the
compressed
data,
and
also
said
I
I
developed
this
along
with
allison
and
anderson
and
dort,
has
been
really
helpful
in
providing
guidance
for
this
package
as
well.
K
To
each
of
these
models,
we
we
run
a
parameter,
sweep
over
several
diff
possible
values
for
the
the
the
model
parameters
which
vary
depending
on
what
particular
model
you're.
Looking
at.
K
We
can
sort
of
come
up
with
an
idea
of
of
what
parameters
for
that
specific
model
are
performing
the
best,
and
once
we
have
once
we
know
the
ideal
parameters
for
each
model,
we
can
test
each
of
our
models
using
this
left
out
set
of
testing
data,
which
includes
the
different
variables
that
have
different
features
from
the
original
training
data,
and
we
don't
have
great
results
yet,
mostly
because
our
features
that
we've
chosen,
as
I
said,
are
sort
of
crude
and
don't
quite
give
us
perfect
separation
between
each
of
the
the
classes
that
we're
trying
to
predict.
K
But
we
expect
that
as
we
as
we
try
our
convolutional
approach,
we
might
come
up
with
some
more
features,
so
the
the
sort
of
challenge
here
is
to
find
features
that
that
hold
generically
across
these
data
sets.
So
they
may
look
entirely
different
in
this
case.
These
two
data
sets
are
are
optimally
compressed
using
the
exact
same
compression
settings,
but
it's
it's
hard
to
tell
just
visually
what
it
is
about.
K
These
two
data
sets
that
are
making
them
optimally
compressed
using
the
exact
same
settings,
and
so
for
that
we
are
going
to
try
a
convolutional
approach.
K
So,
as
I
said,
the
the
explicit
feature
models
are
simple
and
fast
and
easy
to
interpret,
but
but
we
need
to
find
better
features
that
we
can
feed
into
these
models,
and
the
the
nice
part
about
a
convolutional
network
is
that
it
sort
of
discovers
features
on
its
own
through
the
process
of
training
the
network,
and
we
can
use
those
new
features
to
feedback
into
the
explicit
models
to
improve
those
models.
K
And
as
sort
of
a
quick
review
of
the
convolutional
neural
network
approach,
so
we
have
some
input
data
set
in
this
case.
We're
inputting
the
entire
we
can.
We
could
potentially
input
the
entire
data
set
at
a
single
time
slice.
K
We
can
involve
that
using
a
series
of
filters
or
kernels
to
come
up
with
intermediate
levels,
which
are
called
feature
maps
which
can
represent
different
aspects
of
the
image.
So
you
might
have
a
kernel
that
selects
for
edges
in
the
image
like
horizontal
vertical
diagonal
edges,
you
might
have
kernels
that
sort
of
smooth
the
image
and
whatever
features
arrive
arise
can
be
sort
of
seen
in
these
feature,
maps
which
these
feature
maps
are
then
condensed
to
sort
of
summarize.
K
The
overall
presence
of
the
features
in
a
region
of
the
map
and
that
process
is
repeated
and
at
the
end
we
use
the
the
generated
features
in
a
dense
neural
net
like
normal
to
to
predict
the
optimal
compression
level
of
settings.
K
So
at
this
stage
the
model
is
not
very
sophisticated.
We're
sampling
these
images
by
36
times
so
36
times
to
reduce
the
time
to
fit
the
model,
we're
using
the
image
data
directly
and,
as
I
showed
before,
if
our
data
looks
completely
different,
it's
not
surprising
that
the
cnn
can't
pick
up
on
features
of
the
data
that
are
maybe
not
visible
from
just
looking
at
the
raw
data.
K
We're
also
modeling
only
one
step
into
this
process,
so
we
only
we're
only
modeling,
either
selection
of
the
optimal
compression
algorithm
or
the
optimal
compression
parameters
conditional
on
a
single
algorithm,
we're
using
a
single
hidden,
convolutional
layer
and
a
small
collection
of
ces
and
daily
variables
and
we're
not
including
any
of
the
explicit
features.
K
K
Looking
at
how
this
model
is
coming
up
with
features,
though,
as
I
said,
maybe
give
us
insight
into
how
we
can
improve
the
explicit
future
models
later
on.
So
the
the
end
goal
here
for
the
convolutional
net
is
to
have
a
sort
of
two-stage
model,
where
we
have
one
model
that
first
selects
for
the
optimal
compression
algorithm,
the
sccfp
or
big
grooming,
and
then
conditional
on
that
we
have
a
separate
model
which
may
or
may
not
be
a
convolutional
neural
net.
K
That
selects
for
the
actual
optimal
parameters
within
that,
given
that
compression
algorithm
we'll
be
using
some
interpretable
techniques,
such
as
saliency
maps
to
investigate
and
improve
the
other
models,
or
we
may
end
up
finding
that
the
cnn
approach
itself
is
the
most
ideal
approach,
we're
not
really
sure
yet,
and
we're
also
going
to
try
pre-processing
the
input
data.
So,
instead
of
maybe
looking
at
the
raw
data,
we
might
look
at
something
like
the
gradient
fields
of
the
the
input
data.
K
As
our
input
and
a
quick
summary
so
sort
of
how
this
relates
to
to
cesm
we're
hoping
to
make
this
sort
of
an
automatic
process
that
happens
whenever
you
run
a
simulation,
so
there's
just
one
automatically
generated
configuration
file
like
a
yaml
or
a
json
that
supplies
the
compression
settings
and
it'll
have
some
preset
default
settings,
and
it
only
really
needs
to
be
edited
if
space
or
or
accuracy
of
the
data
are
of
special
importance.
Otherwise,
the
user
doesn't
really
need
to
do
anything.
K
Special,
the
compression
will
just
run
and
sort
of
under
the
hood,
and
nothing
else
needs
to
be
done,
and
that's
all
I
have
so
thanks
for
listening.
B
All
right
thanks
very
much
for
that
great
talk
alex,
and
I
do
see
one
question
already
in
the
chat
from
brian
dobbins
brian.
Do
you
want
to
ask
your
question.
J
Yeah
so
yeah
it,
it
really
depends
brian.
It
depends
on
the
compressor,
some
of
them,
like
the
zfp
compressor,
that
I
showed
results
for.
Are
it's
a
transform
compressor,
it's
quite
sensitive
to
the
chunking.
I
usually
do
chunks
as
individual
time
slices
with
some
variables.
You
can
get
better
compression
if
you
compress
multiple
slices
at
once
with
some
variables
you
get
worse.
J
The
same
is
true
with
2d
and
3d
ver
with
3d
variables.
Does
it
make
sense
to
compress
it
as
a
3d
trunk
or
to
do
2d
slices
again?
That
depends
on
the
variable.
I
mean
compression
really
works
by
finding
patterns
in
the
data
so
variables
that
have
good
correlation.
It
makes
sense
to
do
them
in
big
chunks,
but
like
variables,
like
cloud
fraction,
are
extremely
difficult
to
do
in
those
it
makes
sense
to
do
smaller
chunks
so
yeah.
I
did
that
answer
your
question.
F
Right
correct,
I
guess
yeah,
it
would
be.
I'd
also
be
interested
in
seeing
sort
of
what
the
difference
rates
are
between
the
different
methods
for
a
variable
and
like
different
chunk
sizes,
because
if
it's
relatively
minor,
then
picking
something
that
performs
well
might
be
good
enough.
But
yeah.
J
J
It's
it's
usually
minor,
except
for
there
are
some
variables
that
are
just
hard.
I
mentioned
cloud
fraction
because
the
smallest
non-zero,
so
all
the
it's
a
fraction,
so
they're
all
between
zero
and
one,
but
the
smallest
non-zero
value
is
like
order.
Ten
to
the
minus
twelve.
I
think
so.
There's
a
huge
range
in
the
in
the
data,
and
it's
that
stuff
like
that
is
hard
for
the
compressors.
G
Rich
yeah,
I
was
just
going
to
respond
to
brian
that
I
put
a
message
in
the
chat
that
hyenk
about
a
year
ago,
did
some
chunking
of
high
resolution
impasse,
data
in
and
then
parallelizing
those
chunks
to
do
to
look
at
the
scaling
and
performance
of
parallel
compression,
and
I
can
share
her
poster
that
she
gave
on
that
topic.
If
you'd
like
to
look
at
it.
B
Thanks
thanks
for
that
comment,
I
have
a
question
going
back.
I
think
it
was
your
previous
slide
here
on
yeah
how
we
would
apply
it
to
cesm,
so
am
I
can
you
help
I'm
having
a
lot
of
trouble
understanding
this?
So
is
the
idea
that
that
there
would
be
sort
of
based
on
the
based
on
what
you
find
from
this.
We
would
kind
of
pre-specify
for
different
variables,
the
the
compression
the
optimal
compression
technique
for
that
variable
or
or
is
there
something
else.
K
Yes,
so
I
guess
if
we
maybe
maybe
alice-
wants
to
add
something
here,
but
but
I
guess
my
understanding
is
yeah.
If,
if
we
know
what
the
optimal
compression
settings
are
for
that
variable,
then
then
they
would
sort
of
automatically
be
applied,
but
I
guess
otherwise
we
would
apply
our
model
to
to
sort
of
assess
what
we
predict
to
be
the
optimal
compression
settings
and
we
would
have
some
error
checking
to
account
for
that.
B
J
If
there
is,
I
guess,
oh,
can
I
collect
yeah,
so
I
guess
like
what
I'm
envisioning
is.
You
know
you
have
options
of
all
the
possible
output
variables,
so
hopefully,
with
this
tool,
we
build
some
kind
of
database
with
you
know
all
these
possible
options
and
then,
depending
on
how
someone
configures
their
csm
run
and
which
variables
that
we
want
to
output,
then
we
create
this
file
with
knowing
that
okay
for
this
variable
they're
using
this
grid
resolution
and
they're
doing
daily
output.
J
B
So,
if
how
so
extending
that
to
the
the
scenario
where
someone
introduces
a
new
output
variable
or
kind
of
time,
frequency,
that's
different
from
from
something
that
you've
like
exactly
trained
your
model
on,
do
you
do
you
have
a
vision
like?
Would
people
find
a
similar
variable
to
specify,
or
would
it
be
simple
enough
to
run
this
to
run
this
tool
on
these
new
variables
for
by
a
user.
J
J
K
That
is
sort
of,
I
guess.
The
whole
point
of
the
model
is,
if
we
introduce
a
new
variable,
then
we'll
be
able
to
accurately
predict
what
the
optimal
compression
settings
for
that,
for
that
new
variable
would
would
be.
D
Right
well
so
this
is
sort
of
a
tangential
point,
but
I
guess
I'm
sort
of
trying
to
understand
where
in
the
process,
this
would
go
because
I
mean
currently
we're
limited
to
writing
out
single
history
slices
of
our
variables,
and
I
assume
your
compression
algorithms
operate
on
the
time
series
output.
So
that's
still
an
extra
stage.
So
I
would
assume
you
would
plug
this
sort
of
thing
in
along
with
the
post-processing
tool
that
generates
the
time
series.
D
J
I
mean-
I
guess
my
thinking
on
this
is
the
very
first
thing
I'd
like
to
do
is
start
with
data
sets
that
are
sitting
out
there,
taking
lots
of
space
like
some
of
the
lens
data
sets
and
doing
post
processing
to
them,
but
certainly
going
forward
in
the
future.
I
would
hope
that
we
can
write
the
data
directly
to
time
series
in
that
compressed
format
unless
the
user
specifically
says
don't
compress
it.
D
D
D
The
concern
I
have
I
guess
with
it
is
that
I
think
what
you're
saying
is
that
you
have
to
figure
out
the
optimal
mechanism
for
each
variable
and
also
for
each
resolution
of
that
variable,
and
I
guess
potentially
like
if
you
run
that
paleo
the
same
variable
for
paleoclimate
it
might
be.
You
might
need
something
different
than
for
future
climate
right
or
is
that
right?
Do
you
think,
do
you
think
the
time
representation
might
matter
as
well
as
resolution.
D
C
J
D
J
I
think
brian
put
something
like
that
in
the
chat
brian
dobbins,
like
you
could
just
go
with
a
what
you
think
is
a
good
enough
option
or
you
could
just
say
I
want
to
be
pretty
conservative
and,
of
course
there
always
be
the
option
to
say
well
these
three
variables
I
want
those
losslessly
compressed.
I
don't
want
to
lose
any
information
because
I'm
going
to
study
those
to
death,
but
the
other
300,
I'm
not
going
to
look
at
that
closely
and
go
ahead
and
compress
those
a
lot.
J
From
gary
strand
with
the
csm
lens,
one
data
is
that
everyone
downloads
the
same
10
variables
and
looks
at
them
and
the
other
300
are
just
sitting
there
taking
up
space.
So
we
could
consider
that
okay
for
these
10,
that
everyone
really
looks
at
we'll
go
very
conservative,
but
for
the
300
that
people
don't
really
look
at
we're
gonna
go
more
aggressive.
D
Yeah
that
makes
perfect
sense
yeah
thanks
for
adding
that
in
I
just
I'm
thinking
about
all
these
hundreds
of
variables
that
are
out
there
and
then
trying
to
figure
out
the
optimal
thing
for
them,
but
I
think
you're
right
that
often
people
don't
care,
they
only
care
about
a
few
variables
and
not
everything
so
yeah.
Awesome
thanks
thanks
for
that.
B
Question
kind
of
following
up
on
that
end
and
brian's
comment
in
the
chat
about
questions
asking
about
a
default
options
is,
is
can
can
you
give
some
some
sense
for
how
how
bad
you
know?
Let's
say.
Let's
say
we
wanted
to
get
kind
of
started
on
this
sooner
rather
than
later.
Given
the
importance
of
this
like
how
bad
how
bad
would
things
be
if
we
did
try
to
pick
something
sort
of
default
for
a
lot
of
variables
that
was
maybe
on
the
more
conservative
side
is?
B
K
Sort
of
goes
back
to
that
figure
that
allison
had
in
her
presentation.
Doesn't
it
about?
I
guess
the
the
amount
of
compression
we
can
get,
there's
sort
of
a
baseline
level
of
compression.
We
can
get
for
most
of
the
variables
of
about
two
to
one,
and
I
think,
if
we
compress
all
the
data
at
that
sort
of
same
conservative
level,
and
it
would
be
reasonable
to
to
expect
that
sort
of
saving.
B
Like
a
you
know,
just
just
arbitrarily
would
you
know,
could
you
say
something
like?
Oh
we
can.
We
can
at
least
get
it
an
extra
two
to
one
by
using
some
conservative
threshold
that
for
lossy
compression
that
that
works
for
every
variable
or
99
of
variables,
or
something
like
that
or
is
that?
Is
there
just
too
much
spread
in
the
lossy
to
be
able
to
do
something
like
that.
J
I'm
I
think
on
average
you
could
certainly
say
that,
maybe
that's
I
think
on
that
plot
alex
was
talking
about
my
quote.
Conservative
one
had
an
additional
about
2x
improvement
over
the
lossless
one.
I
mean
all
these
compressors
have
different
ways
of
controlling
the
error.
Too.
I
mean
in
theory
a
scientist
could
just
say.
Well,
if
every
variable
has
a
relative
error,
you
know
if
the
relative
error
is,
you
know
0.001
between
the
compressor
and
compress
on
every
variable,
I'm
good
with
that.
J
K
I
was
actually
just
trying
to
to
remember
if
there
is
sort
of
like
a
lower
limit
to
how
much
additional
compression
we
can
get.
There
might
have
been
a
couple
of
variables
that
we
we
couldn't
get
any
additional
improvement
over
the
loss
list,
but
I
think
for
the
most
part,
you
could
definitely
compress
to
at
least
like
say
one
and
a
half
times
over
the
loss
list.
I'm
not
sure
if
you
can
get
to
two
for
every
variable,
though,
but
there's
definitely
some
kind
of
lower
bound.
There.
J
J
Probably
that's
probably
probably
good
yeah
and
brian
dobbins
commented
in
the
chat
as
as
far
as
making
this
practical
I
mean
I
recently
have
been
working
with
brian,
so
that,
because,
obviously
for
cloud
data,
the
impact
of
data
storage
is
immediately
obvious
in
terms
of
your
bill
for
how
much
space
you're
taking
up
so
brian
and
I
are
starting
to
work
together
on
some
of
this
compression.
I
hope
the
more
people
can
play
around
with
it.
The
more
comfortable
people
will
get
and
the
more
feedback
we'll
get
from
the
community
and
the
better.
F
Not
yet
yeah,
I'm
still
very
early.
In
this
I
mean
my
my
goal
is
to
have
sort
of
you
know.
So
csm
has
this
workflow
capability
that
jim
edwards
has
added
and
we
could
have
a
workflow
that
calls
the
container
that
does
this,
but
I'm
very
very
early
on
the
plane.
With
this
and
and
sort
of
I
can
I've
been
able
to
test
the
nco
stuff
in
the
container
on
some
already
time
series
data,
but
I
want
to
get
the
time
series
generation
into
that
as
well.
F
So
that's
a
full
like
singular
solution,
so
there's
still
some
work
to
be
done,
but
as
soon
as
I
get
time-
and
I
I
make
a
bit
of
progress,
I'll
I'll
coordinate
with
allison
and
anybody
else-
who's
interested-
maybe
we
can.
We
can
kind
of
get
some
eyes
on
it
from
scientists
and
and
get
it
in
workflows
for
testing.
B
Yeah,
I
just
have
a
naive
question.
Well,
I
have
two
two
naive
questions.
First,
with
all
of
the
compressors
do
all
of
the
standard,
is
there
any
consideration
for
the
end
user
enabled.
J
So
with
that
with
using
the
compressors
through
netscf
filters
like
we
are,
then
it
will
just
be
as
if
you're
reading
a
losslessly
compressed
file
now
so
as
long
as
you
have
then
the
right
net
cdf
hdf5
installation,
which
obviously
we
would
make
sure
the
right
one
was
on
cheyenne.
Then
it
won't
be
any
different
to
you.
B
H
B
And
then
then,
my
second
question
is
say:
someone
might
have
like
a
pile
of
external
hard
drives
sold
with
hundreds
of
gigabytes
of
net
cdf
files.
How
what's
the
first
step
in
just
grabbing
these
compressors
and
compressing
away.
J
Well
so
I
mean
for
sure
you
should
already
be
using
lossless
compression
through
net
cf
and
then
the
easiest
one
to
use
that
I
mentioned
was
charlie
zender's
method
because
that's
just
simply
available
through
the
nco
tools,
and
I
I
wrote
a
document
how
to
do
this
and
I
shared
it
with
brian
and
I
happy
to
share
it
with
you.
But
it's
it's
fairly
straightforward.
B
J
Then
the
the
more
like
the
doe,
compressors,
sc
and
zfp
you
do
have
to
have
a
special
build
of
an
scf
hdf5,
as
jim
pointed
out
in
the
in
the
chat,
and
also
jim,
pointed
out
that
yeah
that
you
won't
have
to
use
nco
for
charlie
zender's
tool
with
the
next
net
cdf
release.
J
It's
actually
going
to
be
a
part
of
that,
so
that
one
I
like
that
one,
you
might
not
get
as
good
compression
rates,
but
it's
very
simple
to
use
very
easy
to
understand
and
all
your
tools
are
going
to
work
automatically.
So
it
has
some
really
great
advantages
and
I
think
it's
a
good
place
to
start
sounds
good.
Thank
you.
B
I
see
there,
you
know,
we've
been
talking
about
compression
of
output
data.
I
see
that
there's
an
interesting
comment
from
mattvay
in
the
in
the
chat.
Do
you
memphi?
Do
you
want
to
unmute
and
and
share
your
thought.
D
If
you
want
to
run
like
1850
to
20
something
yeah
to
the.
B
So
the
the
yeah
so
the
question
being,
would
it
be
feasible
for
us
to
compress
compress
all
the
atmospheric
forcings
and
other
input
data
to
see
esn
so
that
the
yes,
so
that
people
can
get
compressed
input
data
yeah?
That.
E
B
I
makes
sense
to
me:
I
don't
know
if,
if
others
like,
jim
or
brian
or
others,
have
thoughts
on
that
or
or
allison
and
alex
if
you've
looked
at
all
at
input
data
that
wasn't
output.
F
Yeah
I'd
be
curious
to
hear
I
don't
even
know
to
be
honest,
whether
we
use
any
compression
on
our
input
data
now,
but
I'm
very
interested
in
this
because
again
with
the
cloud
you
have,
the
the
charges
for
transfers
are
related
to
the
size,
the
data.
So
if
we
can
do
compressed
data,
that's
it's
faster.
It's
cheaper!
It's!
This
is
that's
my
interest
in
this
there
there
are
questions.
F
I
think
that
aren't
answered
and
alice,
and
I
have
talked
about
doing
like
a
run,
with
restarts
compressed
to
conservative
values,
to
see
what
the
variance
is
from
uncompressed
runs,
because
if
you're
talking
input
data,
you
know
lossless
jim
just
responded.
Input
data
is
not
currently
compressed
as
well,
not
even
lossless,
maybe
there's
something
we
could
do
there,
but
then
you
maybe
don't
have
that
backwards.
Compatibility,
but
velocity
compression
be
really
interesting
for
to
see
if
there
are
areas
where
that's
even
viable.
I
I
don't
know
I.
J
I've
kind
of
steered
away
from
even
suggesting
using
lossy
compression
on
restart
files
or
input
files,
because
my
feeling
is,
if
that's
like,
a
step
beyond
accepting
its
use
on
the
output
files
and
will
have
much
more
effect
on
the
much
greater
effect
on
the
data.
So
my
feeling
is:
let's
get
the
output
data
that
adopted
first
but
yeah
in
the
future
that
it
would
be
fun
to
look
at
though
marianna.
C
That's
a
great
question
about
input
data.
The
question
I
have
is
say
we
start
compressing
input
data.
Do
we
get
rid
of
the
other
input
data
because
then
you
have
two
copies,
which
one
are
you
going
to
use?
And
so
that's
that's
the.
So
it
brings
a
question
of
backwards
compatibility
in
because.
B
We
never
overwrite
our
data,
given
our
given
our
strong,
strong
requirement
for
backwards
compatibility.
I
think
we'd
have
to
do
it
for
for
data
sets
moving
forward
and
could
start
maybe
with
some
of
these
big
data
atmosphere.
Forcing
data
sets
to
at
least
for
offline
runs
they're
a
big
barrier.
I.
C
Think
that
could
make
actually
a
huge
difference
in
how
we
store
it
even
on
I
mean
even
on
cheyenne,
if
you're,
if
you're
forget
about
the
web
servers,
we
have
you
know
some
of
these
are
particularly
as
we're
going
to
higher
and
higher
resolution.
That
could
be
a
really
big
saving.
B
F
Yeah
I
was
on
this
note.
I
was
just
going
to
say
if
we-
and-
and
this
is
where
I
don't
know
enough-
but
if
we
know
if
we
can
read
loss,
lossless
compression
with
certain
versions
of
net
cdf,
and
we
can
detect
that
when
we
reach
out
to
the
input
data
servers,
we
can
possibly
set
up
a
secondary
input.
Data
service,
like
I
said,
I'm
talking
about
set
up
regional
ones
for
the
cloud
that
have
the
the
lossy
compression
and
default
back
to
the
loss
list
from
ncar.
If
that's
not
available.
F
So
I
think
there
are
ways
to
test
this
and
and
sort
of
get
the
the
lossy
stuff
out
and
maybe
track
stats
as
to
how
many
people
are
still
using
older
versions
of
net
cdf.
You'd
have
to
think
of
a
way
to
do
this,
but
I
think
I
think
there
are
some
options
there
and
maybe
that's
the
path
forward.
Yeah.
B
Cool
I
have,
I
have
a
kind
of
broader
question
for
allison
and
alex,
which
is.
Are
you.
B
Kind
of
your
sense
of
the
of
the
level
of
support
you,
you
get
versus
resistance
from
the
community
when
you,
when
you've
talked
to
others
about
this,
and
I
guess
related
to
that.
Are
there
things
that
others
from
the
software
engineering
working
group
can
do
to
can
do
to
kind
of
support,
support
this
effort
and
kind
of
kind
of
help.
You
guys
see
it
through
to
so
that,
so
that
we
can
get
this
adopted
soon,
because
I
know
it
is
a
it's
a
important
thing.
J
I
think
certainly
the
fact
that
we
are
going
to
run
out
of
storage
soon
has
been
very
helpful
in
motivating
people's
interests.
Gokan
has
been
extremely
supportive,
he's
actually
partially
funded
a
lot
of
alex's
work
and
I
feel,
like
people
are
definitely
more
interested.
I
I
have
had
trouble
getting
cam
scientists
to
look
at
these
data
sets
that
I
generated
I'll
say
in
honesty.
J
It's
been
about
a
year
since
I
generated
them.
I
have
gotten
more
interest
from
some
of
the
other
groups
and
now,
with
the
advances
in
net
cf
compression
we're
going
to
start
looking
at
some
of
the
other
model
group
data,
I
definitely
feel
like
people
are
interested
and
yeah.
I
it's
it's
just
we've
been
working
on
this
for
a
long
time,
but
it's
really
only
recently
that
the
technology
is
kind
of
coming
together
that
it
could
be
put
in
practice
and
even
so
like.
I
don't
think.
J
I
think
the
sc
method
will
work
with
pnet
cdf,
but
I
don't
think
zft
does
yet
so
there's
still
some
issues
to
be
worked
out
as
far
as
doing
it
in
peril,
but
I
guess
I'm
still
just
aiming
for.
Like
kind
of
the
low
bar
of
like
compressing,
some
of
the
data
sets
that
are
out
sitting
out
there,
not
even
worrying
about
doing
it
real
time
or
new
things,
but
let's
go
after
some
of
the
old
stuff,
that's
taking
up
space.
So
I
don't
that's
that's
my
thought,
but
certainly
I
mean.
J
I
think
that
this
group
can
definitely
make
it
easier
to
use
lossy
compression,
maybe
make
it
the
default.
Maybe
work
with
the
scientists
to
help
them
get
experience
with
the
data
so
that
it's
not
again.
I
don't
want
it
to
be
this
scary,
unknown
thing
I
wanted
to
be.
You
know
when
they
sit
down
to
think
about
which
variables
they're,
gonna
output
and,
as
rich
mentioned
in
the
comments,
that's
obviously
the
best
type
of
compression
to
stop
outputting
stuff
that
you
don't
look
at.
J
B
Okay
thanks,
I
see
there's
a
there's,
a
lot
of
there's
a
lot
of
questions
and
comments
in
the
chat.
So
let
me
try
to
I'm
gonna
try
to
summarize
them
and
and
for
the
people
who
ask
them
feel
free
to
jump
in
if
I'm,
if
I'm
mis
stating
these
cheryl
craig
asks
about
bit
for
bit
comparisons,
and
so
I
I
guess
this
gets
to
the
reproducibility
of
the
lossy
compression
algorithm
itself.
B
So
if
we
have
a
baseline,
where
we've
run
lossy
compression
to
store
that
baseline
and
then
we
rerun
code
and
then
apply
the
same,
lossy
compression,
algorithm
and
parameters
can
will
the
results
still
be
bit
forbid
with
the
with
the
baselines?
J
As
far
as
I
know,
if
you're
using
the
same
versions
of
everything,
it
should
be,
there's
not
inherent
randomness
in
these
compressors
that
we're
using
at
least.
B
Okay,
good
and
what's
the
dave,
bailey
asked,
what's
the
latest
on
uncompressing,
to
read
in
terms
of
cpu
time
I
think
yeah,
so
I
guess
dave
you're
asking:
how
long
does
it
take
to
uncompress
these
files.
J
So
so
doing
it
through
netcdf,
I
haven't
noticed
the
difference
between
reading
a
losslessly,
compressed
net
cdf
file
and
one
lossy
compressed.
I
mean
it's
being
taken
care
of
by
netcf
and
you're.
Like
most
your
data,
I
think,
is
already
losslessly
compressed,
so
I
don't
think
you're
really
going
to
notice
a
difference
between
what
difference
you
already
experienced
when
you
went
from
the
original
to
the
lossless.
D
J
G
And
I
looked
last
year
be
before
I
retired,
from
incar
we
looked
at
the
cost
of
different
two
different
algorithms.
One
was
dfp
and
how
long?
What
is
the
overhead
for
the
compression
and
and
decompression
step
relative
to
the
overall
time
it
takes
to
do
the
io
and
and
it's
algorithmically
dependent?
G
G
So
it's
actually
it's
that
if
the,
if
you
have
a
nice
decompression
algorithm,
for
example,
your
your
read
times
will
be
significantly
lower.
If
the
file
is
smaller,
that's
so
it
it.
But
the
important
thing
is
like
we
hying,
never
tested
this
on.
G
You
know
variable,
let's
say
all
the
different
algorithms
that
I
know
of
that
were
listed
by
allison,
so
some
may
be
more
expensive
than
others,
and
it's
worth
it's
worth.
Looking
at
the
overhead
of
how
like
per
petabyte,
how
many
core
hours
do
you
actually
have
to
spend
decompressing
but-
and
I
could
david,
I
could
share
that
that
poster
with
you
could
kind
of
see
what
I'm
talking
about.
G
I
think
another
thing
is
like
the
little
subtle
thing
which
is
crops
up
is,
if
you
try
to
do
parallel
this
stuff
in
parallel,
and
you
have
a
block
size,
I
think
the
reproducibility
issue
might
crop
up.
If
you
don't
do
it
with
the
same
block
size
in
two
different
runs,
and
I
don't
know
the
answer
to
that.
J
That
that
so
that's
true,
it
definitely
depends
on
the
block,
like
everything
really
has
to
be
exactly
the
same,
it
does
depend
on
the
block
sizes.
So
usually,
when
I
do
compression,
I
specifically
pick
the
block
chunk
sizes
for
net
cf,
so
that
I
know
what
they
are
and
I
don't
leave
it
up
to
the
algorithm
to
pick.
G
Yeah
and-
and
I
think
that's
that's
unfortunate-
we
tried
to
stay.
We
try
to
stay
away
from
that
say,
an
mpi
within
the
models
that
you
can
run
with
different
ranks
and
you
won't
get
a
different
answer,
but
it
just
may
it
just
may
not
be
possible,
but
that's
a
kind
of
a
deep
in
the
algorithm
model
dimension.
There
might
be
a
way
to
patch
it
up.
Somehow.
J
Yeah
and
rich
brings
up
a
good
point,
dave
that
it
really
does
depend
on
the
algorithm
like
charlie
zender's.
Big
grooming
algorithm
will
absolutely
be
the
same
time
to
read
data
compressed
with
that
as
with
regular
lossless
compression,
because
it's
basically
like
a
a
pre-conditioning
type
filter.
So
the
file
won't
look
any
different,
so
we'll
take
the
exact
same
amount
of
time,
whereas
cfp
and
sc
there's
a
little
additional
work
that
goes
on
so
I
mean
all
of
this
is
very
algorithm-dependent
as
well.
D
Jim
raised
the
issue
about
hdf5,
parallel
performance
versus
pnet,
cf,
parallel
performance,
and-
and
I
guess
my
thinking
on
this
is
you
know
this
is
not
really
an
offline
process.
This
is
this
is
an
online
thing
where
we're
going
to
be
initializing
the
model
we're
going
to
be
reading
in
great
information.
Initial
data
sets
potentially
restart
files,
whatever
things
like
that,
so
this
is
all
about
the
online
tools
that
we
have
available,
and
so
so
we
need
to
build
it
into
the
whatever
the
net
cdf.
Reading.
Writing
that
we're
doing
within
the
model.
J
I
mean
one
thing:
I
think
that
we
have
to
keep
in
mind
too.
Is
I
mean,
like
rich
said,
if
the
file
size
is
smaller,
it's
going
to
take
less
time
to
read,
and
the
other
thing
is
beyond
that.
I
mean
we're
really
at
the
point
where
storage
is
so
much
more
precious
than
cpu
hours,
so
pardon
me
wants
to
say
who
cares
if
it
takes
five
extra
minutes
at
the
beginning
of
the
run,
if
you're
gonna
produce
30
less
terabytes
of
data,
I
mean
really.
Who
cares
you
know,
so
I.
J
D
The
people
who
care
about
this
or
like
the
data
assimilation
people,
you
know
who
are
going
to
be
writing
out
frequent
restarts,
and
you
know,
they're
going
to
really
care
about
that
sort
of
performance.
Yeah.
D
D
G
Yeah,
I
was
just-
I
was
just
going
to
add
that
we
with
the
experiments
that
we
did
with
the
parallel
compression
we
looked
at
the
cost,
cost
comparison
using
some
sort
of
figures
of
merit
for
cost
of
disk
space
customs
cpu
hours,
and
it's
it's
decidedly
on
the
side
of
the
cpu
are
decidedly
on
the
on
the
side
of
data
storage.
G
How
long
would
you
have
to
store
the
data
and
how
long
does
reserving
that
amount
of
data
for
that
length
of
time?
How
much
does
that
cost
compared
to
the
cpu
cost,
because
data
is
more
like
renting
a
an
apartment
and
in
cpus
are
ephemeral
right.
So
when
you,
when
you
look
at
that,
it's
very
the
retention
time
for
the
data
for
compressing
it
from
a
cost
perspective
is
very
small
compared
to
the
before
you
break.
Even
you
know,
I
forget
what
the
number
was,
but
it's
it
was
on
the
order
of
like
48
hours.
G
If
you're
going
to
keep
the
data
for
more
than
48
hours,
you
probably
should
you
probably
should
compress
it
because,
but
this
is
again
with
one
algorithm
and-
and
you
know
particular
data
file.
We
were
using
a
very
high
resolution,
his
single
very
high
resolution
history
files,
like
178
gigabyte
file
from
impasse,
so
your
mileage
might
vary,
but
I
didn't
get
the
sense
that
that
that
allison
is
wrong.
You
know
that
who
cares
about
the
cpu
costs
really.
B
B
Left
officially
a
minute
or
so,
but
I
there
was
a
lot
of
activity
in
the
chat.
Do
people
have
some
questions
or
comments
that
they
want
to
still
bring
to
the
whole
group
either
on
this
or
or
there
were
stuff
we
might
have
missed
from
jesse's
earlier
talk.
So
in
the
last
few
minutes.
F
Yeah,
maybe
building
on
jim's
question
here
allison.
Are
you
guys
in
touch
with
any
of
the
people
that
are
doing
peanut
cdf?
Do
you
see
that
as
there's
a
potential
for
compression
in
that?
Is
that
the
same
stuff
you're
doing
what's
the
path
forward?
There.
J
J
But
more
recently
I
know
that
the
sc
folks
out
of
argonne
have
been
started
working
with
the
peanut
cf
people,
but
I
don't.
I
went
to
their
github
page
the
other
day
and
there
wasn't
a
lot
there.
So
I
not
don't
know
what
the
status
is,
but
I
think
it's
only
good
for
us
if
the
doe
wants
it
there,
because
they
probably
do
have
the
money
to
get
some
of
this
in
there.
D
I
can
only
say
I
don't
know
either,
but
I
I
do
know
that
e3
sm
gave
peanut
cdf
developers
quite
a
bit
of
funding
in
the
last
couple
of
years,
and
I
think
this
is
on
their
task
list.
J
I
mean,
I
think
the
algorithm
developers
are
pretty
motivated
for
this.
To
happen,
I
mean
I
feel
like
both
the
zfp
and
sc
developers
would
be
thrilled.
If
we
said,
oh
we're
going
to
use
your
algorithm
to
compress
all
our
data,
so
we
do
have
some
leverage
there,
but
unfortunately
yeah
somebody
has
to
do
the
work.
So
I
I'm
optimistic
that
all
this
is
going
to
come
around
as
more
and
more
simulation
groups
are
realizing.
They
need
to
use
lossy
compression,
so
I'm
optimistic
about
it,
but
yeah.
B
B
All
right
well,
thank
you
to
all
the
speakers
today
and
thanks
all
for
this
very
interesting
and
useful
discussion,
yeah.
Clearly
a
lot
of
interest
in
in
the
lossy
compression
and
a
lot.
You
know
exciting
to
see
this
continue
to
move
forward.
So
we
are,
you
know,
so
this
is
where
we
are
officially
closing
the
the
session,
but
some
of
us
will
be
staying
on
for
the
next
hour
in
which
we
try
to
recreate
the
experience
of
of
chatting
and
having
lunch
together
so
feel
free,
please!
B
If
you
can
stick
around
and
we
can
chat,
I
think
we're
gonna
set
up
breakout
rooms
for
that.
If
I
can,
if
we
can
figure
out
how
to
do
that
and
so
that
we
can,
you
know
you
can
you
can
get
together
with
someone
you
wanted
to
catch
up
with
if
you'd
like
so
yeah
we're.
F
B
Well,
we
could
stick
it,
we
could
stick
here
and
then,
if,
if
people
want
you
can,
let
me
know
and
we
can
set
something
up-
cool.
B
I'm
I'm
gonna
personally
step
out
for
just
just
a
few
minutes
to
get
myself
a
little
bit
of
food
and
take
a
take
a
five
minute
break.
But
but
I
will
be
back
shortly
and
others
feel
free
to
stay
on
and
chat.