►
From YouTube: Day 4 - WCRP Digital Earth Workshop
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
A
Good
morning,
everybody
all
right,
so
the
flu's
colds
and
no
covet
so
far
are
taking
hold
of
this
conference
in
an
unexpected
way.
In
that
it
is
two
of
our
invited.
Virtual
speakers
had
to
cancel
because
they're
both
sick
with
flues
and
that's
only
fewer,
won't
be
here
at
nine
o'clock
and
Laura
Rosanna
won't
be
here
after
the
coffee
break.
So
what
we
have
decided
to
do
is
we're
going
to
ask
Richard
and
Sherry
who
were
the
contributed
talks
before
lunch
to
speak
now
then,
we'll
have
Mila
and
clovers
invite
a
talk
at
9
45.
A
Then
we
have
a
coffee
break
and
then
we'll
go
into
breakouts
straight
after
the
coffee
breaks.
So
your
afternoon
breakout
session
becomes
a
morning
breakout
session,
we'll
have
lunch
and
then,
depending
on
how
you're
going
in
your
breakout
sessions,
you
can
reconvene
after
lunch
or
not
it's
up
to
you,
how
you
feel,
how
far
you've.
B
A
A
So,
basically,
the
afternoon
is
a
bit
at
your
discretion
right,
so
work
for
as
long
as
you
want,
or
as
little
as
you
want
and
then
have
the
afternoon
or
FIFA.
That
was
the
better
option
than
finishing
the
meeting
today,
which
we
could
also
have
tried,
because
there's
a
few
people
who
deliberately
who
are
not
here
today,
who
will
be
here
tomorrow
for
the
closeout
of
the
meeting
and
the
plenary
discussion
and
the
reporting
of
the
breakout
groups.
A
Andrew
will
add
something.
I,
just
want
to
remind
you
but
I,
don't
quite
know,
I
think
I
will
say,
sent
any
of
the
any
of
Andrew
Andreas
myself
Kathy
or
cath
your
acronym
proposals
or
what
we
should
call
these
models.
K
scale
kilometer
scale
wasn't
appealing.
I
have
to
report
that
in
a
on
a
car
ride,
Gretchen,
malindor,
Cass,
senior
and
I
have
come
up
with
a
wonderful
acronym,
so
there
is
competition,
that's
just
to
needle
you
all
a
bit.
We
have
one
and
we
love
it.
C
Yeah
I'm
going
shopping.
We
were
discussing
I'm
going
shopping
this
afternoon
for
prizes,
so
this
is
going
to
happen
just
the
other
thing
to
note.
If
you
have
the
afternoon
off.
Obviously
there
are
many
opportunities
in
Boulder.
If
you
just
to
note,
there
is
actually
an
in-car
shuttle
that
we
can
get
people
to
schedule
from
that
will
leave
from
right
out
here
that
will
take
you
across
town.
C
D
It's
probably
worth
noting
that
there's
a
number
of
Mountain
trails
that
begin
at
the
Mesa
lab.
So
if
you
take
the
shuttle
up
there,
there's
lots
of
beautiful
trails
that
take
you
into
the
forest
and
the
hills
I've
been
here
all
week,
but
my
wife
has
been
hiking
and
tells
me
the
sumac
is:
is
changing
beautiful
red
and
it's
bright
and
lovely
and
I
think
the
Aspens
are
past
Peak.
But
there
aren't
a
lot
of
Aspens
right
here
in
the
Front
Range,
but
but
the
sumac
are
changing
and
it's
quite
lovely.
D
So
if,
if
that's
appealing
to
you,
had
to
Mesa
lab
and
grab
a
trail.
A
A
Oh
I
saw
it
on
TV.
That's
all
I,
don't
know
what
they
use
all
right.
So
why
don't
we
get
started
so
our
first
Speaker
today
then
is
Richard,
Loft
and
and
followed
by
Sherry
and
they're
doing
a
tag
team.
So
they
have
asked
to
do
two
times
12
minutes
and
then
only
be
ask
questions
at
the
end.
So
that's
how
we're
gonna
do
it
Richard
go
ahead.
E
B
E
Thank
you
very
much.
I'll
try
two
live
up
to
the
opportunity
to
pretend
to
be
Ollie
fur
in
some
ways.
So
today
we're
going,
as
was
mentioned.
Sherry
and
I
are
going
to
tag
team,
a
discussion
about
the
computational
science
and
software
engineering
aspects
of
the
Earthworks
project
on
Tuesday.
E
E
We
were
already
looking
at
trying
to
Port
impasse
to
gpus
on
the
you
know,
in
a
curiosity,
driven
way
to
see
if
directive
offload
could
actually
work
reasonably
well
for
for
a
model
and-
and
so
the
objective
was
to
produce
an
nwp
forecasting
system
with
about
30
percent
of
the
planet,
refined
to
three
kilometers.
Essentially,
the
weather
company
wanted
to
have
three
kilometer
resolution
over
the
populated
areas
of
the
world
I.E
the
land,
and
they
also
recognized
with
impasse
that
it
had
local
mesh
refinement
baked
in.
E
So
it
took
about
two
years
to
get
empaths
fully
ported
as
a
stand-alone
meteorological
model
to
gpus,
and
when
we
were
done,
we
were
able
to
run
on
Summit
I
have
here
about
4
200
gpus,
that's
about
700
nodes
of
summit
at
Full,
Resolution,
three
kilometer,
and
we
we
had
to
take
some
shortcuts,
one
of
which
was
we
didn't
want
to
Port
the
radiation
because
it
was
in
progress
elsewhere.
So
we
adopted
a
lagged
radiation
where
we
used
the
heterogeneous
node
fully
using
the
CPUs
to
calculate
radiation.
E
So
just
a
couple
results
from
from
that
exercise:
the
left
shows
weak
scaling,
so
this
is
holding
the
amount
of
computation
per
computational
element
constant
and
then
scaling
out
the
number
scaling.
The
resolution
proportionate
to
how
many
devices
you
use,
and
so
these
flat
curves
show
that
at
weak
scales
and
then
the
strong
scaling
studies,
the
three
orange
green
and
brown
show
10
five
and
three
kilometers
and
the
blue
is
10
kilometers,
but
run
on
in-car's
Cheyenne
system.
E
E
And
that's
that's
one
year
per
day
at
that
time,
on
Summit,
so
for
Earthworks,
the
premise
in
The
Proposal
is:
can
we
take
this
nwp
success
and
translate
it
into
a
nurse
System
model
and
the
target
was
Global
quasi-uniform
resolution
put
all
the
components
on
one
mesh,
one
at
the
same
resolution
that
obviates
the
need
to
do
interpolation
or
some
kind
of
regritting
that
introduces
potentially
errors
Target.
E
These
Global
storm
resolving
resolutions
Focus
here
on
the
G11
casahedral
type
grid,
with
41
million
points,
that's
the
kind
of
the
target
we
settled
on
when
the
proposal
was
written
in
2019
and
the
goal
you
know
the
moon,
landing
type
goal
for
us
in
the
proposal
was
half
a
year
per
day
at
3.75,
kilometers.
E
So
one
of
the
things
that
we
have
to
do
is
sort
of
you
know
leverage
we're
leveraging
CSM
and
we
want
to
run
CSM
at
high
resolution,
as
does
the
SEMA
team
at
ncar
and
the
CSM
core
team,
so
we've
been
looking
with
them
at
some
high
resolution,
results
for
Earth
an
earth
system
component
that
looks
like
atmosphere,
land
in
a
data
ocean
and
this
just
kind
of
underscores
what
what
happens
on
a
cpu-based
system.
When
you
run
on
this,
you
see
the
atmosphere.
E
The
blue
piece
of
this
pie
dominates
computational
time
it's
about
90
percent
of
the
time
long
turnaround
times
for
this.
This
is
512
nodes.
It's
about
an
eighth
of
our
system,
I,
say
R,
because
I
worked
at
incar
for
27
years,
so
their
system
20,
you
know
an
eighth
of
the
system,
so
to
get
a
run-in
requires
draining
a
large
chunk
of
the
machine.
So
you
can
wait
in
the
queue
for
a
week
to
try
and
get
a
big
run
like
this
done.
E
It's
we
actually.
When
we
ran
this,
we
revealed
some
issues
with
how
the
model
scaled
this
particular
run,
spent
as
much
time
initializing.
The
land
surface
as
executing
the
the
model
that's
been
fixed,
but
you
have
to
run
at
these
scales
in
order
to
find
these
kind
of
problems
and
then
the
slow
throughput,
the
actual,
when
you're
actually
running
here,
with
with
the
this
test
case,
you
run
about
.08
simulated
years
per
day
at
seven
and
a
half
kilometers.
E
So
Earthworks
strategy
coming
just
using
that,
as
as
an
example,
you
know
what
we've
done
is
use
Regional,
refinement
of
empaths
to
reduce
the
cost
of
tuning
the
climate
parameterizations
of
meteorological
length
scales,
so
we're
trying
to
avoid
running
fully
fully
refined
models
at
ultra
high
resolutions
to
tune
that,
for
cost
reasons,
Target
heterogeneous
competing
with
gpus,
that's
kind
of
obviously
the
point
here,
but
we
need
to
accelerate
things
in
order
to
get
these
gpus
can
accelerate
this
code
to
something
more
reasonable
than
.08
based
on
the
times.
E
E
You
can't
really
get
much
throughput
if
you
need
to
use
a
large
fraction
of
the
computer,
so
we
need
to
go
after
really
big
machines
exit
scale
machines
as
the
target
platforms
for
these
things,
so
that
we
can
actually
get
some
turnaround
time
and
I
think
it's
important
to
at
least
mention
that
our
our
project
also
recognizes
the
huge
data
problem
that
is
entailed
by
this
kind
of
approach.
E
E
If
you
want
to
read
it,
but
essentially
this
plot,
the
cool
colors
are
CPU
based
results,
Intel
processors,
basically
and
the
warm
colors
are
gpu-based
results,
either
openmp
or
open
ACC
offload,
and
you
can
see
that
the
the
two
types
of
CPUs
and
gpus
have
very
different
characteristics
and
they
show
up.
In
the
physics
the
number
of
columns
per
node,
that's
optimal
for
a
CPU
is
essentially
exactly
opposite
of
a
GPU.
E
Now
in
this
in
this
setup
yeah
in
this
setup,
we
just
have
embedded
a
GPU
based
physics
in
with
the
see
in
with
the
CPU
model,
so
you're,
actually
not
seeing
all
of
the
all
of
the
stuff
that's
going
on
with
the
GPU
because
of
the
data
transfer
back
and
forth.
E
E
This
is
where
the
atmosphere
dicor
is,
and
you
would
think
this
would
be
the
easy
part,
but
impasse
7
is
quite
a
bit
different
than
M
past
six,
because
impasse
7
is
is
designed
to
go
into
the
climate
model.
So
there's
some
restructuring
but
I
think
the
key
Point
Point
here
is.
We
have
some
latency
issues
and
some
computational
issues
that
we
discovered
in
impasse,
seven
that
are
associated
with
MPI
weight
and
with
some
computations
some
variables
that
we
didn't
capture
as
being
GPU
resident
and
so
we're
working
on
this.
E
But
this
this
Advantage
seems
to
persist
in
the
newer
architectures
that
have
come
out
since
the
old
result
and
we
think
10
kilometers,
four
simulated
years
per
day
on
256
a100s
is
a
reasonable
simulation
rate.
E
With
the
ocean,
we've
got
six
times
faster
than
a
Broadwell
node
on
this
system
we
tested
on
that's
at
Nvidia.
That
equates
to
about
12
simulated
years
per
day.
We
think
that
this
this
is
for
a
particular
test
case
called
EC
60
to
30,
when
we
take
that
down
to
three
kilometers,
so
we
think
we
can
get
multiple
years
per
day
out
of
the
ocean,
with
a
very
relatively
small
number
of
gpus
compared
to
the
atmosphere,
so
we're
in
good
shape,
I
think
with
the
ocean.
E
And
then,
with
data,
it's
like
be
careful.
What
you
wish
for
the
model
could
produce.
We
estimate
with
hourly
output
one
and
a
half
petabytes
per
simulated
year.
That's
a
lot!
So
we
plan
to
use
data
compression
parallelism
in
the
form
of
task
or
desk.
I
should
say
the
desk
python
library
for
workflow
and
chunk
it
out
as
Zar
chunks.
E
E
E
G
Thanks
Rich,
so
I'll
I'll
be
talking
about
the
software
engineering
challenges
that
we
faced
in
this
project,
so
basically
talking
about
how
we
got
the
results
that
rich
just
showed
oops,
where
am
I
sorry,
so
I
threw
a
whole
bunch
of
stuff
on
a
slide
to
show
how
complex
the
software
development
project
really
is
and
I'm
going
to
start
off
by
pointing
out
the
let's
see
here
there.
It
is
this
this
first
diagram
up
here
in
the
right
top
hand
corner.
G
So
we
are
coupling
empathy
ice
with
and
pass
ocean
within
the
CSM
infrastructure,
which
presented
a
whole
bunch
of
software
engineering
challenges
in
order
to
get
that
working.
The
other
challenge
we
have
is
these
models
continue
to
develop
so
they're,
not
they're,
not
static,
they're,
being
developed
orthogonally
to
our
own
project.
So
how
do
we
actually
coordinate
porting
code
to
GPU,
as
the
code
is,
is
changing
as
we're
trying
to
do
our
work,
so
we
have
that
extra
complexity
as
well.
G
We
also
are
coordinating
this
effort
across
many
different
organizations,
so
the
project
itself
is
Led
through
Colorado
State
University,
but
we're
also
coordinating
the
effort
between
many
Labs.
Here
at
ncar
we
have
the
private
sector
with
help
from
Nvidia
and
Rich's
new
LLC
r
d,
as
well
as
some
efforts
within
the
department
of
energy
as
well.
All
we
also
have
a
very
complex
software
stack
that
we're
also
trying
to
coordinate
across
many
of
these
different
systems
that
that
I
show
here.
So
how
do
we
do
this?
G
So
you
know
when
you
think
of
software
Engineers
I'm,
going
to
throw
this
out
here
where
we're
typically
stereotypically
introverts.
We,
like
our
offices,
we
don't
like
to
coordinate,
but
in
the
efforts
like
this,
we
just
can't
do
that
we
have
to
coordinate.
So
an
important
part
of
this
project
is
getting
everybody
in
the
same
room
or
Zoom
conference
call
and
nowadays
and
making
sure
that
all
voices
are
are
heard
and
that
they're
equally
important.
G
In
all
these
conversations,
I
personally
like
to
celebrate
All
successes,
so
every
little
accomplishment
we
make
we
celebrate
as
a
team
and
keep
everybody
motivated
and
moving
forward.
It's
also
as
a
group
important
to
create
that
Clear,
Vision
and
path,
but
everyone
at
part
of
the
project
should
also
understand
how
their
contributions
fit
into
this
bigger
picture.
G
We
can
go
ahead
and
transfer
team
members
back
and
forth
to
help
each
other
across
the
the
project,
but
it's
it's
also
important
to
empower
all
team
members
so
as
the
lead
of
the
software
engineering
effort,
I
always
try
to
find
Opportunities
to
create
leadership
roles
for
for
the
team
members,
so
so
they
have
a
chance
to
grow
their
leadership
skills,
but
also
their
technical
skills
as
well.
G
We
also
need
to
definitely
coordinate
our
effort
with
the
scientists,
so
you
know
from
our
experience.
The
scientists
have
to
be
equally
invested
into
the
project.
We
can't
go
ahead
as
software
engineers
and
say
we
know
best.
This
is
what
you
guys
are
going
to
do.
They
need
to
be
invested
in
into
these
projects
as
well.
We
also
need
to
be
aware
of
the
science
planning
that's
going
into
it
as
well
as
the
scientists
need
to
be
involved
with
where
we're
going.
Software
engineering
wise.
G
We
need
to
be
in
constant
communication
with
each
other,
so
as
a
team
part
of
our
team
is
embedded
into
the
the
science
meetings
and
we
also
notify
the
scientists
in
separate
meetings
are
planning
from
the
engineering
effort.
It's
also
very,
very,
very
important
to
have
that
exit
exit
strategy
plan
so
as
we
leave
as
we
complete
these
projects,
how
are
how
is
the
software
going
to
be
maintained
after
the
work
has
been
completed
and
that's
always
been
kind
of
a
a
difficult?
G
G
So
when
we,
when
we
think
about
the
software
development,
it's
very
important
to
to
have
Version
Control
Systems,
so
as
a
project,
we've
basically
set
up
a
repo,
that's
sits
parallel
to
CSM,
and
we
did
that
because,
even
though
all
the
changes
we're
making
it
in
regards
to
cam
and
the
GPU
Port
are
going
right
into
Cam.
We
are
also
remember
coupling
and
past
sea
ice
and
then
pass
ocean
into
this
infrastructure
which
doesn't
fit
in
the
long-term
plans
of
CSM.
G
So
we
had
to
create
a
parallel
repo
for
for
this
work
and
the
way
that
that
this
repo
actually
works
is
it
actually
just
contains
an
external
config
file
which
basically
is
a
recipe
of
exactly
all
the
repos
and
all
the
all
the
different
repos,
and
all
this,
the
the
software
stack
with
within
Earthworks
within
CSM.
It
pulls
it
down
and
kind
of
contains
that
recipe.
So
it's
really
lightweight,
but
all
the
information
is
there.
G
So
as
a
project,
you
know
going
the
GPU
porting
route,
it's
you
know
we're,
like
all
other
institutions
which
way
do
you
go.
You
know
we
went
the
open,
ACC
route
for
most
of
our
development,
mainly
because
of
the
research
that
that
we
have
right
now
from
this
plot
that
rich
showed
a
similar
plot
earlier
we
are
seeing
better
performance
with
open,
ACC
right
now,
so
we're
going
that
in
that
direction.
G
But
we
are
aware
that
open,
openmp
offload
is
still
being
developed,
and
so
it's
still
on
a
radar
and
we
have
a
separate
research
project.
That's
looking
into
okay,
we
want
open,
ACC
route.
What
if
we
have
to
switch
to
openmp
offload
openmp
offload
has
the
advantage
that
we
can
run
on
Intel
gpus,
so
right
now,
with
open
ACC,
we
can
run
on
Nvidia
gpus.
We
can
run
on
AMD
gpus,
but
we
can't
run
on
Intel
gpus,
so
as
performance
improves
with
openmp
or
we're
needing
to
run
on
Intel
architectures.
G
We've
investigated
this
tool
actually
from
Intel
that
I
highly
suggest.
If
anybody
else
is
in
the
same
situation,
we
found
that
it
gets
you
about
90
percent
of
the
way
there
in
performance
as
well
as
portability.
So
we've
we've
actually
looked
at
it.
It's
with
some
complex
codes,
like
cm1,
that's
developed
here,
it's
a
cloud
model
as
well
as
this
Pumas
mg3
work.
G
I
also
wanted
to
give
a
shout
out
to
project
region
which
had
a
little
bit
of
about
a
little
bit
of
a
blurb
about
it.
But
basically
what
this
is
going
to
enable
us
to
do
is
to
seamlessly
analyze
data
on
unstructured
grids
right
now,
we're
kind
of
in
that
situation,
where
we
have
to
regrade
all
of
our
data
before
we
analyze
it.
This
will
be
provide
that
that
seamless
transition
to
reading
in
the
data
and
then
automatically
analyzing
it
within
like
x-ray
Das.
G
So
it
will
be
able
to
exploit
that
parallelization
within
our
workflow
very
easily,
with
this
work
so
stay
tuned
for
that,
and
it
wouldn't
be
a
computational
talk
without
talking
about
testing
infrastructure,
so
during
development
I
for
GPU
work,
it
is
very,
very,
very
critical
to
test
often
and
worry
about
performance
later,
it's
very,
very
easy
to
lose
correctness
when
you're
doing
GPU
work.
So
it's
very
important
to
be
testing
this
every
iteration
just
to
make
sure
that
we,
as
software
Engineers,
aren't
changing
answers
for
you
guys.
G
So
we
did
that
with
with
the
empath
or
I'm
sorry,
the
mg3
Puma's
work
just
because
it's
not
a
standalone
test
and
we're
able
to
iterate
back
and
forth
very
quickly
on
the
development
and
then
put
it
back
into
the
model
or
the
club
work
that
we're
just
actually
starting
we're.
Actually,
Club
is
very
nice.
It
has
its
very
own
like
testing
infrastructure
and
it's
a
standalone
application
on
its
own.
So
in
order
to
do
this,
this
porting
what
we
actually
implemented.
G
Actually
we
didn't
you,
you
dubbed
Milwaukee,
did
Vince
Larson
and
Gunther
did
this
where
they
added
the
multi-column
case
capabilities,
so
we're
going
to
go
ahead
and
iterate
back
and
forth
on
the
development
quite
easily.
With
this
multi-column
capability
that
they
added
now
after
development,
we
have
some
cam
regression
tests
that
have
been
added,
so
we
can
go
ahead
and
move
move
to
a
different
part
of
the
project
and
the
science
can
still
keep
going
on,
but
they'll.
G
Let
us
know
if
something
breaks
so
the
way
that
they
do,
that
is,
they
have
two
different
tests.
We
call
it
a
smoke
test
where
it
just
does
a
test.
Does
it
run,
and
we
also
have
another
bit
for
a
bit
comparison
test
which
Compares
between
CPU
and
GPU.
So
then
we
can
tell
if
our
answers
are
different.
G
So
what
are
we
promising
with
this,
so
so
December
of
this
year,
we're
hoping
to
get
out
more
of
our
CPU
capabilities
and
our
and
our
configurations
that
we
have,
starting
next
year
in
May,
we're
going
to
be
releasing
more
of
our
GPU
offload
versions
of
these
packages
and
a
15
kilometer
resolution
configuration
and
then
a
year
from
now
we're
going
to
be
releasing
more
of
our
GPU
offload
capabilities.
Another
resolution,
but
we're
also
going
to
be
looking
at
releasing
our
Diagnostic
Center
analysis
packages
and
then
finally,
hopefully
May
2024.
G
So
how
are
we
doing
so
when
we
look
at
our
CPU
route,
we're
doing
fairly
well
we're
pretty
much
on
target,
as
Rich
says
the
iteration
process.
As
soon
as
we
start
getting
the
higher
resolution
it
takes,
you
know
we
can
get
one
job
through
the
queue
a
week.
G
So
it's
really
limiting
our
throughput
in
that
regards,
but
we
are
making
some
some
good
project
progress
and,
as
rich
said,
we're
sitting
at
about
0.08
simulated
years
per
day
with
the
atmosphere
land
configuration
at
7.5
kilometers
we're
hoping
to
speed
that
up
by
about
like
12
times,
and
then,
where
are
we
with
our
GPU
effort,
so
we're
doing
fairly?
Well
with
that
as
well.
So,
as
Rich
mentioned,
we
have
already
completed
the
dynamical
core,
the
empath
7
version.
We've
also
completed
the
port
Pumas
as
well.
G
Right
now
we're
we're
validating
the
radiation.
The
rrt
MGP
we've
hit
a
couple
compiler
issues
with
that
one,
but
we're
hoping
to
work
that
one
out
we're
currently
working
on
the
Club
Port,
we're
just
actually
starting
that
so
we're
hoping
to
get
that
done
within
the
next
year
and
we're
also
evaluating
the
other
physics
within
cam
itself
to
figure
out
exactly
how
much
work
it's
going
to
be
and
and
what,
where
the
slowdowns
are
in
other
parts
of
the
package
that
need
to
be
ported
to
gpus
I.
Think
that
was
my
last
slide.
So
huge.
A
H
And
Nikolai
from
Avi,
so
we're
doing
something
similar
was
a
great
talk.
My
question
is
about
your
testing
you're
developing
code
on
GitHub
and
you
have
to
test
it.
Well,
probably,
every
pull
request
get
tested
somewhere.
Is
it
tested
like
on
GitHub
actions
or
basically
my
question?
If
how
do
you
test
your
code
on
the
hpcs,
because
our
admins
basically
freaked
out
when
you
ask
them
that
you
have
to
do
a
unit
testing
on
hpcs
and
they
say
no?
No,
it's
never
going
to
happen
well.
D
G
That's
probably
why
we
haven't
heard
anything
back
from
them.
You
know.
Basically,
our
testing
right
now
is
the
regression
tests
that
they
do
for
cam,
so
every
single
time
they're
getting
ready
to
release
or
or
do
another
tag
in
cam
that's
when
they
run
those
particular
tests
when
we're
actually
testing
during
development.
It's
you
know
for
for
me
in
particular,
I
port,
you
know,
module
and
I
run
the
test
and
see
if
that's
different,
so
unfortunately
we
don't
have
any
CI
continue.
So.
G
Now
I'm
not
like
I
said
we
have
the
email
in
we'll
see
I'll,
let
you
know
how
that
goes.
H
Too
yeah
that
would
be
great
to
hear
because
for
us
it's
right
now
pretty
hard.
E
Yeah
can
I
I
just
wanted
to
say
that
two
things
about
this
one?
We
have
the
added
issue
of
not
just
testing
one
Center
because
we're
targeting
doe
exascale
systems,
so
we've
we've
had
written,
essentially
portability,
test,
request,
type
small
allocation
request
when
the
testing
gets
to
a
certain
size.
They
start
to
say:
what's
the
you
know
where
it's
the
science
headline
related
to
this
and
I
I?
E
E
Where
there's
some
kind
of
donut
hole
like
there
and
I
guess,
the
the
second
issue
is
part
of
the
pull
request
is
a
lot
of
times,
has
a
code
review
which
is
a
manual
process,
and
that
takes
up
some
time.
So
we
can't
always
just
trigger
an
automatic.
E
I
A
quick
question
event:
Earthworks
proposal
was
submitted
and
you
indicated
too.
There
was
a
Target
simulation
years
per
day
and
it
was
like
I
can't
remember
what
the
exact
number
was
right
at
3.75
kilometer
resolution
now
you're
into
like
two
three
years
into
the
project.
You
know
a
lot
more
about
the
capabilities
and
all
that
stuff
Are
You
On
Target
in
a
coupled
system
with
3.75
kilometers
to
accomplish
that
goal
by
May
2024
or
do
you
can
you
sort
of.
E
Yeah,
you
know
the
the
results
from
the
A1
100
and
the
where
we
feel
I
I'm
comfortable
with
the
notion
that
we
can
hit
that
there
are
some
dependencies
like
we
have
to
get
Club
ported.
We
have
to
get
the
radiation
actually
running
on
gpus
one
of
the
problems
with
the
the
GPU
thing.
From
my
projection
point
of
view,
is
it's
an
L-shaped
Roi
curve?
E
So
essentially
you
don't
get
any
speed
up
until
everything's
ported,
so
there's
no
host
advice,
traffic
between
the
different
components.
Once
you
get
everything
over,
then
you
see
this
big
benefit.
So
that's
why
I
mean
it's
L-shaped?
You
know:
managers
like
linear
Roi.
You
know
like
I
invested
a
million
dollars
and
I
got
this
much
improvement
and
that's
part
of
this,
the
scary
part
in
making
these
projections,
but
right
now,
looking
at
the
ocean
and
looking
at
the
atmosphere,
I
think
we
we
might.
F
Thanks
Rich
and
Sharia:
what's
actually
the
best
overview
of
Earthworks
I've
seen
so
far,
my
question
is:
are
you
planning
to
host
any
workshops
similar
to
CSM
workshops
or
Wharf
workshops,
or
will
it
be
as
easy
as
running
one
of
our
Standalone
models?
Eventually,
once
it's
all
done,
can
a
grad
student
just
go
to
GitHub
download
it
or
is
it
going
to
be
more
complicated
and
we
need
to
bring
people
in
and
and
teach
them
how
to
do
it.
C
Yeah
sure,
if
a
grad
student
has
a
Nexa
skill
machine,
no
problem
I
mean
it
is
Earthworks
is
a
configuration,
that's
targeted
at
a
specific
thing,
that
is
the
global
three
and
a
half
kilometer
coupled,
so
it
is
supposed
to
be
a
comp
set
of
cesm.
So
in
that
sense,
yes,
it
will
be
possible,
but
the
computational
load
is
going
to
be
excessive
they're
going
to
be
other
versions
of
cesm
using
these
components
that
will
be
easier.
C
The
regionally
refined
stuff
things
like
that
again
there
should
be
com
sets
it's
going
to
be
a
community
model,
so
it
is
going
to
be
there
accessible
able
to
explore,
but
the
computational
resource
for
this
is
definitely
this
is
the
exascale
part,
but
it
doesn't
mean
they're,
not
other
parts.
If
that
answers
your
question
and.
E
There
and
there
is
a
plan
to
hold
a
workshop
I.
Just
don't
remember
the
details
of
where
yeah.
So
we
got,
we
got
a
workshe,
we
you
know
I.
Think
one
thing
is:
is
you
need
that
version
one
out
there
even
at
low
resolution,
so
people
can
kick
the
tires
on
it
and
then
you
have
a
you,
have
an
opportunity
to
have
a
workshop
where
people
can
test
stuff
after
they
work
go
to.
A
E
D
E
A
J
Be
brave
everything
I'll
try
to
be
brief.
Thank
you
for
your
talk.
It's
very
informative
I,
like
that.
You
did
make
a
comment
about
land
kind
of
like
oh
dear.
We
have
to
deal
with
that,
but
given
Martin
best
talk
from
Tuesday
and
looking
at
being
in
being
on
different
resolutions
or
grids
or
whatever,
how
might
you
think
that
the
land
could
be
accommodated
because
whenever
we
look
at
Earth
system
components,
it's
ocean
and
sea
ice
and
atmosphere?
Oh
yeah
land
is
always
underneath
there.
E
Well,
it's
exactly
because
of
what
Martin
said
about
the
land,
possibly
getting
much
more
expensive
and
I
also
think
that
Tim
Schneider
mentioned
that
when
they
run
sort
of
Wharf,
Hydro
type
experiments,
the
land
consumes
about
25
percent
of
the
resources.
Whereas,
when
you
look
at
the
pie,
chart
I
showed
at
seven
and
a
half
kilometers
for
CSM
the
land.
Is
you
know
five
percent
or
six
percent?
E
So
if
it
grows,
then
the
whole
idea
that
we're
just
going
to
keep
it
on
CPU
and
not
deal
with
it
stops
working
because
you
don't
have
enough
CPU
power
to
calculate
the
thing.
So
at
some
point,
there's
a
sort
of
possible
bifurcation
point
where
you
say:
okay,
the
hell
with
it
I'm
going
to
Port
it
to
gpus
and-
and
that
is
you
know-
that's
not
factored
into
our
current
work
plan.
But
it
may
have
to.
G
A
A
D
E
Okay,
yeah,
all
that
all
that
plants
Birds
Rivers
everything
whatever
all
that
25.
If
it
becomes
bigger,
then
it
can't
be
ignored.
That's
that's
something.
I
learned
by
raising
children.
A
I'm,
a
complete
Outsider
but
I
saw
a
version
of
the
ncar
land
model
that
had
Hill
slope
effects
from
the
in
the
talks
that
were
about
Alaska,
so
you
must
have
a
version
floating
around
where
you
can
play
and
see
what
the
consequences
of
making
it
more
complicated
might
be
so
I
just
it
might
be
one
of
those
where
people
don't
talk
to
each
other
cases.
But
if
you
there's
stuff
out
there
to
try
how
much
more
expensive
it
might
be,
no
I,
we
don't
have
time
for
any
further
questions,
I'm
afraid.
So.
A
So
our
next
presentation
will
be
by
Milan
Clover,
who
is
joining
us
from
Oxford
I
presume
there
he
is,
and
he
will
talk
about
data
challenges
when
we
go
to
these
very
high
resolutions-
and
we
can
see
your
presentation
now,
Milan
and
I
will
give
you
a
sort
of
after
35
minutes,
I'll
I'll
shout
into
the
microphone,
because
that's
the
only
way,
you're
going
to
hear
me
all
right,
so
take
it
away.
Thank
you.
K
Perfect
yeah
thanks
everyone
thanks
for
having
me
yes
I'm
in
Oxford
right
now,
so
I'm
I
did
not
make
the
big
trip
over
the
Atlantic,
but
I
do
want
to
talk
about
yeah
different
challenges,
around
data
and,
while
being
at
Oxford.
Yes,
we
collaborate
with
ecmwf,
but
so
many
of
the
things
that
I
want
to
talk
about
are
probably
a
bit
more
high
level,
a
little
bit
less
like
technical
implementation.
K
Details,
there's
a
lot
of
probably
view
from
ecmwf
spiced
in,
but
also
because
I'm
not
directly
working
at
ismwf,
I'm
kind
of
approaching
everything
from
a
slightly
more
naive
point
of
view
and
I
actually
found
that
very
helpful
to
stir
up
discussions
and
to
question
things
whether
we
should
always
do
it
in
this
way
or
that
way
and
I
hope.
Therefore,
you
will
also
have
a
lot
of
discussions
for
me
after
this
talk.
K
So
there's
a
lot
of
collaborators
people
that
I've
talked
to
over
the
last
years,
or
so
in
order
to
understand
the
different
perspectives
around
data
challenges,
especially
when
we
produce
a
lot
of
data,
and
so
the
very
first
plot
I
actually
just
put
that
back
together
and
put
that
together
yesterday,
because
I've
received
some
data
from
from
EastEnders.
How?
K
Basically
the
archive
scaled
over
the
over
the
last
decades
and
I
basically
want
to
ask
the
question
of
will
whether
we
will
actually
enter
the
Google
regime,
meaning
like
how
long
does
it
take
us
to
have
as
much
data
acquired
as
Google
had
a
couple
of
views
in
the
in
their
archives,
which
is
well
just
an
estimate
from
XKCD.
But
you
can
basically
see
some
OS
archives
at
the
moment
is
well.
That
was
December.
K
Last
year
was
at
Beyond
half
an
exabyte,
and
if
we
project
that
forward-
and
this
is
literally
there's
no
more
signs
than
just
literally
hand-drawn
a
couple
of
exponential
Curves
in
there-
then
by
2030
2040-
we
will
be
somewhere
between
several
exabyte
and
if
we
didn't
do
any
compression
and
it
didn't
do
any
cleanup,
we
would
actually
hit
the
hit
the
zeta
byte,
which
is
obviously
an
enormous
amount
of
data
that
I
just
can't
imagine.
K
But
you
can
also
see
that
there's
a
big
scope
for
for
compression.
So
if
we
simply
were
relative
to
where
we
are
now
have
a
tenfold
compression,
then
we
kind
of
like
gain
something
on
the
order
of
like
almost
10
years.
K
If
we
had
100
100
times
compression,
then
we
would
gain
probably
like
almost
15
years
or
so
and
in
terms
of
this,
this
race,
towards
the
the
Google
regime-
and
this
is
obviously
yeah-
puts
out
a
massive
Challenge
and
really
motivates
us
to
to
think
better
about
how
we
store
our
data
and
how
we
make
it,
make
it
accessible
to
our
users,
because,
on
the
other
hand,
while
our
archives
have
been
exploding.
K
Actually
we
have
not
thought
that
much
about
how
we
represent
our
data
in
a
bit
wise
way,
because
back
in
when
Nissan
burf
started
the
the
forecast
they
used,
double
Precision
floats.
The
IEEE
standard
was
actually
officially
only
introduced
a
couple
of
years
later
and
only
last
year
switched
to
single
Precision.
K
K
And
while
there
was
probably
a
bit
more
movement
on
the
data
compression
side,
I
believe
it's
not
as
much
as
or
not
did
not
receive
as
much
Focus
as
it
probably
should
have
received,
given
the
problem
that
we'll
be
facing
soon
so
I
kind
of
want
to
give
a
little
overview
of
what
different
aspects
of
data
conversions
are
out
there,
because
I
feel
like
in
this
community
around
high
resolution.
K
You
can
often
group
it
into
two
two
groups
or
two
schools
and
the
one
first
school
is
like,
let's
say,
transformation,
schools,
so
they're,
really
the
physical
perspective,
meaning
you
ask
the
question
of
I
have
some
data?
K
What
is
the
best
basis
function
to
represent
this
data,
and
so
basically
I
just
have
some
coefficients
that
are
multiplied
with
these
basis
function,
and
that
represents
my
data
really
well,
and
the
idea
of
that
is
obviously
that
you
somehow
approximate
the
spatial
structure,
spatial
or
even
temporal
structure,
if
you,
if
you
were
to
compress
in
a
time
and
I,
mean
I,
just
gave
this
example
of,
like
the
spectral
transform
that
you
could
think
of
for
representing
data
in
terms
of
spectral
coefficients,
there's
eofs
that
people
probably
have
heard
of
you
could
just
truncate
them
and
then
basically,
you
have
some
kind
of
data
compression
from
that.
K
K
All
of
them
basically
somehow
include
that,
or
most
of
them
are
fairly
expensive
to
compute,
because
you
have
to
do
a
lot
of
floating
Point
operations
in
order
to
get
into
your
transform
space
and
also
in
order
to
get
it
back,
because
you
basically
represent
your
data
with
some
kind
of
underlying
basis
function.
It
is
often
difficult
to
bound
the
error
to
no
approach
a
priori.
K
What
your
error
could
be
because
if
your
data
does
not
really
fit
the
basis
function
that
you're
using
it
makes
sense
that
kind
of,
like
you
easily
Beyond
some
some
bounds,
and
it's
quite
difficult
for
many
of
these
approaches
actually
also
Random
Access
isn't
really
easy
to
do
because
you
basically
say
like
you
want
to
know
what
I
don't
know
the
temperature
in
New
York
is
and
then
for
the
spectral
coefficients
in
like
your
spherical
harmonics,
for
example,
you
would
need
to
transform
everything
back
in
order
to
know
one
point:
So
that
obviously
makes
the
random
access
a
bit
tricky
the
second
school
and
that
what
was
yeah.
K
What
I
would
like
to
highlight
a
little
bit
is
the
kind
of
like
the
school
around
precision
and
information.
Theory
I
will
talk
about
that
about
this
a
bit
more
in
detail
later,
but
one
of
the
underlying
properties
of
that
school
is
that
you,
you
don't
really
think
about
the
spatial
structure
in
terms
of
like
the
physical
perspective,
but
most
of
the
time
transforming
it
into
this
new
encoding.
Whatever
this
encoding
is,
is
relatively
cheap.
K
If
you
understand,
for
example,
how
floating
Point
numbers
work
or
how
like
a
linear,
quantization
or
logarithmic,
quantization
Works,
your
error
bounds
are
also
relatively
rigid,
and
random
access
is
also
usually
much
easier,
because
you
can
it's
easier,
it's
more
straightforward:
how
to
chunk
your
data
into
pieces,
and
so
you
can
easily
say,
like
I,
just
want
to
decompress
this
one
chunk,
and
then
here
you
go.
K
This
is
definitely
so
tensor
trained,
for
example,
back
to
the
school
of
Transformations
is
not
something
that
I've
worked
with,
but
I
definitely
would
like
to
basically
present
it
out
there
as
like
a
thing
that
we
should
look
out
for
in
the
future,
because
I
think
some
of
these
approaches
could
be
could
be
really
promising.
K
The
tensor
train
approach
is
basically
the
idea
that
you
just
have
yeah
lots
of
Tangles
that
you
multiply
together
in
order
to
represent
your
n-dimensional
array,
which
could
obviously
be
like
spatial
temporal
Dimensions,
as
well
as
like
Ensemble,
Dimension
and
so
on,
and
so
forth,
and
at
least
the
few
papers
that
I've
seen
out
there
on
these
ones
so
far.
They
they
basically
claim
to
be
like
really
really
accurate
at
super
high
compression
factors.
So,
for
example,
this
picture
that
I've
put
it
here
on
the
left.
K
How
how
this
is
possible
to
be
achieved,
but
they're,
basically
there
to
they're
really
good
in
representing
smooth
data
and
just
as
a
comparison,
there's
other
techniques
which
are
also
belong
to
this
group
of
Transformations-
that,
for
example,
the
zfp
splits
your
data
into
little
blocks
and
then
basically
puts
little
basis
function
into
these
blocks
and
that's
why,
if
you
ramp
up
the
compression
a
bit
too
hard,
that's
why
you
get
these
this
block
structure
appearing
and
a
similar
for
for
a
set
which
basically
fits
a
couple
of
functions
in
there
and
that's
why
you
see
these
these
faces
on
the
other
was
smooth
surfaces
that
should
be
so
tensor
trains
I.
K
Think
in
general
are
really
good
to
keep
in
mind.
If
you
want
to
represent
smooth
data,
it
is
not
clear
to
me
how
they
generalized
to
data
that
could
be
super
sharp
edges
on
one
end
of
the
of
your
field
and
rather
smooth
on
the
other
one
I
think
they're
more
on
the
end
of
being
fairly
expensive.
K
It
is
unclear
to
me
how
they
generalize
to,
for
example,
unstructured
bits
because
obviously
you're
in
the
unravel
you
were,
let's
say,
two-dimensional
field
into
a
single
Dimension
and
that
I
think
that
takes
away
basically
a
lot
of
compressibility
for
these
methods,
because
they
don't
really
can't
really
exploit
the
the
neighbor,
the
neighborhood
of
your
forgiven
gritzer.
K
The
next
approach
that
I
also
haven't
worked
on,
but
I
saw
one
or
two
papers
on
that
recently
and
so
I
want
to
put
it
out
there,
because
I
think
this
is
something
that
could
be
really
promising
for
certain
applications
of
data
compression.
Now
we'll
talk
about
these
use
cases
for
data
compression
in
a
bit.
K
Is
this
idea
that
you
could
also
use
a
neural
network
to
compress
your
data,
and
so
the
idea
is
really
that
you
say:
I
have
some
kind
of
n-dimensional
array
that
has
some
coordinates
XYZ
time
Ensemble,
whatever
Dimensions
you
can
think
of,
and
you
basically
train
a
network
to
return
you,
the
scalar
at
a
given,
coordinate,
given
the
input
coordinates
and
so
basically
start
by
defining
an
architecture
for
your
neural
network.
K
You
train
it
on
it
with
some
kind
of
loss
function,
for
example,
you
say
like
I
want
to
want
to
want
to
minimize
my
let's
say
we
mean
square
error
and
in
the
end,
you
store
the
information
of
your
original
data.
Set.
You
store
that
in
the
coefficients
of
the
neural
network
and
so
literally
decoding,
it
is
simply
just
providing
into
this
new
network.
Your
x
y
z
t
whatever
coordinates,
and
it
will
basically
do
you
do
the
inference,
then
through
the
neural
network,
and
you
basically
get
out
your
one
single
point.
K
The
cool
thing
about
this-
and
this
is
why
I
kind
of
like
it
is
that
it
basically
interprets
automatically
because
it
tries
to
fit
a
function
through
all
your
dimensions,
and
so
it
basically
doesn't
really
matter
where
you
evaluate
that
function.
As
long
as
as
long
as
you.
Obviously,
your
evaluation
and
training
data
set
is
is
coherent
enough.
K
You
could
just
like
yeah
pick
a
point
that
wasn't
actually
where
you're
that
wasn't
actually
used
to
train
your
data
and
you
to
train
your
neural
network
in
the
first
place,
but
I
guess
it
also
comes
with
a
lot
of
difficulties
that
I'm
quite
excited
for
to
to
understand
in
the
coming
years,
because
I
think
there's
more
and
more
people
looking
into
this,
but
in
general
it
is
a
it's
a
method
that
is
rather
expensive
in
terms
of
compressing
things
and
decompressing
things,
but
I've
seen
papers
that
basically
claim
yeah,
also
factors
above
a
thousand
X.
K
That
should
be
possible.
But
it's
really
unclear
to
me
how
easily
you
can
control
the
error
and
so
on
so
I
kind
of
want
to
put
it
out
there,
but
I'm,
not
so
sure.
That's
something
that
we
would
use
directly
basically
going
to
the
very
other
end
of
the
spectrum,
and
this
is
kind
of,
for
example,
the
standard.
That's
current
use,
the
similar
f
using
using
the
group
data
format.
K
Is
this
idea
of
linear
quantization
I
just
want
to
quickly
mention
that
so
you,
for
example,
you
go
from
like
you
text
your
data
set,
you
look
for
your
minimum
for
your
maximum
and
then
you
just
split
the
the
range
in
between
into
equidistantly,
and
this
is
where
the
linear
aspect
comes
in
equidistantly
with
some
bits.
So
let's
say
you
have
24
bits
available.
You
choose
the
size
of
for
every
number
of
presentation
and
then
you
just
split
it
into
two
to
the
power
of
24
quants,
and
then
these
sponsor.
K
Basically,
you
then
round
your
original
data
to
into
these
into
these
buckets,
and
these
buckets
then
represent
your
data.
The
problem
with
that
is
that
you
really
have
to
know
what
kind
of
data
distribution
you're
actually
using,
because-
and
this
is
just
like
one
of
my
favorite
counter
examples-
if
you,
for
example,
try
to
compress
nitrogen
dioxide
with
this.
K
You
will
see
a
bit
pattern
histogram
as
it's
shown
here
and
at
the
bottom,
that
basically
most
of
your
values
go
into
the
very
first
buckets
and
most
of
the
other
buckets
are
basically
empty,
because
your
data
is
more
logarithmically
distributed
rather
than
linear
list
distributed.
So
it
doesn't
really
fit
actually
in
there.
And
if
you
then
look
at
things
like
the
entropy,
you
see
basically,
that
seven
bits
are
effectively
unused
and
simply
by
like
all
the
values
being
in
the
first
buckets
and
nothing
in
the
almost
nothing
in
the
other
buckets.
K
You
can
directly
see
that,
for
example,
your
first
bit
doesn't
contain
any
information,
because
you
basically
know
it's
going
to
be
a
zero,
not
the
one
which
would
then
encode
the
second
half
of
your
of
your
range,
an
alternative.
There
is
like
logarithmic,
quantization,
where
you
simply
between
the
Min
and
Max.
You
then
distribute
your
your
Quant
slightly
differently,
for
example
in
the
log
spacing,
if
you
use
the
same
data
for
that,
actually
it
turns
out
to
be
better
because
nitrogen
dioxide
is
a
more
logarithmically
distributed.
K
You
can
see
that
your
your
first
bit,
approximately
is
probably
the
information
you
first,
but
it's
approximately
one,
because
it
splits
the
histogram
into
more
or
less
two
two
equal
parts,
and
so
it's
quite
interesting
that,
basically
you
or
like
this
is
one
of
the
challenge
that
we're
also
facing
that.
We
basically
need
to
know
what
our
data
looks
like.
K
What
are
the
statistics
of
our
data
in
order
to
choose
an
appropriate
compression
method
and
that's
a
big,
Challenge
and
I
think
this
is
definitely
something
that
has
to
be
worked
on
more
to
kind
of
automate
this
to
to
make
it
to
kind
of
develop
standards
of
how
different
compression
methods
should
be
used
and
I
mean.
This
is
probably
where
do
would
like
to
criticize.
If,
because
at
some
point,
when
they
came
up
with
their
compression
methods
of
like
using
the
linear
quantization,
they
probably
just
thought
about
yeah.
That's
the
easiest
way
and
I.
K
And
what
do
we
do
with
that
data
if
we
compress
it
in
a
certain
way
and
I
think
this
is
really
highlights
one
of
the
one
of
the
more
underlying
things
that
we
have
to
understand
when
it
comes
to
data
Expressions
that,
depending
on
the
method
that
we're
using
there's
different
control,
knobs
so
different
things
we
can
choose
in
order
to
make
this
compression
method
work
better
or
not
so
good
and
the
first
one-
and
this
is
basically
in
this
linear
quantizational
logic.
K
Quantization-
is
really
the
idea
that
you
choose
the
number
of
bits.
So
many
you
choose
the
size
of
your
data,
set
a
priority
and
then,
at
the
end,
you
get
some
error
out
of
this
in
order
to
get
to
your
compressed
data-
and
this
might
be
rather
unsatisfying
because
you
kind
of
need
to
work
your
way
back.
K
In
order
to
understand
what
error
actually
happened
in
your
data
set
given
a
certain
size,
but
you
may
be,
rather,
you
might
be,
let's
say
like
positively
surprised
if
it's
really
small,
but
you
rather
would
like
to
control
the
error
rather
than
size,
and
so
let's
say
the
second
group
of
compressor
methods
and
many
of
the
Transformations
fall
into.
K
That
is
that
you,
you
basically
choose
your
maximum
error
that
you're
happy
to
happy
to
tolerate,
and
often
this
is
literally
just
like
one
scalar
that
you
choose
so,
for
example,
for
this
zfp
library
that
I've
mentioned
earlier
that
spits
everything
everything
into
these
little
little
blocks,
you
kind
of
say,
like
yeah,
I
want
to
have
a
maximum
absolute
error
of
X
and
then
press
play,
and
then
it
kind
of
compresses
that
and
at
the
end,
you're,
like
my
data
set,
is
that
small?
K
So
you
really
go
from
this
error
two
to
size
and
then
have
your
compressed
data.
However,
all
of
these
or
like
all
of
these
methods
that
fall
into
group,
one
and
two
are
basically
such
that
the
the
the
compression
step
and
the
information
loss
step
are
combined.
So
you
can't
really
disentangle
them
which-
and
this
is
why
I've
mentioned
group
number
three
might
be,
especially
in
our
applications
where
we
do
yeah.
K
K
Is
that
actually
there
are
methods
that
let
you
choose
the
errors
independently
and
really
literally
for
every
number
and
I'll
we'll
give
one
example
later
and
you
can
combine
it
with
basically
any
lossless
compression
compressor
that
you
can
think
of,
and
so
you
kind
of
take
these
these
two
steps
of
like
introducing
an
error
because
you've,
you've,
somehow
truncated,
your
your
system,
yeah
your
data
and
choosing
the
actual
compression
step
to
make
this
independent,
and
this
kind
of
I
think
gives
us
a
lot
of
flexibility,
because
we
have
a
much
better
control
about
the
error
as
well
as
we
can
even
later
say.
K
Like
oh
yeah,
we
still
want
to
keep
the
same
error,
but
we
want
to
have
another
compressor
which
we
know
then,
for
example,
is
not
going
to
affect
our
errors,
which
probably
makes
all
the
all
the
scientists
going
to
work
with
that
data
happy
because
they
know
we
still
get
the
same
errors
before,
but
maybe
it's
like
smaller
or
decompresses,
faster
or
slower,
and
so
on
and
so
forth,
and
so
I
think
it
really
makes
sense
in
order
to
phrase
this
challenge
a
bit
better.
K
Is
that
we
kind
of
need
to
see
which
use
cases
do
we
usually
have
for
data
compression
and
so
I
kind
of
like
try
to
came
up
with
a
couple
of
examples
in
order
to
highlight
how
data
compression
can
be
different,
especially
at
really
high
resolution,
where,
obviously,
we
have
a
lot
of
data
to
deal
with
so
case
number
one
and
then
just
like,
as
an
example
is
like
the
typical
case
of
real
analysis
data
so
meaning
you
have
like
one
Institution,
for
example,
has
produced
the
data
set.
It's
a
certain
data
set.
K
That
is
well,
it's
like
it's
like
one
data
set,
but
it's
basically
used
by
many
many
different
people,
let's
say
era
5
or
something
like
this.
So
it's
a
data
set
where
if
it
was
small,
it
would
be
really
beneficial
because
you
could
easily
like
download
it
to
different
to
different
servers
to
different
institutional
clusters,
and
people
could
reuse
it
quickly
because
it
is
basically
a
data
set
that
is
compressed
once
but
decompressed
many
many
times
by
different
users.
K
Decompression
speed
is
basically
really
the
the
key,
but
less
relevant
might
be
the
compression
speed
because
you
may
say
like.
Oh,
we
actually
would
be
happy
to
sacrifice
like
basically
have
like
a
slower
comparison,
speed
for
a
much
smaller
file
size
and
then
also
something
like
portability
is
obviously
important,
because
it's
going
to
be
used
by
different
people
and
they
may
say,
like
oh
I,
just
want
to
have
like
one
grid
location,
but
give
me
all
the
time
steps,
meaning
you
kind
of
have
a
random
access
of
that
data.
K
The
case
number
two
is
kind
of
like
what
I've
traced
here
is
like
research
simulation,
so
basically
meaning
like
oh
I-
want
to
do
work
on
this
project.
I
want
to
do
this
little
experiment
with
my
model,
you
run
it
and
you
basically
yeah
important
is
the
decompression
speed,
because
you
may
read
it
out
many
many
times,
and
you
also
don't
want
to
really
include
much
of
an
overhead.
K
If
you,
if
you
write
to
the
data,
you
want
to
do
like
a
random
access,
but
maybe
size
is
not
that
relevant,
because
you're
only
going
to
keep
it
for
a
couple
of
months
on
your
on
your
computer
and
then
you
kind
of
put
it
into
some
long
story.
Long
storage
archive,
which
is
then
case,
number
three
I.
K
Think,
where,
like
really
size,
is
the
absolute
thing
that
matters
and
portability
may
not
be
that
relevant,
because
it
might
be
just
a
file
that
really
is
just
stored
there
and
isn't
actually
meant
to
be
touched
that
often
anyway
and
so
decompression
speed,
maybe
not
that
relevant,
Random
Access,
mainly
that
relevant
either
and
then,
let's
say
case,
number
four
operational,
really
I,
probably
couldn't
think
of
anything
better.
Like
let's
say
literally,
everything
is
important
because
you
produce
a
lot
of
data.
K
You
want
to
distribute
it
to
a
lot
of
people,
they
want
access
on
a
frequent
basis
and
so
on
and
so
forth,
and
so
really
the
question
that
we
should
ask
is:
what
do
we
compress
right,
because
this
is
I
think
really
the
difference
between
these
idealized
cases
that
you
see
in
a
lot
of
different
compression
algorithms,
everything
looks
nice
and
smooth
and
you're
like
hey.
K
We
can
compress
that
to
like
a
factor
of
three
thousands
like
yeah,
but
our
data
doesn't
look
as
much
as
like
a
textbook
example
as
your
if
your
paper
claims
to
be-
and
this
is
one
of
the
examples
that
I
produced
by
just
looking
at
the
different
variables
that
are
in
the
Copernicus
atmospheric
monitoring
service.
This
is
one
of
the
services
that
our
operationally
produced
at
dcmwf.
It's
basically
the
chemistry
forecast
of
the
atmosphere,
and
so
you
see
here
basically
try
to
put
as
many
histograms
into
one
plot
as
possible.
K
You
can
see.
Some
of
them
are
absolute
like
because
it's
a
log
scale,
absolute
spikes,
so
they
actually,
their
range
is
really
small
and
they're
more
like
linear,
distributed.
Other
variables
span
many
many
orders
of
magnitude.
It's
a
super
multimodal
distribution,
some
value,
they
all
have
like
very
uncertainties,
somehow,
possibly
like
many
many
zeros.
If
you
think
about
precipitation,
there's
many
errors
in
the
world
where
it
currently
doesn't
drain,
then
also
the
problems
that
we're
also
facing
are
really
that
sometimes
move
some
fields
are
super
smooth.
K
Some
are
really
have
strong
gradients
that
may
change
which
you
from
location,
to
location,
from
vertical
layer
to
Vertical
layer.
Sometimes
we
store
things
on
unstructured
grid
or
even
sometimes
we
want
to
store
things
in
spectral,
coefficients
and
I,
have
mass
data
and
so
on,
and
so
for
these
kind
of
data
that
we're
dealing
with
the
challenge
to
find
a
really
good
compressor
is
not
bigger,
and
so
I
really
want
to
ask
this
more
underlying
question
more
to
lead
over
to
the
topic
that
I
actually
want
to
talk
about.
K
Is
this
question
of
like
what
information
is
there
actually
in
a
given
data
set?
And
there
is
definitely
in
terms
of
when
you
think
in
terms
of
dimension
in
terms
of
information?
There's
definitely
different
dimensions
right,
so
we're
here
at
a
kilometer
scale,
Workshop.
So
meaning
resolution
is
absolutely
important
for
us
in
the
higher
resolution.
The
more
data
we
will
produce.
K
The
more
information
is
then,
that
data,
but
we
also
look
at
many
different
numbers
of
variables
and
all
of
them
come
with
numbers
that
I
can
have
some
kind
of
precision,
and
so,
if
we
think
about
data
compression,
we
kind
of
truncate
this
information
space
somewhere
and
that
literally
made
a
little
little
bend
around
the
resolution
because
Andrew
gettleman
said
many
times
now
that
we
do
not
want
to
question
the
usefulness
of
of
kilometer
scale
so
definitely
including
the
resolution,
because
I
do
not
want
to
create
any
of
that.
K
But
we
make
28
something
in
terms
of
precision
and
I
think
this
is
really
where
we,
where
would
like
to
Define
this,
let's
say
what
I
call
the
real
information
problem
of
lossy
data
compression
is,
where
you
say,
I
have
an
original
data
set.
I
may
have
some
research
questions
that
I
would
like
to
ask
and
I
want
to
know,
what's
the
smallest
subset
of
this
data
set,
which
can
still
answer
that
exactly
that
same
question
in
a
qualitative
way.
K
So,
for
example,
your
hurricane
is
still
going
to
make
the
landfall
at
the
same
time
roughly
at
the
same
time,
and
so
the
question
really
is
like
what,
if
we
think
about
in
terms
of
compression
What
compression
error,
is
okay
to
have
and
I
just
literally
pulled
a
number
out
of
the
hat
and
said
303.25
times
some
unit,
plus
some
uncertainty,
and
the
question
really
is
like,
depending
on
the
application
on
the
use
case,
there
might
be
certain
digits
in
there
that
we
trust
and
other
ones
that
we
do
not
trust
and
we'd
rather
would
throw
away
in
order
to
save
memory.
K
And
so,
if
you
think
about
this
in
terms
of
let's
say
Kelvin,
because
at
some
temperature,
303.25
Kelvin
and
your
use
case
is
a
weather
forecast
that
you
want
to
communicate
to
users
on
their
phone.
Then
you
may
say:
like
okay.
393
is
good
enough.
But
obviously,
if
you
think
about
something
else,
then
maybe
you
want
to
preserve
the
0.3,
and
maybe,
if
it's
about
I,
don't
know
millimeters
of
rain
over
a
certain
period,
you
say
like
wow,
it's
an
extreme
stream
weather
event.
K
So
our
model
is
probably
not
good
enough
to
represent
it
anyway,
so
we
can
could
just
truncate
it
to
300,
and
it's
basically
good
enough
because
it
would
be
the
uncertainty
of
that
forecast
is
masking
that
that
Precision
anyway,
and
so
really
there's
there's
notion
of
like
what
is
an
acceptable
error
depends
on
a
lot
of
different
things
that
I've
just
mentioned,
and
so
the
question.
K
The
underlying
question
that
we
want
to
ask
is:
is
there
any
way
that
the
uncertainty
in
our
data
can
be
estimated
if
unknown,
because
for
many
of
our
variables
or
like,
let's
say
the
other
way
around
for
some
of
our
vowels
I'm,
pretty
sure
that
people
have
a
good
idea
of
what
the
uncertainty
is?
Let's
say,
temperature,
but
there's
a
lot
of
variables
where
we
don't
know.
Is
it
like
10
to
the
minus
5?
Is
the
10
to
the
minus
3
or
is
it
something
else?
K
K
And
this
is
kind
of
what
we
wanted
to
tackle
finder
framework
that
answers
that
question
are
probably
given
some
data
and
in
order
to
motivate
that
a
bit
more
imagine
you
have
some
data
here
and
now
it's
just
represented
in
some
numbers
0.05,
and
then
you
have
some
ditches
at
the
end.
The
question
is:
do
you
trust
them
at
the
end
or
not?
Probably
not,
if
you
don't
trust
them
for
compressibility
reasons,
you
should
throw
them
away
and
in
terms
of,
if
you
encode
this
into
bits.
The
question
therefore
is.
K
Basically:
where
do
you
cut
where
you
just
put
the
line
to
distinguish
between
the
stuff?
That's
real
that
you
would
like
to
keep
in
the
stuff
that
you
would
rather
throw
away,
because
these
are
the
high
entropy
bits
that
are
really
not
well
compressible.
So,
ideally,
you
do
want
to
cut
them
off
in
order
to
get
smaller
file
size
which,
by
the
way,
also
means
that
you
can
directly
communicate
your
uncertainty
within
the
data
set
without
encoding
that
explicitly
the
the
uncertainty
which
I
find
is
a
great
opportunity
of
that.
K
K
So
basically,
here
in
the
vertical
we
kind
of
compare
all
the
bits
that
are
still
sitting
in
the
vertical,
so
you
may
end
up
with
like
the
bit
stream
or
bits
that
are
in
adjacent
grid
points,
and
so
just
as
an
example
here
in
this
bit
stream,
you
would
have
zeros
mostly
followed
by
zero
and
words.
One
either
remains
one
or
switches
back
to
zero,
and
so
you
can
basically
put
this
into
a
joint
probability
Matrix.
K
This
is
just
an
example
here,
for
example,
put
into
into
probabilities
of
transitions
between
zero
and
one
one:
zero
zero
one
and
one
one,
and
then
you
can
calculate
the
mutual
information.
So
basically
the
question
of
like
how?
What
does
one
bit
tell
me
about
the
next?
And
if
there's
is
there
any
information
in
that?
K
If
there's
no
information,
it
basically
suggests
that
we
should
throw
this
away,
and
so
we
came
up
with
this
with
this
bitwise
real
information
content
framework
that
we
hope
is
going
to
be
helpful
in
order
to
address
this
is
more
umbrella
question
in
the
future,
which
is
imagine
you
start
with
some
great
data
it
doesn't
have
to
be,
it
can
be
even
unstructured
and
you
kind
of
like
have
some
kind
of
space
filling
curve
that
basically
moves
along.
K
As
long
as
you
can
expert
point
is
somewhere
adjacent
in
space
and
then
for
every
bit
position.
So
let's
say
the
sign
bit
the
exponent
bits,
the
mantissa
bits.
You
look
at
the
bits
that
are
in
the
surrounding
group
points.
If
they're
all
identical,
then
your
entropy
is
zero
anyway.
So
information
also
has
to
be
zero,
because
entropy
is
kind
of
the
the
upper
bound
of
your
information,
but
the
mutual
information,
for
example,
if
a
lot
of
zeros
are
next
to
each
other
and
a
lot
of
ones
are
next
to
each
other.
K
Your
Mutual
information
is
actually
approaching
one
bit,
and
so
you
can
basically
do
that
for
every
single
position
and
then,
in
the
end
you
get
up.
You
come
up
with
one
of
these
graphs,
as
you
can
see
in
the
bottom,
which
maximizes
at
one
bit.
That's
it's,
basically,
the
maximum
information
that
you
can
have
for
a
given
bit
position
in
your
in
your
data
and
then,
for
example,
you
see
like
that.
K
This
might
look
a
bit
different,
but
most
of
the
data
that
I've
looked
at
is
qualitatively
something
like
this,
and
so
the
question
that
you
can
ask
is
then,
how
many
bits
do
I
actually
have
to
retain
in
order
to
preserve
a
certain
amount
of
information?
And
here
in
this
example,
for
example,
you
say
like
the
the
purple
dashed
line
is
the
line
that
we
would
cut.
If
we
wanted
to
preserve
99
of
real
information
in
in
our
data
set
and
remove,
then
the
rest
for
compressibility
and.
A
E
He's
he's
back.
K
Yep,
that's
fine,
I'll
I'll
just
show
a
few
more
slides,
and
then
we
can
go
over
to
the
questions.
If
that's
okay,
for
you
yeah
thumbs
up
yeah,
perfect
great,
and
so
basically
the
idea
is
then
that
instead
of
thinking
about
error,
Norms
I
kind
of
motivate
people
to
think
about
in
terms
of
preserved
information,
because
this
is
actually
a
quantity
that
kind
of
moves
with
the
statistics
of
your
data
better
and
it's
hopefully,
and
we
have
to
see
whether
whether
we
actually
understand
this
metric
in
its
full
breath
as
hopefully
a
metric.
K
That
is
actually
it's
a
bit
more.
It's
a
more
useful
metric
for
data
that
you
don't
know
much
about,
and
so
in
this
example
we
just
like
compressed,
for
example,
water,
vapor
and
99
of
information,
compressing
that
gives
us
a
compression
factor
of
40
roughly
compared
to
double
Precision.
K
Whereas
then,
obviously,
if
we
were
to
go
all
the
way
to
the
right,
throwing
away
about
20
of
the
information,
we
would
actually
get
visual
artifacts.
So
we're
kind
of
well
away
from
that
and
kind
of
want
to
skip
that.
But
you
can
also
in
terms
of
high
resolution.
Modeling
we've
also
applied
this
technique
to
some
satellite
data,
for
example,
here's
brightness
temperature-
and
it
worked
rather
well.
K
However-
and
this
is
probably
one
of
where
we
get
into
the
the
Bad
and
the
Ugly
aspect
of
this
of
this
Workshop-
is
that
every
time
you
calculate
this
bitwise
information
content,
you
end
up
in
a
situation
where
you
get
basically
an
average
for
the
entire
domain
that
you
look
at
so
meaning.
Here,
for
example,
you
would
get
an
average
information
for
as
well
as
these
like
patchy
Cloud
features.
K
You
can
see
as
well
as,
for
example,
the
more
sea
surface
temperature
signal
that
you
can
see
here
over
the
Black
Sea,
and
so
this
is
obviously
poses
a
problem
and
we
realized,
especially
when
you
look
at
fields,
for
example
like
precipitation,
where
which
can
be
quite
patchy
in
the
tropics,
but
then
rather
smooth
in
were
more
smooth
in
at
this
resolution,
at
least
in
in
mid
latitudes,
and
if
you
then
calculate
the
bitwise
information
content
of
that
you
get
one
number
to
suggests.
K
You
should
keep
that
position,
but
actually,
in
the
end
that
might
be
more
inflicted
by
the
tropics,
you
cut
may
end
up
cutting
more
Precision
off
in
this
in
the
in
the
extra
Tropics
that
you
actually
want
to
a
similar
problem.
That
we
faced
was
that
some
people,
for
example,
used
uploading
numbers,
truncated
them
expected
an
absolute
error
that
was
uniform
but
realized.
K
No,
it
actually
isn't
because
floating
Point
numbers
are
rather
logarithmically
encoded,
so
the
little
transitions
in
the
amplification
of
the
arrow
that
you
can
see
here
are
exactly
the
eight
4
8
16
degree,
32
degrees,
ISO,
either
isotherms,
and
so
also
one
of
the
problems.
K
If,
if
you
think
about
this
quantization
that
happens,
then,
is
that
you
basically
filter
out
gradients
that
may
cause
problems
here
or
there
I
don't
know,
but
it's
basically
yeah
just
saying
that
you
kind
of
like
neighboring
good
points
that
you
really
can't
significantly
distinguish
that
you
basically
put
them
onto
the
same
level,
I
kind
of
want
to
skip
that,
and
basically,
in
the
end,
we
just
want
to
say
that
we've
been
working
on
different
implementations.
So
I've
been
writing.
This
package
called
bit
information.jl,
which
is
my
Julia
reference.
K
Implementations
and
other
people
have
taken
it
up:
developing
python
x-ray
net
CDF
grip,
hdf
interfaces,
so
this
is
basically
all
something
that
we're
working
on
to
understand
how
it
can
be
used,
how
it
can
be
useful
for
people
and
I
hope
yeah
triggers
some
yeah
some
discussion
around
how
we
actually
store
our
data
and,
at
this
point,
I
think
I
would
like
to
end
this
talk
and
I'm
very
excited
to
hear
any
of
your
thoughts
thanks
foreign.
L
Hi
Milan
Peter
lauritzen
here
from
ncar
a
couple
of
comments
to
make
use
of
data
compression
I,
think
there's
a
lot
we
can
do
on
on
the
actual
model
side
as
well.
So,
for
example,
if
you,
if
you
look
at
budgets
in
the
model,
they're
very
sensitive
to
any
kind
of
truncation,
so
if,
for
example,
what
I
talked
about
yesterday,
those
energy
Tendencies
I'm
subtracting
two
huge
numbers
to
get
something
order.
L
One
out,
however,
if
I
compute
those
inline
in
the
model,
you
know,
then
it
doesn't
matter
if
you
start
compressing
the
data,
so
I
just
want
to
shout
out
that
I
think
it's
really
important.
We
compute
a
lot
inline
in
the
model
and
then
we
can
really
make
use
of
of
data
compression,
and
you
can
make
similar
Arguments
for
functionally
related
quantities
in
the
atmosphere
such
as
chemical
species
and
then
the
other
comment
I
have
is,
if
you're
a
dynamicist,
there
are
some
Diagnostics
that
require
zonal
means.
L
So
if
you
compute
those
after
the
fact,
you
need
really
high
resolution.
3D
data
and
one
issue
would
move
into
these
unstructured
grid
models.
Is
it's
not
trivial.
To
do.
Zonal
means
less,
for
example,
something
we're
working
on
at
ncars,
so
that
you
can
compute
these
kinds
of
Eddy
statistics
in
line
and
again
in
that
way,
massively
reduced
the
amount
of
output,
and
then
you
could
do
compression
on
it
afterwards.
K
You
know
I
absolutely
agree,
and
this
is
why
I've
been
almost
like,
advocating
a
bit
more
to
do
the
compression
directly
within
the
floating
Point
format,
simply
because
it
is
a
format
that
we
do
understand
relatively
well,
where
people
have
been
thinking
putting
a
lot
of
thought
into
how
to
actually
design
rounding
modes
so
that
the
bias
free,
meaning
that
actually,
if
we
truncate
in
the
floating,
Point
format,
you
should
at
least
theoretically,
you
should
be
able
to
yeah-
do
not
distort
any
budgets
that
you're
calculating
which,
in
the
end,
I
mean
it's
just
like
a
sum
over
a
lot
of
truncated
values,
but
I
do
agree.
K
This
is
absolutely
something
that
we
should
also
look
out
for,
for,
let's
say
these
are
more
transformation
based.
Algorithms
is
that
it
might
be
really
tricky
with
them
to
guarantee
that
any
kind
of
means,
averages
and
so
on
are
preserved,
and
this
is
not
apparently
clear
for
those
ones,
whereas
for
as
long
as
you
have
like
a
bias
free
rounding
mode,
this
should
all
basically
take
the
boxes.
C
Is
next
I'm
Elon?
It's
Andrew
gentleman
thanks
for
your
participation
today,
starting
earlier
earlier
this
morning.
C
Our
time
afternoon,
your
time
again
great
talk,
I,
had
a
sort
of
General
comment
and
thought
that
thinking
about
the
concept
that
we
have
of
stupidly
parallel
processing,
maybe
there's
a
form
of
stupid
compression
where
we
just
don't
output
stuff,
and
this
is
kind
of
along
the
lines
of
what
Peter
was
saying
that
we
think
ahead
about
what
we're
going
to
use
from
these
simulations
that
we
get
out
of
a
cmip
model
of
we
at
ncar.
C
We
actually
call
these
mower
Runs
Mother
of
all
runs
where
we
output
everything,
but
maybe
that's
not
the
model
we
should
think
about.
We
should
think
about
only
outputting
what
we
need
to
Output,
based
on
the
analysis.
It's
almost
like
thinking
the
information
content,
one
step
further
through
the
analysis
of
do
you
need
it
at
all,
and
some
of
this
again
is
what
Peter
was
saying
about
doing
things
online,
but
I'm
wondering
if
you
had
comments
on
that
and
that
and
if
there's
ways
of
taking
this
information
content
even
further
Upstream.
K
Yeah
no
absolutely
I
mean
this
is
where
kind
of,
like
my
little
graph
came
in,
where
I
tried
to
like
make
the
little
bend
around
the
resolution,
because
I
think
this
is
really
what
we
actually
in
the
end
have
to
tackle
is
that
our
information
Space
is
really
high
dimensional,
including
like
what
applications
we
use
who's
the
end
user
in
the
end
and
I
do
agree.
K
If
we
have
a
clearly
defined
objective
that
our
data
is
supposed
to
do,
for
example,
give
us
the
temperature
forecast
at
a
given
location,
then
it
is
relatively
clear.
What
we
want
from
the
data
compression,
however-
and
this
is
probably
the
caveat
I
want
to
mention-
is
that
we
all
somehow
work
in
research
right,
so
there
might
be
someone
coming
up
with
a
research
question,
a
few
cup
four
years
down
the
line
and
say
like
I,
want
to
look
at
this
and
suddenly.
K
This
is
an
application
which
requires
something
different
from
the
data
and
I
think
this
is
the
underlying
problem
that
we
are
facing
with
all
this
data
compression.
That's
why
everyone's
basically
careful
everyone
tries
to
like?
Oh
no
I
don't
want
to
throw
this
data
away
because
I
may
use
it
at
some
point
down
the
line
and
so
yeah.
This
is
really
where
we
have
to
find
some
kind
of
Common
Sense.
C
I
mean
I,
guess
one
example
that
we
always
do,
which
is
the
first
thing
is
we
do
off?
We,
we
usually
compress
with
resolution,
but
remember
we
have
to
think
about
resolution
in
4D
and
its
resolution
in
time,
and
then
we
lose
information,
but
that's
a
convenient.
You
know
way
for
us
to
do
it,
and
we
can
probably
think
about
that
with
respect
to
high
resolution
as
well.
What
time
should
we
compress,
or
what
times
do
we
just
throw
out
so
anyway,
just
a
thought.
A
Thanks
Andrew
Christian,
Jacob,
yeah
I
have
a
question
too.
So
you
mentioned
you
know
you
had
your
precipitation
example,
which
I
really
liked
where
you
saw
the
exotropics
behaves
different
from
the
tropics
and
then
you
compress
it
as
if
it
was
a
global
field,
I
mean
in
principle.
We
don't
have
to
do
this.
The
one
advantage
we
have
is.
We
know
the
system
that
we're
modeling.
A
K
I
have
a
better
solution,
and
this
is
exactly
this.
This
method
that
I
frame
is
like
round
plus
lossless,
where
you
literally
split
the
the
truncation
and
the
actual
compression
to
two
different,
two
different
steps,
meaning
that
really
the
first
thing
you
do.
A
Yeah,
but
there
may
be
other
advantages
in
chunking
in
space
right.
A
lot
of
research
questions
concern
the
tropics,
so
you
many
people
download
Global
fields
and
throw
away
70
of
them
straight
away
because
they're
looking
at
a
certain
aspect,
so
I'm
just
saying
there
may
be
opportunities
to
be
smart.
So
last
question,
though,
goes
to
Pierre
Luigi
and
then
we'll.
B
Move
on
I
Milan,
just
following
on
on
this
idea
of
future
uses
of
the
data
I,
think
you
hinted
it
in
your
presentation.
There
are
also
mathematical
considerations,
so
we
may
want
to
take
first.
Second
derivatives.
It's
not
just
what
we
want
to
do
with
the
data
is
what
operators
we
want
to
apply
to
the
data.
Is
there
a
way
to
take
that
into
account.
K
Yeah,
this
is
basically
where
all
the
transformation
data
compression
methods
that
I've
outlined
a
little
bit
claim
that
they're
better
at
because
I
mean
in
the
end
they're,
basically
fitting
some
kind
of
basis
function
to
your
data
and
if,
as
long
as
that
basis
function
is
nicely
differentiable,
you
basically
end
up
also
with
the
compressed
data
set
that
doesn't
just
whether
once
you
compute,
let's
say
the
gradients,
it
doesn't
really
get
much
distorted.
K
I
do
see
these
advantages
and
I
find
it
tricky
at
the
moment
to
really
see
like
I'm
still
looking
for
applications
where,
like
someone,
calculates
some
gradients
or
something-
and
it
absolutely
goes
wrong
if
you
use,
for
example,
this
like
round
plus
lossless
method.
So
if
you
have
a
good
application,
please
to
say,
like
oh
yeah,
for
example,
look
at
this
data
set
and
if
you
compute
the
gradient
of
that,
then
you
can
press
it,
and
then
you
want
to
look
at
this.
Please
share
with
me.
K
I
do
see
that
it
is
absolutely
a
use
case,
that's
necessary
for
us,
but
I,
don't
necessarily
see
yet
that
the
transformation
based
methods
are
make
it
make
it
basically
like
the
gradient
preservation
of
them,
make
it
absolutely
worth
considering
them
over
the
other
class
of
compression
techniques.
A
Thank
you
again,
Milan
for
the
great
talk
you
can
see
from
the
reactions
everybody
liked
it.
So
we
let
Milan,
go
and
have
dinner,
I
suppose
go
home
and
have
dinner
and
and
we'll
go
and.