►
From YouTube: CI WG demo: Building Cyberinfrastructure for Multidisciplinary, Multiscale Agroinformatics
Description
Date: 11/2/2019
Presenter: Jim Wilgenbusch
Institution: Minnesota Supercomputing Institute
Midwest Big Data Hub
A
A
They
might
be
using
to
describe
these
data,
but
I
can
say
from
personal
experience
that
you
know,
especially
when
we
start
talking
about
developing
cyber
infrastructure.
The
fact
that
these
things
exist
and
that
there
are
many
of
them
lead
to
important
implementation
questions
which
one
do
we
follow,
for
example,
you
know,
and,
and
and
and
that
gets
into
interesting
arguments
and
and
cultural
warfare
that
is
difficult,
sometimes
for
CIA
developers
to
be
able
to
sort
out
by
themselves
when
you
get
into
the
actual
data.
A
In
fact,
I
would
propose
that
it's
actually
one
of
the
biggest
obstacles
to
realizing
some
of
the
promises
of
the
Big
Data
revolution
is
the
that
data
are
messy
and
the
fact
of
the
matter
is
the
the
sort
of
non
sexy
side
to
big
data
is
actually
rolling
up
your
sleeves
and
figuring
out
how
to
clean
it.
A
little
bit
of
an
example
of
that
is,
you
know
in
some
data
sets
that
we're
collecting
their
digital
data
sets
they've
actually
been
published
on.
We
see
things
like
this,
where
you
know
the
management
conditions
are
really.
A
You
know
full
of
different
languages,
different
spellings,
different
capitalizations
and
so
forth.
There's
different
ways
that
you
might
say
low,
nitrogen
that
have
could
have
a
pretty
significant
impact
if
you
started
to
actually
do
analyses
on
these
different
activities.
So
the
broken
data
issue
is
certainly
a
huge
challenge.
It's
not
specific
to
Ag,
but
a
digital
agriculture,
but
but
it
is
tremendous
in
terms
of
working
on
these
problems.
A
I
just
wanted
to
throw
this
out
there,
because
this
isn't
a
new
problem
right.
It
has
been
around
a
little
over
a
decade.
Now,
where
you
know
Jim,
Gray
commented
and
in
2007
is
the
tools
for
capturing
data,
both
at
a
mega
scale
and
I'm
milla
scale,
a
really
just
dreadful,
and
here
we
are
again
a
decade
later
and
I
think
it's
the
same.
Now.
A
What
are
some
of
the
big
privacy
concerns
around
how
data
are
being
shared
and
aggregated
and
I?
Think
when
looking
at
infrastructure
and
developing
CI
in
particular,
these
are
really
important
issues
to
look
at
because
they
can
be
major
roadblocks
in
terms
of
getting
work
done
and
I
say
that,
because
actually
I've
done
a
good
amount
of
work
in
the
academic
health
space
where
we
actually
have
pretty
good
defined
boundaries
around
handling
data
and
and
how
to
do
it,
and
likewise
actually
for
financial
information.
A
There
are
federal
based
laws
and
for
both
of
these
cases,
there's
also
state-based
laws
that
govern
how
those
data
are
stored
and
shared,
but
there
actually
isn't
a
whole
lot
right
now.
In
the
agricultural
space,
there
are
some
groups,
many
of
them
industry-led
that
are
beginning
to
develop
some
prototypes
of
standards.
But
it's
still
largely
speaking
the
Wild
West
in
terms
of
policies
and
and
rules
around
how
to
handle
out
your
cultural
data.
A
We
worked
for
about
two
years
in
the
state
of
Minnesota
to
actually
get
legislation
on
the
book.
This
just
passed.
This
August,
where
data
that
are
stored
on
our
platform,
which
I'm
going
to
be
describing
in
just
a
minute,
are
actually
considered
to
be
public,
private,
non-public
and
that's
important,
because
there's
a
couple
of
ways.
A
Once
data
go
on
to
state-run
systems,
that
people
can
actually
get
them,
that
would
that
would
potentially
keep
people
from
actually
contributing
to
common
repositories,
and
these
are
similar
to
sort
of
FOIA
or
data
privacy
acts
that
people
have
within
their
own
states
to
request
those
data.
So
we're
actually
now
able
to
put
the
data
onto
the
platform
that
I'm
going
to
describe
and
give
some
assurance
of
privacy
of
that
data,
which
is
which
has
been
extremely
important
and
in
terms
of
working
with
farmers
and
and
private
companies.
A
A
sort
of
last
general
challenge
that
I'm
going
to
face
is
really
the
scope
of
the
things
that
we
want
to
do.
The
datasets
are
incredibly
diverse,
genomic
Environmental,
Management
socio-economic,
hence
the
name
gems
or
data
that
we're
interested
in
actually
getting
to
be
interoperable,
so
that
we
could
make
broad
inferences
about
various
phenomena
related
to
the
food
value
chain.
A
A
We
need
to
develop
cyber
infrastructure
that
really
scales
from
small
data,
which
is
actually
a
lot
of
the
problems
that
we
work
on
to
big
data
and
people
that
typically
are
sort
of
thinking,
a
big
in
terms
of
volume.
But
we
all
know
that,
there's
more
to
to
that
and
then
develop
models
and
there's
a
lot
of
room
in
the
space
that
span
diverse
data
types
and
also
time
and
space
on
the
social
end,
and
you
could
definitely
make
an
argument
that
this
also
bleeds
into
technology,
but
on
the
social
and
I
would
say,
promote.
A
You
know
a
develop,
develop
standards
that
will
be
useful
to
the
community,
while
also
recognizing
that
data
are
going
to
be
messy,
and
if
your
platform
doesn't
deal
with
that
reality,
then
it's
probably
not
going
to
work
well,
also
promote
fair
data.
I
think
everybody
here
knows
what
fair
is,
while
also
respecting
data
privacy
concerns
that's
critical
and
I,
and
in
the
AG
space
it's
critical,
because
in
actually
a
64%
of
R&D
now
and
AG
is
done
in
the
private
sector.
So
that's
a
recent
change.
A
As
in
the
last
decade,
we
flipped
over
to
actually
funding
more
research
in
the
private
sector
in
in
Ag
and
we're
in
it's
a
hopeful
story.
The
hopeful
story
is,
you
know,
we
start
off
with
visions
of
what
we
might
want
to
want
to
have,
and
we
we
we
have
the
need
for
all
of
these
intervening
technologies
before
we
can
actually
realize
the
airplane
that
moves
large
numbers
of
passengers
fast
and
without
those
intervening
technologies
it
won't
work
in
the
same
way.
A
You
know
we
have
in
1792
the
Farmers
Almanac,
which
tries
to
help
the
farmer
make
better
decisions
in
terms
of
what
they're
going
to
do
with
Ag
and
we
feel
like
there
have
been
a
ton
of
intervening
technologies
that
make
it
realistic
for
us
to
develop
a
platform
to
address
some
of
these.
These
questions
that
we
have
that
maybe
we
put
to
the
foamer
Farmers
Almanac
before,
but
we
could
use
more
data-driven
solutions
in
the
future,
so
the
timing
is
right
and
and
I
think
again,
I'm
preaching
to
the
choir
here.
A
What
Robus
is
that
there
obviously
are
different
values
to
to
running
on
these
different
platforms,
and
you
know
if,
if,
if
you
have
to
do
long-running
compute
intensive
jobs,
you
want
the
cluster
where
there
are
serious
privacy
concerns
which
we
actually
are
supporting.
We're
not
going
to
host
the
platform.
But
it's
going
to
be
hosted
behind
company
firewalls
and
then,
of
course,
for
developers.
It's
important
to
be
able
to
move
that
platform
on
to
laptops,
so
people
can
move
around
and
make
changes
to
the
platform
very
easily.
A
The
specific
contributions
that
we're
making
can
be
sort
of
wrapped
up
into
two
parts.
Gems
share
gems
share,
essentially,
is
what
is
controlling
access
to
the
data
so
who
sees
it
when
and
what
they
specifically
see.
Somebody
commented
earlier
on
that
this
sounds
a
little
bit
like
what
maybe
irods
might
do
and
right.
A
A
It
supports
this
notion
of
sort
of
open
private
and
pooled
data
sets
that
are
also
really
critical
for
some
of
the
things
that
we're
engaged
in
and
it
it
really
is
sort
of
beyond
data,
in
the
sense
that
we
can
actually
what
we
call
products
include.
Not
just
data
sets,
but
also
workflows
in
this
case.
Right
now,
the
workflow
largely
means
a
jupiter
notebook,
but
we
have
actually
worked
in
terms
of
wrapping
up
other,
more
complicated
workflows
that
actually
operate
outside
of
a
notebook
gem
tools.
A
Likewise,
we
actually
do
some
guessing
now
on
I
think
we're
up
to
16
ontology
Xand
vocabularies,
so
that
we
notice
column
headers
that
fit
some
of
those,
as
well
as
some
of
the
data
within
them,
and
we
make
some
suggestions,
but
there's
a
pulldown
style
menu
where
the
user
actually
can
change
that.
If
we
got
the
guest
wrong,
you
know
Agri
Vox,
as
opposed
to
the
crop
research
ontology,
a
couple
of
projects
that
we're
working
on
genomes
growlers.
It
operates
across
the
G
by
E
by
s.
A
A
This
group
has
been
growing
and
they
have
been
feeding
us
outstanding
feedback
to
really
drive
the
tool
and
keep
it
grounded
in
important
ways,
little
description
of
sort
of
what
that
is
specifically
from
the
iaa
standpoint.
What
we've
done,
of
course,
we're
at
a
university,
so
we've
got
an
awesome
opportunity
to
train
students
and
we're
doing
that
now
through
our
Bibby
program,
bioinformatics
and
computational
biology,
and
have
actually
gotten
some
fantastic
students
out
of
that
program.
Working
with
us
now
more
information
you
can
find
at
our
now
newly-minted
website.