►
Description
Laval University Case Study
Guest Speaker: Guillaume Moutier (Laval University)
Data Science Services (Valeria)
A
Well,
that
was
a
really
interesting
presentation
from
from
Shepherd,
because
in
fact
it's
mostly
what
I
was
about
to
say.
But
now
there
you
see
exists,
it's
quite
different,
but
two
tools
and
the
approach
is
it's
quite
the
same
level.
University
is
in
the
Quebec
City
Canada.
We
are
number
ranking
number
six
in
Canada
for
for
a
research,
and
that
means
that
we
have
data
scientists
on
the
other
campus
picture,
a
picture
of
the
campus.
A
But
that
makes
a
population
of
about
three
thousand
people
who
will
qualify
as
data
scientists,
with
our
researcher
graduate
students
and
so
on,
and
those
are
the
people
we
are
trying
to
to
help
with
data
science
platform
that
very
closely
resemble
to
it.
What
what
weather
has
just
been
presented
so
a
few
words
about
the
project
that
we
have
data
services.
We
call
it
Valley
area,
that's
kind
of
a
brand
name.
The
idea
is
to
provide
professional
services
just
consulting
support
and
sorry.
A
Inserting
support
and
training
around
all
those
data,
science
and
machine
learning
things,
and,
of
course
we
will.
We
will
base
our
services
around
the
technological
layer
that
we
have
with
our
own
data
centers
on
the
campus,
once
we
have
a
compute
Canada
or
a
compute
Quebec,
which
are
resources
that
are
freely
available
to
researchers
all
throughout
Canada
and,
of
course,
other
services
in
the
cloud
that
we
can
also
leverage.
So
the
idea
is
exactly
as
was
presented,
to
have
a
central
data
Lake
with
some
storage
capacity
and
compute.
A
The
capacity
on
top
of
which
we
will
have
the
terrorization
and
management
tools
totals
are
mostly
for
transferring
the
data
recti
ingestion,
because
in
research
we
are
using
more
and
more
more
sensors
and
data
scrapping
from
everywhere
and
resources
that
we
currently
have,
whether
at
compute,
canada
or
campus,
do
not
talk
about
this
new
tile
of
of
that
ingestion.
So
we
have
to
provide
this
in
our
new
platform.
Of
course,
tools
for
processing,
visualization
and
analyzes
and
storage,
and
access
and
catalogs.
A
But
those
are
mostly
technological
solutions
and
we
want
to
abstract
this
layer
of
technology
to
our
researchers
so
that
they
are
able
to
use
it
in
a
much
easier
way
and
that's
where
we
will
provide
you.
The
service
access
to
data
scientists,
slash
data
engineers
who
will
help
our
researchers
to
take
the
most
of
everything
that
we
are
putting
up
in
trigger
the
project.
We
are
also
working
at
the
data
management
framework.
A
We
were
talking
earlier
about
access
and
security,
and
things
like
that.
That
will
be
part
of
this
management
and,
of
course,
of
course,
support
and
training.
So,
if
I
look
at
the
architecture,
we
are
quite
lucky
here
on
the
campuses.
We
have
four
different
data,
centers
ways,
of
course,
optical
fibers,
covering
the
campus
with
connections
from
up
to
100
gigabits.
That's
also,
we
have
our
own
wherever
on
network
I
burn
network
all
throughout
the
see,
that
is
to
connect
our
research
centers,
that
we
have
in
different
hospitals,
and
things
like
that.
A
A
We
made
some
technological
choices
for
this
project,
so
first
was
to
you
need
to
network
with
100
Gig
links
all
throughout
the
data
centers
and
40
gigs,
where
it's
not
necessary.
We
also
use
self
or
a
clique
storage,
especially
as
we
can
liberate
restore
data
centers
to
have
our
data
spread
across
different
physical
sites.
A
Those
are
completely
isolated
sites
geographically,
but
in
fact
it's
the
same
difference
by
network
that
will
spread
throughout
the
data
centers.
Usually
we
come
from
the
world
of
fetch
pcs,
so
it's
mostly
based
on
the
roster
for
a
file
system
and
things
like
that.
But
here
it's
a
complete
change
of
architecture.
We
go
with
self
which
ways
which
will
be
much
much
easier
for
us
to
interact
with
the
data
and
and
move
it
around.
Of
course
it
poses
other
challenges
which
I
will
talk
about
a
little
bit
later.
A
It
will
be
a
completely
shared
service.
So
that
means
that
we
don't
know
yet
at
which
path
will
part
which
parts
will
grow,
whether
it
is
storage,
whether
it
is
the
compute.
So
from
the
start,
who
has
to
separate
everything,
and
once
this
has
been
done,
then
we
chose
to
you
to
yourself:
storage.
Well,
in
fact,
there's
no
leader
of
Hadoop,
as
we
can
directly
use
spark
and
tensorflow,
and
all
these
all
these
new
things
directly.
We
salute
the
use
of
the
standard
hello.
A
If
we
look
at
the
technological
landscape,
we
have
our
data
scientist
on
the
top
left,
which
will
still
be
using
their
standard,
that
analyzes
tools,
jump
and
or
our
studio
SPS.
I
something
like
that,
but
we
want
to
offer
to
its
applications,
tool
and
services
that
switch
our
top
fly
and
the
top
right
for
data
ingestion
data
discovery,
which
is
very
important
for
us.
A
It's
not
just
about
having
data
in
a
central
dynamic,
but
it's
about
finding
it
and
right
now
we
will
act,
this
kind
of
tools
such
as
10,
a
verse
or
a
sea-can
catalogs
or
I
roads
for
the
management.
I
think
that
we
want
you,
of
course,
I
have
Jupiter
and
Arnett
books
on
demand
and
we're
still.
The
effort
is
HPC.
Services
are
which
are
much
more
than
natural.
On
the
bottom
part,
you
can
see
the
different
applications
and
tools
that
we
will
use
and
in
fact
we
don't
pretend
in
this
project
to
invent
anything.
A
Those
are
standard
applications
that
that
you
can
find
there.
Our
project
is
much
more
about
integrating
all
of
these
in
a
seamless
experience
for
our
users,
and
here,
if
it
works,
I
can
make
a
small
demo
the
kind
of
an
experience
that
we
want
to
offer.
So
here
it's
the
portal
and
please
it's
it's
only
a
small
proof
of
concept
that
we
need
here.
A
A
So
here
what
some
logged
in
a
cuckoo
clock
I
have
directly
access
to
the
portal
where
I
can
find
Jupiter
get
lab
4
for
storing
the
data.
Do
we
will
have
C
can,
of
course,
other
and
other
services,
no
more
standard
services
like
databases
and
access,
my
storage
and
everything.
So
here
I'm,
the
researcher
I-
am
able
launcher.
Jupiter
Jupiter
hub
I
have
a
selection
of
different
types
of
that
books.
A
That
I
can
that
I
can
choose
yeah,
for
example,
let's
say
I
want
to
do
some
side,
PI
and
then
I
click
install
everything
is
based
on
Cuban
G's
on
open
shifts.
So
here
as
I
launch
my
notebook,
you
can
see
a
new
pod
which
has
been
instantiated
it's
under
my
name
directly
because,
as
we
have,
the
integrated
authentication
I
can
I
can
gather
all
the
data
that
are
necessary.
A
You
know,
of
course,
username,
but
maybe
some
other
restrictions
and
thing
like
that
and
I
cannot
show
you
it
to
you
right
now,
but
we
have
a
new
and
you
version
of
this,
where
I
would
be
about,
depending
on
my
credentials,
to
select
many
CPUs
I
can
run,
which
memory
if
I
am
able
to
use.
Gpus
and
thing
like
that,
and
here
I
have
my
chapter
netbook,
which
is
right.
Now
it's
backed
by
an
s3
storage.
A
You
know,
as
we
are
running
inside
up
and
shipped,
we
had
to
decouple
the
storage
and
we
didn't
want
to
use
persistent
volumes
in
openshift
for
this
kind
of
thing,
it's
better
to
have
every
on
the
on
the
safe
storage,
which
is
much
easier
later
to
interact
with
other
applications
and
another
tools.
So
here
you
have
everything
and
those
are
well
standard
netbooks,
which
is
not
really
not
really
interesting.
A
We
are
talking
about
petabytes
of
data
and
very
very
heavy
heavy
computing
heavy
compute.
We
have
some
genomics
computing
that
takes
a
few
weeks
to
to
run.
So
we
want
you
to
make
sure
that
we
will
have
a
sufficient
sufficient
performance
with
staff
or
spark
back,
and
the
results
are
currently
being
published
right
at
something
that
we
are
doing
in
common
short
answer
is
yes,
it
works.
Of
course,
there
is
a.
There
is
some
drawback
in
terms
of
performance,
but
still
regarding
the
versatility
that
we
have
by
using
an
object,
storage
and
cost
efficiency.
A
It's
much
much
better
to
run
right
out
with
staff
directly,
and
we
start
our
Hadoop.
As
I
said,
the
network
has
been
redesigned
as
I
just
showed
you.
We
have
this
working
a
proof
of
concept
with
Jupiter
grown
up
and
shipped
with
the
self
provisioning
of
the
different
flavors
of
notebooks,
and
forget
that
we
have
also
OpenShift
working
with
red
analytics
with
go
for
spark.
We
have
another
one,
a
little
pension
with
desk
or
for
scheduling
for
scheduling
other
computer
process.
A
We
had
a
fire,
we
have
to
complete
single
Ssangyong
across
all
the
solutions
which,
which
really
makes
things
very,
very
much
easier
for
everything,
one
you
and
what's
next,
we
are
working
and
seek
and
integration
with
Jupiter.
So
that
is
the
ability
directly
from
Jupiter
to
interact
with
secant
to
to
preload.
All
your
data
sets.
The
data
sets
that
you
have
bookmarks
or
your
favorite
data
sets.
They
will
be
directly
imported
interpreter
unknowns.
A
We
have,
we
are
working
or
sold
to.
We
have
some
other.
Second,
extensions
you'll
be
able
to
let's
say,
for
example:
oh
you
have
antennae,
said
that
I
find
interesting
and
I
want
to
reduce
some
pre
analyzes
on
this.
So
it
should
be
very
easy,
with
only
one
button
to
spawn
directly
on
that
book.
Have
it
connected
to
the
right
data
set,
important
data
set
and
then
I
am
able
directly
to
interact
with
it
without
doing
almost
anything,
we
still
have
to
work
on
the
globus
integration.
A
Globus
is
the
kind
of
system
that
way
to
exchange
data
between
different
data
centers.
For
example,
we
have
computed
canada.
We
have
big
facilities
in
the
West,
for
example,
and
then
Coover
or
in
mountain.
So
whenever
we
want
to
leverage
with
HPC
power,
we
have
to
move
the
data
from
point
to
point,
so
we
reach
with
Globus
at
very
high
speed.
We
want
to
use
volt
as
an
integration
to
store
all
the
client
secrets,
because
we,
as
we
use
an
extra
storage,
usually
if
those
are
not
public
public
resources.
A
We
are
working
also
an
uncute
flow.
We
are
developing
custom,
cube
spawner
you
use
inside
tied
up
and
shift
to
be
able,
as
I
said,
to
gather
different
information
and
lunch
notebooks
with
different
parameters
and
some
different
things
that
elephants
that
I
don't
remember
right
now,
but
that's
about
it
and,
of
course,
first
everything,
everything
that
we
do.
You
know
all
those
all
those
new
technologies
and
integration,
and
things
like
that
will
be
published
and
source
for
for
everyone.
B
This
is
amazing,
stuff
I
promise
you.
This
is
not
the
work
you
float
meeting.
I
will
say
that
that
last
point
that
you
had
on
what's
next,
particularly
relative
to
bonding
and
integrating
compute
resources
into
Jupiter.
That
is
probably
the
most
requested
feature
that
we've
had
in
queue
flow.
I
would
love
to
get
your
thoughts
on
it.
A
Yeah
at
the
end,
but
yes,
sure,
definitely
parts
that
we
touch.
We
want
to
test
and
push
to
the
limits
you
know
and
decent
immersion
is
very
important
because
you
know
our
researchers
they
are
not.
There
are
not
IT
people,
they
are
not
even
a
scientist
yet
so
here.
The
the
Monday
that
I
have
from
the
management
of
the
university
is
clearly
stated,
as
we
want
our
researchers
to
do.
Research
not
lose
time
in
setting
up
infrastructures,
setting
up
libraries
and
find
the
right
which
work.
That's
the
loss
of
time
for,
for
us,
yeah.
A
B
B
Being
able
to
just
describe
number
of
CPUs
number
of
Jews
and
your
container
image
and
creating
an
entire
new
four-year
cluster
for
a
job
to
run
in
it
spins
it
up,
it
creates
it.
It
runs
it.
It
stores
the
data
and
then
it
shuts
down-
and
you
know
the
bridge
between
that
and
doing
that
from
Jupiter
pub
should
be
very,
very
small,
so
I'd
love
to
get
your
thinking
and
how
we
help.
Theta
scientists
execute
that
that
very
step,
hey.
A
Comes
to
my
mind,
Bay
Beach
would
be
some
interest
for
you.
As
we
come
from
HPC
from
HPC
worm,
we
have
managed
to
integrate
sea
VMFS
inside
tide
or
not
booking
edges.
So
like
this,
we
have
access
to
directly
pre,
combined
modules
for
calculation
and
thing
like
that,
which
I
without
adding
to
have
them
inside
the
image.
So
that
means
we
keep
the
images
as
small
as
possible.
Where
is
access
to
huge
amount
of
different
scientific
libraries
without
I
think
anything
to
do
so.