►
From YouTube: RustConf 2020 - Under a Microscope: Exploring Fast and Safe Rust for Biology by Samuel Lim
Description
Under a Microscope: Exploring Fast and Safe Rust for Biology by Samuel Lim
Ever wondered what goes on behind the scenes of breakthroughs in understanding proteins, viruses, our own bodies, and more?
Take a deep dive as we journey through some of the workings of computational biology at large, along with its advantages and pitfalls. In this talk, we will see how Rust bridges the biological sciences with safe, performant, and scalable systems, and discuss how you can play a role even as a fresh Rustacean.
A
A
A
A
Rna
can
help
us
to
investigate
the
differences
between
individual
cells
as
single
cell
rna
analysis
or
in
groups
of
cells
or
communities.
As
we
would
see
in
bulk
rna,
we
can
look
at
specific
expressions
of
genes
or
sets
of
genes
and
interrogate
them
by
themselves
and
look
at
how
their
sequences
can
compare.
A
And
as
a
note,
as
many
of
you
have
probably
been
affected
by
copit19
covet,
19
is
an
rna
virus
which
means
that
the
virus's
entire
genetic
sequence
is
contained
in
the
capsule
and
its
format.
Its
information
format
is
rna,
so
we've
looked
a
little
bit
about
how
rna
comes
to
be,
why
it's
important
and
how
sequencing
can
play
into
that
a
little
bit.
A
It
comes
to
computing
where
we
actually
need
to
process.
The
information
that
we've
gathered,
rna-seq
processing
is
how
we
can
quantify
compute
and
analyze
the
data
that
we've
taken
after
we've
left
the
wet
lab
and,
after
we've
done
our
isolation
of
of
different
samples
of
different
reads
and
different
fragments.
A
Everything
from
our
information
to
more
information
and
inferences
we
want
to
gather,
can
go
digital
and
the
rust
and
the
applications
in
rust.
Coming
soon.
I
promise
from
this
basic
understanding
of
the
mechanisms
of
rna
and
rna-seq,
there's
a
simple
methodology
that
we
can
take
and
that's
we
read
the
information
from
the
files
and
the
experimental
samples
that
we've
taken
and
turn
them
into
data
streams
that
we
can
manipulate.
A
We
map
and
align
these
data
streams
to
reference
data
where
they're
applicable.
So
we
can
take
the
information
that
we
have.
We
can
position
them
and
we
can
compare
them,
see
the
similarities
and
the
differences
and
categorize
them,
and
we
can
finally
analyze
the
output,
depending
on
whether
we
want
to
quantify
the
categories
and
the
expressions
of
different
genes
or
different
sequences,
and
we
can
send
these
results.
We
have
for
further
processing
in
other
pipelines
or
other
programs.
A
Now
rna-seq
tools
are
a
broad
spread.
They
can
be
focused
on
many
different
analyses
or
different
methods
to
achieve
analysis
and
to
name
a
few.
Some
of
them
may
be
worried
about
the
quantification,
the
categorization
and
the
analysis
of
expression
of
different
genes
and
different
rna
sequences
within
our
data
stream
and
each
have
its
own
uses
and
advantages,
but
most
are
largely
disjoint
in
terms
of
their
programmatic,
tooling.
A
A
This
is
one
example
of
a
direct
translation
where
we
take
not
only
basic
configuration
values
like
verbosity,
and
we
have
flags
for
that
and
we
can
take
sub
commands
and
other
options
and
if
something
is
not
relevant
to
the
functionality
that
we
want
to
define
right
now
in
the
abstraction
we
want
to
define
right
now,
we
can
skip
it
so,
thankfully,
to
crates
like
struck
up
and
pico
args.
A
A
A
A
And
it's
continuing
to
grow
over
time
when
data
can
not
only
grow
over
time,
but
can
grow
orders
of
magnitude
in
size
just
from
the
process
of
a
single
step
in
the
pipeline
performance
does
matter
so
in
a
way
we
can
actually
think
about
sequencing
and
the
general
process
of
analysis
in
three
distinct
steps.
Where
we
read
the
information
we
parse
the
data
we
map
in
a
line
and
we
paralyze
operations,
we
analyze
and
we
export
the
data
that
we
need
for
further
analysis.
A
If
sequence
data
were
the
simplest,
we
could
possibly
conceive,
we
would
have
a
continuous
stream
of
fragments
of
bases,
joined
together
continuously
and
realistically
require
more
than
just
a
continuous
stream.
We
require
more
information
that
we
require
structure
around
it.
Now.
How
does
that
structure?
Look
one
example
would
be
the
fastq
format
where
we
take
in
not
only
the
sequence,
information
which
is
crucial
to
our
analysis,
but
also
the
identifier
which
is
the
identification
of
what
sequence
we're
looking
at
the
quality
scores.
A
Parsing
in
rus
is
not
just
a
general
fee.
We
actually
do
need
some
specific
features
to
biological
file
formats
sometimes,
and
we
can
actually
measure
this
information.
Thankfully,
to
a
professor
hung
lee
at
harvard.
We
have
been
able
to
quantify
some
of
these
basic
benchmarks
for
common
analyses
and
parsing.
A
A
A
And
once
we've
parsed
all
these
files,
we
need
to
do
basic
processing
to
them,
which
includes
mapping
and
alignment
the
basis
of
most
bioinformatics
pipelines
and
not
all
mapping
and
alignment
is
created.
Equally,
some
are
better
expression.
Analysis.
Some
are
better
at
quantifying
different
parts
of
rna.
A
A
So
the
commonality
between
these
tools
is
that
parallelism
and
efficiency
is
actually
no
longer
optional
in
rna-seq
processing,
it's
an
assumption
of
the
field,
and
so
in
some
of
the
rewrites
of
these
tools,
we
had
to
defer
expertise
to
designers
of
rust
systems
and
the
community
more
at
large.
So
the
commonality
between
these
different
tools
is
actually
that
parallelism
and
efficiency
with
our
time
and
our
memory
is
no
longer
optional,
with
rnac
processing,
most
computers.
A
This
includes
cargo,
where
we
have
an
actual
build
tool
similar
to
pip,
similar
to
snake,
make
cmake
all
brought
together
and
cohesive
in
the
sense
that
you
can
test
that
you
can
make
that
you
can
build
that.
You
can
run
that
you
can
compile
all
these
different
things
and
all
these
different
tools
and
all
these
different
crates
together.
A
And
we
have
a
crates
ecosystem
where,
if
we
know
the
rust
code
compiles,
we
know
that
it
will
compile
everywhere.
That
rust
is,
and
in
that
sense
we
can
continue
to
build
upon
different
crates
and
different
tools
and
different
libraries,
based
on
the
assumption
that
we
know
it
works
abroad
and
across
and
when
something
is
not
available
in
this
ecosystem
and
when
something
is
so
domain
specific
that
we
really
need
a
tool
from
somewhere
else.
A
The
biggest
asset
of
the
rus
programming
language
going
forward
may
not
just
be
the
language
itself,
but
also
its
community,
and
the
community.
Mentorship
model
is
what
biologists
can
continue
to
take
and
learn
from
rust
beyond
the
language,
even
as
they
go
further.
Thank
you
for
joining
into
this
talk.
I
hope
you
enjoyed
it.