►
From YouTube: DevoWorm (2020, Meeting 35): biological problems in software, Hacktoberfest, Multivariate Analysis
Description
Discussion on working with and encoding biological problems in software, Hacktoberfest updates, and a presentation on PCA, tSNE, and UMAP.
A
B
A
B
What's
that,
how
are
you,
oh
okay,
I
haven't
done
a
lot
directly
for
diva
warm
stuff,
but
are
we
I
haven't?
You
submitted
the
extract
yet
for
diva
stuff.
A
Yeah
I
did
for
two
neuro
match
correct.
B
Okay,
because
I
can
I'll
do
we
can
talk
elsewhere,
the
other
one
for
the
other
group,
but
yeah.
I
haven't
done
a
lot
of
people
alone
yet,
but
I
will
be
doing
more
with
them.
So.
A
Well,
that's:
okay!
Yeah!
I
think
that's
it's
fine!
I
think
that
neuromatch
stuff
is
pretty
keeping
you
pretty
busy
and
then
the
other
the
rip
stuff
we're
doing
with
the
other
group.
So
it's
like
kind
of
taking
all
the
time
right
now,
but
yeah.
I
think
it
that
should
be.
You
know
something
good
will
come
out
of
that,
hopefully
yeah
that
work
is
a
little
bit.
Maybe
I
don't
know
if
it's
far
afield,
but
it's
definitely
something
they
probably
won't.
D
Hi,
dick
hello,
how
are
you
all
right?
Okay,
I'll
do
some
quick
rearranging?
Okay,
there
you
go,
one
fell
off
and
I
have
a
wanted
to
meet
right
in
the
middle
of
your
meeting.
D
Okay,
okay,
all
right!
Oh
susan,
can't
make
it
today,
okay,
but
I
hope
you
will
schedule
with
her
a
repeat
of
her
talk:
okay,.
A
A
Yeah,
that's
pretty
good
talk.
It
was
on
her
optical
imaging
words:
oh
okay,
okay,
good
yeah
and
yeah.
I
got
I
got
a
fairly
good
feedback.
I
mean
you
know
we
don't
have
any
other
experts
in
the
group.
So
it's
oh
yeah.
Last
week
I.
A
Good
yeah
yeah
he
gave
her
some
practice
in
giving
the
talk.
It
was
it's
pretty
interesting
that
I
think
other
people,
okay,
good,
so
good.
Well,
welcome
to
the
meeting!
Sorry
I
have
to
keep
shifting
the
time,
but
I
had
a
prior
engagement.
That
was
it
one
time
and
then
it
shifted
to
another
time.
So
I
have
to
make
that
meeting.
So
I
have
to
move
this
one,
so
I
hope
I
hope
people
are
you're
not
watching
or
if
you're,
not
at
the
meeting
you're
watching
this
later.
A
So
that
being
said,
we're
going
to
talk
about
hacktoberfest,
which
is
the
event
going
on
on
github,
where,
if
you
make
a
certain
number
of
commits
or
the
repository,
that's
participating,
you
get
like
a
free
t-shirt
from
google
and
that's
you
know
enough
incentive
for
some
people
to
make
commits
to
things.
So
it's
you
know,
there's
a
whole
marketing
campaign
around
it
and
we'll
talk
about
that
in
a
bit.
A
A
And
then
now
you
emailed
me
this
week
about
that.
You
had
a
problem
and
that
problem
was.
You
have
a
lot
of
ideas,
but
you
don't
know
how
to
implement
them
in
like
software
or
something
like
that.
I
know
how
I'm
just
extremely
small.
D
The
problem
is
learning
a
language
without
anybody
around
to
help
is
very
difficult
because
all
languages,
all
computer
languages,
have
folk
folk
knowledge
which
is
not
transmitted
with
the
language
itself.
A
Yeah
yeah
see,
I
mean
yeah
yeah,
that's
a
problem.
I
mean
in
a
lot
of
like
I
mean
they're
they're.
You
know
people
who
go
online
and
chat
about
different
problems.
Usually
it's
like.
If
you're
in
a
class,
you
know
you
might
have
a
homework
and
you
solve
the
problem.
D
D
So
you
know
this
there's
a
lot
of
assumed
knowledge
right.
Okay,
I
mean
I
used
to
speak
fortran
four,
but
that's
ridiculous.
You
know
it's
now
quite
dead
language
and
the
the
subsequent
fortran
is
pretty
opaque
too
the
one
that
they
have
now.
D
Yeah,
there's
still
there's
a
new
version
of
fortran,
but
for
a
fortran
four
programmer.
It's
a
lot
to
learn.
D
So
anyway,
I've
tried
mathematica
and
so
far
it's
you
know.
My
code
looks
like
fortran
4.
D
A
A
Yeah
yeah.
Definitely
so
I
mean,
like
you,
know,
I've
I've.
I've
done
a
lot
of
stuff
like
in
matlab,
and
I've
done
a
lot
of
stuff
in
python,
which
is
a
newer
language.
So
I
mean
that's
that's
what
I'm
familiar
with
and,
of
course
now
python
seems
to
be
like
the
open
source
standard.
So
when
we
go
to
google
summer
or
code,
for
example,
is
standard
is
usually
python
and
it's
you
know
it's
not
that
hard
of
a
language
to
learn,
but
it's
it's
a
bit
hard
to.
A
You
know
really
kind
of
get
into
it
like.
How
do
you
do
x?
You
know,
so
you
have
to
put
together
the
pieces
from
like
you
figure
out.
Okay,
this
this
requires
a
loop
or
this
requires
some
module.
You
have
to
load
it
in
and
you
have
to
get
it
running
and
then
so
it's
it's
a
it's
a
challenging
thing,
but
fortunately
there
are
a
lot
of
tools
out
there
that
are
available
for
people
to
make
their
work
easier.
A
So
I
think
jesse
was
part
of
neuromatch
this
summer
and
neuromatched
it's
a
summer
school
and
they're
running
it
virtually
and
they're.
Trying
to
get
people
to
run
simulations
in,
I
guess
and
so
jesse
do
you
want
to
relay
your
experience
on
that.
B
Oh
I
mean
they
they
had.
B
A
B
We
basically
everything
in
book
collabs
and
I
really
like
they
had
very
good
videos.
I
liked
the
tutorials,
but
it
was
a
little
bit.
There
was
a
little
bit
of
a
little
bit
of
ping-pong
back
and
forth
when
you're
trying
to
learn
or
write
code
within
the
collabs.
B
That
was
probably
the
strangest
part
for
me,
but
overall,
like
what
they
were
trying
to
do,
and
I
I'm
even
I'm
still
it's
a
bit
delayed
with
everything
I
was
having
right
now,
but
I'm
still
working
on
my
own
slowpod
going
over
the
material
again,
but
I
really
liked
that
they
had
a
very
good.
It's.
B
B
It
was
hard
with
the
fact
that
there
was
just
so
much
content
and,
and
it
was
very
difficult
to
get
to
the
level
of
depth
and
the
amount
of
time
you
know
like
we
had
one
day
on,
reinforcement,
learning
or
like
it
was
very.
It
was
very
extremely
over
over
saturated
in
terms
of
material,
but
but
it's
a
reference
to
go
back
to
and
look
at
again,
and
I
think
it's
definitely
setting
the
bar
for
that
kind
of
a
learning
environment
in
the
future.
A
Yeah,
I
think
that's
a
good,
oh
like
so,
let's
back
up-
and
you
mentioned
google
co-lab,
so
they
actually
have
notebooks.
Now
that
you
can
execute
code
in
so
they're
these
notebooks,
I
think
we've
talked
about
the
jupiter
notebooks
and
the
group
previously,
where
you
have
these
little
kernels,
which
are
these
little
windows,
and
you
put
a
piece
of
code
in
and
you
can
run
it
in
real
time.
A
So
you
install
this
editor
on
your
machine,
it's
for
collab
or
for
jupiter,
and
it
gives
you
this,
like
composition,
notebook
and
each
part
of
the
notebook.
There's
a
kernel
or
a
window
that
you
can
type
in
some
code
enter
some
code
and
run
it
in
real
time.
So
you
can
test
it
to
see
if
it
works,
and
so
we've
used
the
notebooks
in
google
summary
code
for
a
lot
of
things,
they're
very
convenient
and,
like
jesse
said
they
use
them
in
lesson,
plans
for
like
scientific
simulation.
A
So
in
this
case
they
were
doing
like
looking
at,
like
you
know,
neuroscience
simulations
or
other
things,
and
you
know
it
allows
you
to
point.
It
allows
you
to
do
everything
you
would
do
in
like
something
like
python,
but
it's
very
interactive,
and
so
we
also
yeah
so
collab
is
is
one
example.
Jupiter
is
another
example,
and
those
actually
run
not
just
with
python
but
with
other
languages
as
well
so
and
then
in
our
other
group,
we've
for
google
summary
code.
A
Actually,
we've
done
programming
in
other
languages,
so
python
is
a
standard,
but
it's
not
proliferated
to.
I
think
we
had
one
project
in
something
called
kotlin,
which
is
like
a
offshoot
of
python
and
then
another
one
in
julia,
which
is
actually
a
scientific
simulation
language
which
is
up
and
coming
so
you
can
see
that
there's
this
proliferation
of
languages,
even
now
with
with
just
you
know,
within
python
and
python,
related
stuff.
A
So
it's
always
challenging
to
get
things
like
moving
and
get
things
you
know
keep
on
top
of
things,
but
I
think
the
notebooks
are
a
good
way
to
approach
the
the
problem
I
think
also
having
like
you
know,
one
or
two
standard
languages
in
the
research
group
is
another
good
way
to
do.
It
is
in
terms
of
dick's
problem.
A
I
mean
we
still
don't
really
address
like
how
do
you
make
this
easier
like
for
a
single
researcher
to
go
in
and
say
I
want
to
use
python
for
this
problem.
How
do
I
put
it
together,
although
I
think
jesse
mentioned
in
narrow
match
they
have
these
tutorials,
so
they
work
very
hard
to
create
tutorials
on
different
exercises
in
python.
So
you
know
you
might
have
an
exercise
on.
A
How
do
you
make
a
a
simulation
of
a
cell,
or
you
know
just
a
cell
body
that
could
be
done
as
like
a
tutorial
people?
Could
you
know,
get.
A
A
It
was
very
oh,
yes,
the
annual,
so
this
last
week
we
had
the
annual
conference
on
open
worm.
It
was
the
year
so
every
year
they
have
a
get-together
of
everyone
on
the
board
and
all
the
senior
contributors,
and
they
present
on
different
things,
and
I
presented
on
some
of
the
stuff
we've
been
doing
in
this
group.
A
We
had
a.
It
was
very
interesting
because
you
don't
get
to
see
the
board
members
like
once
a
year,
and
there
are
a
lot
of
connections
there
with
like
venture
capital,
and
you
know
other
areas
of
like
business,
so
they
they're
trying
to.
I
think
stephen
larson
is
the
person
who's
in
charge
of
coordinating
a
lot
of
the
open
worm
stuff
he's
trying
to
get
a
board
together.
That
really
is
going
to
push
the
foundation
forward,
but
you
know
they.
So
we
talked
about
the
different
projects.
A
They
were
really
interested
in
what's
going
on
in
diva
worm,
so
you
know,
I
presented
a
number
of
the
things
that
were
going
on
here
and
they
they
were
pretty
impressed.
But
yeah
I
mean
overall,
it
was
a
pretty
good
meeting
and
you
get
to
see
people.
You
know
you
only
need
them
in
slack
every
once
in
a
while,
and
then
you
see
them
on
video
conference
because
open
room
used
to
have
general
meetings
where
everyone
would
show
up,
but
that
doesn't
happen
anymore.
A
So,
oh
I
wanted
to
get
back
to
the
conversation
about
the
co
learning
code
and
all
that
so
krishna
you've
been
contributing
to
the
divo,
learn
data
science.
Tutorials.
A
E
E
Did
you
make
it
bradley
which
one
the
the
issues
yeah?
I
I
created
an
issue
regarding
creating
a
label
for
hector
buffett.
A
E
E
A
Yeah
yeah,
it's
fine,
oh
yeah,
lazy,
dick
says
oktoberfest
is
yeah.
So
that's
oktoberfest
is
a
german
holiday
like
a
month
long
where
they
have
beer
and
everything
and
hacktoberfest
is
a
variant
to
that.
We'll
talk
about
that
in
a
minute
so
yeah,
I
guess
krishna.
You
had
some
contributions
of
some
data
science,
demos,
so
yeah
in
the
diva
learn
group.
Then
we've
been
or
in
the
diva
learning
organization
we've
been
putting
together
some
demos
for
data
science.
So
there
we
have
some
notebooks
as
well.
That
show
my
demos.
E
It's
not
all
about
python.
I
have
pushed
also
sql
and
bash
code
command
online,
linux
and
sql,
also
because
these
are
the
part
that
people
often
you
know,
ignore
because
sql
is
the
basic
of
you
can
say
there
can't
be
data
science
without
databases,
yeah
yeah,
yeah,
also
and
now
I'll
be
pushing
our
code,
and
I
was
working
on
some
genomic
data
analysis
in
r.
Once
I
once
I
draft
all
the.
E
I'll
be
pushing
a
six
seven,
you
can
say
lesson
like
structure
where
it
for
beginners
it
could
be
for
beginners
to
how
to
analysis.
You
can
say
gene
analysis
in
our
language
because,
most
of
the
time,
people
only
work
in
python
and
like
things
like
sql
and
bash
and
r
are
often
ignored.
E
So
I
wanted
to
be
you
know
a
cocktail
of
all
the
things.
A
Not
just
python
focus,
oh
yeah,
yeah!
That's
what
we're
just
talking
about
we're
talking
about.
Like
you
know
dick
was
you
know,
kind
of
learned,
pascal
at
one
time
and
did
a
lot
of
stuff
in
it
and
then
eventually
the
language
kind
of
evolved
away
from
where
he
was
so
it's
and
that
you
know
that
just
happens.
I
think
with
python.
A
It's
happening
to
some
extent
where
it's
changing
a
lot,
so
you
always
have
to
keep
like
on
top
of
it
and
but
yeah
I
mean,
I
think,
that's
good,
that
we
have
tutorials
for
that,
and
so
that's
another
place
to
go.
I
mean
that's
it's
hard
to
do.
Yeah,
it's
really
hard.
What
was
that.
A
B
So
I
was
looking
at
it
again.
I
answered
it.
B
A
B
Wondering
is
there
a
specific
divide
like.
B
Demos
specifically
for
data
science.
C
B
I
I
I
worked
on
something
during
in
the
past,
the
the
learner
imaging
thing
and
I'm
not
sure,
that'll
be
more
learning
or.
B
B
E
B
E
B
E
A
Might
be
an
education
thing,
I
I
guess
we
started
the
repos.
Let
me
share
my
screen
and
show
that
just
so
that
we
can
see
what
I've
been
mentioning
this
a
couple
times
now
and
so
here's
diva
learn-
and
this
is
data
science,
demos,
and
this
is
education,
so
I
think
education,
we
have
nothing
yet.
Oh,
we
have
the
the
journal
of
open
source
science
paper,
which
is
still
still
going
we're
trying
to
get
it
done.
A
I've
had
some
things
come
up
in
the
interim,
but
this
is
where
we
would
put
this.
So
this
is
like
scientific
education
content.
So
this
journal
of
open
source
science
paper
is
like
on
the
devo
learn
platform,
but
kind
of
like
a
paper
that
explains
it
and
you
know,
gives
you
more
reference.
You
know
future
further
references
for
like
what's
going
on
in
that,
so
I
think
probably
what
you're
talking
about
would
fit
into
this
repo,
and
so
it
would
be.
B
A
Yeah,
it
would
be
because
the
data
science
demos,
I
think,
are
largely
going
to
be
about
just
like
quick
tutorials
on
different
topics
like
you
know,
if
I
want
to
know
how
to
do
like
manipulate
some
data
or
maybe
run
some
python
code,
this
is
a
place.
Okay,
dick
a
place
for
you
to
post
those.
A
I
mean.
That's
that's
what
I
was
envisioning
you
know
in
these.
You
know
these
repositories
always
have
some
sort
of
like
drift
where
they
like
get
to
they
get
too
far
afield
and
they
have
to
reorganize
it,
but
I
think
that's
a
good
way
to
do
it
like
if
it's
sort
of
data
science,
if
it's
like
how
do
I
manipulate
data
or
analyze
data,
something
specific
to
that,
it
would
be
that's
where
it
would
go.
B
B
E
I
created
the
resource
section
mostly
for
you
know,
for
example,
for
important
github,
repos
of
other
people
or
if,
for
example,
like
code
free
code
camp
is
having
a
great
video.
So
you
can
share
that
video
in
the
resource
section
things
that
we
don't
own
is
basically
for
that
resource,
section
things
that
we
don't
own
and
yeah
yeah.
A
A
This
is
like
for
things
that
might
be
background
reading
networks,
and
we
want
to
be
careful
with
attribution
there.
We
don't
want
to,
and
then
this
is
networks.
This
is
stuff
from.
I
think
that
mayo
committed
from
some
of
his
work
on
networks
from
this
summer
there's
more
to
put
up
there,
but
I
haven't
put
it
up
there
yet
tutorials.
A
These
are
the
ones
the
the
straightforward
code
tutorials.
So
this
one
here
command
line,
basics
ipynb.
This
ipymb
is
a
notebook
and
they
don't
render
very
well
in
github.
But
if
you
download
it
and
have
a
editor
you'll
see
that
it's
it's
basically
like
with
it's
a
editor
with
some
windows
and
you
can
put
code
in
and
run
it.
A
This
is
a
different
format
for
a
tutorial,
so
this
is
for
this
is
krishna's
tutorial
on
sql,
which
is
a
structured
query
language,
and
so
this
is
actually
a
nice
way
to
teach
code,
because
you
have
this
description
of
what
it
is,
and
it's
very
I
think
it's
pretty
accessible.
Then
you
say:
what
can
it
do?
You
have
a
bunch
of
things
that
it
can
do.
A
A
You
know
it's
a
little
bit
vague
in
terms
of
like
where
you
bring
the
data
from,
but
it's
a
good
good
way
to
show
what
it
can
do,
just
as
a
specific
set
of
examples
and
then
inserting
new
values
and
updating
values,
the
leading
record
and
putting
memes
in
or
kind
of
a
way
to
soften
the
blow
of
learning
a
pretty
dry
subject.
So
I
think
that's
and
then
you
have
references
down
here.
So
I
think
that's
that's
a
good
start,
for
you
know
demos.
I
think
there's
a
lot
more
to
do
with
demos.
A
Dick
says
my
problem
was
that
40
years
ago
I
left
all
programming
to
my
students.
That's
kind
of
what
we've
been
doing
with
the
machine
learning
stuff
here
in
evo,
like
I've
been
getting
students
who
are
really
good
at
machine
learning
and
recruiting
them
to
work
on
some
of
this
stuff
for
diva
learn.
I
don't
I
don't
know
if
I
really
understand,
like
all
the
machine
learning,
so
it's
like
a
probably
run
with
the
same
problem
there
with
that
but
there's.
A
A
A
Maybe
we'll
talk
about
maybe
some
specific
problems
that
maybe
dick
wants
to
solve,
and
maybe
we
can
work
out
some.
Maybe
we
can
like
post
them
in
a
way.
That's
maybe
recruit
people
to
work
on
them
or
something
like
that.
I
don't
know
if
that's
something
you'd
be
comfortable
with
dick.
D
Yeah
sure
the
first
two
I
said,
four
problems.
I
guess
the
first
two.
Some
people
are
responding
to
them.
D
And
now
what
we
know
about
fractals
and
biology
is
at
some
point
you
get
down
to
the
molecular
or
atomic
level.
We
can't
be
fractal
anymore,
right,
okay,
so
the
problem
here
is
to
take
a
bottom-up
approach
and
try
to
simulate
the
motion
of
molecules
that
we
think
might
have
something
to
do
with
the
motion
of
the
diatom
and
see
if
that
motion
is
jerky
yeah
and
then
what
parameters
would
allow
us
to
match
the
model
to
observations?
D
Yeah,
okay,
so
if
I
mean
one
hypothesis
is
that
the
diatom
actually
moves
by
like
a
rocket
ship,
except
that
it's
it
moves
like
a
a
rocket
ship
that
was
proposed.
Oh
god,.
C
D
D
So
that
so.
A
A
B
D
I
generally
default
to
what
are
called
mass
gas
models:
okay,
okay,
which
may
which
basically
involve
discretization
of
space,
all
right,
yeah,
okay,
end
of
events
so
yeah
there
may
be
other
approaches,
continuum,
approaches,
etc.
But
in
any
case
that's
the
basic
problem.
Can
we
make
a
model
that
imitates
the
actual
observations
of
the
jerky
motion
of
diatoms.
D
D
And
you
know-
and
there
are
things
you
can
do
like
try
to
make
predictions
like-
how
would
the
diatom
move
against
the
force
things
like
that?
So
you
could
also
try
to
do
simulations
of
experiments
that
are
plausible
in
the
lab
and
there's
a
classical
experiment.
For
example,
going
back
to,
I
think,
maybe
the
1960s
done
by
margaret
harper.
D
D
The
one
alternative
is
that
the
diatom
trail,
the
gooey
stuff,
that
they
leave
behind,
gets
stretched
and
snaps
and
as
it
snaps
it's
like
holding
on
to
a
rubber
band,
letting
it
go
things
move
suddenly
very
fast.
C
D
A
A
A
Great
and
then
like
we
could,
I
mean
then
there's
also
the
thing
about
like
how
do
you
evaluate
across
models?
That
would
be,
I
think,
something
that
would
be
like
sort
of
beyond
the
immediate
problem,
but
well.
D
D
D
D
And
he's
doing
some
very
small
colonies,
you
know
just
a
few
cells
and
the
whether
or
not
that
motion
is
smooth
or
jerky
could
be
determined.
E
Sure
yeah,
I'm
not
from.
E
Yes,
of
course,
so
is
there
any
change
in
their
structure
or
something.
D
E
D
Yeah,
so
you
don't
expect
any
significant
change
of
shape
when
they
collide.
There
are
experiments
where
they
collide
with
the
beam
of.
A
A
A
So
we
had
a
so
there's
this
thing
called
the
oktoberfest.
Let's
see
it's
right
here,
I
have
the
image.
Oh,
let
me
share
my
screen.
I
didn't
do
that
yet
so
there's
this
event
called
oktoberfest,
like
I
said
it's
through
the
month
of
october
and
github
hosts
it.
It's
a
play
on
oktoberfest,
of
course,
which
is
a
german
holiday
where
they
celebrate
the
entire
month,
and
so
this
is
the
hecktoberfest.
A
This
is
for
the
evil
learn
or
you
know,
for
all
of
divaworm,
but
we're
focusing
on
divalern
for
this
event.
So
the
idea
is
that,
throughout
the
month
of
october,
people
can
go
to
the
github
repository
and
make
commits
to
the
repo.
It
could
be
anything
that
they
want
to
do.
Add
in
some
text
add
in
a
tutorial,
make
changes
to
some
code
and
then,
if
they
get
like
five
or
more
commits,
they
get
something
free
from
from
github.
A
So
it's
a
t-shirt
usually,
and
it
has
the
hacktoberfest
name
and
it
has
like
github
on
it
and
it
motivates
people.
I
guess
to
do
some
commits,
so
we
have
in
let's
see
so
this
is
divaler,
and
this
is
where
the
action
is
occurring
for
the
for
the
hacktoberfest
and
started
the
first
and
we've
already
had.
I
think
about
12
people
contribute
to.
It
is
mainly
individual
here
in
this
repo,
so
they
were.
A
I
think
this
was
driven
by
major
and
o'jual,
where
they,
I
think
you
know,
talk
to
people,
they
knew
at
their
school
and
they
started
to
make
commits
to
the
repo.
So
we
have
a
bunch
of
commits
that
are
I
don't
know
if
you
can
see
it
under
actions?
No
pull
requests.
We've
had
like
probably
about
12
poll
work
or
maybe
10
pull
requests
in
the
last
couple
days.
So
it
started
on
the
first
and
it's
going
to
continue
through
the
month
and
we've
already
had
a
pretty
good
amount
of
interest
in
it.
A
So
one
of
the
things
you
can
do
if
you
want
to
commit-
or
if
you
want
to
participate
in
hectoberfest,
you
can
go
to
the
issues
in
any
of
the
diva
learn
repositories.
In
this
case,
we
have
divalern,
has
a
good
number
of
issues
to
to
look
at,
and
you
look
at
these
issues
that
have
been
generated.
I
think
mayor
is
mostly
generating
these
individual
in
the
diva
learn
part
of
divalern,
but
there
are
other,
I'm
not
sure
if
bourgeois
done
the
same
for
c
elegans
diva
learn
there's
one
issue
open
here.
A
This
is
add
contact
info.
So
this
is
something
that
I
don't
know.
If
it's
it's
a
question,
but
basically
you
know
people
will
go
through
these
issues.
They'll
pick
an
issue
and
then
they'll
address
it
though,
and
what
what
you
do
is
you
you
make
a
fork
of
this
repository
and
then,
which
is
basically
just
your
own
version.
You
would
go
to
fork
and
it
would
create
it.
A
I've
been
involved
and
myoca
been
involved
in
this,
where
we're
all
reviewing
these
changes
and
basically
just
to
make
sure
that
they're
not
malicious
or
that
they're
not
junk
which
can
happen,
but
you
know
and
then
a
lot
of
times
there
will
be
like
errors
that
they
make
or
things
they
need
to
add
that
you
can
suggest
they
make
before
their
pull
request
is
accepted,
and
then
that
happened
I
think
twice
in
this
batch
of
pull
requests.
A
So
we
have
a
lot
of
things.
You
know
that
it's
it's
a
nice
process
for
sort
of
managing
version
control.
So
you
know
we
have
files
that
we
are
constantly
making
version
changes
to
and
github
is
designed
so
that
you
can
make
those
changes
without
overwriting
things
and
being
able
to
track
all
the
changes,
and
it's
very
effective.
A
That's
why
we
do
a
lot
of
things
on
github.
So
that's
that's
how
basically,
how
pull
request
works?
If
you
want
to
contribute
to
this
you're
welcome
to
do
so
they're.
A
You
know
this
is
just
something
that
google
puts
on
as
or
github
puts
on
as
a
way
to
encourage
participation
in
open
source.
So
hopefully
we
get
a
lot
more
of
these
there's
also,
I
think,
an
issue
we
had
to
solve
about
wake
up,
verifying
the
the
repo
for
participation
in
hacktoberfest,
but
I
think
that's
been
solved
so
now.
A
A
A
couple
people
have
made
two
pull
requests
already
and
they're
just
like
chain
little
changes,
I
think,
but
and
then
I'm
making
a
list
of
further
interactions
so
krishna's
on
here.
You
know
he
made
a
pro
request.
A
I
think
it
was
before
our
oktoberfest,
but
I
put
him
on
the
list
and
I
put
melvin
m
on
the
list
because
she
contributed
something
to
the
tutorials,
and
so
those
are
I'm
just
making
a
list
of
people
who
are
contributing
to
evil,
learn
and
some
of
the
other
repositories
in
diva,
learn
and
and
following
up
to
see
what
they're
doing
so.
A
And
so
that's
that's
a
good
census
tool,
and
I
think
you
know
it'll
drive
interest
in
the
group
as
a
whole.
So
and
hopefully
some
of
these
people
will
join
slack
or
I
don't
know.
If
maybe
we
can
have
some
discussion
on
github
about
things.
If
that's,
where
they're
living,
you
know,
if
that's
where
they're
checking
in
maybe
we
can
get
more
people
involved,
and
you
know
some
of
these
problems
that
we're
posing
in
the
meetings.
A
But
but
you
know
it's
a
good
thing
to
know,
and
I
think
it's
a
good
thing
to
like
get
people
involved
in
this.
So
that's
all
I
have
to
say
about
hacktoberfest.
A
A
If
you
don't
really
have
faculty
with
github
and
some
people
don't
especially
if
they're
coming
from
the
biological
side,
because
it
is
more
of
a
computer
science
centric
tool,
then
you
know
you
can
contact
people,
you
know
or
join
the
slack
channel
and
that's
another
way
into
the
community.
A
I
don't
want
to
exclude
people
who
aren't
really
they
don't
have
much
to
say
in
terms
of
github
commits,
because
that
is
again,
like
you
know,
maybe
very
code-centric.
A
So
if
we
have
people
who
want
to
commit
in
other
ways,
you
know
wouldn't
really
be
part
of
the
hacktoberfest
system,
but
we'd
still
be
interested
in
hearing
what
you
have
to
say,
and
so
that
that's
a
good
overview.
I
think
of
that.
B
A
B
C
A
And,
oh,
I'm
not
sure
either.
I
think
some
of
the
organizations
might
have
events
surrounding
oktoberfest,
like
maybe
like
their
get-togethers
or
some
things.
It's
just
a
way
to
like
encourage
people
to
participate,
so
they
might
have
things
where
they're
having
meetings
of
people
like
you
know,
maybe
open
events
where
people
can
come
and
interact
at
a
little
bit
higher
level
than
github
commits.
I'm
not
really
sure
you'd
have
to
show
me
an
example,
though,.
B
I
don't
know
if
that
was
something
that
other
people
and
deal
that
we're
trying
to
do
or
not.
A
A
You
know
we
might
have
a
meeting
where
we
discuss
their
contributions
and
then
maybe
what
things
that
they
might
follow
up
on
and
it
might
be
things
that
are
more
like
you
know,
more
involved
projects
or
things
that
you
know
aren't
just
like
isolated
commits,
so
that
maybe
they
can
get
a
little
bit
more
exposure
to
diva,
learn
or
diva
worm.
You
know
so
that's
I
mean
that
that
would
be
good,
but
we'll
talk
about
that
later.
A
So
last
thing
I
wanted
to
talk
about
today
is
this
presentation
that
I
wanted
to
give
and
if
you
have
to
leave
at
10,
that's
or
at
the
top
of
the
hour,
that's
fine,
but
I
just
wanted
to
go
through
this.
So
this
is
something
I'm
trying
to
wrap
my
head
around
for
a
while
now,
and
this
is
how
to
understand
multi-dimensional
analysis
in
developmental
biology.
So
what
I
have
here
are
three
different
methods.
A
A
But
then
there
are
two
other
methods
that
have
been
more
recently
applied
to
mainly
molecular
data,
and
these
are
t-sne
and
umap,
and
these
are
probably
things
you
haven't
heard
of
so
much.
But
if
you
read
any
sort
of
like
molecular
biology
paper
in
development
or
in
other
areas
of
biology
will
encounter
these
methods,
and
so
I
wanted
to
demystify
some
of
them
so
again
I'll
make
these
slides
available
afterwards
and
it'll
be
recorded.
So
if
you
can't
make
the
whole
talk,
you
can
go
back
and
check
it
out.
A
A
A
A
Then
there's
this
where
you're
picking
the
eigenvectors
at
the
highest
eigenvalues,
then
you're,
projecting
these
data
or
projecting
the
data
points
to
these.
A
Taking
the
data
you're
extracting
information
about
their
sort
of
core
variance
and
then
you're,
taking
the
that
sort
of
that
model
of
the
variance
and
you're
mapping
the
data
points
back
to
them
and
then
you're
getting
a
map
of
those
data
and
how
they're
distributed.
So
this
is
from
a
hacker
noon
tutorial,
but
there
are
books
on
this.
I
just
wanted
to
give
a
couple
of
blog
posts
on
this
that
maybe
are
more
accessible.
A
So
this
is
an
example
of
what
people
usually
do.
They'll
take
their
data,
they'll
analyze
it
using
pca
and
then
they'll
get
maybe
the
top
two
dimensions
or
top
two
principal
components
and
they'll
plot
them
in
a
bivariate
graph,
and
then
it'll
show
these
sort
of
the
scatter
of
the
data,
but
also
these
groups
and
the
clusters
are,
you
know.
Basically,
they
are
organized
along
some
varying
axis
of
variants:
they're,
not
really
like
clustering,
like
a
clustering
algorithm.
A
So
this
is
an
example
here
of
like
taking
a
picture
of
a
flower
or
morphological
data
from
a
flower
and
creating
a
pca
biplot,
which
are
these
two
top
two
axes
of
variants
and
you
can
define
different
features
in
the
data
by
looking
at
the
different
vectors.
So
this
is
sepal
width,
this
vector
this
is
sepal
length.
This
is
petal
width
and
this
is
petal
length
and
you
can
define
these
clusters
and
then
you
can
define
it
by
species
and
the
pca
analysis
gives
you
an
idea.
A
It's
sort
of
you
know,
looking
at
the
shared
variance
and
the
variance
between
groups,
so
you
have.
These
species
basically
fall
in
separate
clusters,
so
the
setosa
species
is
very
different
from
the
versacolor
species
and
the
virginica
is
overlapping
with
the
versa
color
species,
and
if
you
were
to
just
kind
of
idea
in
nature,
you
may
or
may
not
see
that
distinction,
but
this
puts
some
numbers
on
it.
A
A
So
this
is
pca
geometrically
projected
data
onto
a
lower
dimensional
space.
So
you
take
all
the
dimensionality
of
your
data
set
and
you
map
it
to
a
series
of
dimensions
that
are
defined
by
the
by
the
vectors
that
I
mentioned
before,
and
they
try
to
make
them
as
orthogonal
as
possible
so
that
you're
getting
into
you
know
you're
getting
some
sort
of
independent
set
of
variables,
basically
that
you
can
compare
to
one
another.
A
So
that's
the
first
step.
This
is,
of
course,
where
you
can
help
to
identify
clusters
in
the
data.
If
you
compare
it
with
a
hierarchical
linkage,
analysis
or
clustering
analysis.
You
see
that
this
clustering.
This
is
a
hierarchical
linkage
method,
so
it
forms
these
basically
a
tree
with
nested
sets
in
it,
and
so
you
can
see
that
this
matches
up
with
this
cluster
analysis.
But
it's
not
an
absolute
map
one-to-one
map.
The
pca
analysis
will
reveal
features
that
the
cluster
analysis
doesn't
and
vice
versa.
A
So
it's
not
exactly
like
a
clustering
analysis,
type
of
clustering,
and
then
you
have
this
there's.
So
there's
some
limitations
on
pca.
One
of
them
is
that
it
may
miss
non-linear
patterns
of
data.
Another
is
that
it
misses
non-orthogonal
patterns
of
data.
So
if
the
data
is
well
organized
and
highly
structured,
it
will
pull
out
these
clusters.
But
if
it's
not,
then,
if
there's
a
lot
of
there's
interactions
in
the
data
or
there
is
unexplained
variance,
it's
going
to
be
harder
to
find
a
good.
A
A
That
way,
we
think
of
it
as
its
hardcore
quantitative
method,
and
that
brings
us,
of
course,
to
t
snee,
which
is
t
distributed,
stochastic,
neighbor
and
bedding,
and
that's
a
lot
of
words
for
something
that
is
really
just
like
a
dimensionality
reduction
technique
like
I
said,
you're
interpreting
these
sets
of
points
and
you'll
see
with
t
sne
that
that's
actually
more
true,
even
though
this
is
a
more
advanced
method
than
pca.
So
this
started
off.
A
Jeff
hinton
and
company
actually
published
a
paper
on
this
in
2008
visualizing
data
using
t-sne,
and
this
is
a
machine
learning
sort
of
approach,
and
so
in
that
sense
it's
more
advanced
than
pca.
A
Like
more
simple
dimensionality
reduction
techniques,
there's
a
technique
called
multi-dimensional
scaling,
which
it's
also
closely
related
to,
and
that's
actually
a
very
simple
method,
but
this
is
you
know
this
is
supposed
to.
This
was
the
state
of
the
art
maybe
10
years
ago,
so
for
this
t-sne
algorithm,
there
are
two
steps.
A
The
first
is
to
construct
a
probability.
Distribution
for
high
dimensional
objects,
so
you
take
what
pca
does
and
you
look
at
about
the
high
dimensionality
of
your
data,
but
you
construct
a
probability
distribution
for
it
and,
in
this
probability,
distribution.
Your
objects
that
are
similar
in
some
way
have
a
higher
probability
of
occurring
and
dissimilar
objects
have
a
lower
probability.
A
A
But
then
you
also
have
this
low
dimensional
map
that
you
want
to
transform
these
data
or
map
these
data
to
that
you're,
also
constructing
a
probability
distribution.
On
this
probability,
distribution
for
for
the
little
dimensional
map
is
based
on
minimizing
like
what
they
call.
The
kale
divergence
for
each
point,
and
so
it's
basically
kl
divergence,
is
a
technique
that
is
used.
It's
it's
the
callback
leveler
divergence.
A
Clusters,
but
the
cluster
val
validity
is
a
little
tricky.
It
requires
a
lot
of
visualization
and
interpretation,
so
kl
minimization
is
done
using
something
called
gradient
descent,
which
is
an
optimization
method
that
is
very
common
in
machine
learning,
but
that
still
doesn't
solve
a
lot
of
your
problems
of
validating
the
clusters.
A
So
this
is
like
the
mnist
data
set
using
t-sne
and
so
mnist
is
this
data
set
that
they
use
in
machine
learning
that
has
handwritten
numbers
from
zero
to
nine?
And
it's
like
different
handwriting.
You
know
different
people
doing
different
sort
of
swirls
and
things
with
their
handwriting,
so
the
fours
look
very
different
across
the
different
samples
and
the
idea
is
to
be
able
to
identify
all
these
as
a
four
in
this
row
of
fours.
So
all
the
variation
in
the
fours
should
be.
A
That's
defined
as
a
cluster
for
three,
so
it
actually
does
a
pretty
good
job
on
this
mnist
data
set
it.
It
classifies
those
instances
correctly.
So
all
these
light
blue
instances
are
threes
that
are
then
put
in
the
category
of
three
in
this
map
and
the
threes
cluster
together.
Now
you
can
see
there
are
a
couple
of
reds
here
which
are
one
I
think.
A
Oh
no,
eight!
So
you
see
the
eight
is
being
misclassified
as
fives
and
it's
threes
a
little
bit
it's
so
it's
a
little
messy,
but
the
idea
is
that
the
proper
classification
should
be
in
one
of
these
clusters,
and
you
can
see
it
looks
very
much
like
some
sort
of
postmodern
art
thing
where
they
just
kind
of
like
a
jackson.
A
Pollock
almost-
and
this
is
the
problem
with
interpreting
this-
it
creates
a
very
pretty
image,
but
it's
hard
to
interpret,
and
so
the
basic
rule
of
t-sne
is
that
points
within
clusters
are
meant
to
be
similar
in
some
way,
even
if
they're
misclassified,
but
it's
hard
to
say
that
if
these,
if
these
clusters
are
less
similar
than
these
clusters,
so
these
two
clusters-
the
one
at
the
top
here
and
the
one
over
here,
it's
hard
to
say
whether
these
are
less
similar
than
this
cluster.
A
So
this
is
a
meta
analysis
of
all
these
data
sets
and
what
they're
going
to
do
is
they're
going
to
map
it
to
a
t-sne
model
and
see
what
if
they
can
identify
different
cell
types,
so
they
have
these
first
of
all,
they
did
all
these
different
met.
They
tried
all
these
different
methods
on
the
data
and
they
got
some
performance
statistics
on
this.
One
data
set
tacit
at
all,
so
I
think,
tacit
get.
All
is
what
we're
going
to
focus
on
here.
A
This
is
adult
mouse
cortex
using
smart
c2,
so
you're
looking
for
a
bunch
of
classes
here-
and
you
have
this
thing-
perplexity
and
random
initialization,
so
perplexity
or
perplexity.
I
don't
remember
what
they're
actually
calling
it
is
the
main
parameter
that
you're
looking
for
in
t-sne
and
it's
a
method
of
I
I
think
it
might
be
similar
to
convolution-
I'm
not
really
sure
what
they're
what
it
is,
but
that's
the
parameter.
That's
that's
the
key
and
it's
a
way
of
like
you,
know,
algorithmically,
sorting
the
data
out
and
trying
to
get.
A
You
know
find
a
good
mapping
between
high
and
low
dimensional
space.
As
you
can
see,
t-sne
produces
a
lot
of
clusters
here.
Whether
the
clusters
mean
anything
is
a
different
issue,
and
so
one
of
the
things
we
can
do
is
we
can
look
at
out-of-sample
mapping.
So
in
this
example,
we
have
line
two
data
sets.
A
We
have
a
reference,
t-sne
atlas
and
a
data
set
of
interneurons,
and
so
we're
able
to
map
those
interneurons
onto
this
reference
atlas
and
we're
able
to
do
it
fairly
successfully,
and
so
the
they're
two
actually
t-sne
has
a
number
of
parameters.
So
perplexity
is
the
measure
of
global
structure.
So
I
get
remember.
I
told
you
that
these
individual
clusters
are
sort
of.
You
know
you.
Can
you
have
a
very
good
measure
of
sort
of
similarity
within
clusters,
but
then
between
clusters?
We
really
don't
know.
A
What's
going
on
well
perplexity,
if
we
set
this
at
a
different
level,
we
can
refine
this
global
structure.
So
it's
like
a
top-down
way
of
saying
you
know
we
want
to
look
at
like
how
many
clusters
we
have
or
how
similar
the
clusters
will
be,
but
it's
not
really
that
it's
not
really
a
fine
level
of
control.
It's
it's
sort
of
this
measure
of
global.
It's
a
general
measure
of
global
structure,
so
it
turns
out.
A
T-Sne
is
not
very
sensitive
in
this
respect
and
there's
another
method
you
map,
which
is
much
more
sensitive
and
we'll
talk
about
that.
Next
perplexity
is
the
main
free
parameter
in
tc.
So
this
is
the
one
parameter
you
can
play
with
to
improve
your
results,
but
there
are
others
such
as
learning
rate
and
number
of
iterations,
which
you
can
also
play
with
to
improve
your
result.
A
So
this
was
like
10
years
ago
now,
in
the
last
couple
years,
we've
come
up
with
umap,
uniform
manifold
approximation
and
projection
for
dimensionality
reduction.
That's
a
long
set
of
words,
but
basically
it's
an
improvement
on
t-sne.
It
works
in
a
similar
way,
but
it's
better
apparently
and
so
they're,
using
this
more
and
more
now
and
you'll,
see
it
in
papers
and
I'll
show
you
at
the
end
that
there's
an
atlas
that
uses
it
that
involves
c
elegans
developmental
data,
this
paper,
mcinnis,
healey
and
melville,
is
sort
of
lays
out
the
method.
A
So
this
pretty
picture
here
is
what
you
get
with
umap
and
if
you
compare
umap
and
t-sne,
we
know
that
t-sne
does
not
scale
well,
especially
for
stuff
like
single-cell
sequencing
analysis,
so
sn,
rna-seq
or
other
methods
really
doesn't
do
a
very
good
job
of
scaling
well
from
small
to
large
data
sets
and
your
average
your
average
next
gen
data
set
is
very
large,
so
we
want
to
have
a
method,
that's
robust
to
that
size
of
data
also
t-sne.
A
A
T-Sne
performs
non-parametric
mappings
between
higher
and
lower
dimensions
and
does
not
rely
on
features,
and
so
pca
has
something
called
loadings,
which
are
features
that
you
can
use,
but
tc
doesn't
have
that
and
so
to
avoid
sparsity,
which
are
fragmented
clusters.
They
call
them
manifolds,
but
the
manifolds
just
have
like
unconnected
clusters,
that's
what
they
call
sparsity.
So
if
you
have
a
bunch
of
clusters
that
are
just
very
far
apart,
that's
also
bad
because
it's
sparse
it
doesn't
give
you
a
lot
of
information
about
like
you
know.
A
A
There's
this
lack
of
global
distance
preservation
in
t-sne
and
so
to
solve.
This
umap
actually
uses
a
stochastic
gradient
descent
model,
so
they
use
this
model
of
gradient
descent,
but
they
use
it
in
a
way.
That's
very
different
from
what
t-sne
is
using,
which
is
they're
using
it
sort
of
as
a
way
to
optimize
like
kale
divergence
and
other
things.
B
A
A
So
you
know
if
we
use
like
a
normal
distribution,
we
may
miss
a
lot
of
the
outliers
or
a
lot
of
the
things
that
are
out
of
distribution
and
if
you
use
an
exponential
probability
distribution
in
the
high
dimensional
case,
you
pick
up
a
lot
of
that
variation
that
you're
going
to
miss
just
by
looking
at
assuming
that
you're
just
trying
to
fit
them
into
normally
distributed
categories.
A
There
there's
a
lot
of
potential
for
interactions,
so
the
distance
metric,
that's
used
in
new
map
varies
across
the
manifold
or
space
and
the
nearest
neighbor
graphs
result
from
fuzzy
simplical
sets,
which
is
a
form
of
topological
data
analysis
which
we've
talked
about
in
this
group
before,
but
is
an
interesting
sort
of
wrinkle
to
this,
because
then
we're
bringing
in
a
new
method
and
it's
a
different
and
exciting
method
that
might
solve
some
of
the
problems
that
we've
had
with
t-sne
e-map
also
uses
a
number
of
nearest
neighbors
instead
of
a
perplexity
measure.
B
A
Number
of
nearest
neighbors
as
a
proxy
or
as
a
high
level
parameter,
and
so
the
parameter
d
min
demonstrates
a
set
of
uniformly
connected
points.
This
is
sort
of
this
nearest
neighbors
approach
using
a
nearest
neighbor's
approach
instead
of
something
like
perplexity,
leads
to
tightly
packed
clusters.
A
So
you
get
this
more
dense
local
information
and
then
they
use
binary
cross
entropy
as
a
cost
function
instead
of
kale
divergence,
and
I
think
there
are
some
problems
with
kale
divergence
in
terms
of
its
sort
of
how
it's
sort
of
its
rigor
and
and
completeness
for
what
you're
looking
at
here
so
binary
cross
entropy
is
a
better
choice.
A
Finally,
there's
this
tool
that
is
interact.
It's
an
interactive
atlas
of
single
cell
sequencing
data
for
c
elegans,
neurons
in
development,
and
so
where
I
don't
think
yeah.
I
think
it
is
just
for
development,
so
this
is
a
lineage
resolved,
molecular
atlas
of
c
elegans
embryogenesis
at
single
cell
resolution,
and
this
is
a
science
paper
that
came
out
last
year.
This
is
the
vishal
of
this
cello
tool.
A
You
can
see
it
on
github
and
there's
also
a
shiny
app
that
allows
you
to
actually
just
generate
these
plots
directly.
So
if
you
go
to
the
shiny
app
you,
you
input
your
parameters
and
it
produces
these
clusters
which
look
like
sort
of
the
t-sne
clusters,
but
they're
much
denser
and
they're.
You
know
there's
more
global
structure
here.
So
what
you
have
is
you
have
a
bunch
of
cell
types
and
you
have
they're
color-coded,
so
you
can
see
that
they
cluster
within
their.
You
know
within
their
groups,
and
basically
this
tells
you
something
about.
A
You
know
how
different
they
are
from
one
another.
So
the
ciliated
amphi
neuron,
for
example,
fits
into
a
couple
of
clusters,
and
it
tells
you
that
you
know
maybe
something
in
relation
to
some
other
cell
types,
and
this
is
based
on
you
know
all
of
the
single
cell
data
that
they
have
for
the
cell,
so
they
have
a
bunch
of
sequencing
data
for
the
cell.
It
defines
this
cell
state
and
this
is
sort
of
a
summary
of
that
and
hide
in
a
low
dimensional
space.
And
so
we
have
this
two-dimensional
space.
A
So
we
have
this,
we
think
about
in
terms
of
the
lineage
tree.
The
developmental
cells
that
are
born
250
minutes
are
responsible
for
all
this
variation.
A
You
get.
So
that's
that's
just
sort
of
to
set
this
up.
You
get
these
axes
like
you
do
in
pca.
For
this
analysis,
v1
and
v2.
These
two
axes
represent
the
dimensions
of
a
two-dimensional
projection
of
the
source
data.
So
this
is
just
a
two-dimensional
map
of
all
this
variation
and
distance
on
the
manifolds
are
defined
by
single
cell
transcriptional,
profiling
of
all
cells.
So
these
distances
between
clusters,
for
example,
is
a
product
of
this
mass
profiling
of
a
single
cell
and
basically
a
summary
of
that
distance.
A
A
So
you
have
I
apologize
for
going.
Oh,
I
see
dick
had
to
leave,
but
that's
okay,
so
I
mean
I've
been
trying
to
figure
out
how
to
present
that
for
a
while-
and
I
just
wanted
to
give
you
a
primer
and,
like
I
said,
if
you
have
you
know,
this
is
something
that
you'll
probably
encounter.
If
you
read
developmental
biology
papers
with
like
molecular
data
in
them,
don't
be
scared
off
by
the
the
figures
and
the
techniques.
This,
I
think,
is
a
nice
primer
for
that.
B
No
questions,
but
I
appreciate
that
overview
thing
together,.
E
A
Yeah
yeah
no
problem
I'll
share
the
slides
later
and
then
yeah.
So
thanks
for
attending
happy
hacktoberfest
and
if
we'll
talk
on
you
know
offline
on
slack
and
next
week,
we'll
have
another
meeting,
maybe
we'll
follow
up
on
the
programming
issues
like
you
know
how
to
solve
a
problem,
and
hopefully
we'll
have
more
hacktoberfest
activity,
yeah,
all
right!
So
christian.
I
look
forward
to
seeing
your
paper.
E
Yeah
I'll
I'll
just
send
you.
I
have.