►
Description
On Sep 27th, 2020, Vijay Kiran, Teodor Heggelund, and Daniel Slutsky interviewed Anthony Khong.
We began with a short presentation about Geni -- a Clojure dataframe library that runs on Apache Spark. The name means "fire" in Javanese.
https://github.com/zero-one-group/geni
Then, the conversation evolved around more general subjects, such as Apache Spark in general, other projects by Anthony, and the Clojure data science ecosystem.
Note that Daniel's comment at 57:41 is wrong.
SICMUtils enables some options for auto differentiation, as commented by Markus Agwin:
https://clojureverse.org/t/video-recording-scicloj-interview-3-anthony-khong-about-geni/6643/3
A
Hello,
so
welcome
everyone.
This
is
the
third
psycloge
interview
and
with
us
today
we
have
anthony
who
wants
to
share
the
work
he
has
been
doing
on
jenny,
hello,
anthony.
A
And
as
co-interviewers
I
have
with
me
vijay
and
daniel.
C
No,
it's
me
hey
hello.
It's
me
vijay,
I'm
like
closer
programmer
and
super
interested
in
spark
and
all
the
related
things.
So
you
may
know
me
from
some
other
audio
stuff
that
I
do
for
closure.
So
then
that's
it.
I
guess
I
know
too
daniel.
D
Hi,
hello,
daniel
I'm
a
closure
person
too,
even
though
not
at
the
moment
at
my
day,
job
and
I'm
so
happy
to
to
be
hearing
what
is
going
on
at
the
moment
with
what
anthony
is
doing.
A
Great
yeah,
so
the
main
focus
of
today
is
is
anthony
and
the
work
he
wants
to
present
is
on
jenny,
which
is
a
closure
interface
to
apache
spark.
A
We
haven't
seen
much
focus
on
spark
so
far
in
the
cyclosur
data
science
discussion,
so
we're
hoping
that,
may
that
may
change-
and
we
might
understand
a
bit
more
and
to
make
this
as
good
as
possible.
A
We're
going
to
start
out
with
with
anthony
presenting
jenny
and
after
that
we
want
to
dig
into
the
details,
so
it
will
be
up
to
dan
lvj
and
me
to
take
notes
and
figure
out
where
we
want
to
to
dig
deep
further
during
the
presentation
yeah
so
over
to
you
anthony
what
is
jenny
and
how
do
we
think
about
it?
A
B
A
B
So
I'm
going
to
be
talking
for
15
minutes
before
we
get
into
just
the
informal
chat.
I
guess-
and
actually
I've
presented
guinea
before
I
think
about
a
month
ago,
in
cyclone
as
well,
but
that
was
in
a
lightning
talk,
so
I
had
only
five
minutes,
so
this
is
going
to
be
an
elongated
version,
where
I
can
elaborate
a
little
bit
more
on
some
of
the
details
and
some
of
the
design
goals
for
guinea.
B
B
Hopefully
so
yeah,
so
the
plan
of
attack
for
today
is
I'm
gonna
briefly
talk
about
what
it
is,
then,
some
of
the
design
goals
that
go
into
developing
guinea
and
some
of
the
future
plans
tentative,
but
we
go
to
it.
So
what
is
guinea
right?
So
I'd
like
to
call
it
a
closure
data
frame
that
runs
on
spark,
and
the
first
thing
is
that
it's
an
idiomatic
closure
data
frame.
B
What
that
means
is
that
it
should
be
nice
to
read
enclosure,
and
it
should
be
nice
to
write
as
well
enclosures
it
doesn't.
It
shouldn't
look
foreign,
and,
apart
from
that,
it's
it's
a
it's
a
data
frame
library
right,
so
you'd
expect
that
it'll
be
able
to
do.
You
know
some
of
the
typical
stuff
like
reading
reading
data
from
file
counting
number
of
rows
and
seeing
what
columns
you
have
right.
B
So
this
is
an
example
of
a
group
by
aggregate
operation
and
guinea.
It
will
look
similar
if
you're
a
spark
user
a
familiar
if
you're
a
spark
user,
because
a
lot
of
this
would
resemble
spark
and
method
chaining,
that's
very
common
in
scala,
but
then
also
it
should
look
like
closure
as
well.
So
we
can
see
that
it
understands
keywords
and
yeah.
It
uses
this
threading
threading
macro
all
right,
so
this
is
grouped
by
aggregate
and
and
sore.
B
B
So
this
is
an
example
of
how
you,
you
know:
build
a
supervised
machine
learning,
a
supervised
learning
pipeline
and
look
very
similar
to
spark
as
well,
but
in
spark
you'd
have
to
just
point
to
which
columns
are
your
feature
columns
and
then
maybe
you
want
to
do
a
pca
on
that,
just
to
do
dimensionality,
reduction
and
train
the
nxg
boost
right,
and
then
you
put
everything
in
a
pipeline
like
this.
B
B
So
that's
what
training
a
machine
learning
model
looks
like,
and
the
third
thing
is
that
there's
rdd
support
right,
so
rdd
is
a
resilient
distributed.
Data
set,
which
is
just
lower
level
spark.
So
a
lot
of
what
you
do
in
guinea.
Is
you
use
the
the
built-in
function
that's
available
already,
but
then,
when
you
need
to
do
custom
computations,
you
need
to
drop
down
into
lower
level
spark,
and
that's
where
you
do
rdd.
B
And
finally,
it
comes
with
its
own
command
line,
interface
right
and
I
think
I'm
going
to
elaborate
more
on
this
in
a
bit,
but
the
idea
being
that
it
needs
to
be
fast
for
you
to
start
and
start
querying
stuff.
That's
why
you
know
it
comes
with
that,
but
more
on
that
in
a
bit,
but
I
mean
a
fair
question
be
like
why.
B
Why
would
you
be
using
this
right
and
I
think
it's
sort
of
like
an
intersection,
or
rather
the
union
of
y
spark
and
white
closure
and
maybe
in
cyclos?
I
wouldn't
elaborate
much
on
why
closure,
but
then
why
spark
is,
as
I
think,
there's
a
lot
that
we
can
say
about
it.
First,
there's
a
lot
of
developers,
there's
even
a
company
backing
it
right,
so
speed
and
scalability.
B
Very
often
it's
it's
top
of
the
class
right
in
terms
of
a
big
data
framework
and
you
get
that
for
free
runs
everywhere.
So
the
same
code
will
run
on
your
laptop
on
your
desktop
on
on
a
cluster
right
so
which
is
nice.
It's
mature.
You
know
it's!
B
It's
it's
production
ready,
and
one
thing
I
really
like
is
that
it
has
a
nice
composable
api
that
really
looks
like
sql,
which
is
something
you
don't
get
with
pandas
or
rs
data
table
right,
so
that's
definitely
a
plus,
and
why
closure,
I
think
the
rebel
experience
is
really
unparalleled
right,
like
I
think,
that's
that's
one
thing
that
we
can
really
sort
of
emphasize
with
closure
and
you
know
sparks
joy
like
there's
there's
something
about,
like
you
know,
doing
your
data
analysis,
enclosure
that
that's
really
nice,
that's
that
I
don't
know.
B
I
can't
quite
put
it
in
words,
but
the
experience
is
amazing.
B
So
so
that's
just
a
very
quick
introduction
of
what
it
is.
The
second
section
that
I
want
to
talk
about
is
just
elaborate.
Some
of
the
design
goals
right
and
for
me,
working
as
a
data
scientist
right
work
with
this
feedback
loop,
a
lot
you
you
have
some
idea
about
the
data
that
you're
working
on
and
then
you
need
to
translate
that
into
query
and
you'll.
Wait
for
your
query
to
start
to
give
you
some
answers,
then
you
build
on
that
idea
to
get
more
ideas
right.
B
You
you
know,
so
you
just
keep
keep
on
iterating
through
this
loop
right
and
for
me,
a
lot
of
the
design
goal
really
goes
into
optimizing
this
loop
right,
so
you
need
to
be
able
to
go
through
your
ideas
very
very
quickly,
and
some
of
the
important
factors
is
that
getting
started
must
be
fast
right.
B
And
secondly,
once
you
have
an
idea,
you
want
to
translate
it
into
query.
That
also
needs
to
be
fast,
so
that
goes
into
the
dsl
and
the
conciseness
of
your
queries
and
finally,
with
spark
you
get
this
really
nice
query
speed.
You
get
the
results
quite
a
lot
so
that
you
don't
get
your
train
of
thought
doesn't
get
interrupted,
just
to
elaborate
more
on
that
right
like
so.
B
The
first
thing
is
a
fast
and
accessible
repo,
so
very
often
you'll
be
thinking
about,
like
the
data
set
right
that
you're
working
on,
and
then
you
just
have
that.
One
question
that
you
you
want
to
answer
and
python
is
like
amazing
for
this,
because
it
starts
up
very
well
starts
very
quickly.
Then
you
import
pandas.
B
Do
your
query
and
you're
done.
I
mean
you
get
away
with
this
if
your
data
is
small
enough
and
your
query
is
instantaneous
right,
so
python
is
really
good
and
I
want
to
be
able
to
do
this
as
well
with
with
with
closure
and
guinea
right.
So
if
I
need
to
do
line
new,
that's
that's
too
long
like
that's
too
much
time
if
I
need
to
do
a
require
that
also
takes
in
some
of
the
time
right,
and
I
need
to
pick
my
dependencies.
B
So
literally,
you
type
in
guinea
you
step
into
the
rebel
with
guinea
required,
and
you
can
just
start
typing
straight
away
right,
it's
still
not
as
fast
as
python,
unfortunately,
and
a
lot
of
that
actually
goes
down
to
spark
being
you
know
not
so
fast
to
to
start,
but
r
and
python
are
definitely
really
good
at
this,
and
you
know
if
your
queries
or
that
one
question
that
you're
trying
to
answer
is
it's
going
to
take
you
more
than
a
couple
of
minutes.
B
Then
seven
seconds
is
not
bad,
but
apart
from
that,
I
think
you
still
probably
want
to
go
for
r
in
python,
and
the
second
thing
is
translating
that
idea
into
a
query
right
and
so
like
being
able
to
write
queries
very
fast.
So
there's
nothing
stopping
you
from
writing
pure
interrupt
right.
So
this
is
what
you'd
write
if
you're
dealing
with
vanilla
spark
with
the
scala
interrupt
a
couple
of
things.
B
B
Closure
data
data
structures,
and
that's
it's
just
a
little
bit
nicer
and
it's
quicker
to
write
then
also
you
know
you
want
to
be
able
to
maybe
interact
with
closure
directly
right
so
that,
if
whatever
it
is
that
you're
doing
is
returning
you
a
scala
sequence,
a
spark
row
which
is
very
much
like
scala
case
classes.
Then
you
need
to
do
this.
Unpacking.
That's
not
very
nice
right.
So
when
you're
done
with
the
stuff
like
it
should
give,
you
closure
data
structures,
so
we
pay.
B
You
know
particular
particular
attention
on
that
and
finally,
the
the
other
arrow
and
the
loop
is
query,
speed,
right
and-
and
you
get
this
from
spark
for
free
there's,
a
lot
of
people
working
on
on
on
performance
spark
performance.
So
this
is
just
a
nice
group
by
aggregate
example
all
right
and
then
you
you
write
to
to
disc
24
million
rows
with
a
million
groups.
B
Then
guinea
is
is,
is
you
know
very
competitive
here
right
because
it's
just
spark.
I
think
in
the
last
lightning
talk
I
I
said
that
python
was
really
slow,
but
that's
just
because
I
was
writing
my
pandas,
probably
not
so
optimally.
You
know
it
can
be
competitive
as
well,
but
but
ganesh
is
up
there
right,
like
it's
fast.
B
Some
other
goals
right
I'd
like
to
say
that
you
know
gne
to
spark
as
what
closure
is
to
the
jvm
and
close
your
script
to
javascript
right
then,
like
it
really
embraces
the
host.
It's
it's
just
just
using
all
the
facilities
and
like
the
spirit,
is
really
captured
here
by
david
nolan's
quote
and
during
his
talk
parasitic
programming
languages.
Where
he's
talking
about
creating
mobile
apps
for
on
closure
script
right,
saying
react
native
as
an
art
problem.
B
That's
somebody
else's
problem
and
you
know
spark
is
not
gunney's
problem.
It's
somebody
else's
problem.
You
just
need
to
make
sure
that
the
bridge
is
good
so
that,
when
you're
riding
it
feels
like
closure
without
getting
in
the
way
of
spa.
B
So
that's
that's
the
idea
and
also
like
you
get
this
nice
feature
coverage
right
like
which
is
not
possible.
If
you're
not
you're,
writing
everything
from
scratch.
In
the
core
name
space
there,
like
400,
h,
functions
approximately
and
there's
a
lot
of
machine
learning
models
there
already
rdd
support
as
well.
B
So
I
couldn't
imagine
writing
this
from
scratch
and
being
competitive
in
terms
of
performance
and
everything
else
like
it's
just
no
way-
and
you
know
I'm
only
doing
this
like
spending
like
half
an
hour
an
hour
a
day
every
day
on
on
guinea
right.
B
So
this
is
what
you
know
being
parasitic,
allows
you
to
do
and
also
being
very
shameless
about
borrowing,
other
idiot,
idioms
right,
so
obviously
closure,
idioms
stuff,
like
remove
stuff,
like
I
don't
know,
ink
deck
right,
it's
not
available
on
spark,
but
you
know
that
we,
we
have
a
pretty
good
idea
as
closure
developers.
What
what
these
things
mean.
So
it
should
be
there
to
operate
on
on
columns
and
then
I
I
use
pandas.
B
I
used
to
use
pandas
a
lot
and
there's
a
lot
of
stuff.
That's
that's
pretty
neat
there
that
doesn't
exist
in
spark,
and
you
know
it's
just
shamelessly
it
and
then
it's
you
know
again.
This
goes
back
to
like
writing.
Fast
queries
right
like
it's
there
and
you
know
whatever
comes
in
your
mind,
like
you
wanna,
you
write
that
then,
and
there
easy
getting
started.
Experience
like
this
is
a
bit
of
a
pet
peeve
of
mine
with
with
closure.
B
Right
like
I
want
the
getting
started
experience
to
be
as
easy
as
possible,
so
starting
from
a
clean
slate.
This
is
very
much
taken
from
inspired
by
borg,
dudes,
babashka
and
clj
condo.
Where
you
know
you,
you
have
an
install
script,
you
make
it
an
executable,
you
move
it
to
your
path
and
you're
good
to
go.
That's
all
you
need
to
do
and
it
should
work.
The
other
dependency
is
java.
That's
it
nothing
else
right,
not
even
line.
B
And
also
beginner-friendly
documentations,
you
know
getting
started.
I
don't
want
people
to
get
lost,
but
you
know
I
haven't
had
much
feedback
on
this
it'll
be
great.
If
we
yeah
some
people
try
it
out
and
we
get
feedback
on
whether
or
not
the
like.
What's
the
what's
missing
in
the
docs
right
and
finally,
like
it's
a
bit
of
a
a
hobby
project
for
me
right,
so
it
should
be
fun
right
and
some
of
the
stuff
that
I
like
doing
is
100
test
coverage.
B
It
doesn't
really
mean
much,
but
I
like
looking
at
it
so
that
sparks
joy
and
also
this
kind
of
anal
continuous
integration
pipeline.
You
know
just
testing
everything
I
don't
know
I
kind
of
like
it
so
yeah.
That's
that's!
B
Probably
the
last
goal
and
just
very
quickly
some
future
plans
right,
like
there's
four
main
modules
and
spark
there's
spark
sql,
there's
spark
rdd,
there's
sparks
machine
learning
and
there's
spark
streaming,
so
I'm
working
on
spark
streaming
so
that
you
know
we
get
like
this
nice
coverage
of
the
entire
library
better
documentation.
I
think
django
has
an
exceptional.
You
know
a
documentation
style
kind
of.
B
I
think
you
know
probably
could
learn
a
lot
from
them,
so
I
think
that's
that's
next
and
integration
to
other
closure
data
libraries,
I'm
specifically
thinking
about
a
zero
copy
path
to
tech
ml
data
set.
I
think
we've
talked
about
that
a
lot,
but
also,
I
don't
know
note
space
and
oz
right.
I
think
spark
doesn't
come
with
a
nice
visualization
library,
it'd
be
nice
to
to
have
that
as
well.
B
More
borrowed
idioms
experience
and
pandas,
dusk
and
and
spark,
but
that's
that's
about
it
right
so
other
idioms
from
r
and
from
on
julia,
let's
say
would
be,
would
be
awesome
and
also
smoother
experience
when
you're
deploying
it
on
the
cluster.
You
know
because
you
know
that's
that's
what
spark
is
good
at
you
know
we
should
leverage
the
the
free
facilities
which
at
the
moment
is,
is
not
there.
So
those
are
just
some
some
future
plans
that
I
have
and
that's
it
for.
B
I,
my
15
minute
introduction
of
guinea.
Thanks
for
listening.
A
B
A
I
kind
of
would
like
to
take
a
step
back
first
and
just
go
into
the
history,
because
when
I'm
hearing
you
present,
I
I
hear
that
you're
you're
making
making
data
science
work
both
on
technical
and
a
business
sense.
But
I'm
wondering
how
did
you
get
there
and
and
why
did
you
end
up
with
with
the
combination
of
spark
and
closure?
Can
you.
B
B
Yeah,
so
actually
working
with
data,
I
started
when
you
know
I
guess,
on
a
daily
basis,
when
I
was
doing
my
master's
in
oxford
doing
applied
statistics
that
was
the
main
language
was
r,
so
we
all
used
our
studio,
and
that
was
that
was
okay.
That
was
nice.
B
No,
no,
no,
no
complaints,
then,
and
then
I
moved
to
a
an
algorithmic
trading
startup,
where
we
used
a
lot
of
python,
a
lot
of
python,
a
lot
of
pandas
as
well,
and
then
I
started
running
into
a
few
issues
like
the
performance
wise.
It
was,
I
wouldn't
say
bad,
but
it
was
hard
to
predict
and
you
had
to
do
a
lot
of
profiling
to
make
stuff
work
and
a
lot.
B
Some
of
the
stuff
is
pretty
unintuitive,
and
you
know
it's
very
permissible
for
side
effects
as
well.
So
though,
like
you
know,
whatever
pipeline,
you
you
come
up
with
is
not
so
amazing,
I
would
say,
and
then
I
moved
to
agoda,
which
is
you
know,
a
bigger
tech
company.
B
They
use
spark
scala
spa
a
lot,
and
you
know
it's
it's
it's
it's
honest
spark
was
honestly
awesome
like,
and
you
know
you
had
this
luxury
of
working
in
a
big
company
where
computational
resources
is
just
not
an
issue
like
you're
encouraged
to
think
that
storage
is
unlimited
right
and
you
just
work
away
with
that.
But
then
the
problem
I
had
with
that
is
that
the
the
main
way
that
you're
encouraged
to
work
with
that
is
through
a
notebook,
and
this
robbed
me
the
wrong
way
really.
D
B
And-
and
you
know,
notebooks
I
think
is
very
problematic
for
maybe
it's
okay
for
research,
but
then,
when
you
start
to
productionize
it
like
you
need
tasks
you
need,
like
all
the
other
stuff.
You
know
you
need
your
terminal,
I
feel
so.
What
I
did
when
I
was
there
was
that
you
know
this
is
the
notebook.
You
know,
I'm
just
gonna
connect
to
it
via
the
command
line,
interface
right,
but
then
there's
still
some
issues
there.
B
So
let's
say
you're
still
really
running
a
notebook,
so
you
don't
have
control
much
control
over
your
environment
or
maybe
I
I
didn't
know
how
to
do
it
and
also
it
took
like
back
then
like
a
minute
to
start
up.
So
so
it
was.
B
I
don't
know
like
I
missed
some
things
from
from
python
and
pandas
and
that's
like
really
fast
like
connection
and
just
getting
getting
stuff
done
and
then
and
there
right,
but
then
at
the
same
time
I
love
the
spark
performance
and
the
api,
the
like
the
fact
that
it
really
looks
like
sql
so
fast
forward
to
2019.
A
B
So
let's
say
I
you
want
to
add
another
library
right
to
whatever
it
is
that
you're
working
on.
Then
you
need
to
sort
of
recompile
your
your
your
jar
and
and
and
sort
of
like
send
it
to
the
to
the
to
the
to
the
master,
and
then
it's
just
a
lot
of
hassle.
B
I
don't
know
and
and
then
like
I've
got
this
sort
of
hacky
way
of
connecting
to
via
the
the
command
line
interface-
and
I
don't
know
the
whole
experience
is
just
not
so
nice
and
again,
like
you
know,
I
was
working
on
on
a
on
a
pretty
big
code
base.
So
maybe
it
was
like
me
not
understanding
on
how
to
maneuver
some
some
things
right,
but
I
did
miss
the
fact
that
you
know
you
can
just
type
ipython
and
then
you're
good
to
go.
A
Because
the
notebooks
saw
the
spark
note
because
they
were
hosted
for
you
in
a
sense.
B
Yeah
yeah
yeah
right
yeah,
so
everybody
was
using
the
like
I'm
guessing
connecting
to
the
same
I
mean
there's
like
the
resource,
scheduler
right,
and
so
you
you
ask
for
for
resources
and
then
they
give
it
to
you,
but
then
getting
like
the
extra
dependencies
there.
It's
just
not.
I
don't
again
that
it
could
be
my
shortcomings
right.
B
So
I
just
get
like
a
very
bloated
jar
and
like
everything
that
I
could
potentially
want
and
then
just
put
it,
which
is
again
not
not
probably
not
so
good
for
when
you,
when
you
try
to
productionize
it
yeah
but
yeah.
So
I
miss
some
things
about
python.
I
really
love
some
things
about
scala
spark
and
then
I've
always
wanted
to
do
closure.
B
I've
been
wanting
to
learn
that
for
like
a
couple
of
years
before,
actually
you
know
jumping
into
the
language
and
actually
executing
a
couple
of
projects
using
it.
Then
I
thought
you
know
one
day
I
you
know
I've
got
to
execute
this
this
data
project
and
then
I'm
using
python,
pandas
and
dusk,
I'm
like
waiting
for
ages
and
then
I
thought
you
know
like
if
we
could
just
do
this
in
spark.
B
It'll
be
okay
but
then
my
other,
my
other
data
analysts
would
would
have
to
learn
scala
right
or
I
don't
know
for
some
reason.
I
didn't
really
consider
pi
smart.
Maybe
I
should
have
but
yeah.
So
I
thought
you
know,
I'm
gonna
wrap
some
things
and
enclosure,
and
then
we
already
know
how
to
do
closure,
because
we've
done
closure
before
and
then
hey.
B
You
know
some
stuff
actually
works
really
nice
and
I
got
like
30x
speed
up
on
some
of
the
important
queries,
and
that
was
that
was
good,
and
then
I
started
building
on
on
on
guinea
that
that
was,
I
guess,
the
the
story.
A
So
I
I'm
still
curious
about
a
few
bits
because
you're
working
at
this
fintech
place
and
then
you're
working
for
the
big
place,
but
right
now,
you're.
B
A
For
or
you're
one
of
the
founders
of
zero
one.
B
A
A
So
what
what's,
what
led
you?
What
led
you
there.
B
Oh
so
I
got
married
in
in
2018.
B
and
then
my
job
in
agoda
was
based
in
bangkok
and
you
know
you
know
I
wanted
to
move
back
home
to
indonesia,
so
that
was
the
motivation,
so
I
thought
you
know
before
starting
you
know
having
a
baby
right
like
if
there's
an
ever
a
time
for
me
to
to
start
my
own
thing
and
and
maybe
potentially
fail.
Who
knows
that
was
the
time.
So
I
thought
you
know
what
I'm
going
to
quit
and
and
do
my
own
thing
and
yeah.
A
Well,
I
guess
you
started
the
company.
Unfortunately,
it
worked
out
but
you're
doing
data
science
stuff.
I'm.
C
C
B
B
Be
failures,
so
I
mean
we
we're
a
service-based
company
right,
so
we
work
with
people
that
have
data,
so
I'm
not
necessarily
doing
stuff
with
our
own
company's
data.
B
There
isn't
much
data
really,
but
then
we
work
with
one
of
the
biggest
retailers
in
indonesia
and
they
have
like
millions
and
millions
of
customers
right.
I
would,
I
still
wouldn't
call
it
like
big
data,
because
it's
not
that's
like
a
few
hundred
gigabytes
of
stuff.
It's
it's
fine,
like
it'll,
run
on
a
single
machine,
so
yeah.
So
that's
that's.
What
I'm
working
on
now.
B
C
That,
first
of
all,
I
mean
it's
a
really
really
nice
project.
By
the
way
I
mean
I've
been
in,
I've
been
doing
some
spark
for,
for
some
time,
mostly
mostly
scholar,
driven
things
and-
and
recently
I
think
most
of
the
code
is
driven
by
spark
sql,
rather
than
any
of
other
things,
because
you
know
that's
much
more
sql
has
more
reach
right.
I
mean
there
is
more
people
who
can
write
sql.
C
So
how
is
the
interrupt
story
for
for
a
genie
here?
Is
it
something
that
is
calling
the
scala
api
behind
the
scenes
or
how
is
it
working.
B
Oh
okay,
so
for
spark
sql
it's
it's
it's
easy
enough
to
call
the
scala
api
and
and
and
that,
but
then
for
the
other
stuff,
oh
for
and
for
machine
learning
as
well,
but
for
rdd
and
for
spark
streaming.
It
gets
quite
tough,
I
think,
to
to
work
with
this
to
work
with
the
scala
interrupt.
So
he
is
using
the
the
java
interrupt
for
spark
streaming.
A
C
Okay
and
and
because
I
see
that
you're
on
spark
3
already
or
is
it
on
part,
2
or.
B
I
mean
the
thinking
with.
That
is
that
I
you
know
it's
it's
not
really.
Why,
like
guinea
is
as
far
as
we
know,
we're
the
only
one
using
it.
B
B
So,
there's
nothing
to
break
really.
So
just
we
have
a
line
ancient
there
going
on
and
we
just
make
sure
we
keep
up
until
we
for
release
like,
like
a
you
know,
a
beta
release,
right
yeah.
When
we
use
it
in
production,
then
we
can
finally
say
to
people
that
you
know
it's.
C
Yeah
so
so
you're
using
data
set,
but
I'm
curious
about
the.
How
are
you
mapping
because
datasets
are
statically
typed
right,
you
know
that's
the
main
idea
behind
data
sets
compared
to
data
frames,
at
least
so.
How
is
all
that
the
the
conversion
to
closure
driven
things
like?
If
I,
if
you
get
a
row,
then
how
is
it
converted
into
closure
maps?
C
B
Which
is
which
you
just
get
get
whatever
scala
thing
you
get
and
then
I've
just
we've
just
got
some
rule
of
thumb
right
if
it's
a
sequence,
convert
it
to
a
closure
sequence
yeah.
B
If
it's
a
java
array,
converted
to
a
sequence
and
so
on,
and
so.
C
B
Mean
actually
that
part
of
the
code
is
not
super
performant?
I
would
imagine.
B
Doing
a
lot
of
checkings
yeah,
I
think,
thankfully,
a
lot
of
the
stuff
that
you
do
on
spark.
It's
done.
The
heavy
lift
thing
is
done
inside
of
spark.
B
When
you,
when
you
collect,
when
you
get
the
stuff,
then
it's
there's,
there
isn't
so
much
stuff
there
and.
B
It's
been
working,
okay
for
me,.
C
Yeah
yeah,
so
what
because
then?
So,
if
I
understand
correctly,
if,
if
I
download
jenny
now,
it's
essentially
talking
to
local
spark
by
default
right,
yes,.
B
B
Cluster,
so
I've
only
tried
it
with
gcp
data
prop,
so
you
can
do
the
same
thing,
but
then,
with
your
spark
config
there
already
and
then
it'll
connect
to
yarn
yeah.
B
Of
local,
but
that's
that's
one
s
part
of
the
project
that
we
really
need
to
work
on,
because
yeah
at
the
moment,
a
lot
of
it
is
just
local
right.
Yeah
we
haven't
had
the
use
case
to
where
we
need
to
have
like
a
big
cluster
right
to
execute
some
of
the
stuff
that
we
need
to
execute.
But
honestly
spark
local
people
like
to
say
that
it's
not
good
for
like
small
to
medium
data.
It's
actually
pretty
good.
C
Yeah
totally,
I
mean
I
understand
that
from
the
performance
point
of
view,
for
example,
the
cluster
that
that
I
work
with
at
work.
C
Its
storage
capacity
is
380
terabytes,
with
like
a
real
metal
nodes
like
20
metal
nodes
beefed
up
it's
not
just
the
data
side.
It's
also
the
idea
behind,
because
the
cluster
is
categorized.
You
know
everything
is
secure,
so.
C
Via
you
know,
kebros
tokens
and
all
that
fun
stuff,
I
would
say,
because
that
that
would
certainly
open
a
lot
of
more
folks
who
who
want
to
do
this.
As
you
said,
you
know,
most
of
the
data
is
living
on
a
cluster
and
and
yeah
in
terms
of
computational
resources.
C
It's
it's
it's
much
better,
because
I
don't
need
to
copy
all
my
parquet
files,
which
are,
like
probably
you
know,
with
terabyte
onto
my
computer
and
then
and
then
do
the
work,
and
it's
not
even
a
load
in
in
some
environments
right,
because
you
have
to
work
on
the
cluster
and
get
a
key.
So
do
you
have
any
plans
in
that
direction
or
are
you
looking
for
or
do
you
think
the
design
itself
is?
Is
you
know
compatible
with
that
that
mode
for
for
genie.
B
I
I
I
think
it
is
it's
just
that
I
haven't
really
explored
it,
but
it's
it's
on
the
pipeline
yeah.
I
will
try
to
write
guides
to
deploy
stuff
on
on
those
three
things:
at
least
a
gcp
dataproc,
proc,
aws,
emr
and
databricks
yeah.
C
C
B
For
those
three
things,
ideally,
you
can
use
gneon.
C
Yeah
yeah
because
I
mean
exploration-wise,
it's
it's
awesome
right
because
you
know
you're
familiar
with
closure,
you're
familiar
with
closure,
and
you
call
that,
like
the
terminology
that
you
are
you're,
giving
this
to
people
who
are
familiar
with
closure
can
getting
access
to
spark
really
quickly
without
going
with
them
without
going
to
scholar
or
python.
C
Obviously,
the
next
step
would
be
okay.
I
built
my
program.
It
looks
awesome,
but
I
want
to
deploy
this
on
the
on
the
cluster
now
yeah,
so
that
that's
going
to
be.
I
think,
really
really
interesting.
Point
there.
Two
two:
it's
nice.
B
It's
something
we're
going
to
look
into
yeah.
For
now.
Our
use
case
is
batch,
jobs
runs
locally
and
as
a
drop-in
replacement
for
pandas,
our
our
data
frame.
That's.
C
It
means
it's,
it's
really,
you
know
hitting
the
right
sweet
spot,
I
would
say,
because
you
know
you're
taking
advantage
of,
as
you
said,
sparks
computational
engine
and
then
giving
a
closure
way
of
it.
So
otherwise
I
need
to
do
maps
and
then
you
know,
but
the
general
closure
data
processing.
I
think
this
this
makes
it
way
better.
C
I
think,
on
par
with
you
know,
data
frames
and
pandas
and.
C
That
you
know
you're
getting
that
that
fancy
thing
and
usually
spark
is-
is
more
on
the
cluster
side
rather
than
on
the
local
side
right.
So
that's
the.
B
Yeah
so
like
even
on
on
on
sparks
subreddit
on
on
reddit
right,
like
people
are
saying
like
hey.
If
your
your
stuff
fits
on
memory,
you
don't
you
don't
need
spark.
C
C
Yeah
yeah,
I
mean
the
thing.
Is
it's
also
because
you
you?
You
know
the
functions.
You
know
the
you
know
you
know
the
framework,
so
I
don't
need
to
learn
another
new
framework
to
to
to
do
my
work,
which
is
just
the
interesting
part
and
then
with
the
same
code.
You
can
scale
it
up
to
200
machines.
You
know
without
translating
it
into
something
else.
Yeah.
C
There
are
lots
of
interesting
tech
there,
but
there
must
be
some
challenges
right.
B
Yeah
yeah,
okay,
I
mean
the
first
thing
is
that,
like
I,
don't
have
that
much
experience
and
closure
itself
right,
like
I've,
been
using
it
professionally
since
december
2019,
so
it's
it
hasn't
been
a
year.
So
learning
the
language
itself
is
as
something
then
yeah.
The
scala
interrupt
can
be,
can
really
be
tricky,
but
then
a
lot
of
it
has
been
solved
actually,
like
you,
take
bits
and
pieces
from
other
libraries.
C
B
Yeah
and
you,
I
think
I
think,
you're
okay,
but
then
some
stuff,
like
steel,
trades
and
like
implicits.
They
don't
translate
so
nicely
to.
C
Yeah,
when
they're
they're
they're
they're
already
I
mean
implicits-
are
already
pain
in
scala
already,
so
I
I
would
really
not
want
to
have
them
in
closing.
No,
no.
B
You
don't,
I
don't
think
it
translates
so
well
to
clojure
anyway,
yeah.
C
B
I
mean
with
the
how
the
convention
and
and
everything
so
one
pain
point
is
that
you
need
to
do
a
lot
of
reflex.
Yeah.
B
Inspect
like
okay,
what?
What
is
this
guy
actually
requiring
right.
B
How
do
you
deal
with
that
yeah
and
and
yeah?
I.
I
ran
into
a
lot
of
troubles
with
with
the
rdd,
because
functions
need
to
be
serializable
yeah,
which
is
which
is
tough.
I
like
basically
taking
a
lot
from
spark
plug,
which
is
imperative.
Yeah
I
mean
I
mean
I
think
I
know
that
they're
using
spark,
but
then
I'm
not
sure
they're
they're
pushing
everything
on
on
github,
but
then
whatever
is
there
is
actually
super
useful.
That's
that's
used
in
guinea
as
well,
so.
B
Yeah,
but
I
I
claim
no
credit
for
that.
That's
taken
straight
out
of
spark
plug
and
using
that
model
and
then
and
yeah
wrapping
the
other
stuff
on
the
rdd
methods
on
rdd
ecosystem
and
yeah.
B
You
can
use
it
with
guinea.
Okay,.
C
C
D
C
But
it
is
very
impressive
by
the
way
I
mean
if
you
started,
you
know
you
started
closure.
I
I
wouldn't
pick
as
my
as
my
first
project.
If
I
was.
C
With
just
six
months-
or
you
know
just
getting
started
with
the
programming
language
and
then
going
to
at
this
level
of
spark
is
a
complex
beast,
because
I've
been
working
in
spark
since
spark
one
pretty
long
time
ago,
and
there
was
no
spark
ml
or
anything
and
also
spark
multi
ml
lab
and
that
fiasco,
but
using
completely
new
language
and
then
trying
to
work
on
this
super
complex
thing
that
that's
really
admirable
by
the
way.
C
What
do
you
call
that,
like
a
closure
spark
thing,
a
ripple.
C
C
C
C
C
D
B
And
then
can
I
rewrite
my
my
current
query:
that's
running
in
python
yeah,
an
enclosure
you.
A
B
So
so
that's
kind
of
the
story,
and
and
for
me
like
spark
sql
was
pretty
and
spark
ml
was
they
were
pretty
okay
to
wrap
and
enclosure.
Rdd
and
streaming
are
a
bit
more
difficult,
and
for
me,
even
if
it
didn't
have
rdd
or
streaming,
it
would
still
be
useful.
Yeah.
C
Yeah,
of
course,
yeah
yeah
yeah
I
mean
streaming
is
fairly
reasonably
new
right.
It
starts
with
spot
to
something
2.2,
something
structured
streaming
and
everything
so,
but
but
I'm
curious
how
that
maps
to
idiomatic
closure
as
well-
and
I
think
you're
making
me
explore
that
a
bit.
So
I'm
curious
about
the
streaming
side
of
it,
because
there
is
these
streams
and
types
there
and
all
that
stuff.
So
yeah.
B
At
this
point,
it's
just
like
making
sure
all
the
methods
are
there
to
be
used
and.
B
Kind
of
have
to
be
familiar
with
the
spark
ecosystem
to
yeah
you
and
then
you're
kind
of
calling
there's
like
a
one-to-one
thing
to
calling
the
right
methods
and
and
you're.
Okay,
probably
but.
C
B
Definitely
not
at
the
moment,
I
I
don't
think
I
know
enough
about
rdd
and
streaming
to
build
another
grammar
on
top
of
it.
C
Yeah
yeah,
and
is
it
something
just
your
project
or
how
many
people
are
working
on
this.
B
Just
me
actually,
but
then.
B
I
but
then
the
next
if
we
have
like
a
good
fit
for
another
data
project,
so
that
I'll
teach
the
other
team.
My
my
entire
team
to
use
it
right.
C
B
My
team
is
like
three
people
right,
so
yeah,
so
we'll
we'll
see
how
that
goes.
But
then,
at
the
moment
it's
just
a
little
bit
of
fun.
I
spend
like
one
or
two
pomodoros
every
day,
just
like.
C
B
Yeah
trying
to
do
stuff
adding
stuff
on
guinea
yeah
yeah,
but
you
know,
there's
there's
a
lot
of
little
stuff
yeah,
I'm
I
don't
know,
I'm
just
tinkering
about
really
no
yeah.
C
But
it
is
it
is.
It
is
impressive,
especially
now
and
when
you
show
that
you
know
the
test
suit
and
also
the
ci
stuff,
that
you
are
making
sure
that
everything
is,
you
know
working
properly.
So
that's
that's,
that's
extremely
disciplined.
I
must
say
I
mean
I
I've
been
I'm
in
the
industry
for
almost
20
years
and
even
even
normal
commercial
prices.
Don't
have
this
kind
of
you.
B
Know
definitely
follows
and
uncle
bob's
clean
code
kind
of.
D
B
Know
I
kind
of
like
it
but
yeah,
no,
that
that's
how
we
write
a
production
software
as
well
in
zero
one
group.
So
it's
just
a
just
a
practice
that
that
just
carries
over
to
our
open
source
project.
C
Yeah,
nice,
so
so
because
you
you
picked
up
closure
and
then
you're
now
a
spark-
and
I
was
curious
about
your
experience
of
learning
closure
itself.
Yeah.
B
B
In
javascript
python
and
scala,
mainly
okay,
but
then
I've
always
had
this
huge
interest
in
haskell.
C
B
Haskell
from
is
I've.
Actually
you
know
I
still
want
to
dabble
in
it,
but
then
I
I
feel
that,
like
so
I'm
like
I'm
a
huge
fan
of
functional
programming
right,
so
that's
that's
really
one
thing
that
I
really
really
like,
and
I
don't
think
in
indonesia
or
anywhere
starting
a
a
software
team
with
haskell
with
that
carries
a
lot
of
risk,
but
I
still
want
it
to
be
a
functional
programming
language,
so
yeah,
I'm.
B
I
think
I'm
left
with
f
sharp,
o
camel
and
enclosure.
So
that
was
like
the
the
three
that
I
was
seriously
considering.
C
B
And
then
we
met
someone
who
who
had
really
had
a
lot
of
experience
and
closure
and
became
a
no-brainer
like
you
want
to
have
that
mentor
right,
showing
you
the
light
when
you
know
you
encounter
these
edge
cases,
so
we
went
with
closure.
That
was
a
switch
from
python
to
closure
and
december
2019..
B
C
Impressive
anyway,
I
think
that's
yeah,
I
mean
I
I'll,
certainly
give
it
a
try,
because
I
haven't
tried
closure,
driven
things
and
and
because,
as
I
said,
my
day-to-day
work
is
with
a
big
cluster,
with
lots
of
data
and
with
an
enterprise
curtain
code
enterprisey.
You
know
way
of
dealing
with
things
because
I
work
in
a
fintech
and
we
have
a
lot
of
regulations.
C
As
you
know,
you
know
around
around
the
data
and
everything
and
security
and
all
the
stuff
so
and
I'm
curious
to
give
it
a
try
and
then
see
how
it
pans
out
and-
and
I
you
know,
I
really
like
the
documentation
and
the
way
that
you're
managing
the
project.
It's
super
nice.
B
Yeah,
thank
you.
The
one
thing
I
was
looking
at
is
that
it
might
not
be
compatible
with
live.
If,
if
that's,
what
you're
using
oh.
B
Okay,
so
because
that's
that's
what
we
used
because
then
because
you're
you're
sending
texts
either
python
code
or
scala
code
right,
so
so
that
wouldn't
work,
then
I
thought
you
know
you
have
something
that's
very
similar
to
that
which
is
n
rappel
right.
So
if
you
have
a
master,
that's
that's
running
at
the
cluster
and
then
you
can
connect
to
the
n
ripple
of
that
cluster
of
that
muster.
Then
you
should
be
good
to
go.
C
Yeah
yeah,
so
I
just
need
to
start
the
driver
with
within
ripple
and
then
you
can.
B
Connect
to
that
one,
okay,
so
hopefully
that's
just
so
what
I
did
with
dataproc
as
just
grab
the
the
guinea
source
code.
Put
it
in
an
uber
jar,
yeah.
C
Nice,
that's
super
nice,
but
I
mean
this
is
really
fascinating
project
by
the
way.
So
congratulations
on
on
I
mean
putting
it
into
production
and
using
it
and
then
and
also
open
sourcing
it.
You
know,
that's
that's
that
you
know
giving
back
to
the
community
and
that's
that's
really
admirable,
especially
with
therefore
that
you're
putting
it
into
maintaining
it
and
documentation,
and
you
know
setting
up
proper
well
setting
up
an
example.
I
would
say
no
nice,
thank
you,
so
I
think
again,
I'm
speaking
too
much
daniel
dude
or
theodore.
C
Please,
please
stop
me
if
I'm
saying
something.
A
I
I
wanted
to
ask
you
a
little
bit
about
your
team
because
anthony
you
mentioned
that
the
rest
of
your
team,
you
were
you,
were
curious
about
whether
you
were
going
to
introduce
closure
and
jenny
to
them.
A
Can
you
talk
a
bit
about
how
you're
working
right
now,
what
their
competence
is
and
what
kind
of
obstacles
you
see
in
bringing
them
over
to
closure
and
jenny.
B
Yeah
yeah,
so
I've
got
this
pet
peeve
right
of
data
scientists
not
being
able
to
be
as
being
not
fully
versed
in
software
engineering.
So
I
think
that's
that's
an
you
know.
You
encounter
data
scientists
so
without
software
engineering
background,
but
I'm
adamant
that
that's
not
something
that
I
want
to
have
in
the
company
right
so
that
if
you're
a
researcher
and
my
team,
you
have
to
be
able
to
do
some
back-end
programming
as
well,
and
you
dabble
with
the
software
team
as
well.
B
Sometimes
so
so
everyone
in
the
team
knows
closure.
So
that's
that's
not
really
an
issue,
but
then
we
we
haven't
had
more
data
projects.
If
I'm
honest
with
you,
we've
got
like
mathematical
modeling
projects,
so
yeah,
so
we
haven't
really
had
the
chance
to
do
that.
So,
but
then
the
next
one,
probably
yeah.
B
Yeah,
so
the
what
I'm
referring
to
like
mathematical
modeling
is
that
you
you
go
to
the
client
and
then
you
see
their
processes
and
then
you
make
your
assumptions
and
then
usually
it
boils
down
to
an
optimization,
a
constraint,
optimization
problem.
If
it's
linear,
then
great,
you
use
something
like
or
tools,
linear
programming
and
you're
done.
B
If
it's
non-linear,
then
you
use
some
other
tools,
even
genetic
algorithm
right.
So
these
kind
of
things,
that's
what
at
least
my
team
is
dealing
with
at
the
moment.
B
Yeah
so
one
project
that
we
successfully
delivered
as
for
a
flower
company,
you
know
so
flour
to
make
bread,
noodle
and
all
of
these
stuff
right,
so
they
make
their
flour
based
by
milling
the
wheats.
B
But
then
weeds
are
agricultural
products,
so
their
prices
fluctuate
their
protein,
wet
gluten
and
moisture
fluctuate
their
price
that
their
delivery
might
be
late
right,
so
they
have,
they
must
produce
flour
that
has
constant
quality
and
they
have
their
sales
forecast
with
ever
fluctuating
raw
material
right.
So
what
we
made
for
them
is
this
optimizer
that
which
is
just
a
constraint,
optimization
problem,
a
large-scale
one.
It
has
like
100
000
constraints
or
something
like
that.
A
B
Tell
them
for
every
flower!
What
what
do
you
use
the
what
raw
material
to
use
and
what
a
what
you
should
be
buying.
B
No
yeah,
it's
it's
just
a
mathematical
modeling.
For
me,
it's
just
like
an
optimization
problem
right,
so
you
have
a
bunch
of
inputs
and
then
you
create
the
the
the
the
problem
and
then
you
solve
it
and
you
give
it
back
to
them.
There
is
no
machine
learning
involved,
there's
no
data
cleaning
involved
because
you
know
exactly
what
you're
going
to
be
given.
So
no
not
really.
A
C
Yeah
yeah,
I
was
wondering
because
you're
talking
about
the
data
science,
data
modeling
sort
of
things,
because
so
what's
your
gen?
What's
your
opinion
about
like
closure
for
the
data
science
stuff,
because
you
know
we're
talking
about?
Obviously
this
cycle
interview
and
then
you
know
we're
we're
interested
in
in
data
science,
work
and
closure,
because
this
is
more
or
less
python
being
the
the
lingua
franca
of
data
science
and
because
you
have
experience
in
python
as
well.
C
B
I
think
if
closure
had
the
same
kind
of
ecosystem,
that
that
python
has
it's
it's
a
much
nicer
language
and
much
nicer,
like
you
know,
with
with
all
the
rebel
and
and
stuff
like
it's
it's
it's
a
it's
a
much
nicer
environment
to
to
be
working
on
as
a
data.
B
I
think
because
again
like
it
all,
goes
back
to
that
feedback,
and
you
know,
data
data,
exploratory
data
analysis
and
data
cleaning
is
one
thing
which
you
can
do
with
spark
sql,
but
then
there's
plotting
as
well,
which
you
really
do
with
guinea
yeah.
B
So
I
think
that
it's,
but
but
the
ecosystem
is
not
there
right,
like.
B
You
you'll
not
run
into
problems
like
there
isn't
like
a
mature
numpy
kind
of
counterpart
and
enclosure.
There
isn't
some
kind
of
a
psychic
learn
there.
Yeah.
B
So
yeah
so
you're
missing
a
lot
of
things.
C
D
B
B
I
think
nunpai
right
numpai
is
a
huge
thing
like
you,
you
do
a
lot
of
things
in
numpy,
numpy,
psyched,
learn,
cycle,
learn's,
not
and
and
and
something
like
torch,
tensorflow
or
jax.
Something
like
that.
So
it's
a
like
an
automatic
differentiation,
library,
yeah
and
yeah.
B
What
are
the
things
I
mean
a
lot
of
people
like
working
with
notebooks.
I
don't,
but
I
think
it's
very
important
to
to
gain
that
market
share
to
make
a
complete
data
ecosystem
and
as
as
far
as
I
understand
it,
I
don't
know
I
haven't
played
around
with
that.
You'll
probably
know
more
than
me
daniel,
but
it's
probably
not
there
right
like
there
isn't
a
reason
for
you
to
to
to
jump
ship.
Let's
say
if
you're
working
with
a
jupiter
notebook
to
to
closure.
B
D
D
C
Yeah,
it's
a
bit
of
a
you
know:
unfair
comparison
right
because
python
had
a
head
start.
I
think
number
is
like
what
10
years
20
years
old
already
now,
I'm.
C
It's
actually
older
than
clojure
itself,
so
you
know
it's
a
bit
of
a,
but
I
think
that
the
best
option
is
to
because
closure
has
been
like
the
you
know,
compatible
with
you
know,
host
on
something
you
know
utilize
the
stuff
already
there.
So
I
think
that
is
spirit.
If
I,
if
I
can
comment
on
on
the
project
that
that
you're
building
you
know
that
this
is
the
same
spirit,
right,
utilize
spark
and
then
provide.
You
know
fantastic
experience
using
closure
to
get
advantage
of
that
system.
So
I
I
know
probably
daniel.
C
You
know
more
about
this
as
well.
Right
I
mean
the
the
python
closure
interrupt
interrupt
work.
If
we
can
get
there,
then
suddenly
it
opens
up.
You
know
a
host
of
possibilities
because
that's
what
happened
in
enclosed
script,
ecosystem
right,
you
know
because
it's
on
javascript
and
then
suddenly
you
have
access
to
all
this
all
these
things
and
then
we
just
need
to
make
sure
that
the
the
tools
are
ground.
You
know.
A
C
In
in
a
good
way,
then
it
opens
up
like
a
lot
of
possibilities,
because
you
can't
possibly
say:
okay,
I'm
gonna
start
numpy,
you
know
clone
enclosure
and
I'm
pretty
sure.
After
after
building
two
metrics,
you
know
computation
functions,
then
I'm
like
dumb
done.
You
know,
I'm
not
gonna
build
all
the
you
know
so
that
that's
that's.
That
would
be
a
humongous
task,
so
I
think
the
taking
advantage
of
what
is
there
and
then
making
sure
that
the
interrupt
is
is
awesome.
That's
how
we
got
and
got
away
with
closure.
C
You
know
using
every
possible
java,
a
library
out
there,
at
least
in
the
beginning,
like
now,
and
if
you
see
most
of
the
libraries
that
are
the
java
interrupt
things
are
very,
very
tiny,
surface
area.
You
know
they
just
expose.
You
know
that
they
built
it's
a
really
small
wrapper
around
java
stuff.
C
So
any
anything
that
you
can
pick
up.
So
I
would
say
I
think
that
would
be
a
nice.
You
know
approach
to
fill
the
gap,
otherwise
we're
such
a
small
community
and
well
compared
to
python
number
of
available
python
programmers
in
general.
It
will
be
a
monumental
task
to
say:
okay,
we're
going
to
redo,
psychic,
learn
or
numpy
or
on,
and
then
you
still
have
to
reach
out
to
the
java
x
system
anyway,
because
if
you
want
to
use
numpy,
then
you
need
a
proper.
C
You
know
good
precision
for
for
all
the
numerical
computation.
That
means
you
need
to
fall
down
to
java
level
and
then
use
some
library
there.
C
B
C
C
D
About
numpy,
so
I
think
dragon's
work
on
neanderthal
is
a
huge
project
and
it
does
bring
really
high
performance
numerical
programming
with
arrays,
and
so
that
is
not
missing
anymore.
I
guess
all
the
linear,
algebra
parts
are
really
mature,
and
but
I
guess
I
guess,
when
you
go
into
some
specific
applications
like
time
series
analysis.
B
D
C
B
B
And
I
think
really
the
the
answer
to
that,
at
least
at
least
for
you
know,
in
my
opinion,
it's
it's
just
to
wrap,
numpy
and
and
be
able
to
really.
B
You
know,
use
it
very
nicely
and
enclosure,
because
I
mean
dragon's
doing
like
an
amazing
job,
but
there's
no
way
to
compete
with,
like
hundreds
of
developers
doing
numpy
they.
They
just
have
a
lot
more
surface
area
and
if
you
could
just
wrap
it
nicely
and
make
it
so
that
it's
it's
nice
to
interact
with
other
parts
of
closure
code,
and
I
think
that
that's
probably
the
way
to
go.
A
I
think
we
have
an
advantage
in
some
sense,
though,
and
when
we
write
closure
tools,
we
tend
to
write
them
small
and
that's
been
said
so
often
that
it's
become
this
kind
of
trope,
but
in
this
case
comparing
using
jenny
to
spark
notebooks.
A
If
you
want
to
use
numpy,
you
have
to
deploy
your
thing
and
you
have
to
control
your
dependencies,
and
you
have
to
do
that
piece
of
work
that
also
it's
kind
of
a
solved
problem
in
the
closure
space
and
in
in
that
same
sense.
I
I
really
like
that
in
jenny,
you
just
you
just
make
this
little
piece
that
can
be
used
in
whatever
way
we
want
to,
and
you
also
made
the
cli
to
make
it
easy
to
get
started
with,
but
it
can
be
used
from
the
repel.
A
So
it's
not
like
the
only
way
to
use
jenny
is
through
the
cli.
So
that's,
for
instance,
since
I'm
using
emacs,
I
won't
be
able
to
get
the
same
kind
of
editor,
help
that
I
would
be
getting
otherwise.
A
So
I
feel
like
this,
this
approach
of
building
the
small
composable
things
in
a
sense
counterbalances
the
huge
ecosystems
that
we
have
to
yeah
compete
with.
C
B
I
guess
yeah
for
me,
like
gunney,
is
kind
of
a
smallish
project
right
because,
like
I'm
not
doing
any
of
the
computations,
it's
just
wrapping
around
so
like
we
can
still
have
numpy
in
enclosure
right.
It's
just
and
it's
not
going
to
be
such
a
huge
project
because
that's
they're
just
wrapping
stuff.
I
I
okay.
I
don't
know
this
is
again
yeah
and
and
also,
but
also
like
as
soon
as
you
do
that,
then
what
numpy
are
you
using?
What
python
are
you
using?
C
But
that's
the
that's!
The
impedance
mismatch
that
you
have
when,
when
you're
bringing
in
you
know
interop
with
a
language
like
python,
which
is
a
bit
of
a
different
model
underlying
model
compared
to
java
stuff,
so
I
think
the
the
main
advantage
being
on
jvm
is
that
you
know
you
can
just
you
can
interrupt
its
color.
You
can
drop
the
jruby
whatever
you
know,
because
they're
all
underlying
tech
is
still
java
and
jvm.
So
it's
much
easier
there
I'm
curious
in
terms
of
the
yeah
and
then
how
lip
python
cld
is
working.
C
So
I
might
need
to
look
it
up
how
far
we
are
at
on
that
one.
B
It's
all
there
with.
D
B
C
B
Yeah,
so
if
I
had
to
run
that
on
linux,
at
merbin,
okay,
but
but
really
like
for
me,
the
startup
time
is
super
super
important
because
you
you
want
to
get
started
as
as
soon
as
possible.
C
B
So
like
even
as
simple
as
like
going
through
the
the
cookbook
or
some
of
the
guides
and
saying
hey,
this
doesn't
work
right
and
yeah
at
the
moment,
like
none
of
the
functions
have
any
dock
strings
by
the
way.
I
really
want
to
get
a
way
of
to
just
import
all
of
the
scala
dot
strings
and
then
just
just
put
it
there
yeah.
B
That's
that's
in
the
works
and
then
yeah
some
some
help
on
like
deploying
stuff
in
the
cluster.
I
think
that
that's
that's
going
to
be
a
big
deal
and
yeah
yeah
feed
feedback.
All
around
I
think
would
be
would
be
awesome
because
at
the
moment
like
I
know
it
works
for
me.
I
don't
know
if
it
works
for
someone
else.
B
Yeah
someone
did
raise
an
issue
right
saying
like
hey
the
the
tests
they
don't
work
turns
out
that
they
only
work
if
you're
at
certain
time
zones,
just
terrible
so
much
for
100
test
coverage
right.
B
So
completely
repeatable,
but
only
by
by
certain
devices,
no.
D
B
That's
terrible,
like
you
know,
I
really
think
like
there's
it's
it's
only
going
to
get
more
robust
if
people
use
it
so
these
kind
of
issues
yeah.
C
Yeah
nice,
I
mean
that
I'm
not
sure
you
know
this
might
be
a
small
digression,
but
I
think
that
there
was
this
internet
joke
or
actually
based
on
some
kind
of
fact,
long
time
ago,
that
you
know
an
email
can
only
goes
like
500
miles.
C
Like
you
know,
so
there
is
a
university
and
then
there
is
this
sysadmin
folk
guy
there
he
posted
letters
and
like
they
keep
sending
emails,
and
then
they
get
a
complaint
from
the
university
professor
or
something
people
saying
I
cannot
send
email
beyond
500
miles
or
something,
and
it
was
fascinating
because
you
know
how
can
emails
stop
after
500
miles
and
then
apparently
they
they
realize
that
based
on
the
number
of
hops
and
then
based
on
their.
B
C
Of
the
of
light,
essentially
the
signal
fast
passing
through
and
at
some
point
there
was
a
router
called
misconfiguration,
so
that
is
the
one
that
is
bouncing
the
emails
back
like
it's
not
getting
a
reply.
So
from
people
point
of
view
it
was
500
miles
and
then,
from
tech
point
of
view
it
is
completely
different.
So
it
was
fascinating
story.
It
was
a
fun
fun
thing.
So
so
it's
something
like
that.
Okay,
you
know
all
the
tests
work,
but
as
long
as
you're
in
my
time
zone.
C
Nice,
okay,
but
it's
it
it's
super
cool
by
the
way,
so
I'll
certainly
give
it
a
try.
I'm
curious
about
the
cluster
side
of
it,
because
that
will
be
my
if
this
works,
then
that
will
be
a
main.
You
know
thing
for
me,
because
you
know
I
don't
run
any
spark
programs
on
locally
unless
you
know
just
for
experimentation.
C
Rest
of
the
work
is
happening
on
the
cluster
itself,
so
I'm
curious
about
your
roadmap
and
streaming
and
these
things
as
well
so
yeah.
I
think
yeah.
B
Probably
documentations
yeah
and
yeah
cluster-
I
don't
know
I
might
get
someone
from
my
team
to
to
do
the
the
the
cluster
side,
yeah
yeah.
C
A
B
C
C
A
Yeah,
I'm
saying
that
we're,
I
think,
we're
past
the
hour
since
we
started
so
perhaps
take
a
final
round
of
questions
and
then
then
finish
up.
If
that's,
okay
with
everyone.
A
B
Yeah,
I
think,
there's
there's
an
initiative
to
to
to
try
it
out
on
beginners,
I'm
big
on
trying
to
make
it
as
easy
as
possible
to
get
started.
B
So
any
feedback
on
that
would
be
great,
and
maybe
we
can
work
it
out
together
how
to
make
it
as
easy
as
as
possible,
and
also
another
pet
peeve
of
mine
is
like,
whenever
you're
trying
to
tackle
a
problem,
enclosure
you're
confronted
with
a
lot
of
libraries
right
and
then
you
need
to
do
your
research
and
and
that
takes
a
while
right
and
which
is
kind
of
bad
as
well,
because
I'm
kind
of
contributing
to
this
right,
because
there's
tech
ml
data
set
already,
which
is
pretty
established
and
making
it
clear
for
people
like
this-
is
what
you
use
this
for,
and
you
use
a
tmd
for.
B
I
think
that
I
don't
know
the
answer
to
that
at
least
not
not
100.
I've
got
some
ideas,
but
then
making
it
clear
for
people
where
to
go,
I
think
is,
is
very
important
and
also
that
that
that
bridge
we
can
have
a
zero
copy
path
to
tmd,
so
that
people
are
not
forced
to
pick
one
over
the
other
with
making
and
being
locked
in.
That
would
be
great
as
well,
but
then
yeah,
I'm
not
entirely
sure
where
to
to
start
with
that.
B
So
I
need
to
ask
chris
and
also
library
integrations.
I
think
integration
with
with
oz
would
be.
It
would
be
amazing
and
note
space.
So
I'm
thinking
you
know
in
my
head,
like
like
node
space,
being
a
drop
in
replacement
to
the
rebel
and
you'll
just
do
stuff
on
the
rebel
and
that
but
then
you'll
have
nice
image.
B
Nice
charts
nice
plots
on
your
browser
instead
of
just
you
know,
just
your
terminal,
so
I
think
those
are,
I
think,
some
some
of
the
more
immediate
stuff
that
I
think
would
be
would
be
great
to
tackle.
A
C
No,
I
think
that
I
pretty
much
asked
every
possible
question
already
and
yeah.
I
know
that
you're
using
so
you're
using
whim
to
develop
all
this
stuff
right,
anthony.
C
A
B
Yes,
yes,
yeah,
I'm
thinking
same
thing.
Actually
so
it'd
be
nice
to
have
like
the
guinea
cli
having
like
the
server
right
and
work
in
in
the
background,
and
then
the
startup
time
would
be
one
or
two
seconds
instead
of
seven
yeah
that'd,
be.
C
B
C
C
As
I
said,
it's
it's
a
fascinating
project.
I
really
admire
your
your
hard
work
behind
this
one.
I'm
curious
how
where
you're
going
to
take
it
so
I'll
certainly
try
give
that
a
try
and
if
I
have
hiccups
or
something
I'll
reach
out
to
you,
so
you
can
help
me
out
a
bit
and
I'm
curious
about
the
cluster
stuff.
So.
C
How
how
I
can
put
it
into
my
workflow
as
well
yeah,
because
spark
is
something
that
I'm
using
like
yeah,
almost
like
10
hours
a
day.
You
know
yeah
yeah,
that's
basically
my
work,
so
this
is
going
to
be
fascinating
project
for
me
to
try
out
thank.
C
D
C
Yeah
yeah
I
mean
I
I
don't
want
to
you
know,
cross,
promote
something.
So
I
want
to
keep
things
a
bit
separate.
This
is
not
about
other
work.
I
do
so
it's
mostly
about
you,
know,
psychology
and
genie.
That
is
the
main
thing,
but
yeah,
I'm
I'm
really
happy
to
be
here
and
then
you
know
understand
things
a
bit
better
and
I
I
didn't
know
this
much
about
this
project.
C
C
A
Okay,
so
that's
it!
Thank
you
for
watching
the
third
psych
closure
interview
with
anthony
about
jenny.