Sci Cloj SciCloj Interviews, 27 Sep 2020

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: Geni - dataframe library on Spark - Anthony Khong - Scicloj interview 3

Description

On Sep 27th, 2020, Vijay Kiran, Teodor Heggelund, and Daniel Slutsky interviewed Anthony Khong.

We began with a short presentation about Geni -- a Clojure dataframe library that runs on Apache Spark. The name means "fire" in Javanese.

https://github.com/zero-one-group/geni

Then, the conversation evolved around more general subjects, such as Apache Spark in general, other projects by Anthony, and the Clojure data science ecosystem.

Note that Daniel's comment at 57:41 is wrong.
SICMUtils enables some options for auto differentiation, as commented by Markus Agwin:
https://clojureverse.org/t/video-recording-scicloj-interview-3-anthony-khong-about-geni/6643/3

A

Hello, so welcome everyone. uh This is the third uh psycloge interview uh and with us today we have uh anthony uh who wants to share uh the work he has been uh doing on jenny, uh hello, uh anthony.

B

Hi theodo very nice to be here.

A

And as uh co-interviewers uh I have uh with me uh vijay and daniel.

C

No, it's me hey uh hello. uh It's me vijay, um I'm like closer programmer and super interested in spark and all the related things. So um you may know me from some other audio stuff that I do for closure. So then that's it. I guess I know too daniel.

D

Hi, hello, um daniel I'm a closure person too, even though not at the moment at my day, job and I'm so happy to to be hearing what is going on at the moment with what anthony is doing.

A

Great uh yeah, so uh the main focus of today is uh is anthony and the work he wants to present is on jenny, which is a closure interface to apache spark.

A

We haven't seen much focus on spark so far in the cyclosur data science discussion, uh so we're hoping that, may that may change- and we might uh understand a bit more and to make this as good as possible.

A

We're going to start out with uh with anthony presenting jenny and after that we want to dig into the details, so it will be up to dan lvj and me to take notes and figure out where we want to uh to dig deep further during the presentation yeah so uh over to you uh anthony um what is jenny and uh how do we think about it?

A

B

Sure, thank you so much for the introduction. So uh let me share my screen. Please let me know when you can see it. I can see your screen great.

A

B

So I'm going to be uh talking for uh 15 minutes before we get into just the informal chat. I guess- uh and actually I've presented guinea before um I think about a month ago, uh in cyclone as well, but that uh was in a lightning talk, so uh I had only five minutes, so this is going to be an elongated version, where I can elaborate a little bit more on uh some of the details and some of the design goals uh for guinea.

B

C

This is going to be three times more awesome.

B

Hopefully so yeah, so the plan of attack for today is uh I'm gonna briefly talk about what it is, uh then, some of the design goals that uh go into uh developing guinea and some of the future plans uh tentative, but uh we go to it. uh So what is guinea right? um So I'd like to call it a closure data frame that runs on spark, and the first thing is that it's an idiomatic closure data frame.

B

What that means is that it should be nice to read enclosure, and it should be nice to write as well enclosures it doesn't. It shouldn't look foreign, and, apart from that, it's it's a it's a data frame library right, so you'd expect that it'll be able to do. uh You know some of the typical stuff like um reading uh reading data from file counting number of rows and seeing what columns you have right.

B

So this is an example of a group by aggregate operation and guinea. It will look similar if you're a spark user a familiar if you're a spark user, because a lot of this would resemble spark and method chaining, that's very common in scala, but then also it should look like closure as well. So we can see that it understands keywords and yeah. It uses this threading threading macro all right, so this is grouped by aggregate and and sore.

B

So that's the first thing to say about guinea. The second thing is that, because it's running on spark right like it's, it's basically a wrapper with some added grammar on top of spark, you get spark ml for free, so there's a reasonably rich machine learning library that gets into it.

B

uh So this is an example of how you, uh you know: build a supervised machine learning, a supervised learning pipeline and look very similar to spark as well, but in spark you'd have to uh just point to which columns are your feature columns and then maybe you want to do a pca on that, just to do dimensionality, reduction and train the nxg boost right, and then you put everything in a pipeline like this.

B

uh Then you can do fit which trains, the pca, trains, the exjubus and then, when you do transform it'll, go through all the stages in the pipeline and give you the prediction.

B

So that's what uh training a machine learning model looks like, and the third thing is that there's rdd support right, so rdd is a resilient distributed. Data set, which is just lower level spark. So a lot of what you do in guinea. Is uh you use the uh the built-in function that's available already, but then, when you need to do custom computations, you need to drop down into lower level spark, and that's where you do rdd.

B

And finally, it comes with its own command line, interface right and I think uh I'm going to elaborate more uh on this uh in a bit, but uh the idea being that it needs to be fast for you to start and start querying stuff. uh That's why uh you know it comes with that, uh but more on that in a bit, but uh I mean a fair question be like why.

B

Why would you be using this right and I think uh it's sort of like an intersection, or rather the union of uh y spark and white closure and maybe in cyclos? I wouldn't elaborate much on why closure, but then why spark is, as I think, there's a lot that we can say about it. First, there's a lot of developers, there's even a company backing it right, um so speed and scalability.

B

uh Very often it's it's top of the class right in terms of a big data framework uh and you get that for free um runs everywhere. uh So the same code will run on your laptop on your desktop on on a cluster right so which is nice. It's mature. You know it's!

B

It's it's production ready, and one thing I really like is that it has a nice composable api that really looks like sql, which is something you don't get with pandas or rs data table right, so that's definitely a plus, uh and why closure, I think the rebel experience is really unparalleled right, like I think, that's that's one thing that we can really uh uh sort of emphasize uh with closure and you know sparks joy like there's there's something about, like you know, doing your data analysis, enclosure that that's really nice, that's that I don't know.

B

I can't quite put it uh in words, but uh the experience is amazing.

B

So uh so that's just a very quick introduction of what it is. uh The second section that I want to talk about is just elaborate. Some of the design goals right and for me, working uh as a data scientist right uh work with this feedback loop, a lot um you uh you have some idea about um the data that you're working on and then you need to translate that into query and you'll. Wait for your query to start uh to give you some answers, then you build on that idea to get more ideas right.

B

You you know, uh so you just keep keep on uh iterating through this loop right and for me, a lot of the design goal really goes into optimizing this loop right, so you need to be able to go through your ideas very very quickly, and some of the important factors is that getting started must be fast right.

B

And secondly, uh once you have an idea, you want to translate it into query. uh That also needs to be fast, so that goes into the dsl and the conciseness of your queries and finally, with spark you get this really nice query speed. You get the results uh quite a lot so that you don't get your train of thought doesn't get interrupted, um just to elaborate more on that right like so.

B

The first thing is a fast and accessible repo, so very often uh you'll be thinking about, like the data set right that you're working on, and then you just have that. One question that you you want to answer and uh python is like amazing for this, because it starts up very well starts very quickly. Then you import pandas.

B

Do your query and you're done. I mean you get away with this if your data is small enough and your query is instantaneous right, so python is really good and I want to be able to do this as well uh with uh with with closure and guinea right. So um if I need to uh do line new, uh that's that's too long uh like that's uh too much time uh if I need to do a require uh that also takes in some of the time right, uh and I need to pick my dependencies.

B

That's also not not very good, so it comes with a guinea cli.

B

So literally, you type in guinea you step into the rebel with guinea required, and you can just start typing straight away right, it's still not as fast as python, unfortunately, and a lot of that actually goes down to uh spark being you know not so fast to to start, uh but r and python are definitely really good at this, and you know if your queries or that one question that you're trying to answer is uh it's going to take you more than a couple of minutes.

B

Then seven seconds is not bad, but apart from that, I think you still probably want to go for r in python, and the second thing is translating that idea into a query right uh and so like being able to write uh queries very fast. So there's nothing stopping you from writing pure interrupt right. So this is what you'd write if you're dealing with vanilla spark with the scala interrupt a couple of things.

B

That's not so nice about this is that you need to take into account that the types right you need to do this type tetris, so you need to look up. Array is different to seek and scala sequence is different to java array and all of that stuff. So that's not so nice. So you can write this instead.

B

Closure uh data data structures, and that's it's just a little bit nicer and it's quicker to write um then also uh you know you want to be able to maybe interact with closure uh directly right so that, if whatever it is that you're doing is returning you a scala sequence, a spark row which is uh very much like scala case classes. Then you need to do this. Unpacking. That's not very nice right. So when you're done with the stuff like it should give, you closure data structures, so we pay.

B

You know particular particular attention on that and finally, the the other arrow and the loop is query, speed, right uh and- and you get this from spark for free uh there's, a lot of people working on on on performance uh spark performance. So this is just a nice group by aggregate example all right and then you you write to to disc uh 24 million uh rows with a million groups.

B

Then guinea is is, is you know very competitive here right because it's just spark. um I think in the last uh lightning talk uh I I said that uh python was really slow, but that's just because I was writing my pandas, uh probably not so optimally. You know it can be competitive as well, but uh but ganesh is up there right, like uh it's fast.

B

uh Some other uh goals right uh I'd like to say that uh you know gne to spark as what closure is to the jvm and close your script to uh javascript right then, like it really embraces the host. It's uh it's just uh just using all the facilities and like the spirit, is really captured here by david nolan's quote and during his talk uh parasitic programming languages. Where he's talking about creating mobile apps for on closure script right, saying react native as an art problem.

B

That's somebody else's problem and you know spark is not gunney's problem. It's somebody else's problem. You just need to make sure that the bridge is good uh so that, when you're riding it feels like closure without getting in the way of spa.

B

So uh that's that's the idea and also like you get this nice feature coverage right like uh which is not possible. If you're not you're, writing everything from scratch. uh In the core name space there, like uh 400, h, functions approximately and there's a lot of machine learning models there already rdd support as well.

B

So I couldn't imagine writing this from scratch and being competitive in terms of performance and everything else like it's just no way- and you know uh I'm only doing this like uh spending like half an hour an hour a day uh every day on on guinea right.

B

So uh this is what uh you know uh being parasitic, uh allows you to do and also uh being very shameless about borrowing, other idiot, idioms right, so obviously closure, idioms um stuff, like remove stuff, like uh I don't know, ink deck right, uh it's not available on spark, but you know uh that we, we have a pretty good idea as closure developers. What what these things mean. So it should be there to operate on on columns and then I I use pandas.

B

uh I used to use pandas a lot and there's a lot of stuff. That's that's pretty neat there that doesn't exist in spark, and you know it's just shamelessly it and then it's you know again. This goes back to like writing. Fast queries right like it's there and you know uh whatever comes in your mind, like uh you wanna, you write that then, and there um easy getting started. Experience like uh this is a bit of a pet peeve of mine with with closure.

B

Right like uh I want the getting started experience to be as easy as possible, so starting from a clean slate. uh This is very much taken from uh inspired by uh borg, dudes, uh babashka and clj condo. Where you know you, you have an install script, you make it an executable, you move it to your path and you're good to go. That's all you need to do and it should work. The other dependency is java. That's it nothing else right, not even line.

B

And also beginner-friendly documentations, you know getting started. I don't want people to get lost, but you know I haven't had much feedback on this it'll be great. If we uh yeah uh some people try it out and uh we get feedback on whether or not the like. What's the uh what's missing in the docs right um and finally, like uh it's a bit of a uh a hobby project for me right, so it should be fun right and some of the stuff that I like doing is 100 test coverage.

B

It doesn't really mean much, but I like looking at it so that sparks joy uh and also this kind of anal uh continuous integration pipeline. You know just testing everything I don't know uh I kind of like it uh so yeah. That's that's!

B

Probably the last goal and just very quickly some future plans right, like uh there's four main uh modules and spark there's spark sql, there's spark rdd, there's sparks machine learning and there's spark streaming, so I'm working on spark streaming so that you know we get like this uh nice coverage of the entire library better documentation. uh I think django has an exceptional. uh You know a documentation style kind of.

B

I think you know probably uh could learn a lot from them, so uh I think that's that's next and integration to other closure data libraries, uh I'm specifically uh thinking about uh a zero copy path to tech ml data set. I think we've talked uh about that a lot, but also, uh I don't know note space and oz right. uh I think spark doesn't come with a nice visualization library, it'd be nice to to have that as well.

B

More borrowed idioms um experience and pandas, uh dusk and and spark, but that's that's about it right uh so other idioms from r and from uh on julia, let's say would be, would be awesome and also smoother experience when you're deploying it on the cluster. You know uh because you know that's that's what spark is good at you know we should leverage the the free facilities uh which at the moment is, is not there. So those are just some some uh future plans that I have and uh that's it for.

B

I, uh my 15 minute introduction of guinea. Thanks for listening.

A

Thank you anthony. That was uh very interesting. I wrote down a few technical questions, but.

B

A

I kind of would like to take a step back first and just go into the history, because when I'm hearing you present, I I hear that you're you're making making data science work uh both on technical and a business sense. But I'm wondering how did you get there and and why did you end up with with the combination of spark and closure? uh Can you.

B

A

B

A few years and yeah and lighten.

A

B

Yeah, so actually uh working with data, I started when you know I guess, on a daily basis, when I was doing my master's in oxford uh doing applied statistics that was the main language was r, so we all used our studio, and that was that was okay. That was nice.

B

uh uh No, no, no, no complaints, then, and then I moved to a an algorithmic trading startup, um where uh we used a lot of python, uh a lot of python, a lot of pandas as well, and then I started running into a few issues like the performance wise. uh It was, uh um I wouldn't say bad, but it was hard to predict and you had to do a lot of profiling to make stuff work and a lot.

B

Some of the stuff is pretty unintuitive, and you know it's very uh uh permissible for uh side effects as well. So though, like you know, whatever pipeline, you you come up with is not so uh amazing, I would say, and then I moved to uh agoda, which is uh you know, a bigger uh tech company.

B

uh They use uh uh spark scala spa a lot, uh and you know it's it's it's it's honest spark was honestly awesome like, uh and you know you had this luxury of uh working in a big company where computational resources is just not an issue like you're encouraged to think that storage is unlimited right and you just work away with that. But then the problem I had with that is that uh the the main way uh that you're encouraged to work with that is through a notebook, and this robbed me the wrong way really.

A

uh And now we're talking about the scala notebook right, yeah, yeah,.

B

Sparks scala spark.

D

B

uh And- and you know, notebooks I think is very problematic uh for uh maybe it's okay for research, but then, when you start to productionize it like you need tasks you need, like all the other stuff. uh You know you need your terminal, I feel so. What I did when I was there was that you know this is the notebook. You know, I'm just gonna connect to it via the command line, uh interface right, but then there's still some issues there.

B

So let's say uh you're still really running a notebook, so you don't have control much control over your environment or maybe I I didn't know how to do it and also it took like back then like a minute to start up. So uh so it was.

B

uh I don't know like I missed some things uh from from python and pandas and that's like really fast like uh connection and just getting getting stuff done and then and there right, but then at the same time I love the spark performance and the api, the like the fact that it really looks like sql um so uh fast forward to uh 2019.

B

A

To jump in with a question sure, so what does it feel like to not have control over the environment? Can you give an example of where that was.

B

So um let's say I uh you want to uh add another library right uh to whatever it is that you're working on. Then you need to uh sort of recompile your uh your your jar and and and sort of like send it to the to the to the to the master, and then it's just a lot of hassle.

B

uh I don't know and and then like I've got this sort of hacky way of connecting to uh via the the command line interface- and I don't know the whole experience is just not so nice and again, like you know, I was working on on a on a pretty big code base. So maybe it was like me not understanding on how to maneuver some some things right, but uh I did miss the fact that you know you can just type ipython and then you're good to go.

B

You know, you're, like you started, you know uh yeah.

A

Because the notebooks uh saw the spark note because they were hosted for you in a sense.

B

Yeah yeah yeah right yeah, so uh everybody was using uh the like I'm guessing connecting to the same uh I mean there's like the resource, scheduler right, and so you you ask for for uh resources and then they give it to you, but then getting like the extra dependencies there. It's just not. I don't uh again that it could be my shortcomings right.

B

So I just get like a very bloated jar and like everything that I could potentially uh want and then just put it, which is again not not probably not so good for when you, when you try to productionize it uh yeah but yeah. uh So I miss some things about python. I really love some things about uh scala spark and then uh I've always wanted to do uh closure.

B

uh I've been wanting to learn that for like a couple of years before, actually uh you know jumping into the language and actually executing a couple of projects using it. Then I thought you know uh one day I you know I've got to execute this this data project and then I'm using python, pandas and dusk, I'm like waiting for ages and then I thought you know like if we could just do this in spark.

B

um It'll be okay but then my other, my other data analysts would would have to learn uh scala right or I don't know for some reason. I didn't really consider pi smart. Maybe I should have uh but yeah. uh So I thought you know, I'm gonna wrap some things and enclosure, and then we already know how to do closure, because we've done closure before and then hey.

B

You know some stuff actually works really nice and I got like 30x speed up on some of the important queries, and that was that was good, and uh then I started building on on on guinea that that was, I guess, the the story.

A

So I I'm still curious about a few bits because you're working at this uh fintech place and then you're working for the big place, but right now, you're.

B

A

For uh or uh you're one of the founders of zero one.

B

A

B

A

uh So what what's, what led you? What led you there.

B

Oh uh so I got married in uh in 2018.

B

uh and then uh my job in agoda was uh based in bangkok and uh you know uh you know I wanted to move back home to indonesia, so uh that was the motivation, uh so I thought you know uh uh before starting uh you know having a baby right like if there's an ever a time for me to to start my own thing and and maybe potentially fail. Who knows that was the time. So I thought you know what I'm going to quit and and do my own thing and yeah.

B

You know. Fortunately it worked out and yeah. That's that's! That's the story. I guess.

A

Well, I guess uh you started the company. Unfortunately, it worked out but you're doing data science stuff. I'm.

C

A

Still curious about a bit of.

C

A

Because startups.

B

A

Really hard to to get going and there can.

B

Be failures, so I mean we we're a service-based company right, so we work with uh people that have data, uh so I'm not necessarily uh doing stuff with our own company's data.

B

There isn't much data really, but then we work with one of the biggest retailers in indonesia and they have like millions and millions of uh customers right. I would, I still wouldn't call it like big data, because it's not that's like a few hundred gigabytes of stuff. It's it's fine, like uh it'll, run on a single machine, uh so yeah. So that's that's. What I'm working on now.

C

So can I can I ask you one question about the interop, because um I.

B

C

That, first of all, I mean it's a really really nice project. By the way I mean I've been in, uh I've been doing some spark uh for, for some time, uh mostly mostly scholar, driven things and- and recently I think most of the code is driven by spark sql, rather than any of other things, because you know that's much more sql has more reach right. I mean there is more people who can write sql.

C

um So how is the interrupt story for for a genie here? Is it something that is calling the scala api behind the scenes or how is it working.

B

Yeah, it's it's calling the the scala api.

A

B

Oh okay, so for spark sql it's it's it's easy enough to call the scala api and and and that, but then for the other stuff, oh for and for machine learning as well, but for rdd and for spark streaming. It gets quite tough, I think, to to work with this uh to work with the scala interrupt. So he is using the the java interrupt for spark streaming.

A

B

For sparks spark sql and spark ml, that's a full scala, interrupt.

C

Okay and and um because I see that you're on spark 3 already or is it on part, 2 or.

A

Something yeah.

C

So you are defaulting to data sets instead of data frames, then.

B

Yeah yeah yeah.

C

Yeah so I mean.

B

I mean the thinking with. That is that uh I you know it's it's not really. Why, like guinea is as far as we know, we're the only one using it.

C

Well, you, you never know.

B

A

C

One release and then see who is complaining.

B

So, there's nothing to break really. So just we have a line ancient there uh going on and we just make sure we keep up uh until we for uh release like, uh like a you know, a beta release, right uh yeah. When we use it in production, then we can finally say to people that you know it's.

C

Kind of ready for production, so yeah.

B

From that, I'm just going to keep up with with spark yeah.

C

Yeah so so you're using data set, but um I'm curious about the. um How are you mapping because datasets are statically typed right, you know that's the main idea uh behind data sets compared to data frames, at least um so. How is all that the the conversion to closure driven things like? If I, if you get a row, then how is it converted into closure maps?

C

Is it? Is it something that you do some magic behind the screens too.

B

Like for a spark row, there's like get all right.

C

B

Which is uh which you just get get whatever scala thing you get and then I've just we've just got some rule of thumb right if it's a sequence, convert it to a closure sequence yeah.

B

If it's uh a java array, converted to a sequence and so on, and so.

C

B

C

B

Put it to another map.

C

B

Mean actually that part of the code is not um super performant? I would imagine.

C

Because you're.

B

Doing a lot of checkings uh yeah, uh uh I think, thankfully, a lot of the stuff that you do on spark. uh It's done. The heavy lift thing is done inside of spark.

C

Yeah, of course, yeah.

B

When you, when you collect, when you get the stuff, then it's uh there's, there isn't so much stuff there and.

C

B

It's been working, okay uh for me,.

C

Yeah yeah, so what because then? So, if I understand correctly, if, um if I download jenny now, it's essentially uh talking to local spark by default right, yes,.

B

C

What's the story in connecting to a cluster, because I can start spark shell by pointing it to yarn by pointing to a remote.

B

Cluster, so um I've only tried it with gcp data prop, so you can do the same thing, but then, with your spark config there already and then it'll connect to yarn yeah.

C

B

Of local, but uh that's that's one s part uh of the project that we really need to work on, because uh yeah at the moment, a lot of it is just local right. Yeah we haven't had the use case to where we need to have like a big cluster right to execute some of the stuff that we need to execute. But honestly uh spark local uh people uh like to say that it's not good for like uh small to medium data. It's actually pretty good.

C

Yeah totally, I mean I understand that from the performance point of view, for example, the cluster that uh that I work with at work.

B

C

Its uh storage capacity is 380 terabytes, with like a real metal nodes like 20 metal nodes uh beefed up it's not just the data side. It's also the idea behind, because the cluster is categorized. You know um everything is secure, so.

B

I have to connect.

C

Via you know, kebros tokens and uh all that uh fun stuff, I would say, um because that that would certainly open a lot of more uh folks who who want to do this. As you said, you know, most of the data is living on a cluster and and yeah in terms of computational resources.

C

It's it's it's much better, because I don't need to uh copy all my parquet files, which are, like probably you know, with terabyte um onto my computer and then and then do the work, and it's not even a load in in some environments right, because you have to work on the cluster and get a key. So um do you have any plans in that direction or are you looking for or do you think the design itself is? Is uh you know compatible with that that mode for for genie.

B

I I I think it is uh it's just that I haven't really explored it, uh but it's it's on the pipeline uh yeah. uh I will try to write guides to deploy stuff on on those three things: at least a gcp dataproc, proc, uh aws, emr and databricks uh yeah.

C

B

C

B

For those three things, uh uh ideally, you can use gneon.

C

Yeah yeah because I mean exploration-wise, it's it's awesome right because you know you're familiar with closure, you're familiar with closure, um and you call that, like the terminology uh that you are you're, giving this um to people who are familiar with closure can getting access to spark really quickly without going with them without going to scholar or python.

C

um Obviously, the next step would be okay. I built my program. It looks awesome, but I want to deploy this on the on the cluster now yeah, so that that's going to be. uh I think, really really uh interesting. Point there. Two two: it's nice.

B

It's something uh we're going to look into uh yeah. For now. uh Our use case is batch, jobs uh runs locally uh and as a drop-in replacement for pandas, our our data frame. That's.

C

B

Current use case yeah.

C

It means it's, it's really, you know um hitting the right sweet spot, I would say, because you know um you're taking advantage of, as you said, sparks computational engine and then giving a closure way of it. So otherwise I need to do maps and then you know, but the general closure data processing. um I think this this makes it way better.

C

I think, on par with you know, uh data frames and pandas and.

B

Something like.

C

That you know you're getting that that fancy thing um and usually spark is- is more on the cluster side rather than on the local side right. So that's the.

B

Most interesting.

C

B

Yeah so like even on on on sparks uh subreddit on on reddit right, like uh people are saying like hey. If your your stuff fits on memory, you don't you don't need spark.

C

B

Which is true, you don't need smart, but at the same time, if you do use spark, it's kind of amazing.

C

Yeah yeah, I mean the thing. Is it's also because you you? You know the functions. You know the you know you know the framework, so I don't need to learn another new framework to to to do my work, which is just the interesting part and then uh with the same code. You can scale it up to 200 machines. You know without translating it into something else. Yeah.

B

That's the powerful thing I think yeah yeah on your laptop yeah, exactly.

C

So during the project like what was the most challenging thing for you so far, you know building something with spark on closure and spark and scholar.

C

There are lots of uh interesting tech there, um but there must be some challenges right.

B

Yeah yeah, um okay, I mean the first thing is that, like I, don't have that much experience and closure itself right, like uh I've, been using it uh professionally since uh december 2019, so it's it hasn't been a year. So learning the language itself is as something then yeah. The scala interrupt can be, can really be tricky, uh but then a lot of it has been solved actually, uh like you, take bits and pieces from other libraries.

C

B

Yeah and you, I think I think, you're okay, but then uh some stuff, like um steel, trades and like implicits. uh They don't translate so nicely to.

C

Yeah, when they're they're they're they're already I mean implicits- are already uh pain in scala already, so I I would really not want to have them in closing. No, no.

B

You don't, uh I don't think it translates so well to clojure anyway, uh yeah.

C

B

I mean with the uh uh uh how the convention and and everything so uh one pain point is that uh you need to do a lot of reflex. uh Yeah.

C

B

Inspect like okay, what? What is this guy actually uh requiring right.

D

B

How do you deal with that uh yeah and and yeah? I. I ran into a lot of troubles with with the rdd, because functions need to be serializable um yeah, which is which is tough. uh I like basically taking a lot from spark plug, which is imperative. uh Yeah I mean I mean I think I know that they're using spark, but then I'm not sure they're they're pushing everything on on github, but then whatever is there is actually super useful. uh That's that's used in guinea as well, so.

D

Have you serialized functions for that.

B

Yeah, but I I claim no credit for that. That's taken straight out of spark plug and using that model and then uh and yeah wrapping the other stuff on the rdd uh methods on rdd ecosystem and yeah.

C

B

uh You can use it with guinea. Okay,.

C

So that means you're you're, pretty much ready or well. Okay, I'm just talking based on my stupid knowledge of five minute discussion.

C

um That means you should be more or less ready to switch to the cluster more pretty easily right, because the main reason being serialization is because you want to send the code to different nodes on the on.

B

C

Cluster, basically, that's why you need serialization. If I understand it correctly: yeah yeah.

D

C

uh But it is very impressive by the way I mean if you started, you know you started closure. I I wouldn't pick as my as my first project. If I was.

C

With just uh six months- or you know just getting started with the programming language and then going to at this level of uh spark is a complex beast, because I've been working in spark since spark one uh pretty long time ago, and there was no spark ml or anything um and also spark multi ml lab and that fiasco, but using completely new language and then trying to work on this super complex thing that that's really admirable by the way.

C

So and then I'm I'm just taking the project because of this interview, you know, I'm I'm not usually looking for anything related to spark other than spark sequel and spark scala, because that's what I use mostly, I know christopher kron and a couple of other people built uh some spark like spark.

C

What do you call that, um like a closure spark thing, a ripple.

A

Yeah yeah yeah yeah.

B

C

But I never gotten used to it because you know well, I'm the closure guy in the team. So it's you know the rest of the team is using python.

C

So my point is that you know come starting with a completely new language that that you learned recently and then attacking a problem like this. It's it's really awesome by the way. It shows your uh you know your skill and everything.

B

But it's it's! It's I've gone about it in kind of an agile way right like.

C

B

Started off with like, can I do this in closure, then yeah? It turns out that you can. uh Can I do.

C

B

Stuff enclosure turns out.

C

D

B

And then can I rewrite my my uh current query: uh that's running in python yeah, an enclosure you.

A

B

And you get a 30x with that, then it's like okay. This is something.

A

That I I need to have.

B

A

Point here, yeah.

B

So so that's kind of the story, and and for me like spark sql was pretty and spark ml was uh they were pretty okay to wrap and enclosure. uh Rdd and streaming are a bit more difficult, and for me, even if it didn't have rdd or streaming, it would still be useful. uh Yeah.

C

Yeah, of course, yeah yeah yeah I mean streaming is fairly reasonably new right. It starts with spot to something 2.2, something structured streaming and everything um so, but but I'm curious how that maps to idiomatic closure as well- and I think you're making me explore that a bit. So I'm curious about the streaming side of it, um because there is these streams and types there and all that stuff. So yeah.

B

At this point, it's just like uh making sure all the methods are there to be used and.

C

B

Kind of have to be familiar with the spark ecosystem to yeah you and then you're kind of calling uh uh there's like a one-to-one uh thing to calling the right methods and and you're. Okay, probably but.

C

B

Definitely uh not uh at the moment, I I don't think I know enough about rdd and streaming to uh build another grammar on top of it.

C

Yeah yeah, and is it something just uh your project or how many people are working on this.

B

Just me actually, uh but then.

B

I but then the next uh if we have like a good fit for another data project, uh so that I'll teach the other team. uh My my entire team to use it right.

C

B

My team is like three people right, so yeah, so we'll we'll see how that goes. But then, at the moment it's just a little bit of fun. I spend like uh one or two uh pomodoros every day, just like.

C

B

Yeah trying to do stuff uh adding stuff on guinea uh yeah yeah, uh but you know, uh there's there's a lot of little stuff uh yeah, uh I'm I don't know, I'm just tinkering about really uh no uh yeah.

C

But it is it is. It is impressive, especially now and when you show that you know the test suit and also the ci stuff, that you are um making sure that everything is, you know working properly. um So that's that's, that's extremely disciplined. I must say I mean I I've been uh I'm in the industry for almost 20 years and even even normal commercial prices. Don't have this kind of you.

B

Know uh um um definitely follows and uncle bob's uh clean code kind of.

B

Discipline right, which is.

C

B

That I think not not as popular in the closure community, but you know.

D

B

Know I kind of like it uh but yeah, no, that that's how we write a production uh software as well in zero one group. So it's just a um just uh um a practice that that just carries over to our open source project.

C

Yeah, nice, so so because you you picked up closure and then you're now a spark- and I was curious about um your experience of learning closure itself. Yeah.

B

C

How how did that go because before closure, you were a python programmer or were you working.

B

In javascript python and scala, uh mainly okay, but then uh I've always had this huge interest in haskell.

C

B

uh Haskell from is I've. Actually you know I still want to dabble in it, uh but then uh I I feel that, like so I'm like uh um I'm a huge fan of functional programming right, so that's that's really uh one thing that I really really like, um and I don't think uh in indonesia uh or anywhere uh starting a a software team with haskell with uh that carries a lot of risk, but I still want it to be um a functional uh programming language, so yeah, uh I'm.

B

I think I'm left with f sharp, o camel and enclosure. So that was like the the three that I was uh seriously considering.

C

B

And then we met someone who who had really had a lot of uh experience and closure and became a no-brainer like you want to have that mentor right, showing you the light uh when you know you encounter these edge cases, so we went with closure. uh That was a switch from python to closure and december 2019..

B

C

Team yeah yeah yeah yeah, that's.

B

C

Impressive anyway, I think that's um yeah, I mean I I'll, certainly give it a try, because I haven't um tried closure, driven things um and and because, as I said, my day-to-day work is with a big cluster, um with lots of data um and with an enterprise curtain code enterprisey. You know way of dealing with things um because I work in a fintech and uh we have a lot of regulations.

C

As you know, you know around around the data and everything uh and security and all the stuff so and I'm curious to give it a try and then see how it pans out and- and I you know, I really like the documentation and the way that you're managing the project. It's super nice.

B

Yeah, thank you. um The one thing I was looking at is that it might not be uh compatible with live. If, if that's, what you're using oh.

C

Yeah yeah leave your server on sparky yeah yeah.

B

C

Think we use that one now.

B

Okay, so uh because that's that's what we used uh because then because you're you're sending texts uh either python code or scala code right, so so that wouldn't work, then I thought you know uh you have something that's very similar to that which is n rappel right. So uh if you have a master, that's that's running uh at the cluster and then you can connect to the n ripple of that cluster of that muster. Then you should be good to go.

C

Yeah yeah, so I just need to start the driver with uh within ripple and then you can.

B

Connect to that one, okay, so hopefully that's just uh so what I did with dataproc as just uh grab the the guinea source code. Put it in an uber jar, yeah.

C

And then run the.

B

Yeah run the uber and then check that the master is running on yarn and that worked yeah.

C

Nice, that's super nice, but uh I mean this is really fascinating project by the way. So congratulations on on I mean putting it into production and using it and then and also open sourcing it. You know, that's that's uh that you know giving back to the community and that's that's really admirable, especially with therefore that you're putting it into uh maintaining it and documentation, and you know setting up proper well setting up an example. I would say no um nice, thank you, so I think uh again, I'm speaking too much daniel dude or theodore.

C

Please, uh please stop me if I'm saying something.

A

I I wanted to ask you a little bit about your team because anthony you mentioned that uh the rest of your team, you were you, were curious about whether you were going to introduce closure and jenny to them.

B

A

Can you talk a bit about uh how you're working right now, what their competence is and what kind of obstacles you see in bringing them over to uh closure and jenny.

B

Yeah yeah, um so I've got this pet peeve right of data scientists not being able to be as being not fully versed in software engineering. So I think that's uh that's an uh you know. You encounter uh data scientists so without software engineering background, but I'm adamant that that's not something that I want to have in the company right so that if you're a researcher and my team, you have to be able to do some back-end uh programming as well, and you dabble with the software team as well.

B

Sometimes so um so everyone in the team knows closure. So that's that's not really an issue, but then we we haven't had more data projects. If I'm honest with you, we've got like mathematical modeling projects, so yeah, uh so we haven't really had the chance to do that. uh So, but then the next one, uh probably yeah.

A

So, what's the difference, then, between a data project and a mathematical model.

B

Yeah, so the what I'm referring to like mathematical modeling is that you you go to the client and then you see their processes and then you make your assumptions and then usually it boils down to an optimization, a constraint, optimization problem. If it's linear, then great, you use something like or tools, linear programming and you're done.

B

If it's non-linear, then you use some other tools, even genetic algorithm right. So uh these kind of things, uh that's uh what at least my team is dealing with at the moment.

A

So, could you give an example, then, of what kind of a challenge that is for a company.

B

Yeah so uh one project that we successfully delivered as for a flower company, you know so flour to make bread, noodle and all of these stuff right, so they make their flour based by milling the wheats.

B

But then weeds are agricultural products, so their prices fluctuate their uh protein, wet gluten and moisture fluctuate their uh price that their delivery might be late right, so they have, they must produce flour that has constant quality and they have their sales forecast with uh ever fluctuating uh raw material right. So what we made for them is this uh optimizer that which is just a constraint, optimization problem, a large-scale one. It has like 100 000 constraints or something like that.

A

Yeah, so would they then use that constraint optimizer to make decisions on what to purchase? So we.

B

Tell them for every flower! uh What uh what do you use the what raw material to use and what a uh what you should be buying.

A

Okay, and, and that is not something that you would then describe as a data project.

B

No uh yeah, it's uh it's just a mathematical modeling. uh For me, it's just like an optimization problem right, so you have a bunch of inputs and then you create the the the the problem and then you solve it and you give it back to them. There is no machine learning involved, there's no data cleaning involved because you know exactly what you're going to be given. So no not really.

A

C

uh Yeah yeah, I was wondering because you're talking about the data science, data modeling sort of things, because um so what's your gen? What's your opinion about like closure for the data science stuff, because you know we're talking about? uh uh Obviously this cycle interview and then you know we're we're interested in in data science, work and closure, because this is uh more or less python being the the lingua franca of data science and because you have experience in python as well.

C

So how do you see closure ecosystem? That is help that is geared towards the data science or data modeling world.

B

I think if uh closure had the same kind of ecosystem, uh that that python has it's it's a much nicer language and much nicer, uh like you know, with with all the rebel uh and and stuff like it's it's it's a it's a much nicer environment to to be working on as a data.

C

B

I think because again like it all, goes back to that feedback, uh and you know, data data, exploratory data analysis and data cleaning is one thing which you can do with spark sql, but then there's plotting as well, which you really do with guinea yeah.

B

So I think that uh it's, uh but but the ecosystem is not there right, like.

C

B

You you'll not run into problems uh like there isn't like uh a mature numpy kind of counterpart and enclosure. There isn't some kind of a psychic learn there. Yeah.

C

B

Spark ml is not it right, like have quite the same kind of coverage,.

C

B

So uh yeah so you're missing a lot of things.

C

B

And if the ecosystem were there like, I think I would recommend it definitely.

C

More than yeah yeah.

D

Can we maybe talk for a while more about the ecosystem and you have been doing some time series modeling, I guess back then, where you were working in finance right and you have been doing that with pandas and numpy. So you could enjoy the time series facilities of panthers.

B

D

So I imagine maybe that would be one thing that you feel that is missing.

A

D

What what is completely missing, maybe what do you think that is far away to get at the moment.

B

I think nunpai right uh numpai is a huge thing like you, you do a lot of things in numpy, numpy, psyched, learn, cycle, learn's, not um and and and something like torch, tensorflow or jax. uh Something like that. So it's a like an automatic differentiation, library, um yeah and yeah.

B

What are the things I mean a lot of people like working with notebooks. I don't, but I think it's very important uh to to gain that market share to make a complete data ecosystem and as as far as I understand it, uh I don't know I haven't played around with that. uh You'll probably know more than me daniel, but it's probably not there right like there isn't a reason for you to to to jump ship. Let's say if you're working with a jupiter notebook uh to to closure.

D

Yeah, so I I think most of these things are going to be there soon. So there is a claude jupiter.

B

D

By our friend klaus and the about psychic land, so I guess yes on yesterday's meeting. We have seen some part of the ecosystem and I think it is about to be going there and I wonder what is completely missing and so automatic differentiation is something that is completely missing at the moment.

B

I think dragon has something on that. Maybe.

D

B

Maybe maybe yeah.

C

Yeah, it's a bit of a you know: uh unfair comparison right because python had a head start. I think number is like what 10 years 20 years old already now, I'm.

B

Very mature yeah, very much yeah. I think it's.

C

It's actually older than clojure itself, so you know it's a bit of a, but I think that the best option is to um because closure has been like the you know, compatible with you know, host on something you know utilize the stuff already there. So I think that is spirit. um If I, if I can comment on on the project that that you're building you know that this is the same spirit, right, utilize spark and then provide. You know fantastic experience using closure to get advantage of that system. So I I know probably daniel.

C

You know more about this as well. Right I mean the the python closure interrupt interrupt work. um If we can get there, then suddenly it opens up. You know a host of possibilities because that's what happened in enclosed script, ecosystem right, you know because it's on javascript and then suddenly you have access to all this um all these things and then we just need to make sure that the the tools are ground. You know.

A

C

In in a good way, um then it opens up like a lot of possibilities, because you can't possibly say: okay, I'm gonna start numpy, you know clone enclosure and I'm pretty sure. After after building two metrics, you know uh computation functions, then I'm like dumb done. You know, I'm not gonna build all the you know so that that's that's. uh That would be a humongous task, so I think the taking advantage of what is there and then making sure that the interrupt is is awesome. um That's how we got and got away with uh closure.

C

You know using every possible java, a library out there, uh at least in the beginning, like now, and if you see most of the libraries that are the java interrupt things are very, very tiny, surface area. You know they just expose. You know that they built um it's a really small wrapper around java stuff.

C

So any anything that you can pick up. So I would say I think that would be a nice. uh You know approach to fill the gap, otherwise we're such a small community and well compared to python number of available python programmers in general. um It will be a monumental task to say: okay, we're going to redo, psychic, learn or numpy or on, and then you still have to reach out to the java x system anyway, because if you want to use numpy, then you need a proper.

C

You know good precision uh for for all the numerical computation. That means you need to fall down to java level and then use some library there.

C

That's probably I'm not sure about the python and enclosure interop, because I see some stuff happening around there. Maybe daniel you know what's happening around that right.

D

Yeah, so it is an active project.

D

Chris and james have been developing leap, python, clj and.

B

D

They keep discussing it with users and it seems to be fruitful. I.

C

D

Tried it so much.

C

D

About numpy, so I think dragon's work on neanderthal is a huge project and it does bring uh really high performance numerical programming with arrays, and so that is not missing anymore. I guess all the linear, algebra parts are really mature, and but I guess I guess, when you go into some specific applications like time series analysis.

B

D

The ecosystem that you have around numpy is magnificent, and that is still missing, probably yeah.

C

B

D

As we know in closure,.

B

And I think really the the answer to that, at least at least uh for you know, in my opinion, it's it's just to wrap, numpy and and be able to really.

B

You know, use it very nicely and enclosure, because um uh I mean dragon's doing like an amazing job, but there's no way uh to compete with, like hundreds of developers doing numpy they. They just have a lot more surface area and uh if you could just wrap it nicely and make it so that it's it's nice to interact with other parts of closure code, and uh I think that that's probably the way to go.

A

I think we have uh an advantage in some sense, though, uh and when we write closure tools, we tend to write them small and that's been said so often that it's become this kind of trope, but in this case uh comparing using jenny to spark notebooks.

A

Okay, you want to use spark with scala, then you might have to use spark notebooks and then that just drops and drags along all these assumptions like you want to use this and you can't reload your dependencies or you cannot simply add libraries, whereas in closure we have the recall, which is the kind of small tool that just solves the interaction problem and compared to python uh in python.

A

If you want to use numpy, you have to deploy your thing and you have to control your dependencies, and you have to do that piece of work that also it's kind of a solved problem in the closure space and in in that same sense. I I really like um that in jenny, you just you just make this little piece that can be used in whatever way we want to, and you also made the cli to make it easy to get started with, but it can be used from the repel.

A

So it's not like the only way to use jenny is through the cli. So that's, for instance, since I'm using emacs, uh I won't be able to get the same kind of editor, help that I would be getting otherwise.

A

So I feel like this, this approach of building the small composable things uh in a sense counterbalances uh the huge ecosystems that we have to yeah compete with.

C

B

I guess uh yeah uh for me, like gunney, is kind of a smallish project right because, like I'm not doing any of the computations, uh it's just wrapping around so like uh we can still have numpy uh in enclosure right. uh It's just and it's not going to be such a huge project because that's uh they're just wrapping stuff. I I okay. I don't know uh this is again yeah uh and and also, uh but also like as soon as you do that, then what numpy are you using? What python are you using?

B

You probably want to standardize that you don't want to import this dependency kind of nightmare from python to.

C

But that's the that's! The impedance mismatch that you have when, when you're bringing in you know uh interop with a language like python, which is a bit of a different model underlying model compared to java stuff, so I think the the main advantage being on jvm is that you know you can just you can interrupt its color. You can drop the jruby whatever you know, because they're all underlying tech is still java and jvm. So it's much easier there I'm curious in terms of the uh yeah and then how uh lip python cld is working.

C

So I might need to look it up uh how far we are at on that one.

B

I I spoke to chris recently and he.

C

B

It's all there uh with.

D

B

Exception of the interrupt with pie torch because they're doing some some funky stuff with the with the threading. But.

C

B

From that, like, I think it's just a nice smooth sailing, though I must say the last time I tried out lip python clj like there was this really not so nice pause in the beginning, trying to require all the stuff, but it might be because I was running a docker on mac, which I I believe, there's a performance penalty.

C

B

Yeah, so uh if I had to run that on linux, at merbin, okay, but but really like for me, the startup time is super super important because you you want to get started as as soon as possible.

C

B

As soon as your your thought, process gets choppy or you're not building on your ideas, so yeah.

C

Yeah, so you, you said you know you're looking for contributions as well. So how do people get start? You know I can get a head start on on on the project like what what can they do for the project? Yeah.

D

If they want to contribute.

C

If they want to look into any issues.

B

So like even as simple as like uh going through the uh the cookbook or some of the guides and saying hey, this doesn't work right uh and uh uh yeah um at the moment, like none of the functions have any dock strings by the way. um I really want to get a way of uh to just import all of the scala dot strings and then just just put it there yeah.

C

B

That's that's in the works uh and then um yeah some some help on like uh deploying stuff in the cluster. I think uh that that's that's going to be a big deal uh and uh yeah uh yeah feed feedback. All around I think uh would be would be awesome because at the moment like I know it works for me. I don't know if it works for someone else.

B

Yeah someone uh did uh raise an issue right saying like hey the the tests they don't work uh turns out that they only work if you're uh at certain time zones, just terrible so much for 100 test coverage right.

C

Well, as long as you're in indonesian time zone and then you're facing east and your keyboard is rotated, 70 degrees right.

B

So uh completely repeatable, but only by by certain uh devices, no.

D

B

That's terrible, like you know, I really think like uh there's it's it's only going to get more robust if people use it so these kind of issues yeah.

C

Yeah nice, I mean that I'm not sure you know this might be a small digression, but um I think that there was this uh internet joke or actually based on some kind of fact, long time ago, that um you know an email can only goes like 500 miles.

B

Yeah right remember.

C

Like you know, so there is a university and then there is this sysadmin folk guy there he posted letters and like they keep sending emails, and then they get a complaint from the university uh professor or something people saying I cannot send email beyond 500 miles or something, and it was fascinating because you know how can emails stop after 500 miles and then apparently they they realize that based on the number of hops and then based on their.

B

C

Of the of light, essentially the signal fast passing through and at some point there was a router called misconfiguration, so that is the one that is bouncing the emails back like it's not getting a reply. So from people point of view it was 500 miles and then, from tech point of view it is completely different. So it was fascinating story. It was a fun fun thing. So so it's something like that. Okay, you know all the tests work, but as long as you're in my time zone.

C

Nice, okay, but um it's it it's super cool by the way, so I'll certainly give it a try. um I'm curious about the cluster side of it, because that will be my uh if this works, then that will be a main. uh You know uh thing for me, because you know I don't run any spark programs on locally unless you know just for experimentation.

C

uh Rest of the work is happening on the cluster itself, so I'm curious about your roadmap and streaming and these things as well so yeah. I think uh yeah.

B

Working on streaming at the moment.

C

B

Probably documentations yeah and uh yeah cluster- I don't know I might get uh someone from my team to to do the the the cluster side, yeah yeah.

C

Yeah, even if it is working on gcp, if there is a there, is a reasonable way to to do that already.

A

C

Would start right, it doesn't need to be like okay, just like spark submit and yarn and.

B

C

B

I think the problem is, I mean, at least for me- is that I'm not an expert in setting up these infrastructure. I just use it a lot, so I mean I've. I found a way to make it work, at least on gcp, uh but.

C

B

I don't know if it's like a super dodgy way of doing this.

C

B

But yeah hey uh yeah! If someone could point me out to the right direction, that'd be awesome.

C

A

Yeah, I'm saying that we're, uh I think, we're past uh the hour uh since we started uh so perhaps take a final round of questions and uh then then finish up. If that's, okay with everyone.

C

A

Sure so I'm wondering uh anthony I'll just uh start: uh how? How can how can the community help you with with jenny? What do you need? What do you need help with we've gotten into this? But what would your priority be.

B

Yeah, I think, there's uh um there's an initiative to to to try it out on beginners, I'm big on trying to make it as easy as possible to get started.

B

So any feedback on that would be great, and maybe we can work it out together how to make it as easy as as possible, and also another pet peeve of mine is like, whenever you're trying to tackle a problem, enclosure you're confronted with a lot of libraries right and then you need to do your research and and that takes a while right and which is kind of uh bad as well, because I'm kind of contributing to this right, because there's tech ml data set already, uh which is pretty established and making it clear for people like this- is what you use this for, and you use a tmd for.

B

I think uh that I don't know the answer uh to that at least not not 100. I've got some ideas, but then um making it clear for people where to go, I think is, is very important and also uh uh that that that bridge uh we can have a zero copy path to tmd, so that people are not forced to pick one over the other uh with uh making and being locked in. uh That would be uh great as well, but then yeah, uh I'm not entirely sure where to uh to start with that.

B

So I need to ask chris um and also um uh library uh uh integrations. I think um integration with with oz uh would be. It would be amazing and note space. So I'm thinking you know in my head, like uh like node space, being a drop in replacement to the rebel and you'll just uh do stuff on the rebel and that but then you'll have nice image.

B

Nice charts nice plots on your browser instead of uh just uh you know, just your terminal, so I think those are, I think, uh some some of the more immediate stuff that I think would be would be great to tackle.

A

Thanks, that's that's very exciting. um Vijay. Do you have a uh a last question.

C

um No, I think that I pretty much asked every possible question already and yeah. I know that you're using so you're using whim to develop all this stuff right, anthony.

B

Yes, yes, yes,.

C

B

C

You'll switch to emacs, so you know we'll we'll keep in touch we'll see. uh Yeah.

B

Again with emacs, it's the the startup time.

C

C

So you know, startup time is only a problem when you shut it down so.

A

B

All right just keep it alive right, yeah,.

A

Yeah, that was a very.

B

Good point very good point.

A

I was actually wondering about your question about startup time, because I could consider just I will always keep a running emac server, so I.

B

A

Keeping a running closure with jenny inside that.

B

Yes, yes, yeah, I'm thinking same thing. Actually so it'd be nice to have like uh the guinea cli having like the server right and work in in the background, and then the startup time would be uh one or two seconds instead of seven yeah that'd, be.

C

B

C

B

C

As I said, it's it's a fascinating project. I really um admire your your hard work behind this one. uh I'm curious how where you're going to take it so I'll certainly try give that a try and if I have um hiccups or something I'll reach out to you, so you can help me out a bit and I'm curious about the cluster stuff. So.

A

C

How um how I can put it into my workflow as well uh yeah, because spark is something that I'm using like yeah, almost like 10 hours a day. You know yeah yeah, that's basically my work, so this is going to be fascinating project for me to try out thank.

B

C

B

Absolutely absolutely thank you.

A

D

You know on every conversation of vijay, I'm waiting for the moment. Vijay will ask about emacs, and I I really I hoped it to happen this time and it did wait. Yeah.

C

Yeah yeah I mean I I don't want to you know, cross, promote something. So I want to keep things a bit separate. This is not about other work. I do so it's mostly about you, know, psychology and genie. That is the main thing, um but yeah, I'm I'm really uh happy to be here and then you know understand things a bit better and I I didn't know this much about this project.

C

Of course, I went through a little bit to go through the things to prepare for the for the for the chat, but now I have way more insight into this and all the other hardwood has been done. So thank you, anthony thanks a lot yeah.

B

Absolutely thank you. Thank you for taking the time to chat.

C

A

Okay, uh so uh that's it! uh Thank you for watching uh the third psych closure interview with anthony about jenny.