GeoSci.xyz SimPEG, 23 Aug 2023

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: SimPEG Meeting, August 23rd, 2023

Description

Weekly SimPEG meeting from August 23rd.
A discussion on solvers and bringing back MUMPS.

A

Sure so we got a good turnout today. Oh.

B

Yeah well, the Brady Bunch is here.

A

A

Excellent well, Joey! You want to kick us off with intros you're, the first Square in The, Brady, Bunch I, see.

B

um I'm Joe Capriati I I was working as a UBC for a while uh postdoc us, not simpack development, now I'm down at not at mines as a research associate still involved with syntag as much as I can so I still kind of you know doing a bunch of uh support, request, review than I can and going in there so interacted with you a bit on there on your pull request you already put in yeah.

A

Awesome Doug I see you next. Oh.

C

Hi I'm Dr Goldenberg Matt nice to nice to meet you I, think I know most everybody else here so I'm a professor wakia professor emeritus at UBC and geophysics.

A

Make sure death.

D

uh Yeah, my name is Devin Cowan uh nice to meet Matt um yeah I have I wear a lot of hats, I'm doing simpeg development. One of the things I'm active in is trying to build up all the user.

D

Tutorials makes it a little bit easier for new users uh spending a lot of my time right now, working on some of these new natural source systems like qamt and mobile Mt, and then also some plate modeling and how we are going to do that for time, domain problems um working with tetrahedral meshes and that kind of stuff, so a little bit all over the place.

A

Excellent people I know you know everybody, but if you want to say anything.

E

Yeah so TiVo uh data, scientist and geophysics with a uh with a Cobalt so working with Matt, and we like I'm, very happy to see uh all these very dedicated work at all level, with primer servers. Getting pushed to the open source.

F

Hi I'm Sandeep from Argentina I'm, actually working as a postdoc in UBC in Liz's group and uh yeah. Besides, the research I'm doing I develop open source software, I'm involved in fatian nuttera and also contributing with simpek, uh mainly the last year. So.

G

F

A

H

Hello I'm Dom uh I'm in Vancouver um I. uh We used to be at UBC too um not marriage of science as a scientific programmer and I'm all over. Whatever code Improvement, we can do to send Peg yeah. It.

E

Was a man hashtag behind the Geo apps right? Oh, is that is somebody else behind that now.

H

If you know what the Geo apps are yeah, most people don't care, but yeah.

A

A

B

Yeah there we go.

G

Hi Kalyn Martens uh thanks Lindsay uh I work for SJ geophysics um at our company, we're interested in doing um uh large, loop, em and Mt for starters at least and uh IP inversions. So uh that's why we're interested in synthetic.

A

Let's see Matt.

I

Yeah so I'm not plume a software engineer at Coble, metals and I work with Tebow um and I've, been uh in addition to a lot of the the kind of machine learning work that I've been doing.

I

um I do a lot with our compute infrastructure so trying to make things faster, and so a lot of the work that I've ended up doing lately has been with accelerating our geophysical inversions and trying to figure out various ways to to support various architectures, as well as uh making the inversions run as fast as possible.

A

I'm thrilled to have you here, I'm uh Lindsey, Hagee I'm, an assistant professor at um UBC, and been involved in simpig for uh quite a while at this point, um but yeah I mean so normally we start I, don't know if anybody has any um quick announcements or things to bring up for the group, and then we can perhaps hand the floor over to Matt.

A

It's going once.

D

One thing I'll want to ask while he's here, but it can wait till afterwards.

A

Okay sounds good, I think only quick things is uh Dom. Bcgs is tomorrow. That's right.

H

Still happening, yeah 4.

A

30. excellent um and then um John cut I will be pinging, some of you for a newsletter um updates because we'd like to get that out um in the next week or so so, please I get back to him. If there's anything you'd like to share in that and feel free to Ping. If there's things you would like to include like a new pull request with mumps, uh so um Matt yeah, please take it away.

I

Certainly um I have some material I can share here. I'll go ahead and share. My screen uh looks like got a participant screen. Sharing is disabled.

A

Right Joe, are you gonna, do that or you want me to do it last time we both did it.

B

I got it: okay,.

A

A

Okay, I think you should be able to Now Yep.

I

Looks like it's working all right, so the first thing that I'd like to share here so I have a set of benchmarks that I've run recently. We do a lot of work in the cloud um we run on AWS and Tebow, put together a small forward simulation Benchmark for us to be sort of comparing performance across instance, types um and so I just wanted to show uh some of the the variability of performance across instance types as well as the effects of uh solver Ops, uh and so we can see here.

I

um You know, we've got our our Baseline uh party, so with uh you know as part of mkl, uh and then we have the mob solver that we've been using uh at kobold for probably the last year or so, as we've been using our our M1 and other Apple silicon machines, um and so the the months. Software right now is not as performance as the as partiso, at least in in Cloud usage, but it has allowed us to do local development on M1 laptops.

G

And in a second.

I

I'll show uh some benchmarks also run on my uh my M1, and so we can see here that you know the the solver Ops if we're hinting to either Paradiso or mumps, that we are dealing with a symmetric or SPD Matrix that we see you know significant performance increases in both cases.

I

uh The months pull requests that I have up right now, uh standardizes the uh attributes across both of these solvers that you can specify the same solver Ops uh for both of these, uh as well as uh standardizing the behavior of those solver Ops. So previously, when you, uh when you were hinting to the mumps solver, that a symmetric Matrix was in use, it was actually treating that Matrix as hermitian and that's different from the way that our other solvers are working.

I

um I also have, in the months PR the capability to deal with hermesian matrices, but that's through uh additional means, where uh you know in in side, in uh yeah, in scipy's, sparse matrices, there's the dot T attribute, which is a transpose and then there's the dot conjugate method which allows you to take the con, uh complex, kanji and so dot T dot kanji. It is how to uh how to deal with that kind of stuff.

A

One just quick question or things happening in the chat might be um Joe. Do you want to just jump in and ask your question.

A

Joe you're, you're, muted.

I

Yeah these are, these are all timings.

B

I didn't want to interrupt it too much. Oh.

I

B

um I'm also curious about, like I, said the memory requirements between the two of them.

I

um I haven't looked at memory requirements differences at this point um all of these, uh or this Benchmark is very small uh and fits very easily into an eight gig instance. um So I haven't looked in detail.

I

um The thing that has been kind of most uh of a priority for us at this point is decreasing the simulation time.

I

um If we need to, we need to up the memory, we can generally use a larger instance up to a you know, a certain one that uh at which point uh I think there's going to be out of core kind of stuff that we would need to be doing. You know, but that's not something. We've looked.

B

At right, one other question: you said: you've been doing stuff with M1 processors as well. Did you run any simulate or any timings on using apples, emulation stuff right because they have a? You can also do like emulation and sort of X like 86 4 processes, right yeah,.

I

B

The timings like how much that slows it down versus using something native just out of curiosity, yeah.

I

I, don't have quantitative uh numbers for this. I do have qualitative painful.

I

um It was uh not a great experience trying to run party so or even uh or even doing things like building uh building the image we uh we ensure environmental reproducibility um by running inside Docker containers uh and even building the image for that was a significantly slower.

I

um So I I didn't bother to do that just because of uh things that are not as computationally taxing being pretty painful to do um the most numbers on my M1 and the laptop that I'm using here, we can see I've got a MacBook Pro 16 inch 2021, um it's the M1 Pro System.

I

um Our numbers are looking um pretty solid here um and they're actually better than some of these. uh Where did it go?.

I

um There we go uh better than some of these numbers that we were seeing in the cloud.

I

um These Cloud numbers are also uh hobbled at this point by a task related issue, so the environment management for all of those numbers uh I ran using uh basically a these spin up Das clusters under the hood, uh and it seems like one dask- is involved at this point. um Things run slower with both parties so and mumps, but this was not an issue that I could reproduce with the the super Lu solver.

I

So that's an open problem for us and we would see significant performance increases as well, probably on the order of between uh 25 to 50. If we can get that problem solved.

I

um So taking a look at these numbers here, we're seeing this is actually pretty similar to what we would be seeing on a local Apple, Intel machine and I. Think I've got an I9 here, um so it it does pretty well, and we can also see that uh supplying various hints to this simulation uh increases performance. So I want to.

E

G

I

These uh hence kind of standard and also uh available.

H

And just to make sure Matt, so the the runtimes that you have on the on the cloud yeah, it's like an Azure Azure machines or something those are not uh Apple. M1 no.

G

I

The the instance architecture here so these are x8664. So, for example, an r6a is an R6 yeah. So R is memory, optimized AWS instances, it's a sixth generation and it's an AMD processor, the arm processors here, um our aws's uh graviton processors, um so images that are uh that are built for those um will work with the with the M1 as well.

H

So all these numbers could potentially be different if you were to test it on uh on m1s.

I

Yeah, so these numbers are different on M1, so this is. These are running with this sort of dasque orchestration layer in between looking.

G

I

M1 here so we've got um the the three numbers that we're looking at are unsymmetric symmetric and SPD, um and we can see here that the corresponding numbers on M1 unsymmetric, you know, we've got uh and then symmetric and symmetric, positive, definite, um but I would expect that these numbers will go down as we um deal with that Das related issue which I have not been able to chase down.

I

um As far as the the kind of.

A

C

I

The effect of the changes is making making it possible to run months in a standardized way. You know on instances where we can already run party so, but also extending our capability to to arm instances um standardizing the interface between the two so that the same solver Ops can be used um there.

I

uh There's kind of two different sets of changes. One of them is ready, for. One of them is ready for review, which is what's in this PR and then the next one is not ready for review um and that's how do we package this properly um so right now, uh Pi map solver as it stands, we'll take a look at at the main branch.

I

Here we have this month's interface, that's slightly shimmed out this interface Works uh and we've been using it at kobold for the last year or so, um but something that could use some could use some love and care, uh and that's what I've done here, but then kind of the next step after this and uh uh Joe and I were were discussing this kind of Next Step uh in some of the pr comments here is: how do we make this available uh and it seems like there are a couple of different problems that we would want to see um so right now we have a Fortran interface and that's something that's going to get uh kind of tricky to support, especially with the the deprecation and removal of this utils there's some complications with numpy disc utils, which is the standard way that um F2 Pi, I'm, sorry, F2, Pi extensions, uh get built, and so F2 Pi is a tool.

I

That's used to take this kind of um you know this kind of Trend interface here and make it into something that python can understand. Mumps also has a c interface and cython is a great way to write a python code that needs to talk to C padiso already.

I

Does this and a cython extension is probably going to be the future here, so porting porting, this month's interface to cython um that could live inside pymat solver or it can be broken out into something like something like Heidi, so um I think that what we have right here as far as this packaging.

G

I

Is how to package things with F2 pi? This is not ready for release here. This isn't a branch on on my my four care coupled Metals pymat solver.

I

This is good enough for our use at kobold metals, because we have very tight control over our environment, but what ends up happening with this branch and why this is not included with these changes here. Is this Branch modifies, pymat solver in such a way that if mumps was not available on your system, pymat solver could not be installed. That's not releasable, uh that's great for us, and it leads to this kind of benchmarking capability and really solid results on that one.

I

um But this would be the next step of going. Okay, we've got, we've got a you know. Some love and care applied to uh you know months, leaving the packaging status quo as it is where someone would need to kind of DIY the installation, um The Next Step would be taking this packaging capability and making it into something that you know we can release the world as this is how to uh this is how to use pymat solver and have mumps available.

I

um It would be great to be able to interface with the version of months on conda. That would be, you know, can't install mumps, can't install pymat, solver and you're good to go. um There might be an intermediate package that we would do um where we would have. uh You know you know, I, don't know what it would be, what it would be called.

I

It would be some sort of mumps interface I'm, hesitant to uh try to do anything with the existing uh kind of Open Source mumps interfaces, because they've been dormant for so long and because our months interface needs are so limited, there's very little that we actually need to do so. We don't need a general purpose, uh interface to all or months capabilities. We can get. You know future parity with uh party so with a very, very small amount of blue code.

I

So we could do uh that months, interface, ourselves uh and maintain that, in the same way that Heidi cell is maintained, that's probably the the most pragmatic way to do it, and if someone decides later that they write an excellent and well-maintained uh mumps interface, we can then try Transitions and that.

B

Yeah, let's say that's kind of my my thoughts on that as well, so it's probably the best route. That way we can kind of keep it like how you're saying like it's kind of hard to build time at Silver with mumps at the same time and have it being an optional thing.

B

Are you guys using so you're talking about the pie, diesel stuff? Have you guys been using that, like built locally on your own systems, have you been just installing it through condo Forge? What have you guys been doing.

I

um We've actually been pip installing it.

B

I

That's worked really well for us glad.

B

B

It's interesting so yeah I think that might be a great way to go about that. Honestly.

G

B

Been definitely interested in month, stuff for a while that just haven't had the time to sit down and play with it, because I should have probably have a little bit more time coming up soon supposed to be getting a new Apple silicon process. Machine. So I'll have a use to doing this and I think I already mentioned also mentioned on there. That I had already played around a little bit at my own computer with the uh Apple solver as well, not sure if you ever get a chance to do that.

I

Now I looked at the fact that they can't do complex and I was just like no I'm gonna stay away.

I

I have looked at, um I, did look at umf hack um as a potential under sparse solver, but um the performance that I was getting was nowhere near mumps.

B

Yeah, so it's like there's a few things: uh did you try any of the like, specifically the chill mod solvers.

I

I have him on my radar, but I have not tried them. um The really nice thing with umash pack was uh there's a package called scikit, umf pack and I just wrap direct umf pack and I had a solver that I could use so the uh the wrap direct stuff that lives in uh in solvers.pi. Here that's the worst and it made it very quick to try uh umf pack. Unfortunately, the the results were were not great, so I was I was really happy that it was so easy to to wrap that solver.

I

Someone else had a nice kind of face that made that yeah.

B

Yeah, there's a there's a good chill like tool mods so like psychic, uh sparse, I think that interfaces with chill mod, pretty well I mean that part of that means. You can only do you know chilesky factorizations, those.

G

B

Pretty fast things to do, as you see from like just taking advantage of that is symmetric, is positive, definite on the on the solvers there. The there I I I've looked into a bunch of different solvers myself, and it's just like okay. Well, most of the things that we were doing producer was working pretty well, but now that we're moving on to this stuff, it definitely makes sense to start getting back into these other solvers.

I

Yeah, having something that's available on on more architectures and is reasonably performant is is huge and it seems like mumps months. Does that um pretty well.

B

Yeah I think the only issue with months is that it's not is easily installable on Windows, like you can't get all the features on it was like I, don't think, there's an MPI. You can't do a bunch of the uh reordering heuristics on Windows yeah like it works. It still works.

I

Yeah and I'm thinking, if you're, if you're on Windows you're, probably at this point anyway on an Intel machine and party, so it does pretty well yeah.

E

And it's still worth mentioning that so yeah, that's your PL Max brings in so the possibility to pass option to the solvers and that's how we you also improve the bodies of uh performance. It's not only a piano mems but bringing in those options. Yeah.

B

E

Showing you actually, you were showing good like you, it's improved, both the parties or when the man's side, so.

I

That's that's not quite it so the assault, the options that you can pass to produce so previously. That is symmetric and is positive, definite.

I

um They worked differently for the months solver, and so now what you can do is taking a look at the at the Benchmark itself. We can read this Benchmark code. We can see here that you know okay, which solver do you want, and so now you know if solver is part easier, solvers months, you know we we import the the correct thing, but then we don't have to use different solver options for different solvers um and the the solver object. Behavior is also standardized.

I

So yes, metric now means the same thing for party so as it does for months.

B

One of the things that I was that I'd like to do where we should probably put up a issue on Sim pack itself is to make use of these flags and more of the simulations than they are, at least by setting them default flags. For some of these things, because most of the systems that we do, we should know whether they're you know positive depth or symmetric I.

G

Don't think yeah.

B

Advantage of it more in many of the systems, actually, it's only really taken care of the DC simulations.

E

Like I, like as I said, we've already switched us to that uh branch of Biomat solver, so I'm, actually using like it's very nice, with the servers too, like that you can set the server option on the simulation object. So it's just like a simulation dot, solver option and you specify that it's symmetric and positive, definite and with a single line of code in Simplex. You end up with uh 15 20 percent uh speed increase, which is very nice. Oh.

B

Yeah, my point is that we like should know some of those things about those systems like we. There should be no reason that you would have to set that as a user to know, yeah hit these flags it's going to make it go faster right and.

E

Oh yeah, 100 that we should. We should go at some point as default within the syntax simulation object, but uh it's I just wanted to emphasize that it's a very it's a it's a one-liner.

I

Yeah it'll be awesome to you, know, step one, standardize the options and then step two get the options in by default. That would you know yeah if someone just have stuff running 20 faster with no code changes just by upgrading Sim tag, make a lot of people pretty happy.

I

You know, we've had simulations that run for five days and the effect of that is five days down to four. That's pretty substantial.

B

um So you said you're having some issues with desks stuff as well yeah um with what you were using desk to do, and so.

I

One thing that we're doing right now is we're running a lot of simulations like we're running an ensemble of simulations, um and then we would run simulations across a lot of machines um and what I've found- and this is very very confusing for me- um is that if I have, uh if I have a piece of code that uh you know, let's say, I I break this simulation out into you, know simulation.py um and then I have a piece of code that either you know, bronza you know, starts up a dash cluster and then submits this simulation function.

I

To that dos cluster um I see the simulation run slower there I also see the simulation run slower if I create a Das cluster, do absolutely nothing with it and then do a sub process run of a simulation.pi that just has the stimulation in it. It's like. No, that should have absolutely no awareness of the Das cluster. It's a very deeply confusing issue for me.

B

I

It's reproducible.

B

um Interesting.

I

And I tried reproducing it it's reproducible with both the producer and lump solvers um I, took a simulation and vastly reduced the number of time, steps in it and tried reproducing it with the super Lu solver just to see if I could have something minimally reproducible that I could send to somebody else. Like hey, you don't have to install this software, you can install syntag and then off you go with this reproducer I couldn't reproduce it that way. So I sort of said I'll leave that one for later, but.

B

I

It is a problem that we're aware of it.

B

It's like that yeah. They should look in those cases. There's you're not doing any sort of communication right, you're.

I

Just no, it's an idle desk cluster. It's just sitting there, uh so it's yeah I, don't understand why it's happening. I'd love to get to the bottom of it.

B

You're, starting with a desk cluster, there's a there's, a master or a River they're called the.

B

Client, not the client, the thing that controls the clients, the scheduler, sure there's a scheduler there's but there's a there's, a main there's a thread that has like say it's running on the main machine. And then you, when you start at the distribution or this like a bunch of Das clusters, there's one that specifically.

G

B

Everything else and then there's all the other clients, sometimes like the main thread, can be a client as well.

B

I

Give me one second and I will pull up the code that has this problem.

H

Is super Lu? Where is Super Value coming from super.

I

Lu is one of the rap solvers. um It's the Sci-Fi, sparse, solver and.

H

It's I, don't think it's parallelism.

I

No, it's pretty bad yeah.

H

All right: it's multi-threaded, multi-threaded, Super, Alien, okay, no.

B

E

B

Was so in the meantime, I've definitely been looking at a bunch of mom's stuff and seeing where they're at so the kind of Forge the one that's available is 5.2.1.

B

And I know they've been trying to update it to more recent versions, but there's it's been struggling on condo Forge. There are a few pull requests on the feedstock to update it to more recent versions, but they're none of them have passed so there's a there's like a mumps MPI, there's a month sequential there's, two kind of there's those two got packages yeah.

I

It was interesting, so we updated from five five to five six recently.

F

I

A really nice uh C make set of set of cmank files uh called Psy Vision, slash, Mouse, and that has been very, very useful for getting mobs compiled and built with various options. The options changed a little bit from five five to five six, so that was sort of.

B

Tediously added support for 64-bit integers or something yeah.

I

Yeah, um all of the all of the work that we've been doing has been with the you know, sequential months, which sequentials not the greatest.

B

Name because it is multi-threaded.

I

It uses both open, Blossom, openmp uh threading, it just doesn't run NPI um and that you know gives us feature parity with parties. So um now it would be really nice. If you know we could improve kind of. You know that single threaded performance uh as much as possible, um because you know if we throw more machines at something, then that you know the cost just balloons in order to so um efficiency uh in the kind of single threaded case has been something that we've been trying to pay attention to.

A

Well, this is really exciting Matt, um as you can see ourselves kind of laughing as you pulled up the mumps, the Fortran code, and you can see and get it's seven.

G

Years ago it was yeah last.

A

Last touch so I'm really excited you're pushing on pushing on this um I guess um I know: you've been chatting a bit with Joe on the pull request. What do you both sort of see as as next steps here.

B

And like looking at the forecast, I'm happy to take that in so at least, if somebody can, you know, get mumps working. If somebody can install it like the option is still there and I like I, see what you've done with the Polar Express. It looks good it's hard to bring up the teacher parody. So if we can, if someone can install it with that Bare Bones interface, that's there right now, that's fine! We can figure out distribution and how to do all those other steps I think later, but that's kind of what I see.

I

Yeah, that sounds that sounds awesome, because it's like the the make file stuff. You know the building the mumps interface and then renaming the init.

B

We've been doing.

I

That for a year it works great, um at least in in our use case. um It doesn't look like writing. The side bomb will be that difficult, especially with the Precedence of Pi Diesel, and then a couple of the other mumps interfaces. They're. Just like it's like there's some annoying C versus Fortran, things of like there's I can tell thing and they're one base versus zero base. So Mom says Hey Define these macros and you'll be throtted.

I

um So it looks a little tedious.

I

um One of the things that I did uh was I moved that conjugate option that was in Fortran out to a numpy conjugate call to reduce the amount of C kind of stuff that we would need to do um so. I think it's going to be a fairly tedious, but not difficult right.

I

And then the really nice thing that I I think that we can do. Is you know using setup tools?

I

um We would have the option if we showed shoes of uh you know, including that mumps interface, either in pi Mount solver as an optional extension, so that, if compilation fails, the extension or the package continues to install without the optional extension.

G

H

I

Break that out, I would be happy to do either way, whichever one is easier. I'm just wondering about you, know, integration kind of stuff of you know how. How straightforward is it to to do the kind of integration testing of all of the pieces and have someone be? You know, have really high assurance that the mumps extension is working. That would be one argument in favor of keeping it in there, but we can look at. uh We can look at other options as well.

A

And it's santier Joe if you've got thoughts on if it should be included or not or where we perhaps start and then figure out that question later.

F

E

F

B

My first thought is to kind of treat it the same way we treat the party so interface, um just because it's like okay. Well, if you want that, you can install that and get those.

B

Those things available to you separately from pymat solver or Pi matte solver I.

G

Know it's just.

B

Like another level, but it's just it's kind of a.

G

B

Thing where those specific things are.

B

D

B

Soon, as you start trying to include compiled code Insight, something and especially optionally, compiled code, it gets rather tedious and complicated. So if we could just not yeah.

I

And then the question becomes, you know, how do you do binary distributions versus Source distributions and.

B

G

B

I

Separate binary distribution, it probably gives you a lot more flexibility.

B

I

Yeah it'd be fine with with another another package.

A

Is that something that would be useful to start in the simpig organization like sooner than later or I, mean I, know, Matt you've already got some code where you're playing around with this, like what? What are your thoughts on that.

I

um I think it would be, it would be great to to at least stub out a repository for that, um and then we can see about populating that repository later.

I

um But if the stub is there, uh then I think it would be a matter of submitting PR's to that stub. um So get the get the get the infrastructure in place uh sort of in a minimal way. That makes it easy to to contribute, um because it sounds like Joe. You might have some time coming up soon.

I

I might have some time kind of soon, but if we've got the we've got the place for it, then uh it's it's one, fewer thing that we need to do and it would probably be pretty quick to at least set the place up.

B

Yeah and hopefully it shouldn't be too hard to create that interface like I know, I did the pie diesel stuff in a few weeks here and there, so it.

I

B

I

No I, don't I, don't think it'll be terrible.

B

Okay, I just haven't had the need to do it yet I've been interested in doing it. Yeah I'll probably have the need to do it soon and as well as like put up the.

G

B

Interface for the Apple solvers, as well when they're useful.

A

So an important question is we'll need a name for it.

B

A game, that's not one pie or pie or.

I

Mom's D, so that's the first thing that I was thinking. That's a terrible name.

B

Well, it's like it's pie, diesel, because it's parties.

I

As a party so yeah.

A

Yes, well maybe we can start an issue and brainstorm some brainstorm. Some names.

B

When I felt, the appropriate is like a sidekick in the middle.

B

Yeah well, I think we'll think about it.

A

Do other folks have questions for Matt or kind of more broadly thinking about.

D

uh Yeah I have something that's, maybe somewhat related, so I mean I'm I'm, also working on large loop time domain, em stuff I'm, also wanting to implement the B field, I'm, currently working on solving this problem on a tetrahedral mesh and I'm, putting conductances on the faces so that you can do plate light structures, so I'm, not sure if Tebow's mentioned, but we're we're working on some very similar problems and uh one of the issues that I have maybe I'll you mind if I screen share a little bit of math, but it should be okay um and it sort of comes up if you wanted to use the B formulation on a tetrahedral mesh.

D

uh This is basically this is the system that you have to solve at each time step and you end up having in your system the inverse of one of these Mass matrices and when you have a mesh, that's like a tensor mesh or a tree mesh. This is diagonal, so taking the inverse of it is Trivial.

D

If you have an unstructured mesh, it's no longer diagonal, and so it's in a way it's kind of like you have to do a solve within your solve and I'm sort of look just kind of looking for ways that I'm going to attach attack this like. Can you can you define this using the Paradiso solver and then Define a using a partiso solver in sort of a nested way and have it work out these kinds of things so.

B

My my recommendation, Devin, is to avoid doing those inverses, because those inverses are actually dense, matrices yeah. So if you try to, if you tried to actually form a using a partiso solver.

D

I don't want to like solve it and then send it to dense and then like actually store it outside of that, I mean I'm just starting to look at this problem today, but yeah like you want to use anisotropy or you want to put it on an unstructured mesh. uh This is an issue.

B

Foreign, if there's a way to not use the B solver like if there's a way to not use the B formulation.

D

Yeah well, if you're going to at least for what I'm doing, if you're going to model dbdt, then you use the e-formulation no problem. uh If you want to solve in terms of the magnetic fields, then I mean I haven't looked at the H formulation, and maybe you can use that for non, not having conductances. But if you have a conductance, you're kind of stuck with uh Eon edges and B on faces, and you wind up with this.

B

I think a better route if you wanted to get the big deal which would be to implement the integration part on the E field measurement to do what you want.

D

To do yeah I'm, not the biggest fan of that one, but.

A

It's something we should try, though I mean we should have the option to at least have a sense of what uh the accuracy is, and you know there might be a lot of settings where that works works. Just fine.

E

D

Yeah, but even if it resulted in storing two dense matrices, even if you didn't want to do it, but could you still do it with the Machinery that we have.

B

So, like the eight Matrix, you would form so. Okay, if you want to perform the a matrix for those tetrahedral meshes using that inverse Mass Matrix, it is going to be dense, like there's no way you can get around it, because oh.

D

Okay right because, okay, because you have the inverse of that mass Matrix, which is a dense, Matrix and then you're now trying to go and Factor something. That's very dense, which.

B

The benefit of the whole thing was factoring something.

D

B

Yeah I, don't even think you could probably store it on your machine. Yeah.

D

That, okay, that makes sense, yeah I, get it now.

B

So that's why I say like just you'd, be better off implementing the integration on the Deep edcs.

D

That might be our only choice.

B

H

I'm gonna ask you a question Matt uh just circle back to uh to direct solvers, um it's it's kind of like a uh anecdotal observation and I I wish uh John was here, but uh he experimented a lot with Paradiso and he if I recall, he observed that uh parisu is eventually getting to like a saturation point or even, if you give more threads, it was actually not necessary.

H

Solving the problem faster I would be curious to know if, uh if you experimented with it or if moms you know, if moms could has this at the same time, if you have a first If You observe that then second does mom has a work around this yeah.

I

Go ahead, Tebow, sorry.

E

I was just gonna say personally, I don't have numbers, but I have observed that parties or past, like I, think even like things yeah like 48, was kind of the maximum thing. I was getting, but even before that, like the difference between, for example, 32 to 48 CPUs was minimal, like it was kind of like an asymptote and plus 48 I had nothing no gain at all, but that's just a personal observation. I don't have numbers, so we met yeah.

I

I, have some uh I have some numbers that I can share here. um So this was a another Benchmark here.

I

um So this was you know, party so and mumps so going from one thread to 32 threads, and we can see that.

E

I

Know 32 times more threads doesn't mean 32 times faster, it's barely two times faster um and then you know we we see a similar kind of uh you know similar kind of saturation here so going from going from one thread on this uh x8664 to 32 threads.

I

um We're we're not seeing actually I think this is. This was a different Benchmark. um Let's take a look.

B

Are those like the factorization times or the actual results, or all of it together.

I

B

This is okay, this.

I

Is the same Benchmark um I think this was, uh let me get you the exactly. What was timed there I'm gonna unshare and see what kind of? uh Because you don't want to watch me explode file system.

H

That would be pretty disappointing if, if putting 12 threads is not at least uh you know much faster than single thread. Yeah.

I

Here we go again, this was uh so. This is the the. What exactly is timed here. um So we set up the simulation um and then we run the simulation.

H

So it's both together. It's everything yeah.

B

Exactly analyze, Factor, so yeah.

I

um But what we're? What we're really looking at here is, you know the wall times and going from um you know, going from the the mumps single thread, um some of the most 32 thread, and we see you know going from. You know the time about halves. So it's about twice as fast and it's the same thing with party, so that saturation is definitely a thing that happens.

I

H

It would be interesting to see if you have a bigger system right, because maybe your system is just too small and then there's nothing to work on, but uh yeah. If you give them something much larger at what point, it would be good to understand.

I

Yeah yeah for those larger systems, understanding the memory use as well. That's going to be very important.

H

Yeah absolutely.

B

G

I

Profiling uh capability now than when I first ran these so.

F

I

Lot easier to get those numbers.

B

And the whole topic of sparse solvers is a very interesting thing and how it goes about it's a it's, a very It's, oddly enough, it's all graph Theory, how they construct things and how they Implement trees, um how they follow operations around it's. It's really fascinating issue.

H

A lot of phds for.

B

Sure, like parallelization, they break it down blocks of things. They try to reorder reorder the matrices in such a way that you can work on chunks of things at a time in them, and then, even in that case, they they sometimes introduce zeros into the blocks to make them blockier.

B

It's really interesting, like there's a whole bunch of stuff going on it's. It's fascinating.

I

I guess one question related to these sparse matrices is I saw that there was a good while ago. uh There was an issue discussed in in the pi math solver uh issue, history of looking at the panua implementation of harbiso, which is significantly diverged from Intel's mkl implementation.

I

The licensing for it looks pretty onerous, but if that's something that could be evaluated, that might be something to understand if it's worth dealing with those kind of Licensing situations.

E

Yeah because basically panuapat is always the licensed body so which is updated if we know on them, while I think MK El pasito is that like 2004 2007 implementation, it does not change. Since.

I

So the the panel of folks claim that um the two implementations diverged in 2006.. uh Looking at the Intel 1mkl release notes, there have been a couple of comments over the last few years on changes involving the party so implementation, but it's been pretty late, so I don't know the the Intel version of it is really actively developed.

I

So we're dealing with pretty old code. There.

H

Yeah and this one is thread safe right there we haven't talked about this, but um if we could have a thread safe, direct solver that could change the way we line things up when we're doing the uh the paralyzation of the inverse on the inverse side of things.

H

um That being said, uh we, you said something about in the in the chat. That kind of uh triggers. Something in mind is is the is the storage of the of the factorizer. If we could reuse the factorizations of the direct solvers right, do you have hooks right now to easily save out to disk um the solver, the the factorization of the solver.

I

uh I, don't have those hooks implemented, um but I was taking a look at the the user guide a little bit earlier and it's a function called to save and it's a function called to restore. um So it's not too bad same.

H

Thing you wanted to you.

G

I

To a memory mapped file, you could have some pretty fast serialization.

H

Because because then we could, uh instead of having requiring the uh the the solver to be thread safe, we could just regenerate them. You know uh have mechanism to generate multiple solvers and then they just reuse, whatever has been already be computed, and that would be pretty pretty neat yeah.

I

So one question about the the thread safety here um is so I've gone, very deep on looking at the solvers, but I have not really I, don't have any familiarity with how simpeg is is sequencing or ordering computations?

I

um So one possibility here as well um depending on the sequencing is: uh is there something where we could be? We could be doing something where okay, we factor and then you know we're solving x equals B. Can we make B into a large Matrix instead of a vector uh and then that could potentially bring in you know other gloss. Extensions um I was doing some experimentation with the effect of uh G, the uh G of sorry g e, n, n t open gloss extension, uh taking a look at what are the performance differences?

I

um It looks like it's only really active when we give a symmetric hint um but I didn't see a really remarkable increase in performance Delta between unsymmetric and symmetric uh between when gemmt was available to monksen, when it wasn't um but I'm thinking that that might be due to the types of computation that it's doing, if you're doing something. That's not really all that parallel.

I

If we can move from, you know, parallelizing, you know by uh you know.

C

G

I

Instance of most more stuff to work on at the same time, there might be possibilities there, but I I'm not familiar enough to even have a like a proper informed discussion. From my end on on Simpang code.

H

Yeah, it is definitely faster if you give, instead of a single calling, moms or parties multiple times on a vector, if you give it a matrix, it's definitely going to be faster, all right to solve. That being said, eventually, your limited memory wise, you cannot give a b that is too large, because then you you have to you have to generate that that right here, right before you give it to uh to the solver.

I

Right, certainly,.

H

So it's it's kind of, and you know it would also uh if we can paralyze uh if we can paralyze over those solves. That means you could technically cast it over multiple machines right, so you don't need to do it at all. All at once. In one place we could also spread it around yeah.

I

That would be, that would be really nice.

H

I

Yeah and that's that's something too we're you know getting into the types of parallelization that are easy for people to manage. uh You know HPC kind of stuff. You know if you're doing Slaughter, job orchestration and then NPI stuff, like that's great. If there are things that you can do that are easier to manage, uh you know like cloud enabled stuff. You know if you're, if you're going to be running on uh I, do a lot of AWS work.

I

So you know if you can be running on um you know a task cluster, the environment management for that is far easier, and so um you know broadcasting that uh you know that saved Matrix factorization across the Clusters can be way easier than getting things involved with MPI just from environment management. Of what software do you have installed on the cluster.

H

Yeah and even code code, wise yeah,.

E

H

Much simpler to implement for anyone right they can, they can create a new simulation or whatever, and it's just it's kind of a template. It's easier to deploy.

H

That was good. Thank you. I need to get going, but great talk thanks thanks man for the work to uh to see it coming, try it for sure.

E

Thanks Don.

A

Thanks everyone thanks so much Matt I, don't know if folks, um if anybody wants to stick around you're welcome to I also need to drop off but really appreciate this Matt. It's super cool to see, and um it's great, that you know you're picking up at this level of the code and making optimizations here, I think it's um really exciting, and even you know, as Joe mentioned some of the things we can do in the base simulations to just you know, start to really take advantage of some simple wins.

A

um I'm glad you've prompted that so much appreciated.

I

Awesome yeah: it was great to great to meet everyone here. I'm excited to do more.

A

Yes, absolutely and brainstorm a name.

E

Thanks a lot Matt thanks.

C

E

C

A

G