National Energy Research Scientific Computing Center (NERSC) GPUs for Science 2020, 5 Aug 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Introduction to Perlmutter

Description

Jay Srinivasan of NERSC presents an Introduction to Perlmutter. Recorded live via Zoom at GPUs for Science 2020. https://www.nersc.gov/users/training/gpus-for-science/gpus-for-science-2020/ Session Chair: Oisín Creaner

A

Okay, I'm gonna try the captions, okay, great okay,.

B

A

Thank you all for inviting me to give you an introduction to to promoter. uh I'm jason young austin, the project director of the formal project that we have with the nurse to to bring in the system.

A

um Let's see so promoters is going to be the uh I think the caption keeps mispronouncing it, but or mistyping it um so pearl marta will be the basis of our gpu systems for the next uh few years, starting later later, on in in 2020 and 21.. uh So I just.

B

A

To give a quick introduction to nurse, I think most people here might be familiar with nurse. We are the mission high performance computing facility for the office of science, uh and uh you know on the left. You see a bunch of statistics about nurse. We have thousands of users, hundreds of projects and a lot of codes, and what that means is really we have a diverse workflow and that diverse workload you know is characterized by simulations uh by data analysis and recently learning as well right.

A

Some comments here, okay, um so uh what that means is also that the systems that we get- and you can see the roadmap here from later earlier on in this decade- uh have to be able to cater to that really diverse workload that we have right. So starting in 2013.

A

We had edison uh and then followed by corey, which gave our users an introduction to the mini core uh era and then uh later on this year and in into 21 as I'll talk about we're going to have the ninth generation of our systems, which we're calling promoter and those will have a mix of both cpu and gpu nodes on the system and then later on in this decade, we'll get into nurse content right.

A

So what is promoter? It's a system that uh you know we've uh right from the get-go. When we started the project we decided it was. It was gonna, be a system optimized for science. So uh what does that mean? It's a system that provides a substantial increase in performance over over quarry, which is our current uh knl based system.

A

uh That's three to four x of quarry. It has a mixture of both gpu accelerated and cpu. Only nodes that meet these three pillars of our needs. Right. Simulations.

C

A

Analysis and learning right, uh the.

C

A

Cpu partition gives us a capability similar to what we have on quarry today, right.

A

Allows us to support these complex workflows that we have using all of the resources that we have on our system.

A

That's compute storage and networking right. The the data stack. That's optimized uh for this system will enable us to support both uh analytics and machine learning at scale right. So uh what.

B

Does the system consist of.

A

The the little diagram on the right shows you the major pieces, and we have cpu on nodes. We have gpu accelerated nodes, we have an all flash, integrated storage and then on.

A

The all of this is connected together using the next generation of interconnect by by create called uh slingshot, and that's an ethernet compatible interconnect and what that does is it basically opens up the inside of these machines to in sort of a seamless fashion to the outside world right, and it enables data movement uh much much more easily than was possible before uh on on systems like edison or even core right, the uh the gpu only nodes as I'll talk about have four nvidia gpus with uh you know the latest in cores and interconnect as.

B

Well, as high bandwidth memory.

A

And they'll have one amd milan cpu, which is the next generation from what people are obviously now on the wrong line. uh We'll have over 6 000 ampere gpus, the uh the interconnect as I've talked about. Is uh this ethernet compatible high performance interconnect and really we we expect that we'll be able uh capable of terabit connections to and from the system right.

A

So uh here's a chart, that's similar to what I just showed before, but sort of also shows how we're bringing in the system right so we're because of various timing issues and the way things sort of roll out from the technology perspective.

A

We're bringing this in two phases right, so we'll bring in the first phase uh late this year and that'll consist of the gpu accelerated nodes, uh all of the storage um and uh and the all of the associated nodes that will help us run this as a system right. So that includes the login nodes, all of the nodes, for you know high memory, work, workflows and so forth and then, as well as the storage and access to external storage, so it'll be fully integrated into nurse. Just like the systems we have.

A

Milan cpus will come in later in 2021, as will uh the the you know, client side of the high performance interconnect right, which is the the slingshot part.

A

uh So what uh what does the system consist of? As I showed you? How is it built out right and the ampere gpu nodes, as well as the melon cpu nodes, will both go on blades? The blades will have a different number of nodes, depending on whether their gpu, accelerated or cpu.

A

Only all of those blades will go into a compute rack uh and each rack will have 64 blades so we'll have either 128 nodes or uh 256 notes, depending on whether it's a gpu accelerated, blade or a cpu only blade, and then all of those racks will be put together to give us this pro monitor system and that has 12 gpu max and 12 cpu racks right, um the uh you know, the other part that's important when it becomes a whole system is how do you get it all to work together right and uh that's probably of most importance to to you all as users.

A

We have activities uh in in the areas of network the storage, uh the application, readiness work, as well as the system software work, to ensure that this uh system that's been put together will work really well uh for our users and be a productive system right, and so you can see from those four areas that all of those four areas are things that are are new technologies to promoter they're, not just new to promoter they're, also new technologies. Overall right, uh the the high-speed network is a brand new technology from craig.

A

The all-flash storage is going to be one of the first times that all flash storage is run uh on a shared file system, which is luster based at the scale right for the kinds of workloads that the diverse kinds of workloads that we have obviously gpus aren't new, but running it in production, for our diverse workload at the scale of users and at the scale of science that our users do is new and so getting our apps ready, for it is an important uh part of that effort and the system software that ties everything together is also a new uh revision of the system.

A

Software that create has been putting out.

A

So if there are any questions, please speak up, I don't know how uh regina or uh others you want to handle the questions, but uh so I you know uh you're gonna have plenty of talks today on on.

A

uh You know, over the next couple of days on gpus and so forth and max is coming right after me, I think to give more details on uh on gpus, but uh uh the a100 was.

B

Just recently released.

A

Right, um obviously, we didn't have as grand uh releases if uh people had do it were able to attend gtc in person, uh but hopefully everybody has uh listened to the talks and things like that from gtc and seen uh the features that uh ampere has uh so we'll be getting the a100, which is a an implementation of the ga 100 gpu right, and you can see all of these statistics these.

A

You know stats and speeds and feeds on there and we're looking at things like you know, almost 20 teraflops with the tensor core on fp64 and so forth. So the other features that really uh are of importance to, we believe to our users and uh in fact, that the talks over the next couple of days are addressing uh are things like uh you know? How are these things connected together and how are you going to be able to use?

A

uh You know the four of them that we have on each node on promoter effectively within relay three right. This multi-instance gpu technology that ampere puts together is very interesting.

A

We're going to have it available in promoter, of course, and to use it effectively we're going to have to be able to integrate that and make it expose that technology to our users, to our workload manager and to how uh users access the system from a scheduling point of view, and that's something that we're going to be looking at very closely between now and when promoter comes into service.

A

uh The tf32 support- and I think jack and uh she talked about this- is the the mixed there's going to be a number of talks on uh mixed precision, stuff work and that's going to be very interesting. So uh I think that technology is going to be explored as well over the next couple of days here uh for the system that we have. uh You know I just wanted to touch on one specific aspect of the all flash file system.

A

um So this one, uh obviously it's uh all flash uh it's luster-based and it has uh it's gonna, be fast, it's gonna be usable and it's optimized right. So it's fast across multiple dimensions has high bandwidth because it's all flash it has uh excellent higher iops performance, and it's uh able to sorry this should say on 3.2, with 3.2 million uh file creates per second right.

A

It's going to be usable for our users, there's 35 petabytes of usable capacity, that's uh on essentially on the machine right. um It's not a remote file system that you have to access to a small network pipe or anything. It's on the machine. It's part of the fabric uh that compute nodes are on and the gpu and the cpu, and it.

B

Has familiar luster interfaces.

A

So if people are able to use it and we're going to have data movement capabilities uh that are new, uh that allow people to move data seamlessly between the tiers right. So what are the tiers we're talking about? Is the storage, the external file systems that we have and, um and obviously things like uh uh you know, the archival, storage and so forth. So um and then finally, there's a number of optimizations that luster clearly works at scale as people know now on corey, uh but uh with all flash.

A

You have to worry about things like how does small file I o perform? How? How do we take advantage of these high ops that are there uh and so forth and where we're making sure that the file system is optimized for that.

A

uh The other thing that you know we talked about with these four centers of excellence, areas that we have on earl mother to make sure that the system is ready for our users.

A

uh Obviously, the the thing that's probably of most interest to you all is that application readiness box in yellow- but you know we do- have a number of other areas that will uh perhaps, while working in the background will make.

A

Hopefully your lives easier when you do use the system- and that includes things like system software and scheduling. How do we schedule these multiple resources and even drilling down into the node? How do you make sure that things like the multi-instance gpu technology is available in a fashion that makes it useful for multiple users to use the same core gpu partitioned into multiple instances and things like that?

A

The workflow architecture, which enables people to support you know, data, intensive workloads and things like that. How do we make sure that that's available on a system? That's going to be new, that uh has a new system software stack that has this diverse workload on it, the storage.

A

I just talked about uh making sure that all of those features available on palmada are tested ready for our users and the networking as well, which enables this us to take advantage of this ethernet compatible network, so make make available all of the features that allow ethernet to sort of connect up to the outside world in a seamless fashion, available on in the inside of the system as well.

B

Jay we're just coming to the end of the 15 minutes. If you can come to a tidy up block and then we'll take a few questions. Awesome.

A

Yeah, I just have a couple of more slides. I think so. um The other aspect- that's useful of this effort- is the nisap program that I think you're going to hear a little bit obliquely about.

A

I think, but it's really the basis of all of the interactions that nurse staff are having with our users to get them ready for gpu right, and the goal here really is to prepare not just our users but all of voe office of science users for promoter and the way we're doing it is using the same model that they've done before with corey, which is to partner closely with a broad range of application teams and with the vendor and apply those lessons that we learned, but with those partnerships to the broad community right.

A

uh The knee sap effort specifically for promoter is going to partner with about over 50 application teams across two tiers that have slightly different levels of engagement uh and uh and the work that you're gonna hear about.

A

I think uh will will show you how well we're doing, and in fact that's what has given us- uh some confidence that gpu system and the cpu system and the way we've divided up the resources on our promoter system into those two uh you know technologies is, uh is going to be useful for our users and it in fact, what motivated us to ensure that we get in the gpus and make those available to our users as soon as possible.

A

um The there's other work that isn't sort of formally part of the project, but that nursk is doing we're working with you know, pgi, to enable openmp, gpu acceleration, and that's uh effort that I think you're going to hear about a little bit where obviously performance portability is a key aspect of making sure that these systems are usable, are productive for our users right, not just on nurse, because nobody runs just owners. People run all across resources in the doe complex and so uh and then.

A

Finally, you know in terms of how we present this information to our users. You know people have hopefully seen our strong presence in gtc, even though it was virtual, providing information and training to our users.

A

We have quarterly hackathons to focus on some support on porting all of these applications to gpus and we've partnered with other doe labs in these community hackathons as well.

A

So in summary, we're really excited to be able to introduce promoter, which is the system that's optimized, for science to our broad user base. Our staff have been engaging with our users to ensure readiness for the system and uh the the effort that the postdocs here put in to bring this.

A

This set of sessions to our users is sort of testament to that. uh We have a very strong training effort in collaboration with nvidia and createhpe, to give new uh information about the new technologies and promoter to our users, and uh we really look forward to working with you all to make promoter a very productive platform, the next generation platform for our users and uh I'll stop there um happy to take any uh questions that you have.

B

Okay, so thanks for that that jay we have time for one or two questions. If you stick them in the q, a session.

B

uh We had a question there from hugo hugo. Do you want to unmute yourself.

C

Yeah, hello, jay, uh you show the slides with the synergy of the different uh team working on the network on the app readiness and other boxes for building palmetto and making it an efficient machine.

C

How will you describe the synergy between all these uh factors that make per meter uh a good, supercomputer and easy to use by the end users like how all these teams work together to to make it uh transparent to end user.

A

Yeah, that's a good question, so uh you know promoter is really uh the focus, the most important focus of nurse. I would argue right and what that means is you know we at nurse. We have a formal project that says: okay, we're going to bring in the next generation of the system that we have over 50 percent of nurse staff are directly working on the effort to bring in uh per mutter into production right, and so, when we've split it up into these different groups. uh Really.

A

What that's saying is that there's a a focused effort by these different groups of people on these different areas of the project, but overall it's part of bringing in the same system so or you know all of these teams.

A

We, we have, you know weekly meetings to to bring in to bring together all of the efforts that these people are doing. All of these people are going to be participating and uh are already participating in some of the early test.

A

Testbed hardware that we have, which isn't yet at the scale where we can expose it to users or it isn't, doesn't necessarily have all of the technology uh that would make it useful to expose it to users, but for the staff effort, it's possible for the staff to start getting access to some or all of this technology gradually between now and the end of this year, when we have how money are in there, and so all of those efforts that people are working on uh sort of mesh together in in the back end, if you will right without necessarily being exposed to the users right away at this point.

A

But it is part of one big project that uh you know, like I said over over uh half of uh uh nurse staff are directly working on promoter related activities uh at this point right and have been for the last year really.

B

A

You for that, okay.

B

um I'm gonna need to move on. So if you do have any further questions, um please do put them in the q a and we will bring them up.