National Energy Research Scientific Computing Center (NERSC) NUG 2013, 28 Mar 2013

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: NERSC Today and Over the Next Ten Years

Description

NERSC Now and Over the Next Ten Year, NERSC Director Sudip Dosanjh. Recorded in Berkeley at NUG 2013, the annual meeting of the NERSC Users Group.

A

So what I'm going to do is kind of give you an overview of nurse today and then talk about kind of what are our plans for the next ten years. So your submission is to accelerate scientific discovery after the only office of science, and so it's important to note it is at the office of science.

A

We do no collaborations that are water, but our primary mission is serving the needs of our clients and it's through high performance computing and extreme data analysis and we're seeing as Kathy mentioned kind of a growing importance of data at nurse and I'll be talking about that throughout the talk as I mentioned nurse is unusual and that we do have a long history. Therefore, there are other. Do we computing facilities that are that are more recent, but nurse was established back in 1974, it's 1996 that that nursed moved to Berkeley, Lab and I.

A

Think what's notable. Is that a number of things recently? You know really focused on data, so we're going to continue to focus on computing.

A

Your computing at scale and height throughput computing, the Kathy mentioned, but a number of the things that have really been kind of notable have had to do with data one is we establish the PD SM data, it's a computing system or nuclear in high-energy physics, and that continues to be upgraded and.

A

I know that will impact on those communities. Hp SS became the mass storage platform in 1999, so we established a facility wide file system in 2005, and then we had started our collaboration with the joint genome Institute to provide all their computing back in 2010. So so a number of these have really had to do with with kind of our growing mission and data.

A

So in terms of nurse today, I was really going to kind of focus on kind of what's different about nurse than some of the other computing facilities, because you all know about Oak, Ridge and Argonne as well, and so, but we are quite quite different from them in a number of important ways.

A

We we do also you know, work with computer companies and to deploy advanced, HPC and data resources. So we deploy a wide range of different types of systems, better first of the kind and push the state-of-the-art in terms of technology in different ways we deployed hopper, which is the first one of the first Cray petascale systems, with a new gemini interconnect.

A

We're currently deploying edison, which will be the first cray petascale system, with Intel processors and an Ares interconnect and a dragonfly, topology and and Edison is there's a four cabinets or on the floor now and and we'll be moving to general availability.

A

For that, it's the DARPA serial number one system that we have on the floor right now, we're developing nurse 8 in collaboration with aces and those will be designed as on-ramps to exascale and we've architected and deployed data platforms, including the largest, do e systems focused on genomics and we, as I said, we had one of the first facility wide file systems.

A

You know we employ experts in high-performance computing, computer systems, engineering, data, storage and networking.

A

So one thing that's different from up for us is that we directly support do a science mission, so we're the primary computing facility for DOA offices science. So we do allocate about 10% of the resources ourselves, but most of the allocations that nurse are really done by office of science. So six program offices allocate their base allocations and then, when we put a new system on the floor, they can submit proposals to over four over targets and the deputy director of science prioritizes those over target requests.

A

So so it's really a you know it's serving the the needs of do a Office of Science up. There is showing kind of the breakdown among the different offices, and the other thing is that the usage shifts of Steel we priorities change, and so what we notice is that we Tyrael science, for instance, has gone up in the last 10 years.

A

Biosciences earth sciences, combustion engineering, and so there has been kind of this move within the Department of Energy from science, so they're still doing fundamental science, but there's been kind of a growing emphasis on science that has an application, and we see that reflected in terms of the allocations at nurse.

A

We really are focused on the scientific impact of our users. They're about 1,500 journal publications per year about 10 journal cover stories per year. On average, there were 13 in 2012.

A

There have been a number of notable accomplishments using nurse resources. It simulations at nurse were key to two Nobel prizes in 2007 and 2011 data resources and services played an important role in two of Science Magazine's top 10 breakthroughs of 2012 Smithsonian, magazine's, fifth, surprising scientific, milestone of 2012 four of science magazines insights at the last decade.

A

The other thing to note here is that a number of these have really in add, in addition to simulation, have involved data, and so both of the the discovery of the Higgs boat boson and the measurement of the theta 1-3 nutrient neutrino weak, mixing angle. Those were both focused on data, the three genomics ones that are at the bottom. Those were focused on data, there's a supernova that was caught within hours of an explosion in 2011.

A

That was, data that was transferred from telescope 2 to the nernst systems, and analyzed and telescopes from around the world were redirected the same night. So so we are seeing this shift that Kathy had mentioned. The other thing that's really different for us is that we support a very broad user base, so we have 4,500 users and we typically add 350 users per year.

A

So it's geographically district distributed, we have 47 states, we have multinational projects, so we have users around the world. We have 10 states with over 100 users and we have 13 with 50 to 50 to 99 users. I was tell my story, is I was on a Southwest flight and I had off my hopper shirt and I hadn't shaved I. Had my tattered jeans on, and someone came up to me and said: do you work at nervous kidding, and so so we have users, apparently on all.

A

If you get on a plane, you're likely to run into a nurse user.

A

So we support a very diverse work load. You know we have many codes over 600 codes and algorithms, and so again this is you know the the Oakridge and Argonne they focus on on.

A

You know a dozen or a few dozens of users, and they have you know maybe a dozen code teams and codes that they worry about. We have 600 codes and algorithms and we really in terms of algorithms. We have all kinds of different things, ranging from fuchsia fusion to density, functional, functional theory to climate, to mb. Lattice QCD, so so we have to serve the very broad needs of this community. We also have people running at all kinds of different scales. As Kathy mentioned, we have people running.

A

This is showing the job size breakdown on hopper and and it has about 150 3,000 cores. So in red there are jobs that use over 65,000 cores, and so you see that we have lots of people using 65,000 cores and over over 15,000 cores, and then we have lots of smaller simulations, we're really very high volumes of smaller simulations. So we have to be able to support this very diverse work load, so we really have an operational priority which is providing highly available HPC resources backed by exceptional user support.

A

So so we try to maintain a very high availability of our resources, so we always have one large HPC system available at all times, so we try to have two systems on the floor if at all possible, because it takes usually several months to get one of these systems stabilized, and so so you know right now. Both argon and Oakbridge have been upgrading their their their systems and they've not been available for for a period of time and so a notice. We don't do that yeah. We can't do that.

A

Given our mission in our in our user base, our goal is really to premised the productivity of our users, so we provide one-on-one consulting. This shows the number of tickets over time. So this is essentially we deal with this with constant staff over the last 10 years, and so ten years ago we were seeing about 3.4 tickets per user.

A

Our goal is to solve a have a path for solving 80% of user tickets within 3 3 business days, but you can see that there are thousands of tickets that are generated and and and so right now we're at about 1.2 tickets per user, and so it's really critical for us to develop kind of scalable methods to deal with this in that in that we'd really try to use extensive use of web pages, putting information for you all online training.

A

You know people understand that if you can do things that reach a broad segment of this user base, that it will help us keep the number of tickets down per user, and so that's really critical. So I guess in terms of the feature, needs and challenges.

A

So we've been asked by do-e to do us strategic planning, so we've been busy doing that for several months and so I'm going to talk a little bit about that and what we project is the future needs and challenges and then kind of our strategy, so so Richard and and harvey wasserman. They do these requirements, reviews with six program offices. So there are reviews with every three years with each office, and a number of you have probably attended some of those. But the program managers invite representatives set of users.

A

Typically, we try to get more than 50% of the usage from that office represented at these meetings with BES, that's harder, because it's a very large community at H at a cheapy we had about 85% of the use represented at the last meeting.

A

Richard Harvey worked hard to have them identified, that their science goals and representative use cases and based on those use cases they try to back out what the requirements are, and then they rescale the estimates to account for users that are not at the meeting. And then we aggregate the results across the six offices, and then we try to validate from other sources, including including what we hear from this meeting.

A

This tends to underestimate the need, because we're missing future users so so, but if we project out so the the black line, there is the trend, the historical nernst trend over time of computing. That's that's available to the users. This is normalized in terms of hopper, which is 1.3 peda flops. So so one there is is one hopper year, and this just shows over time it's been a pretty pretty linearly.

A

We've been going up pretty linearly on this logarithmic scale and and shown in red, are the actual hours delivered and if you right now we're when we deploy Edison, this is where we're going to be with hopper and Edison, and so what's going on. Is that that, in terms of kind of the the general-purpose x86 processors, it's going to get harder and harder to stay on this trend line with those kinds of processors?

A

And so we made the strategic decision with Edison to make it an x86 based system, because the users really weren't in general, ready for GPUs or accelerators. But you can see that if we continue that we would fall below the kind of the historical trend line so with nurse 8 we're really looking at some kind of system that has a more energy-efficient architecture.

A

That would let us get back closer to the trend and we have kind of a range here that depends on the budget and so so for at the higher end of the budget, we'll be able to get back on the trend line. So, of course, the important thing is really. What are the users need and it turns out that the users need a lot more than that, and so this is kind of the agar negation from the requirements.

A

Reviews and you can see that the the need is actually much higher, and this is a logarithmic scale and the need is actually much higher.

A

The other thing that you'll notice is that if we just project this out the the aggregate need of the six program offices at nurse which surpass exascale in about 2018, and so in that timeframe, people need to be able to run simulations that are hundreds of petal flops or thousands of millions of simulations that are a petaflop, but in aggregate that'll be that's well, then that'll reach exascale and about in about 2018.

A

So if you see this, if you plot this I was asking Richard. Well, you know on a logarithmic scale, you don't quite see it as much, but if we plot it on a linear scale, you can see that in the Northgate time frame that, if we were on the lower end here would be over a factor of five less than the what's been identified as the the science needs of the different offices.

A

So we've also asked the users about storage and and aggregated those needs and and in terms of 2014. If we went along our historical trend line would be about thirty petabytes short and in in 2014 and 2017 would be two hundred and forty five petabytes short.

A

The other thing we've been looking at is the the data traffic into and out of nurse, and that's also following a linear. It's going up linearly on this logarithmic plot and there are a couple of notable things. One is you see this slight drop here, but that was really because of some improvements in software and TCP Auto tuning. But then you saw this jump as we started to see more traffic. This is really from from high energy physics, but you could imagine that, and this doesn't count the jgi traffic and so there'd be another step.

A

If you we were to plot that, so every time we engage a facility, we're really going to end up seeing these jumps up above what we're seeing is kind of the trend line and what we're expecting to see is we'd see our first petabyte day in 2020.

A

So that's really an amazing amount of data, that's being moved so I get asked what what surprised you the most, and so this is really what surprised me the most since I've taken over over nurse, because that nurse user is import more data than they export, and so, when you're running a supercomputing Center you with your traditional model, as people do simulations and then take data away.

A

What people are doing is, while they are doing that they're there they're transferring away lots of data and we've gone up to a petabyte per month in terms of data traffic out, but for the last four years we've actually seen more data coming in and so a number of times we've seen more than a petabyte per month, and so this this plot includes jgi, whereas the previous one didn't but we're seeing really staggering amounts of data coming into nurse skin. And so this is really a lot of.

A

This is experimental data or sequencing data and people are bringing it to nurse to do those analysis that I was showing earlier.

A

The other thing we're seeing in these requirements reviews is that there's a increasing emphasis on data and all the reviews that Richard and Harvey have had they pulled out various statements, but but we see kind of the greater need for for storage for analytics for archival storage, for sharing data curation, so so we're seeing that kind of across the board and we're also collaborating with more XD experimental facilities and they're facing you know kind of a similar set of extreme data challenges, as our computing folks are facing and basically what's going on, is, if you look at kind of the improvement per year.

A

This is in memory. This is in processors, and this is an instruments, those things like sequencers and detectors. What you're seeing is we're used to thinking that processors are on this Moore's Law curve and that's really a very fast of improvement, but things that are instruments are improving at a much faster rate than than Moore's law.

A

And if you look at things like cost per genome, it's dropping much faster than Moore's Law and if you would look at expected data rate production from things like light sources, what we're seeing is that they're going to get up to terabits per second in the next five to ten years, and so so, when we were talking to the light sources, you know they were kind of projecting their there. Their data needs. You know they're at about 65 terabytes per year now and they were projecting or in 2009.

A

They were expecting to go about 1.9 petabytes per year in 2013, and if you just extrapolate, this trend line out they'd be out to up to exabytes in 2021, and there are other other communities that we deal with. Were you know they're they're going to be generating hundreds of petabytes of data that they need to they need to analyze and what they're seeing is that in a lot of cases that you really can't analyze all the data?

A

And you really can't compare across data sets and a lot of scientific discovery is really from comparing across data sets and they have very limited ability to be able to do that right now and I won't spend as much time on this. But but you know, as Kathy was pointing out, you know the computer industry roadmaps are not going to meet the the mission needs that we're seeing here, there's kind of a great challenges with the technology as we go forward.

A

There's lots of possibilities for improvement, so what I'm showing here is ASCII red. This was the first teraflops supercomputer that we deployed at Sandia, and this is a Intel teraflops chip. So so this is in in practice.

A

This would be the same amount of computing as this, but but but when you actually try to program this- and you look at the amount of memory that's available and and the memory bandwidth it's going to be a real challenge to get the scientific productivity out of this that you got out of this, and so I really need to meet these challenges through hardware and software. So we're gonna need to rewrite some of the codes, but we're also going to need to influence the computer industry.

A

So so we've Kathie's talking about computing on things like this. You know they really don't care about reliability as much right now. Once in a while, you have a failure. You may not care as much. They also may not care about correctness right.

A

You you're very worried that that, when you're done with your computation that you can trust the result, but but if we're counting on things that are in cell phones, if you drop a bit every now and then they really don't care right, and so so we really need to be able to somehow influence the computer industry to meet meet our meet our needs in terms of reliability and and and and correctness.

A

And so what I'm showing here is that you know we've been on this trend line, which has been as Kathy, was showing it's been exponentially, increasing improvements, but we're really beginning to see this. This fall-off in terms of what people are actually achieving on these systems. It's not keeping up with what the you know.

A

What the the peak or the Linpack numbers that we're seeing so so I did want to point out that these challenges really aren't impacting just exascale, as Kathy mentioned, it's really all skills of computing that are that are going to be impacted here.

A

So in terms of our strategy, what we've been looking at is you know we want to be able to meet these ever-growing computing and data needs, so we need to provide usable, exascale, computing and storage systems, we're starting now to work with a number of users to begin transitioning the codes to execute effectively on many core architectures I know a number of you've been working on this already, but we are planning to engage our community more in the next several years.

A

As we begin preparing for nurse gate, and the other thing we're doing is we're really looking at? How can we influence the computer industry to ensure that some of these systems can meet the the mission and science needs of the office of science, and our secondary objective is to increase the productivity, usability and impact of do e user facilities by by providing comprehensive data systems? And it's not just hardware, but it's also software and services that are needed to be able to do this.

A

So Kathy already mentioned the the the new facility. This is critical, deploying this facility in terms of being able to provide both the power and and the space that's needed. So this was just a shot. That's taken so just down the hill, but the retaining wall is in place and and and so the foundation is being completed and so in within the next couple of months the foundation will be completed and and they'll start working on the structure. So our the plan is to to to move in early 2015, so so first quarter of 2015.

A

So in terms of those different different objectives that I mentioned, providing the first was providing usable, exascale, computing and storage systems. As I mentioned, we made nurse 7 a x86 based system, but in terms of nurse 8 that'll be our first pre exascale system, we'll have a pre exascale system in 2019 nurse 9 and an exascale system in 2023. So our strategy is that that this is pretty much what we've been doing in the past, but it's really having open competition for the best solution, so we don't pick which systems we're going to buy.

A

We really look for a competition to see what people propose and try to pick the best of those we focus on the performance of a broad range of applications and not dis benchmark. So our goal is not to build the best Linpack machine, but it's really to build a system that works on our broad range of applications because of the diversity in the codes and the algorithms.

A

We really need general-purpose architectures, so what's what's new is that we want to do earlier procurements, so we can have a greater influence on the design there are these do a fast forward and design forward efforts that I'm very involved with these are collaborations with processor memory companies. These are going to be what system integrators and with with interconnect companies, but we're working very closely with them so that the research that they're doing benefits do applications.

A

There's a lot of work going on in and they were planning on leveraging that as well, and then we really need to begin this transition to a new programming model. So in terms of programming models are our near-term strategy is, is to hopefully provide you some kind of smooth progression to exascale.

A

We want to provide support for legacy code, although that's going to be at less than optimal performance and would like to be able to get reasonable performance with MPI plus OpenMP, at least in the near term. Nurse gate will support other programming models and we're really not pre-selecting those that's going to be based on based on the procurement that we're doing and we're going to support optimized libraries so that, hopefully, people can get some of the performance by just by just using libraries that are highly tuned or optimized for these systems longer-term.

A

We really need to have a broader effort to converge on the next programming model. And-And-And, that's something that we're also looking at is: how can we drive this so that that it's not this? This is probably more evolutionary, but we want to leave room for something. That's revolutionary and much better I mean in some sense. We are very focused on performance, improving the performance of our codes on next-generation architectures, but programmability is really a critical as well as we don't want systems that are that are next to impossible to program.

A

So in terms of transitioning the codes so we're beginning to deploy testbed. So there a number of test beds at risk to help you and with us gain experience with new technologies and to better understand some of the trade-offs. We're gonna have in-depth collaborations, with some selected users we're trying to cover that algorithm space.

A

So so we want to make sure that we have some cover that as broadly as we can and and be, and we want to begin transitioning some of the codes, and so this will really be based on how much usage do they represent, how much diversity and the algorithms is there and then kind of what's the level of interest by those users and making this transition, and so based on that we're going to develop and and other efforts in the community we're going to develop training and online resources to help kind of the rest of our users as well, and we will have consultants available to help with, with with in-depth questions and and issues.

A

So so I think it's really critical to know it again that that all the users will be impacted, and so so, when I get asked well, what fraction of the codes eventually need to make this transition out of those 600 I? Think eventually, all of those codes have to because otherwise you're going to be stuck at today's performance levels. I mean you're not going to see this Moore's law and provement in your codes unless you're able to make this leap to next generation architectures.

A

So so, as I mentioned, we also want to influence industry more. We want to make sure that these future systems meet our needs and and are more programmable and reliable part of that we're partnering with Los Alamos and Sandia on our procurements in 2015 and 2019. So we're already seeing that the larger size of these procurements is giving us more leverage more interests. We had 10 different companies respond to our draft RFP.

A

We want to provide industry with greater information on our workload. There are all these co.design efforts, but in some sense the the people like Intel and NVIDIA are really going to be more influenced by what does overall workload look like, rather than just one particular application, and so so those co design efforts are important, but we also need to be provide them kind of broader information through things like instrumentation and measurement, as I mentioned, we're already actively engaging with fast forward and design forward.

A

We have this computer architecture lab, that's been established by Oscar, that's Berkeley, Sandia collaboration, and we want to serve as this conduit for information flow between computer companies in our user community in terms of our extreme data strategy, we're partnering with DOA experimental facilities to identify some of the requirements and create some early successes were developed being in deploying new data resources already, but our plans are to deploy systems that are really focused on data in the in-between years. So in 2017 and 2021 we would deploy systems that are really focused on data data analysis.

A

We want to provide a new class of HPC expertise. What we would like to do is be enable people to rely on nurse for data analysis the same way they do currently for for a computation, and we really have a unique opportunity here, with es net and and and all the Oscar research that's funded to create some end-to-end solutions in the space.

A

So so so we do believe that, as of now, we do provide unique data, centric resources, examples of that are gene pool and PD, SF and Carver.

A

We think that's going to continue on into the future, and so so we will have these compute intensive architectures that are really that the goal is to maximize the computational density and local bandwidth for a given power, cost constraint and so we're gonna. Try to you know those try to maximize the bandwidth density near compute for data intensive architectures.

A

Your goal is really to get the maximum data capacity and global bandwidth for a given power cost constraint, and so you want to bring more storage near compute or, conversely, embed more compute near the storage, and this also requires a lot different software and programming environments. So people are interested in running databases, for instance.

A

So if you look at the building blocks that are underneath all of this they're they're very similar, but how you organize them into a system is much different, and so so as we go forward, this kind of just summarizes what we were where our current plan is that we would be deploying nurse 8 and in late 2015 nurse 9 in 2019 and nurse 10 and 2023, and then we'd have the nurse data.

A

One system which would be kind of the follow on kind of the natural follow-on to the the data systems that we currently have deployed in 2017 and and 2020 2021, and so in blue. What I'm showing is kind of the the need that's been identified in the in the various requirements meetings the aggregate need the yellow line is the historical trend line for nurse this light blue would be if we're kind of limited by our current budget and power in red would be if we're limited by budget. So it's it's more.

A

What's the largest system that we could buy for a certain amount of money and in green is if we're, if we're limited by power and Oscar's, told us that that we should plan each of the computing centers should plan for a maximum of 30 megawatts, and so it would be reaching 30 megawatts out here, and so this would be kind of the trend line. So kind of the bottom line here is really if we want to get anywhere near what the users have told us that they need.

A

We really need to be much more aggressive in terms of deploying some of the hardware that's projected for for exascale systems, and so we really need to have some active research with with industry to try to push push this curve up, because in principle you know we're never, for environmental reasons and for for just cost, it's very unlikely that anyone's ever going to want to deploy more than 30 megawatts of computing, and so so so being even staying at that, which is a lot we're gonna, be well short, and so so I'll. Just close.

A

What saying that you know? We do have a strategy and a plan for meeting the kind of the ever-growing computing and storage needs that we've identified with the community, and we really want to enable the science teams, with the nation's largest data, intensive challenges to rely on nurse to the same degree. They already do for modeling and simulation. So I'll close with that. Hopefully, that gave you something some ideas of where we're headed.