National Energy Research Scientific Computing Center (NERSC) New User Training 2018, 20 Apr 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: New User Training: 09 Parallel I/O

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Hi everyone, my name, is Jenny I'm, also from the task group today, I'm going to talk about para dial, so I will tell you a little bit of what is aisle and some common tire issues and a nurse and also show you what is the typical hpc I'll stack and talk about a little bit about the performance versus productivity? That's some relatively new, also I, last one slides about one case study which is as seen as mysterious IO and how we solve their own problem.

A

So, first, what is I? Oh, so I Oh can be some pretty. Some beauty like an Indian, Ocean and IO can help you help you make more money like not a lot of startup choose IO as their domain extension.

A

There are millions of websites choose dot, IO that good question, that I exactly means input and output similar with what we mean here and interesting whenever I came across the border to come into the country, I was asked one question: what's your purpose over your chief and I I told the officer I'm going to I'm coming to this country to do my PhD studying IO and then the officer asked explain that to me. So I put that on to that to his keyboard and I said this input, and what do you see from the monitor?

A

There's the output and what is parallel I/o, clearly how much for keyboard and much more monitors. So that's all I believe past the border so clearly io. Cannot it isn't that top-secret, like a nuclear technology that can stop me from coming this country. Also isn't something get that can make you money, but why do we care about IO? So here at least some come some common io questions are common Iowa issues. So, first, today, user will ask hey you.

A

You guys claim that your system, curry that's a pig back then with around 700 or more gigabytes for a second, why I could only get one percentage of that so there's a really common question and and secondary scalability, so I. If you get a little bit of I/o knowledge, you know that, with more I of processes, more storage servers, you potentially can skill your application to a larger scale, but why the performance isn't actually scalable and insert the metadata issue.

A

By closing, mostly like a dis omission there we have limited number of methods, server that becomes a bottleneck of the meditator performance and, last but not least, the pain of productivity. So users came to us and said I like to use and try Python spark or tinder flow for my data analytics another the data analysis job, but why the iessons is slow. So those are the typical and common io questions and that's why I think that before I really show you, how to optimize IO I think is important to like explain the complex.

A

Each HP CIO stack why the aisle cam becomes a bottleneck.

A

So there are two major factors that really bring this iope issue. Firstly, we have complex, HPC I/o stack, and we know that we have some hardware in downstairs and when we we know that we can run our application in the system. So between the two layers, there are bunch of iOS deck, io middleware, so including parallel file system.

A

Lisa also mentioned that and IO middleware like MPI, oh and also high-level IO library like each day, five, nine CDF, audios and also now these people start to use more Python, which offers productively a productive interface like H, 5 PI. So all those layers come together to serve your application and run on this part of hardware federal file system. So this is really complex and whenever your application issues in data requests impose, our I would request.

A

It has goose has to go through those hierarchical stack and any thing happens in the between can slow down your I/o. So that's one with a major issue that IO is not free will and sometimes will be bought bottleneck.

A

So the second major factored is is the difference between human machine in which how we describe the real word problem so for scientific data. Typically, we we try to describe the model close too close to the way that we are familiar with, for example, for climate science, we have 3d data right and latitude longitude and hate. That's how we describe the weather or climate, but Hardware on the hardware layer. It only can understand bytes.

A

So that's the difference and mismatching between the application and hardware and those are the two major factors that really bring some io issue here, and that's also why we need a lot of middle layer to preach the mismatch in between application and the hardware.

A

So here's the secret of slow I/o, so looking at the at the disk level, so we often will hurt but continue iö and non-contiguous io. So what does that means? So look from the disk level.

A

When we talk about contiguous IO, it means the data are located closely and whenever you want to read something imagine you have an image and when you store that image onto disk, if they start in the row-major by contiguous IO, we mean that we can read from the first file integrally contiguously first row second row and the same in the surgery in case of non contiguous IO.

A

That means you want to read the image column by column and without in such kind of a non-contiguous IO pattern. So in this case the disk were how to read the Sun block and then jump I think jump to another block and it will shake a little bit that causes some latency. So in this simple calculation you can see there. It's quite dramatic difference between contiguous IO cost and non contiguous IO cost.

A

But that story is totally different down SFC because they have no moving parts, but I think I will focus on the traditional HPC hardware and next next session. Wahid will talk more about first buffer.

A

So we did some study and we think the I/o challenge will become more severe in the next few years in 2020 or 2025. So here's the simple data. So if currently, your application produced like 10 200 terabytes of data so 2 years from now that application can produce 3 times more data. So many years from now it will be 22 times more of data. So, given that huge amount of data how to efficiently load the data into memory to continue your data analysis, that will be challenging so.

A

The file system came szene to really leverage the multiple disk, multiple object, storage servers that can bring some parallelism and performance to your application and a nurse we used Laster and gpfs. Those are also those are the really popular parallel file system across our those HPC facilities and that's a layer right on top of the I/o hardware and.

A

So this is that this diagram shows our current architecture, so we have a curry King as well and calcaneal those two partitions both connect to the sink all night router, which is 130 servers that our net router can redirect your I/o requests to the storage to through the parallel file system master and totally. We have 148 object, storage, servers and also additional information. We can try to spread your data on more object, storage servers with the simple command like stripe. Large dots means your current directory.

A

You can specify your data to be stored on seven OST instead of just one OST and to check those the striping information you can simply type our FS gets drive that to get the current striping information, we are currently directory.

A

And when they are on top of a parallel file system, is the I/o middleware, so I omitted. Where comes in, to really bring some optimization and I was scheduling and exam to speed up your I/o, for example, in MPI earlier they provide a collective aisle and non-block non-blocking I/o by default, so application typically will use just independent IO in which all your processes from the application we had to the I/o by themselves without any coordination.

A

And if we turn on the tractive I/o all the processes before the actual access or read the data, they will start communication. First, they will share their access information to to figure out the optimal way to access the data and mostly attractive I. Okay, optimize, your non-contiguous IO. If you obsess the data, Duncan Duncan do grizzly the intern on track to I/o can sometimes bring the performance benefit, but still sometimes we will use. We won't use independent Iowa, because craig-carroll can bring, because there is a communication phase. It can cause some synchronization cost.

A

And so on, top of all I will meet a where we have this high-level IO library. So I would say that that layer is more close, it's more closer to to human being through our data. So, for example, hdf5. Whenever we talk about hdf5, it's not just IO, it is a data model so that Lear can help. You describe your problem easily and also can manage your IO manage the IO for you. So you don't have to learn a lot about the MPI.

A

Oh details: how to turn on collecting oil, how to figure out the layouts and tell the file view something like that. You can just use hdf5 to focus on your data and your problem and the isle part can be well handled by hdf5 internally.

A

And from by using hdf5, we want to try parallel aisle, you simply add those red lines and that can immediately change your serial code into a parallel code.

A

And last is about productive interface, so you may hear about spark and is really a big data framework and tensorflow, and you can try that use that for your deep learning application and so on, so for using those kind of a productive software and you have to pay attention to their IO interface and mostly, we provide some recommendations: photos, productive software, for example, each 5pi, just the productive interface at the Python layer and also tensorflow aisle. We have a nurse we have people working on that, so it will be released soon.

A

Good question so mostly text file or other file formats that popular in commercial world are well supported in the existing software, but for running that on each PC environment. We want to leverage the profile system in order to do that, we how to leverage the parallel I/o and typically do like a test file interface I will plug in it. Isn't we are supporting those parallel I/o feature, so we typically try to convert those text file into the hdf5 format that we can. We also call that just because that we have some nice parallel I/o interface.

A

On top of that.

B

A

Don't understand so York so.

B

She should they stopped using a web stream and.

A

It depends so some people say we don't have any Iowa problem, because their data is tiny. They can immediately load that into memory. So really, in that case you don't want to bother with parallel I/o and turning on NPR I/o. Something like that. So in that case probably stick with your what you have, but in case you will produce 100 gigabytes like even terabyte, so you really want to think about the file format and the I/o piece will work.

A

And here's the some exam before using it by PI and the motor neon crack vial with one line of code with data center collective. Then you can really leverage and benefit from the collective aisle and some coding compared comparison left is a H by PI. Right is written, hdf5 C, and you can see that the first a few block map to the entire page of the right and then taking the few a few lines point to another page and the last a few lines map to another asserted page.

A

So you can see in terms of coding, effort writing in H, 5, PI or Python can really be productive, but in terms performance, the question is again is that when you gain some productivity, how much performance you would afford to lose and we did some study. We found that most cases, it's okay, the Python layer, doesn't add too much overhead to the C layer, but in some cases in case of a metal heavy operation or communication involved operation, there is some performance in loose in using the productive interface last slide.

A

Okay, so it's bad a scene as IO escena is a Astrophysical code and used wide in wide range of problems like interstellar, medium star formation and when the user first came to us, he asked I want to know how much I always taking in my code I want to see. If there is some eioped button egg, then I used the darshan. So that's also something we recommend for profiling.

A

Aisle I simply provide his code and showed him this plot, and we found that 40 percentage of his code is doing IO, which is useless, and for years his code is wasting the time wasting his time in doing those input and output, but really the cure by the analogies and character care about the science right and so I made sorry. mmm Then later we also figured, we figured a little bit details about about this cause. I owe patent. We found that every few seconds his code produced a tiny hdf5 file.

A

The IO patent is what we call non teenagers and the number of percent is its thousand, and then we try to turn on the crack-tip IO, and here is what the user emailed us. So the user made that change and he found a extra solved his problem, so the ioad take 40 percentage of time, and now this is there okay, thank you. If you have any question, feel free to email, our consult, look at this website and can easily find us. Thank you.