Node.js Benchmarking Working Group, 15 Feb 2017

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: PGO Deep dive - special benchmarking meeting

Description

https://github.com/nodejs/benchmarking/issues/91

A

Okay: okay, welcome to the special special benchmarking meeting, pgo deep dive. We have canal here with us who's going to take us through the investigation that he's done. I'm saw handed over to you canal, oh hi,.

B

Everyone, oh, this is good knowledge from Microsoft and from last few weeks, I was a studying something MA performance benefit that we get about from profile guided optimization. So this is my like: whatever research I did oh and the results that I have seen, I wanted to just present and or or and know like what people think about it. Oh so, let's start a basically what is pgo pjs transfer profile, guided, optimization or where you first compile a binary or compiler instrumented binary, and what that means. It's like.

B

Oh, the compiler will add patches to the core to the flow graphs of the of the source code, and once you have the instrumented binary compiled you just run through the training scenarios or which will okay. Well, you run through the training scenarios and the instrumented binary records all that information and dumps it into some profile data files. Once your training scenarios are done, you take those profile, data files, you take those instrumented binary and then recombine the binary to get the optimized binary. What that does is like from the profile information.

B

It knows what were the hot code paths that were taken during the training scenarios were executed and it optimizes the code for those core parts. Oh and that's why, like it is like a optimized binary compared to the normal binary, so some of the benefits that we get from pj ways like it enhanced the program locality. Basically, it knows what can what of basic blocks or I jump to and that's why it brings them closer to each other. Oh, there is also like benefit of virtual call.

B

Speculation like you, have multiple derived classes, and you have some virtual calls or from the profile information. It knows like what, for the most apparent derived classes that it called into and they're like can do some like it does some optimization there is like function, inlining, some better register location, because it knows like what kind of assignments you were doing in while recording that scenario also, it knows like the branch prediction.

B

Oh and I will talk about it, a bit on the like in the next slide, and it it provides a it generates, optimized code for overall for those like most important regions of the program, which are like most beneficial for the training centers that we have performed.

B

So, for example, like take a look into like this code, or let's say you have this this earth, where you have flown from A to B, you just go once and then there is a like condition where you just like go one at like in one condition, but like 100 times in twos block sea and the sea has another condition where 90 times it goes in through through the right flow and 10 times. It goes through the left flow and then from EU k goes to F by like 90 times.

B

Normally, if you see, on the right hand, side on the part a 0 in the normal binary, the programs will be 0, though the basic box will be arranged or like this, like ABCDEF, just like whatever it has seen. But if we use profile guided optimization, it knows that it has taken path. Cef or like more than it did from cdf, so what it did is it moved block e closer to see. So your jumps are like not far.

B

It's like near chumps, oh and the infrequent execution, like those dogs unmoved down so, for example, or dog d, has been moved down and that's like one example of what a information, what how it benefits from the information it records, and there are like many other information like what I call it. The previous slide, a lot of information gathering and using that it produces the optimized binary.

B

Also some of the case. Studies like vm, like a asp.net core or in the recently they publish a blog and they use this PTO binary. They saw like time for five to ten percent optimization in improvements in the startup of the like music, app, for example, or chromium. Does that the pockets of blog recently like and talk to her about, like oh, they will be Oh. Making chromium like oh build with PTO binaries Mozilla had that for a while or Python does that so there are like.

B

But this pgo technology is a is a proven technology. Basically, that many like a big projects or want to use their and they use that or to get the benefits or improvements. Oh and yeah and I am like added some link about the PG 00 down below.

B

So with that I wanted to experiment like how beneficial it is for node um and basically what I did was I choose a training set, and basically I use ahmed, a benchmark, a tech empower benchmark or the core benchmarks that node has and the top ten node modules that most of the models are dependent on, and this list I have taken from like I, have included a link from where I have or taken this list and the right hand side seems like all those those ten modules that I use and basically I just ran unit tests of those ten modules and obviously NPM install to install those ten modules.

B

Also, this was my training set and what I did is like I train this binder or produce or pgo binary, using this training set and then measure the performance improvement that I see on egg my air temp hour and core benchmarks. So this is the result that I saw so this is based on the node of like channel as of January. Third, so a nightmare I saw like five percent improvement. Oh taken power, I saw seven percent improvement and there are like a lot of improvements in the code benchmark.

B

Oh I have the entire sub, like the detailed of individual benchmark. Implement, is listed here in this in my juice github, but like just to summarize, like buffer, has like 25 through fifty percent improvement, query string as up to thirty percent or the script. Oh, you did timers. They have up to twenty percent improvement and HTTP I'd, like fifteen percent improvement and that's again like without hand coding any of the things like it. Just like running the twins, it training scenarios and recompiling the note binary with that trading scenario.

B

I didn't have to do anything else and I put see this benchmark or this improvements and there's all this benchmark for were there any that went down? Oh no, like that's all like so in the in the at least light. So basically, like whatever training scenario, I used, I majored then performance on those benchmark. So it could it's very likely that I went seen any regression on those benchmarks, but it is like definitely possible that some other scenarios might have regressed.

B

Oh and that's the point I will, I will feel like I will talk about in the challenges section sure, okay yeah, but but yeah like just to like whatever training synergies you are doing like it is like those are getting bench more like our improvements or the other exits that I did was like. I just use egg mayor and taken power as a training set and they're like I, saw improvement in Tekken import for, like fourteen percent versus previously I just saw four seven percent, so it depends on what training set. You have your used.

B

um Basically, so the challenge is or in this PTO thing is: you need to choose the right training set 0, which means that you want to make sure that you are. You have like improved the common scenarios, but you haven't progressed much the uncommon scenarios like that's, definitely a trade-off, but you want to make sure that the most common signals are benefited by this PTO binaries or the other challenge.

B

Is you need to have a robust automation for executing this training center, because now we are talking like if we are want to do this in CI or basically, what we are saying is that you knowed you execute this trainings and I. You rebuild node and that's where you get the PTO binary. So you want to make sure that all the straining sinners like there are no failures and it has like recovering if there are any failures and make sure that the profile or data files are not missed by when we execute the training set.

B

For example like when I was testing this or with visual studio, like I, saw the sum of the profile data file, so not getting generated and I had to like write some different script to make sure that every like every second, the profile data files are getting generated.

B

So that's like we need to make sure like that's one challenge the profile data files are not shareable across like architecture or platform or bills, so you can't have like just one or like a profile data file and then just rebuild or the optimized binary on different architectures of that. From that you need, you have to do this exercise like for me. They were / every platform.

A

That that was one question I had is like, but could you do the training once and then reuse those files across builds or as you do it for every build I mean.

B

We have to rebuild I would try it for different architecture. I can give it a try, but as far as I know and what I read on like internet or you need to make sure that oh, the PGC files are basically like create generated like separately. Part of the reason is like there are multiple files, at least for windows.

B

There are like multiple files that are produced once you build the instrumented binary and that file is actually used, while you actually build optimized binary so basically like the first build has to happen on the same o same matrix as the second, oh, so yeah so, and that brings to the last challenge, which is like you're. The build time will increase for this, like it will be like twice plus time spent in running the training set, the control yeah.

B

The summary basically just wanted to get is there is a significant performance win most of the benchmark like at least what I experienced or, and the optimization is done by the compiler like there is no hand coding it's just like out of the box, and there are tools available for doing the pjo, like I'd, write on windows and, like folks from internet, write this thing on ubuntu and this he also share the instruction so I believe, like it is available for every every platform.

B

um But the downside is or like AI you have to build I'm or you need to find the right inside and there is them should be like there has to be some. We evaluate the side effect of on other scenarios, so just wanted to get thoughts of the community like how the thing or we can solve this side effect, our this town socket and what they thinkin of getting this binary yeah.

A

I mean I. Guess it's going to be really tough to figure out whether you've got it right or not. Right, yeah.

B

So like what yeah exactly but like today, we don't know like I mean today we don't know how the performance of note is in other, like outside benchmark right.

A

B

Yeah, it is like kind I think, that's why the benchmarks are to basically judge the proper performance on node on the real world right I I be.

A

B

A

What would be interesting for me anyway, would be as if we say we trained it on acne air. You only do we actually see degradation Xin, the micro benchmarks on or if we trained it on the micro benchmarks. Do we see depredations in Acme air, okay, again again tried at exercise cuz? That would say like if we can't find the case where it makes other things worse, ah then it makes you a little bit less worried about that right.

A

Yet, but if we'd, if we run those two simple cases and one is significantly worse than the other than its okay well yeah, we've got to make sure we have a full set right.

B

Yeah and I already have the binary for that is screen just for taking bore and egg minute. So I can use that to test out the code like micro, benchmark right.

A

Yeah, okay, so.

B

A

That would be kind of like an interesting data point. The other one is even just on the build front. You know there'd be a significant challenge. There I think yeah, because today we you know we I, don't think people are gonna fly to double the build time like a regression testing mm-hmm. At the same time, that's all the testing we have like when we go to do a release.

C

A

We launched a bill now, we'd have to see in the release herbs. If, just in the release run, we could be willing to double the time or not. Okay,.

B

Oh I see also using like just while releasing generator optimized binary instead of Lexi I yeah, except that yeah.

A

It's not really the way that yeah exactly right, like the thing is the current see, I jobs for releases, don't actually run any tests. Okay, but they build again. You know so I believe what we do is we build and run the test against a particular commit uh-huh and the regular CI, and we say okay everything's good. Then we run the release job, which will just create the binaries from that tag. Okay,.

B

A

We somehow you know, we need a job, we need a way in the job to say: okay, this time run it with everything, so they would do it twice now that would potentially twix take twice as long and then when they did. The release they'd also want to flip on the end. Okay, let's make it.

B

Yeah right I mean yeah, like definitely the building is doubled or plus, like it like time for the training scenarios rates. So, oh, but again like that depends like if we are like. If we see that there are improvements, for example like if we just train with that mayor and see there are significant improvement in micro back up, then it might be a word even a try right.

A

Yeah I yeah I'm just trying to think how to like I think first yeah we'd have to see. Is there I mean I, don't know if there's any way to really mitigate that doubling of the time other than did like it's too bad. You couldn't do the training once and then reuse that training data, even if it wasn't quite as good right yeah, but it doesn't sound like you can do that right. Right.

B

Because yeah, it basically uses the some of the files that would produce in phase one right and.

A

Those files change every.

B

Time and those lines yeah because your code has changed so it has to rebuild.

D

Yeah I guess I it's they certainly seemed interesting. I've got it. I first thought I'd quite binge at the quite interested in seeing the data that you've collected for acne air.

B

D

The other thing is that I mean if I was taking node I, think your data that you've shown suggests that if you train node on one particular set, we then seemed to see an improvement in the performance of that benchmark. So, for example, if you train it purely on acne air.

C

D

See an improvement on acne air. Yes,.

C

D

Was running node in production and I would want it trained on my production software, whether towers running so.

C

D

If there's a way of us being able to put something together to make it easy for someone to download a performance, enhanced version, for example, which gives them the tools to be able to create their own optimized version specific for their production code. Yes,.

B

Yeah I mean like that's yeah, that's one or possibly deeper you won't. You don't train for like egg layer or something but user can use right tools provided by node community to build up like optimized binary folders in our oh. Yes, what are the external dependencies?

B

Well like it? There are no external dependencies. There is like all you need is like, for example, for Windows. It needs like videos, video tools, oh I, know it's what you need to do.

A

The regular build, what's it yes, yeah like I'm, wondering like you, know, thinking along what what Gareth is mentioning they're like the first step, might be to try and integrate into the make file a target which would basically build I guess the thing is, it would have to build run. The tests run the training things and then.

A

Rerun right, like I'm, just wondering if there was like a target if there were targets that were in the make file that basically said, make the training make thee. So, first of all, you have to do a make and you have to specify basically like get ready for pgo or something right. Yes,.

B

Yeah, like phase one itself, you need to change or like one link time flag. Basically, in chimp-like there is a like there is a flag right.

A

So yeah, so you do that. Then you build node. Okay, yes! So now now you don't know, then you run your benchmarks.

B

We run the benchmark and then again she's the flag, saying that okay, now I have run the trade the benchmarks. Now it is time to generate the optimized binary so that flag, actually else to use the feet the profile data files to produce the binary. Now, let's book those profile, data files, where do they get generated they get generated in like are the same dish when no DX is exist. Okay, so those could be copied, though right, yes,.

A

Don't I'm just thinking like if, if we, if we had a tag that says you don't make tto enabled mm-hmm, then you would then you would need to say okay now you can run your things. Then we would want to say like make dist with pgo packaging or something right.

C

A

Those two extra targets might be useful and then I guess you know we could see that you could have I, don't think it needs to happen. Actually, at the same time um we could separate. We have a job that goes, you know, so so new release gets published right. It's a particular hash. You could then envision a job that would go in the background and say out. There's new release I'm going to build in this pgo enabled mode. You I'm then going to run this set of benchmarks and.

C

A

I'm going to do to make dist in this particular configuration. Yes, now the the the the wrinkles to that is. You know we do have our release machines separated out mm-hmm, so I, don't think we'd want to be running like all the benchmarks and all that kind of stuff on those machines right um there just aren't as many of them that you know and all that. Well, you know it might be something like we would run them on the other on the regular farm.

A

Then we would copy the files / Getty, I guess the problem with that is is if those machines may not have the same OS as even as the other ones. I. Guess that probably matters to write like I.

B

Mean as long as it can be like windows, for example, yeah notoriously or Windows, that should be fine, so.

A

But, for example, like you know, in Linux, we may test across fedora other bunch of things, but we build on centos now, I think. Maybe we do test at least have some test machines that are at the earlier levels as well. I just know our release. Machines are at the earliest levels so that we actually get. You know support across the different versions. Yeah.

B

I mean yeah. Definitely the Blues have to be compatible, I guess, for example, if you generate a profile data files with of Visual Studio 2050 I, don't think you could be able to build optimized bindable, two thousand six to 1030 right so but yeah, um but definitely like moving the files or is possible and that's how I did like a yeah.

A

B

I have created the instrumented binary and then I, just obviate of like run the training period-- lose the data file and that's it right. I don't have to build up instrumented binary get in here. So.

A

Yeah, okay, so I mean there'd, be a bunch of wrinkles and trying to get that process in place. I, don't think we I, don't think I just can't see us integrating into our regression tests and even into our standard releases. That's going to be harder, but there might be like an alternate flow where there's a second set of binaries, which would be okay here.

A

The pgo enabled binaries there'd be a bunch of work related to that, but I think I think the first step of actually making it easy to with make targets to be able to do the steps isn't a bad thing, because then, if somebody wants to try it out, they can do that and, like guerra said, if they won't, if you want to build node yourself and optimize it for your app, that's not a bad thing. Mm-Hmm, and then you know once that's there, you can.

A

You know if we have enough people were willing to actively work on it. You know we could work through the steps of saying okay. How would we do an offline sort of flow to generates and builds that people at least could try out with to start with, like here's, the optimized binary versus the unoptimized right right.

B

And yeah I mean like I'm, like I'm excrement image, like some other like partners within Microsoft, using like a node and recently like I've, shared them, the binary demise, or should optimize and I just can't afford their app yeah and yeah I haven't heard back from them yet but yeah. It's just like yesterday.

A

So, okay and I think a few of those other. Like the experiments of you know, can we see it where it's actually made things worse, which I'm sure I bet it can it's like? How likely is the question yeah.

B

Yeah, even um I, don't think it would make it progressed. Oh, but yeah the egg mayor exercise will be good to verify that so again yeah. I will definitely try out like training with the ACMA and then see it's education on code benchmark, yeah.

A

So yeah a temple, it sounds interesting. I think the challenge I see is the amount of work to get something that you know you'd be able to give out to the community easily write for, says the. I guess it's like a five to ten percent performance benefit. It sounds like yes,.

B

Yeah I mean so, for example, this is the like most Oh, exhaustive printing, say, for example, yeah and yeah. There I solid minimum, like 50 jump in a nightmare right, okay and then in code might one day was a hello to ventilate go by increments.

A

Oh yeah, so those micro benchmarks but I guess they're they're fairly specific right, like yeah, so I, don't know how representative that would be in real life jun, that's cool yep,.

D

So yeah yeah, Batman I think there's probably still quite a bit to learn, especially even if we're only looking at the sort of optimizations that are being applied. There could be areas of node that sir pointed out, but as this is an area that isn't normally up tonight, optimized very well, but perhaps that could be sort of code changes made to also sort of implement our exploit these sort of optimizations in a normal binary. If there is something that we notice that yeah.

A

D

Consistently improved by winning the PGA.

A

That's true to its an interesting like you're, basically saying it's a way of pointing out where you could get better by making the code better.

B

Yeah I mean that's the thing like I'm, not sure if you can visualize or what things it moved around, for example, what functions it in line based on the training scenario again.

C

B

While building it does call out like okay, it has optimized X number of functions or for speed, but it doesn't call out like what exact function so yeah I can take through and see like. If I can and get that information out, and then we can just like hand code and like optimize those function. Yeah.

A

Cuz I thought becoming outfit with if it could even yeah. That's an interesting one where it gives us the pointers of where to optimize to its kind of quite interesting. Okay,.

B

Yeah I can I can pull up down there, but.

A

It seems to me, like the the first. You know, concrete thing too is I. Don't see any reason why, having a make option to build in that mode, wouldn't be a good thing.

A

You know if that way, like Gareth was saying, if you know, if somebody wants to build it themselves and they can turn on a few options to be able to do that. That sounds good right yeah and we want those you know if we were going to if we were going to use it in the regular binary production. We would want those anyway right.

A

Correct, should yeah.

B

A

C

A

So sounds like there's some decent next steps, hmm any other things we should talk about. Do you think or discuss like yeah.

B

I mean yeah, that's not like good. This was like just a first step to see like Oh what you people think about it, Oh, based on the results that I've seen. Oh and you are definitely like it so like from what I hear it's a good thing to try out for next steps and the only pain point I guess is to finding the right training set and the paratime oh yeah. That's.

A

Sort of that the training set to convince yourself that it's like to give you the right answer and then the logistics of actually building something.

A

What I mean I think I think if we got the yeah we're completely convinced that it's always going to give us like a ten percent win, then you could get to the thing they say: okay! Well, you know how much effort are we going to have to go through in terms of actually getting binaries? That will give us that reading I mean I I, think, like the one I just walked through, where it's basically an offline process.

A

You know we deliver binaries, but then behind the scenes you generate a second set of binary, so people can choose from the regular ones which are always available first right or the optimized ones, which may you know, take a little longer to get generated. Mm-Hmm.

C

A

Might be a way to sort of factor out the the timeline. You know the extra effort, because you'd only be doing it once for each release, right sure and that's assuming all the all the tests till past and everything, but hopefully that would normally be the case. Shh.

B

Yes and yeah I mean yeah and fought like a train. Coming back to the trainings and I were like today like the way the load performance is like cutting determined is obviously by the micro benchmark and like acne and, like other benchmark, shows up. Oh yeah.

A

We have those, you know we have the about, knowing that we run then track nightly. But right, you know, those are the ones and we and yeah we hope to add in others like me, and the new Intel benchmark of stuff to get us broader coverage.

B

As we can right, because that's like I think here, like I've, been through that benchmark- and it is like more realistic because it creates like lot of objects versus Achmed, which is like it doesn't create much object. So that's like definitely a realistic 10 but like if we can see improvements with people on those benchmarks and also micro. By targeting, then it's like both pursuing all those I guess, yeah.

A

I think so, yeah I think it long term. It sounds like a good. You know. Ten percent is enough benefit that it's worthwhile. It's just work to get theirs well, right, yeah, so yeah. We need to build out our benchmarks and then you know, but I. You know. If you have time just to invest in it, it seems seems worthwhile, okay, okay, so.

D

Yeah, like I, like.

A

The idea of definitely making a make target, so you know that's a concrete thing that can help people if they want to try it out and stops right.

B

Okay, yeah, so I think, like a lot of things. I should all be looking into is like seeing the application of like one training scenario on like code benchmarks, yeah 10, seeing if the the PDO can spit out the optimization detected so in hand give that yeah I think that's also yeah.

A

That's also interesting yeah, so.

B

Yeah I think, is there anything else that you want to look into. Oh I think.

A

Those are the good next steps and then, once we've learned what we can learn from those we can take whatever than thing is: okay,.

B

Well, yeah really once I have some data, then maybe again click update this issue or leg again you and then we can schedule next meeting yeah.

A

I mean I, think I, say, update the issue and come to the next regular benchmarking meeting, and we can just sort of get our okay. You want to come regularly and give a give the rest of the people an update on where you're at that would probably be good. Yeah yeah. That sounds good.

A

Okay sounds good, so I'm I'm just going to see their shows as to viewers um I'm gonna do is I'm gonna, look on no dev, so let me just- and so, if you, if you go to no dev and ask a question I'll just check to see if there are any there I'll just take me a second to login.

A

So yeah just give me one minute: I'm logging entire sea. mmm I look on.

C

A

Dev just to see I, don't see any questions there yet, but will give people just a minute or two in case okay, so there are well. What is it really? I it's it's the IRC channel, the no dev our IRC. Oh, ok! Ok! I just thought that you know that's one way. One way people can ask us questions I'm, just trying to think. Is there any other I, don't know if through YouTube, there's a way for people to ask questions.

A

I'm. Looking at the sea open, there is a YouTube live chat which I've opened so I, don't know if people can access that directly from the outside. That's another way that you could ask a question: ok, you shadow link below them. Oh I'm, not sure it's just a link, it's it's! Basically, you have to join no dev.

A

Ok on IRC I. Can here's the link for the YouTube chat? Ok,.

A

So if you go to 58, you know if you need an IRC client be on freenode and then you can. It's like hashtag, no dash death. Ok, ok,.

A

But I don't think I see any questions.

A

So I think at this point we'll call it a day, and you know if anybody has questions, they can obviously raise an issue on the repo and won't you know we'll get or this issue come from this issue or show up at the next benchmarking meeting gas questions really.

B

C

Ok, ready thanks talk to y'all yeah I, like.