Node.js Node.js Live Events, 26 Jul 2016

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Gareth Ellis - Node.js Live London

Description

Gareth Ellis, who is a runtime performance analyst at IBM and also a member of the Node.js benchmark working group, provides a brief introduction to benchmarking and performance testing, the different approaches you can take to performance testing, and what to do if you identify an aggression as well as the tools that you can use for benchmarking. He also provides an overview of what is happening in the Node.js benchmarking working group.

A

Right so most of this slide has just been said, but, yes I'm. My name is Gareth I work as part of the runtime technology team at IBM and I've been performing of nodejs.

A

So what we're going to talk about today, then I'm, going to start off with a brief introduction to benchmarking and performance testing, going to look at some of the key challenges, different approaches that you can take to performance testing and then what to do. If you identify a regression how to find out what it is.

A

That's cause that we're going to look at some of the tools and examples of using those tools on how to benchmark and look at the performance and ogs and then finally, I'm going to give a bit of information about what we've been doing in the benchmarking work group.

A

So an introduction to benchmarking, then one of the most important things when benchmarking or performance testing is to change one thing and one thing only between the different runs and the different things that you're comparing now. Typically, the thing that you change will be whatever is you're wanting to check the performance of if you're, wanting to check if you're, the latest version of your application code performs the same or better than previous versions. You'll be just comparing the old version of the code with a new one.

A

Likewise, if you're wanting to upgrade to a new version of node and you're wanting to make sure that your application runs the same or better than it previously did, it would be the copy of node, the big comparing and changing it's a very easy habits fall into trying to change loads of things at once so pulling in the latest version application code and then also upgrading your version and out and may as well move everything them to a brand-new machine and also pulling the latest versions of other dependencies.

A

But then, if something does go wrong, it's going to be very difficult to try and work out what it is. That's causing the issue. It's worth mentioning that performance testing is quite different. Functional testing, whereas in functional testing, typically you'll run something a number of times, and if it works great with performance testing, there is not any single one answer that you'll get out of your benchmark, and this is one of the key challenges.

A

So no matter how many times you've run your benchmark, chances are each time you run it you'll get a slightly different answer, whether that be you're measuring the startup time, the chance of it's starting to the same number of milliseconds. Each time is fairly small. So one of the things that you need to be aware of is sort of fundamental renter and variance.

A

This can sometimes lead to false positives. If you've got a theory about something, that's going to happen, so you might think, oh, my new application code is definitely going to run faster. You might just do one run of your old one, okay, that took 200 milliseconds right when you won that takes hundred ninety brilliant. It's faster job done, but typically you'll get a large range of results. So it's very important to ensure that you run your benchmark a good number of times to give you an idea of the sort of expected variance in your scores.

A

It's also something that's worth mentioning that if you go and run say ten times and you find you've got a fifteen percent difference between the lowest and the highest number, then that could well mean that you would find it difficult to measure say a regression or an improvement of, say five. Ten percent to be able to measure that sort of thing you'd need to run it a good number of times, something that can help, try and reduce the variance would be making sure you've got consistent environment.

A

So, each time you run your benchmark or test your application code, it's good to try and have the machine that you're running it on in it in the same state. One of the things that we do is we try and reboot our machines before we do each run. The longer machine has been up, you'll find that the performance may change slightly and whilst it might be quite good to Train, keep it as close by keeping on the same boot.

A

If you then have to reboot your machine, save you installed a kernel update or something like that. You've then got back to a state that is going to be very difficult to get back into your state that you think you're normally in so we found that rebooting the machine before each run at least guts back into a position that we could easily we create.

A

Something else is making sure that the machine is isolated from outside interference, so that could be making sure your co-workers aren't logging into the machine that you're doing your testing on and running things that is taken away. All the CPU, if you benchmark also is using network trained and having a private network. So you make sure that somebody else is in transferring a big file across the network and affecting your scores.

A

Something else that you can try is interleaving the ruin, the two things that you're trying to compare. So that would be doing one iteration of your say: good, build and one iteration of the build that you're testing and do and alternating between good build and the one that you're testing. The final key challenge is jumping to conclusions. It's very easy to, as I said before, to say: oh I think my recent code change is going to improve startup time.

A

Do one test find its improves good job done, but actually you need to carry on and do a good number of runs to make sure that you've got confidence in the data.

A

Two different approaches you can take towards benchmarking: first, one being micro benchmarks, and these can be quite good for measuring a specific function or API change, for example, for example, creating a new buffer, it's good for comparing key characteristics. However, there are some downsides to micro, benchmarking as well.

A

The first thing is that, even if you manage to improve your micro benchmark, so it runs ten times as fast that may not actually translate into real-world improvements when you check it into your big application, if it's something that isn't actually cold all that often it may end up that it doesn't make too much difference at all second thing, especially where a jit is involved such as v8.

A

Is you risk not measuring exactly what you think you're measuring, so v8 has an optimizing compiler, and yet that's looking out on ways to improve your code to make it run faster. It may notice that some expensive operation, for example, assigning a variable from some of the function, and you don't actually ever do without variable, so it might just take that away completely and you think brilliant, it's really really fast, but actually you're, not testing. What you think you are.

A

The other approach towards benchmarking is a whole system, benchmarking, so benchmarking, perhaps an expected customer use case or a larger, larger benchmark for something that we use in the community is Acme air, which is a fictional airline. Company and users can simulate creating themselves account booking on flights, checking in all that sort of stuff. This does have downsides as well as the more you test. The more code involved, there's more scope for variability, so you've run a good number of tests and you think you found a regression. What do you do now?

A

First thing is to check: are you sure, have you definitely not missed something and actually you've measured somebody logging in and running something that's a pin away or you cpu have a look at the expected variance is if you've got a variance of ten percent and you think you've found a one-percent regression. You need to be quite sure. Otherwise, it's going to be very difficult as you go through trying to make changes to detect whether your fixed it or not. If you sure there is a regression, then you can have a look at what's changed.

A

Is it your application code? Is it node? Is it that you've upgraded to a different machine? There's a few different things that you can use to try and work out what the cause the regression is. There's various tools which we're going to look at in a second second one would be just doing a binary search of the code. Changes between your good and your bad build that's useful in some situations, maybe not so useful in others.

A

So, if we're looking at nodejs, we need to understand that there's a lot of places a regression could come from. It could be perhaps a change to some of the native JavaScript libraries. It could be that you've just upgraded to a new version of v8 and oh and so a new version of node that is pulled in a new version of v8.

A

It could be also that new version loads pulled in and open ssl security fix, which sometimes can leave, can cause performance regressions because you may not have been doing everything that you were supposed to be doing before it could be Olivia the update it could be that you've pulled in a new dependency or an updated module that has had an adverse effect on your performance, some different tools that we can use. So we could use a JavaScript profiler, there's one built into node through v8, there's also other external packages.

A

You can install, for example, at metrics. You can also use native system profilers, / t prof, o profile, they're all options.

A

So as an example, this is a macro benchmark that we've been running IBM and it simulates creating a buffer from an array of numbers. We go in repeat this operation a large number of times and then run it through a test harness which tries to either get a good quality data or until the maximum number of iterations that we've defined.

A

When we were testing the new node product for release, we found that there was a regression, so the first few lines are from node 4 3 2, which was the previous release, and we found that we could get ten point six operations per second, when we upgraded to node 4 dot 4.

A

We found that when down to six operations, a second which is about forty percent regressions, so it's a fair amount, as I said before it may not actually translate into real world benefit in your big application depends how many times you're creating buffers from arrays, but it's something that was worth investigating, and so one of the things that we could use would be the v8 profiler.

A

You can just run that by using minus minus prof on your node command line, it will then go and generate a file called isolate some hex and then dash v8 log in your current directory. You can then use the post processor, which is built into node by passing minus minus prof dash process, and then this isolate log that it's created and it will get get you a load of different bits of output at metrics, as I said before is another option installable from mpm.

A

You can add lots of different hooks and different monitoring agents and also include them in your own modules, and if you wanting to find out anything more about metrics, we've actually got our metrics architect with us here from IBM today.

A

Another option of going Traynham find in the regression and said with binary chop and it's all well and good. If you've got a small number of change sets to go through.

A

But if it's a massive change, for example, you've just upgraded from node 0 to 8 to node 6, then there's going to be a lot of changes to go through and try and work out what's caused the regression git bisect is an option which could help you if it's, if it's in get the change that you looking at so before, I showed you the VA profiler, so a minus minus prof. the profile at the top here is from node 4 3 2, and we can see the hottest method.

A

Is this lazy compile of from object, and it will give it gives us the line number in the native JavaScript library in buffer? Is we can see that we're spending about twenty three point? Nine percent of the ticks are happening in this lazy, compile when we go to node 4 up for that jumped up to forty seven percent. So that's certainly something that's worth looking at. So the lazy compile is part of the compilation, so it's not necessarily a regression in from object.

A

It's the compilation, that's taking a lot of the tix perf, the system profiler, you can do a similar sort of thing. So there's a massive number of options. You can pat ikan pass, which you can just get from the perf man pages and bypassing in minus minus perf, basic prof onto your node command line and that supplies / with the v8 symbol. So it can match the jetted or compiled code to what it actually was. And again we go with this example.

A

We can see twenty-three percent of the time in node, four, three, two and forty six percent of the time in node voted for so we're seeing something to do with compilations, so some extra options that we can pass in again. These are v8 options that we can pass in straight to node is tracing optimizations and also d optimizations, and this is what happens when we go and do that on node fought for so we can see first of all, that the profile of spots that from object is a hot method.

A

It's being cold a lot of times, so it's very hot and therefore worthy of being compiled, so it goes and compels like using crankshaft. It goes into some further optimizations completes the optimization and then straight away goes and D optimizes it. So there's a good chance that this is what's using some of our time after a issue and a pull request.

A

It ended up that the problem was in node fought for we'd gone through and changed all of in all four loops we'd gone and changed the stepper variable to be declared with let rather than VAR, which the current optimizer has an issue with in v8. So this will be fixed when turbofan becomes the default, but I've not note for that, for it isn't the default.

A

So we went reverted the change and got the performance back, which is good, and the next bit I want to talk about, then, is what we've been doing in the node community, benchmarking workgroup so the workgroups goal, or we have a mandate to track and evangelize performance games between node releases. So we've been defining use cases. So, where is it that people have been using node and what are the areas that we can be looking at to try and get as many real-world examples of people using those?

A

So we can make sure we're looking in the right places for any performance. Regressions we've identified some some benchmarks, but we still have more to ident fine. Indeed, if you're aware of any you more than welcome to Quinn, raise an issue or even pull requests can submit some more benchmark. So we can be running those in the community on regular build, so we can spot any regressions that may be coming in, which means that you don't end up deploying them in production and then having some issues.

A

We've been running and capturing the results, so you can go to benchmarking. Nodejs dog and we've got a set of graphs of the current benchmarks there and the results tracking, node 0, 12, node, 4, node, 6 and also the master branch. We've currently got 13 members and we have meetings roughly every month month and a half something like that.

A

We do the meetings on google hangout so you're more than welcome to either dial in join in or just go on to the YouTube, live and watch the meetings or listen to the meetings.

A

Here are some of the use cases that we've defined so far, so the first one would be back-end API services, so this would be rest or rest like. So this is typically over HTTP in public infrastructure. Their main focus would be trying to ensure that they can get good performance over public infrastructure where things such as latency and bandwidth, maybe concern service-oriented architectures.

A

So this is where you may have an entire function built up into a single API and there could be different. Api is curling on these typically used to provide private api's, and it could also be quite a lot of overlapping.

A

The next one would be micro service based applications, so nimble, low resource, quick, startup, apps and typically these sort of things may also use some different types of networking so and perhaps UDP to try and get stuff to happen as quickly as possible, and we want to make sure that we can try and track these that we tell you, don't, go and regress them in node and Jen rating and serving dynamic web page content. So things such as express happy khoa react all this sort of thing very popular frameworks.

A

We need to make sure that we've got benchmarks that cover these, so we don't go in checking changes that could potentially affect lots and lots of users single page applications. So this is typically where the main GUI of a application is served via HTTP request and then further updates are done over either web sockets or HTTP, two agents and data collectors.

A

So monitoring and managing systems little applications running on system keeping an eye without trying to use too many resources and then finally, at small scripts, so a tool in focus sort of things that we need to have very quick, startup time, low overhead and also quick to go away again as well, because typically you're going to be starting doing a little bit of work and then going away.

A

So things such as low CPU and low memory usage here are key and again you can go and have a look at those in the benchmarking work group with a bit blunt detail.

A

So for all of those use cases and there's a number of metrics that we probably be interested in looking at consistently low latency ability to support high concurrency high throughput, fast startup, shut down time, restart and also low resource usage benchmarks that we've currently got running in the community, and so we've got some are tracking startup time. We also look in at the footprint of a small process time to require my jokes: that's something that lots of people are going to be hitting. We have acne air running, which is throughput.

A

We look at the response time and also look at respond. Sorry footprint measurements, whilst the applications running it's all well and good, if it's very small, once the application gets going, most people are going to have quite a bit of a load applied to their application, so we want to make sure that it's not growing out of control. We've also recently checked in a docker file, so you can go and build your own docker image, which will compare two versions of node and then throw out a comparison at the end.

A

So I dared you to have a look at that. We've got a number of other benchmarks: impressive progress as well looking at the performance of URL and also trying to be running. The benchmarks which are in the nerdy is build sauce and actually graphing. Some of the results from that, as I mentioned before, a benchmarking, no gesture, ugh we've got lots of graphs like this.

A

This one is where higher is better, so we can see there at the top that the purple and the blue, which are node, 6 and node master, are faster than previous releases, which is what we want to see. There's lots more graphs on benchmark and don't know, gesture ugh so dared you to go and have a look at there and and then finally, how you can get involved. So, as I mentioned before, get up calm, / no Jess, like benchmarking, go and have a look at what's going on.

A

If you think something's missing go and raise an issue raised, a pull request come and have a look at what we're talking about. We want to get more people involved, get more benchmarks and ultimately make sure that we don't regress the performance in Oh Jess. So thank you very much.