Cloud Native Computing Foundation KubeCon + CloudNativeCon North America 2019 (San Diego), 22 Nov 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Networking Optimizations for Multi-Node Deep Learning on Kubernetes - Rajat Chopra & Erez Cohen

Description

Join us for Kubernetes Forums Seoul, Sydney, Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

Networking Optimizations for Multi-Node Deep Learning on Kubernetes - Rajat Chopra, NVIDIA & Erez Cohen, Mellanox

Training a Neural Network may take days or weeks, even on a top of the line GPU. To reduce training time, distributed computation is often employed to spread the work across multiple GPUs and multiple nodes. Horovod is the best example of such a scalable architecture. At NVIDIA, in collaboration with the community, we have configured Kubernetes and multi-node infrastructure to deliver performance that scales as we add more GPUs and nodes. This talk presents the problems and solutions related to networking discovered during this journey. The inexhaustive list includes solutions like CNI for multiple networks using SRIOV, enabling RDMA over IB and Ethernet (RoCE) to provide low latency, high throughput and direct GPU to NIC connectivity (GPUDirect), enforcing PCI affinity of GPUs with respect to Network Interfaces, using Source-Based routing within pods for L3 networks and much more.

https://sched.co/UabV

A

All right, good afternoon, everybody welcome to today's session networking optimizations for multi node, deep learning on kubernetes before we get going I'd like to remind everybody to please rate the session afterwards on the schedule app. Also, if you have any translation needs, we got translation going on over here, just punch in that URL type in the code, and then you have the app on your phone.

A

Please use headphones, though, versus the audio speaker um I'd like to introduce these speakers for today, arez cohen, vice-president for cloud x, nei program at monix and Rashad Chopra principal software engineer with NVIDIA working on AI, deep learning infrastructure. It was like with that I'd like to hand it back over to you guys. Thank.

B

You thank you very much so good afternoon. Everybody.

B

We thought to start with why machine learning, but, to be honest this days, it's quite obvious. Machine learning is everywhere. Even we have this little thing here that translate. It is all machine learning based, so really machine learning is everywhere and we definitely want to run machine learning in kubernetes as well.

B

This is just a basic definition of what machine learning is from Wikipedia, in essence, is just the ability to allow machines to learn from data and program themselves, but the real development over the past few years is around subfield of machine learning, called deep learning or deep neural network, which is basically an implementation of machine learning which is inspired by the brain, in other words, taking a biological implementation. If you like and translate that into software implementation, this is the very basic of neural networks.

B

Will not do a neural networks course here, but I do want you to understand the basics, so it will help us understand the challenges in running in in large scale. What you see here is a single neuron, which has three inputs on the left-hand side, x0, x1 and x2, and with each input we associate the weight that we 0 W 1 and W 2, and those weights are very important. Those weights will define if this neural network will work properly or not.

B

So we need to tune them and we'll talk about how we do that in a minute, those inputs and the weights are going into the cell body. There's a simple function there that decide. If this neuron should fire or not so as a single neuron, that's pretty simple and a single neuron doesn't really help us a lot, but when we couldn't combine them together, interesting things start to happen. This is a neural network that do image, processing and predictive an images, digits, 0, 1, 2, all the way to 9.

B

You can see the output on the right-hand side and the input on the left-hand side. This is an image. This is a 3 layer, Network and, as you considers many connection between the neurons, the neurons are all connected interconnected to each other. Now, on every connection, there are weights, as we said earlier, so you can assume that there's quite a lot of weight on this neural network, but neural network can be much bigger.

B

This is a 12 layer, neural network and, as you can imagine, there are much more parameters, much more weights. That needs to be tuned for this neural network to work efficiently, and this is definitely not the biggest neural network. Actually, today, we're saying neural networks that are multiple hundreds of layers and they can be very, very, very large and very complex.

B

Now, when we build a neural network in the beginning, it doesn't act as we want it to be. We need to train it. We need to teach it and the way we're teaching neural network is through a process where we feed information that is pre classified or pre tagged a good example.

B

A very classical example is an image processing when we are trying to teach a system to distinguish between dogs and cats, so we will feed images of dogs and cats, but we know that there are dogs and cats and on the output we will check what was the result and we will try to feedback and fix those weights.

B

That I said earlier through some kind of algorithm until the point where we feed dog image- and it will say it's a dog and good probability and a can't imagine it will say it's a cat and then at that point we say that the neural network is trained and then we can move to the inference phase. The inference phase is basically the execution part. This is where we actually push an untagged in a data. It can be an image that we don't know exactly what it.

C

B

It will tell us if it is a dog or a cat. Of course. This is a simplified example. The training phase is very challenging. It is very, very much compute intensive process. This is why NVIDIA GPU is such a wonderful solution, because it allows us to drive computational very fast, but what we see is that those neural networks are growing in complexity and the input data is growing as well and actually today, if we are looking at modern models and problems, it takes us weeks to train a model.

B

Obviously, that is very challenging challenging, because a week is a long time, but also we need to train quite often there's two reason: one we will learn to train. Often one is because development model is not a linear process, like writing a C code or any other language. You are used to you write the model, you train it. You see how it behaves, you change it and then you train it again and you go on and on now. If you do that, and it takes you a week between training, that's very inefficient.

B

Another reason is that, just like in life, you have to keep on learning, you build the model, you train it great, you put it in production, but things change. The world is dynamic and we need to be able to keep on teaching the model too and bring it to be more accurate. So what we really need is we need to to accelerate our training time, and the only way to do that is to do scale-out computing, basically add more computers. So how do we? How do we do trainer scale out in machine learning?

B

This paradigm called data parallelism and basically, what it does is basically, instead of having a single computer that has all the input data we just add more computer, we scale out and we take the input data. We split it in between the different computers. So in this example, each computer will get sixth 1/6 of the data and then it do a local training. Now, if each of the computers will do local training, you will ever end up with six different models, because each of them was working on different data elements, so that will not work.

B

What we really need to do is we need to combine those machines, power together and the way to do that is basically split. The data into mini batches, let's say: 32 images, do local training on each node and then communicate the results and combine them together through the network. It can be to a single computer. It can be between the computers, there's different ways to do that, and eventually this process. When we are iterating this multiple times it can be hundreds of thousands or millions of iteration.

B

We will get a single end module that is trained. Now, when we look at those network element in the training phase, we see that the communication pattern is very, very challenging it will. It will be usually a very high performance, high throughput. The those models can be transferring tens and hundreds of gigabits per second. So it's a lot of data from every computer, and you have a lot of computers, usually doing that it will be very high message rate and low latency requirements and it has collective operation in nature.

B

What I mean by collective operation is that all those nodes are working together? They work on a mini badge to communicate and they're waiting for each other, so they work as a collective and as a collective. You need to wait for everybody and synchronize everybody together, and this type of communication pattern very much remind us of high performance computing Network, which are all the supercomputers, and we know from a lot of years of experience that advanced network techniques such as our the human GPU direct, are critical for having an efficient training.

B

I mentioned our they may. Let me just give you a quick explanation about what is already main are delay. Is a remote direct memory access. It is a transport service, you guys probably know TCP and UDP, so it's in the same layer of Mallis and but it was designed much later and it is much more advanced in terms of feature set.

B

It provides ability to do read and write over the network, not only send receive, and it provides us ability to the kernel bypass kernel bypasses when the application talk directly to the odd were bypassing the kernel and by that we get very low latency and it provides full hardware offload, at least with the Mellanox necks, which mean that we can transfer hundreds of gigabytes without any CPU intervention. So the CPU load is zero and the efficiency is very, very high. Our dev I started from a technology called InfiniBand, but today it's part of Ethernet.

B

It is called Rocky and the interface the software interface is not socket. It is an interface called herbs which is very important because the normal net devices that we transfer to the container doesn't provide Rocky or are the main interface. We need a different interface.

B

Gpu direct is a technology that allow us a better efficiency of sending and receiving data from the GPU to the network. What you see on the top right hand, side is: how do you get data in and out of the GPU memory without GPU director? What you would do is typically copy data from the GPU memory to the host memory, copy the host memory to a buffer in the host number again for the network and send it out- that's not very efficient.

B

Obviously, GPU direct is a technology that developed by Nvidia and Mellanox together about 10 years ago, which allowed the neck and the gpo to communicate directly to each other. So the neck can access the GPU memory and send and receive data directly from there. Obviously, it provides a much more much better efficiency. So how do we enable our team, a NGP direct in kubernetes?

B

So today we are using SRA of V as a mechanism to expose our diamond GPU. Direct SRV is a PCI specification stands for single route, our virtualization and basically, what does what does it mean? It means that you can take a PCI device, slice it and provide slices to an application called virtual function. What you see on the images and on the bottom here, on the left hand, side you'd, see the standard configuration on the right hand, side. You see this array of n.

B

You can see that every virtual function coming from the device has a net device, but also the our DMA device, those devices that are needed for our DMA, that can be mapped into the container. There are CN eyes and device plugins to provision SRA away. It is completely standout and upstream, and you can find the links in this page. So from a container perspective, the container will see when, when, when at the SRU V device and CNI plugin are activated, the container will see both an our DMA device.

B

This is the IB dev that you see here, as well as a net device under the namespace of that container and we'll get a slice of the neck as part of the Sava interface.

B

So how does it look from an orchestration perspective on the kubernetes level, so, first, the SR, every Network device plug-in will advertise the service capabilities. Basically how many virtual functions each device has and then, when you launch a pod, it will define that it needs a survivor. V in case that it needs a service or it will sign, is a virtual function. The kubernetes scheduler, then we'll run the pod and the right host with with those resources, the device plug-in will allocate the virtual function that will be connected to that pod.

B

The allocated device is communicated with the SRA OVC ni, because we need the net device to run on that interface and then the service and I will move the virtual function at device to the put the namespace.

B

In kubernetes today, you're not allowed to run more than one interface or one CNI into a pod, and for that Malta's or similar in plugins were developed monitors as a meta plugin, which allows to provide multiple interfaces to to a pod. There will be always one interface, which will be the prime, the master interface. This is the standard kind of eth0 that that you are using, and this is where all the security groups and policies and so on will be applied. But then Moltres provides you the ability to connect additional c nice, one omo.

B

It can be more than one, and in our case it is obviously a survey of a and those interfaces.

B

Although they are providing connectivity, they don't have any security policies around them and they are not able to be present in under the kubernetes interface and with that I will hand to result.

D

Now welcome to part two of the post learn session of the last day of the conference. I'm sure you guys excited try to make it interesting. Funny thing I noticed that when res said our DMA, the AI thing running over there was saying our DNA.

D

It's that was noticing, and it's more clever than I, was afraid of anyway. Yeah we get to.

B

Probably my accent or.

D

I'm, smarter than all right all right, so uh arrests talked about all the ingredients that you would need to get deep learning going on Cuban Aires for multiple nodes. You need Rd main everything he put it on the list of all the ingredients and I put all these ingredients together. You will have a large data center. Hopefully, if you want to scale out and push further and further, you have a large data center and you could be able to put these pieces together and things would start to work I'm afraid not.

D

You will run into issues and those are the issues which I'm gonna list down. If you ever going to try this in your data center you're gonna bring this infrastructure up. You're gonna have to take care of these issues in mind. Some of them are functional issues. You will see that stuff is not working and some of them are performance issues.

D

You will see that if stuff is working at as low as 1% of the performance that you thought or I would have shown in the slides, so I will go through some of this list and we start right. We probably start making some counts of things that I am going to list and in the end, I'm going to show some performance graphs on how when we put in these optimizations, what do we see? Do we see the expected result? So not all right. So, we'll start with priori flow control, explicit congestion, notification, stuff.

D

You can read up on Google what it is. It's broadly about saying when the network is going to be shared, it's an either RDMA or converged. Ethernet you're gonna have trouble with other things. So well, let's separate our priority flow control things and then, let's make a separate channel. You need to configure that on the host the ecn bits, which is congestion, notification, which is just saying your first congestion, please back out dumps in traffic. Let's don't make it worse kind of things.

D

These are things which you have to can fix on your switch or the router and then other things you need to get this ACS thing set up on BIOS. If you don't do that, you will see some performance suffering and there's a GPO direct which it has introduced. If you don't put this kernel module in imagine now you have 20 boxes in your data center or 200 boxes, and you didn't put one of the kernel modules thing going on in one of the boxes dismissed one.

D

What is the beauty or the single most important principle of high performance computing? The slowest guy is gonna guide the performance of the entire flock. When you say the whole thing is running on 20 percent performance. What's going wrong? Well, please go through the checklist this suffered. This so believe us. The final part is blue film registers that using Mellanox codes. If you have smaller packets, 64 bytes less or something, and then you probably want to tune this further. You don't want to see if I pursue drop. It's not that 5% broken performance is important.

D

I translate that to 5% of your money going waste when you've invested millions of dollars in that infrastructure going. How many of these are there? Four of these all right? We keep going. You don't believe me probably, and these are the graphs that we measured. We didn't do PFC NECN, the one on the left is showing for interface cards, the red orange green and blue, trying to operate over some period of time, which is the x axis, and the y axis is the gigabits per second and the max.

D

You could have gotten a hundred gigs per seconds, that's the card throughput, and you would see that you don't get the things right. One of the cards is just working at 20%. 20 bits per second will just enable that, and you see the graph on the right- and you say nearly everyone is working together like brothers should be, and you get above 90% but, and you can see as anyone can, that there's some small spikes that come down and everything we were at.

D

The four counts of things that we had optimized I'm gonna go next on how to remove these spikes. Also, if there's one single important thing that you're here for and you want to learn on how to get deep learning infrastructure on scale with cuban ares and everything. This is the one thing which will improve your performance. Multifold make it scalable, make it reliable. The problem statement is that I got these boxes. You know the GP boxes.

D

Each box has eight GPUs of 16 GPS, a 4G piece of whatever your favorite Convocation work and, let's connect them to the network and the network has these top-of-rack switches or something? How do you connect them if you're, anyone like me or something stupid or learning? This is what I did I put these boxes and then the rod got over I put this into the switch and then get to the other rack.

D

Put these boxes put it on the top of the rack and they connect the power of the rack to the fabs and spines or whatever it is, does not work, and it would work for any other thing that you would try to do. It will not work for deep learning. Why? Because deep learning has a peculiar thing when you try to share those mini-batches with your parameter servers and everything what's going on, is that the.

D

If you see this diagram on the right top right, the information sharing happens in and throw a ring or a parameter server or a tree or something whatever information goes through Nick one comes out of Nick one covering all the GPUs conducting all the parameters that each of those GP is communicated, goes to the other GPU node and communicates with Nick one also and goes to the third node, and can me guess with Nick one again, all the Nick ones want to talk to each other. All the Nick toes want to talk to each other.

D

All the Nick trees want to talk to each other. Never would you have a situation in deep learning in the frameworks that Nick one is trying to talk to Nick no Nick, one talks to Nick one Nick, two Nick, three, three, four, four! So a minor point, but a very clever point: I'm saying the single most important thing. If you have to learn so wake up. Listen to this when you've wire up your servers, get your Nick 1 2 switch 1, all the Nick tools to switch to all the Nick threes to switch 3.

D

Even if they're from the same box, just a simple measure and the result is that when all Nick ones perform their communication pattern, they will have no extra hop one hop right. They all go to the same switch they get back to the thing. There is no extra hop required now. Extra half is bad when you're talking about RDMA over Ethernet a simple principle very easy to understand, just that I struggle.

D

So much with putting all of these two in the same single switch and everything- and the one important thing that you can learn from here- is that if you get the number of ports on your switch is more than the more you can scale any cluster. If you have a 128-bit of 128 port switch, you will be able to connect 128 machines on one switch. That would nearly mean 1,000 GPUs.

D

Five points did I make. This is the single most important one sixth point: if you don't enable this, this is going to be dysfunctional. This is a source based routing. Now it has introduced that using multiple have multiple interfaces into the port. When you have multiple interfaces. All of them are on a different network and possibly would be do you know how routing works then, if I want to go from a box to some IP address, my destination routing table says: oh, you want to get to that.

D

Subnet use this NIC I say hold on I'm, deep learning, okay, I know which NIC to use don't tell me to use that Nick I know which need to use I want to connect NIC one to Nick. One, don't tell me that your routing table is saying to use NIC for for that subnet. Well, you need so spaced out in here, so that when you're, deep learning framework is trying to share parameters, it consults the source based routing and says NIC one will always go from NIC one.

D

You choose the source interface and you choose the destination interface and you need this plug-in to make this working. So we always put this together. If you have a quick question, I can answer now. If we can take this letter.

D

Yeah layer, 2 network clearly, and you will not have VLANs then, and you will not have ever the possibility of going and expanding your network when you have a 648 port switch and he said, can we have layer 3? If you don't want layer 3? Yes, if you stick in layer 2, you don't need this, but you want to scale out and we wanted to scale out so we said well, let's make it generic good question. Thank you. This is the roughly the final thing that I'm going to talk about.

D

We can spend a lot of time on this and, if you want, we can close the sooner, but these are real performance measurements. Now I could have gone on to hundreds of GPUs, but I wanted to restrict and see that just 5 boxes put together just 42:48 GPUs put together on different scale. What are the numbers that I got on resonate 50, which has 23 million parameters to be shared on every one of the 50 layers? When you do deep learning training on 14 million images, the y-axis is number of images per.

D

Second, the x-axis is number of GPS that are used and the three graphs are on batch sizes see clearly in SGD.

D

We trust' SGD is the algorithm that we use stochastic gradient descent to share those parameters, and that's why you would see why the batch sizes are important, why your accuracy is important, why the training image the these speed of training is important and how these things play, but with the network guys, we want to make sure that what did I do all this for and you can see in the third graph, just in five boxes put together, you can see if you did not put your network right.

D

The red line is the RDMA highly optimized stuff, and the blue line is also hundred gigs I'm, not comparing wrong mix with stuff. The blue line was also the hundred gig mix, and you see two extra performance. That's half your money wasted in just five boxes. Put together five as an or six boxes put together, yeah, so 48 gpu-z would see 2x performance last now, as you start to scale into hundreds and thousands GPUs, you can calculate what's going to go happen.

D

The top left graph is the bigger batch size and there are three lines there. One is the orange line, which is the ideal theoretical maximum that you could ever get, which is what I did with one GPU I took and I said what, if I put 48 gp's I just took that number and multiplied by 48. That's the idea you could get, and if you see in most of these graphs until 8 GPUs, you definitely get similar scale and after 8 you start to peter off a little bit.

D

You know the scale is the slope is not one anymore or something it's because one box had a GPUs. It's only when the second box comes in, you use the actual network. You use the PFC. If the EZ ends, you will use the optimizations that I've been talking about who likes to see half of their data or half of their money going based. Nobody we're talking millions of dollars here, I'll prove my point in the end, with a joke.

D

At somebody in conclusion, I don't stretch it. We can talk about this later and these graphs are important. These are real measurements and they're not done yet we're gonna improve this further.

D

We're gonna pull these red guys and they love them and try to see how the scale and everything so I'm, putting together laughs with 800, GPUs and thousand GPS them and see how far it goes and then you've seen the ML portion umbers and everything we want to train the resonant 50 or the more complex, smaller Z 1 is known, 50 and 150, and what not, not in weeks as it used to take not in days nor in hours.

D

These results are the old model trained in five minutes, so you can train retrain things like those well. This is the summary and it's coming to a close. Finally, the upshot of the matter is a multi rail or Diamond Cuban, a DS is possible and it scales very well as we have shown which Don the performance numbers. There are a lot of things to take care of, and you can go over the slides. It will publish, of course, further work would be storage. You got a plan for it or we'll talk about it.

D

Ambient temperature is an important thing. If you have hundred degree going on outside, you will see that performance of your data center is suffering, and you would say what I did everything right what's wrong? Well, it's just hot outside plays around good pressure. There are other things: weather clears biggest principal slowest link will guide the performance of the entire job. It's an HPC thing and to close, does anyone know what principle does a plane fly?

D

This was asked to me when I was doing my pilot training several years ago and I took the first without instruction and the instructor said so. You've learned a little bit about how planes fly. How do you? How do they fly and I was naive? I said you know like how networks works on bits and switching, and it has all the Bernoulli's principle things. No, the only principle is you put money, it flies. This is networking. Put money, in course, just take a few things. Thank you.

B

So if any of you have questions that we have few more minutes and you're more than welcome to contact rich out of myself, yes,.

E

Do you have performance numbers visibly the bare metal versus communities that you could illustrate on this.

D

Is cubed and is on bare metal? Okay,.

E

So if you remove the infrastructure and just run it purely on bare metal, how would this look like that's what I was trying to figure out.

D

Exactly the same, it's if it's an argument that your name spaces introduce anything they're not in the data path. Okay, the SRO via the whole idea, is that there is no bridge. There is nothing. The SRV idea is that you take the physical interface. Nearly the physical interface take, maybe a virtual function out of it and pull it right into it and as a rest set, it doesn't work on those sockets. It works on the verbs, so the data plane is not even there.

D

It doesn't even know what namespace and everything is it does, but it's not in the path. Yes,.

B

I would maybe just add one thing that we didn't talk about here, but the next phase that we will do is actually enable all the Sdn controllers with the bypass. So we you know in the very near future, we'll be able to run all those goodies where the full Sdn and without any performance impact. That's the next phase, any additional questions yeah over.

C

C

If we are slicing the GPU to V GPUs right will we see any issues here like our.

D

Multiple jobs we'll run the.

C

D

The problem we're trying to solve we're taking one job and we're gonna, take thousands of GPUs and use saying that we take one GP and split it into multiple jobs so that we can't compare this I mean I. I know that this is the thing that we need to work. I mean this is the problem K said, but this is not that problem statement. It's this problem said I got a model. It takes hours to run and optimize. How can I bring it down to minutes?

D

So this is not GPU sharing. This is taking everything in your datacenter or on one job finished in minutes. Come back with the new parameters.

D

F

If you have a large GPU cluster in which you run in different trainings in parallel, like different different sets of you, know, DC video trainings, you know when you keep running were load your cluster. It gets somehow fragmented, like you cannot do perfect lace me that you cannot put the jobs next to each other. That's why so, when you're, using rocky instead of InfiniBand, how much noise?

F

Can you know how much noise can it survive, or you know to make hotspots in the network because with InfiniBand G, so hollers has always like a double path you can on T. Somehow the nodes will reach each other within a.

D

Certain Latin.

F

C with a certain pathway that without noise but with rocky, is not the case. If.

D

Your question is: if I could tell right, is that with rocky you're selling the network in somewhere and within few when you get the Clear, Channel and everything and stuff, so this work is showing rocky, but it's not occluding rocky first, including InfiniBand. First of all, so it's a principle should apply, and then we should work on IB as well, but come back to a question. Yes, if you clog a network, you will get clogged results. If you clear a network, you will get clear results.

D

Is this there's nothing else, but but you don't forget the PFC ECN bits right. If you have you have ecn on, you would have a normally functioning cluster, but if you still are overloaded at least you will not die with thrashing I will.

B

Just maybe quickly add on that in general, when you're running multi applications and a cloud which is our performance, you need to do placement in a network of our manner if you'll place your workloads and without any network awareness, you may have challenges and it is not available yet to this part of kubernetes. But this is something that HPC guys are definitely but.

F

You can do never placement, but it will get fragmented eventually, because you place three, you take four nodes and use for a job. That's.

C

F

Source gets free and then you only need two and then two are isolated. I made, you may not have four next to each other right, you're.

D

Right I mean this is a problem that we faced in our data center and we are doing a new scheduler. We have built a new scheduler for the deep learning jobs for communities and to begin with, for these large jobs, we do gang scheduling so that we know that the gang scheduler will put all these jobs on the nodes that are allocated. So we demarcate these nodes saying this is a node which will run jobs which want eight GPUs.

D

This is a node which will run, which will four jobs which won't for GPS, which means two jobs can fit if you had a GPS if you're for jobs get creative here. Sixteen GPS on that box in this is a box, a separate box, labeled separately. The scheduler knows the notes, electron rows and everything which is going to run only jobs which one one GPU, which means we can fit sixteen jobs if they have sixteen GPUs or eight jobs, if they're after using. But that's how we try to avoid that fragmentation.

D

It is an integer programming trouble and, if you say, I want to solve everything and not have fragmentation. Yes, it's a real problem. You will have lot of what's called efficiency issues, utilization.

C

B

A

We're at the top the hour at this point so all right: okay,.

B

A

Any other questions I catch up with the guys afterwards outside, and thank you for coming to the session. Thank you. Thank you. Thank.

B