Red Hat OpenShift APAC Hybrid Cloud Kopi Hour, 24 Apr 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: APAC Hybrid Cloud Kopi Hour | Scaling and High Availability on ROSA and ARO

Description

A very special edition from the Red Hat APAC Managed OpenShift Black Belt team. Join our Red Hat experts Nethali Zoysa and Suresh Gaikwad for a LIVE Q&A chat accompanying a replay of their recent presentation on resiliency on ROSA and ARO. You’ll learn about the key concepts behind scaling and high availability - from types of scaling to load balancing, they’ve got it all. There’s technical examples and real world recommendations to make your clusters amazing. And our presenters will be standing by LIVE in the chat to answer questions and share experiences!

A

Hybrid Cloud copy hour today we'll be bringing you some really cool content from our managed cloud services team, the APAC managed openshift black belts internally. We refer to them as an acronym, of course, for their name Mobb or the mobs. The mobs work closely with our managed openshift customers and field teams supporting real world installs of Rosa arrow and more. They get to play in this space every day and see all the ways that a managed openshift service is proving helpful to our customers to reach their goals.

A

And with this experience they have gained a massive amount of knowledge and how to do real world deployments. The right way. So today, I'm excited to share with you their recent session on scaling and high availability in Rosa and arrow and as an added bonus. Both our presenters are standing by live in the chat to answer your questions and share their stories and experiences be sure to say, hi So today, we're featuring two mobs.

A

First, up is Natali to cover scaling in openshift Natalie dives into scaling different types of scaling, resource management and more you're, going to learn, Concepts and stuff about how it works, sections on optimizing and best practices after that is Suresh to present on high availability and he's going to Deep dive into everything, from application, probes to load, balancing capabilities and controlling pod placement. Everything to make your clusters safe and you won't want to miss the best practices session or Suresh, gives you the rock solid foundation to making your openshift cluster always available.

A

So we really think you'll find this session interesting and have come ready with your questions as again. Natalia and Suresh are going to be standing by live for Q a in the chat, engage with them. Ask them your questions and enjoy the session thanks.

B

Hello everybody. Today we are going to talk about scaling in red hat openshift. We will cover types of scaling, key Concepts, how Auto scaling Works optimizing, Auto scaling and best practices. Let's see what are the types of scaling.

B

Scaling refers to the ability of openshift to automatically adjust the number of running instance of application based on demands. This can be achieved manually or automatically when it comes to manual scaling to instantly change. The number of replicas administrator can use the also scale command to alter the size of job deployment or replication controllers Etc.

B

When it comes to Auto scaling, there are three types of photo scaling: horizontal scaling, vertical scaling and cluster scaling, horizontal scaling, horizontal scaling, involve altering the number of ports available to the cluster to switch the change in workload demands you call scale in when you reduce the number of words, and you call scale out when you increase the number of calls vertical scaling.

B

A vertical scaling mechanism involved, the dynamic provisioning of resources such as Ram or CPU of cluster nodes to match the application requirements. This is essentially achieved by tweaking the port resources based on the workload consumption. This scaling technique automatically adjusts the port resources based on the usage over time, thereby minimizing resource wastage and facilitating Optimum clusters. Utilization.

B

Works best with long running, homogeneous workloads such as databases, it has few limitations also, for example, vertical perotto scaling is not ready to use with jvm-based workloads due to limited visibility into actual memory usage of the workload other than horizontal scaling and vertical scaling. We have even a multi -dimensional scaling. It combines horizontal and vertical scaling simultaneously. This is a less frequent and not recommended more complex to manage, because defined in the when to scale horizontally or vertically depends on many parameters which are sometimes hard to predict.

B

The third type is cluster scaling. Cluster scaling involves increasing or reducing the number of nodes in the cluster, based on the neurode utilization matrices and the existence of pendant parts. The cluster's auto skill, typically interfaces with the chosen cloud provider, but in the case of open shift container platform implementation, cluster Auto scaling is integrated with the machine API by extending the compute machine set API. What this really means, as you can see, on the screen, openstack has custom resources, called machine, pools, machine sets and machines, specifically in AWS infrastructure,.

B

You can consider machine set and machines similar to replicaset and replicas of pots. You can change the number of replicas in the replica set to change the ports. Similarly, you can change the replicas of machine set to change number of machines which is controlled by Machine set, but when it comes to AWS, you have another high level, abstract called machine pool which basically control even machine sets.

B

Actually machine said a single machine set is always available in a one availability Zone in AWS, so we are not going to go more detail, but this is a very high level explanation about machine pool, machine set and machines.

B

What will happen is basically crust. Auto scalar called machine pool which call the machine set, and eventually it will create the machines.

B

Note that horizontal vertical and class total scaling are unrelated to each other. They are developing a separate projects and can be used independently from each other. Therefore, the cluster requires fine tune in the settings of Auto scale is so that your work is running correctly.

B

Openshift has been already fine-tuned and it's ready to go.

B

So, let's try to understand some key Concepts under the hood of scaling. The first one is resource requests and limits a whole bunch of resources. A container need can be optionally, specified using resource requests and limits. So what is resource request when the resource request is specified for a container? The cube scheduler use this information to decide on which node the Port is placed.

B

It is used for scheduling and to indicate that a pod cannot run with a less than the specified amount of compute resources, The cubelet Reserve, at least the request amount of resources, specifically for that container.

B

If the node, where the port is running, has enough of resources available, it is possible and even allowed for a container to use more resources than its requested resources.

B

Then what is resource limit when a resource limit is specified for a container? The cubelet and container runtime enforce those limits so that the running content is not allowed to use more of the resources than the limit. This is used to prevent a pot from using up all compute resources from a node.

B

In some cases, cubelet even reserve the limit, which is greater than or equal requester because of the reservation. The scheduler refused to place a port on a node, although actual memory or CPU resource usage on node is very low. Since the capacity check fails, these protects again. The resource shortage of node with resource usage later increases for the peak request rate.

B

The second concept, the resource allocation in the worker knob in kubernetes worker node memory and CPU, are divided into four categories: allocatable Cube reserved system reserved and eviction threshold.

B

What is allocatable actually the allocatable is the resources available to your ports.

B

Cube result the cube, preserve resource necessary to run kubernetes agents such as coupler the container runtime Etc the system reserved, as it name says, this is the resource needed to run the operation system and system demons and last eviction. Threshold memory, pressure at the node level to the system.

B

Is sometimes lead to what we call ohms out of memories, this effect the entire node and all part running on it. Node can go offline temporarily until memory has been reclaimed avoid or reduce this probability of system, ohms or system out of memories. Kubernetes reserved some memory via a specific flag called eviction heart the kubernetes attempt to every pods whenever memory available on the Node drop below the reserved value. For this reason, resources reserved for eviction are not available for parts.

B

You need to account all of them. If your cluster runs extra stuff, something like demon said cnis like funnel or log collectors such as flu and D. It will further reduce the resources available for your pot or allocatable Reserve allocatable resources.

B

There is a very nice resources that you can go to this URL and you can check the resources based on different node types. So I will leave it to you and let's go to the next section, how Auto scaling works?

B

Let's have some understanding how horizontal Auto scaling works.

B

The first step, the put CPU and memory usage, is pulled by Cube blood every 10 seconds. So, after that, the Matrix server will aggregate matrices and expose them to the rest of the kubernetes API in every 60 Seconds.

B

Following that, the horizontal scalar controller, the control of the horizontal support Auto scale, is in charge of checking the matrices and deciding to scale up or down your replicas by default. The HPA checkpoints matrices every 15 seconds.

B

In the worst case scenario, the horizontal scale can take up to 1 minutes and a half to trigger the auto scaling. You can see you can add this 10 second 60 second plus 15 seconds, roughly around one and half okay, let's think now, horizontal scale want to create another pod, but you are node does not have resources, in that case the cluster Auto Scale company to the picture, because you have to create another node. The cluster Auto scalar has a reaction time of 10 seconds.

B

The thrust Auto scale check for unscreditable pods in the cluster every 10 seconds. Once one or more ports are detected, it will run an algorithm to decide how many nodes are necessary to deploy all pending boards and what type of nodes groups should be created. The entire process take no more than 60. Second, on a cluster with less than 100 knots, the average latency should be about 5 Seconds, and if your cluster has nodes around 100 to 1000 nodes, it is said that this time take no more than 60 seconds now.

B

Let's imagine that our cluster has less than 100 nodes, so this become 30 seconds. Okay, next step is not provisioning load provisioning. Actually, it depends mainly on the cloud provider, but it's pretty standard for new compute results to be provisioned in three to five minutes. So let's say take three to five minutes and last the port creation tag launching a container generally happen in couple of milliseconds, but some application may take time to boot, depending on the kind of application, for example, jvm based.

B

It takes some time and also some operation to perform. For example, some data injection might take time. More importantly, downloading the content image could take from a couple of seconds up to minutes, depending on the size of and number of layers of the image. So that's how horizontal Auto scaling work and all the time involving in each and every steps.

B

So with that explanation we have a good understanding that Port Auto scaling lead time is, as shown on the picture with the default time in horizontal product or scalar reaction time, 10 seconds for Port CPU and memory scrap 60. Second, for Matrix, aggregation and 15 seconds for HPA checkpoint matrices, and if you want to create another new node, the cluster or scalar reaction time come with a 10 second plus 6 additional 30 seconds and no permission in time take around three to five minutes and Port creation.

B

Time also add to that: ok, that's how and that's the for auto scaling lead time. So how do we optimize Auto scaling? Let's have another look, if you add all these times, it will take around 6 to 7 minutes for entire process.

B

I am happy with two: are you happy to wait for six, seven minutes to handle sudden surge in demand? No right, so in that case we have to see how to optimize this Auto scaling s. U k and C from top to bottom. Open shift. Use Cube controller manage flags for optimal HPA performance, so this Flex has been. These flags have been already optimized, they are fixed and you have no control on this. So therefore you have no control on this HPA performance. You cannot produce the time and also openshift cluster Auto scalar reaction.

B

Time is 10 seconds. So in that case you you have no control on that, also, but node provisioning time and product creation time. Of course you have no control. You cannot reduce node permission in time. You cannot reduce the port image downloading time you, but you can have some work around to optimize them. So let's have a look. The first one is not provisioning time.

B

You can have some work around to minimize new node creation.

B

You can optimize HPA Target threshold value. You can create node proactively in advance in in term of Port creation time. You can cache the image in advance, so let's, let's little bit, dig deeper and see how we can do each and every sips first one no provisioning time.

B

So if we do not have a control on node provisioning time, how we can optimize this timing, the one step is minimize new node creation. Try to minimize new node creation.

B

Minimize creation of new nodes, if possible, choosing the right instance type for your cluster has dramatic consequences in your scaling strategy. For example, if your node has space only for few pods, as you can see on the left side, you have to provision in a new nodes for additional replicas, incurring additional another. Six seven minutes delay the lead time to trigger the horizontal for total, scalar class dot or scale and provision of the compute resources on the cloud provide.

B

But if your node has a space for a large number for, as you can see on the right side, this will reduce it dramatically because you have enough space always choosing a large instance. Type also has another benefit. The ratio between the resource result for cubelet operating system and eviction solves what we discussed earlier and the available resources for your ports. The ratio is a greater select, the right instance, but not the biggest instance.

B

At all the times, you have to have some kind of a research and try to select the optimum large enough node size. There's a big efficiency dictated by how many pods you can have another node cloud provider limit the number of PODS, sometimes it's dictated by the underlying Network on per instance basis. You have to consider all of this. You should also consider if you have only few nodes. The impact of fail in one node is very high.

B

The second workaround that you may try is optimize HPA tablet, threshold value at the high level, HPA keep an eye on resource request, matrices coming from your application workload, nothing but ports by querying the Matrix server.

B

It compare the Target threshold value that you set in the HPA definition, with average resource utilization observed in your application workload that is CPU and memory. If the Target threshold is rich, the HPA will scale up your application. In this example, your target is 50 percent.

B

If your CPU or memory according to your your definition, if your CPO or memory either or both of them is greater than 50, your hpl HPA will scale up your application.

B

So after some research, this Target threshold value may be set low enough so that exist inputs can handle the growing traffic until a new node is created to have Space 4 ports. So that's that's a that's another workaround that you can try and the third one is create a node proactively in advance.

B

So if you can afford to have a spare node available at all times, teach the auto scalar always to have a spare empty node. You may create a node and leave it empty as soon as there is a port in the empty Norm.

B

You create another empty node, but cluster Auto scale. It does not have this functionality built in, but we can have a nice workaround for this. So, let's see how it works, run a low priority deployment with the enough request to reserve an entire node. As you can see on this picture, as you can see on the right side, there is a low priority, placeholder for which always preserve the spare node.

B

Consider this low priority report as a placeholder and as soon as a real Port need the resource. You could evict the placeholder port and deploy the real high priority report. Let's see how it works, so the new application pod comes it evict the place of all the port, and now what happen is there's a no node for the placeholder Port. It will create another new spare node and your low priority. Placeholder Port will appear there.

B

So pay extra attention to the memory and CPU request so that they are used to reserve the space of your new spare node. You may provision a single large Port that has roughly the request, matching the available node resources to make sure that the point is evicted as soon as a real phone, which is the application Port is created. You can use put priorities.

B

Put Priority indicates the important of the Pod relative to other ports. When a port cannot be scheduled. The scheduler tries to evict the low priority parts to schedule. The pending points. You can configure Port priorities in your cluster with a pod priority class. So this is the third option that you can try and we we are proposing this one. There are some workaround that you can optimize, not provisioning timing and the next one is for creation time.

B

If your company allows to Cache the image in advance, in that case always try to catch the image in advance. There are some companies which has image pool policy set, as always for some reason, for example Security. In that case, you have no chance for this, because it always has to pull the image from the registry. But if your company allows use pod creation time, you can optimize spot creation Time by downloading the image in address.

B

There are some mechanism you can use to Cache the image into the node in advance, for example, image puller use demon set. So this this use a demon set to download images in each and every node, and also there are some operators, as you can see, in the kubernetes image, puller Cube, flage, Etc and also even you can use the placeholder put in the spare node to download the image in advance.

B

You may use the application image if supported as a low priority, Place solder Port when creating the spare node. This will pull the image into the spare node in advance and allow the real port to start faster, replacing the low priority port.

B

Although these practices May reduce the Pod creation time, we highly highly recommend to experiment them to suit your requirements. Last but not least, double check the image pool policy in your organization.

B

And the next step is, let's go and try to understand what are the best practices in scale? The first one use the latest version of autoscaler object. Kubernetes is a highly kubernetes is a high velocity platform.

B

So there are a lot of things happening with every new releases performance and comparative compatibility of different kubernetes version with auto scalar releases are truly tested and documented. It's highly recommend to use only the compatible Auto scaling object of the kubernetes control plane version to ensure the cluster Auto scale, appropriately simulates the kubernetes schedule.

B

The second one is be specific, specific on resource requests and limits. It is crucial to ensure the port resource requests are comparable to their actual consumption as a recommended best practice, cluster administrator leverage, historical consumption, Statics and ensure each body is allowed to request resources close to the actual kind of usage.

B

So there are many ways that you can Define requests and limits. The first one is defining the request lower than the actual usage.

B

So in this case it is sometimes problematic, since your nodes will be often over committed. Our committee nodes can lead to the excessive eviction more work for quiblet and a lot of pressure only so it's not a good practice. Maybe, and the second option is you can set the request and limit to the same value in kubernetes is often called referred to as a guaranteed quality of service class and refer to the fact that it is in importable that the Pod will be terminated and evicted.

B

The community scheduler will reserve the entire CPU CPU and memory for the pod on the assigned nodes. Pods with guaranteed quality of service are stable, but also inefficient. For example, if you app use a 256 megabytes of memory on average, but you reserve two gigabytes, we have 1.5 gigabyte unused most of the times, so we have to ask the question: is it Worthy, if you want to extra if you want to have an extra stability?

B

Yes, it's worthy use, guaranteed quality of service, which means request equal to the limit when you want to have the when you want to have a minimized free scheduling and eviction for your pods and last option is: if you want to have the efficiency you might want to lower the request and leave the gap between the request and limit this.

B

This is this is one of the uh one of the practice that you can increase the efficiency when, when your requests match the app, actually you say the scheduler will pack your ports in your node efficiently use this mechanism, which is request match to the actual average dosage. When you want to optimize your cluster and use the resource resources wisely, for example, you can use this kind of a mechanism for web application.

B

First, API application, Etc, and the next option is in term of best practices, Define resource request and limits for each part, I am repeating it for each part.

B

The fact is, the auto scaler relies on each node utilization and put scheduling status to tackle the scaling changes. Calculating node utilization relies on the simple formula dividing the sum of all resources requested by capacity of nodes. As a result, missing resources request for one or many pod affect the calculation of node utilization.

B

It is recommended to fact it is recommended practice to ensure all puts running with a node have resource request and limit defined for optimal scaling at any given time.

B

The fourth point is group node instance, with similar capacity, a class total scalar works, with the assumption that every node of every node of the node group has the same memory and CPU resources.

B

To achieve this, a template node is created by the auto scalar, on which all cluster wide scaling operations are performed to ensure the autoskelectory the template not perform accurately. It is ideal to have node groups with the nodes having the same resource and footprints, and the fifth point is specify disruption, budget for All Parts kubernetes support defined in the port disruption budget as a cluster object for voluntary or involuntary disruption of workload replicas to prevent, incurring losses. Cluster administrator should Define Port disruption budgets to ensure the autoscaler maintain a maximum results of pod.

B

For Optimum functioning of cluster Services, while preventing budget overlap.

B

In this session, we already covered type of scaling, key Concepts, how Auto scaling works, optimization of Auto scaling and best practices. Thank you very much.

C

Hello everyone. Today we will talk about how to achieve High availability of application. When deployed on openshift clusters, we will cover how high availability can be achieved with the help of application, props load, balancing techniques, scheduling strategies, Port disruption, budget for priority and preemption, and then we will discuss some of the best practices to be followed to achieve High availability.

C

When we talk about high availability, it's important to consider cluster High availability along with the application High availability, though we will be focusing on application. High availability today when deploying Rosa Arrow OSD clusters always ensure the Clusters are deployed in multiple availability zones.

C

Also, when you scale the cluster nodes ensure to scale the nodes evenly across availability zones,.

C

Now to improve reliability and availability of applications, we use application probes. Let's take a look on how application probes helps in achieving High availability.

C

Openshift includes two type of probes for evaluating the status of applications. Liveness probe examines whether an application is functioning properly a and, if not causes it to be started. A Readiness probe examines whether an application is ready to receive traffic and, if not, causes it to be removed from the service endpoint list.

C

There are three type of application: probes, HTTP gate, TCP, socket and exec. They are all slightly different in how they prove the ports and containers. The HTTP gate probe makes a simple, HTTP or https connection. If the response code is between 200 and 399, the check is deemed successful.

C

The TCP socket probe makes a pure socket connection to the identified port and is only considered successful. If the connection can complete neither HTTP gate nor TCP socket require anything special of the parts or containers being cropped. The most powerful prop is exec. This probe actually results in the specified executable being run inside the designated parts container.

C

If the exit code of the executable is 0, the prop is considered successful.

C

The reason that this probe is so powerful is that any executable inside the specified container may be run could be a custom program that checks not only the process running in the specified container, but that checks other things in or even outside of openshift environment, to determine whether to pass or fail is warranted.

C

The liveness probe determines whether or not a body is alive and healthy.

C

If lioness profiles, openshift will terminate the Pod and then restart it with an exponential back off delay, whereas Readiness probes determine if a part is able to receive traffic, it is possible that certain application instances may not be dead or require restart, but may also not be in a state where additional traffic should be sent to them as an example, an application that has hit its CPU threshold or for some some of the other reason cannot handle an additional load, may still be working fine, but sending new load to it may cause service disruptions.

C

If a Readiness Pro fails a sufficient number of times, this part is marked as not ready, and this causes it to be removed from the endpoint list for service that would map to it. In this way, additional traffic going to the service is no longer balanced across parts that are not ready. Once the port passes the Readiness checks again, it is marked ready, and this results in the port being re-added to the endpoint list. Then the traffic will again distribute it to the power.

C

Now there are some other important settings when configuring these application probes initial delay seconds, which is how long to wait after the Pod is launched to begin checking timeout seconds is how long to wait for a successful connection. It could be HTTP gate or TCP socket only period seconds, how frequently to recheck for the pro connection and failure threshold which states how many consecutive failed checks before the prop is considered failed.

C

Now, let's take a look at load, balancing capabilities which we have openshift, provides built-in load, balancing capabilities which can distribute the traffic across multiple replicas of your application. This helps to ensure that no single replica is overloaded and that can that traffic can be automatically rerouted to healthy replicas. If one fails, you can even split the traffic between multiple services for a b testing, blue, green and kenro deployments.

C

This is useful when you want to slowly onboard the newer version of the application to the users.

C

Scheduling plays a vital role in achieving High availability of the applications. Let's take a look how we can use different scheduling strategies for this.

C

We do not want to have all our parts on the same node, because that would make having multiple Parts irrelevant when nodes must be taken. Offline openshift does a good job already at this. The scheduler by default will try to spread your ports across the nodes that can run them now. The first of all. We can use node selectors on ports and labels on the nodes to control where the Pod is scheduled with node selectors openshift schedules the ports on the nodes that contain matching labels.

C

It's a good practice to dedicate some notes for your mission, critical applications to avoid impact such as resource crunch on these application ports, which may be caused by some other parts running on the same node. When we want to have more granular control of on the scheduling, affinity and anti-affinity rules can be used. Using these rules, you can spread the parts of a service across nodes, availability zones or availability sets to reduce correlated failures when latency or performance matters.

C

Pod Affinity can be used which tells the scheduler to schedule relative Parts together to avoid scheduling of some parts on some on the same node for anti-affinity can be used, which will prevent the scheduler from locating the new pod on the same node and will spray the parts across the multiple nodes or availability zones.

C

Application pause to get scheduled on the specific nodes dedicated for your mission. Critical applications, tense and tolerations can be used tense and tolerations allows the node to control which parts should or should not be scheduled on them. Now, even after using Affinity or anti-affinity rules, it does not guarantee or of even spread of your replicas with preferred mode. Even spread is not guaranteed and with strict anti-affinity mode.

C

You may end up running single replica on nodes, which presents as a problem of fault tolerance and may not be a efficient use of resources available on the nodes. This is where topology spread constraints comes to the picture, so the topology spread constraints gives you more granular controls during scheduling of the pods. You can use spot topology, spread constraints to control the placement of your pores across nodes zones or other user defined topology domains by using pod topology spread constraint.

C

You provide A fine grain control over the distribution of ports across failure, domains to help achieve High availability and more efficient resource utilization. Take a look at how a board topologies spread constraint. Definition looks like looking at this example. Let's consider you want to run seven replicas of your application across three availability zones.

C

Now you can't have even distribution of seven replicas across three zones. In this case, scheduler scheduler can't evenly distribute the parts across available designs.

C

So in that case, what it will do is it will schedule two two parts on two nodes and three parts on the third node. This is controlled by the max Q field.

C

Skew field specifies the degree of imbalance in the spread. It tells the allowable difference of the number of Parts between these zones. As an example, if we, if, if we have Max Q set to 7, the scheduler may end up scheduling the ports, seven zero zero across these three availability zones.

C

The next field is in this example, is topology key, which is a key for one of the node label. The next field is when unsatisfiable, it is the action of the scheduler, uh uh the action the scheduler should take when it can't satisfy the conditions specified in the topology spread constraint. It could be scheduled anyway, which means even if it doesn't match the condition. The parts will still be scheduled or do not schedule, which means if it doesn't match the condition, do not schedule the pods.

C

Even with all the above techniques accounted for once a maintenance event has been completed, pots may be scheduled in an unbalanced manner to better prepare for the next maintenance event. We may have to make sure the ports are scheduled evenly.

C

You can also benefit from this scheduler in a situation where your nodes are over utilized or underutilized in these situations, where new nodes are added to the cluster or other scheduling, conditions have changed and let's take a look at different disruptions and how we can deal with them before discussing part disruption budget. Let's understand what what is a disruption pots do not disappear until someone, a person or controller destroys them or because of unavoidable situation occurs.

C

Disruptions are of two types: involuntary disruptions which is involuntary disruptions, are unavoidable, such as Hardware failure, cluster admin deleting the node mistakenly Kernel Panic on the nodes or nodes running out of resources, or even a cloud provider failure, and we have a voluntary's disruptions which are typically occurred by application owner or admins, such as burning a node during maintenance activities, removing the port from the node, typically during cluster scale, down where it removes the node from the cluster modification to the deployment causing a port restart deleting a pod or deployment from the cluster or application upgrades to deal with the involuntary disruptions always ensure you have enough resources available in the cluster to run applications efficiently.

C

Even if a node goes down for some other reason or application for fails, the scheduler should be able to schedule the pause on other available nodes for fault. Tolerance always run multiple ports at least two ports for your application, such that application can tolerate a single point. A single power failure also spread these parts across availability zones using scheduling techniques, so that, even in case of easy failure, application can still run efficiently serving to the clients.

C

Let's take a look at uh how we can deal with voluntary disruptions, as these are application owner or admin initiated disruptions, always use GitHub practices for kubernetes manifest files with githubs Git will be the source of truth even for your application, manifest files.

C

Even if someone manually modifies the kubernetes Manifest file or performs some disastrous activities such as deleting a kubernetes resource, the files will be created again in the cluster, with auto sync policies.

C

Another way to deal with these voluntary disruptions is to use deployment strategy. A deployment strategy is a way to change or upgrade an application. The aim is to make the change without any downtime in a way that usually barely notices the improvements by default in openshift. We use rolling deployment strategy. A rolling deployment slowly replaces instances of the previous version of your application with instances of a new version of the application and it waits for the New Ports to become ready via Readiness check.

C

When using ruling strategy, you need to ensure that your application handle and -1 compatibility.

C

The third and important way to handle voluntary disruption is to use for disruption budget a pot disruption budget is nothing but a budget for voluntary disruptions, which is typically used to achieve High availability of applications.

C

This is a error budget which states no matter what we will always have: minimum set of application, ports up and running all the time in case of voluntary disruption. A bond disruption budget specifies the minimum number or percentage of replicas. That must be up at a time if you specify a port disruption. Budget openshift respects them when preempting pores at a best effort level, it limits the number of parts of a replicated application that are down simultaneously from voluntary disruptions.

C

For example, if you are running three replicas of a power on two nodes and pdb is configured to run minimum two ports all the time during maintenance activity, the node running to Ports will never be drained till the time we have an additional part running on the second node.

C

huh Now, even with all this in place consider a disastrous situation where we don't have enough resources available to run the entire workload in the cluster. This is where report priority can help to some extent. Let's talk about product priority and preemption.

C

A popularity is defined in a priority class, which is a non-names based object. Port priority indicates the importance of a pod relative to other ports and queues that the parts based on that priority.

C

This is specifically useful when you have unknown failures like easy failures or have minimal resources available in the cluster poor priority ensures your part gets the highest priority when scheduler tries to schedule the pause. The default priority is z in between 0 to 2 billion population. Preemption allows the cluster to evict or preempt lower priority parts so that higher priority Parts can be scheduled or can be run if there is no available space on a suitable node for priority also affects the scheduler scheduling, order of parts and out of resource eviction ordering on the Node.

C

The scheduler schedules a higher priority part sooner than the pots, with lower priority. If its scheduling requirements are made, you can apply Port priority and preemption by creating a priority class object and associating parts to the priority using priority class name in your pod specifications by default in openshift, we have three priority classes and you can also have one Global default priority class in the cluster, which means, if the part doesn't if the port doesn't specify a priority class. This default priority class will be used.

C

Now, let's take a look at some of the best practices to be followed when deploying the applications on openshift to achieve High availability.

C

Now, when we talk when we talk about best practices, always ensure you run multiple replicas of your application pores, you can use replica, set or replication controller to ensure we have X number of replicas running all the time always spread your application ports evenly across nodes. You can use different scheduling strategies as we discussed to spread these application ports evenly across nodes, then specify the disruption budget for the for your parts use pdbs to so that applications will have minimum set of replicas running all the time.

C

Even during voluntary disruptions be aware that bot disruption budgets cannot prevent involuntary disruptions from occurring, then Define, a scheduler, D scheduler policy. Even with all the scheduling techniques uh we have once a maintenance event has been completed, reports may be scheduled in an unbalanced manner to be better prepared for the next maintenance event we may have to make sure parts are scheduled evenly.

C

The next is design your application so that it can tolerate losing pause. You need to instrument your application to do the nest any necessary cleanup when it receives a sick term. This phase has to be quick with openshift. By default. You get 30 seconds when, when a bar gets sick term to clear all its in-flight connections in this case of a web in the case of a web application, there is no time at this point to wait for the existing client sessions to be concluded. This leads to us to the next principle.

C

It shouldn't matter, which part receives a request, because there is no way to clean in-flight sessions when a port gets killed. It may happen that subsequent request of an in-flight session go to a different part. After that power was managed after the port that was managing the session is killed. Your application needs to be designed around this. Eventually, it might be okay to lose the session in some cases, but in most of the circumstances, you will want to give your customer the best user experience, which means not losing decision for a stateless application.

C

It shouldn't matter. However, if your server side application maintains a state in the form of a session, then you need a way to persist it to either a cache or a database so that it can be retrieved by the parts.

C

The most important thing is sure nothing across multiple clusters. The cluster should be completely independent. Without sharing any resources you might be tempted to share the storage layer or some Services outside the cluster. Unless this Services provide their High availability solution, the cluster should not share any service with other cluster.

C

This way, we avoid creating a single point of failure. As an example, let's consider you have created an active, active openshift cluster, wherein you are running your your application in both the Clusters and load balancing it when uh Gateway or load balancer.

C

Now, even in case of a cluster failure or an application failure in one of the cluster, your application will still be available to your end. Users now consider a situation wherein your application is: writing some data or reading some data from an Oracle database which is outside your openshift cluster.

C

If this or app, if you are using the same Oracle database for both of these clusters, and if your Oracle database is a single in it's a single instance Oracle database, a failure in Oracle database will will cause the entire application to go down. We we should avoid this kind of situations.

C

The last is capacity considerations. Doing maintenance requires having some spare capacity at a minimum.

C

One needs to be able to take a one node of the cluster performing a cluster maintenance on one node at a time can take a long time for large cluster. One can obviously take multiple nodes offline, but that requires having more spare capacity.

C

Finding the right balance between the spare capacity reserved for maintenance and the time for the maintenance operation will take.