Red Hat OpenShift Cloud Tech Tuesdays | Red Hat Livestreaming, 27 May 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Cloud Tech Thursdays: Scaling OpenStack at CERN

Description

Cloud Tech Thursday explores the full modern open source cloud stack, from hardware to serverless. Learn about new ideas, projects, and releases around Kubernetes, OpenStack, hybrid cloud enablement, and many other topics.

This episode: How OpenStack is scaled to meet the needs of CERN

A

Good morning, good afternoon, good evening, wherever you're hailing from welcome to another episode of cloud tech, thursdays here on open shift tv, I'm chris short, executive producer of openshift tv, we are joined by a bunch of friends here from red hat, as well as a special person from amy will tell us, amy hi how you doing today.

B

Good, let me go ahead and introduce my compatriots. First, we have josh burkus, who is the kubernetes community person here at red hat? We have mike perez, who is the seph storage community architect and myself, amy marisch, who is the openstack community person here at red hat, and we are very pleased to announce that today we have balmera moreira from cern to talk about scaling. These openstack cloud at cern romero.

C

So hello, my name is belmir murdaira, I'm a computer engineer at cern. um I joined around 12 years ago.

C

Initially I was working server consolidation and how to virtualize the batch service. That is a huge service at cern but rapidly move my focus into cloud computing and how to deploy and manage a large-scale infrastructures.

C

um I'm also a member of the openstack technical committee and also co-chair of the openstack large scale scene, and I'm really happy to be invited to be here to talk with you about the cern infrastructure.

A

D

Cool you said you have a short presentation.

C

It's not that short. uh Okay, just slides, fair.

A

D

Okay, I think it'll be.

C

Enough for the hour, um I can start sharing.

B

It's certain we're talking large scale here right, can't.

C

A

B

C

All right, can you see it.

D

Yes, come on, they, they accelerate things to the speed of light.

A

D

Especially the speed.

A

C

So that's why I'm gonna through very fast through this one, all right so yeah. I can start so this session, it's about how we scale openstack at cern. So currently we run thousands of nodes and thousands of virtual machines and we go through through the steps from the beginning in 2013 to today, but maybe the audience is not familiar with cern. So in the next couple of slides I'll, give you an overview about the organization and the role of the cern cloud infrastructure in the organization.

C

So cern is the european organization for nuclear research. It was established in 1954, initially with only 12 member states, and this number has been growing over the years. Currently, there are 23 member states, black sits in the border between france and switzerland, and it's very very close to geneva.

C

The mission of the organization is to do fundamental research in the particle physics field. Cern is the biggest international scientific organization in the world. More than 10 000 scientists from more than 100 countries work in the organization.

C

Not everyone is a staff member. It's mostly people from different universities around the world.

C

So, to help understand the universe cern provides a unique range of particle accelerator facilities.

C

The slater complex at cern is a succession of different machines, accelerators that accelerate particle beams to higher energies very close to the speed of light. The lhc is one that you can see. Clearly in the satellite picture is the stern largest accelerator. It's also the world's largest accelerator with a 27 kilometers diameter, and it crosses two countries, france and switzerland, for comparison. You can see the geneva airport here and to have an idea about the size of this machine.

C

Here is the cern main site, and these are other accelerators, where the particle beams travel before being injected in the large iron collider.

C

um So the lhc has flourished not only one but two beams of particles that travel in opposite directions, and they collide in very precise points that are the experiments so atlas, alice, cms and lhcb.

C

So what is quite fascinating to me is that all of this is 100 meters underground, and this is how the tunnel looks like you can see the magnets. So these big blue pipes that you see here they are around 10, 000 of them in all tunnel.

C

Each magnet can measure between 5 to 15 meters and can weight up to 35 tons inside each magnet. There are two pipes where the particle beams travel in opposite directions. These are electromagnets so and superconducting. This means that they can conduct electricity without any resistance.

C

However, to achieve superconductivity states, they need to be at very, very low temperatures, so minus 271.5 degrees celsius, which is even lower than the temperature of outer space. As you can imagine, the cooling process takes a few weeks and requires tons and tons of helium.

C

So these are the experiments, atlas, cms, lhcb and alice. These are the particle detectors where the collisions occur. These machines are huge. They have up to 45 meters, long 25 meters in diameter and more than 12 000 tons, and, of course, everything is 100 meters on the ground mike visited this some time ago.

C

A detector is basically a digital camera, but they can take up to 40 million picture seconds. This produces up to one petabyte of raw data every second, of course, we cannot handle all this data. Well, our search system doesn't support this. So what physicists do they have triggers in the experiments that try to identify in real time the interesting events and everything else is discarded.

C

So at the end we end up with the few gigabytes per second that are stored for analysis.

C

Even though per year we store around 90 petabytes of data. That needs then to be analyzed.

C

So this is how it looks an event after reconstruction.

C

And with all these pictures, uh physicists can have a representation of the collision events. The the analysis of all these data gives gives physicists insights how the particles interact, but detectors are not only undergrounds at the cern site. They are also in space. So this is ams the alpha magnetic spectrometer that was installed in the international space station in to measure antimatter and cosmic rays and to search for dark matter.

C

All these data that ams generates is transferred to earth to be analyzed, and most of it is analyzing the cern clouds okay. So this was a very brief introduction about the mission of the organization and what it does so now, let's start talking a little bit about the cloud infrastructure.

C

um So to process all of these data and to support the the scientists all around the world cern also provides compute resources and to the scientific community.

C

Over 90 of the resources in the compute in the data center are provided um through the cerns, openstack private clouds, and to understand the motivation uh why we build a private clouds, we need to go back through the beginning, so 29, 2011 and and then we will see the evolution of the cloud infrastructure over the years and some of our architecture decisions.

C

If you have any question, please interrupt me at any time. So this is the data center in geneva, it's a building from the 70s. It was designed to have a prey super computer in the middle. At that time, of course, this evolved. It was upgraded over the years. Now it's a normal data center.

C

This it has two floors. This is one of the floors, um but one of the limitations that we have in this data center is the power capacity. Currently it has a power capacity of 4 megawatts and is not easy to extend data center. So that's why? If you visit now, the center you'll see that most of the racks are not completely full, usually they are all full and power constraints is one of the reasons.

C

So this is another data center that the cern operative.

C

From 2013 to 2019 over six years and there we only run compute nodes for the openstack clouds. All the compute nodes for computing for processing um were for the openstack cloud, so this is in hungary and it was a huge challenge for us, because when we launch in production our cloud infrastructure, we had two different locations, one in cern geneva and the other in hungary.

C

So the challenge was not only to deploy openstack at that time, but also to run these different locations transparently and we're going to talk a little bit more about this later, and these are another location when, where we run our cloud infrastructure, these are now compute containers ident with high density for computing, and you can see some when they were installed and this cooling hardware being installed right. So this is one of our dashboards for monitoring and you can see the size of our current quality infrastructure.

C

So we have around um 300 000 cores uh in the clouds um 3 400 users. uh More than 4 000 projects around 30, 000 visual machines. This changed a lot. We are in the process that we are decommissioning a lot of hardware because it we are replacing it. So that's why you see this big drop in compute nodes and number of vms that we added beginning of the mode.

C

We also have a lot of services in the cloud, so we have ironic supervision bare metal you. We have around 8 000 bare metal nodes, magnum clusters. um Usually most of these clusters are kubernetes more than 600 and also volumes for cylinder. You can see that we have a lot of block storage more than three petabytes.

D

So quick question from the audience kind of unrelated, which is: um can you direct our audience member to where they can look at jobs that are available at certain.

C

E

I've already sold everybody right.

B

To work at cern with.

C

Y'all um there is a web page for jobs. I think, if you search jobstern, you will immediately find it I'll.

B

Go ahead and lock it up.

C

A

Job cern, let's see oh here we go careers careers.cern, that's all it is yep. That's all you need I'll drop it into that. Are you already dead? Thank you. Okay,.

B

Y'all are faster than I am yep.

C

All right so going back to 2011., um so that was a period of change, so we are in a period where um we had a lot of computing requirements. They were increasing a lot and the lhc was running. um There was a need for more computing resources and we had these power constraints in the computer center in geneva. So we need to expansion options in a different site.

C

Also, we need a business continuity.

C

It's nice to have different locations, because data center doesn't run only processing jobs, batch jobs for to analyze the lhc data, but also it runs all the services for the organization so having a business continuity plan and on top of that, the disaster recovery.

C

So that's, why uh cern opened at the international standard to for all the member states to have another data center and the hungary one? So that's why we got that data center in hungary, so the project started in 2011 and the data center was ready in 2013 uh just in time for the launch of our private openstack cloud infrastructure, where.

D

Where hungry is it.

C

So it's very close to budapest.

D

C

And when I say wiegner vigner was the exact location but he's budapest.

C

Okay, and then we are in the situation that we know that we can have a new data center, also more servers, but we are still using our manager managing tools from the beginning of the 2000s tools that were developed in-house.

C

It was a time that there was nothing available actually to manage a data center of our size in 2000, so we needed to build all these tools. However, reality in 2011 was completely different. There were a lot of open source projects that probably were doing. Definitely they were doing a better job and we did much more functionality than the tools that we made in a house and then other problem was when an organization build their own tools.

C

Attracting people to work on those tools is actually very difficult because they only add value inside the organization and when you arrive there, you need to learn them, and if you live, that knowledge will not it's not interesting for other organizations and companies. So it was a time that we needed to really adopt. Opens the open source tools available to to manage our data center, um so we start looking to the all the options available so why to build uh also cloud infrastructure.

C

At that time, everything was running on physical machines and it's a completely shift, um not only uh the way we manage the data center, but also for all the users that that's what we needed to tell them. Well now we need to transfer these workloads to virtual machines and, as you can imagine, a lot of them were server others they liked it to have their machines in center and control them, and this was a huge cultural change.

C

But, as everyone knows so, cloud infrastructure brings a lot of advantages: improve operational efficiency, the resource efficiency, so the the possibility to consolidate a lot of servers in one resource.

C

It's it brings a lot of advantages and something that was quite easy to sell was to re-improve the responsiveness, because at that time, if someone needed a machine, they needed to fill a lot of forms, and maybe, after a few weeks or even months, they will get to the physical machine to work on having called the infrastructure. With an api sole service, they could immediately take that machine, so we started identifying neutral chain and the things that we clearly need was a configuration management tool. So there were a few options at that time.

C

um At that time we decided for puppet and this what we've been using since then puppet is not only used to configure the openstack infrastructure, but is worldwide used through the organization to configure all the it services, for example, monitoring tools, and there are a lot of projects open source projects that were in the define and that we are using today, so kibana search collect the fluent the um that we are using to manage not only the monitor, not only the openstack resources, but all the resources in data center under the cloud manager tool.

C

So it was a time that there is not much available and when we started really looking into this, this was 2009. um I started looking to open ebla, open ebla was a open source tool. Actually, we start looking to open about to visualize the batch system and we we were quite successful. We did huge scalability tests because one of the concerns at that time it was these open source cells- were able to scale to our needs. So we were able to create more than 15 000 visual machines.

C

You know on openable, which was amazing at that time.

C

um That was one of the options, um but also cern, beginning of 20 2006 um was managing a small visualization tool that was built on top of this microsoft system, center virtual machine manager, where the cern team basically built a web interface on top of it, and it was a basic web interface where the certain users could go and create virtual machines, selecting an image and basically was it. So there was no api interaction. It was only that web interface.

C

um It was a virtualization tool, but it was quite popular, so it had in 2011. It had thousands of virtual machines running in that microsoft infrastructure, but that was a time where openstack was released 2010 and that was a game changer. So, with all the industry support to this new cloud tool, I think it was clear from the beginning that was the right choice for us to to invest in openstack, to understand this tool and to join the community.

C

So we started investigating basically openstack from the beginning.

C

um This is a presentation that I gave in january 2011 uh to my management to basically describe my findings about openstack, so this was based on the first release of openstack austin.

C

um You can find the presentation in this link. It's quite funny now going back all these years and see this presentation.

D

I'm glad that I did.

C

D

Given the timing, did you consider eucalyptus or cloud stack at all.

C

Cloud stack was not there yet yeah, it was only after uh eucalyptus, it was an option, but it there was not a lot of deployments running eucalyptus at at least at our scale. So it was not. It wasn't never considered as a good option for us.

B

Yeah I started with grizzly. I can't imagine, starting with austin.

C

Well, it was quite a challenge going back now into these slides and uh revisiting. All of this is it's amazing, so in this slide, this is the architecture of nova at that time and yeah.

C

We believe that nova was complicated at that time with these diagrams. Well, when you see it today, it's completely different. It was a time where also where there was only two projects that was swift and nova, nothing else so glance. I think it was only available on boxer or katus, so everything was no good even to create users, you needed to do nova manager user, create something like this.

E

Nova was doing more than just virtual machines at that point right, so it was doing storage even at that time. Maybe we even had networking that was being done. Some people would call that the good old days of a flat network before we moved on to quantum and then moving on to neutron.

C

E

No network still.

B

C

B

But the texan in me has to just clarify, even though it is spelled b e x, a r is bare because we pronounce things funny.

C

A

Texans do or just.

B

We have some weird city names, fair.

A

County is one.

B

C

Thank you amy to raise that.

C

All right, so this was 2011, and so we needed to to get our hands dirty on this. um So we created uh several prototypes and the goal was to adding functionality in this different prototype. So we started with with what we call guppy, because so it was a very small and fragile animal and you see that the animals over time get more stronger with more functionality.

C

um Well, this was deployed the first prototype with the fedora 16, and why? Because at that time there was the federal cloud sick team that released the rpms for fedora.

C

They were not available for rail, yet centers, so we use the fedora because of that also it was a big exchange for us because this it was using kvm and we wanted to use kvm because we thought it was the future.

D

C

Then, in our open ebla tests we are always using zen, and but that was a pivotal moment also for kvm. It was integrated on rail six. I believe, and uh it was the only one supported there.

C

We started from the beginning using the openstack puppet modules, and actually we helped them to to develop the initial puppet modules um with the puppet labs. At that time it was funny times um and yeah that was just initially initially tests. We went then to a different release, and this was always closed on it for us to test, and you see that at this time we already moved to center six, our scientific linux, six um and also I preview.

C

So you remember that infrastructure that I told you that we had at cern running on top of this microsoft infrastructure.

C

So we wanted to continue to support that and we at that time believed that the easy way to do it was also to have hyper-v in the openstack infrastructure and then move those machines into openstack, but using the same virtualization layer.

C

um So you see the challenge: it's completely new product trying to scale this or to move our own infrastructure to openstack two data centers and then two different visualization technologies, um keystone ldap integration. What was also tried during this version? uh You can imagine that we have a huge that directory.

C

um Then the last prototype we opened to some of our community. We tried to put all of the services into aj and we had in this prototype more than 600 compute nodes.

C

This was already beginning of 2013 and basically we launched our cloud infrastructure in july 2013. from the beginning. It was clear that um if we get serious with this, we need to engage with openstack community, because there is no way that we alone will be able to solve what the issues will need the help of the committee. So you see that, from early on, we started attending meetups, um also helping the community and the organizing meetups. This one was at cern in end of 2013., um and this was my first openstack summit.

C

This was 2012, I think, in san francisco. It's a long long time ago- and this was the keynote realm- is how small it it was at that time.

C

Right so the cern cloud infrastructure, so we started using scientific linux 6., so that was in 2013.. um Later we moved to centos center 7.

C

um from the beginning that we are using the rdo packaging and it's great to have all these packages and all the best thing that red that does. uh However, we still have projects that we need some internal packages. So we need to rebuild all of this and from the beginning that we are using the upstream public modules for for openstack.

C

So some some considerations uh when we start to building this so the number of compute nodes, um we started very small with only a few hundred compute nodes, but we knew that we wanted to move all the data center to to openstack. So at the end, is a few thousand compute nodes. So is this tool able to to scale to these numbers at that time?

C

If you look back, there are not a lot of big sites using openstack, so that was always a concern uh different locations, so the data center that I just mentioned at the member of openstack projects. That was a time that every week was a new openstack project popping up. um It was very hard to to follow all of this, and then there was all these split of functionality. That also was happening so, for example, nova volumes moving to cinder um nova network moving to quantum.

C

So this was very difficult to when we are trying to deploy a series infrastructure very difficult to manage.

C

um Also then, we add a large number of users and projects um a lot of users every month um leave the organization um around 100 users leave the organization and one other users come back to the organization or join the organization.

C

There is always this constant move of people at cern, um so we needed to automate all of this um automation of creation of the projects when the people leave the organization, all the project removal all these automation needed to be invented, and then, of course, so we're having a large infrastructure, all the automation that is needed to manage the infrastructure itself. All the procedures just needs to be figured out because everything is new, so the kind of workloads that we run in the infrastructure- um mainly it's physics, data analysis.

C

So all the data from the lhc experiments and many other experiments, I.t services uh and then many other infrastructure that is required for the organization, the experiments services to run the experiments itself, engineering services to develop different tools for the experiments um and also personal vms. So any user at cern has the possibility to to have a project and run their desktops, their personal vms in the infrastructure.

C

So most of our visual machines are ephemeral.

C

They they consume more than 80 of all the cpu cards available in the clouds, and this is what is used for the lhc data processing and the four, because this is a very specific use case. We have special configuration for this virtual machine, so we do cpu pass through.

C

uh We have no more flavors, but also they are ephemeral because they are only processing jobs, so things like lime, migration. We are not really interested for for this kind of visual machines.

C

Then we have the pets that are all the service vms, um where performance is less important. But what is really important is that we can keep these virtual machines running and the live. Migration is a huge requirement for for all of those. um It's multi, it's a mix of operating systems that we have here. We have a lot of windows vms as well.

D

So these mostly like people's personal environments,.

C

So these are people-personal environments, but also a lot of services. Most of the services of the.

C

Organization, so 2013 is when we finally open the private cloud infrastructure to our users. uh We started with two cells and cells at that time was a quite new concept. Not a lot of people were using that um we decided to use cells instead of regions, even if we had two different data centers, mainly because we wanted this to be as easy as possible for the users to migrate their workloads to the infrastructure I've. These were users most of them.

C

They they are not computer service, but they need to to manage their their applications, uh physicists that they have their projects.

C

uh We wanted to be as easy as possible for them to move their workloads from physical servers to cloud infrastructure. So that's why we wanted to reduce it maximum the number of concepts.

C

So that's why we decided from the beginning to use nova cells- and this was cells v1 at that time, so we had one cell in geneva and the other cell in at that time. We wanted to have a j everywhere, because we believed it was a good architecture.

C

um We didn't want to have a single point of failure. We accelerometer um bad idea. We moved back uh some time, then after um glance was safe backhands, but at the beginning, really in 2013 it was actually all the images were stored on afs, because the cyph cluster at that time was not ready.

C

So we had for a few months all the images stored on afs and we were still doing some cinder tests, aj proxy. That continues and then we had. We had three masternodes per person, because this iv ability write in q again grab team q with cluster with the milker queues.

C

I think nothing special, very, very common architecture. So this is the diagram that you can see that we have two cells geneva vigner, um and this is all services that are running at at that time um for cellulometer we're running mongodb for the database.

C

There was no gnocchi at that time, stacktalk. We we started running that at that time it was. It was very good to have a perspective of the service, um and then we we keep more or less the the same architecture. This is the top cell. um The architecture of cells v1 is completely different from cells v1. So that's why? You see, you may not recognize the service that we have here from the current architecture of openstack.

C

So this was the vm growth since we launched for our users in in july 2013, and you can see that this is the cumulative number of vms that were created in cloud. This is only until april 2017, but the pattern keeps the same after 2017, and this is the number of vms um growing. So you see that this was very well adapted by our community and also we worked quite hard to basically move all the compute all the physical nodes from the data center to the infrastructure.

C

There are weeks that we are moving more than 100 nodes into the cloud infrastructure, and this was not new order. This was converting the existing servers into compute nodes.

D

For a while, so when did you get have all of the you know, standalone servers been converted, or is that something you're still working on.

C

um So all the servers that were that are dedicated to computing uh are converted. However, not everything in the data center was converted to compute nodes and not everything runs on top of the openstack. One example is storage, so does it doesn't make sense uh to run the storage service on top of openstack, so um they continue bare metal machines, managed by the storage team.

B

Okay, so those bare metals aren't ironic, but yet you do have ironic as well as magnum clusters in the system.

C

Right so ironic is a quite recent product in in our clouds. So I, if I remember well in 2019, it's available since 2018 2019 um ironic started to uh for us. You know it was a requirement because a lot of people a lot some of the use cases that that we had um didn't fit well virtual machines, so people that really needed a huge virtual machines, a full node virtual machine. So there is not didn't make a lot of sense to visualize uh that environment for them, because they were losing a little bit of performance.

C

So we will apply it bare metal, the ironic service and initially um our goal was to to have an api for for for the users to interact with the cloud with bare metal the same way they interacted with virtual machines. But we had bigger goals for ironic, not only that not only having a pool of parameter nodes for people to use, but also to change all the workflows in the data center and what our goal at that time was to manage all the resources in the data center using bare metal, including the compute nodes.

C

So currently, all the compute nodes in the infrastructure are managed by ironic. um We have this kind of exception and we have a lot of this in our infrastructure.

B

Yep so yeah, so another follow-up question I kind of want to ask is what determines whether it's a vm, a container or on bare metal.

B

So what determines where the workload goes?.

C

So it's quite ironic that we are talking about all this inception and all of these concepts- and I have the slight scale and quite simple, because.

B

We'll go ahead and go through and we'll we'll get to it.

C

um No, so that is a decision of the user, uh so the user have all these apis, and if he has quota, you can create uh bare metal nodes or you can create a virtual machine or you can create a container depending on the in the application that he wants to deploy.

C

Okay, I I think I still have one slide about ironic. I can go then deeper on it.

C

Okay, so, as I said, scale implies simple, because if from the beginning, you know that we're gonna manage nodes um dark, the architecture needs to be simple, so this is an overview about our architecture, um something that we decided from the beginning was to ease away to the openstack services. So we don't have a few physical nodes like most of people, three physical nodes and we run all the openstack control plane in those nodes.

C

So we just we try to distribute as much as possible. So we have machines that only run keystone. We have around 16 of them. We have machines that only run glance, api, neotrend and so on. Why? Because all this isolation allow us to upgrade all these different openstack components independently and allow us to focus in one problem at a time, then for nova.

C

Also, we have uh this kind of architecture we run all the api is completely isolated. Then we have the level one that is the main control plane for the for cell zero, where we have the schedulers and conductors.

C

No, no vnc proxy and placement as well is completely isolated from nova and even the databases.

C

We have completely isolated instances for each openstack project, so we have a mysql instance for keystone, one mysql instance for glance and for nova.

C

We have one independent bicycle instance: per cell and again is to have this insulation, and then we have the the cells itself and they are very, very simple: they have the control, plane again is related and then uh all the compute nodes so with which this represents that we have one control plane or only one server, acting as a control plane for around 200 compute nodes and in total we have around 80 cells.

E

Can you speak about the benefits of isolating things into different cells.

C

Can you repeat mike.

E

Yes, can you uh talk about the benefits of uh taking the sales approach of isolating different services within child cells, like uh how did you reach that requirement in your infrastructure.

D

And, and also for for my sake for cells were talking about sort of groups of physical servers, yeah.

C

Exactly it's a logical partition of the nova deployment.

B

So it can be a rack, it can be a data center, but it's.

C

B

Of servers that are together.

C

Yeah, I think this slide is good for this. Let's explain this, um so we decided to to go through the cell routes because we didn't have at the beginning to have this region concept to our users, but actually cells have a lot of benefits.

C

Basically, they can act as the failure domains um also, they can allow you to configure those servers in that cell in a particular way, and we are looking at those advantages um basically to to the player infrastructure. For example, you can see here that the rv ability zones are basically sets of different cells.

C

Meaning that if this cell goes down, the variability zone is not completely down it's just graded, okay, and because we have so many cells means that the control plane that we have for each one. It's only one server, meaning that if that server goes down, the workloads continue to run, because it's only the apis for this particular cell.

C

For these particular resources that will not be available and also it has the advantage to distribute all the control plane, meaning that for each cell, we're going to have a completely independent rabbet team queue, meaning that the scaling rapid mq will never be a problem for us, because the cluster, it's very small, it's just at maximum of 200 compute nodes.

C

So that's why we never reached scalability issues for ravity mq in nova. I think I have. I have a slide where I go through the advantages of cells and compare it with regions.

C

That answer your question mike.

E

Yeah yeah, I just wanted to give the sense to the audience of the benefits of it. Thank you.

C

um But what I would like to show with this slide is that we also run all these services that I show you in this slide: keystone, glass, newton, all the openstack control plane. We run it on top of the cloud itself, so the control plane doesn't run in physical machines dedicated for the control plane or doesn't run in a different clouds only for the control plane. It's running the cloud itself that it manages.

C

um So that's why we have this inception again. It's like ironic, as I mentioned early, that manages also the compute nodes, even if it's an openstack project.

C

So you see that in the servers side by side with the user vms, we have keystones, we have glances vms and so on and keystone, for example, and all the other servers are distributed between the different availability zones, the same availability process that we give to our users. We use it and also distributed between the different cells. So that's why we reach the high variability for the control plane.

B

It's nice to see using availability zones. I don't think people take advantage of it as much as they should.

C

We are exposing availabilities on store users from the beginning.

D

C

D

You started out with two widely separated data centers.

C

D

I actually weird techie question here: what what's the actual line lag between uh cern and hungary.

C

Oh exactly just in time.

D

C

All right so between cern and vigner, it's 1 000, 600 kilometers, and that translates to around 24 milliseconds of latency, um and this was at the beginning. We are trying not only to figure out how to set up the new data center, but also to set up openstack on top of it. Then we had this legacy, this latency issue as well.

C

um What you see in this slide as well is the connections between vigner and cern. So we add, two network links, 100 gigabits uh bandwidth between the two centers completely redundant. So that's why you see two and um after a few years we also added a third one. So we add a connection um with uh with a total of 300 gigabits per second between the two centers.

C

And this is what basically for us this was like connecting. It was a cloud interconnect um with the peering networks, because it was the same network for people that are used to public clouds.

C

Of course, having the data center there um that had some architecture implications. For example, um we run the databases, we started to run the databases in geneva, but the latency was so high, and that was a time that we didn't have nova conductor. So all the compute nodes were connecting to the databases at that time.

C

Also because we were using cells v1, the scheduler was purcell, so we had a scheduler in vigner that work on connecting to the databases in geneva, so the user experience at beginning vigner was not that great because it was sometimes very slow.

C

So we continued to iterate on this, so databases were deployed in the beginner data center.

C

um Another thing was ceph because the staff cluster was in geneva, so at the beginning, because the latency it was a very bad experience for users to have block storage in vigner, so in 2015, um saf cluster, the storage team deployed the saf cluster in vigner, for these use cases or block storage for the cloud um and other things, for example, um a glance cache.

C

uh Initially, we thought that 24 milliseconds was a lot of latency and we needed to transfer a lot of images to vigner, so we decided to implement class cache at vigner, but that turns out it's not really needed, because after some time the images are cached at compute at the compute node level.

C

So it was just an overhead that we had in architecture. So we removed that- and that was the case. That glance was the nodes were actually contacting lions in geneva without any issue. So you see there were different and several architecture constraints that we were figuring out over the years.

C

So the data centering vigner was operational between uh or cern was using the data center. It continues to run between 2013 and 2019. We knew this from the beginning. It was uh a contract for only four five years that was extended one more year um by the end of 2018. We are running there, 17 over sales, um three hundred compute nodes and uh for the availability zones. There we had the 78 nodes so early 2019, we started decommissioning the clouds in the vignette center and, as you can imagine, that was another challenge.

C

So we needed to remove all these cells from the infrastructure, and this was completely in november 2019 and one interesting part is that uh 2 500 servers actually returned to geneva because they were late purchase and it was still very good servers so and this service was the ones that were added to the computer containers that I show you at the beginning. In that picture,.

B

Now is wigner totally going away. Are you just moving some servers around because you mentioned.

C

So now we don't use the neura at all anymore.

B

C

All right so cells versus regions, so I list here some of the advantages of cells and regions to try to to make it more clear why we went for sales at the beginning. uh Basically, sales is too sharp than nova deployments. It's. It only applies to nova.

C

There is no other service that has cells, and that is actually an issue, uh isolite server domains and it's completely transparent users and one of the things it's uh also logical partition for operators and allow us to to have different configurations for a particular cells and distinguish the different cells with different configurations, which is important for us, for example, for the batch use case that has a completely different configuration for when comparing to the service cells regions.

C

uh They are completely independent, openstack environments, um so you need to basically deploy all the services, all the openstack projects in the plan independently in that region.

C

But you have that fault, tolerance, it's a completely different environment that should be managed in completely isolated way. So that is the big advantage, and actually now we are running multiple cells. By multiple I mean three cells, so in 2013 um for us it was more simple to manage one small cell than two small cells.

C

However, when the infrastructure grow to at this point, it's more simple actually to manage uh two or three small clouds than one big one, and one of the main reason reasons that also we move into regions is because neutron neutron doesn't have this logical partition with cells, and it was a big point of failure. So if neutron was down or or anything was affecting neutron, it was visible in all the clouds.

C

So all the users will see this and partitioning neutron between two three different regions uh allow us to to improve the reliability of the cloud. A lot neutron agents are quite chatty with drivetmq cluster, so needs to be a very big right, mq cluster, and that is always an issue to maintain also the new, um the the regions that we have now um they are per use case, so one region is focused in the I.t services and user vms.

C

The other two regions is more for or it's only for lhc data processing. So they only have that use cases which, for us logically also makes sense the way we manage this.

D

So the way you're describing nutrient, it sounds like a downgrade. So what are you getting out of upgrading to neutral.

C

Well, we we were forced to upgrade to neutron, because nova network is not supported anymore. However, for the old regions we still, we are still running um nova network. So we have six six cells where we still run nova network, um because we still, we are still evaluating how we gonna migrate. This is quite scary, migrating from nova network to neutron without interruption of the vms.

C

um Just thinking about that the vms could um lose network connectivity for some time. It's very scary for us and those cells are running important service for the organization. So that's why we are still trying to figure out how to how to do it in the right way.

C

So simple doesn't always plays well with the eye availability, but we try to to work around this. So that's why we have multiple cells for availability zone which allow us to if a cell goes down to for the availability zone, not to be completely down, which is a very good feature.

C

um The cell control plane is not highly available and if it dies.

C

There is no workload interruption, just people are not able to connect to to their vms, using the openstack apis or to to openstack operations using the the apis in that particular cell. However, um this simplifies a lot the the deployment because we have around 80 cells. uh If we had availability through all these cells control plane, we will have a lot a lot of control plane to manage and then, as I already mentioned, we also run the the control plane.

C

On top of the cloud itself, drive team queue, it's very challenging to scale and maintain um so we try to not run right into clusters at all, and what we find is that um if we have very small clusters like we have purcell reptimq is quite stable um and not having the complication of writing. Q clusters simplifies a lot of deployment and operations.

C

um Do you want me to go faster? I don't know if we have a hard line to.

B

Finish, how are we doing.

A

uh I mean we're fine on time. You can go over if you want no problem.

C

Okay, I'll I'll try to go faster now, so my sequel, databases um again like rabbitmq, um we don't have a cluster for my for the mysql databases, so we have independent mysql instances and the funny thing is most of them run on top of the cloud infrastructure.

C

So the storage is in a netapp solution, but the instance itself. Most of them run on top of the cloud infrastructure, except few exceptions that we have to for bootstrap issues that we run at them in on physical servers.

C

So openstack and when we have a cloud infrastructure where we want to have a lot of functionality, that translates to a lot of openstack projects, and these are the current openstar projects that we run, and you see the version that we have uh now in our cloud, and you see that they are. They are in different versions and is this isolation of deployment that allow us to have this so nova still in stein. One of the main reasons that we still stand is because we still have those cells running over network and upgrading.

C

Now, it's very, very risky. We have a lot of patches for another network to continue to run, um but you see that most of these services are running very, very recent releases, and that is one thing that we always try to keep up with the openstack release cycle.

C

um Having so many open many many openstack projects and managing uh all of this, and most of them manage uh thousands of resources. For example, sender manages thousands of volumes with the petabytes of storage behind um it's always tricky. For example, ironic. We have now around 8 000 nodes that are managing ironic and we started reaching scalability issues in ironic.

C

um So again, that is one of those things that you you get if you are scaling the infrastructure, and fortunately there is this functionality, conductor, groups that is more or less like nova cells in ironic and now we are taking advantage of this so splitting logically, the ironic deployments scale also means staging, even if we try to upgrade most of the services every six months, the configuration is always changing, um so the configuration through the private models is always changing so trying to have a ci cd testing.

C

Everything before we deploy the production is quite important and we have a staging process process to have everything. First, on pre-stage testing in a small number of nodes and qa and then going through different master levels until reach. Everyone in the infrastructure test stack is what we call to our testing infrastructure. Very few few nodes to test upgrades and new configuration options.

C

Scale also translates into automation. um We are using several projects for automation, uh for example, rally to prove infrastructure every day. Raleigh deploys thousands of virtual machines in infrastructure just to make sure that every cell is okay and everything is running as intended run. Deck is a project that we use a lot for operations.

C

For example, we have different teams. The repair team, for example, doesn't have access to the openstack resources. However, we have all these procedures and run back jobs that they can trigger. For example, when a node is needs a repair, they can trigger a job that will basically try to lie migrate. All the instances in that node notified users, if that is not possible, that that node needs any repair intervention and then we have misreal.

C

Mral is also an openstack project that we use for workflows, for example, uh for all the projects, creations and all the projects, removal when a user leaves organization so going through all the resources from users and make sure that they are deleted, and all of this is automated scale also means permanent changes.

C

So upgrades through the openstack release cycle is every six months, um so we run 15 openstack projects. So, as you can imagine, every it's almost an upgrade day for us um and then also we have the open operating systems distributions upgrade. So we started with scientific linux 6.

C

at some point we needed to upgrade to center 7.. This is never easy.

C

There is no easy way to move from six to seven: it's a required reinstallation. In our case, and now we are facing again the move from center, seven to synthesize and stream, um and we are working on this hardware commissions. So every around five years um copy nodes need to be the commission um and, as you can imagine, a lot of live migrations need to happen to try to do this transparent to the users.

C

So recently uh we just um removed around um or we like migrated around 900 visual machines, because we are the commissions some sales and we continue to do it. um This is a lot of work. We wrote a recent blog post. You can follow our work here.

C

Security, um as you all know, so, meltdown spectre. A couple of years ago, created a lot of fuzz, so we needed to actually reboot um most of our cloud infrastructure because of this um also disable hyper threading, reducing the number of cars available and these operations. When you have thousands of nodes, it's a lot of planning um a lot of work currently for kernel upgrades.

C

We are trying to automate this because when we have all these, these compute nodes running they run for years and the upgrade in the kernel is quite difficult without disrupting the user. So we are trying to automate this to basically having a tool that continues, live migrating instances in the infrastructure and when the compute node is empty. Just reboots, the compute node for the kernel upgrade.

C

And scale it's of course teamwork. um So what cern or openstack team it's um six seven people, but over the years we added the participation of thousands dozens of uh different students, fellows project associates um that joined the team for a few periods of time and contributed a lot for for this project.

C

So these are my slides. I didn't intend to take so long to to go through them.

A

B

Nah, this was great yeah, you know for people who didn't know what cern was. I think they got a really good understanding of it, and you know the infrastructure. The fact that you are running different versions of openstack based on what project is something that most people don't do, and you all have really good reasons for why you're doing it.

A

C

So I'm happy to answer your questions um also, the audience they they can follow me on twitter. Ask me questions there. um Some questions through the email, I'm happy to answer them.

A

Wonderful be great.

B

Thank you so much belmaro for joining us today. This was really good.

A

Very informative.

C

A

Never would have expected the number of problems cern has with infrastructure. uh Okay,.

B

A

B

Yep I got it, have you ever considered any other bare metal provisioning other than ironic, interesting.

C

uh So ironic was a quite easy decision for us because we had all this infrastructure based on openstack, so it was the natural choice for us also having the possibility to have exactly the same api, to create virtual machines um and bare metal notes.

C

It was quite attractive and it's, I think it's a really advantage for us, so the user doesn't need to learn a different api, different command line to this.

C

So ironic was always, I think, the most attractive solution to us.

B

I know we're going to have a little bit of a lag but go toss. One did that answer your question.

D

Well he's responding. I do have one of my own since I'm coming here from container land right. um One of my questions is: are you already running some workloads that are distributed as container images and, if not, are you planning on.

C

So at cern they are different teams that are using containers for for to deploy their applications. So, as you saw in the in one of the slides, we have more than 600 magnum clusters, and most of them are almost all of them are kubernetes clusters.

C

um So there are a lot of applications uh using containers as their deployment method. We also start playing with the containers to deploy openstack.

C

We recently we are experimenting with one region trying to deploy openstack uh on top of kubernetes using the elm charts, uh openstack elm shards, and we are experimenting with it currently. For example, in production we have half of the glance requests going through a glance that is deployed in a kubernetes cluster.

C

That answers your question. Josh.

A

Out of just sheer curiosity, I mean how many maintainers of openstack do you have like I'm.

C

Always curious about team.

A

Topologies and size and stuff like that.

C

So we are seven car members that, um but then we have all these fellows and project associates that joins our team, um but usually they don't do operations and they do more investigative work like looking for different projects, evaluate different openstack projects or kubernetes associated projects, and then, if we think it's worth to to invest in those projects, uh so then we go further and we try to implement them to deploy them in the cloud.

A

C

So currently we have um some people doing work on gpus trying to understand how to to have gpus in the clouds uh other people looking into how to have functions as a service in the cloud, for example, we have all these different projects that are always going on.

A

I'm I can only imagine how many different projects are going on at any given moment, given the resources that are available.

A

Have you collaborated with any other scientific organizations about your infrastructure, specifically.

C

Oh yeah sure, um so at the beginning we collaborated a lot with nectar, for example, that is a scientific, scientific research network in australia, and at that time they were using openstack. They are still doing and they were quite big and we changed a lot of ideas how to deploy openstack and more recently, we collaborate with the ska. That is the square kilometer array, basically, is the biggest or will be the biggest telescope telescope in the world, uh the sites in south africa and also australia for observation, and we did interesting projects with them.

C

For example, pre-interval instances.

C

So they are not available in openstack by default, so we collaborated with ska to develop this um also running kubernetes clusters on bare metal. There was a lot of work that was done in collaboration, for example, with ska in this area.

A

Cool awesome all right, so I dropped links to both uh nectar and ska in the chat. If folks are curious about what those organizations are all about, uh if there's anything else feel free to reach out to me question wise short at redhead.com, I can pass along to belmiro and the team here, um but without any further questions I think we'll just wrap up here. So thank you very much from here. This was awesome presentation.

D

Thank you so much yeah thanks.

C

Thank you so much for having me.

E

Yeah nice, seeing you.

A

All right take it easy and stay safe reminder. Folks, red hat is having a recharge day tomorrow, so we will be off the air completely and monday is a u.s holiday. So we will see y'all on tuesday stay safe out. There.

B

B