Red Hat OpenShift London 2022 OpenShift Commons Gathering, 6 Jul 2022

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: Application Delivery & Life Cycle Management - Francesco Giannoccaro (UK Health Security Agency)

Description

Accelerate Application Delivery and Life-Cycle Management to Support the Growing Complexity of Public Health Science

Francesco Giannoccaro (UK Health Security Agency)

This OpenShift Commons Gathering was held on July 6th, 2022 live in London, England
https://commons.openshift.org

A

Good afternoon, everyone um thanks thanks for joining this session. um My name is francesco jannocker and I will take about 20 minutes of your time to share my experience with working with colleagues in these last two years during the pandemics.

A

So I work in uk health security agency, I'm head of high performance computing and hosting services within the organization uk, I just say the health security agency in uk has been created in 2021, merging bringing together three existing organizations. One was public of england. I was working before in that organization jbc, which is the join biosecurity center and a division of nhs called nhs test and trace.

A

So this, the format ii, jvc and nhs test and trace have been created during the pandemics and and since 2021, uh precisely since october 2021, all these three body are now acting as a single agency.

A

um We have about 8 000 people, majority scientists working in the organization, and so we, the mission of the organization, is to protect the public from the impact of infectious disease um spanning you know, from any aggressive pathogen, but also from chemical, biological and um nuclear incidents. um So quite a broad mission and and uh a lot of expertise that have been uh coming to the organization.

A

So the the main scientific services that the organization uh provide to the country are the pathogen genomic services. So um in screening, uh through um the analysis of the dna dna analysis of pathogens, of both.

A

Virus, but also a number of other pathogens who not only don't think about just about the coronavirus, also more broadly to tuberculosis legionella monkey park. So all the aggressive pathogens that are potentially origin of national international global outbreak um include also in the activity that the agency does the antimicrobial resistance monitoring, so understanding how those pathogens develop resistance to the antibiotics and predictive models um to to understand how those outbreak evolve.

A

So the this is uh an ever-growing complexity, a scenario that constantly evolves um for a number of reasons because of we live in a global uh society where people moves and travel from country to current country to from different uh continents and uh and therefore um transmissible disease um travel at a different speed in in modern days, but also in the on the positive side.

A

You think about the ability of having significantly more data to analyze and to provide to scientists to understand how we can fight those uh disease so the the amount the large amount of data is, at the same time a challenge mean that, of course, to make best use of of this increasing volume of data.

A

We need also to uh have um an increasing capacity in terms of you know: computational resources and especially in scenario like this- uh that fortunately likely happened once in generation.

A

It's just not for trans things that we every generation go through, but there is um suddenly uh incredible volume of data comings from different organizations around the world share those data within the scientific community and therefore there's the need to scale at a speed that is difficult to cope with, so the the infrastructure that the organization has rely partially on on-premise data center and partially on off-premise capability uh running on commercial cloud.

A

So we have two main data center, one in north london and one in southwest england, ports on salisbury and clearly in those data center. We have both storage and computational capacity during the pandemic. We have expanded that resources on premise but, most importantly, we have enabled the ability to analyze and to leverage resources that are available from public cloud environments and in order to make to make those resources available.

A

The connectivity, of course play an important role when you have a significant amount of data that is are generated by the the whole genome. Sequencing machine every day so that the connectivity has been also a challenge, then the data you know you talk about petabyte of data every month uh produce the the globally.

A

um The data related to dna sequencing is growing at a speed that is higher than any other science domain and uh before pandemi it was already doubling every six months um and, of course, now is- is growing at even faster speed. So managing the amount of data in in an effective way um is is a challenge in itself and um is not only important to provide you know, resilience to this data, but also facilitate access to this data, which means imagine it is not as data that sit in a database.

A

This is a structure of data that scientific machine producer essentially are files in order to facilitate the sharing of those information with the scientific community around the world. It's important to catalogue. uh You know, to hard metadata to those files that allow browse ability of those systems, so technology around cataloging data making this data easily accessible is pivotal.

A

The the way we have been approaching, this increasing demand for computational results has been adding additional hardware to the data center that we have in in our infrastructure, the one that I mentioned- north london and southeast england, but also to say it's building um a lasting capability on commercial cloud in order to leverage that resources is important to give workloads the portability that they need to seamlessly run in different environment, and that possibility is one of the main issue that containers help to resolve.

A

So the use of um you know containerizing those workloads, quantized scientific application- uh is ultimately what enable easy use of elastic resist that are available in commercial cloud. So we have started using containers in the hpc domain in the high performance computing area, and we started that journey a few years ago with colleagues within the organization, primarily bioinformaticians.

A

So colleague that are tech savvy that you know have easily approached and understand the benefit of moving from um having multiple version of their software compiled in what technically is referred to as modules in in an hpc environment move. Those number of modules that are at least there into container is a process that I started a couple of years ago and that, as enable that portability that I I was mentioning before in this specific case, the container engine is a container engine called singularity.

A

But you know exactly the same concept of other container engine like docker, um so the benefit, of course is simplifying, say the the postability.

A

um The fact that you know you, you have different uh version of runtime, all in a single object, let's say and and the fact that this specific container engine is already engineered to talk with job scheduler in hpc environments. So in hpc, of course, an important role is played by the job scheduler, which is the component that spread um the workloads across a number of physical nodes, and then we have been trying to introduce and to present those benefit to our scientists.

A

Clearly, the audience is different. So when you talk, you know with a colleague that are already working on an hpc environment. They um are already more knowledgeable in some concept, whereas scientists like biology, biology, epidemiologists, they need a way to convert the the the ordinary way of working for them, so basically using their laptop and id from often um they work on in r, for instance, um but to allow them to use scalable environment um is slightly more complicated than working.

A

You know, with people that already have been using common line, interface like in an hpc environment, so kubernetes you know provided scalability, however, is is not sufficient to deploy or to move applications that scientists use from their laptop in a way that that that easily they they can use that level of scalability and openshift offer that wider ecosystem of components all integrated in in an easy way to to be used by colleagues, that don't don't have that already that technical familiarity with with some components like load, balancer, shared storage, network isolation, networking encapsulation and um we um using openshift, we have significantly facilitated that journey of moving application from um scientific workstation, desktop or laptop into an environment that can be easily scaled.

A

um There are so the benefits that we have seen, of course, are not only from a development uh aspect, which we we've seen many benefits and and very positive feedback from from our scientists. um Of course, the security aspect for them is is quite important. So every time you talk about moving something from a laptop they they feel like.

A

You know they control that they comply to information, governance policy, to move into a broader environment, and when you stop mentioning public cloud, you know you have to unders to explain all the security envelope that you're going to provide to make to offer the same level of security uh to what they do. So again, open shift um has um a very strong attention to security. The fact that the registry, the image registry, for instance, is constant, constantly maintained and, and therefore security patched is um a big plus.

A

It's not like downloading an image from uh you know, docker, hub or or or git is it's um using a base image that is already been assessed in terms of specifically in terms of security, then the other, of course um benefit that we have seen from from those colleagues is the fact that the the old ci cd capability is um a really streamlined within their the same environment.

A

So they uh the need to um really learn much about uh how to monitor you know the uh the loads on their uh system and and how to understand uh how to scale some of the components within boats or within an environment.

A

So the the logs, the the the fact that we still have uh an easy way to give this ability to today log to a more central um log capability. That organization has, in this case, for instance, through splunk, um has been a plus that has you know they didn't have before, and that has been much appreciated, but also from an operator um perspective. Also from you know, from within the team that I'm I'm liking to work more closely with um the benefits are um numerous.

A

So, um as I say, we we work in in quite complex environment, um probably remember from the first slides the you know: the ecosystem of open source technology. You have bare metal, high performance, computing environment, running linux.

A

You have those environment that can scale and burst dynamically, on, for instance, on openstack, where we run in different type of application in an elastic way, so openstack, for instance, run open shift, but also provide bursting capability to the hpc environment, so openstack hpc, those components are quite um you know, intense to maintain because they have very, very different um module within that the same environment and um having a more orchestrated way for these components, to talk each other to to automatically scale and to provide that level of flexibility that um you see in in commercial uh public solution is, um is one of the the benefits that from an operator perspective, openshift provides.

A

So here you know you, you see, for instance, the fact that um the security again is uh is at a very high level, so meaning you know as linux and mc as capability, which means in essence, that every container is treated as as a process and is fully isolated by all other processes. So um there is no need to give not only no need to give high privilege to the process that run in this environment, but in addition to the fact that they don't run, has sort of this root as user.

A

They also are a lower level, um have an additional capability of being isolated one two to each other. um The um the case, studies that I would like to briefly touch today that the one that we have seen are quite broad. So, in addition to the normal, let's say web application that um probably the majority of of of people using okay or open open shift. They use you know so web um environment with the database, which are part of our user case.

A

I'm going to touch briefly on, um for instance, batching process that we have been running on openshift, which are probably slide.

A

You know more specific for scientific organization, so we we run, of course, a number of uh web system that present the results of the analysis for a number of outbreak and, again, don't think just about calvi the coronavirus, but also about about other outbreaks that happen around the world, like less known, like the the the collaboration with who on legionella, musel rubella, which are all outbreak that um have happen in different countries.

A

Also, more specifically, we have a web application that target.

A

An important virus, which is the tuberculosis pathogen that still at these days, affecting and significant amount of people so um is produced. This is a clear example where um we have a technical partner that works on on the code, so at the image is produced by external developer and they uh push their image within our environment. They, of course they use. You know the same base image that we have.

A

We approve in terms of security and then, um when we tag a build as production that that environment becomes available uh to the public, we have an internal components, for instance, that we run an open shift and that is used um in collaboration with other scientific organizations around the country, and this this is to provide basically it's a consolidated simulation tools used by scientists and academia to produce consensus around the kovid r parameter, which is the the parameter that, in give a sense of uh um you, know how the infections span from um individual to individuals how their reproduction number is called and.

A

We do this in collaboration with the uk, defense, science and technology laboratory and- um and this is ultimately then analyzed by- I think- spin, which is the scientific pandemic, influencer influencer group uh for modeling so to to models how how the infections spread um another application. Is the uk nsc recommendation website. So it's the national screening committee recommendation. So this is a website that is used, is accessible by the public and and clinical uh sorry clinician to look up to to clinical condition so understanding if a specific condition need to go through.

A

A screening of you know the of the affected people, and so you can find how how the screen work um and is also used to coordinate consultation about you know uh the screening itself. This is um from an openshift perspective. This is um completely managed by ci cd capability that openshift itself provide.

A

um So with that turn and and the other components that openshift offer the other user case or case studies that I mentioned before more related to batching process.

A

So in this case, we we use openshift to run um analysis that are um managed to to specific um pipeline, uh in this case apache airflow. um So users before were running these bad jobs again on their scientific workstation.

A

They didn't have enough compute capacity to um run in, in short time, analysis on large data set and, as I was mentioning before, the volume of data to analyze has been growing significantly and and so the need of not you know, having a workstation taking hour to analyze, and you know, having a laptop always on um started that process of thinking how we could run that type of batch processing on kubernetes in this case, on open shift.

A

So there is a building container image process and that allow us also to locking down specific version uh of an uh runtime environment, and that is important for um scientific reproducibility. So we want to make sure that when we publish some data, we also keep track of the entire process involved with that results. So all the code and the library involved in the the scientific data that are ultimately shared with the scientific community need to be uh kept, and that is an a you know capability that we we have.

A

um The one of one of the system that you know use that batching process is, is the unified invection data set.

A

This is one of the systems that access a number of different data source and perform um different type of data, mesh up and query and analysis, and one of the challenge here was ensuring that we were complying to information governance policies. So we built a small module that used this technology called kerberos to perform authentication at user level, so the user that run that batch process that started the batch process and therefore that is going to querying some specific sensitive data uh is also logged for information governance requirements.

A

So here is again um a wide number of components involved through through the the old pipeline that afro managed- and these are ultimately computed in in a containers environment, specifically on open shift. The results of those analysis are then published through through dashboard, and um you know coming back to to to the the way colleagues were working.

A

You know those to to share those dashboards, those um reporting um uh they were saving this data into a git repository and everyone that wanted to see a reporting capability. They had to download on their system that set up data in and only only then be able to access. You know that type of dashboard with openshift we have enabled uh we've removed that complexity and all that number of steps. So here again um we have accelerator the ability for scientists to do their daily work.

A

um The the last sort of um user case, if you like, is the interactive analysis that we are more recently working on and um scientists are at least in our organization that they they are more familiar with r, which is probably slightly easier to approach.

A

um I think yeah, python and r are the two development runtime that we use that we see are used more often, but specifically on science, scientists that do um that run, predictive models and data analytics that there are there's a lot of using of r, um so interactive analysis through um they. They they normally do through this through a component that called, um let's call r studio um in in their own laptop.

A

So it's they have this ide, this development environment and they uh choose the the different data set and they they run at those analysis. But um what we're working now is a way for them to connect that um id that that environment directly with kubernetes within open shift and therefore from the code and that they put together, they can then deploy the actual computational task in parallel across a number of of containers in openshift. So again, that is a capability that has allowed us uh to um facilitate and accelerate and use that scale resources.

A

um I think that that's primarily what I want to say. I want, of course, also mention the fact that all these has been possible through the use of open source technology, primarily so I I want to have used this opportunity to thank the open source community around the world, not only people that are working on okd and open shift, but more broadly people that invest their time, resources on the broad ecosystem of open system technology.

A

These enable open science and make possible uh for organizations to accelerate at speed a number of activities are otherwise very complex and and probably costly to run in a different way. So thanks again to radat for the effort that he's putting on the open source and and to the open source community in general.

A