Red Hat OpenShift San Francisco 2019 | OpenShift Commons Gathering, 28 Oct 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: OpenShift and Machine Learning at ExxonMobil with Cory Latschkowski

Description

OpenShift and Machine Learning at ExxonMobil with Cory Latschkowski of ExxonMobil.

Filmed on October 28th, 2019 in San Francisco.

A

So got my name: Cory latch, Kowski I'm, with what's called the upstream integrated services technology, enablement group at ExxonMobil, so I'm gonna disclaim really quickly. These slides were recycled, slides that were approved by legal, so it's gonna be a little painful. This first part you going through that I'm going to tell you a few stories that don't necessarily have slides so I'm grateful to be here. I'm grateful for Exxon letting me come and share some of these experiences and also Red Hat.

A

For inviting me, if you haven't heard of Exxon Mobil we're sort of a large organization we we did. Some people at our core would say that we're actually a risk management company that happens to dillan oil and gas. We take safety very seriously and that builds a culture around that also brief. Intro to me, I've been with Exxon Mobil for about eleven years. Maybe twelve, now I've lost track moved around a bit.

A

I was an active directory domain admin for a research company for a few years, I eventually moved into HPC high-performance computing at Exxon Mobile, where I was focused on large data processing, I became an RHC and, ironically I, don't think I had any Red Hat subscriptions. I was actually managing at a time moved to cybersecurity. I was an SME for internal digital forensic cases, did log aggregation project and worked with Hadoop Splunk and some other technologies. There would have been a really interesting use case in out of looking back at it.

A

We had a data breach analysis that we had pulled in, and machine learning would have been really fun with actually some of that stuff. About two years ago, I was pulled in as the platform architect for open shift, and why did I leave cybersecurity? It sounded really cool to work with kubernetes and work with open shift just curious. How many people in here started their journey with openshift before Version?

A

Three: anybody in here man- you are a brave soul, so we started out around version 3.5, and this was theirs I'm going to talk about this a little bit more later, but that's that's sort of where we started out. I came on to a team that was an agile team. It was part of a digital transformational effort and we did everything from the full stack hardware to onboarding of customers.

A

One of the big wins there that I want to share is what we did is with with such a large organization. There's a lot of overhead and processes, and one of the things that we did is we stood up an open shipped instance in a cloud provider with git lab and we use git lab is basically authentication provider and we said, if you have a company email address, you can go and use this, and that was a huge win.

A

We saw early adopters really easy by doing that, and we also people are really happy about that, because it didn't require the same approval levels that they usually had to go through.

A

Also one of the big wins there was partnering with Red Hat. We didn't internally have the experience to pull this off, and so we had to partner with Red Hat and also pulling contractors to build this team. I guess you could say we also had some accidental wins. We got lucky. We got some really good people that worked really well together.

A

Also as an architect and an elite engineer, I got to see the onboarding process so that iterative, agile, devops approach was extremely valuable to see all the use cases there's also some less successful attempts for onboarding and that's sort of where this story is going to start as far as machine learning and data science is what happened is so let me explain, there's actually a few data science groups within ExxonMobil, one specifically that I'm talking about today is in what's called the upstream organization.

A

They were integrated during some restructuring and Leedy, but by the name of Audrey Resnick. She partnered with Red Hat contractor who was a full-stack developer to help with some of this. This work, okay, yep we're good. So in this picture, you'll see there's a Jupiter notebook running. This was what I like to call the beginning of the snowflake Factory.

A

We started every you get a Jupiter notebook, you get a Jupiter, okay, everybody gets one, and so that started not a very consistent experience and in when you're doing machine learning you want to have reproducible results, and so that created a few challenges. So what was worked on was to bring that the goals there were to create an interactive reproducible and collaborative environment for the data scientists, and so Jupiter notebooks were selected. This was as you're seeing here.

A

The the main thing was: people are running these in the HPC environment and Linux and windows on port 8000, whatever you like, it was all over the place, so the goal there was to actually move from this, this local PC environment, to more of an open shift, and so this was a huge win for the data scientists to start having this the standardization, it forced a lot of different things as well. It inherently made more of a DevOps approach to things. People may have I'll, probably explain it later.

A

Another example of some lessons learned there, but this accelerated a lot of the pocs that were being done, and so this is sort of the model that they went through more of an agile model and then pushing code. One of the big things was using s2i and to deploy these proof of concepts and then have a way to demo that and then giving feedback from that. So one of the big win here is also that, because OpenShift was risk assessed, a lot of the controls had already been documented.

A

These data scientists didn't have to go through that process, which was traditionally sort of a large overhead. Also, our back access to certain data access to certain models was was simplified through OpenShift.

A

Also security, so that was a big one, as well as bringing these dependencies for your for your model and we had a security pipeline to bring in the artifacts for Python that were brought into a nexus and also the images that came out of some of these base. Images for the jupiter notebooks were part of that image or in into the nexus repo.

A

So this we did a few things here. One is we took. We took some situations that would generally take anywhere from months to deploy using waterfall and some of the overhead of our internal procedures and change that down to minutes so before when, before this effort first started between the data scientists and developer Dave I think they were producing one or two proof of concepts that were actually getting to customers right now.

A

I think we're around 70 plus pocs that are being produced by the same group because of using one big thing is source image, so not only the jupiter notebooks, but also using source to image, to deploy these POCs so reusing data and also connections to data. That was a huge one. We're using these images. A lot of the connections to data were through sequel server or work. Alanda having those drivers already built into the images was very helpful.

A

Another thing that we looked at recently working with Red Hat and will Benton was the idea of doing a source to image model training so actually doing your your training and during the build process, so we're also seeing a CIC D pipeline maturing. We we had a few things that we learned from this in trying to solve this one. One of the biggest problems is where's your data on Prem databases in various countries and certain agreements limited where we could move data or access data development and deployment in Jupiter notebooks. While it was much better.

A

There was also sort of some lessons learned in context that had not been captured during that, for example, here's a horrible story to tell, but somebody was using Jupiter notebooks on their on their local machine and they moved to openshift. They started working on that they had done a lot of work and their pod scaled down. Okay, someone hadn't told them about persistent volumes, so you can imagine how that sort of was a very painful but good lesson, so that was sort of a journey in itself to understanding that context.

A

Also one size does not fit all. We found that data scientists are very they're, very special and, and one size does not fit all. So we tried to do that and we found out some of them just wanted. If you're thinking about MVP model, some of them just wanted shoes, they didn't want to skateboard, they didn't want a bicycle or a car we're trying to get them all to work on the bicycle. So again, we just focus on basic fundamentals, which are like webhook integrations and using integrations with Jenkins and OpenShift.

A

So here's where some of the things that I can legally share with you, this image here is actually some of the flow modeling that was done with machine learning. So these understanding what the well flows are going to be over over the lifetime with the well using tensor flow high torch psychic, learn all those libraries and dependencies building them into into the jupiter notebooks and then an other python base images currently we're looking at actually using GPU and an in on OpenShift and seeing the benefits there.

A

There's a lot of stuff that I've seen way more interesting presentations about, but I can just say we're we're looking at that we're currently in the process of working that we're also looking at Rapids a lot of the work around Nvidia and the efficiencies they're. Bringing to that.

A

So some of the proof of concepts that we're working through is petrol physical models. So you'll will look at the physical attributes of various layers in the Earth's crust and process, those in a 3d model, and that's that's one of the use cases.

A

So these are some of the efficiencies that we're finding here are not only just we're not just optimizing slightly these or millions of dollars, if not billions, of dollars that we're finding that we can save, or at least avoid in cost in certain areas, natural language processing, we're taking a lot of technical texts and trying to create a repository library around that and also shared. It was talked about earlier, sharing those machine learning models as api's through like open data hub and stuff, like that.

A

One of the lessons learned also with GPU and this this may or may not be applicable to you. But hopefully it is. We found that, because GPUs are built at a premium in the cloud you may want to do an analysis to decide if you want to just buy that new rack of GPU Clutton like servers every month, instead of paying a cloud provider to run it for you, that was, we had several budgets that were burned through from some of the data scientists running their models in cloud. You also learned to train models locally.

A

If you have the resources, it saves you some costs and then run your trained models as spike work in the cloud.

A

So also, some of this is going to seem really straightforward and simple, but as as some of our data, scientists they're very focused on the problem, and so they don't always step back and look at the big picture, and so hybrid cloud requires you to look to step back a little bit and look at that.

A

So we broke it down into some really basic questions. When we go into some of these discussions. Where is your data like because the question that was being asked of of by the data scientist was like? Well, where do I put this? Do I put it in the cloud? Wear it, but my data, where do I, run my application, and so we broke it down into these three questions being what is your data and discuss data sovereignty? Where are your customers or the internal external or the mixture?

A

And then what is the bandwidth or latency between these elements that you need in your system.

A

So here's here's our safe word clouds slide.

A

We we, we all want to be doing these things. This was just part of the slide deck that was approved, so we're gonna skip that.

A

But how do you do that? How do you use those technologies effectively? So what this is one of my personal focus areas I believe that success is not about doing things perfectly. It's about willingness to change and being honest about where you are.

A

Ultimately, this is far more important than what your current abilities are, and so one of my favorite conversations that I had recently we're in a room, very intelligent people, I pretty sure I was the only one without a PhD in that room, and someone asked: why is it called a cloud and I was like okay, I looked around and I was like. Does anyone else want to fill this question? I realized nobody in the room actually knew the answer.

A

I was like we should just Google this guy's, but no, it actually turned into a really good discussion and I and I talked about back in the 90s doing Network diagrams and how it was abstracting. These details and- and it was just abstraction and trust- and that's what cloud was all about, and also it's really easy to draw, but but it's it sparked a conversation that was very helpful and and realizing that we're never too smart to learn more.

A

So talking specifically about some of the effort and some of the lessons learned that came out of some of these first first part of this journey was that I'm part of again part of the upstream data science enablement team. So I was platform architect. I have been moved over because of this unique set of skills or experience that I had to try and add context so I'm. My team is, is specifically there to fill in these gaps in knowledge and on soul culture.

A

So these are big gaps and if we have a lot of non IT engineers at ExxonMobil, gaps in machines create failure, and we want to avoid that in our culture and in these efforts with data science, so consulting with the data science, we spend probably fifty to sixty percent of our time doing that right now. We also in in doing that. We also get a lot of really good feedback which turns into education. This turns into building success. Skills is what we've termed it, which are really just developer practices.

A

A lot of data scientists at ExxonMobil did not come from it from a development background. They didn't grow up in that world and so they're not familiar with get branching they're, not yeah. There's some of them aren't even familiar with yet they're, not smart. Like it's it's a challenge. There's a wide spectrum of people were working with another one. Big one is collaboration and partnering one of the listening to one of the internal talks at Exxon Mobile. We were looking at the number of patents that were released over the last decade.

A

It was very low and we realized that collaboration is something we don't do well, and so we're focused on doing that not being a lynchpin like our team's purpose, is not to be a lynchpin in collaboration, but to be an enabler and to get out of the way of that and to help it organically happen, we're also focusing on self-service. So that was a big one we want.

A

We want enough of a paved path for our data scientists to be able to use these tools, and that was why Jupiter notebooks came in for collaboration with other people. That's also that's also one purpose that, as our team someone, someone asked us the other day. So what do you guys really do and we're like? Well, we force awkward conversations and that's literally, what we do is we come in and people are like well, can you help us and I'm like sure, but can we do a peer review of your code and they're like well?

A

I, usually, don't share my code with anybody I'm like we're gonna, look at it. We're gonna see if you're using modules like we're gonna, see if you're using best practices, and so we we force a lot of this awkward conversation that that helps to change that culture.

A

In addition, we have awkward conversations with other groups in the organization to help provide features around, for example, bringing GPUs in or saying we need open shift for, for some of this work tell me about operators, so we forced these awkward conversations, also with with other organizations within the IT or org.

A

So some of the ways that we've we've discussed in retrospective is around what what actually builds a successful enablement team. So one one big thing is: we leave our egos at the door like if we don't know something we tell someone that and we go figure it out or we figure it out with them. We're we're also full stack developers. So we we work on making others successful. As a team, we demonstrate what's what's called healthy disagreement.

A

We we all have very strong opinions at times, but we know how to disagree appropriately and we demonstrate that to data scientists that inherently don't collaborate, they're, scared too or they're, just it's not natural for them. So we try to be a good example in that area. I'm. Also, to give you an idea, there's four people on this team and we have about 90 data scientists, we're supporting and no way is that the perfect ratio?

A

Don't don't take that back home with you, but we we definitely have a lot of work that we do, but we've seen a lot of success even with a small number of people, and so here this is. This is basically this is the legally released picture of some of our data scientists, so this is them collaborating around a jupiter notebook.

A

One of the things that's that is the best to see is to see when, when they really understand these things, and when we can answer those really simple questions, why is it called a cloud and in talking about the fundamentals and contexts around OpenShift, it has been a huge enabler for our data scientists and I'm glad to be sharing that with you. If you have any questions, I hope we'll we'll talk later, but thank you.

A