Red Hat OpenShift AI / Machine Learning | OpenShift Commons, 6 Apr 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: OpenShift Commons ML SIG April 5 Full Mtg Recording

Description

Diane Mueller - co-chair
Introductions
AIOps SIG announced
Drew Minter (UBIXLabs) – UBIX Platform Intro

A

Four minutes after the hour, I think that's polite enough time. Welcome, Jonathan and Tushar, and everybody else and please feel free to jump in I, wanted to do a quick announcement about a new cig. That's been injured, created here, AI ops. We had our first meeting about a week ago. All the videos and everything are here on the Commons open, shapes cig, HTML page, which I'll pop over to here and there's a mailing, a Google Group for it as well.

A

That we've just started up on the topic of AI ops and if you have a chance or an interest in applying your tooling do operations. This is sort of the beginnings of conversation that we've been having internally to Red Hat with a number of partners, so we're we're socializing that and creating a group around that. So you can join that or you can join just directly through the AI ops group itself.

A

So you can see, there's not a lot of topics there, but we are just getting started so join us if you're interested in that space today, let's see who we have on the call and I'm what I usually try and do is make everybody introduce themselves, especially the ones that I don't know, and if you want to let's go through the list here Bob. If you want to just introduce yourself that would be great. Now unmute you.

A

A

If not and you're just listening in Calvin Locklear, you want to introduce yourself.

B

A

And I can tell from that accident and then drew who's going to be giving us our presentation. Today you want to introduce yourself.

C

Sure everyone drew Minton CTO and chief data scientist at UNIX labs I've been there for five years. Are my history with open shift and red had really started a couple months ago? But you know we've been very open source friendly, you know stem to stern and I'll be doing the demo to introduce you to our platform. That will be at the right hand, someone for more detail.

A

David Barakat.

D

Hello, can you hear me well, I can.

A

D

Good afternoon everybody David very cat I'm based out in Raleigh I used to be the chief architect for internal things, found here last month, where I'm also expanding my role to do analytics. Ai MML I'm happy be happy to be here happy to learn from from the team.

A

um And Jonathan, who I know well: Jonathan Gersh Dadda you wanna, unmute and.

E

Yes, Jonathan I work at red head on our offensive team and I was side and growing interest in a IML Thank You.

A

Stefan yeah.

F

My name is Stefan x9 I'm, based in Germany I, have background on air ops from the meeting space I'm a long-term. He will pecker to enterprise now for Mike, a focus employee responsible for money, drawing products and also machine learning areas and AI ops, but in a way to transition to a solution. Architect, role in to adapt in Germany.

A

Welcome aboard sure you want to introduce yourself.

G

A

But he's muted so daniel want to try and introduce yourself.

A

Few seconds he's probably got his head doing something else and I'll go back Bob if you still still there and want to introduce yourself go for.

A

Otherwise. What I'd like to do today is and I, don't see our second speaker here, Jeff beam having joined us, so I'm gonna ask drew MIT Lincoln to share his I'm gonna. Stop sharing my screen and asked drew to share his screen and walk us through with one of the topics for today around the UX platform and tell us what that is and what it can do, and that would be great to hear.

E

A

Found out about it about three days ago, so I'm quite interested in and what you can do with that.

A

There we go and.

C

I already see my screen: okay,.

C

All right, so you know this looks like a pretty technical audience and so in service of that I've cut out about two-thirds of the slides that we normally cover. But again you know a lot of you know it's not always one person in AI who has the entire sort of scope of where we're playing so I still think it's instructive to get a little bit of background on where we're coming.

G

C

As a company, you know just briefly before I go through our messaging. You know the first seconds when we were originally building something that would compete with data breaks back in 2013, but we decided to build something easier than Scala of our own DSL started in March of 2013 back in our 0.5 I believe it was. I came in right around when sparkster 26 happened in May of 2014, and we we came from the prosthetic capital infrastructure. We had a restart.

C

We actually acquired another company, all the source thought that is their data shape or offering for that we'll go through the different column and we're right now, just you know kind of exiting our stealth mode to be able to start embracing customers here.

C

So you know the history of the company that we were the the main pivot that we made after we weren't going to be data. Bricks was to embrace autonomous data, science and and I think that anybody is involved in AI.

G

C

That that, while there's a lot there's a significant amount of promise, there's also a lot of a challenge. As far as the the landscape between the amount of data that's being generated, the amount of demand for applications and, just frankly, how hard it is to actually be able not only deliver on a quality model. But when you look at processes around it, you know there's been a lot of Six Sigma in other places where things have become commoditized processes and something that are very highly automated and repeatable processes.

C

And even though conceptually you know, we've got Chrystia and some or what-have-you operationally it's kind of all over the place and there's also. This is a famous diagram that the Google shows in publishing one of their their documents about showing the the level of technical debt that is taken on. That is really external to the modeling process itself.

C

And so you know, we learn very early on in our research that if we were going to take on the challenge of having autonomous data science possible, that we had to be able to have a pretty big tent in order to be able to accomplish those goals, and so just to give you some idea of where we're playing these are some of the quotes that some people have said about us and getting previews and understanding where we're at so you know the will be going through a demo for the majority of the discussion, but just want to give you some idea of.

C

You know that we're really trying to help on all levels of the entire process- and you know with certainly the emphasis of the citizen and the citizen data finds in the hardware, data, scientist and professionals, having sort of one place to be able to work together. But we see more there. So you know when we look at the data science process.

C

We won't weird we'd taken an idea from you, know continuous improvements and continuous releases and we to be able to make sure that we're facilitating a comprehensive view so that, regardless of the type of data or the type of analytics you're doing and the type of you know, automation where we would be sort of a sense making of the sensory part. Then there would be some other control loop. That would be integrated with that. We can essentially be plugged in at all levels.

C

And so, when you look at from a technology perspective, how how we're? How we're looking here and again will be. It will port for open for Red Hat summit and a phase where we're going to be actually on the point openshift as a platform as availability. There abduch a lot more of that later, but you know because we knew that we needed to have a reference platform for all of our own research on being able to have model refreshes online, etc.

C

We knew that we had to have our own sort of big tent internally as far as what we considered dependencies for our our offering, and so this just gives you an idea of the scope of where our dependencies are.

C

So what I'm going to do is I'm going to introduce you to the deployment infrastructure. We have give you an idea of some of our DNA. If you will and then we're going to spend the majority of the time going through a couple workflows, one of them will be in as a traditional survey, iover IOT scenario of a part of a passage failure and how to predict it and prevent it in the future and then there's going to be another one.

C

Just looking at housing prices, that's more of a specialty item for for hard core data scientists to understand some of the depths of the learning engine pieces that we are adding.

C

Here but before I start are there any questions about anything just.

F

Briefly, so, in the overview of this of the architecture you, the notebook, was grayed out. Is it so that you're not focusing on any you Peter notebook environment, which is I, believe pretty popular in Cagle or Google? This is something what you don't address right now,.

C

So the short answer is that it's grayed out in that you know Jupiter lab, has so much momentum behind it that it's I think it would be foolish for any software developer to try to make its own notebook infrastructure. You know like data break I'm sure at some point, we'll have to do something more in super labs. Is my prediction, but III do have a good, a nice slide on that subject.

C

When we talk about roadmap, that I think will help clarify how we view ourselves and I'll take a moment just to make sure that you understand sort of how that that works out there. But it's an excellent question so yeah it will be covered. Trust me super okay, any other questions for start.

C

Okey doke, so what we're looking at here is the is the interface that I use to spin up the instances that we'll be looking at today in the demo, and so you know, because we are dyed-in-the-wool open source stack. We had actually considered having part of our our software actually open source. At one point you know we we have the capacity to be able to not only spin up our own instances of these major subsystems, but also as we're planning for you know for the first version in OpenShift.

C

The images will be using our data ones, but we're certainly transitioning expected soon after the read something to be able to start using. You know the the open, shifty images for kaká, etc, but we already a configuration available for them, but you know for development purposes a lot of times. You know we having just one server is enough, so it gives you're not doing heavy streaming work.

C

So these data types that we see here are currently the images that are that are available natively in AWS, but the server that we're actually using is an private hosting in Germany, and it has.

C

It has open, stack DNA so that we can deploy to GCP or and when the process of going deep, with with the UV images and OCI containers for open shifts for for demoing it.

C

So with this information, we're expecting that we'll be able to give that invitation codes to people at that Red Hat, and then they can come to the site and be able to put in credentials. It'd also be another version for open ship that will give similar availability. So people can start playing with this on their own account.

C

But one thing I just want to quickly show is that you know we have because we we manage this and we can. You can go across cloud. We can actually do the we can. We can manage instances, you know centrally, and so here you can see some of the management tools we have and let me just go down real quick to show.

C

This is one of the instances that we'll be looking at today, and so you can see that, even though, on the other page, we we could scale out and sort of five different servers under the covers, where we're maintaining 11 different docker containers that we have a 12-1 with our data space or data shape or acquisition that we'll be adding, and you were expecting within the first 90 days, an open ship to be able to just be producing it, 2/3 native openshift images and then the rest will be using off-the-shelf open ships containers there.

C

So that's you know so with with the docker container infrastructure that allows us to be able to not only deploy with our cloud neutral cloud integration, but also work on private cloud and be able to install on bare metal.

C

So any questions, so this is usually when I stop talking about infrastructure and dependencies. So if there's questions about deployment, hardware, software requirements or dependencies just as a minor roadmap item- that's not in the presentation is that after rad summit we're also looking at being able to you know, we we've been doing a lot more deep learning and we we found that you know for a lot of occasions.

C

We need a more dynamic, be able to switch from CPU workloads on day, like seven GPU support, so we're for a more dynamic container management and also extending the data science language. That is a layer that unifies all the access to these. These services that we use internally and and be able to have that support deep learning on GPU as well.

C

So any questions before we go to the more traditional part of the demo.

A

No not seeing anybody question you yet so um we'll hold that till the end. Okay,.

C

All right sounds good I'll, just yeah I mean I'm, trying to surface Oriole and a lot of my conversations. I know we have a lot of time. So if a question is burning, you know I'm happy to stop midstream to to just a triage whether we're going to be covering it or not.

C

So this is our infrastructure as far as the solution space application that we built on our deep space or our instance management. If you will that that coordinates and deploys all those those differences. One thing we also have that the benefit of the layer that we built is that we we essentially can backup and restore any of the models and if our scripting and operators that can be deployed to other versions and other instances of Ubik, so that you can be federated learning pretty easily.

C

But you know we're kind of kind of we're going to go in to end in our in our demonstration today in looking at the process here. So what I'm going to do is I'm just going to take our basic CSV load and we're going to take data, that's from a pump. That was that was the size of a house that we worked with, but we've simplified the number of sensors from it for demo purposes.

C

So I'm going to name this training data, and this is normally all we need to do for citizen data scientists be able to get the data in, but we also have much more extended capabilities because we have native SPARC integration. So, for example, you can choose if how to store the data frame or the idd, depending on how you look at it.

C

For you know, for, however, you want to use the different options for stores that are built into SPARC well, but in this case we'll just keep it simple for demo purposes and so I'm going to start this one. But what I'm also going to do is I'm going to start a load for another set of data that will rule that we will have for later on in the demo here. This is essentially sort of fast waiting to see some of the model performance. What do you do when you're in production.

C

And so we seem several steps happening and if you look back there.

C

What you'll see is that we do multiple steps within our execution, and what this is is a.

C

Running a little bit hot on my computer, there, okay, so normally what you'll see is a disagree fresh leaf on. First there we go okay, so you can see that there's several different steps that we run at one beyond just the basic load of the data, and we do this for two reasons. One reason is that you know we found from a research a thousand different workflows that data scientists generally have a lot of sort of basic questions.

C

They have for data that they won't have answered, and so we try to stay sort of one step ahead of the data scientist and our recommendations, plus also we use that information for feeding or learning engine by taking basic profile, information and being able to derive a feature space for a meta learning or what we call our meta space. Sorry just give me just a moment.

C

And so I'll show you as soon as this comes back, that there is featured similarity that we that we do and we use a open ml data set, which is a meta learning research site, so that if people are interested in some of the profiling of their data, they have a reference they can. They can look at externally.

A

And this is how we know you're doing a live demo. Well,.

C

You know I I used, you know technical audience, so I'm always prepared for more things than I get to sometimes but I'm anyway. So this just to UM go back to what we're what I was talking about before here. So you can see. You know, for example, when I talk about our DSL language, this is essentially loading from HDFS into the spark memory, the structure internally, that we use for our tasks SDK.

C

So, for example, people can learn DSL and let their own tasks, and we can learn some from them as long as they have the right annotations of what data they produce for the actions that we want. And then, even though this is the DSL dialect, that's being used in this phase, we also have a JavaScript support, so you know that's how we accomplish our airflow integration and our ml flow integration that were working on right now, but just to go back to what I was talking about with the future similarity here.

C

So this is an example of essentially for all of the sensors that came in this is these are datasets that are from the open ml research site, and then these are what we saw similar features, and then this is sort of the cluster. They belong a few in the likelihood rank on the scale there tend to be or part of that that question that helps us narrow the serve space for when we're making recommendations for data Sciences workflow.

C

But you know one of the most common scenarios that we're looking at a work like this. That is, that isn't necessarily as easy to put into this. This workflow canvas is the data exploration that you normally do, and so we sort of have two answers that we look at in that respect or so I'm.

C

Taking that I'm copying the name of the the table that we just created that we just made and I'm plugging it into this, this ugly little URL that has the JWT token for authentication than just the name of the table, because you know with self-service, analytics on one hand and data science. You know very handcrafted model fact tables or facets.

C

However, you want to look at your your inputs, the models, there's kind of been a breakdown of that the corpus collosum of the data brain corporate wise, and so you know, we add philosophies essentially, if from. If you have the right authorization, you should be able to have access to any work that anybody is doing as soon as it's finished, without sort of the with with publishing beam before of an isolation and a governance scenario on the instances. So this is an example of the data.

C

Just in Jason form that's available and you know throw a partner Domo here. I can just take that same URL.

G

C

Made- and it's got the name here and with just a couple clicks it's now available.

C

Or any of your bi clients, this is just one example of our integration here. Another example: that's that is it's coming soon. That will be in your live on stage for the fifth, but it's just in a slightly different version of a newer version of the software here is that this is the classic iris dataset look here.

C

And so just to show you here, this is this: is the iris data set? You know using the same package that we looked at previously and you know we got some inline.

C

Okay, so so here's our inline analytics, but one thing that there's a recent feature that we've added is that you can look at the view of the process itself and take anything from any stage that there's been a new sort of standing query or our options generated and be able to directly have it be brought into an inline tool, we're still Reed rebranding. Essentially, this is for those of you are visual analytics. This is big, a Voyager now embedded within UNIX.

C

So, but you know, this is something that we'll be able to have, as you know, a longer term ways you're not having to copy and paste the URL that we saw earlier there, but anyway, we're still just kind of in the data worm. We haven't gotten to the the fun stuff of the AI. Yet so you know just as a quick look at the data from the training site there.

C

If we look at the data, that's in there, we can see that, just as a rough cut that the pumps feel pressure, he is the most anomalous. That's that's in the data set, and so what I can do is I can select that pump steel pressure and there's a query to our learning engine that uses the profile of the data and the meta features. Your meta space features that we we've essentially generated and essentially makes a recommendation of what are some tasks that we can do next, and so this particular task is an annotation.

C

It's sort of we. We make a difference between whether it's a user domain, that's causing the future engineers or if it's an analytic domain, that's separate from the taxonomy another type of sort of semantics that you're looking at here. So this is one of our basic ones, because you know, since we know, there's a date/time there's an anomalous. You know this is we're.

C

Selecting the most anomalous feature, what it's with it's recommending it that we, we might be doing a predictive maintenance scenario, and so, in this case you know, generally speaking, sometimes it can be take weeks, sometimes to be able to take the information that you get from the business and also you know as far as what the date/time sends you want to use, and what's your your window, that you want to be able to tech things that detect things and to be able to get parts out to the boondocks and scheduled downtime.

C

So in this particular case of the 168 hours and so what's happening right now, is that you know we just capture that data, usually just gonna, do spreadsheets out of work orders, and now we've actually done all the data engineering.

G

C

Need to be able to get a baseline from the first week of the data that we have and then look fast-forward to see whether a week after the sensor occurs, whether we've had an anomalous state based on that baseline and so with having this binary date. That is time forwarded a week.

C

We now have the data shaped in a way that we can usually make predictive model that we could use to prevent this type of failure in the first place, and so what I can do is I can select that anomaly that was just created, I get a recommendation for a classification, and so we can use that as the target. Then we can go back here and we can select the features that we want to use.

C

At them there, and now all we need to do is add the performance metric. In this case, we use air into the curve which point-five the.

G

C

Model one point: it will be perfect and I will go ahead and give a coffin name and so from a you expected. You know this is where we are we're: building with ml flow and, ultimately, with a robot needs through AI I, actually a little bit of a tree I with relatives, especially the ability to have Bake off's outside of our cluster. But because you know we are native, we do have native spark ml native opal lab at scale out and native a show sparkling water scale built-in.

C

What we, what we've done is we're we're actually querying our our learning engine before we start doing any training to reduce the number of models that we need to see, sort of most exemplary ones based on the complexity of the data and other historical work, and so this is a relatively simple set of features: small set of data.

C

So you know it's not going to require a lot of different options to have have some variety here, and so what we come up with is essentially a two different models that better come up and there's a significantly better scenario with a spark canal. You got to see what's above 11 later, but the nice thing is that essentially we have everything encapsulated and, with the same type of idea that we showed of an automatic availability within the rest of the organization. We don't really look at deployment as being anything but more of a governance issue.

C

So it's an example here, I'm going into postman, which is the tool we use as sort of a proxy for any depth of tides to detect any rest based. You know, communication for for general HAP integration and our PA what-have-you, and so what I'm going to do is I'm, adding the.

C

Matting the name of the bake-off to this endpoint. This is written in DSL that we have here, so we can also configure it as needed and what's coming in here and the features parameters, essentially adjacent representation of one sensor. Reading. These are the same features that we were looking at before just slightly different values and so with the right authentication based on the data Beauty header and also having the right email that matches it that's being sent in.

C

We essentially have this model available for anybody who wants to automatically look at the data and be able to start using it someplace later on, and so what we get back during the from this from this round trip here is metadata. That shows the data types of what we're used in the model and then also feedback of whatever columns you use in that you pass through it, and then we also get a yes/no question.

C

So, in this case it doesn't look like we're a week out from any failures, and then we also have raw score, that's available, so that if you need banding down the line to make more granular decisions, that you can use that type of interface better and so for a lot of products. A lot of AI tools.

C

Production is just a mystery. You kind of just write your specs and someone else recode. That incident you ask, and you know we can say that kind of a waste.

C

We try to organize our past development so that they publish opera functions or analytic assets that can be reused so that we can have a tax generating task, essentially just just compile those for, for you know a deterministic workflow, but um you know in simpler terms at this point: I want to just fast forward in sort of a timeline to a week later, where we use the data set that we built earlier.

C

That has information whether the there was a failure that actually occurred, or did we get to an anomalous state if you will, and so this steal failure within a week features the one that says the actuals of the data we've been looking at, and so, if we look here, we now have a monitor task, that's been recommended, and so what I can do is I can take the bake-off that we made I can select it here.

C

I can add it to that century task, and so this gives me all the configuration to be able to evaluate how things have been going since the models been in production, and this sample is just a simple one for for CSVs, but we do have support in our DSL for Kosta and SPARC streaming constructs. So you can do this as a continuous batch. It's easier to follow this way, and so, if we look here, you can see the physical model that's associated with the bake-off model.

C

That's the best one at this point and you can see the the drop in the AUC score. So it's not doing as well as we were thinking previously and the usually the most common problem is that there's been drift of data. So we also provide you with information about the distribution of the training set features versus the batch itself, and you can see that they're close enough, that there really hasn't been any change.

C

That would be a representative difference, and so this is a concept drift as a polite way of saying it or that it you know the model itself has has has something new that you haven't thought about before you have to you have to think about, and so you know normally that's kind of a new exercise, but the way that we've architected the dependencies of our analytic assets and metadata. We have subscriptions that you have.

C

You know that there's a common event t that people can subscribe to and also ask themselves, can subscribe to you so that, if there's going to change where there's new features and some with engineered from the same lineage, that you can put them into a bake off automatically and then it with gif.

C

If it works your governance, then you can reapply it and it will update the model later for you, so I'm going to pause here before I talk to you a little bit more on the you know for the second phase, the demo, because I want to make sure I answer any questions in general that people have and and I want to be conscious of time, because I know we only have 21 minutes the session total.

A

But quite good demo so far and question just.

F

To recap a bit, so you know in the beginning, you mentioned that the mode for for out of citizen data scientist and do you really target these more like high-level, because I see a lot of details which is great by the way, but no it's the citizen data scientist really is the target target user for your tool? Well,.

C

So that's an excellent question and what you know we started kind of in the middle of what we consider our process from a technology standpoint, because we there's really a citizen diet, data scientist as someone who hasn't embraced all the true complexity of data science, and so you know there are- and we know that in some ways this this view of building tasks may be very intuitive to people who are dead professionals. You know people who do ETL people who dated development. You know bi developers.

C

What have you that that there is there is a simpler sort of dashboarding approach of adding a layer of the insights into the kind of raw annotations here. So you can think of this as the workbench that we built first of you, our own research, you know as they have people, but we also wanted to have the the the breakdown of the tasks into a way that could be.

C

Ultimately, you know have the tasks themselves be built-in, as suggestions from a UI like an example is when we were looking at roadmap lies over looking at is adding suggestions for feature engineering.

G

C

G

C

Task from directly from bi clients, so that's kind of one way that we're expecting to help. You know. Citizen data science is more data, analysts be able to be more directives in predictive work and then I'll talk a little bit more I.

C

Think that's your earlier question about how we want to make it easier for data scientists to be able to be available in this ecosystem, so so you're I think you're, spot-on in saying that this seems a little bit more technical than the citizen data science might be, but probably too simple for most data scientists, but where we're serving this is sort of the core of the infrastructure. Almost the audit trail of what we see from other interface.

C

Okay, thanks. That's one question any other questions at this point: I do have a couple slides to talk about roadmap and things to expect Diana. If you want me to do those or if you want I'd, probably have another five or ten minute the material. Do you want me to go over.

A

Please go ahead because it's really interesting to us I think this is all new to me um because, as I said, I only heard about uvx you buy the IX.

C

Yeah you big sis is what we call ourselves. We.

A

C

Ubiquitous analytics was the and and when we were thinking of trying to compete directly with with data breaks, we were calling that ubiquity. But the interesting note is that, just as a technical piece for the I think there's somebody I saw who who was at it breaks once upon a time I'm, just gonna go in going in just as a legacy instance code here, but if you want to kind of go a little bit deeper.

C

I'll just show you here that if you and I just restarted this instance before the demo ran I, think it's usually about 60s. So this, essentially you know. If you want to see a you know, an idea. This is essentially all of the that's running, and so the whole demo itself took 606 spark jobs. That were we run and in the background here- and you know considering that, usually they measure things in singles that you're doubled.

C

It's because it's one big batch people are bringing to data breaks, they're very excited, but the challenge we have is that we build this, this data science layer to unify a lot of non spark technologies. In the same, you know ast that we create, and so we have to kind of break up our we have to make DSL more modular language to make a spark, only version that will run in their library system.

C

So it is a roadmap item that we're very you know excited about, but just to give you sort of an idea of some of the depth of what we're doing in there.

C

So, anyway, all right, so let me talk just briefly about the the slides just to see if I get any other questions and we'll go back to the force multiplier for the rest of demo afterwards.

C

C

Okay, so this is this is what the open shift work starts, so essentially I'm giving a meeting to people on the tamp. It's just I didn't get I didn't get a chance to get this completely tested for us to make sure that everything ran probably before the meeting but I'm closed on, essentially having solution.

C

Space is what we call the product that we showed is the workspace canvas if you will, and so that's where we're expecting then for Brett had summit, we're expecting to be able have previews where we can give out, we can give out invitation codes and people can actually start trialing cubics as sort of one as a unified install, but we know that ultimately there's a lot of pieces where we're going to be breaking up the infrastructure and sort of our implementation and sort of mock, headless, modeling, scaled scheduling, work and some of the other pieces.

C

So I'll talk a little bit more so essentially from. If you look at the second half when he 19 we're rebranding the management workers, the deep space and what we were looking at at the solution. Space inside space is essentially a little bit of a taste of that our visualization also our chat and to be able to run tasks from the chat face and also just a lot of plugins using plotly, and that are ways to have analytics that are driven by path like for semi-supervised learning or labeling. What have you in the domain space?

C

It probably won't be, and you know we'll be lucky you see in the first half, but that's that's where we look at ontology management's, like linked data, importing metadata dictionaries and knowledge bases to be able to build rules directly from a semantic view in that respect and data space is a product that helps bill joins they do data cataloging and building tag systems and also being able to publish a dynamic data shape so that people don't have to worry about the joins and building all those ugly sequel queries just to be able to have all the joins resolved.

C

They need. One thing on the subject of jupiter lab is that you know we are very passionate, I mean Jupiter. You know we really felt like we had to wait for Jupiter lab to come to be able to have the light way to engage data scientists because you know from from our perspective. Ultimately, you know we have a project we're working with in our deep AI team of being able to translate Python from Jupiter lab notebooks into you, big, fast or at least recommend replacements that are already scaled out.

C

They don't, if the you know rewritten for production, but you know we already have in the in the engine the ability to take Python and our code and even our work spaces without recoding and integrate them as well. But we know there's a lot of use cases where people want to go directly from a tool of Python and put it into a solution space here. So we call that that integration, where you know we have sort of a junior version of solution space.

C

That's focused for data science, data scientist and jupiter lab tasks that were we're branding is model space, but mostly it's sort of a front-end to be able to integrate from a scientist perspective, to create tasks or assets. For tasks to use in solution, space.

C

C

So, just in conclusion, and talking about you know where we're trying to you, know the, but so we've taken on a very you know big tent, if you will, but I think it's taking us so long and to be able to get the market, but we're really excited about what we've done and you know we're really excited about. You know joining the the open ship community and embracing it, and so that's the core of it. So you know again.

C

I know we got 14 minutes left, but I do have another demo vignette that I can do. If you want to give me a moment to go back to it, but I know we only got 40 minutes left so much. Let me just hear if there's any questions. Thank you.

A

David Erekat has a question: okay,.

D

Yeah sure thanks you're, saying that you're doing in twin from data acquisition to everything that you say so, which is fantastic. How do you exactly do today's acquisition? A what to see the user using khakha using a combination of several others? How do you the acquisition, okay,.

C

That's that's an excellent question. I'll. Just briefly show you some potassium available. So as an example, you know what we used was. This has there's a little bit more, has more robust instrumentation than some of the other tasks we currently have, but essentially we we have a lot of different ways to get at static data that that's directly available like, for example, we have rappers that take advantage of AWS parallel api's, for you know for loading files in certain formats yeah.

C

We also have JDBC that we wrap around for spark sequel, but also we have made a deep investment in streaming and kind of two ways in that all of our logical operations for our pipe functions, if you will in our DSL, can take to take static or streaming data as a you know, as a producer of data. So so we can not only listen to caucus agent. You can basically set up cost of topics and and and read from cost of topics directly in our DSL, but you can also.

C

We also have standing query infrastructure to be able to, for example, playback tables like they were streams. If you need to or use them internally with any spark streaming, they have done externally with your own jars so and.

E

C

Want to point out that that is our acquisition from a DSL. The source will be considered native in our zero glue, air lambda turnkey architecture on top of our dependencies. But you.

G

C

To emphasize that that each task is essentially adjacent file that has been that can also support JavaScript. You know installing the right libraries and getting our own server. So in that respect we can acquire from data.

C

We can move. Essentially you know data into our platform by invoking external. You know processes to be able to start that movement into infrastructure that we already have as people want to develop as well. So we like to think of ourselves with sort of an analytic you.

B

C

Have an internal and an external analytic orchestration that we do in that respect.

D

Okay, good job.

B

D

Reason why I'm asking is, as you move more tea often sees there's different, there's services that you could be layering on top of or conceive that gives you all, that data acquisition and aggregation from different study with different connectors, etc. General that you could be yes, out-of-the-box kind of consuming to facilitate getting data from different points. So I would encourage you to the hanukkah data and see you can get some more value added from the partnership. Sorry from the open sea services, yes,.

C

Yeah, it's definitely on the roadmap that you know sort of the the the price for admissions to make sure our instances work first, but I definitely welcome I mean you know: we've been working with them as a partner and my I. Don't you seen. Domodedovo has a lot of sort of this pretty connector view of you know when you, when you want to actually get a bunch of services. That looks a lot like what I saw in you know in openshift, and he was like. Why don't?

C

Why don't we have this so I would just amazing that's even this.

C

D

That's fine anyway,.

B

D

You're saying that, but I'll total up online and I'll send you a couple things: okay,.

C

No III appreciate it because I mean like I said it's certainly, you know: I we've we've tried to concentrate on having a very broad footprint from the product. You know from it's sort of the data science discipline and at least having good hooks, but to your point you know we're moving. You know now that we're we're we're building partnerships for for cloud integration trying to take advantage of those. The things that are that are uniquely open ship is definitely you know very high on our list. So, okay, thank you.

C

Okay, thank you, yeah any other questions at this point.

A

You know I'm, not I'm, not seeing them yet I do want to save a couple of minutes at the end. So if you want to do a another little demo for about five minutes, that would be great and then I just want to talk about future events, but people can be talking thinking about future topics. That would be great well. He.

G

A

C

Yeah, let me just yeah this one runs usually pretty quick, so I'll just start the I'll. Do the narrative wall I'm doing the setup. So this is a different set of data that I'm bringing in on this case from a CFS.

C

But so what we're looking at here is this has about 60 or 70 different features itself. It doesn't take very long to look, but the goal that we're trying to look at here is. You know we were looking at a binary classification problem and we had a user domain feature that we engineered with the annotation, and this this demo just give an idea is, is what I call our force force multiplier demo, which is essentially trying to show some of the ways that we've taken?

C

You know research from you know, from sort of current your Python ecosystem, and also to our own learning engine research to be able to help. You know help data scientists get through problems a lot faster here. So you know as a similar scenario that we talked to this before I. Don't need to go through all the demo pieces for the you know the background of the looks it's the same task but I'm gonna scroll down here to yeah sales price. Okay.

C

So when I select sales price, I get a recommendation for a DTS or synthesis, and so I'm going to choose that as a target and then I'm going to choose a few of the features, and so essentially you know one of the things that you know. As a data scientist you do a lot of.

C

Is you you have to look at trying to put the case in a context with the rest of the population, which you know from the standpoint of a lot of the you know the the ticky-tack of the work that you're doing means that you do a lot of aggregations and you have to join those aggregations back with the data itself, and you know that can be a very time-consuming task.

C

So what I'm doing here is I'm choosing. We have a. We have a main target of sales price and what I'm doing is I'm choosing some of the some of the features that I want to use as ways to be able to look at distinct, essentially to do subpopulations. If you will for calculation, so I'm gonna have my list of a few features that I'm going to add to the feature synthesis as far as slicers.

C

If you will and then I'll pick the ID column that we can use and then what I'm going to do is I'm going to look at a series of for the sales price different partitions of the sort of distributions of subpopulations based on those here, and you know again, you know, as long as the grain is the same. It's one path of of execution. There up sorry.

A

You get love messages like that.

C

Well, you know it's uh I, you know, I we've been uh we're pretty close to our funding, so I'll be able to get some more testers pretty soon here but anyway. So you know behind the scenes. What we're! What we're essentially going to be doing here is to build.

C

Got to get a search on the.

C

C

Well, I tell you what, let's just let me I got it I got a deck that shows.

C

And there's a video I'll send a link to the the kind of showcase that we have where I have a video when I was doing this I recorded earlier in.

G

A

Yeah, if you could add the link to the video into the notes in the recap post, oh.

C

Yeah well, I'll. Probably do it later today, because Doug owns the the showcasing sort of branding stuff that we do here so, but I'll just sew here. Real quick. These are just think times from from another different version, hand there. So essentially you know in selecting that we have a deep feature synthesis and then what we do is we set that as the main aggregation target and that the putting the ID we set up the different ideations and then we'll we get out is essentially it. What it does is it looks.

C

It looks at those the it uses as main groupings, and it looks for other places where it can find other aggregations of those mean that make sense so that you have a very deep decomposition. So essentially in about two minutes, you get about 500 features that would, by themselves to be about 70, different sequel, queries to be able to generate, and then the next step is that we we essentially wrap around the garden-variety psychic random forest. Our own learning algorithm that we use for feature selection.

C

So, essentially, you know we're taking the same the same metric that we used to build all of the features and and we're using that as the target to predict and essentially what we get out of it is that we do two rounds, one where we do random sets of hundred different features from the five or six hundred that are in there and then we then we we apply our learning algorithm. Then we make other rounds so essentially all the Purple's or rounds that we're done using our algorithms seeded by the orange one.

C

So you know the difference between them. Isn't that much in this particular scenario, but I've seen the difference between sometimes be fifteen to twenty percent. That.

G

C

G

C

Percentage with the actual difference, as you have more runs, and then we also give you information about.

C

You know what were the most important features, and you can see that some of the features that were engineered there came up in it as well.

A

Awesome, how about you put up a final slide about how to get more information and how to contact you? Okay, then we'll kind of move into a I'll steal back the screen for a few minutes after you do that I'm seeing they'll get there so anyways we're at the top of the hour. Folks and I just wanted to mention a couple of things: we're not going to meet next month, because it's right in the middle of Red Hat's summit, and you know how crazy that is and we're.

A

Also it's I, don't know how many of you are going to, but a number of the m/l AI folks from Red Hat will be at ODS C in Boston, April 30th through May 3rd.

A

So we'll look to meet up with you guys, and anyone is interested in that there and the other thing I just wanted to mention quickly- is we're in the planning stages of doing a face to face open ship Commons gathering with an MLA ai, ai ops theme, sometime in the fall in the Bay, Area and I, always look for other MLA I events to co-locate with or to push together to so everybody gets the best out of their travel budget.

A

So if there's any other adjacent ones that you're all planning on attending, please let me know about them so that maybe I can plan the gathering to be next to that, and you only have to travel once if you're coming from outside of the Bay Area with that through the question I have for you is: where is you you Bix based? Are you in the Bay Area? Where is so.

C

We we've always been a geographically diverse company, I'm based in the Dallas Metroplex, our CEO lives in the Madison Wisconsin metro and our majority shareholders are and the frost data capitalist system he came from were in San, Juan Capistrano, but at one point we had seven FTEs in yeah and we've always had contractors in the Philippines or Ukraine or Romania. So you know, we've been following the Sun, yes,.

A

We all are these days, so that's great just keep it um on your radar watch for announcements on the mailing list. I'll. Add you guys to the mailing list too. Thank you drew really for taking the time to do this today. I know it was short notice. I just wanted to. Let people know that Jeff bean did ping me and if you see very, very Kuznets because they're in the middle of they were acquired date artisans, and so that's.

A

Why he's a bit delayed and being able to attend today, and so he'll probably come and do an Apache flank update at a future event, though, I want to respect everybody's time and let you go just do remember. There will be an AI, op steak meeting if anyone's interested in that on April 29th.

A

Those are Mondays and we're gonna skip may, and so our next meeting will be June 7th and please, if you Stephan people who are new to this group David, if there are topics you want to talk about our projects, you're working on you want to showcase. Let me know, and I will happily add you to the agenda, but with that through and everybody.

A

Thank you very much for your time today and I will ask drew to email me his slide deck and we will end the video link and we'll post this video of this meeting to our YouTube channel and I'll, send it out to the mailing list into each of you. So thanks again guys great.

C

Thanks, everyone have a great week.

B