Red Hat OpenShift Data Services Office Hours, 9 Sep 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Data Services Office Hour (Ep 8): Day-to-day data science

Description

Join data scientists Sophie Watson and Chris Chase for a hands-on Office Hour about data science. Be ready with your questions and learn a few things along the way.

This episode, we talk all things data science. Discussing how data is processed, models are trained and tested, and what data scientists eat for lunch.

Twitch: https://red.ht/twitch

A

A

A

Good morning, good afternoon, good evening, welcome to red hat live streaming. I am chris short host with the most of the channel and I'm joined today for today we're talking data services, particularly data science, I'm joined today by sophie watson and audrey resnick. How are you all today.

B

Doing pretty good so greetings from arizona anybody.

A

B

A

Is it warm there? Still? I, I guess is the question.

B

Give up to 100 today it'll be.

A

A little bit tasty.

B

Remember it's a dry heat.

A

No, I know that I know that it's a dry heat I've been in the dry heat, it's when the dry heat sucks, all the water out of you. That's when it's a problem.

B

Yeah, you just have to remember to keep on hydrating yeah.

A

There you go uh sophie, we are talking about day-to-day data science, which almost is a tongue twister. um I like how you did that there, the so. How are you sophie.

C

Very well, thank you.

A

C

A

I'm fantastic, the fire is going in the background because it's only going to be a high of 70. Today it is officially fall.

C

Here uh very jealous, it's gonna hit 100 here as well. Sorry y'all.

A

Can keep that thanks? I will stay up here, north of canada. Thank you.

C

Yeah, so we wanted to like talk. I know: we've had like a few sessions where we chatted about data science and all things: data science and, like we focused on data engineering last month, the call and guillermo. So obviously, if you missed that, go check that out. That's pretty! uh That was a pretty good discussion, um but I feel like people don't know what data scientists actually do so, chris, what do.

A

Data scientists.

C

A

You all do data.

C

A

No, I mean I to an extent I do know what you do, because I have worked with data scientists in the past. uh In.

D

A

Roles like getting them to learn how to use containers to run their models and run everything kind of at their own pace. So I know that you look at a ton of data. You create algorithms or use algorithms to kind of parse, and you know find things within the data that can uh help organizations down the road kind of deal and maybe detect future things.

A

uh It is pretty opaque to me. I will say, because a lot of math has involved and I have like, if there's a phobia of math, I have it. um It took me 10 years to take an algebra class, so yeah, it's pretty bad anyways. How.

B

Many fingers am I holding up no five.

A

I can count like if we get past, like the basics of like the the little calendar in your os right, like cosigns no lost on me, uh algorithms, not lost on me. I kind of get like sorting, algorithms and stuff like that. I get that, but the whole the whole thing with like sand doing math like so like that whole thing, no, I haven't. No, I don't have any desire to learn that level of math detail. Yeah.

C

So I think that's interesting you've hit on a few points, though, and you talked about what data scientists do um did.

A

I offend you, I'm sorry if I did no.

C

Not at all- but I think it plays into where I want to take this, so you gave this really nice overview of like we explore data we try and find patterns in it. We might use an algorithm which you know most people just say model, but we might use.

D

C

Algorithm or a model or a process to explore that data, and then you also said that we might make predictions using that we might try and predict the future.

C

So I like that, a lot, because I think there's this notion that all data scientists do is just train models.

A

I I don't think that's the case. I think there is a degree of you have to do that in your work. Right like you can there is some churn right, like there's a grind to your work and then there's like the actual discovery phase the aha moment right.

C

B

That I was just gonna say: I think that kind of describes it well too.

B

The the one thing that I think most people don't realize is how much time we spend with our customers trying to understand what their problem is, that they're trying to solve, and that leads us to ask more questions of the data, and I always um look at look at the data and tell the customers. The data is going to tell a story, but I need to ask you questions. You know: how do you use the data?

B

You know, what are you hoping to use this data for so, besides the mathematics, there is a little bit what we we call almost domain expertise. Yes,.

A

B

That kind of almost gives a venn diagram with domain knowledge, mathematics and then programming and then in the center areas, kind of where your data scientist sits. But I don't think that's the whole story, because sometimes data scientists, they don't end up, creating a model.

B

They may look at all of the data and what the user is um trying to accomplish and they may say. Well, you know you can solve your questions here just by using tableau.

A

Right sometimes.

C

A

Tools that you already have that you could throw your data at yeah. That's.

C

Correct yeah right and there's a lot of data scientists for whom the the role isn't to ever train a model and make predictions. It is simply to kind of produce reports and um process that information that you then pass back to stakeholders so that they can make some decisions.

A

So I mean I traditionally see that as like a business analyst kind of thing, do you see the two roles kind of merging, or am I just old in my thinking? I guess.

C

I think um I think it's trendy to call everybody a data scientist, so I think that yes, a lot of people that traditionally we might have called business analysts kind of five ten years ago and now um feasibly called data scientists.

C

um It really varies from you know: companies company, you know when audrey and I talk to our customers, so you get such a range of what it means. You know our first question is you know some account team might call and say: hey. Can you talk to the data scientist so we'll sit down with the data scientists and the first thing to say is so what do you actually do?

C

What do you think your job is? How do you spend your time.

A

Is it like? Is it more like an interview than like a work session, or is it yeah.

C

I mean like sometimes.

A

Yeah, okay, yeah.

C

It is like trying to figure out what they do and then you know what their pain points are and then how you know if we can sweep in with containers like he said and kind of uh ease, some of the friction in their day-to-day um going forward.

B

So- and some I was just gonna say: sometimes they know that they have a problem and they know that they can find it in the data. But when you start looking at the data you can go, did you know that you could do this with your data? So, for example, I worked with a mining company and they were looking at their their data for truck optimization, and then they were saying we have to spend 13 million dollars and buy these new graders to like grade the roads, and I was looking at the data going well.

B

Did you look at your data to see how much time these your current machines are using and they're like? No, we just know that they're all over the place, but we don't know if things are getting done so asking the questions and probing is um it's very important uh because sometimes you'll find these little hidden gems that your customer didn't know about right.

A

They say 13.

B

Million dollars.

A

Yeah, like there's so much potential because you come in with like a fresh set of eyes right. It's almost like a consultancy within a can within a larger organization right like yeah. I know you think you need to grade the roads, but do you realize that road isn't used often- or you know something like that right, like yeah.

B

Or did you realize that somebody's been taking a two-hour break they've just been sitting there in that spot for two hours? I don't know yeah not me not.

A

Me either absolutely not no.

B

Never you no, no, never yet.

A

My watch never tells me to move. I promise.

C

um Chris one of my friends just texts me to let me know that you're on fire- and maybe you should do something about that.

A

um It's digital so don't be afraid. Folks, it's it's normal. Okay,.

C

Okay, thanks for terrifying, so yeah, audrey and yourself. Chris also talked about um math like maths as brits call it um comes into this um and I feel like there's, there's two sides of there's two classes of people, one that think data scientists, just uh train models and the other the thing that data scientists sit down and do like absolute extreme next level, mathematics, pen and paper, greek letters all day every day.

A

Yeah greek letters all the time.

C

Yeah, so I think it's important to understand what, when you choose a particular model or an algorithm or you process your data in a particular way. It's important to understand the the underlying mathematical assumptions and statistical assumptions that you're making about your data or the the relationship between variables in your data.

C

But I don't know about you audrey, but I haven't sat down and written greek letters on a sheet of paper for a very long time.

B

Yeah, I would have to say that that was probably two years ago for me, um but then I do have some colleagues um ex-colleagues from the company that I worked for for before, um but yeah they they did sit down and do that. But I would say generally uh that's very rare.

B

I mean you just don't sit down all day and and turn out mathematical formulas or algorithms.

A

B

A

If you're not sitting there turning out maths, then what are you doing.

C

Well, training, algorithms.

B

I would say when you look at a data science problem uh 90 of the time is looking at the data and trying to understand it very well. So that means you could be maybe doing a number of small analytics where you're looking at some charts at some of the data or some graphs and then going back and asking the customer questions.

B

I remember one of my mentors from five years ago saying that if you get a problem and some data- and you start within a week going ahead and creating a solution, you probably didn't spend enough time on the on the data, um and I find that actually very true once you start mirroring yourself in the data and taking a look at the relationships between the different data and thinking about what the customer wants.

B

You usually end up digging around for more data and going back to the customer and saying I find this relationship, or I see the data telling me this.

B

Is there any other data that you you use that you haven't told me about, and I know for me a lot of the times the customer would go. Oh yeah, I totally forgot about this other database that we use to get x, y z right and then you're off again exploring the data one more time.

A

Is that I mean I gotta ask: is it frustrating to an extent right when people don't realize all the data they have and they give you a problem and they're like or give you data and just like hey, find something and like you, do you feel like you're kind of like needling, a haystack looking or like hunting for ghosts? Sometimes, or is it.

B

That's probably what makes the data scientist the data scientist, because we're kind of.

A

B

We dig stuff like that. I mean that's um for me. That's.

A

Just ghost hunting later or what yeah.

B

There you go exactly.

A

So, yes, we say day-to-day data science right like what does a data scientist do day to day to help the organization right? I know you said admire in data. I know that it takes say longer than a week to look at the data.

A

um At what point in time does the does an organization come to the data scientist and say hey? We could really use your help right now. I mean what do you do to make that um connection between? You know what your day-to-day work is and organizational goals.

C

Yeah, so I mean first and foremost, it starts with a lot of discussions, understanding what what success means for the stakeholders, what they, what they want to find out um what information they've got? How will you know if you've done a good job? um You know when is enough enough, um so I think it starts with a lot of discussions. Audrey.

B

Yeah, I would agree, I mean you, don't want to keep going back to a problem saying well if we had more data, if we looked at it this way, because you have to be cognizant of the amount of time that you're spending on a problem as well, I mean if, if you're spending so much time on on the problem, maybe the cost of you working on it is going to not give you the benefits of of just getting a short-term answer and sometime just getting a small answer or if you did create a model just seeing what the model generates for a few weeks can actually give you more valuable insights.

B

So for all of those people that, like looking at data talking to different people and trying to glean gems out of what your model may produce or what the data might tell you come on in, you know, be.

D

B

A data scientist.

A

So what are some problems that you saw? Personally I mean, I don't know if you can necessarily talk about all of them, but you know if you want to omit company names, feel free. I mean what kind of you know evolutions have you started.

B

Yeah, I know for for me um the most fun that I ever had with, I would say, a large data set and that kept on generating more opportunities was for an open pit mine that sounds terrible, but it was for oil sounds and trying to determine.

B

First of all, just the optimization of these large trucks carrying the actual raw material around on the roads and the first thing that they were looking at is: can we optimize how the trucks take paths on the roads.

A

B

It was apparent that sometimes when things were slow, it was really the road. So then, how can I determine if a road is needing repair or needing a grater, and then it went to?

B

How can I track my my graders to see what kind of optimization uh is being done, so those are kind of very general high level looks at what you can do, but I mean that project for me lasted a year and a half just because we kept on finding more valuable insights as we went along, and I mean we have to look the first time we talked with this customer. uh They were saying yeah. We want to do this and we're scratching our heads and we're like well.

B

Do you happen to have any gps data and they're like oh yeah, we've been collecting that for the last five years, but we haven't done anything with it right because the trucks that would go on the mine, they had a tracker and each time.

D

B

Pass a certain marker or checkpoint, um you would get the uh the speed of the truck. You know how.

E

B

With the time and the speed, you could get distance uh fuel consumption everything and they were just sitting on this for a number of years, because nobody really knew what to do with the data. So for a data scientist, I think that's kind of the treasure chest of data science, where you can go into a problem and there's just so much to look at and so many insights that you can gain and I would have to say the most important part of a data scientist and sophie.

B

You can agree with me or give your input on. It is really sitting down with the the people that are doing the day-to-day tasks. um I'll add one more item and then we can put it over to sophie. I remember talking to this one mine, engineer and saying: well, how do you know where all the trucks are in the mine and he's like? Oh well, um on the back of my crossword puzzle, uh this page I've written down where everybody starts.

B

You know so I kind of know in the morning where this person is in this. You know 10 square.

D

Kilometer area.

B

Then I have to jump in my pickup truck and track them and I'm like good god, there's got to be. You know an easier way that we can do this and we could do that eventually again, because we knew the gps coordinates. We could find out where the person was within the last half hour, but these are the sort of things that you can find.

B

A

I mean sophie, I think, of the like the the ups drivers, never turning the left or trying to minimize their left-hand turns as like a data point for you know, fuel economy and everything. What kind of discoveries have you made in your you know: data mining.

C

Right and then.

A

I have a question from the audience to ask for both of you.

C

Oh gosh, um okay, I'm just going to talk for the next 40 minutes, so we never get to the audience. Question sounds scary, um so I think um I should. I should throw out a disclaimer that, um although my title is data scientist and spend a lot of time with data scientists and.

C

We're often kind of helping them and advising them in their kind of roles in daily work. um I've had some real fun thinking about recommendation engines in the past, so.

A

Like for, for amazon,.

C

A

C

Well, I mean right so how many different, how many different things can we think we need recommendation engines for amazon sure. So, what's the.

A

Deal with amazon yeah like right so with amazon.

C

Like well, I don't know, what's the most frustrating thing about purchasing something from an e-retail site.

A

Or when they recommend that I buy something similar after I've already bought the thing.

C

A

C

New ironing board and then.

A

I recommend six of them over the next week.

C

Right, okay versus think about something like a tv streaming thing. So if you watch a full horror, show yeah, it knows you've watched it yeah. Does it not make sense that it recommends something similar.

A

Something similar with the actors, the genre, you know, there's.

C

So many so many data.

A

Points to touch there, yeah.

C

Exactly so, first up the even though at the end of the day we're just recommending, essentially, we can think of all of these recommendation engines as just suggesting products to users. You've got to think about the domain because in the movie case, we want similar things to continue to be recommended to the users versus in the case of I just bought an ironing board. I don't want another ironing board because I just bought an ironing board.

C

So I really like things like that in terms of recommendation engines and algorithms and models there's you know, there's there's a known set of algorithms that people use for these. When I go and approach a problem like this, I'm not writing a new recommendation algorithm from scratch. I'm not getting. You know the paper out and greek letters and kind of making something up. There's known algorithms that we'll take for this, but you can't just throw your data in and expect a good answer.

C

You've got to think about what you're gonna then return to the user what's important and what is it? And it goes back to that notion of how do I know if I've done a good job, so um I think yeah, the recommendation stuff was so fun because at every facet, every like turn, there was a new facet to think about. Like he said in the films you know. Oh, do you recommend things with the same actors in.

D

C

Exactly or um do you recommend things that are set in the same area? So when we moved to oklahoma and then lockdown started, we decided we were going to watch every film that had ever been set in oklahoma.

C

I can let you know that we got like three films in um and didn't finish, the third film, um but there's still hope, and but things like that, there's so many facets of ways that you could recommend things and chain things together, and then it falls down immediately as soon as you transfer that to a different, um a different domain, even though it's the same algorithm to recommend things.

C

The other thing that I got really hooked on was um spotify, so spotify.

C

A

Okay, go ahead. Sorry, I.

C

I mean I feel, like I have a lot of beef.

A

With spotify's recommendations.

D

F

A

Their recommendations.

E

A

Sometimes the songs I recommend are not anything. I would listen to.

C

Are you sure that um some of your kids aren't actually exactly.

A

No, uh my account is safe from the kids, so, okay.

C

So the thing I find fascinating about spotify is the way in which it can it puts together these daily playlists, which are a mix of things that I've listened to before and I liked, but I've never told spotify, I liked them. So how does spotify know that I, like them, any suggestions.

A

C

Well, throw that one out there I mean if you've got a song that you like, then you listen to it a lot right right.

A

Yeah, the more.

C

You listen to it.

A

C

Right, so it's not like when we buy something on amazon or we watch a film and we rate it five stars, you don't really do that with music. You instead just listen to it a lot and perhaps in repeated patterns, and so it's incorporating that information in there. So not only is it saying which songs have I listened to before. It's saying which songs have. I listened to many times.

C

um I always like trying to mess up the algorithm, see what happens if you put on a song that you don't like overnight, go to sleep, leave it on repeat in the other room.

C

Is that going to get into your algorithm the next day? Is that going to be on your recommended like daily playlist one in a week's time who knows figure it out? Try and like reverse engineer: how are they making these decisions?

C

The other thing it does with those daily playlists, which is fascinating, is um it kind of makes coherent playlists out of them? It's like somebody, has sat down and crafted you a personal mixtape and there's kind of four different daily playlists, and they all kind of cover similar genres, similar moods and it's not always that like playing this one is part playlist too, as rock it just it blows my mind. I think it's so clever um how they are using their recommendation engine and then they're not just recommending things that you've never listened to before.

C

So occasionally they do. They mix them in they've got some algorithm somewhere. That's determining how often they mix in new songs to your playlists, um but they're also recommending things that you've already listened to and putting those in your playlist as well. So again, it's different from the movie recommendations and the um the online shopping recommendations where it's not going to recommend you things that you bought before unless it's something like printer paper, in which case there's another algorithm running.

C

That knows that you need to buy printer paper like every three months, because that was how frequently bought it before so. Basically, I think recommendation engines are fantastic. I'm I'm gonna stop my branch. There.

A

So question someone's curious: is there a right path or known good path to switch from data scientist to ml engineer with ml ops expertise, I really enjoy training models and creating the apis and also. I would like that to take advantage of openshift to achieve that. So that's kind of like a long, long question to basically say: hey. Is there a path for me to work on apis more with right.

C

A

Machine learning, capabilities.

C

Audrey you want to.

B

Say hi to max back there.

A

B

Yeah, poor guy got three legs. um I think that that path is actually pretty seamless, it's very similar to kind of what what I've done. You can start on data science, but if you kind of get interested in any of the ml ops, it's very easy to slide into that role. Right now, I'm working with a number of ml ops engineers looking at how we deploy some of our our models and what could be future best ways for deployment.

B

So I would say the data science portion is actually very important because there are not too many. I would say of us that span those multiple roles and you can actually give very good insight into how ml ops is done because remember most ml ops engineers.

B

They may not know a lot about data science, so they may not take into consideration that you know you may want to say, use a couple: different techniques for looking at your data, whether it's grafana or whether you're using apache streams to pull in your data they're.

B

Only looking at how you're doing your model delivery- and sometimes you have to look at how the whole picture is put together, and I would say that, if you're looking to get more into ml ops, at least within red hat, there are a number of really good courses that you can take. There's like an intro to open shift, intro to kubernetes, and then it just goes from there and you can go down the ammo ops stream. So I think that switch again very easy. It's a very good switch and it makes you more valuable.

A

Yeah, I just dropped the link in chat uh for you and in hindshaves hinch avenues. Maybe I don't know don't know how to say that username, but I just dropped link in chat for the catacode of courses. um Okay,.

C

Chris, can I share my screen. Absolutely all right, let's see if we can get technology to play, can you see my screen? Yes, okay, so I think um yeah. I think when we think about ml ops kind of as the the discipline and the process and um of um operationalizing machine learning, um then it's a really important thing to think about um and.

C

The reason that it's tricky and the reason that there's this need for the mlx engineer is because um data science is hard, but data scientists aren't application developers, so we don't uh necessarily know best practices about things, like version control, source control, um kind of making code repeatable. You know the same in the same way that you'd want your um a standard application to be repeatable. It's really important that your machine learning code is completely repeatable and reproducible, and particularly in industries where things are audited.

C

So, for example, if I have to go back and say why did I deny this person a mortgage on dates x right um then, you've got to say: okay. Well, what model was in production on date x? What data was that trained on what particular parameters we're running for that model, etc, um and so obviously like perhaps or is it red hat? That's where you think our containers and containerizing nests and the container images everything's going to be immutable?

C

That's really going to help the story, but I just like this side because I think it captures kind of all of well, not all of but many of the challenges of reproducibility.

C

And so I think, if you have that skill of being able to produce these apis um to kind of put these models into production and interact with the models. But you're also aware of the underlying data science, then you're in a really good. um Instead to make sure that we don't make any silly decisions um when we're going forward.

A

How much data science have you seen in medicine? Lately, I'm just curious from a personal perspective. Having you know a couple.

D

Different injuries voting right now. Is it okay good, because.

A

I would love some kind of model to be able to tell me if I'm gonna have a high pain date or a low pain day right like something like that would be a life changer for me, because you know environmental conditions.

A

You know activity the day before that kind of thing being able to tell me like hey, you should be. You know, there's all these stats out there that people are looking at to say, hey, like that's the magic number for disabled people to look at. You know if this number is low, you're good. If this number's high, you should probably take a break kind of thing, and you know we're all searching for like the holy grail of data, essentially to kind of give us an idea of how our body's going to respond.

A

So I'm super curious about the medical sector in general, because I know, like you know, gas oil, natural resources. They all do it. I know shipping does it. Logistics is doing a ton of data, but there's like medicine is just this ripe area. I feel like for a lot of.

B

Improvement, I know medicine in particular, just within the last couple of years. They've always had a lot with imaging, and I would say that not that that's the the only thing that's really left into data science, but they've done a very good job of say, taking, say, x-rays looking at x-ray data ct scans or even video when they would use in colonoscopies to try to do anomaly detections, where they're looking for either cancerous tumors or benign tumors.

B

That's actually been very useful, or I know in the area of covid what they've done is they've taken x-rays of normal chests those with pneumonia, those with covet and can they possibly create a model that could detect covid very quickly and in that respect, they've done not too bad of a job with data that they have but remember within medicine. It's just not only these x-rays, I mean they're they're using it for for dna analysis.

B

uh They are using it for heart analysis. I know um gosh even like 10 years ago there was a company that I was talking to, that they were actually monitoring a person's heart rate from home.

D

B

Would wear a monitor and then based on an irregular heartbeat or something somebody could check in with that person to make sure that they were okay?

B

uh So you know data science has been around in medicine for a long time and the things that we're seeing right now are great and plus the health trackers that you're you're talking about right are tons of apps out there uh that you actually can just download and use.

C

Now that being said, I think everybody can think of an example where ai has been ridiculously wrong. um For example, you know if you train ai, to determine um identify different types of footwear, so high heels, sneakers football boots, etc.

C

um You could do that with photos. We've got object detection, so we can train a model. It's really easy to confuse that. All you've got to do is wear your high heels on grass and it actually thinks that they're going to be football boots, because it's picking up the grass, which is usually in the background of football boots, rather than picking up something about the shoe itself.

C

So- and I mean that's just a benign example, um you know there's chat bots that have gone very awry very quickly.

A

Yeah, so I don't.

C

C

I wouldn't put my life in the hands of an ai diagnosis, but I think there is information that we can use like you said chris, like you know, um don't go outside today because it's awful out there, that's you know, that's useful information and that's not detrimental.

C

I mean within reason that's not detrimental to us as a human versus. If an ai says, oh, you need extreme surgery, um but a doctor says no. I don't think you do. I trust the doctor. Hey carl.

A

Hey carl welcome, oh no, we can't hear you hey, oh magic buttons to be pressed if.

C

Only there was some ai.

A

That could uh die right like.

C

I am talking and I'm on mute. It's just.

D

Being late having you know, I'm still on mute, no you're.

E

D

Can hear you okay,.

E

D

Not only do I show up late, I show up with technical difficulties. You know, uh don't trust, don't trust my uh my data science model right now- that's happening. I guess.

E

Yeah the reality coming in.

D

Yeah, so it it sounds like we're talking about the appropriate use of models in certain situations is that is that right, or am I uh am I missing the beat here.

C

I think that works pretty well yeah chris was asking of like um healthcare.

D

C

And ai, um you know, audrey was talking about the really exciting work, that's going on all the medical imaging stuff or the forecasting prediction stuff, um and then I was just being a natural pessimist and saying: don't don't get ahead of yourselves folks.

B

I mean that's, that's the thing too is sometimes the the models that somebody comes up with, or some some of the analysis maybe just be on such a small subset of test data that, when you're using it in a larger grouping, it totally gets it wrong. So.

A

B

As an opportunity to get it right, I don't know.

A

Yeah and there's I I'm not gonna, get on my soapbox, but there's a lot of scenarios that, like aren't, accounted for in the collection of data. Sometimes right, like is the person in a wheelchair, for example, and they can't reach a thing and that's why they're not using it.

A

You know right like stuff like that, um like the environmental things aren't taken into account, and we look too closely at if you drill too far down you get data, that's very skewed and kind of works for 80 of the world, but not the other 20 kind of thing um I feel like that is going to be a continuing problem in the industry.

A

As you know, I mean, let's the thera knows trial started, I'm just going to throw that out there right like there's only so much you can do with ai and ml right now, right, like you, can change all of an industry with it, unless you truly are discovering some new way of doing things. um That being said,.

A

Are we in the right environment for data science to flourish and machine learning to flourish, or is it in your opinion, um a little too early for like us to put full faith in in our ai and ml, and it sounds like yeah? I don't trust all the algorithms just yet from the sophie perspective I mean. Is there anything obviously the the music suggestion engine works for you sophie, but the it doesn't work for me necessarily so.

C

Yeah, it's got a it's got an easy job for me. Really, it just has to recommend taylor. Swift, songs on repeat this is fantastic.

A

Low bar to clear got it, I think.

B

I think people in general um they should be able to trust ai to a greater extent. I mean you may not realize it, but on the highways we do have a lot of autonomous trucks, especially in the tucson region. uh There is a company that produces autonomous freight freight haulers wow and they they test those those vehicles um and they haven't had any accidents in the last number of years. They've done testing through various weather conditions, of course, weather in arizona, mostly sunny, but you can get the dust storms.

A

You can get on one side with rain.

B

A

Got mountains you can go, find some stuff yeah.

E

B

So um there actually may be situations where you may not even even be aware that ai is working well um and as well too, within the imperial valley within california, and here uh just outside of yuma. We have a lot of agriculture that relies on ai in terms of moisture detection.

B

Do we need to uh you know water, some of the crops, so there are a lot of things that ai is doing very very well, but I think with uh any new technology as soon as you start looking at something they're going to be a few hiccups.

B

So if somebody asks you do you want to participate in this ai study and you have heart issues or something like that, and they say, oh by the way, you're going to be our first test candidate. That's when I would be pretty pretty leery.

A

Right yeah, like read all the warning signs.

B

A

Having just spent my weekend at a roller coaster park, uh or at least a day of my weekend in the roller coaster park uh past the thrill level of two, I I shouldn't ride the ride according to the warning signs so yeah, but that is just because you know it's like: are you pregnant? Do you have spinal issues? Do you? You know it's like this list of things. You should not ride and it's literally on every sign on every roller coaster. So you know it's it's one of those things where it's like.

A

Okay, I'm glad they put that out there, because otherwise, you know I'm just going off doctor's orders at that point which were don't write anything you know so yeah you can do. The kitty rides. Chris, that's okay!.

B

Kitty rides are fine.

A

Your grades are fine, I just don't fit in them very well.

A

So carl, I haven't asked you many questions because you just got here.

A

um So I mean data science day-to-day. Where are you seeing? You know actual like things happening with data science in the real world? Right now,.

D

Yeah now that that's a good question, I mean we've as audrey mentioned. You know there are fantastic examples of ai and ml right that are happening all around us right I mean some of them are big examples right, physically big, like trucks on a highway which is pretty uh you know it's pretty interesting, but a lot of it too, is you know if you think about customer enact interactions like interactions, you have with shopping right, uh you have, I guess, the music recommendation algorithm.

D

You guys have already discussed, but there's also, you know next best action type models that are out there like what products can we recommend to a customer? And how can we recommend things in the moment right? It's not. How can we recommend things and then interact with you know a sales associate? Who then calls you later the at that afternoon and then says? Oh, you know the the uh the item you're actually interested in is in stock. Please come back to our store to.

D

uh You know purchase that, like that the opportunity is already gone right. So a lot of the I mean I don't want to call anything easy, because it's not right. I mean it's. The only thing. That's easy about data science is introducing bias right. We don't want that the maybe the more straightforward things or models that I've seen a lot of interaction with are those models that exist in internet shopping, right and surfacing recommendations, so recommendation engines um and next best actions. Those are those are the good examples, because it's an in the moment decision.

D

Your model will take the information. It has create an inference present it back to you and if you know sophie mentioned shoes right if I'm shopping- and I see an advertisement for shoes, it's it's not that big a deal. You know it's, it's not going to upset me, even though I have no interest in shoes right. It's benign example. So the we see a lot of these where the model can act quickly and even when it's wrong, the downsides are limited right.

D

So it's a very benign example: the the ones that the models that are getting all the headlines are the the really big hard to solve problems. Like you've already discussed in healthcare, where you know that touches lives, uh you know pretty much pretty directly.

A

Yeah I mean there was just a study.

A

I read that obviously used a ton of data to break down uh kidney function in patients that had covid and how, like the long-term effects of covid, it looks it's starting to look like the data is indicating that it affects long-term kidney function.

A

If you do get covered, you know at a younger age, you might need a kidney replacement at a later age is what it's starting to look like so like that data to me is fascinating right, like the long-term effects of a novel coronavirus.

A

That is truly interesting, because this is something that's going to be with us forever. um You know we're not going to eradicate this overnight kind of thing, um so yeah learning the the long-term effects of getting it is vitally important because there's a whole class of people now that are coveted survivors and um they might have long-term issues. If we don't go, look at the science now right and start looking at it and seeing what's changing in those people that had you know, naturally contracted coronavirus, yeah.

C

Right- and that brings us back to kind of the two different types of data scientists chris in my opinion, because the thing that you're talking about is kind of taking that data analyzing. It there's no sort of pressure on time and then making some conclusions and a report and then feeding that back to somebody versus carl talking about kind of the the ai that's ingrained in these systems. So, as a data scientist we'd have to think if we were trying to solve this problem.

C

Okay, well, which algorithms can I use, because I could probably make a better recommendation if I had three weeks to churn all of this customers data and every other customer's data and pass it through a really deep neural net and then flip it and reverse it and then do something else with it and then come up with this recommendation for them. But by that point, they've already, like they're gone right.

A

Like they've had strategy, changing sessions and yeah all those other stuff right.

C

Yeah, so it comes back to kind of thinking about sitting down with the stakeholders, understanding what's important, how you'll define success in the project um and then figuring out where to go next.

D

Yeah I mean in both of these cases right it comes down to the quality of the data and being able to ask the right questions. I mean you've probably already talked about this since I'm late, but you know the the real mark of a data scientist is coming up with appropriate hypotheses being able to test them and then understanding the impact for the inevitable iteration. That's that's going to happen as we continuously improve these models and with something like covid and long covered, and you know all of the unknowns that are out there.

D

I mean we simply don't have enough data. We have intelligent people who can ask the right questions so we're moving in the right direction. But if you think about longitudinal studies, I mean a lot of these studies happen over 20 30 40 years. I mean we're talking decades right. So there's we're only scratching the surface on lung covered.

A

Yeah and like the systems that are put in place today to start studying, these things will still be like turning away five years from now to continue to study them right. So, oh, absolutely yeah, it's pretty wild, so we got about 10 minutes left. Is there anything we want to share for that data scientist out there? That's trying to you know break through in their work today and find something awesome in the data ask for more data.

A

Is that just a stall tactic.

A

More data, more data, um I'm just kidding the some of the things that I've seen right, like in the financial sector, with ai model or ml models, um or just models in general. I should say, because I don't really know what I'm talking about is you know like person x like has these accounts? They probably would appreciate this product kind of thing um or person y is applying for a mortgage, so they need to like make sure their credit score is as high as they can get.

A

So here's what they recommend doing, or not doing while you're going through the mortgage process. That kind of thing right, like don't, buy a car when you're trying to get a mortgage like that's like great advice, um because it raises red flags to the people in the mortgage business right um so like I've, seen those examples out there kind of in my day-to-day work. What other examples do you think are helpful for data scientists that they've created over time.

A

Like the annals of data science, history.

B

Well, I don't know what would be so helpful that what I find interesting are just on our social media, how they've been changing their algorithms in the past couple years, yeah.

D

B

To a lot of the unrest or disruption, I mean they're, taking a harder look at um the emotion and the context of various uh conversations that are happening and and flagging them.

D

B

Whatever natural language processing that they're using for that or whatever algorithms they've developed, I think, um have been very interesting just over the past couple years and that that's allowing a lot of those individuals within those social media companies to actually ban people or ban groups.

B

um I would say those are more visible and those are actually more interesting and I think it's good for a lot of the you know the ones where they're actually banning a lot of people or groups to provide just overall, more stability and safety, though other people will say well what about the freedom of expression right well.

B

Safety first is.

A

What I always say.

B

Safety first but yeah, it's it's an interesting discussion and I I like the way that it's um it's going right now. It's just it's very interesting. Yeah.

D

I mean what I'm hearing you say: audrey is that as data science, artificial intelligence and machine learning become more prevalent right. We have the interaction of data and inference and modeling with society as a whole and that's a whole nother topic that we could. You know we could spend hours discussing and that's that's a really cool thing. I mean, as you know, it's it's a dynamic system too. So as society changes with data science and artificial intelligence like we're going to have to now go back and update these models and it's a continuously iterative process.

D

Given that you know you know the interactions between everything here.

A

So it's funny that you mention social media. I have a you know a different, an opposing view. I like sports, so I get nothing but ads for sports books and casinos all day long right like that's it. That's all. I see on twitter for ads like when I'm using the native twitter apps, it's just non-stop casinos and sportsbook. I am very much anti-gambling.

A

So right, like gambling is an addiction. The models aren't addressing that right, like they're. Just throwing all this, you know a lot of sport books or you know, throwing money at ads, so the ads are getting thrown up to people, and it's like how do you say this ad is not healthy for me right like that, is the problem that I have right like there's no way to report an ad as oh. This is actually bad for me like or I'm not allowed to do this by law.

A

You know some judge has made a ruling kind of thing um so right.

B

There are um sometimes uh when you click off an ad they'll say what do you not like about this ad or why are you not interested, so you can put that information there um that will get back to the the company, whether it's um google or whoever is popping that ad up on whatever device or browser that you're using it's.

A

Only twitter, let's face it, the only site I really use is.

B

That twitter is a really good point. I mean what happens um when those those ads come up to somebody who's fighting in an addiction. You know they're.

A

B

Yeah, it's it's! Not it's not easy uh coping with that. um I don't have a good answer for that. Yeah.

D

No yeah, I mean, I don't think it's, maybe it has been defined, but it's really uh whose responsibility is this right? Is it you know? Is it the person serving the ad? Is it the infrastructure I mean like? I don't know if we have an answer for that? Maybe we do I'd. Love to you know, hear a comment um in in that regard, uh but it's to me that strikes me as a incomplete solution to the advertising problem right. They aren't considering the new flow of data as you interact with the system. Sorry audrey.

B

Yeah, I was just going to mention, but you have to remember too, with a lot of these companies. The reason some of their ads float to the top is they've actually paid for that content to float to the top right.

D

B

You're that you're, looking at so um any like whether it's a social media company or just a web page browser remember there. There is payment for certain vendors to offer their services or their products. um So there is a balance that that company uh is is having, and you know it's the bottom line. Sometimes right, no.

A

I I trust me, I completely get it like. I know that you know this wonderful service is that I use routinely. Is I don't pay money to it? So I have to make money somehow I get that, but there has to be a better way right. Like that's what I'm trying to say is you know if, if someone from twitter's out there watching feel free to dm me.

B

They're just going to block you, that's what they're going to do! They're just yeah.

A

I mean, and you know what that's fine I I have a website. I can be de-platformed, I don't care, it's fine. You know it's just one of those things where it's like.

A

I wish there was a better way for the you know like ad deterministic things to happen right, like oh he's, literally blocked every single sportsbook account on twitter. We probably shouldn't show him any more sportsbook stuff. You know because they pop up every day, it's just a particular pet peeve of mine so and.

C

So I think that goes back to this need for ml ops, engineers, people that understand.

D

C

The machine learning algorithm can do what is other functionality that we need to bring in from other aspects of the system might be. You know it's. It's essentially you're kind of talking about just encoding, a rule right do not show chris x, and so then it's filtering those out the recommendation and going back to the stakeholders and thinking. How will we know if we've gonna done a good job? I think everyone will know they've done a good job when chris is happy.

A

A

How do they know? I'm happy, though,.

D

A

Them know what types of ads you would you would prefer to say, so I have gone through that data and it's like okay, yeah I've cleaned up some of it because it was just like way off right. Like show me ads about the san francisco 49ers like I do not care about the san francisco, 49ers no offense to anybody.

A

That's a 49ers fan, I'm not um so it it's weird on what it picks up on right, like it thinks I'm part of the green party, not part of the green party in the uk, I'm in the u.s right, like it's really weird but yeah. So um there's yes, there are things that I can do, but when companies target specific demographics, that's where I find the problem right like is that the right way to do things? I don't know to them so far?

A

Yes, there is a money tree kind of thing to it, but there has to again there has to be a way to give feedback to that model. To say not chris or not. People like chris kind of thing.

B

Right again, you're going into that whole topic. Where carl is right, we could spend a whole couple hours uh discussing the ethics involved in the money flow and how um certain demographics may or may not be targeted, yeah and yeah. That's the thing with ai. You have to use it ethically.

A

Exactly all right, any parting thoughts before we sign off here.

A

No all right well! Thank you all for joining.

A

Thank you all for watching out there uh coming up next on the channel here in an hour gonna be sitting down with some of our managed service folks to talk about manage cloud service offerings um in the cloud, so should be a nice little conversation here at 11am, eastern 1500 utc so feel free to join in, and you can catch this crowd again in a month um and then in two weeks we'll have the data service office hour will contain data storage folks, so stay tuned folks, it's gonna be a fun ride, stay safe out there, everybody.

F