NCAR Climate and Global Dynamics Laboratory 2021 CESM Workshop, 14 Jun 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CESM Workshop: Machine Learning, CESM-related Efforts

Description

The 26th Annual CESM Workshop will be a virtual workshop with a modified schedule on its already scheduled date. Specifically, the virtual Workshop will begin with a full-day schedule on 14 June 2021 with presentations on the state of the CESM; by the award recipients; and three invited speakers in the morning, followed by order 15-minute highlight and progress presentations from each of the CESM Working Groups (WG) in the afternoon.

On 15-17 June 2021, working groups and cross working groups have half-day sessions, some with presentations and some that are discussion only.

A

Machine learning we are post. ah Let me just clear my screen, uh katie dagon dagon from encar and I christiania blanovsky from the university of michigan are hosting the session and before we get started, we have a slate of speakers and then also discussion periods. I would like to remind you oops of the now I'm too fast a code of conduct.

A

So let me just display this for a few seconds, so it's just a reminder that we want to communicate in respectful manners and share the error except new ideas.

A

Okay, before we get started, we would like to advertise a few events, and this actually includes past events. These are opportunities for people that are maybe new to machine learning, and these are either upcoming opportunities like this one. So here we are looking at an opportunity this summer for grad students postdoc, but actually everybody to attend and participate in an artificial intelligence summer school, and this is organized by ncar and partners.

A

There was an nsf ai institute and, if you're interested yeah, please go to that link that I provide here and registrations are accepted until july 22nd.

A

So this is an opportunity for new people to yeah get started with machine learning. This is on the focus on trustworthy ai. So, if you're, really new to machine learning, maybe before you attend this summer, school take a look at last year's summer school from ncar. This was featured last year, a virtual summer school at encar and all recordings or lecture notes are online, and this really provides a fundamental background for ai and machine learning for the earth system sciences. I highly recommend this okay.

A

Last but not least, or at least a few more opportunities here we had twice now we had adu tutorials on machine learning. Here you see links now to the event that happened in 2019 and 2020. This was organized by karthik kashina from lawrence berkeley, national lab the it's a hands-on tutorial. So you have access to google collab notebooks, for example, and presentations are online, including a recording, again a wonderful opportunity to get started.

A

There is an upcoming workshop. This is a nor series, and this is now the third instance or the third workshop of that series. That is a workshop on leveraging ai in the environmental sciences. This is upcoming and note the deadline, if you want to contribute a talk, is tomorrow, so there's still time, but the deadline is tomorrow. The workshop will be in september this year, partly in person, partly virtual and additional opportunities are, and these are past opportunities, but notes and recordings are all online.

A

uh The second newer workshop on leveraging ai in the environment of sciences. There is a u.s climate working group with a webinar series that just ended. This is, I think, last may, or this may and then also there were a few wins from the european center for medium range weather forecasts.

A

Again, I recommend you to check this out. Okay, today's goal of this cross working group is to yeah bring people together. As we know, machine learning is a vastly growing field, and often these you know machine learning, efforts are pretty scattered, that's even true within ncar and katie will give us an overview actually of the in-car activities, and the goals of this cross-working group session is yeah to network to inform each other about the ongoing or planned ml activities related to csm and provide a discussion forum.

A

So this is a list of speakers and now I hand it over to and I stop sharing my screen hand it over to katie who will guide us through these talks.

B

Thanks christian and I'll also say, um those slides that we were just presenting should be posted on the csm workshop website. If you wanted to follow some of those links, um I would just check back after the workshop and you should be able to find those slides. So we have a great set of talks this afternoon.

B

We have, I believe, 15 minutes per speaker, so if speakers could plan to keep to time and possibly end, maybe at the 12 or 13 minute mark, we have time for some live questions, but of course we can also continue questions in the chat, as we've done throughout the workshop.

B

So let's get started with our first speaker, who is our invited speaker and that is laura zanna from nyu. Take it away.

C

All right, thank you very much for the invite, and I will right now, if you're on my screen.

C

So thanks everybody for joining us and and and thanks for organizing the session, it's always pretty exciting to see such a large community. uh You know both within cesm and within the machine learning kind of coming together and and thinking about those problems. So so thanks again, so um all right. So what I want to show you are, basically you know pieces of work that we've been up to uh in the last few years.

C

Tackling the problem of you know parameterization and ocean process parameterizations and then I'll give you a little bit of an update on m squared line, which is a large international collaboration uh to use machine learning, scientific machine learning to improve parameterization in the ocean, sea, ice and atmospheric components of climate simulation, with ncar, of course, being a big partner and, of course, csm being front and center.

C

So, of course, here I'm going to focus on the ocean component because I'm an oceanographer, um so some of the work has you know, I'm the one presenting and getting the invite, but all the credits uh goes to uh you know. Members uh in my group, uh tom bolton was a phd student in oxford uh with me back when I was in oxford, and that was for github and altra guillermo was a postdoc at current nyu and starting a faculty job uh back in the uk in september.

C

Okay, so I don't need you know, I'm preaching to the choir. We all love climate models and they're great. uh We use them for prediction and we use them to understand processes, but we also know that they have some limitations. So here what I'm showing is dynamic sea level uh in a one percent co2 experiment. So up here is the cme 5 ensemble down here is the same f6 ensemble and what we see is, of course, you know. We see basically strong signal in dynamic sea level.

C

You can think about it as integrated origin, heat content. You know with very large signal in the north atlantic and in the southern ocean, for example, in either of the ensemble, so really those those basin responds very strongly to one percent co2 in terms of sea level. What we see here on the right uh is the model spread uh in semi privacy in excess and unfortunately, what you can see is that, for example, again in the north atlantic and southern ocean, the spread is actually as large as the signal.

C

So in many regions of the ocean, the uncertainty or the you know. Basically, the the response of the climate models can be very different in regions where actually the the responses might be very strong and we were able to actually understand the spread or at least pin it down to the ocean component to the ocean models and the parameterization within them in particular. Mixing and eddies are are kind of critical uh pieces here that actually set the spread and set the uncertainty.

C

And so, of course, you know, we've been dealing with parametrization climate model for a very long time and they're, usually based on physical understanding of those processes and writing them down. As mathematical expression, so here of course you know again, I'm preaching to the choir. Our idea is: can we use machine learning to actually go and parameterize those processes, and so now, let's not assume that we know what the physics is going to look like mathematically.

C

Let's just give the algorithm a choice given the data, and so the idea is taking data either from observation. Unfortunately, in the ocean were a little bit limited, so what we focus on is going to be mostly a high resolution simulation and ask the algorithm to actually pick out what you know. What are the missing terms or what? What is the misinforming at course, resolution that would need to add to faithfully represent the physics that is unresolved now everything I'm going to show you is is trying to blend physics and machine learning together.

C

So we're not going completely blind, um but you know I'll give you a couple of examples, and so, as I said right, so we we're going to come up with parameterization of small-scale processes. So that's you know, for example, that would be a 100 kilometer grid box, which is the resolution of of our typical climate model and all the physics underneath it or the ocean turbulence are not parametrized and that's what my group has been really focused on kind of ocean turbulence, parameterization at the meso scale, so 10 to 100 kilometers roughly.

C

uh There are a few other papers, kind of tackling different aspects. So there's one paper on vertical mixing, which is a very nice uh way to use machine learning, a neural network and, of course, some some of our colleagues uh and here at denkar, who have been actually working on parametrizing, mesoscale, kinetic energy. I think the paper might be out already in the atmosphere.

C

Huge number of papers are coming out. Tackling many many processes goes back more than 10 years ago, and so there's really kind of you know a large amount of of work that is going into using machine learning to learn celebrity parameterization.

C

So here I'm going to show you a couple of examples from our group and what we found, what works, what doesn't and where we're going with it. So the first example is basically we're going to use um high resolution simulation, we're going to filter them and cause grain them and we're going to extract.

C

You know what is the missing forcing that the course resolution model should have basically so basically we're trying to diagnose a parameterization from a high-resolution model and we're going to ask a convolutional neural network to learn that missing, forcing given the large scale the resolved velocity. So again, you can think about it as we're looking for a function, some kind of function, given the data that will only be a function of the result scale and in our case the result scale are going to be resolved velocity fields.

C

So now what the neural net is doing, I'm not going to go into the details of the architecture. I just want to say one thing: it's certainly going to go and optimize through this data set to find the best function, but we trick it because the new owner doesn't know about conservation principle. It has no idea about the physics, just see images and tries to match those images and find the best thoughts.

C

So what we've done in this work is, we know we need to conserve momentum, because at the end of the day I need to go and take that parameterization and put it in a cross-resolution model. If I have a net input of momentum or net sync, nothing good is going to happen, as you can imagine. So what we do is we actually learn different components of the tensor and at the end we have a fixed layer that texts the divergence of that tensor.

C

That means that globally, when I integrate, I have no sources and same so globally. We actually conserve momentum, that's something you can build within the architecture. So, even though the neural net doesn't know about it, you actually can build some constraint within the architecture.

C

So that's one example I'll show you in terms of result, but here there's one physical constraint, but then we let the algorithm, you know freely give us. Basically, you know thousands of weight, multiplying some function and we don't really know what it is.

C

Second example, which is a little bit closer to you, know what we like as climate scientists and as physicists, which is okay. Let's, let's you know, take a step back and rather than ask the algorithm to learn a humongous. uh You know uh um you know basically function but try to constrain it a bit. So we're still gonna do the same exercise. Take a high resolution model diagnosis, the missing forcing from a course resolution model and now use a different machine learning algorithm. It's called relevant vector machine.

C

So it's a spouse, bayesian regression and what the regression is doing is we're giving it a library of function, because the library of function is based on data right. So we give it. You know basically, images of velocities gradients of velocity, higher order, derivative and so on and so forth, and we have the algorithm to actually prune through that library of function and select the one that actually best match the force. So now the beauty of it is that you don't end up with you know thousands of weights, uh multiplying things that you can't interpret.

C

We end up with weights, multiplying a function which can be a derivative or gradient and so on and so forth, so mathematical expression that you can go and actually analyze after the fact, and so there there is kind of you know the beauty to it, which is now it looks like equations, so I can actually understand it.

C

So I'll show you the good and the bad about those two things. The first step is, of course, we need to actually see if we're doing a good job uh and so we're doing some kind of testing offline. So, as I said, what we're trying to do in those idealized simulation is learn this true missing forcing so we have x here y here, that's the mean uh missing mesoscale, forcing in the momentum equation. That's the standard deviation, that's what we're trying to learn.

C

So this is what the neural net is doing right. So that's the mean standard deviation, that's what the equation discovery uh gave us again mean standard deviation and those are the correlation between what the machine learning algorithm has learned uh versus the truth. Okay, so in both instances you can see that both machine learning algorithms do a great job overall, so the neural net does better in you know many regions of of this domain and does worth in others where actually there's very low turbulence over there.

C

So maybe it doesn't matter but can't predict the waves very well. The relevance vector machine ended up being a tensor that I'm not explaining you the physics here, because that would take me more than the you know. Five minutes I have, but it really depends on the stretching and shearing of the fluid, which we know are important components for parameterizing mesoscale turbulence, and so you know in both instances the network do a great job.

C

The advantage of the equation discovery, even though has lower skill offline is that you know we were able to and to understand all those pieces with the neural net. It was much harder because I still have no idea why it does such a great job, both of them generalize us very well by the way. But it's just to give you. You know a flavor of the type of methods we can use uh directly using data to actually extract information for parameterization of misinformation.

C

So all of that is offline right at the end of the day, I need to go and plug it into a crosser resolution model. So, of course, we started with something very simple: a very simple biotropic double gyro model. So it's if you're an oceanographer, it's simplest possible model. You could come up with, so we have a jet in the middle a little bit as before, and this is a coarse resolution model with no parameterization. So that's the zonal, velocity again x and y over here.

C

That's the standard deviation of the zonal velocity and that's a higher resolution, so there were 30 kilometer and that's basically less than four kilometers. So a lot more turbulence right. The flow is a lot more energetic if we add the parameterization this is you know what the flow field looks like if we plug in in the course resolution the equation discovery, so you can see we're basically kind of recovering a lot of the turbulence in the flow field.

C

That's if we implement the neural network again, actually neural network does a better job. In both instances we had to actually tune down the parameterization the equation discovery. The model blew up so even though it was an equation, the model was was unstable very quickly, so we had to have the coefficients.

C

The neural network never became unstable, but brother gave us a solution that was highly unphysical, so basically completely forgot that there was reinforcing and gave us a gigantic eddy that took the entire uh basically the entire domain. There was no jet anymore. There was just a massive inverse cascade happening kind of swapping around all pieces of turbulence.

C

Again, I'm telling you the good and the bad right. There are many things that can happen, uh but nonetheless, in both instances with some tuning we're able to actually recover the properties of a high resolution simulation without the cost of the highlights.

C

So all of that is very idealized. uh Next up is implementing it in more complex models, so we started in bioclinic models, but of course, mum. Sex is our next step as part of m squared line, as I mentioned at the beginning, which is a new international collaboration. So um so I was asked to say specifically what we would implement so we're going to implement some of the equation: discovery parameterization.

C

um So both momentum and buoyancy that we've discovered in those kind of you know pre-trained model we're also going to implement the deep learned uh parametrization. Those are a little bit trickier uh again because, because of the form of the parameterization, so I talked about this one there's a new one that uh that we trained using data from coupled climate models so cm 2.6 and that's a stochastic parameterization, actually, where we learn the mean and the standard deviation of the missing person.

C

So those are kind of the learned parameterization that we're going to go and directly implement in 216.. uh But of course, we're doing a lot more in m squared lines. uh So we're going to tackle mesosphere and seven as a scale parameterization for momentum and tracers uh we're going to look at vertical mixing as well again momentum and tracers parameterization ideas, interface, both ocean and atmosphere and ice and atmosphere. So america, holland, uh is kind of tackling that problem at ncar and in the atmosphere, both within the boundary layer and and momentum.

C

Transporter cloud, so uh judith werner, also at mkhar, will tackle that for the ocean. um Many people are involved both at gfdl and ips cell and across many universities, so a wide range of of parameterization and processes uh that we're going to tackle- and I just kind of want to close with that- it's a very large project with many many partners uh and so really involved many components of it of using high resolution simulation and data simulation product with a male in theory to come up with new parameterization that we can plug into.

C

You, know the gfd ln, car and ipsl model for improving prediction, and so again this is uh this is an exciting product project, supported by mid future with with many colleagues that are absolutely uh essential uh to the project. So thanks again for having me and hopefully there's time for questions.

B

Thanks very much for that excellent talk, laura uh yeah, let's, let's take a few questions. We can also push into the discussion time. At the end, I forgot to mention that um if you have a question, you can use the raise hand feature in zoom or you can type it in the chat.

B

Let's start with a question from bill large.

D

You know, thank you very much. Could you tell me uh if there's any effort or advantage or what you do with your inputs, do you them or do.

C

They have units yeah. So that's a great question. So usually we try to uh normalize them uh because you want to wait by standard deviation, so so that that helps you. So there's a lot of pre-processing that goes in it by the way. So here I kind of you know, went through the 12-minute version of it, but there is what you do with the data at the beginning makes a big big difference.

C

B

Thanks yeah, I have a question about um how how we might be able to understand a little bit more why the neural network is doing such a great job. I mean: do you have thoughts on interpretation of that or explainable techniques.

C

Yeah, absolutely so we're definitely tackling this as well. So I mean you know: there are many techniques, as you know, uh since you're doing a lot of them now, um so we're using you know, feature maps and we're calculating jacobians what we're finding right now. The first step is actually a lot of. You know a lot of the gradient and the strain and stress that we had with the equation discovery.

C

That's what the neural net is learning as well, uh but it does hire other moments better and that part, I still don't understand why so right now, what we're doing is we're trying to give the algorithm the equation and then learn the residual from it and see if we can interpret the residual from the neural net so trying to kind of go at it. But it's hard to understand what the neural net is doing, especially for hierarchies yeah.

B

Thank you, I think we'll move on, but if you have other questions feel free to type them in the chat we can come back. So our next speaker is garrett lyman from the university of michigan uh garrett. Take it away.

E

Okay, um yeah thanks for uh having me so today, I'll be talking about some of the work that myself and christiana jablonowski are doing on, um trying to emulate simplified physical processes in uh cam so similar to lara's talk, but now we're talking about the atmosphere um so some over overall motivation.

E

As we all know, I think machine learning has become just this very intriguing and useful tool uh to us as scientists, and particularly in atmosphere and earth science, and one of the biggest uh applications of that for us has been to see whether or not we can emulate or improve uh physical parameterizations in our climate models. And that's where we began this work. When I came into the university of michigan um new to atmospheric science and new to data science.

E

So um it's been, it's been a long time coming, but we're finally, we're finally excited about some of these results.

E

So something we do want to look at is how machine learning performance depends on the complexity of the problem, we're trying to emulate the data, selection and preparation, the types of machine learning techniques that we use and, of course, the architecture of those of those techniques and all the hyper parameter choices.

E

So the number of neurons, if you're, using a neural network or the number of trees in your random forest, whatever the case may be, and in order to address these kind of fundamental questions, uh we've been implementing this in a hierarchy of extremely simplified atmosphere models. And if we look at a at a diagram of kind of the simplest climate model, you can make is 2d types of deterministic tests all the way up to some an amit model with realistic or or state of the art physics.

E

We kind of fall here in the middle where we're we're in this 3d dicor, but we're keeping it very, very simple. With a dry uh held solaris test, some of you may be familiar with and then a moist version of that as well, where we allow it to rain as well and uh in the dry setup we have two forcings. We have the horizontal velocity, which is a very linear function. We're not going to tackle this with machine learning, because you can.

E

You can emulate that with just uh linear regression, if you want, but in the temperature forcing this newtonian newtonian newtonian temperature relaxation, um you actually introduce a little bit of nonlinearity with the latitude dependence on kteq, which means it's just outside of the range of linear regression, but it's still extremely simple and so we're going to try and emulate that and then in the moist setup you start to introduce condensation boundary layer, mixing and heat fluxes, and you of course allow it to rain. So all three of these um are non-linear.

E

The temperature tendency is definitely dominated by the newtonian relaxation. So essentially that's going to be the easiest, but it's still highly non-linear compared to the dry case and then the q. You have only these non-linear terms, a little bit more complicated terms and then, of course, precipitation is an integration over the vertical.

E

So it is also a much more complicated problem, and this is what I've been spending the majority of the last year uh trying to try to emulate so machine learning we're all here, because we have some kind of an interest in it and fundamentally, what it is is just determining functional relationships between an input and an output of a data set and, in general, the most modern machine learning techniques.

E

The more data you have, the better um not all of them, but uh and but most of them and the ones that we focus on are random. For us. For the majority of this work, I do have a neural network in here.

E

I hope we can get to it see how time goes and these the models are built in python, using established libraries, scikit-learn for the random force and keras for the neural networks, and then we've incorporated uh sherpa, which is a library out of uc irvine that helps us to hyperparameter tune our hyper parameters in our machine learning model.

E

So it can tell me what the best choice for number of trees in my forest or the number of neurons in my neural network and many many more things, there's pretty much a countless number of options that you have. So it becomes a little bit uh taxing if you don't have some kind of an optimizer and just diagram wise. I want to introduce what a random forest and neural network is so random.

E

Forest is an ensemble of decision, trees that are randomly initiate initialized and each decision tree uh is fit individually to come up with a prediction based on your data um and then your final prediction is a some kind of a an average of all of the trees uh predictions, so something that's unique about random or useful about your inner forest.

E

When you're talking about a physical problem, not a data science problem is the fact that it cannot predict something outside of the scope of what it's seen, what it's been trained on, because it's just an average based on the out the outputs that it's trained on.

E

So things like the precipitation which we know cannot be negative value that inherently can't be predicted as a negative value with a random force, which is why they're so powerful or part of why they're so powerful uh neural networks, on the other hand, are a system of uh interconnected neurons with nonlin non-linear activation functions, which introduces that non-linearity and the computationally they're very efficient once you've uh implemented them to run on either parallel or on a gpu.

E

But they also have a very large range of functional space that it can fit. So it's extremely powerful, but it does not have that inherent.

E

It it has trouble when it's trying to extrapolate right, unlike the random forest, where it'll always predict something that's seen within the bounds of what it's seen before. So there's, there's uh advantages and disadvantages to everything.

E

um I'll talk quickly about the model configuration where we get our data set from um we're using two configurations. Most of the work with our random force is done on a what we're calling a two degree grid with the cesm 2.1 finite volume.

E

Dynamical core both of these are run with 30 vertical levels, but the 2 degree grid is, of course, 1.9 by 2.5 lat lawn and we're running that one for 60 years and we're collecting weekly output, so uh we're not collecting it every time, step or physics, time step we're collecting it sporadically throughout this 16 year, 60-year climate model run or atmosphere run, and this is mostly used for our tendencies: the dt dt, the dq dt and also for precipitation, and only on random force.

E

uh We also have an older one degree. Resolution um run that was a three year run with hourly outputs the same order of magnitude on the total number of data points we have, um and then we used this for a a precipitation run about a year ago, which gave promising results and I'm hoping to talk quickly on that. um But most of the work is going to be with the random force.

E

For the two degree data set and uh as far as splitting up our data we use about 50 years for training and validation, we leave a gap and then we for all my results. We use what's called the test data, which is about six years at that weekly output for um for the two degree, um and so it's basically we're running this all offline, but on data that it's never seen before, so that we don't have any overfitting issues.

E

If it does, if it does poorly, then we know we overfit if it does well, then we know we kind of where we want to be, and I just want to introduce the idea of an r square. uh Some, a lot of you are probably familiar with r square, but essentially closer to one. Is what we would like uh the better? uh Our machine learning has learned whatever we're trying to emulate um and the lower around zero, the po, the worse it is at emulating, and if you have something negative, which is definitely possible.

E

uh It essentially means that you didn't learn your the the variance between the unexplained variance of your machine learning model is just uh far more than or more than um the actual variance of your data, so it didn't learn anything so keep an eye out for basically white space on my r square plots. If that that that just means that in those regimes it didn't regions, it didn't work, so some results for the dry.

E

This is that very simple uh case where we have that newtonian temperature relaxation and on the left is a zonal, mean time mean um plot from the actual cam output uh in the middle for the over the testing data in the middle, we have the machine learning. This is a random forest and we see it's very close. um This is what we should expect to see.

E

It's a very simple, simple problem, um something we weren't seeing for quite a long time, but recently got it, got it kind of working finally, um and then oops clicked away from something uh the r square for that is uh kind of reinforces how skillful that that predictor is, but um you know 0.96, I think, was the minimum we're looking at something that learns very well. But it's learning something very easy. So it's easy to get excited, but it's also, um you know, take it with a grain of salt.

E

This is a very easy problem, so the more interesting results starts with the moist case for that dt dt- um and this is the same cam model on the left, the random forest in the middle and then the difference on the right, and we see that overall, the structure is there.

E

We were predicting that equatorial region, where the most heating is happening quite well, we're missing some structure here in the mid-latitudes and then you can see a little bit of structure on the lower levels that are that is missing as well or slightly slightly different between the two but overall, even in the difference plot.

E

We're hovering right around that that minus 0.1 to 0.1 um kelvin per day, which is which is encouraging- and it's also further reinforced in our in our our square, which is pretty much the majority of the domain, is, is around where we want to see it 0.9 and above, and we have a little bit of negative r square here at the poles and the lower levels.

E

And we do have this poor performance close to the surface, and I do think I know where that's coming from, and I have a a new run going right now to try to address that. But essentially it's because I'm I'm parameterizing with sherpa on the level like around 800 850 hectopascal. So it may be parameterizing well for everything above that, but not below it.

E

So I'm trying something a little closer to the surface right now, so we'll see, if that, if that improves that low level um difficulties um for dqt, this is probably the the most difficult and also the most challenging for uh machine learning to to emulate. So, overall, structurally in the zonal mean time I mean we see it's, it looks similar we're missing those those largest we're overshooting.

E

I should say in the equatorial region with the moisture and then um even worse, so in the r square, we're seeing significant uh negative r square, which is basically it wasn't. It didn't learn well and and where it did learn um it's not anything to be too excited about. You know the point two to 0.6 range, um definitely something that needs to be improved upon, and this was a first attempt uh just last week, so I do expect I can. I can improve this somewhat in the next couple of weeks.

E

um The precipitation, however, was a little bit more encouraging kind of in between the dtdt and the dqdt. So on the left here. This is now just a time mean since it's a surface field, so you have latitude on the left. Longitude on the right- and we see pretty pretty good between the equatorial region and the mid latitudes, where most of the excitement is actually happening.

E

The difference plots also look look good, there's. I think if I would have done a uh longitud zonal mean on this. The line plot would have been a little bit more uh interesting, but I do like the panels too. So uh it's something I I could have done this. These results actually just finished this morning, so I didn't have a great amount of time to to mess around with them, but uh the r squared also is is encouraging.

E

We have point six and above for the most of the regime, and then these pockets of low and somewhat sometimes negative are square where most of the there's this transition region between where the activity is in the equator and in the mid latitudes and just outside of that equatorial region seems to be a little bit difficult and um and for the machine learning to to predict, but we do see a skill overall skill of around 0.85, and uh so that's that's encouraging for sure. um Also.

E

This was the first attempt with a random forest, so we're I'm excited for. You know first attempt to be this, this skillful um and then just wanted to go back. We don't see any negative precip on these. um These mean plots here for the machine, learning predictor uh and that's, as I mentioned, part of the uh the benefits of using a random forest.

E

So I will quickly play this. Video, which was generated by christiana from our old one degree simulation using a neural network uh to parameterize the uh the large-scale condensate or precipitation, and we see that there's a lot of skill. uh The the flow of the precipitation over the test data is uh is uh very similar between the two. This is just a no mean.

E

This is just an eye test in a movie and- and I show this time mean just like the one before, but we see here these little pockets where we have on the right. We see the skill in the peaks, the equatorial region and the mid-latitudes, but we see these pockets um that are accumulating negative precipitation, in the mean, so not just a little bit enough to be to be noticeable, um and this could cause instabilities or things if we were to couple it back to the to the gcm.

E

So um you know, neural networks are are good. They are difficult to parameterize, at least in my in my experience, and they do come with their um with their negatives as well. So it's something that's that we're interested in diagnosing and addressing um in the future uh and then I'll just leave the summary, because I think I'm running late, but I don't know how late we started. So uh hopefully we can answer some questions, um but uh yeah I'll conclude. There.

B

Thank you, garrett yeah thanks very much really interesting work thanks for including those late breaking results. um uh Yeah. Let's take uh one quick question: um we we do have one that just came in the chat, so we'll read that uh thanks for your talk, what is the reason behind leaving a four-year gap between the training and test data.

E

um So the four-year gap is, is uh it's encouraged by the fact that you don't want any kind of like correlated signals between your testing and your validation data and your uh or sorry your training validation and then your testing data?

E

um Of course, four years is well beyond what I what I actually need um as far as a gap there, you know a few weeks to a month would be fine, but I used four years just because I wanted to take the last part of my of my data for testing, and I didn't want to test on too much, because it's just uh it's you're over complicating your computational resources by running it on. Let's say I'm running it on about six six years.

E

If I ran it on nine years, it's you're not gaining a whole lot more out of the predictability of your your machine learning model there. So I leave for four years for the um three year case. I left only about three to four weeks, which is more in line with where you want that gap to be to avoid autocorrelations and stuff with the climate signals and in these simplified test cases it's not as important. But it's a good practice for when you do kind of go into those more complicated things and more realistic.

B

Excellent thanks garrett, if you have other questions, feel free to type them in the chat or we can come back to them at the discussion. So I think we'll move on to our next speaker, which is uh zachary laid from colorado state university. Take it away zach.

F

Okay, so hopefully you can all see my slide so hello, I'm a postdoc at colorado, state university, I'm working with dr elizabeth barnes and today I'm going to be talking about a little bit of a different aspect of machine learning that we haven't discussed quite yet, and it's really the idea of explainable machine learning. So how can we take?

F

You know we have this machine learning model that makes a prediction, but sometimes machine learning is often regarded as this idea of a black box sort of how is the machine learning model using its methods inside like its neural networks, to really make its prediction so to get everyone started. um My background is a climate scientist so something I often think about is maps and particularly, let's think of a map of surface temperature and one way we can sort of think about you know.

F

Maps of temperature is to calculate the global, mean temperature, so right here, you're seeing a time series. This is from observations I'm using the third generation of the noaa 20th century reanalysis data set. So you see your classic climate change and variability signal going forward until 2015..

F

Another way we can think about global temperature, which many of you are familiar with, is running climate models. So here now, I'm adding on the cesm one large ensemble and one advantage is all of you are aware that the large ensembles allow us to really think about natural variability and noise through each ensemble member, in addition to averaging through them, to really understand the forced response, but to understand patterns of regional, climate change and variability.

F

There are many different types of external forcings that affect temperature patterns across the globe. Things like greenhouse gases, of course, and carbon dioxide, other things like industrial aerosols or biomass, burning type, aerosols and also land cover and land use changes.

F

And of course, then we have everything else. That's in the model things like that internal variability, which those ensemble members really allow us to capture. But again it's really challenging if we're thinking about sort of climate change attribution to really disentangle what is being affected by things like greenhouse gases versus industrial aerosols.

F

So for this machine learning problem, I'm going to be using a new data set from the csm large ensemble one, where I'm going to be thinking about the all, but one forcing files. So the idea of these concepts and to go forward with my sort of naming nomenclature.

F

um What you're doing here is you're, taking the csm one large ensemble and for different simulations you're, actually fixing one of the four things. So in my case, let's think about aer. So in this case it's your full, forcing csm, large ensemble, but I'm fixing greenhouse gases to 1920 levels. So, therefore, the predominant or dominant, forcing in this case will be aerosols.

F

So then in greenhouse gas ghg plus, in this case I'm fixing the industrial aerosols to 1920 levels, while the greenhouse gas enforcing evolves realistically through the 21st century, and then I'm comparing those different simulations to the all the typical standard simulation from csm1, large ensemble and then also using observations.

F

So really so. What you know, as I've already mentioned, these different external forcings, can affect regional climate, variability aerosols, potentially you know or remain a big uncertainty, even understanding, 20th century historical climate change. So the question is, you know using explainable ai? Can we gain new insight to understanding forced climate signals from these different forcings?

F

So I'll return to that idea of thinking about a surface temperature map and I'm going to set up a very simple problem from a neural network, and this problem again is really not interesting per se. So uh to explain what I mean by that, I'm going to take each point on a temperature map, that's going to be one sample at every latitude and longitude point, I'm going to input that into an artificial neural network.

F

This is a very shallow neural network and adding hidden layers doesn't really, you know, affect its accuracy or prediction. And then my prediction from this neural network is to tell me some metadata about the file in particular. That metadata is what year is that map coming from now you could argue and say: well. I already know that you know if I read in let's say a net cdf file of temperature.

F

I already know what's in the year of the temperature map, but what's really interesting is we can now use these explainable ai methods to understand how the neural network is making its decision, and in this case there are many different methods for explainable ai. I'm going to be focusing on one called layer, wise relevance, propagation of lrp and if you're not familiar with lrp.

F

Essentially the concept I'll go through a simple examples: um let's say: you're doing an image classification problem, so I'm inputting an image of a wolf into the neural network and I'm hoping it classifies it as a wolf. In this case it does.

F

But now I can use the explainable ai method called lrp and it's going to provide me a heat map of where the neural network is looking. That adds to its decision that it was a wolf. So, in this case, you're going to get a heat map for every point of that image, and you can see that it resembles a wolf. We can do that for other examples.

F

Here's an image of a volcano, and now we can see that there's an outline of a volcano from the lrp that helped determine what are the points that made it um so the neural network could make an accurate decision.

F

And, lastly, I, like sharks, here's an image, um that's inputted, into this neural network, which correctly classifies it as a great white shark, and you can see that heat map from this explainable ai is saying: where did the neural network? Look to make its decision I'll? Add that layer, wise relevance? Propagation of course, is not the only method of explainable ai and it's not necessarily perfect. It's subject um to interpretation.

F

In this case the neural network incorrectly classifies this frying pan as a crock pot. We still get this map from lrp of where it looks to say it's a crock pot, but it doesn't necessarily mean that it was accurate, so it it requires careful interpretation when looking at these lrp maps.

F

So again, I'll return to my problem, this idea of taking a temperature map from climate models and observations it predicts the year, which of course, is not that interesting. But now we can produce these heat maps of where, on the temperature map did the neural network look to be able to predict the year. This allows us to no longer think of machine learning models as black boxes, so to get to the data and just to show what these you know. The raw data is looking like for these different large ensemble simulations.

F

Of course, in the first simulation where aerosols are dominating and greenhouse gases are fixed, we have cooling during the late 20th century due to increasing aerosols in the simulation with fixed aerosols. We see even greater warming across the world due to greenhouse gases that are evolving through time and then in all simulation where we have both aerosols and greenhouse gases evolving over time. We see it a bit less as warm um for the temperature trends due to that aerosol interaction when compared to the ghg plus okay.

F

So now I'm finally going to get to the prediction of our simple neural network, again sort of a less interesting aspect of it. So what you're seeing here is the prediction output and how to read these plots um again, I'm predicting the year. So you see that um that line here in the white line. That is your one to one line. So we want our data to follow the one to one line which would indicate that it correctly predicts the um the year of the maps.

F

So what we do is we take these different simulations and we have three neural networks, one that is trained on each of these different large ensembles, and you can see that the neural network does a great job, even in the case of aerosols, only and greenhouse gases fixed it still for the most part in our training and testing data correctly predicts the year.

F

But let's take a look now after we've already trained our model on the climate model, simulations again three different neural networks, we now can test observations, so how well does our model learn to be able to predict the year of observations?

F

And now you can see where the differences really evolve. In the case where there is no time evolving greenhouse gases, we can see that our model does not correctly predict the year in aer plus. We can also see it does pretty well in the ghg plus and in the all the typical csm large ensemble, um one proxy. We could think of how well the observations are being predicted is sort of the slope or the r squared of these predictions, and in this case, what's really interesting.

F

Is that our model that is trained on the greenhouse gases that evolve over time and fixed aerosols, so the middle panel actually gets the correct order of the years better than the more realistic forcing in this regular csm large ensemble.

F

We can then make sure that you know this wasn't just by chance by the different combinations of training and testing data, in this case, I'm using training and testing data from different ensemble members in each of these three simulations. So we can see here by testing different combinations of random initialization seeds or the training and testing data. These are histograms of the slope of observations.

F

The goal here is to be closer to that one to one line or sort of that closer to that perfect prediction, as you can see here, this reiterates that the network that is trained on time, evolving, greenhouse gases and fixed aerosols does a really good job for at least capturing that one-to-one slope paralleling the real holistic line.

F

So now I want to get to the explainable ai part. So how is this network making its predictions, and in particular, why does there seem to be some greater skill in terms of the correct ordering of the years in that ghg simulation?

F

What's nice about lrp? Is that we get a heat map for each year over time, so, instead of a simple regression problem where we would get sort of one map of the regression weights. In this case we actually getting lrp maps over time and how to read these is that the whiter or the higher relevance values are more important for the network to make its decisions I'll point out that one aspect of lrp is that it's leveraging non-linear patterns or correlations across space.

F

So we can't directly infer that surface forcing over a particular region is related to higher relevance. So one you know realistic example of that is is in the arctic. We know there's a big climate change signal, even with you know the large internal variability, but in our simulations for the neural network, the neural network is almost never seeing the arctic or using the arctic to make its predictions.

F

So I realize there's a lot of maps on this. So if we really break down sort of this period from the mid 20th century into the mid 21st century, we can look at our three different simulations, and now we can compare them. You know, where is the network looking between the one that's aerosol, driven or the greenhouse gas driven or the aerosol? Only for or the realistic forcings on the right?

F

So we can see one area that stands out in all of these simulations is the north atlantic, so the neural network is using information temperature information over the north atlantic to make its prediction and I'll keep reminding you that again, that middle ghg simulation actually had this prediction problem that correlated to the order of the years more closely than the more realistic, csm large ensemble all forces.

F

We can also, then compare because of course, as I you saw in the previous plots, the north atlantic is important in all the simulations.

F

So now we can run our network many many different times to see whether that was just a fluke of the hyper parameters and make them a histogram of sort of the relevance over different regions. Here, I'm outlining the north atlantic, so you can see the histogram that is further right indicates that there was more relevance in the north atlantic for ghg plus to make its prediction for observations, and we then can compare different types of regions um we can see. If you focus on southeast asia, we can see that ghg plus, where there's fixed aerosols.

F

There is lesser relevance over that particular region to make its prediction, because there are no aerosols over that area.

F

So I'll just end with my key points, I really believe there is a lot of potential here for explainable ai to sort of reveal patterns of climate change and variability sort of as a pattern, recognition method across space really leveraging those non-linear relationships we can also see in interesting result of this is that at least for how our artificial neural network is being trained. It's really producing a higher correlation with the actual observations without time involving aerosols.

F

Now, the interpretation of why that is occurring is difficult, but potentially it could be that these patterns of force, climate change, such as over the north atlantic, may be closer to the forced signals in observations, suggesting again the importance of looking at our climate models and how sensitive sensitive they are to very small changes in aerosol forcing so you can. This paper was recently released about this and I'm happy to take any questions. Thank you.

B

Thanks doc, for that really nice talk very interesting results, um yeah! Let's, let's take a question or two before we move on so maria go ahead.

G

Hey zack nice talk. um As always. um I have a quick question about lrp, um so there's a paper submitted by someone in your group about how lrp may not be the best explainable ai method.

G

So I was wondering if you tried some of the other explainable methods like input times gradients or something like that and um or if you're planning on doing that. Yeah just wondering about those issues spotted by your group.

F

Yes, that is a great point, so for the lrp method, what is doing is it's taking its output sample and it's back propagating it through the network and it's using these different sets of mathematical rules to then produce the heat map, so the fur lrp itself there's actually many different methods of lrp or methods of this backward propagation.

F

So what we've done and as the paper you're referencing, is to take a look at the pros and the cons of the interpretation of the different back propagation rules. So we've repeated these results with different lrp propagation rules that sort of get rid of some of the issues that have risen and we actually see that the patterns remain the same, at least the spatial patterns in our setup.

F

One other opportunity, I think, for this type of work, particularly for comparing different climate models, is an explainable ai method called backwards, optimization, which would really allow us to take a look at the differences between climate models, particularly maybe useful, for like cement five version of six. um So there's a lot of really exciting explainable ai methods that really allow us to dive in especially useful for climate science.

B

Awesome, thank you. Yeah thanks zach. This is certainly a topic that we can come back to during the discussion as well. So looking forward to that, let's move on to our next speaker, uh who is kevin rader from ncar kevin. Take it away.

H

All right I'll share my.

H

H

All right, how's, that looking to people.

B

I think you just want to swap displays because we're seeing the notes as well.

H

Oh okay, uh I actually don't know how to swap displays.

B

If you go back to the screen, you were just on, if you put it in the presentation mode again,.

B

And then at the top left, there's a button that says swap displays.

H

ah There we go.

B

Perfect looks great.

H

Thank you for helping old, luddite, all right so yeah. This is um very different. Talk from the previous few. This is not about results. It's about a new data set and how to use it, which I don't know yet so I'm hoping to get input from everyone.

H

So the people involved are myself. I'm kevin rader, I'm working in the data simulation group at ncar. I had a lot of help in putting together this data set from uh the team, jeff anderson's the pi and then lots of other helpers, which I have listed there. I'd like to thank ian grooms for making the first query about using this data set for machine learning.

H

Hadn't occurred to me yet, but uh I think you may have promise and also I'd like to thank uh katie and maria for exploratory discussions of using the machine learning in the cesm context.

H

So from wikipedia we've got a definition that I looked up just to not be totally ignorant: high quality, labeled training data sets for supervised and semi-supervised machine learning.

H

Algorithms are usually difficult and expensive to produce because of the large amount of time needed to label the data, uh in my case, we'll see that the difficulty was not labeling the data it was just producing the data the labeling came along with it context for this data is that a reanalysis is a picture of the state of the atmosphere or any other system you have which uses the information in both the model. Heim casts and observations and takes account of the uncertainties in both of them.

H

The tool we use is the ensemble common filter, which requires an ensemble description. The state of the model's atmosphere and the result we get is a variety of data sets which may be useful as training and verification data and machine learning contexts.

H

Some of it is labeled highly labeled, I'm hoping you all can shed some light on that. So, as I mentioned, uh we know about data simulation in these data sets, but we don't know much about machine learning.

H

So here are some highlights from the model that we used in the data assimilation for the heimcast based on the cesm21.

H

The atmosphere is cam6 and it was a one degree resolution with 32 levels. That's called the workhorse model for cesm, had an active land model, active sea ice and an active river model. So those are time evolving and creating their own data and it was forced by a data, sst sea, surface temperature from avhrr.

H

And we also used millions of observations at every uh assimilation time. These observations like radius on temperatures or satellite wind observations are assimilated into a model state which is consists of the surface pressure, temperature, winds, moisture and some cloud quantities.

H

So, by combining these observations, with the forecasts that we've made with the model we get equally like 80, equally likely cesm states that are consistent with the actual weather and with the cam physics.

H

So consistent here means that it's a balance of information in the observations and the model heimcast and it explicitly accounts for the uncertainties in those two main sources of information and those uh those uncertainties are represented by the observational errors that come along with the observations and the model ensemble spread.

H

So we feel like this might be an unprecedented data set, but I'd like your input on that. It's an 80 member ensemble output, which is pretty big by most standards in the atmospheric sciences, the large ensemble experiments and others like that and ensemble data simulation have shown the value of having a large ensemble.

H

The data frequency is four times per day for many of the data types and it spans nine recent years, 2011 to 2019 so far. Most of them are in that cdf format and the rest could be converted.

H

I'm interested in getting your input on about how useful would be to reformat this into a czar format for cloud computing or whether most people would access large datasets like this on a supercomputer and the data is archived at the end, cars research, data archive in ds345.0 and, of course it's.

H

Free, so here's the first example of one of the observation of one of the data sets that we created. These are the observation files. They contain input and output of the assimilation and also a variety of metadata which might be useful for labeling.

H

um So the first thing is the instrument that took the measurements and the quantity observed like the temperature. It also includes the actual observation value, for instance 290 kelvin. um It includes an observation error estimate which we use in the assimilation process, but might be uh useful in machine learning. Also, um then it creates. It also has an ensemble of model. Estimates of the observation, it's another crucial part of the assimilation process, but maybe a unique form of data for machine learning.

H

Other highlights it comes with quality control labels for both the input and the output observation types, and it also contains the uh the observation locations and times there's up to a million observations at each of these assimilation time windows and we have 13 000 assimilation windows. So I think that's a lot of data to work with.

H

Of course you can work with a subset, but it's all there if you want it.

H

One thing I wanted to mention too is that um combining the observation error with this ensemble spread gives something we call the total spread, which is a better measure of the consistency of the observations in the model. So that's something that can be derived and used.

H

Another data set we have is generated well I'll, give you the background. Csm can be configured to run surface components such as clm or pop using a data atmosphere read from a file instead of calculated in cam, so here's the date atmosphere. It feeds data to the coupler and the coupler gives it to whichever components want it. These are surface forcing files, usually fluxes, uh sometimes just the state at the surface.

H

So this is another form of output. We had observations as the input, and this is one form of output from the process.

H

Again, it's an 80 member ensemble of these forcing files the cadences range from one to six hours and they also span the nine years.

H

These are two-dimensional, gridded data, there's very little metadata in them, but the values that the data themselves are very useful for running these surface models and also I'd like to point out it's expensive to produce these or to reproduce them. If you wanted to try that, so these data, forcing files can be used to force all of these surface components.

H

But these three I've outlined in blue have interfaces to the data simulation research testbed, and you can actually do data assimilation using those components.

H

A third form of the output data is, uh we have an ensemble mean of the cam model state, which was I mentioned before, as the surface pressure, temperature, etc and that's available every six hours. At the same time, we have the 80 member ensemble, which is available weekly, so I've sketched it out here graphically. So each column is one time. We've got our 0 hour 6 hour 12.

H

in our 0, we've got all 80 members and the mean, and then a week later we have another mean and all of the members in between we're missing all the members. So I'm wondering if this is a useful combination in machine learning contexts or if this is just a an oddity.

H

And the last thing I'll bring up is clm history files, there's also four times per day and an 80 member ensemble, and they explore the impact of weather observations or and or forcing variability on plant and crop growth.

H

It's to my mind is one more step removed from the observations, but it still might be interesting to explore in a machine learning context.

H

So this is my final slide. I'd like to open it up for discussion, um questions I'm particularly interested in are which data sets seem most or least useful for machine learning. Here's a summary of the four of them. How would they be useful?

H

Does data need to be reformatted or is net cdf good and which metadata could qualify as labels of the data?

H

So that's it. Thank you.

B

Thanks kevin yeah thanks for summarizing here, I think these are really uh helpful questions to think about, along with points like cloud computing which you brought up. um uh Any uh questions or responses to these discussion points um feel free to raise your hand, put something in the chat. We can also be thinking about these um throughout the rest of the session and come back to them at the end. But I'll pause for a moment here, see if there's any questions.

B

uh Maria go ahead.

G

Sorry I was going to see if someone else raised their hand. So someone has a question. I can definitely um yield, but um I had a quick question um kevin. I was wondering if you also are making available the observations that were used. Oh yeah, I see says observation files, but that were used to create the um the re-analysis product, just kind of thinking, more broadly about areas where there's very limited observations and what that means for training, a machine, learning model, etc.

G

Yeah, just thoughts.

H

Yeah, the complete observation set is available within this data set and you can make maps of the distribution um you can analyze them, statistically they're they're all available for use.

G

And um does that have labels as far as like what is um aircraft or uh like balloon? Okay,.

H

Great I lost my cursor there, we go, that's what the instrument is and the quantity observed. So it may say: satellite, uh you know cloud top wind velocity or it may say a radio sun balloon moisture measurement so that that's all part of the metadata. That's in the observations.

G

B

So much yeah, certainly having such a large data size is, is definitely a benefit to machine learning and the amount of data that we need for training. I think it probably just depends on the question, and hopefully people can be thinking about that during the rest of the session. There's a comment in the chat that I'll just read before we move on, and so it.

H

B

I would think that frequency distributions of observed and modeled variables would be important for testing the veracity of machine learning methods so kind of similar to what maria was saying. How we can use data sets like this for uh validation of machine learning, algorithms, which is, I think, another potential powerful use for this data set.

B

Great thanks kevin. I think we should move on uh to our next speaker, um who is chris fletcher from the university of waterloo and.

B

When kevin stop.

H

Sharing your screen.

H

I

Okay, great, thank you very much. I will try and share mine. Give me a thumbs up or something. If you can see. My title looks great. It's good, okay, perfect! Well, thanks very much um for the introduction, thanks for having me here today and thanks everyone who's watching um for coming along. So I'd like to begin uh this talk by just showing this kind of beautiful picture um of a very high resolution, um global simulation. This is from a paper in james last year.

I

So this is a 1.4 kilometer global um climate simulation, and I think it's fair to say that you know if we, if we all had our kind of um you know top 10 list of of things that we would like as climate modelers, you know a very high resolution. Global climate model or earth system model would be right at the top right because to support decision making and adaptation efforts. We really need that high resolution simulation particularly hydrologic change.

I

Unfortunately, computational resource limitations mean that we're far from this right now, so the work I'm going to describe today is an attempt to use machine learning as a way to kind of augment or or improve the efficiency of the model development for higher resolution esm. This is a sort of proof of concept or an early study, and this is work in conjunction with a phd student in computer vision at the university of waterloo.

I

Will mcnally and uh jack virgin a climate scientist a phd student in my group as well in geography and I'd like to acknowledge microsoft, the ai for earth program through the ai institute at waterloo for for funding?

I

So I've already said we need high resolution right. This is um pretty clear, but in terms of cmip6, you know we're basically at the sort of hundred kilometer uh grid spacing scale and so really to get to that high resolution. Various methods that I'm clumping into this, this sort of general term downscaling, are required. We've heard many talks this week at the workshop about variable resolution, where you have a global model operating mostly at lower resolution, and you kind of focus in on a particular region.

I

At high resolution gray, you can do dynamical downscaling with a sort of limited area model, high resolution, statistical downscaling. All of these, in my opinion, are sort of sub optimal. The the optimal solution would be to have a higher resolution, global esm, but we're sort of prohibited from prevented from having that by computational uh limitations and one of the places um where there's a barrier.

I

uh I think to to these getting these high resolution esm's even off the ground, is in tuning and calibration okay, and so what I'm going to show today is a way that we believe we can use a machine learning technique from computer vision, namely convolutional neural networks, to support this calibration effort and reduce the amount of cpu time. That's required to run simulations with the higher resolution models.

I

So we heard a nice talk earlier in the week from cecile henae about tuning calibration of cesm and cecile outlined the the sort of multi-step process of this calibration and tuning process really nicely. And what I'm going to focus on in my talk today is um the way that we believe we can insert machine learning right into this tuning.

I

The the tuning and calibration process at its heart is about finding the the optimal values of uncertain parameters, and so I believe we can use machine learning to to help us do that and I'll show some results about that. um I'm going to present results that are purely in the in the sort of atmosphere, land only the kind of uncoupled uh framework, but I do believe that there is potential to extend this methodology to incorporate the fully coupled sort of multi-component uh modeling framework.

I

That's required to uh you know, find energy balance and optimize the calibration, for you know transient simulations, for example, with cesm, so a set of simulations that we ran um with a very old version of cesm and, in particular the atmosphere component, cam 4, with fixed prescribed, pre-industrial, ssts and sea ice.

I

And what we're trying to do is to investigate the impact on the atmospheric model simulation from a number of uncertain atmospheric parameters, and so we're doing a perturbed physics, ensemble or ppe and in fact, we're doing three of those we're doing three ppe's each of a hundred members each and we're doing them at different resolutions. Okay, so we have a one degrees or f, o nine um ppe, that's our highest resolution.

I

In this example, we also have a two degree version and we have a four degree version: the f40 and the f45, and the nine parameters are shown are shown here and the the values that we uh vary them across. I'm not expecting you to take this information home and the the actual details of the parameters that we're varying are not that important for for the presentation that I'm making it's it's more of a an example of how the calibration of uncertain parameters can be uh incorporated and improved in this machine learning architecture.

I

We've run each of these members in the ppe for three years so 36 months, and then all of the results I'm showing you today will be an analysis of the annual mean uh three-year mean, so the climatological mean okay and um we have to upscale the lower resolution. Outputs to the the higher resolution grid 192 by 288, just to make the the numerics of the cnn work, the the convolutional neural network or the the cnn is a technique borrowed from computer vision and normally in computer vision.

I

You go from right to left in in terms of the order of operations, so in computer vision, the task is to start with a complex image and then try to sort of progressively simplify uh that image. To identify kind of the key features um similar to to you know in zack's talk where he was showing the identification of a type of animal or a kitchen item, or whatever now um will mcnally who's the grad student who's working with me on this project.

I

He had the insight that, if you invert this process, um then you could actually go from the the sort of very low dimensional. uh In fact, the sort of 90 input of parameter values; okay, so we have those nine parameters and then, through a series of convolutions and and reshapings, you can actually kind of increase the complexity right up into the point where the model outputs as its predictions.

I

This fully uh resolved global array of seven output variables, things like low cloud fraction, uh total precipitation net radiative flux at the top of atmosphere, all that kind of thing, and so all of these outputs uh come from a single kind of iteration through this cnn architecture.

I

So we provide as an input the nine input parameters and the cnn once it's fully trained, we'll uh spit out this, this array of global maps of our output variables- and we do a sort of training and testing cross-validated um methodology to to assess the accuracy of this cnn at predicting outputs um from from cam4 and just to show you what it looks like in the one degree mode.

I

You know we have a 100 member ensemble and we sort of randomly sample 80 cases, and then we try and predict the unseen 20 cases, and we repeat that whole thing 40 times. So we can get kind of a sampling distribution and assess what the skill is and we assess the skill using. um This is not sum of squares. This is a skill score metric um from uh pierce uh 29 uh 2009.

I

This is a sort of very common um that that is very similar to sort of cling up to efficiency, takes into account. You know bias and rmse, but also the spatial pattern of the data, the correlation and the variance ratio.

I

So it's uh it's appropriate when you're trying to match a spatial map and another important detail to mention before I move on- is that we're using the cnn not to predict the the sort of raw output from cam4 but in fact to predict the differences in the output that are due to the parametric changes so in each ensemble member, we're perturbing nine parameters, and what we want to predict is the impact that those nine parameters have on a particular output field.

I

Okay, so here's the sort of first results slide that is showing our in the left column, the output from cam 4 for a single realization in the ensemble and again these are differences that are due to the parameters that are being perturbed, and so we've got the rest dom or the top of atmosphere or top of model. uh Radiative balance field here, which is fairly smooth, fairly flat field, and then we've got the total precipitation, which of course has a lot of spatial structure.

I

This is the one degree case and then the two uh columns here, the cnn mse and the cnn ss columns.

I

These show the predictions of this realization by the by the the neural network and the reason I'm showing two versions of the predictions is because we can we're comparing the effect of training the model, the cnn, in a slightly different way, using a different loss function and that's a probably too much detail for this talk, but just to say that the skill is improved when the the target that you're training against incorporates some spatial information about the target field.

I

um So we can look at skill scores and we can look at mse and overall, we can see that the accuracy of this cnn is is is is pretty good for an individual realization, zooming in on the precipitation field, and then looking you know averaged over all realizations. The cross-validated skill score is about 0.8. Okay, for this cnn, using the uh skill score loss function for precipitation in particular, which is the hardest target for the for the neural net.

I

It's at 0.73 and it's producing maps that look like this, and if you sort of compare the map on the right and the left and and look you know, there's a lot of detail here in the asian monsoon regions, uh northern part of south america, the sort of separation of the itcz there's a lot of detail here, the the um the cnn is able to capture and just to remind you that the only information that this model's been given is those nine parameter values at the beginning.

I

Okay, now, for my final part of my talk, um just a sort of slightly more applied example of how we think this might be used uh operationally down the road to try and make calibration and tuning more efficient. So I mentioned at the beginning that we actually have three ppe's.

I

We have three ensembles at different resolutions and what we do in this next example, inspired by the work of anderson and lucas 2018, is to actually construct a multi-resolution emulator, and so we train the cnn now on all of the data, all of the cases at lower resolution, and then we show different versions.

I

uh We train different versions of the cnn, gradually increasing the number of high resolution, examples that we we show the cnn so we're progressively, giving it more and more of the data that it really needs to make predictions of high resolution outputs and once again we have this kind of 20 unseen cases and we repeat the whole thing 40 times to assess the skill and we can compare our cnn output to just you know what we would obtain by using a prediction of the climatological mean difference due to the parametric changes as our ben as our baseline.

I

So here's the kind of main result slide for the talk and starting in this this panel on the left. What you see on the y-axis is the skill score and on the x-axis, this is being shown now as a function of the number of higher resolution cases that are incorporated in the training. So a value of zero means there are no high-resolution cases.

I

It's only low resolution data, that's incorporated and when you do that, the blue line shows you that the skill is around 0.6 slightly below 0.6, which is not bad but incorporating more higher resolution data does improve the skill as you might expect- and this is particularly true for variables like precipitation, which have a very strong resolution, dependence and so precipitation starts at around 0.4, it's a bit lower than the mean, but it increases rapidly and and the more higher resolution data that it sees the the more that the skill separates from the the sort of baseline value.

I

So, overall we see this kind of plateauing in the skill for all of the seven outputs that we predict around 40 cases. So, basically, if you can run your higher resolution model for about 40 times, then you've seen about as much skill gain as you're going to see there's sort of diminishing returns above 40 cases. So this is where the efficiency comes from right.

I

You can run the lower resolution versions of the model and they provide a fair amount of information right about what higher resolution outputs are going to look like, and then you can kind of feed it. um Smaller chunks or smaller numbers of higher resolution runs, and in that way you can save yourself a lot of time because you don't have to run as many of those those different combinations of cases with high resolution at the beginning.

I

Okay, so I'm going to uh conclude with a few thoughts, so um we would argue that this this method, using um the convolutional neural network, you know, could uh potentially support the calibration of of higher resolution models. um The prediction skill from the cnn is it is reasonably good.

I

I think that that is a a rather subjective measure, though, and it would be really uh interesting to have discussions with model developers as to whether or not that precipitation field, for example, that we end up predicting from the cnn would be kind of useful information or whether the the finer scale details that are in the in the original simulation from cam 4, that aren't predicted by the cnn would be required.

I

um I've made a statement here that you know in in this in this setup the having the cnn in place of just running the higher resolution model. 100 times you know, reduces the amount of cpu time we need by about 20 to 40 percent. So there is a considerable uh cost saving and you know you might argue that if those uh relationships across scales were to hold that that saving might be even more profound as you go to to get higher resolutions.

I

The highest resolution in this example is just one degree: it's the standard, cmip6 resolution, it's not particularly high. So the the key question for extending this work is you know whether or not we can? We can push this to you know 0.25, 0.1 or or further right and maybe to time, evolving and multi-component situations, and so I'm going to leave it there just to acknowledge my co-authors. Will mcnally from computer vision and jack virgin, and we have a manuscript in preparation thanks a lot for your attention.

B

Thanks chris, for that really nice talk, uh we did have a question coming in the chat during the talk, so we'll read that from chuck, just as there are multiple methods of down scaling, are there multiple methods of upscaling and does the choice of method for upscaling significantly influence the end results.

I

Yeah, that's a great question thanks, um so the upscaling happens basically to get our data sort of con conforming. um So I forget where I mentioned it here yeah. So we have sort of different uh different resolutions of our training data. We have to get them all kind of conforming, and so you can either sort of degrade the high resolution or you can upscale the low resolution. We we decided to sort just use a bilinear interpolation to get these um lower resolution fields onto the one degree grid.

I

We tested a few other methods of upscaling, you know sort of cubic uh interpolation and other things it didn't seem to make a kind of material difference to the to the conclusions we, uh our our cnn skill, was fairly robust to those changes, but we haven't explored any kind of more advanced methods of of upscaling and it would be interesting to know whether or not those um those may have an effect.

B

Yeah related to the it's kind of unique that you have the different resolutions for the ppes, and so I guess I was wondering if you see significant differences in the ppe spreads with at different resolutions, since you do have like consistent parameters. So does does that impact the spread of the resulting ensemble members.

I

Yeah, that's a really good question actually um off the top of my head. I I don't know um I need. I need to look at that. That would be something very interesting to look at. I do recall from the anderson and lucas paper, which is the the the previous example that I I'm aware of where I have this multi-resolution ppe- um that they did show that the distributions the pdfs of the the outputs for the different resolutions. I think that was a cam 5 study.

I

They had a slightly more um advanced model, but um there wasn't. There wasn't a huge difference in terms of the the the dispersion among the the ensemble members and the different resolutions, but we'd have to check for this. This version of the model.

B

Great thanks, um I think, there's a bunch more questions in the chat, but I'm wondering if we should move on um to the last talk, which is will be given by me.

A

Yeah, so our last speaker is the endcard team and kd presents on behalf of the encar team, and she really gives an overview of all the activities that are going on in machine learning with csm.

B

Okay hope you can see that okay, yes um great so yeah thanks everybody for the talk so far, I've sort of been tasked with trying to give an overview of all of the cesm related machine learning projects here at npr, um but also you know some collaboration with external folks. um So, let's jump right in, I don't think I have to motivate this too much.

B

We've already heard some great talks on machine learning for climate modeling and how we can sort of try and use some of these new techniques for the questions that we're interested in, and so when I was trying to map out the different cesm related machine learning activities.

B

I thought I'd just break it down by cesm components and we actually do have a pretty good sampling of different projects thanks to input that different folks at ncar sent me on what they're working on, and so I'm going to start with a couple of projects that I've been involved in and the first one is actually from the land side, so very similar to the work that chris was just talking about. I've done some emulation and calibration uh with ppes and clm5, using machine learning to emulate clm and then I'm moving to the atmosphere side.

B

I'm also involved in work that uses different machine learning, algorithms to detect extreme events. So we'll talk about that briefly.

B

um We'll also highlight some work by andrew gedemann and others looking at parametrization in cam6 and then staying in the atmosphere. Maria molina is leading work on using machine learning for earth system prediction with a number of us here at encar and also a few external folks.

B

uh We'll also uh highlight some work by alice duvivier, uh looking at moving into polar modeling. So looking at process understanding um for sea ice and then finally, we'll highlight some work uh by scott bachmann and marcus, using sort of a framework to couple hpc with machine learning and that's with the ocean model. That's with mom 6.

B

and I'd also like to say that I'm probably not covering everything here, because I know there's a lot lots of ongoing work. So we might not be capturing all the projects. But hopefully this gives you a good sense of the different kinds of activities that were we're working on here at empire.

B

Okay, so starting with the projects that I'm involved in again really nice to follow, chris's talk, because what I've done with the land model with clm5 is to train a series of artificial neural networks to emulate. Cln5 output, given different sensitive parameters as input, and so what this does is allows for many fast computations with an emulator instead of running the full model.

B

So you can test lots of different parameter values, and this is work with ben sanderson, rosie fisher and dave lawrence, and so really, what you can do here is do a series of emulation, calibration and testing procedures so uh on the left, I'm showing sort of assessing the skill of the clm emulator for a particular output variable.

B

This is looking at the spatial variability of principal components of gpp or gross primary production, and so what we want to see is that the neural network predictions are well correlated with the actual model output, so so that our emulator is getting pretty good skill and predictions and then what you can do is you can bring in the observations to ask the emulator to give you the best fit parameter, values that best match observations and that's the calibration step, and so what I'm showing here is results from the model.

B

That's then run with those optimized parameter values of relative to observations, and you can compare that with the default model bias on the right. So what we're finding is that we're getting improvement in some regions and degradation in others. This is for again for gpp, looking spatially kind of highlighting the difficulty of calibrating globally with these different parameters, but there's more detail in this paper that came out last year.

B

I've also been involved with work, looking at using machine learning to do extreme weather detection, and this is with maria molina john truesdale julie, carone and jerry neal aladenkar.

B

So what we'd like to do is use machine learning, based detection to automate the classification of different synoptic scale, weather features, and so what we're showing here is that we've been able to apply some existing algorithms that are both based on pre-trained, convolutional, neural networks or cnns, and on the left. I'm showing detection of atmospheric rivers and tropical cyclones- and this is using the pre-trained climate net algorithm.

B

There's a paper here per bottle that has more details on that algorithm to detect ars and tcs globally in high resolution or quarter degree coupled cesm simulations and then on the right. I'm showing a different machine learning algorithm called dl front, and this is to detect different front types, and this was developed by jim bart and ken kunkel at nc state. So there's a reference here for more details on that algorithm.

B

The spatial domain here is just north america and it's not quite as high resolution, but we are able to test this on csm output and actually get pretty good detection results, despite the fact that this algorithm was trained on mirror re-analysis, and so some of the goals with this work are to explore validation and explainability of these different algorithms.

B

I'm showing just a quick example of validating the results from the frontal id detector here, so we're actually able to compare the cesm front crossing rate climatology so sort of how often fronts are passing across north america with the national weather service data set.

B

That allows us to do some validation and we are seeing pretty good agreement between the two data sets when we look at sort of the mean front crossing rates, seasonally we're also working on developing a detector for mcs and so maria's been leading that work, and so that would be as another tool in our tool set to kind of um get.

B

Some detection of you know different type of extreme weather event, um but then our overall, our overarching goal, is to connect the identified features with extreme precipitation events, and so I'm just showing an example of this here where we have a snapshot of the detected fronts uh and then the 90th percentile precipitation plotted on top of that. So what you can see is there are some areas where the extreme precipitation lines up nicely with the frontal systems.

B

There are some areas where it doesn't, which could indicate a different event, type or source for that extreme precipitation.

B

Okay, so moving right along to the work I mentioned by andrew dettleman, also with jack chen and david john gagne, and so this the goal of this work is to machine, learn the warm brain processes so emulating the cloud microphysics in cam using a neural network. So the question is: can we do the warm rain processes better? I'm not going to go through all the details of the emulation here, but I would direct you to andrew's recent paper in james.

B

But if you look at the top right figure, they're showing here the emulator performance, which is quite good so comparing the emulator on the y-axis with the bin model, this is tel aviv university bin microphysical model on the x-axis and so showing very good agreement between those. And this is plotting. A rain mass tendency and the colors are frequency, but then he also notes that the bin code is different than the original model, which is the I believe, the morrison and gentleman cloud microphysics scheme.

B

So there are differences there and then on the bottom here, they're showing the onset of precipitation, which does look different between the control and the emulator and so andrew circled sort of the region here of lower effective radiuses, where you can see differences in the rain rate, which is the colors here as you increase the liquid water path. um But another important takeaway is that this is an opportunity for a large speed up and calculations, because the neural network is quite fast.

B

So the next project I want to highlight is led by maria molina, and this is looking at machine learning for earth system predictability. uh This is with yaga richter, sasha glanville, myself, kirsten mayer, zane, martin from csu julie, crown, ichihu and jeremiah.

B

So the motivating question here her concept here is that prediction: skill of precipitation on sub-seasonal to seasonal time, skills from earth system models remains poor.

B

So the idea here is to use actually an unsupervised learning technique which we haven't heard too much about so far. Today, we've heard a lot of different examples of supervised learning and the specific technique here is psalms or self-organizing maps, and so the idea is to use this technique to group synoptic scale patterns without the need for any pre-existing labels.

B

So the way this could work is you know, providing an input vector, for example, containing week three mean winds, geopotential, height and precipitation over the us, and then letting the sun, letting the self-organizing map determine different groupings of these climatological patterns, and so maria's provided an example here of the week three mean precipitation anomaly: these are from the cesm2 s2s simulations sub, seasonal, reforecasts, and so again, what the song is doing here is kind of grouping different precipitation, anomaly patterns. You can see you know, patterns where you've got drying in the southeastern u.s versus wedding.

B

I believe the arrows um are winds, but I'm not 100 sure um and there's nine groupings here. The number of categorizations from the som is somewhat of a user-defined parameter, but you can see how it it separates into different patterns with different sample sizes across those, uh and then you can also look sort of upstream of those week, three precipitation anomalies.

B

So here's just two very different precipitation anomaly patterns uh week: three, but then also the the preceding outgoing long wave radiation patterns uh week, one- um and so you know you can think about possible, teleconnections and so teleconnections here, if you look at the olr and the tropical pacific, so the next steps would involve predicting the synoptic scale patterns of u.s precipitation on sub-seasonal time scales, so, for example, starting from olr patterns and then sort of switching over to more supervised learning technique.

B

For example, cnn, to predict these different precipitation anomalies and some of the questions are you know how robust are the patterns delineated by the sum? Can we leverage these methods to think more about extreme events and also improving, potentially improving s2s prediction skill so moving right along to polar modeling alice to vivier, maria molina and marika?

B

Holland are working on a project that has to do with antarctic marine protected areas, and so this is really thinking about forecasting coastal locations of highest ecological value for possible protection, and the map here is showing both existing marine protected areas or mpas in blue and proposed mpas in gray and then sea ice production is also sort of plotted around the coastline of antarctica.

B

Here, this project is also using self-organizing maps with cesm2 output, so alice has provided an example here of two different regions and sort of running the cesm output from that region through a self-organizing map and the different patterns in these are 10 meter winds, so very different patterns, again sort of separated into nine different self-organizing maps, and you can see the different frequencies as well plotted with those.

B

So that's for the abnensen sea and then for the ross sea again sort of separating those wind patterns into different components. I think the idea here is to relate the csm output to ecological impacts over time and how we can sort of contribute to preservation and production areas.

B

So the last project I want to highlight is called smartsim, and this is this was contributed by scott bachmann and gustavo marcus at ncar, but it's also in collaboration with a pretty large team from hewlett-packard and also cccma.

B

So what is smartsim? It's a development library dedicated to converging ai and numerical simulation models sort of shown here. We know a lot of our climate models are written in c and fortran, and so smart sim is sort of working to connect those models with the modern data science stack and therefore enabling sort of inference, training and analysis through that connection, and so by modern data science stack, we mean things like jupiter lab and desk and scott's, provided an example simulation.

B

I'm going to start this here, where they're using smart, sim with mom, 6 and so what's plotted here is eddie kinetic energy, so the use case is predicting eddy, kinetic energy, using the machine learning to hopefully help improve parameterizations of mesoscale turbulence. So I'll start the simulation again because it's pretty cool, so this is looks like a pre-industrial control. Daily output run over the course of a year, so very fine scale, sort of turbulence features that you can see here.

B

Okay um and then just a few more details on their end car collaboration. I believe there is a paper that's been submitted.

B

That's on archive that you can that I would refer you to for more details, so they're using momsix and pytorch, which is a common machine, learning library to kind of facilitate the smart sim infrastructure and then there's more details here about the ensemble members and the computation again emphasizing there's a negligible slowdown so also thinking about computational costs and feasibility and sort of how quickly this system is able to handle the throughput between the sort of numerical simulations and the analysis.

B

And then this last point here which I'd be interested to hear more about from scott or gustavo, is that they are looking into invoking smartsim into the coupled cesm. I think so far. It's just through mom6, but it'll be interesting to see if we can apply this to other components of the model.

B

Okay, so that's all I have. I just want to thank everybody who gave me slides and hopefully, if some of those folks are online, they can answer more specific questions about their various projects, and so we can do that or we can transition right into the discussion.

B

Yeah thanks everybody for the talks and for your attention so far.

A

Many thanks katie. So if you have a specific question for katie, please raise your hand and wait a few minutes or a.

A

A

I don't see any specific question um just from maria great overview, which I agree. So one of our motivations was that we have a pretty scattered landscape, even at encar, as you just heard of machine learning, scientists or data scientists that work with us, the earth system, scientists and it's even with an end car, not quite in my view, an organized effort.

A

Yet so I think part of the session was well the intention of the session to really bring the community together to learn about the encore efforts and also see, of course, what's going on in the outside community.

A

So I think we could transition into the discussion period, and this is one chat question, um but I think it's for for for a different person. Katie, do you want to bring up the discussion questions.

B

Sure do you still have those slides pulled up.

A

um Yep, let me I can share my screen. Okay,.

A

A

Put this presentation note.

A

A

Okay, we only have maybe 10 more minutes or four 14 more minutes, but potential discussion points are listed here, but we don't need to stick to these, so this is really an open-ended discussion and really an opportunity for us as a community to exchange ideas. One question that I put up here is whether in-car can really facilitate help facilitate the communication among us, the people that participate here in this cross-working group machine learning scientists are maybe also interested in getting into csm related research with respect to machine learning.

A

So a question, and- and we might not have the answer today- but can ankara really help facilitate that discussion? Can encar connect us scientists um that that you know who do machine learning, related research with csm or even other models that encore like wolf or other models? And if, yes, how would we do that? And this is an open-ended question, of course, but other discussion points as you see them here, and this came up in many of the presentations we talked about the machine learning workflows, even the tools like sherpa or others.

A

These are hyper parameter, optimization tools. We also just heard from katie about tools to link python and fortran codes, and I think this is actually a very important topic, as we are looking at emulators for csm or other models or the month six months, even that we heard from from xena there may be questions concerning the computational resources needs. um Encode does provide, of course, gpu capabilities with casper and, of course, the next system that's coming.

A

Are we utilizing these I'll be getting access to these? We had points raised here about explainable ai. How can we interpret the machine, learning methods supervised versus the unsupervised methods like the self-organizing maps versus random forest or nms?

A

We heard about physical constraints, how we integrate them into the machine learning algorithm, how we improve parametrizations, or I guess we heard that's a mixed bag. That's not always an improvement emulation of physical organizations, and maybe we didn't concentrate too much here on the uncertainty quantification, but these are all points that are of interest here to this group.

A

So I don't see the chat right now. Let me open it um any any comments to this regard. Please raise your hand if you like to speak up, or, of course you can. You can enter questions or suggestions into the chat box.

G

Yeah I'm happy to go first, I guess um I I'm. I love um the first bullet point um and you know how could we work to connect everyone, um and particularly at ncar, where we have scientists working across so many different components of cesm and with so many different areas of expertise, but I'm curious to hear from everyone here or or who would like to speak up about? How can that be done?

G

Or can one of these efforts be done to connect everyone without it being um more like imposing on others time, because um I think in this virtual world a lot? Maybe people are having more and more meetings or you know just finding a struggle for time um balance. So um yeah are there efforts that others are part of, and that have seen success or any suggestions for how something like that could be done or led.

A

Maybe I can, you know, provide my own thoughts, but I of course invite everybody to contribute, so a first step could be and- and it would of course require some resources to build up- maybe even just a simple web page uh on one of the end car servers to facilitate information exchange, for example, let's post our references, can we maybe list the csm related machine learning references in one location that we have a kind of a go-to place to see? What's going on, at least in the csm world?

A

Of course we have many other activities in the community, but that could be a starting point. Maybe with a few you know highlights from recent papers, maybe updated once a month, or so you know not too frequently, but a place where we at least would go to to find. Maybe even collaborators from the end card team and see what the activity is.

G

Thank you, but.

A

Again, I invite everybody to contribute to to these thoughts.

D

Yeah so um well, first of all, thanks katie for sharing some of the work that we've done with hpe.

D

um So, as katie said, I'm sort of mostly interested in doing the improvement of parameterizations, and I was sort of wondering if ncar could maybe act as a host for some kind of repository for like transfer learning, so that we can share our you know: trained um neural networks or parameterizations with each other. You know thinking about a conventional turbulence, closure parameterization right. It's like we want to share that idea with other researchers.

D

You know we either share the equations or maybe share numerical algorithms, but with machine learning, it's a little bit different right, we're sharing the results of training. You know a neural network or something um so you know if we wanted to share our training that produced that mom 6 movie, that katie showed um you know another another scientist or another modeling center may not have the ability to run the very expensive model that we use for the training, um but we'd still like to be able to you know share.

D

You know share the results that we have with them right, because I think that would be quite nice and I think ncar is perhaps uniquely situated to do that, since we have the computational resources- and we have you- know we're hosting these gigantic data sets. um So why not hope be like a host for these kinds of model? Improvements too.

B

Yeah, I think I think, that's a great point, scott. I know we have another gokan as his hand up, but I'll just mention that, in our collaboration with the extreme weather detection, we've been able to leverage models that were trained elsewhere and then do that sort of transfer learning by taking those models and applying them to cesm output. And it's you know less computationally heavy on on our end, because we don't have to do the training sort of from zero.

B

So I think it's a nice example of of how that might work in different contexts, but go on go ahead.

D

So it's along the similar along this. uh My comment was along this along similar lines as well, and I think bailey just commented uh as well. I guess I mean listening to all the talks, so well, not that many actually, but it looked like many people are using different. I guess I don't have a sense of how common these algorithms or whatever scripts or codes are.

D

I know that people are using the generic names of whatever cnn or some other names, but I have no idea how details differ. I mean at the end.

D

I can see this thing as more or less at this stage since its early stages, this field is almost in its infancy. People are pro, probably are trying different methods and different techniques, but in five years time we may end up with a whatever suit of all these algorithms. How? How are we I mean? Is it possible for ces I'm able to host some of these uh algorithms that are accepted by the community? You know or identified, as maybe robust I mean I can see that this can be diverging quite a lot. So what is?

D

I guess the general thinking of how to sort of bring these things together in a sense, and maybe cesm can serve that purpose through a website or through something else, and maybe we can essentially form a task team, not a working group necessarily or cross-working task team or something we can discuss this thing at an ssc meeting and maybe cesm can act to facilitate these communications, maybe on a bi-monthly or by whatever sort of telecoms or webinars, and that kind of thing.

B

Yeah, I guess I'll I'll just also mention that some of the early machine learning for earth science tutorials have played a similar role, and this is also where we can leverage the work. That's been done in sizzle because to prepare for those tutorials they've created a set of no jupiter notebooks or python scripts, using some of the common machine learning libraries that we've heard you've heard about today.

B

So I think maybe that's happening a little bit, maybe where cesm can be more specific to some of the techniques we would want for looking at different climate, modeling questions um and there's certainly some best practices and, as dave said, maybe some sort of standard ml toolkit, because there are a lot of choices to be made along the way. I think garrett kind of hinted at that in his talk, um so there's not really one size fits all, but yeah.

D

I mean an interesting thing, is that I don't know in late, 1990s machine learning came up again and many people essentially were working on it and nothing came out of it. I don't know whether there was actually lack of momentum behind it or not, but it was actually it comes in cycles almost and we need to make sure that this one sticks around. I guess.

B

Yeah, I see some other hands maria and then garrett.

G

Yeah thanks mom katie gokan. I wanted to reply to your um suggestion or idea. I think it's a great point um it'd be interesting to see if maybe like, while katie mentioned, that the models so far that are being built and all these different projects and things like that are highly specialized for a specific question and the architectures can vary greatly.

G

It would be interesting to maybe try an effort where we have some sort of like vanilla, that's what they call like the simpler cnns, but some sort of vanilla, cesm applied neural network that maybe can be provided to the community uh specific for cesm and maybe and have that be untrained and others can train it for their own application.

G

I don't know just a totally random thought that I had, as you were, explaining that um to try to maybe motivate more people to train, um but something that's like very standard like like we just kind of packaged it and provided it I don't know, that's just a random thought. I had.

A

E

uh Yeah, I just wanted to um respond not directly to to what goku gohan said, but uh kind of adjacent and uh and also address what katie said.

E

I have used neural networks extensively over the last two years and I have yet to really besides the one that I showed today, which was about a year ago, I've yet to really get one to work, even at the most simple level.

E

So one thing that I've learned- and this could just be a product of my of my workflow, but um it's definitely not a one-size-fit-all type of work by any means, um and I did want to commend sizzle and their staff for being able to get things that I've been interested in using whether it's sherpa or even a library xg boost for boosted force things that weren't implemented on the on casper and in the python. Casper libraries they've been really helpful in getting those things on.

E

I know it's not what you're talking about with some kind of a robust everybody can can kind of use this machine learning model, but it is. It is very helpful to be able to use those resources and have that and have that uh you know ability to to kind of do do what whatever you you as long as you can justify that you're gonna get use out of it. um You know they're really helpful at getting those things together for you. So.

A

Important points gert megan.

G

Hi, uh I also want to um say something about um what uh maria and gokun was saying, and actually those are great points. One thing that I think um our community should stay vigilant about is like the the basically progresses in the like the new algorithms that come come up and basically are very successful and can be used like in the in the transfer learning.

G

So there are like a lot of like competitions, and there are like a lot of like newer algorithms coming up um that are like a basically a version of, for example, see a convolutional neural network cnns, but um they are like kind of like, for example, something like unit or alex net, or something like that.

G

Those things I think it would be important for us to stay vigilant after mostly some like progresses on those, because there are like. I think there are a lot of like newer algorithms or newer, basically neural network structures that are like successful and can be used in in like other fields, and we can use it with like transfer learning a bit like csm or with climate data.

G

And that's just a point that I wanted to bring up.

A

Yeah very, very important. I think we need to stay close to the computer science community in this regard. Yes,.

G

Can I reply to you um yeah, just that's a great point. I think maybe could like making uh finding a way to get the community more involved in the computer science conferences on machine learning like neurops and then there's that neurix earth science- something something workshop, um so I think um yeah finding I mean we already have agu ams et cetera, but some of these um large machine learning conferences and getting the csm community involved. There would be great to stay up to date on some of these newer models that are coming.

A

A

So we are past our allocated time. um We can keep this channel open for interested participants if you still have a little bit of time if we have more interest in this discussion. Of course, this was brief and we couldn't address. You know all the points we could have potentially discussed, and please view this here as a start of the conversation, because I think there's a lot of potential for machine learning and csm.

A

So the action item for us as organizers will be uh to think about. Maybe can anchor facilitate communication, as this can be at first pretty simple. I am not envisioning a huge effort right now, but you know I I suggest we give it a go and see whether we have a better platform to use with, and this is really specific to csm. Of course we have access to all these other resources ago, ams and so forth, but I think it would be good place to start. You know a discussion about csm related.

B

Research thanks everybody for coming, especially at the very end of the long week.

A

A

A