National Energy Research Scientific Computing Center (NERSC) Deep Learning for Science School 2019, 7 Aug 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 04 - TensorFlow 2.0 Ecosystem - Josh Gordon

Description

Deep Learning for Science School 2019 - Lawrence Berkeley National Lab
Agenda and talk slides are available at: https://dl4sci-school.lbl.gov/agenda

A

Go to point, that's us super cutting edge because.

A

Thank you so much everyone how's it going. So thanks for having me, this is my first time back in the National Lab since my first job after college, which was at Lawrence Livermore. So it's good to be here. I can say that a Berkeley is way prettier.

A

The view from the hill is beautiful anyway, I'd like to introduce tensorflow, not that Livermore's, not pretty too, but this is really something so I'd like to introduce tensorflow to today and what we'll do is basically I'll point you to a bunch of resources that you can try today during the hands-on workshop, and you can try later on on your own time and I'll start with just a quick overview and walk you through what tensorflow 2 is and then I'll show you some intro examples and some advanced examples and the goal is not going to be to cover all the concepts and the advanced examples just to give you tutorials, you can check out later.

A

So the short story is that tensorflow 2 is the main goal, is ease of use and currently it's in beta and I guess I'll just start live. Let me show you where to go to find all the tensorflow 2 tutorials, so you can do this in the workshop.

A

But if you go to tensorflow org and what you should do is ignore pretty much everything on the website, except what you find at tensorflow at org, slash, beta and go to learn and then tensorflow you and our website a little bit slow recently, not sure why and then click on tf2, beta and I know this sort of hidden up here and what you'll find easier through the URL. What you'll find on this tab? These are all the latest tutorials and guides and everything we're writing for version.

A

2 goes here, there's me getting a horrible sunburn, and so, during the workshop later, you're gonna walk through some of the we've very roughly divided these things into beginner advanced and during the workshop you'll work through a couple of the basic tutorials and, as time remains some of the advanced ones.

A

If you haven't learned tensorflow one well, first of all, you do not need tensorflow wan knowledge at all to use tensorflow two and if you have not previously used tensorflow one, you should not just start just start directly with tensorflow two Google's learned a lot, and the community has learned a lot about developing deep learning frameworks in the last few years. That field itself is advancing really fast, but so is the software engineering side and, as we've learned, a lot about the needs of Googlers and the community, we've adjusted the library to match them.

A

So tensorflow 2 is just as fast and performant, but it's much much easier to use. Briefly. Let me just I, don't think you've done any code yet. So let me just show you a little bit of code really quick and then we'll talk more depth about it. So we have some really many examples.

A

If you were to click on get started for beginners, tensorflow 2 gives you different styles of code, that you can use to define your neural networks and at the beginners end we have something called Vaqueros, sequential api and, if you're new to this, which I think a lot of people are I want to walk you through exactly what it is.

A

So, first of all, every single tutorial and guide that you find on this page is runnable end to end and we're really proud of this. It means that the code all works so for all these tutorials they're, actually Jupiter notebooks on github and the way we build our website is. We have a script that just reads the notebook converts its HTML sticks in on the web page. So for any of these, you can go to github.

A

If you want, you can download it, you can run it locally or for any of these, you can run it in collaboratory, and let me just give you a quick overview of colab. I know you're gonna be using lab hardware today, but collab is awesome if you're at home. So what collab is it's a free, Jupiter, notebook environment? It's provided by Google research, it's basically Jupiter notebooks in the cloud and what's important about collab, for our examples is that it comes with a free GPU. So when you open up the collaboratory notebook, whoops.

A

This is really high-res. You can hit this connect button in the top right and that will give you a virtual machine on Google Cloud you've root access to the VM, so you can install whatever you want. You can upload data the VMS there get deleted every like 12 or 24 hours.

A

So don't stick any data on there that you want to keep you'll notice I'm connected now, and that means I can use this exactly like a jupiter notebook at the start of a lot of our tutorials, and this is the only thing you'll need to keep in mind collab by default. It has all sorts of software installed on it. It's got scikit-learn, it's got tensorflow one. It's got pi torch, it's got Karis, it's got all sorts of great libraries.

A

You need to install tensorflow 2 at in collab at the start of all our 2.0 beta tutorials. So you can run the thing commands that start with a bang inside collab you're running a shell command. So if you do bang LS you'll see the directory that you're in in your VM. If you do bang pip, you can install software, and this is the only thing that takes a moment, but you'll have to kick this off and you start all of our tutorials. Here's how you enable a GPU.

A

The first thing to be aware of is: if you want to use GPU, enabled tensorflow, you have to change the install command to read like that. All the tutorials that need a GPU will do that for you, then what you need to do is enable GPU inside the notebook.

A

So if you go to edit and then notebook settings you can pick a hardware accelerator, don't choose a TPU TP use will require a small amount of code changes and we're working on making that no code changes, but for right now, GPUs on a single machine will just work out of the box and your code is identical. There's nothing else! You need to do so.

A

You can enable the GPU and when you get connect- and then you hover over the this guy up here, you'll see it's connecting I know it's like two-point text, but you'll see it's connecting to a GPU back-end, which is great all right before I explain what this tutorial does. I want to show you the code for a neural network, and what we're looking at here is. This is a.

A

If we had just written and defined her model like this, we would have multi-class logistic regression and the simplest way to define a neural network in tensorflow. 2 is using what we call the sequential API, which says: I'm gonna define my neural network as a stack of layers, one after the other and 90 plus percent of machine learning. Problems in practice will fall into exactly this.

A

What you can see if, instead we wanted to do a neural network instead of a logistic regression model or a linear model? What we could do is this. We could add another layer and we'll choose the width of the layer and we'll give it a reimu activation, and now we have a neural network and if we wanted to have a deep neural network that Z this is a new laptop, so I'm not familiar with keyboard. Now we have a deep neural network, and now we have a deeper neural network.

A

So one thing: that's a blessing and a curse about deep learning, as you're finding out is there's a ton of concepts. So no one should attend a school like this being new to this stuff and seeing terms like dense layers and raillery and being like, oh yeah, like I, totally now having tuition or what's happening at all these layers. It takes a long time to learn the concepts, but what I've hope you've just seen is that writing the code is much easier today than it used to be, which is a really good thing.

A

Sequential models and I'll explain what Karos is in a second, because this is slightly weird sequential models. When people develop with this style of code, it feels imperative it feels exactly like doing regular Python programming, but what you're actually doing is behind the scenes. You're, defining a data structure and your data structure here is very simple: it's a stack and what this means- and this should be interesting- maybe even if you've been developing with Karros for a while and didn't realize this.

A

When you say model compile here, we're using a very high level API to set up our model and you can choose the way you want gradient descent to work. Maybe you've seen SGD. You know maybe you've seen atom or things like that Thanks, but anyway, you can choose. Optimizers out of a box, can choose your loss function and categorical. Cross-Entropy is something that you would use when you're doing classification, it's basically a loss function that compares to probability distributions metrics that you want to get printed out.

A

But what's interesting is when you say multiply, compile at compile time. Obviously we can do checks, basically making sure the shapes of your layers are compatible, and what this means is that we can catch programming bugs before you start training and running your models, which is a really good thing. This is what we want in the next style of code. That I'll show you in a minute: it's a little bit more flexible than this, but we can't check things at compile time.

A

You can train your model with a single line of code and whenever you train a deep learning model, the very first thing you should look at is overfitting and underfitting. So one of the reasons that deep neural networks are so successful for things like classification is they're super powerful and if you decide to define like a really deep network and by deep I mean the number of layers- and you make the layers wide enough by that I mean the number of neurons per layer units per layer and you train the thing for a long time.

A

It's probably going to memorize your training data, and this is a good thing because of that it's also very likely to badly over fit and do a horrible job on your validation and testing data. There's lots of different knobs. You can tune to prevent networks from overfitting. There's things you can do like adding regularization like l2 or drop out. You can reduce the size of your layers.

A

You can mess around with your optimizer, but the most important thing is just the single number here that box and an epoch means you've used every example, through your training set once to update all the weights on your model. So an epoch means a single sweep through your training set. It's a single round of gradient descent with every example. The longer you train these things for the more tightly you're gonna fit the training data, and so you can basically figure out that right number.

A

The magical number for at box 5 plotting the loss in your training data, so the way I would set this up. I would choose a silly large number of that box. I will- and you know that from experience, but I train this thing for a 28 box, long enough to memorize the training data as you're training.

A

You make a plot and you plot the loss on your training data and you also plot the loss on your validation data and basically, when you start training, all your weights are initialized randomly, so the loss on both your training and validation data will be decreasing as your model begins, overfitting or memorizing. The training data, the loss on your validation data is going to start increasing, it's gonna get worse and worse and worse, and when that begins to occur, that's the right number of epochs to train.

A

For so you train these models until your validation loss starts. Increasing a lot of what people focus on and one cool thing about giving the talk of the lab is when they start learning deep learning. You'll spend all your time messing around with stuff like this, like: what's the right model architecture, what types of layers should I use, but in practice this is by far the smallest part of doing deep learning successfully. It's all about thinking about your problem. So what are you trying to model? Or what are you trying to classify?

A

How do you evaluate it? How do you know you've done a good job? How do you know your models are gonna work, though in production when they're deployed on data you've never seen before, and so it's really thinking about your experiment, that's hard and setting up the right experiment, and once you have that done this, you can mess with and figure out, but anyway, it's what you'll start on today. Anyway, let me just run this code really quickly.

A

I probably should have installed 1002 before I started, explaining it, but we'll let this go and you'll see the output. Oh, that takes a minute, so I'm gonna pull up.

A

I'm gonna pull up another example, while that's running. If we go back to the getting started page and we click on get started for experts Oh before I show you this I promise that explain what Kara sees so.

A

Anyway, it's running modeled outfit right now, there's also methods like model doctor dict, and you can see for every Epoque. It's reporting the loss from the training data and there's parameters you can set to have it automatically report the loss in the validation data to anyway. Here's the first thing, that's weird so who has used a library called Karis before okay, so half of you, here's what's interesting. If you go to Carol Co Kara Scott IO is wonderful. This is an independent open-source project, nothing to do with tensorflow.

A

If you do pip install Karos, you get what you find at Cairo Co behind the scenes. Caris will automatically install another deep learning library and that can be tensorflow. It can be c NT k. It can be MX matter whatever what Karros is and why Karros is so successful care us and you've just seen this Center getting started for beginners example: Karros is an API spec, so Kara says basically there's different ways to define your deep neural networks. One such way is the sequential API, where you define a model, and you add layers to it.

A

You can pile it. What Kara doesn't say anything about is here's how you multiply matrices quickly, so when you actually need to train these things, kara uses another library behind the scenes to do the math and another library to handle. How do I get this stuff on the GPU or whatever. If you do Pippins anyway, this was extremely successful. This was the first deep learning library that was really really really user, focused, really really clear and easy to use. It was oh and we'd love it on the tensorflow side.

A

So now it's also built into tensor flow. If you do pip install tensor flow, every example that you find at Karros dot IO will work if you change the imports. So you take an example. Instead of from Charis top models, you say from tensor flow Karros and the rest is the same. Tensor flow to is a superset of carroll's and it has stuff that you won't find at this webpage if you're new, to deep learning. This is a wonderful place to start anything.

A

You learn at Karis thought IO will feed directly into tensor flow, so you're not wasting your time. In fact, there's a whole book that I'd really strongly recommend deep learning with Python by Francois Sholay, which is by far the best book to start your deep learning journey from a practical developer side. It's 40 bucks. It's there's, no math, it's not an academic textbook, but it's basically here's how you do the thing.

A

So if your goal is to learn how to like I want show me the simplest way to train an image, classifier or a text classifier and by simplest I mean simplest, but not black box. It's not like model equals. My magical text, classifier train, it's! You know you're at least defining the models piece-by-piece, so you understand what's happening inside anyway. It's wonderful, deep learning with Python tensorflow adds a lot on top of this and right now we're talking only about Python. So let me show you some Python, that's different in tensorflow to that.

A

You won't find a chaos that I Oh. If we go back to our get it starting get it started page and we look at tensorflow for experts. This is a very similar model to what we saw in the beginners example. It's another endless thing which maybe I'll walk you through in a sec, but the model is defined in a very, very different way.

A

You so if you've been doing deep learning for a while. This might look like chainer or PI torch, and what we're doing here is we're defining our model by subclassing, a class defined by the library, and here in tensorflow we call it model different frameworks. We call it different yoga, different frameworks, we'll call it different things, and this should feel a lot like object-oriented numpy development in the constructor and by the way, of course, you can add parameters as you like.

A

You can override all these methods here, we're defining our layers and in the call method which, in some libraries is called predict or forward. This is what happens when your model or your layer gets called and what's wonderful is tensorflow. Two is what we call imperative or eager by default.

A

If you don't know what eager means lucky you it's just the way python works, it means imperative, so out-of-the-box tensorflow two works exactly as you'd expect Python code to work, and what this means is that, if you're really curious about I want to see exactly what the output of some convolutional layer is, you can do print X or print X dot shape, and these are tensors, but they also have nice any tensor in tensorflow too, if you want to get into numpy land, all the tensors have a dot numpy method.

A

So if you have a tensor, you can just call T dot numpy now you're backing them by which is great. So this is wonderful for learning the of the basics or not the basics. At all. Excuse me for learning exactly the details of what's going in and out of these layers, where the shapes. What does my data look like? So it's great for debugging one thing: that's new in tensorflow to that I also want to show you is there's two ways and by the way this is also a chaos model.

A

Karras has three ways of defining models. It has sequential which you should always start with, and you should always use first, because sequential models are the easiest to debug and they're, also the easiest to share with friends. So, if I'm looking at a code from a student and she writes it using the sequential model and there's a bug, I can immediately see it. It takes like 30 seconds tops if she writes it using this sub classing model, it can take me 15 minutes to find it.

A

That's because this is new and there's few standards for how you write your code. This way with the sequential model, your code is a data structure with the sub classing model. Your model is Python byte code, which means you can do whatever you want, but it's also tricky to debug. So that's that's the trade-off they're, both wonderful. You can't go wrong with either there's also the functional API in Karros, which some people really love so sequential is a stack, functional API, but you would use if your model is a day or a graph.

A

So it gives you a little bit more expressivity than the sequential model. Both of these models can be trained in two different ways, regardless of whether you use the functional, sequential or sub classing. You can train your model with model dot fit which you should always start with. Unless you want to poke around with the details and here's how you poke around with the details, so in tensor flow 2, you can also train your models with what we call a gradient tape and basically, what a deep learning library is.

A

In a nutshell, a deep learning library is a matrix multiplier, because almost all these layers under the hood you forward and backward by multiply matrices so deep learning library. This is true, tensorflow c TK, MX and add all of them multiplies matrices. It can also do that on a GPU great all of the deep learning libraries they have different ways of defining layers and they have automatic differentiation and that's what we're seeing here under the hood. So tensorflow uses reverse mode, auto diff, and this is basically writing model dot fit from scratch.

A

So the tape will trace all the operations that are in nested on this whiff block and it literally plays them back on a tape to compute the gradients, and so here this is our forward. Pass we're making some predictions on images we're calculating our loss, which is single number and what's great, is if we say tape, dot, gradient, we're, saying tensorflow. Please give me the gradients of the loss with respect to all the variables in my model. Do you print those out you'll get the very gradients, they're Python lists.

A

This means, if you happen to be, you know much smarter than me, and you know you're doing research on the optimization and you're implementing like the new berkeley, optimization method, you can implement it and just regular Python and it's very, very easy to poke around with one cool thing that tensorflow to does the only piece of non-standard Python. So if you did tensorflow one you learned about you know sessions and graphs and placeholders, and all this stuff is very cool but tensorflow.

A

To has none of that there's only a single line of not regular Python that you need to use in tensorflow to and it's this and it's totally optional, it's TF function.

A

If you do TF function, what this will do so one piece of slowdown in deep learning libraries is behind the scenes. Tensorflow is a c++ engine, a writing code in Python, when these operations are actually executed, we're going from Python to C++ compute the results send it back to Python, so we're ping-ponging back and forth line by line which is slow. If you do at TF function, what you're saying is hand this entire function to the C++ back-end and I'm, not a compilers engineer, but if you are, then you probably know all different optimizations.

A

You can do to the code if you analyze it statically so compile the code optimize it compute. It send back the result once and this can give you anywhere from a 0 to 10 X speed-up on your code and it's a single line. It's awesome, anything that you can stick inside TF function. You can stick inside a tensorflow save model if you're exporting things. If you do TF function, it makes your hard your code slightly harder to debug. The error messages might make less sense.

A

So the way this works when you're developing your models, don't use TF function, develop your model debug it all that stuff when you've finished developing. If you care about speed which I often, but if you do care about speed, add a single TF function and the way to find the best practices for this is look at our advanced tutorials and usually what we do you don't need to add this on top of this is recursively applied, so you don't need to sprinkle your whole code with it.

A

Usually we just stick it on top of our training loop and that's it. So we like this a lot. It's super user-friendly, um so, what's really relevant for the lab is distributed training so in the guides on tensorflow, two tents floated org, slash beta you'll, find a guide for distribution strategy and I want to show you what distribution strategies are. So here's some careless model that we've defined to do whatever and if we want to run this, the most common case of distributed training is one machine with multiple GPUs, and this is called data parallelism.

A

So there's a parameter called batch size, which is how many examples do you use to round of gradient descent? Larger batch sizes means more accurate updates, so the simplest way to do distributed training give one box with a lot of GPUs as you increase your batch size. So let's say the most that one GPU can handle is 32 you've, two GPUs you give 32 examples to each GPU. They do the forward pass backward pass. You average the gradients.

A

Nice about distribution strategies is that's the complete code for data parallelism and there's more that are being developed, there's different strategies for different Network configurations and different number of machines and different numbers of accelerators. It's really cool stuff, and what I'd like about these is they're super user friendly when you even on a single machine by the way big gotcha, is your data input pipeline? So, let's say you're doing something like training, an image, classifier and you're reading examples off disk.

A

A huge slowdown is this is called GPU starvation, so one issue might be your GPUs are faster than the code you've written to read images off disk and if you have a small data set, the easiest way around. This is just use. Numpy load the whole thing into memory. You don't have to worry about it. Have a nice day. If you have more energy to invest, you can use something called TF data and TF data.

A

It's a data pre-processing pipeline, that's written in C++, but you can call it through Python, it's a little bit faster and it has all sorts of utilities that you can use to get data on GPUs.

A

All right, so I want to show you a couple examples and.

A

Before we do that.

A

Let me point you to a few of really cool examples, we're working on that you can try during the workshop, so Steve's gonna walk you through some basic stuff things and the bad stuff and here's some new stuff.

A

So we haven't published these yet they should be on the website later this week, I just sort of pirated the code and uploaded it. But that's: okay! It's about to be open sourced anyway. What this is is this tutorial for image segmentation, so this will train a image segmentation model on the Oxford Pets data set the reason that I wanted to give you a link- and you can just jot this down.

A

So you have it later: it's bitly, /tf seg, it's just a jupiter notebook and what's nice about this, is it runs in about five minutes or less so a lot of our advanced tutorials like cycle gand I'll, show you in a sec can take a little bit longer to train, but this is fast enough that you can do it almost interactively, and so it's a really nice advanced example. That's fun that you can play with quickly, so TF, seg and I can give you the things later too and there's another one.

A

They'll show you I'll walk you through this I just made these slides a second ago, so a little funny. This is a code example for a deep dream and this is based off a github repo that I have, but this is cleaned up, so it runs a lot faster, using tensorflow to best practices, mine's kind of crap. So this is Bentley. /Tf dream!

A

Oh, yes! So the reason this requires sign-in is I. Didn't have time to upload this to my github account. So what this is is this is just a notebook sitting on my Google Drive. um If you can't access it, I'll fix that in a sec. I just want to get this to you. If other people are having trouble accessing it. I'll fix that right. After the talk, I'll, probably just messed up the sharing settings, so TF seg in TF, dream and.

A

Let's do this, let me talk briefly about tensorflow beyond Python and then I want to talk a little bit about deep learning very, very quickly. I know: you've covered a little bit of convolution I just want to say a few more words about it. Then we'll do linear regression just to show you the mechanics of writing tensor flow to code from scratch, and then we'll do deep dream and the reason we're doing deep dream and linear regression is.

A

The code is very, very, very similar, despite doing two very different things, which is a nice feature of tensor flow, so there's also something called tensor flow, j/s and every time I pick up my laptop to do this. I unplug.

A

Every time I pick up my laptop to do this I unplug things it's horrible, but what we're looking at carefully carefully! It is not meant for this many people, but this is a model called pose net and even though I'm filming you right now, this is relatively private, so no data is being sent to the cloud. This is all running locally in Chrome and working Firefox to your favorite browser. So this is running locally. It's entirely in JavaScript, its GPU accelerated. So it's fast and what's interesting is I.

A

Just bought this a couple days ago, cuz my personal laptop died. So this is my home laptop and it's the cheapest MacBook. You can buy right now, so it's still not cheap right. It's like 1,100 1,200 bucks, but the point I'd like to make is even when this hardware, this is running at I, can't see it. It's probably like 20 frames a second. This is fast right and it's doing something. That's pretty sophisticated.

A

So the reason I'm showing you JavaScript- and this may not be relevant to stuff you're working on at the lab, but it's a really cool way that I destroy it. No, it still works great question. Is it related to the Xbox No, so I'm? Not my guess would be that it's not related to the Xbox I suspect, but I could be totally wrong. The Xbox has some sort of radar type device to physically measure your distance. This is purely just vision.

A

I'm, not sure which is the question was: is it how does it compare to the Xbox I'm, not sure I, don't have an Xbox I only play games on my laptop, but the reason I'm showing you that is I am NOT a JavaScript developer and when I heard about this idea to do machine learning the browser, my gut instinct was like that's silly, because I spend a lot of time being like I love Python, but it's slow.

A

Why on earth are we going to JavaScript, which is probably even slower as soon as I saw stuff like this I was like well, I was totally wrong. The reason we care about doing machine learning in JavaScript- and this is a huge game-changer of course, like we want JavaScript developers and people using other languages to be able to develop a deep learning that haven't switched by that. Obviously, but the reason we care about javascript is because it runs client-side. So this gives you another deployment option.

A

So as a Python developer, the way I deploy models is I, start up a REST API and the crappy way to do that. Is you use flasks or whatever you want? If you're at a large place like a lab or a company, you can use something called tensorflow serving and tensorflow serving is part of the tensorflow eco system. It's exactly the same code that Google uses to serve models internally, you can download it to C++ library, it will load in models you saved in Python and throw up a REST API.

A

So that's high performance, but it takes some time to set up, but to pulling in JavaScript means you can just push models out to your users and they run client-side, and this is really really cool. So it's new paradigm and it's only about a year old and we're seeing tons of cool applications all over the place tensorflow because I'm talking about the ecosystem. It also has it's a very, very big project. A huge thing right now is Swift for tensorflow, and this is something Chris, Lattner and others are working on.

A

So Swift is a modern language. It's compiled it's fast and there's a lot of engineering hours being poured into this project. Basically implementing tensorflow and Swift. There's a whole class. You can take on it from a company called fast, a I. It looks really promising. So if you happen to be a swift developer, that is a completely legit place. There's no need to use Python r JJ Allaire from our studio also did a phenomenally good job implementing tensor flow in R.

A

So if you're a statistician and that's your background, you can use very similar, api's they're, all Karos. So what I'm, showing you directly in art? And it's great, so it's a big project. There's lots of other things that you'll find on our website.

A

One really useful one is tensorflow hub and tensorflow hub is a library of pre-trained models. I want to give you some caveats there working on upgrading tensorflow hub for tensorflow too right now. Some examples work with tensorflow too, and some don't so just FYI you, let me talk a little bit about deep learning and then we'll do some more code. So deep learning is representation.

A

Learning and I want to add a few more words on convolution and usually when I teach deep learning, I start with convolution instead of dense layers, because if you have a single dense layer- and we had more time, it's very easy to interpret what a single dense layer is doing when you have a DNN defined as a stack of dense layers. Who knows exactly what features the subsequent layers are looking at, but you can see it very easily with convolution so convolution, it's not a machine learning concept and it's something you've probably used.

A

So this is just the code example inside to do convolution on an image to detect edges. Convolution is how all the filters in photoshop work for sharpening things and blurring things and finding edges and stuff like that and something interesting is inside. Pi chose the they have a bunch of built in pictures, but it shows the astronaut picture so I like astronauts.

A

So my first question is: does anyone know who the astronaut that's built into SCI pi is because this is a science he place, she's famous enough to get built in his eye by yeah I know you could use your phone anyway. It's it's Lynn Collins and she was the first woman to command the space shuttle Columbia, which is a big deal anyway. I just want to show you what convolution means so you'll see this a lot deep learning. We take terms that, are that mean a lot of things?

A

So, if you're an electrical engineer, you know we more about convolution than I ever will in deep learning convolution we mean slide and here's how we slide to detect edges and I'll. Show you this fast and then slow. So convolution starts with something called a kernel or a filter. You'll see in deep learning. There's multiple names for the same thing, all the time just to make it fun to learn. So here's a filter that can detect edges. There's a couple important things about this. So the first thing is you notice: this filter has nine numbers.

A

Eight of them are negative one. One is eight. The reason that this can detect edges is the intuition is. If we take the dot product of this filter and an area under the image, the dot product is going to be zero. If all the pixels have the same intensity and it's going to be a larger number if the pixels are different or there's an edge.

A

The way this works is that's the result, but let me show you how to compute it. If you do the side, pike involve code and I'll. Show you exactly how that works in a second. That's the output. What's really really interesting and wonderful about convolution. Is it's super efficient?

A

So if you think about it, if you wanted to detect edges on an image using a dense layer, you're gonna have a bad time to quote self mark, so you're gonna need a very wide layer and that dense layer needs to learn to detect edges separately at every chunk of the image. So you're gonna have different neurons. That learn to say: is there an edge in the top-left corner? You'll have more neurons? Is there an edge? Next to that? This is stupid with just the model numbers, though we can find edges all across the image.

A

So there's many many many times more efficient. This actually isn't much more than linear regression, so y equals MX, plus B just to find the best fit line. That's two parameters for the slope in the intercept, and here it's just seven more. We get edges every one of the image, so convolution has a lot of nice properties and here's what I mean by the dot product. Just so you can see- and this is how the library works.

A

Do we take an image and we take our filter and we have an output image and we drop the filter and some chunk of the input image. We take the dot product, that's just one times, two plus zero times zero plus one times one and so on and so forth. So you sum it up and that's that's the output pixel and then we convolve where we slide and we take the dot product. Again we get another output, pixel we convolve and we convolve, and now we have an output image and there's more to a two.

A

You can learn about padding and stuff like that and strides and pooling and whatever, but that's that's basically convolution and here's how this fits into deep learning.

A

Here's what the deep in deep learning means so I stole these from a friend of mine, Martin Gorner he's a much better artist than I am but here's an image and an image isn't 2d, it's 3d, so an image often has three color channels, red, green and blue, and if you printed the shape out of this thing in numpy, you might see ten by ten by three with height depth. So red, green, blue and what's interesting, is too much Diet Coke.

A

We can convince Reedy in exactly the same way we convolve in 2d, so we're still doing dot product. But now our filters are three-dimensional, so filters will always pass through the entire past through the entire depth of the image. And so, if we start convolving, this filter we're still taking a dot product and as we slide, we get output, pixels and I'm. Just gonna fly through it. If we slide for a long time, we get an output image in Photoshop. You write the filters by hand here.

A

The filters are learned, but let's say I had written this by hand for some reason, so this might be edges. What's great about.

A

Great question: if your image doesn't have symmetry or the filters don't evenly divide the image, then you can use concepts like padding and stride to deal with that, but for right now, I'm just gonna pretend just to fly through it excellent question to fly through it. I'm gonna pretend that it's just evenly divides it. You don't have to worry about that on the way to find out how to deal with images. There are different sizes.

A

If you look at the tensorflow api docs for convolution and maybe I'll show you that in a sec, you'll see a whole slew of options, but here's the important point in deep learning with more filters, you get more output images, so these might be edges in the different orientation and you can have as many as you like. So if you see a layer, let me show you what such a layer might look like in code.

A

We pull up. This is our beginner tutorials convolutional neural networks, and this tutorial is a bit of a lie. What this tutorial is it's not really CNN tutorial. This tutorial is trying to say: I am the minimum amount of code. You need to quickly train a convolutional neural network, but it doesn't teach you about cnn's. Five energy- maybe will update it later, but what we're looking at with this layer here, what this is saying is give me 32 filters. So let's say the input image to this layer was 10 by 10 by 3.

A

Here, we'd be learning 32 filters. Each of the filters would be 3 by 3 by 3. Every filter goes through the entire input image, more learning, 32 of them. That means the output of this layer is going to be a stack of 32 images.

A

Each filter, outputs, an image so that base would be 32 deep, all detecting different features and what happens and what's really nice, so the deep learning part happens at the next convolutional layer. So at the first layer the filters are learning features of pixels, which are basically edges in colors right at the next layer. If we have a bunch of filters again, we can have a very, very deep image, but now these filters are learning features of features and features of edges are basically shapes at the next layer.

A

You're learning features and features of features which are textures and whatnot, and so deep learning is this. You learn this hierarchy of features, all of which you learn automatically, and it's this really powerful concept and what's great about it, is if you start your deep learning journey with convolution, you can visualize exactly what these filters are detecting and you can actually do tricks with them using things like deep dream and then only at the end, by the way.

A

It's the filters are very easy to understand at the first couple layers, but if you have a CNN, that's 20 layers, deep, who knows what layer 19 is detecting, but you can use deep dream to poke around with it and then at the very very end. That's when you have a dense layer. So what a CNN is doing, your dense layer does the actual classification. So what your CNN is doing, it's basically a really really cool feature. Processor preprocessor that you get for free and there's a couple: different cool research directions too.

A

So let me see if I can slide for this.

A

I may not have a slide I'm. Looking for. What's that what you learned about, when you start deep learning, is image classification, which is really really important, given a picture of a cat or a dog predict if it's cat or dog key skill, much more interesting question: you can spend a long time on that. Much more interesting question is given that the model says you've got a cat you're asking: why did it say? I have a cat.

A

What features is the model looking at that he used to make the prediction and you might not care about it for cats, but it's useful to do basic science in other domains, so I'm sure you've all heard about this, so Lily, paying you a few years ago, became really famous for doing work on diabetic, retinopathy detection. So you've probably seen these pictures.

A

If you google, around, for like google research blog diabetic, retinopathy you'll, find she didn't experiment where you know a patient will take a picture of their retina and she tried to classify the scan of the retinas, diseased or healthy to assist ophthalmologists. But basically, Lily did really amazing work, not in writing a fancy image classifier, but in applying it to an important domain. So that was image classification applied somewhere, where it mattered.

A

Much more interesting, though, was the follow-up work, which is given that the classifier said that a picture of someone's eye was diseased is asking why it's like what pixels in the eye were indicative of the disease and the reason that's important is you can hand that off to a physician or somebody doing basic science like you might have at the lab and that could guide their future research or it could be a tool that may or may not be helpful and Lily's group did other things which was asking given that you have this data set?

A

Can you predict somebody's blood pressure based on a picture of their eye? The answer turned out to be yes and then also which pixels in the eye or indicative of blood pressure, and you can actually see anyway, so image, classification and interpreting how these things work are sort of two sides of the same coin, both important.

A

So let me show you some all right. Let's do this I'm gonna point you to a couple examples. Then I'll walk you through linear regression, I'm going to walk you through a deep dream. So in terms of examples here are some of the latest tutorials we just published ah I'm good I've got diet coke, but I've been having too much Diet Coke thanks. So.

A

Great question, so the question was within the same layer: do the filters all need to be the same shape? Usually you want the output to be the same shape. So usually you know this. All the outputs I have are these rectangular volumes that just makes it easier for the next layer. You could absolutely have different shapes in your output. You could absolutely have different shapes of filters.

A

You could use three by three filters: five by five four by four and your idea, which would have been research a few years ago was: can you use different filter shapes at the same layer and your idea is good and the answer is yes, you can I, don't have slides for it right now, but there's there's a bunch of famous computer vision, image classifiers that you can read about one such model is VG g and b GG, there's VG 16, AGG, 19, they're, basically just deep stacks of convolutional layers and then a few years later, you'll see fancier models.

A

Things like ResNet that have you know, skip connections and things between the layers. But one such paper that you just hinted at is called Inception one question: if you're doing computer vision, research is what's the right filter size and what inception basically said was we don't know so at each layer inception will run like a one-by-one, filter or three by three a five by five and it basically averages the results. So it's sort of let's try everything and it helped. So, yes, you can have different filter sizes in our tutorials.

A

Almost always we'll just have a single one, and one challenge with deep learning is its hyper parameter soup so for almost any paper, you'll find a million different parameters you can play with how many layers what's the size? What are the different? Thank you so much for the different activation functions.

A

You can experiment for a long time. Devine developing these models, I was gonna, say divining is a bit more of an art than the science right now. One such project, speaking of the tensorflow ecosystem, I, don't have slides for this. But if you, google, chaos, tuner chaos, tuner is a library to make it easy to basically do hyper parameter tuning, which means trying different combinations of things and seeing which works. Well. I only know two ways to do this manually: one way to tune hyper parameters which you should not do is grid search.

A

So that's trial, combinations of a couple different settings because it's very slow, slightly faster than grid search, could be random search. The reason you don't do grid search is often settings that are very close to each other, have basically the same performance, so you're wasting time by doing grid search. So you random, search even better if you're, a mathematician which I am NOT, there's all different search, algorithms that you can apply to hyper parameter tuning and there's library, chaos tuner. That has some of these built in and it looks. It looks really really really good.

A

There was a talk about it at Google I/o this year, which looks great so Cara's tuner, all right. It's gonna talk about Gans, so Gans are a really really interesting idea and basically I just wanna say one word about Gannon point. You two useful stuff.

A

So tensorflow right now we have really really really good Gann tutorials and the reason is they're fun to look at so we've been spending a lot of time developing them, and we have this nice sequence of gans. What again is almost all the problems that you look at, while you're learning our classification problems? It's given a picture, classify it or you might do regression. You know predictive price or a predictive probability or predict the weather right.

A

Much harder problem is image generation, and so, if I say to you, don't classify the image, but synthesize me a picture of a cat. You know that type of problem is like a very different order than classifying things and the reason it's hard is in deep learning everything we do needs a loss function. So all these DNS are trained by gradient descent. The way we get the gradients is by backprop.

A

The problem is: if you want to synthesize a picture, we need a gradient that tells us if our picture is good or not, and the way that this is in 2014 from uni Goodfellow. The way that we can generate images is by training. Two networks: in parallel, we use one model. The discriminator and the discriminator is just an image. Classifier, it's only job is given a picture say. Is this a real cat, or is this a cat that somebody synthesized?

A

We have a second Network, which is a generator and the generator starts like knowing nothing about cats and we teach the generator to generate increasingly realistic cat photos over time by training it against the discriminator. So this is called adversarial training and it gives us a loss function that we can optimize against and some of the by the way, all these papers all these tutorials, though they should also link to the papers that they talk about. So you can read more detail. Our first Gant tutorial it's important because it runs fast, but it's boring.

A

It works with M mist which you'll see you know forever. But what we're doing here is this is just a little gift. The tutorial produces. That shows you the digits it's learning to generate over time and it starts with random noise and they get increasingly better and by the way, just a detail. Here we seed the generator with random noise. That's so it doesn't learn to produce exactly the same image again and again and again, and the reason we've fixed the R and for each of these plots, which forces its generate the same image.

A

So you can actually see the progression anyway. Dc gained great prove the point. This was a later paper, but very very quickly. You can do much more sophisticated things with ganz, so this is a model I, but a lot of you who've heard heard about recently. It's called picks two picks, and this is from a wonderful group out of Berkeley and here, although we have a whole bunch of data, sets the input image here. Are these beautiful facades of well not beautiful? The output is beautiful.

A

Here are these probably grad student produced cartoon drawings of facades and here's the building they correspond to, and this is the output of the fixed fixed model and the reason I'm showing you this is this little web page will run end to end in collab. So if you click the Run button, it will download exactly the data set that you see here. Well, train the model and show you the output, which is this so it's beautiful, I mentioned experimental design being important.

A

Another thing, that's obviously important I'll just say this is not being an and I bet. A lot of you have heard about pics to pics. Recently just cuz. Some people did a crappy company based on pics to pics, which I think is now sunset. So there's a lot of good work. You can do with deep learning you can think about.

A

How can we, you know, look at how these models are analyzing patients eyes, but if you're also like a teenager, you can do really silly pointless things, and so that's that's just something we're dealing with as a community right now, anyway, another beautiful beautiful paper from the same group at Berkeley is cycle Gann, and this is real. This is what the tutorial makes hmm cycle.

A

Gann is a different fish than pics depicts so in pics depicts you need paired input-output data, so you need a facade of a building and you need a photograph of the building.

A

There's things where it's very hard to collect paired training data, so one such example is day and night. So if you want a picture of downtown San Francisco during the day, it's hard to get exactly the same picture at night, because cars move around and stuff like that, but there's also data sets where it's almost impossible to get paired training data, because the paired training data doesn't exist in nature.

A

So here cycle Gann is translating from horses to zebras and it's called cycle because it can also translate from zebras to horses and there's no such way to collect this data set because it doesn't exist.

A

So what the authors of this paper realizes that, although you can't get a one-to-one mapping, what you can do is get a directory of horses and a directory of zebras and the adversarial learning problem here is the generator produces an image of a zebra and the discriminator can't figure out if it's real or false like. Could this image of a zebra belong in my zebra directory? The loss function also forces the image of a zebra to his closely match the input image of the horse.

A

So if you stack these up, you'll see they're almost identical, and so we have these two loss functions. And if you look at the code, you'll see the code is almost identical to pics to pics. It's basically and in fact that's how we wrote the tutorial. We import the entire pics to pics model and we slightly change the loss function. So a lot of these cool tricks and deep learning are just thinking about these new loss functions that describe the problems you care about and then training models, yeah, yes good eyes.

A

So one thing the question was: there's background noise and stuff like that, so there's a couple reasons for background noise cycle. Gann is one of our few tutorials that almost all of these will run in a few minutes cycle. Gann does not, and this starts to push the limits of collab. So collab is meant for interactive research or interactive. Whatever development we just run our tutorials in collab, because that's what we expect users to do before they install tensorflow in their local machine, and we didn't train this that long.

A

If you train it for longer, the images get a little bit sharper. But if you look at checkerboard images in convolution, you have really good eyesight and.

A

A

Yeah there's a wonderful journal called distilled pob, which is nuts. It's called distill di STI ll pub, and this is some of the best work around into understanding exactly what are these networks doing under the hood to classify images or do whatever you want? So it's research in interpret e and it's the best that I'm aware of all of their articles. Have these beautiful, interactive demos.

A

But if you want to learn about checkerboard, artifacts and convolution, they have a whole little thing explaining exactly why that happens, and it's an artifact of just the way filters work so distill that pub is nuts they published very rarely, but they maintain a super high quality bar and it's got. It was I think it was started at Google by Chris Oh a lot, but he left recently and there's contributors from all over the place.

A

So that's cycle Gann.

A

Another thing that might be of interest- and we just published this tutorial I- think about a week ago, so I'm not an expert in this area, but if you'd like to learn about adversarial examples, you've probably heard about these an adversarial example. What this means is it's an image, so this is a panda to me. This looks like a panda, but by adding this noise to the panda, we can trick the classifier into thinking that it's something totally different and adversarial.

A

Examples are interesting because they reveal weaknesses in the way these models work so often, like you know, we'll train these image. Classifiers we'd be like yes, like Pat self, on the back. I have this like super 99%, accurate model, but really under the hood. It's it's not doing what we think it is, and this ties in really nicely to work on interpretability. So it would be good if we understood these models better, so we'd have more confidence in the way they works.

A

Yeah, great question so for this and for Ganz, what level of tensorflow are we using to write them? And the answer is often it's a mix. So, let's check so here with adversarial examples, we're using the gradient tape to train the model. The reason that we're using the gradient tape is, we need the gradients, so the simplest way to create an adversarial example which is implemented here. Is we get the gradients of the image with respect to?

A

Basically, we get the gradients and I could be wrong. We get the gradients just as we're gonna do a normal step of gradient ascent and then what we do is we take a giant step really quickly in the wrong direction, so maybe, under the hood, all these images lie on some manifold and we're just jumping way off and that totally fools the classifier, because we need the gradients. We're writing the grading tape. This way, however,.

A

We're using basically regular chaos to actually get the image classifier. So, in addition to tensorflow hub there's something wonderful called Carrasco patience, and this is what I would personally recommend Karis in both TF chaos has a whole box of famous image models built in so here we're downloading one such model called mobile net and mobile net has gotten very popular recently, because there's a lot of interest in running models on phones and, basically or in web browsers or mobile.

A

That's really helpful to one research direction recently, which isn't rocket science, but it's super super valuable is basically how can we train accurate models with fewer parameters, so fewer layers, shorter layers, more efficient functions, so they can run on different devices and there's always the speed accuracy trade-off anyway. Mobile net has a whole bunch of different versions that run fast on different devices.

A

But here what we're doing is we're importing mobile net, and at this point you can say mobile net dot predict some image that you have a memory and it will label that image based on image net. So out of the box, when we're saying weights equals image net.

A

There's a big database out of Feifei V's group at Stanford called image net and image net is famous for having like a little bit over a million images in a thousand different classes: cats, dogs, flowers and it's used in a lot of academic image, recognition competitions, but it's famous and you can download these models that have been pre trained on image net, and this makes it very easy to reuse these things anyway.

A

So this is a mix and then, if you look through the gans I like DC Gann, you might see that one model is defined using the sequential API and you might find another is defined using something else, and so the nice thing about 10:02 is you have different options based on what are you doing? Let me mix and match you.

A

A

Another really good collection, we're gonna separate these out when we launch the library, if you want to learn a lot about how this works under the hood. So if you're saying like Karros is great but I have my own idea for the Berkeley layers, library or anything else like that, you want to write your own. Let me point you at resources that you can use to figure out how to do that. So there's in the tutorials webpage there's this guide section which will break off later.

A

If you want to learn exactly how Karros layers and models work, this is really really excellent and it will also introduce the sequential functional and subclassing api's, which is great. We also have this awkwardly named needs to be expanded collection which walks you through exactly what tensors are and how do they interrupt with numpy? How exactly does ta function, work and auto graph work and stuff like that, and some of these guides are excellent, so there's lots of details for you to chug through you.

A

Let's see what else can I tell you? Here's the distribution strategy guide that I mentioned.

A

And yeah you can see this is under active development, so you can see there's different strategies supported right now for different types of different styles of tensorflow. By the way, in addition to in tensorflow, too careless is what we call the recommended API. So if you're starting tensorflow too now you should use the Kaos libraries tensorflow is a huge project and there's another wonderful, API called estimators.

A

These were originally inspired by scikit-learn, but they grew to become a little bit more complicated, they're, very, very popular internally and they're, totally supported and tensorflow they're wonderful, they're fast. But if you're starting today, you should probably start with care us just because it's a little bit easier to use. But if you have existing code that happens to use these things, it's still supported. No worries is so your your common question was intent circle of 1.6. It was difficult to write layers. Yes, I agree with you. Has this problem solved intense flow?

A

Yes, it has so I'm very, very happy with sensor flow I. Think the team I don't want to make like an apple joke like courage, but it's a courage to like pivot the library, but this wasn't BS, but with the courage from Apple to ditch the adapters right or ditch the ports. It was courage, I forget exactly what they did but yeah. So this guide will show you how to write custom layers and what's really nice about it as all the guides, you can run it.

A

It will run end to end and the code should work. So it's great.

A

You, while I'm flipping through stuff, does anyone have any questions.

A

Great question: what happened to tensorflow slim by the way I went to that SEC? Let me explain why we have all these different options.

A

So tensorflow 2 is a big project and one of my favorite things about Google is it's very much bottom-up, so like if I had an idea which I don't, but if I hadn't an idea for like the Josh library, if I had like instead of Kara I want to do the Josh Lib, probably I could take a swing at it and if it was good, maybe I could open-source it and get users. So we have a lot of people trying a lot of different ideas.

A

This is how Kara's came to be so no one told the Kara's team. No don't do this, so they did it and it worked really well. We've had a lot of ideas that haven't worked so well. Slim by the way works extremely well, but it has a relatively small user base that has done really really good work. Particularly Slim has a ton of really awesome.

A

Pre-Training they've done an excellent job, so TF slim they have a github, repo and it'll, be like 10, 20, plus famous image classifiers with complete code, and all that it's great, but it's not what we're standardizing on it's. Not this limb is bad. We just picked care us because it's got a large user base in this a little bit more mature and easier to use. I, don't know if it's deprecated or not another thing, that's interesting about tensorflow! When you're writing your data input pipeline, you basically got two choices.

A

You can use numpy, which is what you should start with and then, if you feel like it, you can graduate to TF data. Tf data can be faster, but it's also harder to use, and it's this trade-off where, basically, if you have an engineering team, TF data is what you want. If you're a single developer, just hacking around probably start with an umpire use, TF data, if you feel like it, it just performance tuning, takes some hours to get right.

A

So if you're writing your input pipeline with TF data, you probably should benchmark it and start playing around with it. Anyway, I'm gonna show you linear regression from scratch and then I'm gonna show you deep dream and the codes almost the same, which is just why I want to show you this surprisingly. Yes,.

A

Can I talk about TF agents? Unfortunately, no I'm, not a reinforcement, learning expert I've, never used TF agents. My manager does know a lot about it. You can find him on Twitter, Magnus, hissed, in' or check out the TF agents, get a repo I, almost certainly there's somebody in the room. That knows a lot about reinforcement, learning who talked to you about it. You can try and find them during the workshop.

A

That's another interesting thing about tensorflow: by the way on the tensorflow github site, you will find like a whole zoo of different projects like TF agents that are being implemented in tensorflow. That's the actual tensorflow code base, you'll find a ton of them, and this is a really really nice thing, both we Google and outside of Google.

A

What might be interesting now having randomly pulled this up? Let me talk about the first two. The first is magenta and magenta is that's this beautiful project using tensorflow for art and music? So have you ever seen a sketch RNN? This will take one second to load.

A

So sketch RNN is awesome. It looks like a toy for kids, but it's not so one thing you learn about. If you learn about rnns you'll, learn about classifying text and you learn about generating text, which is great. We have a tutorial that will teach you how to generate Shakespeare. You can also apply this same idea of generating text to images and sketch our men. It's tiny, but here it's loaded for pineapples, and so if I start drawing a pineapple sketch RNN is going to try and autocomplete my pineapple and.

A

And so it's interesting is, let me load a model for something more interesting.

A

The octopus is great one I've never tried out to this.

A

This is also running client-side by the way I, don't know how to draw an octopus.

A

So this is, this is extremely cool and it's also very surprising right, so obviously like this is not gonna put an artist at a word in next long time, but you can imagine doing like a more serious implementation of this, where you had a tool that you know, I get writer's block a lot. Maybe you could help artists with artists, Pok or if my job was to generate clipart. Maybe I could see a bunch of possibilities and the student process. What's cool is probably if I start drawing an octopus. That was amazing.

A

If I start drawing an octopus in a different way it might, it might be able to autocomplete that's ooh.

A

Does anyone happen to know by the way where the training data for this came from kids? It looks like kids, but sadly it's adults.

A

Yes, no glasses can't see you see now. Yes, exactly, it was an app that Google made that had people draw stuff and it's called quick draw right. So quick, draw you'll notice, there's somewhere on here, there's like a privacy thing when you play quick, draw, there's nothing identifiable, but it saves your drawings and what's interesting, quick-draw by the way used to be really easy and would be like draw I, don't know like a truck now it's really really hard because they have a lot of data in there enriching your training set.

A

So I can't do this. It's draw a camera, but if you start throwing a camera, quick-draw will give you like glare or suitcase. Oh I know it's camera anyway, that drawing goes into the sketch RNN database and what's interesting about the drawings is drawings, I think of drawings. As pictures they're, not they're sequences, and because we have a sequence of rush strokes, you can train an RNN to continue the scene, and so that's that's how sketch RNN came to be and on the magenta website, which is magenta tensorflow org.

A

They have implementations of all this stuff. I should also mention like these: are it's awesome? They share their code. A lot of our tutorials on the website are meant to be relatively minimal examples. So it's not like, let's train the world's most accurate image classifier. It's just show me some code that will get me started.

A

These examples are intended to be like awesome, so these are just the code directly from the papers, so they take a lot more time to go through, but if you're serious about learning how I need this stuff works, it's all it's all right there, which is super super cool. The other project that I wanted to mention when I just randomly went to the github repo is called mesh tensorflow, and this is probably really interesting to berkeley lab. This is for like super distributed training, there's a talk from the tensorflow dev summit.

A

You can watch that will go into mesh tensorflow in more depth if you're, a statistician which I am not. This is another cool thing about deep learning by the way I like I'm, a average Python developer, which means I can help people out with their deep learning models. I'm. Okay with ML, but you'll, see like right off the bat there's all these really really deep sub disciplines.

A

So if your background is in stats, there's this beautiful library, which is implemented with Carol's called tensorflow probability, and this lets you fit different distributions and do all sorts of cool statistical tricks which are out of my area. But it might be interesting to you.

A

Another cool thing: if you feel like contributing to the tensorflow ecosystem, our whole docks repo is on github. So if you see something with one of these tutorials that can be improved, you see something that doesn't make sense. Please file a pull request to raise an issue and do our best to fix it.

A

A

Another really interesting project is federated learning, and so this might be of interest to you if you're doing research in privacy and so federated learning ask the question: where of let's say that all of us would like to train. Let's say all of us are users and we want to train a model that can tag our photos.

A

So Google photos does this now, but let's say, google photos doesn't exist, so you want to upload a picture or you want to have a picture on your phone and we want a model that says that's a picture of you on vacation with your dog. Let's say that we want to train this model together, but none of us wants to upload our images to a server which we don't. How can all of us learn a model together while keeping our data private, and this is called federated learning and it's a really cool research area?

A

There's the implementation and sensor flow and there's an article on our blog that you can read about so that's tensorflow federated something to be aware of for some of these projects by the way there's there's a ton of them and what I would do before you dive into them is I would check the activity log and you want to find projects that are being actively developed and maintained and worked on. There might be some stuff in here. That's a little bit older.

A

All right, let me show you linear regression and here's a link you can write down if you want to play with this later, it's not on the website. Yet it's we have something similar buried into a tutorial, but this is self-contained bitly, /tf, ws, 1.

A

Are there plans where tensorflow to have a probabilistic language, good question I think that's TF probability but I'm, not it's out of my areas so far that I can't yeah.

A

Check out TF probability, they also have a very, very good talk. It's on YouTube from the tensorflow developer summit.

A

A

So let me just explain why we have this, so what this notebook does is. This is writing linear regression in the lowest level possible. You could do this with chaos, but this is pretending we didn't have it. What this notebook does just go really quickly and just give you the highlights. It generates some random data. It generates a noisy distribution, as you might expect it finds the benefit. That's fit line. The other thing this notebook does in case.

A

This is the first time you are you're new to gradient descent, and you want to poke around with exactly how gradient descent works at the end of it. The notebook has code to produce this, and what we're looking at here is when we do linear regression, we start with a random guess for M and B the slope and intercept, and our random guess might be up here and it's plotting what those values were and on the lawsone z-axis. It's plotting the squared error and then what we do is at each step of gradient descent.

A

You can see how the loss decreases and it's just a nice diagram. The reason I like this is it's real and I've seen this diagram alati slides, including slides that I've made, but it's nice just to have little code that makes it and then it also makes it easy to think about gradient descent right. So here we know that linear regression has a global minimum. Deep neural networks do not as a piece of trivia. It's been a long time since I took calculus, but I. Remember and I hadn't had any kind of neural networks.

A

They weren't a thing then, but if, when I took calculus I learned about local minimum global minimum right and if somebody had told me at the time like hey like these DN ends it's unknown if they have a global minimum and if they do, we don't know if we can ever find it. I would have said like right. Okay, that's training.

A

These things with gradient descent, probably he's not going to work, because that's what I learned in school and my intuition would have been totally wrong and it turns out that a lot of people made the same mistake. It turns out that to train a DNN to be accurate, you don't need to find a global minimum. You just need to find some point on the surface that works well enough and it turns out that we can find points that work extremely well and also because these dns they have so many parameters.

A

Apparently it's it's much harder to get stuck in a local minimum. That's very bad! This also makes it easy when you learn about deep learning, there's a whole box of optimizers. You can use this. One just uses gradient descent written by hand, but you learn about things like rmsprop and atom and stuff like that and a lot of the ideas they have good intuition. So you might look at this and say like well. You know when we have our initial guesses for M and B they're- probably really bad, because they're random guesses.

A

So when we get the gradient, we might want to take a very large step and then, after we've taken a bunch of steps, probably our guesses are getting a little bit better. So we'll take slower and slower steps and you might invent the idea of an adaptive learning rate or a decaying learning rate other things you might invent. If you saw the surface, you might come up with things like momentum to help you roll out of like little local, minimum and stuff like that. Anyway, it's just really nice.

A

The thing I wanted to show you I just want to show you two things and we'll go into deep dream for these DN ends. You always need three ingredients. You need a forward pass or a way to make predictions, and here the way we make predictions and in tensor flow to this. These are tensors, but it looks exactly like regular Python. Our forward passes y equals MX plus B, so, given an x predict y, then our loss function is the squared error and oh yeah.

A

So these are vectors, so we're getting our squared error and then the thing that I wanted to show you is the training loop, and here we've written this from scratch right. So.

A

What we need is the gradients of the loss with respect to M and B and the way we get that is given our training data. We make some predictions, we calculate our loss and then outside of the width block, we use the gradient tape to get it directly. This is also a really nice example to have so you can just print these things out and just see exactly what they are and what they represent, which is really nice.

A

Another thing by the way about the style of code is, if you're doing gradient clipping or something like that, you can implement it regular Python, but what I want you to look at is the training loop, so it's that and now I want to explain deep dream and it's gonna look very very similar yeah. This is the new example that will be on the website, hopefully this week.

A

So here's the idea in deep dream. First of all, who's seen deep dream before and by that I mean who's. Seen pictures that look like like this, so who's seen: d dream. Okay, other people have they've seen it does anyone know anything about it? Like what is this, why does this exist?

A

Did someone sit down one day and like yo, I need to generate psychedelic images like what was the goal in deep dream, No.

A

It's so the result was an LSD trip, so this was one of the original meme makers. So if you had, if you were like a reddit user and like you had your hands on deep dream now you have a lot of karma, so people just like banged out in psychedelic images and they're really cool. So when you look at this, what do you see so, first of all, what's what's the picture that this started life as.

A

Starry night, by an NGO, okay and what has story night become or like what? What is in story night now that van Gogh might not have put in his original painting eyes animals, because this is by far the highest resolution screen I've ever presented on by the way this is nice. I can see, there's wheels that looks to me like I draw right.

A

So where might these objects have come from.

A

It internet so this this is a generative model, so by a generative model, I mean deep dream is producing this image, we're not doing classification. All of the things that you see in deep dream appear in image net from a phase group. Her big database image net and the reason that we see lots of eyes dog faces is there's a cute little nose.

A

Image net happens to have lots of pictures of dogs, flowers, snakes, cars stuff, like that, normally in deep learning. What you do is you have a model and the model has variables or parameters, and you adjust the parameters to fit the data. So you train the classifier by tweaking these weights in deep dream, we start with a pre trained image. Classifier the goal is not to adjust the classifier at all. The goal in deep dream is deep: dream is an experiment to understand how image classifiers work.

A

What are the convolutional layers that I showed you earlier actually doing like I? Had this kind of like hand, wavy thing like yeah like layer, 4 is detecting textures, but deep reme is saying: is it really detecting textures and can we see what the filters are detecting? So the idea of deep dream is we're gonna start with an input image and we're going to modify the image to increasingly excite a filter in a pre trained image classifier.

A

So if we downloaded mobile met or vgg or a model that you trained yourself in the forward pass, we take an image. Do you pass it through the classifier? It goes through layer, 1, layer, 2, layer, 3, you blah blah blah softmax at the end says it's cat in deep reme. We stop at the layer that we care about so I might stop at layer. 4 and I'll ask the model to actually print out the activations things that come out of the reimu's at layer 4. That will give me a list of numbers.

A

What I want is to find a way to modify the image to make that list larger, to increase the way excite that layer. So here's how this works, hmm you.

A

There's a very small amount of code, which is why I find it so interesting here we're downloading image net. This is the Charis application. It's pre-trained we're getting the image net weights right here. If we had a cat in memory, he said base model dot, predicts cat and memory. You would probably say it's cat. It's an image, classifier great!

A

If you do and I think I can do this, because I ran this code.

A

Yeah I disconnected from my VM a long time ago. Let's see if it can recover.

A

Yeah, if you do model dot summary by the way, there's yeah, if you do model dot summary you'll, see a giant list of all the layers in the model, and the important thing here is that these layers have names and what we're doing is. The first thing we need is a Ford Pass. So when we push our image through the inception model, we get the output of some layers, and here we've selected these layers and there's two ways to do deep dream.

A

One is to maximize the activation of a layer or set of layers, or you can maximize the activation of a specific filter here, we're just doing layers for fun, but you can modify this to work with filters and what I mean by that is.

A

This might be layer two and as I've drawn at layer two as eight filters here, I'm trying to maximize the activations of all of them, as opposed to picking just one.

A

Our loss and there's some code here to simplify this, but our loss is just the sum of the activations, and normally we do gradient descent here, we're doing gradient ascent, so we actually want to maximize this loss. So we want to modify the image to make this number higher, which means whatever features these filters are detecting, there's more of them in the image and then in gradient ascent.

A

What we do is we wrap our image in a TF variable, and this just means it's going to be able to back prop through this and then what we do. You don't need this TF function, but it will run a lot faster if you use it.

A

Here we get the loss, so this function will Ford the image. Through the network will sum up the activations in some layer we need to go in the opposite directions: we're taking the negative of it and the magic of Auto diff is once we have it set up this way and before they cleaned it up. It looked all basically the same as linear regression when out slightly tighter the magic to Auto diff, because we have everything in tensor flow.

A

We can get the gradients of the loss with respect to the pixels on the image, and if you print this out you'll see it has exactly the same shape as the image and what we can do is we can add that to the image and in one round the image will be modified to increasing excite those filters, and so, if we pick filters this one's really quickly by the way, if we pick filters, it looks like the filters.

A

The reason this is a little bit lower res is this code, there's a whole bag of tricks that you can add to this to generate higher res, really pretty images. This is the minimum amount of code to make it work and that's what we get from these filters, but you can play with it to detect different things and here we're getting lots of eyes and stuff like that. So deep dream, is this really really really cool result?

A

What deep dream is doing is it's proving that yes, in the process of training, is CNN you're learning filters, the automatic feature. Engineering magic is learning filters that detect things we see in the world and if you look at the older implementation in tensorflow, one which has all the tricks, but it's way way longer ignoring the code. The authors of this library, they visualized every single filter in every layer of a pre, trained CNN, and you can see exactly what you see in a lot of diagrams.

A

So these are images that really excited the different filters in the first convolutional layer and it's a little hard to see but they're, basically detecting different colors and different edges and different orientations.

A

So that's layer, one as you move up the network you'll start to see filters that are responding to textures of different types, and these are a little bit harder for me to interpret. But the point is patterns are getting more abstract. As we move up the network and the deeper you go, they still don't make sense to me, but they start to resemble things that you might I. Don't know you could name them.

A

If you really tried all right, some are pretty and as you go really deep, you start to get things that are semantic II. So here whatever this is- and this looks like some strange combination of cute dogs and eyes and snakes- and who knows what this literally is is this is some filter. So, if you saved like calm, if like layer, 5 Network is calm, 2d 64- this might be like the eighth filter in that convolutional block, and this is an image that will make that filter super excited.

A

So whatever that filter detects is right here, the reason it's tessellating across the image is because the convolution is sliding. So that's why we see the same pattern. Repeating, but this is a really big deal, it's an amazing insight and if you can play with this for a long time, you'll see some things that really creepy, because the snakes this is pretty so there looks like trees.

A

But what deep learning is it's many things, but one way of talking about it is deep. Learning is representation, learning another way of saying it is it's automatic feature engineering and that's what you're getting with deep dream. Yes,.

A

These great questions, these images start with an image or random noise. They start with random noise.

A

Exactly you can do this in two ways: you can start with random noise or you can start with an image. And yes, it's a good idea inside yes,.

A

These are all from one classifier, so here we're looking at VG.

A

I think I clicked on VG, but I forget so this would be VG g, which is famous to architecture, trainer and imagenet, and then that models static and we start with random noise and we choose a layer and a filter, and then we get the gradients using that little loop that I showed you and then we iteratively adjust the random noise to increase that but yeah, but it's all the same model, and so when you download vgg like you're, not just getting the classifier but you're, getting as an artifact of training, a classifier you get inside that model.

A

There's representations of all this stuff. There's another really similar piece of work that I want to show you, because it's pretty but it's outdated. So it's called style transfer.

A

So, don't don't do this now we have this because it's an awesome, awesome, awesome paper, but it's outdated. So again we used our turtle and this is like the ugliest version of style transfer. Ever let me show you.

A

Thanks Google for the auto, so this is this is a MIT strata center strata Center anyway, you start with a photograph, and you start with the painting and you try and produce a new image that merges them too, and by the way, the way you would do style transfer now is with again, which is both simpler and works better. But this is very, very it's a close friend, deep dream and what you do we're not just stacking these images. This is also one of these magical artifacts of given that we have an image classifier.

A

What else can the classifier do? And the idea is this if we forward both of these images through the classifier layers, close to the input into CNN, to take edges and shapes- and these are texture like things right, edges, close to the output detect, eyes and stuff like that, and those are content like things we can write a loss function. We start with an image: that's random noise and the goal is when we Ford this image through the network.

A

It will excite the early layers in a similar way to the style image and it will excite the later layers in a similar way to the content image. So it's a very, very similar idea to deep dream and we have the complete code for it on the website.

A

Basically, we're picking some layers that are content from lay the network, we're picking some layers that are stylee earlier in the network, and then we start with random noise.

A

There's some math, but anyway, you scroll through you'll, see the lost function. So that's that style transfer the way you would do style transfer today is cycle. Gann and the authors shared these graphics with us they're from the paper, but they gave us the high resolution, which I really appreciate style. Gann, you can do style, transfer, eat things right, except you can go there they're a little bit higher quality. So you go for photos to different artists. You can transfer between different artists. You can do winter to summer.

A

One thing you could do to and you won't have time for it during the workshop. You need to train this for like 10 hours, it's gonna be difficult to do in collab. You want to use your own hardware. The lab has fast Hardware. Leave it running overnight. You'll have a good time winter to summer, works really really well. One thing I want to show you about the cycle. Gann tutorial.

A

Image generations cycle again.

A

You yeah, we have a whole bunch of psycho gang datasets from the paper built into tensorflow datasets. So here this is picking horses to zebras, but if you follow this link, here's a whole bunch of datasets from the authors that we have built in.

A

So if you want to modify cycle Gann to go from summer to winter or whatever you can just change a single keyword in that tutorial and run the model and it will do it or you can collect your own directories and images you could transfer between Berkeley and Livermore or whatever you want. So it's really really cool. This is a thing called tensor flow datasets by the way tensor flow datasets is conveniently different from TF data or tensor flow data.

A

Tensor flow dataset is a large collection of datasets things like M, mists and other famous ones like image net that you can import in TF data format, great question. So what do we do for feature engineering in tensor flow? Yes, there's a bunch of stuff, so in deep learning, the best place to start so, there's there's broadly two classes of machine learning problems.

A

So if you have structured data and by structured data, I mean you've got a spreadsheet or a CSV file with you know, the rows could be customers and the columns which are your features, might describe things like demographic data like ignoring fairness and privacy like age, gender income, whatever it's a small number of features that are very meaningful to us. When you have data like that, traditional models like trees work extremely well, it's very very hard to beat a decision tree with deep learning.

A

Until you have the other type of problems, you have our deep learning problems where you have lots of features like pixels or words where individual pixels don't mean much to us, but because of this feature engineering trick, it can transform them into more meaningful representations. So you have these two flavors of problems. Deep learning does work for structured data. Too often you need more data lots of data before you start outperforming these methods. So if you have a structured data problem, really strong baseline is a decision tree slightly stronger is a random forest.

A

Gradient boosted tree is going to be even better, probably and start there, and then you could see if deep learning can compete. I'll go back to future engineering and deep learning this sec. Here's what you can do with deep learning, though you that you can't do with trees Kegel at finder there's this really awesome. New data set from kaggle and the authors gave us Petfinder gave us permission to use it, which I really appreciate.

A

I haven't had a chance to work on this. Yet what's cool about this is whatever I'll just tell you. This is a database from pet finder. The goal is to predict how long it takes for these pets to be adopted.

A

The reason this is a cool database, it's an important problem, but it's cool database because it has three types of data: it has structured data or tabular data, which is basically fields like the idea of the pet, the name of the pet, the breed of the pet gender. So these are psychic learning fields you might use a tree, for it also has pictures of the pets and presumably, if you looked at a picture of the pet, that would be an informative feature and it has free text.

A

So that's like something that they wrote like you know: fluffy is a six year old, whatever and she's really awesome and playful, and so we have these three types of data, unstructured text, tabular data and images, and what this means is. This is a good use case for deep learning on structured data, because you can train a joint model that takes all these things at once, and so that's really when you want to use deep learning on structured data.

A

Let me point you to the tutorials we have which are okay, we're working on improving these if this doesn't belong in machine learning basics, but it's there haphazardly and classify structured data and what this will do. This is just a starting point like don't copy and paste this and try and train it on a large data set. This is importing like 300 lines, saikat learning data set from the Cleveland Clinic for heart disease and it's predictive.

A

A patient has heart disease based on this data, and what this is doing is it's showing different ways that you can represent this data for a neural network. So we do do some feature engineering only with structured data, and let me point you to a tool that you can use. It's called facets the way to find this tool facets it's from a team called pair. Oh, they do people in AI research. So if you search for a pair Pai R facets you'll find this tool.

A

It's not a tensor flow tool, it's just a useful thing, but it can demonstrate what we do and facets. It has this nice thing here where you can upload a CSV file and you can visualize it the CSV file. That's it's got a nice little button you can, you can use, it runs open. The CSV file it's already here is from the US Census, it's a subset of the 1990 census and the goal this is like a perfect storm for fairness is predicted.

A

Somebody makes more or less than 50 thousand dollars a year, so we can color the dataset by some. Nothing will happen nice. So what I've done is I've colored the data set blue dots, less than 50k red dots more than 50k. What's cool is, if you click on a dot, you'll see the row from the CSV file that corresponds to that dot. So this person's 38 they had a capital gain, and so this type of data makes sense. I'm, actually surprised, I, think I clicked on a blue dot.

A

So less than 50k I'm surprised that somebody had a capital gain that large and made less than 50000 so think this is probably an outlier, but anyway they're high school grad ah anyway structured data. What facets means is bucketing. So let's say that this is basically a tool to poke around with your data and get to know it. So let's say I wanted to facet it or bucket it. What I could do is I could bucket it by age.

A

So now we've divided into age buckets and we can see that these kids very rarely make more than 50k and as you've, people that are sort of I, don't know, maybe dudes they're, like prime income years or whatever you see, that the ratio changes and you can bucket it again. So if you wanted to poke around, you could bucket it by whatever you want. You know you could do with my education and jobs and fast. That is a fancy word for bucket, but one type of feature engineering you might do in deep learning.

A

Is you might bucket your data, so you might try and make it easier for the model. If, if you knew off the top of your head that it didn't matter if they were 33 or 34 or 35, you might get rid of those features and just replace them with simpler ones by bucket izing. The data I think more interesting than this there's another tool that they're just released and it's called the counterfactual tool or it's this it's the what-if tool.

A

Yes, oh thanks I should stop talking. Thanks I'll. Give this last slide. A no point am I out of thanks for reminding me so here's the last slide then I'll give you books on stop talking. This is a tool called what-if and it's new to me. It finds something called counterfactual examples and what that means is, let's say you are predicting.

A

If somebody's can get a loan for a mortgage and the answer is no, this will find them the closest point in the data set where the answer was yes, so it's called a counterfactual example in what, if tool and books.

A

I'll give you two so the last one is deep learning with Python. That's the Charis book. It's awesome! If you want the tensorflow to book, there's only one tensorflow, two books by Aurelien Jeron, it's the top one. There only get the second edition which is not released yet, but you can start reading it on a Riley's website. They have a free trial. Only get the second edition. The first one teaches tensorflow, one which you don't want. So thanks very much and I'll be around during the hands-on workshop. I can help with any questions.