GitHub Security Lab, 22 Jul 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CodeQL Live Episode 1

Description

As always, feel free to leave us a comment below and don't forget to subscribe: http://bit.ly/subgithub

Thanks!

Connect with us.
Facebook: http://fb.com/github
Twitter: http://twitter.com/github
LinkedIn: http://linkedin.com/company/github

About GitHub
GitHub is the best place to share code with friends, co-workers, classmates, and complete strangers. Millions of people use GitHub to build amazing things together. For more info, go to http://github.com

A

Welcome everyone to live ql episode one today we have nico weissman senior director of security research at github in conversation with aditya sharon, engineering manager for codeql core technology. Take it away, gentlemen,.

B

Thank you aj. I'm excited to be here today. How are you doing aditya.

C

Hi nico, I'm excited to be here as well. I think this will be great.

B

All right all right, so uh this is going to be the first of hopefully multiple uh live ql twitch streams. I don't know how you call it shows if you want to, uh and the idea will be to sort of um look at variety classes and um just you know, learn about the 1d class itself and then look at uh ql and how we can resolve some of those uh buy classes and build those patterns uh with uh coql great.

C

And for those of you joining uh by way for introduction like we work at github and with github security lab, I am an engineering manager that works on the code, ql core technology, and if you want to find out more about what that is what code ql is how you can use it to find vulnerabilities check out the github security lab website or the link codeqr.com to just get you started. While we talk about it.

B

Sounds good and on my side, I'm nico weisman, I'm the head of the github security lab mainly focus on security, research and again, if you want to learn some of the work, we're doing go to the exact same website, securitylab.github.com all right.

B

Let's start with um our first edition, which is going to be around a very old vulnerability class but rather than you or I mean still alive, I would guess so we're going to work around uh string uh copying- and I know like some people will be like facebook right now, because, like uh string copy is as old as a string copy boomer. These are all, as you know, pretty much security itself.

B

um I was trying to track down uh bullying these around, like string copy and um it's really hard to track down like the first one. I do remember uh that one of the first papers about a stack overflow- and this is not specifically to string copy. But you know a lot of stack for overflows- were like very tied into uh this three copy function.

B

One of the uh first papers around like stock overflow exploitation was the uh well-known uh smash into stock for fun and profit, and that was in 1996, so pretty old and I'm sure they were like all their stuff. um I think match also uh wrote uh paper or like stack overflows at some point. um I think it's even um early that much in the stock from lf1 um anyways uh for those who are new.

B

uh I we're gonna talk a little bit like what the bugs are and I'm gonna look at one specific uh back glass related with string. Manipulation is not the one that you probably are thinking about, which is the most common one, uh all right, so we're gonna do that interacting this is not encrypted at all. So you know send your questions as we go. uh We love to you know, learn what you think about.

B

If you have, if you're not clear enough, if you uh you know, let us know, and we will more than happy to stop and you know try to address some of those questions. um So let me start by like sharing my screen, um and this is going to be a virtual machine which is running and ubuntu.

B

We are not going to work a lot around exploitation, at least for this edition. uh We are want to concentrate on volumetric classes more than anything else, so let me start with um string copy and to do that. uh Let's write a very basic uh c program that sort of like uh has this vulnerability.

B

Hopefully uh my c is uh good enough. I haven't done a lot of like see in a while.

C

You're not supposed to write stuff on on the fly and live right. I was about.

B

To say that so please be very merciful of us, because we are going to be doing that uh a lot uh during this uh stream and in the future so like please, I'm going to make a lot of mistakes. So um yes, please have some patience, um so we're gonna do like just head start like and then- and I promise you we're not gonna- be about like the normal uh string copy overflow, but just to sort of like get us into content.

B

We're gonna do like a very basic string copy right, so we're going to have is we're going to have three buffers, I'm going to call it buffer, 1, 20, buffer, 2, 20 and then a third buffer 20., and, to be honest, like instead of like doing um um 20. Let's do max size, as people actually do on code using.

A

Okay, here we go.

B

All right, so what we're going to do is we're gonna overflow, the buffer two, which is in the middle between like one and three and see how that goes uh just to make some uh add some content buffer, one, I'm gonna fill it uh with uh characters, a uh actually all right here we go and the same is gonna happen. I know I forgot.

B

uh The same is going to happen with the um buffer three and the one we are going to overflow. Pretty simple is going to be buffer two for those who don't remember how or never deal with c in the past the way string, copy works is copy, a string up to the null termination byte from destination to source. So in this case, buffer 2 is going to be the destination and the source is going to be. The first argument of this program.

B

I'm not used to using ubuntu on this machine, so I'm gonna have some problems with my shortcuts all right. So we have our uh exercise here.

A

B

B

To include string, but I think it's still compile so now. I'm gonna run the the software and theoretically, it should.

B

Crash and let's do 30 bytes here we go.

B

Before we do that as a way to show the content, what we're going to do is we're going to print the content of buffer one and buffer two, so buffer one is gonna be, and problem is gonna get excited soon uh for one. I.

C

Think it's pretty interesting just to see how you set up something so that you can look at what's happening in memory and then look at the results in a very simple way. Yeah.

B

Yeah, instead of like debugging, oh again, um this is going to keep happening um instead of like going and debugging with gdb and all of that and wasting that time. What we're going to do is like basically creating two strings around the buffer. We want to overflow and then, when we overflow we're going to see that a and b are getting uh overflow by our string, and this is again pretty simple.

B

We're going to use c just and I didn't recompile it all right, a and b, what's going on, did I say, make it to 0x20? No, that's 20.

A

C

We go okay, so just just you know, I'm going to talk through my understanding of this, because this is very different from the kind of process that normally follows. So you've you copied strings from the input into buffer two, um but because buffer two is sitting in contiguous memory between buffer one and buffer three. What you've written into buffer two has now overflowed out into buffer three.

B

Correct so what we're gonna see here is that, uh if you're looking at the stack, we have uh theoretically buffer one here, uh buffer, two and then buffer three. This is uh indeed full of uh a's. This is full of b's and then, when we perform the string copy, what happens is like? It goes all the way, with the c's up to overriding some of the c's over here, but for free.

C

Right, so why does it let you do that like it feels like if you've said, the fingers of size 20, then it should be able to the copying function, should be able to stop.

B

uh Probably in high level languages, but see string copy is very, very straightforward. It will copy without looking at the size of any of the both the destination are the um the um source. Okay. So so the first thing that you know people thought to sort of like fix.

B

That was to use, string and copy, which does exactly uh the same thing it copy, but uh it use size boundary all right, so destination source and the size you want to overflow, the size, the size of the destination- and this is important fact because, like if you look at some of the code- and you will be surprised like how many of those happen in the while, like uh people, sometimes pose like a size of source.

C

Right, so if your source is much bigger than your destination, then this will very obediently. Take the size that you give it and it will write beyond what the destination might be able to fit.

B

Exactly I'm pretty sure, maybe not with string copy. Maybe you have seen that with mem copy, which is similar, but for like every memory- and we have seen I'm sure, like there's- been a lot of bugs like that in 2020. So this is not something from the past. Definitely things that happen now. You might see it on string copy, probably not anymore,.

B

So what the the problem with string copy is the following one, which is um what we're going to do here, is we're going to make that mistake, I'm going to make um string copy and we're going to do the size of destination. So, let's, let's do that so sorry, the size of source, so in here what you should do actually is put the size of the destination, not the source.

B

So uh we're going to do that and the problem with string copy and is that um it doesn't. uh If, if the size you're trying to write it's exactly the size of destination, then it will not copy the new terminate byte okay. So that means that if you print that string or do any other operation around string copy, it might or might not found a null at the end of that string and it could be lead with you. Many problems in in the future of that string.

A

B

C

Like going through the history of like string, copying functions, I guess right, so we started with with string copy, which did not.

C

Let us specify the size and string copy is a little bit better because you can say what the size is, but it doesn't quite do what you expect, because it doesn't have the null terminate. Okay.

B

Exactly so, the answer to all of that was to use three nail copy, which I think it came out from the uh bsd team, and what string copy does is exactly the same thing as string and copy, but instead of using the full size it do it. It does have a full size minus one and adds as a null byte at the end of the buffer.

B

So you make sure that whatever your copy is going to have a null byte at the end, of course, if the size is bigger than uh the uh destination, then you are going to have an overflow. Oh that's like, if you're not implementing the right way, you're going to have an overflow, but at.

C

Least, but if you give it the right size, then it's going to do something sensible with the right size exactly.

B

And we can try to do that.

B

Let's see, uh I did it again, this is going to happen all the time. Sorry.

B

And what we're going to do here is we're also going to print a buffer.

B

Too yep, so one of the things that I forgot is to make it include, uh because it's not on the regular library and includes beauty, slash, string, dot, h and the way we have to compile it is, uh we have to add a little uh minus lbst. Here we go simple: now we do the overflow. As you can see, uh it was limited, uh and you can see here uh an interesting fact like for buffer one. We, uh our uh we actually wrote exactly 20 bytes, there's actually no null byte.

B

On the end, however, uh when you uh declare a stack variable, there is padding around that and that's why we are not seeing that like a lot of like trash in here, we do see like a weird character here. We should not be there, uh but that's like reminiscence of the stack, uh because with the mem said we are reading. Actually, with this mem said, rene actually said a null by string at the end of those buffers, but what's important here is buffer two.

B

So in buffer two, we wrote exactly mass max size so in in reality, if we have done it with string name copy, and we can do that- let's do it with string and copy and see what it looks like.

C

If we write max size minus one because there's a null byte at the end, it looks a little bit shorter in.

B

The output, exactly so, as you can see like with string and copy it actually wrote, 20 bytes, and then we see this weird character, which is like direct the the buffer size plus one and then there's a new light after that, so uh with renal copy, we don't have this problem and, as you can see, uh compile it again and as you can see, it's been printed well, there's no like weird character at the end of this buffer. All right!

B

Okay, I mean that's that that's like 15 years ago, or so like gravity class in terms of liberty classes. Now let me show you the actual brilliant class that we want to uh discuss uh in this um in this session. So one of the interesting thing, if you look at the string I'll copy- and hopefully I have the mind files- oh I did it again. Oh no, all right! So if you go monstrous copy, hopefully you have demands here is a very basic.

B

What it says is like it's a function that copy and concatenate string respectively, save it in an error prone and and warranty that the result string is null terminated. That's perfect! Now, if you look at the uh return value and.

B

You will see something very interesting in this specific line. It's a district, the string, l copy and serial cut functions return the total length of the string they try to create and then.

A

What's key here.

B

Yep yep, the key thing here is these- are try to create. So what exactly try to create means in this context right? So, let's, let's, let's try to figure out what it means right. So what we're going to do is uh I'm going to create a variable uh size, t length um and then I'm going to put it here and I'm going to print it afterwards.

B

A

B

Let's see how that goes, uh what did I do? Ld? I was gonna print, so.

C

So that's kind.

B

Of that's very interesting right. It is.

A

C

To print okay yeah I mean, maybe you should.

B

Let me do the following: let's do that, like I'm, going to ask what is the size of buffer 2 all right, so we have the size returned by string.copy and the size return. The actual.

B

Oh the actual uh string length of buffer two all right. I guess we're gonna, see it in a little bit, so yeah now we're getting into a good place. So we have.

C

Go ahead and go ahead: yeah! Maybe again, I'm just going to try and talk talk myself through this, because um we asked it to write 40 characters.

C

um Of course it it realized that it did not have room to write, 40 characters and it it only wrote 19 and then it left one space for a null uh a null terminator. So this means that if we ask for the length of what went in into the buffer as a string, we get 19..

C

um But if you ask for the kind of value reported back by the copying function, it's 40. it's the thing that you asked it to write and not how much it actually wrote um exactly and yeah. So if you're expecting one to be the other you're, probably going to have a bad time correct.

C

Is that a thing that you might do like? Would you expect that? Would you use the return value.

B

That's a very good question and the answer to that it depends on the developer uh and the, uh and this is one of the interesting things about like auditing c code. Is that and especially me that I have like a very short memory, as like people from my team, my uh have experienced. Is that sometimes with an audit team, I'm like asking myself like what exactly will this return?

B

And I keep doing that question for like a lot of different functions uh and that's part of like my lack of memory, but at the same time it goes as an advantage because sometimes, like you really forget, like what's supposed to be returning certain functions or like what is supposed to this argument, be doing and all of that. So there is a whole experience around like uh auditing by using man files so like actually looking at the implementation of certain code, uh and the interesting thing is also like.

B

Sometimes people tend to create wraps around functions like, for example, I, like you, will see that streaming copy since uh you like, you realize that to compile it, we need the l, bsd library and things like that. What some projects do around that and other libraries like they write their own wrappers? They. What they say is like if street nail copy is already implemented. Let you stream a copy, otherwise use my own implementation of stringent copy and with this external copy, is a example of that.

B

But you will see that a lot in some other functions. The interesting thing about that is like. Sometimes you have to look at that implementation because it might be different from like the actual three net copy implementation.

B

It might work exactly the same, but the return value might be different and then it might expect different things from like a normal string copy and by that different differentiation you might find like vulnerabilities through that way.

C

In this case, interesting that like, even though, even if you being experienced at auditing this kind of code and looking through these, if this doesn't stick in your memory, it's pretty reasonable to expect that for someone else, maybe coming into a new code base or who's developing something and has jumped around between what they're working on might also not remember the subtleties of what this is returning and what that intended. Meaning is so these are really non-intuitive in that.

C

It's not immediately obvious to you uh what you're going to get back out and whether it's the number or half the number or the number minus one that you expect exactly.

B

So, let's, let's make an example of that and how these uh can, potentially, you know, be exploited. So the way we have seen it in a lot of different copies is the following.

B

So what we see is like this is one string of copy because, like when you see that, as like someone is using the return value of the extreme copy, that means that they're going to use the return value right, like nobody like just get that I there is some exceptions about that because, like sometimes we want to check if uh you know the the we are copying more than the actual uh mac size.

B

So there's some remaining left, but in general, when you see someone use using the return value from string.copy, that means that they're going to use it for something. So a good example of that is, like another student copy, to write some other string into the same buffer. So what internally? What they do is like they copy the same buffer too, and they use the return value as an offset.

C

Okay, so you're kind of copying things in a sequence into the same buffers, you're. Really oh you're doing string concatenation. I guess you're building up a long string which has a bunch of segments that you've placed in.

B

Exactly so, we have in this case we're using the first argument and the second argument of the function to copy it all into buffer two, and- and this is valid like for example, uh if buffer two is enough and we're not you know over on more than we're supposed to, then we will be copying like these two buffers in the same function, and that will work pretty well. So what they do is like.

B

They use the length that we see in here as an offset to where to start copying the second buffer, the r2 and then in terms of the size, since they don't want to overflow what they do is they use mac size minus that length.

C

Okay, so after the first time, I hope that I've copied mac uh I've copied length uh something into the buffer, and then I start at the end of that string that I copied and then the amount that I copy. The second time is the space that I think is left exactly okay, so that would make sense if the length were really the length correct that I actually copied. I guess.

B

So let me, let's start with that and use it in a valid way right, um so we're gonna, compile that and then what we're going to do is going to run the exercise with it uh with a x and a set, and what we're seeing is exactly that right. Like first argument and then the second argument is being copied into buffer two everything looks fine.

B

The length is five bytes, because the length remember is the return, but the return value of the first three null copy, which is like those five x and then the final string length is actually the string length of buffer two after we did the choose three null copy. So that's ten bytes sounds good. Now, let's start like making some fun. So we start adding more x here and what's gonna happen. Is that so far, so good uh we're going to add the? um What happened here. Is that we buffer?

B

uh The first argument is like 19 bytes, the first the this x x are. Actually, let's leave them. Let me do it with python, so it's much clearer, so we're going to actually copy um 19 bytes and then we're going to add some sets at the end right. So that's perfect. It works well why it worked well because it went here and uh the return value was 90 bytes, so it actually did a max size. Minus 19 and that's gonna be one and that's okay and we try to copy one.

B

But since external copy does size, minus one is actually end up copying nothing at all and, as we can see here in the result, we got a normal buffer two without the set included.

B

We have gone one by less 18. Then we see the one set at the end right right. What happens if we go one bite more so far so good, but if you do 30, then something will happen.

C

So we started writing in the wrong place because we wanted the disease z to go into buffer two, but we ran out of room in buffer two, and I guess the offset where we started. Writing was outside of buffer two.

B

And let me do the following: let me just so: we can look at as how the stack will look like I'm going to order it. This way that makes.

C

Sense: let's do that.

B

So, as you can see here, what happened is that buffer 2 actually copy all the c's and then copy the sets after that, and then the set end up being overflowing buffer. Three and let's use one- and you can see like the length of the whole thing when we, when we did the first string of copy, was 30 bytes.

C

But it didn't actually copy 30 it. We asked it to copy 30 and it copied 19.

B

It's getting interesting right: okay,.

C

So this is quite subtle.

B

Yeah, it's quite subtle! So, let's- let's let's um understand what's going on here and to do that- we're going to basically replicate what arguments are actually passing through street copy.

B

So we want to copy so we're going to put like this is going to stream copy of buffer plus 2 plus length, and then it's going to be r2 and then what we're going to do is exactly subtract max size, minus length right should, and so here we're going to do is d is going to be length and then the argument is going to be max size minus length. So basically, here we are like showing out what's going to happen with the second string of copy.

B

So as we can, oh, I forgot to do a line break here. We go.

A

B

So now again, so what's happening is the following: it's it's starting to copy! First of all, there's that's the ability right there like no matter what the site thing is going to happen. Is that we're starting to copy, add buffer to plus 30 and remember that the size of buffer 2 is 20., so we already like whatever we are writing here. Is it's going to overflow, like it's passing the boundaries of buffer.

A

B

Then, of course, we're copying the content of our agreement too, and then what is the the the size we're copying minus 10.? That seems bad. That is definitely bad. So, uh of course like, uh if you can see, is the way string of copy works is the um the third argument is a size t, and that means it's an unsigned integer. uh So.

C

If you give it something negative, it's going to underflow into something really really large.

B

Yes, okay, yes and then sign it. So that's where the magic happen, uh of course, in this case, since this external copy is not going to like overflow, all the way like if it was mem copy, what happened with one copy is that you're actually going to take that size and going to overflow? Whatever is on that size? So if it doesn't say mine like minus 10, which is going to be a large number on assassin's creed, it's going to try to copy that amount of unsigned byte.

B

But since strenul copy always stop when it's, when you see a null byte, then it's only going to copy as much as rb2. Okay. So that's what's happening like we end up over overflowing buffer, three.

B

And that's the back and we can see multiple uh um different backs around that like it could be with the second. The second uh function can be a stringent copy. This can be ascending as printf. It can be a mem copy.

B

It can be any function that copy memories you want. This could be a string lcat too.

B

A

C

A

C

Expecting like a size argument right and then, if we're using that size argument, that's derived from this calculation that string l copy gave us it's not going to be necessarily at the position that we expected exactly. Okay, exactly.

B

So now that we identified the minority class and of course like the way you will exploit that in all computers and you're gonna go about like bypassing all the different uh security implementations nowadays. But the way you will do it long time ago is that you will just overflow the return address, and then you get control of eip now.

B

The interesting thing about this specific vulnerability is that it's not just a stack overflows. If you can control length, then you can like modify the offset to move it around and potentially, if you know by passing it any like uh canary token restriction and things like that with that offset. So there are like many ways you can potentially exploit that in modern uh operating systems.

C

But I won't even notice that you've like smashed the stack canary or done other things like that, you can bypass that because you control exactly where you're writing exactly.

B

Exactly so taking that aside, let's, let's start to look at ql, okay, let me stop sharing and let's see how someone will without it in this work. uh How would someone like start modeling that community class.

C

Great, so what we've got, what we kind of talked through here and there have been some very interesting comments on the chat about you know the the value of talking about both the vulnerability side of it, and then you know, can we get to the ql? um So I think this is very interesting in that.

C

I want to know very clearly what the problem is when I start to write a query, or at least have some idea of what the pattern is that I'm looking for, and I think what you've shown me here gives me a good idea of that right. I'm looking for these these particular functions. These particular string copying functions, and I know what their arguments mean now, and I also know that the return value is interesting and the size arguments are interesting.

C

So with that knowledge, let's write some code ql.

C

And what have I got here? um I've got a code, ql extension for visual studio code. uh If you've been following along with github security, lab and associated technology, this should be familiar. I've got a database of some open source code. This is apache guacamole.

C

I've taken this from a historical point in time, so it's a fairly old database, but it's got some interesting patterns in it, and I've got a starter workspace which has the ql standard, queries and libraries and the ability for me to write my own.

C

So let's write a new query for ourselves in order to start doing anything interesting, I need the standard libraries for codeql for cnc plus plus, and what do I want to look for. First, is I guess to start with, I want to see whether this api you described is being used right. Do we use these string copying functions? Are they in the code base at all, um and maybe I can describe them in a nice way, there's that we know them by name. We know there's a particular kind of function.

C

Call uh so, let's, let's say something like uh string, string, call string copy call and what kind of thing might this be? It's probably a function. Call I'm just going to make my editor a bit bigger.

C

So I'm using the function call from the standard library now that I've defined my own class. This will describe a set of values. I need to say what it means to be a member of this set of values.

C

Well, I can get the name of this. Oh, I should get the target of the function. Call because it's a call the target will tell me uh the function that I care about, and now I can get its name, and you mentioned a couple. You mentioned your string, l copy string, l cat- maybe let's just use- use those um string copy, there's no o there. We go and uh string account. Let's start with those right now and uh well. I can, I know some other information about these.

C

I can get the return value because that's at the point that I call, but maybe I also want the size argument, so let's write a member predicate on this class, which returns an expression. That's the argument of that particular call.

C

Say it gets the size argument because that's what it does and what do I want to say.

C

This dot get argument now string, l copy and string lcat have the third argument as the size. If I remember your example correctly, uh because we are right, uh we are programmers in z, we use zero indexing. This is two and I equate this to the result variable. So it says this returns the second argument or the argument at position. Two of this call all right. So, let's, let's look for these from such as call.

C

And let's run this query.

B

And I got nothing, nothing have.

C

You tested this query before so there I guess we you mentioned that people write wrappers around these functions right um and so maybe like names are a very strict thing to match on. Maybe they've got their own version of it. Something like that.

C

What we can do is try to match this a little bit more generally, so, let's say something like.

C

I know matches, let's do a regex match.

C

And then I can have anything at the beginning and then I have a string copy string count.

C

Let's see if this does any better, I can. I can account for the fact that they might have written some naming prefix at the beginning.

C

That is a little bit better okay. Now, because this is guacamole, they have guacamole naming, so everything is guac great. Now what do these look like.

B

Now not for this session, but I will definitely like look at wax three like copy and like make sure that what is that that the return value is doing exactly what we're expecting to do. But it is right because we already check it out. We've.

C

Got to look at it, okay, so I see a bunch of these calls, um there's a whole bunch of them. I mean we uh but there's maybe some and there's some that are box, drill, cats or something that are glock still copy. It's good. I can handle both of those patterns reasonably well.

C

This one doesn't use the return value. So maybe that's not very interesting. Oh here we have the length we use the length, but then we check it and we do some kind of bounce checking. So maybe that one is okay. This one kind of looks like the pattern that you described right. We we do this string copy, we get the length out if it's beyond a certain size, we bail, but then we use it again in. We do exactly the pattern that you describe right.

C

You subtract from the length and then you do another copy and you actually you keep doing this a few times.

B

And that's exactly what you will end up seeing on um on functions that implement that in the correct way that as soon as they get the length, they will check it against. The third argument of strenuous copy right.

C

Which is what we had in the size over here.

C

This one doesn't seem to check it against the length; it checks it against zero, but it doesn't look like it checks it against the the bounds that you might expect it to be limited within, so that one looks a little bit more concerning right now, of course, this query is not super interesting. I've just looked for all the calls. You can do this with grep, but we can knowing that there is some interesting examples in these results.

C

Maybe we can improve our great actually look for uh the pattern that we care about right, and so what is interesting here in this problem that you described is you we want to see the return value of this, and we want to see whether it ends up in the length parameter of another. Of these calls. Sorry, the size parameter of nothing is called.

C

So, to do that, we talk about something much more complex in codeql the flow of data through the program. I want to see that the value in the first place flowed into the value in the the expression in the second place.

C

How do we do that? Well, we import the libraries that allow us to reason about data flow.

B

Which describe a lot of vulnerabilities right like there's? A lot of abilities are around like data flow like there's a source that end ups in a dangerous thing, a source of certain kind, on a sink of certain kind.

A

C

It's a nice pattern, it it when we write it, it's a little bit of boilerplate, and you know the audience will see this in a moment when I write the query, but it's a very powerful paradigm, where you can describe this kind of common situation of here are sources here are sinks. Here is the flow of information between them and a surprising number of queries. As you said, just fall out of this pattern. They all kind of look the same.

B

Like, even if you talk about like uh web right like an sql injection, is like there's a source of like something that you control and then there's a sink of like uh this source. You control, ending into uh method that you know query a database.

C

Right and I guess cross-site scripting is if your sync is you're- writing it on the page, then you've got kind of the same vulnerability. Okay. So, given that all these queries, all these problems have the same form. We describe them in a similar way. We write these configurations which are really just boilerplate, to describe the sources and the sinks.

C

So in this case, I'm writing my own configuration class, which extends team tracking configuration this. That does most of the work. I need to give my configuration a name. It doesn't matter what this is.

C

My configuration used to have a couple of predicates in it. It needs to have an is source and is sync predicate, both of which describe the endpoints of this problem. So let me write is source first.

C

My source is a data flow node. This refers to a point of the program that might have a value. um What did I say the sources were. I said it was just the return value of such a call right. If I get the length out, it might be surprising.

B

Because all start when, like someone returns something out of like strenuous copy or cut.

C

Right right, and so that's what I wanted such a return value is an expression and in particular it is one of these calls.

C

So I say that the source is an expression and it belongs to the set of interesting string. Copying calls that I described previously.

C

I need to do something similar for syncs, we'll call it sync. So I don't confuse myself and now my sync is not the call itself. My sync is the size argument of such a call, and so I can do something like equals.

C

Take any string, copying, call c and get its size argument.

B

So let me understand what you just done, so we have as again, we are uh defining a train tracking configuration uh to make the data flow that you were mentioning, and then we are overriding two different predicates one is the e-source which describes the source of the data flow and in this case, what we're looking for is the return value out of the function that calls signal, copy or string cut, and that that data flow has to flow all the way to the same function or like a different functions that string a copy or external cat.

B

But instead of you know, looking at the return value as we did on the source. Now we are looking at the get size argument that, as we defined, was the third argument of those functions that we mentioned exactly so we're going from external copy to astronaut copy, basically yeah.

C

And they might be the same, they might be different. You know it's possible for it to satisfy this condition in both cases, but we're not insisting that they are exactly the same, uh because that would be very narrow.

C

So I've described this configuration and you have described this configuration now. I need to use it in a query. uh Let's add ourselves.

C

We can use these path nodes so that we can have the editor display a nice path that shows us how information gets from one to the other use my configuration, because that describes the problem, and what do I know about this? I want to say that the config has flow between the source and the sync, and I will select them in a particular order. That is only interesting because we want to format the results in a nice way. So I can say here is that that's it return value.

B

Before we actually run it, can you walk me through any, because I can see that why you're not using any on the source, for example- and you are doing.

C

It that's a good question, any is kind of cool we could use any in the source. We could do something like this.

C

This is perfectly valid and it's exactly the same as what I wrote before. So I guess you have a choice between you have a value that we've described. The value is source as expression, and we want to say that it belongs to a particular class.

C

So you can do this in one of two ways: either you can equate it to something that belongs to the class, which is what this equals and then any value does, or you can just say directly, it belongs to the class and so belongs to is instance of, and so instead of inequality, we're saying it belongs to this class or this set.

C

In the second case, we don't really have a class for the size arguments. We could maybe write one, but we haven't. Instead, we can get them as values. We can get them as the return expression, the return value from this predicate, and so I'm going to equate these.

C

But I can only call get size argument if I have one of these string copy calls floating around, and so I can use any to find such a value for any who have seen exists in any of the codeql tutorials or other queries. These are quite similar. You could equally equally well write this as um the.

B

Same question on twitch right now, like what's the difference.

C

Between like six than any the audience is ahead of me, these are syntactically equivalent. I could say there exists a string copy, call and call it c um such that the sink as an expression is c's size argument.

C

So these are the same. These are just equivalent ways of writing the same query. This is kind of the nice thing about code ql. There are many ways to write the same thing. The difference does not matter as long as it is clear to you as the writer and ideally to your readers.

B

There's no optimization by using any rather than exist or anything among those lines that we shoot.

C

Yeah, the in this case the engine, the optimizer, will compile them down to effectively the same thing, and so there's no performance gain doing this. My intuition for deciding how to write something in terms of style is this is kind of a one-time use variable right. If I were saying more things about c, then I would probably use an exists, because it's quite useful then to declare it as a local variable and then say some stuff about it.

C

In this case, I only want c so that I can get its size arguments, so it feels like a natural place where you can use any, but, like I said, it's not meaningful in terms of uh the behavior of the query so pick whichever one you find is most intuitive. You now I've written this query with this particular structure. I will also import a little extra library that allows me to visualize the graphs in the editor and I will add some metadata again.

C

This is not super interesting from the point of view of you as a researcher, but just to make the tooling work. We say that this query is formatted in such a way that it will produce graph results or path results, and now I will run the query.

C

Of course, this will take a little bit longer because we have started to use a much more sophisticated library which needs to be compiled and then evaluated in order to understand the flow of data through the program.

B

And let's talk a little bit about the flow of data, is this happening inside a function? Is is happening between like uh inter functions, how that works, no question.

C

So there are different levels of the the data flow and team tracking libraries. uh There is a local variation, which is um you can do change tracking local taint, and this will track the flow of information within a single function.

C

This is quite feasible to compute, so we can actually compute it for every single function in the program.

C

However, this is pretty limited uh in our example. You know if, if, if you'd done, this string, l copy call in some other function and then you return the value back out and then you ended up in some other function.

C

Local taint would just completely fail to catch that, so it's a good place to start in terms of writing your query, but it it's limited. So you know, maybe you can do if you're writing a query. If you're listening in you might want to start with something local and then you can gradually build your way up to using this configuration and the difference in using this configuration- and this has flow has flow path.

C

Situation is that this is global flow. It is inter-procedural. It is capable of reasoning about the flow of information across function. Boundaries and we can see- and we can inspect in some of the results- whether we actually manage to get any uh interesting results in this code base at cross function boundaries.

C

Of course you pay a price for that, so it's slightly more expensive to compute, which is why we have to narrow down our sources and our syncs to something pretty manageable.

C

If these are very huge, then your query is going to take quite a while to compute.

C

So what have we got here? We've got a set of results. I'm curious. I can expand these parts. This is always quite fun.

C

B

Is one of these, so what we're seeing here is that number one is the source and number choose the destination right, yeah or the synthesis. Indeed.

C

These are similar now remember we, we spoke about the kind of bounce checks, but we haven't done anything about them in our query, so this is just from auditing point of view. Finding all the calls to string copy whose return value might end up in this length argument, and we notice that it it handles this subtraction operation. Fine.

C

This is because we use change tracking and not data flow, so we're fine with these minor changes to the value. As long as some value is derived from the length, probably with some arithmetic involved, we're still going to catch it.

B

C

And this this is one of the interesting ones. I think that we described right we're not there's no bounce check against the length. um I won't go through all of them. Oh, this one has has some possible parts there's some variation. There is a conditional um which is guarded by this macro over here.

C

I can jump to the definition and see what it is.

C

So I see that there's this macro that actually does uh a size check before it attempts the subtraction and it actually kind of bounds out at zero if you're doing something invalid. So these are false positives, because there is a sensible size check and we can think about how to tackle that in a few moments.

C

But for now we are at least convinced that this analysis is perfectly capable of looking through a few layers of indirection and describing to you these multiple possibilities for how you might end up in the size argument of one of these string copy calls again like I said I won't go through all of them, but it's quite interesting that you're capable of doing that.

B

Let's look at some other ones where it says fs.c.

C

Let's look at fs.c what.

B

C

Oh well, there's no checks at all. No, not.

A

Even an attempt.

C

To check previous rifles is convert, it is doing some kind of path, computation, right, yeah,.

B

Now, of course, to actually being able to exploit that, you need to be able to send a source that is bigger than the um destination like this.

C

Right, so you need your parent, whatever, that is to be bigger than max path. I.

A

B

That's possible sometimes not.

C

In this application, you need like a a file system paths that is longer than the whatever they're setting for max parties. Maybe that's possible.

B

And this is actually the function that we report to guacamole when we added this functionality. It was not about itself because we couldn't find um a place where you know. There's the the parent was like bigger enough than uh rdp fs max path. uh However, like we, let them know that the pattern there could be potentially dangerous in the future, uh so they should look at it.

C

Yeah- and you know sometimes when you're doing this kind of, even when you're looking at the results of these queries, what we've written here, I think, is an exploratory query. It's a great starting point. Frankly, it doesn't give you that many results, so it's quite manageable, but it's a good starting point to say like what am I looking at in this code, that is probably buggy, and then you can use your expert knowledge in terms of like whether this is exploitable what to do with it. That kind of thing.

B

We have a question from the audience and is what is the difference between a top level expression and say a call in the u-boot channel. You would challenge. There was step where we had to find top-level expression of calls.

C

Good question so the u-boot challenge uh again for those not familiar. This is one of the tutorials on uh the github websites that involves anal github learning lab that involves analyzing the source code of uboot and finding particular calls those were macros. If I remember correctly so, there's a distinction here in our libraries and how we model cnc, plus plus code between functions and macros function, calls and macro.

C

Invocations macros are, in my personal opinion, quite annoying to model and to reason about when you're writing a query, because they can expand to almost anything they can have. Any syntactic element it doesn't have to be complete. Syntax is not be fully valid syntax, it is just some fragment of code. um However, we can provide you the invocations of these macros and then you can look at the the values that come out of them.

C

So in those cases the u-boot tutorial asks you to find the macro invocations and the corresponding expression. So the kind of the main call that comes out to this, because those macros really just wrap function, calls most of the time here we are in a nicer world. We are dealing with function calls these are not macros, although we could maybe change the query to handle that case, and so the function call is already an expression.

C

If we follow up the inheritance hierarchy using jump to definition in the editor, we see that it is a particular type of expression. So here we don't need to do this top-level expression business because we're already at the top-level expression um yeah if you're reading about macros work with care, you will then need to probably find the corresponding expression for the macro invocation.

B

All right, so what happens if we want to add um sm printf to this data flow, because that as a printf is similar to what we have seen uh here? But uh the argument for the the size uh constraint is on the uh second argument, not on the third, so that, like is going to change the way we are looking at this right. All right.

C

So we we have, I guess these, let let me just write down the patterns in a in a boring fashion, so I have uh string, l copy is desk source and size. I have a string called cat is pretty much the same, and then I have sn printf, which is, uh I forget, the the first one. It's.

A

C

Something like and.

B

Then for string format and then your values, then values.

C

um Okay, so I've got to handle these separately, and that also affects my get size argument predicate right, because I want this index to be um based on what I find there. So there are some options for doing this, uh because we've written a class to me, the most natural way is to say that this is a a value that is associated with the class. um We allow you to declare fields. This is a little bit like a field might be in java or some other rc, plus buster, or some other language like that.

C

So we can say size, argument, index.

B

Has a question: how can cochle know what c dot size argument is on any function, and this is exactly what um idt is about to explain.

C

uh Yeah, I think some folks have also asked this a little bit earlier, where they were trying to make this more general. So you can do this with a lot of exists written within your characteristic predicate. That's this predicate here that defines the body of the function of the class, but then you have this problem of. How do I work out? What that value is here in this other predicate, so we're going to declare it as a field.

C

Just remember that everything in codeql can have multiple values. It doesn't have to be just one, so you have to handle these with a bit of care now. This is not bound to a value. It's complaining at me. So let's bind it. So, let's write these different possibilities. One way I could do this is, I could write something like.

C

I have a name, the name is.

C

Target dot get name, and what else do I know about this now I have a bunch of different possibilities.

C

Excuse me: let's go back here, so my first case is that it's string I'll copy, so I can say name.

C

Equals string I'll copy and size arg index equals 2 or something similar for string.

C

Lcat the size argument is similarly also 2 or the name is.

C

And the index is 1.

C

And now I need to close my exists, I'll format it just to make sure I can read this reasonable way. So I say that the name is the name of the called function. uh If it's string, l copy or string l cat, then I get index two and of course I've not bracketed this correctly. So let me make sure I do that. Otherwise, my ands and ors will get mixed up.

C

That's better. So either it's string, l copy or string lcat and the index is two or it's sn printf and the index is one and now I had better use the correct index over here.

C

This is slightly more general. I lost the handling that I had of like guac stringer copy of these prefixes, um so I could do something like this.gettarget.getname matches a wildcard anything at the start, plus this name.

C

I think this is a reasonably neat way of describing all the patterns that we care about.

B

So let me see if I understand so, we for the first function we're looking at any uh function whose target which is like what we are exactly. We are calling match: external copies, renal cut and sprint f and for each of those we are assigning, which is the argument index of the size that we're looking for so for external copy. This number two string cut number two two and four s and printf number one.

C

Yeah and then we attach that value together with the current value in the class and we are allowed to access it over here. This example is neat because there is only one possible value for each such name, but you could you could write many. You could have a two or three and you could do all sorts of stuff like that.

B

Are we assigning the value to to size our index? Is that how you should think about it?.

C

Probably a better way is to think about it as asserting that two things are equal, so you say that these are not assignments also because you can write them. The other way around.

C

This is perfectly legal, and so this strips up a lot of people who are used to assignments yeah, I think of it as an assertion of equality, any variable can have a whole bunch of possible values. They are determined by its type. So in this case an integer variable must be. uh You know a 64-bit integer, um 32-bit integer, and you then decide what the possible values are narrowed down to. So by writing such an equality assertion. You are asserting that in this situation, the value of this must be true.

A

C

Right, someone on the chat mentions it's like a filter that takes a multi-valued result set. That's another helpful way to describe it so think of these as it's worth developing your own mental model of like how you want to think about these, I like to say there are sets of possible values or tables of possible values and predicates, and then they are narrowed down through these assertions. uh Based on what.

B

Yeah I have to I have to make that uh when you start with cochlear, I have to make that mental model of uh filtering write it like a signing personally, because I you know, I kept thinking the wrong way uh and by like starting to think about like filtering instead of like assignment it like helped me, you know, understand better uh what I'm looking for.

C

Yeah and another common one that turns up is just multi-valued like. If things can have multiple values, it's it makes sense when you need to reason about things happening many times in your code base right. There won't be only one of these calls. There could be many, uh but it takes them some time to get used to that.

B

Now, the way you set up this uh um configuration for the data flow, that means that now the data flow can flow from a string s and print f to s and printf. If I want to or from nsf printf to external copy or like any of those combinations.

C

Yeah uh right now, I'm not really insisting that it's the same kind that ends up in each end. I can flow from any one of these into the the size argument, if any other one of these, maybe that's okay. I think that seems reasonable, at least for the problem we're describing, we could be fancier and we could add it to the configuration and say this is a configuration that only applies to this particular kind of string, but it feels like uh it feels like overkill.

B

So this is sort of remind me of that vulnerability that cat found some time ago on, I think, was iris's log. If I'm gonna write, I believe so, and um he was not looking at srinal copy or string account, but rather sm printf right, like I think the way it works uh and um you might have the uh database around or.

C

I do have the database um okay, conveniently let's, let's, let's run this query on that database. uh This was for the audience. This was the open source project. Our syslog, uh who again at this database, is from april 2018. So that was the time at which they fixed it.

C

We're not dropping zero days here um and they had this interesting vulnerability is the same pattern that we described, but with s and printf, and which one of our colleagues in the security lab found reported, um and they let's see if I can find the right one, I think it.

A

C

A

C

Let's look at the code, so you've got the same pattern over here where you call in this case you call s and printf. I think in a loop.

C

B

Have so interesting because data flow understand the loop and understand that the flow of information will eventually move uh like on the first call to the second call of the loop right.

A

C

Able to it's not you know, and many people ask about this like does it get stuck? If there's a loop, does it like? You know, stack overflow and we're like we're reasoning about what the program does statically, so it will be a recursive calculation to figure out that here is a graph which has a cycle in it, but we're not going to try and unroll all the possible values that the loop might have, because we're not running the program which is kind of nice, but yeah.

C

As you said, it does know that this is a loop, so it is so. This statement will be followed in some control flow path by potentially the same statement again, and that means that the return value that got added into I all names might end up back in here in this subtraction, where we subtract io names, I believe for sn printf. You have to add some other conditions that say, like your format string is going to allow you to add something unbounded.

C

So you need something like a percent s or something similar in the format string. Otherwise, if you fix it to like a an integer with a limited number of uh places, then you can't get it over. So it's fine.

B

I mean eventually you might, you could potentially do it, but you will see more cases around like uh strings rather than uh integers or rather yeah.

C

And I guess in all cases you could still have this bugginess of you started in the wrong place. You just wrote it in the wrong in the wrong part of memory. It might not overflow, but it's not doing what you wanted to do.

B

That's very interesting, so I know it's not the scope of this conversation but, as um you know, part of a blue team. um So what we have done so far is basically exploratory queries. Right, like we are looking at code, we are, you know, understand a specific ability class and we start like building a class to find different cases.

B

As you can see in um we can see in guacamole we find like 21 different entries that has that specific uh pattern, but, of course, out of the 21 entries there are other a bunch of them that has like guards, and all of that for me, as uh you know, security researcher, that's good enough, like 21 results, is something that I can. You know, look at it in like 20 30 minutes top.

B

uh At least you identify like two or three functions that I think are good, but as a as part of like a blue team- and you know connecting coql into my ci cd uh having uh let's say like out of 21, I have uh three good entries like having like um 18 false positive.

B

It might like bring some friction into the uh it's not.

C

Great, if you, if you have that and you I guess, you want developers to to grow their security knowledge, and uh you can't have that if they're unhappy with the results that you are giving them great. So that's a, I think, that's a really interesting point. You make about kind of what different security teams might need these different users. I think, in terms of queries. There are really two ways of thinking about a query. There are two possible queries that you might write.

C

um We've done kind of an exploratory query, so this is something that you know. Security lab researchers will use quite a lot. Folks who are contributing to our bounty programs, will write often um we're exploring the code base right, we're auditing it we're figuring out vulnerable patterns, we're kind of getting more precise as we go we're building up a query to be more meaningful.

C

Like you see me do today, but we accept that as long as the number of results is manageable in some human sense, it's okay, if there's a little bit of noise because we know to manually look them up so that then the other kind is really a production query, something that you would run. As you said in ci, um you would run it on.

C

You know like github code scanning, if you had that enabled, then you want something like something precise, you want to be able to say with reasonable fidelity like this is probably a problem. Let us give you all the information that you can have to fix. The problem.

B

And like everything that happened in production, it's take more time and it has to be like much better quality that but little hack week came out and.

C

You know we also try to encourage people to run this kind of analysis at code review time, and if it's coming on your pull requests, you really want them to be sensible results, otherwise, you're going to be deeply unhappy with whoever is giving you this uh tooling and analysis.

C

So with that in mind, I think it's useful as query right just to think about exploratory versus production. um Often an exploratory query can make its way into production. That is perfectly reasonable.

C

Now to the specifics, how can we do that? For this query? um Well, the main obstacle here is that we haven't worked out places that might sanitize the the length value or bound or restrict it or not, in other words, make it safe.

C

We've only talked about the sources and the sinks, and you know if you're, following along with this terminology, there's this other aspect of what is a sanitizer for this problem. Have I protected- and you know with these examples that we saw that I and we kind of verbally agreed- were false positives. The pattern was that there is a check right and then in one branch of the check you you don't get the same value, it's just zero.

C

The data flow analysis will take care of that, but in this branch we know that this is a safe length. It's been checked, it's all right to use.

C

So let's look at the configuration and see what it enables us to do. It has a bunch of predicates defined on it. I'm just going to close these results. For now it has is source and is sync which you've seen me use already. It then has is sanitizer, so that's a place where you might have a single node in the dataflow graph, where you want to stop the flow.

C

uh This is good if you have something like you're talking about sql injection, cross-site, scripting and there's a maybe a sanitization function that does escaping and then, if an argument goes into that.

A

C

Assuming your sanitizer knows what it's doing, we won't get into that.

C

What we have here is a little bit more involved. It's not so much a single place that gets sanitized as like a condition and if you're inside that condition, you're safe and if you're outside that condition, you're sunk, um and so there is a hook for that. It's a little bit more involved to explain, but we can use it. These, I think, are called barrier guards or sanitizer guards.

C

So the definition here says it holds if data flow through nodes guarded by guard is prohibited, and if we describe these in our configuration, the analysis will take them into account when it calculates flow.

C

But this means I need to construct something of this barrier guard type, a guard or a condition that validates some expression, and I need to describe what it checks in what branch. So, let's start writing one of these and I'll talk myself through it and convince myself at least that I'm on the right track.

C

First code snippets are useful. If I do something like if length is less than some bound.

C

Then I guess it's safe to use length and if not, then it's not safe to use that.

C

Simple check for the overflow there's the dual of this, because I don't know which direction my comparison might happen, which looks like this. If it's, if it's bounded below you saw something that checks against zero, but not the upper bound. Well, I'm still unsafe in that case, but it's safe. If I'm about it.

C

So I want to handle these cases. Let's give myself a a class name, let's say comparison guard and it will extend data flows, barrier guard class.

C

Now, as with all classes, I need a characteristic predicate to say what is in this class. What is the value that happens here?

C

I believe these kind of comparison operations are called relational comparisons, and I will make sure by using the autocomplete, because I cannot have all of this in my head at all times.

C

So a relational operation is one of these greater than less than possibly with an equals, two great um good enough to start with, but the compiler is still not happy. It says it must implement this method defined on the super type called checks. Well, let's do that.

B

I have never used very guard, so I'm like really excited what's happening right now,.

C

They're quite neat, I I find them a little bit difficult to to write on the fly, so this is a little bit nerve-wracking, but we will see how it goes, and I believe this also accepts a boolean.

C

It does accept a boolean, so let's give it a.

A

C

Okay, so what? What is this? What is this saying? This is kind of confusing reason, but checked is the thing that gets checked. I want it to be this length uh in this example. This is why I wrote the examples because you can't can't live without them. So checked is going to be this parity is in which of these branches. Is it safe?

C

So I will say one of two possibilities checked equals this and I'll cast it back to a relational operation, and I can get the greater and the lesser operands.

C

I don't care about left and right, because I don't want to reason about whether the programmer did it in a certain direction. I want to know which is the the bigger end of the operator so either I am the greater operand which looks like this, and so I will bring this comment down here and I guess polarity or parity.

C

Is it's in the false part of this that I want? I think this is what I want here. So if I'm in the place where the condition does not hold it is safe because I bounded below- or the other case is what I had up here.

C

Something similar as long as I remember to switch the senses around. I am the lesser, not the left, operand, the lesser operand and parity is true all right. uh A good way of checking whenever you're writing pieces of a query is to use quick evaluation.

C

uh You should have done this for the source, but I was in a rush. You can run the command codeql, quick evaluation, to evaluate just this predicate and see what it finds.

C

This will probably find a whole bunch of stuff because it's now going to find every single place in the program that there is a greater than or less than comparison, but it's nice to know that at least I found the pattern that I care about.

B

Will the pattern return the body or the actual comparison yeah.

C

I've it will give me three things. It will give me because I evaluated this member predicate. I will get this, which is the comparison operation. It will give me checked, which is the expression that I got checked and it will give me the boolean parity perfect. This has done some interesting things with the formatting.

C

Here we go so in this case, checked is the bigger one and the parity is false and so on, and if I click through event enough of these I'll eventually find the one that we care about, but I'm not going to do that here.

B

Excessively, a number that you should not do by hand.

C

Right so just make, but make sure that your predicates work is always a good, uh a good stepping stone as you're writing these queries you can do similarly for sources and syncs. uh You cannot, I'm not sure. If you can see my right click menu, you can see it now, it's a bit of a lag codeql. Quick evaluation is your friend here. You want to evaluate your predicates to make sure that individually, they are correct before you put them into a bigger query.

C

Okay, with that in mind, let's run the.

B

Query, are we not supposed to connect the configuration? Oh you already.

C

Know indeed, you are supposed to connect configuration and I have not done so so let me know.

B

Sometimes, like cochle works in magical ways, I'm.

C

It it has some magical extensions where you can add things, but that's generally a good, um a good rule of thumb. If you, if you didn't use it.

C

Now I do need to use, I believe it's called a sanitizer guard.

C

And I will need to give it a barrier guard argument and I just ask it to use the cost that I defined up here, which I called comparison. God.

C

Now I will run my query and actually use the thing that I.

C

Wrote while that runs I'm just having a look at the chat, there is a question about restricting quick evaluation. Some subset of the code base, because you want to you- know check that you filtered some of these. That's a great question. You can't in terms of changing the tooling uh move the other way for a second. However, you can always modify your predicate's body to add something some filter about the location.

C

So what you might want to say is I want I care about sources within a particular file, so I can get its location I can get the file. I can get a whole bunch of things about the file. Maybe I want the just the file name, just the base name, and I can do something like that.

C

This is a good trick for just narrowing down your your predicates. While you test them uh often narrowing down your results as you system as well, because you want to see that you've caught the problem in this particular location, starting at this particular point, but I make this mistake all the time don't forget to take it out when you are done using it. Otherwise, you will be very sad about why you do not find results that you meant.

C

So that is a useful trick, though just use the locations to match all right. So.

B

Before we actually go into the results in the past, without this barrier we got 20 21 results, and now it seems we have gotten 11 results.

C

That's right, let me just make sure that's not what I wanted. uh That was a different code base here we are, we had 21 on uh guacamole servo, and now we have only 11..

C

Let's have a look at them. Make sure that they are wait. Am I even on the same code base, I'm not on the same copies. Let's run it on guacamole server, just for ease of comparison.

C

Oops, there's no one. I meant to run.

C

C

Where do you specify in the comparison guard that the guard compares the length good question.

C

So the comparison guard here is anything that does a less than or greater than check, um because I don't know how far through the program this has flowed, it might be called length, it might not be called length.

C

This is very loose now. The real precise way to do this is to say this is comparing against some known length that I am then using uh from the string copy call. There are other libraries that you can use to say that these are the same expression, but it's quite hard to reason about this across function boundaries. It's quite hard to say, like these two are the exact same values and they might be alias.

C

They might be exactly the same, so we're taking a very kind of imprecise, heuristic approach, you're just saying, filter out a place where you've got a bounce check of some form based on the direction of the bounce check.

B

But that's is using any sort of like data flows from the source automatically like we're not defining the source as part of like this check right, because otherwise I eat every check. So it has to be like somehow related with data from the source to that check.

C

So we are yes and the buyer guide will only stop the flow, so we start with the sources that we defined. We will trace the flow through. We will stop if we found these barrier cards and then we will see if we can eventually reach such a sink.

B

So interesting we have we used to have 21. Now we have eight results. It's.

C

B

C

So I will get rid of my history, so there is a bit more space here. I have one of these calls that is not compared as far as I can tell, but he's not in this path.

C

We are in the else block of this comparison, so it's yep. uh We have one of these similar.

C

Patterns this one is: we check that the length is positive, but we don't check that the length is bounded above.

C

C

Here this one always happens, regardless of whether that bound is checked.

C

Yep, these are all kind of quite similar. This one just has no checks or at all, like there's, not nothing to worry about here. I think this is. These are quite similar, the different parts of a library that is designed to handle um file, system operations or file system path. Constructions.

C

That's awesome, okay, so I am happier about those results because it doesn't get fooled by those like remaining macros that actually do the correct uh recalculation of the length- and this is eight even more reasonable, even manageable. It might be- and some people have pointed out that this this kind of sanitizer is either too loose or too precise, like it might sanitize out less more than you want, but um you can refine that, if necessary. I think it's a really good starting point to see whether you're getting precise results already.

C

B

Have a couple of questions someone is asking: if you want to check for general conditions, we could do if statement correct.

C

If we trace through these definitions of these barrier guards they they are already the things that are in a condition so there's. This is condition that says it is within an if statement or switch statement or a loop condition. So it's already done that for you, you can, of course, if you want to write pieces of query yourself, you can always say like look for if statements and look for the condition inside the if statement and so on, but if you're reasoning about barriers and buyer guards, it's it's there already.

B

Have another question: so there is a is sanitized guard only consider things that flow from source to sink. Does that mean that any expression from source to sink or anything that use the source?

B

Sure if I fully understand the question.

C

So think of a configuration as defining different building blocks, the source is one. Each of these building blocks can exist on its own right. You can have sources of some vulnerable input of some surprising value like we have here. um They exist in the code you describe what they are. You find them. Similarly, you can have syncs. That say you know here is a place in the code, I'm doing something dangerous.

C

um You can have sanitizers, which say I am comparing some value. I don't know what the value is, but I'm comparing some value against some other condition. I am doing some kind of check that sort of thing.

C

Now the configuration is bringing all of these together right, and so, when we do, the the real magic is. This has flow. Paths uh has slow predicate. If we look through the definitions of this- and we saw what reasoning is doing, it is using all of these definitions that we have here and putting them together.

C

It is saying, find me parts through the code where I can start at a source and end in a sync and I'm not going through any of the sanitizers that you have defined or that I have defined in the standard library so putting all those together is what the configuration and the flow path predicate do. That will give you a result that says you know there is the flow of data or.

A

C

The flow of data in this situation, so you can re you- can describe these independently, but for them to be a final result of a flow query, they must be reachable together and what it will give. You is paths from the source to the sync.

B

All right, I think we are getting close to coming to an end. um Do you have any other questions from the stream chat before we actually wrap it up.

C

There have been some questions about guard conditions, but I think the others in the chat have done a great job at answering them. It's the the thing inside the condition is the guard inside the the. If, uh for example, is the guard condition if you're unsure again, you can always experiment right.

C

Get a database of some open source code make one on some open source code write a little predicate that says, find me all guard conditions and evaluate it, see what you get, and that will give you a good intuition for like what these components of the libraries are actually doing.

C

How do you get all results back if you open the codeql view in your vs code? Ide it gives you a nice query. History. You can see all the queries that you run in the current session. You can click on them. You can see the results. You can see the query you can rename it to other nice things like that. So this has been.

C

This ui has been improving very rapidly over the last few months, so it's got a whole bunch of nice features for you to track the results that you're doing and we are open to ideas. So you know talk to us on discussions. Talk to us by filing issues if you're trying this out and you run into friction in your workflow.

C

B

I think that I, if I'm gonna, run if you go like show query text, it will actually show you the original query.

C

I believe yeah yeah, it gives you a read-only buffer of the query at the time it was written. This is nice if you've like iterated on a query, but you just want to remember what did it look like at the time that I wrote this indeed, this.

B

Word can not have.

C

B

Where can I find a copy of this query? I think that we can easily save it and put it on our repo github.com github slash security lab. We can definitely do that or we can actually put on the discussions.

C

We can put it on discussions too. That'll be a great place to continue the conversation.

C

If you also want to see similar queries that do these kind of patterns look in if you've got a copy of the code, qr libraries, this is github.com github codeql for most languages, you can see all the standard queries and uh if you open, sn, printf overflow, for example.

C

This is a more complicated version of the query that we wrote there's a lot of stuff going on here that talks about overflowing that I've not really gone into, but it essentially finds this pattern of uh finding calls to things like s and printf where the result might flow into the size argument.

C

So it's nice to see how these get used in the queries that we run in production.

B

Perfect, all right so just to wrap up. I think that you know we discuss the volumetes around uh return, value of uh string copy memory uh functions and we took like two different approach.

B

One is sort of like a red teaming approach which is around you know, building a fast uh query that finds certain amount of results that are manageable later through manual analysis, and you want to do that right.

B

You don't want to start like adding constraints where you're like exploring code, because that's what you're doing you're exploring code and you might find minority classes that are exactly what you're looking for, but you might find like things that you are actually not looking for through these like um basic queries that you're building as we you know sort of like went through that making those easy queries are much faster and easier to build.

B

Now, on the blue team side, there is this that you have to think about, like all the potentially uh false positives around one umbrella, so that requires much more investment in it, but obviously it paid pretty well because then you can add it on your um debsec pipeline and like every time that a developer make a mistake or like follow that bad pattern.

B

Then you can detect it right away before they actually get introduced into code, okay, so for everyone here, uh just essentially, you know that we have a slack channel. If you want to ask more questions later in the future, we also under our github public repo github.com github security lab. We have a discussions, which is a forum that you can ask questions around uh call, ql or anything related with open source security. uh And uh finally, you can also visit our website securitylab.github.com, which has a huge amount of resources.

B

You can download call ql in there and compile your own code base as far as it's open source or for academic use, and you have. uh We have uh learning labs. We have ctf. We also have back bounty that is currently live where you can write your own queries and merge it with the the open source query that we currently had, and you know, obviously you get paid for that. We also have a back bounty around like finding uh real warranties in open source projects with a call ql query.

B

uh It has been very active lately, the back bounty and we are more than happy to get more and more uh results. So you know happy to expand this community happy to be here in this labql. Thank you. I dtr very much um for helping us. I hope that um I'm going to continue having you as the expert in coquille and we will see other black classes in the future.

C

Happy to be here yeah and to answer the last few follow-up questions on the chat. If you're asking about coql training, follow the resources that niko mentioned, we have learning labs that you can follow. uh You'll, hopefully have some more of these, and if for enterprise customers, we of course can arrange things that are necessary, but in the open source and security research world we have plenty of resources available for you to jump in on.

B

Yeah there's a workshop that uh aditya and luke did on satellite. It's like a two hours workshop around uh cochlear in javascript and java. If I'm not wrong, that's right! You can follow that with examples and all of that to learn uh to do your first steps into call ql and obviously we could continue with this live coquel. If you like it, please let us know in the slack or on twitter or or on the discussion. Let us know what you want.

B

What you want to hear like you know, if there's anything on cochlear, you want to explore or abolitiy class that you are curious about. We are more than happy to work around that in the future. We hope to have like other security, researchers and other uh people from the community to talk about their volunteer classes, and uh hopefully you know, find a way to model them on on call ql anyways. uh Thank you very much. Everyone for staying with us and see you next time.