GitHub GitHub Satellite 2020 - Workshops, 8 May 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Finding security vulnerabilities in Java with CodeQL - GitHub Satellite 2020

Description

CodeQL is GitHub's expressive language and engine for code analysis, which allows you to explore source code to find bugs and security vulnerabilities. During this beginner-friendly workshop, you will learn to write queries in CodeQL and find known security vulnerabilities in open source Java projects.

GitHub Satellite: A community connected by code

On May 6th, we threw a free virtual event featuring developers working together on the world’s software, announcements from the GitHub team, and inspiring performances by artists who code.

More information: https://githubsatellite.com
Schedule: https://githubsatellite.com/schedule/

A

Okay, so hi everyone. My name is Luke Artie and I'm, a security solutions architect in the professional services team. Again um day-to-day I work with our customers to help them deploy news. The cool security features in our products and one of my favorite parts of the job is teaching people how to write kql, so I'm genuinely really excited to be presenting today's session on finding security vulnerabilities in job with Kokua.

A

So today, what we're going to cover is what kql is? How do you use click yo to identify patterns in source code, and then we will provide a hands-on session where we'll guide you, through the process of writing, query to find a known security vendor in Apache struts. So we're going to start off with a short slide deck here about 15 minutes and then we'll move into the hands-on session, and the session is designed for total beginners.

A

So you don't feel like you need any existing experience of kql and also, although this session is focused on analyzing, a Java project, you also don't need extensive experience of Java either and the code key are skills that you learn today will actually be transferable to analyzing other languages with kql. Now, if you want to follow along with the hands-on session, you will need to set up the kokyu, our development environment. The link is on the screen here to the repository containing the prerequisites.

A

I think that's been shared with you already, but if you haven't set this up now the you know yeah that would that would be a good time to start now. We also have a slack channel where you can ask questions, and we have a few helpers on the on the call here who are also on the slack Channel.

A

So we have anders arthur and a teacher and they'll be answering questions on the channel itself and also be checking in with them periodically to see if there are answers that we want to share with the whole group.

A

B

A

What is Cody well, so the key idea behind kqr is analyzing code as data. What we do is we create a database of facts about your program, then provide an expressive, declarative query language called QL for identifying patterns in that database.

A

So what can I do with it? Well, the main use case is to find bugs and security vulnerabilities in your code developers a human and, unfortunately, humans, make mistakes, not only that, but they make the same types of mistake over and over again what Cotillard provides us a way to describe an automatic search for the patterns underlying those mistakes.

A

The kokyoku language is very high-level and it's very expressive which makes a very quick and easy to write or refine queries and to provide really accurate results. So what this allows us to do is to help automate repetitive security code review by taking the expertise of security researchers and allowing them to express their knowledge in this sort of codified, readable, executable form in these queries.

A

So some might ask why a new language on there already enough languages in the world well kql has a fairly unique combination of features which makes it ideally suited to writing queries for analyzing code. So first it is a logic, programming language. What this means is that you specify a logical conditions that describes the patterns you want to find.

A

These conditions are combined with operators such as a hand not in or second it is declarative, which means that the order in which you specify the conditions does not matter and that there are no side effects now together. These features mean you can focus on describing what you want to find, not how you want to find it. Instead, what we do is we provide an advanced query optimizer, which takes the high-level declarative code to our query and automatically compiles it to an efficiently executed form now.

A

Thirdly, Curel is object, oriented, and this may seem very strange for a query language we'll explain exactly what this looks like in more detail in a few slides time, but the benefit here is the same as other object-oriented languages. So we can encapsulate data and operations on that data together we can benefit from and we can benefit from inheritance composition and other object oriented type patterns.

A

This also allows us to provide very rich standard libraries that include comprehensive representation of the program, but also simple, to use libraries for performing common program analysis tasks, a data flow analysis, control for analysis range analysis.

A

Finally, we should also mention that kql is a read-only query language. There are no options for inserting euros into the database. Instead, the database is built in a single step from a particular check out of the source code, but say what does the crew actually look like?

A

Well, basic syntax of qo will look familiar to anyone, who's used, SQL or similar grade languages.

A

There's a query Clause and this query: Clause is made up of three parts, a front part a web part and a select part so from where and select, and here the from Clause specifies some variables that would be used in the query.

A

The where Clause specifies some conditions on those variables in the form of logical formula and the Select Clause specifies what the result should be think of refer to variables that are defined in the from Clause there's, also a series of inputs at the top of the query, which les has really used logic defined in other libraries. So this import Java allows us to import the standard library for Java.

A

Let's take a closer look at this example and see if we can figure out what it's doing so, the from clause is defined as a series of variable declarations where each declaration has a type and a name for example. This declaration here is for a variable called if statement, and it has a type if statement which is from the koku. Our standard library for Java, which we imported above types represents sets of values.

A

So, for example, this if statement type here represents the set of all if statements in the program variables also represent set of values initially constrained by the type of the variable. So this variable here initially represents the set of all. If statements in the program and what we'll do is, we will later refine that with the conditions that we apply so meet, the second variable here block represents the set of all blocks in the program. Payee elements surrounded by curly, braces, typically, a sequence of statements.

A

Now the where Clause allows us to specify some conditions on these variables and combine these with logical operators such as, and here we provide a condition that specifies the block variable is equal to the. If statement get, then there are two important aspects here. So, firstly, the if statement get then this shows the object, oriented nature code, QL so get. Then this is an operation that is provided on the type if statement and it returns the then part of an if statement.

A

So if you think of what an if looks like typically it's a if some condition, then something happens else, something else happens say, let me say get then is the thing that happens if the condition is true, so this is this is demonstrating the object-oriented nature of of Kokua now. The second aspect here, that's important- is that this equals here is equality, not assignment. We are not assigning this value into the variable block. We are saying that it is equal to this.

A

If statement don't get, them is they're specifying a logical condition, and essentially it's saying find me all pairs of blocks and if statements in the program, whether block equals the then part of the.

A

If statement now, we can see that the, where Clause also has a second condition here, that we combined with and meaning that both conditions must be true and the second condition says block, don't get none statement equals 0 again, we can read this in a sort of natural language way where we say this says find me your blocks in the program whose number of statements is 0, in other words, empty blocks.

A

So if we step back a bit here, we can see that this query is actually finding if statements which have an empty then block now. Finally, if we have a look at the the last part of the query here, the Select Clause. This is actually what defines the result of the query and here what we can see as we select an element in the program and a message that we're going to report at this for this element in the program. This is how we report static analysis issues in your code base.

A

It's a location in the code base and a message at that location. Now, one way to think of the Select is that it's going to produce effectively a table of results where the first column is. If statements and the second column is the message associated with those if statements, so what we're going to do now is take a brief tour through some of the reuse and encapsulation features in kql before we actually start the hands-on session. So these are the building blocks of of queries.

A

So the first feature that we'll introduce you to is predicates, so these provide a way to encapsulate portions of logic in the program so that they can be reused. You can think of them as a mini from where select like a select clause. They've had to be produced a set of rows in a result table the difference here is that you can name the table of results and you can reuse them, say here's an example.

A

So we have the query that we had on the previous page and what we might want to do is write multiple queries that need to identify empty blocks in the program. We can do this by introducing a predicate which identifies the set of empty blocks.

A

This is done by using the predicate keyword, then providing an name for our predicate so that we can reuse it and then a series of variable declarations are similar to me from part of a from where select now the body of the predicate the bit inside here is effectively equivalent to the, where clause that we had before so here.

A

What we can say is that is empty, so it takes a single variable declaration block and what it's going to do is it's going to represent the set of all blocks in the program which are empty now? How do we define that? Well, we need to put this condition in in the body of the predicate, and this condition is the same condition that we had before.

A

We know that this condition defines what it means to be an empty block, we're now just giving this a name, so this predicate will represent the set of blocks in the program that considered to be empty.

A

We can then use this predicate to simplify our query by using it as a logical condition in the where clause, to say that the then part of the if statement must be empty, so you can think of this in in one of two ways you can think of this as the logical conditions induced here effectively inlined, so you can think of this in the logical way, or you can think of this in a sort of set Y. You can think well, let's take the set of all.

A

If statements and let's filter it by the ones which are and where the blocks are empty and if we look at this query well, the same logical conditions are being applied on both sides and we'll get the same set of results for both.

A

So the next feature that we'll look at today is classes so classes. Allow you to define new types in kql and like all types they describe sets of values. So we've already seen two classes in kokomo block and if statement and those are defined in the standard libraries, for example, we could define a new code ql class to represent the set of empty blocks. The way that we define this. So we use the keyword class.

A

We provide a name for our class, in this case empty block, and and then we provide a set of super types in this case block all classes in ql must have at least one super type, and the super types define the initial set of values in our class. In our case, empty block starts with all the values in the block class.

A

However, the class that can only represent the same values set of values as another class is not particularly interesting. We can therefore provide what we call a characteristic predicate, which is this thing here. Just looks a little bit like a constructor then get deceived it is it isn't really but effectively what it allows. You to do is to define some additional conditions that can restrict the set of values further.

A

So in this case, what we're saying is we start off with a set of all blocks in the program and what it means for Locke to be an empty block is well it's the same condition that we had before that. The number of statements in that block equals zero. You notice within here we can use this magic variable called this, and this refers to the to the block that we're starting with and allows us to define logical conditions on the instance here of the of the castle.

A

Using note that have value can belong to more than one of these sets of values, which means it can have more than one type as a simple example here, empty blocks are clearly both blocks and empty blocks.

A

Now so far, this class is actually equivalent to the predicate solution that we saw previously we're in fact supplying the same set of conditions, and we will calculate these same set of values as with the predicate case, in both cases where we're calculating a table effectively. That says here are all the empty blocks in the program.

A

The difference is actually how we use it. So one way in which we can use it is to say, if statement up there, then instance of empty block. This is a logical condition that says we only want those if statements use them part is within the set of empty blocks.

A

Now we can actually use this another way, so the other way that we can use this is by defining a variable, a temporary variable here, and this temporary variable. We can specify the type directly as empty block now when we apply this logical condition here, we're only starting with a set of empty blocks. To begin with, so we're only going to match this logical condition against the empty blocks. So again, this is equivalent to the to the previous queries that we've seen.

A

Okay, I've done more than enough talking now. So let's move on to the hands-on part of the workshop. If you haven't already done so, please follow the link on the slide and get free s code and the ktl extension is set up. If you have any questions, then please do ask on the slack channel, as I said and as a teacher and are all there to help answer your questions.

A

Arthur. Do we have any interesting questions on the slack channel that we want to bring up.

C

And there was one question about the weather: the workshops will be recorded and sent out to do.

A

C

For later, viewing I believe.

A

They are being recorded and I think there was a being shared later so.

C

Grateful thanks for confirming cool all right thanks.

A

B

A

Switch over from the slide deck so.

A

Okay, so what I have here is my vs code window I've got that markdown document here, that's referred to in the in the repository, and you can find it there as well I'll, be following along as we go through here say you don't necessarily need to have it open yourself or that you might find it helpful helpful as we go through.

A

Okay, let's get started all right, so this workshop is on finding an unsafety serialization issue in Apache struts. So serialization is the process of converting in memory objects to text or binary output formats, usually for the purpose of sharing or saving program state. The serious data thing can then be loaded back into memory at a future point through the process of D serialization, so languages such as Java, Python, Ruby, C, sharp D serialization provides the ability to restore not only the primitive data but also complex types, such as library and user-defined classes.

A

So this provides great power and flexibility, but does also introduce a significant attack vector if the DC ization happens on untrusted user data without frustration, so I'm sure you're all familiar with Apache struts. It's a popular open source, MVC framework for creating web applications in Java now in 2017, a researcher from the predecessor of the github security lab and found a CVE CVE 2017 1985, which was an XML D serialization vulnerability in Apache struts, and it was severe enough that it would allow remote code execution now.

A

The problem occur because included as part of the Apache struts framework is the ability to accept requests in multiple different formats or content. Types and Apache status actually provides a pluggable system for doing this through this content type handler interface, and that interface provides an interface method to object.

A

So you can define a new content, type and Eiffel, say xml or json, and by implementing this interface and defining your own to object method which takes data in the in the first parameter in the form of a reader and uses that to populates the target object, which is taken as the second parameter. Now. Typically, this this reader, this input is provided directly from the user without any sort of validation. So it can't really be trusted and typically, the way a Content handler works.

A

Is it's going to take this reader and it's effectively going to deserialize the contents to populate the target object now this shouldn't happen without any kind of validation, because if it happens without any kind of validation, then the untrusted user data could be could be DC Erised in a fairly arbitrary way, and this is what calls this particular CBE.

A

So what we're going to do in this workshop is we're going to write a query to find the CVE in a database. That's specifically built from the known vulnerable version of Apache, struts, okay. So, as I said before, if you haven't already, please follow the setup instructions for Visual Studio code, so you'll need to install the Visual Studio code IDE and then install the code, ql extension for visual studio code. You can install it from the extension panel here.

A

You also need to set up the startle workspace and the reason for this is the starter: works-based has links to the standard libraries. So if you don't set up the start of workspace, you don't get the standard libraries and you aren't able to you to easily write queries once you have the start of web space. You'll need to open that within vs code.

A

You also need to download and unzip the vulnerable database and you'll need to choose this database in cocuwa, using the ctrl shift, P to open the command palette and selecting Kokomo choose database, and then what you need to do is to create a new file in the QL custom queries Java directory called unsafety civilization. So you can see that's what I've done here. I've got this unsafety serialization ql file, and I've got this open underneath here.

A

If you get stuck through this or if you want to look up more documentation later on, there's actually extensive public documentation on how to write KQ up so there's a general landing page for learning, Kyoko QL, there's one specifically for Java and there's a lot of public documentation as well about using the code, QL extension for PS code.

A

Ok, so the way this workshop is going to work is that it's split into several steps, and so it's up to you, you can either write one query per step or you can kind of write a single query that you refine at each step. I'm going to go for the approach of writing a single query, each step. Each question has a hint associated with it and that hint describes some useful classes and predicates in the standard libraries for Java.

A

So you can explore these in your ID, using the autocomplete suggestions and also the jump to definition command. What I'm going to do is, after reading out each of these questions, I'm going to pause and give you the opportunity to answer it for yourself and leave a gap of around two minutes for these first ones and maybe a little bit longer for some of the later ones.

A

What I'll do is I'll actually start a timer in vs code, say time to start two minutes and at the bottom here will give us a timer to work on I. Just say we keep the time. Don't worry if you get left behind we're going to build in a couple of pauses to answer, questions and things, and you know if it's going too fast or too slow you can.

A

You can ask questions on the on the slap channel at the same time now before we get started, I'm just going to show you a couple of features of kind of how to actually use vs code to to write kql. So you can see, we've got this unsafety serialization query file here, we've added the importing at the top for the standard library. What I'll do is I'll just write a simple, simple query here.

A

So let's write a query just to find all the if statements in the program, so you can see here, we get water complete, enter birds arrived from. We can hit control space here to bring up the autocomplete as well. So you can see here it's giving us type suggestions, because the first thing we need to do is define a code QR type.

A

So if statement you can see that it's provided us with the if statement type as an option we can hit enter, we can provide a name for this if statement and we can provide a select if statement again, you can see it's also completed that and we can save that file. You know here you can actually right from where selects, without the way, in fact, only the selectors mandatory here right now, we can run this query, so there are two ways that we could run the screen.

A

One is to right-click in this in this file here and click kqr run query the other way, which I, typically users to open the command palette. Again, that's ctrl shift P in vs code and choose the you run query option here, so you can run this query and you see we get the results back on this right hand, side and the results are numbered and the ones that are elements in the program, the whole locations in the program. These are actually clickable links.

A

So if I click on this first one you can see it takes us to you. The read-only copy of the Apache stretch source code for the line at which this program element exists. So we can see here. This was all the if statements in the program there are eight and a half thousand of them, and if we click on them, it will take us to the library each one of these is defined.

A

So that's how you you browse the results.

A

Okay, so section one so finding XML, P serialization, so extreme is a Java framework which is used for Sir, Isaac and DC, rising Java objects to run from XML, and it's used by Apache struts and what it does is. It provides a method called from XML, which is used for deserializing xml to a java object now by default. The input is not validated in any way. So Xtreme does come with some validation features, but they are not on by default, and so it is vulnerable to remote code execution exports.

A

So in this section, what we'll do is we'll identify some calls to you from XML in the code base. Ok, so question 1 here is to find all method calls in the program. Others I'll just start a timer here and give you a couple of minutes to to test them.

A

Remember if you get stuck, you ask cost questions on the air slack Channel.

A

A

Remember, there's also a hint here, so if you have the markdown doc you're making yourself, you can press this button here and it will show you a hint.

A

B

Hey Luke, this is Jason jumping in we, it seems like we might have lost audio. If you can hear me just letting you just letting you know, yeah.

A

I mean oh yeah,.

B

Glad and clear, okay.

A

Did you did you miss the question on that previous part? We.

B

Did and let's go ahead and repeat that: okay.

A

Let's repeat the question then sorry so question to you here is to update your query to report the method being called by each method. Cool, so are we starting.

A

Okay: let's go through the answer to question team, so we have the query from method, access, cool cool and the question is to update the query, to report the method being called by each method call and if we expand out, the hints in the suggestions are to add a code. Ql variable called method with the type method. The method access has a predicate called get method for returning the method, and the final selection here is to add a where clause.

A

So what we're going to do is, as I suggest we're going to add a variable whose type is Method and whose name of integral method we're then going to add a where clause and in the workhorse we're going to say, cool drop, get method, equals method and then we're going to report that as part of the Select Clause going to run this query.

A

So this is similar to the if statement and MD block or e again, we have two variables here and we're just relating them with their logical conditions, who are saying, find all pairs of method, accesses and methods in the program where the method being cooled by the method. Access is this method variable I'm, just reporting all of these. Now what we can see is we actually have two columns of results where the first column is the method call. So this is the method call before and the second column here is the method being called.

A

So you can see the relationship between these two and, if you look up above then again we have around 68,000 results.

A

Okay, so question three find all calls in the program. Two methods called from eczema- and you can update this query here, just to additional condition.

A

Okay, we'll go through the answer to question three. So if we expand out the hint here, you see the hint says, method get name returns. A string representing the name of the method say the way that we've changed is we're going to have a second condition into the career. Let me say that the method that we've identified as the target of the school, at name of the method equals from.

A

So you can think of this as applying an additional logical condition here, saying only report me in the methods whose name is from XML. Let's run this query.

A

And we can see we now and you get two results so remember from where we start at the top. We had 68 thousand results for that. She only turned them out to the from external method and again we can click in here and in dare and local villa by alright question for the extreme dot from XML method. Dc arises. The first argument: ie the argument at index 0 update your query to report the deserialized argument.

A

As you saw before we before we start that there was one one optimization. We can actually do for this particular particular query here so in general, for the square, we are not actually going to need the method itself. The thing will be interested in is the call, so we could actually just remove the method from the Select clause here.

A

I've removed this method, then, actually the only thing that we're reporting here is the call and not this method variable and actually, if we have a look at what we're doing here, we could actually inline this and so the way that we can inline us is by deleting this method variable and then inlining and of this condition they logically.

A

This is exactly the same apart from the fact that any reporting, the cool, but again you can read this as we read the other logical conditions as saying for a method call get me, the method get the name of that method and assert that it's is called from XML. So this is just a simplification of this query and if we run it again now we can see we still get the same same two results here: okay,.

B

So a question for.

A

Again, the extreme drop from XML method DC arises. The first argument ie the argument to index theory. So let's update this query to report D deserialized argument.

A

And do let us know in the slide channel if that, if the timings here are too long or too short, you want to be more time bit less time. We can treat this as we go through.

A

A

Okay, let's go through the answer, so the hint here is that method called get arguments in tie returns. The argument of the height index and their arguments are expressions in the program which are represented by the kqr class expression for extra. So the session here introduce a new variable to hold the argument expression. So we can do that by saying text for our you can add an additional condition here twice together. The Arg and the method group. So we're going to say is that the Arg is equal to call get argument.

A

0, because the the argument list you see arises that index. There is the first argument of the call we're going to take the Select here and we run this query.

A

You can see that we now have the same two results that we had before. So we now have two cons for the first class X, the from X of L and the second column selects the thing that's being antici realized same in particularly here.

A

Okay, it's a final question in this section record from the presentation that the the predicates allow you to encapsulate logical conditions and every usable format.

A

So this question is about converting your previous query into a predicate which identifies this set of expressions in the program which I'd be see rise directly by from XML, and so we've got a we've got a template here that you can follow, and this actually introduces a new concept in kokyo, which is this exists, so you can think of an exists as a mechanism effectively for introducing temporary variables with a restricted scope and again a bit like predicates themselves. You can think of them as their own mini from where select.

A

So you have a series of variable declarations at the top here and then some conditions in the body that must hold and we're using this exists here, because we actually don't care about the method axis itself outside this predicate. Actually, what we want to get out of it is the argument that is going to be decentralized.

A

Okay. So again, the question is: convert your previous query into a predicate of this form.

A

Okay, so go through the answer here. So, let's start off by copying the template that we provided above and what we're essentially going to do is take the contents of the where clause that was great before we're going to put them in the body of this exists, because the the conditions that we want to apply here effectively, the same I will need to do a little bit of renaming, because we previously called the method acts as cool actually now we're going to call it from XML and say from a set point of view.

A

The way to think about this is that we now have a predicate that names a table of results that represent the set of all expressions in the program that we think are directly deserialized as XML or from eczema.

A

Now, what we're going to do is to update the query itself, so you can actually use this, so we know any need the expression and to change it.

A

So that's like with the example that we saw in the slide deck we're going to refer to this protocol yeah report, the argument we run this query: you can see that we get the two results and again these are the same two results that we've been following through for the last three steps: okay, so that's the end of section one so I'm going to ask Arthur now, if there are any questions of the snap channel and any feedback on timings, whether you want a little bit longer little bit slower yeah Arthur over over to you.

C

Let me unmute first and.

A

C

One question that just came in what's the difference between using expert or arguments in the from in this case.

A

The difference between extra and arg in the from so unless from cause the extra is the type of the variable and arg is the name of the variable. Is that the is that the question? What the difference? Those two aspects, this.

C

Yes, or maybe it's about using get arguments man, it might be. The cookies me didn't type in a method, name, I'm, not really sure. Okay,.

A

C

An argument type actually is.

A

There an argument- oh I, see okay, I, understand the app say. Actually, if we have a look at these predicates here you mouse over that we actually get some context specific help about what this actually does. So you can see the the get argument predicate here gets the argument. This specified zero base position in this method access, and you can also see that it has a signature here. There helps us understand what parameters are the it itself takes and what it might return.

A

So you can see in this case the get argument actually returns, something of type expression, and so that's because in the model of the program.

A

So if you think about what's actually stored in our database, it's effectively a model of the program in a sort of tree structure known as a abstract, syntax tree and AST, and in that tree structure the tree structure represents the sort of hierarchical nesting of the program, and so, if we think about what a method call looks like like this one, so this is going to be represented as a method access in the database and it's going to have children in this tree in this AST tree for the qualifier of the call and for each of the arguments of the call and now those arguments here can actually be any type of expression in the program.

A

So in this sense it doesn't make sense to have a specific QL type for argument, because it doesn't. You know, because this can be any type of expression. So you may as well use the same cue, our class, that we have have defined for that. So hopefully that answers that particular question yeah is there? Is there any feedback on on timings or anything.

C

Not at the moment, um because someone's asking, if there's you and readable documentation outside of visual studio available for all the predicates and libraries, yes,.

A

I, we have a help site helps a more calm, in fact, I think it's referred to at the end of the markdown document, yeah and then there's documentation. In fact, the brackets up here the documentation, links, learning kqr for Java, I, think that has links to the standard library for for Java.

A

So you can see all this QR documentation, so the stuff that comes up on the of the overlay here, that's all published to the public hub site and the learning kgr for Java also has tutorials for using different aspects of the standard library, so dataflow or using the ast notes. So you should be able to find everything that you need there as well as in as well as in VSK.

C

Yeah another question about the pipe symbols in the exists statement.

C

So I think the question is: why can there be two that you can say why exists and then a condition and another condition separated by the bar.

A

Yeah, okay, say I. Think the question is that you could do something like that right, you're allowed to be allowed to do. That. Is that the question yes.

C

I think so is this like.

A

C

There's, why does it exist and what's the difference, if any and what's preferred yes.

A

A well I start off with that kind of exists say exists as a special kind of component in here are called a quantifier such an existential quantifier. So if you've done any logic, formal logic before you kind of recognize, this ecology and the structure of this exists is that it has a series of variable declarations and then some conditions within the exists, and so typically in the simple case, you'll just see us using the standard, logical conditions to combine the formula that we need to specify for the body of the exists.

A

Now, there's also a two-part version of this and the reason there exists. A two-part question of this is because there are other quantifiers, so in particular, there's a quantifier called fooled, which is which talks about again. If you've done any formal logic, that's this terminology would be recognizable which talks about having a series of declarations arranged for those declarations and something that has to be true for for all of them.

A

Now, in this case, that that doesn't particularly make sense, but the exists syntax that uses a bar is there because they exist because they're for all syntax requires the bar and in actual fact for exists. This is entirely equivalent because the way I just described it with the two bars here we have a range and something that needs to be true for at least one thing in the range.

A

Well, if you think about that a little bit, that's actually just the same as combining these two conditions with and now one area where it can be a little bit helpful to use this, as if you have something like this, where you might say, I'll go from XML, don't get argument. Zero, maybe say that the first argument could also be desirous as well. Well then, you could write it like this, but you can see.

A

We've had four brackets here, because the war, because of the precedence of a hand and all so we've got to be clear about kind of which, which ones grouped together and so you're, sometimes seeing qil the use of a bar here to avoid that, because the bar avoids the avoids the question.

A

So how about head answers? The question.

C

Yes, I think it does. Thank you. Is there anything else? There was one advanced question. I was actually asked quite a lot of answers already, but about where the code cure queries can be run on the output of binary tools such as LLVM, so other things in source code.

A

So I think there's there's two answers to that question. So I mean if you think about the fundamental process that we're doing here, which is building a database of facts and the writing queries on a database of facts. Then there's no a theoretical reason why you couldn't do this also on on binaries right, there's, no kind of limitation there.

A

So that's the the kind of theoretical question. The practical question is that we don't support at the moment analyzing analyzing binaries, the only type of binary support that we do have is for C, sharp and for C sharp. We can analyze dll's assemblies that are written in dotnet and have byte codes, so it's called Gaia cell now. What are the reasons that we don't support? This is because it can be tricky to say firstly, I mean they sort of the main purpose for what we do is its static analysis of source code and say it.

B

A

Necessarily been our focus to look at binary analysis, because it's a sort of different area and a different topic and often different use cases, and one of the challenges with binary analysis is so in particularly what we usually focus on is trying to report results to developers so that they can actually fix them. And if you do a binary in analysis, then it's tricky to then correlate that back to some source code that that you can actually report an issue. So this is one reason why, from a pragmatic point of view, we haven't really.

A

We haven't really looked at it now. The C sharp case is actually quite interesting, because what we do for C sharp is we analyze the Assemblies that your source code depends on. So when you do a build of a c-sharp project? Typically, when you're doing the compilation, you depend on some DLL some assemblies and we actually analyze the assemblies and put the byte codes.

A

They represent a database representation of the byte code in the database alongside the database representation of your source code and then what we do is we use that to analyze data flow through your libraries as well as your source code, and so that's one area where the sort of binary analysis or byte code analysis can be really useful for source code analysis. Now, as I say, that's something that we have for c-sharp at the moment. We don't have it for other languages.

A

Java is one where we could potentially take the same approach and I think that's definitely an area of interest, but it's not it's just not one that we've got planned right now. Well, that's a.

B

Good question: thanks.

A

Any other questions I.

C

Don't think so, thank you. Look all.

A

Right, great, okay, hopefully everyone's ready to move on to use section T, so section T, is about finding implementations of the two object method from the content type and are so like predicates classes in kql can be used to encapsulate reusable portions of logic, and so we saw this in the slide. Deck classes represent single sets of values, and importantly, they can also include operations which are known as member predicates that are specific to that set of values.

A

So we've seen numerous instances of this already so we saw if statement and then, if statement drugget, then method, access and method access, don't get methods, method and method. Don't get named. One thing, I didn't show you before actually was the jump to definition, feature that's available in kql. So if you right-click on a type, you can go to definition. You can also you do that using 12, and you can actually see the source code of the class that we're depending on here and so actually all the koku our classes.

A

All the standard library is actually open source. So it's all available in github for /qr type repository, there's a link at the bottom there and you can actually jump to the definition here and you can see which member predicates are defined. So you can see here for method access. You see definitions and things like it's an argument, get arguments the get method, implementation we were looking at before. So that's often helpful when you're kind of exploring the exploring the standard libraries.

A

So what we're going to do? The first question here is to create a code, ql' class called content type and err to find the interface or Apache struts to rest handler content type and so, as a reminder, there's a template here of what what a class looks like in this case we're going to extend the standard library type called ref type. So ref type here stands for reference type, and this is basically the set of things like classes in Java that are referred to by reference.

A

Ok. So the question here is to write this class and to fill in the characteristic predicate here with the appropriate conditions and remember, there's a there's a hint down here. If you get started.

A

I'm going to provide two minutes again for these ones, if you feel like you need a little bit longer, just let us know on the slack channel will they're a little bit.

A

C

A

Go through the answer so again, the question was to write a kql class called content type handler to find this particular interface, and you can use this template and to to fill in. So what we're going to do? A sketch you write this out, so we can say, class content, type, swimmer classes have a main and they have to extend an existing type.

A

This place we're going to extend a ref type as this as classes, interface, type parameters, arrays and so forth, and that would be a valid valid class in and of itself, but only represents. It still only represents this set of all the ref types in the program. So we need to provide a characteristic credit curtain that specifies what conditions make content type handlers special from the rest of the ref types, and the answer here is that they they have a name which it looks like this.

A

And so, if we have a look at the hint, it says, use ref type door, House qualified name which takes two parameters: package, name and class name to identify classes within the given package, name and class name. So we've got a little example here, for that looks like she can say, from ref type are they're harder, has qualified name as a package name and a class name escapes java.lang string, and then you can select our and remember.

A

As we talked about before within the characteristic predicate, you can use the magic variable this to refer to the ref type. Okay, so we're going to say this dot as qualified name and what we're going to do is we're going to copy out this, because in two parts here say one justifies the package hounding the triplane subtly.

A

And there we go so now. What we're saying here is that a content type handler is a ref type which has the qualified name. Org Apache stressed to you rest handler content type handler now it would be nice to be able to test this without having to you to write in your query and in fact we can actually do that. So we can right click either on the class or on the characteristic predicate and choose quick evaluation.

A

So you can you, you can run quick evaluation on predicates on classes and on complete sets of formula, and this is a way that you can very easily start to debug and further understand your your kql. So we've written this class, we want to make sure it's returning the right thing before we continue on with the with the next part of the query. So you can see this has returned one result: it's a content.

A

Type handler has returned the interface content type handler and that's what we expect right, because there's only one definition of this interface in the year in the codebase say the string all right, say: question t create a kql class called content type handler to object for identifying methods called two objects on classes whose direct super types includes. Content type handler play, there's a bit of a mouthful if you're not sure where to start and then be able to go to the hint here.

A

And but if you break it down step by step- and you should be able to write a query for that to start the timer two minutes.

A

A

Okay, we'll go through the answer now, so if we expand out the hint here, then desert, there's a few hints say the first one is used method, get name to identify. The name of the method. Think we've seen that before had to identify whether the method is declared on a class use. Direct super type includes content, type and ER. You will need see, firstly, identify the declaring type of the method using method doc get declaring type.

A

Secondly, identify these super types of that type, using ref type table, get a super type, and thirdly, use instance of to assert that one of the super types is a content type handler.

A

So what we're going to do here is we're going to write out a new class.

A

Object and it's going to be used for identifying methods called two objects, so we are going to extend method.

A

We need to add a characteristic predicate so so far this is pretty much similar to the ref timeline that we have above, except that we're starting for the set of all methods. We want to refine it to only those methods that are called to object and that are on something that implements this content type handler interface, okay, so the first thing we're going to put in is that this look at name because to object.

A

Then the second part that we're going to put in is this aspect here so, firstly, well, we need to know what type this method was declared up. We can say this at declaring type guess the type of which this member is declared now having got the member type. What we want to know is whether that member type so that the the type of this method is declared in, we want to know whether it extends content type handler, because not all the methods then to object in the program, maybe on things that extend content, hi Pamela.

A

So the way we're going to do that is they get a super-tight and.

A

Now, what we want to say is that supertype includes, or the set of super types here includes the content type handler interface. So the way we can do that, but we actually saw this only only a few times you can actually say, instance of and then the content type and.

A

Say just checking in are there any any questions on the side channel? We should go back.

C

Just one came in is there any preference between get name equals a string or has name with a string argument.

A

No I mean I, think that's just more personal preference and I think our I think the question here for to clarify is: you can also do this. The same way like this.

A

There's really no preference I mean under the hood. These are both predicates, so this one is a predicate that has a return type a result tonight. This one is one that takes the parameter under the hood. It's calculating the same thing: it's not really it's more about kind of personal preference as to what makes more sense in terms of the or what what reads better to you for simplicity, I've, been using the same one, all the way through so I, making sure I use this difficut name equals. So I'll continue to do that.

A

But you know you're entirely. Free to use has name as well.

C

Thank you, I, don't see, there's any questions and all as well as other people are typing through it like. Would you like to wait a sec or um let me yeah.

A

Let's give it: let's give it a few seconds and see what the anything else can see. It.

C

Was also questioned earlier, maybe I can just is it possible to analyze dependencies as dependencies to identify coat or color? This has a specific dependency things like that. Yes,.

A

So we did this for for some of our languages and say there were kind of two ways that we can look at dependency. So, as I said before, for compiled languages like C, sharp or Java, when you compile the code, you have to provide a reference to the libraries that you're actually compiling against, and so when we see that compilation to kind of populate our database the facts about the program, we observe that compilation.

A

We look at the arguments to the compiler and we see that you've passed in these particular dependencies, and so when we build our database of facts, we store a reference to the fact that you depend on on these particular particular binaries. Now it's not always a straightforward process to go back from those binaries to find out what particular version they were it's possible for some languages for other languages like JavaScript or Python.

A

We depend a little bit more on package references in the repository itself, so in those cases we can actually build those kind of package references into the database itself and we can potentially query those as well now. This is kind of an interesting question because it sort of hits on the vulnerable dependencies topic and sager hub actually has a native feature that looks for vulnerable dependencies in your source code. It doesn't use code key.

A

Well, it's just looking at your your definitions, things like your maven pom files or your packages for python, or you know your NPN theorem, yeah modules dependencies for JavaScript and that sort of a lightweight solution, and what I can do then is match that against a database of known run, four different library versions had it reports that within get out of itself. So that does a feature that's already and github. You can already use that today, based on open source code and private code as well. Now the value the King kqr can bring.

A

On top of that is slightly different. So, as I said, you can have the kind of dependencies in the database.

A

What QR can also bring is a knowledge of which aspects of those libraries you use. So one problem with the sort of typical, vulnerable dependency analysis today is it'll, say you're depending on a vulnerable version of library X, but we don't know if you're using it in a way that actually makes you vulnerable.

A

So, for example, if there's just one API called in this library you're using that there is known to be vulnerable, that's why say a CV was raised on it. You know the the kind of vulnerable dependency analysis that doesn't use. Kqr doesn't know that. So one interesting aspect here is whether you can use kql to help identify unsafe uses of vulnerable libraries, and there are actually a couple of examples of public queries. We have that they use that so I think the JavaScript play sure approaches like pollution query.

A

If you want to look the help afterwards, there's an example where we we look at both in dependencies of your code and what your source code does for that dependency in order to find whether you're you're using an explodable way. I look at question.

C

Thank you another question that came in when you started talking so their code, curl, Tokyo's, feline database creation. Does it it requires a billable code. Would like partial code also work? Yes,.

A

A it requires buildable code, and one reason for this is because for compiled, languages is generally better to observe what the typical bill does for that project and what we do actually, in fact, is you provide us with the bill commands we've run your regular build and we just walked out for whenever the compilers and then every time the compiler is called, we've had to be mimic what the compiler does to populate our database of facts now.

A

The reason that we do this is because for compiled languages, it is often important for accuracy to actually see what the bill does, and so there are numerous examples of you know the bill generating files as it goes along, which help tie the project together. Bills using you know, multiple different versions of dependencies, so you don't know which version is going to be used until the bed actually occurs.

A

These kind of aspects are only really revealed kind of when you observe a build, and so for that reason for buildable languages, we we want to build command in order to be able to do a fortune. This means that we can't build partial projects. So if you can compile something and that we can build it, so if you can say compile one component of a project, then you can build a database for it. But if you can't compile it, then yeah unfortunate, you can't build a build.

A

A database for those ones and I should note as well. This is particularly important for it's quite important for Java and c-sharp is very important for c and c++ like it is extremely difficult to do any second, any kind of sensible source code analysis on c and c++ without running a build, because C and C++ files often use the preprocessor which effectively arbitrary arbitrarily rewrites the files at Build time and say for C and C++ yeah.

A

It's absolutely critical that you have a build in place in order to get an accurate picture of how the project fits together. There's.

B

Another good question.

C

A

C

You're welcome. Another question is: are there any plans for supporting functional language such as closure or other legs that targeted JVM, I? Guess yeah.

A

So at the moment we don't have any other languages on the right map, but you know we. We are sort of continually evaluating the kind of set of languages that we support and the sort of considerations there are typically popularity of the language. Also, the rating change of the popularity of the language you know is the language becoming more popular or less popular and also obviously commercial decisions, as well as to kind of what our what our customers are interested in, and so this is definitely an area that we value feedback from the community.

A

So if there are particular languages, you'd like to see us support, let us know- and we kind of feed, that into the decision making process as well.

C

C

One question is the the Cure languages, it's open source.

A

Say the language itself is not open source. So there's a so I should say: there's a language specification that we published openly the implementation of the language, the compiler and, in particular the query optimization engine that is all closed source. The aspect which is open source is all the queries themselves, so the queries themselves for all of our languages, open source they're in the github kql repository and part of the reason that we do that. So the motivation here really is that we think the way to you know best help the world secure.

A

Their code is through community power security, and so one of our motivations here is to really allow security, researchers and I'm sure there are many security researchers on this call who you have you know deep technical knowledge of finding security vulnerabilities to be able to encode, at least some of that knowledge as these queries and then to be able to share them with the community via open sourcing of the queries, and we think that over time this is we're going to become an extremely effective way to really quickly deal with new and emerging security, vulnerabilities and so I mean we had a good example of this through one of our customers actually Microsoft say a couple of years ago.

A

You might remember the Zips that vulnerability. That was well at least labeled by snick, and they found a bunch of cases in open source paper and some software and at the time and still at the moment, Microsoft was one of our biggest customers and one of our biggest users of kql, and they saw this zips that vulnerability, which is essentially had tainted path from ability, and they they saw that revealed and the very next day in about twenty minutes.

A

They reg aquarii, to find to find instances of zips that in c-sharp code and the day after that, they don't run out on dozens of their kind of largest code bases and the month after that, they then released the query to you to be open, sourced and says: there's a kind of a nice pattern here, of, firstly being able to rapidly rapidly identifying new and emerging security vulnerabilities and, secondly, contributing those those checks back to the community, so that everybody benefits from all right. Thanks.

A

Ok, let's, let's move on to the next question, and they were the next exercise on here and we'll we can come back to more questions like to run in the process. Alright so question three here and two object methods should consider the first parameter as untrusted input. Writing query to find the first ie index, 0 parameter for to object methods, and so the way I suggest you do this is we've been kind of following on on what I've been doing. Is.

A

Let's just comment out this query for now and write on you query underneath here and I'll: stop the timer for two minutes.

A

A

Okay, let's gave me the answer here say so. Writing the question to object method should consider the first parameter as untrusted user input, so we just want to write a query to find the first parameter for two object methods. So if we look at the hints, it says you use method, get parameter to get the I to index parameter and create a query with a single click. Your variable of type content type handler to object. So let's do that?

A

We're going to call this variable method and we're going to slash the object, method and object method get parameter. We want the zeroth parameter, so this is going to return the places in the program. The parameters in the program that we think are taking in untrusted user input. You can see that there are eight of them here. So this is this.

B

A

Eight implementations here in in the special Apache struts that implement the two object methods. So you can see here we have like HTML handler object method as empty JSON Lib handler. So this is obviously doing some JSON processing. We have multi-part form which doesn't do any hand. You two objects extreme, which is hunting XML and a couple of test ones here as well. So have you seem to be picking right things and again, if we have a look at what's happening here, look at the JSON one.

A

The second column here is representing the parameter that we think is taking in untrusted user input.

A

Alright, so that's the end of section two: are there any any more questions that we want to cover before we view under Section three.

A

Well assume not if Arthur isn't amusing, so, let's move on all right, say: section three unsafe, XML, D serialization, so we've now identified places in the program that received the untrusted data and the places in the program that perform potentially unsafe acts and LD civilization. What we want to do is really combine those together now and ask the question: does this untrusted data have a flow to the potentially unsafe XML deserialization cool now program analysis? We typically call this a data flow problem. Data flow helps us answer.

A

Questions like does this expression ever hold a value that originates from a particular other place in the program.

A

Now the way to think about data flow problems is to visualize them as one of finding paths through a directly graph, and this directed graph has nodes which are elements in the program and edges that represent the flow of data between those elements in the program and then, once you have this graph, you know if there if a path exists in this graph, the data flows between those two nodes, so I've got a little example here. Suppose this so consider this example Java method.

A

So it's a function, it returns an integer and it takes a parameter that we've called painted and it's an integer, and you can see that this value effectively propagates through this methods in different ways. So, firstly, this parameter is accessed and the change parameters, access and stored in X and depending on this value of some condition, X, is either restored in a variable called Y and then used in a call, foo or X is just returned, and in this final case we return -1.

A

Now, if we want to think about what the data flow graph for this method will look like well, it'd be something like this say: we will have a data frame node. That represents the parameter and that's going to be called painted. This represents this piece of source code here and we also have a data flow node representing the access of that parameter here, which is going to be an expression.

A

This is a parameter, access expression and, as you can see from the graph, we have a dependency between them a flow between them, and so there is a made for this element and now for this element and an edge between those two elements to show the data flows between the two.

A

Now, of course, this parameter is restored in the variable X and then X is then used in one of two ways: either is used to you reassigned to Y, or it's used in this return. Compare well what does that mean from the data flow graph? Well, it means there's going to be two nodes in the data flow graph want to represent the access of the X here and once represent the access of the X there, and now you can see this is represented by these two nodes here.

A

They both have edges from this access tainted access here, because data flows from here to both of these cases and then finally, this X is restored in variable Y and it's accessed in corfu.

A

In the argument to call fee which we know it's an expression- and so there's another expression- node here- Y- to represent this bit of the program that again has data flow from her this assignment of X into the variable Y here. So this is the data flow graph that we'll be analysing for something like this, and so the thing to bear in mind here is that this is all about flow through this through this graph.

A

If there's a path through this graph from say tainted to UI here, then data flows between the two now kql for Java provides data flow analysis as part of its standard library. So you can import it using Cemil code java data flow data flow and the library models nodes using the data flow node kql class. Now this data flow graph is separate and distinct from the ast that we talked about before so you'll have effectively in the database. One tree that represents the basic structure of the program and the dataframe 8 is a dataframe.

A

Waves are a separate representation and the reason for this is we want to provide some flexibility and how data flow is modeled. So we don't want to be tied to only having data flame nodes for four things they appear in the ast, and so there are a small number of data phenotypes, so the ones that most common, the ones we saw above are the expression, maids and the parameter names. But there are other types as well, so things like definitions and so forth.

A

So in this section what we'll do is we will create a dataflow query by populating this particular template to find flow from the untrusted user input to this XML D civilization.

A

Now what I'm actually going to do is I'm going to actually write out this data flow template from scratch in my in my query, yet to explain kind of what each of these these features are for. You feel free to to copy the template from from the markdown file once we finished going through the explanation, all right, so the first thing I'm going to do is add a new import to our query or ml code dot. Java data for later flow and this imports the data flow library so that we can use it now.

A

The way this the state of flow library actually works and.

A

Is that you have to define what we call a data flow configuration and the data flow configuration specifies what these sources are for this data flow problem and what the sinks are. In other words, what are the things that we want to find flow from and floating now? The reason that we have a configuration for this is because what we're doing here is global data for winter interprocedural data flow, so we're looking across the the whole program as we see in the database and we're actually finding flow across method boundaries across file boundaries.

A

You know across the whole program now because of these scared of this, it's not possible up front to kind of build a table saying this is how data flows from every node in the program to every other node in the program. So when we're defining a configuration here is to specify the conditions that make up the source of the sink so that we can calculate this data flow on a more restricted set of data.

A

Now we actually have a counterpoint to this global data flow, called the local data flow and that we can actually compute for the whole program, because we just restrict it to the data flow dependencies within a single single method and that doesn't need a configuration that has a simple predicate. The cool that says is their local, flavor, tween, X and Y, for example, okay.

A

So what we're going to do for this particular problem is define a configuration, we're going to call it struts unsafe these serialization convict, and it's going to extend this library class called data flow duration.

A

And so we should note that this is a special kind of kql class. We call it a configuration class and its purpose is slightly different to the to the classes that we've seen before, so it's actually designed as a wrapper a container for a series of related predicates. So in this case a predicate for defining sources and a predicate for defining sinks. By defining this within a single class, it means that we can tie the source and sink definitions together now.

A

The way these configuration classes work is that at the moment they extend string, and so you have to define a unique string in the characteristics predicate for this, particularly.

A

A

Line okay say we have a. We have a name for this now now, there's actually two predicates that we need to implement to specify what the sources and sinks are. If we actually jump to definition on the configuration we can see what they are, so we want a is source predicate and an is sink predicate.

A

So, let's copy those out, let's paste them into here.

A

Abstract because they're things that we need to implement so we're going to specify that these are overriding predators.

A

Okay, and so what we're going to need to do is to fill this in in order to specify what the source is for. Our problem is I'm going to need to fill this in in order to specify what the sinks for our problem are. So this is going to be our first question for me when we get to the question, sir, before we do that, I want to see what this actually looks like in a query. So the way this works in a query is you specify the configuration you're interested in now?

A

We can step out these so you'll notice. This is complaining because we haven't got any any conditions in the body here now. If we don't want to apply any conditions at all, we can actually use a shortcut down. Let's just do that for the program.

A

Now we can see struts, unsafety, siriusian, config appears and you're to complete and to use that and then typically in retrospect, the configuration you're interested in and you're going to have a variable for each of the source and sink and then Li, where you're going to say config has flow was sink say this is applying a magical. In addition, say this is part of this particular configuration.

A

Yeah, this is part of the particular configuration and the flow. This will effectively calculates a table of sources and sinks. So these are things that had here to the source and sink predicates above which are connected directly or indirectly, by data free, okay, we're going to say, select.

A

A

And we are going to specify a message hearing so Penn State House.

A

Forrester unsafety serialization.

B

A

A

B

A

Then this is the template here, okay, so the final two questions here and we will do them- do them both together here so is to complete the source and is sink predicates.

A

Using the queries that you write for section, 2 and section 1, so I will just start a timer there.

A

A

Okay, say: let's go through the is source say the hint here is that you can translate from a query course to a predicate by converting the variable declarations. The front part to the variable declarations of them exist, placing the where clause conditions, if any in the body of the exists and adding a condition which equates the select to one of the parameters of the predicate and remember to include the content type and our two object class you defined out here, so we've still got the the predicate.

A

Some classes we defined earlier here and what we're going to do is we're going to take this query that we wrote before and we're going to put it into the is source very good here.

A

Alright, so as it says, the way that we convert, this is to say change the from to an exists, a exists content type handler to object.

A

It says to place the where clause conditions, if any in the body of the exists, we don't have any work or conditions for this one and to add a condition which equates to select to one of the parameters of the predicate. So what we're going to say is that the source is equal to the to object, method, get', parameter 0.

A

Now we can't actually say that directly because, as I mentioned before, dataframe nodes are not the same as ast nodes, so you can see here. This is even highlighted here. Node is incompatible with parameter now, alias actually works as the dataframe nodes have a converter on them, which allows you to get the equivalent ast node for your data frame 8. In this case, this is called as parameter.

A

Okay, now we have the source to find.

A

What we'll do is just give you one more minute, so you complete the is sink and then we'll go through the isn't.

A

A

Okay, let's go through the sink cancer that they, the hint here is just to complete the same process as above so again, let's copy this Korean.

A

What we're going to do is before we're going to convert this to any lists.

A

We're going to say is this Russian part, as I said above that's just around the hint a because we're going to convert the where clause just to a condition in the body of the exists and we're going to assign a parameter of the predicate the to dis lect. So in this case, so before we use that as parameter because it was a parameter in this case, it's actually an expression that we're interested in here, because the the argument to the from XML call is an expression.

A

So you say, sync has expression, equals R and and save that here, okay, so what we can do at this point, so we now have actually have a completed crew that we could use. What we can do at this point. Just to validate that we have the right things is we can click on? These is source and is sync predicates run this query and we can just validate that. We have the results that we expect so here we can see those eight in parameters run here.

A

And we should have the the two parameters here that are in the in the from XML. Oh sorry, the two arguments, two from XML and now, if we run this query.

A

We should actually be able to find the sink of the unsafe xml d civilization. You can see, we've got one result and actually, if we click into this, this is in fact the the CV that that our colleagues and their the github security lab team actually report. It so is taking an untrusted user input in that has the reader in, and it's passing it straight to this extreme dot from XML, and it's doing nothing to you to protect this in any way.

A

So Xtreme itself has some some options that you can set before calling from XML T prevents sort of arbitrary to civilization. Well, none of those actually used here at all and say this allowed that she remain codex arbitrary code execution now, in this case, it's kind of easy to verify that this is a. This is a true positive, an interesting case, because the parameter and the which is the source and the argument which is the sink just are both in the same method. Now this is often not the case for for for security vulnerabilities.

A

Often they are spread out across the program and there may be dozens of steps that go from their source to the sink. So how can we improve our query to actually report by the very final part of this workshop is going to be converting this to what we call a path problem query.

A

Now there are five parts to key traversing inquiry to a path problem query. So the first is converting the app point from problem to path problem. Save the query, isn't a thing we've seen this in wallet occasions already can have some metadata at the top, and one of the pieces of metadata is a kind which can be a problem which can be a problem or path problem. They for a problem. Query will get just the result that we saw before and slightly slightly with slightly improved formatting.

A

And you can see we get this. This one goes on.

A

What we want to do is to convert this out kind to from problem to path problem, and this tells the cake you are too lame to interpret the results of this careers path results. The second thing that we'll need to do is add a new import data, fide path graph, which reports the path data alongside the query results.

A

The third thing we'll need to do is change the source and sink variables from data flow node to date, flow path, mate to ensure the nodes, remain, retain path, information and then the fourth step will be used, has flow path instead of has flow, and the fifth step is to change the select to report the source and sink as the second and third columns.

A

The tool Train combines this data with the path information from the import of path graph to build the paths, so we are close to running out of time here say I'm going to leave this as an exercise you can complete on your own. The solution is, is available here under the expansion, and so we have a bit more information at the bottom.

A

Here say: if you want to find out a bit more about how this permeability was originally identified and there's actually a blog post on the github security lab page, they explains how we found it, and although today we have created a query from scratch to find this problem, it can actually also be found with one of our default security queries such as unsafety serialization docume.

A

You can actually see this working on a vulnerable copy of apache struts, so this is the one that we actually built the database from. So you can see the plain of this won't get up here and it's been analyzed on LG TM comm, which is our free, open source analysis platform. So, if you click on this link here, it will take you to the result in extreme panda on the actual vulnerable version of the code as found by the out-of-the-box query.

A

Now, there's a wealth of information that you can you can move on to after this, so there's a few tutorials available for java, in particular the one on analyzing data flow. It's probably an interesting one for up next whip. After this, we actually have full kokyo training courses for Java available for free online they're, all in the format of a slide deck that you can fry three with, and you could also, as I mentioned at the start, entered the kqr Java capture-the-flag challenge that we've launched for satellite.

A

That's on the github, a security lab website, and you have a chance to win a prize. So you can put your newfound kql java kql skills to use. We also have some older capture-the-flag challenges there as well. We have ones for C and C++ or for JavaScript that you can try out no prizes for those ones anymore, they've finished, but they're good for learning. Anyway.

A

We also have a code qo+ on github learning lab and if you want to find out more about how you can find vulnerabilities using kql, then the github security that research bug is a really good place for security researchers to go to understand how the github security lab actually uses code key well on a daily basis to find new and interesting security vulnerabilities and, of course, as I mentioned numerous times already. All of the code.

A

You are queries and all the libraries are open source, so you can click this link and visit those, and just as importantly, if you write a new query and you want to contribute it back to the community, you can do that as well and we have a contributing guide here that you can. You can follow to do that yeah, and so thank you very much for for listening. I.

A

Don't think we quite have time to answer any more questions on on the audio channel, but we'll be on the slack channel for for a bit longer yeah. So if you have any burning questions, please you know please answer them there and we'll all be there to be able to to ask your questions. So thank you very much for turning up. Thank you very much to my it's, my helpers and to the team that set this all up and yeah. Hopefully, hopefully, talk to you seeing about kql Thanks.