GitLab Tanuki Tech, 14 Dec 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: TT230: GitLab Manage, Monitor, and Protect

Description

This is a Tanuki Tech session on 12/14/2020.

For more on Tanuki Tech, see here: https://about.gitlab.com/handbook/marketing/revenue-marketing/sdr/tanuki-tech/

For more on the speaker, see here: https://www.linkedin.com/in/christopher-wang-0835b226/

A

Do it welcome to today's session? This uh completes our get lab story. um What we're talking about is the last three stages in gitlab, so we've been just powering through them. The goal of today's session is ultimately to have better customer conversations by talking about, in particular, the manage modern defense stages and ultimately understand what are our customers. Why would they care about this um and what we're trying to do? Why did we add it in the first place right?

A

Ultimately, this is part of what our customers are buying and so being able to articulate, explain it. It's just part of what they're getting, and maybe you can entice them more, so we're just finishing and rounding out that story today.

A

So last time in gitlab, we talk about these things all the time, but developers to create merge requests after that, all of our ci runs after that it really depends on the type of application for certain applications, we'll create a new installer for you and we'll store that on the gitlab server itself.

A

If you have a different type of application, you can update your servers right and so we're talking about. Is cases number two and three from our last class either upgrading new servers are updating existing servers and then now answering the question of now. What obviously there's three product categories left right, so the story continues and that's what we're talking about today.

A

So what's really left when we're talking about like a devops platform or one tool for the entire devops life cycle. Well, there's still a couple of things that we need to add in and really what we're talking about here with these three stages is more about, like business level, understanding of what's happening in my organization, so as an organizational leader having metrics to assess the health of my organization, something that every vp every c level they care about. You know how fast is my code coming along? Am I efficient? What are my bottlenecks?

A

That's what we're talking about here so almost like. If you think about an airplane pilot flying through the atmosphere at 30 000 feet, then um you know they have to have lots of gauges and what are your gauges for your engineering organization? That's what we're talking about for this first one question number two: as an engineering organization, I need to monitor my environment to proactively mitigate risks. As uh anyone, that's read the new york times or the wall street journal yesterday. They know that the u.s commerce department actually got hacked.

A

It seems like it was a pretty major thing. We don't know what information they were able to get out, but worst case scenario. They learned how to print us dollars. That'd be a big problem right, so just having uh metrics for security um and also performance, so we're talking about is like do I have enough servers or my server is fast enough. When do I need to buy new servers?

A

So that's what we're talking about so I sort of conflated number two and three but long story short- is that all of these higher level, organizational metrics of how healthy is my organization? My fast am I efficient. Am I proactively getting the right equipment.

B

A

I practically making sure that everything's secure that's what we're talking about today and these are the manager monitor and defense stages. So what have we talked about so far? um Well, we've talked about the previous six stages. These were covered in our other two level classes, and what we are doing out today is we're just finishing this story: the manage, monitor and defend stages.

A

So, let's jump right in so let's talk about the manage stage. First and really what this is all about is giving vps c-level even I'd, say senior director visibility in their organization. Certain questions that it answers is: how quickly am I developing code like? Are we fast? How are we compared to other groups? How are we compared to my peers and my competition right? We need to have some sort of metrics for this. Otherwise it's just all like good. You know someone will say we're doing good.

A

Well, how good um other questions that we answer is how often are problems surfacing? What is the quality of my applications? Is the quality good? There needs to be some sort of metric for that. How does it improve in time? Is it 20 better? Is it 50 better? We give you metrics for this.

A

Other things are metadata on our projects. This helps with hiring and things like that um and ultimately, what are the bottlenecks in my development process? One of the themes that I'll talk about in this class is the danger of relying on word of mouth. The danger of relying on word of mouth is that if I'm some bp, I have eight direct reports and this happens across every team. Every organization is that there's one person, that's really good at talking is really great.

A

Politically, doesn't actually do that good of a job but knows how to work the right channels and what happens when you promote someone like this is that everyone else who's actually doing the work they get mad right. They feel like it's unfair, it's bad for culture, it's bad for morale and you're, not like promoting the right people so trying to figure out what are the bottlenecks? What are the people that actually are doing a great job? That's what we're talking about here. Ultimately, metrics are needed for leaders to accurately resolve organizational challenges.

A

Without them, you don't have a compass, you don't have gauges, you don't have a roadmap. The reason why every single marketing, like all those todd, calls that we have what's the first thing that we do is that we go through the metrics which ones are going well, which ones aren't going well. So that's what we're talking about in the manage stage, so I have a bunch of uh let's actually look at the product itself.

A

This is the first chart that I want to display and really what this talks about is code velocity. How efficient is my engineering organization? Is it increasing in time? Is it decreasing during the summer months, because maybe people are less motivated and if it is as an organizational leader, what sort of things can I do in my organization to increase motivation? Then right?

A

Are there spikes all the way at the end of the release? So let's just say that there's a september release and that this is a huge spike. Well, that means that co developers are usually either. I now know understand that a they're under a lot of pressure b, they're just trying to get stuff in and see all that stuff, because it's getting shoved in all at once is probably not the highest quality right. So knowing how to interpret these charts allows me to have better leadership decisions across my organization.

A

So once again, this is what we call throw put it's the number of merge requests that are merged at every given month and then what I'm looking for is. Do I have any unexpected dips. Is that morale at the end of a release? Do I have a huge spike and that usually indicates that's a red flag to me? That means that people are just pushing their stuff through, because there's a deadline, and usually I'm going to pay for it in the long run.

A

Other things to take a look at is what we're really talking about is the idea of iteration here so in continuous integration continuous deployment, the whole idea is to iterate so merge in small changes.

A

All the time do not have big changes, thousand line code change right and then so you need some sort of metric of figuring out. Am I actually implementing ci cd and that's what this line change metric tells you well.

A

If all of my changes are plus one line, plus 23 lines, plus 48 lines, plus one lines plus 18 lines, then we're doing a great job in gitlab in general, from our organizational metrics, we are doing a great job with ci cd, so I'm seeing a lot of merge requests that are plus 600 lines, plus 300 lines, um and there. The good thing on our behalf is that there aren't very many of these right. Then that's not as good as an organizational leader.

A

If I really want to develop and implement the best practice of ci cd, I have to make sure that I want to see like 18 line changes, 68 line changes, and that basically tells me I am on track into implementing ci cd and getting excuse me all the business value benefit out of cicd.

A

So this next chart that I'm talking about this is more for the manager, senior manager level, but it talks about all of that outstanding, merge, requests right and what it's telling me is the amount of time that it took to review these merge requests. What I would do, if I were a manager, is I'm in charge of maybe one to two teams. I'm gonna go through every couple of days. Look at all the merge requests, look at the review time and then what I do is the ones that are taking too long. Okay.

A

This is way too long right. This is over two years. If I scroll down, then like a typical organization's, not gonna, see merge, requests that are that long. But what I'm gonna do is I'm gonna, say: okay, the ones that are too old. We we have to figure out like what happened. Was it that someone left a company, and so there's this outstanding merge request?

A

Well, we gotta get someone else to like pick up slack and to you know, get things in the other thing that happens is if there's a large review time is that developers are fighting on the best implementation, and that happens all the time across organizations, and so now, if almost like, a fight is breaking out and people are arguing back and forth, then as an organizational leader like a manager level, I have to come in and do my part to make sure that work gets done right. Is it getting nasty?

A

How can we just make this about the business and get our code in right? So what we don't want to see is we don't want to see high review time? The merge requests that have high review time. I would go in as a manager and see how I could help that's my goal. I would do that multiple times a week.

A

Here's issue analytics um this really talks about your application quality. So, as you can see, let's just say that gitlab the number of customer issues that are coming in for our product. It starts off, it's not very high and it keeps on creeping up. It keeps on creeping up well, this tells me either a more customers are using our product.

A

Our b, the product of the quality of our product, is decreasing, so in our so it's a sort of like there's two variables that are moving at the same time, but if for a different organization, let's just say that your customer adoption is about level, then what we have is we have a major release in april, and then we have a ton of issues that come in may june july. Well, we now know from a graphical perspective.

A

Well what we did is we rushed through code in april and we're paying for it in may june july. So now, as an organizational leader, I have a business level decision to make either a I'm okay with this. Okay, with hiring a lot of support engineers to figure this stuff out or b. Maybe it makes more sense to not release so early, get things more, solid, more tested, more vetted and not pay for it for the next three months. This is like a organizational level decision.

A

Some people there may be a business case for releasing early right. It could be that you have a major conference coming up and you just need to have something to talk about for your conference. That happens all the time, but this is what we're talking about here with time series data on our issues, and you can see over here all the issues that are coming in and ultimately this is another dashboard for your organization over here we can see the sever the severity of all of our issues coming in so uh a priority.

A

One is like a really big big big deal like our gitlab server does not turn on. Our entire organization is like stuck because our gitlab server doesn't turn on and we have like an ultimate license right. So these are the ones where it's like a support. Engineer go look at the priority number ones. First, I'm gonna assign my senior principal support engineers for these sev ones, right and so now, let's take a look at this undefined category.

A

What undefined means is that, generally speaking, a customer comes in makes a support ticket and then our engineers get to decide. Is it priority? One? Is it party two? Is it priority? Three? Is it priority four? So your undefined support tickets is really your backlog. They haven't been triaged yet and, as you can see, the number of undefined support tickets is creeping up and up and up and for gitlab, and so now you have a bunch of questions like one is this? Okay, are we is this?

A

What we is this something that, as an organizational leader, we are okay with, or maybe I need to hire more support engineers right to get this like untriaged issue, count lower right and then so more our metrics give us visibility into okay. Let's just say that we hire a bunch. I want to go see now this decrease january february march- and this is my graphical way of finding out- is the business decision that I made of hiring these nurses more support engineers. Is it paying off for my organization in time right?

A

So this is the dashboard to look at for that. um One last chart which developers to hire right so, generally speaking, all of your managers are going to tell you. I want more head count. I want more head count. Well, not everyone gets head count right. So if there's one team that does all the ruby development there's another team that does all the javascript development well, I need to have some metrics to figure out how many developers I actually need.

A

Otherwise it can get really out of line right, and so one of my gauges, for that is what percent of all the code in my organization is ruby well over here, it's around 70.

A

So that means that around 70 of my developers should be ruby developers or javascript it's around 20 right, and so what I'm looking for here is an exact relationship between my developer percentile and what I have in this chart. But what I am looking for is that there's not a big discrepancy. If there's a big discrepancy, then that raises a flag to me as a vp level. It's like there's something that's out of line here, right so being off: 10 20, that's, okay, being off 30!

A

40 percent- that's gonna, raise some flags or just sort of like. Why are we so out of line here, um so I know that they just went through a bunch of charts. Let's just go through them. One at a time thorough, put how efficient is my development code review? Can I help with any of these outstanding merge requests or my developers fighting issues pipe month?

A

um This can tell me if I'm having culture I like um if morale may be low, if I maybe need to spice things up during the summer months or something, it also tells me like, what's the quality of my releases right? So if I have a release in june and then a spike in july, then that's a problem right, so we're talking to a certain extent it's normal, but if it's too high, then we're just releasing bad code right and our customers don't like it because it doesn't work.

A

One is how many what how? What? How are my customer support tickets trending in time right? Do I need to make new hiring to alleviate this this last one is: what sort of developers should I hire and how many of them at what percent? So these are just like how a plane flies, and I have to have a lot of gauges for this as a vpc level. The this is some of the gauges that I have for my organization, and so, let's actually just talk about this a little bit.

A

The only reason why we are able to give you so many great business level. Organizational metrics is because git lab is one tool for the entire devops life cycle. So if you had some of your cian github, some of your ci and jenkins, some of you're deploying something else, some of your security stuff and someone else, it would be much much more difficult to get some of these metrics just because all each of those tools has a piece of that story, because we collect metrics on your entire organization from end to end.

A

That's the reason why we are uniquely positioned to give you organizational level metrics. We are better than this, and a lot of other people are simply because of the fact that we are collecting metrics in your entire development life cycle. Otherwise, I would have to write some integration, make some chart, integrate it into jenkins and github, and something else to create one of these like composite charts, that's just a mess. No one wants to do that.

A

Can I help clarify anything about uh who we just talked about before we jump into the monitor stage.

A

All right pretty clear for everyone.

C

Cool, I just want to add something chris. um It's really interesting to see how you use the rest of the analytics, but actually the one that I use the most to showcase. What they just said is the value stream.

B

C

Part of what we were seeing or is another thing.

A

Yeah, it's the last slide that I mentioned, and I actually forgot about to mention it over here demo organizational bottlenecks, so um I need to go pull this up real quick. This is uh thanks for bringing it up bruno yeah. So let's talk about value stream, analytics real quick. This is something that we aren't maturing, but in all honesty, if I were a vp rc level, this is like one of the reasons why I buy git lab and we talked earlier about the danger of relying on word of mouth.

A

The danger of relying on word of mouth is because, if I'm a sea level, I have so much stuff going on. I really like I'm not in the weeds of any individual like group, because that's why I delegate down right and so a lot of those people they do rely on word of mouth and obviously the problem with relying on word of mouth is the sample case that we just talked about so value stream analytics can be set up to basically, whatever you want it to be.

A

um In our particular instance, we set it up so that amount of time that it's in an issue before someone picks it up after someone has assigned themselves to it, then the amount of time before, like they start coding. This is the amount of time that it takes for someone to code it at the end of it. A merge request is created. That's the event that triggers the next phase test, this amount of time that it takes the test phase of our seattle run, which is really quick.

A

That's part, the reason why is because gitlab is great at this right, the amount of time that it takes for a merge request to get reviewed and finally emerge seven days staging. We don't collect data on this, but let's just say that your environment's different, let's just say that your environment has um three stage categories, because sometimes you deploy the cloud. Sometimes you deploy to on-prem. Sometimes you deploy to like uh you're you're you're like a test environment right, so you could have stage one stage two stage: three um and you can.

A

This is completely customizable, but at a high level, what we are talking about with value stream analytics is collecting metrics on each of the phases in your development life cycle. We just saw that how we do that here with git lab so and how this helps with decreasing word of mouth is: let's just say that I'm an engineering vp I get a director that's in charge of um of the testing. I got a director that's in charge of development.

A

I got some project managers that are in charge of uh like creating issues and then like assigning them out right and they're. All telling me I'm doing such a great job right and the fact of the matter is that, like all right? Well, we just missed the last three releases, everyone's blaming each other. So I now need to have some metrics to actually understand what's going on in my organization and that's what value stream analytics go.

A

So what a lot of people do is they'll assign like almost one business unit to each of these categories right, so your developers fall into that code. um So if your group of developers fall under this category, your test, qe people fall into this category so on and so forth.

A

Your project managers they fall under this category and then now I have metrics on which phase in my development cycle is my bottlenecks and which ones are my relative strengths, so that I can have more of a metrics based performance conversation when people are asking for promotions, that's we're talking about with value stream analytics thanks for bringing it up for now.

A

C

Thank you chris, yes, like for the first questions that you mentioned in the first slide like how fast we develop and all that, apparently, that's my go-to resource to point out. You know like it's just yeah. It seems like very clear deployment frequency deploys and all that, and probably the quality is more based on the issues and all the other analytics right.

A

Yeah exactly and then the other thing is, you want to see improvement too right so tell me about the last 30 days. Tell me about the last 90 days. Tell me about the last seven days. Am I trending in the right direction right? So, let's just say that, like we implemented to some new organizational change, we hired some new directories got fresh ideas. Well, tell me about the payoff right. I got to know about the payout and so like the ability to just like toggle through some of this stuff. This gives me time.

A

Series data on figuring out are some of these strategic bets that I'm making for my organization. Are they winning and ultimately did I make the right choice right, so every change for organization there's risk right. We can go in the wrong direction.

A

Are you going in the right direction? You need some metrics for that.

A

Okay, cool and the last thing I'd say with this- is that without knowing where your bottlenecks are, then you really have very little way of improving your organization in the first place so like todd when he talks about like first order net new logos right, so it's like if he didn't know that because he didn't have a chart, then that would severely impact our ability to win in 2021 right.

A

um I yeah okay. So let's talk about the monitor stage. Next, um this is actually let's. uh Let's just stop briefly. Can I help clarify anything that we went through uh in like the last couple of slides.

D

A

Awesome, let's continue, then, all right. So let's talk about the monitor stage and what really the question that this is helping. Our leadership figure out is- and this is more engineering leadership- not as much business value- is how do we quickly identify and respond to problems in our applications and servers right, and so this is a little bit different. On the first question, it was almost primarily business value, this one's going to be more in the weeds, so we're talking about like vp of infrastructure right. This is the type of things that he cares about.

A

So questions like how do I? How quickly do my engineers identify, diagnose and respond to application issues so creating that support ticket figuring out the answer to it? How fast is that, if it's for a good organization, maybe like one to two days for a bad organization, maybe a couple of weeks right so we're gonna have some metrics on when someone raises a support ticket to when we get like a response and know what's going on and we're on our way to solving it like we need a metric for that.

A

How do I managers make sure that applications have enough resources right, so enough servers enough cpu enough memory enough, like? uh Are our websites too slow like? Are they fast enough? We need to have metrics for this if we're amazon.com so like for many of you all that took our earlier sessions, you know that the difference between a fast website and a slow website is maybe 50 in sales, so for amazon.com or netflix there's absolutely.

A

You absolutely need metrics for figuring out like how fast are your websites? How is this trending in time? Are we okay during the holiday session? Are we okay like on a three-day weekend right? So that's we're talking about here, so we include powerful tools to bootstrap this process, and one caveat that we have with this. Is that most of the features that we're talking about here? They work for kubernetes based environments.

A

um So that's for that's the caveat um so to surely start talking about what we do around this. It's important to understand how normal people do this without all of these tools, and I used to be one of these people right and so just a little bit telling this story. um I was a test engineer for half a decade and every server that you have has maybe around 10 to 15 log files, and these are just these gigantic files of text and it's just sort of like if you create a new folder, it's logged.

A

If someone logs in the system, it's logged and so to solve an issue you're, basically looking through these massive files trying to find out like oh there was an incorrect system call at time. 2018.0906. This should have been something else, it's very, very, very specific.

A

Looking almost for like a needle and a haystack right and- um and another thing is that when a problem happens, you get this thing called a stack trace. What a stack trace is that, basically, is the computer's way of saying something: bad happened. What happened? A 500 happened. Why did it happen? It happened because of this. Here's, the! If you look at the code here, then it tells you basically exactly what happened. I wouldn't say exactly what happened but pretty close to it, and so um customer issue comes in then.

A

The number one thing that you need is you need log files which will tell you what was actually happening on the server and if you can get a stack trace that also helps you. You need this information.

A

So what's the problem, the problem is that, looking through all these log files is really really hard right. Each server may have 10 15, 20 log files and let's just say that your netflix, you don't have 10 servers. You have 100 000 servers right. So do you want me to log into one server? Look through all the log files. These are 5 000 pages of files, log back out log in the next one, and then eventually I find the server where the problem happened like it just doesn't scale right.

A

The other problem is that some problems are very difficult to trace. So what I mean by that is that if you just think about like how complicated technology is- and this is a fake example- but it's uh it's almost to illustrate it, but it's like. Why was I in such a bad mood today? Well, the weather is really bad today in raleigh and something else happened and I'm stressed out because of christmas, and I didn't get a promotion.

A

I don't know something like that right, but it's like 10 things lined up and then some bad outcome happened. It's the same thing with servers, so it's literally like 10 people tried to log in. We only wrote the code for having three simultaneous logins and then this other thing happened where our load balancer.

A

Did this really weird thing that only happens during the update and the update was happening at this specific time and then the specific user got an error message right, so it's just really really really hard to trace, but the problem is that it's just sort of like your job is to figure this stuff out right, because if people don't like your product, they run these problems. Then, ultimately, less people are going to buy your things because those people are going to be vocal and they're going to tell other people to not buy your product.

A

So ultimate problem is that this ultimately creates situations where quality engineers they report a problem and developers. They deny it so um now you have like this situation where, like quality engineer comes in, does two weeks of work, maybe literally like 60 to 80 hours. Looking through these systems creates a report for the developer to go fix. Something developer says: I don't see this on my system. I don't know what you're talking about I'm not going to work on this. So how often did this happen? When I was a quality engineer, I would say maybe.

A

In a given year, I'd say maybe 20 percent of the time out of all of the support tickets that I went through as a quality engineer developers would like say I I don't like they basically denied, maybe 20 of the time right and then ultimately, what happens is that now you have deadlock, there's no progress, that's made and all the while this is happening. The customer is not happy and oh so the team's not happy right.

A

So all this is to say is that long story short. It is hard to be a quality engineer. It is hard to accurately diagnose a problem without an accurate diagnosis developers, don't know what to do. Sometimes they purposely don't know what to do, because they don't want to work on stuff and then oh, it ends up into very nasty situations. A lot of the time, and so there needs to be some better way of doing things, and so we're talking about is ultimately paid tools for monitoring your servers right.

A

So, let's just say that we're looking for something like this in a stack trace well as opposed to me manually logging into each server and looking through all these servers. What if I had some sort of third-party like machine that was monitoring all of the log files for all the servers in my environment? That's what we're talking about here and that's actually something that gitlab does so what we do is we can monitor your log files for you and report errors as they occur. That means it's all automated.

A

I don't have to have some qe person log into the machine and look through this 5 000 page block of text and trying to figure out like when the mistake happened right so immediately as the mistake happens, because gitlab is looking at your log files in real time, it can basically automate the creation of an issue and then so. The reason why this is good is because one that saves qe a lot of time two. This is an objective source of reporting to show developers.

A

So it's not like this argument, where it's like that, like a developer is trying to say something and um it's not getting through now, all of a sudden gitlab, which is a robot and is a like. All automated solution has reported this. So it's hard to really argue with like what our tools are saying right.

A

um So we have automated creation of issues as error surface as soon as that server problem happens as soon as that snack trace shows up in our log file, then we'll create an issue, and then a customer like um a support ticket can come in, go to the issues list go to the support board, so this is all fake.

A

This isn't a good example, but like it will create an issue for you say what happened, what server, what log file when and now all of a sudden it basically automated a lot of this stuff that I used to do by hand. This would have saved me a ton of time if I had this in my environment and let's talk about some of the business value that we get out of this. So reducing developer, slash qe, friction um so that those like nasty situations, they're talking about and ultimately automating, some of the quality engineer workload.

A

This isn't really so much of the thing where it's like. We have to be afraid of automating away people's jobs, the people who actually do this stuff they would like.

A

They really want some sort of solution that is going to allow them to do this better, because, as someone who used to do it, it was a pain in the butt to go through all these servers look through all the log files spend maybe four days trying to recreate a situation so that it happened only to create some issue that then the developers they don't agree with, and then they don't even do anything with right so um like I would have really really appreciated having some sort of third-party system to automate all of that stuff.

A

Okay and ultimately we're talking about here. Actually, this is different. I'm sorry different slide. The other type of environmental metric that we really need to keep track of- and I alluded to this earlier- is how fast is my website performing right and then so on? The bottom here is annual financials for amazon.com five years ago there are 100 billion 2019 they're at 280 billion they're around 400 billion now right, and so let's actually take a look at some of these things.

A

So some just like studies pinterest some business- increased new customer signups by 15 after a reduced wait times for the website made the website faster british broadcasting corporation, found that they lost 10 percent of users for every additional. Second, their site took the load.

A

So, basically, when all of your revenue comes from displaying ads, you are losing 10 of your customer base from because your ad revenue, that's how you make money if your stuff, if your tech isn't fast enough, google found that 53 of mobile website visits were abandoned if the site took longer than three seconds to load. All that is to say is that imagine a 5 10 15 reduction in customer adoption for amazon.com how much that would affect their business.

A

So we're talking about this growth rate is 60 bill to not invest in your servers and to create a slower speed may actually reduce that 60 bill to 55 bill. Maybe 50 bill increase right of new adoption. That's what the data shows from all of these industry metrics, and so it makes a lot of sense for amazon netflix. All these other sites to really invest in your technology make sure their stuff is as fast as possible.

A

So how does git lab actually do this right? um Well before we jump into that? Let's talk about what we need to do to like get this to work um so at a bare minimum. We have to make sure that we have the right number of servers right. So maybe we need to buy new surfers right and then sometimes you need to address a more specific bottleneck. Well, you don't actually need more cpu and memory.

A

Maybe you have a network bottleneck because your people are downloading too much stuff, so you can basically increase network by itself, maybe buy more wiring, bigger, wiring, more wiring systems, as opposed to just putting in more servers which isn't going to solve your problem.

A

So and ultimately, we're talking about here is we need a tool to monitor our environment, so our business can proactively identify and respond to when more resources are needed. Is this something that my guess is that this very rarely does this come up in y'all's? Customer conversations? Is that correct.

C

Sorry, chris, can you repeat the question: what.

A

Subject comes up in customer conversations. um Do you all talk about this? In your conversations, my guess is no.

C

I don't get rarely.

B

Yeah, this is normally like solutions, architect, infrastructure, how to set things up type of like that's farther down the line. Yeah.

C

Sometimes yeah: well, sometimes they don't even know we have ci. So we don't get to talk about monitoring.

A

Yeah, it's a good point um yeah, so we can go through some of the session pretty quickly then, and let's talk about how we do this, um we do this with this thing called prometheus and you get this really awesome dashboard. It tells you cpu usage memory, usage network usage. It gives you time series data over here and then so if customers are complaining, hey your app is really slow. Now I can go log into prometheus and say: oh wait. We ran out of memory at that point in time.

A

That's why it was slow right and then so now we have a system to basically um sort of make sure that we are doing our part as a business level leader to make sure that we don't have. We are in like that elite tier of website load times, which ultimately is going to drive revenue for my business. That's what we're talking about here.

A

um One caveat about prometheus prometheus is really for kubernetes. um So if you hear kubernetes, you can talk about it. um We have a lot of stuff that makes this easier, but we're not talking about like stuff that runs outside of.

A

A

Yeah there's some additional things that I want to tack on here, so we already talked about resource monitoring. Some additional things that I haven't talked about is automated alerts right. So let's just say that um let's just say that we're having a spike in customers using our servers, we're about to run out of memory, we need to go. Do something quick! Well, we can actually customize when alerts are sent. We can send a slack message to your your director of infrastructure that, like hey our app, is about to run into some performance problems.

A

You need to go, create new cloud servers and get this fixed right. um So that's one of the things that another thing that we do and um ultimately we're talking about here- is that prometheus is pretty good. There are better solutions for a lot of the things that we talked about and some of the things that you may hear is heroku and new relic.

A

These are like really really really big businesses and um just like how we are quarter circle in some of these things, they are like the complete circle right um in the maturity page, all right. So, let's actually just bring everything home um so who actually read about the what I talked about with the hack of uh the us commerce and department. um I did it's crazy, yeah. It's it's really a big concern. um I actually have a what what did you read about it josh?

A

What are some of the things that you're thinking about with it.

D

uh Well, I mean there's, there's still a lot, we don't know about what happened, but I mean it's a significant breach likely from russia. um So I I've already seen like a couple of my my uh colleagues on the intelligence community side of things like sending out emails to customers saying like it's all. That's happened like kind of drawing drawing their attention to our security offerings and whatnot. So.

A

Yeah cyber security is something I've spent a lot of time researching and reading up on, because I think that in the next 10 years this will be one of the defining issues in our age. um So I'm not talking about, like some small thing, that technology like changes, some thing that people don't even notice about, I think that identity, theft and cyber security is going to redefine our planet in the next 10 years. So what we're talking about here is: let's just talk about some metrics. What did actually the analysts say?

A

Well, the analysts say that by 2021, cyber crime is projected to hit around 6 trillion globally, that is, more money lost than all of money and drug trafficking, all of money and violent crime, and basically we're talking about is that cyber security cyber crime is going to be pretty much like.

A

Let's actually show you.

A

A

So, to put this under reference, uh that is more money than the vast majority of economies in the entire planet generate so like france, their gdp is around maybe like 2 trillion, so there's more money lost as a wealth transfer for cyber crime than all the economic output of france in 2021. That's what we're talking about.

A

So it's a really big problem from a business perspective, everyone's read about like lawsuits. These are becoming more and more common, even the u.s federal government was hit. We just talked about that. The worst case scenario is that russian hackers learned how to print the us dollar it's theoretically possible. If that's the case, then our currency could collapse. um There are real issues um and it requires like the smartest people on this planet, to figure out something new, because the current paradigm is not working.

A

um So what we're talking about just finishing up the story is the protect stage right. um We have some very limited defense capabilities out of all of the product stages that we have. The protect is probably the least mature out of all of them, and what we're really talking about here is server defense.

A

When we talked about the security stage, we're talking about scanning your merge requests or problems, what we're talking about here and why this is a separate stage, is that this is going to defend your servers, they're running actual web applications, and things like that. um So that's why it's two separate stages and how we split them out. So it's uh it's, definitely not mature, but if you're smb- and you don't have anything- it's definitely better than what you're doing right now.

A

um Obviously, if many enterprises they have 100 150 different security tools, they're not going to be super interested in this stuff, but way to think about it is that if you don't have anything right now, then it's way better than not doing anything and to have it all in one in gitlab will ultimately save you time money, one development panel, um one less tool to evaluate what less license costs.

A

So it can definitely be something that people are interested in, and so what we're talking about here is another theme is that most of our defend capabilities are limited at kubernetes clusters.

A

So if our web servers are running in kubernetes, that's what we're talking about and we have certain features for defending the clusters themselves and we have additional features for defending specifically web apps if they're running in kubernetes. So let's talk about what we're talking about here, so this roadmap. Another caveat is that this roadmap is changing all the time, so some of these slides may actually be out of date. But when I made this presentation about a month and a half ago, this was our current, offering offering number one.

A

We have a kubernetes cluster, so each of these is a server right. This is a server, this blue thing's, another server, and what we're talking about here is that out of our server comes out network data right um so like when someone's accessing our website, we're giving out html fragment and then that's what that's?

A

How someone that's, how someone accesses our website right, so what we can actually do is we can monitor all of the data that goes out of each of these servers, and so, if I I can now put up some sort of thing, that's monitoring all the web traffic that's going out and it's like html fragment, good html, fragment good ssh key. That's unusual right, like someone's username password. That's that's unusual right and then so um that's! What we're talking about here is monitoring the network traffic. That's leaving your servers, also monitoring the network traffic.

A

That's going between the internals on each server um so like not to get too granular here, but it's just sort of like. If you imagine under the hood in your car, there's all these pipe stuffs going around. I can monitor whether or not like the right stuff is going from like one component in my server to another right and then, if I can put up like a lot of watchers, then I can figure out like hey. This is unusual. Maybe I should block this thing. That's what we're talking about here!

A

Second thing that we're going to talk about is firewalls. Firewalls is something that we've all messed around with. um If and then, we've had to customize, but the whole idea is that if you have a web application running in kubernetes, then we have this thing called mod security, and what that does is it's specifically going to.

A

You can basically say like hey. I want to only have my app serve the united states if it comes from, like you know these ip addresses that are associated with, like some like country that we're at war with or something like that. Then we want to block all that stuff. I'm trying to like not get too political here, but like we can block ip addresses by the country that they're from we can block specific ip addresses. If there's some customer, that's like it's a ddos attack right.

A

So I'm just going to hit get lab sas 10 million times and try to degrade its surface. I can block their ip address. um So that's! What we're talking about here is basically can be blocked by country can be blocked by specific ip address um and that's another way to protect your web applications.

A

So rounding everything out um thanks for attending today's session, um I'll stick around for another couple of minutes. If any. If I can help clarify anything, if you want credit for this class fill out the test which I'll generate at the end of this session, but thank you all for coming. I hope this was helpful.

C

Thanks grace, yeah.

C

Thank you very much.