GitLab #EveryoneCanContribute cafe, 19 Jan 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 47. #EveryoneCanContribute cafe: Observability, quo vadis

Description

Slides: https://docs.google.com/presentation/d/15CzbqO3leXOnH3Pwz94zYRzeOT8g92YQK7wC-Ii8HzU/edit
Meetup: https://www.meetup.com/everyonecancontribute-cafe/events/282736146/

0:00 Introduction
1:40 Presentation start
3:16 3 Pillars of Observability: Metrics, logs, traces
10:47 Profiling
11:37 Overlap of Observability
15:47 Known and Unknown
17:14 Observability example: Docker Hub Rate Limits
18:55 OpenTelemetry & Tracing History
23:08 Use case: CI/CD Observability https://gitlab.com/gitlab-org/gitlab/-/issues/338943
25:31 Use case: Quality Gates
28:41 From DIY Monitoring to Observability
30:40 o11y.love as learning collection
31:30 Group discussion

A

And welcome everyone back in the new year and hopefully a healthy and unsuccessful new year.

A

um I think it's, the 47th version of the avalon contribute cafe and after last year's session, while we talked about raycast, which is a workflow application on mac os, I'm shared by michael eichner, and we sneaked the upstream acquisition um into the meetup actually with seb and matt joining live, and we thought about well, what's what's going on with observability, how does it help with ci cd pipelines and and so on, and just like this somehow sparked the idea for today's meet up and in order to get the conversation going, I've prepared um a long.

A

You know a short sled deck um so just to bring everyone up to speed what we are talking about and if you're new to monitoring or observability. So you can like get the feeling or get um get an idea. I don't want to do any like frontal um screen sharing now and I'm speaking and you're listening and we're not joining the conversation.

A

So I would encourage you to whenever you have a question or you want us want to discuss a specific topic or something is unclear just to jump in and unmute yourself and yeah. That being said, niklas helped me prepare the slides for today, and I think we should be just be going. um The thing is where to start with observability well, something which, which I always think about is when talking about observability and monitoring quickly, make it a bigger one, bigger.

A

We have kind of we have security, which we talked about. Shifting left. We had state black box monitoring, we are slowly. We had been adopting slowly with metrics, so like something with graphite and later on prometheus.

A

Then we had stage changes over time and metric data points and everything else, um and we have been moving on in the past years with um defining service level objectives slos um with an agreement.

A

So, like I'm agreeing with my customer to have 99.5 uh percent availability, we have objectives which are normally higher than the level because we want to have yeah, basically 100, but not not really, and there are certain indicators um and this moved into defining the four golden signals which is defined by the google sre series um and it turned out to be something like okay, we have latency traffic error situation, but at a certain point, to see more to to get some more insights to monitor things, to observe things.

A

We need code instrumentation, and this was kind of the discussion. Okay, monitoring at metrics. Okay, this is a defined state. um What is observability now? Can we maybe define the three pillars of observability?

A

And this has been quite some discussion in the past years, so like to get things started um with matrix and with the first pillar, which could be metrics like we have prometheus prometheus is simply said, um a daemon which collects the metrics from end points and does some auto discovery and much much more. But in the end um it's a simplified way of getting insights into your application into your services and collect the metrics.

A

It has its own query, language, which is called promql. There are certain functions which allow you to aggregate and calculate the metrics to present them in a different view. This is helpful to um write your own queries um generate things and later on also work with alerting. The thing is as a developer. It's often hard to like start with that. So where do I actually add my code? um What is what is like? Where do I start with metrics? For my monitoring for my slos is?

A

There is a key and the tag I need values, but what is the right way to to move forward?

A

So um you can define your own metrics with that bench in my app instrumentation, but the thing is also: you have infrastructure monitoring like memory cpu io on the node, maybe in your kubernetes cluster, on the part on the cluster nodes and much much more for your services, where the promises exporter, for example, is implemented in in the in docker and in specific other ways. um There are planned libraries available to make your life much more easy.

A

So, like learning, this can be playful with python example, which is shown on on the screen: shot, build something and, for example, deployed into kubernetes in kubernetes. Install the promises operator, use the custom resource definitions for service, monitor and inspect the metrics. Then this can be like a fun learning way to actually move on and okay, I've implemented metrics now, but the thing is we not only have metrics when we want to look at things.

A

So um we are thinking about sickness, something which describes a certain state as that, a certain thing which happens and we think about metrics logs events, traces and profiles or profiling, and we also need to break up monolith and microservices, and there is so much many things to unpack. So it's um it's a great way to like focus on app instrumentation, the first time um similar thing for logs.

A

There are so many decisions to be made, so many tools involved and stacks to be evaluated, that it makes sense to really focus on evaluating the options and also as a developer, focus on. How do I log things? What is structured logging? How can I improve the performance of my logs and so on? So this also evolved in parallel to metrics over the past years.

A

It's still a long story and many many decisions to be made um which led us to traces and distributed traces in. In that specific regard and nicholas, please correct me if I'm saying something wrong, um the thing is um traces and spans work in a different way to logs so span has a start in the end.

A

Time has some context, so you define telling the user where the specific thing happened, and you also need code additions if there is no automated tracing, and you also needed to learn about specific um implementations back ends and collectors so like back-end tracing is a tool which can do it. um Grafana tempo is a tool which can do it and there is a collector with open telemetry as a specification which can do it so again, long learning curve. Many many things to unpack and tracing became the third pillar for observability at least probably.

B

uh People um also know traces already before so probably you can assume it or you can see traces every not every day, but if you're working with a browser and need to debug your front end locally, when you're clicking on inspect in the network tab, everything in there is also a trace. In the end, every request that is made is the trace.

B

So and why is no tracing something new in the system because traces were established? um You need to go to network.

A

Yeah, I remember that and now I need to press record.

B

A

Turned off record.

B

And now you need to refresh the site and then you see- and you see, on the top uh the diagram this uh trace in the end. So since you're doing multiple requests, loading, something um and you see now- something is happening and it does zoom after that. That depends on how the website is configured.

B

If google presentation do it really in uh um not in a parallel range, so they do in a step after step, but you would the interesting point interest is that you see the total time. This is the length the length the longest span in the end, and then you can also see parallel requests and where parallel requests often happen is in distributed system or in service oriented systems.

B

So, for example, think about a simple microservice architecture, where you have three services: a booking service and order service and in the transfers, and you want to get now insight into this architecture, how which service called whom and who gets with which data and how long does it take and all the stuff is helping with address or distributed tracing in the end? um The big benefit of that is that you can decide which data you want to put in to see the value out of it. So it's in short, import choices.

A

This was um a quick live uh tryout thanks for the reminder of a browser uh developer tool, because this was also the thing which reminded me immediately when I saw jaeger the first time I think in 2017 or 2018 um and thought. Oh, I can totally debug slow websites with them, there's, certainly more behind it and like many many new things came about and we was with the switch.

A

I'm not I'm not planning to talk all day, um but here's the thing um profiling also came around and became popular with open source in the past year, actually so like providing application performance insights in c um when a function is called too often or taking too much time that you can see that in a graph and analyze that similar to how you work with traces and it's a it's an additional data source, and so we could theoretically speak of the four pillars of observability now, um but there's discussion going on if it's, whether the pillars or if it's something else with observability um for now.

A

um We also thought about discussing like an overlap with observability, since we kind of used to collect lots and metrics um over the past decade, um distributor tracing came a little later because it uh there were open source tools being developed um with open tracing and open sensors um and with open source. The adoption just got wider and much more powerful, also in community building.

A

um We've seen that, like metrics, can be aggregated logging is tracking the changes in the system like in a in a data stream basically, um and to the like to the picture which is shown on the right. um With the I don't know, it's called intersection.

A

Where everything comes together, we actually need to put in profiling as a data source or as a way to observe things or to correlate or to analyze things. So we need all these data being available, but there is a certain overlap. What is collected, for example, I could write a trace and a span similar to how I write logs in my application, like defining timing, points and durations and saying.

A

Okay, the request to the uh for the client to the website is dependent on my microservices architecture and the background, and the query goes to the http server to the database cluster in the backend to the redis cache and to the front end again. um What is blocking the service? For example, I.

B

Think also that a lot of people are using like from probably it's also interesting, now, of course, with tracing, but also with how it comes from. How would I use it on myself? Probably some people are already using it already, so at least the two intersections so metrics and logging is mostly the standard I would say in production apps, because you use it mostly also in a daily base, because you would get metrics to get the data to see.

B

If the system is up and running, then you probably get an alert and what you do then afterwards is because with a metric, you don't have enough information, so you don't know: what's happened in the system, you go into the lorry intersection and check in your learning tool, how the locks are in there and which events happened afterwards to make a good guess, or you know, directly, what's happened there and then you're fixing it.

B

So um this is also a step that probably a lot of people do- um and this is mostly the basic system that everyone has and also interesting, because in this data, what you see regarding low volume, high volume means like how many information needs to be stored so submit metrics are mostly only numbers. It's not so many data that need to be stored for lots. It can be really big. You need to be have a big system for doing all the loading stuff.

B

So it's also interesting how long you're, logging, and also the same with trace a little bit smaller than lots mostly, but you need also to store all this data, so um this is also an important point. I would say.

A

And I think data is a good point. You need storage for that. So, if you're planning to to evaluate something or plan for the long term, metrics traces and logging, I think the logs will take consume the most space, if not aggregated um but yeah, it needs back back-ends. It needs tools which which might need availability, high-availability, um distributed systems and so on. So um it's just it's a challenge and you might be seeing or might be seeing the three pillars of observability being discussed on the internet.

A

You might also be seeing something which which mentions um the nouns and the unknowns so like the unknown unknowns. uh We are not aware of and do not understand.

A

This is something which is mentioned, for example, when you read the honeycomb dot, io blog posts around what data they collect and which, like events and signals, I think they're using a column-based storage engine in their cloud and they collect basically everything with the agent and the b lines. So they have a lot of data available and you might be drawing a conclusion from something I think this on the next slide.

A

Like did you know that maybe the dns resolution latency increases your cloud costs. This is something which I probably I would be thinking about it because I I know that dns is always the problem, um but in the end um it's it's not really a known fact, but we might be understanding it if we're just collecting all the data which which we have available um on the other side, like the known things which reward a monitor, is monitor the state of your application, you either get ping ping works or ping does not work.

A

The slides also contains reference at references at the bottom, so everything to learn async.

A

Let me see what else is there yeah one one example which happened last year and we discussed it in the everyone can contribute cafe, were the docker hub rate limits where we didn't really know what will be happening. We knew that there will be limits when you're doing a docker pull and after a while, I think, 100 in six hours or something it didn't work anymore.

A

So we thought about what could be affected: our cicd pipelines because we're using docker cloud native deployments, kubernetes clusters clusters and so on organizations behind the nut, which is still a thing in modern infrastructure and another cloud providers which act behind certain ip nets and so on. So um we had a known state like we could simulate something. uh We wrote a premises exporter back then and could monitor something.

A

But the thing is: if you cannot detect that um or you would like to detect that um an unknown state could be like you're deploying something in your cluster. Does the icd pipeline kind of work? But you have logging with too many requests and the problem is, you cannot reliably detect that and your customers see different prices on your website and they think they bought something for a hundred dollars.

A

But actually it's costing 200 because you increase the price, but it didn't reach them, and this could be a problem for many businesses um and we thought about yeah how to how to really um understand all these things now um coming to metrics um or coming from tracing metrics and logs and there's a little bit of profiling to an idea of unifying that and open telemetry um was founded or just to circle back a little bit in time in 2016 for distributed tracing the open source projects open sensors.

A

I think this was driven by google and open tracing have been formed, and this was a specification and also client library, implementation which allowed you um to instrument your code and send traces to tools like zipkin, jaeger data door, light step and so on. um This has gotten like um doer development or overlaps, so open telemetry was founded, yeah um and it aimed to merge, open tracing and open sensors. It became a cncf um tech working group forbes in the observability space.

A

The project was created and in 2021 it also added metrics and logs to its agenda and became an incubation incubating project, so hopefully soon will be ga ready to use and graduating um and most recently, last week, open tracing has been deprecated or there is the idea to deprecate it and announce it, which is linked so open telemetry is here to stay now.

A

What is it? How can you like use or combine it with your existing tool stack or how can you get started? One thing you need to understand is there is a collector or a side car which then consumes uh the traces and the metrics, but you still need to provide your own backends so, for example, jaeger for traces promises for metrics.

A

If you want to instrument your application, there are client, libraries and sdks in development which either allow you manual instrumentation, with c plus, plus, go and so on, uh which is linked in the getting started: documentation, for example, for go or um certain languages. Allow you to do auto instrumentation, so something which I think plays proxy somewhere in the code and then does automate automatic tracing or automated instrumentation.

A

So this has been heavy under development in the past. I would say two years and is getting more traction in especially 2022.

A

um from a visual overview you can think of um having the uh the agent or the collector service running in your qrs cluster, or either being instrumented from ports from virtual machines and so on. um This is a picture which I copied from the docs other examples for adoption range, for example, for kubernetes system components which has been added in 1 22 um in alpha. I think um so. The adoption is is going further, more ideas on using open telemetry, uh for example, with uh up stress tracing.

A

There has been an early demo being shared last week or this week. The other theories other idea is to link metrics with traces. This is called x and plus, if you run a if you run into that, and one other thing which came about is the icd observability with tracing by tracing your pipelines and find out the drop duration and so on, which can also be achieved with open telemetry.

A

And um just keeping in there the thing is so we talked about tracing like how does this trace and the span look like and the back ends, and so on. um This is a copy from the earlier slide. The idea is that you can see, for example, the cloud resource costs. They are very high because you have many ci cd jobs which are long, lasting and failing all the time you actually don't need them.

A

You have slow caches for the icd in your infrastructure and you have a certain network latency when containers are being pulled. This is something which is, I would say, hard to detect unless you're scraping the logs and and trying to understand ability by yourself now the idea is- um and I have been working on this um in the past two weeks- um to create um the jobs to be done.

A

What needs to be implemented, for example, in gitlab, defining where to send it like defining the open, telemetry endpoint in ci cd variables instrument, the actual code, the server and the runner center traces and dispense to the open telemetry collector have visualization in its simplest version, just using the jager front end because using jager as a backend.

A

This is described in this gitlab issue, which is, I think, 15 pages long. Meanwhile, um but yeah. The idea is to really um start the implementation and make ci cd observability a breeze in the future.

A

The other thing to watch out for is metrics are ga in open, telemetry and logs are experimental, could be or will be, a unified way to instrument apps, but still developers need to learn and adopt which which could be you, and you also need to evaluate whether it makes sense to migrate everything to open telemetry from slash matrix endpoints in applications.

A

um I think I'm talking too fast. um Are there any questions so far.

A

Okay, um the other thing which is like interesting in observability- and this is something for um left, shifting, slos um quality gates, um which is something we discussed in 2020 so like one year and some months ago, around captain using the knowledge of metrics and alerts.

A

Writing your own prompt calculus for promises with the open, slo format and then defining the service level objective and say: I'm deploying this staging environment. The docker pool, for example, fails because the rate limiting happened, and I don't want to deploy that into production.

A

A quality gauge should measure that and and tell me about it.

A

This is one of the more advanced things you can do with metrics and left shift dslos, so captain basically plays the quality gate um and you can use either the graphic graphical interface or um also define your own yammer. If you want to um no it's it's very easy to try out, and there are many tutorials available um to instrument and not insurance, to measure the um service level objectives for your application.

A

um For me personally, this would have helped me, for example, detect certain cic um c plus plus uh co-routine crashes, on the stack, but only with with 1000 api clients, which you don't have in a development environment and having having that, for example, in a test environment would certainly have helped with quality gates, not merging it and not releasing it to customer environments.

A

um This is a definition how it works, um and this is a playground demo, um but yeah. The thing is, captain should be acting as a can can act as a quality gate. Promises for slos and simulating a production environment or incidents is hard, so we could be adding chaos engineering to that.

A

The other thing we can we need to keep in mind is how to generate slos. This is also a hot topic in the future.

A

In the observability space in the monitoring space um here in chaos, engineering similar thing- you want to kill your parts asynchronously randomly you maybe want to chaos into chaos, engineering with network connections- um maybe I'm I'm creating something around dns, so bgp routing um and still verify that um everything is operational and the service level objectives are still matching um see. What else do we have here?

A

It's basically a summary of what you can do with service level objectives and learn about instrumentation and observability.

A

And use the benefits of cloud environments. The other thing is like to see the value in logs metrics and traces to get started quite easy um and see how far you can get in your observability story.

A

In the end, you shouldn't reinvent the wheel, because there are many, there are tools out there. There are platforms out there, providing these capabilities and also like document everything on your way, which we are currently doing.

A

um The other thing which is interesting or which um which hopefully comes around is like some machine learning which allows us to correlate metrics and traces in the future and to easy easily make it more easy to identify any bottlenecks or ci cd pipelines being blocked or external resources being the root cause of it, and the other thing I want to highlight is cd events, which is a newly formed specification or 2b form specification for continuous delivery events, which sounds very interesting to join and spark the conversation not only for continuous delivery, but also for cloud native environments.

A

um Yeah, that's to recap: I copied the slide deck from a different one, and actually resources are linked over here and we want to start an open discussion now. um The other thing I want to highlight is um I own this domain, and I thought about creating resource for observability and learning it.

A

um So this is, and not yet fully functional um and kardog space um where everyone can contribute, and um I will be working on it and I will encourage you to also join and submit your much requests so that we can create a learning platform for everyone.

A

A

I think it was a lot of information.

A

Questions thoughts, use cases for observability.

A

B

C

It oh never mind, go for it.

B

All right go ahead, so I can.

C

I was, I was just gonna, ask I mean yeah, I don't know, maybe just in general, so yeah. I also I've been uh working on building a open source, continuous profiler for about a year now, but one of the things that um that I'm kind of interested in is like you know, with obviously there's like tons of tools out there, there's tons of signals that you can like add to your your workflows.

C

But I think one of the interesting things is is like how you sort of evaluate the cost versus the benefit of adding in these different, like you know, tools and signals, and that kind of stuff, so that you don't get to the point where it's like you know: you've added logs and metrics and traces and profiling, but it costs more than the system that you're you know monitoring, and so I'm I'm just curious what uh people's thoughts are on that.

B

Probably I can give a little bit view on my view on this so because we started we're using mostly only open source observability to it, so we don't use any vendor right now and what we did so in my past company. So my past experience we used for standards, but in my current company we're doing a full other way.

B

So we use nothing from the stretch, so we stripped out everything using only the basic stuff and implement stuff when we need something so because you get, the problem is really an observability that you easily shifting from signal only to noise and humans are not doing then anything.

B

So then you have only a big um big boiler of mud of information, and then you don't know where to start, and then you probably ignored here, because it's not important, then you get a configuration drift probably and the main benefit is then um really starting with less information, and you will get up and up and up because not everyone started with one hundred percent and not everyone is the google.

B

Not everyone is in amazon, so mostly companies have a lot of simpler systems mostly, and it's also what we now see in the whole cloud native space, or, I would say, at least in the observability space. I found really interesting because we had this yeah. We need to have distributed systems.

B

This needs to be different components and mostly all newcom components that coming up are mostly modular monoliths in the end, so that you can turn off features on and off, but you don't need to have multiple deployments need to check how it's working, of course, the blast releases a bit a little bit higher, but practically mostly most people don't hit that so and for the specialized should go into the specialized world. That's also totally fine so, but, as I said, I think not. Everyone has the same problems and that's also why not.

B

Every company has the same use case and that's the problem, how to find a general way how to adopt that. So I think there is no really general way, but you.

C

B

A lot of insights from the community when you talk to people how they handle this problem, and you probably oh, this was a little bit complex. What we currently doing, probably we should reshape it, and I think the main role is overall to reduce the complexity that human can take it and the machine can take that complexity.

B

But it doesn't need to be that you need to really drill down to through at least five or six layers and finding. Then the problem underneath so because you're more because you're really distracted by all the information that probably not now interesting.

C

B

Yeah, but I would add to that in short, interesting.

C

Yeah I mean that's interesting yeah thanks.

B

And I think also what a difference can be if you're an early adopter or not. So if you are starting really early with the tools, then they are not so complete and you grow with them when you, when you're on top on that, then it's also probably not so problematic to get this, but, for example, in prometheus yeah we didn't know we had like we implemented for us, also the remote right internally for some tools that we are using, and this was like a new feature.

B

I know I use now prometheus for five years, mostly in different setups, but um it's also hard to keep up on all the information, but the community is also changing all the parts so because everyone has different requirements. It's also fine.

A

A

And speaking of like keeping track of all the changes, um prometheus is building an agent which is a feature flag at the moment um which is built on remote rides and like making it easier to have sort of something which was to push gateway in the past, because it's not just scraping actively metrics and connecting to the services, but you maybe want to sort of push something to prometheus wire remote right. um This is going on and this is like.

A

I, I think it's the feature flag, it's currently being tested, and the other thing um I'm when I'm looking at cicd observability in gitlab- um and this is like the uh the feature request issue- I created it- I'm worried about when I'm adding this line of app instrumentation for open telemetry.

A

For example, does this impact the application performance and not just like this- is a small installation where some ci cd pipelines to run, but it should be a large scale system and I'm not sure if there are probably from early adopters, there are benchmarks or metrics available allowing you to say okay. This is a good thing to turn it on by default um or saying. Okay, this is something I don't want to be turned on by default.

A

The problem, then, is you're, not collecting data when it's not on, um and in the case when you drag a problem, then turning on tracing and wishing for data being generated in the past, which is not possible, might not be possible. um It's super hard. It's a similar thing to hey. We didn't collect logs from the server, um and now we cannot ssh into it because it's broken um it's it's super hard to like measure and define for a specific environment and say this. This works that way.

A

um One one thing to really look into for this: have sort of a playground: environment, staging environment, something like that for deaf environment and really try to measure that, for your use case for your application, it adds more workload to yourself, um but I do see the benefit of learning and understanding how it's being done, and maybe on-board new team members document it for yourself or even contribute to the community and or to the wider community and help provide feedback and say: okay.

A

This is working in my environment or I have found or adopted this use case for telemetry, for example,.

C

So how do so? How do you find that balance then like or I guess, kind of like? Do you have any? uh I mean?

C

Maybe it's it's on a case-by-case kind of thing, but finding that balance between, when you add something new or when you remove something, that's just become like too noisy, and it's not you know, actually being you know because it's like it's always the thing where, as soon as you turn off the thing that you need, then, like the error happens and like you said like you can't go back in time, and so I'm curious, you know when you're looking at tools- and you know ci cd pipelines whatever it is like.

C

How do you kind of balance, those uh kind of like competing aspects.

A

That's a good question.

A

I think you need to make a mistake and get flawed, for example by logs, so you're learning by mistakes. An incident happens and you have 100 gigabytes of log files. You need to search in, um and your elastic search cluster is not happy, for example. um So this helps, if you want to be proactive about it, I think um taking my developer head on I'm thinking of I needed to learn, for example, structured logging.

A

Previously we just in c plus plus we logged everything we did throw our sec traces and the logs were not just one line, it was multi-line, it was not. Users couldn't read it, but developers were happy. Developers were not happy because users created back reports because of the statutories, but in the end I think having a common sense of this could be something interesting to log, for example, you're starting an http request, you're ending it finding some timing points um which are helpful for logs and traces.

A

I think it's a it's a good way to start and say I really want to see when um the client is doing an http request. When does it start? When does it end, and then I have the black box in between, um I could go the route of saying, I'm just adding logs in the different thread or a different application, and then I have the tools um to to search and cover that, or maybe I'm thinking of hey.

A

This could be a trace um we're starting here and we are forwarding basically the trace id to the other application, and then I get to see a timeline with specific spans and I also have the possibility to add more context to it, because the log line is just it's text or it's json or it's something else, but sometimes you really want to add. This has been executed in a docker environment um with version.

A

Whatever um specific other text you can, you can add or enrich uh to the uh trace uh context, and this helps you to see. Oh, um the bottleneck is because we're using a two old version of docker in the in that environment and for some reason the customer requests always go that route. Maybe we need to fix our aha proxy or something else, so I think thinking of use cases and incidents into your environment, which you hopefully have from the past, can be really helpful to say this is a starting point.

A

I really want to look into. I hope this helps.

B

I also can recommend this book a philosophy of software design by john ostert, because when we mostly talk about complexity, we think only that the systems are complex to understand, but mostly what I saw my last companies was um that we had more the other problem, because also it is defining a little bit different or adding to that.

B

So it's also hard to make changes in the system when systems are really complex and it's not easy to do a change, then you have also a big complex system, and then you probably don't do that and that's why the people are feared of doing this changes. So you should be have changes need to be simple and it should be less impact. That's the reason why we have all these technologies so with kubernetes complex, of course, but it delivers also some value, of course, that we can run probably workload in parallel.

B

We can probably also use one cluster for doing development and alts production. The the um of course, the root cross, one that we need is this down is probably another problem, but we can use an easily tested in the same system and we have all these tools, but I think the other problem is coming with. That is, like you, have a really steep learning curve.

B

So you need to know a lot of tools and one different thing is knowing things and having experience, because knowing some stuff I heard about the new two that's coming up, something was positive and I can use my favorites so when people coming up here, we saw this new too, and I say okay, what is your experience with that?

B

So how do you saw it when it's not working, because mostly these are the interesting times because then you so see if it's fit for your use case, and there are a lot of tools that you can use and it's the same pillar comes also when you, when we talking about cicdcd, so some people are doing ci.

B

Of course, some people doing continuous delivery, but a lot of less people doing really continuous deployment in the end, because continuous deployment is quite hard because you need to have a fully automated environment, so everything needs to be done. No manual intervention anymore.

B

This is probably not whatever you can use out of a box because you need to understand the system and then you cannot all support it from systems that are using it. That's fitted for their use case, put it to you and you get the same results they build for the reason, because they want to save some money. They want to save money doesn't need to be safe infrastructure trust it could be save people time because they can work on other topics to bring your product further. So that's a lot of stuff, that's ongoing!

B

That's also the reason why you probably also not to jump on every hype, train, that's coming up and also sometimes stay off of twitter of all the instant interesting stuff that is happening outside a lot of people doing interesting stuff, but we have enough information out there. The prom. The more problem is find the right content for you, working on a simple problem or working on a problem and doing a focus drop instead of turning out in different spaces, and then you make like digging in a new hole.

B

Then you have a new hole and then you never never have anything finished. That's not a good result in there yeah.

C

Definitely been there done that.

B

Yeah, it's everyone do that at some time. So it's interesting because there's so many space and the promise that we own all people don't have enough time to learn all the things small focus of having um the things yeah.

B

Other questions, or we can talk about other topics- we can talk about blockchain, so um we can talk about rust, I'm in for that. So.

A

You want to do blockchain, observability and rust.

B

We can, I could find pros that we do on the next meetup. I can show you a blockchain. I can show you rust. I can show you bb bp, bpf, so berkeley packet filters, everything combined.

A

Okay deal: you got yourself, some work.

B

No problem, it's um not wrong. That's that's the reason why I'm also saying like to do that so learning something new, but michael, we don't need to program so many restaurants, so it's for sure so that we not extend our one hour mostly, and we have six hours.

D

In a row, yeah.

B

It's quite cool.

A

And maybe we can not only so, I think the last time we we did. The rust advanced session was the promises exporter with the web server.

A

um There was my sdk for rust with open telemetry, which, which also would be interesting and I'm planning to work on adding telemetry to ci, cd or at least try a park and see how far I get and also want to create more learning resources regarding how to get started um and not just like the five-minute success. You add something to a code and then something says hello world, but you really find a use case in in an application um like this is the http request. This is starting.

A

This is ending um finding a real use case um or maybe breaking something, and I will be giving a talk next week at cows, cannibal, I'm thinking of using chaos, engineering to break something which then gets alerted, and you get to see something which you probably cannot simulate in a staging environment. um I also thought about.

A

I don't know if you're familiar with that, but there is a um a tool called cube doom uh which actually kills ports, which one can could use for chaos, engineering and kubernetes um yeah, and also see how distribute the tracing then works. For example, it would be interesting to see the traces if something is randomly being killed.

A

Which also provides insights and ideas um for real time of a not real-time real world incidents, because most often time an incident is just there and it's an s1 high priority, and you need to react upon things. You already have.

A

Or you need to learn like look for jay just before christmas. This was fun.

A

Oh, it was not fun, but it was sarcastic fun.

B

It depends on which applications are in your company is, I would say,.

A

To be honest, for me, it was fun to learn about more, to learn more about dependency scanning with gripe and travi and get to see how these like tools work. To do ah it's. No it's for container scanning dependency scanning is something else again to to understand. What is the potential of your pinning your docker container in your pipeline or in your deployments, to a specific version and you're? Never updating it, you're, never going to update it again, but still shipping cvs and vulnerable software application applications and dependencies and so on.

A

Maybe maybe we can revisit a supply chain in the future. I guess this will be a hot topic for kubecon europe in may,.

A

Yeah, I'm not trying to put myself too much work on my plate, but I would totally love to work with you nicholas on. We could do.

B

The uh signing press container images also with uh crosic from the chained art, dress.

A

This would be interesting for maybe april well, let's see about it first, I would love to collaborate on blockchain, rust, observability and other things. We can use the gitlab.com group namespace. um Maybe we will create a new group or we just use observability or something.

B

Provide a slide and all this stuff so, but um I need to uh check which uh blockchain we will use. So there are multiple options already: there's not one blockchain, of course, so they're still a different system. It's like when you're talking about a distributed system. This can mean that that could mean that so there's not one solution for the problem and they are have also different trade-offs, because they're working also with different consensus and so on.

B

um But I will make a short overview of that and probably bring you a little bit from the point from what bitcoin is what ethereum is. So what solana is um so these are the most common chains that everyone know and then probably also to program on that, because sometimes in simple words, blockchain is not nothing different like in an immutable database in the end, and you can also do like um structured procedures or stop procedures in um database language, and this is like a smart contract, a program or how.

B

However, you would call it it's nothing special. So.

A

Yeah I just copied the we have been our handbook on everyonecontribute.com. There is a slide template which I've been using for today. You can just copy that.

B

Yeah, I will do that.

A

Oh, we talked a little over time.

B

By over time, we have 7 pm.

A

B

A

I mean in gitlab speedy meeting. Sorry any anything else. We could chat about.

A

Everyone is silent, I think observability is just mind-blowing.

B

A lot of stuff to learn, and mostly also to use, let's see.

A

I think, to be honest, it took me when did I start in march 2020? It took me one and a half years to really fully understand what I can do with open telemetry um and that I that it's a specification and a collector, but I do need to bring my own backend and.

A

Like also seeing um the adoption of the clients and sdks which which helped me understand, okay, this is how I can instrument it with tracing.

A

It's still a complex thing to add certain headers and then functions and dive into the code again, but it feels like a much more unified way and stable way, which is also used by vendors rather than building something on your own.

A

Yeah similar thing with understanding kubernetes, um I need to generate the yaml and I need to understand all the components which, in contrast, is not necessary, because you need to find a use case and deploy an application, a service and and once that pipe in your head gets going, you kind of get addicted to doing more.

A

Okay, um anything else you want to chat about. Otherwise I would just stop the recording and.

D

One last question: um are there any good twisting libraries for non-server use cases so for desktop clients, mobile clients? You.

B

Can use software.

D

B

Because it pushes the data, so if you have a problem that it needs to be connected to something but tracing mostly words that you push for data and that's so they will not be pulled okay. So that means the problem with getting prometheus up in a local client. They were in a prom con in europe, 16 from which also um a blog post. So to do that, because the client is pushing the data into the system, it's quite easy to do that so or it's technically possible. Otherwise it would otherwise burst around.

B

It would be also possible, but of course, and a lot of more security teams have problems to implement this because you're, probably mostly nutted and all the stuff. But um you can do this with open trade or telemetry.

D

So just searching for a good library or for english language. Api myself.

A

You can, I think you can use the javascript library from open, telemetry um and build it with npm or with no chairs.

A

I think you are. We are thinking about a visual studio code extension. um Maybe correct me if I'm wrong, with the assumption.

D

Not directly so I'm just searching for a way because most most stuff I see nowadays is server-centric so distribute tracing and whatnot, and I have mostly the problem that I need to have classical end-use appliance and pushing the data is generally open. Telemetry is a good foundation to have now a standard way of doing it, but building it into things like c plus plus applications is a little bit harder because there is no unified way for networking.

A

So they uh the the technical background for open telemetry it uses grpc in the background for sending or emitting traces to a open, telemetry collector.

A

This is basically a daemon which needs to be running in your network where the application is in so there there needs to be a direct connection, I'm not sure if there is some proxying or for forwarding already in place, but I do think that when you implement the open, telematric plus plus client in your applications code, so your the thing is, um let me see if I can quickly find it in the open, telemetry.

A

Plus plus getting started.

A

Let me see when you add that um the things you need to do it's experimental tracing is stable. Okay, um do we have examples.

A

We have a lot of examples. Okay, um this is not something I was looking for.

A

So, like you need to import, I should look at the examples you need to import.

A

The headers, of course- and um I think it's a partial license, so it might not be compatible with gpl. I had that problem in the past, but I created that pork anyways, um so I I tried playing around with it with a simple timing point three years ago. I think it's broken now, because the um and the api changed. But the thing is you need to kind of initialize the tracer, which um is relatively straightforward.

A

I would say- um and let me see if I can get tracer okay, we're we're getting something out of it and then we're running something and inside we are creating a scope in a thread and okay, we are starting a span and then it's being thread is joined and then everything is gone. Okay might not be the best example, but it from what I've seen it's it's.

A

It has gotten a lot easier to add to edit to your code, like it was in the beginning, where you headed to like include one page of things and then define something. um I think the most important part is that you instantiate um the tracer or the object, which then sends something over there.

A

One thing you need to keep in mind and let me see if I can find the cd telemetry issue.

A

um This one, that's the other one, doesn't matter.

A

So, for some reason everything is slow today.

A

uh What was I looking for? We do have ah this one um like the configuration you need for open telemetry from the client side. It's really straightforward.

A

You define a server you're, defining some authorization, maybe as an http header and you're, defining the traces exporter which can be, for example, diega or just open telemetry as the collector, and this is something you need on the client which can be manipulated, manipulated by environment variables. So you need to ensure that this um this doesn't get overwritten which, which I tried describing over here and um that's basically. Well, that's that should be about it.

A

um In this specific example, it's about adding something with ruby and go um which I look forward to trying out the thing is um telemetry can get quite overwhelming if you're looking into this picture, for example- and my recommendation is to really start as simple as possible instrument a demo application use jaeger tracing as a tracing background, because jaeger also provides its own ui. So you can really start simple and use the open, telemetry collector in the middle and build your own. I think there are some demo environments around with soccer compose and probably kubernetes.

A

Meanwhile, build your own or use an existing demo environment and really do shorter iterations on adding traces and spans to your code, then open up the ui and then evaluate what's going on or maybe or maybe maybe you can use grafana, you might be using ops trace in the future, um something which which is already there and you don't need to worry about, like installing 10, different tools and and whatnot just use a simple installation and when it comes to adding more than that really instrumenting the application. You already have the kit history things.

A

You learned, you hopefully document it and there are certain examples which allow you to follow along. um For this specific thing I have found, you see dish, it's not an issue anymore. It should be an epic zoom.

A

A

Kubernetes has implemented that and kubernetes is written ago.

A

This should be the pull request for the ap server and we can just see the changes. um Probably it doesn't make sense to view the changes on the web interface, but there is. There are certain examples out there which implement that, and this is something my users can actually use in production already.

A

So I'm I'm really a fan of learning from someone else or learning from others how they did it, and especially reading the divs um on what happened, and maybe the mistakes made on the way or the performance problems which which were discovered.

A

I hope this helps definitely.

A

Perfect talking endless about observability today, looking forward to to our next meetup, which will be on the second tuesday in february.

A

I have no idea what it is, but we will announce it in our meetup group and looking forward to chatting about blockchain, rust and observability nicholas right.

B

Yeah, actually, obviously it could be hard, but I think it's I think blockchain is quite hard and also showing a little bit what you can do on that and russ um just like at least an hour or a lot at least probably two hours, but I will speed a little bit faster and we can watch the video in slower motion instead of watching videos faster than uh yeah. But we can do that. So we can talk about blockchains and webstreet. So.

A

Yeah, perfect web three blockchain and learning about the technology behind it sounds super interesting thanks for that and everyone watching. We will meet each other next month and until then have a great time stay healthy and see you soon.

A

B