GitLab Observability Open Meetings

16 Nov 2022

https://gitlab.com/gitlab-org/opstrace/general/-/issues/71

1 participant
10 minutes

dashboards

ui

navigation

gitlab

observability

visualize

maintaining

devops

embedding

fork

19 Oct 2022

No description provided.

1 participant
25 minutes

vm

virtual

bootstrapping

provisioning

automated

backend

developing

booting

screen

debug

21 Jul 2022

Tracing, Error Tracking and a little bit about the UI.

1 participant
8 minutes

gitlab

staging

git

dashboards

exporter

demos

grafana

configure

ui

access

This is a demo of deploying GitLab Observability (Opstrace) locally and using a test application to send and query traces.
This is using our new Golang operator code, GitLab auth, ClickHouse backend and an early version of our Observability UI (Grafana fork).

3 participants
21 minutes

deploying

deploys

deployments

git

setup

kubernetes

tooling

repository

linux

dashboards

14 Apr 2020

This is a recording of a session between OpenTelemetry project maintainers and the GitLab Monitor Stage team members on what OpenTelemetry is and does, and how GitLab and the community, in general, can contribute to OpenTelemetry.

5 participants
58 minutes

discussed

open

gitlab

telemetry

general

observability

visibility

ted

editors

flab

1 Jan 2020

Speaker: Andrew Newdigate

GitLab.com’s monolithic Rails application experiences high week-on-week traffic growth. To ensure availability, GitLab’s Infrastructure team track and plan ahead in order to avoid hitting capacity limits in the application, whether these limits be CPU, database connection pools, memory, storage or any number of other finite resources. Hitting these limits could result in hours, or days, of degraded service while workarounds are put in place. With this in mind, the team set about building a set of tools on top of Prometheus recording rules and alerts to provide them with the information they need to be sufficiently forewarned, up to a month in advance, of potential resource saturation issues. If you’ve ever felt that you’re reactively responding to resource saturation issues, this session will provide practical examples of how we’re building a framework for resource planning into our SRE team workflow. We’ll be presenting our open-source solution and explaining how it works for us.

Slides: https://promcon.io/2019-munich/slides/practical-capacity-planning-using-prometheus.pdf

7 participants
28 minutes

capacity

bottleneck

gitlab

redis

throughput

resource

monitoring

server

infrastructure

benchmarking

4 Dec 2019

Andrew & Marin pair on adding a new saturation metric to the GitLab.com monitoring suite. This video resulted in this https://gitlab.com/gitlab-com/runbooks/merge_requests/1679.

This change was a corrective action following on from Production Incident https://gitlab.com/gitlab-com/gl-infra/production/issues/1419 and again in https://gitlab.com/gitlab-com/gl-infra/production/issues/1437 (exactly one week later)

3 participants
60 minutes

capacity

bottleneck

throughput

servers

utilization

scalability

network

nfs

latency

monitoring

4 Dec 2019

Hordur and Andrew discuss how AutoDevOps can be better monitored using the key metrics framework used for monitoring the components of GitLab.com.

This follows on a outage in the feature https://gitlab.com/gitlab-org/configure/general/issues/9

2 participants
50 minutes

monitoring

dashboard

alright

ops

report

metrics

diagnostics

taking

noticed

deploying

30 Jul 2019

A presentation I prepared at the request of the Self-managed Scalability Working Group.

https://docs.google.com/presentation/d/1xx8sOoWsRvw8_wHqBKbujrWQwmK-meDQrSs4zGNY53I/edit?usp=sharing

Issue: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7217

1 participant
17 minutes

degradation

incidents

fail

performance

reports

inefficient

managed

crucial

service

lab

GitLab / Observability

16 Nov 2022

19 Oct 2022

21 Jul 2022

1 Jun 2022

14 Apr 2020

1 Jan 2020

4 Dec 2019

4 Dec 2019

30 Jul 2019