GitLab / Scalability Team

Add meeting Rate page Subscribe

GitLab / Scalability Team

These are all the meetings we have in "Scalability Team" (part of the organization "GitLab"). Click into individual meeting pages to watch the recording and search or read the transcript.

14 Sep 2023

No description provided.
  • 4 participants
  • 24 minutes
router
town
dedicated
servers
technical
public
discussion
git
demo
rails
youtube image

17 Aug 2023

No description provided.
  • 4 participants
  • 22 minutes
redirection
redis
patch
configuration
staging
gem
cluster
scaling
trouble
queues
youtube image

13 Jul 2023

No description provided.
  • 6 participants
  • 41 minutes
implement
model
tam
brainstorming
logs
overviews
temland
scaling
dashboards
terminal
youtube image

11 May 2023

No description provided.
  • 3 participants
  • 8 minutes
server
working
workers
psychic
brainstorming
flipper
mechanism
defer
bit
time
youtube image

20 Apr 2023

No description provided.
  • 3 participants
  • 38 minutes
bottleneck
contentions
capacity
throughput
overlock
stalled
processes
lock
profiling
database
youtube image

13 Apr 2023

No description provided.
  • 3 participants
  • 15 minutes
duration
faster
incremental
slow
prometheus
thanos
urgency
rule
capacity
substantially
youtube image

30 Mar 2023

No description provided.
  • 6 participants
  • 54 minutes
virtualization
vm
virtual
vmware
macos
virtualbox
installations
linux
machine
complicated
youtube image

16 Mar 2023

No description provided.
  • 6 participants
  • 58 minutes
cache
analyzing
reddit
observability
throughput
measuring
cash
data
patches
compression
youtube image

2 Mar 2023

No description provided.
  • 3 participants
  • 15 minutes
troublesome
error
cluster
redis
staging
cache
configuration
server
interceptor
patched
youtube image

16 Feb 2023

No description provided.
  • 5 participants
  • 39 minutes
logging
monitoring
prioritize
important
rpcs
italy
rails
concurrency
services
gitly
youtube image

2 Feb 2023

No description provided.
  • 6 participants
  • 42 minutes
migrated
migrations
rethink
manage
multi
reads
evaluated
data
shared
replicas
youtube image

12 Jan 2023

No description provided.
  • 6 participants
  • 53 minutes
uploading
uploads
cache
gitlab
backlog
copy
grab
request
redis
problems
youtube image

15 Dec 2022

No description provided.
  • 4 participants
  • 29 minutes
capacity
sizing
max
replica
allocations
workloads
scaling
cpus
throughput
saturation
youtube image

8 Dec 2022

No description provided.
  • 6 participants
  • 40 minutes
memory
capacity
utilization
container
saturation
overloaded
useful
advisory
monitoring
com
youtube image

1 Dec 2022

No description provided.
  • 3 participants
  • 34 minutes
psychicron
throughput
timeline
scripts
issue
ahead
cluster
redder
prediction
io
youtube image

15 Nov 2022

Walkthrough and retrospective of the work that the Scalability:Projections team did to migrate redis rate limiting from running on VMs to running in kubernetes.
  • 3 participants
  • 22 minutes
redis
staging
registry
deployments
setup
vms
replicated
migrate
kubernetes
observability
youtube image

10 Nov 2022

No description provided.
  • 5 participants
  • 29 minutes
queues
routing
configuration
users
tweaking
managed
issue
deployments
latency
sidekick
youtube image

13 Oct 2022

No description provided.
  • 5 participants
  • 31 minutes
pipelines
scheduling
gitly
staging
process
dashboards
ops
cache
thread
ahead
youtube image

6 Oct 2022

No description provided.
  • 3 participants
  • 1:01 hours
timeline
timeland
io
juncture
saturation
provisioning
context
runs
help
gitlab
youtube image

15 Sep 2022

No description provided.
  • 4 participants
  • 33 minutes
setups
regions
deployment
planning
send
psychic
capacity
red
partitioned
p6
youtube image

20 Jul 2022

No description provided.
  • 3 participants
  • 40 minutes
cluster
kubernetes
registry
cache
infrastructure
plan
rollout
project
redis
vms
youtube image

16 Jul 2022

No description provided.
  • 3 participants
  • 23 minutes
authentication
redis
registry
setup
accessing
ports
passwords
tcp
connect
sentinel
youtube image

4 Jul 2022

No description provided.
  • 4 participants
  • 45 minutes
reprovisioning
slowdowns
memory
redis
capacity
quickly
eviction
duration
throughput
needs
youtube image

14 Apr 2022

No description provided.
  • 3 participants
  • 26 minutes
ongoing
thinking
having
concern
liam
overall
rollout
review
teams
scalability
youtube image

18 Mar 2022

As part of our investigation into a WAL archiving saturation incident (https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6581) we got into an ad-hoc profiling session, and general introduction into CPU profiling.

Participants:

- Matt Smiley
- Igor Wiedler
- Alexander Sosna
- Biren Shah
  • 4 participants
  • 38 minutes
perf
processor
profiling
processes
throughput
host
postgres
perfscript
script
thread
youtube image

17 Mar 2022

Liam and Marin discuss creation of self-serving platform in Infrastructure and how this aligns with existing design platform
  • 2 participants
  • 21 minutes
pajamas
infrastructure
formalizing
project
structure
manage
thinking
fancy
understanding
deployment
youtube image

24 Feb 2022

No description provided.
  • 7 participants
  • 36 minutes
latency
git
redis
servers
bump
monitoring
upgrades
prioritization
slow
noticed
youtube image

17 Feb 2022

No description provided.
  • 3 participants
  • 22 minutes
dashboard
sli
services
budget
resolution
troubleshooting
sharing
optim
proposal
monitoring
youtube image

3 Feb 2022

No description provided.
  • 4 participants
  • 12 minutes
incidents
declared
woodhouse
incident
interface
policy
notice
process
improving
disclosure
youtube image

27 Jan 2022

No description provided.
  • 4 participants
  • 1:14 hours
demoing
staging
deployments
provisioning
process
schedule
prepping
capacity
production
git
youtube image

13 Jan 2022

No description provided.
  • 2 participants
  • 42 minutes
ssh
efficient
inefficient
git
gitli
server
cache
proxy
commit
ci
youtube image

25 Nov 2021

No description provided.
  • 9 participants
  • 34 minutes
psychic
functioning
things
analyze
project
issue
magic
help
cluster
bit
youtube image

11 Nov 2021

No description provided.
  • 5 participants
  • 39 minutes
silences
silenced
silencing
alerting
message
behavior
knowing
interrupts
conversation
slack
youtube image

28 Oct 2021

No description provided.
  • 7 participants
  • 1:01 hours
workload
testing
scheduled
configuration
servers
prepping
execution
failovers
demoed
shared
youtube image

7 Oct 2021

No description provided.
  • 4 participants
  • 7 minutes
apt
upgrades
morning
issue
demo
staging
testing
slos
cpu
chef
youtube image

5 Oct 2021

  • 1 participant
  • 3 minutes
dashboards
gitlab
features
dashboard
grafana
graphs
budget
indicators
project
contributing
youtube image

30 Sep 2021

No description provided.
  • 4 participants
  • 29 minutes
git
efficient
rpcs
gigabytes
fetches
process
faster
giddily
server
gitly
youtube image

22 Sep 2021

No description provided.
  • 3 participants
  • 1:01 hours
slowness
durations
throughput
latency
bottlenecks
thresholds
query
reasoning
slowest
milliseconds
youtube image

9 Sep 2021

No description provided.
  • 4 participants
  • 29 minutes
sli
issue
metadata
deployments
slos
sla
service
registry
discussion
dashboard
youtube image

2 Sep 2021

No description provided.
  • 6 participants
  • 52 minutes
processors
giddly
processes
servers
cpu
utilization
faster
gopc
graphs
workflows
youtube image

12 Aug 2021

No description provided.
  • 4 participants
  • 55 minutes
marmite
toasts
delicious
yeast
australians
honey
brew
product
vegemite
salty
youtube image

6 Aug 2021

No description provided.
  • 4 participants
  • 42 minutes
logging
durations
servers
pinged
slowed
capacity
priority
red
behavior
suspicious
youtube image

15 Jul 2021

No description provided.
  • 3 participants
  • 20 minutes
process
issue
scheduled
staging
deduplication
users
replica
sidekick
app
binged
youtube image

14 Jul 2021

Stan, Matt, Andrew, Jason, Marin and others discuss some corrective actions following on from a production incident: https://gitlab.com/gitlab-com/gl-infra/production/-/issues/5158
  • 7 participants
  • 42 minutes
problematic
mitigation
bottlenecked
fixes
monitoring
concerns
caching
service
urgent
inefficient
youtube image

8 Jul 2021

No description provided.
  • 9 participants
  • 50 minutes
streamrbc
gitli
process
bottleneck
tcb
server
connection
streaming
problems
protocol
youtube image

24 Jun 2021

No description provided.
  • 8 participants
  • 55 minutes
prometheus
monitoring
troubles
5000
performance
scrapes
version
mirrors
observability
frequency
youtube image

22 Jun 2021

2 min video showing budget attribution for the Purchase group.


https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1114
  • 1 participant
  • 2 minutes
dashboard
error
budget
scalability
rails
puma
billings
failures
fix
endpoint
youtube image

17 Jun 2021

No description provided.
  • 8 participants
  • 60 minutes
calculating
reevaluate
trouble
thinking
discussed
theory
effort
figuring
evaluated
experimentation
youtube image

10 Jun 2021

No description provided.
  • 8 participants
  • 35 minutes
demo
concern
scalability
improvements
ahead
dashboard
launch
discussed
process
problems
youtube image

7 Jun 2021

No description provided.
  • 4 participants
  • 33 minutes
process
observability
project
assess
operational
task
staging
issue
preparation
monitoring
youtube image

27 May 2021

No description provided.
  • 7 participants
  • 23 minutes
prometheus
gitlab
repository
dashboards
runbooks
access
repo
copy
repositories
observability
youtube image

20 May 2021

No description provided.
  • 6 participants
  • 53 minutes
investigations
psychic
redis
throughput
processing
confusing
server
experiment
curious
red
youtube image

13 May 2021

APAC Scalability team Demo - Quang-Minh shows sidekiq routing rules in omnibus + helmcharts, and compression of sidekiq payloads
  • 2 participants
  • 20 minutes
routing
demo
configuration
helper
vm
process
testing
parts
repository
implementation
youtube image

12 May 2021

  • 6 participants
  • 30 minutes
queues
migrations
processes
currently
staging
kubernetes
tasks
prioritize
provider
app
youtube image

6 May 2021

No description provided.
  • 6 participants
  • 44 minutes
bottleneck
database
timings
monitoring
transaction
slow
inefficient
budget
server
dashboard
youtube image

29 Apr 2021

No description provided.
  • 6 participants
  • 8 minutes
deploying
process
runes
kubernetes
routing
configuration
staging
worker
mel
qa
youtube image

28 Apr 2021

No description provided.
  • 3 participants
  • 1:15 hours
bottleneck
gitly
italy
suspicious
cache
forked
throughputs
stuff
gcp
rationale
youtube image

22 Apr 2021

No description provided.
  • 7 participants
  • 50 minutes
sli
monitoring
cluster
issue
proxy
scheduling
observability
feature
dashboard
registry
youtube image

15 Apr 2021

No description provided.
  • 3 participants
  • 29 minutes
optimizing
sensing
budget
dashboard
planning
schedule
onboarding
decisions
rollout
prioritizing
youtube image

15 Apr 2021

No description provided.
  • 5 participants
  • 35 minutes
rerouting
catch
queues
processing
setup
workload
tweaking
suggestion
sending
bit
youtube image

15 Apr 2021

No description provided.
  • 7 participants
  • 38 minutes
react
query
oversight
troubles
runway
replica
complexity
approach
second
incident
youtube image

8 Apr 2021

No description provided.
  • 3 participants
  • 10 minutes
cache
server
bug
traffic
gitly
stuff
users
repo
security
host
youtube image

5 Apr 2021

A quick run through of the redis sidekiq scalability test harness from https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/956


It is crude, but has given us good numbers.
  • 1 participant
  • 13 minutes
throughput
redis
experiment
binaries
configuration
processes
kubernetes
scalability
generator
patch
youtube image

1 Apr 2021

  • 6 participants
  • 51 minutes
sidekick
redis
capabilities
suggestion
experiment
monitoring
demo
deployments
tooling
red
youtube image

25 Mar 2021

No description provided.
  • 7 participants
  • 27 minutes
recording
error
dashboard
aggregation
feature
mapping
monitoring
gain
help
process
youtube image

25 Feb 2021

No description provided.
  • 5 participants
  • 42 minutes
git
cache
rpcs
backlog
gigabytes
quickly
processed
servers
queues
logs
youtube image

4 Feb 2021

No description provided.
  • 6 participants
  • 28 minutes
gitly
git
thinking
gist
gitlie
hosts
epics
gita
gitlab
fork
youtube image

28 Jan 2021

No description provided.
  • 3 participants
  • 37 minutes
dashboard
dashboards
testing
grafana
git
helpers
bot
tooling
runbooks
graphonet
youtube image

23 Dec 2020

The group dashboards project we're currently working on:

- https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/360


Grafana folder where these dashboards are stored:
- https://dashboards.gitlab.net/dashboards/f/stage-groups/stage-groups
  • 1 participant
  • 3 minutes
dashboards
gitlab
dashboard
geo
stages
metrics
git
contributions
features
nodes
youtube image

22 Dec 2020

No description provided.
  • 3 participants
  • 27 minutes
dashboards
thinking
manage
dashboard
issue
virtual
tasks
scalability
improving
plan
youtube image

17 Dec 2020

No description provided.
  • 7 participants
  • 54 minutes
italy
canary
gitli
issue
alert
repositories
iffy
bother
important
monitoring
youtube image

16 Nov 2020

An introduction to the Infrastructure team division and main responsibilities at GitLab.


Official Team Structure documentation: https://about.gitlab.com/handbook/engineering/infrastructure/team/



More about Infrastructure at GitLab: https://about.gitlab.com/handbook/engineering/infrastructure/
  • 1 participant
  • 9 minutes
infrastructures
reliability
deployments
coordinating
gitlab
responsibilities
infrastructure
services
kubernetes
team
youtube image

13 Nov 2020

No description provided.
  • 4 participants
  • 13 minutes
dashboards
nfs
customization
dashboard
trivial
v2
mapping
generated
automatic
package
youtube image

11 Nov 2020

No description provided.
  • 4 participants
  • 35 minutes
dashboard
chat
slightbot
issue
realizing
feature
dashboards
users
endpoint
interface
youtube image

4 Nov 2020

No description provided.
  • 5 participants
  • 47 minutes
configuration
vms
proxy
deployments
interface
version
bypass
observability
filter
hosts
youtube image

7 Oct 2020

No description provided.
  • 4 participants
  • 19 minutes
registry
profiler
gitlab
vlogs
process
querying
longer
vso
repository
regular
youtube image

2 Oct 2020

Andrew shows Bob how we automatically generate recording rules from high cardinality metrics, and how to include a new feature_category label in that.
  • 2 participants
  • 38 minutes
overviews
attribution
monitoring
issue
aggregation
significant
feature
dashboard
registry
budgeting
youtube image

16 Sep 2020

No description provided.
  • 9 participants
  • 29 minutes
profiling
prometheus
profiler
process
rpcs
performance
extending
gitlab
logging
seeing
youtube image

18 Aug 2020

A quick demo of https://gitlab.com/gitlab-com/runbooks/-/merge_requests/2684, which allows SREs to quickly navigate between different observability systems, such as Kibana, Bigquery, Stackdriver and Sentry. The aim is to reduce the MTTD for incidents, helping to drive up the availability of GitLab.com.
  • 1 participant
  • 5 minutes
dashboards
service
kibana
grafana
demo
servers
visualizations
profiling
dashboard
data
youtube image

12 Aug 2020

  • 3 participants
  • 26 minutes
manageable
nfs
initiative
scalability
configuration
issue
talking
repo
organizational
finished
youtube image

13 May 2020

A demo for the new connection pool metrics recorded in GitLab (https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/153), because the setup to test this is quite involved
  • 1 participant
  • 3 minutes
monitoring
postgres
setup
connection
databases
geo
demoing
hosts
main
graphs
youtube image

22 Apr 2020

Discussion follows on from the issue https://gitlab.com/gitlab-com/www-gitlab-com/-/issues/7201 "Prepare on OKR for improving SpeedIndex on a benchmark of URLs compared to similar URLs on GitHub"

-----------------------------------------------------------

@andr3 (the Scalability Team's frontend counterpart) and I had a great call about this topic today: https://youtu.be/e2iccdgrY5s

Some points from the call:

1. Optimisations to GitLab.com's SpeedIndex benchmark would mostly fall on frontend teams
1. Breaking our Javascript bundles down into smaller components to reduce compile times
1. Ensuring that Javascript bundles are effectively cached between releases (ie, production deploy doesn't invalidate cache)
1. Is there more performance that we can squeeze out by taking advantage of our new Cloudflare setup?
1. Is a target SpeedIndex of 1000 a reasonable goal? @andr3 think's its possible
1. Serverside Rendering GitLab: https://gitlab.com/gitlab-org/gitlab/-/issues/215365
1. Managing frontend performance and bringing into Prometheus
  • 2 participants
  • 46 minutes
performance
optimizing
slow
index
speeding
efficient
users
servers
important
benchmark
youtube image

27 Mar 2020

No description provided.
  • 9 participants
  • 43 minutes
kubernetes
deploying
project
cluster
tcp
network
dashboards
repo
configured
testing
youtube image

2 Mar 2020

  • 4 participants
  • 35 minutes
considering
weights
manages
concern
capacity
trusten
reconsidering
tend
carefully
priority
youtube image

26 Feb 2020

Discussion between Grant & Jason, related to the self-managed scalability workgroup's design of reference architecture using the Cloud Native GitLab Helm charts.

We covered things like:
- Why *not* Omnibus in Kubernetes
- Separate of components by concern within the Helm charts
- Scaling workloads vertically and/or horizontally
- pre-scaling at minimum 50% or more expected load, and maximum to 110% (straight to 100% for tests)
  • 2 participants
  • 54 minutes
advanced
kubernetes
cloud
clusters
dashboards
architectures
tweaking
migrating
deploying
backends
youtube image

20 Feb 2020

A demo of https://gitlab.com/gitlab-com/runbooks/-/merge_requests/1930 which automatically generates Kibana Searches and Visualizations from Grafana, using Jsonnet and the Grafonnet library.
  • 1 participant
  • 4 minutes
dashboard
graphs
workhorse
indexing
demo
dashboards
visualization
elasticsearch
useful
logs
youtube image

1 Jan 2020

Speaker: Andrew Newdigate

GitLab.com’s monolithic Rails application experiences high week-on-week traffic growth. To ensure availability, GitLab’s Infrastructure team track and plan ahead in order to avoid hitting capacity limits in the application, whether these limits be CPU, database connection pools, memory, storage or any number of other finite resources. Hitting these limits could result in hours, or days, of degraded service while workarounds are put in place. With this in mind, the team set about building a set of tools on top of Prometheus recording rules and alerts to provide them with the information they need to be sufficiently forewarned, up to a month in advance, of potential resource saturation issues. If you’ve ever felt that you’re reactively responding to resource saturation issues, this session will provide practical examples of how we’re building a framework for resource planning into our SRE team workflow. We’ll be presenting our open-source solution and explaining how it works for us.

Slides: https://promcon.io/2019-munich/slides/practical-capacity-planning-using-prometheus.pdf
  • 7 participants
  • 28 minutes
capacity
bottleneck
gitlab
redis
throughput
resource
monitoring
server
infrastructure
benchmarking
youtube image

4 Dec 2019

Hordur and Andrew discuss how AutoDevOps can be better monitored using the key metrics framework used for monitoring the components of GitLab.com.

This follows on a outage in the feature https://gitlab.com/gitlab-org/configure/general/issues/9
  • 2 participants
  • 50 minutes
monitoring
dashboard
alright
ops
report
metrics
diagnostics
taking
noticed
deploying
youtube image

25 Nov 2019

A really quick video that demonstrates how to use the Grafana Explore user-interface to drill-down into the visualisations in Grafana, for deeper adhoc analysis.
  • 1 participant
  • 1 minute
graphing
dashboard
query
graf
graph
monitoring
explore
pull
gravano
view
youtube image

15 Nov 2019

Andrew takes Marin through GitLab.com's SLO framework.

Some topics covered include:
* Symptom-based Alerting vs. Caused-based Alerting, RED Method Monitoring, USE Method Monitoring
* How we calculate the SLI, SLA, SLO for each service
* How to use our Grafana graphs to visualise the SLA trend for each service
  • 3 participants
  • 34 minutes
indicators
slo
sidekick
improving
dashboard
metrics
demo
reliability
monitoring
technical
youtube image