youtube image
From YouTube: 25. #everyonecancontribute cafe: Observability with Opstrace

Description

Opstrace starts at 5:56 after introductions.

Blog: https://everyonecancontribute.com/post/2021-04-14-cafe-25-opstrace-observability/
Twitter thread: https://twitter.com/dnsmichi/status/1382365947122581506
Website: https://opstrace.com/

Open Source observability is moving fast, it is hard to catch up. We want to make things easy to deploy and use.

Insights

- Quickstart installation in AWS.
- Opstrace deploys Loki, Cortex, Prometheus, Ingress Controller, APIs, UI, Grafana in the Kubernetes cluster in AWS.
- Authentication with Auth0, future brings Dex to provide SAML, etc. for SSO.
- Grafana comes with default dashboards.
- You can send data to Opstrace from a local demo environment with docker-compose.
- Metrics generated by Avalanche, scraped with Prometheus. Log messages scraped with Fluentd. - Grafana combines Loki (logs) and Prometheus (metrics) as data sources.
- Easy to use Prometheus Alert Manager, configuration using an API for automated rules creation, or a UI. The Cortex functionality is proxied by Opstrace with an authentication token and API interface.
- Roadmap ideas: SLOs and error budgets - generate rules and provide templates out of the box.
- Monitoring Cloud Vendor Metrics, no Prometheus provisioning. Instead, send configuration over the API and a new cloudwatch_exporter container is deployed to the Opstrace tenant.
- Open discussion with ideas and questions:
- High Availability - out of the box, Cortex comes with 3 nodes by default, and cloud/Kubernetes takes care of failover.
- Which problems are not yet solved with monitoring/observability?
- Now focus on onboarding, easy to get started with Open Source, similar experience like Datadog.
- Improve usability of Grafana, should be much more collaborative as a UI. Make it a debug session, and instead of using Google docs / Notion, add text, graphs, etc. and have these documents live in there, even after a year.
- How to answer any question - links between logs, metrics, traces. Exemplars for linking metrics and traces, released in Prometheus 2.26. More on this Grafana blog post about Tempo and our 6. Cafe with Tempo when it was announced in October 2020.
- Integrating Opstrace, e.g. a graph into Merge Requests from a staging deployment.
- Join the issue tracker and Slack to discuss development ideas.
- Thought of integrating Vector for logs?
- What was the intention to create Opstrace?
- Ask infrastructure questions, and needed to collect data. We love Prometheus, but there is still so much to build.
- Datadog and it runs in your SaaS, first idea was more closed.
- Continued to iterate, we are standing on the should of giants - make it an open source project. It is harder.
- Don’t re-implement everything, work together.
- Reporting dashboards & customization - make it easy to use.
- Incident management integrated with GitLab and alike.
- As a developer, I don’t care about the configuration or the service being run in Kubernetes. I want to see metrics from a staging deployment, and focus on the fun stuff.
- Security comes out of the box - communication between monitoring nodes. GDPR for logs, and compliance levels. What data is stored in the backend
- We’ll revisit Opstrace in the future and see how things are going. And of course try it ourselves, maybe in a future #everyonecancontribute cafe.