Red Hat OpenShift Red Hat Summit 2018 | OpenShift, 10 Jul 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Gaining Observability in OpenShift Using InfluxData

Description

Learn how to quickly set up a robust monitoring solution for OpenShift using InfluxData. We'll touch on best practices for gathering and storing time series metrics with Telegraf and InfluxDB, visualizing that data using Chronograf, and alerting your team when problems occur using Kapacitor.

A

My name is Russ Savage I am giving a talk about gaining, observe illud observability, an open shift using influx data, a little bit of a mouthful but we'll talk through what is in flux, data if you're not familiar and then we'll talk a little bit about best practices with open shift and other kubernetes platforms and give you some nice screenshots, so got 20 minutes I'm up against lunch. Actually, last time, I gave a talk in presentation.

A

The president, the the conference ended at 4 o'clock on Thursday and I, was at 3:49 on Thursday, so I'm moving up in the world I think by maybe 2025 I might be keynote. You guys saw it first. You guys were here at the beginning. So congratulations as I said: I'm a product manager at influx data, I started in October I work specifically on two of our products, coronagraph and telegraph. So if you want to talk to me afterwards about that anybody who's not familiar with influx data, our booth is literally right.

A

Over there Nikko are the marketing team is waving. We deliver a modern engine, not urn engine for metrics and events, otherwise known as time series data right. So anything you need to do with data that has a timestamp associated with it could be infrastructure. Metrics could be IOT. Sensors could be. You know if you're using openshift a lot of container metrics all that stuff. You can use our platform to collect, gather, display and keep track of that data.

A

A couple more marketing slides before we get into the actual details here, but we're really focused on developers and builders. We provide a platform you take that platform and make it your own right. So we try to be as easy to use as possible we're open source. We love open source. We love the community. All the all the stuff I'm going to show today is open source and we're. We pride ourselves on being easy to deploy. Our CTO always talks about fastest time to awesome.

A

So the goal is to make sure that you can get up and running with data and graphs in a matter of minutes. So that's kind of that's that's kind of our our focus.

A

Where rates will be small, but we've got some big customers. So a couple names on here that that you may have heard of so a lot of a lot of big enterprise customers, trust us with their time series data and.

A

Yeah and that's exciting, so first off there's a mean: is anybody using influx data today I'm using influx DB anybody using coronagraph for capacity or any of our products? Okay, so nobody really knows knows what's going on, so anybody who has heard of us knows about our our time series database influx DB.

A

We started building that for four years ago, approximately and realized very quickly that you can make the most amazing database in the world fast performance Enterprise ready, but if you don't have the things around it, you're not gonna, see very good adoption, so we started building out other pieces of our platform. Starting on the left with Telegraph is our collection agent. It's an open source, go collection agent, very, very easy to extend and build your own plugins, for we have about 150 out there that's maintained by open source community.

A

We have internal developers that are maintaining it as well, but most of the code is contributed by the community and you can use that agent to collect and gather metrics and push them into influx. You can also use the agent to push it into other things. We obviously would, rather you push them into influx, but we also have companies like wavefront that leverage our agent to push into their system and we're fine with that too. Data goes into influx. Db.

A

Next thing you want to see is what your database data actually looks like right. So a lot of people here, leveraging dashboards, probably in grow fauna, for a lot of their visualization tool. Tooling, we have our own called cronograph. We love grow fauna. They make an awesome product if we use cronograph for managing the arrest of the components in this back some custom visualizations and really exploring your data, and you can.

A

It comes in our in our platform as well, and the next thing is once you start seeing your data in the system you might want to. You know, be alerted when a system goes down when you start spiking your CPU, all that stuff, that's where capacitor comes in, so you can build out really complex, alerting mechanisms using capacitor and all of these kind of components work together as the we call it. The tick stack TI ck some people called the influx data platform. You can call it whatever you want.

A

Just we recommend you, you use it. So that's kind of an overview of our architecture of influx data. All the different products we'll talk a little bit more about them in detail, but you guys are at a Red, Hat, Red Hat summit right. What about open shift? I, don't know it's like a little containerized.

A

You know container system that redshift has kind of important so in flux, data and open shift right. So all of our all of our components of the platform they're all designed to run in containers and run in in any container architecture, including OpenShift, we're a certified technology partner with Red Hat. The partner marketing team had to make sure that I get that in there putting in the logo there.

A

So we work with red hat, make sure that all of our stuff runs on runs on Red, Hat, Linux and in their system and we're really an enterprise-grade metric system. So you know, there's a ton of metrics options for out there right and a lot of people are probably using Prometheus for doing a lot of their metrics collection, which is great. We love Prometheus.

A

You know a lot of people that leverage us along with Prometheus, come to us because of our enterprise features right, so enterprise security, highly availability, high availability, clustering, we have a lot more control over retention policies and and how data gets expunged from the system and multi tendency features that are kind of built in. So a lot of people will start out with Prometheus and realize that they need a little bit of longer term view on their metrics or a little bit more security or metrics across a bunch of different containers, kind of aggregated.

A

That's where influx steps in and kind of becomes the back-end storage for a lot of that stuff did I mention we're a Red Hat certified partner to make sure that I get that in there. Otherwise I'll get I'll really get in trouble.

A

So, aside from being marketing, fluffy talk about influx data, are gonna want to put some content in here about anybody, who's leveraging our system and how to actually deploy it into into kubernetes OpenShift docker kind of applies to all of these contain all these all these pieces, so before I start jumping in where these, where these come from so internally, we run kubernetes cluster for part of our part of our internal infrastructure, and so we leverage all of all of these things that I'm gonna talk about is how we deploy it on our internal kubernetes system, and so that's what kind of where these came from at a high level.

A

You want to make sure that you are definitely putting all your monitoring infrastructure into its own namespace, separate it out from your from your other application. Namespaces, that's kind of table stakes. A lot of our a lot of our tools need persistent volumes in a containerized world. When things are spinning up and down, they need to be able to access the same. The same data so in flux. Db needs a place to store that information, capacitor and cronograph all have little little metadata databases that they leverage for keeping track of all those all those things.

A

Those all need to be put on persistent volumes for in your in your deployments. Now, when we talk about visibility in in openshift or other kubernetes environments, there's kind of two parts right- and you guys you guys, probably know this, but you need to track the underlying resources in the bare metal right. That's CPU, usage memory. You know things things like that. Any sort of AI ops- that's that's happening, but you also need to track the services that are running on those systems right.

A

So there's kind of two different techniques for for tracking each of those.

A

You guys know: Damon Damon sets kubernetes and openshift right. So Telegraph is our collection agent. What we, what we've done internally, is we've configured a daemon set for Telegraph use that to make sure the Telegraph agent is actually running on every single node in our cluster and reporting metrics back right.

A

We leverage a shared shared configuration via config maps for all of that information, so that tracks kind of the basics that I talked about earlier right, I'ope, CPU, network memory, disk anything, that's that's kind of generic across all of your nodes and all that data gets reported back into into a centralized influx right.

A

Storing that information by a config map makes it really easy to make a single change and then deploy that to all your all your nodes that way when you're, adding new nodes or bringing down nodes, they automatically get the agent it's automatically collecting data, and it shows up in in your infrastructure, which is really important because right in these things, you're not always you're, not always keeping track of where things are running and when and how many nodes you have.

A

So that's kind of that's kind of the first piece right that gives you visibility into kind of the underlying infrastructure of of your of your cluster right in OpenShift other metrics. We internally there's a bunch of different ways. You could do this, but we've used the sidecar pattern for this, so essentially for every single pod, you're deploying into these open openshift environments or other kubernetes environments. You attach in a telegraph container right.

A

Those containers share the network space, so communicating between your application and telegraph is is really easy and you can set up a set up Telegraph to scrape all the premiership Prometheus metrics you want so like I said, a lot of people are leveraging Prometheus as the endpoint format. The slash metrics scrape that data to bring it directly into influx in in an influx 1/5 latest latest DB. We actually can't accept Prometheus read/write endpoints directly into the database.

A

If you want to skip the skip to agent for collecting, that's fine Telegraph has the advantage of it will handle all the batching and batching a request for you, so it makes a little easier.

A

Telegraph will communicate to to influx over TCP, but we recommend for your application, sending metrics into Telegraph try using UDP. That means that if the agent goes down, your application won't break it'll, just fire off fire off things and if no one's listening to the socket, then you know those metrics will just get dropped, but at least your application won't stop running right. So so we just prefer UDP when you're talking directly to Telegraph and then Telegraph can push the data to influx via a TCP.

A

Just fine. So there's a lot of there's a lot of ways. You can do this, you can use stateful sets. You can use other other can't even remember all of them, but there's a lot of different options, but we use a sidecar pattern with with staple sets for for deploying this stuff.

A

So what does that mean? So once you actually get the data into the system? I talked about a couple different parts of the ecosystem that you need right. So now the data is being collected by telegraph, the agent it's getting pushed into influx DB for storage, rapid, rapid access. Now you actually want to see what the heck is going in there. You can use the data Explorer in cronograph browse through data and quickly chart out data information. That's coming into your system, so you can start building out more complex dashboards right.

A

These are just some examples of internal dashboards. We have four for kubernetes tracking CPU cores containers hosts all that sort of stuff. You know you guys, obviously have your have dashboards that you might leverage, but just to give you some examples of what they look like in cronograph.

A

One of the cool things so I work on the cronograph team, so pretty excited about that. But one of the long-standing features that we've been missing is table so we're adding that into the next release in the next couple weeks. So cronograph 1-5, we'll have table support, so you can at attach host lists and bad actor reports and log data. You can push all that into into coronagraph and see it on the dashboard. So that's gonna be really exciting. I'm pretty pretty pumped about that. This is just a quick example. This this is actually.

A

This is how we monitor of influx cloud, which is our enterprise cloud, offering we actually leverage our tools internally to monitor all of our customers that are using that. So this is an example for a particular cluster, very small cluster, obviously, but giving us the visibility into what what container versions they're running, what their I ops are, what their memory usage all that sort of stuff, which is cool.

A

So data is coming in. You can see all that data. Now you start identifying trends, and now you want to build out alerting on top of that right. So capacitor is a tool that we have to build out alerting it also. Does you know much more advanced stream processing once you start learning how to write tick script? But here you can set.

A

You know basic threshold, alerting deadman alerts, all that sort of stuff directly through the cronograph UI, which is which is really helpful, but if you're an advanced user, we also have a full tick script, editing ID IDE inside of inside of cronograph, so you can edit and manage your tick scripts directly in the browser without having to to go into you know whatever development development system you'd like so all these all these different components, they all work together to create metrics and monitoring platform that we think is really great for the for container din, barment s--, specifically openshift.

A

So, in summary, is this nice little slide wipes across the screen again in flux? Data read how it's certified partner again need to get that in there all of our products designed for container architectures designed to be run in a cloud from the very GetGo telling about use telegraph agents. It's really powerful agent, very flexible, very customizable, use that to gather all the metrics.

A

You want out of your systems and there's different deployment techniques in order to accomplish that and then leverage in flux DB, where you need enterprise-grade time series data with security with clustering, with high availability, all that stuff. So so all the different parts of our of our stack. You know really work together and create an awesome, awesome experience.

A

That's it for me. I've got technically four minutes left, but I encourage you guys if you're interested in influx data and our and our products, our booth, like I said, is right behind right behind. Here we've got people over there. That would love to chat about influx with you guys and get a better understanding.

A

So that's it for me thanks for thanks for coming, you.