Cloud Native Computing Foundation Online Programs, 20 Mar 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: How to whiteboard your software catalog taxonomy

Description

Don't miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe in Amsterdam, The Netherlands from 18 - 21 April, 2023. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

A

Hey everyone: my name is Zohar I'm, the CEO of Port. Today we're going to talk about how to whiteboard your software catalog taxonomy uh into your internal developer portal.

A

um So essentially, uh if you're hearing this webinar and you try it out backstage and you have a bunch of questions so I'm going to shed light uh today about what should be taking into account while you're modeling the software catalog. That needs to represent your way of work and architecture.

A

So in this webinar we will cover uh both introduction to to idps the Seven Pillars. That needs to be included as part of the IDP, that you have we're going to focus on the software catalog, which is one of the main blocks of it, and how you should choose the right model for it: how to bring data into the catalog from the various sources. How you can use backstage plugins to do that and after you did, you did all this amazing job.

A

What's next for for IDP, um so part of it is is solve service for the Developers foreign.

A

So it all started two years ago, when backstage was released to the World by spotify's engineering, by their platform team and essentially, they had backstage as their own um IDP that helped the entire engineering to use a unified interface where they can consumed everything related to to devops and development life cycle, in a way that they can understand right by both giving them a visibility layer into the developed components, something that they can comprehend and act upon and also being able to consume resources and consume services off the shelf and to become self-sufficient engineering.

A

So there are a couple of building blocks to an internal developer portal and we are not going to going uh about every each one of them today, but we're gonna more focus on the software catalog, but just to give like an a quick overview of what should be included. So software catalog is definitely one of them.

A

The second one is the self-service part where new care, where you can allow developers to um to act upon the catalog and to consume all kind of self-service actions like scaffolding, a microservice, creating a development environment for five days, adding an environment variable to a service and so on and so forth.

A

um Soft storage maturity is another important pillar where you can basically embed your organizational standards for development and make sure that they're being met in a way that your engineering can can follow right like by using scorecards. You can basically certify your software in terms of production, Readiness security, privacy, compliance and so on and so forth.

A

The fourth pillar is the automation for uh different workflows, so, for example, uh Auto terminate the resource that was consumed by the sole service action or even use uh the API of the software catalog as part of the cicd jobs that needs to deploy uh to deploy certain services and you might want to fill or or pass a build according to the data, resides in your software catalog.

A

The last pillar for this block is the rbac role-based access control, where you can basically decide like what is the level level of control uh each one of your um each one of your users has right like who can see specific data in the catalog and who can perform certain self-service sections, and this is being used by the Arbok, which is basically a key driver for developer experience, because it reduces all the noise uh from the things that people don't need to see and don't know about.

A

So they can just have what they need and put them on the golden path to get it reducing the cognitive load and reduce all the noise behind the scenes.

A

On top of all that, you have the ability and the power to create insights and reports for your organization on and for your own needs right, so you can keep track of deployment, success, success rate, durometrics, nttr um and so on and so forth and last you have the UI, API and chatops interface, which is the unified way to consume all these uh blocks uh in in one single interface.

A

So today we are going to focus about the software catalog and we are going to talk about which is basically and usually the first step into IDP and how you should think about the structure of it and how you can bring your structure into it.

A

Okay, so basically, what does a software catalog provides? Essentially right, so it provides your engineering with a simple way to get answers to very complex questions. uh Questions like where is the log4j currently deployed across my infrastructure. What is the current running production version of a given service right like who owns this microservice? Where can I find the API for it, and the list goes on right, so you can imagine the software catalog as some kind of a visibility layer into the developed components that you know how your engineering understand them right.

A

So it can be anything from microservices versions, environments, Cloud resources, Cloud accounts, and the list goes on right. So this software catalog is some way to represent all that with all the metadata that your engineering needs in one place in a way they can understand, and if you think about it, your software catalog will look differently than my software catalog and your friends of the catalog right, because every organization operates and makes specific decisions differently. So all of us are developing software, but we all like have different architectures for doing that.

A

So every one of us is some kind of a snowflake and when you think of a software catalog, you want to be able to bring your way of work into that. So how can you reflect different data models with with an IDP?

A

So first you need to be very opinionated and not you know compromise on your way of work, and you want to be able to reflect it directly into your IDP, for example, with backstage. So for doing that. You really need to think of your data model, which, essentially, there are a lot of commonalities with other organizations, but essentially your software catalog will look differently than others right. So you really need to be opinionated and not like uh think that you should change something for the chosen way of work.

A

So what should be like baked inside your taxonomy right into your software catalog? So the most famous uh component, and probably the one that you will start with, is the microservices right. So you might want to represent microservices. You might want to also include packages, and you want to include kubernetes clusters and you want to include pipelines and- and you want to include a lot of different types of components into the software catalog right.

A

um So you might look at this list and be familiar or you know, think about like a commonalities, but I'm sure that there are a couple of components that are not on this list and you should be able to represent them as well. Right, like your custom resources and all these kind of entities and kinds essentially has dependencies between them, because we are developing software. Software has dependencies between one another and especially in a devops era.

A

Right so you have like uh services that are running on environments and environments are using Cloud resources that are hosted on different Cloud accounts and so on and so forth.

A

So to be able to bring your data model into the software catalog, you need two main building blocks that backstage provides the first one is types of entities right like you want to be able to um Define the types that you want as part of the software catalog, and you need some generic way to do that. So you need to wait to Define like schemas of entities. So this is the blooper.

A

Their orange part is the relations. So you want to be able to make relations to to reflect dependencies of the different software components that are being reflected in the software catalog right. So these are the main two building blocks that you need to bring your own data model into the software catalog okay. So let's build our first model together and of course, this is just an example, as you're probably going to have a slightly different um structure for your way of work.

A

So the first thing that we are gonna answer is um questions about services like who owns this microservice when, where can I find the API docs for it and so on and so forth? So the first component that we'll have is going to be the microservice component, so we are going to create this entity. We are going to apply all the different properties that identify it like the owner, the on call, the Links to the different documentations, the readme and so on, and so forth right.

A

So this is going to be the first, uh the first one. The second one is the system component right. So we basically want to be able to know what are all the services that are associated with the specific system right within my overall architecture, and then I probably want to know um like on a separate note.

A

What are the kubernetes Clusters that I own and where do they resides across different Cloud environments, because I operate in a multi-cloud environment, so I'm going to create two other entities of a kubernetes cluster and the cloud provider, because I want to be able to see all the Clusters across the different Cloud providers, because I use gcp and AWS, for example? And you can see this arrow that indicates the dependency between the entity of the kubernetes and the cloud provider and service and system right.

A

So this way I will be able to see the data uh with respect to the to the dependency that represent my way of work right now. I want to make some kind of a combination between the two, because I can get like very strong answers for that right.

A

So I want to be able to know what services are running in production, for example, uh right now, right and and when I say production, essentially because I work with kubernetes clusters that represent environments for me, so I might want to uh to see like what are the different services that are running across my environments. The kubernetes clusters right so I'm, going to create another type of component that is called a running service.

A

So a running service is connected to service and kubernetes cluster, just because I want to be able to get the runtime data about each service that is running across the different environment and to have some kind of a metrics of all the services across all the environments, and then I will be able to see relevant data about. You know the version, the CPU and memory limits for each service in each environment and so on and so forth.

A

So I created this type called Running service into my taxonomy of the software catalog and I also want to know like what are the last deployment, for example, of a specific service to production. You know for just for root cause analysis purposes. I want to see like the thread of deployments across each type of service, so I'm going to create a deployment kind of entity that is associated with the running service, because each deployment has a logical connection and the reference to the running service. That is, uh that is deployed right.

A

So I created this kind of deployment, so I already have like a way to see all the microservices. What are the systems that are using uh like what are the services that are running as part of each system? I can see each service and where it's currently running in terms of the kubernetes cluster and I can also know, like all the kubernetes Clusters, that I own running across Cloud providers and I can see all the deployments that took place and is referenced to the relevant service and environment that it points to right.

A

It's very powerful to have this kind of data model already, but I just want to add like one last component, which is the package, because I want to be able to see not only the services version that are running for each environment. I want also to see all the packages that were built as part of the deployment process for each service, because I want to know like if I have an incident or a vulnerability found.

A

I want to easily find where it's currently running in a resolution of a service, a cluster, a cloud provider or in a service right. So this is how I chose to architect my basic model, and this is also how I recommend you to think of, like an initial use case for your software catalog, so essentially to to accomplish this kind of data model with with backstage so you're, provided with all uh with all types of uh of ways to to reflect this kind of entities and relationships.

A

So these are called kinds, so you have five five kinds that are provided, so you have the component. You have the resource, you have the API the system and the domain right. So so for this. For this simple example, we use the component type to represent the package in the service. uh The system, of course, the resource to represent kubernetes clusters and Cloud providers, so um the building blocks that are provided but by backstage are good to show metadata that is being brought by a github's GitHub sway.

A

So these are essentially manifest files that reside within your git repository and are being fetched into the backstage software. Catalog, but uh to bring data that is more relevant to the runtime and to represent it in a nice way that connects to all your resources. You need to use the plugins to do that so to bring runtime information about their services. For example, kubernetes holds relevance, the relevant data about it. You might want to use the plugin for kubernetes and to bring data about the ICD.

A

You might want to use the relevant plugins for that to reflect deployments so, of course, to be able to create this kind of architecture and to reflect it just the way that you want. It might require you to make some work and to adjust everything together, but it's something that is achievable by the by the model that is provided.

A

So essentially, this is called the C4 model and backstage provide the five five kinds of uh of ways to reflect the metadata about your software. So the first one is the component. The component is essentially um some every kind of piece of software, from services to packages to backend service, to data pipelines and so on and so forth, and it is being tracked by The Source control um that you that you maintain as a service owner right.

A

um The second one is the API, so the API is an important part of the catalog and essentially allows you to make uh like to represent the connection and the boundaries between different components and the way that they are being consumed with one another right. So you want to reflect like the the API definition, whether it's a protoba for graphql, as a data schema or and to Define, like the the code interfaces between them right.

A

um So they need to be machine, readable formats, so they can be built for, for you know further tooling and Analysis. On top of that, the third component is the resource, so the resources are essentially like all kind of infrastructure pieces from S3 bucket Pub sub databases like anything related to your uh to resources and specifically Cloud resources. So, by by modeling them you will have a better way to visualize resources and to create tooling around them right. um The next one is the is the system component.

A

So you know, as you have a lot of software component, you want to be able to create some kind of an abstraction uh and to bundle a couple of resources and components either one umbrella that will be presented as a system right.

A

um So you want to be able to have some kind of a logical um way to combine everything together and give it a name. So you can be. You can essentially encapsulate a couple of resources and components under one one umbrella.

A

uh The second one is the domain, so the domain essentially is also a way to encapsulate a couple of related entities and it's very useful to create a group of couple of systems with short-term terminology, um usually around business purposes, so you might have a couple of business domain within your company and you you want to reflect the software uh with respect to the business structure right.

A

So these are like the types of components that are being provided by by the C4 model and by spotify's backstage, but if you probably want to add more types of components into the software catalog, um so you might want to extend the model and- um and you and and that are not provided by these kind of components to extend it to a more uh custom use cases. So you might want to have to write some code for that. But this is absolutely fine.

A

uh You can do that, but it will not be provided by the out of the box um building blocks by uh provided by the backstage right. So, but I will not cover it this this in this session.

A

So let's talk a little bit about ways to ingest data to the catalog using the using backstage so for most components you will, uh you will only be able to bring the data using the github's operation so for packages, Services clusters and so on and so forth. You will need to maintain like git files as part of your repository, and this will be automatically reflected into backstage software catalog right.

A

um So mostly, the components that are being provided by the C4 model are good in order to reflect metadata about your software right. So Services clusters packages, things like that for runtime use cases like running services and live deployments, things that are more like um you know, live and ephemeral.

A

You might want to use the plugins for it and think how you want to connect them and how you want to reflect this kind of data into the software catalog, so plugins will be better for live data and the other types of resources will be better for for metadata. The GitHub sway right.

A

So let's talk a little bit about the segmentation of the different plugins that are provided by by backstage. So you have the cloud the CI CD, the GitHub, the kubernetes, which is like a standalone and the SSO for the cloud provider for the cloud plugins you can. You can bring data about cloud formation, lambdas pipelines and things like that across the different clouds that are being used by your organization.

A

The second type of plugins is all the plugins for the CI CD for Jenkins, like different, like tecton pipelines, Travis, CI and so on and so forth, and and by the way like these kind of plugins are more for the data ingestion. There are other plugins for uh for visualizations and Views. So under the plugin world of vectors are a couple of segments like um like types of plugins, but this is more about the data ingestion plugin, so you have the CI CD.

A

You have the GitHub, of course, if you want to reflect data about actions, pull requests um and and all kind of like GitHub out of the box information that you want to to include, you have the kubernetes, which is an opinionated plugin. That gives you a way to show like clusters and and logs live Vlogs about clusters and basic kubernetes data, and it will like show you uh one form and the way that that the kubernetes cluster will be reflected in your in your backstage instance.

A

So you can use that and you can also reflect Argo CD flows as part of it, and then you have the the SSO which is not so related to the to the data catalog, but it's still important to emphasize that it also provided by a plugin. So you can use your OCTA your and to integrate with the ldap protocol in order to allow a single sign-on solution uh for your backstage instance right.

A

um So up until now, we spoke about the software catalog, so the next phase of the software catalog will be of after you did. Software catalog is the self-service part. So essentially, this is usually like a natural way to uh to proceed once you once you've done that. So essentially you want to allow all kind of actions about it, and this is the next phase of of your IDP um and what we recommend is proceeding with your backstage installation, so just to conclude everything that we've went through.

A

So we talked a little bit about uh what is an IDP. We took a deep dive into the software catalog how to model it. What are the building blocks that are provided and should be provided as part of your IDP solution, specifically about backstage how you can use backstage building blocks to architecture the data model, and we talked a little bit about the C4 model.

A

Another way, another thing that we talked about is the way that Arbok can be used in order to adjust views for your own personas and think about how you can take the next phase of the IDP into the self-service part where you can allow developers to act on their own and to be self-sufficient engineers.

A

So if you have any kind of thoughts or comments, I will be very happy to talk about it. We are really excited about the field we think backstage is great and it's a great community and a great open source project so really feel free to reach out by email to me at Zar, getport IO and feel free to try it out it's open for self-service and you can try getport IO, um it's open and free for use. um Thank you. So much I hope you enjoyed this conversation and please reach out.

A

If you have any questions.