Red Hat OpenShift Transformation | OpenShift Commons, 2 Jul 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Data Driven Approach to Community Development Diane Mueller (Red Hat) Daniel Izquierdo (Bitergia)

Description

Data Driven Approach to Community Development
Diane Mueller (Red Hat) Daniel Izquierdo (Bitergia)
OpenShift Commons Briefing
July 2, 2020
https://commons.openshift.org/events.html

A

Well, everybody welcome to yet another OpenShift Commons briefing, and this is going to be a fun one for me, because I have a colleague of mine that I've been doing. Some research with Daniel is guardo from bateria and we presented it was a week ago, Saturday at the ICG se a very, very abbreviated version of this. So we thought we would do a deep dive today, because there were a lot of questions about how to use the analytic tools.

A

Why we're doing it, and so we're gonna, take this opportunity, when most of you are probably off on vacation, to steal an hour from you and talk about how in the OpenShift Commons and the OpenShift ecosystem and the kubernetes and the CN CF the ecosystem. We've taken a data-driven approach to doing community development and how that has helped me really be able to be effective and nurture a healthy, diverse, hopefully very engaged community. Around OpenShift okd kubernetes. All the CN CF projects were incubating operators, all those kinds of good things.

A

So gonna move motor through this so that we can get to the deep dive a little bit, but I'm gonna set the stage first and then we're gonna have depth. Daniel do a bit of a demo of how this all works. So this is the paper that Daniel and I wrote together.

A

This diagram is out of date, though, because it is based on data from github and other sources which we'll talk about, but it's what I generally refer to as my jellyfish diagram and it basically maps out all of the network relationships the network, the relationship was, and the networks between them between the projects, the people who are contributing to the project participating in them across the CN, CF, kubernetes and openshift ecosystems, and we'll dive into that a little bit more and if you know Red Hat.

A

You know that you've probably seen this screen before, and you know that we're really all about open source and believing very deeply, and it's in our DNA that open source is the source of all the technology. Innovation, that's happening today in the world, and you know github is where we we live and breathe these numbers again, they're a little out of date but and they've grown exponentially I. Think it's a hundred and twenty-five million repositories.

A

At this point it's huge- and there are just a few of them on the screen and okd, which was formerly known as open shipped origin, is the one that we're going to focus on a little bit today, and so, if you don't know, okay D, it is the open shift, distribution of kubernetes. Basically, we like to say it's a function of kubernetes plus plus. All of the other things we add into it at Red. Hat, open shift is easy to find it's going to be GA.

A

Hopefully next week with the 4.5 release of open shift will have a distro of ok d4 for you, and you can try it out at ok d IO, but it basically is a community distribution of kubernetes and one of the things that happened over the course of the time. Maybe four years ago we switched from a standalone open source project that was origin to being rebasing re-architecting, an open shift on top of kubernetes and using heavily using containers.

A

So if you're, an old stir from open shift like me, you still remember gears and cartridges, but then, when we switched over, we really had to refocus how we looked at what community was and the reality check, and the honest thing was that from the most part, the contributions to origin were Red Hat based it was Red. Hat dominated. There were a lot once we took Red Hat out. There were a lot of external folks contributing to the project, but you know, and still today, on the value added parts of open shift.

A

They are primarily things that are integrated and added to it. The value adds by open shift so I'm not going to change my tune on that. That is really where most of it is, but big change has been for us and, and the complexity comes in when we have this ecosystem-based model that we switch to. If you go to Commons you'll see there are you know right now, over five hundred eighteen, five member organizations that are part of that?

A

These are energy users, integrators cloud providers upstream project leads tons of people having conversations that we have to interact with and understand where they're coming from, and then we've seen what I call the rise of the interrelated cloud native ecosystems and it's everybody shows this picture. It's crazy I know, but it actually is very helpful when you filter it down to some of the open source projects that are being incubated and that's really what I tend to focus on is these ones that are either incubated or graduated I?

A

Do trust me I, look at all the sandbox ones too, but for this analysis, we're just going with graduated incubated projects and then we're adding in the wonderful world of operator framework. So the vote just took place and it's just been accepted as an incubated project. I think it's going to officially be announced, probably next week. I think on the 9th is when the press release went out so that we have to add in for that all the operators that we're building things that are an operator hub and the operator framework itself.

A

So this landscape just keeps growing and it's impossible to really understand all of the relationships or to know all the people in your community. What I like to say is in the past, community managers usually focused on one single project and trying to get people to work on just that one, and we don't have that luxury anymore. There are so many interdependencies on the different projects that are working on pieces that are layered on top of openshift, are integrated into open shift or run underneath openshift and all of those release cycles.

A

Product roadmaps feature request, issues, everything you can possibly imagine you name, it all have an impact on each other and then the human side of it is as well is really I. Think the thing that, from community development point of view, is it's unknowable without using a data-driven approach. So you know I can create all the spreadsheets I want from mailing lists and analyze them up the wazoo by myself by hand, but about I think it was. Did we decide when we first met Daniel?

A

When you first showed me, the batter, gia grew more lab was at 2016 I. Forget it.

B

Was 2014 during the other stocks? I mean yeah, though.

A

We've been I've, been looking at this magic for a long time and have been implying it trying. You know, first, the dashboard, which gives you the pie, chart and breakdown of the contributors, and then this network analysis, stuff, and so we've sort of ingrained this into the way that I, my day-to-day approach to working with the many communities, so I've been able to scale myself in some ways in a way that I couldn't formally do without having that data-driven approach and these data driven approaches, the sales teams use them. Serums are them.

A

You know those are customer relationship management things. This should be a community relationship. Management tools is what the way that we look at this, and so basically, what we're just doing is applying some data science and analytics to the problem space of understanding who's in your community, how to nurture them, how to support them and how to reach out and connect with them and connect teach them to each other. I'm gonna stop I'm gonna.

A

Let Daniel talk a little bit about now that you know how complex that the problem is about the tools that we have and the datasets and where we're working with so Daniel go right ahead. There yeah.

B

So then axis is based on Keith's repository. So if we think about the usual data sources in any open source community, we have a bunch of data sources by data source. I mean this is infrastructure that we may be using into. You haven't really mentioned that on some of them as the mainly list or we have select channels. We have git repositories, some of them are using git. Have some of them are using get a blossom stack, so there are. There are several of them, and typically those data sources from five to ten.

B

If we think about development activities, communication channels outreach to other to the general public. So in this case on for today, we are just focusing analysis, gate which is a big chunk of of case, and then we are. We are focusing on CN CF, open, CA and operators and go to the next slide, please so for the tooling- and this is how we are moving from from art to science, with what we call how to apply this data-driven approach to community development. So we are using remote lab tomorrow.

B

Up is part of the chaos period, and this is under under temporal in of condition. The chaos is the acronym for community health analytics or open source software, and we are working there in two areas.

B

One of them is defining metrics from tecnica of gnostic point of view to just discussing about metrics and bring in some specific definitions of those looking for use cases and- and there are several working groups there, so we are talking about diversity and inclusion, working group, african, well risk working group or a value working group from an open source perspective and then the second, a bunch of people are focused on on software, and there are, there are several tools there are.

B

One of them is remote lab, which is what we are presenting today and I'm gonna take participant for original developers here, and then we have a cover which is similar to doing a pretty focused on it have as far as I remember, and then there are a couple of extra push around so remember. This is the architecture that you can see. This is it's not only about retrieving information, so there is a reprocessing and post-processing of existing data.

B

There is specific problems that we have to deal with as identities or affiliation management the how to automate all of this. How to have this in improv and they're, not a very and how to produce value to the end-user right to start them. For four, from the left side of the of the chart, we have a bunch of data sources. Some of them we have mentioned in so git repository is darker JIRA. We have proxy Allah and some others.

B

Then, right after this, we have perceivable, which is the to retrieve all of these, and this is producing some data transformation. So this is your front-end to transform any kind of blog or API into a JSON document, and this is temporarily stored in some database, but then at the very end this is creating a new index in in elastic search. Elastic search is the database. We are using here the persistent database and then we are creating bra indexes at the same time, the tool that you can see right in the middle dumar ALK.

B

It is the data processor, so this is kind of saying: okay, I have a new JSON document, so I am storing this in elasticsearch and then at the same time, I am asking shorting health. Hey. We have a new identity here. What do I do with it? So shorting hat is the tool that will take care of all of the identities and affiliations and shorting help uses another database. Why do we have this? In this case? This is to be gdpr compliant.

B

So we have a kind of an external or third-party database where we can store everything, and then everyone can opt in or out from the rest of the visualizations and and so on. So you can analyze the information, let's say the ones we have sorting help doing its job. We have the raw indexes, then the next step is to enrich those indexes by enrich.

B

It means basically creating specific indexes focused on your business model, and the business model we are talking today is about community development, so we are producing. Those datasets that are in the row index is into something more meaningful for the final user, an example here. If we think about the deed activity, we bunch of comets right. So in a comment we see it all for the coming week at the committee we have the date. We have the time phone. We know the files that were modified or move or copies or or or created from scratch.

B

Then we have the lines for each of them, but we're ready to remove or modify as well. So all of this information can be parsed and can be transformed, so, for instance, by default, remote lab is probably producing a sprat. I, remember three or four indexes based on it information. One of them is for another granularity of commits, so we can go there and check who is working with who in what commits and or file paths etc. The next Graham might give you a more finer variety that we have here.

B

We can go at the level of of the file path we know were specifically is certain organizations where people can't even participate in that. So we are. We have some critical area in in our open source, pre it and those developers leave the community because thirst and turnover right turnover happens today, and we can look for the right expertise to try to fill that knowledge gap. But we need the data in advance trying to understand what's going on, then there is another index that we we can create.

B

For instance, we are creating, which is the analysis of what we call the onion analysis, but we think of open source communities as an early on it's a bunch of layers right at the very center. We have the core set of developers, so by definition we rename them as those producing 80% of the activity of the comments.

B

In this case, then, we have regular developers, those producing the next 15%, and then we have this long tail open source developers that we see in any of them story editor producing one comic, two, three, four five, those are the casual developers and those are typically filling up to the 100 percents of producing this last 5%.

B

From just let's say one data source which is heat. We can. We can start producing specific indexes right, so this is what we mean by and routine basis and then at the very end, at the bottom right part of the child. You see Kavita kids at downstream person of qivana, with let's say certain extra vitamins and plugins, and so on. Everything is open-source by the way and then there's the end user that can be swiped all of this information and navigate through the data and they can create new visualizations.

B

So this is a 2d. We are using a.

A

Couple points about this: I think you, you went a little fast over the sorting hat and identity merger and I just want to harp a little bit on this. If you notice all of those different data sources- and you think about you- know if you're listening to this later, how many different email addresses you use in all of these different data sources, and you know the the idea that we would know who you are? Are we as a community manager would know? Oh, this is my Stack Overflow first I wanna.

A

Oh, this is my Twitter persona. Oh, this is my github persona that that makes when you try and untie the knot that is community relationships. Those are some of the things that having this facility to do the identity merger across all of these different data sources is really huge.

A

It also leads to the other conversation that we have about anonymity and ensuring that we respect people's privacy and if they want to be anonymous somewhere, they are so a lot of what you'll see here today is we're really focusing on public identity, stuff stuff that is in github and that so what we're doing so?

A

If people think that that they're still anonymous in the world, we need to really let them know that that this is a very simple open-source tool and engine that people can it's really, you know longer anonymous, I guess it's the point that I'm trying to get to here and- and so that's that brings in another level of conversation about moving from art to science as well as you know, are we GDP are compliant? Are we following?

A

You know the legal stuff, so a lot of that conversate I think that's another whole day's worth of conversation too, but just to let people know we are working within the legal framework of how we are allowed to use this data and so just to set that stage and really I spend most of my time in that browser box at the end and a little bit of my time at doing some corrections in the identity merger space.

A

So as a community person working with this data set, you really need to have some domain X experience with it. So I. If you look at someone's github repo, it may have contributions to kubernetes Prometheus and then there's some gaming platform over in left field. You need to know enough about the ecosystem, to know that that gaming platform isn't really or hopefully isn't really something. That's has a repercussion for your your ecosystem, so having domain expertise about whatever you're analyzing is really important. So till I move to the next slide and let you there.

B

You go explain.

A

B

Perhaps just another body, so there are, there are already so we are not the only ones doing this right and there are already open source communities providing such information about identities and affiliation, pacific port for attribution, which is what we are doing here right to help advancing in the development of the community and having everyone on board earlier or faster with the proper tools. So communities is open, a stock or ciencia. They already have certain public datasets with specific identities affiliations for all of the developers, and this is civilian community created.

B

But that means that the you, as a member of the community, can Boulder and say I. Am this person I and I've been working this company a B and C for giving these years? But then your contributions will be correctly explained, and this is this is at the end important for for for organization, so they can. They can see specifically, what's going on.

B

We can have some other discussions about what does it mean, for instance, influence in an open-source community, so we can talk about specific roles as maintainer or proper developers to who's playing that role from what company that person is specifically coming from, and if we go for a more aggressive perspective, then we can go. We can go and have specific questions on what are my competitors doing in the technologies that are specifically key for my technological stack? Then you need to have certain knowledge, and all of these data driven approach is quite useful to understand.

B

What's going on there, because you can have requests and specific answers to those questions beyond your perception right, yeah.

A

I think I started out using the network analysis stuff to understand who was in my community and I, always say that when I'm talking to people who do community development that the most important first step is knowing who's in your community and how to connect with them and how they're connected to each other. So you can do all the content.

A

Development write all the documentation you want, but if you really don't even know who your audience is or who the participants are in your community you're ain't going to end up rewriting that or reframing it in some way. So but there's also the and we talk about it. Quite often the idea that this is one way to see where the community is going.

A

So in some of the earlier analysis, and we've done, you can see, as as things like, Yaeger took off and open tracing and open and Zipkin, and some of the things you could see. People moving from one project to the next and that historical analysis and hopefully predictive analysis- is the next layer that we might want to layer into this too.

A

To see where the as a Canadian is want to do where the hockey puck is going is really what you want to be watching for too so the baseline stuff that you need to do in my humble opinion, is really know who's in your community and how they're all connected to each other and then starting.

A

Then, once you have that grasp of your community moving and applying that to pay attention to new projects, survey, lists or Lego, or you know, a bazillion other projects as they pop up, because then you can start watching the key folks here and what they're contributing to, and it's really amazing what you can learn from this and you can get lost it's sort of like social media. You can go down a wormhole to, but you always come back up and and see how things are interrelated.

A

So it's a very it's been hugely helpful or developing the OpenShift, Commons and and are and making sure everybody is properly connected and supported, know.

B

From indeed from from that perspective, I think it's it's worth mentioning that, before entering to matrix discovery process, is really useful to have certain strategy on the table and certain Authority. So people tend to to have metrics for the pleasure of having metrics and the problem sometimes is that you, you may lose track of where you were going well. If you have a proper, you know mettle and strategy and action plan, then you can be come play with the data, but then you can come. You know that you have a part right.

B

So then this is the right way to do. Percent.

A

The other thing and we'll get to the demo in a second here, but the other thing, that's really important for people to understand too, is like pretty much. Every large project has a dashboard. You know that shows you the static stuff and who's the biggest contributor to this project and who's doing the most in this project. And it's you know it's a bragging right for corporate contributors or individual ones, and it's a great way to know how to reward people.

A

But it's um it's really almost useless for doing community engagement, those static pie, charts and things. You really need to understand the relationships, not the numbers and I think that's what this demo hopefully will show you a little bit of so I'm gonna, stop sharing my screen and let you share yours and then we'll see how we're doing here for time and we're doing. Okay does Daniel and I could talk about this for days and yeah.

B

Maybe maybe it's more to introduce a more the concept of personas and how we're playing with this? Do you think well,.

A

First show the that what you had, therefore, the opens the second tab there, because that's what.

B

A

Think is that the basis of the jellyfish for me, like the jellyfish diagram we use in the article- and this is really the thing that you can't see in screenshots and stuff, but you can dive into here- are the connectors here. So the large jellyfish there is kubernetes and the smaller one is openshift, and so we can look at the relationships between who's, contributing to openshift and who's contributing to kubernetes.

A

So if you dive, keep diving and it's you know, as the complexity gets bigger, you can start to and pantses in there Luca is in there. Seth is in there like I, because I've been working in the open ship. Community. I know almost everybody here, but if a new person pops in then I'm, you know I become aware of it, and you can also get list views of this and all kinds of cool stuff.

A

But it also starts to show you if you zoom back out I, think you've added in Jaeger here you can see who's working on OpenShift who's, working on kubernetes and who's, also working in Jaeger. So this became important for me when I a ger when the Jaeger team from uber and Red Hat said. Okay, we like some help from You Diann to get us into incubating status over on CN, CF and I did not know everybody in the community, so I was able to pull in this data.

A

Look at who from Red Hat was contributing, who, from uber and other places- and these were my key people- do you connect with to help move that project through to the next level, and the team did an awesome and you can see URIs there and a bunch of other folks, and so they may not have been contributing to my project. Okay, D, origin openshift, but they were contributing to a key thing in the ecosystem, Jaeger and open tracing that was integral to people successfully using the open shift and us deploying it.

A

You know in over 2000 enterprises. So this was a great way to use the network analysis in this space, and maybe you wanted to add a few more words in there about how this actually.

B

A

B

So so, just to mention to explain a bit more how this work so that we didn't didn't know it in the previous in the previous slide, although it was already explained the picture of the dots that we say are our developers, so those are those our display if they have committed something during the last year. You can see here to later.

B

Government is, in this case, open, safe and Tiger. Definitely think it's already a graduated project. Fine, we have this only assigned to in creating, but you negate we specify this filter here, so we we are sure that we were analyzing, only kubernetes, open, sea and tiger, so that that's why we know this is the other, and this is not any any other project in in the waiting or cat waited existent. So the bigger you are. That means that you have committed more or commits to that specific project.

B

So then we have some thoughts around that are bigger than the other photos are developers that have contributed some more comets than the average. We can see some of them here, and then we see it is this number of developers here that our game are, they have a net into kubernetes and they have a net into open sea. So this means that there in the last year, those developers all of this here have contributed to both words in this case coordinated and an offensive and the same.

B

The same thing happened here, so we have these three developers that within the last year we see that have contributed to open, save and Jager. In this case, and then we can see some other that I have contributed to cool this and the others well of these people.

B

So these are the basics of the network, diagram yeah, it's true. So in addition to these four, on top of these, we can specify certain filters as the ones we already provide. If we can go for a tank picker here, so we can go for the last month. If we are interested and then we can produce other or other kind of data sets or or widgets, for instance, you were specifically commenting the newcomers. We can have a list of the very last people that join the community.

B

So then we can say from a community perspective, hello welcome you. Can you can help them or facilitate the process too for the onboarding process and so on? So maybe you can be thinking a bit more again. Your specific work there yeah.

A

So I think one of the things that that is hard to tease out is retention of newcomers, engagement with newcomers when new organizations- and so from my perspective, I'm, very organizational based. So when the new organizations starts contributing to open shift or start using open shift, I want to know about it or, and so this data is also includes, they love an issue.

A

They have made a comment in Stack Overflow, all kinds of different places, so this really helps me as a community development person understand new entrants and when they are and then the onboarding process begins the outreach making sure that they have what they need, and that doesn't always mean like stalking them or throwing information at them.

A

It's just being aware is huge because then, when they come and they show up at maybe your event or they ask a question, you know you re aware that there that there and in the community- and that gives you a step step ahead, so I- think that there's like and we'll talk about this a little bit later is there's a number of personas that we tease out from this data. That really help, and maybe, if you dive into maybe the clayton :, your achill analysis that'll help a little bit too.

A

So if once you explain what you're showing here and if people don't know Clayton, then they don't know kubernetes I think that's a bumper sticker, some women. He is one of the lead contributors and architects for OpenShift and and on kubernetes itself. So his watching, someone like him evolve over time is really a good example of you know how how someone on boards and gets deeply deeply involved into a project.

B

Yes, so so this this task for contains a blog widgets as you can see, and then this is so far for the 2012 year. So this is eight years ago on the Left. We have the number of commits for each of the projects and then repeated for each of the players we have for each of the bars. We see more cars and they're in the next in the next year's for Clayton and then for each of them. This is split into the different different repositories. This developer has been participating at then.

B

At the same time, we have from the right like a diagram where we can see Clayton in the middle and then we will. We will see all of the repositories. Clayton have been participating at each of the year. So then we will see like snap sort of playing town for 2012 2013 2014 2015.

B

So this is our beginning up and save, and then we have origin, not WordPress example and then Chandler samples. So those are two three main main projects. Port Clinton was, in this case, contributing to we move on. Then this is 2013. Then we can see how there are some more players: Python interface, the website for for often see I, think it had press play and for Java and our client and some others. Then we see how the network is kind of growing on this option.

B

14 then we can see Offensive still and most of activity for played on, but then we go for for certain projects in so people instead of having the projects in the CNC for assistance believe by kubernetes for Jagger and so on same thing with or we have graduated and equated so then you will say how this keeps growing, but if we go to their specific procedure, is that we can see that this is permit is this is the API? And then these are examples to use, coordinate this and.

A

B

See how the is the whole activity of plate on 2014. In this case we keep at one scene. Then most of the work is in opposite your origin, but then more and more commit certain in the CNC effect system, 2016, even more retirees, and then we have been creating project so probably about some new plates in the CNC ethical system, plus all of the graduated one.

B

So most of them, as you can see who were made this examples, community tango cluster trade, history, API and governmentís, then we can go to 2017 and Creighton keeps growing 2019. We have some activity in the operator framework. We have started to participate there and then 2000 2019. So we have a breakup framework. Some waiting for us graduated happen see and then kind of nowadays, so the last six months approximately so this is most I think we have four plate them.

A

This is this is interesting because had we known nothing about Clayton and or the oncoming of kubernetes and we'd been watching Clayton back in 2012 evolved.

A

Theoretically, we could have started to see the importance in the rise of kubernetes to this. If what anybody outside of Red Hat, probably could have seen it I think we saw it inside because Clayton was vociferously endorsing the work that was going on in kubernetes, but I think you can see from this example. There's also ways to start seeing. You know, as people move to other technologies, whether they're edge your IOT or they start using open data hub or different networking solutions, or you know, load balancers or whatever it is.

A

You can start seeing when they start contributing to other projects or posting questions about them. You can start to see where things break down or where things are picking up speed and where projects are maturing, and so it's a really useful set of tooling for people who are ecosystem. Watchers, like myself,.

A

You want to add any more to that.

B

I thought we can move to the annotation persona. Specifically, what do you think yeah.

A

Absolutely and then we can hit the slides after that Natalie yeah.

B

Though yeah, so this is this example in the same way that we can move forward specific people or newcomers in this case newcomers persona.

B

We were discussing that it's it's important for you, I am the newcomers in the sense of new New, York initiations come in today to the community and then the relations with with other communities for organization. So in this case the example we see right here, this chart is Hoover activity in the whole gf+ offensive plant operators, so the dots again are developers, and then we can see that most certain certain specific repository.

B

So we have a contra scene, then we can see with some more open tracing up in tracing Jagger in this case, and then we have the developers working there. If you see more open tracing, DRP, see Prometheus, ok and then perhaps, if we move to the next one, then we can see how this is related to write write today, maybe you can elaborate a bit more about importance of connectors.

A

There's a couple of things that this is showcasing and is one I look at open shift from an organizational based set of glasses, so I like to look at whether it's uber, who is not an open shift customer how they touchdown in our in our ecosystem. How and then people who are end-users such are different spheres of influence and how we're connected to them. But this is also really you know, shows me if I need to find someone to talk about not just open tracing, but maybe Prometheus or chaos.

A

Engineering or you know whatever it is. This starts to show me the people who are the influences influencers or the connectors between projects so say: I'm looking for someone, who's done something with Griffin ax, open tracing and kubernetes and OpenShift these diagrams to speak to internally at uber. Right or you know, at a conference like ciencia. It allows me to figure out and trace, not to be using a pun, trace the relationship back to someone who might either be that person to speak or know the person or helpers.

A

Another person speak with a little bit more insight into the project. So it's really been a huge tool to help build peer-to-peer relationships to help foster collaboration across projects and to see where organically project cross-pollination between projects is happening.

B

So in this case, what we can see in this example our uber and private contributions to to those prayers of the material. So the NCLB graduated and made a operators an open, safe and then the land of colors. Is this purple Red Hat and then various kind of this from orange color. Then we can see that there are some over developers and then there are relations, because we can see that there are different.

A

Develop if you go back up a little bit that really big dot, there is Travis, Nelson I haven't know if you added rook in here, he's the gentleman or one of the leads on rook. So it's like it's interesting to see where people pop up in other diagrams as well so and there's a whole slew of work there so which repository is. Is that one connecting to that Travis is in the center of oh? So.

B

This is, these are all of the incubating planets the desire, all the players at the I.

A

Admit that is yeah. That's he's there because of rook yeah and that's again, why you kind of need some domain knowledge as well. So it's not a perfect is not gonna make you a I intelligent about who's in your community, but it does give you a big jump. Start yeah.

B

It's it and that's a really good point: the domain domain expertise because each time, sorry, it usually happens right, like I point to certain data sets and then they say: oh that makes absolutely sense because of these, and this really song and then I said okay, so it's like I can point to the specific oddities in the data set or specific, like highlight certain areas, and then you say: oh that makes sense because of this, and then you can go there and dig into the data and so on.

B

So that's that's really really important to have to stand them between context, knowledge and domain knowledge and expertise with with it.

A

Into the slide.

B

Just to mention to them yeah yeah, you, you served your screen. I will.

A

Share my screen now and we'll pop back in the sides and talk a little bit about the personas.

A

Here so we talked a little bit about this and let's see if we can get this to go forward there we go so this is you know the uber diagram, no I keep using that word, but the over thing, and- and this is again is a screenshot from 2019. It is much bigger now and I need to take a new screenshot.

A

Well, we dove in a little bit in to see how things worked with Jagger, with Jager and openshift and kubernetes, and that was really helpful for me again, especially when I was first learning a little Jagr and open tracing to be able to be knowledgeable about who was in the community was key for me to be able to be helpful in helping them, nurture the relationship with the CNC F to get to incubating and now graduated status so and I was not a participant in that community, so I had no foreknowledge other than that.

A

The other thing that it lets you tease out is were other people in who you know in the community like grapes, Swift I had no idea. He had any connection to Jaeger, so it was really pretty cool to be able to do this, and so this kind of led us to that kind of first pass at really leveraging the data led us to start talking about okay, deep personas, because okay D is really the project that I try and foster, along with a few others like Quay and operators and others.

A

But this is really then for me by assigning personas to these folks and help me sort of untangle the community relationships, and so we kind of at the moment I have about five personas, that I look at and categorize people as the ten gentle personas people who are in who are working in one community and working in, but not working in others, so they're kind of tangental to your project. They may not be working on OpenShift but they're, still important to OpenShift, so like URI from uber or connector personas that are working in multiple ones.

A

Those are really good, and then we mentioned earlier a newcomer, personas, very important part of community development is flagging, new entrants, fostering them making. You know understanding how long they stay, how long it takes them to get deeply involved. Very important aspect of community development, identifying project leads and personas. So Clayton, of course, was an unknown entity to anyone inside of Red Hat and pretty much anyone inside of kubernetes, but starting to figure out how to identify other folks, as we want to create more diverse and healthy ecosystems and someday Clayton might want to retire.

A

So who are we going to level up and put in maintainer and contributor roles? Who are doing that? You know to make sure we have a diverse and healthy group of project leads and then again for me organizational personal as that's when you aggregate everybody from whether it's uber Amadeus or any one of the end users that are using your project to really understand how they're using it what other projects they're using.

A

So, as we saw, we didn't actually have the data for OpenStack, but if we could have gone back further because Clayton had done some work on a little tiny aspect of OpenStack with me ages ago, and so when we bring in the OpenStack one, you can even tease it out how people migrated from OpenStack to kubernetes or in other aspects.

A

So it's really a very interesting way to see how people show up in communities and where things are going and the small part of OpenStack was a project called solemn, which was supposed to be open stacks platform as-a-service back in the day. If anyone remembers that shout out to Adrian Otto and yeah, so we could really dive into that.

A

The other one we talked about was organizational personas to be able to see where they're working in what space you know where they overlap when they contribute to your project to other projects and really we use that for I use that, on a regular basis, to understand what our end users are doing and to make sure that you know if there's a new feature or a new, that of technologies out there like edge or IOT or networking storage, you name it that they're looking at or starting to contribute to, and we want to make sure.

A

We know that I'm, a product person person perspective and from a community perspective, though we hate. Sometimes this is a good one here, I'll just walk through it quickly here this was all of the projects that CERN was contributing to and then the other person that we started to look at when we dive down into an individual person, because we knew Greg Swift, who is now at logged DNA, but at the time was at Rackspace. So he had some OpenStack connections.

A

Had that tiny little or not tiny, it was I'm sure it was a real contribution to the Yaeger conversations, but we could start to play out and see where they were playing in that. So it's really kind of interesting and plus I had all the data from common, so Henson.

A

Therefore in 2018 he was also my contributor to the conversations in the community award that we gave him so so there's really lots of great ways to use this and then, as I mentioned earlier Yuri, who was tangental but very important to this, as well as other work, that's being done at uber on operators and the operator pattern using it for m3/d, be showcase that a commons event to what they were showcasing that at the CN CF event a while back.

A

So getting these advanced signals, even if they're weak signals, they're still really important signals to be aware of what people are doing, and hence some influence around the conversations around operators and operator framework and the operator pattern kind of emerged, and that was that was pretty important again being able to really look deeply into one of your corporate customers personas, where they show up what they're working on using OCPD on Azure OCP on OpenStack we've.

A

There Amadeus has been huge open ship commons, community members, they've been onstage at Red Hat summit they've been in CN CF talks we but being able to really see where they're going and what new technologies they might be working on. We had them on talking about Kafka not too long ago on stage, because they were some of the leading lights using in an enterprise situation, Kefka and willing to talk about it. So that was a great opportunity to do that. There's also I mentioned going down wormholes.

A

This might have been one of the wormholes because, but it turned out to be not quite as bad of a wormhole as I thought. It was. We kind of laugh a little bit about this.

A

One is that the data is not always perfect and every once in a while it we do the sorting hat- and this is where we go back to now, having domain expertise teasing out why Kim Min showed up as a contributor to open shift, turned out to be a misinterpretation of the data in terms of one of the issues or something that was logged to something. However, it did give me a very weak signal that at Ali, Baba and Ali pay. They were looking at open shift and okd in origin, which then merged into.

A

They eventually had a deployment of open shift and okd there, so it was, and I ran into them at one of the CN CF events or was a Linux Foundation event, and they came up to me afterwards and say: hey yeah yeah we are this is this is who I am but correctly identifying? People is pretty important too, and then, as everybody's well aware of, we have another problem. Space 2 is now that IBM and Red Hat are conjoined twins and are all under one umbrella.

A

Learning who is in the IBM world that are also contributing to the different projects so that we can, you know, make the best and make take advantage of where we have other representation and other network connections in in projects. So that's another thing that we've been looking at closely with all of this data, so yeah those are pretty important relationships.

A

Obviously it really has helped us a lot from the Commons model, which is ecosystem based or open source community development that we're working with here at OpenShift and at Red Hat and really what our goal is is not to we're not trying to stop people or do that we're really trying to promote peer-to-peer interactions. So it allows us to understand where those interactions are happening across projects and nurture them too, as I always like to say, give away the podium, because it's it's often not about the code contribution at all.

A

It's more about sharing the information, the knowledge making the connections so that some he's working on one feature in one project that impacts another one, getting them to connect or be able to facilitate your future getting into their roadmap. Making those connections are really the things that community development is now all about, rather than trying just to get everybody to contribute code to yours. So what your metrics on this stack analytics or whatever the dashboard it looks great.

A

We all know that's a wonderful thing to be the number one contributor to a project or whatever- and you are our powers-that-be- love us to be there. However, the more important thing is that all of the communication and the network of peers is nurtured and healthy and again diverse and well engaged and know how to engage with each other. So that has really been the model that we've been going for with open.

A

Shipped Commons is giving away the podium pulling in the people to speak at things like open ship Commons briefings on topics that you might not have thought were relevant, but once you look at the model you can see. Oh there's this project out there. That's um that's about to hit you all like a ton of bricks, so you'd better know something about it. So we'll pull someone in there and give them the podium.

A

So that's really kind of what we've been teasing out over the past couple of years and whenever anyone hears me talk about jellyfish, they probably shut down their ears now. But these are the kinds of tools that we really think help build healthy communities, because it's not possible any longer with the complexity in these communities and these relationships to do it on gut or personal relationships.

A

There are just way too many repos to watch. There are way too many people in those repos they're way too many relationships, and so much of our companies and our customers and our end users depend on these things being well-oiled machines that we can't really risk it on a gut instinct or Diane, putting a mailing list into a spreadsheet doing analytics on it anymore. We haven't done that for a long time.

A

This is really then the thing that's helped us to do this, and so these are some of the conclusions that we we have sort of reached as many more that we can tease out here, but I think it's pretty obvious that no company, whether it's an end-user company, a technology company hosting provider, is really working on just one thing: that's been pretty key, and this is data-driven approach has been really helpful for upstream coordination, and that is essential, and these relationships really really matter to everybody.

A

Having domain knowledge is really then key, and this is not really an attack on old school community. Individual management- that's kind of nurturing, still needs to happen for your project. You can't abandon that that, but it does bahoo view to take a more ecosystem approach approach and to help you do that with some data-driven tools and then, as Daniel always tells me, data matters.

A

You got to clean your data and curate your data, and you need good tools like we've gotten from matera and I know.

A

I had in the beginning a routine every Saturday morning, I'd sit down with a cup of coffee and run the report and see who would the outliers were and where the, where there was duplication, where the sorting hat didn't work and have to go back in and do that clean up work, so I think that's been for me, one of the habits for that I would like to see more community people develop and incorporate is really to start understanding who's in your community. I can't say that more vociferous ly.

A

That is the key to all of this.

A

If you don't understand the domain and you don't know who's in your community, it's really difficult to do any aspect of community development, be it marketing, content, delivery, coach on boarding any any any aspect of community and then I always put that an amenity is dead because it is, and then maybe what's next now that we're part of IBM, maybe someday they'll, give me access to IBM Watson, and we can tie that in and do some real predictive analysis or take even better open data hub and apply it to this.

A

Just like we do for telemetry and other things, so that's kind of where we're at right now in terms of the deep dive here and Daniel. Is there now that I've talked forever other things you want to add in there and I'll look and see if we have any questions or if anyone's asked anything.

B

Nominee know many many more things to do throughout here has to say that it is a really interesting and funny thing to deal with I'm pretty happy to have participated and keep evolving. This video concept.

A

Yeah, the one question that's come in, which I think is a good. One, too, is what the correlation between code collaboration between personas and the company team membership that that's an interesting one. I've used the tooling so far to identify that team from say, Amadeus or uber, who you know who is working on the open-source ice. It doesn't give me insights into who's behind the firewall. I, don't always know everybody at Amadeus or that, but it does give me a way to to do that.

A

We could easily, with this tool, watch the development like we did with Clayton's analysis. Instead of just doing an individual watch, the growth of open source participation in different repos or an entire organization. That's- and that would show us the collab I, think a bit of the correlation between code collaboration between the personas and that what we haven't done is tagged.

A

The the tagging or the grouping of people into those personas is still a hand-wavy Diane thing like when I see someone, that's I, recognize them now, as a tangental or I recognize them as their. We don't. The tooling does not recognize people automagically.

A

Yet as that or and that's where I think, maybe the predictive stuff might help us to is to recognize as some what the path is to being from going from tangental to being connected to multiply things and historically and- and you know, there's only so much time in the day, but these are things that are very much of interest to me to continue to do with this work.

A

Is you know, as we try and nurture healthy, engage and diverse communities, these kinds of toolings and the metadata that, we add to say the sorting hat and the identity management will hopefully help us he's out different issues around marginalized communities and make sure that we give the podium and the support people across lots of communities whether there are technology, communities or communities of interest in other aspects of their lives.

A

I think that's what we had for time for today and I know Daniel it's late, where you are you're off in Spain and I'm on Canada and we're probably the only people that don't care about the holiday coming up this weekend. But if you have questions, please do reach out to us.

A

We're very happy to make that happen for you, and let me just throw up the last page here or my favorite page here, the Canadian one, because yesterday was July 1st Canada Day, and this is a wonderful Wayne Gretzky quote here- is the goal here is to skate to where the puck is going not to where it's been, but where it's been always informs us, hopefully how to grow new people in your community and keep them engaged.

A

So with that and we'll see if we can get to the next slide some days, we just say thank you and if you're interested in this topic, please reach out to us and we'll be happy to continue the conversation.