CHAOSS Webinars, 17 Apr 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CHAOSS Webinars: GrimoireLab

Description

Overview of the GrimoireLab system, one of the software projects produced by CHAOSS. Webinar in the series of CHAOSS Webinars. April 17th, 2018.

Slides: https://speakerdeck.com/jgbarah/chaoss-webinars-grimoirelab

A

So welcome to the to the third I think it is third cows webinar. Today we are going to talk a bit about Ramallah. The more lab is one of the projects produced by cows, cows.

A

Let's go to the next slide. Cows is working group hosted by the Linux Foundation devoted to produce and integrated open source software for analysis of a development and to produce a set of metrics which are useful for analyzing, the health and the situation of open source software projects. Today we are going to talk about good, more lab, which is one of the tools that we have developed in in respect to this goal of producing open source software for analyzing software.

A

If you want to have some more information, you can go to cows or github, but I also asked in my lab what you have the entry point for all the tools, because, as I'm going to show you in a moment, the more lab is in fact a set of tools that can do everything from retrieving data from software stories up to producing the efforts in reports with information about how software is been developed.

A

This is a can of staff that you can produce with doing more lab on the right. You have a dashboard. This is a screenshot of open, a feed or bitter spot. I owe you want, you can go and check the real thing and there you have an actionable dashboard based on Cubana what you can drill down and have a look at what's happening in the period from many different points of view.

A

Kimura is capable of analyzing a lot of different data sources from git repositories from turret or pot, Scylla or slack or IRC, or mailing list up to like 20 or 25, and produce this kind of words, and you can also produce reports which are PDF PDF files, where you have a description of what's happening in Priya from the software development point of view, Indonesian groom, our lab is also a set of Python tools, which means that you can also use them from Python. You can load your own code to do anything you may want.

A

Let's go to him to look at the structure of a more lab, so here on the Left, we have two repositories: everything engram or lab is devoted to analyzing those sorry analyzing, those repositories. So we start with them and we mined information from them.

A

The idea is to have as much information as possible, ideally all the information available in the original data source and store it into a database which is right here and that's because once you have all the information into the database, you don't need to go again to the primary sources so that you need to prepare staff again from the original API or from the original cuyps, which usually is much more efficient. So in the end, the idea is having a copy of everything here in the database as you compute the row index.

A

So the first path for the data that we have is extraction extracting is done mainly by Percival. Percival is a set of libraries which try to provide a single API for retrieving information for any of the data sources. Then we have Artful, which is a way of orchestrating percival, to retrieve information from a large number of repositories in parallel continuously, and we have a processor of all that information which is from our elk America elk is basically driving out on Percival and storing the raw information in the database.

A

We are using elastic search database on this part. You have something which is quite interesting when you need to analyze a software development which is identifying the many entities, a developer solution for even very simple staff, like counting how many people are participating in the project, you need to find out the different identities at the blawker are using and merge them. In addition, you can also profile them a bit so like for which company they are working in somewhere else.

A

They want to track that kind of information from having statistics visiting companies in addition of statistical base it on developers, that's the stuff done by a salty hat, which is a tool that we have for managing identities into a Marriott, V or MySQL database. That information is combined with the raw indexes to produce the in which indexes and rich indexes are indexes designed to be more simple to visualize and to be more simple to produce reports with them.

A

So those in rich indexes are like simple: you can think of them more as simple CSV files, but to stir it into a database.

A

So you have wrote off a lot of fields for each item and an item can be a commit, can be a ticket, can be a mail message or can be as like Kenneth, and that is visualized by manuscripts into documents, which is reports, as I said, PDF files and stuff like that, or that can also be visualized it with Kidder Thibodeau is a version of cabana, it's quite quite similar to qivana, and you can look at it from the front browser on the top.

A

We have moderate, which is a tool the body to the configuration of the software so moderately, usually how you run everything together so that you produce a simple configuration file and from that configuration file you define sorry in that configuration file. You define the data sources, you define. How do you want them to be deal with and Indian moderate is capable of producing both the documents and and the dashboards.

A

We have a couple of tools more that are not in this diagram and we are still integrating them right now, one for managing identities in our web, browsers that you can do merging of identities or identifying of identities in the browser and another one for dealing with the configuration of moderate, mainly the list of stories and mr. credit, and we also have a new library for doing enrichment. Those tools have been out right now in the process of being integrated with everything.

A

The next slides will, in a bit more of detail with some of the projects I'm going to go very quickly through them, so I'm, just repeating myself. First of all, first step is retrieving information from the data sources. You have the complete list of data sources. If you go to the purse or depository and you have there a list of what is supported by Kumar lab and after which is orchestrating the data retrieval paper.

A

So, while Arthur is basically dealing with with jobs and its job is usually the retrieval of information from our repository and young jobs can be done in the loop incrementally so that you can go and visit the story once again and get all the incremental information that you need. I mean what happened in repository since the last time I visited, then we have enrichment and, as I said in Richmond, is basically combined a combination of the information in identities and the information in data structure.

A

So you get the raw indexes right here to combine them with the information in identities, and you do some messaging of the data, and you do things like, for instance, for tickets. It's important to know how long the tickets were opened. So here's where you do the competing for that simple metrics like how long a ticket has been open or how long does it take to answer a message?

A

All of that produces the region which, as I said, is basically a list of all the items in the data sources, with the specific metrics for I'm sure that somebody's putting some noise, maybe you can.

A

Try to go on so then you have the expectation architecture, which is how you got to those to those in which indexes and you produce information for reporting or for the dashboards, and this is asked a very simple example of how you can produce it as work with more so, as I said, what you need to do is to have description in a file called moderate GFE, with the kind of things that you want to do, including the data repositories that you want to deal with and what to do with them and assuming you have elasticsearch and kuvira already installed.

A

This is the way of running it. You just keep installing model of remark model which is the main packets, the mint they mean Python, packets, driving everything, and then you run moderate with this configuration file that you need to produce. You have the tailored information in a gumar lab tutorial, where it explains you how to produce these files, which are the files needed for configuring moderate and that's it. So today the idea was just to introduce a gumar lab. You can try it with a single line.

A

If you have a Tokyo driver installed, so you can just docker run this container here with this configuration, and this is going to analyze, Bluemont lab itself. The only thing that you need is a github token that that you can obtain from the github website graciously, and then you just run this again. You have details on the a gumar lab tutorial by the way, as of today, we are moving to more lab tutorial to meet.

A

Have so have a look at it in this new link, if you and that's it so as I said, the area of this is just to introduce architecture and the main components of after Marla and during the next week's I plan to have some more webinars on the different components.

A

That's all on my site, so questions comments. Anything from your side.