South Big Data Hub Social Cybersecurity Working Group, 28 Apr 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Social Cybersecurity WG: Complex networks, AI and the computational study of terrorism

Description

Presenter: Dr. Gian Maria Campedelli
Institution: University of Trento, Italy

A

It is my pleasure to introduce um John Maria, who is a postdoctoral research, fellow in computational sociology and criminology, at the University of Toronto Italy in 2020. He earned a PhD in criminology from Catholic University in Milan. If you from 2016 to 19, he worked as a researcher at transcribe, the joint Research Center on transnational crime of Catholic university university of benonia and University of Perugia in 2018. He was also a visiting research scholar in the school of computer science at Carnegie, Mellon University.

A

His research addresses the development and application of computational methods, especially machine learning and complex networks, to the study of criminal and social phenomena, with a specific focus on organized crime, violence and terrorism. So John over to you- and you will find a lot of interesting folks in the team today, just to listen to your talk and I will also request everyone to please stay muted. If you have a question, please use the chat feature and I will be moderating the chat.

A

So if you have any pressing questions, we can ask our speaker over to you John. Thank you again. All.

B

A

B

You very much for inviting me first of all, thank you for joining I'm, happy that there's a both representative from the terrorism side, as well as the computer science information science site, so um very excited to to get this talk, can I can I share. The screen uh am I allowed to. Yes,.

A

Please go ahead: okay,.

B

All right, so you should see um the the presentation slide. Correct.

A

Yes, we do all right, perfect.

B

So um is going to be the title: complex, natural, artificial intelligence and the computational study of terrorism. um A lot of topics I'll try to keep us, keep it as brief as possible. uh I'm, not very good at timing. So please uh I'm in uh stop me or uh give me heads up if if things are are running out of time, um so who am I? This is a router philosophical uh question.

B

uh I wanted to keep it as material as possible and need, and actually anticipate in me, so I will skip it, uh I'm, just a photographer research, fellow now at University of Trenton, Italy and computational sociology and criminology, and um as he as he said, I'm mostly interested in terrorism, organized crime and urban crime, especially violent crime, uh by through the application and the development of machine learning and complex networks and craft learning and causal inference methods.

B

So the talk outline will be structuring four different kind of sections. First, one will be a very short overview on computational, modeling terrorism, research and then I'll present, two studies that I co-authored um that are actually on this on this field and then I'll try to wrap it up with some concluding remarks.

B

uh Hopefully um uh stimulating some discussion at the end, um given especially given the heterogeneity of the audience uh present here today before starting I, just wanted to thank uh free of the three co-authors that really made it possible to um publish studies and to write and to develop a study so first of all, Professor Catherine Carly at Carnegie, Mellon and then Dr Kirk, shank, um uh who's, I.

B

Think now he moved from Carnegie Mellon uh to be at the West Point uh Academy and then Dr mahila bartolovic, uh recently awarded a PhD in computer science at um Carnegie Mellon. Now um working as a software engineer uh in the startup that Professor Kaufman Carly um um found it so uh very short review on computational modeling terrorism, research.

B

um Since the 70s uh there have been appeals about the state of Empirical research and terrorism and many uh Scholars over time try to highlighted various methodological issues that terrorism, research, empirical science, terrorism, research had. uh These are mainly four. So we have a lack of good, empirically, grounded research, so mostly literature, reviews and um opinion pieces theoretical pieces without any kind of empirical backup, the other Reliance on secondary sources, uh which made it very difficult to innovate.

B

Information wise, the low level of collaboration among Scholars terrorism studies are really spread out across different fields like economics, political science, security studies, criminology um now even information, science and computer science, and just very uh little uh effort to collaborate among uh Scholars and across Fields nearly high amount of one-time contributors.

B

uh So people that um contingent on the particular public pressure to study terrorism, uh move from one field to Terror in studies just to you, know, publish a study and then move to another field uh because of the public pressure and the academic pressure is in another uh focus on another topic.

B

So in 2014, Mark sageman uh talked about this problem of stagnation of terrorism. Research, especially uh in light of the massive fundings that after 9 11 were allocated to the field that actually were, was not able to fulfill the many um problems and uh and respond to many open questions about terrorism that are still open.

B

A newest review, published in 2020 by Sherman, actually was a little bit more optimistic and he pointed out that we are moving towards the the solution of some of the long-standing issues, although some are still in place uh to date, economics and political science. Among the many fields that really deal with terrorism are the ones with the highest meteorological standard, which is not not surprised, is kind of uh uh common for many social phenomena, social problems and in the last 10 years we have.

B

You know two areas in quantitative, computational, social research that fostered a lot of research. A lot of interest in terrorism. Research is to do our Network science and machine learning and predictive modeling. So the memorable items of terrorist networks. Why do I call it invariable lightness? Because you know um we have this spike in interest in in terrorism? uh From the point of view, social network analysis, a lot of studies, uh however, most applications will rely on affiliation networks or Alliance and rivalries networks, which are mostly characterized by cross-sectionality and blur temporal boundaries.

B

uh An I-dimensional dab, no multi-modality, no multiplexity, so basic networks uh that kind of map communication among Affiliates or co-fending, but nothing uh deeper than that uh and generally.

B

These kind of applications are mostly mapping uh physical connections between individuals rather than more abstract, but still meaningful um link uh in in the site of terrorism and then from the machine learning uh terrorism, research, a lot of optimism, machine learning, uh Foster that uh in a lot of fields, uh increasing data availability, mostly thanks to the efforts at Global terrorism, database, folks, University of Maryland and start Center, uh attractive scholar from known social science field uh for applying uh to apply machine learning and predictive modeling.

B

uh However, after you know, a lot of uh works that came out came out as vibrant debate. That models are not able to meaningfully forecast, for example, violence eruptions. This was a kind of a critic that came out from the conflict research agenda that is really relevant also to terrorism uh research um to date, so the machine learning hybrid, more excitement at actual results possible causes the inability to capture complex interdependencies between events, uh the lack of cause and knowledge and theoretical reasoning.

B

Data are still two scars in spite of the many efforts that have been done um and the insufficient spatial temporal resolution, which makes it very difficult to predict such a rare and high impact phenomena.

B

So this talk in the spirit of what I've just said. The stock uh will try to um present to works that hopefully uh goes in the direction of tackling some of the problems that I mentioned. So the first one is learning future terrorist Target through temporal metagraphs games are forecasting future targets, leveraging the ability of graphs to capture retrooperational dependencies across events and then multimodal metrics reveal patterns of operational. Similarity of Service Groups, basically present a graph representational learning approach to detect behavioral, similar ID between groups at a global level.

B

So here's the connection between the content of this talk and title, which is the integration of complex natural, artificial intelligence for the study of terrorism. So the first study we have three issues with the accident: literature on on terrorism, operational information about terrorists or studies that overly aggregate temporal levels so uh like years or months or quarters, which makes it very difficult to have an utility at a policy level.

B

uh Micro level forecasting only focusing on events and lethality to discriminate between events by and this overlooks, the other originality of attack characteristics and then computational attempts mostly ignore Theory. Blinded the letter data speak philosophy, so the aims of the work are to develop a forecasting modeling framework to predict Target at higher risk of being hit by operating, A, fine grain and temporal level, exploiting Riches of data in terms of event, characteristics and building the power Theory.

B

In fact, the theoretical premise is here so strategic theories of terrorism, or is it in a study of conflicts by shelling uh Samuel Awards in 1980, uh so terrorists uh we can say that it can adapt uh active adaptive and other sales setting, so the Strategic frame, which was mostly uh framed by MercyMe, assumes that they operate within some sort of collective rationale.

B

So we can think of groups as a single entity that makes decision and the Strategic decision making is limited by several constraints that influence the type of attacks that a group or an actual plot means that um there are some characteristics and factors, some variables that impact and influence the way in which this uh events will occur and the kind of nature of these events also terrorism, does not occur at random. We know that it follows specific patterns like self-exidability, self-propagation and spatial concentration.

B

This is well known, um originated from the literature, Trump crime concentration and violence concentration. So the point is what, if regularity is besides, the spatial temporal component also exists in terms of a banker at risk, so the.

C

B

Is the the main, like the Supreme question, that we try to Target and the second question was okay but Dan? How can we integrate if this is true? And if we want to investigate it, how can we integrate the two fundamental layers, the temporal dependence and the operational dependence of strategic decision making this graph uh uh kind of uh summarize it? We think that the solution is temporal mathographs, so temporal mathographs are able to capture connections over time between events, but also across event characteristics.

B

So we move from IID uh a research design that are very common in which events are actually considered as independent of one another. We move to a more kind of a sophisticated way, which is thinking about temporal dependence, so one event will impact the probability that another event will occur and we go to the temporal methograph approach. Basically, we know there's a temporal dependence, but we also want to investigate whether there's a value in understanding the uh and their dependencies between event characteristics.

B

So the fact that maybe some of them have some characteristic will impact the probability that some similar events will occur in the in the future.

B

So we focus on two countries, Afghanistan and Iraq for the sake of numerology of events, but also because they're quite different countries, uh even though they are high frequency countries where terrorism was was very prevalent. But if we see the data down here, we see that we have different temporal distributions. We have different quantitative uh uh considerations to make about these two countries.

B

We have different um ecological and contextual characteristics that that are linked to this two countries, so Afghanistan mostly um related to the activity of Taliban, while Iraq uh from 2001 to 2010, uh so the process of many different groups. So we thought that this was too.

B

This was were two interesting, um uh let's say settings to test our our framework, so the analytical pipeline here uh I try to summarize it, uh but we basically focus on each attacks, deploy tactics, utilize weapons and attack targets, as distinct Dimensions tactics in the global terrorism database are originally labeled as attack type, and by focusing on this free Dimension. We we derive multivariate two-day based time series from 2001 to 2018, in which each time serious observation Maps the centrality of a certain feature in its dimension in a given unit.

B

So basically, uh the centrality of that particular tactics or that particular weapon, or that particular Target. It uh is actually capturing how popular how used, how prevalent that particular tactic, weapon or Target was in that particular time frame, which is again a two-day based unit, uh and then we forecast the most Central future targets, trying to learn the patterns of Association uh in previous time time, steps between tactics, weapons and targets.

B

So we try to infer what will be the next Target by using information on past attacks and on this this passage reality values that uh are a byproduct of the network. Representation of connections between the different event characteristics and we also test with different input weights. The algorithms are tested with different input weights, because we wanted to know uh how bad how? How long should we go back in the history of the attacks to try to optimize predictive accuracy?

B

The longer we go and it means that the more stable the activity of a group of of a certain terrorist activity is the shorter. The time frame uh the highest is the frequency with which the actors that are active in that particular context are changing their actions and we experiment with different machine learning and deep learning approaches. But we also compare between metagraphs time series.

B

So this idea that by combining uh the uh temporal and event characteristic event characteristics, networks and resulting graphs, we can gain more knowledge with shallow time series which just used it uh count of features uh in each time unit. And we wanted to understand whether you know using our our approach actually uh um allowed us to to optimize our predicted accuracy.

B

So this is just a graphical representation of how we started the the whole uh project by representing starting from our uh tensor representation, in which we have our days and uh our matrices uh with our days and with our event, characteristics and the count of each characteristics for each day which are divided by Dimensions. So we have targets, we have talk takes and we have weapons and from that we derive our uh our networks. And then we calculate our centralities as a byproduct of prevalence for each of the event features that we wanted to study.

B

We test a different algorithmic, architectures, so Baseline. Basically, we wanted to forecast centrality. Is that t plus one are the ground Truth at time? T? So basically, it's assume that there's no change while was most Central in the past. It's going to be Central also in the future, and that feed forward neural network, simple class of neural net, no design for sequence data.

B

We wanted to test whether we have a temporal structure and then uh four different uh algorithms that actually uh bear um a temporal sequence structure, so lstm CNN with a a 1D, convolutional filter and then bi-directional STM. So basically it's an expansion of the LCM. We sequin. We learn the sequence forward and backwards and then a combination of CNN and ostm, so we stacked together a 1D convolutional layer, a dense one and then on lstm and another dance for performance evaluation. uh Basically, the centralities are and continues uh uh values so from zero to one.

B

We normalize them, but we're not really interested, and so the algorithms actually learn uh the centrality value, but we're not really interested in understanding how well they were predicting the actual, continuous value. So we transformed the problem from a regression one uh to a ranking one. So basically, we developed two different measures of accuracy. Element-Wise accuracy which basically tells us is the model able to forecast at least one of the two most most Central round truth targets for each time unit. uh It's some it's it's a little trivial.

B

So we wanted to know whether among the two most Central, uh the algorithm is able to pick at least one, and then we have cell-wise accuracy. So we want to understand whether the model was able to correctly forecast the whole two item set. So we have two round Truth, uh most more most Central targets. We want to understand what our algorithm is actually able to produce those two uh correctly.

B

So the results, comparing architecture and models uh we have Afghanistan on the left and Iraq on the on the right. uh We have the um blue dots for the methograph algorithms and the lighter um say cyan ones are the ones with the shallow times here is what uh emerges is that generally, the metagraph approach is far better in predicting uh the next future Targets. This means that our Network, a representation, actually allow us to um to have uh much more information about.

B

What's going on in those two settings, we have variation in terms of the ability of the algorithm and the input width for each of the two settings. So the first results again is that graph, the read time series I performed shuttle time series and forecasting because we have a reach representation. Second, is that bi-directional lstm in both settings?

B

uh Have you guys performance, although with different empath widths, means that um basically, the fact that we are uh letting our networks to learn the sequence uh backwards and forward allow the algorithm to understand a little bit more of the context that is surrounding our information and these?

B

Actually, uh um we see that uh this uh game uh much more pretty accuracy compared to other approaches, uh and then the third result is that Afghanistan uh has the is set wise accuracy uh in the uh biostm experiment, with 30 units as input width, so meaning that in Afghanistan we have more stability, which makes sense because offense then again is populated by less uh terrorist actors, while in Iraq the highest set-wise accuracy is slower, and also we have an input width of five so ten days, basically less regularities and the fact that the result is less.

B

Let's say um optimal means that we have more actors and that these more actors are also innovating more in the country. So this makes it a little tricky for the algorithm to to learn for the batch configuration just to focus here. uh What we see is that on the blue, we have the ground Truth. uh So the number of time that each of these targets were uh on on the on, among the most uh the two most um Central in time unit, and then we have the predicted in in green.

B

What we see is that the algorithm is very the algorithm are very, is very good and and picking up the two most prevalent while it struggles a bit to Peak some of the less prevalent ones, which is something that we should and we want to work on. Of course, when data, when this kind of complexity and heterogeneity and and the phenomenon is linked to the fact that we don't have so much data, especially because we're transforming this in time series uh it's going to be challenging.

B

Probably some data augmentation technique will help and we'll certainly uh think it through, and if you have any suggestion at the end, we I'll be I'll, be glad to hear so conclusions for this first study. Temporal method, graphs provide future context compared to Showtime series in line with theoretical premises and our assumptions and hypotheses, then the relevance of model testing uses different amount of data, so the behaviors show different products in different countries and context, so there's no one-size-fits-all solution and then the promising approach in context with high frequency to terrorism.

B

However limitation, we lack a distinction of different groups and answers operations in this different. In the same country, so the Iraq scenario is problematic for that precise region, and then we don't hang Badge of spatial information which probably might help us uh getting more information and uh maybe refine our prediction and the Kirin injury system does not capture rare events, so 911 and similar events gets lost. We have still the to figure out how to solid Black Swan problem so he's a reference. It was a study published in 2021 in nature scientific reports.

B

It's open access. So um um if you want to look it up, uh it's it's it's it's there. Second study. uh Multimodal networks reveal patterns of operational similarity of service uh of terrorist groups. So, uh basically again, the idea is to try to combine uh machine learning or graph learning, with uh Network science and complex metrics, to try to reveal something about how terrorism work. This is not a predictive uh study, it's more about kind of a descriptive inferential one.

B

uh So the combination of weapons and targets constitutes a group operating profile uh and the choice of targets and weapons and is constrained by a number of factors like material resources, support goals, ideology so uh represent actors in The operational profiles through graphs to detect behavioral.

B

Similarity is the actual aim uh of the study and it's important to be able to discriminate and study the heterogeneity of groups while looking up weapons and targets, because weapons and targets and the taxes that you that they use are the actual most concrete characteristics that lead to attacks in the end and to understand the impact uh on uh human life, but also the economy and political stability of attacks.

B

Surprisingly also, there's not a we weren't able to find any comparative account of uh heterogeneity uh among terrorist groups and serious actors uh in a global scenario, which means that we lack understanding of how singular groups are, uh how different they are, how they operate. Do they innovate? Do they change over time, so we use again the global terrorism database and we consider terrorist actors that have plotted at least 50 attacks at the global level, from 1997 to 2018, accounting for a total of uh uh 105 groups and more than uh 42 000 events.

B

And again we focus on tactics, targets and weapons. Here we have a visual representation. uh First of all, the number of groups and their number of attacks. We see that most groups uh are actually around a 50 to 100 attacks and we have a a small number of outliers, uh which are mostly aslamist actors that uh are able to plug were able to plot thousands and thousands of attacks in this time period. We have also the data about years active, which we see that it's pretty uh pretty pretty reach as a representation.

B

We have groups that have been uh active with high frequency, but just for uh um a little amount of time, and then uh groups that have been active for even more than 20 years, for so for all the uh the time period under consideration, and we also have the disaggregation of groups by their ideologies, see that uh Islam is not angiotism is the most frequent one in general, so we have atominationalist groups as a second uh second, uh most frequent, so computational methodology. What we did here uh for each year.

B

We took uh this multi-model network framework, in which, basically, we had three different matrices again for each year uh and each Matrix was a group by tactic Group by Target and group by weapon Matrix, where the uh link, where weighted links, meaning that the weight was the number of times that that particular group was separate, use that particular tactic or Target or weapon in that particular year. So by starting from this scenario.

B

Basically, we transformed these networks, this multimodal framework, which is a multimodal bipartite framework into a multimodal unit model framework by using our reducible graph procedure. So basically, in the end, we come up with uh networks that connect groups and the connect groups if they were similar and uh use of tactics, targets and weapons, and that precise here and then we devise these multi-view moderati clustering procedure that was first developed by Ian krugshank and it's a doctoral phases which allow us to optimize this measure of modularity across different modes of the networks.

B

uh And we do this two times so basically, the first multi-human or larger clustering computation is allow us to filter out the cluster to outliers. So, basically, we filter out those organizations, then, in in each of the modes for each year, are particularly impressive in the way and different from the others in the way that they behaved.

B

And then we reiterate the procedure on the refining classes separation, because we wanted to also disaggregate the critical mass of groups that were with that quantitatively seem more similar, but were characterized but much more nuances and and the way in which they were, for example, combining some tactics with some targets and weapons.

B

So, for example, these are the reducible graphs for uh 97 to 205, uh 211 and 2018. For all the free modes- and we see that we have outliers, which are basically isolates, and then we have a core component uh of uh groups that are that seem very similar because they are very connected. But in the end, uh if we look at the nuances of the data and the distribution of the weights uh in each of this networks, we see that there's still something to be discovered there and still at originating.

B

That's why we wanted to run a second uh attempt for uh separating and refining the Clusters. So we detect some clusters in the end again, the fact that we use this multi multimodal um base modularity allowed us to end up with having clusters that combine together the data on tactics, targets and weapons, and we see that the number of clashes doesn't seem to give us or provide any relevant information. There's not a trend there.

B

It seems pretty flat, but if we consider the number of active groups over time from 1997 to 2018, and then we calculate a ratio between the Clusters and a number of groups, we see that actually we have a decreasing Trend, meaning that probably this ratio, cluster groups showing on that one Trend, probably is indicating us as a reduction of heterogeneity and complexity at the global level. So we add more groups, but we have this kind of similar number of classes over time.

B

It means that we don't need more clusters to uh um uh to study and to link uh the the increasing number of groups. So the increasing number of groups probably are groups that are uh more similar to the others, and we don't have you know different groups that are particularly editor genius. So we have a general Sensation that probably heterogeneity and complexity is reducing at the global level over time to try to understand this fact better.

B

We also look at the stability of co-class drain so use we use the adjusted random index and the folks now score to different measures of how the Clusters were stable over time. And what we see is that we have a strong stability from 2011 to 2018, meaning that group that had a certain behavioral pattern in 2009 kept it quite fixed in the following year, so they were clustered with similar groups and those groups were not changing their behavior in in this time frame.

B

We also have a region of stability uh in the Years 2002 2006, meaning that again, groups were generally clustered that were General clusters together in the past, are going to be clustered together in the future, meaning that the two are maintaining their same kind of Behavioral profile. However, before 2002 we have a high variance, so this is possibly due to new actors coming in or radioactive groups that are changing their operations significantly over time and the fact that group, a and Group B, where it costs, are together into 1998.

B

It's not a guy D that are going to be clustered together in 1999, actually there's a very low probability that are going to be clustered together, meaning that at least one of the two switch their operation um and uh um and and it's going to be clustered with with someone else.

B

So our idea was also to understand what drives Dental clustering. What what's? What's? What are the drivers that made groups being clustered together in in a year, uh because this is a sort of a way to understand again what drives operational similarity and we wanted to test some hypotheses that we had, because, mostly in the literature, we had the separation between groups that is mostly based on their ideology or their geographical um setting.

B

uh But our uh intuition was that maybe we, if we look at operational characteristics and their behavioral profiles, we can see that maybe uh the fact that two groups are of the same ideology or do not have the same ideology uh doesn't mean much, and we can see something. We can see some similarities at a behavioral level, uh even uh when, when two groups are fighting for two very different reasons, so we use exponential round graph models on the fully connected component, like networks, the rifle cluster.

B

So basically each group it's going to be connected to all the other groups that are in the same cluster. These uh give rise to uh fully connected networks or isolates when we have groups that are just isolating their own cluster and we run this uh separate year by year. uh Models using uh exponential, random graph modeling.

B

So we look at some of feature weights like is the amount of resources activity, a driver of cool clustering and the answer ECS, as we see on the top left some of future uh subplot uh when the when the dots are are blue. It means that the result is significant and we see that. Yes, the fact that uh two groups uh share the same amount of resources so we're active and, in the same way, to the set to the same extent to another uh that was a driver, cool class drain.

B

The number of knowns there were features uh in your Regional networks is also uh repertoire uh is also a driver of co-clustering, and this is a repertoire of diversity. So the fact that two groups are very little diverse or very high diverse. uh The fact that two groups have this characteristics are are correlated with the fact they're going to be clustered together.

B

So if we have one group that it's not really diverse and another one that continues to um differentiate, Direction they're not going to be clustered together, um however, uh the country holds true, so you know if you are very, uh not not very uh Innovative and not very diverse uh you're, going to be clustered with similar uh uh groups and actors yeah. We also look at most common targets, most common tactic and most common weapons.

B

So the fact that you use the same weapon mostly or you uh deploy the same tactic or you go against the same. uh The most common Target are these uh uh drivers of of operational similarity overall, not really much. We have some uh significant. We have some years for which this some of these are significant, but mostly uh we don't wear any statistical significance in understanding similarity as a byproxy of sharing the same most common Target tactic and weapon and I.

B

Think the most interesting, too, is that we also look at Region and ideology, so our groups operating in the same region more similar, not very much so uh across the years we have only five years in which this is relation is is actually significant.

B

So this means that the fact that two groups are uh operating in the same scenario like Eastern Europe or western Africa or South America, doesn't mean that they have higher chance to be operationally similar, and the most interesting one to me is that the fact that ideology is not really a driver operational similarity. So the fact that two groups are sharing the same ideology doesn't mean much uh we might have.

B

uh So the fact that, for example, two group are Jedis groups uh doesn't have uh doesn't doesn't lead to a higher chance of being clustered together than one group. That is a Christian one and one group that is, uh she had respawn, and this also holds true, for example, for groups that are left-wing and right-wing groups, so we have Hunter kiss or communist groups that are very similar, for example, to racist um uh and Nazi uh Nazi groups that are active in the time frame under under analysis.

B

So the conclusion of the second study, uh multimodal graphs allowed to capture operational similarity across terrorist groups. This again is a good fact that um embedding together uh complex networks, uh and so the network representation of behaviors uh Beyond physical representation of connections between groups is important to gain knowledge about how they operate. uh And then we detect a reduction in our genetic variation, operational behaviors in the last years, and a consequent consequent higher degree of clustering stability.

D

B

Time and then uh from the rgm results, we see that the yearly amount of resources, activity and repertoire diversity, Drive code, clustering and different directions, and also the the combined measure of the two is also a predictor. However, we see that sharing the same ideology and acting in the same region they're not correlated with operational similarity and I. We think that this goes against most of the public debate about terrorism that is currently heard on media and uh and even uh in academic Outlets.

B

So these are references. What uh this study was published in terrorism and political violence in 2021, but you can also find it on archive.

E

B

uh Window with um all the supplementary analysis as well uh and all the robustness tests that we that we that we did um so before uh getting to the end just some concluding remarks, so the computational wave in crime, research, icon, I, I, come from criminology, uh so uh I see this uh from the lenses of a person that works as well uh works on terrorism, as well as on uh as in crime and since at least 2010.

B

We have seen a diffusion of computational approaches for the study of crime and criminal Behavior, Uh and this also Foster interest in the public policy sphere. You know all the debates about um around predicted policing and criminal justice risk assessment tools.

B

uh Some factors I think were decisive and is uh in prison interest and attention on computational study, competition, laws of terrorism of crime, the higher availability of data, the democratization of programming languages and statistical software and the dramatic shape to quantitative measurement of crime, and also the demand for data-driven public decision, which also uh opens some business opportunities uh for academics and and analysts that were working in the sphere.

B

But what about terrorism? So terrorism, research has certainly been lagging behind. uh The trend is similar to the one witness uh uh and crime research, though so we see that there's a an increasing interest, although with uh with a slower, Pace, probably uh and there's, certainly an increasing fascinated fascination with computational methods sparked by especially by the opportunities offered by social media data availability.

B

So Twitter, but also other social media channels uh have been kind of labeled as new El Dorado uh because of the ability to create original data data that have been not not been used before, uh but still I think that there's still many structural problems uh that are mostly for so the first one is the lack of integration and dialogue between disciplines, again, terrorism, researchers, lack uh methodological expertise, while mostly while uh computer science and stats, mostly like the domain knowledge and the fact that these two communities don't talk to each other is is a is a real problem to me, because we will face a risk of investing much uh on one side uh on the computer science side to create sophisticated models, but without understanding the phenomenon under analysis and on the other end.

B

uh Areas of Scholars criminologists have the main knowledge, but do not have the tools to really apply what they know and to create meaningful models for both research and policy, and then terrorism research has seen it as playground again. The problem of one-timers following the hype for the AI for social good Trend, that, after the probably 2016 so uh after Isis, uh you know when Isis seemed to uh ended to be a problem for the Western World.

B

People moved to other types of uh social phenomena like which are certainly important like covet, is information fake news polarization but leaving a sort of a desert around a computational study of terrorism, um probably until the next size pops up when attention will be again uh brought to the to the topic- and uh you know, but this process really makes it difficult to create a homogeneous and uh continues um research agenda that that will um that will survive in the future, and it really makes it easier just to follow short-term projects that are um justified by funding and by the fact that we all have to publish and so on and so forth.

B

uh But this one help us having. uh You know the ability to create solid uh infrastructures and solid research communities. Then data scarcity beyond the gdd, which is again uh the the best source and the one that I've uh been working with since the uh since my PhD there's no easily accessible, comprehensive, reliable data set Beyond this one, uh which is which is a problem.

B

uh Another one, is that there's a small integration between the offline and the online world, so Fox working with the gdd are just working on the offline work and then folks, working with Twitter data just working on the on their online world, they're not talking to each other. There's no integration want to think there'd be certainly promises and trying to look how, for example, the offline were um input, the online word and vice versa, and then, which is probably the most important one, most technical ones.

B

The fact that we have a rare imbalance Phenom here so predictive modeling with crime, uh relies on millions of observations as millions of crime occur around the world every year, while recorded terrorism. Events luckily are just around 2000, probably more now uh for gdd um data from 1997 to today. So we're talking about a very different amount of information that we can rely upon for our for our studies and for our research.

B

So thanks for your attention, I hope I was um good with times so I think we're around 40 minutes uh and if you have questions I'll be happy to to answer to them or critiques or whatever we have to discuss. This is my email by the way. So, if you have to go- and you want to drop me a line afterwards, uh my email is open and I also Tweeter, so I'll be glad to um to you from from uh from the stock on those channels as well. uh Yes, so that was it.

A

Thank you. Thank you, John very much, I'm sure this is a very, very interesting talk and there must be some thoughts and questions. uh Let's open the floor for discussion and I'm also monitoring the chat. So if you have any questions, uh you can also use a chat feature but unmute yourself and go ahead with the question.

C

Hey John um question about I: guess how you're calculating or how do you derive similarity I think you kind of you mentioned it some in here, but um you know, as you go through, and look at uh I was trying to to bring all this together. But when you talk about similarity in that, um you know I forget which slide it was. But you talked about. You were talking about co-clustering between targets, tactics, weapons, ideology, um yeah. How are you actually calculated? Okay,.

B

Maybe there's one or the previous one.

C

uh No I think it was the one where you were showing us towards the end. What was driving co-clustering, oh okay, yeah, but I was just curious. How are you calculating simul similarity um because, like this uh you're showing here, I, guess co-clustering, but then you've already prior to this you've already drive that they were either similar or not? Similar is the network based on similarity.

B

Yeah, so uh thank you for for your question, uh of course, like uh I had to to be very fast on some details, so um it's totally fine. If you I mean it was my fault, um so to think the thing is uh yeah. So the fact the two uh groups are in the same cluster is uh how we measure similarities.

B

So the fact that two groups are uh in a cluster are, uh by definition similar now uh the fact that uh so what we wanted to study is okay, let us see what are the variables that we didn't use uh and uh in the original Network representation that are correlated and that are drivers of this similarity. So at the beginning of our framework, what we do is we have this modal networks for each uh for each year and for all the free dimension and I mentioned. So we have tactics, targets and weapons.

B

We have this group by type of tactics Group by type of weapon Group by type of Target uh networks that are bipartite networks, where the um where the weights are actually the count uh of the instances like I, don't know. I'm, Isis and I use in 2015 I use 20 bombing, something like that. So you're going to be have 20., so we have this weighted bipartite networks which are modal again, uh and then we transform uh this bipartite networks using this radius ball graph, which is some sort.

B

It's in computational geometries some sort of a nearest neighbor problem. So basically we take these two by this Viper type networks and we transform them into Group by group networks, and in that case two groups are going to be connected together if they share a lot of the weights that were originally placed in the networks.

B

So basically, if you want to think it more uh Matrix like it's.

B

um The most similar The Rose of uh the uh the bipartite networks are for two groups, the more likely that they're going to be connected in the uh in the unimodal network. Now, once we have those networks, uh we're gonna run our clustering procedure, because the fact that two groups are going to be connected- it's not per se uh uh Insurance of similarity, because maybe one one um uh one group is connected to many other groups, but the but the, but the connections have different weights.

B

So we want to disentangle that, and we want to, uh we wanna, you know, derive what are actually the strongest links to this to do other groups and that's how basically we we then create the links we. We then create the clusters by looking at the links of the unimollo networks, so basically um well, okay, yeah. So basically we have our uh bipartite networks here, which are this ones, and then we we convert them into monopertite and we're gonna have this once here. These are the modern protect networks.

B

We see that our outliers are not connected to anyone. These are the groups are very outstanding, uh but then you have a critical mass of reserves that seem to be very connected, and you know at the first glance you might say they are they're similar, but actually the way of the weights of this connections are very different. So basically, uh the clustering procedure allow us to also uh um extrapolate uh similarity between those critical masses, and then we have these clusters that actually optimize across the free different modes.

B

And then, when we have these clusters, uh what we do is: okay, now, let's try to use variables that we that we actually didn't use some of them. We actually use them, but but some of them were actually embedded in the procedure, but some others were not. So we wanted to understand whether okay, we have the sum of feature weights- probably it's going to be correlating, but in which uh Direction and then the same with number of non-zero features uh in in which direction as well.

B

But then we wanted to understand whether what we derived is it actually associated with, uh for example, the same ideology or the same region, or we can actually see that there's a operational similarity going on operational similarity pattern going on also for groups that are very distant, very far away geographically or very far away uh and their ideological Spectrum. So.

C

B

If that answers the question, but um the it's it's, there are multiple steps in the way in which the similarity is actually uh derived.

C

No, that that is helpful, yeah. That's what I was wondering about yeah, so I mean you are taking multiple steps to do that and I guess. When you get those um kind of the monopod, the monotype graphs, I, guess part-time graphs are those are you measuring any kind of I guess, I'm thinking something in terms of like clustering coefficient modularity, something where you can kind of measure the the intensity or the strength of that similarity? Oh.

B

Well, that's a very good question. Actually uh we also measure clustering coefficient uh degree of solitivity degree, centrality and stuff like that. I, it's all in the paper, uh I I didn't include the plots here. I didn't include a plus here, but we also look at that to see how, over time, for example, the networks change. So we do see more clustering over time, uh which is also another proxy of saying for saying that probably is more of engineering going on over time.

B

uh So, if you, if, if you, if you look at the place check the paper, you'll see that we also look at that and if you have questions uh or Curiosities, please send me an email I'll be happy to um to answer that. Thank you for your questions. Thank you.

A

Thank you, John I think we have another question from Brian. Please go ahead. Brian.

D

Yeah thanks uh this is great uh I, have a question about sort of about the uh distribution of the data within the GTD and how that's impacting both papers right. So we know that there are things like I'm going to say: 80 of all attacks fall into either the they're, either bombings or shootings, or something like that right.

D

um So there's a lot of very sort of, let's say low Fidelity buckets that these events sort themselves into in terms particularly I, think of Target types and weapons and tactics to the extent that those are similar to each other, um which seems to me like, has some real potential impacts in both of these papers.

D

Right like in the predictive paper in the first half, it seems like I wonder to what extent you guys addressed sort of the like real naive models here right like what happens, if you just guess the most common um outcomes here for some of these variables right like if 80 of our attacks are bombing, are shooting, then your your results, uh with with a more sophisticated model are I mean how what is the Divergence from chance on those right and then similarly, here when we're talking about driving the co-clustering, um this is a great slide.

D

I. Think to illustrate the point like we're doing so much flattening of the Nuance of what's Happening from event to event in the source data that I'm actually really curious. What happens when these methods are run on um data that have far more specificity about what happened right? Like did you guys dig into sort of the weapon, subtypes the Target, subtypes, or even consider how this is different?

D

If you had actual textual representations of what was happening in ways that you could get much smaller, more specific uh measures of of similarity between you know, event a and event B.

B

Yeah, so thank you for the question. uh That is that's a great Point. Actually one of those points that actually uh kept me awake a lot of nights over the years, but so um you're working at gdd. So you know that that way better than me. uh uh The problem that you mentioned are there.

B

uh One thing that I didn't mention before is that we use all the uh four uh four possible weapons, three possible attack types and free possible targets for each um for each observation in, in both cases, both the predictive and um you know, co-clustering stuff. So uh we didn't stop at just uh the primary one or the First Column, which is something that unfortunately many people, many people do it in this video.

B

um So that does that uh I would say that for the predictive part, uh somehow the the Baseline algorithm that we tested so basically just keeping the most um uh using the most common, the most Central of the past time unit as the next one. uh That is a sort of uh uh test for how uh the algorithm is good at picking.

B

Just you know just the most common and the most uh frequent, uh rather than being able to really discriminate, and um and actually the fact that the Baseline approach doesn't uh provide us with very good result, is a good way to of saying that uh most more sophisticated algorithms are actually able to to uh to pick up some of the complexity of the patterns in the data.

B

Now the using the sub subtypes uh I tested in the early days and then I stopped that just because of the fact that, unfortunately, uh the the number of observations here uh are not as much as many as as um you know that they they don't allow us to go with a feature space. That is gigantic, so we would run easily into the crystal dimensionality. So definitely that that that's that's, uh um you know, that's uh limitation of the study of the first study for the second one. I see it as a limitation and I.

B

Don't I, don't think that uh there will be a problem in testing the approach using subtypes, because in the end we are using here a sample of groups that are pretty dear genius. Even though we're talking about you know, the majority of them are as yetis uh or animation all this, but they are as yet as an animation release that have very different characteristics. So we have rigidis and animationists groups that operate in different geographical uh contexts and we with different resources again.

B

So, yes, you know we have the usual usual suspects that generally use the same bombings, and you know firearms and stuff like that. But we also see a lot of energy in general uh for for some groups that have some particular ideologies like I, don't know an environmental uh animal, animalistic groups or um fall left groups and stuff like that, and then one way that is embedded in the system uh that prevent this kind of uh flattening so flattening off of the various characteristics is the fact that we use weights.

B

So we we just don't use the fact that that group use a firearm to uh um to carry out an attack, but we look also at how many times did you use that and by using this multimodal framework? We also implicitly look at the connections like how many times you use a firearm, but how many times is a firearm against like a a police officer or how many times do Firearms against a civilian.

B

So yes, if we look at the single modes and single variables alone, you know there they might seem flat- and this is probably true mostly for the predictive uh predictive paper, uh but still we might have that concern, and also in the second one. But the precise objective of the of the of this papers was to try to combine and to try to represent a complexity of terrorism, behaviors beyond the single uh source of information which might be tactic Target and weapons.

B

So we wanted to have a representation that was able to capture interconnections, weighted interconnections that were taking into consideration.

B

You know the the complex spectrum of combinations of behaviors, the the frequency and and the aboriginality coming out of that so uh I agree with you that the optimal model would especially in the predict okay, predict predictive case, would would pick up more a new and said uh variables like the subtypes uh and the sub um targets and stuff like that. I think we have three. You have three levels of targets, I'm very afraid, and that was the first feeling that we had when we tested that uh with this numerosity.

B

With this number of observation uh that would end up being completely meaningless. So we figured okay, it's better to have a model that is a little bit less um rich but meaningful, rather than having a structure, a data structure that is very rich, very nuanced, but that bring no uh noise out at all. uh So that was our kind of a trade-off and the decision on how to design the the system, especially in the Memphis paper. I, don't know if that answers the question, but.

D

Yeah, no, that was great. Thank you.

A

B

A

Oh, thank you. Thank you John. Thank you. Everyone for the question. uh I know that we have um we. We are four minutes or the meeting time and I see. There's one and uh uh but I would like to ask John. Do you have a hard stop, or can you entertain another question.

B

No, no, please go ahead, don't worry! Oh.

A

Thank you so much uh hey there. Please go ahead with your question. Sure.

E

um Thank you, uh Dr Joseph. This was, like uh uh you know, an insightful. uh The findings are really insightful. um I just had like, uh like this a question about this slide, exactly the results. What drives cochlea starting the first two things like I, wanted to ask you the amount of resources activity. uh What did you mean exactly by that uh the amount of resources activity and also the uh repertoire diversity? So yeah? Can you just explain on these two sure.

B

So when we we, when we constructed this variable, we looked at the original, uh bipartite Networks, so again, a network in which a model Network in which we have for tactics, uh weapons and targets we have for each for each year. We have our groups and then the tactics, weapons targets, and we count the number of times uh that that group particularly use Firearms or use bombings or use whatever.

B

So the first variable here is just the sum of future weights. So basically we sum the number of uh weights for each group uh in each year and each mode.