South Big Data Hub Social Cybersecurity Working Group, 14 Apr 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: SOCCYBSEC WG: Underlying factors of Extremism on Cyber Social Space

Description

Understanding communications between source and target requires deciphering the unique language, semantic and contextual characteristics reflected through sentiment, emotion, intention and divergent thinking. A context-aware and knowledge-enhanced computational approach to the analysis of these narratives breaks down this long-running and complex process into contextual building blocks that acknowledge inherent ambiguity, sparsity, and creativity.

Date: 04/02/20
Presenter: Ugur Kursuncu & Amit Sheth
Institution: University of South Carolina

A

All right we'll get started, and it is my pleasure today to host two speakers. They will Co present a talk on underlying factors of extremism on cyber social space, dr. Ahmed said and dr. Kerr sent you I know dr. Hamid said for over 10 years now, maybe more, and he has done tremendous work in any analysis, especially social media and other forms of data. He is recently the founding director of University of South Carolina's university-wide, artificial intelligence Institute.

A

He is a fellow of I Triple E in Tripoli I in Tripoli is his current research teams or at the intersection of big data, and you know, including physical cyber, social, Big, Data, semantic cognitive, perceptual, computing knowledge, infuse learning and Augmented personalized health. Along with dr. said, we have dr. collection, who is a postdoctoral researcher and the aih team, and he received his PhD in computer science from the University of Georgia.

A

His research is in the pursuit of gaining a better understanding of online human behavior, with social impact spanning multiple disciplines from computer science to cognitive, political and Health. Sciences, specifically, one leg of his work focuses on an enhanced analysis of online communications concerning malevolent activity is run by ill-intentioned actors or orchestrated by malicious crooks that are harming our society at large. So in that introduction, I would like to request a doctor said, and after croissant you to tell us more about their water Thank.

B

You knitting, so really the real presenter is the presenters or I'm just getting things started or is coordinating a an interesting activity in our group that involves participation of a whole bunch of collaborators. Number of them are outside South Carolina, and you will see the name of some of them or most of them listed in the part of the team in the slide. So good, let's go to second slide here. Sure.

B

They the fact that there is a significant role that the social media platforms have provided the ability for extremist group to expand their activity to recruit. He is reasonably well documented, and the concern is that the platform themselves are making inadequate effort, and we probably remarked later on why that may be the case and why research like the one that we present to be present are is necessary to gain the handle such that they can play a bigger role.

B

Government wants, to you, know, platform to X, to use their special responsibilities and remove harmful content, and recently I had a chance to give a talk. It's interesting. It occurred to me that the amount of use of social media for negative activity might have exceeded a motor used to do the positive activity, or at least the negative activity is so large in percentage that that is becoming a big big problem. Next page.

B

Even in the narrower context of Kobe 19, you start seeing all these headlines. So while there is been lot of you know in the past, we we are aware of the use of Xperia. You know of this one line platforms by the extremists, even in these any particular crisis that happens event. That happens. You know these guys are really adapting very fast now looks like.

B

One of the best known example of misuse of social media platform has been this problem of the travelers, so around thousand Americans. During this you know time frame that is mentioned on the slide and since 2011 three hundred Americans have attempted to travel or have traveled to places inside it by jihadist.

B

The number for Europe is even much larger and good fraction of these people have been persuaded, inspired and persuaded online.

B

One of the best-known example was this recruitment of twenty four-year-old Alabama student or a college student who, after a year of interaction with recruiter, Isis recruiter, chose to move to the state.

B

The, whilst use self-taught about the religion, the exhibit extremist Network and the recruiter. Helped this person and similar persons to interpret the religion in the way to serve their nefarious cause.

B

Now, in the work that we have pursued, and particularly in the work that who is coordinating here, are some aspects I can identify.

B

This is a multi-directional problem, so one aspect of this is that sale jihad has different meaning in different contexts. People who are using in in the positive way in religion is very different than the abuse of this term or misinterpret of this term by the jihadis understanding.

B

What that domain is is very critical, as we will demonstrate to you that it is not a problem that can be solved by the standard techniques, AI techniques or information material techniques, that deeper understanding of the content and conversations is very important, and that requires you to really model domains in our case will be religion, ideology and violence, understanding the users and modeling. The users is also very important. Who is a recruiter? Who is a follower and what are the different stages of radicalization?

B

Or of you know this process of persuading the you know persuading somebody to this extremist cause that is very important and the whole process of understanding. The persuasion is interesting, deep and well thought out by those people who practice this.

B

One of her collaborators is a political science at UMass and he has done empirical work and in our case you know the relevant to our particular case. Is this ridiculous scale? Well, he has laid out the you know: scale the he has laid out the scale to these five levels from non to severe and how you know you can see the description of what those stages are main Singh views.

B

No support for particular moderation, all the way to call for action to join the and fight the use of violence, and there are a bunch of indicators that have been identified that help us kind of connect. The content with the you know this different stages of ridiculous Asian. So these are the kind of indie concepts of topics that are discussed at different stages of radicalization.

B

And so the idea here is to analyze the content in the context that can provide us deeper understanding or the factors characterized in this reading class in process. As you see, you know this excess we had access to some highly verified data of this process actually carried out on the on the network, social network, and you could, you know, being able to study these understand the process and how, in a methodical way, a non extremal use.you person is ridiculous by any recruiter.

B

Is is an interesting thing and we want to be able to have the technological mechanisms to be able to isolate these parts of the process and study them and see what factors those different dimensions. I mentioned are playing a role next so now this problem is made particularly challenging in that.

B

If you have wrong results, then you have, you could have harmful results. You know that is, extremists will succeed incorrect and that you will not be able to counter. You know this kind of activities, but you have false alarm. That is also very jarring. That's also very it's not you know labeling somebody. That is when that person is not his can cause a huge you know backlash, and you have to be very worried about that. So quality reliability of this prediction is very important, but also very challenging.

B

So next I am going to pass it on to or he is going to take us more in the deep dive. Take all.

C

So if we study online extremism, we focused more on actually Islamist extremism, as we can have access to domain expert and we can have access to some high high quality data and so that we can employ our approach using all our resources.

C

So the data fact that we have used in this work, it was verified by Twitter and later they accounts for were suspended by Twitter, and the time frame was like seven years of timeframe and the data set includes 538 extremist users, and then we have also equally same amount of non extremist users and non-extremist users were specifically created from a corpus that contains Muslim users. So the reason that we have actually picked Muslim users.

C

So we want to actually see how we can separate extremist users from non-extremist users because they are actually using the same I mean they are talking about the same religion, but their interpretation are actually very different, and that makes a difference actually between the extremes and the next one extremists.

C

So we wanted to see you explore the data, what kind of what what we have in the data? We need to understand the concepts that we have in the data: the language characteristics of the data, how these people are actually using the religious concepts and the and other mainstream concepts that we that we see I'm here all the time every day and what we have seen here in the experience content. So there are actually three.

C

We have identified three groups of.

C

Contexts and one is actually the religion and the other one is ideology and another one is hate and what they do is most of the time they are trying to propagate their ideology, but they are actually using what for doing for doing that. They are using religion and they are trying to also disseminate hatred against Western countries using their ideology and their religion as well.

C

So for that reason that was one observation that we have done in the in the data set and after them we have into the literature what literature was saying actually about the contextual dimensions, and it was. It also confirmed that actually, the political scientist working on religious extremism today also identify roughly the contingent dimensions, religion, ideology and hate, and then we decide that we need to actually represent the data based on these different context of the content.

C

So the reason that this is an important approach is the distribution of the prevalent terms might be actually different for each of the eighth dimension, because, as we know that some of the concepts that exist in the so extremists are coming in and hijacking those mainstream concepts and none that they twist the meaning and none they actually, they actually assigned some other meanings. That is going to serve their ideology and they are changing the meaning of these concepts.

C

Actually, for example, the jihad is a is an important concept in the really the religious literature and religious resources, but the in ideology. Extremist ideology, jihad is also important, but the meaning is actually very, very different and to be able to detect the extremist content and separate them from the non extremist content. Jihad is a very important concept and we need to actually disambiguate this important concept from different meanings. So then we can actually.

C

Represent this diagnostic term in our model, based on their different meanings in different contexts, for example, the jihad might actually appear in tweets with different meanings. For example, some person like manic this person might actually use jihad, something that is going to like be a better person, and so he says, like my jihad, is always being kind, but on the other hand, another person is actually using jihad in context of ideology, and it is actually saying that the nation of jihad and martyrdom so jihad is a part of their ideological rhetoric.

C

On the other hand, they are using jihad to disseminate hatred against the Western countries or against their up group or against their enemy and they're using jihad. For that purpose, as well, and all these different all the same concept but different meanings and to dissent to differentiate these from each other when we are languish represent actually these the same term for extremists and non extremists. So we can see the difference as as their similarity with other key other key words.

C

As you see here, the jihad is actually much more similar to other extremist concepts, for example Allah key or a key dollar share or Islamic state media. So these are the terms actually most of the time extremes are actually using Allah. He is a very well-known ideologue and Ikeda is actually one concept that the Islam's extremes are. They keep using Islamic state media. We all know I'm bothering on extremist people are using jihad, for maybe in our course religious terms. Quran Muslims, Imam.

C

So so being able to make this differentiation, then we decided to create different contextual, contextual dimension models and then for each of these dimensions, religion, ideology and hate. We are creating a language model and to be able to do that, so we didn't start from the data actually, but we we went to an external knowledge resource for each of these religion, ideology and hate emotions. We utilize external ground, truth resources, the corpora, which has been identified by our domain expert political scientist, domain expert and for religion.

C

We are using Quran and hadith because excuse users are actually making references to those resources, underhand ideology. We are using the books or lectures of the ideologues. Our dialogues have been identified by the political scientists as well, and also the hate hate. We are using hate speech corpus that was probably before a specific to Twitter as well since so these are actually a kind of modular contextual dimensions, so this can be applied. This approach can be applied over many social other social problems as well as far as long as we identified the correct, contextual dimensions.

C

So how we create user representations, as we represent the words, so we create their represent representations based on their similarities, with respect to their surrounding surrounding keywords, as well as the whole corpus. So for that reason our solution is actually a distributional similarity based representation and we are doing that for each of the dimensions, religion, ideology and hate.

C

So after we generate representations, so we decided to just compare the extremist people with the non-extremist people how actually similar they are and how dissimilar they are and as we expected so for religion, they are very similar, very strong similarity, because non x rays, people are Muslim people and they're using the religious language and x rays. People are saying as well, even though they are using actually the religious language in a different way. But there is some similarity here, but that is very difficult to separate from each other for a language model.

C

On the other hand, ideology there is somewhat similarity, but not that similar and for hate. Actually, it is it's not that similar at all, even though there are some similarities between some of the users. On the other hand, when we compare extremist people within each other, it is interesting that we see for religion so for religion we were expecting. Actually still, we will see strong similarity between extremist users, but for religion. It is not actually the case. There is still strong similarity between extremist users most, but there is a group of users in the x-axis.

C

If you see there is a group of users that are not similar at all with with other users, so that is actually creating some suspicions there may be in our extremist dataset there. There might be some outlier users, all pile users who are labeled as extremists, but they may be actually on extremists. So this is just suspicion for now, and then we carry on further, for I mean just making sure of the suspicions about the outlier. We actually create some visualizations for the extremist users, and what we see here is for each of the dimension.

C

The ideology is blue, religion is green and the hate is red for religion and hate. We are seeing actually a group of users circled clusters. They are very independent from the rest of the group and they are actually similar and far from others, and there is a very small group as well for the ideology, but it's not that far compared to religion and hate.

C

So that key is another signal, actually that the extremist users may have some outliers so to be able to make sure that what we have done is we have randomly picked 10 random users from each dimensions and not see again how they are actually spread out in space and then what we see here is the same clusters are being formed and we want to see also if the users in each of these clusters are the same users or not so repeatedly.

C

We randomly select an users and each time the same users were clustered actually separately and far from the rest of the group for religion and hate. So that is that had created a big suspicion that the data set that has outliers. So we used a clustering, hierarchical clustering algorithm to cluster the users like the.

C

40, extremists, users and then we created two different groups for.

C

Religion of the mushroom ideology dimension and the hate dimension, and we did the Cystic of analysis whether this separation actually is significant, significant and and we see that it is. And now we have selected some random sample of seven four seven six users 55% and gave it to our food science expert and he validated that these users are actually outliers and a couple score was 83%.

C

So at the end, so we obtained the set of 49 users and then we labeled them as outliers and I removed from dataset. And then these are the contents.

C

From the alcohol users, so they were mostly talking about marriage or Allah silence, Islam leaders, cake and so on, which are seemingly not very indicators of extremist content on dr. hunt. Since we are using three different dimensions, we know that not all extremist users are actually equal and some users we actually at the beginning of the radicalization process. Some users may be at the end of the radicalization process and for each of the stage or the level of radicalization, the intensity or laxity of the radical ideology or religion or hate may be actually different.

C

So for some users we might have some sparse representations for religion for ideology or for hate, and that is going to impact our our models for performance. So for that reason we have created an approach for importing these partial plantations and reuse topic of similarity of the contacts to be able to account for the sparsity and after we created our models. So what we have seen is our tried national model. It performs best, as you see here, by the way Rih.

C

It stands for religion, ideology and hate, and the reason that we are using precision gaze, we want to emphasize the reduction in miss classification of non-extremist users. Is that chef actually point out before we need to reduce false positives and we need to reduce the false positives, meaning we need to reduce the risk classification of non extremists compared to reduction in miss classification of extremes. So we might be actually a better off missing on some extremist users being misclassified.

C

There is non extremists and we may actually have a better motivation to reduce the miss classification of money extremists because that is going to be huge numbers, so even like 1% 1% increase in precision is going to impact millions of people when you talk about like Muslim people in the world, so you can maybe talk about more than 1 million people and 1 percent of these people are going to be a huge number so another having the X rays.

C

People may be actually very minority, very small number and we prefer actually precision is an evaluation metric for that reason. So for the try to match models, it curves performs my best for precision. On the other hand,.

C

45 national model with regional, hey, it is performing better than travelling national for Rico. So in that case, so we are actually planning to use non strap to be able to account for that particular trade-off. So.

C

Based on the results that we have, what we have observed in this study is the domain-specific knowledge that we can incorporate in the model. Creation is quite critical and it is improving actually the improving the reduction of fossil arms of the models, and it also reduces the likelihood of an unfair mistreatment of non-extremist individuals or any potential social discrimination.

C

On the other hand, studying actually different dimensions, contextual dimensions and how they are impacting the model performance, so what we have seen here is the extremist use potentially employ actually religion along with hate. Maybe this is a part of their hate tactic, so they are actually doing this purposefully and it might be possible that they are just fine, their hatred, using religious, religious rhetoric.

C

On the other hand, each dimension plays different roles for each of the levels of the radicalization which, which was like five levels of radicalization, and the nuances in each of these radicalization is quite different, because each of the dimensions are actually playing role, different roles in each of them and capturing the Monza's in terms of linguistic or semantic characteristics. We can actually do this when we break this down into different dimensions. We can capture those nuances in a more granular way and.

C

This is what we have done so far.

C

We have more actually research that we are working on, how to better understand the factors of online extremism, and we are working more on actually other aspects of the of the radicalization process, how these people are able to perceive these people and maybe another time we can talk more about other aspects as well. Thank you.

A

Thank you doctor said, and dr. % you for your great presentation. This was very interesting and I learned quite a bit from your presentation. Now we can open for questions and answers from our participants. I.

A

Can get it started over you? Can you go to slide? 17 I had a very quick clarification question. There.

C

A

So could you explain what was the X and y-axis.

C

So we are so these are the representations generated from language models. In this case we have used a word to wake and and to be able to create this. So we just measuring the distance between the representation using cosine similarity, and we are just measuring that and then creating this heat map.

C

So the x-axis is still the users, extremists, users and the y axis is the non extremists users. So.

A

The data set comprises the fiery red extremists and 500 non-extremist users. That's right! That's right!.

C

D

Hi this is film. Editor I also have a question here. So how did you so I understand? The corpus was from Twitter correct? Yes, yes,.

E

D

How did you go from the model were to back to profiles for users, so you had to do aggregation crosswords in a tweet and then aggregation across tweets for a user. So I was wondering how you how you did that those two levels of aggregation.

C

So we did not actually so the the data sets we aggregated from two resources, and one of them was from one publish resource by Marian Fernandez and the other one was actually from Amelia Ferreira, who used to be working with. You I think yes, and he published a list of users.

C

Actually, it was a group of users, group of volunteers called lucky troll group and they were actually identifying the likely Isis supporters and reporting to Twitter and Twitter was reviewing those and then suspending their accounts after verification that they're actually Isis supporters and we have taken taking the list of the users. That was that that was how that was published, and then we, when we gathered actually their past tweet tweets, and then we combined the two data sets accordingly.

D

Yeah but the what I was asking is: how did you go in order to compute similarity? You need to have a vector representation of each user. Yes, I was asking how you build those back to representations, because you said that you started from work to back, which gives you a vector for each word.

C

Connotations using these external resources so to be able to, for example, for religion dimension. We have created one worth: wake representation for our ideology, another represent another model and for hate we create another model and using these models we created the user representation of each user and meaning user representation. Meaning is we aggregated the tweets for each user and then we create a representation accordingly,.

E

Aggregation I.

D

Was thinking about the details about of that aggregation so.

E

D

E

C

D

Your average, across the words in a tweet or across all the words in all the tweets, or do you first average across tweets and then across sorry words and then across tweets, oh I'm, asking because there's lots of different ways to do this and also dependent on that. So I wanted to understand how you did that. So we.

C

First average across at week and then across the whole across the tweets for a user.

D

Okay, so it's an average of averages of vectors mm-hmm. Yes, thank you sure.

E

That seems like a reasonable strategy to me well as well, unless someone else wants to go first, this is Sarah.

E

It's a really fascinating presentation very useful, but Phil's question got me wondering: do you look at all how the Twitter users network to each other so.

C

We have network information, but in this specific study we didn't look at the network interactions, so this is something we are working on also right now how these people are actually interacting with each other. So it was in saying that when we also look at the information density for each of the dimensions for the users, so there is actually differences between the religion and the ideology. So, for some of the users, ideology is actually much higher much Dancer compared to majority of the users, so the ideology users are actually a small number of users.

C

On the other hand, the users with idea of which religion and hate it is actually a majority of the users. So we are actually working on right now in network analysis, where we can see who are the recruiters and who are actually the followers being radicalized by the recruiters.

C

And so this is the the information density was actually suggesting that there might be some group of users with trying to disseminate ideological content and and disseminate to their their followers that they're trying to radicalize, and so we are actually in the process of identifying those recruiters and followers. At the intersection of network analysis and their content as well. Yeah.

E

No I think it's fascinating. I mean I am a traditional content. Analysis person I feel that content is extremely powerful and I have been a little frustrated that people have gone for Network, not content, so I think then, wherever you can try to get a useful intersection of the two. Instead of just talking about content or just talking about network, is it's really really useful, not just in this context, but in general, in terms of leveraging the particular importance I.

B

Think the points that I've made so what makes at least for me this- you know this much more exciting- is that a lot of work has been done in content analysis and often the tendencies is to let the data speak and see what we can find. What pattern we can find all that when we go into this complex domain, I.

B

Instantly feel that the empirical work that social scientists may have done in this case, as helos work, that we build upon, is almost a required, without that it will be, though the words will be jumbled up in so many different dimensions and that the true meaning will not come about, hence being able to say, I am going to study this from religious perspective from ideology, perspective from hate perspective or valence perspective, and that in doing so, actually you know doing the pain, slipping worth of taking hadith and Quran and and actually creating a fairly good knowledge, graph or ontology and and kind of using.

B

You know that to filter and understand that dimension independently from ideological dimension was, you know, is the key to studying these kind of relatively complex problems. And one of the very interesting thing was in this. You know again, empirically there were five states that were outlined.

B

What was very insightful was to see which of these dimensions of religion and ideology played, or you know, particular role in each of these different phases. So it looks like you know, you're getting here in the strain of cognitive process that the recruiter is using in a first.

B

You know using religion getting to some level and then going to ideology getting to some level next level and then using the violence for for the last action kind of part of it or you know so so now there is all there's a danger that you start with this empirical model and there is kind of a hypothesis and then you lose out on the real signals, and that remains but the problems, otherwise just so complex that you will not get a sense of what's happening with different.

B

You know this levels and process steps in the process without having this domain aspect of it and without actually building good domain models and without taking the step combining the statistical processing, what to work kind of stuff and embeddings with domain infusion, as we do so. You know, our groups are key emphases these days in knowledge, infused deep learning- and this is just one of the instances of that line of work that we are doing. I.

E

Think it's great and you're totally preaching to the choir here as I'm a political scientists young right now, they're looking at Russian disinformation, narratives, but I'm, starting with the narrative not with suspected you know, trolls or BOTS, or anything like that, and it's it's complicated, but I completely agree with you about this signal. Noise issue and I can I totally think that you know having a ground truth. Corpus is just really really important.

C

Because it's important because, for example, these malevolent actors on social media, they are using some mainstream concepts and mainstream language that all other people are using. Most of the time.

C

In this specific example, the Islamist extremist networks groups, so they are using Islamic, mainstream resources and language to justify their violence and extremist ideology and most all the time it is really difficult to separate both from both from each other, and this is most of the time and most likely that the social media companies are reluctant actually to to solve this problem like in a real-time and specifically one of the reasons actually that we, this is our theory, but maybe one reasons that we have fun outliers in the data sets because we're like this data said knowing that the data set was verified by Twitter, but we had outliers in the data set which are not actually extremists and later we found out that the Twitter experts were actually not domain experts for Islamist extremism, so they didn't know the domain knowledge with respect to religion or Islam or specifically for Islamist extremist ideology.

C

So it is very likely that they were they fail to separate the accounts that are actually extremists from non extremists, and there are like 10 percent, although outliers may be that that was a reason. On the other hand, it is important to generate representations. The true representations based on their meanings with respect to these contexts, religion, ideology on it because same the very same concepts are going to be represented differently, for example the vectors, the representations that we are going to generate over here for religion.

C

The same concept for jihad is not going to be the same representation for ideology. It is going to be very different from each other, as I as I showed like in earlier slice, so over here. The same concepts are actually very similar to extremist concepts that we already know and the other jihad representation was close to other concepts that we know that are not extremists non-extremist.

C

So this is actually how we can separate or the same business ambiguities concepts from each other. That is going to contribute a lot to the performance of the model, specifically reduction in false alarms and once any anyone wants to employ this kind of language model, they need to account for false alarm and their implication because is going to implicate millions of people. For that is done. This is significant. I.

D

Have another question about the outliers I was trying to figure out what might have led Twitter to make the mistakes, if indeed there were mistakes and- and that made me think about the content that you used to find the user representation.

D

So did you use only original tweets or do you use also the content of retweets so because I'm I was wondering if perhaps some of these people, who are not necessarily you know, extremists in from from your perspective, when you look at their the way they write, but perhaps they have occasionally retweeted something posted by somebody else that might have affected tweeter decision, but it may or may not have affected your classification depending on whether or not you consider the content in retweets. So I was curious about how you chose so.

C

We were also curious. Actually, what kind of I mean these outliers since the dataset was verified by Twitter? What was the reason actually for that particular so-called mistake, or if there is a mistake and the team that verifies all the reported users to reveal whether they are supporting what their Isis supporters or not so they were actually a general abuse detection team, so they were actually the team verifying the users who were abusing the platform spreading bad behavior. So that was actually so.

C

They were as as far as we could find, so they were not trained specifically for this particular phenomena or Islamist extremism. Kinda.

D

But did you you I was curious about whether you use the content of retweeted tweets in your presentation of the you.

C

Sob, so we need to so we used retweets. On the other hand, we needed to use retweet for each of the users to be able to have a full representation of that users to be able to capture all of the linguistic behavior that they have.

D

Okay, so you counted the content of a retweeted tweet as part of the content produced by a by a user. Yes, thank you.

A

Okay, do we have any other questions.

D

If there are no more I have one more I'm, sorry.

D

Monsieur transition, where you went from the similarities, you know from the vector representation and calculating similarity between users to where you moved into a classification task for which you calculate precision at the end. So how is the? How is the classification?

D

How is the classifier model? How does it work so.

C

We use random forests, cause fire and the baseline that we were taking was using an iPad bias. Classifier so did it was the state of the art, and so we used random force, cause fire and to be able to generate the representations we used word to it, and so.

D

So you use the word to back fee demand, values of the vector as features fed into the random forest. Yes, yes, and then.

C

D

Petition and recall that your reporting is based on what cross, validation or.

C

So we, after after we generate representation for each dimension, so we concatenated the representations and then again reduce the dimension.

C

And then we fed into the random forest algorithm.

D

Okay and Andy and I was just the last question, is: how do you measure your performance using cross-validation or oh yeah,.

C

Of course, we use cross-validation tenfold, okay, so.

D

There is so thank you that that answers my question and, of course that is always a challenge in these cases, because you know you can get relatively high as you do good, good, very good performance within the domain with cross-validation, but these these kind of tasks in these real-world scenarios are really challenges. When you go, you know cross domain.

D

You know we find our models, which are for doing some different tasks, which is about detection. We find that they do really really well within domain, but very poorly about lane. So that's always a challenge, of course, for all of these applications. But let me thank you guys. A really great presentation, I learned a lot.

A

Then, given for the questions, definitely help us know a little bit more about the approach I'm. If I may I'd like to ask one question a doctor said and doctor course with you, given this study now, as we know that the behaviors keep evolving now, how do you see application of your research or the models that you have developed? What would need to be done to detect new forms of giving new forms of behavior to detect the same users with their new forms of behavior, not sure if I made my question clear sure should.

B

Be able to represent that behavior that you are thinking about is this in the of the type like these five stages of radicalization and the process of moving across that or something else you have in mind when you see behavior so.

A

When I took there are multiple dimensions to that question one is we sometimes see that people share things on Twitter when they're in the initial stages, but then they move to some private platforms, almost closed platforms when they are getting close in recruitment stages or level of for it from your scale. Yes, that is something that we have observed, that they move or migrate from one platform to the other. That is one form of behavior.

A

Another form of behavior could be what, if the type of features you have used may not be adequate now, maybe they have changed their tactics, they're using different types of words or different type of content. So in that case, how would the model adapt to these evolving behaviors.

B

Yeah very good question.

B

If you want to do a sorting.

C

So, in this case, the external knowledge plays an important role not being solely dependent on the data set. So then you can generate representations regardless of the platform, on the other hand, creates a creation of the model, is going to be dependent on still the data set that that you have.

C

On the other hand, we believe that the dynamic knowledge graphs can address to this question is we can represent the online behavior of these groups in a knowledge graph, and this knowledge graph can be evolved over time depending on the changes in this behavior, so that we can actually incorporate incorporate that knowledge in a model in a continuous in continuous manner. So so this is I mean still an open question for us and how you can continuously evolve this knowledge graph first and that you can incorporate that in the model.

C

So, first, the first answer that I can that I could give would be this actually, yes,.

B

So slightly repeating, given here the focuses on content, there is more. That means we don't be on country, but just for content aspect of it. The the change in the topics and the change in.

B

The concepts that they use, if that changes I, think that that's where we've been trying to do more and more work on this, so called knowledge, graph, evaluation or dynamic noise graphs and in in many activities, new concepts come in and new.

B

How do you detect that those concepts are there now as part of the conversation, and how do you make them in howdy in enhance, extend your knowledge, graph or knowledge model for that domain with to include them that part we have been working on for some time. I have not been able to give enough attention to it because of other problems going on, but it's a it's been a theme that we will be working more on.

B

So that means the topic drift or concept reef, a kind of stuff that we expect to capture along this line. There is also increasing value for incorporating the time concept, meaning the concern when a particular concept becomes dominant in a particular part of conversation, and when we incorporate that into our knowledge graph to to also have a time information record. That also becomes part of the you know, domain-specific approach that we pursue, but your question will be months more than this and just to know all the ways that we need to look at I would.

A

Like to thank you very much Ahmet and over for this fantastic presentation,.