Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
In the aftermath of earthquakes, social media images have become a crucial resource for disaster reconnaissance, providing immediate insights into the extent of damage. Traditional approaches to damage severity assessment in post-earthquake social media images often rely on classification methods, which are inherently subjective and incapable of accounting for the varying extents of damage within an image. Addressing these limitations, this study proposes a novel approach by framing damage severity assessment as a semantic segmentation problem, aiming for a more objective analysis of damage in earthquake-affected areas. The methodology involves the construction of a segmented damage severity dataset, categorizing damage into three degrees: undamaged structures, damaged structures, and debris. Utilizing this dataset, the study fine-tunes a SegFormer model to generate damage severity segmentations for post-earthquake social media images. Furthermore, a new damage severity scoring system is introduced, quantifying damage by considering the varying degrees of damage across different areas within images, adjusted for depth estimation. The application of this approach allows for the quantification of damage severity in social media images in a more objective and comprehensive manner. By providing a nuanced understanding of damage, this study enhances the ability to offer precise guidance to disaster reconnaissance teams, facilitating more effective and targeted response efforts in the aftermath of earthquakes.
Source detection is crucial for capturing the dynamics of real-world infectious diseases and informing effective containment strategies. Most existing approaches to source detection focus on conventional pairwise networks, whereas recent efforts on both mathematical modeling and analysis of contact data suggest that higher-order (e.g., group) interactions among individuals may both account for a large fraction of infection events and change our understanding of how epidemic spreading proceeds in empirical populations. In the present study, we propose a message-passing algorithm, called the HDMPN, for source detection for a stochastic susceptible-infectious dynamics on hypergraphs. By modulating the likelihood maximization method by the fraction of infectious neighbors, HDMPN aims to capture the influence of higher-order structures and do better than the conventional likelihood maximization. We numerically show that, in most cases, HDMPN outperforms benchmarks including the likelihood maximization method without modification.
Graph generation is an important area in network science. Traditional approaches focus on replicating specific properties of real-world graphs, such as small diameters or power-law degree distributions. Recent advancements in deep learning, particularly with Graph Neural Networks, have enabled data-driven methods to learn and generate graphs without relying on predefined structural properties. Despite these advances, current models are limited by their reliance on node IDs, which restricts their ability to generate graphs larger than the input graph and ignores node attributes. To address these challenges, we propose Latent Graph Sampling Generation (LGSG), a novel framework that leverages diffusion models and node embeddings to generate graphs of varying sizes without retraining. The framework eliminates the dependency on node IDs and captures the distribution of node embeddings and subgraph structures, enabling scalable and flexible graph generation. Experimental results show that LGSG performs on par with baseline models for standard metrics while outperforming them in overlooked ones, such as the tendency of nodes to form clusters. Additionally, it maintains consistent structural characteristics across graphs of different sizes, demonstrating robustness and scalability.
In the attention economy, understanding how individuals manage limited attention is critical. We introduce a simple model describing the decay of a user's engagement when facing multiple inputs. We analytically show that individual attention decay is determined by the overall duration of interactions, not their number or user activity. Our model is validated using data from Reddit's Change My View subreddit, where the user's attention dynamics is explicitly traceable. Despite its simplicity, our model offers a crucial microscopic perspective complementing macroscopic studies.
During the COVID-19 pandemic, scientific understanding related to the topic evolved rapidly. Along with scientific information being discussed widely, a large circulation of false information, labelled an infodemic by the WHO, emerged. Here, we study the interaction between misinformation and science on Twitter (now X) during the COVID-19 pandemic. We built a comprehensive database of $\sim$407M COVID-19 related tweets and classified the reliability of URLs in the tweets based on Media Bias/Fact Check. In addition, we use Altmetric data to see whether a tweet refers to a scientific publication. We find that many users find that many users share both scientific and unreliable content; out of the $\sim$1.2M users who share science, $45\%$ also share unreliable content. Publications that are more frequently shared by users who also share unreliable content are more likely to be preprints, slightly more often retracted, have fewer citations, and are published in lower-impact journals on average. Our findings suggest that misinformation is not related to a ``deficit'' of science. In addition, our findings raise some critical questions about certain open science practices and their potential for misuse. Given the fundamental opposition between science and misinformation, our findings highlight the necessity for proactive scientific engagement on social media platforms to counter false narratives during global crises.
Mild Cognitive Impairment (MCI) is a critical transitional stage between normal cognitive aging and dementia, making its early detection essential. This study investigates the neural mechanisms of distractor suppression in MCI patients using EEG and behavioral data during an attention-cueing Eriksen flanker task. A cohort of 56 MCIs and 26 healthy controls (HCs) performed tasks with congruent and incongruent stimuli of varying saliency levels. During these tasks, EEG data were analyzed for alpha band coherence's functional connectivity, focusing on Global Efficiency (GE), while Reaction Time (RT) and Hit Rate (HR) were also collected. Our findings reveal significant interactions between congruency, saliency, and cognitive status on GE, RT, and HR. In HCs, congruent conditions resulted in higher GE (p = 0.0114, multivariate t-distribution correction, MVT), faster RTs (p < 0.0001, MVT), and higher HRs (p < 0.0001, MVT) compared to incongruent conditions. HCs also showed increased GE in salient conditions for incongruent trials (p = 0.0406, MVT). MCIs exhibited benefits from congruent conditions with shorter RTs and higher HRs (both p < 0.0001, MVT) compared to incongruent conditions but showed reduced adaptability in GE, with no significant GE differences between conditions. These results highlight the potential of alpha band coherence and GE as early markers for cognitive impairment. By integrating GE, RT, and HR, this study provides insights into the interplay between neural efficiency, processing speed, and task accuracy. This approach offers valuable insights into cognitive load management and interference effects, indicating benefits for interventions aimed at improving attentional control and processing speed in MCIs.
Time-varying group interactions constitute the building blocks of many complex systems. The framework of temporal hypergraphs makes it possible to represent them by taking into account the higher-order and temporal nature of the interactions. However, the corresponding datasets are often incomplete and/or limited in size and duration, and surrogate time-varying hypergraphs able to reproduce their statistical features constitute interesting substitutions, especially to understand how dynamical processes unfold on group interactions. Here, we present a new temporal hypergraph model, the Emerging Activity Temporal Hypergraph (EATH), which can be fed by parameters measured in a dataset and create synthetic datasets with similar properties. In the model, each node has an independent underlying activity dynamic and the overall system activity emerges from the nodes dynamics, with temporal group interactions resulting from both the activity of the nodes and memory mechanisms. We first show that the EATH model can generate surrogate hypergraphs of several empirical datasets of face-to-face interactions, mimicking temporal and topological properties at the node and hyperedge level. We also showcase the possibility to use the resulting synthetic data in simulations of higher-order contagion dynamics, comparing the outcome of such process on original and surrogate datasets. Finally, we illustrate the flexibility of the model, which can generate synthetic hypergraphs with tunable properties: as an example, we generate "hybrid" temporal hypergraphs, which mix properties of different empirical datasets. Our work opens several perspectives, from the generation of synthetic realistic hypergraphs describing contexts where data collection is difficult to a deeper understanding of dynamical processes on temporal hypergraphs.
We investigate how Large Language Models (LLMs) behave when simulating political discourse on social media. Leveraging 21 million interactions on X during the 2024 U.S. presidential election, we construct LLM agents based on 1,186 real users, prompting them to reply to politically salient tweets under controlled conditions. Agents are initialized either with minimal ideological cues (Zero Shot) or recent tweet history (Few Shot), allowing one-to-one comparisons with human replies. We evaluate three model families (Gemini, Mistral, and DeepSeek) across linguistic style, ideological consistency, and toxicity. We find that richer contextualization improves internal consistency but also amplifies polarization, stylized signals, and harmful language. We observe an emergent distortion that we call "generation exaggeration": a systematic amplification of salient traits beyond empirical baselines. Our analysis shows that LLMs do not emulate users, they reconstruct them. Their outputs, indeed, reflect internal optimization dynamics more than observed behavior, introducing structural biases that compromise their reliability as social proxies. This challenges their use in content moderation, deliberative simulations, and policy modeling.
We present a novel methodology for modelling, visualising, and analysing cyber threats, attack paths, as well as their impact on user services in enterprise or infrastructure networks of digital devices and services they provide. Using probabilistic methods to track the propagation of an attack through attack graphs, via the service or application layers, and on physical communication networks, our model enables us to analyse cyber attacks at different levels of detail. Understanding the propagation of an attack within a service among microservices and its spread between different services or application servers could help detect and mitigate it early. We demonstrate that this network-based influence spreading modelling approach enables the evaluation of diverse attack scenarios and the development of protection and mitigation measures, taking into account the criticality of services from the user's perspective. This methodology could also aid security specialists and system administrators in making well-informed decisions regarding risk mitigation strategies.
This paper investigates International Research Collaboration (IRC) among European Union (EU) countries from 2011 to 2022, with emphasis on gender-based authorship patterns. Drawing from the Web of Science Social Science Citation Index (WoS-SSCI) database, a large dataset of IRC articles was constructed, annotated with categories of authorship based on gender, author affiliation, and COVID-19 subject as topic. Using network science, the study maps collaboration structures and reveals gendered differences in co-authorship networks. Results highlight a substantial rise in IRC over the decade, particularly with the USA and China as key non-EU partners. Articles with at least one female author were consistently less frequent than those with at least one male author. Notably, female-exclusive collaborations showed distinctive network topologies, with more centralized (star-like) patterns and shorter tree diameters. The COVID-19 pandemic further reshaped collaboration dynamics, temporarily reducing the gender gap in IRC but also revealing vulnerabilities in female-dominated research networks. These findings underscore both progress and persistent disparities in the gender dynamics of EU participation in IRC.
Understanding the functional roles of financial institutions within interconnected markets is critical for effective supervision, systemic risk assessment, and resolution planning. We propose an interpretable role-based clustering approach for multi-layer financial networks, designed to identify the functional positions of institutions across different market segments. Our method follows a general clustering framework defined by proximity measures, cluster evaluation criteria, and algorithm selection. We construct explainable node embeddings based on egonet features that capture both direct and indirect trading relationships within and across market layers. Using transaction-level data from the ECB's Money Market Statistical Reporting (MMSR), we demonstrate how the approach uncovers heterogeneous institutional roles such as market intermediaries, cross-segment connectors, and peripheral lenders or borrowers. The results highlight the flexibility and practical value of role-based clustering in analyzing financial networks and understanding institutional behavior in complex market structures.
The evolution of cooperation in networked systems helps to understand the dynamics in social networks, multi-agent systems, and biological species. The self-persistence of individual strategies is common in real-world decision making. The self-replacement of strategies in evolutionary dynamics forms a selection amplifier, allows an agent to insist on its autologous strategy, and helps the networked system to avoid full defection. In this paper, we study the self-interaction learning in the networked evolutionary dynamics. We propose a self-interaction landscape to capture the strength of an agent's self-loop to reproduce the strategy based on local topology. We find that proper self-interaction can reduce the condition for cooperation and help cooperators to prevail in the system. For a system that favors the evolution of spite, the self-interaction can save cooperative agents from being harmed. Our results on random networks further suggest that an appropriate self-interaction landscape can significantly reduce the critical condition for advantageous mutants, especially for large-degree networks.
The notions of network $r$-robustness and $(r,s)$-robustness have been earlier introduced in the literature to achieve resilient control in the presence of misbehaving agents. However, while higher robustness levels provide networks with higher tolerances against the misbehaving agents, they also require dense communication structures, which are not always desirable for systems with limited capabilities and energy capacities. Therefore, this paper studies the fundamental structures behind $r$-robustness and $(r,s)$- robustness properties in two different ways. (a) We first explore and establish the tight necessary conditions on the number of edges for undirected graphs with any nodes must satisfy to achieve maximum $r$- and $(r,s)$-robustness. (b) We then use these conditions to construct two classes of undirected graphs, referred as to $\gamma$- and $(\gamma,\gamma)$-Minimal Edge Robust Graphs (MERGs), that provably achieve maximum robustness with minimal numbers of edges. We finally validate our work through some sets of simulations.
Seasonal migration plays a critical role in stabilizing rural economies and sustaining the livelihoods of agricultural households. Violence and civil conflict have long been thought to disrupt these labor flows, but this hypothesis has historically been hard to test given the lack of reliable data on migration in conflict zones. Focusing on Afghanistan in the 8-year period prior to the Taliban's takeover in 2021, we first demonstrate how satellite imagery can be used to infer the timing of the opium harvest, which employs a large number of seasonal workers in relatively well-paid jobs. We then use a dataset of nationwide mobile phone records to characterize the migration response to this harvest, and examine whether and how violence and civil conflict disrupt this migration. We find that, on average, districts with high levels of poppy cultivation receive significantly more seasonal migrants than districts with no poppy cultivation. These labor flows are surprisingly resilient to idiosyncratic violent events at the source or destination, including extreme violence resulting in large numbers of fatalities. However, seasonal migration is affected by longer-term patterns of conflict, such as the extent of Taliban control in origin and destination locations.
This paper presents a social learning model where the network structure is endogenously determined by signal precision and dimension choices. Agents not only choose the precision of their signals and what dimension of the state to learn about, but these decisions directly determine the underlying network structure on which social learning occurs. We show that under a fixed network structure, the optimal precision choice is sublinear in the agent's stationary influence in the network, and this individually optimal choice is worse than the socially optimal choice by a factor of $n^{1/3}$. Under a dynamic network structure, we specify the network by defining a kernel distance between agents, which then determines how much weight agents place on one another. Agents choose dimensions to learn about such that their choice minimizes the squared sum of influences of all agents: a network with equally distributed influence across agents is ideal.
This paper argues for a new paradigm for Community Notes in the LLM era: an open ecosystem where both humans and LLMs can write notes, and the decision of which notes are helpful enough to show remains in the hands of humans. This approach can accelerate the delivery of notes, while maintaining trust and legitimacy through Community Notes' foundational principle: A community of diverse human raters collectively serve as the ultimate evaluator and arbiter of what is helpful. Further, the feedback from this diverse community can be used to improve LLMs' ability to produce accurate, unbiased, broadly helpful notes--what we term Reinforcement Learning from Community Feedback (RLCF). This becomes a two-way street: LLMs serve as an asset to humans--helping deliver context quickly and with minimal effort--while human feedback, in turn, enhances the performance of LLMs. This paper describes how such a system can work, its benefits, key new risks and challenges it introduces, and a research agenda to solve those challenges and realize the potential of this approach.
While the Internet's core infrastructure was designed to be open and universal, today's application layer is dominated by closed, proprietary platforms. Open and interoperable APIs require significant investment, and market leaders have little incentive to enable data exchange that could erode their user lock-in. We argue that LLM-based agents fundamentally disrupt this status quo. Agents can automatically translate between data formats and interact with interfaces designed for humans: this makes interoperability dramatically cheaper and effectively unavoidable. We name this shift universal interoperability: the ability for any two digital services to exchange data seamlessly using AI-mediated adapters. Universal interoperability undermines monopolistic behaviours and promotes data portability. However, it can also lead to new security risks and technical debt. Our position is that the ML community should embrace this development while building the appropriate frameworks to mitigate the downsides. By acting now, we can harness AI to restore user freedom and competitive markets without sacrificing security.
Humans increasingly rely on large language models (LLMs) to support decisions in social settings. Previous work suggests that such tools shape people's moral and political judgements. However, the long-term implications of LLM-based social decision-making remain unknown. How will human cooperation be affected when the assessment of social interactions relies on language models? This is a pressing question, as human cooperation is often driven by indirect reciprocity, reputations, and the capacity to judge interactions of others. Here, we assess how state-of-the-art LLMs judge cooperative actions. We provide 21 different LLMs with an extensive set of examples where individuals cooperate -- or refuse cooperating -- in a range of social contexts, and ask how these interactions should be judged. Furthermore, through an evolutionary game-theoretical model, we evaluate cooperation dynamics in populations where the extracted LLM-driven judgements prevail, assessing the long-term impact of LLMs on human prosociality. We observe a remarkable agreement in evaluating cooperation against good opponents. On the other hand, we notice within- and between-model variance when judging cooperation with ill-reputed individuals. We show that the differences revealed between models can significantly impact the prevalence of cooperation. Finally, we test prompts to steer LLM norms, showing that such interventions can shape LLM judgements, particularly through goal-oriented prompts. Our research connects LLM-based advices and long-term social dynamics, and highlights the need to carefully align LLM norms in order to preserve human cooperation.