Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
The configuration model is a cornerstone of statistical assessment of network structure. While the Chung-Lu model is among the most widely used configuration models, it systematically oversamples edges between large-degree nodes, leading to inaccurate statistical conclusions. Although the maximum entropy principle offers unbiased configuration models, its high computational cost has hindered widespread adoption, making the Chung-Lu model an inaccurate yet persistently practical choice. Here, we propose fast and efficient sampling algorithms for the max-entropy-based models by adapting the Miller-Hagberg algorithm. Evaluation on 103 empirical networks demonstrates 10-1000 times speedup, making theoretically rigorous configuration models practical and contributing to a more accurate understanding of network structure.
Attempts to manipulate webgraphs can have many downstream impacts, but analysts lack shared quantitative metrics to characterize actions taken to manipulate information environments at this level. We demonstrate how the BEND framework can be used to characterize attempts to manipulate webgraph information environments, and propose quantitative metrics for BEND community maneuvers. We demonstrate the face validity of our proposed Webgraph BEND metrics by using them to characterize two small web-graphs containing SEO-boosted Kremlin-aligned websites. We demonstrate how our proposed metrics improve BEND scores in webgraph settings and demonstrate the usefulness of our metrics in characterizing webgraph information environments. These metrics offer analysts a systematic and standardized way to characterize attempts to manipulate webgraphs using common Search Engine Optimization tactics.
We study how participation in collective action is articulated in podcast discussions, using the Black Lives Matter (BLM) movement as a case study. While research on collective action discourse has primarily focused on text-based content, this study takes a first step toward analyzing audio formats by using podcast transcripts. Using the Structured Podcast Research Corpus (SPoRC), we investigated spoken language expressions of participation in collective action, categorized as problem-solution, call-to-action, intention, and execution. We identified podcast episodes discussing racial justice after important BLM-related events in May and June of 2020, and extracted participatory statements using a layered framework adapted from prior work on social media. We examined the emotional dimensions of these statements, detecting eight key emotions and their association with varying stages of activism. We found that emotional profiles vary by stage, with different positive emotions standing out during calls-to-action, intention, and execution. We detected negative associations between collective action and negative emotions, contrary to theoretical expectations. Our work contributes to a better understanding of how activism is expressed in spoken digital discourse and how emotional framing may depend on the format of the discussion.
We study sublinear-time algorithms for solving linear systems $Sz = b$, where $S$ is a diagonally dominant matrix, i.e., $|S_{ii}| \geq \delta + \sum_{j \ne i} |S_{ij}|$ for all $i \in [n]$, for some $\delta \geq 0$. We present randomized algorithms that, for any $u \in [n]$, return an estimate $z_u$ of $z^*_u$ with additive error $\varepsilon$ or $\varepsilon \lVert z^*\rVert_\infty$, where $z^*$ is some solution to $Sz^* = b$, and the algorithm only needs to read a small portion of the input $S$ and $b$. For example, when the additive error is $\varepsilon$ and assuming $\delta>0$, we give an algorithm that runs in time $O\left( \frac{\|b\|_\infty^2 S_{\max}}{\delta^3 \varepsilon^2} \log \frac{\| b \|_\infty}{\delta \varepsilon} \right)$, where $S_{\max} = \max_{i \in [n]} |S_{ii}|$. We also prove a matching lower bound, showing that the linear dependence on $S_{\max}$ is optimal. Unlike previous sublinear-time algorithms, which apply only to symmetric diagonally dominant matrices with non-negative diagonal entries, our algorithm works for general strictly diagonally dominant matrices ($\delta > 0$) and a broader class of non-strictly diagonally dominant matrices $(\delta = 0)$. Our approach is based on analyzing a simple probabilistic recurrence satisfied by the solution. As an application, we obtain an improved sublinear-time algorithm for opinion estimation in the Friedkin--Johnsen model.
Reliable electricity supply depends on the seamless operation of high-voltage grid infrastructure spanning both transmission and sub-transmission levels. Beneath this apparent uniformity lies a striking structural diversity, which leaves a clear imprint on system vulnerability. In this paper, we present harmonized topological models of the high-voltage grids of 15 European countries, integrating all elements at voltage levels above 110 kV. Topological analysis of these networks reveals a simple yet robust pattern: node degree distributions consistently follow an exponential decay, but the rate of decay varies significantly across countries. Through a detailed and systematic evaluation of network tolerance to node and edge removals, we show that the decay rate delineates the boundary between systems that are more resilient to failures and those that are prone to large-scale disruptions. Furthermore, we demonstrate that this numerical boundary is highly sensitive to which layers of the infrastructure are included in the models. To our knowledge, this study provides the first quantitative cross-country comparison of 15 European high-voltage networks, linking topological properties with vulnerability characteristics.
In many real-world scenarios, an individual's local social network carries significant influence over the opinions they form and subsequently propagate to others. In this paper, we propose a novel diffusion model -- the Pressure Threshold model (PT) -- for dynamically simulating the spread of influence through a social network. This new model extends the popular Linear Threshold Model (LT) by adjusting a node's outgoing influence proportional to the influence it receives from its activated neighbors. We address the Influence Maximization (IM) problem, which involves selecting the most effective seed nodes to achieve maximal graph coverage after a diffusion process, and how the problem manifests with the PT Model. Experiments conducted on real-world networks, facilitated by enhancements to the open-source network-diffusion Python library, CyNetDiff, demonstrate unique seed node selection for the PT Model when compared to the LT Model. Moreover, analyses demonstrate that densely connected networks amplify pressure effects more significantly than sparse networks.
The main goal of this paper is to investigate an up and coming crowdfunding platform used to raise funds for social causes in India called Ketto. Despite the growing usage of this platform, there is insufficient understanding in terms of why users choose this platform when there are other popular platforms such as GoFundMe. Using a dataset comprising of 119,493 Ketto campaigns, our research conducts an in-depth investigation into different aspects of how the campaigns on Ketto work with a specific focus on medical campaigns, which make up the largest percentage of social causes in the dataset. We also perform predictive modeling to identify the factors that contribute to the success of campaigns on this platform. We use several features such as the campaign metadata, description, geolocation, donor behaviors, and campaign-related features to learn about the platform and its components. Our results suggest that majority of the campaigns for medical causes seek funds to address chronic health conditions, yet medical campaigns have the least success rate. Most of the campaigns originate from the most populous states and major metropolitan cities in India. Our analysis also indicates that factors such as online engagement on the platform in terms of the number of comments, duration of the campaign, and frequent updates on a campaign positively influence the funds being raised. Overall, this preliminary work sheds light on the importance of investigating various dynamics around crowdfunding for India-focused community-driven needs.
A smart city is essential for sustainable urban development. In addition to citizen engagement, a smart city enables connected infrastructure, data-driven decision making and smart mobility. For most of these features, network data plays a critical role, particularly from public Wi-Fi infrastructures, where cities can benefit from optimized services such as public transport management and the safety and efficiency of large events. One of the biggest concerns in developing a smart city is using secure and private data. This is particularly relevant in the case of Wi-Fi network data, where sensitive information can be collected. This paper specifically addresses the problem of sharing secure data to enhance the quality of the Wi-Fi network in a city. Despite the high importance of this type of data, related work focuses on improving the safety of mobility patterns, targeting only the protection of MAC addresses. On the opposite side, we provide a practical methodology for safeguarding all attributes in real Wi-Fi network data. This study was developed in collaboration with a multidisciplinary team of legal experts, data custodians and technical privacy specialists, resulting in high-quality data. On top of that, we show how to integrate the legal considerations for secure data sharing. Our approach promotes data-driven innovation and privacy awareness in the context of smart city initiatives, which have been tested in a real scenario.
We analyze a simple algorithm for network embedding, explicitly characterizing conditions under which the learned representation encodes the graph's generative model fully, partially, or not at all. In cases where the embedding loses some information (i.e., is not invertible), we describe the equivalence classes of graphons that map to the same embedding, finding that these classes preserve community structure but lose substantial density information. Finally, we show implications for community detection and link prediction. Our results suggest strong limitations on the effectiveness of link prediction based on embeddings alone, and we show common conditions under which naive link prediction adds edges in a disproportionate manner that can either mitigate or exacerbate structural biases.
Preferential attachment is often suggested to be the underlying mechanism of the growth of a network, largely due to that many real networks are, to a certain extent, scale-free. However, such attribution is usually made under debatable practices of determining scale-freeness and when only snapshots of the degree distribution are observed. In the presence of the evolution history of the network, modelling the increments of the evolution allows us to measure preferential attachment directly. Therefore, we propose a generalised linear model for such purpose, where the in-degrees and their increments are the covariate and response, respectively. Not only are the parameters that describe the preferential attachment directly incorporated, they also ensure that the tail heaviness of the asymptotic degree distribution is realistic. The Bayesian approach to inference enables the hierarchical version of the model to be implemented naturally. The application to the dependency network of R packages reveals subtly different behaviours between new dependencies by new and existing packages, and between addition and removal of dependencies.
Scientific research needs a new system that appropriately values science and scientists. Key innovations, within institutions and funding agencies, are driving better assessment of research, with open knowledge and FAIR (findable, accessible, interoperable, and reusable) principles as central pillars. Furthermore, coalitions, agreements, and robust infrastructures have emerged to promote more accurate assessment metrics and efficient knowledge sharing. However, despite these efforts, the system still relies on outdated methods where standardized metrics such as h-index and journal impact factor dominate evaluations. These metrics have had the unintended consequence of pushing researchers to produce more outputs at the expense of integrity and reproducibility. In this community paper, we bring together a global community of researchers, funding institutions, industrial partners, and publishers from 14 different countries across the 5 continents. We aim at collectively envision an evolved knowledge sharing and research evaluation along with the potential positive impact on every stakeholder involved. We imagine these ideas to set the groundwork for a cultural change to redefine a more fair and equitable scientific landscape.
The complex systems keyword diagram generated by the author in 2010 has been used widely in a variety of educational and outreach purposes, but it definitely needs a major update and reorganization. This short paper reports our recent attempt to update the keyword diagram using information collected from the following multiple sources: (a) collective feedback posted on social media, (b) recent reference books on complex systems and network science, (c) online resources on complex systems, and (d) keyword search hits obtained using OpenAlex, an open-access bibliographic catalogue of scientific publications. The data (a), (b) and (c) were used to incorporate the research community's internal perceptions of the relevant topics, whereas the data (d) was used to obtain more objective measurements of the keywords' relevance and associations from publications made in complex systems science. Results revealed differences and overlaps between public perception and actual usage of keywords in publications on complex systems. Four topical communities were obtained from the keyword association network, although they were highly intertwined with each other. We hope that the resulting network visualization of complex systems keywords provides a more up-to-date, accurate topic map of the field of complex systems as of today.
Given its computational efficiency and versatility, belief propagation is the most prominent message passing method in several applications. In order to diminish the damaging effect of loops on its accuracy, the first explicit version of generalized belief propagation for networks, the KCN-method, was recently introduced. This approach was originally developed in the context of two target problems: percolation and the calculation of the spectra of sparse matrices. Later on, the KCN-method was extended in order to deal with inference in the context of probabilistic graphical models on networks. It was in this scenario where an improvement on the KCN-method, the NIB-method, was conceived. We show here that this improvement can also achieved in the original applications of the KCN-method, namely percolation and matrix spectra.
We study the Susceptible-Infectious-Susceptible (SIS) model on arbitrary networks. The well-established pair approximation treats neighboring pairs of nodes exactly while making a mean field approximation for the rest of the network. We improve the method by expanding the state space dynamically, giving nodes a memory of when they last became susceptible. The resulting approximation is simple to implement and appears to be highly accurate, both in locating the epidemic threshold and in computing the quasi-stationary fraction of infected individuals above the threshold, for both finite graphs and infinite random graphs.
Domestic Violence (DV) is a pervasive public health problem characterized by patterns of coercive and abusive behavior within intimate relationships. With the rise of social media as a key outlet for DV victims to disclose their experiences, online self-disclosure has emerged as a critical yet underexplored avenue for support-seeking. In addition, existing research lacks a comprehensive and nuanced understanding of DV self-disclosure, support provisions, and their connections. To address these gaps, this study proposes a novel computational framework for modeling DV support-seeking behavior alongside community support mechanisms. The framework consists of four key components: self-disclosure detection, post clustering, topic summarization, and support extraction and mapping. We implement and evaluate the framework with data collected from relevant social media communities. Our findings not only advance existing knowledge on DV self-disclosure and online support provisions but also enable victim-centered digital interventions.
Community detection is a core tool for analyzing large realworld graphs. It is often used to derive additional local features of vertices and edges that will be used to perform a downstream task, yet the impact of community detection on downstream tasks is poorly understood. Prior work largely evaluates community detection algorithms by their intrinsic objectives (e.g., modularity). Or they evaluate the impact of using community detection onto on the downstream task. But the impact of particular community detection algortihm support the downstream task. We study the relationship between community structure and downstream performance across multiple algorithms and two tasks. Our analysis links community-level properties to task metrics (F1, precision, recall, AUC) and reveals that the choice of detection method materially affects outcomes. We explore thousands of community structures and show that while the properties of communities are the reason behind the impact on task performance, no single property explains performance in a direct way. Rather, results emerge from complex interactions among properties. As such, no standard community detection algorithm will derive the best downstream performance. We show that a method combining random community generation and simple machine learning techniques can derive better performance
In this work we present PercIS, an algorithm based on Importance Sampling to approximate the percolation centrality of all the nodes of a graph. Percolation centrality is a generalization of betweenness centrality to attributed graphs, and is a useful measure to quantify the importance of the vertices in a contagious process or to diffuse information. However, it is impractical to compute it exactly on modern-sized networks. First, we highlight key limitations of state-of-the-art sampling-based approximation methods for the percolation centrality, showing that in most cases they cannot achieve accurate solutions efficiently. Then, we propose and analyze a novel sampling algorithm based on Importance Sampling, proving tight sample size bounds to achieve high-quality approximations. Our extensive experimental evaluation shows that PercIS computes high-quality estimates and scales to large real-world networks, while significantly outperforming, in terms of sample sizes, accuracy and running times, the state-of-the-art.
The emergence of decentralized social media platforms presents new opportunities and challenges for real-time analysis of public discourse. This study introduces CognitiveSky, an open-source and scalable framework designed for sentiment, emotion, and narrative analysis on Bluesky, a federated Twitter or X.com alternative. By ingesting data through Bluesky's Application Programming Interface (API), CognitiveSky applies transformer-based models to annotate large-scale user-generated content and produces structured and analyzable outputs. These summaries drive a dynamic dashboard that visualizes evolving patterns in emotion, activity, and conversation topics. Built entirely on free-tier infrastructure, CognitiveSky achieves both low operational cost and high accessibility. While demonstrated here for monitoring mental health discourse, its modular design enables applications across domains such as disinformation detection, crisis response, and civic sentiment analysis. By bridging large language models with decentralized networks, CognitiveSky offers a transparent, extensible tool for computational social science in an era of shifting digital ecosystems.