Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
This study investigates differential games with motion-payoff uncertainty in continuous-time settings. We propose a framework where players update their beliefs about uncertain parameters using continuous Bayesian updating. Theoretical proofs leveraging key probability theorems demonstrate that players' beliefs converge to the true parameter values, ensuring stability and accuracy in long-term estimations. We further derive Nash Equilibrium strategies with continuous Bayesian updating for players, emphasizing the role of belief updates in decision-making processes. Additionally, we establish the convergence of Nash Equilibrium strategies with continuous Bayesian updating. The efficacy of both continuous and dynamic Bayesian updating is examined in the context of pollution control games, showing convergence in players' estimates under small time intervals in discrete scenarios.
We introduce and study the problem of online omniprediction with long-term constraints. At each round, a forecaster is tasked with generating predictions for an underlying (adaptively, adversarially chosen) state that are broadcast to a collection of downstream agents, who must each choose an action. Each of the downstream agents has both a utility function mapping actions and state to utilities, and a vector-valued constraint function mapping actions and states to vector-valued costs. The utility and constraint functions can arbitrarily differ across downstream agents. Their goal is to choose actions that guarantee themselves no regret while simultaneously guaranteeing that they do not cumulatively violate the constraints across time. We show how to make a single set of predictions so that each of the downstream agents can guarantee this by acting as a simple function of the predictions, guaranteeing each of them $\tilde{O}(\sqrt{T})$ regret and $O(1)$ cumulative constraint violation. We also show how to extend our guarantees to arbitrary intersecting contextually defined \emph{subsequences}, guaranteeing each agent both regret and constraint violation bounds not just marginally, but simultaneously on each subsequence, against a benchmark set of actions simultaneously tailored to each subsequence.
Decentralized data-feed systems enable blockchain-based smart contracts to access off-chain information by aggregating values from multiple oracles. To improve accuracy, these systems typically use an aggregation function, such as majority voting, to consolidate the inputs they receive from oracles and make a decision. Depending on the final decision and the values reported by the oracles, the participating oracles are compensated through shared rewards. However, such incentive mechanisms are vulnerable to mirroring attacks, where a single user controls multiple oracles to bias the decision of the aggregation function and maximize rewards. This paper analyzes the impact of mirroring attacks on the reliability and dependability of majority voting-based data-feed systems. We demonstrate how existing incentive mechanisms can unintentionally encourage rational users to implement such attacks. To address this, we propose a new incentive mechanism that discourages Sybil behavior. We prove that the proposed mechanism leads to a Nash Equilibrium in which each user operates only one oracle. Finally, we discuss the practical implementation of the proposed incentive mechanism and provide numerical examples to demonstrate its effectiveness.
A perfect clone in an ordinal election (i.e., an election where the voters rank the candidates in a strict linear order) is a set of candidates that each voter ranks consecutively. We consider different relaxations of this notion: independent or subelection clones are sets of candidates that only some of the voters recognize as a perfect clone, whereas approximate clones are sets of candidates such that every voter ranks their members close to each other, but not necessarily consecutively. We establish the complexity of identifying such imperfect clones, and of partitioning the candidates into families of imperfect clones. We also study the parameterized complexity of these problems with respect to a set of natural parameters such as the number of voters, the size or the number of imperfect clones we are searching for, or their level of imperfection.
In natural ecosystems and human societies, self-organized resource allocation and policy synergy are ubiquitous and significant. This work focuses on the synergy between Dual Reinforcement Learning Policies in the Minority Game (DRLP-MG) to optimize resource allocation. Our study examines a mixed-structured population with two sub-populations: a Q-subpopulation using Q-learning policy and a C-subpopulation adopting the classical policy. We first identify a synergy effect between these subpopulations. A first-order phase transition occurs as the mixing ratio of the subpopulations changes. Further analysis reveals that the Q-subpopulation consists of two internal synergy clusters (IS-clusters) and a single external synergy cluster (ES-cluster). The former contribute to the internal synergy within the Q-subpopulation through synchronization and anti-synchronization, whereas the latter engages in the inter-subpopulation synergy. Within the ES-cluster, the classical momentum strategy in the financial market manifests and assumes a crucial role in the inter-subpopulation synergy. This particular strategy serves to prevent long-term under-utilization of resources. However, it also triggers trend reversals and leads to a decrease in rewards for those who adopt it. Our research reveals that the frozen effect, in either the C- or Q-subpopulation, is a crucial prerequisite for synergy, consistent with previous studies. We also conduct mathematical analyses on subpopulation synergy effects and the synchronization and anti-synchronization forms of IS-clusters in the Q-subpopulation. Overall, our work comprehensively explores the complex resource-allocation dynamics in DRLP-MG, uncovers multiple synergy mechanisms and their conditions, enriching the theoretical understanding of reinforcement-learning-based resource allocation and offering valuable practical insights
Correlated equilibrium generalizes Nash equilibrium by allowing a central coordinator to guide players' actions through shared recommendations, similar to how routing apps guide drivers. We investigate how a coordinator can learn a correlated equilibrium in convex games where each player minimizes a convex cost function that depends on other players' actions, subject to convex constraints without knowledge of the players' cost functions. We propose a learning framework that learns an approximate correlated equilibrium by actively querying players' regrets, \emph{i.e.}, the cost saved by deviating from the coordinator's recommendations. We first show that a correlated equilibrium in convex games corresponds to a joint action distribution over an infinite joint action space that minimizes all players' regrets. To make the learning problem tractable, we introduce a heuristic that selects finitely many representative joint actions by maximizing their pairwise differences. We then apply Bayesian optimization to learn a probability distribution over the selected joint actions by querying all players' regrets. The learned distribution approximates a correlated equilibrium by minimizing players' regrets. We demonstrate the proposed approach via numerical experiments on multi-user traffic assignment games in a shared transportation network.
Cyber defense operations increasingly require long-term strategic planning under uncertainty and resource constraints. We propose a new use of combinatorial auctions for allocating defensive action bundles in a realistic cyber environment, using host-specific valuations derived from reinforcement learning (RL) Q-values. These Q-values encode long-term expected utility, allowing upstream planning. We train CAFormer, a differentiable Transformer-based auction mechanism, to produce allocations that are approximately incentive-compatible under misreporting. Rather than benchmarking against existing agents, we explore the qualitative and strategic properties of the learned mechanisms. Compared to oracle and heuristic allocations, our method achieves competitive revenue while offering robustness to misreporting. In addition, we find that allocation patterns correlate with adversarial and defensive activity, suggesting implicit alignment with operational priorities. Our results demonstrate the viability of auction-based planning in cyber defense and highlight the interpretability benefits of RL-derived value structures.
There is a broad recognition that commitment-based mechanisms can promote coordination and cooperative behaviours in both biological populations and self-organised multi-agent systems by making individuals' intentions explicit prior to engagement. Yet their effectiveness depends on sustained compliance supported by institutions, especially in one-off interactions. Despite advances in quantitative studies of cooperation and commitment, most applied analyses and policy debates remain largely qualitative, with limited attention to the allocation of scarce institutional resources between enhancing participation and ensuring commitment compliance. Herein, we develop an evolutionary game-theoretic model that explicitly examines the strategic distribution of a limited budget for institutional incentives, namely rewards or punishments, aimed at these two critical objectives within pre-commitment frameworks. Our findings reveal that a reward-based incentive approach consistently yields greater coordination success than a punishment-based approach, with optimal outcomes arising when resources are appropriately distributed between participation promotion and compliance assurance. These findings offer novel insights for designing institutional incentives to promote broad, coordinated adoption of new technologies.
We study the fair allocation of indivisible items to $n$ agents to maximize the utilitarian social welfare, where the fairness criterion is envy-free up to one item and there are only two different utility functions shared by the agents. We present a $2$-approximation algorithm when the two utility functions are normalized, improving the previous best ratio of $16 \sqrt{n}$ shown for general normalized utility functions; thus this constant ratio approximation algorithm confirms the APX-completeness in this special case previously shown APX-hard. When there are only three agents, i.e., $n = 3$, the previous best ratio is $3$ shown for general utility functions, and we present an improved and tight $\frac 53$-approximation algorithm when the two utility functions are normalized, and a best possible and tight $2$-approximation algorithm when the two utility functions are unnormalized.
We initiate the study of mechanism design with outliers, where the designer can discard $z$ agents from the social cost objective. This setting is particularly relevant when some agents exhibit extreme or atypical preferences. As a natural case study, we consider facility location on the line: $n$ strategic agents report their preferred locations, and a mechanism places a facility to minimize a social cost function. In our setting, the $z$ agents farthest from the chosen facility are excluded from the social cost. While it may seem intuitive that discarding outliers improves efficiency, our results reveal that the opposite can hold. We derive tight bounds for deterministic strategyproof mechanisms under the two most-studied objectives: utilitarian and egalitarian social cost. Our results offer a comprehensive view of the impact of outliers. We first show that when $z \ge n/2$, no strategyproof mechanism can achieve a bounded approximation for either objective. For egalitarian cost, selecting the $(z + 1)$-th order statistic is strategyproof and 2-approximate. In fact, we show that this is best possible by providing a matching lower bound. Notably, this lower bound of 2 persists even when the mechanism has access to a prediction of the optimal location, in stark contrast to the setting without outliers. For utilitarian cost, we show that strategyproof mechanisms cannot effectively exploit outliers, leading to the counterintuitive outcome that approximation guarantees worsen as the number of outliers increases. However, in this case, access to a prediction allows us to design a strategyproof mechanism achieving the best possible trade-off between consistency and robustness. Finally, we also establish lower bounds for randomized mechanisms that are truthful in expectation.
We consider a setting where we have a ground set $M$ together with real-valued set functions $f_1, \dots, f_n$, and the goal is to partition $M$ into two sets $S_1,S_2$ such that $|f_i(S_1) - f_i(S_2)|$ is small for every $i$. Many results in discrepancy theory can be stated in this form with the functions $f_i$ being additive. In this work, we initiate the study of the unstructured case where $f_i$ is not assumed to be additive. We show that even without the additivity assumption, the upper bound remains at most $O(\sqrt{n \log n})$. Our result has implications on the fair allocation of indivisible goods. In particular, we show that a consensus halving up to $O(\sqrt{n \log n})$ goods always exists for $n$ agents with monotone utilities. Previously, only an $O(n)$ bound was known for this setting.
We study a Bayesian persuasion setting in which a sender wants to persuade a critical mass of receivers by revealing partial information about the state to them. The homogeneous binary-action receivers are located on a communication network, and each observes the private messages sent to them and their immediate neighbors. We examine how the sender's expected utility varies with increased communication among receivers. We show that for general families of networks, extending the network can strictly benefit the sender. Thus, the sender's gain from persuasion is not monotonic in network density. Moreover, many network extensions can achieve the upper bound on the sender's expected utility among all networks, which corresponds to the payoff in an empty network. This is the case in networks reflecting a clear informational hierarchy (e.g., in global corporations), as well as in decentralized networks in which information originates from multiple sources (e.g., influencers in social media). Finally, we show that a slight modification to the structure of some of these networks precludes the possibility of such beneficial extensions. Overall, our results caution against presuming that more communication necessarily leads to better collective outcomes.
Coordination tasks traditionally performed by humans are increasingly being delegated to autonomous agents. As this pattern progresses, it becomes critical to evaluate not only these agents' performance but also the processes through which they negotiate in dynamic, multi-agent environments. Furthermore, different agents exhibit distinct advantages: traditional statistical agents, such as Bayesian models, may excel under well-specified conditions, whereas large language models (LLMs) can generalize across contexts. In this work, we compare humans (N = 216), LLMs (GPT-4o, Gemini 1.5 Pro), and Bayesian agents in a dynamic negotiation setting that enables direct, identical-condition comparisons across populations, capturing both outcomes and behavioral dynamics. Bayesian agents extract the highest surplus through aggressive optimization, at the cost of frequent trade rejections. Humans and LLMs can achieve similar overall surplus, but through distinct behaviors: LLMs favor conservative, concessionary trades with few rejections, while humans employ more strategic, risk-taking, and fairness-oriented behaviors. Thus, we find that performance parity -- a common benchmark in agent evaluation -- can conceal fundamental differences in process and alignment, which are critical for practical deployment in real-world coordination tasks.
Cyber warfare has become a central element of modern conflict, especially within multi-domain operations. As both a distinct and critical domain, cyber warfare requires integrating defensive and offensive technologies into coherent strategies. While prior research has emphasized isolated tactics or fragmented technologies, a holistic understanding is essential for effective resource deployment and risk mitigation. Game theory offers a unifying framework for this purpose. It not only models attacker-defender interactions but also provides quantitative tools for equilibrium analysis, risk assessment, and strategic reasoning. Integrated with modern AI techniques, game-theoretic models enable the design and optimization of strategies across multiple levels of cyber warfare, from policy and strategy to operations, tactics, and technical implementations. These models capture the paradoxical logic of conflict, where more resources do not always translate into greater advantage, and where nonlinear dynamics govern outcomes. To illustrate the approach, this chapter examines RedCyber, a synthetic cyber conflict, demonstrating how game-theoretic methods capture the interdependencies of cyber operations. The chapter concludes with directions for future research on resilience, cros-echelon planning, and the evolving role of AI in cyber warfare.
In this paper, we consider finite-strategy approximations of infinite-strategy evolutionary games. We prove that such approximations converge to the true dynamics over finite-time intervals, under mild regularity conditions which are satisfied by classical examples, e.g., the replicator dynamics. We identify and formalize novel characteristics in evolutionary games: choice mobility, and its complement choice paralysis. Choice mobility is shown to be a key sufficient condition for the long-time limiting behavior of finite-strategy approximations to coincide with that of the true infinite-strategy game. An illustrative example is constructed to showcase how choice paralysis may lead to the infinite-strategy game getting "stuck," even though every finite approximation converges to equilibrium.
Budget aggregation deals with the social choice problem of distributing an exogenously given budget among a set of public projects, given agents' preferences. Taking a game-theoretic perspective, we initialize the study of budget-aggregation games where each agent has virtual decision power over some fraction of the budget. This paper investigates the structure and shows efficient computability of Nash equilibria in this setting for various preference models. In particular, we show that Nash equilibria for Leontief utilities can be found in polynomial time, solving an open problem from Brandt et al. [2023].
The increasing reliance on cyber physical infrastructure in modern power systems has amplified the risk of targeted cyber attacks, necessitating robust and adaptive resilience strategies. This paper presents a mathematically rigorous game theoretic framework to evaluate and enhance microgrid resilience using a combination of quantitative resilience metrics Load Served Ratio LSR, Critical Load Resilience CLR, Topological Survivability Score TSS, and DER Resilience Score DRS. These are integrated into a unified payoff matrix using the Analytic Hierarchy Process AHP to assess attack defense interactions. The framework is formalized as a finite horizon Markov Decision Process MDP with formal convergence guarantees and computational complexity bounds. Three case studies are developed 1. static attacks analyzed via Nash equilibrium, 2. severe attacks incorporating high impact strategies, and 3. adaptive attacks using Stackelberg games, regret matching, softmax heuristics, and Multi Agent Q Learning. Rigorous theoretical analysis provides convergence proofs with explicit rates , PAC learning sample complexity bounds, and computational complexity analysis. The framework is tested on an enhanced IEEE 33bus distribution system with DERs and control switches, demonstrating the effectiveness of adaptive and strategic defenses in improving cyber physical resilience with statistically significant improvements of 18.7% 2.1% over static approaches.
In this work, we propose the first fully first-order method to compute an epsilon stationary Stackelberg equilibrium with convergence guarantees. To achieve this, we first reframe the leader follower interaction as single level constrained optimization. Second, we define the Lagrangian and show that it can approximate the leaders gradient in response to the equilibrium reached by followers with only first-order gradient evaluations. These findings suggest a fully first order algorithm that alternates between (i) approximating followers best responses through gradient descent and (ii) updating the leaders strategy via approximating the gradient using Lagrangian.