Loading...
Loading...
Browse, search, and filter preprints from arXiv—fast, readable, and built for curious security folks.
Showing 18 loaded of 48,952—scroll for more
As autonomous LLM agents increasingly hold real credentials and operate infrastructure without a human in the loop, operators have no standard way to tell an agent that a resource is off-limits. Access controls either let the agent in (it has valid credentials) or hard-fail it (indistinguishable from any other client). We propose a third mode: a lightweight, published in-band deny signal -- the Recuse Signal -- that a server emits over a protocol's existing channels (an SSH banner, a PostgreSQL NOTICE) asking a connecting automated agent to voluntarily withdraw. This is a cooperative governance control, the robots.txt analogue for live access; it is explicitly not a security boundary. Its value is entirely empirical and, to our knowledge, unmeasured: do compliant LLM agents actually honor such a signal? We define the signal as an open mini-standard, implement two zero- or low-footprint adapters (an SSH banner/PAM hook and a PostgreSQL wire-protocol proxy), deploy them on a live production host, and run a controlled experiment in which fresh agents are given a benign operations task and observed for recusal. In a pilot (SSH; OpenAI GPT-4o and GPT-4o-mini; and Claude Code as a deployed agent), the signal cleanly induces recusal -- 100% recusal when present versus 100% task completion in a no-signal control -- and, revealingly, behaves as a cooperative rather than absolute signal: an explicit operator-authorization framing flips the most capable model to proceed, while other agents continue to defer to the on-host policy. We release the standard, adapters, and experiment harness for reproduction.
WebMCP is a newly emerging protocol that enables websites to expose tools directly to AI agents, bypassing traditional user interfaces and introducing new security risks. The dynamic exposure of agent-accessible tools in WebMCP expands the attack surface of web sessions, especially when third-party scripts are involved. In this study, we identify a new potential threat, termed Mid-Session Tool Injection (MSTI), in which attackers leverage third-party scripts to inject malicious tools during an active session. To better characterize this threat, we classify MSTI based on the stage and target of manipulation, distinguishing between Tool Hijacking and Tool Framing. Tool Hijacking modifies the set of tools visible to the agent through mechanisms such as the AbortSignal API or race conditions during tool registration. In contrast, Tool Framing influences the agent's perception of tool roles through metadata fields such as tool name, description, readOnlyHint, and inputSchema. Our implementation demonstrates that both Tool Hijacking and Tool Framing can successfully disrupt the intended functionality of WebMCP. Based on these results, we outline potential mitigation directions and provide security design recommendations for WebMCP, including binding tool identity to its origin, ensuring lifecycle consistency, enforcing data boundaries for third-party tools, and maintaining traceable logs of tool registration and invocation. These findings indicate that MSTI arises from WebMCP's unique tool lifecycle and structured metadata, making the tool surface itself an emerging security concern.
The European Union will introduce the EUDI Wallet by late 2026, which allows users to hold digital credentials (i.e., representations of physical official identity documents) on their devices. This will allow users to securely and privately disclose identity attributes to websites. Although such a system has many benefits, it also introduces risks caused by poor credential disclosure decisions. In this paper, we (i) conduct a large-scale survey on credential disclosure with users and experts and (ii) evaluate the effectiveness and feasibility of our Credential Assistant that displays expert recommendations and user opinions. Our results show that users are likely to overshare (e.g., ~20% of users disclosed their official ID to news websites). This indicates that users struggle to protect their privacy, which will impact the usability of the EUDI Wallet and lead to privacy violations, identity theft, and other abuses of leaked credentials. Finally, we show that our Credential Assistant significantly reduces users' credential disclosure mistakes from ~15% to ~7%. However, it does not fully eliminate poor credential disclosure decisions, indicating that stronger interventions may be necessary, especially for sensitive attributes.
Evasion attacks present a significant challenge to the robustness of machine learning (ML)-based classifiers, particularly in critical applications such as fraud detection and cybersecurity. Although existing defense mechanisms are effective in some settings, they often suffer from limited generalizability and do not systematically improve model robustness across diverse attack scenarios. To address these limitations, we introduce Robust Ensemble of Selectively Strengthened and Augmented Predictors (RESSAP), a novel framework that transforms a single classifier into an ensemble of robust classifiers. Each classifier in the ensemble is trained on a carefully selected subset of features, where feature selection is guided by a resilience metric that accounts for both feature importance and robustness. During inference, a random subset of these classifiers is used to make predictions, increasing unpredictability and improving resistance to adversarial manipulation. In addition, noise-based data augmentation is applied during training to strengthen decision boundaries and improve generalization. Our experimental results demonstrate that RESSAP significantly improves robustness against adversarial evasion attacks while maintaining strong accuracy on clean data. Overall, this model-agnostic framework provides a scalable and flexible defense strategy for enhancing the security of machine learning systems without requiring major changes to existing architectures.
Large code language models (CodeLLMs) can generate and rewrite programs, enabling functionality-preserving code mutation that may be used to create diverse malware variants and evade signature-based detection. A key security question is whether this mutation capability survives model compression, which would make deployment feasible under limited hardware budgets. We propose SecRL-Prune, a structured pruning framework for CodeLLMs that operates on feed-forward (MLP/FFN) channels. Starting from a pretrained teacher, it learns a layer-wise pruning policy with reinforcement learning using a teacher-student KL-divergence reward. To improve efficiency, we cache the teacher's top-P predictions once and compare the pruned student against this compact target, avoiding simultaneous teacher-student residency in GPU memory. We evaluate SecRL-Prune on HumanEval using pass@k for execution correctness and var@k for code diversity across three 7B CodeLLMs at 10-30% compression. SecRL-Prune consistently preserves higher pass@k and var@k than recent structured pruning baselines under aggressive pruning. In a case study on real malware samples, semantics-preserving mutations from 20%-pruned models substantially reduced detections. These results show that code mutation capability can survive significant structured pruning, highlighting the security relevance of compressed CodeLLMs.
As chatbots increasingly influence daily decision-making, their potential to produce misleading responses poses substantial risks to users. This paper investigates a critical cognitive vulnerability in LLMs: their tendency to uncritically trust external context when presented with fabricated evidence bearing markers of credibility. We introduce Ghostwriter, a two-phase attack framework that first repackages misleading statements with fabricated rationales, then instruct target LLMs to incorporate these viewpoints when responding to relevant queries. Experiments on BBQ, ToxiGen, and our specialized dataset reveal that commercial LLMs without external safety classifiers remain highly vulnerable, while even frontier classifier-guarded models (e.g., GPT-5.4) reduce but do not eliminate the attack. Building on this, we explore multiple defense strategies, among which a tailored safety policy enables gpt-oss-safeguard to achieve 81% detection rate.
An estimated 5.3 billion mobile phones became electronic waste in 2022. Many of these devices can be repurposed and used in different contexts to extend their lifetime and to reduce ecological impacts. An often overlooked aspect of smartphone reuse is cybersecurity: these devices embed hardware-backed security mechanisms that rely on vendor-controlled provisioning and are designed for a fixed device lifecycle. In this paper, we investigate whether security mechanisms and guarantees remain effective when devices are repurposed outside their original ecosystem. We explore security features in a PinePhone, an open-hardware smartphone, and focus on three core security aspects: boot chain integrity, isolation provided by the Trusted Execution Environment, and the protection of hardware-bound secrets. Our experiments simulate realistic repurposing scenarios and highlight the complexity of reconstructing trust anchors. We generalize our observations to infer requirements for secure repurposing and illustrate how vendor locked mechanisms hinder the repurposing of a majority of discarded devices.
Image safety classifiers serve as a critical component of contemporary content moderation systems on the internet. However, their resilience against user-style malicious image editing remains underexplored. Such behaviors are highly prevalent in daily scenarios but difficult to fully reproduce. To explore this vulnerability, we introduce RedEdit, a novel black-box red-teaming agent that formulates photo-editing evasion as a combinatorial search problem over edit-tool sequences. It adopts a Vision-Language-Model (VLM)-based proposer to generate semantically targeted candidate edits and a Monte Carlo Tree Search (MCTS) planner to prioritize promising edit paths while backtracking from ineffective ones. Together, the proposer and planner instantiate two key capabilities of human attackers, i.e., domain knowledge and iterative backtracking, respectively, to reproduce this practical threat. Our extensive experiments on UnsafeBench reveal profound systemic vulnerabilities: fewer than two edits on average enable 76.2% of unsafe images to evade detectors, while retaining 93.0% malicious semantics, meaning that such manipulated content remains perceptually malicious to humans while easily bypassing automated moderation. We therefore appeal to the community for more attention to this overlooked practical threat.
Cheating poses a significant threat to the Multiplayer Online Games (MOG) industry by degrading player satisfaction and undermining the fairness in competitive gaming. Despite efforts to develop mitigation techniques, cheating remains difficult to detect and prevent in practice. In particular, a class of cheats based on network flow disruption remains unsolvable. To find out how to detect such attacks we need access to representative labelled data. However, no such dataset exists. To address this gap, we leverage an experimental framework that combines a multiplayer online game with a plug-in capable of both reproducing cheating attacks and collecting logs at two levels: network and application-layer. This paper presents a dataset compiling records of game sessions played by both real players and automated game clients, with cheating actions explicitly logged. To the best of our knowledge, this is the first dataset that provides logs of network flow disruption cheats. While it includes such network-based cheats, it is not limited to them and also contains records of more commonly studied cheats, such as aimbots and wallhacks. This dataset can be used by researchers in academia and industry seeking to develop cheating detection mechanisms for online games. Furthermore, it is designed to be evolutive and can be enriched by others creating their own data traces with the proposed framework.
Existing learning-based detectors for Solidity smart-contracts reduce vulnerability detection to syntactic pattern matching within single functions, yet many of the most consequential exploits (The DAO, Cream Finance) exist not in any individual function but in the relationship between functions and in the combination of conditions that made the attack feasible. Thus, we propose AttackPathGNN, a graph neural network (GNN) that reframes detection as reasoning over explicit attack paths. Two architectural choices distinguish it from prior GNN-based detectors: (1)a State Interference Graph that links every pair of functions sharing mutable storage through typed, weighted edges and through directed reentrancy-path edges defined by an explicit five-condition predicate; (2)conjunction pooling, a differentiable AND-aggregator over eight named exploit preconditions whose log-sigmoid form causes the per-function exploit score to collapse whenever any single mitigation (a reentrancy guard, an access-control modifier or SafeMath) is in place. Across five independent training runs, AttackPathGNN attains 92.3+/-0.2% F1 on the SmartBugs Wild held-out test partition (4.3+/-0.3% false-negative rate, 90.8+/-2.5% detection rate on the independently human-labelled SmartBugs Curated benchmark), recovering 6/10 DASP10 categories at 100% on every seed and Reentrancy at 98.7+/-1.8%. Each prediction is emitted with a structured remediation report, turning each verdict into an actionable, function-level audit finding.
Malware research primarily studies the results, the methods, and the impact. Even from an offensive security perspective, what is examined is the method, not the development strategy of the offender. This study investigates the behavioral signatures and coding patterns embedded in the malware source code. By analyzing a large corpus of leaked malware code and comparing it with carefully selected benign open-source software, we apply static application security testing and compute multiple software metrics. Based on cognitive psychology and criminological theories, our work interprets differences in code structure and quality as behavioral indicators, reflecting distinct motivational structures, risk tolerances, and development strategies of malware authors compared to benign software developers. Our findings reveal that malware code is generally smaller, less documented, and exhibits higher cyclomatic complexity per function, with reduced use of abstraction mechanisms such as classes and closures. Vulnerability analysis further reveals that malware exhibits more issues of the types that benign code typically avoids, suggesting a minimal investment in secure development practices. These patterns imply a development style optimized for expedience, operational secrecy, and evasion rather than long-term maintainability. Nonetheless, the code quality metrics indicate that it does not deviate significantly from benign software enough to be distinctive. By framing code metrics as proxies for behavioral signals and strategic choices, we demonstrate how quantitative software analysis can enrich behavioral cybersecurity research, offering new insights into the practices and priorities of malware developers. Our results pave the way for further research in the behavioral profiling of cyber offenders.
Service discovery is a fundamental process in wireless networks, enabling devices to find and communicate with services dynamically, and is critical for the seamless operation of modern systems like 5G and IoT. This paper introduces PriSrv+, an advanced privacy and usability-enhanced service discovery protocol for modern wireless networks and resource-constrained environments. PriSrv+ builds upon PriSrv (NDSS'24), by addressing critical limitations in expressiveness, privacy, scalability, and efficiency, while maintaining compatibility with widely-used wireless protocols such as mDNS, BLE, and Wi-Fi. A key innovation in PriSrv+ is the development of Fast and Expressive Matchmaking Encryption (FEME), the first matchmaking encryption scheme capable of supporting expressive access control policies with an unbounded attribute universe, allowing any arbitrary string to be used as an attribute. FEME significantly enhances the flexibility of service discovery while ensuring robust message and attribute privacy. Compared to PriSrv, PriSrv+ optimizes cryptographic operations, achieving 7.62* faster for encryption and 6.23* faster for decryption, and dramatically reduces ciphertext sizes by 87.33%. In addition, PriSrv+ reduces communication costs by 87.33% for service broadcast and 86.64% for anonymous mutual authentication compared with PriSrv. Formal security proofs confirm the security of FEME and PriSrv+. Extensive evaluations on multiple platforms demonstrate that PriSrv+ achieves superior performance, scalability, and efficiency compared to existing state-of-the-art protocols.
Rule-based Intrusion Detection and Prevention Systems (IDPS) offer precise attack detection as well as mitigation, however their manually crafted, signature-driven rules limit adaptability to emerging and zero-day threats. Additionally, existing public datasets (e.g., CICIDS2017, UNSW-NB15) focus on traffic classification and provide little structured information to support automatic rule synthesis or prevention logic. To address this gap, we propose Generative Thread Intelligence (GenTI) \footnote{GenTI refers to the proposed framework, and GTI refers to the dataset.} an LLM-driven benchmark for automatic generation of IDPS rules targeting unseen attacks. The dataset (GTI) aggregates over 150k detection and prevention rules from Snort, Suricata, Emerging Threats, as well as 50k YARA, each annotated with protocol behavior, payload signatures, contextual relationships, mappings to Cyber Threat Intelligence (CTI), along with actionable response types (alert, drop, reject). Moreover, on top of this corpus we design an LLM-based pipeline that transforms analyst prompts and representative payloads into deployable rules via structured prompt engineering, Chain-of-Thought (CoT) reasoning, as well as a Chain-of-Verification (CoVe) loop for syntactic, semantic, and security validation. The generated rules are executed in real time on (Snort/Suricata) and evaluated by syntax accuracy, semantic similarity, CTI coverage, security effectiveness as well as unseen attacks detection. Furthermore, our GenTI instantiation achieves a composite rule-quality score of 89.4\%, with 94.8\% CTI coverage, improving unseen attacks detection from 45\% to 87.4\% and reducing the false-positive rate from 8.5\% to 2.3\%. Overall, GenTI establishes the first large-scale benchmark that tightly couples rule-level CTI with LLM-based automation, enabling adaptive, self-evolving IDPS.
The hardness of the Learning Parity with Noise (LPN) problem is a foundational assumption in cryptography, forming the basis of constructions ranging from symmetric-key primitives to public-key encryption and beyond. A central open question is whether the average-case hardness of LPN can be based on worst-case complexity assumptions, as has been achieved for the analogous Learning With Errors (LWE) problem. Existing worst-case-to-average-case reductions for LPN [BLVW19, YZ21] rely on statistical smoothing of linear codes, which inherently limits the resulting average-case hardness to noise rates as large as $1/2 - 1/\mathrm{poly}(n)$, which is insufficient for public-key applications. We explore a new approach towards obtaining such reductions: rather than requiring that random sparse combinations of the rows of the generator matrix of a code be statistically close to uniform, we only require that they be computationally indistinguishable from uniform. This leads to a clean win-win structure: we show that any efficient LPN solver can be transformed into a pair of efficient algorithms $(S, D)$ such that for every matrix $A$ of appropriate dimensions over $\mathbb{F}_2$, either $S$ decodes the code generated by $A$ from random noise, or $D$ distinguishes random noisy codewords of the dual of this code from uniform. By instantiating this reduction with appropriate parameters, we obtain the average-case hardness of LPN with inverse-polynomial noise rate $n^{-α}$ for any constant $α< 1$, assuming the worst-case simultaneous hardness of decoding a code from random noise and distinguishing random noisy codewords of its dual from uniform. In particular, setting $α= 1/2$, our reduction yields LPN hardness in the parameter regime required for Alekhnovich's construction of public-key encryption [Ale03], a regime that was previously inaccessible via worst-case reductions.
Service discovery is essential in wireless communications. However, existing protocols provide limited privacy protection, leaking sensitive device information and opening routes to network attacks. This paper proposes a private service discovery protocol, called PriSrv, which enables both service providers and clients to specify fine-grained authentication policies before establishing connections. PriSrv achieves this via a dual-layer matching architecture: an outer layer filters mismatched entities using public attributes, while an inner layer handles mutual authentication using selectively disclosed private attributes. As a core component, we introduce the primitive of anonymous credential-based matchmaking encryption (ACME), which enables dual-layer matching in a single step to achieve bilateral policy control, selective attribute disclosure, and multi-show unlinkability. To instantiate ACME, we design a fast anonymous credential (FAC) scheme providing constant-size credentials and efficient verification. We demonstrate PriSrv's interoperability by integrating it with popular wireless frameworks including EAP, mDNS, BLE, and AirDrop. Detailed formal security proofs and extensive performance evaluations across desktop, laptop, smartphone, and Raspberry Pi platforms demonstrate that PriSrv provides enhanced privacy guarantees with high usability, achieving secure discovery in less than one second on mainstream mobile devices.
We verify the 'extendedGCD' implementation in Go's standard library ('crypto/internal/fips140/bigmod'), which plays a crucial role in the generation of RSA key pairs. Even though the Go implementation is supposedly a direct port from BoringSSL's implementation, we uncovered two deviations that each break the algorithm's invariants: (1) the Go implementation deviates in the way coefficients are updated, and (2) it permits a larger input domain. We address both deviations; the first by fixing the Go implementation, which results in an on average 24% speedup, and the second deviation by porting an existing proof for BoringSSL and extending it to cover the larger input domain. We prove correctness and termination of the fixed Go implementation using Gobra, a deductive program verifier for Go. Where necessary, we used Lean to prove key lemmata on non-linear arithmetic, which we import into Gobra. Our verification effort reveals three key insights: subtle bugs can slip into even well-reviewed code with surprising ease; formal verification is a powerful tool for uncovering them; and AI agents can facilitate the verification process by iteratively refining invariants and lemmata based on Gobra's error messages.
Protecting proprietary RAG databases from unauthorized redistribution is challenging: existing watermarking methods either inject fabricated relations between real entities, polluting the knowledge base with misinformation, or embed fragile lexical patterns that adversarial paraphrasing easily removes. We propose SentinelRAG, a watermarking framework that embeds style-consistent but fictitious knowledge entries into the RAG database. Our key insight is that synthetic knowledge describing fictitious entities is unlikely to be retrieved by legitimate queries, yet can be reliably triggered through targeted probes known only to the data owner. Experiments on four datasets ranging from 2.9k to 8.8M documents demonstrate that SentinelRAG achieves statistically significant detection $p < 10^{-5}$ across all tested configurations at only a 0.1% injection rate. Compared to the state-of-the-art, our method significantly reduces the false detection rate while maintaining negligible interference with legitimate user queries.
Autonomous spacecraft require rapid, lightweight, and reliable onboard detection of cyber-RF threats. Using the SPARTA attack model, we analyze the latency-accuracy trade-offs of TinyML-compatible classical models -- Random Forest, Logistic Regression, SVM, and MLP -- for detecting uplink jamming, Fake-NR spoofing, payload manipulation, ground-segment compromise, and unauthorized command injection. We present a physics-informed theoretical analysis of each model's computational complexity, VC dimension, Lipschitz continuity, and latency scaling, supported by empirical measurements on adversarial RF spectrograms generated via BandErasure, FakeNR, and NoiseBurst corruption modes. Results show that Logistic Regression achieves microsecond-level inference with only a 1\% accuracy drop relative to Random Forest, making it an effective TinyML baseline for onboard autonomy. The study also identifies opportunities for advancing spacecraft cybersecurity through richer feature encoders and multi-timescale learning architectures, building on recent progress in edge intelligence and trustworthy AI.