Loading...
Loading...
Browse, search, and filter preprints from arXiv—fast, readable, and built for curious security folks.
Showing 18 loaded of 48,585—scroll for more
This paper addresses the fuzzy shortest path problem in directed graphs, where edge costs are modeled as generalized fuzzy numbers with Gaussian membership functions. We interpret height as an indicator of information reliability. Based on this view, we introduce a weighted geometric mean to aggregate heights during the addition of generalized Gaussian fuzzy numbers. We employ a reliability-aware ranking that jointly considers the core, height, and standard deviation of fuzzy edge costs to determine the shortest path, thereby capturing their central tendency, reliability, and variability while keeping Dijkstra-level complexity per relaxation. The method yields routes that are not only cost-efficient but also supported by highly reliable information. To assess robustness, we construct a crisp baseline from the ranking and conduct Monte Carlo alpha-cut sampling--drawing membership levels uniformly and then sampling within the induced intervals--to recompute path costs and quantify sensitivity via the mean percentage deviation and its standard deviation. Finally, a large-scale case study on the FAA air traffic network demonstrates that the proposed GGFN--SPP framework scales efficiently to real-world networks, balances cost and reliability through $α$--cut aggregation and risk-aware ranking, and exhibits stable performance under Monte Carlo simulations with subnormal fuzzy costs.
Modern intrusion detection systems generate thousands of alerts daily, but alert fatigue severely limits security operations effectiveness due to too many false positives or low-impact events. We address this by proposing a principled framework for alert prioritization based on subnormal Gaussian fuzzy numbers, explicitly modeling three sources of uncertainty: threat severity, detection confidence, and organizational risk attitude. Each alert is represented as a fuzzy number with the core indicating severity, spread indicating uncertainty, and height reflecting detection reliability. We apply ranking indices to prioritize alerts, allowing organizations to tune security posture through a risk-attitude parameter. Experimental validation on CIC-IDS2017 and NSL-KDD demonstrates greater robustness than baselines under detector degradation (0.9963 vs 0.8215 NDCGrel@100), with distinct differentiation in mid-confidence alerts and near-parity with baselines under robust detectors. The framework is theoretically grounded, computationally efficient, provides interpretable reasoning, and remains robust across detector families and miscalibration scenarios.
Machine learning systems face diverse threats that undermine robustness, privacy, and fairness. Although many defenses have been proposed, each typically addresses a single risk in isolation. Real-world deployments, however, require these defenses to be composed to meet multiple guarantees simultaneously. The process of composing defenses is complex and not well understood, and its impact on performance and security remains unclear. We present Landseer, a modular framework for integrating machine learning (ML) defenses into the ML lifecycle and systematically evaluating their composition. Landseer encapsulates defenses as containerized modules, allowing existing and new techniques to be plugged in with minimal effort. Its evaluation engine automates experiments across multiple metrics, supporting the study of defenses both individually and in combination. In a preliminary study, we identified 35 state-of-the-art machine learning defenses. After filtering for reproducibility, we analyzed their performance using Landseer's unified evaluation process. Our findings reveal gaps in replicability across defense families and provide insights into the challenges and opportunities in integrating multiple defenses, establishing a foundation for improving the reliability of machine learning systems.
With the rapid proliferation of generative models, such as diffusion models, digital watermarking has emerged as a crucial solution for identifying AI-generated images. Modern post-hoc watermarking schemes use neural networks to achieve an extremely low false-alarm rate while remaining robust to common image transformations. However, there is a lack of comparison between these modern methods and classic ones, particularly in real-world scenarios where robustness and security take precedence over achieving an extremely low false-alarm probability. In this paper, we propose a fair comparison of robustness and security between modern and classic post-hoc watermarking across various types of classic augmentations and recent sophisticated attacks. Our experiments show that, in a realistic scenario, classic watermarking outperforms modern techniques in terms of security while maintaining robustness.
In this work, we propose BAIT (Boundary-Aware Iterative Trap), a three-step jailbreak framework that approaches malicious goals through internal disclosure. BAIT first asks the model to identify the protection boundary, then requires it to refine that boundary, and finally requests a detailed example. By expanding each step upon the model's previous responses, BAIT turns the model's own reasoning and consistency tendency into a disclosure pathway. Experiments on AdvBench, JailbreakBench, AIR-Bench, and SORRY-Bench demonstrate that BAIT consistently achieves strong attack success rates across top-tier large language models, significantly advancing conventional jailbreak baselines. Further analysis reveals that: 1) prevention-oriented framing significantly outperforms direct knowledge request; 2) the refinement step plays a critical role in disclosure escalation; and 3) the first two steps have a certain chance of eliciting harmful content while triggering little filtering.
Counterfactual tuning (CFT) has emerged as a promising paradigm for Large Language Model (LLM) unlearning by training models to generate alternative fictitious knowledge in place of undesired content. However, in this work, we find that this paradigm still underperforms other paradigms in some aspects, and identify two previously overlooked pitfalls underlying this gap: (1) knowledge conflict, where mutual inconsistencies within counterfactual corpora induce conflicting gradients that disrupt parameter optimization, and (2) hallucination spillover, where fitting false targets instills a persistent fabrication bias, inflating hallucination rates on unrelated domains. To systematically diagnose these issues, we introduce RWKU+, an extended benchmark equipped with novel trade-off metrics and gradient-level diagnostic tools. Our work further discusses the limitations and overhead of the paradigm, aiming to provide insights and actionable guidance for more rigorous LLM unlearning research.
As AI systems gain increasing autonomy and execution capability, the number of discovered security vulnerabilities continues to rise. However, many of these vulnerabilities are not fundamentally novel, but instead reflect recurring classes of weaknesses long observed in prior computing systems. Execution-capable AI agents are effectively unbounded, self-modifying programs that interact extensively with multiple layers of the computing stack. This broad interaction surface imposes a significant security burden on developers, who must reason about and secure complex cross-layer behaviors. Prior research has primarily focused on vulnerabilities in open-source agents and agent frameworks. In contrast, it remains unclear whether proprietary agent systems -- developed under stricter coding standards and formal review processes -- exhibit similar security weaknesses. In this paper, we present findings from two penetration tests conducted in 2025 against proprietary agent products and evaluate whether the security posture of AI agents has improved since these assessments.
Prompt injection poses a critical threat to the safe deployment of large language models, yet existing detection approaches are typically evaluated under limited settings that do not reflect real-world operating constraints. In this work, we present a deployment-aware evaluation of prompt injection detection using a multi-model and multi-regime experimental framework. We compare lexical, semantic, structural, and transformer-based detectors across multiple out-of-distribution settings, repeated data splits, and both ranking and thresholded deployment metrics. We introduce interpretable structural signals that capture hierarchy overrides, system prompt spoofing, role redefinition, and evasion patterns, and assess their contribution both within sparse models and in combination with strong encoder baselines. Our results show that detection performance is highly regime-dependent and sensitive to threshold selection, with no single model dominating across all settings. Transformer-based models achieve the strongest overall performance, while structural signals provide modest but consistent gains in certain regimes and improve low false positive rate behaviour in harder scenarios. These findings highlight the gap between ranking performance and deployment effectiveness and underscore the importance of evaluating prompt injection defences under realistic operational constraints. Code will be released.
The Resource Public Key Infrastructure (RPKI) secures the Internet's routing system by defining a complex trust and validation framework for certificates, Route Origin Authorizations (ROAs), manifests, and Certificate Revocation Lists (CRLs). These mechanisms are specified across dozens of RFCs. This paper presents the first comprehensive analysis of the causal link between flaws in RPKI Requests for Comments (RFCs) and vulnerabilities in implementations and real-world deployments. We reveal how vague, conflicting, or underspecified requirements in 50 RPKI RFCs propagate into inconsistent implementation behavior and operational failures. We conduct the first large-scale, impact-driven evaluation of RPKI specifications. Our methodology combines differential fuzzing of major RPKI implementations with Internet-wide crawling and validation log analysis, enabling us to trace practical vulnerabilities back to flawed RFC requirements. We uncover 61 previously undocumented inconsistencies in validation behavior, trace 23 directly to RFC flaws, and identify two novel vulnerabilities that were assigned CVEs. Our findings reveal that these are not isolated coding errors but rather systemic issues inherent in how RPKI standards are written, interpreted, and implemented. To mitigate these threats, we propose concrete recommendations and introduce a novel alerting service that monitors and reports live inconsistencies in RPKI deployments. Our open-source datasets, code, and tools support reproducibility and further research.
Structured data is well handled by gradient-boosted decision trees (GBDT), which are usually trained on vertically partitioned features across mutually distrustful parties. High speed and interpretability make GBDTs popular in finance and healthcare, where neural networks may fall short. Enabling secure computation for GBDTs poses unique challenges, requiring secure record alignment for comparison. Relying on private set intersection (PSI) is a de facto approach. Mistaking PSI for a safety measure actually exposes which record identifiers (IDs) are shared between the datasets. Although circuit-PSI could help, it is costly for generic uses. New ideas are needed to efficiently train in a "dark forest". Aiming to hide the IDs, we initiate the study of anonymous GBDT training on split data held by two parties. Dual circuit-PSI in our design lets the parties alternate as receiver to run pick-then-sum over local features. Via oblivious programmable pseudorandom functions, we propagate circuit-PSI outputs as shared state across runs. Avoiding universal alignment, we resolve the neglected dilemma that ID hiding incurs a cost that scales with domain size. Next, we halve the cost of ciphertext packing used to convert single-instruction multiple-data homomorphic encryption from (ring) learning with errors in prior secure GBDT (Usenix Security' 23) and related secure machine-learning computations. Comparative experiments show our protocol remains competitive with leaky approaches in efficiency. Enabling ID-hiding aggregation, our techniques can extend to other vertically partitioned analytics.
In an era dominated by big data and machine learning, establishing valuable data collaboration has never been more critical. However, such collaborations must operate under regulatory and legal constraints. Two-party Privacy-Preserving Record Linkage (PPRL) emerges to assess the potential collaboration value and also ensure the privacy and security of the involved data. Nevertheless, the substantial computational and communication overheads associated with PPRL hinder its practical adoption in data markets with numerous potential collaborators. Therefore, we present the Screening-then-Linkage framework, which incorporates a lightweight Screening phase prior to the resource-intensive PPRL phase, i.e., PPRS, to mitigate the scalability issue of PPRL. We propose a circuit-PSI-based system, named Appraisal to realize a secure, effective, and efficient PPRS. To reconcile the approximate matching and/or schema-aware setting required in PPRS with the limitations of the circuit-PSI supporting only symmetric functions, we propose a more communication-efficient secure permutation, i.e., Oblivious Attribute/Feature Alignment protocol tailored for PPRS. This protocol supports a broader range of comparison functions and significantly improves efficiency, i.e., reducing communication costs by a factor of 14 compared to the conventional protocol. Our rigorous analysis and comprehensive empirical evaluations demonstrate the security, effectiveness, and efficiency of Appraisal. Appraisal can accommodate up to $850\times$ more records than the SOTA PPRS system, SFour, within the same constraints. Moreover, it is $165 \times$ faster than SOTA PPRL, indicating the Screening-then-Linkage framework substantially decreases the computation time required to identify the most valuable collaborators from a large pool of candidates.
Unmanned aerial vehicle (UAV) swarms are increasingly deployed in vast low-altitude applications, owing to their capabilities in distributed sensing, flexible communication, and autonomous coordination. Nevertheless, the open and highly dynamic operating environment of UAV swarms introduces serious security risks, including GPS spoofing, insider threats, and multi-hop intrusion. These threats are aggravated by limited on-board resources, frequently changing network topology, and the presence of intelligent adversaries. To tackle these issues, this paper proposes a cloud-edge-end collaborative defense framework for UAV swarms. Based on this framework, three complementary mechanisms are developed. First, a cooperative perception scheme is designed to resist GPS spoofing via interactive attack-defense game modeling. Second, a behavior-driven authentication method with trust evaluation is developed to mitigate insider threats. Third, a multi-agent attack forensics framework is devised to intelligently trace the propagation paths of multi-hop attacks in UAV networks. Experimental results validate the effectiveness of the proposed approaches. Finally, several open research directions are outlined.
YARA rules are widely shared across threat intelligence communities to enable collective defence against malware. This practice implicitly assumes that removing metadata (e.g., author fields) sufficiently protects the identity of contributing organisations. To assess the validity of this assumption, we systematically evaluate how much can be inferred from YARA rule text alone. Specifically, using a corpus of 23,305 rules from three major public repositories, we train independent classifiers along four stylometric fingerprint dimensions: individual author, source repository, malware family, and temporal drift, using three complementary methods: lexical n-grams (Burrows' Delta), syntactic AST features (Caliskan-Islam), and fine-tuned CodeBERT. Our results demonstrate that repository origin is almost perfectly recoverable (up to 99% accuracy), individual authors can be re-identified well above chance (76%), and malware family classification reaches 95%. Comparing the same repository attribution task across full-history and time-restricted subsets reveals a 9-18% accuracy gap, providing preliminary evidence of temporal drift in repository fingerprints.To further disentangle content from style, we conduct per-malware family author attribution experiments. Even when the malware family is the same for all samples considered, authors can still be re-identified for five of seven tested families (mean accuracy 74.6%). These findings constitute the first systematic demonstration that YARA rule sharing is a measurable OPSEC attack surface, and that metadata removal alone does not mitigate it.
Retrieval-augmented generation (RAG) increasingly underpins high-stakes applications, yet remains vulnerable to Confundo-style poisoning where adversarially optimized documents manipulate generated outputs. Existing defenses assume that detecting poisoned evidence prevents harm. We show this assumption is incorrect: models exhibit a monitoring-control gap -- they can detect contradictions in retrieved evidence yet still act on poisoned claims. We introduce the Cordon Principle -- no agent capable of final synthesis may access untrusted natural-language evidence -- and realize it through CORDON-MAS, a compartmentalized framework that enforces this principle architecturally by separating evidence extraction, cross-source audit, and answer synthesis into agents with asymmetric memory privileges. Across five BEIR datasets, CORDON-MAS reduces attack success rate by 92.4\% relative to undefended RAG. This reframes RAG poisoning from a detection problem to an information-flow control problem.
Reliable watermarking of panoramic imagery is fundamentally challenged by arbitrary 3D rotations. As panoramas are defined on the sphere, they naturally transform under the action of $SO(3)$, rendering conventional planar representations and augmentation-based robustness strategies inadequate and devoid of theoretical guarantees. To address this, we formulate panoramas as spherical signals and leverage $SO(3)$ representation theory to derive provably rotation-invariant descriptors. While spherical harmonic coefficients transform equivariantly under rotations, the natural invariant constructions are typically limited to zeroth-order statistics which eliminate directional information and severely constrain embedding capacity. In this work, we introduce a principled third-order invariant construction by coupling higher-order $SO(3)$ irreducible representations via tensor products and projecting onto the trivial representation. This yields a spherical invariant bispectrum that preserves phase information while remaining strictly rotation-invariant. Leveraging this property, we embed watermarks into higher-order spherical harmonic coefficients and recover them from invariant bispectral scalars, enabling reliable extraction under arbitrary 3D rotations. We provide a theoretical proof of $SO(3)$ invariance for it and demonstrate experimentally its near-perfect robustness to continuous rotations while maintaining high visual fidelity.
Cross-slice attack attribution in 6G networks requires identifying causal propagation chains through shared infrastructure in under 100 ms. Existing methods struggle to satisfy this strict SLA without sacrificing accuracy, because shared resource contention creates spurious correlations that are indistinguishable from genuine causal links under standard Granger tests. We propose DA-GC, a certified causal attribution framework that integrates resource-conditioned Granger causality with an axiomatically derived Resource Contention Model (RCM) to systematically block resource-mediated confounding. On a 15-slice production-emulation 6G testbed with 1,100 attack scenarios, DA-GC achieves 89.2% attribution accuracy at 87 ms. This represents a 7.9 percentage-point improvement over the strongest baseline at 2.7x lower latency, alongside demonstrated cross-topology generalization and concept-drift resilience. Crucially, DA-GC is backed by a comprehensive formal certification stack. We provide mathematically proven validity certificates for statistical soundness under serially dependent telemetry and piecewise-stationarity. Furthermore, we establish strict security bounds, including an adversarial utilization spoofing breakdown point of $δ^* \approx 0.95$, and define the minimum differential-privacy noise required for a provably private and robust deployment.
Shared library hijacking attacks in the Linux ecosystem, including embedded Linux, are a significant concern. It fundamentally exploits the dynamic linker's library-resolution semantics rather than modifying trusted libraries directly. Prior research has extensively analyzed attack vectors exploiting environment variables, embedded search paths, and dynamic loader internals, demonstrating that hijacking is rooted in fundamental loader behavior rather than isolated misconfigurations. Existing defenses either harden or replace the loader, enforce control-flow integrity after libraries are loaded, or apply file-centric integrity mechanisms such as signatures and measurement frameworks. However, these approaches fail to address a critical gap: none verify whether the shared object actually resolved by the loader is the intended and trusted one. In this paper, we argue that shared library hijacking is fundamentally a loader-resolution authenticity problem and present a loader-centric verification framework that enforces authenticity guarantees for the dynamic linker's resolution process. Our design supports both path-bound and location-independent (i.e., Build-ID-based) identity models combined with cryptographic hashing. We implement our approach on GNU libc (glibc) systems and evaluate it on both general-purpose Linux (e.g., Ubuntu) and embedded Linux (e.g., Buildroot) environments under emulation. Our results demonstrate that our proposed mechanism indeed prevents shared library hijacking attacks.
The Resource Public Key Infrastructure (RPKI) has become essential to secure inter-domain routing. Despite its critical role, RPKI software remains largely untested beyond shallow parsing. Existing fuzzers, like AFL++ or libFuzzer, do not work well for RPKI as they assume a single, self-contained input per execution, while RPKI repositories contain hundreds of interdependent cryptographically linked objects. Existing fuzzers fail to handle this complexity and lack the ability for precise coverage attribution in multi-object repositories, breaking feedback-based exploration and thereby missing most severe vulnerabilities in RPKI validation. In this paper, we overcome these limitations through novel fuzzing techniques, including continuous sampling and using functions as side-channels for per-object coverage attribution in large input repositories. We further show how parsing inputs to a labeled tree allows structural and semantic mutations while preserving cryptographic validity in mutated repositories. We implement our new techniques into a powerful fuzzing tool called CAT, combining non-sequential fuzzing with our template-agnostic ASN.1 mutation engine to achieve 66x throughput improvement over sequential fuzzing and exploring 24 - 47% more unique code paths compared to libFuzzer and previous work. Evaluating CAT on RPKI validators uncovered 21 previously unknown vulnerabilities with 8 CVEs already assigned (CVSS 7.5 - 9.8). These include a buffer overflow, Denial-of-Service (DoS), and exploitable repository-poisoning logic flaws. We open-source CAT to enable reproducibility, further research, and adaptation of our methods to other complex cryptography-based protocols such as DNSSEC and TLS.