Loading...
Loading...
Browse, search, and filter preprints from arXiv—fast, readable, and built for curious security folks.
Showing 18 loaded of 46,841—scroll for more
Random subspace method has wide security applications such as providing certified defenses against adversarial and backdoor attacks, and building robustly aligned LLM against jailbreaking attacks. However, the explanation of random subspace method lacks sufficient exploration. Existing state-of-the-art feature attribution methods, such as Shapley value and LIME, are computationally impractical and lacks security guarantee when applied to random subspace method. In this work, we propose EnsembleSHAP, an intrinsically faithful and secure feature attribution for random subspace method that reuses its computational byproducts. Specifically, our feature attribution method is 1) computationally efficient, 2) maintains essential properties of effective feature attribution (such as local accuracy), and 3) offers guaranteed protection against privacy-preserving attacks on feature attribution methods. To the best of our knowledge, this is the first work to establish provable robustness against explanation-preserving attacks. We also perform comprehensive evaluations for our explanation's effectiveness when faced with different empirical attacks, including backdoor attacks, adversarial attacks, and jailbreak attacks. The code is at https://github.com/Wang-Yanting/EnsembleSHAP. WARNING: This document may include content that could be considered harmful.
We propose a simple detection mechanism for the Gumbel watermarking scheme proposed by Aaronson (2022). The new mechanism is proven to be near-optimal in a problem-dependent sense among all model-agnostic watermarking schemes under the assumption that the next-token distribution is sampled i.i.d.
AI agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions embedded in untrusted data can trigger dangerous agent actions. This position paper discusses our vision for system-level defenses against indirect prompt injection attacks. We articulate three positions: (1) dynamic replanning and security policy updates are often necessary for dynamic tasks and realistic environments; (2) certain context-dependent security decisions would still require LLMs (or other learned models), but should only be made within system designs that strictly constrain what the model can observe and decide; (3) in inherently ambiguous cases, personalization and human interaction should be treated as core design considerations. In addition to our main positions, we discuss limitations of existing benchmarks that can create a false sense of utility and security. We also highlight the value of system-level defenses, which serve as the skeleton of agentic systems by structuring and controlling agent behaviors, integrating rule-based and model-based security checks, and enabling more targeted research on model robustness and human interaction.
Assistive technologies increasingly support independence, accessibility, and safety for older adults, people with disabilities, and individuals requiring continuous care. Two major categories are virtual assistive systems and robotic assistive systems operating in physical environments. Although both offer significant benefits, they introduce important security and privacy risks due to their reliance on artificial intelligence, network connectivity, and sensor-based perception. Virtual systems are primarily exposed to threats involving data privacy, unauthorized access, and adversarial voice manipulation. In contrast, robotic systems introduce additional cyber-physical risks such as sensor spoofing, perception manipulation, command injection, and physical safety hazards. In this paper, we present a comparative analysis of security and privacy challenges across these systems. We develop a unified comparative threat-modeling framework that enables structured analysis of attack surfaces, risk profiles, and safety implications across both systems. Moreover, we provide design recommendations for developing secure, privacy-preserving, and trustworthy assistive technologies.
Speculative execution enhances processor performance by predicting intermediate results and executing instructions based on these predictions. However, incorrect predictions can lead to security vulnerabilities, as speculative instructions leave traces in microarchitectural components that attackers can exploit. This is demonstrated by the family of Spectre attacks. Unfortunately, existing countermeasures to these attacks lack a formal security characterization, making it difficult to verify their effectiveness. In this paper, we propose a novel framework for detecting information flows introduced by speculative execution and reasoning about software defenses. The theoretical foundation of our approach is speculative non-interference (SNI), a novel semantic notion of security against speculative execution attacks. SNI relates information leakage observed under a standard non-speculative semantics to leakage arising under semantics that explicitly model speculative execution. To capture their combined effects, we extend our framework with a mechanism to safely compose multiple speculative semantics, each focussing on a single aspect of speculation. This allows us to analyze the complex interactions and resulting leaks that can arise when multiple speculative mechanisms operate together. On the practical side, we develop Spectector, a symbolic analysis tool that uses our compositional framework and leverages SMT solvers to detect vulnerabilities and verify program security with respect to multiple speculation mechanisms. We demonstrate the effectiveness of Spectector through evaluations on standard security benchmarks and new vulnerability scenarios.
Trusted Execution Environments (TEEs) allow the secure execution of code on remote systems without the need to trust their operators. They use static attestation as a central mechanism for establishing trust, allowing remote parties to verify that their code is executed unmodified in an isolated environment. However, this form of attestation does not cover runtime attacks, where an attacker exploits vulnerabilities in the software inside the TEE. Control Flow Attestation (CFA), a form of runtime attestation, is designed to detect such attacks. In this work, we present a method to extend TEEs with CFA and discuss how it can prevent exploitation in the event of detected control flow violations. Furthermore, we introduce HPCCFA, a mechanism that uses HPCs for CFA purposes, enabling hardware-backed trace generation on commodity CPUs. We demonstrate the feasibility of HPCCFA on a proof-of-concept implementation for Keystone on RISC-V. Our evaluation investigates the interplay of the number of measurement points and runtime protection, and reveals a trade-off between detection reliability and performance overhead.
Diffusion-based watermarking methods embed verifiable marks by manipulating the initial noise or the reverse diffusion trajectory. However, these methods share a critical assumption: verification can succeed only if the diffusion trajectory can be faithfully reconstructed. This reliance on trajectory recovery constitutes a fundamental and exploitable vulnerability. We propose $\underline{\mathbf{S}}$tochastic $\underline{\mathbf{Hi}}$dden-Trajectory De$\underline{\mathbf{f}}$lec$\underline{\mathbf{t}}$ion ($\mathbf{SHIFT}$), a training-free attack that exploits this common weakness across diverse watermarking paradigms. SHIFT leverages stochastic diffusion resampling to deflect the generative trajectory in latent space, making the reconstructed image statistically decoupled from the original watermark-embedded trajectory while preserving strong visual quality and semantic consistency. Extensive experiments on nine representative watermarking methods spanning noise-space, frequency-domain, and optimization-based paradigms show that SHIFT achieves 95%--100% attack success rates with nearly no loss in semantic quality, without requiring any watermark-specific knowledge or model retraining.
In low-altitude wireless networks (LAWN), federated learning (FL) enables collaborative intelligence among unmanned aerial vehicles (UAVs) and integrated sensing and communication (ISAC) devices while keeping raw sensing data local. Due to the "right to be forgotten" requirements and the high mobility of ISAC devices that frequently enter or leave the coverage region of UAV-assisted servers, the influence of departing devices must be removed from trained models. This necessity motivates the adoption of federated unlearning (FUL) to eliminate historical device contributions from the global model in LAWN. However, existing FUL approaches implicitly assume that the UAV-assisted server executes unlearning operations honestly. Without client-verifiable guarantees, an untrusted server may retain residual device information, leading to potential privacy leakage and undermining trust. To address this issue, we propose VerFU, a privacy-preserving and client-verifiable federated unlearning framework designed for LAWN. It empowers ISAC devices to validate the server-side unlearning operations without relying on original data samples. By integrating linear homomorphic hash (LHH) with commitment schemes, VerFU constructs tamper-proof records of historical updates. ISAC devices ensure the integrity of unlearning results by verifying decommitment parameters and utilizing the linear composability of LHH to check whether the global model accurately removes their historical contributions. Furthermore, VerFU is capable of efficiently processing parallel unlearning requests and verification from multiple ISAC devices. Experimental results demonstrate that our framework efficiently preserves model utility post-unlearning while maintaining low communication and verification overhead.
Quantum key distribution is often regarded as an unconditionally secure method to exchange a secret key by harnessing fundamental aspects of quantum mechanics. Despite the robustness of key exchange, classical post-processing reveals vulnerabilities that an eavesdropper could target. In particular, many reconciliation protocols correct errors by comparing the parities of subsets between both parties. These communications occur over insecure channels, leaking information that an eavesdropper could exploit. Currently there is no holistic threat model that addresses how parity-leakage during reconciliation might be actively manipulated. In this paper we introduce a new form of attack, namely the Manipulate-and-Observe attack in which the adversary (1) partially intercepts a fraction $ρ$ of the qubits during key exchange, injecting the maximally tolerated amount of errors up to the 11 percent error threshold whilst remaining undetected and (2) probes the maximum amount of parity-leakage during reconciliation, and exploits it using a vectorised, parallel brute force filter to shrink the search space from 2n down to as few as a single candidate, for an n-bit reconciled key. We perform simulations of the attack, deploying it on the most widely used protocol, BB84, andthe benchmark reconciliation protocol, Cascade. Our simulation results demonstrate that the attack can significantly reduce the security below the theoretical bound and, in the worst case, fully recover the reconciled key material. The principles of the attack could threaten other parity-based reconciliation schemes, like Low Density Parity Check, which underscores the need for urgent consideration of the combined security of key exchange and post-processing.
Mobile messaging apps are a fundamental communication infrastructure, used by billions of people every day to share information, including sensitive data. Security and Privacy are thus critical concerns for such applications. Although the cryptographic protocols prevalent in messaging apps are generally well studied, other relevant implementation characteristics of such apps, such as their software architecture, permission use, and network-related runtime behavior, have not received enough attention. In this paper, we present a methodology for comparing implementation characteristics of messaging applications by employing static and dynamic analysis under reproducible scenarios to identify discrepancies with potential security and privacy implications. We apply this methodology to study the Android clients of the Meta Messenger, Signal, and Telegram apps. Our main findings reveal discrepancies in application complexity, attack surface, and network behavior. Statically, Messenger presents the largest attack surface and the highest number of static analysis warnings, while Telegram requests the most dangerous permissions. In contrast, Signal consistently demonstrates a minimalist design with the fewest dependencies and dangerous permissions. Dynamically, these differences are reflected in network activity; Messenger is by far the most active, exhibiting persistent background communication, whereas Signal is the least active. Furthermore, our analysis shows that all applications properly adhere to the Android permission model, with no evidence of unauthorized data access.
Mobile networks are essential for modern societies. The most recent generation of mobile networks will be even more ubiquitous than previous ones. Therefore, the security of these networks as part of the critical infrastructure with essential communication services is of the uttermost importance. However, these systems are still vulnerable to being compromised, as showcased in the recent discussion on supply chain security and other challenges. This work addresses problems arising from compromised 5G core network components. The investigations reveal how attacks based on command and control communication can be designed so that they cannot be detected or prevented. This way, various attacks against the security and privacy of subscribers can be performed for which no effective countermeasures are available.
Existential risk scenarios relating to Generative Artificial Intelligence often involve advanced systems or agentic models breaking loose and using hacking tools to gain control over critical infrastructure. In this paper, we argue that the real threats posed by generative AI for cybercrime are rather different. We apply innovation theory and evolutionary economics - treating cybercrime as an ecosystem of small- and medium-scale tech start-ups, coining two novel terms that bound the upper and lower cases for disruption. At the high end, we propose the Stand-Alone Complex, in which cybercrime-gang-in-a-box solutions enable individual actors to largely automate existing cybercrime-as-a-service arrangements. At the low end, we suggest the phenomenon of Vibercrime, in which 'vibe coding' lowers the barrier to entry, but do not fundamentally reshape the economic structures of cybercrime. We analyse early empirical data from the cybercrime underground, and find the reality is prosaic - AI has some early adoption in existing large-scale, low-profit passive income schemes and trivial forms of fraud but there is little evidence so far on widespread disruption in cybercrime. This replaces existing means of code pasting, error checking, and cheatsheet consultation, for generic aspects of software development involved in cybercrime - and largely for already skilled actors, with low-skill actors finding little utility in vibe coding tools compared to pre-made scripts. The role of jailbroken LLMs (Dark AI) as instructors is also overstated, given the prominence of subculture and social learning in initiation - new users value the social connections and community identity involved in learning hacking and cybercrime skills as much as the knowledge itself. Our initial results, therefore, suggest that even bemoaning the rise of the Vibercriminal may be overstating the level of disruption to date.
Network traffic classification using self-supervised pre-training models based on Masked Autoencoders (MAE) has demonstrated a huge potential. However, existing methods are confined to isolated byte-level reconstruction of individual flows, lacking adequate perception of the multi-granularity contextual relationship in traffic. To address this limitation, we propose Mean MAE (MMAE), a teacher-student MAE paradigm with flow mixing strategy for building encrypted traffic pre-training model. MMAE employs a self-distillation mechanism for teacher-student interaction, where the teacher provides unmasked flow-level semantic supervision to advance the student from local byte reconstruction to multi-granularity comprehension. To break the information bottleneck in individual flows, we introduce a dynamic Flow Mixing (FlowMix) strategy to replace traditional random masking mechanism. By constructing challenging cross-flow mixed samples with interferences, it compels the model to learn discriminative representations from distorted tokens. Furthermore, we design a Packet-importance aware Mask Predictor (PMP) equipped with an attention bias mechanism that leverages packet-level side-channel statistics to dynamically mask tokens with high semantic density. Numerous experiments on a number of datasets covering encrypted applications, malware, and attack traffic demonstrate that MMAE achieves state-of-the-art performance. The code is available at https://github.com/lx6c78/MMAE
Encrypted traffic classification is a critical task for network security. While deep learning has advanced this field, the occlusion of payload semantics by encryption severely challenges standard modeling approaches. Most existing frameworks rely on static and homogeneous pipelines that apply uniform parameter sharing and static fusion strategies across all inputs. This one-size-fits-all static design is inherently flawed: by forcing structured headers and randomized payloads into a unified processing pipeline, it inevitably entangles the raw protocol signals with stochastic encryption noise, thereby degrading the fine-grained discriminative features. In this paper, we propose TrafficMoE, a framework that breaks through the bottleneck of static modeling by establishing a Disentangle-Filter-Aggregate (DFA) paradigm. Specifically, to resolve the structural between-components conflict, the architecture disentangles headers and payloads using dual-branch sparse Mixture-of-Experts (MoE), enabling modality-specific modeling. To mitigate the impact of stochastic noise, an uncertainty-aware filtering mechanism is introduced to quantify reliability and selectively suppress high-variance representations. Finally, to overcome the limitations of static fusion, a routing-guided strategy aggregates cross-modality features dynamically, that adaptively weighs contributions based on traffic context. With this DFA paradigm, TrafficMoE maximizes representational efficiency by focusing solely on the most discriminative traffic features. Extensive experiments on six datasets demonstrate TrafficMoE consistently outperforms state-of-the-art methods, validating the necessity of heterogeneity-aware modeling in encrypted traffic analysis. The source code is publicly available at https://github.com/Posuly/TrafficMoE_main.
LLM-as-a-Judge (LaaJ) is a novel paradigm in which powerful language models are used to assess the quality, safety, or correctness of generated outputs. While this paradigm has significantly improved the scalability and efficiency of evaluation processes, it also introduces novel security risks and reliability concerns that remain largely unexplored. In particular, LLM-based judges can become both targets of adversarial manipulation and instruments through which attacks are conducted, potentially compromising the trustworthiness of evaluation pipelines. In this paper, we present the first Systematization of Knowledge (SoK) focusing on the security aspects of LLM-as-a-Judge systems. We perform a comprehensive literature review across major academic databases, analyzing 863 works and selecting 45 relevant studies published between 2020 and 2026. Based on this study, we propose a taxonomy that organizes recent research according to the role played by LLM-as-a-Judge in the security landscape, distinguishing between attacks targeting LaaJ systems, attacks performed through LaaJ, defenses leveraging LaaJ for security purposes, and applications where LaaJ is used as an evaluation strategy in security-related domains. We further provide a comparative analysis of existing approaches, highlighting current limitations, emerging threats, and open research challenges. Our findings reveal significant vulnerabilities in LLM-based evaluation frameworks, as well as promising directions for improving their robustness and reliability. Finally, we outline key research opportunities that can guide the development of more secure and trustworthy LLM-as-a-Judge systems.
Lightweight cryptographic primitives are widely deployed in resource-constraint environment, particularly in the Internet of Things (IoT) devices. Due to their public accessibility, these devices are vulnerable to physical attacks, especially fault attacks. Recently, deep learning-based cryptanalytic techniques have demonstrated promising results; however, their application to fault attacks remains limited, particularly for stream ciphers. In this work, we investigate the feasibility of deep learning assisted differential fault attack on three lightweight stream ciphers, namely ACORNv3, MORUSv2 and ATOM, under a relaxed fault model, where a single-bit bit-flipping fault is injected at an unknown location. We train multilayer perceptron (MLP) models to identify the fault locations. Experimental results show that the trained models achieve high identification accuracies of 0.999880, 0.999231 and 0.823568 for ACORNv3, MORUSv2 and ATOM, respectively, and outperform traditional signature-based methods. For the secret recovery process, we introduce a threshold-based method to optimize the number of fault injections required to recover the secret information. The results show that the initial state of ACORN can be recovered with 21 to 34 faults; while MORUS requires 213 to 248 faults, with at most 6 bits of guessing. Both attacks reduce the attack complexity compared to existing works. For ATOM, the results show that it possesses a higher security margin, as majority of state bits in the Non-linear Feedback Shift Register (NFSR) can only be recovered under a precise control model. To the best of our knowledge, this work provides the first experimental results of differential fault attacks on ATOM.
Backdoor attacks on federated learning (FL) are most often evaluated with synthetic corner patches or out-of-distribution (OOD) patterns that are unlikely to arise in practice. In this paper, we revisit the backdoor threat to standard FL (a single global model) under a more realistic setting where triggers must be semantically meaningful, in-distribution, and visually plausible. We propose SABLE, a Semantics-Aware Backdoor for LEarning in federated settings, which constructs natural, content-consistent triggers (e.g., semantic attribute changes such as sunglasses) and optimizes an aggregation-aware malicious objective with feature separation and parameter regularization to keep attacker updates close to benign ones. We instantiate SABLE on CelebA hair-color classification and the German Traffic Sign Recognition Benchmark (GTSRB), poisoning only a small, interpretable subset of each malicious client's local data while otherwise following the standard FL protocol. Across heterogeneous client partitions and multiple aggregation rules (FedAvg, Trimmed Mean, MultiKrum, and FLAME), our semantics-driven triggers achieve high targeted attack success rates while preserving benign test accuracy. These results show that semantics-aligned backdoors remain a potent and practical threat in federated learning, and that robustness claims based solely on synthetic patch triggers can be overly optimistic.
The fast pace of modern AI is rapidly transforming traditional industrial systems into vast, intelligent and potentially unmanned autonomous operational environments driven by AI-based solutions. These solutions leverage various forms of machine learning, reinforcement learning, and generative AI. The introduction of such smart capabilities has pushed the envelope in multiple industrial domains, enabling predictive maintenance, optimized performance, and streamlined workflows. These solutions are often deployed across the Industrial Internet of Things (IIoT) and supported by the Edge-Fog-Cloud computing continuum to enable urgent (i.e., real-time or near real-time) decision-making. Despite the current trend of aggressively adopting these smart industrial solutions to increase profit, quality, and efficiency, large-scale integration and deployment also bring serious hazards that if ignored can undermine the benefits of smart industries. These hazards include unforeseen interoperability side-effects and heightened vulnerability to cyber threats, particularly in environments operating with a plethora of heterogeneous IIoT systems. The goal of this study is to shed light on the potential consequences of industrial smartness, with a particular focus on security implications, including vulnerabilities, side effects, and cyber threats. We distinguish software-level downsides stemming from both traditional AI solutions and generative AI from those originating in the infrastructure layer, namely IIoT and the Edge-Cloud continuum. At each level, we investigate potential vulnerabilities, cyber threats, and unintended side effects. As industries continue to become smarter, understanding and addressing these downsides will be crucial to ensure secure and sustainable development of smart industrial systems.