Loading...
Loading...
Browse, search, and filter preprints from arXiv—fast, readable, and built for curious security folks.
Showing 18 loaded of 49,647—scroll for more
Jailbreak attacks bypass LLM safety alignment, yet their mechanisms remain poorly understood. We provide evidence that attacks do not comprehensively eliminate safety features, but instead selectively suppress specific attention heads. We identify two functionally differentiated types: Adversarially Compromised Heads (ACHs) concentrated in early layers, which are suppressed under attacks, and Safety-Aligned Heads (SAHs) in mid-layers, which maintain robust activations even when attacks succeed. Ablation studies support the causal role of ACHs and the contribution of SAHs to robust activations: suppressing a small number of ACHs is sufficient to induce jailbreak-like behavior on normally refused inputs, while removing SAHs substantially weakens mid-layer safety activations. Token-level attribution further shows that ACH suppression is driven specifically by attack-template tokens, providing a mechanistic account of why attacks can bypass refusal decisions through ACH suppression while leaving internal safety signals sustained by SAHs -- a phenomenon we term Robust Harmful Features. To validate the practical significance of this robustness, we show that simply reading these persistent activations -- without any training -- yields competitive aggregate detection performance with strong adversarial robustness.
Developers may reference vulnerabilities in pull request discussions through both explicit identifiers, such as CVEs or GHSAs, and implicit security-related language (e.g., "unauthorized access" or "SQL injection"). Prior work has primarily focused on explicit identifiers, potentially overlooking vulnerability discussions that lack formal references. Bots and coding agents are becoming more common in pull requests, raising new questions about how different accounts communicate about vulnerabilities. In this registered report, we describe our planned study of vulnerability communication in pull requests by humans, bots, and coding agents. Building on the AIDev-pop dataset, we analyze explicit vulnerability references and implicit security-related signals across pull request titles, descriptions, review comments, commit messages, and timeline discussions. We further investigate whether these references are associated with vulnerabilities introduced or fixed in the modified code and how they relate to pull request review activity and outcomes. This study contributes a large-scale empirical investigation of vulnerability communication practices in modern software development.
The rapid proliferation of automated, multi-vector malware threats poses a significant risk to heterogeneous, resource constrained cyber-physical networks. Conventional epidemiological models often treat security defenses as static parameters, failing to capture the strategic, asymmetric maneuvers between an attacker and a defender. To address the gap, this paper proposes a Game-Theory-Integrated Modified Multi- Wireless Sensor Epidemic Malware Propagation (GTI-mSEMP) framework. This paper analyzed and compared the operational trajectories of Susceptible (S) and Recovered (R) node populations across three different operational regimes: Balanced Matchup, Exploit Surge and Hardened Defense. Numerical simulation results capture the real-time transient dynamics of the network state variables, demonstrating how the epidemic curve shifts when either the defensive or offensive scaling vectors hold an efficiency advantage. The proposed mathematical and numerical framework provides a rigorous foundation that can be deployed in highly adversarial network environments to evaluate dynamic malware propagation and predict localized node population states.
Large language models (LLMs) have increasingly moved from standalone text generation systems to agents that invoke external tools, access environments, and execute multi-step tasks. However, conventional function-calling benchmarks mainly evaluate task completion and API correctness, while privacy evaluation benchmarks typically focus on final responses or privacy judgments. Neither perspective captures purpose-bound information flow across an executed multi-tool trajectory. Motivated by this limitation in current agent evaluation, ToolPrivacyBench audits whether task-private atoms are routed only to authorized tools and downstream sinks, thereby evaluating both task completion and privacy over-disclosure during tool use. The benchmark contains 2,150 cases, including 1,150 fully synthetic privacy-sensitive business workflows and 1,000 cases adapted from existing multi-tool and function-calling benchmarks. Each case is represented by a policy knowledge base. After an agent executes against mock business backends, the evaluator compares recorded tool arguments and backend audit logs with this policy knowledge base. The evaluation covers nine widely used agents to characterize purpose-bound privacy over-disclosure. The results show that successful tool execution does not imply appropriate privacy disclosure: an agent may complete a task while transmitting unnecessary private information through intermediate tool calls. ToolPrivacyBench therefore formalizes a need-to-know disclosure boundary, under which each tool should receive only the information necessary for its stated purpose, and uses trajectory-level auditing to identify privacy over-disclosure in multi-tool workflows.
Cyber deception research has focused on improving honeypot deception capabilities to increase attacker engagement and extend their interactions to collect more and better intelligence. For SSH honeypots, this relies on the assumption that attackers log in, open a shell, and type. We tested whether this still held by deploying eleven SSH honeypots that served both interactive and non-interactive session requests for fifteen days. We collected 177,622 authenticated sessions and validated our results against an independent Cowrie dataset over the same time window. We found that 99.23% of sessions were non-interactive. Interactive sessions account for only 0.10%. The same pattern held in the comparative third-party dataset used for evaluation. This finding is important because a honeypot that focuses on interactive shells or evaluates success based on session length and the number of commands can miss most authenticated attacks and draw the wrong conclusions about what attackers do after login.
Threshold private set intersection (TPSI) allows parties to reveal their intersection only when its cardinality reaches a prescribed threshold. Existing quantum TPSI protocols typically rely on a third party (TP) to interpret the final results, which deviates from the cardinality-testing paradigm of TPSI. In this paper, we propose a quantum multiparty TPSI protocol with explicit cardinality testing. Our protocol develops a rotation-based quantum construction in which single-photon sequences are sequentially processed through participant-side data rotations, TP--participant masking rotations, and correlated aggregate rotations. This design produces hidden-label measurement vectors: TP can complete the final measurement, but cannot interpret the semantic meaning of the outcomes. Based on these hidden measurements, we further realize the threshold decision through an oblivious linear evaluation (OLE)-based inner product procedure and a lightweight garbled circuit, revealing only \(\mathbf 1[|\bigcap_i X_i|\ge τ]\) before conditional intersection reconstruction. We prove the correctness and security of the proposed protocol, and further validate its feasibility through quantum-circuit simulations implemented on the IBM \textsf{Qiskit} platform.
Threshold private set intersection (TPSI) allows parties to reveal their intersection only when its cardinality reaches a prescribed threshold. Existing quantum TPSI protocols typically rely on a third party (TP) to interpret the final results, which deviates from the cardinality-testing paradigm of TPSI. In this paper, we propose a quantum multiparty TPSI protocol with explicit cardinality testing. Our protocol develops a rotation-based quantum construction in which single-photon sequences are sequentially processed through participant-side data rotations, TP--participant masking rotations, and correlated aggregate rotations. This design produces hidden-label measurement vectors: TP can complete the final measurement, but cannot interpret the semantic meaning of the outcomes. Based on these hidden measurements, we further realize the threshold decision through an oblivious linear evaluation (OLE)-based inner product procedure and a lightweight garbled circuit, revealing only \(\mathbf 1[|\bigcap_i X_i|\ge τ]\) before conditional intersection reconstruction. We prove the correctness and security of the proposed protocol, and further validate its feasibility through quantum-circuit simulations implemented on the IBM \textsf{Qiskit} platform.
LLM-based SSH honeypots can generate believable interactions, but evaluations indicate they remain somewhat identifiable to determined attackers, indicating the need for a better scaffolding. We present a new LLM-based honeypot design that uses a multi-agent, multi-LLM architecture to address the limitations of the previous shelLM LLM honeypot. Our honeypot, called AdvancedShelLM, uses two LLM agents, a Manager and a Worker, that better understand the commands while reducing incorrect responses and increasing deception. It implements an advanced permanent filesystem, allowing many simultaneous attackers to see the same changing files for the first time. It was evaluated with: (i) unit tests for generative capabilities, (ii) an AI attacker (ARACNE) to assess realism and deception, (iii) human attackers to assess its deceptive capability, and (iv) an Internet deployment to evaluate deception in real-world attacks. In unit test results, AdvancedShelLM achieved a pass rate of up to 99.02%. The AI attacker ARACNE had issues making a decision if the system is honeypot or not, but showed slight bias towards saying honeypot, even for a real Ubuntu shell. With human attackers, AdvancedShelLM deceived more humans than Cowrie, but had similar results as shelLM. The Internet deployment showed concrete evidence that the output of AdvancedShelLM can influence the behaviour of real-life attackers.
Dense embeddings underpin semantic search and RAG, yet a leaked vector store hands much of the underlying text back to whoever holds it. The attacks that make this possible (few-shot alignment, zero-shot inversion, unsupervised cross-space translation) share one weakness: the protected store is a single global geometry that can be aligned to a known one. A secret global rotation, the usual lightweight defence, is no exception: orthogonal Procrustes recovers it once the attacker has about the subspace dimension in known pairs. We introduce Shard, a retrieval-preserving embedding transform that removes this weak axis. The centred embedding is split into a short public prefix (for stage-1 retrieval) and a private residual sharded into C cells under separate secret keys; the residual is reranked under CKKS, where the keys cancel and leave the inner product exact. A single parameter C runs the design from the global-linear baseline it replaces (C=1) to per-document micro-keys (C=N). Because the rerank is full-dimensional, Shard returns the raw-space nDCG@10 that half-SVD truncation gives up; and because the residual is keyed cell-locally, mapping it back to a common frame under a diffuse known-plaintext leak costs roughly C times more anchors (median 200 to 102,400 at C=256), for a few encrypted queries. The short public prefix leaks far less neighbour structure, and a micro-key limit drives the residual graph to zero with an unlinkable, renewable template. The barrier holds against learned, non-linear and unsupervised aligners, and where a matched-utility noise defence de-anonymises almost every probe, Shard de-anonymises none. We are plain about the limits: within a cell the keys cancel, a targeted attacker needs only about d_priv anchors, and an overlapping reference corpus still leaks through the prefix. Shard is an attack-aware geometric defence, not a cryptographic guarantee.
Cyber deception research often assumes that a decoy can be placed wherever there is attacker behavior. This work tests that assumption across MITRE ATT&CK v18.1. We introduce a four-criterion rubric for infrastructure deception and apply it to all 250 ATT&CK techniques. The rubric evaluates whether a defender-controlled decoy can be placed, whether an attacker is likely to interact with it, what intelligence that interaction can yield, and whether the interaction reliably indicates malice. The resulting deception surface is sparse: only 80 techniques (32%) admit a decoy the attacker could plausibly reach. For the remaining 170 techniques, there is no defender-controlled asset in the attacker's path that can be fabricated as a decoy. Decoy placement across those 80 techniques falls into two patterns we call Sweep and Seek. In Sweep, the attacker moves broadly through assets in range and encounters the decoy as part of that activity. In Seek, the attacker looks for a specific kind of asset and interacts with a fabricated version of it. These patterns give a simple placement rule: a decoy must either sit on a sweep path or imitate a sought asset. We also show that decoys usually have useful intelligence potential, but whether an attacker interacts with them at all, and whether that interaction reliably indicates malice, both vary. We release the rubric, decision rules, and per-technique assessment as an auditable baseline for future deception research and deployment planning, and show that infrastructure decoys cannot be assumed to apply to all attacker behavior.
Given \(H\leq G\) finite abelian groups, a transversal \(T\subseteq G\) for \(G/H\) has fixed size \(|G/H|\), but its ambient difference support \(D(T)=T-T\) can vary with the embedding of \(H\) in \(G\). We call $ δ(G,H)=\min_T |D(T)| $ the transversal difference number of the pair \((G,H)\). This invariant is related to finite abelian factorisation, tiling complements, and small-sumset questions, and is motivated by recent work regarding ambient Galois labels in CRT transforms for cyclotomic-subfield homomorphic encryption. We prove various results regarding this invariant, including a general lower bound $δ(G,H)\geq 2|G/H|-m(G,H), $ where \(m(G,H)\) is the largest order of a subgroup of \(G\) disjoint from \(H\). The bound is sharp for cyclic quotients, and Kneser's theorem gives a cross-transversal estimate leading to exact product families with one nonsplit cyclic coordinate and arbitrary split factors. These results isolate the first genuinely new residual obstruction, namely the same-prime square plane \[ G=(\mathbb Z/p^2\mathbb Z)^2,\qquad H=pG. \] For odd \(p\), this case is the technical core of the paper. Here transversals are graphs of functions \(\mathbb F_p^2\to \mathbb F_p^2\), and \(D(T)\) decomposes into carry-corrected finite-field derivative images. We conjecture that \[ δ(G,H)=(2p-1)^2 \] for all odd primes \(p\), prove the unconditional lower bound \(3p^2-p-1\), and give small-prime, probabilistic, and fixed-polynomial evidence for the conjecture.
The widespread collection of fine-grained location data by commercial data brokers creates a re-identification risk that is not widely recognised by the public. While prior research has established that mobility traces are highly unique and that individuals can, in principle, be identified from a handful of spatio-temporal points, such attacks have historically required significant manual effort from skilled analysts, limiting their practical scale. In this feasibility study, we demonstrate in a real world setting that agentic AI fundamentally changes this threat model. We present an end-to-end pipeline in which large language model agents autonomously search the open web, cross-reference public records and social media, and resolve raw coordinate sequences to candidate identities - without human intervention. We evaluate the pipeline on a spatio-temporal dataset containing simulated location points anchored at and around true home and work addresses, focusing on a high-risk disclosure scenario. Our results demonstrate that, from spatio-temporal data and public sources alone, our agentic AI successfully re-identified 18 of the 25 re-identifiable individuals (72%) and 18 of 43 cases overall (41.9%). We discuss implications for Statistical Disclosure Control (SDC) practice and outline the near-future escalation that data custodians and regulators must anticipate. De facto anonymity - an implicit foundation of SDC practice - is shifting. Agentic AI strengthens the case that re-identification is reasonably likely by any means under the GDPR Recital-26 standard, at costs of minutes-and-dollars per target.
Performance numbers reported for hardware are accepted on trust: the reader cannot recompute them, the apparatus is gone, and the silicon itself can be silently wrong, with fleet studies reporting on the order of one core in a thousand returning incorrect arithmetic with no error raised. We make a reported hardware measurement a tamper-evident, independently checkable record. Every quantity in the text, a table, or a figure is bound, by its content hash, to the observation and the verification behind it; the whole is a hash-linked, append-only structure (a transparency log for measurement) that a verifier audits offline without trusting its producer. Matrix products are verified by a probabilistic identity (Freivalds) at O(k n^2) cost under a tolerance we derive from floating-point error analysis and calibrate to the device's own measured residual floor, so a wrong product is rejected with probability 1 - 2^(-k); quantities with no such identity carry an algebraic checksum and a measured reproducibility class. We then treat the check itself as a security object: a probe seed committed for offline reproducibility is an attack surface, and a probe-aware adversary can hide a corruption in the probe's null space, fooling even a quorum of bit-identical witnesses, while a Fiat-Shamir challenge derived from the claimed output closes this. Driving the device from an unprivileged tenant's reach, with a di/dt power virus and a thermal soak, neither moves the calibrated tolerance nor produces a silent error, placing the physical-fault threat at the rare defective part or the privileged attacker and marking the boundary at which the record must compose with a hardware root of trust. We demonstrate the construction across Blackwell and Hopper GPUs and report a residual-floor and reproducibility map by precision, size, and device.
Traditionally, the architecture of high-performance computing (HPC) systems is tailored for speed, while highly secure computer systems must sacrifice speed for security. However, a wide range of scientific domains, such as the life sciences, call for a combination of performance and security to allow processing sensitive data at scale. Here, we present RAMSES (Research Accelerator for Modeling and Simulation with Enhanced Security), an HPC system designed from the ground up to deliver high performance within a robust security framework. RAMSES integrates hardware-based memory encryption of AMD processors with state-of-the-art file encryption from IBM Storage Scale and the Thales CipherTrust manager, establishing an HPC platform that ensures continuous encryption throughout the data life cycle - at rest, in transit, and in use - in compliance with major data protection standards (European General Data Protection Regulation, ISO/IEC 27001 certification, and Federal Information Processing Standards). In addition, we implemented advanced operating system hardening, a multi-layered security architecture, and mandatory multi-factor authentication to adapt the HPC environment to increased security demands. Benchmark results from the biomedical sector demonstrate that the performance impact of the secure environment is limited and that integration of the conflicting requirements speed and security can be achieved while preserving a coherent, flexible, and user-friendly system.
Time-triggered communication protocols rely on trusted components known as guardians to enforce adherence to predetermined network schedules. Network-agnostic guardians offer an efficient and scalable distributed solution with reduced implementation cost and complexity compared to network-aware alternatives. However, this efficiency is based on the guardian's dependence on the controlled node for clock synchronization, which introduces a vulnerability: a malicious node can exploit this dependency to launch timing attacks against its guardian and eventually interfere with messages from other nodes on the network. In this paper, we establish a theoretical lower bound on the attainable clock synchronization precision between a node and its network-agnostic guardian. Building on this result, we introduce a timing attack that leverages the unavoidably imperfect clock synchrony to cause controlled and undetected de-synchronization of the guardian. The attack enables a malicious node to cause collisions with targeted critical network messages. We evaluate the effectiveness of the attack using a FlexRay field bus network model implemented in the OMNeT++ simulation framework. Our results show that the attack is able to remain undetected with 100% success and disrupts the transmission of the critical messages of the target node by causing collisions with them with 100% success.
Fuzzy Labeled Private Set Intersection (FLPSI) lets a receiver learn the labels of enrolled records similar to its query, and nothing else. Constructions based on a set-threshold reduction reach practical performance: a query matches a record when the two agree on a threshold number of components, and the private matching is delegated to an inner set-threshold kernel. We study its homomorphic form, which combines leveled-BFV homomorphic encryption (HE), a garbled circuit, and secret sharing to decide the match under encryption and release the record's label. We identify a composition gap in this kernel: efficiency is bought with a per-trial false-accept probability, but one query runs a trial for every record, so the error compounds with the database size into the kernel's realization soundness error (RSE), the rate at which it accepts a query the plaintext matcher would reject. The RSE is a reliability property of the cryptographic matching layer, not the matcher's accuracy, and a sound kernel must contribute zero or negligible RSE of its own. We formalize this as a composable security property, give a closed-form bound on the receiver's advantage, and close the gap with CSTPSI, a kernel that runs independent token rounds and raises the per-trial bound to a matching power. We prove CSTPSI secure in the semi-honest model. The bound sets the round count: two token rounds suffice for million-scale databases and three for billion-scale at the $10^{-6}$ engineering threshold. Our evaluation confirms this: at a million records the baseline kernel's RSE reaches 100% while CSTPSI holds it at 0 in every measured configuration. For large labels at small to moderate scale CSTPSI is more than 20x faster than the baseline, with up to 93% less communication, converging to the baseline only at million-scale. Our implementation, with a one-command reproducibility harness, is publicly available.
The rapid spread of fake news poses increasing threats to information ecosystems, especially as AI-generated misinformation under Generative Engine Optimization (GEO) poisoning allows adversarially crafted content to be systematically surfaced by retrieval systems, contaminating LLM reasoning. In this paper, we propose Tree of Evidence (ToE), a hierarchical evidence reasoning framework for automated fact-checking that models each claim as a dynamically expanding argument tree. ToE integrates a reinforcement learning-driven multi-source retrieval agent, an evidence evaluation agent, and an argument tree aggregation algorithm to iteratively decompose, retrieve, and verify claims through an explainable evidence chain. We further provide a theoretical analysis of the retrieval process, deriving a formal error bound that guarantees the learned policy converges to a neighborhood of the information-theoretically optimal policy. Experiments across multiple datasets and backbone LLMs demonstrate that ToE achieves improvements ranging from 4 to 24 percentage points over competitive baselines, with particularly pronounced gains on adversarially poisoned inputs.
TinyML models deployed on edge devices are increasingly adopted in safety/security-critical applications, making them a prime target for adversarial example (AE) attacks where inputs are modified to cause misclassifications. However, existing AE detection methods either require white-box model access, which is often unavailable in licensed black-box deployments, or rely on input pre-processing stages that add non-trivial latency and resource overhead, often exceeding what mission-critical applications can afford on their inference path. To address these challenges, we propose AdvScan, a runtime power analysis-based methodology for AE detection that operates in a black-box scenario while inducing minimal latency. AdvScan is based on the observation that AEs produce anomalous neuron activations, which in turn generate distinctive power-consumption signatures. The algorithm initially constructs a baseline distribution of power signatures from known benign inputs; then, at runtime, it applies a one-sample t-test to determine whether a test input's power signature significantly deviates from this baseline, thereby detecting AEs. We evaluated AdvScan using three adversarial example generation algorithms: Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), and Carlini-Wagner (C&W), on three MLPerf Tiny benchmark models implemented on two target devices: the STM32F303RC (ARM Cortex-M4) and STM32L562RE (ARM Cortex-M33) microcontrollers. Across 318,400 total test inputs, AdvScan detects 99.984% of AEs with only 40 false negatives and zero false positives. These results demonstrate the viability of power-based AE detection for secure, accuracy-critical TinyML deployments in black-box environments.