Loading...
Loading...
Browse, search, and filter preprints from arXiv—fast, readable, and built for curious security folks.
Showing 18 loaded of 49,623—scroll for more
Decentralized Finance (DeFi) applications rely heavily on the order in which transactions are executed, making them susceptible to reordering attacks that enable adversaries to extract Blockchain Extractable Value (BEV). While linear blockchain systems such as Ethereum have inspired extensive research into fair ordering mechanisms, DAG-based consensus protocols have remained largely unprotected despite their growing adoption for scalability and performance. In this paper, we introduce Tilikum, a DAG-based ledger protocol that ensures fair transaction ordering without relying on weak edges. Tilikum achieves ordering linearizability by leveraging median-based timestamp aggregation, or batch order fairness, while maintaining low data redundancy and robust garbage collection. We implemented Tilikum in Rust and evaluated it against representative baselines, namely Narwhal/Tusk, Pompē, Themis and FairDAG. Our results show that Tilikum achieves up to $39\times$ higher throughput than other fair-ordering baselines, while fully blocking state-of-the-art DAG-specific reordering attacks.
Impagliazzo's five worlds classify computational assumptions along a single axis, the existence of cryptographic primitives. All five worlds implicitly assume that every party, including the adversary, observes the full input, that the observer is always $O_{top}$. This assumption is so natural that it is never stated. This work makes it explicit and relaxes it by introducing a second, orthogonal axis, the observational axis, defined by the observer hierarchy introduced in previous work. Relaxing the assumption reveals structural phenomena, such as the collapse $P^{O_{prof}} = NP^{O_{prof}} \subset P$, that the five-world framework cannot express. We prove that this collapse holds unconditionally in all five worlds, showing that observational blindness and computational hardness are independent. We define the Observer World $W_O$, classify all world-observer pairs, identify the labeled cells (a)--(d), and introduce a parametric family $W_O^{\varepsilon}$ modelling partial violations of observational invariants. The framework also interfaces with physical information limits, including thermodynamic, quantum, and cosmological bounds.
We introduce PRISM (PE Relational Inter-Section Matrix), an open dataset and feature representation for static Windows PE malware detection. Existing benchmarks such as EMBER, BODMAS, and SOREL-20M represent each PE file as a flat one-dimensional feature vector, discarding the ordering of sections and the relational context between them. PRISM instead encodes every binary as a two-dimensional matrix whose rows are individual PE sections in file order, with a global summary row that preserves compatibility with EMBER-style models. We build the corpus from four malware sources (BODMAS, MalwareBazaar, VirusShare, and CAPE) together with SOREL-20M benign software, yielding 83,633 deduplicated matrices and a family-filtered analysis corpus of 49,204 samples across 684 malware families. A formal separability analysis (Fisher Discriminant Ratio, mutual information, and inter-section information gain) shows that the per-section positional structure carries discriminative information that flat representations cannot capture. Under a strictly controlled, sample-matched comparison, a gradient-boosted classifier on the compact PRISM representation recovers nearly all of the binary-detection performance of the same classifier on the much larger EMBER vector, at roughly one-sixth the dimensionality; EMBER retains only a small, consistent advantage confined to the extreme low-false-positive regime, the two being operationally indistinguishable at the decision threshold. We are explicit that this binary task is saturated, so the structural content PRISM preserves is reserved for tasks with greater metric headroom, such as family classification and architectures that exploit the 2D structure directly. The dataset, extraction library, trained models, and full analysis pipeline are released under CC BY-NC-SA and MIT licences.
We present a novel approach for applying Large Language Models (LLMs) to threat assessment in the context of foreign peacekeeping missions. Building on the PINPOINT project and its use case, the EU Monitoring Mission in Georgia, we combine an interdisciplinary risk-model with OSINT-based media collection and LLM-supported threat extraction. The proposed workflow maps media contents to mission-relevant threats, extracts structured information and applies several additional LLM-based processing steps to improve relevance and grounding. An evaluation of threats extracted from media documents shows high agreement between automatically generated results and human judgment for core aspects such as threat and mission relevance. These results indicate that LLMs provide a promising approach to support analysts in the context of peacekeeping missions.
Privacy is one of the fundamental rights of individuals in modern societies. Yet, the practical adoption of privacy-preserving technologies in daily interactions remains limited. Zero-knowledge proofs offer strong privacy guarantees but are often hindered by their technical complexity. In this paper, we advance the idea of verifiable QR codes that enable off-line verifiers to verify proofs encoded in QR codes. Based on this core idea, we build a novel QR-driven zkSNARK proof verification framework (i.e., zQR) for mobile platforms. The framework integrates blockchain for auditability, non-repudiation and logging; and large-language models for automatic circuit generation. We perform a security discussion of the framework by considering multiple attack surfaces. Furthermore, we present an experimental evaluation measuring temporal costs (proof generation and verification latency, QR code encoding and decoding latency) and financial costs (blockchain gas consumption). Our results demonstrate the feasibility of zQR as a proof-of-concept framework for privacy-preserving verification on mobile platform where proofs are compactly represented with QR code symbol version of 19 with low error correction level. Finally, we discuss potential applications, current limitations and future directions for the broader adoption of privacy-preserving technologies in daily interactions.
LLMs fine-tuned for security classification are usually evaluated on held-out examples from the same distribution as their training data. We show that this can miss vulnerabilities introduced by fine-tuning itself: models can learn token-level indicator semantics that preserve canonical accuracy while failing under behavior-preserving transformations such as PowerShell alias substitution, command reconstruction, string construction, execution indirection, and case mutation. We study Foundation-Sec-8B-Instruct and its base model, Llama-3.1-8B-Instruct, on matched PowerShell classification cohorts. Causal interventions localize the classification circuit to a late-attention route inherited from Llama rather than created by fine-tuning. Fine-tuning concentrates and semantically specializes this inherited structure, improving baseline behavior while creating transformation-sensitive attack surfaces. A three-tier evasion benchmark finds Foundation-Sec misses on iwr substitution, Invoke-Expression reconstruction, and case-mutated Invoke-Expression/IEX variants that Llama does not share. We also derive a pre-deployment monitoring method: a linear probe at the classification boundary and an indicator-token sign test identify command families where canonical indicators change role after fine-tuning. These signals prioritize red-team variant generation using only canonical inputs, showing that security fine-tuning can improve task accuracy while expanding the evasion surface. These results caution against treating small task-specific fine-tunes as straightforwardly safer security classifiers: specialization can convert inherited model structure into brittle indicator rules that preserve held-out accuracy while expanding the evasion surface. Robust AI-enabled security will require specifying the full transformation space of the task and monitoring semantic drift through fine-tuning.
We develop a type system for secure information flow where new security levels can be created and inserted into the security lattice dynamically, i.e., even in the middle of an execution of a system. Our system is formalized by extending Kobayashi's type-based secure information flow analysis for Milner's pi-calculus, which is one of the most expressive models (or "languages") supporting both sequential and concurrent computations, with concise syntax, reduction-based semantics, and bisimulation equivalence as a robust formalization of secrecy as non-interference. The development required careful treatment of extensions of lattices themselves as well as deliberate generalization from the simple 2-element lattice (consisting of only High and Low) in the original system.
Physical layer authentication (PLA) allows to authenticate the user by comparing measurements over time, assuming their time consistency or by modeling their evolution. However, these assumptions become problematic when devices are in motion and in indoor environments due to multipath propagation and obstructions. In this paper, we propose a PLA mechanism for moving devices in indoor environments, where multiple access points (APs) estimate the dominant channel tap path loss (PL) and angle of arrival (AoA) from the received signals and compare them with previously collected channel knowledge maps (CKMs). Specifically, the measurements are compared to those in the neighborhood of the previously known position obtained from CKMs. A comprehensive security analysis is conducted under both random and optimal attacks. Numerical results in a representative indoor scenario, with CKM obtained via ray tracing, validate the effectiveness of the proposed PLA approach.
Unmanned aerial vehicle (UAV) swarms rely on distributed coordination and cooperative communication to support scalable operations, extended coverage, and applications such as surveillance and real-time data exchange. Wireless technologies such as radio frequency (RF) and WiFi are widely used for UAV-to-UAV and UAV-to-ground control station (GCS) communication but introduce significant security challenges. MAVLink, the predominant communication protocol in UAV systems, provides message integrity and authentication but lacks built-in encryption, leaving telemetry traffic vulnerable to eavesdropping. In our previous work, we proposed MAVShield, a lightweight encryption framework for MAVLink communications. In this paper, MAVShield, AES-CTR, Speck-CTR, ChaCha20, and Rabbit are integrated into four custom-built UAVs to establish secure communication links over RF and WiFi channels. Their performance is evaluated through flight experiments using a UAV swarm testbed. Encrypted telemetry data enable autonomous formation control and collision avoidance during flight. For collision avoidance, we develop a modified artificial potential field (APF) algorithm that computes attractive and repulsive forces directly in geodetic coordinates, eliminating Cartesian transformations and reducing trajectory oscillations while avoiding local-minimum trapping. CPU utilization, memory consumption, and packet delivery ratio (PDR) are measured for each encryption scheme. Results show that MAVShield achieves performance comparable to unencrypted communication while outperforming AES-CTR, Speck-CTR, ChaCha20, and Rabbit in overall efficiency. Algebraic cryptanalysis and Wireshark-based traffic analysis demonstrate resistance to key-recovery attacks and protection of telemetry confidentiality. The results indicate that MAVShield is an efficient and secure solution for UAV swarm communication.
With the rapid evolution of LLM-driven agents, Model Context Protocol (MCP), an open protocol bridging LLMs with external tools, has quickly become foundational to modern agent ecosystems. However, the expanding adoption of MCP has also introduced novel security concerns such as Tool Poisoning Attack (TPA), which exploit LLM-server interactions to inject malicious prompts. Existing poisoning schemes typically adopt a monolithic plaintext embedding paradigm, which fails to withstand manual inspection or automated detectors. Current research still lacks a systematic analysis on multi-tool poisoning, where multiple tools can be exploited cooperatively to disperse detection risk. In this paper, we introduce ShareLock, a multi-tool threshold poisoning framework that utilizes Shamir's threshold scheme to ensure exceptional stealth and fault tolerance. ShareLock distributes the malicious instruction as benign-looking secret shares across multiple tool descriptions, achieving both information-theoretic secrecy and attack robustness against moderate auditing. After a covert reconstruction trigger is planted during server update, the aggregated shares reconstruct the hidden instruction, resulting in critical breaches of system assets or private data. To evaluate the realistic threat of ShareLock, we constructed a comprehensive benchmark encompassing four multi-tool scenarios and conducted extensive experiments across mainstream LLMs on two distinct MCP clients. Our results demonstrate that ShareLock significantly outperforms existing single-tool poisoning strategies in tool description-based detection while maintaining an average attack success rate exceeding 90%.
Apple AirDrop and Google/Samsung Quick Share are proximity file-transfer protocols used by over five billion devices, yet their application-layer security properties remain largely unstudied because both stacks are proprietary and undocumented. Both protocols are reachable from wireless proximity without any prior pairing and process complex serialized content (binary plists, CPIO archives, Protocol Buffers, UKEY2 handshakes) inside privileged daemons, making them attractive zero-click targets across multiple operating systems. We perform the first cross-platform reverse engineering and protocol-aware fuzzing study of both stacks. We reconstruct AirDrop's seven-layer state machine and DVZip adaptive compression from binary analysis, build AIRFUZZ, a protocol-aware fuzzer that mutates pre-compression representations, and complement it with targeted hand-written analyses of Samsung's Quick Share service and Google's Quick Share for Windows. We discover six vulnerabilities (V1-V6): three pre-authentication issues in macOS/iOS AirDrop (V1: Swift fatalError DoS in the HTTP path router; V2: unbounded XML plist recursion in Foundation; V3: NULL dereference in Network.framework's HTTP/1.1 parser), two protocol-layer flaws in Samsung Quick Share (V4: pre-authentication OfflineFrame dispatch; V5: D2D encryption bypass for three frame types), and a heap use-after-free in Google Quick Share for Windows (V6) for which Google awarded a bounty. We responsibly disclosed all findings, and Apple, Samsung, and Google have acknowledged the reports.
With a profusion of jailbreaks for LLMs now widely known, a growing concern is that non-expert malicious actors ("the average Jane") could elicit actionable responses to malicious requests. In this work, we examine whether this concern is justified. A non-expert malicious actor requires two ingredients for a successful attack: a powerful jailbreak for their target model, acting on an effective malicious query. For the former, we propose a novel attack strategy based on the multi-armed bandit framework. This allows efficient online learning of the optimal jailbreak from a large choice set via noisy exploration on a small number of queries, with subsequent application of the learnt policy on an exploitation set. For the latter, we curate $\mathrm{FrankensteinBench}$, a safety benchmark of $11,279$ malicious queries drawn from manual curation over $7$ existing benchmarks, along with automated enhancement and generation. Each query is categorized as simple or complex by the technical expertise required to craft it. Our findings confirm the concern. Our bandit-based attack achieves success rates as high as $97\%$ on average over $15$ SoTA open-weight LLMs. Moreover, adding complexity to queries raises the attack success rate by up to $26\%$ on average across models -- making it an effective, automatable prompting strategy.
AI-assisted vulnerability discovery has proven effective for bug classes like memory safety, where instrumentation confirms memory violations and efficiently filters false positives. Many dangerous vulnerability classes, such as cryptographic misuse, however, lack any comparable instrumentation. In this work, we present Chai, an AI-based system that discovers and validates cryptographic misuse vulnerabilities through naturally occurring signals. To achieve this, Chai rethinks the classical technique of differential testing by leveraging AI to 1) improve precision for detecting real security issues in libraries, and 2) repurpose commonly overlooked discrepancies as leads for tangible vulnerabilities in downstream applications. In doing so, Chai inverts the prevailing paradigm of AI vulnerability discovery: instead of auditing one codebase for many flaws, it catalogs flaws at the library level and propagates them across a cryptographic dependency graph, delivering compounding efficiency gains. We evaluate Chai across X.509, JWT, and SAML libraries. Chai discovered a previously unknown critical vulnerability in an SSL library that powers billions of devices, along with security bugs in one library behind a major web browser and another in major Linux distributions. In total, these techniques surfaced over 100 vulnerabilities.
LLM coding harnesses grant agents broad file and shell access, yet the configuration layer that steers them -- rules files, agent definitions, IDE-specific markdown -- is largely unmanaged. A prevalence study of 10,008 public GitHub repositories (n=6,145 agent config files) finds that agent configurations propagate as undeclared shared components: 10.1% of tracked paths are SHA-256 exact duplicates across independent repositories (fork-adjusted, threshold-independent), with 75.5% of clone pairs crossing organisational boundaries. Two further patterns are indicative: configurations are rarely revised (58% single-commit; 0.4 vs 0.6 commits/month age-normalised against CI/CD workflows), and rarely declare permission boundaries (<1% of agent configs vs 33% of Actions workflows, n=31 true positives). We propose a deterministic control plane above the harness that maps one-to-one to these gaps. Rel(AI)Build treats agent definitions as a managed supply chain (SHA-256 content addressing, HMAC-stamped lockfiles, hash-chained audit logs); enforces tiered permissions and attack-derived blocklists before LLM invocation; gates feature work through a phase state machine with requirement-to-file-to-test traceability; compiles a single canonical definition to seven IDE targets; and detects prompt drift via Jaccard similarity. Conformance tests on injected violations confirm each mechanism enforces its stated invariant; developer outcomes remain future work. Governance of this layer must be deterministic and tool-agnostic -- not delegated to further LLM orchestration.
Third-party vendors, such as analytics platforms, cloud services, identity providers, and software suppliers, are increasingly embedded in digital service delivery. While these arrangements enable scale and specialization, they also move customer data and security-relevant practices into environments that customers rarely see, select, or evaluate. This paper examines this problem through a document analysis of the November 2025 OpenAI-Mixpanel security incident. The incident serves as an illustrative case for showing how a security event in a vendor environment can become a governance and accountability problem for the focal organization that maintains the customer relationship. Drawing on organizational trust research and agency theory, the paper argues that third-party cybersecurity risk is both a trust relationship and a delegation problem. Customers trust the visible service provider, while the provider relies on vendors whose security practices are only partially visible and controllable. The paper develops the concept of transitive trust, where customer trust in a digital service depends on the security practices of vendors authorized by that service provider. It then presents the Fortress and Gatekeeper framework, which explains cybersecurity governance boundaries through trust and data flows rather than formal organizational ownership alone. The analysis develops four propositions concerning vendor integration, metadata exposure, vendor assurance, and data proliferation. The paper contributes to cybersecurity governance scholarship by explaining how delegated data processing creates customer-facing accountability and by identifying implications for vendor tiering, data classification, contractual design, continuous assurance, and data minimization.
Spiking Neural Networks (SNN) have emerged as a revolutionary paradigm compared to traditional Deep Neural Networks (DNN) in energy-efficient computing, showcasing exceptional capabilities in processing event-driven sensory data for real-time applications like robotics and edge AI systems. However, unlike extensive studies on DNN copyright solutions, SNN copyright protection remains largely underexplored due to their inherent temporal coding complexities and spike-driven computation. In this study, we propose a novel active copyright protection framework named SpikeTimer for SNNs via temporal backdoor learning. SpikeTimer partitions neuromorphic data into designated timeslices and exclusively embeds authorized tokens within authorized slices. Furthermore, the inherent temporal segmentation characteristic intrinsically enables SpikeTimer to support multi-user authorization mechanisms and accommodates token embedding of arbitrary morphology. Based on this, SpikeTimer precisely responds to authorized data containing a token within the correct timeslice, while producing erroneous responses to unauthorized data. Our key innovation lies in establishing a time-dependent authorization mechanism that protects the SNN copyright by temporal token validity. Additionally, SpikeTimer retains its defensive efficacy even under adversarial attempts. Evaluations on multiple neuromorphic datasets manifest that SpikeTimer achieves around 10% accuracy on unauthorized data with merely around 1.5% degradation on authorized inputs. Moreover, SpikeTimer demonstrates robust resistance against model finetuning and pruning threats.
Multimodal agentic retrieval-augmented generation (RAG) systems expand the attack surface beyond prompt injection to include text poisoning, image injection, direct-query attacks, and orchestrator-level tool manipulation. Existing red-teaming approaches are typically surface-specific and often recycle known attack templates; on text-poisoning benchmarks we measure 73-84% exact duplication. We present MIRROR, a unified cross-surface framework that performs memory-guided Monte Carlo tree search while conditioning candidate generation on retrieved context under an explicit novelty constraint. A deterministic Novelty Gate rejects any candidate matching the retrieval set under normalized comparison, allowing retrieval to inform search priors without enabling prompt copying. Across four attack surfaces on a multimodal agentic RAG target, MIRROR attains 76% ASR on image poisoning compared with 52% for baselines, 97% ASR on orchestrator attacks at half the query cost, and the lowest cross-surface variance (coefficient of variation 0.47). In contrast, specialized baselines collapse across surfaces: suffix optimization reaches 79% ASR on text poisoning but 1% on direct queries. We release ART-SafeBench with 41,815 in-package records and runtime adapters yielding 41,991+ total records across four surfaces.
Lattice basis reduction algorithms have various applications in computational number theory and lattice-based cryptography, but their complexity increases rapidly with the dimension. Motivated by the divide-and-conquer strategy of merge sort and incorporating PotLLL-style deep insertions during recombination, MergeLLL is proposed. In this framework, a lattice basis is split into sub-bases, local reductions are performed independently, and the full basis is reconstructed through hierarchical merging. The approach is focused on improving local lattice structure first before global basis properties are refined, resulting in enhanced Gram-Schmidt orthogonality and numerical stability, while overall computational cost is reduced. The method is naturally parallelizable, allowing efficient multicore and distributed execution. It is shown that the reduction and merging steps preserve the lattice structure through unimodular transformations and achieve logarithmic parallel depth. In experiments on subset-sum and NTRU-derived lattices, improvements over classical lattice reduction algorithms are demonstrated, including better orthogonality, a reduced number of expensive swap operations, and an improved Hermite factor, indicating higher-quality reduced bases.