Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
We propose a novel covert communication system in which a ground user, Alice, transmits unauthorized message fragments to Bob, a low-Earth orbit satellite (LEO), and an unmanned aerial vehicle (UAV) warden (Willie) attempts to detect these transmissions. The key contribution is modeling a scenario where Alice and Willie are unaware of each other's exact locations and move randomly within a specific area. Alice utilizes environmental obstructions to avoid detection and only transmits when the satellite is directly overhead. LEO satellite technology allows users to avoid transmitting messages near a base station. We introduce two key performance metrics: catch probability (Willie detects and locates Alice during a message chunk transmission) and overall catch probability over multiple message chunks. We analyze how two parameters impact these metrics: 1) the size of the detection window and 2) the number of message chunks. The paper proposes two algorithms to optimize these parameters. The simulation results show that the algorithms effectively reduce the detection risks. This work advances the understanding of covert communication under mobility and uncertainty in satellite-aided systems.
Microarchitectural attacks are a significant concern, leading to many hardware-based defense proposals. However, different defenses target different classes of attacks, and their impact on each other has not been fully considered. To raise awareness of this problem, we study an interaction between two state-of-the art defenses in this paper, timing obfuscations of remote cache lines (TORC) and delaying speculative changes to remote cache lines (DSRC). TORC mitigates cache-hit based attacks and DSRC mitigates speculative coherence state change attacks. We observe that DSRC enables coherence information to be retrieved into the processor core, where it is out of the reach of timing obfuscations to protect. This creates an unforeseen consequence that redo operations can be triggered within the core to detect the presence or absence of remote cache lines, which constitutes a security vulnerability. We demonstrate that a new covert channel attack is possible using this vulnerability. We propose two ways to mitigate the attack, whose performance varies depending on an application's cache usage. One way is to never send remote exclusive coherence state (E) information to the core even if it is created. The other way is to never create a remote E state, which is responsible for triggering redos. We demonstrate the timing difference caused by this microarchitectural defense assumption violation using GEM5 simulations. Performance evaluation on SPECrate 2017 and PARSEC benchmarks of the two fixes show less than 32\% average overhead across both sets of benchmarks. The repair which prevented the creation of remote E state had less than 2.8% average overhead.
Language model deployments in consumer-facing applications introduce numerous risks. While existing research on harms and hazards of such applications follows top-down approaches derived from regulatory frameworks and theoretical analyses, empirical evidence of real-world failure modes remains underexplored. In this work, we introduce RealHarm, a dataset of annotated problematic interactions with AI agents built from a systematic review of publicly reported incidents. Analyzing harms, causes, and hazards specifically from the deployer's perspective, we find that reputational damage constitutes the predominant organizational harm, while misinformation emerges as the most common hazard category. We empirically evaluate state-of-the-art guardrails and content moderation systems to probe whether such systems would have prevented the incidents, revealing a significant gap in the protection of AI applications.
In this work, we explore the possibility of universally composable (UC)-secure commitments using Physically Uncloneable Functions (PUFs) within a new adversarial model. We introduce the communicating malicious PUFs, i.e. malicious PUFs that can interact with their creator even when not in their possession, obtaining a stronger adversarial model. Prior work [ASIACRYPT 2013, LNCS, vol. 8270, pp. 100-119] proposed a compiler for constructing UC-secure commitments from ideal extractable commitments, and our task would be to adapt the ideal extractable commitment scheme proposed therein to our new model. However, we found an attack and identified a few other issues in that construction, and to address them, we modified the aforementioned ideal extractable commitment scheme and introduced new properties and tools that allow us to rigorously develop and present security proofs in this context. We propose a new UC-secure commitment scheme against adversaries that can only create stateless malicious PUFs which can receive, but not send, information from their creators. Our protocol is more efficient compared to previous proposals, as we have parallelized the ideal extractable commitments within it. The restriction to stateless malicious PUFs is significant, mainly since the protocol from [ASIACRYPT 2013, LNCS, vol. 8270, pp. 100-119] assumes malicious PUFs with unbounded state, thus limiting its applicability. However it is the only way we found to address the issues of the original construction. We hope that in future work this restriction can be lifted, and along the lines of our work, UC-secure commitments with fewer restrictions on both the state and communication can be constructed.
Large Language Models (LLMs) have emerged as a powerful approach for driving offensive penetration-testing tooling. This paper analyzes the methodology and benchmarking practices used for evaluating Large Language Model (LLM)-driven attacks, focusing on offensive uses of LLMs in cybersecurity. We review 16 research papers detailing 15 prototypes and their respective testbeds. We detail our findings and provide actionable recommendations for future research, emphasizing the importance of extending existing testbeds, creating baselines, and including comprehensive metrics and qualitative analysis. We also note the distinction between security research and practice, suggesting that CTF-based challenges may not fully represent real-world penetration testing scenarios.
Split inference (SI) partitions deep neural networks into distributed sub-models, enabling privacy-preserving collaborative learning. Nevertheless, it remains vulnerable to Data Reconstruction Attacks (DRAs), wherein adversaries exploit exposed smashed data to reconstruct raw inputs. Despite extensive research on adversarial attack-defense games, a shortfall remains in the fundamental analysis of privacy risks. This paper establishes a theoretical framework for privacy leakage quantification using information theory, defining it as the adversary's certainty and deriving both average-case and worst-case error bounds. We introduce Fisher-approximated Shannon information (FSInfo), a novel privacy metric utilizing Fisher Information (FI) for operational privacy leakage computation. We empirically show that our privacy metric correlates well with empirical attacks and investigate some of the factors that affect privacy leakage, namely the data distribution, model size, and overfitting.
Multi-modal large language models (MLLMs) have made significant progress, yet their safety alignment remains limited. Typically, current open-source MLLMs rely on the alignment inherited from their language module to avoid harmful generations. However, the lack of safety measures specifically designed for multi-modal inputs creates an alignment gap, leaving MLLMs vulnerable to vision-domain attacks such as typographic manipulation. Current methods utilize a carefully designed safety dataset to enhance model defense capability, while the specific knowledge or patterns acquired from the high-quality dataset remain unclear. Through comparison experiments, we find that the alignment gap primarily arises from data distribution biases, while image content, response quality, or the contrastive behavior of the dataset makes little contribution to boosting multi-modal safety. To further investigate this and identify the key factors in improving MLLM safety, we propose finetuning MLLMs on a small set of benign instruct-following data with responses replaced by simple, clear rejection sentences. Experiments show that, without the need for labor-intensive collection of high-quality malicious data, model safety can still be significantly improved, as long as a specific fraction of rejection data exists in the finetuning set, indicating the security alignment is not lost but rather obscured during multi-modal pretraining or instruction finetuning. Simply correcting the underlying data bias could narrow the safety gap in the vision domain.
Poorly designed smart contracts are particularly vulnerable, as they may allow attackers to exploit weaknesses and steal the virtual currency they manage. In this study, we train a model using unsupervised learning to identify vulnerabilities in the Solidity source code of Ethereum smart contracts. To address the challenges associated with real-world smart contracts, our training data is derived from actual vulnerability samples obtained from datasets such as SmartBugs Curated and the SolidiFI Benchmark. These datasets enable us to develop a robust unsupervised static analysis method for detecting five specific vulnerabilities: Reentrancy, Access Control, Timestamp Dependency, tx.origin, and Unchecked Low-Level Calls. We employ clustering algorithms to identify outliers, which are subsequently classified as vulnerable smart contracts.
We revisit the longstanding open problem of implementing Nakamoto's proof-of-work (PoW) consensus based on a real-world computational task $T(x)$ (as opposed to artificial random hashing), in a truly permissionless setting where the miner itself chooses the input $x$. The challenge in designing such a Proof-of-Useful-Work (PoUW) protocol, is using the native computation of $T(x)$ to produce a PoW certificate with prescribed hardness and with negligible computational overhead over the worst-case complexity of $T(\cdot)$ -- This ensures malicious miners cannot ``game the system" by fooling the verifier to accept with higher probability compared to honest miners (while using similar computational resources). Indeed, obtaining a PoUW with $O(1)$-factor overhead is trivial for any task $T$, but also useless. Our main result is a PoUW for the task of Matrix Multiplication $MatMul(A,B)$ of arbitrary matrices with $1+o(1)$ multiplicative overhead compared to naive $MatMul$ (even in the presence of Fast Matrix Multiplication-style algorithms, which are currently impractical). We conjecture that our protocol has optimal security in the sense that a malicious prover cannot obtain any significant advantage over an honest prover. This conjecture is based on reducing hardness of our protocol to the task of solving a batch of low-rank random linear equations which is of independent interest. Since $MatMul$s are the bottleneck of AI compute as well as countless industry-scale applications, this primitive suggests a concrete design of a new L1 base-layer protocol, which nearly eliminates the energy-waste of Bitcoin mining -- allowing GPU consumers to reduce their AI training and inference costs by ``re-using" it for blockchain consensus, in exchange for block rewards (2-for-1). This blockchain is currently under construction.
Encrypted search schemes have been proposed to address growing privacy concerns. However, several leakage-abuse attacks have highlighted some security vulnerabilities. Recent attacks assumed an attacker's knowledge containing data ``similar'' to the indexed data. However, this vague assumption is barely discussed in literature: how likely is it for an attacker to obtain a "similar enough" data? Our paper provides novel statistical tools usable on any attack in this setting to analyze its sensitivity to data similarity. First, we introduce a mathematical model based on statistical estimators to analytically understand the attackers' knowledge and the notion of similarity. Second, we conceive statistical tools to model the influence of the similarity on the attack accuracy. We apply our tools on three existing attacks to answer questions such as: is similarity the only factor influencing accuracy of a given attack? Third, we show that the enforcement of a maximum index size can make the ``similar-data'' assumption harder to satisfy. In particular, we propose a statistical method to estimate an appropriate maximum size for a given attack and dataset. For the best known attack on the Enron dataset, a maximum index size of 200 guarantees (with high probability) the attack accuracy to be below 5%.
The proliferation of autonomous agents powered by large language models (LLMs) has revolutionized popular business applications dealing with tabular data, i.e., tabular agents. Although LLMs are observed to be vulnerable against prompt injection attacks from external data sources, tabular agents impose strict data formats and predefined rules on the attacker's payload, which are ineffective unless the agent navigates multiple layers of structural data to incorporate the payload. To address the challenge, we present a novel attack termed StruPhantom which specifically targets black-box LLM-powered tabular agents. Our attack designs an evolutionary optimization procedure which continually refines attack payloads via the proposed constrained Monte Carlo Tree Search augmented by an off-topic evaluator. StruPhantom helps systematically explore and exploit the weaknesses of target applications to achieve goal hijacking. Our evaluation validates the effectiveness of StruPhantom across various LLM-based agents, including those on real-world platforms, and attack scenarios. Our attack achieves over 50% higher success rates than baselines in enforcing the application's response to contain phishing links or malicious codes.
Speech synthesis technology has brought great convenience, while the widespread usage of realistic deepfake audio has triggered hazards. Malicious adversaries may unauthorizedly collect victims' speeches and clone a similar voice for illegal exploitation (\textit{e.g.}, telecom fraud). However, the existing defense methods cannot effectively prevent deepfake exploitation and are vulnerable to robust training techniques. Therefore, a more effective and robust data protection method is urgently needed. In response, we propose a defensive framework, \textit{\textbf{SafeSpeech}}, which protects the users' audio before uploading by embedding imperceptible perturbations on original speeches to prevent high-quality synthetic speech. In SafeSpeech, we devise a robust and universal proactive protection technique, \textbf{S}peech \textbf{PE}rturbative \textbf{C}oncealment (\textbf{SPEC}), that leverages a surrogate model to generate universally applicable perturbation for generative synthetic models. Moreover, we optimize the human perception of embedded perturbation in terms of time and frequency domains. To evaluate our method comprehensively, we conduct extensive experiments across advanced models and datasets, both subjectively and objectively. Our experimental results demonstrate that SafeSpeech achieves state-of-the-art (SOTA) voice protection effectiveness and transferability and is highly robust against advanced adaptive adversaries. Moreover, SafeSpeech has real-time capability in real-world tests. The source code is available at \href{https://github.com/wxzyd123/SafeSpeech}{https://github.com/wxzyd123/SafeSpeech}.
Spam messages continue to present significant challenges to digital users, cluttering inboxes and posing security risks. Traditional spam detection methods, including rules-based, collaborative, and machine learning approaches, struggle to keep up with the rapidly evolving tactics employed by spammers. This project studies new spam detection systems that leverage Large Language Models (LLMs) fine-tuned with spam datasets. More importantly, we want to understand how LLM-based spam detection systems perform under adversarial attacks that purposefully modify spam emails and data poisoning attacks that exploit the differences between the training data and the massages in detection, to which traditional machine learning models are shown to be vulnerable. This experimentation employs two LLM models of GPT2 and BERT and three spam datasets of Enron, LingSpam, and SMSspamCollection for extensive training and testing tasks. The results show that, while they can function as effective spam filters, the LLM models are susceptible to the adversarial and data poisoning attacks. This research provides very useful insights for future applications of LLM models for information security.
Large language models (LLMs) have demonstrated revolutionary capabilities in understanding complex contexts and performing a wide range of tasks. However, LLMs can also answer questions that are unethical or harmful, raising concerns about their applications. To regulate LLMs' responses to such questions, a training strategy called \textit{alignment} can help. Yet, alignment can be unexpectedly compromised when fine-tuning an LLM for downstream tasks. This paper focuses on recovering the alignment lost during fine-tuning. We observe that there are two distinct directions inherent in an aligned LLM: the \textit{aligned direction} and the \textit{harmful direction}. An LLM is inclined to answer questions in the aligned direction while refusing queries in the harmful direction. Therefore, we propose to recover the harmful direction of the fine-tuned model that has been compromised. Specifically, we restore a small subset of the fine-tuned model's weight parameters from the original aligned model using gradient descent. We also introduce a rollback mechanism to avoid aggressive recovery and maintain downstream task performance. Our evaluation on 125 fine-tuned LLMs demonstrates that our method can reduce their harmful rate (percentage of answering harmful questions) from 33.25\% to 1.74\%, without sacrificing task performance much. In contrast, the existing methods either only reduce the harmful rate to a limited extent or significantly impact the normal functionality. Our code is available at https://github.com/kangyangWHU/LLMAlignment
LLM jailbreaks are a widespread safety challenge. Given this problem has not yet been tractable, we suggest targeting a key failure mechanism: the failure of safety to generalize across semantically equivalent inputs. We further focus the target by requiring desirable tractability properties of attacks to study: explainability, transferability between models, and transferability between goals. We perform red-teaming within this framework by uncovering new vulnerabilities to multi-turn, multi-image, and translation-based attacks. These attacks are semantically equivalent by our design to their single-turn, single-image, or untranslated counterparts, enabling systematic comparisons; we show that the different structures yield different safety outcomes. We then demonstrate the potential for this framework to enable new defenses by proposing a Structure Rewriting Guardrail, which converts an input to a structure more conducive to safety assessment. This guardrail significantly improves refusal of harmful inputs, without over-refusing benign ones. Thus, by framing this intermediate challenge - more tractable than universal defenses but essential for long-term safety - we highlight a critical milestone for AI safety research.
The emergence of blockchain technology has revolutionized contract execution through the introduction of smart contracts. Ethereum, the leading blockchain platform, leverages smart contracts to power decentralized applications (DApps), enabling transparent and self-executing systems across various domains. While the immutability of smart contracts enhances security and trust, it also poses significant challenges for updates, defect resolution, and adaptation to changing requirements. Existing upgrade mechanisms are complex, resource-intensive, and costly in terms of gas consumption, often compromising security and limiting practical adoption. To address these challenges, we propose FlexiContracts+, a novel scheme that reimagines smart contracts by enabling secure, in-place upgrades on Ethereum while preserving historical data without relying on multiple contracts or extensive pre-deployment planning. FlexiContracts+ enhances security, simplifies development, reduces engineering overhead, and supports adaptable, expandable smart contracts. Comprehensive testing demonstrates that FlexiContracts+ achieves a practical balance between immutability and flexibility, advancing the capabilities of smart contract systems.
Many-shot jailbreaking (MSJ) is an adversarial technique that exploits the long context windows of modern LLMs to circumvent model safety training by including in the prompt many examples of a ``fake'' assistant responding inappropriately before the final request. With enough examples, the model's in-context learning abilities override its safety training, and it responds as if it were the ``fake'' assistant. In this work, we probe the effectiveness of different fine tuning and input sanitization approaches on mitigating MSJ attacks, alone and in combination. We find incremental mitigation effectiveness for each, and we show that the combined techniques significantly reduce the effectiveness of MSJ attacks, while retaining model performance in benign in-context learning and conversational tasks. We suggest that our approach could meaningfully ameliorate this vulnerability if incorporated into model safety post-training.
Retrieval-Augmented Generation (RAG) has significantly enhanced the factual accuracy and domain adaptability of Large Language Models (LLMs). This advancement has enabled their widespread deployment across sensitive domains such as healthcare, finance, and enterprise applications. RAG mitigates hallucinations by integrating external knowledge, yet introduces privacy risk and security risk, notably data breaching risk and data poisoning risk. While recent studies have explored prompt injection and poisoning attacks, there remains a significant gap in comprehensive research on controlling inbound and outbound query flows to mitigate these threats. In this paper, we propose an AI firewall, ControlNET, designed to safeguard RAG-based LLM systems from these vulnerabilities. ControlNET controls query flows by leveraging activation shift phenomena to detect adversarial queries and mitigate their impact through semantic divergence. We conduct comprehensive experiments on four different benchmark datasets including Msmarco, HotpotQA, FinQA, and MedicalSys using state-of-the-art open source LLMs (Llama3, Vicuna, and Mistral). Our results demonstrate that ControlNET achieves over 0.909 AUROC in detecting and mitigating security threats while preserving system harmlessness. Overall, ControlNET offers an effective, robust, harmless defense mechanism, marking a significant advancement toward the secure deployment of RAG-based LLM systems.