Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
Split Learning (SL) -- splits a model into two distinct parts to help protect client data while enhancing Machine Learning (ML) processes. Though promising, SL has proven vulnerable to different attacks, thus raising concerns about how effective it may be in terms of data privacy. Recent works have shown promising results for securing SL through the use of a novel paradigm, named Function Secret Sharing (FSS), in which servers obtain shares of a function they compute and operate on a public input hidden with a random mask. However, these works fall short in addressing the rising number of attacks which exist on SL. In SplitHappens, we expand the combination of FSS and SL to U-shaped SL. Similarly to other works, we are able to make use of the benefits of SL by reducing the communication and computational costs of FSS. However, a U-shaped SL provides a higher security guarantee than previous works, allowing a client to keep the labels of the training data secret, without having to share them with the server. Through this, we are able to generalize the security analysis of previous works and expand it to different attack vectors, such as modern model inversion attacks as well as label inference attacks. We tested our approach for two different convolutional neural networks on different datasets. These experiments show the effectiveness of our approach in reducing the training time as well as the communication costs when compared to simply using FSS while matching prior accuracy.
The growing reliance on data-driven applications in sectors such as healthcare, finance, and law enforcement underscores the need for secure, privacy-preserving, and scalable mechanisms for data generation and sharing. Synthetic data generation (SDG) has emerged as a promising approach but often relies on centralized or external processing, raising concerns about data sovereignty, domain ownership, and compliance with evolving regulatory standards. To overcome these issues, we introduce SynthGuard, a framework designed to ensure computational governance by enabling data owners to maintain control over SDG workflows. SynthGuard supports modular and privacy-preserving workflows, ensuring secure, auditable, and reproducible execution across diverse environments. In this paper, we demonstrate how SynthGuard addresses the complexities at the intersection of domain-specific needs and scalable SDG by aligning with requirements for data sovereignty and regulatory compliance. Developed iteratively with domain expert input, SynthGuard has been validated through real-world use cases, demonstrating its ability to balance security, privacy, and scalability while ensuring compliance. The evaluation confirms its effectiveness in implementing and executing SDG workflows and integrating privacy and utility assessments across various computational environments.
Vertical Federated Learning (VFL) enables an orchestrating active party to perform a machine learning task by cooperating with passive parties that provide additional task-related features for the same training data entities. While prior research has leveraged the privacy vulnerability of VFL to compromise its integrity through a combination of label inference and backdoor attacks, their effectiveness is constrained by the low label inference precision and suboptimal backdoor injection conditions. To facilitate a more rigorous security evaluation on VFL without these limitations, we propose HASSLE, a hijacking attack framework composed of a gradient-direction-based label inference module and an adversarial embedding generation algorithm enhanced by self-supervised learning. HASSLE accurately identifies private samples associated with a targeted label using only a single known instance of that label. In the two-party scenario, it demonstrates strong performance with an attack success rate (ASR) of over 99% across four datasets, including both image and tabular modalities, and achieves 85% ASR on the more complex CIFAR-100 dataset. Evaluation of HASSLE against 8 potential defenses further highlights its significant threat while providing new insights into building a trustworthy VFL system.
Federated Learning has emerged as a leading paradigm for decentralized, privacy-preserving learning, particularly relevant in the era of interconnected edge devices equipped with sensors. However, the practical implementation of Federated Learning faces three primary challenges: the need for human involvement in costly data labelling processes for target adaptation, covariate shift in client device data collection due to environmental factors affecting sensors, leading to discrepancies between source and target samples, and the impracticality of continuous or regular model updates in resource-constrained environments due to limited data transmission capabilities and technical constraints on channel availability and energy efficiency. To tackle these issues, we expand upon an efficient and scalable Federated Learning framework tailored for real-world client adaptation in industrial settings. This framework leverages a pre-trained source model comprising a deep backbone, an adaptation module, and a classifier running on a powerful server. By freezing the backbone and classifier during client adaptation on resource-constrained devices, we allow the domain adaptive linear layer to handle target domain adaptation, thus minimizing overall computational overhead. Furthermore, this setup, designated as FedAcross+, is extended to encompass the processing of streaming data, thereby rendering the solution suitable for non-stationary environments. Extensive experimental results demonstrate the effectiveness of FedAcross+ in achieving competitive adaptation on low-end client devices with limited target samples, successfully addressing the challenge of domain shift. Moreover, our framework accommodates sporadic model updates within resource-constrained environments, ensuring practical and seamless deployment.
Our research uncovers a novel privacy risk associated with multimodal large language models (MLLMs): the ability to infer sensitive personal attributes from audio data -- a technique we term audio private attribute profiling. This capability poses a significant threat, as audio can be covertly captured without direct interaction or visibility. Moreover, compared to images and text, audio carries unique characteristics, such as tone and pitch, which can be exploited for more detailed profiling. However, two key challenges exist in understanding MLLM-employed private attribute profiling from audio: (1) the lack of audio benchmark datasets with sensitive attribute annotations and (2) the limited ability of current MLLMs to infer such attributes directly from audio. To address these challenges, we introduce AP^2, an audio benchmark dataset that consists of two subsets collected and composed from real-world data, and both are annotated with sensitive attribute labels. Additionally, we propose Gifts, a hybrid multi-agent framework that leverages the complementary strengths of audio-language models (ALMs) and large language models (LLMs) to enhance inference capabilities. Gifts employs an LLM to guide the ALM in inferring sensitive attributes, then forensically analyzes and consolidates the ALM's inferences, overcoming severe hallucinations of existing ALMs in generating long-context responses. Our evaluations demonstrate that Gifts significantly outperforms baseline approaches in inferring sensitive attributes. Finally, we investigate model-level and data-level defense strategies to mitigate the risks of audio private attribute profiling. Our work validates the feasibility of audio-based privacy attacks using MLLMs, highlighting the need for robust defenses, and provides a dataset and framework to facilitate future research.
Large language models (LLMs) typically require fine-tuning for domain-specific tasks, and LoRA offers a computationally efficient approach by training low-rank adapters. LoRA is also communication-efficient for federated LLMs when multiple users collaboratively fine-tune a global LLM model without sharing their proprietary raw data. However, even the transmission of local adapters between a server and clients risks serious privacy leakage. Applying differential privacy (DP) to federated LoRA encounters a dilemma: adding noise to both adapters amplifies synthetic noise on the model, while fixing one adapter impairs the learnability of fine-tuning. In this paper, we propose FedASK (Differentially Private Federated Low Rank Adaptation with Double Sketching) , a novel federated LoRA framework to enable effective updating of both low-rank adapters with robust differential privacy. Inspired by randomized SVD, our key idea is a two-stage sketching pipeline. This pipeline first aggregates carefully sketched, privacy-preserving local updates, and then reconstructs the global matrices on the server to facilitate effective updating of both adapters. We theoretically prove FedASK's differential privacy guarantee and its exact aggregation property. Comprehensive experiments demonstrate that FedASK consistently outperforms baseline methods across a variety of privacy settings and data distributions.
This paper aims to propose a novel machine learning (ML) approach incorporating Homomorphic Encryption (HE) to address privacy limitations in Unmanned Aerial Vehicles (UAV)-based face detection. Due to challenges related to distance, altitude, and face orientation, high-resolution imagery and sophisticated neural networks enable accurate face recognition in dynamic environments. However, privacy concerns arise from the extensive surveillance capabilities of UAVs. To resolve this issue, we propose a novel framework that integrates HE with advanced neural networks to secure facial data throughout the inference phase. This method ensures that facial data remains secure with minimal impact on detection accuracy. Specifically, the proposed system leverages the Cheon-Kim-Kim-Song (CKKS) scheme to perform computations directly on encrypted data, optimizing computational efficiency and security. Furthermore, we develop an effective data encoding method specifically designed to preprocess the raw facial data into CKKS form in a Single-Instruction-Multiple-Data (SIMD) manner. Building on this, we design a secure inference algorithm to compute on ciphertext without needing decryption. This approach not only protects data privacy during the processing of facial data but also enhances the efficiency of UAV-based face detection systems. Experimental results demonstrate that our method effectively balances privacy protection and detection performance, making it a viable solution for UAV-based secure face detection. Significantly, our approach (while maintaining data confidentially with HE encryption) can still achieve an accuracy of less than 1% compared to the benchmark without using encryption.
As the use of differential privacy (DP) becomes widespread, the development of effective tools for reasoning about the privacy guarantee becomes increasingly critical. In pursuit of this goal, we demonstrate novel relationships between DP and measures of statistical disclosure risk. We suggest how experts and non-experts can use these results to explain the DP guarantee, interpret DP composition theorems, select and justify privacy parameters, and identify worst-case adversary prior probabilities.
Driving trajectory data remains vulnerable to privacy breaches despite existing mitigation measures. Traditional methods for detecting driving trajectories typically rely on map-matching the path using Global Positioning System (GPS) data, which is susceptible to GPS data outage. This paper introduces CAN-Trace, a novel privacy attack mechanism that leverages Controller Area Network (CAN) messages to uncover driving trajectories, posing a significant risk to drivers' long-term privacy. A new trajectory reconstruction algorithm is proposed to transform the CAN messages, specifically vehicle speed and accelerator pedal position, into weighted graphs accommodating various driving statuses. CAN-Trace identifies driving trajectories using graph-matching algorithms applied to the created graphs in comparison to road networks. We also design a new metric to evaluate matched candidates, which allows for potential data gaps and matching inaccuracies. Empirical validation under various real-world conditions, encompassing different vehicles and driving regions, demonstrates the efficacy of CAN-Trace: it achieves an attack success rate of up to 90.59% in the urban region, and 99.41% in the suburban region.
Private inference based on Secure Multi-Party Computation (MPC) addresses data privacy risks in Machine Learning as a Service (MLaaS). However, existing MPC-based private inference frameworks focuses on semi-honest or honest majority models, whose threat models are overly idealistic, while malicious security dishonest majority models face the challenge of low efficiency. To balance security and efficiency, we propose a private inference framework using Helper-Assisted Malicious Security Dishonest Majority Model (HA-MSDM). This framework includes our designed five MPC protocols and a co-optimized strategy. These protocols achieve efficient fixed-round multiplication, exponentiation, and polynomial operations, providing foundational primitives for private inference. The co-optimized strategy balances inference efficiency and accuracy. To enhance efficiency, we employ polynomial approximation for nonlinear layers. For improved accuracy, we construct sixth-order polynomial approximation within a fixed interval to achieve high-precision activation function fitting and introduce parameter-adjusted batch normalization layers to constrain the activation escape problem. Benchmark results on LeNet and AlexNet show our framework achieves 2.4-25.7x speedup in LAN and 1.3-9.5x acceleration in WAN compared to state-of-the-art frameworks (IEEE S&P'25), maintaining high accuracy with only 0.04%-1.08% relative errors.
The digitization of democratic processes promises greater accessibility but presents challenges in terms of security, privacy, and verifiability. Existing electronic voting systems often rely on centralized architectures, creating single points of failure and forcing too much trust in authorities, which contradicts democratic principles. This research addresses the challenge of creating a secure, private e-voting system with minimized trust dependencies designed for the most versatile personal device: the smartphone. We introduce SmartphoneDemocracy, a novel e-voting protocol that combines three key technologies: the emerging European Digital Identity (EUDI) Wallet for Sybil-resistant identity verification, Zero-Knowledge Proofs for privacy-preserving validation, and a peer-to-peer blockchain (TrustChain) for a resilient, serverless public bulletin board. Our protocol enables voters to register and cast ballots anonymously and verifiably directly from their smartphones. We provide a detailed protocol design, a security analysis against a defined threat model, and a performance evaluation demonstrating that the computational and network overhead is feasible for medium- to large-scale elections. By developing and prototyping this system, we demonstrate a viable path to empower citizens with a trustworthy, accessible, and user-controlled digital voting experience.
Dementia, a neurodegenerative disease, alters speech patterns, creating communication barriers and raising privacy concerns. Current speech technologies, such as automatic speech transcription (ASR), struggle with dementia and atypical speech, further challenging accessibility. This paper presents a novel dementia obfuscation in speech framework, ClaritySpeech, integrating ASR, text obfuscation, and zero-shot text-to-speech (TTS) to correct dementia-affected speech while preserving speaker identity in low-data environments without fine-tuning. Results show a 16% and 10% drop in mean F1 score across various adversarial settings and modalities (audio, text, fusion) for ADReSS and ADReSSo, respectively, maintaining 50% speaker similarity. We also find that our system improves WER (from 0.73 to 0.08 for ADReSS and 0.15 for ADReSSo) and speech quality from 1.65 to ~2.15, enhancing privacy and accessibility.
Transparency is one of the key benefits of public blockchains. However, the public visibility of transactions potentially compromises users' privacy. The fundamental challenge is to balance the intrinsic benefits of blockchain openness with the vital need for individual confidentiality. The proposal suggests creating a confidential version of wrapped Ethereum (cWETH) fully within the application layer. The solution combines the Elliptic Curve (EC) Twisted ElGamal-based commitment scheme to preserve confidentiality and the EC Diffie-Hellman (DH) protocol to introduce accessibility limited by the commitment scheme. To enforce the correct generation of commitments, encryption, and decryption, zk-SNARKs are utilized.
The emergence of quantum computing presents profound challenges to existing cryptographic infrastructures, whilst the development of central bank digital currencies (CBDCs) has raised concerns regarding privacy preservation and excessive centralisation in digital payment systems. This paper proposes the Quantum-Resilient Privacy Ledger (QRPL) as an innovative token-based digital currency architecture that incorporates National Institute of Standards and Technology (NIST)-standardised post-quantum cryptography (PQC) with hash-based zero-knowledge proofs to ensure user sovereignty, scalability, and transaction confidentiality. Key contributions include adaptations of ephemeral proof chains for unlinkable transactions, a privacy-weighted Proof-of-Stake (PoS) consensus to promote equitable participation, and a novel zero-knowledge proof-based mechanism for privacy-preserving selective disclosure. QRPL aims to address critical shortcomings in prevailing CBDC designs, including risks of pervasive surveillance, with a 10-20 second block time to balance security and throughput in future monetary systems. While conceptual, empirical prototypes are planned. Future work includes prototype development to validate these models empirically.
We propose a method for using Web Authentication APIs for SSH authentication, enabling passwordless remote server login with passkeys. These are credentials that are managed throughout the key lifecycle by an authenticator on behalf of the user and offer strong security guarantees. Passwords remain the dominant mode of SSH authentication, despite their well known flaws such as phishing and reuse. SSH's custom key-based authentication protocol can alleviate these issues but remains vulnerable to key theft. Additionally, it has poor usability, with even knowledgeable users leaking key material and failing to verify fingerprints. Hence, effective key management remains a critical open area in SSH security. In contrast, WebAuthn is a modern authentication standard designed to replace passwords, managing keys on behalf of the user. As a web API, this standard cannot integrate with SSH directly. We propose a framework to integrate WebAuthn with SSH servers, by using UNIX pluggable authentication modules (PAM). Our approach is backwards-compatible, supports stock SSH servers and requires no new software client-side. It offers protection for cryptographic material at rest, resistance to key leaks, phishing protection, privacy protection and attestation capability. None of these properties are offered by passwords nor traditional SSH keys. We validate these advantages with a structured, conceptual security analysis. We develop a prototype implementation and conduct a user study to quantify the security advantages of our proposal, testing our prototype with 40 SSH users. The study confirms the security problems of SSH-keys, including 20% of the cohort leaking their private keys. Our SSH-passkeys effectively address these problems: we find a 90% reduction in critical security errors, while reducing authentication time by 4x on average.
Increasingly, students begin learning aspects of security and privacy during their primary and secondary education (grades K-12 in the United States). Individual U.S. states and some national organizations publish teaching standards -- guidance that outlines expectations for what students should learn -- which often form the basis for course curricula. However, research has not yet examined what is covered by these standards and whether the topics align with what the broader security and privacy community thinks students should know. To shed light on these questions, we started by collecting computer science teaching standards from all U.S. states and eight national organizations. After manually examining a total of 11,954 standards, we labeled 3,778 of them as being related to security and privacy, further classifying these into 103 topics. Topics ranged from technical subjects like encryption, network security, and embedded systems to social subjects such as laws, ethics, and appropriate online behavior. Subsequently, we interviewed 11 security and privacy professionals to examine how the teaching standards align with their expectations. We found that, while the specific topics they mentioned mostly overlapped with those of existing standards, professionals placed a greater emphasis on threat modeling and security mindset.
Quantum Machine Learning (QML) systems inherit vulnerabilities from classical machine learning while introducing new attack surfaces rooted in the physical and algorithmic layers of quantum computing. Despite a growing body of research on individual attack vectors - ranging from adversarial poisoning and evasion to circuit-level backdoors, side-channel leakage, and model extraction - these threats are often analyzed in isolation, with unrealistic assumptions about attacker capabilities and system environments. This fragmentation hampers the development of effective, holistic defense strategies. In this work, we argue that QML security requires more structured modeling of the attack surface, capturing not only individual techniques but also their relationships, prerequisites, and potential impact across the QML pipeline. We propose adapting kill chain models, widely used in classical IT and cybersecurity, to the quantum machine learning context. Such models allow for structured reasoning about attacker objectives, capabilities, and possible multi-stage attack paths - spanning reconnaissance, initial access, manipulation, persistence, and exfiltration. Based on extensive literature analysis, we present a detailed taxonomy of QML attack vectors mapped to corresponding stages in a quantum-aware kill chain framework that is inspired by the MITRE ATLAS for classical machine learning. We highlight interdependencies between physical-level threats (like side-channel leakage and crosstalk faults), data and algorithm manipulation (such as poisoning or circuit backdoors), and privacy attacks (including model extraction and training data inference). This work provides a foundation for more realistic threat modeling and proactive security-in-depth design in the emerging field of quantum machine learning.
In recent years, the increasing awareness of cybersecurity has led to a heightened focus on information security within hardware devices and products. Incorporating Trusted Execution Environments (TEEs) into product designs has become a standard practice for safeguarding sensitive user information. However, vulnerabilities within these components present significant risks, if exploited by attackers, these vulnerabilities could lead to the leakage of sensitive data, thereby compromising user privacy and security. This research centers on trusted applications (TAs) within the Qualcomm TEE and introduces a novel emulator specifically designed for these applications. Through reverse engineering techniques, we thoroughly analyze Qualcomm TAs and develop a partial emulation environment that accurately emulates their behavior. Additionally, we integrate fuzzing testing techniques into the emulator to systematically uncover potential vulnerabilities within Qualcomm TAs, demonstrating its practical effectiveness in identifying real-world security flaws. This research makes a significant contribution by being the first to provide both the implementation methods and source codes for a Qualcomm TAs emulator, offering a valuable reference for future research efforts. Unlike previous approaches that relied on complex and resource-intensive full-system simulations, our approach is lightweight and effective, making security testing of TA more convenient.