Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
Model inversion attacks pose an open challenge to privacy-sensitive applications that use machine learning (ML) models. For example, face authentication systems use modern ML models to compute embedding vectors from face images of the enrolled users and store them. If leaked, inversion attacks can accurately reconstruct user faces from the leaked vectors. There is no systematic characterization of properties needed in an ideal defense against model inversion, even for the canonical example application of a face authentication system susceptible to data breaches, despite a decade of best-effort solutions. In this paper, we formalize the desired properties of a provably strong defense against model inversion and connect it, for the first time, to the cryptographic concept of fuzzy extractors. We further show that existing fuzzy extractors are insecure for use in ML-based face authentication. We do so through a new model inversion attack called PIPE, which achieves a success rate of over 89% in most cases against prior schemes. We then propose L2FE-Hash, the first candidate fuzzy extractor which supports standard Euclidean distance comparators as needed in many ML-based applications, including face authentication. We formally characterize its computational security guarantees, even in the extreme threat model of full breach of stored secrets, and empirically show its usable accuracy in face authentication for practical face distributions. It offers attack-agnostic security without requiring any re-training of the ML model it protects. Empirically, it nullifies both prior state-of-the-art inversion attacks as well as our new PIPE attack.
As machine learning (ML) models become increasingly deployed through cloud infrastructures, the confidentiality of user data during inference poses a significant security challenge. Homomorphic Encryption (HE) has emerged as a compelling cryptographic technique that enables computation on encrypted data, allowing predictions to be generated without decrypting sensitive inputs. However, the integration of HE within large scale cloud native pipelines remains constrained by high computational overhead, orchestration complexity, and model compatibility issues. This paper presents a systematic framework for the design and optimization of cloud native homomorphic encryption workflows that support privacy-preserving ML inference. The proposed architecture integrates containerized HE modules with Kubernetes-based orchestration, enabling elastic scaling and parallel encrypted computation across distributed environments. Furthermore, optimization strategies including ciphertext packing, polynomial modulus adjustment, and operator fusion are employed to minimize latency and resource consumption while preserving cryptographic integrity. Experimental results demonstrate that the proposed system achieves up to 3.2times inference acceleration and 40% reduction in memory utilization compared to conventional HE pipelines. These findings illustrate a practical pathway for deploying secure ML-as-a-Service (MLaaS) systems that guarantee data confidentiality under zero-trust cloud conditions.
System logs are a cornerstone of cybersecurity, supporting proactive breach prevention and post-incident investigations. However, analyzing vast amounts of diverse log data remains significantly challenging, as high costs, lack of in-house expertise, and time constraints make even basic analysis difficult for many organizations. This study introduces LLMLogAnalyzer, a clustering-based log analysis chatbot that leverages Large Language Models (LLMs) and Machine Learning (ML) algorithms to simplify and streamline log analysis processes. This innovative approach addresses key LLM limitations, including context window constraints and poor structured text handling capabilities, enabling more effective summarization, pattern extraction, and anomaly detection tasks. LLMLogAnalyzer is evaluated across four distinct domain logs and various tasks. Results demonstrate significant performance improvements over state-of-the-art LLM-based chatbots, including ChatGPT, ChatPDF, and NotebookLM, with consistent gains ranging from 39% to 68% across different tasks. The system also exhibits strong robustness, achieving a 93% reduction in interquartile range (IQR) when using ROUGE-1 scores, indicating significantly lower result variability. The framework's effectiveness stems from its modular architecture comprising a router, log recognizer, log parser, and search tools. This design enhances LLM capabilities for structured text analysis while improving accuracy and robustness, making it a valuable resource for both cybersecurity experts and non-technical users.
We present a security framework that strengthens distributed machine learning by standardizing integrity protections across CPU and GPU platforms and significantly reducing verification overheads. Our approach co-locates integrity verification directly with large ML model execution on GPU accelerators, resolving the fundamental mismatch between how large ML workloads typically run (primarily on GPUs) and how security verifications traditionally operate (on separate CPU-based processes), delivering both immediate performance benefits and long-term architectural consistency. By performing cryptographic operations natively on GPUs using dedicated compute units (e.g., Intel Arc's XMX units, NVIDIA's Tensor Cores), our solution eliminates the potential architectural bottlenecks that could plague traditional CPU-based verification systems when dealing with large models. This approach leverages the same GPU-based high-memory bandwidth and parallel processing primitives that power ML workloads ensuring integrity checks keep pace with model execution even for massive models exceeding 100GB. This framework establishes a common integrity verification mechanism that works consistently across different GPU vendors and hardware configurations. By anticipating future capabilities for creating secure channels between trusted execution environments and GPU accelerators, we provide a hardware-agnostic foundation that enterprise teams can deploy regardless of their underlying CPU and GPU infrastructures.
Wide-bandgap (WBG) technologies offer unprecedented improvements in power system efficiency, size, and performance, but also introduce unique sensor corruption and cybersecurity risks in industrial control systems (ICS), particularly due to high-frequency noise and sophisticated cyber-physical threats. This proof-of-concept (PoC) study demonstrates the adaptation of a noise-driven physically unclonable function (PUF) and machine learning (ML)-assisted anomaly detection framework to the demanding environment of WBG-based ICS sensor pathways. By extracting entropy from unavoidable WBG switching noise (up to 100 kHz) as a PUF source, and simultaneously using this noise as a real-time threat indicator, the proposed system unites hardware-level authentication and anomaly detection. Our approach integrates hybrid machine learning (ML) models with adaptive Bayesian filtering, providing robust and low-latency detection capabilities resilient to both natural electromagnetic interference (EMI) and active adversarial manipulation. Through detailed simulations of WBG modules under benign and attack scenarios--including EMI injection, signal tampering, and node impersonation--we achieve 95% detection accuracy and sub-millisecond processing latency. These results demonstrate the feasibility of physics-driven, dual-use noise exploitation as a scalable ICS defense primitive. Our findings lay the groundwork for next-generation security strategies that leverage inherent device characteristics, bridging hardware and artificial intelligence (AI) for enhanced protection of critical ICS infrastructure.
Data poisoning attacks are a potential threat to machine learning (ML) models, aiming to manipulate training datasets to disrupt their performance. Existing defenses are mostly designed to mitigate specific poisoning attacks or are aligned with particular ML algorithms. Furthermore, most defenses are developed to secure deep neural networks or binary classifiers. However, traditional multiclass classifiers need attention to be secure from data poisoning attacks, as these models are significant in developing multi-modal applications. Therefore, this paper proposes SecureLearn, a two-layer attack-agnostic defense to defend multiclass models from poisoning attacks. It comprises two components of data sanitization and a new feature-oriented adversarial training. To ascertain the effectiveness of SecureLearn, we proposed a 3D evaluation matrix with three orthogonal dimensions: data poisoning attack, data sanitization and adversarial training. Benchmarking SecureLearn in a 3D matrix, a detailed analysis is conducted at different poisoning levels (10%-20%), particularly analysing accuracy, recall, F1-score, detection and correction rates, and false discovery rate. The experimentation is conducted for four ML algorithms, namely Random Forest (RF), Decision Tree (DT), Gaussian Naive Bayes (GNB) and Multilayer Perceptron (MLP), trained with three public datasets, against three poisoning attacks and compared with two existing mitigations. Our results highlight that SecureLearn is effective against the provided attacks. SecureLearn has strengthened resilience and adversarial robustness of traditional multiclass models and neural networks, confirming its generalization beyond algorithm-specific defenses. It consistently maintained accuracy above 90%, recall and F1-score above 75%. For neural networks, SecureLearn achieved 97% recall and F1-score against all selected poisoning attacks.
Authentication systems have evolved a lot since the 1960s when Fernando Corbato first proposed the password-based authentication. In 2013, the FIDO Alliance proposed using secure hardware for authentication, thus marking a milestone in the passwordless authentication era [1]. Passwordless authentication with a possession-based factor often relied on hardware-backed cryptographic methods. FIDO2 being one an amalgamation of the W3C Web Authentication and FIDO Alliance Client to Authenticator Protocol is an industry standard for secure passwordless authentication with rising adoption for the same [2]. However, the current FIDO2 standards use ECDSA with SHA-256 (ES256), RSA with SHA-256 (RS256) and similar classical cryptographic signature algorithms. This makes it insecure against attacks involving large-scale quantum computers [3]. This study aims at exploring the usability of Module Lattice based Digital Signature Algorithm (ML-DSA), based on Crystals Dilithium as a post quantum cryptographic signature standard for FIDO2. The paper highlights the performance and security in comparison to keys with classical algorithms.
Federated Learning (FL) enables collaborative model training across institutions without sharing raw data. However, gradient sharing still risks privacy leakage, such as gradient inversion attacks. Homomorphic Encryption (HE) can secure aggregation but often incurs prohibitive computational and communication overhead. Existing HE-based FL methods sit at two extremes: encrypting all gradients for full privacy at high cost, or partially encrypting gradients to save resources while exposing vulnerabilities. We present DictPFL, a practical framework that achieves full gradient protection with minimal overhead. DictPFL encrypts every transmitted gradient while keeping non-transmitted parameters local, preserving privacy without heavy computation. It introduces two key modules: Decompose-for-Partial-Encrypt (DePE), which decomposes model weights into a static dictionary and an updatable lookup table, only the latter is encrypted and aggregated, while the static dictionary remains local and requires neither sharing nor encryption; and Prune-for-Minimum-Encrypt (PrME), which applies encryption-aware pruning to minimize encrypted parameters via consistent, history-guided masks. Experiments show that DictPFL reduces communication cost by 402-748$\times$ and accelerates training by 28-65$\times$ compared to fully encrypted FL, while outperforming state-of-the-art selective encryption methods by 51-155$\times$ in overhead and 4-19$\times$ in speed. Remarkably, DictPFL's runtime is within 2$\times$ of plaintext FL, demonstrating for the first time, that HE-based private federated learning is practical for real-world deployment. The code is publicly available at https://github.com/UCF-ML-Research/DictPFL.
The integration of machine learning (ML) systems into critical industries such as healthcare, finance, and cybersecurity has transformed decision-making processes, but it also brings new challenges around trust, security, and accountability. As AI systems become more ubiquitous, ensuring the transparency and correctness of AI-driven decisions is crucial, especially when they have direct consequences on privacy, security, or fairness. Verifiable AI, powered by Zero-Knowledge Machine Learning (zkML), offers a robust solution to these challenges. zkML enables the verification of AI model inferences without exposing sensitive data, providing an essential layer of trust and privacy. However, traditional zkML systems typically require deep cryptographic expertise, placing them beyond the reach of most ML engineers. In this paper, we introduce JSTprove, a specialized zkML toolkit, built on Polyhedra Network's Expander backend, to enable AI developers and ML engineers to generate and verify proofs of AI inference. JSTprove provides an end-to-end verifiable AI inference pipeline that hides cryptographic complexity behind a simple command-line interface while exposing auditable artifacts for reproducibility. We present the design, innovations, and real-world use cases of JSTprove as well as our blueprints and tooling to encourage community review and extension. JSTprove therefore serves both as a usable zkML product for current engineering needs and as a reproducible foundation for future research and production deployments of verifiable AI.
Program analysis tools often produce large volumes of candidate vulnerability reports that require costly manual review, creating a practical challenge: how can security analysts prioritize the reports most likely to be true vulnerabilities? This paper investigates whether machine learning can be applied to prioritizing vulnerabilities reported by program analysis tools. We focus on Node.js packages and collect a benchmark of 1,883 Node.js packages, each containing one reported ACE or ACI vulnerability. We evaluate a variety of machine learning approaches, including classical models, graph neural networks (GNNs), large language models (LLMs), and hybrid models that combine GNN and LLMs, trained on data based on a dynamic program analysis tool's output. The top LLM achieves $F_{1} {=} 0.915$, while the best GNN and classical ML models reaching $F_{1} {=} 0.904$. At a less than 7% false-negative rate, the leading model eliminates 66.9% of benign packages from manual review, taking around 60 ms per package. If the best model is tuned to operate at a precision level of 0.8 (i.e., allowing 20% false positives amongst all warnings), our approach can detect 99.2% of exploitable taint flows while missing only 0.8%, demonstrating strong potential for real-world vulnerability triage.
Quantum computing is reshaping the security landscape of modern telecommunications. The cryptographic foundations that secure todays 5G systems, including RSA, Elliptic Curve Cryptography (ECC), and Diffie-Hellman (DH), are all susceptible to attacks enabled by Shors algorithm. Protecting 5G networks against future quantum adversaries has therefore become an urgent engineering and research priority. In this paper we introduce QORE, a quantum-secure 5G and Beyond 5G (B5G) Core framework that provides a clear pathway for transitioning both the 5G Core Network Functions and User Equipment (UE) to Post-Quantum Cryptography (PQC). The framework uses the NIST-standardized lattice-based algorithms Module-Lattice Key Encapsulation Mechanism (ML-KEM) and Module-Lattice Digital Signature Algorithm (ML-DSA) and applies them across the 5G Service-Based Architecture (SBA). A Hybrid PQC (HPQC) configuration is also proposed, combining classical and quantum-safe primitives to maintain interoperability during migration. Experimental validation shows that ML-KEM achieves quantum security with minor performance overhead, meeting the low-latency and high-throughput requirements of carrier-grade 5G systems. The proposed roadmap aligns with ongoing 3GPP SA3 and SA5 study activities on the security and management of post-quantum networks as well as with NIST PQC standardization efforts, providing practical guidance for mitigating quantum-era risks while safeguarding long-term confidentiality and integrity of network data.
The telecommunications industry faces a dual transformation: the architectural shift toward Open Radio Access Networks (O-RAN) and the emerging threat from quantum computing. O-RAN disaggregated, multi-vendor architecture creates a larger attack surface vulnerable to crypt-analytically relevant quantum computers(CRQCs) that will break current public key cryptography. The Harvest Now, Decrypt Later (HNDL) attack strategy makes this threat immediate, as adversaries can intercept encrypted data today for future decryption. This paper presents Q-RAN, a comprehensive quantum-resistant security framework for O-RAN networks using NIST-standardized Post-Quantum Cryptography (PQC). We detail the implementation of ML-KEM (FIPS 203) and ML-DSA (FIPS 204), integrated with Quantum Random Number Generators (QRNG) for cryptographic entropy. The solution deploys PQ-IPsec, PQ-DTLS, and PQ-mTLS protocols across all O-RAN interfaces, anchored by a centralized Post-Quantum Certificate Authority (PQ-CA) within the SMO framework. This work provides a complete roadmap for securing disaggregated O-RAN ecosystems against quantum adversaries.
Adversarial attacks pose significant challenges to Machine Learning (ML) systems and especially Deep Neural Networks (DNNs) by subtly manipulating inputs to induce incorrect predictions. This paper investigates whether increasing the layer depth of deep neural networks affects their robustness against adversarial attacks in the Network Intrusion Detection System (NIDS) domain. We compare the adversarial robustness of various deep neural networks across both \ac{NIDS} and computer vision domains (the latter being widely used in adversarial attack experiments). Our experimental results reveal that in the NIDS domain, adding more layers does not necessarily improve their performance, yet it may actually significantly degrade their robustness against adversarial attacks. Conversely, in the computer vision domain, adding more layers exhibits a more modest impact on robustness. These findings can guide the development of robust neural networks for (NIDS) applications and highlight the unique characteristics of network security domains within the (ML) landscape.
As the volume of stored data continues to grow, identifying and protecting sensitive information within large repositories becomes increasingly challenging, especially when shared with multiple users with different roles and permissions. This work presents a system architecture for trusted data sharing with policy-driven access control, enabling selective protection of sensitive regions while maintaining scalability. The proposed architecture integrates four core modules that combine automated detection of sensitive regions, post-correction, key management, and access control. Sensitive regions are secured using a hybrid scheme that employs symmetric encryption for efficiency and Attribute-Based Encryption for policy enforcement. The system supports efficient key distribution and isolates key storage to strengthen overall security. To demonstrate its applicability, we evaluate the system on visual datasets, where Privacy-Sensitive Objects in images are automatically detected, reassessed, and selectively encrypted prior to sharing in a data repository. Experimental results show that our system provides effective PSO detection, increases macro-averaged F1 score (5%) and mean Average Precision (10%), and maintains an average policy-enforced decryption time of less than 1 second per image. These results demonstrate the effectiveness, efficiency and scalability of our proposed solution for fine-grained access control.
Problem Space: AI Vulnerabilities and Quantum Threats Generative AI vulnerabilities: model inversion, data poisoning, adversarial inputs. Quantum threats Shor Algorithm breaking RSA ECC encryption. Challenge Secure generative AI models against classical and quantum cyberattacks. Proposed Solution Collaborative Penetration Testing Suite Five Integrated Components: DAST SAST OWASP ZAP, Burp Suite, SonarQube, Fortify. IAST Contrast Assess integrated with CI CD pipeline. Blockchain Logging Hyperledger Fabric for tamper-proof logs. Quantum Cryptography Lattice based RLWE protocols. AI Red Team Simulations Adversarial ML & Quantum-assisted attacks. Integration Layer: Unified workflow for AI, cybersecurity, and quantum experts. Key Results 300+ vulnerabilities identified across test environments. 70% reduction in high-severity issues within 2 weeks. 90% resolution efficiency for blockchain-logged vulnerabilities. Quantum-resistant cryptography maintained 100% integrity in tests. Outcome: Quantum AI Security Protocol integrating Blockchain Quantum Cryptography AI Red Teaming.
RISC-V processors are becoming ubiquitous in critical applications, but their susceptibility to microarchitectural side-channel attacks is a serious concern. Detection of microarchitectural attacks in RISC-V is an emerging research topic that is relatively underexplored, compared to x86 and ARM. The first line of work to detect flush+fault-based microarchitectural attacks in RISC-V leverages Machine Learning (ML) models, yet it leaves several practical aspects that need further investigation. To address overlooked issues, we leveraged gem5 and propose a new detection method combining statistical preprocessing and association rule mining having reconfiguration capabilities to generalize the detection method for any microarchitectural attack. The performance comparison with state-of-the-art reveals that the proposed detection method achieves up to 5.15% increase in accuracy, 7% rise in precision, and 3.91% improvement in recall under the cryptographic, computational, and memory-intensive workloads alongside its flexibility to detect new variant of flush+fault attack. Moreover, as the attack detection relies on association rules, their human-interpretable nature provides deep insight to understand microarchitectural behavior during the execution of attack and benign applications.
Mobile apps often embed authentication secrets, such as API keys, tokens, and client IDs, to integrate with cloud services. However, developers often hardcode these credentials into Android apps, exposing them to extraction through reverse engineering. Once compromised, adversaries can exploit secrets to access sensitive data, manipulate resources, or abuse APIs, resulting in significant security and financial risks. Existing detection approaches, such as regex-based analysis, static analysis, and machine learning, are effective for identifying known patterns but are fundamentally limited: they require prior knowledge of credential structures, API signatures, or training data. In this paper, we propose SecretLoc, an LLM-based approach for detecting hardcoded secrets in Android apps. SecretLoc goes beyond pattern matching; it leverages contextual and structural cues to identify secrets without relying on predefined patterns or labeled training sets. Using a benchmark dataset from the literature, we demonstrate that SecretLoc detects secrets missed by regex-, static-, and ML-based methods, including previously unseen types of secrets. In total, we discovered 4828 secrets that were undetected by existing approaches, discovering more than 10 "new" types of secrets, such as OpenAI API keys, GitHub Access Tokens, RSA private keys, and JWT tokens, and more. We further extend our analysis to newly crawled apps from Google Play, where we uncovered and responsibly disclosed additional hardcoded secrets. Across a set of 5000 apps, we detected secrets in 2124 apps (42.5%), several of which were confirmed and remediated by developers after we contacted them. Our results reveal a dual-use risk: if analysts can uncover these secrets with LLMs, so can attackers. This underscores the urgent need for proactive secret management and stronger mitigation practices across the mobile ecosystem.
The impending adoption of Open Radio Access Network (O-RAN) is fueling innovation in the RAN towards data-driven operation. Unlike traditional RAN where the RAN data and its usage is restricted within proprietary and monolithic RAN equipment, the O-RAN architecture opens up access to RAN data via RAN intelligent controllers (RICs), to third-party machine learning (ML) powered applications - rApps and xApps - to optimize RAN operations. Consequently, a major focus has been placed on leveraging RAN data to unlock greater efficiency gains. However, there is an increasing recognition that RAN data access to apps could become a source of vulnerability and be exploited by malicious actors. Motivated by this, we carry out a comprehensive investigation of data vulnerabilities on both xApps and rApps, respectively hosted in Near- and Non-real-time (RT) RIC components of O-RAN. We qualitatively analyse the O-RAN security mechanisms and limitations for xApps and rApps, and consider a threat model informed by this analysis. We design a viable and effective black-box evasion attack strategy targeting O-RAN RIC Apps while accounting for the stringent timing constraints and attack effectiveness. The strategy employs four key techniques: the model cloning algorithm, input-specific perturbations, universal adversarial perturbations (UAPs), and targeted UAPs. This strategy targets ML models used by both xApps and rApps within the O-RAN system, aiming to degrade network performance. We validate the effectiveness of the designed evasion attack strategy and quantify the scale of performance degradation using a real-world O-RAN testbed and emulation environments. Evaluation is conducted using the Interference Classification xApp and the Power Saving rApp as representatives for near-RT and non-RT RICs. We also show that the attack strategy is effective against prominent defense techniques for adversarial ML.