Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
There is an increase in global malware threats. To address this, an encryption-type ransomware has been introduced on the Android operating system. The challenges associated with malicious threats in phone use have become a pressing issue in mobile communication, disrupting user experiences and posing significant privacy threats. This study surveys commonly used machine learning techniques for detecting malicious threats in phones and examines their performance. The majority of past research focuses on customer feedback and reviews, with concerns that people might create false reviews to promote or devalue products and services for personal gain. Hence, the development of techniques for detecting malicious threats using machine learning has been a key focus. This paper presents a comprehensive comparative study of current research on the issue of malicious threats and methods for tackling these challenges. Nevertheless, a huge amount of information is required by these methods, presenting a challenge for developing robust, specialized automated anti-malware systems. This research describes the Android Applications dataset, and the accuracy of the techniques is measured using the accuracy levels of the metrics employed in this study.
Cyber attacks threaten economic interests, critical infrastructure, and public health and safety. To counter this, entities adopt cyber threat hunting, a proactive approach that involves formulating hypotheses and searching for attack patterns within organisational networks. Automating cyber threat hunting presents challenges, particularly in generating hypotheses, as it is a manually created and confirmed process, making it time-consuming. To address these challenges, we introduce APThreatHunter, an automated threat hunting solution that generates hypotheses with minimal human intervention, eliminating analyst bias and reducing time and cost. This is done by presenting possible risks based on the system's current state and a set of indicators to indicate whether any of the detected risks are happening or not. We evaluated APThreatHunter using real-world Android malware samples, and the results revealed the practicality of using automated planning for goal hypothesis generation in cyber threat hunting activities.
Cryptocurrency blockchain networks safeguard digital assets using cryptographic keys, with wallets playing a critical role in generating, storing, and managing these keys. Wallets, typically categorized as hot and cold, offer varying degrees of security and convenience. However, they are generally software-based applications running on microcontrollers. Consequently, they are vulnerable to malware and side-channel attacks, allowing perpetrators to extract private keys by targeting critical algorithms, such as ECC, which processes private keys to generate public keys and authorize transactions. To address these issues, this work presents EthVault, the first hardware architecture for an Ethereum hierarchically deterministic cold wallet, featuring hardware implementations of key algorithms for secure key generation. Also, an ECC architecture resilient to side-channel and timing attacks is proposed. Moreover, an architecture of the child key derivation function, a fundamental component of cryptocurrency wallets, is proposed. The design minimizes resource usage, meeting market demand for small, portable cryptocurrency wallets. FPGA implementation results validate the feasibility of the proposed approach. The ECC architecture exhibits uniform execution behavior across varying inputs, while the complete design utilizes only 27%, 7%, and 6% of LUTs, registers, and RAM blocks, respectively, on a Xilinx Zynq UltraScale+ FPGA.
This survey systematizes the evolution of network intrusion detection systems (NIDS), from conventional methods such as signature-based and neural network (NN)-based approaches to recent integrations with large language models (LLMs). It clearly and concisely summarizes the current status, strengths, and limitations of conventional techniques, and explores the practical benefits of integrating LLMs into NIDS. Recent research on the application of LLMs to NIDS in diverse environments is reviewed, including conventional network infrastructures, autonomous vehicle environments and IoT environments. From this survey, readers will learn that: 1) the earliest methods, signature-based IDSs, continue to make significant contributions to modern systems, despite their well-known weaknesses; 2) NN-based detection, although considered promising and under development for more than two decades, and despite numerous related approaches, still faces significant challenges in practical deployment; 3) LLMs are useful for NIDS in many cases, and a number of related approaches have been proposed; however, they still face significant challenges in practical applications. Moreover, they can even be exploited as offensive tools, such as for generating malware, crafting phishing messages, or launching cyberattacks. Recently, several studies have been proposed to address these challenges, which are also reviewed in this survey; and 4) strategies for constructing domain-specific LLMs have been proposed and are outlined in this survey, as it is nearly impossible to train a NIDS-specific LLM from scratch.
Large language models (LLMs) remain vulnerable to sophisticated prompt engineering attacks that exploit contextual framing to bypass safety mechanisms, posing significant risks in cybersecurity applications. We introduce Jailbreak Mimicry, a systematic methodology for training compact attacker models to automatically generate narrative-based jailbreak prompts in a one-shot manner. Our approach transforms adversarial prompt discovery from manual craftsmanship into a reproducible scientific process, enabling proactive vulnerability assessment in AI-driven security systems. Developed for the OpenAI GPT-OSS-20B Red-Teaming Challenge, we use parameter-efficient fine-tuning (LoRA) on Mistral-7B with a curated dataset derived from AdvBench, achieving an 81.0% Attack Success Rate (ASR) against GPT-OSS-20B on a held-out test set of 200 items. Cross-model evaluation reveals significant variation in vulnerability patterns: our attacks achieve 66.5% ASR against GPT-4, 79.5% on Llama-3 and 33.0% against Gemini 2.5 Flash, demonstrating both broad applicability and model-specific defensive strengths in cybersecurity contexts. This represents a 54x improvement over direct prompting (1.5% ASR) and demonstrates systematic vulnerabilities in current safety alignment approaches. Our analysis reveals that technical domains (Cybersecurity: 93% ASR) and deception-based attacks (Fraud: 87.8% ASR) are particularly vulnerable, highlighting threats to AI-integrated threat detection, malware analysis, and secure systems, while physical harm categories show greater resistance (55.6% ASR). We employ automated harmfulness evaluation using Claude Sonnet 4, cross-validated with human expert assessment, ensuring reliable and scalable evaluation for cybersecurity red-teaming. Finally, we analyze failure mechanisms and discuss defensive strategies to mitigate these vulnerabilities in AI for cybersecurity.
Reverse engineering (RE) of x86 binaries is indispensable for malware and firmware analysis, but remains slow due to stripped metadata and adversarial obfuscation. Large Language Models (LLMs) offer potential for improving RE efficiency through automated comprehension and commenting, but cloud-hosted, closed-weight models pose privacy and security risks and cannot be used in closed-network facilities. We evaluate parameter-efficient fine-tuned local LLMs for assisting with x86 RE tasks in these settings. Eight open-weight models across the CodeLlama, Qwen2.5-Coder, and CodeGemma series are fine-tuned on a custom curated dataset of 5,981 x86 assembly examples. We evaluate them quantitatively and identify the fine-tuned Qwen2.5-Coder-7B as the top performer, which we name REx86. REx86 reduces test-set cross-entropy loss by 64.2% and improves semantic cosine similarity against ground truth by 20.3\% over its base model. In a limited user case study (n=43), REx86 significantly enhanced line-level code understanding (p = 0.031) and increased the correct-solve rate from 31% to 53% (p = 0.189), though the latter did not reach statistical significance. Qualitative analysis shows more accurate, concise comments with fewer hallucinations. REx86 delivers state-of-the-art assistance in x86 RE among local, open-weight LLMs. Our findings demonstrate the value of domain-specific fine-tuning, and highlight the need for more commented disassembly data to further enhance LLM performance in RE. REx86, its dataset, and LoRA adapters are publicly available at https://github.com/dlea8/REx86 and https://zenodo.org/records/15420461.
Pretrained deep learning model sharing holds tremendous value for researchers and enterprises alike. It allows them to apply deep learning by fine-tuning models at a fraction of the cost of training a brand-new model. However, model sharing exposes end-users to cyber threats that leverage the models for malicious purposes. Attackers can use model sharing by hiding self-executing malware inside neural network parameters and then distributing them for unsuspecting users to unknowingly directly execute them, or indirectly as a dependency in another software. In this work, we propose NeuPerm, a simple yet effec- tive way of disrupting such malware by leveraging the theoretical property of neural network permutation symmetry. Our method has little to no effect on model performance at all, and we empirically show it successfully disrupts state-of-the-art attacks that were only previously addressed using quantization, a highly complex process. NeuPerm is shown to work on LLMs, a feat that no other previous similar works have achieved. The source code is available at https://github.com/danigil/NeuPerm.git.
The Internet of Things (IoT) has recently proliferated in both size and complexity. Using multi-source and heterogeneous IoT data aids in providing efficient data analytics for a variety of prevalent and crucial applications. To address the privacy and security concerns raised by analyzing IoT data locally or in the cloud, distributed data analytics techniques were proposed to collect and analyze data in edge or fog devices. In this context, federated learning has been recommended as an ideal distributed machine/deep learning-based technique for edge/fog computing environments. Additionally, the data analytics results are time-sensitive; they should be generated with minimal latency and high reliability. As a result, reusing efficient architectures validated through a high number of challenging test cases would be advantageous. The work proposed here presents a solution using a microservices-based architecture that allows an IoT application to be structured as a collection of fine-grained, loosely coupled, and reusable entities. The proposed solution uses the promising capabilities of federated learning to provide intelligent microservices that ensure efficient, flexible, and extensible data analytics. This solution aims to deliver cloud calculations to the edge to reduce latency and bandwidth congestion while protecting the privacy of exchanged data. The proposed approach was validated through an IoT-malware detection and classification use case. MaleVis, a publicly available dataset, was used in the experiments to analyze and validate the proposed approach. This dataset included more than 14,000 RGB-converted images, comprising 25 malware classes and one benign class. The results showed that our proposed approach outperformed existing state-of-the-art methods in terms of detection and classification performance, with a 99.24%.
Web-based behavior-manipulation attacks (BMAs) - such as scareware, fake software downloads, tech support scams, etc. - are a class of social engineering (SE) attacks that exploit human decision-making vulnerabilities. These attacks remain under-studied compared to other attacks such as information harvesting attacks (e.g., phishing) or malware infections. Prior technical work has primarily focused on measuring BMAs, offering little in the way of generic defenses. To address this gap, we introduce Pixel Patrol 3D (PP3D), the first end-to-end browser framework for discovering, detecting, and defending against behavior-manipulating SE attacks in real time. PP3D consists of a visual detection model implemented within a browser extension, which deploys the model client-side to protect users across desktop and mobile devices while preserving privacy. Our evaluation shows that PP3D can achieve above 99% detection rate at 1% false positives, while maintaining good latency and overhead performance across devices. Even when faced with new BMA samples collected months after training the detection model, our defense system can still achieve above 97% detection rate at 1% false positives. These results demonstrate that our framework offers a practical, effective, and generalizable defense against a broad and evolving class of web behavior-manipulation attacks.
Host-based cryptomining malware, commonly known as cryptojackers, have gained notoriety for their stealth and the significant financial losses they cause in Linux-based cloud environments. Existing solutions often struggle with scalability due to high monitoring overhead, low detection accuracy against obfuscated behavior, and lack of integrated remediation. We present CryptoGuard, a lightweight hybrid solution that combines detection and remediation strategies to counter cryptojackers. To ensure scalability, CryptoGuard uses sketch- and sliding window-based syscall monitoring to collect behavior patterns with minimal overhead. It decomposes the classification task into a two-phase process, leveraging deep learning models to identify suspicious activity with high precision. To counter evasion techniques such as entry point poisoning and PID manipulation, CryptoGuard integrates targeted remediation mechanisms based on eBPF, a modern Linux kernel feature deployable on any compatible host. Evaluated on 123 real-world cryptojacker samples, it achieves average F1-scores of 96.12% and 92.26% across the two phases, and outperforms state-of-the-art baselines in terms of true and false positive rates, while incurring only 0.06% CPU overhead per host.
The rapidly evolving Android malware ecosystem demands high-quality, real-time datasets as a foundation for effective detection and defense. With the widespread adoption of mobile devices across industrial systems, they have become a critical yet often overlooked attack surface in industrial cybersecurity. However, mainstream datasets widely used in academia and industry (e.g., Drebin) exhibit significant limitations: on one hand, their heavy reliance on VirusTotal's multi-engine aggregation results introduces substantial label noise; on the other hand, outdated samples reduce their temporal relevance. Moreover, automated labeling tools (e.g., AVClass2) suffer from suboptimal aggregation strategies, further compounding labeling errors and propagating inaccuracies throughout the research community.
Dynamic program analysis is invaluable for malware detection, debugging, and performance profiling. However, software-based instrumentation incurs high overhead and can be evaded by anti-analysis techniques. In this paper, we propose LibIHT, a hardware-assisted tracing framework that leverages on-CPU branch tracing features (Intel Last Branch Record and Branch Trace Store) to efficiently capture program control-flow with minimal performance impact. Our approach reconstructs control-flow graphs (CFGs) by collecting hardware generated branch execution data in the kernel, preserving program behavior against evasive malware. We implement LibIHT as an OS kernel module and user-space library, and evaluate it on both benign benchmark programs and adversarial anti-instrumentation samples. Our results indicate that LibIHT reduces runtime overhead by over 150x compared to Intel Pin (7x vs 1,053x slowdowns), while achieving high fidelity in CFG reconstruction (capturing over 99% of execution basic blocks and edges). Although this hardware-assisted approach sacrifices the richer semantic detail available from full software instrumentation by capturing only branch addresses, this trade-off is acceptable for many applications where performance and low detectability are paramount. Our findings show that hardware-based tracing captures control flow information significantly faster, reduces detection risk and performs dynamic analysis with minimal interference.
Malicious software attacks are having an increasingly significant economic impact. Commercial malware detection software can be costly, and tools that attribute malware to the specific software vulnerabilities it exploits are largely lacking. Understanding the connection between malware and the vulnerabilities it targets is crucial for analyzing past threats and proactively defending against current ones. In this study, we propose an approach that leverages large language models (LLMs) to detect binary malware, specifically within JAR files, and utilizes the capabilities of LLMs combined with retrieval-augmented generation (RAG) to identify Common Vulnerabilities and Exposures (CVEs) that malware may exploit. We developed a proof-of-concept tool called MalCVE, which integrates binary code decompilation, deobfuscation, LLM-based code summarization, semantic similarity search, and CVE classification using LLMs. We evaluated MalCVE using a benchmark dataset of 3,839 JAR executables. MalCVE achieved a mean malware detection accuracy of 97%, at a fraction of the cost of commercial solutions. It is also the first tool to associate CVEs with binary malware, achieving a recall@10 of 65%, which is comparable to studies that perform similar analyses on source code.
According to a recent EUROPOL report, cybercrime is still recurrent in Europe, and different activities and countermeasures must be taken to limit, prevent, detect, analyze, and fight it. Cybercrime must be prevented with specific measures, tools, and techniques, for example through automated network and malware analysis. Countermeasures against cybercrime can also be improved with proper \df analysis in order to extract data from digital devices trying to retrieve information on the cybercriminals. Indeed, results obtained through a proper \df analysis can be leveraged to train cybercrime detection systems to prevent the success of similar crimes. Nowadays, some systems have started to adopt Artificial Intelligence (AI) algorithms for cyberattack detection and \df analysis improvement. However, AI can be better applied as an additional instrument in these systems to improve the detection and in the \df analysis. For this reason, we highlight how cybercrime analysis and \df procedures can take advantage of AI. On the other hand, cybercriminals can use these systems to improve their skills, bypass automatic detection, and develop advanced attack techniques. The case study we presented highlights how it is possible to integrate the use of the three popular chatbots {\tt Gemini}, {\tt Copilot} and {\tt chatGPT} to develop a Python code to encode and decoded images with steganographic technique, even though their presence is not an indicator of crime, attack or maliciousness but used by a cybercriminal as anti-forensics technique.
Mobile app markets host millions of apps, yet undesired behaviors (e.g., disruptive ads, illegal redirection, payment deception) remain hard to catch because they often do not rely on permission-protected APIs and can be easily camouflaged via UI or metadata edits. We present BINCTX, a learning approach that builds multi-modal representations of an app from (i) a global bytecode-as-image view that captures code-level semantics and family-style patterns, (ii) a contextual view (manifested actions, components, declared permissions, URL/IP constants) indicating how behaviors are triggered, and (iii) a third-party-library usage view summarizing invocation frequencies along inter-component call paths. The three views are embedded and fused to train a contextual-aware classifier. On real-world malware and benign apps, BINCTX attains a macro F1 of 94.73%, outperforming strong baselines by at least 14.92%. It remains robust under commercial obfuscation (F1 84% post-obfuscation) and is more resistant to adversarial samples than state-of-the-art bytecode-only systems.
Indoor robotic systems within Cyber-Physical Systems (CPS) are increasingly exposed to Denial of Service (DoS) attacks that compromise localization, control and telemetry integrity. We propose a privacy-aware malware detection framework for indoor robotic systems, which leverages hybrid quantum computing and deep neural networks to counter DoS threats in CPS, while preserving privacy information. By integrating quantum-enhanced feature encoding with dropout-optimized deep learning, our architecture achieves up to 95.2% detection accuracy under privacy-constrained conditions. The system operates without handcrafted thresholds or persistent beacon data, enabling scalable deployment in adversarial environments. Benchmarking reveals robust generalization, interpretability and resilience against training instability through modular circuit design. This work advances trustworthy AI for secure, autonomous CPS operations.
Over the last decade, machine learning has been extensively applied to identify malicious Android applications. However, such approaches remain vulnerable against adversarial examples, i.e., examples that are subtly manipulated to fool a machine learning model into making incorrect predictions. This research presents DeepTrust, a novel metaheuristic that arranges flexible classifiers, like deep neural networks, into an ordered sequence where the final decision is made by a single internal model based on conditions activated in cascade. In the Robust Android Malware Detection competition at the 2025 IEEE Conference SaTML, DeepTrust secured the first place and achieved state-of-the-art results, outperforming the next-best competitor by up to 266% under feature-space evasion attacks. This is accomplished while maintaining the highest detection rate on non-adversarial malware and a false positive rate below 1%. The method's efficacy stems from maximizing the divergence of the learned representations among the internal models. By using classifiers inducing fundamentally dissimilar embeddings of the data, the decision space becomes unpredictable for an attacker. This frustrates the iterative perturbation process inherent to evasion attacks, enhancing system robustness without compromising accuracy on clean examples.
Malicious URLs pose significant security risks as they facilitate phishing attacks, distribute malware, and empower attackers to deface websites. Blacklist detection methods fail to identify new or obfuscated URLs because they depend on pre-existing patterns. This work presents a hybrid deep learning model named GNN-GAT-LSTM that combines Graph Neural Networks (GNNs) with Graph Attention Networks (GATs) and Long Short-Term Memory (LSTM) networks. The proposed architecture extracts both the structural and sequential patterns of the features from data. The model transforms URLs into graphs through a process where characters become nodes that connect through edges. It applies one-hot encoding to represent node features. The model received training and testing data from a collection of 651,191 URLs, which were classified into benign, phishing, defacement, and malware categories. The preprocessing stage included both feature engineering and data balancing techniques, which addressed the class imbalance issue to enhance model learning. The GNN-GAT-LSTM model achieved outstanding performance through its test accuracy of 0.9806 and its weighted F1-score of 0.9804. It showed excellent precision and recall performance across most classes, particularly for benign and defacement URLs. Overall, the model provides an efficient and scalable system for detecting malicious URLs while demonstrating strong potential for real-world cybersecurity applications.