Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
Powerful machine learning (ML) models are now readily available online, which creates exciting possibilities for users who lack the deep technical expertise or substantial computing resources needed to develop them. On the other hand, this type of open ecosystem comes with many risks. In this paper, we argue that the current ecosystem for open ML models contains significant supply-chain risks, some of which have been exploited already in real attacks. These include an attacker replacing a model with something malicious (e.g., malware), or a model being trained using a vulnerable version of a framework or on restricted or poisoned data. We then explore how Sigstore, a solution designed to bring transparency to open-source software supply chains, can be used to bring transparency to open ML models, in terms of enabling model publishers to sign their models and prove properties about the datasets they use.
This paper addresses the challenge of creating smart contracts for applications represented using Business Process Management and Notation (BPMN) models. In our prior work we presented a methodology that automates the generation of smart contracts from BPMN models. This approach abstracts the BPMN flow control, making it independent of the underlying blockchain infrastructure, with only the BPMN task elements requiring coding. In subsequent research, we enhanced our approach by adding support for nested transactions and enabling a smart contract repair and/or upgrade. To empower Business Analysts (BAs) to generate smart contracts without relying on software developers, we tackled the challenge of generating smart contracts from BPMN models without assistance of a software developer. We exploit the Decision Model and Notation (DMN) standard to represent the decisions and the business logic of the BPMN task elements and amended our methodology for transformation of BPMN models into smart contracts to support also the generation script to represent the business logic represented by the DMN models. To support such transformation, we describe how the BA documents, using the BPMN elements, the flow of information along with the flow of execution. Thus, if the BA is successful in representing the blockchain application requirements using BPMN and DMN models, our methodology and the tool, called TABS, that we developed as a proof of concept, is used to generate the smart contracts directly from those models without developer assistance.
Even today, over 70% of security vulnerabilities in critical software systems result from memory safety violations. To address this challenge, fuzzing and static analysis are widely used automated methods to discover such vulnerabilities. Fuzzing generates random program inputs to identify faults, while static analysis examines source code to detect potential vulnerabilities. Although these techniques share a common goal, they take fundamentally different approaches and have evolved largely independently. In this paper, we present an empirical analysis of five static analyzers and 13 fuzzers, applied to over 100 known security vulnerabilities in C/C++ programs. We measure the number of bug reports generated for each vulnerability to evaluate how the approaches differ and complement each other. Moreover, we randomly sample eight bug-containing functions, manually analyze all bug reports therein, and quantify false-positive rates. We also assess limits to bug discovery, ease of use, resource requirements, and integration into the development process. We find that both techniques discover different types of bugs, but there are clear winners for each. Developers should consider these tools depending on their specific workflow and usability requirements. Based on our findings, we propose future directions to foster collaboration between these research domains.
Software systems have grown as an indispensable commodity used across various industries, and almost all essential services depend on them for effective operation. The software is no longer an independent or stand-alone piece of code written by a developer but rather a collection of packages designed by multiple developers across the globe. Ensuring the reliability and resilience of these systems is crucial since emerging threats target software supply chains, as demonstrated by the widespread SolarWinds hack in late 2020. These supply chains extend beyond patches and updates, involving distribution networks throughout the software lifecycle. Industries like smart grids, manufacturing, healthcare, and finance rely on interconnected software systems and their dependencies for effective functioning. To secure software modules and add-ons, robust distribution architectures are essential. The proposed chapter enhances the existing delivery frameworks by including a permissioned ledger with Proof of Authority consensus and multi-party signatures. The proposed system aims to prevent attacks while permitting every stakeholder to verify the same. Critical systems can interface with the secure pipeline without disrupting existing functionalities, thus preventing the cascading effect of an attack at any point in the supply chain.
Recognizing vulnerabilities in stripped binary files presents a significant challenge in software security. Although some progress has been made in generating human-readable information from decompiled binary files with Large Language Models (LLMs), effectively and scalably detecting vulnerabilities within these binary files is still an open problem. This paper explores the novel application of LLMs to detect vulnerabilities within these binary files. We demonstrate the feasibility of identifying vulnerable programs through a combined approach of decompilation optimization to make the vulnerabilities more prominent and long-term memory for a larger context window, achieving state-of-the-art performance in binary vulnerability analysis. Our findings highlight the potential for LLMs to overcome the limitations of traditional analysis methods and advance the field of binary vulnerability detection, paving the way for more secure software systems. In this paper, we present Vul-BinLLM , an LLM-based framework for binary vulnerability detection that mirrors traditional binary analysis workflows with fine-grained optimizations in decompilation and vulnerability reasoning with an extended context. In the decompilation phase, Vul-BinLLM adds vulnerability and weakness comments without altering the code structure or functionality, providing more contextual information for vulnerability reasoning later. Then for vulnerability reasoning, Vul-BinLLM combines in-context learning and chain-of-thought prompting along with a memory management agent to enhance accuracy. Our evaluations encompass the commonly used synthetic dataset Juliet to evaluate the potential feasibility for analysis and vulnerability detection in C/C++ binaries. Our evaluations show that Vul-BinLLM is highly effective in detecting vulnerabilities on the compiled Juliet dataset.
This report presents a comprehensive analysis of a malicious software sample, detailing its architecture, behavioral characteristics, and underlying intent. Through static and dynamic examination, the malware core functionalities, including persistence mechanisms, command-and-control communication, and data exfiltration routines, are identified and its supporting infrastructure is mapped. By correlating observed indicators of compromise with known techniques, tactics, and procedures, this analysis situates the sample within the broader context of contemporary threat campaigns and infers the capabilities and motivations of its likely threat actor. Building on these findings, actionable threat intelligence is provided to support proactive defenses. Threat hunting teams receive precise detection hypotheses for uncovering latent adversarial presence, while monitoring systems can refine alert logic to detect anomalous activity in real time. Finally, the report discusses how this structured intelligence enhances predictive risk assessments, informs vulnerability prioritization, and strengthens organizational resilience against advanced persistent threats. By integrating detailed technical insights with strategic threat landscape mapping, this malware analysis report not only reconstructs past adversary actions but also establishes a robust foundation for anticipating and mitigating future attacks.
Supply chain attacks have emerged as a prominent cybersecurity threat in recent years. Reproducible and bootstrappable builds have the potential to reduce such attacks significantly. In combination with independent, exhaustive and periodic source code audits, these measures can effectively eradicate compromises in the building process. In this paper we introduce both concepts, we analyze the achievements over the last ten years and explain the remaining challenges. We contribute to the reproducible builds effort by setting up a rebuilder and verifier instance to test the reproducibility of Arch Linux packages. Using the results from this instance, we uncover an unnoticed and security-relevant packaging issue affecting 16 packages related to Certbot, the recommended software to install TLS certificates from Let's Encrypt, making them unreproducible. Additionally, we find the root cause of unreproduciblity in the source code of fwupd, a critical software used to update device firmware on Linux devices, and submit an upstream patch to fix it.
Modern software supply chains face an increasing threat from malicious code hidden in trusted components such as browser extensions, IDE extensions, and open-source packages. This paper introduces JavaSith, a novel client-side framework for analyzing potentially malicious extensions in web browsers, Visual Studio Code (VSCode), and Node's NPM packages. JavaSith combines a runtime sandbox that emulates browser/Node.js extension APIs (with a ``time machine'' to accelerate time-based triggers) with static analysis and a local large language model (LLM) to assess risk from code and metadata. We present the design and architecture of JavaSith, including techniques for intercepting extension behavior over simulated time and extracting suspicious patterns. Through case studies on real-world attacks (such as a supply-chain compromise of a Chrome extension and malicious VSCode extensions installing cryptominers), we demonstrate how JavaSith can catch stealthy malicious behaviors that evade traditional detection. We evaluate the framework's effectiveness and discuss its limitations and future enhancements. JavaSith's client-side approach empowers end-users/organizations to vet extensions and packages before trustingly integrating them into their environments.
Vulnerabilities in open-source software can cause cascading effects in the modern digital ecosystem. It is especially worrying if these vulnerabilities repeat across many projects, as once the adversaries find one of them, they can scale up the attack very easily. Unfortunately, since developers frequently reuse code from their own or external code resources, some nearly identical vulnerabilities exist across many open-source projects. We conducted a study to examine the prevalence of a particular vulnerable code pattern that enables path traversal attacks (CWE-22) across open-source GitHub projects. To handle this study at the GitHub scale, we developed an automated pipeline that scans GitHub for the targeted vulnerable pattern, confirms the vulnerability by first running a static analysis and then exploiting the vulnerability in the context of the studied project, assesses its impact by calculating the CVSS score, generates a patch using GPT-4, and reports the vulnerability to the maintainers. Using our pipeline, we identified 1,756 vulnerable open-source projects, some of which are very influential. For many of the affected projects, the vulnerability is critical (CVSS score higher than 9.0), as it can be exploited remotely without any privileges and critically impact the confidentiality and availability of the system. We have responsibly disclosed the vulnerability to the maintainers, and 14\% of the reported vulnerabilities have been remediated. We also investigated the root causes of the vulnerable code pattern and assessed the side effects of the large number of copies of this vulnerable pattern that seem to have poisoned several popular LLMs. Our study highlights the urgent need to help secure the open-source ecosystem by leveraging scalable automated vulnerability management solutions and raising awareness among developers.
Large language models (LLMs) have shown promise in software engineering, yet their effectiveness for binary analysis remains unexplored. We present the first comprehensive evaluation of commercial LLMs for assembly code deobfuscation. Testing seven state-of-the-art models against four obfuscation scenarios (bogus control flow, instruction substitution, control flow flattening, and their combination), we found striking performance variations--from autonomous deobfuscation to complete failure. We propose a theoretical framework based on four dimensions: Reasoning Depth, Pattern Recognition, Noise Filtering, and Context Integration, explaining these variations. Our analysis identifies five error patterns: predicate misinterpretation, structural mapping errors, control flow misinterpretation, arithmetic transformation errors, and constant propagation errors, revealing fundamental limitations in LLM code processing.We establish a three-tier resistance model: bogus control flow (low resistance), control flow flattening (moderate resistance), and instruction substitution/combined techniques (high resistance). Universal failure against combined techniques demonstrates that sophisticated obfuscation remains effective against advanced LLMs. Our findings suggest a human-AI collaboration paradigm where LLMs reduce expertise barriers for certain reverse engineering tasks while requiring human guidance for complex deobfuscation. This work provides a foundation for evaluating emerging capabilities and developing resistant obfuscation techniques.x deobfuscation. This work provides a foundation for evaluating emerging capabilities and developing resistant obfuscation techniques.
Ensuring that large language models (LLMs) can effectively assess, detect, explain, and remediate software vulnerabilities is critical for building robust and secure software systems. We introduce VADER, a human-evaluated benchmark designed explicitly to assess LLM performance across four key vulnerability-handling dimensions: assessment, detection, explanation, and remediation. VADER comprises 174 real-world software vulnerabilities, each carefully curated from GitHub repositories and annotated by security experts. For each vulnerability case, models are tasked with identifying the flaw, classifying it using Common Weakness Enumeration (CWE), explaining its underlying cause, proposing a patch, and formulating a test plan. Using a one-shot prompting strategy, we benchmark six state-of-the-art LLMs (Claude 3.7 Sonnet, Gemini 2.5 Pro, GPT-4.1, GPT-4.5, Grok 3 Beta, and o3) on VADER, and human security experts evaluated each response according to a rigorous scoring rubric emphasizing remediation (quality of the code fix, 50%), explanation (20%), and classification and test plan (30%) according to a standardized rubric. Our results show that current state-of-the-art LLMs achieve only moderate success on VADER - OpenAI's o3 attained 54.7% accuracy overall, with others in the 49-54% range, indicating ample room for improvement. Notably, remediation quality is strongly correlated (Pearson r > 0.97) with accurate classification and test plans, suggesting that models that effectively categorize vulnerabilities also tend to fix them well. VADER's comprehensive dataset, detailed evaluation rubrics, scoring tools, and visualized results with confidence intervals are publicly released, providing the community with an interpretable, reproducible benchmark to advance vulnerability-aware LLMs. All code and data are available at: https://github.com/AfterQuery/vader
Motivated by the success of general-purpose large language models (LLMs) in software patching, recent works started to train specialized patching models. Most works trained one model to handle the end-to-end patching pipeline (including issue localization, patch generation, and patch validation). However, it is hard for a small model to handle all tasks, as different sub-tasks have different workflows and require different expertise. As such, by using a 70 billion model, SOTA methods can only reach up to 41% resolved rate on SWE-bench-Verified. Motivated by the collaborative nature, we propose Co-PatcheR, the first collaborative patching system with small and specialized reasoning models for individual components. Our key technique novelties are the specific task designs and training recipes. First, we train a model for localization and patch generation. Our localization pinpoints the suspicious lines through a two-step procedure, and our generation combines patch generation and critique. We then propose a hybrid patch validation that includes two models for crafting issue-reproducing test cases with and without assertions and judging patch correctness, followed by a majority vote-based patch selection. Through extensive evaluation, we show that Co-PatcheR achieves 46% resolved rate on SWE-bench-Verified with only 3 x 14B models. This makes Co-PatcheR the best patcher with specialized models, requiring the least training resources and the smallest models. We conduct a comprehensive ablation study to validate our recipes, as well as our choice of training data number, model size, and testing-phase scaling strategy.
Many critical information technology and cyber-physical systems rely on a supply chain of open-source software projects. OSS project maintainers often integrate contributions from external actors. While maintainers can assess the correctness of a change request, assessing a change request's cybersecurity implications is challenging. To help maintainers make this decision, we propose that the open-source ecosystem should incorporate Actor Reputation Metrics (ARMS). This capability would enable OSS maintainers to assess a prospective contributor's cybersecurity reputation. To support the future instantiation of ARMS, we identify seven generic security signals from industry standards; map concrete metrics from prior work and available security tools, describe study designs to refine and assess the utility of ARMS, and finally weigh its pros and cons.
In this survey, we will explore the interaction between secure multiparty computation and the area of machine learning. Recent advances in secure multiparty computation (MPC) have significantly improved its applicability in the realm of machine learning (ML), offering robust solutions for privacy-preserving collaborative learning. This review explores key contributions that leverage MPC to enable multiple parties to engage in ML tasks without compromising the privacy of their data. The integration of MPC with ML frameworks facilitates the training and evaluation of models on combined datasets from various sources, ensuring that sensitive information remains encrypted throughout the process. Innovations such as specialized software frameworks and domain-specific languages streamline the adoption of MPC in ML, optimizing performance and broadening its usage. These frameworks address both semi-honest and malicious threat models, incorporating features such as automated optimizations and cryptographic auditing to ensure compliance and data integrity. The collective insights from these studies highlight MPC's potential in fostering collaborative yet confidential data analysis, marking a significant stride towards the realization of secure and efficient computational solutions in privacy-sensitive industries. This paper investigates a spectrum of SecureML libraries that includes cryptographic protocols, federated learning frameworks, and privacy-preserving algorithms. By surveying the existing literature, this paper aims to examine the efficacy of these libraries in preserving data privacy, ensuring model confidentiality, and fortifying ML systems against adversarial attacks. Additionally, the study explores an innovative application domain for SecureML techniques: the integration of these methodologies in gaming environments utilizing ML.
Command injection vulnerabilities are a significant security threat in dynamic languages like Python, particularly in widely used open-source projects where security issues can have extensive impact. With the proven effectiveness of Large Language Models(LLMs) in code-related tasks, such as testing, researchers have explored their potential for vulnerabilities analysis. This study evaluates the potential of large language models (LLMs), such as GPT-4, as an alternative approach for automated testing for vulnerability detection. In particular, LLMs have demonstrated advanced contextual understanding and adaptability, making them promising candidates for identifying nuanced security vulnerabilities within code. To evaluate this potential, we applied LLM-based analysis to six high-profile GitHub projects-Django, Flask, TensorFlow, Scikit-learn, PyTorch, and Langchain-each with over 50,000 stars and extensive adoption across software development and academic research. Our analysis assesses both the strengths and limitations of LLMs in detecting command injection vulnerabilities, evaluating factors such as detection accuracy, efficiency, and practical integration into development workflows. In addition, we provide a comparative analysis of different LLM tools to identify those most suitable for security applications. Our findings offer guidance for developers and security researchers on leveraging LLMs as innovative and automated approaches to enhance software security.
Outdated software remains a potent and underappreciated menace in 2025's cybersecurity environment, exposing systems to a broad array of threats, including ransomware, data breaches, and operational outages that can have devastating and far-reaching impacts. This essay explores the unseen threats of cyberattacks by presenting robust statistical information, including the staggering reality that 32% of cyberattacks exploit unpatched software vulnerabilities, based on a 2025 TechTarget survey. Furthermore, it discusses real case studies, including the MOVEit breach in 2023 and the Log4Shell breach in 2021, both of which illustrate the catastrophic consequences of failing to perform software updates. The article offers a detailed analysis of the nature of software vulnerabilities, the underlying reasons for user resistance to patches, and organizational barriers that compound the issue. Furthermore, it suggests actionable solutions, including automation and awareness campaigns, to address these shortcomings. Apart from this, the paper also talks of trends such as AI-driven vulnerability patching and legal consequences of non-compliance under laws like HIPAA, thus providing a futuristic outlook on how such advancements may define future defenses. Supplemented by tables like one detailing trends in vulnerability and a graph illustrating technology adoption, this report showcases the pressing demand for anticipatory update strategies to safeguard digital ecosystems against the constantly evolving threats that characterize the modern cyber landscape. As it stands, it is a very useful document for practitioners, policymakers, and researchers.
Embedded devices are increasingly ubiquitous and vital, often supporting safety-critical functions. However, due to strict cost and energy constraints, they are typically implemented with Micro-Controller Units (MCUs) that lack advanced architectural security features. Within this space, recent efforts have created low-cost architectures capable of generating Proofs of Execution (PoX) of software on potentially compromised MCUs. This capability can ensure the integrity of sensor data from the outset, by binding sensed results to an unforgeable cryptographic proof of execution on edge sensor MCUs. However, the security of existing PoX requires the proven execution to occur atomically. This requirement precludes the application of PoX to (1) time-shared systems, and (2) applications with real-time constraints, creating a direct conflict between execution integrity and the real-time availability needs of several embedded system uses. In this paper, we formulate a new security goal called Real-Time Proof of Execution (RT-PoX) that retains the integrity guarantees of classic PoX while enabling its application to existing real-time systems. This is achieved by relaxing the atomicity requirement of PoX while dispatching interference attempts from other potentially malicious tasks (or compromised operating systems) executing on the same device. To realize the RT-PoX goal, we develop Provable Execution Architecture for Real-Time Systems (PEARTS). To the best of our knowledge, PEARTS is the first PoX system that can be directly deployed alongside a commodity embedded real-time operating system (FreeRTOS). This enables both real-time scheduling and execution integrity guarantees on commodity MCUs. To showcase this capability, we develop a PEARTS open-source prototype atop FreeRTOS on a single-core ARM Cortex-M33 processor. We evaluate and report on PEARTS security and (modest) overheads.
Securing software supply chains is a growing challenge due to the inadequacy of existing datasets in capturing the complexity of next-gen attacks, such as multiphase malware execution, remote access activation, and dynamic payload generation. Existing datasets, which rely on metadata inspection and static code analysis, are inadequate for detecting such attacks. This creates a critical gap because these datasets do not capture what happens during and after a package is installed. To address this gap, we present QUT-DV25, a dynamic analysis dataset specifically designed to support and advance research on detecting and mitigating supply chain attacks within the Python Package Index (PyPI) ecosystem. This dataset captures install and post-install-time traces from 14,271 Python packages, of which 7,127 are malicious. The packages are executed in an isolated sandbox environment using an extended Berkeley Packet Filter (eBPF) kernel and user-level probes. It captures 36 real-time features, that includes system calls, network traffic, resource usages, directory access patterns, dependency logs, and installation behaviors, enabling the study of next-gen attack vectors. ML analysis using the QUT-DV25 dataset identified four malicious PyPI packages previously labeled as benign, each with thousands of downloads. These packages deployed covert remote access and multi-phase payloads, were reported to PyPI maintainers, and subsequently removed. This highlights the practical value of QUT-DV25, as it outperforms reactive, metadata, and static datasets, offering a robust foundation for developing and benchmarking advanced threat detection within the evolving software supply chain ecosystem.