Twitter/XGitHub

CyberSec Research

Browse, search and filter the latest cybersecurity research papers from arXiv

Filters

Cryptography and Security1245
Computers and Society654
Networking and Internet Architecture876
Distributed Computing432
Software Engineering789
Artificial Intelligence1532
Machine Learning921
Hardware Security342
Software Security578
Network Security456
AI Security324
ML Security428
Cloud Security219
IoT Security187
Malware Analysis296
Cryptography413
Privacy329
Authentication247
Vulnerability Analysis385

Publication Year

Results (3899)

May 12, 2025
Louis Faul, Linard Hoessly, Panqiu Xia

Biochemical reaction networks are widely applied across scientific disciplines to model complex dynamic systems. We investigate the diffusion approximation of reaction networks with mass-action kinetics, focusing on the identifiability of the generator of the associated stochastic differential equations. We derive conditions under which the law of the diffusion approximation is identifiable and provide theorems for verifying identifiability in practice. Notably, our results show that some reaction networks have non-identifiable reaction rates, even when the law of the corresponding stochastic process is completely known. Moreover, we show that reaction networks with distinct graphical structures can generate the same diffusion law under specific choices of reaction rates. Finally, we compare our framework with identifiability results in the deterministic ODE setting and the discrete continuous-time Markov chain models for reaction networks.

May 11, 2025
Zitao Yang, Rebecca J. Rousseau, Sara D....

Genes are connected in complex networks of interactions where often the product of one gene is a transcription factor that alters the expression of another. Many of these networks are based on a few fundamental motifs leading to switches and oscillators of various kinds. And yet, there is more to the story than which transcription factors control these various circuits. These transcription factors are often themselves under the control of effector molecules that bind them and alter their level of activity. Traditionally, much beautiful work has shown how to think about the stability of the different states achieved by these fundamental regulatory architectures by examining how parameters such as transcription rates, degradation rates and dissociation constants tune the circuit, giving rise to behavior such as bistability. However, such studies explore dynamics without asking how these quantities are altered in real time in living cells as opposed to at the fingertips of the synthetic biologist's pipette or on the computational biologist's computer screen. In this paper, we make a departure from the conventional dynamical systems view of these regulatory motifs by using statistical mechanical models to focus on endogenous signaling knobs such as effector concentrations rather than on the convenient but more experimentally remote knobs such as dissociation constants, transcription rates and degradation rates that are often considered. We also contrast the traditional use of Hill functions to describe transcription factor binding with more detailed thermodynamic models. This approach provides insights into how biological parameters are tuned to control the stability of regulatory motifs in living cells, sometimes revealing quite a different picture than is found by using Hill functions and tuning circuit parameters by hand.

Feedback in cellular processes is typically inferred through cellular responses to experimental perturbations. Modular response analysis provides a theoretical framework for translating specific perturbations into feedback sensitivities between cellular modules. However, in large-scale drug perturbation studies the effect of any given drug may not be known and may not only affect one module at a time. Here, we analyze the response of gene expression models to random perturbations that affect multiple modules simultaneously. In the deterministic regime we analytically show how cellular responses to infinitesimal random perturbations can be used to infer the nature of feedback regulation in gene expression, as long as the effects of perturbations are statistically independent between modules. We numerically extend this deterministic analysis to the response of average abundances of stochastic gene expression models to finite perturbations. Across a large sample of stochastic models, the response of average abundances generally obeyed predicted bounds from the deterministic analysis, but dramatic deviations occurred in systems with bimodal or fat-tailed stationary state distributions. These discrepancies demonstrate how deterministic analyses can fail to capture the effect of perturbations on averages of stochastic cellular feedback systems--even in the linear response regime.

May 7, 2025
Nicolas Champagnat, Rodolphe Loubaton, L...

Cellular response to environmental and internal signals can be modeled by dynamical gene regulatory networks (GRN). In the literature, three main classes of gene network models can be distinguished: (i) non-quantitative (or data-based) models which do not describe the probability distribution of gene expressions; (ii) quantitative models which fully describe the probability distribution of all genes coexpression; and (iii) mechanistic models which allow for a causal interpretation of gene interactions. We propose two rigorous frameworks to model gene alteration in a dynamical GRN, depending on whether the network model is quantitative or mechanistic. We explain how these models can be used for design of experiment, or, if additional alteration data are available, for validation purposes or to improve the parameter estimation of the original model. We apply these methods to the Gaussian graphical model, which is quantitative but non-mechanistic, and to mechanistic models of Bayesian networks and penalized linear regression.

May 5, 2025
Noah DeTal, Christian N. K. Anderson, Ma...

Quantitative Systems Pharmacology (QSP) promises to accelerate drug development, enable personalized medicine, and improve the predictability of clinical outcomes. Realizing its full potential depends on effectively managing the complexity of the underlying mathematical models and biological systems. Here, we present and validate a novel QSP workflow grounded in the principles of sloppy modeling, offering a practical and principled strategy for building and deploying models in a QSP pipeline. Our approach begins with a literature-derived model, constructed to be as comprehensive and unbiased as possible by drawing from the collective knowledge of prior research. At the core of the workflow is the Manifold Boundary Approximation Method (MBAM), which simplifies models while preserving their predictive capacity and mechanistic interpretability. Applying MBAM as a context-specific model reduction strategy, we link the simplified representation directly to the downstream predictions of interest. The resulting reduced models are computationally efficient and well-suited to key QSP tasks, including virtual population generation, experimental design, and target discovery. We demonstrate the utility of this workflow through case studies involving the coagulation cascade and SHIV infection. Our analysis suggests several promising next steps for improving the efficacy of bNAb therapies in HIV infected patients within the context of a general-purpose QSP modeling workflow.

Cellular reprogramming, the artificial transformation of one cell type into another, has been attracting increasing research attention due to its therapeutic potential for complex diseases. However, discovering reprogramming strategies through classical wet-lab experiments is hindered by lengthy time commitments and high costs. In this study, we explore the use of deep reinforcement learning (DRL) to control Boolean network models of complex biological systems, such as gene regulatory networks and signalling pathway networks. We formulate a novel control problem for Boolean network models under the asynchronous update mode in the context of cellular reprogramming. To facilitate scalability, we consider our previously introduced concept of a pseudo-attractor and we improve our procedure for effective identification of pseudo-attractor states. Finally, we devise a computational framework to solve the control problem. To leverage the structure of biological systems, we incorporate graph neural networks with graph convolutions into the artificial neural network approximator for the action-value function learned by the DRL agent. Experiments on a number of large real-world biological networks from literature demonstrate the scalability and effectiveness of our approach.

DNA microarray technology enables the simultaneous measurement of expression levels of thousands of genes, thereby facilitating the understanding of the molecular mechanisms underlying complex diseases such as brain tumors and the identification of diagnostic genetic signatures. To derive meaningful biological insights from the high-dimensional and complex gene features obtained through this technology and to analyze gene properties in detail, classical AI-based approaches such as machine learning and deep learning are widely employed. However, these methods face various limitations in managing high-dimensional vector spaces and modeling the intricate relationships among genes. In particular, challenges such as hyperparameter tuning, computational costs, and high processing power requirements can hinder their efficiency. To overcome these limitations, quantum computing and quantum AI approaches are gaining increasing attention. Leveraging quantum properties such as superposition and entanglement, quantum methods enable more efficient parallel processing of high-dimensional data and offer faster and more effective solutions to problems that are computationally demanding for classical methods. In this study, a novel model called "Deep VQC" is proposed, based on the Variational Quantum Classifier approach. Developed using microarray data containing 54,676 gene features, the model successfully classified four different types of brain tumors-ependymoma, glioblastoma, medulloblastoma, and pilocytic astrocytoma-alongside healthy samples with high accuracy. Furthermore, compared to classical ML algorithms, our model demonstrated either superior or comparable classification performance. These results highlight the potential of quantum AI methods as an effective and promising approach for the analysis and classification of complex structures such as brain tumors based on gene expression features.

We extend the traditional framework of steady state energy transduction -- typically characterized by a single input and output -- to multi-resource transduction in open chemical reaction networks (CRNs). Transduction occurs when stoichiometrically balanced processes are driven against their spontaneous directions by coupling them with thermodynamically favorable ones. However, when multiple processes (resources) interact through a shared CRN, identifying the relevant set of processes for analyzing transduction becomes a critical and complex challenge. To address this, we introduce a systematic procedure based on elementary processes, which cannot be further decomposed into subprocesses. Our theory generalizes the methodology used to define transduction efficiency in thermal engines operating between multiple heat baths. By selecting a reference equilibrium environment, it explicitly reveals the inherently relative nature of transduction efficiency and ties its definition to exergy. This framework also allows one to exclude unusable outputs from efficiency calculations. We further extend the concept of chemical gears to multi-process transduction, demonstrating their versatility as an analytical tool in complex settings. Finally, we apply our framework to central metabolic pathways, uncovering deep insights into their operation and highlighting the crucial difference between thermodynamic efficiencies and stoichiometric yields.

May 2, 2025
John F. Malloy, Camerian Millsaps, Kames...

Molecular chirality is critical to biochemical function, but it is unknown when chiral selectivity first became important in the evolutionary transition from geochemistry to biochemistry during the emergence of life. Here, we identify key transitions in the selection of chiral molecules in metabolic evolution, showing how achiral molecules (lacking chiral centers) may have given rise to specific and abundant chiral molecules in the elaboration of metabolic networks from geochemically available precursor molecules. Simulated expansions of biosphere-scale metabolism suggest new hypotheses about the evolution of chiral molecules within biochemistry, including a prominent role for both achiral and chiral compounds as nucleation sites of early metabolic network growth, an increasing enrichment of molecules with more chiral centers as these networks expand, and conservation of broken chiral symmetries along reaction pathways as a general organizing principle. We also find an unexpected enrichment in large, non-polymeric achiral molecules. Leveraging metabolic data of 40,023 genomes and metagenomes, we analyzed the statistics of chiral and achiral molecules in the large-scale organization of metabolism, revealing a chiral-enriched phase of network organization evidenced by system-size dependent chiral scaling laws that differ for individuals and ecosystems. By uncovering how metabolic networks could lead to chiral selection, our findings open new avenues for bridging metabolism and genetics-first approaches to the origin of chirality, allowing tools for better timing of major transitions in molecular organization during the emergence of life, understanding the role of chirality in extant and synthetic metabolisms, and informing targets for chirality-based biosignatures.

The inference of gene regulatory networks (GRNs) is a foundational stride towards deciphering the fundamentals of complex biological systems. Inferring a possible regulatory link between two genes can be formulated as a link prediction problem. Inference of GRNs via gene coexpression profiling data may not always reflect true biological interactions, as its susceptibility to noise and misrepresenting true biological regulatory relationships. Most GRN inference methods face several challenges in the network reconstruction phase. Therefore, it is important to encode gene expression values, leverege the prior knowledge gained from the available inferred network structures and positional informations of the input network nodes towards inferring a better and more confident GRN network reconstruction. In this paper, we explore the integration of multiple inferred networks to enhance the inference of Gene Regulatory Networks (GRNs). Primarily, we employ autoencoder embeddings to capture gene expression patterns directly from raw data, preserving intricate biological signals. Then, we embed the prior knowledge from GRN structures transforming them into a text-like representation using random walks, which are then encoded with a masked language model, BERT, to generate global embeddings for each gene across all networks. Additionally, we embed the positional encodings of the input gene networks to better identify the position of each unique gene within the graph. These embeddings are integrated into graph transformer-based model, termed GT-GRN, for GRN inference. The GT-GRN model effectively utilizes the topological structure of the ground truth network while incorporating the enriched encoded information. Experimental results demonstrate that GT-GRN significantly outperforms existing GRN inference methods, achieving superior accuracy and highlighting the robustness of our approach.

There is a plethora of highly stochastic non-linear dynamical systems in fields such as molecular biology, chemistry, epidemiology, and ecology. Yet, none of the currently available stochastic models are both accurate and computationally efficient for long-term predictions of large systems. The Linear Noise Approximation (LNA) model for biochemical reaction networks is analytically tractable, which makes it computationally efficient for simulation, analysis, and inference. However, it is only accurate for linear systems and short-time transitions. Other methods can achieve greater accuracy across a wider range of systems, including non-linear ones, but lack analytical tractability. This paper seeks to challenge the prevailing view by demonstrating that the Linear Noise Approximation can indeed capture non-linear dynamics after certain modifications. We introduce a new framework that utilises centre manifold theory allowing us to identify simple interventions to the LNA that do not significantly compromise its computational efficiency. We develop specific algorithms for systems that exhibit oscillations or bi-stability and demonstrate their accuracy and computational efficiency across multiple examples.

The evolution of chemical reaction networks is often analyzed through kinetic models and energy landscapes, but these approaches fail to capture the deeper structural constraints governing complexity growth. In chemical reaction networks, emergent constraints dictate the organization of reaction pathways, limiting combinatorial expansion and determining stability conditions. This paper introduces a novel approach to modeling chemical reaction networks by incorporating differential geometry into the classical framework of reaction kinetics. By utilizing the Riemannian metric, Christoffel symbols, and a system-specific entropy-like term, we provide a new method for understanding the evolution of complex reaction systems. The approach captures the interdependence between species, the curvature of the reaction network's configuration space, and the tendency of the system to evolve toward more probable states. The interaction topology constrains the accessible reaction trajectories and the introduced differential geometrical approach allows analysis of curvature constraints which help us to understand pathway saturation and transition dynamics. Rather than treating reaction space as an unconstrained combinatorial landscape, we frame it as a structured manifold with higher order curvature describing a geodesic for system evolution under intrinsic constraints. This geometrical perspective offers a unique insight into pathway saturation, self-interruption, and emergent behavior in reaction networks, and provides a scalable framework for modeling large biochemical or catalytic systems.

A continuing frustration for origin of life scientists is that abiotic and, by extension, pre-biotic attempts to develop self-sustaining, evolving molecular systems tend to produce more dead-end substances than macromolecular products with the necessary potential for biostructure and function -- the so-called `tar problem'. Nevertheless primordial life somehow emerged despite that presumed handicap. A~resolution of this problem is important in emergence-of-life science because it would provide valuable guidance in choosing subsequent paths of investigation, such as identifying pre-biotic patterns on Mars. To study the problem we set up a simple non-equilibrium flow dynamical model for the coupled temperature and mass dynamics of the decomposition of a polymeric carbohydrate adsorbed on a mineral surface, with incident stochastic thermal fluctuations. Results show that the model system behaves as a reciprocating thermochemical oscillator. The output fluctuation distribution is bimodal, with a right-weighted component that guarantees a bias towards detachment and desorption of monomeric species such as ribose, even while tar is formed concomitantly. This fluctuating thermochemical reciprocator may ensure that non-performing polymers can be fractionated into a refractory carbon reservoir and active monomers which may be reincorporated into better-performing polymers with less vulnerability towards adsorptive tarring.

Apr 17, 2025
Pengtao Dang, Tingbo Guo, Melissa Fishel...

A physics-informed neural network (PINN) models the dynamics of a system by integrating the governing physical laws into the architecture of a neural network. By enforcing physical laws as constraints, PINN overcomes challenges with data scarsity and potentially high dimensionality. Existing PINN frameworks rely on fully observed time-course data, the acquisition of which could be prohibitive for many systems. In this study, we developed a new PINN learning paradigm, namely Constrained Learning, that enables the approximation of first-order derivatives or motions using non-time course or partially observed data. Computational principles and a general mathematical formulation of Constrained Learning were developed. We further introduced MPOCtrL (Message Passing Optimization-based Constrained Learning) an optimization approach tailored for the Constrained Learning framework that strives to balance the fitting of physical models and observed data. Its code is available at github link: https://github.com/ptdang1001/MPOCtrL Experiments on synthetic and real-world data demonstrated that MPOCtrL can effectively detect the nonlinear dependency between observed data and the underlying physical properties of the system. In particular, on the task of metabolic flux analysis, MPOCtrL outperforms all existing data-driven flux estimators.

Apr 17, 2025
Akshata Hegde, Tom Nguyen, Jianlin Cheng

Gene Regulatory Networks (GRNs) are intricate biological systems that control gene expression and regulation in response to environmental and developmental cues. Advances in computational biology, coupled with high throughput sequencing technologies, have significantly improved the accuracy of GRN inference and modeling. Modern approaches increasingly leverage artificial intelligence (AI), particularly machine learning techniques including supervised, unsupervised, semi-supervised, and contrastive learning to analyze large scale omics data and uncover regulatory gene interactions. To support both the application of GRN inference in studying gene regulation and the development of novel machine learning methods, we present a comprehensive review of machine learning based GRN inference methodologies, along with the datasets and evaluation metrics commonly used. Special emphasis is placed on the emerging role of cutting edge deep learning techniques in enhancing inference performance. The potential future directions for improving GRN inference are also discussed.

Apr 16, 2025
Aliza Ehrman, Thomas Kriecherbauer, Lars...

The ribosome flow model (RFM) is a phenomenological model for the unidirectional flow of particles along a 1D chain of $n$ sites. The RFM has been extensively used to study the dynamics of ribosome flow along a single mRNA molecule during translation. In this case, the particles model ribosomes and each site corresponds to a consecutive group of codons. Networks of interconnected RFMs have been used to model and analyze large-scale translation in the cell and, in particular, the effects of competition for shared resources. Here, we analyze the RFM with a negative feedback connection from the protein production rate to the initiation rate. This models, for example, the production of proteins that inhibit the translation of their own mRNA. Using tools from the theory of 2-cooperative dynamical systems, we provide a simple condition guaranteeing that the closed-loop system admits at least one non-trivial periodic solution. When this condition holds, we also explicitly characterize a large set of initial conditions such that any solution emanating from this set converges to a non-trivial periodic solution. Such a solution corresponds to a periodic pattern of ribosome densities along the mRNA, and to a periodic pattern of protein production.

Substrate modification networks are ubiquitous in living, biochemical systems. A higher-level hypergraph "skeleton" captures key information about which substrates are transformed in the presence of modification-specific enzymes. Many different detailed models can be associated to the same skeleton, however uncertainty related to model fitting increases with the level of detail. We show that essential dynamical properties such as existence of positive steady states and concentration robustness can be extracted directly from the skeleton independent of the detailed model. The novel formalism of directed hypergraphs is used to prove that bifunctional enzyme action plays a key role in generating robustness. Moreover, we use another novel concept of "current" on a directed hypergraph to establish a link between potentially remote network components. Current is an essential notion required for existence of positive steady states, and furthermore, current-balance combined with bifunctionality generates concentration robustness.

There is growing awareness that the success of pharmacologic interventions on living organisms is significantly impacted by context and timing of exposure. In turn, this complexity has led to an increased focus on regulatory network dynamics in biology and our ability to represent them in a high-fidelity way, in silico. Logic network models show great promise here and their parameter estimation can be formulated as a constraint satisfaction problem (CSP) that is well-suited to the often sparse, incomplete data in biology. Unfortunately, even in the case of Boolean logic, the combinatorial complexity of these problems grows rapidly, challenging the creation of models at physiologically-relevant scales. That said, quantum computing, while still nascent, facilitates novel information-processing paradigms with the potential for transformative impact in problems such as this one. In this work, we take a first step at actualizing this potential by identifying the structure and Boolean decisional logic of a well-studied network linking 5 proteins involved in the neural development of the mammalian cortical area of the brain. We identify the protein-protein connectivity and binary decisional logic governing this network by formulating it as a Boolean Satisfiability (B-SAT) problem. We employ Grover's algorithm to solve the NP-hard problem faster than the exponential time complexity required by deterministic classical algorithms. Using approaches deployed on both quantum simulators and actual noisy intermediate scale quantum (NISQ) hardware, we accurately recover several high-likelihood models from very sparse protein expression data. The results highlight the differential roles of data types in supporting accurate models; the impact of quantum algorithm design as it pertains to the mutability of quantum hardware; and the opportunities for accelerated discovery enabled by this approach.