Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
Single-cell data reveal the presence of biological stochasticity between cells of identical genome and environment, in particular highlighting the transcriptional bursting phenomenon. To account for this property, gene expression may be modeled as a continuous-time Markov chain where biochemical species are described in a discrete way, leading to Gillespie's stochastic simulation algorithm (SSA) which turns out to be computationally expensive for realistic mRNA and protein copy numbers. Alternatively, hybrid models based on piecewise-deterministic Markov processes (PDMPs) offer an effective compromise for capturing cell-to-cell variability, but their simulation remains limited to specialized mathematical communities. With a view to making them more accessible, we present here a simple simulation method that is reminiscent of SSA, while allowing for much lower computational cost. We detail the algorithm for a bursty PDMP describing an arbitrary number of interacting genes, and prove that it simulates exact trajectories of the model. As an illustration, we use the algorithm to simulate a two-gene toggle switch: this example highlights the fact that bimodal distributions as observed in real data are not explained by transcriptional bursting per se, but rather by distinct burst frequencies that may emerge from interactions between genes.
Graph-structured data is ubiquitous in scientific domains, where models often face imbalanced learning settings. In imbalanced regression, domain preferences focus on specific target value ranges representing the most scientifically valuable cases; we observe a significant lack of research. In this paper, we present Spectral Manifold Harmonization (SMH), a novel approach for addressing this imbalanced regression challenge on graph-structured data by generating synthetic graph samples that preserve topological properties while focusing on often underrepresented target distribution regions. Conventional methods fail in this context because they either ignore graph topology in case generation or do not target specific domain ranges, resulting in models biased toward average target values. Experimental results demonstrate the potential of SMH on chemistry and drug discovery benchmark datasets, showing consistent improvements in predictive performance for target domain ranges.
HIV epidemiological data is increasingly complex, requiring advanced computation for accurate cluster detection and forecasting. We employed quantum-accelerated machine learning to analyze HIV prevalence at the ZIP-code level using AIDSVu and synthetic SDoH data for 2022. Our approach compared classical clustering (DBSCAN, HDBSCAN) with a quantum approximate optimization algorithm (QAOA), developed a hybrid quantum-classical neural network for HIV prevalence forecasting, and used quantum Bayesian networks to explore causal links between SDoH factors and HIV incidence. The QAOA-based method achieved 92% accuracy in cluster detection within 1.6 seconds, outperforming classical algorithms. Meanwhile, the hybrid quantum-classical neural network predicted HIV prevalence with 94% accuracy, surpassing a purely classical counterpart. Quantum Bayesian analysis identified housing instability as a key driver of HIV cluster emergence and expansion, with stigma exerting a geographically variable influence. These quantum-enhanced methods deliver greater precision and efficiency in HIV surveillance while illuminating critical causal pathways. This work can guide targeted interventions, optimize resource allocation for PrEP, and address structural inequities fueling HIV transmission.
Chemical Reaction Networks (CRNs) provide a powerful framework for modeling complex systems due to their compositionality, which makes them well-suited for analyzing interactions of subsystems within larger aggregate systems. This work presents a thermodynamic formalism for ranking CRN pathways under fixed throughput currents (fixed velocities of species flowing in and out of the system), where pathways represent subnetworks capable of performing the stipulated chemical conversion. We define a thermodynamic cost function for pathways derived from the large-deviation theory of stochastic CRNs, which decomposes into two components: an ongoing maintenance cost to sustain a non-equilibrium steady state (NESS), and a restriction cost, quantifying the ongoing improbability of neutralizing reactions outside the specified pathway. Applying this formalism to detailed-balanced CRNs in the linear response regime, we prove that the resistance of a CRN decreases as reactions are added that support the throughput current, and that the maintenance cost, the restriction cost, and the thermodynamic cost of nested pathways are bounded below by those of their hosting network. Extending the analysis far from equilibrium, we find that while cost is non-decreasing for progressively more restricted nested pathways near equilibrium, multimolecular CRN examples can be found that assign lower costs to more restricted pathways at far-from-equilibrium NESSs. The possibility to reduce the resistance of a network at fixed throughput, while also simplifying the network, may have implications for enzyme family evolution, in which novel reaction mechanisms may first lead to a proliferation of pathways through non-specific catalysis, but later selection for specificity may benefit both from species retention, and more efficient use of autocatalysts to improve throughput.
The rapid expansion of biomolecular datasets presents significant challenges for computational biology. Quantum computing emerges as a promising solution to address these complexities. This study introduces a novel quantum framework for analyzing TART-T and TART-C gene data by integrating genomic and structural information. Leveraging a Quantum Neural Network (QNN), we classify hotspot mutations, utilizing quantum superposition to uncover intricate relationships within the data. Additionally, a Variational Quantum Eigensolver (VQE) is employed to estimate molecular ground-state energies through a hybrid classical-quantum approach, overcoming the limitations of traditional computational methods. Implemented using IBM Qiskit, our framework demonstrates high accuracy in both mutation classification and energy estimation on current Noisy Intermediate-Scale Quantum (NISQ) devices. These results underscore the potential of quantum computing to advance the understanding of gene function and protein structure. Furthermore, this research serves as a foundational blueprint for extending quantum computational methods to other genes and biological systems, highlighting their synergy with classical approaches and paving the way for breakthroughs in drug discovery and personalized medicine.
Recent benchmarks reveal that models for single-cell perturbation response are often outperformed by simply predicting the dataset mean. We trace this anomaly to a metric artifact: control-referenced deltas and unweighted error metrics reward mode collapse whenever the control is biased or the biological signal is sparse. Large-scale \textit{in silico} simulations and analysis of two real-world perturbation datasets confirm that shared reference shifts, not genuine biological change, drives high performance in these evaluations. We introduce differentially expressed gene (DEG)-aware metrics, weighted mean-squared error (WMSE) and weighted delta $R^{2}$ ($R^{2}_{w}(\Delta)$) with respect to all perturbations, that measure error in niche signals with high sensitivity. We further introduce negative and positive performance baselines to calibrate these metrics. With these improvements, the mean baseline sinks to null performance while genuine predictors are correctly rewarded. Finally, we show that using WMSE as a loss function reduces mode collapse and improves model performance.
Populations of cells regulate gene expression in response to external signals, but their ability to make reliable collective decisions is limited by both intrinsic noise in molecular signaling and variability between individual cells. In this work, we use optogenetic control of the canonical Wnt pathway as an example to study how reliably information about an external signal is transmitted to a population of cells, and determine an optimal encoding strategy to maximize information transmission from Wnt signals to gene expression. We find that it is possible to reach an information capacity beyond 1 bit only through an appropriate, discrete encoding of signals. By averaging over an increasing number of outputs, we systematically vary the effective noise in the pathway. As the effective noise decreases, the optimal encoding comprises more discrete input signals. These signals do not need to be fine-tuned. The optimal code transitions into a continuous code in the small-noise limit, which can be shown to be consistent with the Jeffreys prior. We visualize the performance of signal encodings using decoding maps. Our results suggest optogenetic Wnt signaling allows for regulatory control beyond a simple binary switch, and provides a framework to apply ideas from information processing to single-cell in vitro experiments.
Estimating single-cell responses across various perturbations facilitates the identification of key genes and enhances drug screening, significantly boosting experimental efficiency. However, single-cell sequencing is a destructive process, making it impossible to capture the same cell's phenotype before and after perturbation. Consequently, data collected under perturbed and unperturbed conditions are inherently unpaired. Existing methods either attempt to forcibly pair unpaired data using random sampling, or neglect the inherent relationship between unperturbed and perturbed cells during the modeling. In this work, we propose a framework based on Dual Diffusion Implicit Bridges (DDIB) to learn the mapping between different data distributions, effectively addressing the challenge of unpaired data. We further interpret this framework as a form of data augmentation. We integrate gene regulatory network (GRN) information to propagate perturbation signals in a biologically meaningful way, and further incorporate a masking mechanism to predict silent genes, improving the quality of generated profiles. Moreover, gene expression under the same perturbation often varies significantly across cells, frequently exhibiting a bimodal distribution that reflects intrinsic heterogeneity. To capture this, we introduce a more suitable evaluation metric. We propose Unlasting, dual conditional diffusion models that overcome the problem of unpaired single-cell perturbation data and strengthen the model's insight into perturbations under the guidance of the GRN, with a dedicated mask model designed to improve generation quality by predicting silent genes. In addition, we introduce a biologically grounded evaluation metric that better reflects the inherent heterogeneity in single-cell responses.
Understanding protein function at the molecular level requires connecting residue-level annotations with physical and structural properties. This can be cumbersome and error-prone when functional annotation, computation of physico-chemical properties, and structure visualization are separated. To address this, we introduce ProCaliper, an open-source Python library for computing and visualizing physico-chemical properties of proteins. It can retrieve annotation and structure data from UniProt and AlphaFold databases, compute residue-level properties such as charge, solvent accessibility, and protonation state, and interactively visualize the results of these computations along with user-supplied residue-level data. Additionally, ProCaliper incorporates functional and structural information to construct and optionally sparsify networks that encode the distance between residues and/or annotated functional sites or regions. The package ProCaliper and its source code, along with the code used to generate the figures in this manuscript, are freely available at https://github.com/PNNL-Predictive-Phenomics/ProCaliper.
Designing reaction pathways that maximize the production of a target compound in a given metabolic network is a fundamental problem in systems biology. In this study, we systematically explore the non-oxidative glycolysis metabolic network, guided by the principle that reactions with negative Gibbs free energy differences are thermodynamically favored. We enumerate alternative pathways that implement the net non-oxidative glycolysis reaction, categorized by their length. Our analysis reveals several alternative thermodynamically favorable pathways beyond those reported in experiments. In addition, we identify molecules within the network, such as 3-hydroxypropionic acid, that may have significant potential for further investigation.
This paper introduces a tamper-resistant framework for large language models (LLMs) in medical applications, utilizing quantum gradient descent (QGD) to detect malicious parameter modifications in real time. Integrated into a LLaMA-based model, QGD monitors weight amplitude distributions, identifying adversarial fine-tuning anomalies. Tests on the MIMIC and eICU datasets show minimal performance impact (accuracy: 89.1 to 88.3 on MIMIC) while robustly detecting tampering. PubMedQA evaluations confirm preserved biomedical question-answering capabilities. Compared to baselines like selective unlearning and cryptographic fingerprinting, QGD offers superior sensitivity to subtle weight changes. This quantum-inspired approach ensures secure, reliable medical AI, extensible to other high-stakes domains.
The propagation of noise through parallel regulatory pathways is a prominent feature of feed-forward loops in genetic networks. Although the contributions of the direct and indirect regulatory pathways of feed-forward loops to output variability have been well characterized, the impact of their joint action arising from their shared input and output remains poorly understood. Here, we identify an additional component of noise that emerges specifically from this convergent nature of the pathways. Using inter-gene correlations, we reveal the regulatory basis of the additional noise and interpret it as synergy or redundancy in noise propagation, depending on whether the combined pathways amplify or suppress fluctuations. This framework not only accounts for previously observed differences in noise behavior across coherent and incoherent feed-forward loops but also provides a generalizable strategy to connect network structure with stochastic gene regulation.
Over the last decade, proteomic analysis of single cells by mass spectrometry transitioned from an uncertain possibility to a set of robust and rapidly advancing technologies supporting the accurate quantification of thousands of proteins. We review the major drivers of this progress, from establishing feasibility to powerful and increasingly scalable methods. We focus on the tradeoffs and synergies of different technological solutions within a coherent conceptual framework, which projects considerable room both for throughput scaling and for extending the analysis scope to functional protein measurements. We highlight the potential of these technologies to support the development of mechanistic biophysical models and help uncover new principles.
This paper proposes an extension of the traditional Central Dogma of molecular biology to a more dynamic model termed the Central Dogma Cycle (CDC) and a broader network called the Central Dogma Cyclic Network (CDCN). While the Central Dogma is necessary for genetic information flow, it is not sufficient to fully explain cellular memory and information management. The CDC incorporates additional well-established steps, including protein folding and protein networking, highlighting the cyclical nature of information flow in cells. This cyclic architecture is proposed as a key mechanism for cellular memory, drawing analogies to memory functions in computers, such as input, read, write, execute, and erase. The interconnected cycles within the CDCN, including metabolic cycles and signaling pathways, are suggested to function akin to latches in computer memory, contributing to the storage and processing of cellular information beyond nucleic acid sequences. Understanding cellular memory through this cyclic network model offers a new perspective on heredity, cell processes, and the potential disruptions in disease pathology.
Understanding the properties of biological systems is an exciting avenue for applying advanced approaches to solving corresponding computational tasks. A specific class of problems that arises in the resolution of biological challenges is optimization. In this work, we present the results of a proof-of-concept study that applies a quantum-inspired optimization algorithm to simulate a viral response. We formulate an Ising-type model to describe the patterns of gene activity in host responses. Reducing the problem to the Ising form allows the use of available quantum and quantum-inspired optimization tools. We demonstrate the application of a quantum-inspired optimization algorithm to this problem. Our study paves the way for exploring the full potential of quantum and quantum-inspired optimization tools in biological applications.
In recent decades, it has been emphasized that the evolving structure of networks may be shaped by interaction principles that yield sparse graphs with a vertex degree distribution exhibiting an algebraic tail, and other structural traits that are not featured in traditional random graphs. In this respect, through a mean-field approach, this review tackles the statistical physics of graph models based on the interaction principle of duplication-divergence. Additional sophistications extending the duplication-divergence model are also reviewed as well as generalizations of other known models. Possible research gaps and related prior results are then discussed.
We study the impact of noise on attractor dynamics in Boolean networks, focusing on their stability and transition behaviors. By constructing attractor matrices based on single-node perturbations, we propose a framework to quantify attractor stability and identify dominant attractors. We find that attractors are more stable than predicted by basin sizes, showing the importance of dynamical structure in noisy environments. In addition, under global perturbations, basin sizes dictate long-term behavior; under local noise, however, attractor dominance is determined by noise-induced transition patterns rather than basin sizes. Our results show that transition dynamics induced by stochastic perturbations provide an efficient and quantitative description for the attractor stability and dynamics in Boolean networks under noise.
We present a novel framework for chemical learning based on Competitive Dimerization Networks (CDNs) - systems in which multiple molecular species, e.g. proteins or DNA/RNA oligomers, reversibly bind to form dimers. We show that these networks can be trained in vitro through directed evolution, enabling the implementation of complex learning tasks such as multiclass classification without digital hardware or explicit parameter tuning. Each molecular species functions analogously to a neuron, with binding affinities acting as tunable synaptic weights. A training protocol involving mutation, selection, and amplification of DNA-based components allows CDNs to robustly discriminate among noisy input patterns. The resulting classifiers exhibit strong output contrast and high mutual information between input and output, especially when guided by a contrast-enhancing loss function. Comparative analysis with in silico gradient descent training reveals closely correlated performance. These results establish CDNs as a promising platform for analog physical computation, bridging synthetic biology and machine learning, and advancing the development of adaptive, energy-efficient molecular computing systems.