Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
Simplified stochastic models are widely used in the study of frequency-resolved noise propagation in biochemical reaction networks, a common measure being the coherence between random fluctuations in molecule number trajectories. Such models have also found widespread application in the quantification of how information is transmitted in reaction networks via the mutual information (MI) rate. A common assumption is that, under timescale separation, estimates for the coherence and MI rate obtained from simplified (reduced) models closely approximate those in the underlying full models. Here, we challenge that assumption by showing that, while reduced models can faithfully reproduce low-order statistics of molecular counts, they frequently incur substantial discrepancies in the coherence spectrum, especially at intermediate and high frequencies. These errors, in turn, lead to significant inaccuracies in the resulting estimates for the MI rates. We show that the observed discrepancies are due to the interplay between the structure of the underlying reaction networks, the specific model reduction method that is applied, and the asymptotic limits relating the full and the reduced models. We illustrate our results in canonical models of enzyme catalysis and gene expression, highlighting practical implications for quantifying information flow in cells.
This article serves to concisely review the link between gradient flow systems on hypergraphs and information geometry which has been established within the last five years. Gradient flow systems describe a wealth of physical phenomena and provide powerful analytical technquies which are based on the variational energy-dissipation principle. Modern nonequilbrium physics has complemented this classical principle with thermodynamic uncertaintly relations, speed limits, entropy production rate decompositions, and many more. In this article, we formulate these modern principles within the framework of perturbed gradient flow systems on hypergraphs. In particular, we discuss the geometry induced by the Bregman divergence, the physical implications of dual foliations, as well as the corresponding infinitesimal Riemannian geometry for gradient flow systems. Through the geometrical perspective, we are naturally led to new concepts such as moduli spaces for perturbed gradient flow systems and thermodynamical area which is crucial for understanding speed limits. We hope to encourage the readers working in either of the two fields to further expand on and foster the interaction between the two fields.
Canalization is a key organizing principle in complex systems, particularly in gene regulatory networks. It describes how certain input variables exert dominant control over a function's output, thereby imposing hierarchical structure and conferring robustness to perturbations. Degeneracy, in contrast, captures redundancy among input variables and reflects the complete dominance of some variables by others. Both properties influence the stability and dynamics of discrete dynamical systems, yet their combinatorial underpinnings remain incompletely understood. Here, we derive recursive formulas for counting Boolean functions with prescribed numbers of essential variables and given canalizing properties. In particular, we determine the number of non-degenerate canalizing Boolean functions -- that is, functions for which all variables are essential and at least one variable is canalizing. Our approach extends earlier enumeration results on canalizing and nested canalizing functions. It provides a rigorous foundation for quantifying how frequently canalization occurs among random Boolean functions and for assessing its pronounced over-representation in biological network models, where it contributes to both robustness and to the emergence of distinct regulatory roles.
Cryo-electron tomography (cryo-ET) enables structural characterization of biomolecules under near-native conditions. Existing approaches for interpreting the resulting three-dimensional volumes are computationally expensive and have difficulty interpreting density associated with small proteins/complexes. To explore alternate approaches for identifying proteins in cryo-ET data we pursued a Graph Network and topologically invariant approach. Here, we report on a fast algorithm that distinguishes volumes containing protein density from noise by searching for nuances of evolutionarily conversed motifs and the geometrical characteristics of protein structure. GRIP-Tomo 2.0 is a machine-learning pipeline that extracts interpretable topological features of protein structures within noisy experimental backgrounds. Compared to version 1.0, the new pipeline includes three upgrades that significantly improve performance including synthetic tomogram generation simulating realistic noise, graph-based persistent feature extraction as protein fingerprints, and high-performance computing acceleration. GRIP-Tomo 2.0 achieves over 90% accuracy in distinguishing proteins from noise for synthetic datasets and over 80% accuracy for real datasets, which represents a foundational step toward advancing cryo-ET workflows and empowering automated detection of both small and large proteins for visual proteomics.
Computationally predicting protein-protein interactions (PPIs) is challenging due to the lack of integrated, multimodal protein representations. DPEB is a curated collection of 22,043 human proteins that integrates four embedding types: structural (AlphaFold2), transformer-based sequence (BioEmbeddings), contextual amino acid patterns (ESM-2: Evolutionary Scale Modeling), and sequence-based n-gram statistics (ProtVec]). AlphaFold2 protein structures are available through public databases (e.g., AlphaFold2 Protein Structure Database), but the internal neural network embeddings are not. DPEB addresses this gap by providing AlphaFold2-derived embeddings for computational modeling. Our benchmark evaluations show GraphSAGE with BioEmbedding achieved the highest PPI prediction performance (87.37% AUROC, 79.16% accuracy). The framework also achieved 77.42% accuracy for enzyme classification and 86.04% accuracy for protein family classification. DPEB supports multiple graph neural network methods for PPI prediction, enabling applications in systems biology, drug target identification, pathway analysis, and disease mechanism studies.
Multimodal molecular representation learning, which jointly models molecular graphs and their textual descriptions, enhances predictive accuracy and interpretability by enabling more robust and reliable predictions of drug toxicity, bioactivity, and physicochemical properties through the integration of structural and semantic information. However, existing multimodal methods suffer from two key limitations: (1) they typically perform cross-modal interaction only at the final encoder layer, thus overlooking hierarchical semantic dependencies; (2) they lack a unified prototype space for robust alignment between modalities. To address these limitations, we propose ProtoMol, a prototype-guided multimodal framework that enables fine-grained integration and consistent semantic alignment between molecular graphs and textual descriptions. ProtoMol incorporates dual-branch hierarchical encoders, utilizing Graph Neural Networks to process structured molecular graphs and Transformers to encode unstructured texts, resulting in comprehensive layer-wise representations. Then, ProtoMol introduces a layer-wise bidirectional cross-modal attention mechanism that progressively aligns semantic features across layers. Furthermore, a shared prototype space with learnable, class-specific anchors is constructed to guide both modalities toward coherent and discriminative representations. Extensive experiments on multiple benchmark datasets demonstrate that ProtoMol consistently outperforms state-of-the-art baselines across a variety of molecular property prediction tasks.
Extracellular matrix (ECM) remodeling is central to a wide variety of healthy and diseased tissue processes. Unfortunately, predicting ECM remodeling under various chemical and mechanical conditions has proven to be excessively challenging, due in part to its complex regulation by intracellular and extracellular molecular reaction networks that are spatially and temporally dynamic. We introduce ECMSim, which is a highly interactive, real-time, and web application designed to simulate heterogeneous matrix remodeling. The current model simulates cardiac scar tissue with configurable input conditions using a large-scale model of the cardiac fibroblast signaling network. Cardiac fibrosis is a major component of many forms of heart failure. ECMSim simulates over 1.3 million equations simultaneously in real time that include more than 125 species and more than 200 edges in each cell in a 100*100 spatial array (10,000 cells), which accounts for inputs, receptors, intracellular signaling cascades, ECM production, and feedback loops, as well as molecular diffusion. The algorithm is represented by a set of ordinary differential equations (ODEs) that are coupled with ECM molecular diffusion. The equations are solved on demand using compiled C++ and the WebAssembly standard. The platform includes brush-style cell selection to target a subset of cells with adjustable input molecule concentrations, parameter sliders to adjust parameters on demand, and multiple coupled real-time visualizations of network dynamics at multiple scales. Implementing ECMSim in standard web technologies enables a fully functional application that combines real-time simulation, visual interaction, and model editing. The software enables the investigation of pathological or experimental conditions, hypothetical scenarios, matrix remodeling, or the testing of the effects of an experimental drug(s) with a target receptor.
Dynamical systems in the life sciences are often composed of complex mixtures of overlapping behavioral regimes. Cellular subpopulations may shift from cycling to equilibrium dynamics or branch towards different developmental fates. The transitions between these regimes can appear noisy and irregular, posing a serious challenge to traditional, flow-based modeling techniques which assume locally smooth dynamics. To address this challenge, we propose MODE (Mixture Of Dynamical Experts), a graphical modeling framework whose neural gating mechanism decomposes complex dynamics into sparse, interpretable components, enabling both the unsupervised discovery of behavioral regimes and accurate long-term forecasting across regime transitions. Crucially, because agents in our framework can jump to different governing laws, MODE is especially tailored to the aforementioned noisy transitions. We evaluate our method on a battery of synthetic and real datasets from computational biology. First, we systematically benchmark MODE on an unsupervised classification task using synthetic dynamical snapshot data, including in noisy, few-sample settings. Next, we show how MODE succeeds on challenging forecasting tasks which simulate key cycling and branching processes in cell biology. Finally, we deploy our method on human, single-cell RNA sequencing data and show that it can not only distinguish proliferation from differentiation dynamics but also predict when cells will commit to their ultimate fate, a key outstanding challenge in computational biology.
DNA strand displacement (SD) reactions are central to the operation of many synthetic nucleic acid systems, including molecular circuits, sensors, and machines. Over the years, a broad set of design frameworks has emerged to accommodate various functional goals, initial configurations, and environmental conditions. Nevertheless, key challenges persist, particularly in reliably predicting reaction kinetics. This review examines recent approaches to SD reaction design, with emphasis on the properties of single reactions, including kinetics, structural factors, and limitations in current modelling practices. We identify promising innovations while analysing the factors that continue to hinder predictive accuracy. We conclude by outlining future directions for achieving more robust and programmable behaviour in DNA-based systems.
State transitions are fundamental in biological systems but challenging to observe directly. Here, we present the first single-cell observation of state transitions in a synthetic bacterial genetic circuit. Using a mother machine, we tracked over 1007 cells for 27 hours. First-passage analysis and dynamical reconstruction reveal that transitions occur outside the small-noise regime, challenging the applicability of classical Kramers' theory. The process lacks a single characteristic rate, questioning the paradigm of transitions between discrete cell states. We observe significant multiplicative noise that distorts the effective potential landscape yet increases transition times. These findings necessitate theoretical frameworks for biological state transitions beyond the small-noise assumption.
Wuchereria bancrofti, the parasitic roundworm responsible for lymphatic filariasis, permanently disables over 36 million people and places 657 million at risk across 39 countries. A major bottleneck for drug discovery is the lack of functional annotation for more than 90 percent of the W. bancrofti dark proteome, leaving many potential targets unidentified. In this work, we present a novel computational pipeline that converts W. bancrofti's unannotated amino acid sequence data into precise four-level Enzyme Commission (EC) numbers and drug candidates. We utilized a DEtection TRansformer to estimate the probability of enzymatic function, fine-tuned a hierarchical nearest neighbor EC predictor on 4,476 labeled parasite proteins, and applied rejection sampling to retain only four-level EC classifications at 100 percent confidence. This pipeline assigned precise EC numbers to 14,772 previously uncharacterized proteins and discovered 543 EC classes not previously known in W. bancrofti. A qualitative triage emphasizing parasite-specific targets, chemical tractability, biochemical importance, and biological plausibility prioritized six enzymes across five separate strategies: anti-Wolbachia cell-wall inhibition, proteolysis blockade, transmission disruption, purinergic immune interference, and cGMP-signaling destabilization. We curated a 43-compound library from ChEMBL and BindingDB and co-folded across multiple protein conformers with Boltz-2. All six targets exhibited at least moderately strong predicted binding affinities below 1 micromolar, with moenomycin analogs against peptidoglycan glycosyltransferase and NTPase inhibitors showing promising nanomolar hits and well-defined binding pockets. While experimental validation remains essential, our results provide the first large-scale functional map of the W. bancrofti dark proteome and accelerate early-stage drug development for the species.
We study a stochastic model of a copolymerization process that has been extensively investigated in the physics literature. The main questions of interest include: (i) what are the criteria for transience, null recurrence, and positive recurrence in terms of the system parameters; (ii) in the transient regime, what are the limiting fractions of the different monomer types; and (iii) in the transient regime, what is the speed of growth of the polymer? Previous studies in the physics literature have addressed these questions using heuristic methods. Here, we utilize rigorous mathematical arguments to derive the results from the physics literature. Moreover, the techniques developed allow us to generalize to the copolymerization process with finitely many monomer types. We expect that the mathematical methods used and developed in this work will also enable the study of even more complex models in the future.
Colorectal cancer (CRC) is highly heterogeneous, with five-year survival rates dropping from $\sim$90% in localized disease to $\sim$15% with distant metastases. Disease progression is shaped not only by tumor-intrinsic alterations but also by the reorganization of the tumor microenvironment (TME). Metabolic, compositional, and spatial changes contribute to this progression, but considered individually they lack context and often fail as therapeutic targets. Understanding their coordination could reveal processes to alter the disease course. Here, we combined multiplexed ion beam imaging (MIBI) with machine learning to profile metabolic, functional and spatial states of 522 colorectal lesions with single-cell resolution. We observed recurrent stage-specific remodeling marked by a lymphoid-to-myeloid shift, stromal-cancer cooperation, and malignant metabolic shifts. Spatial organization of epithelial, stromal, and immune compartments provided stronger stratification of disease stage than tumor-intrinsic changes or bulk immune infiltration alone. To systematically model these coordinated changes, we condensed multimodal features into 10 latent factors of TME organization. These factors tracked disease progression, were conserved across cohorts, and revealed frequent multicellular metabolic niches and distinct, non-exclusive TME trajectories. Our framework MuVIcell exposes the elements that together drive CRC progression by grouping co-occurring changes across cell types and feature classes into coordinated multicellular programs. This creates a rational basis to therapeutically target TME reorganization. Importantly, the framework is scalable and flexible, offering a resource for studying multicellular organization in other solid tumors.
Due to the ever-rising global incidence rate of inflammatory bowel disease (IBD) and the lack of effective clinical treatment drugs, elucidating the detailed pathogenesis, seeking novel targets, and developing promising drugs are the top priority for IBD treatment. Here, we demonstrate that the levels of microRNA (miR)-103a were significantly downregulated in the inflamed mucosa of ulcerative colitis (UC) patients, along with elevated inflammatory cytokines (IL-1beta/TNF-alpha) and reduced tight junction protein (Occludin/ZO-1) levels, as compared with healthy control objects. Consistently, miR-103a deficient intestinal epithelial cells Caco-2 showed serious inflammatory responses and increased permeability, and DSS induced more severe colitis in miR-103a-/- mice than wild-type ones. Mechanistic studies unraveled that c-FOS suppressed miR-103a transcription via binding to its promoter, then miR-103a-targeted NF-kappaB activation contributes to inflammatory responses and barrier disruption by targeting TAB2 and TAK1. Notably, the traditional Chinese medicine Cornus officinalis (CO) and its core active ingredient loganin potently mitigated inflammation and barrier disruption in UC by specifically blocking the EGFR/RAS/ERK/c-FOS signaling axis, these effects mainly attributed to modulated miR-103a levels as the therapeutic activities of them were almost completely shielded in miR-103a KO mice. Taken together, this work reveals that loganin relieves EGFR/c-FOS axis-suppressed epithelial miR-103a expression, thereby inhibiting NF-kappaB pathway activation, suppressing inflammatory responses, and preserving tight junction integrity in UC. Thus, our data enrich mechanistic insights and promising targets for UC treatment.
Dynamical systems with polynomial right-hand sides are very important in various applications, e.g., in biochemistry and population dynamics. The mathematical study of these dynamical systems is challenging due to the possibility of multistability, oscillations, and chaotic dynamics. One important tool for this study is the concept of reaction systems, which are dynamical systems generated by reaction networks for some choices of parameter values. Among these, disguised toric systems are remarkably stable: they have a unique attracting fixed point, and cannot give rise to oscillations or chaotic dynamics. The computation of the set of parameter values for which a network gives rise to disguised toric systems (i.e., the disguised toric locus of the network) is an important but difficult task. We introduce new ideas based on network fluxes for studying the disguised toric locus. We prove that the disguised toric locus of any network $G$ is a contractible manifold with boundary, and introduce an associated graph $G^{\max}$ that characterizes its interior. These theoretical tools allow us, for the first time, to compute the full disguised toric locus for many networks of interest.
The transcriptional response to genetic perturbation reveals fundamental insights into complex cellular systems. While current approaches have made progress in predicting genetic perturbation responses, they provide limited biological understanding and cannot systematically refine existing knowledge. Overcoming these limitations requires an end-to-end integration of data-driven learning and existing knowledge. However, this integration is challenging due to inconsistencies between data and knowledge bases, such as noise, misannotation, and incompleteness. To address this challenge, we propose ALIGNED (Adaptive aLignment for Inconsistent Genetic kNowledgE and Data), a neuro-symbolic framework based on the Abductive Learning (ABL) paradigm. This end-to-end framework aligns neural and symbolic components and performs systematic knowledge refinement. We introduce a balanced consistency metric to evaluate the predictions' consistency against both data and knowledge. Our results show that ALIGNED outperforms state-of-the-art methods by achieving the highest balanced consistency, while also re-discovering biologically meaningful knowledge. Our work advances beyond existing methods to enable both the transparency and the evolution of mechanistic biological understanding.
Single-cell and single-nucleus RNA sequencing (scRNA-seq /snRNA-seq) are widely used to reveal heterogeneity in cells, showing a growing potential for precision and personalized medicine. Nonetheless, sustainable drug discovery must be based on a population-level understanding of molecular mechanisms, which calls for the population-scale analysis of scRNA-seq/snRNA-seq data. This work introduces a sequential target-drug selection model for drug repurposing against Alzheimer's Disease (AD) targets inferred from population-level snRNA-seq studies of AD progression in microglia cells as well as different cell types taken from an AD affected brain vascular tissue atlas, involving hundreds of thousands of nuclei from multi-patient and multi-regional studies. We utilize Persistent Sheaf Laplacians (PSL) to facilitate a Protein-Protein Interaction (PPI) analysis of AD targets inferred from differential gene expression (DEG), and then use machine learning models to predict repurpose-able DrugBank compounds for molecular targeting. We screen the efficacy of different DrugBank small compounds and further examine their central nervous system (CNS)-relevant ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity), resulting in a list of lead candidates for AD treatment. The list of significant genes establishes a target domain for effective machine learning based AD drug repurposing analysis of DrugBank small compounds to treat AD related molecular targets.
Control of transcription presides over a vast array of biological processes including through gene regulatory circuits that exhibit multistability. Two- and three-gene network motifs are often found to be critical parts of the repertoire of metabolic and developmental pathways. Theoretical models of these circuits, however, typically vary parameters such as dissociation constants, transcription rates, and degradation rates without specifying precisely how these parameters are controlled biologically. In this paper, we examine the role of effector molecules, which can alter the concentrations of the active transcription factors that control regulation, and are ubiquitous to regulatory processes across biological settings. We specifically consider allosteric regulation in the context of extending the standard bistable switch to three-gene networks, and explore the rich multistable dynamics exhibited in these architectures as a function of effector concentrations. We then study how the conditions required for tristability and more complex dynamics, and the bifurcations in dynamic phase space upon tuning effector concentrations, evolve under various interpretations of regulatory circuit mechanics, the underlying activity of inducers, and perturbations thereof. Notably, the biological mechanism by which we model effector control over dual-function proteins transforms not only the phenotypic trend of dynamic tuning but also the set of available dynamic regimes. In this way, we determine key parameters and regulatory features that drive phenotypic decisions, and offer an experimentally tunable structure for encoding inducible multistable behavior arising from both single and dual-function allosteric transcription factors.