Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
Ferrohydrodynamic microfluidics relies on magnetic field gradients to manipulate diamagnetic particles in ferrofluid-filled microenvironments. It has emerged as a promising tool for label-free manipulation of bioparticles, including their separation and phenotyping. This perspective reviews recent progress in the development and applications of ferrofluid-based microfluidic platforms for multiscale bioparticle separation, ranging from micron-scale cells to submicron extracellular vesicles. We highlight the fundamental physical principles for ferrohydrodynamic manipulation, including the dominant magnetic buoyancy force resulting from the interaction of ferrofluids and particles. We then describe how these principles enable high-resolution size-based bioparticle separation, subcellular bioparticle enrichment, and phenotypic screening based on physical traits. We also discuss key challenges in ferrohydrodynamic microfluidics from the aspects of ferrofluid biocompatibility, system throughput, and nanoparticle depletion. Finally, we outline future research directions involving machine learning, 3D printing, and multiplexed detection. These insights chart a path for advancing ferrofluid-based technologies in precision biomedicine, diagnostics, and cellular engineering.
Background: The rapid evolution of personalized neoantigen vaccines has been accelerated by artificial intelligence (AI)-based prediction models. Yet, a consistent framework to evaluate the translational fidelity between computational predictions and clinical outcomes remains lacking. Methods: This systematic synthesis analyzed six melanoma vaccine trials conducted between 2017 and 2025 across mRNA, peptide, and dendritic cell platforms. We introduced the Algorithm-to-Outcome Concordance (AOC) metric - a quantitative measure linking model performance (AUC) with clinical efficacy (HR/ORR) - and integrated mechanistic, economic, and regulatory perspectives. Results: Simulated AOC values across studies ranged from 0.42-0.79, suggesting heterogeneous concordance between algorithmic prediction and observed outcomes. High tumor mutational burden and clonal neoantigen dominance correlated with improved translational fidelity. Economic modeling suggested that achieving AOC >0.7 could reduce ICER below $100,000/QALY. Conclusions: This framework quantitatively bridges AI-driven neoantigen prediction with clinical translation, offering a reproducible metric for future personalized vaccine validation and regulatory standardization. This study presents AOC as a hypothesis-generating tool, with all computations based on simulated or aggregated trial data for demonstration purposes only.
In control problems and basic scientific modeling, it is important to compare observations with dynamical simulations. For example, comparing two neural systems can shed light on the nature of emergent computations in the brain and deep neural networks. Recently, Ostrow et al. (2023) introduced Dynamical Similarity Analysis (DSA), a method to measure the similarity of two systems based on their recurrent dynamics rather than geometry or topology. However, DSA does not consider how inputs affect the dynamics, meaning that two similar systems, if driven differently, may be classified as different. Because real-world dynamical systems are rarely autonomous, it is important to account for the effects of input drive. To this end, we introduce a novel metric for comparing both intrinsic (recurrent) and input-driven dynamics, called InputDSA (iDSA). InputDSA extends the DSA framework by estimating and comparing both input and intrinsic dynamic operators using a variant of Dynamic Mode Decomposition with control (DMDc) based on subspace identification. We demonstrate that InputDSA can successfully compare partially observed, input-driven systems from noisy data. We show that when the true inputs are unknown, surrogate inputs can be substituted without a major deterioration in similarity estimates. We apply InputDSA on Recurrent Neural Networks (RNNs) trained with Deep Reinforcement Learning, identifying that high-performing networks are dynamically similar to one another, while low-performing networks are more diverse. Lastly, we apply InputDSA to neural data recorded from rats performing a cognitive task, demonstrating that it identifies a transition from input-driven evidence accumulation to intrinsically-driven decision-making. Our work demonstrates that InputDSA is a robust and efficient method for comparing intrinsic dynamics and the effect of external input on dynamical systems.
Collective behaviors in cellular systems are regulated not only by biochemical signalling pathways but also by intercellular mechanical forces, whose quantification in contractile monolayers remains poorly understood. Here, by integrating traction force microscopy and numerical simulations, we reconstruct the stress distribution in C2C12 myoblast monolayers to reveal the roles of local mechanical forces in determining the collective cellular structures. We find that contractile monolayers exhibit positive maximum and negative minimum principal stresses, reflecting the intrinsic anisotropy of active tension. Distinct stress patterns emerge around topological defects, coinciding with singularities in cell alignment, density, and morphology, indicating a strong coupling between mechanical forces and structural organization. Moreover, tensile stresses are preferentially transmitted along the cell elongation axis and compressive stresses transversely, demonstrating that local stress guides cell arrangement. This mechanical guidance appears to be universal among contractile systems, as observed also in bone marrow-derived mesenchymal stem cells. Together, our work establishes a quantitative framework for characterizing mechanical anisotropy in active cellular monolayers and reveals a general principle of force-structure coupling, providing a physical basis for understanding how mechanics governs myogenesis, morphogenesis, and collective organization in contractile cellular systems.
Traditional non-biological storage media, such as hard drives, face limitations in both storage density and lifespan due to the rapid growth of data in the big data era. Mirror-image peptides composed of D-amino acids have emerged as a promising biological storage medium due to their high storage density, structural stability, and long lifespan. The sequencing of mirror-image peptides relies on \textit{de-novo} technology. However, its accuracy is limited by the scarcity of tandem mass spectrometry datasets and the challenges that current algorithms encounter when processing these peptides directly. This study is the first to propose improving sequencing accuracy indirectly by optimizing the design of mirror-image peptide sequences. In this work, we introduce DBond, a deep neural network based model that integrates sequence features, precursor ion properties, and mass spectrometry environmental factors for the prediction of mirror-image peptide bond cleavage. In this process, sequences with a high peptide bond cleavage ratio, which are easy to sequence, are selected. The main contributions of this study are as follows. First, we constructed MiPD513, a tandem mass spectrometry dataset containing 513 mirror-image peptides. Second, we developed the peptide bond cleavage labeling algorithm (PBCLA), which generated approximately 12.5 million labeled data based on MiPD513. Third, we proposed a dual prediction strategy that combines multi-label and single-label classification. On an independent test set, the single-label classification strategy outperformed other methods in both single and multiple peptide bond cleavage prediction tasks, offering a strong foundation for sequence optimization.
Advances in single-cell sequencing have enabled high-resolution profiling of diverse molecular modalities, while integrating unpaired multi-omics single-cell data remains challenging. Existing approaches either rely on pair information or prior correspondences, or require computing a global pairwise coupling matrix, limiting their scalability and flexibility. In this paper, we introduce a scalable and flexible generative framework called single-cell Multi-omics Regularized Disentangled Representations (scMRDR) for unpaired multi-omics integration. Specifically, we disentangle each cell's latent representations into modality-shared and modality-specific components using a well-designed $\beta$-VAE architecture, which are augmented with isometric regularization to preserve intra-omics biological heterogeneity, adversarial objective to encourage cross-modal alignment, and masked reconstruction loss strategy to address the issue of missing features across modalities. Our method achieves excellent performance on benchmark datasets in terms of batch correction, modality alignment, and biological signal preservation. Crucially, it scales effectively to large-level datasets and supports integration of more than two omics, offering a powerful and flexible solution for large-scale multi-omics data integration and downstream biological discovery.
Mapping habitat quality, based on factors like host availability and environmental suitability, is a common approach to determining which locations are important for the spread of a species. Mapping habitat connectivity takes geographic analyses a step further, evaluating the potential roles of locations in biological invasions, pandemics, or species conservation. Locations with high habitat quality may play a minor role in species spread if they are geographically isolated. Yet, a location with lower habitat quality may play a major role in a species' spread if it acts as a bridge between regions that would otherwise be physically fragmented. Here we introduce the geohabnet R package, which evaluates the potential importance of locations for the spread of species through habitat landscapes. geohabnet incorporates key factors such as dispersal probabilities and habitat availability in a network framework, for better understanding habitat connectivity for host-dependent species, such as pathogens, arthropod pests, or pollinators. geohabnet uses publicly available or user-provided datasets, six network centrality metrics, and a user-selected geographic scale. We provide examples using geohabnet for surveillance prioritization of emerging plant pests in Africa and the Americas. These examples illustrate how users can apply geohabnet for their species of interest and generate maps of the estimated importance of geographic locations for species spread. geohabnet provides a quick, open-source, and reproducible baseline to quantify a species' habitat connectivity across a wide range of geographic scales and evaluates potential scenarios for the expansion of a species through habitat landscapes. geohabnet supports biosecurity programs, invasion science, and conservation biology when prioritizing management efforts for transboundary pathogens, pests, or endangered species.
Diffusion MRI has revealed important insights into white matter microstructure, but its application to gray matter remains comparatively less explored. Here, we investigate whether global patterns of gray-matter microstructure can be captured through neurite orientation dispersion and density imaging (NODDI) and whether such patterns are predictive of cognitive performance. Our findings demonstrate that PCA-based global indicators of gray-matter microstructure provide complementary markers of structure-function relationships, extending beyond region-specific analyses. Our results suggest that general microstructure factors may serve as robust, interpretable biomarkers for studying cognition and cortical organization at the population level. Using diffusion MRI and behavioral data from the Human Connectome Project Young Adult study, we derived region-averaged NODDI parameters and applied principal component analysis (PCA) to construct general gray-matter microstructure factors. We found that the factor derived from isotropic volume fraction explained substantial inter-individual variability and was significantly correlated with specific cognitive scores collected from the NIH Toolbox. In particular, the isotropic volume fraction factor was linked to reading and vocabulary performance and to cognitive fluidity.
Accurately predicting the three-dimensional structures of protein-ligand complexes remains a fundamental challenge in computational drug discovery that limits the pace and success of therapeutic design. Deep learning methods have recently shown strong potential as structural prediction tools, achieving promising accuracy across diverse biomolecular systems. However, their performance and utility are constrained by scarce experimental data, inefficient architectures, physically invalid poses, and the limited ability to exploit auxiliary information available at inference. To address these issues, we introduce Pearl (Placing Every Atom in the Right Location), a foundation model for protein-ligand cofolding at scale. Pearl addresses these challenges with three key innovations: (1) training recipes that include large-scale synthetic data to overcome data scarcity; (2) architectures that incorporate an SO(3)-equivariant diffusion module to inherently respect 3D rotational symmetries, improving generalization and sample efficiency, and (3) controllable inference, including a generalized multi-chain templating system supporting both protein and non-polymeric components as well as dual unconditional/conditional modes. Pearl establishes a new state-of-the-art performance in protein-ligand cofolding. On the key metric of generating accurate (RMSD < 2 \r{A}) and physically valid poses, Pearl surpasses AlphaFold 3 and other open source baselines on the public Runs N' Poses and PoseBusters benchmarks, delivering 14.5% and 14.2% improvements, respectively, over the next best model. In the pocket-conditional cofolding regime, Pearl delivers $3.6\times$ improvement on a proprietary set of challenging, real-world drug targets at the more rigorous RMSD < 1 \r{A} threshold. Finally, we demonstrate that model performance correlates directly with synthetic dataset size used in training.
Artificial intelligence in medicine is built to serve the average patient. By minimizing error across large datasets, most systems deliver strong aggregate accuracy yet falter at the margins: patients with rare variants, multimorbidity, or underrepresented demographics. This average patient fallacy erodes both equity and trust. We propose a different design: a multi-agent ecosystem for N-of-1 decision support. In this environment, agents clustered by organ systems, patient populations, and analytic modalities draw on a shared library of models and evidence synthesis tools. Their results converge in a coordination layer that weighs reliability, uncertainty, and data density before presenting the clinician with a decision-support packet: risk estimates bounded by confidence ranges, outlier flags, and linked evidence. Validation shifts from population averages to individual reliability, measured by error in low-density regions, calibration in the small, and risk--coverage trade-offs. Anticipated challenges include computational demands, automation bias, and regulatory fit, addressed through caching strategies, consensus checks, and adaptive trial frameworks. By moving from monolithic models to orchestrated intelligence, this approach seeks to align medical AI with the first principle of medicine: care that is transparent, equitable, and centered on the individual.
Proteins are traditionally optimized through the costly construction and measurement of many mutants. Active Learning-assisted Directed Evolution (ALDE) alleviates that cost by predicting the best improvements and iteratively testing mutants to inform predictions. However, existing ALDE methods face a critical limitation: selecting the highest-predicted mutants in each round yields homogeneous training data insufficient for accurate prediction models in subsequent rounds. Here we present FolDE, an ALDE method designed to maximize end-of-campaign success. In simulations across 20 protein targets, FolDE discovers 23% more top 10% mutants than the best baseline ALDE method (p=0.005) and is 55% more likely to find top 1% mutants. FolDE achieves this primarily through naturalness-based warm-starting, which augments limited activity measurements with protein language model outputs to improve activity prediction. We also introduce a constant-liar batch selector, which improves batch diversity; this is important in multi-mutation campaigns but had limited effect in our benchmarks. The complete workflow is freely available as open-source software, making efficient protein optimization accessible to any laboratory.
Accurate quantification in positron emission tomography (PET) is essential for accurate diagnostic results and effective treatment tracking. A major issue encountered in PET imaging is attenuation. Attenuation refers to the diminution of photon detected as they traverse biological tissues before reaching detectors. When such corrections are absent or inadequate, this signal degradation can introduce inaccurate quantification, making it difficult to differentiate benign from malignant conditions, and can potentially lead to misdiagnosis. Typically, this correction is done with co-computed Computed Tomography (CT) imaging to obtain structural data for calculating photon attenuation across the body. However, this methodology subjects patients to extra ionizing radiation exposure, suffers from potential spatial misregistration between PET/CT imaging sequences, and demands costly equipment infrastructure. Emerging advances in neural network architectures present an alternative approach via synthetic CT image synthesis. Our investigation reveals that Conditional Denoising Diffusion Probabilistic Models (DDPMs) can generate high quality CT images from non attenuation corrected PET images in order to correct attenuation. By utilizing all three orthogonal views from non-attenuation-corrected PET images, the DDPM approach combined with ensemble voting generates higher quality pseudo-CT images with reduced artifacts and improved slice-to-slice consistency. Results from a study of 159 head scans acquired with the Siemens Biograph Vision PET/CT scanner demonstrate both qualitative and quantitative improvements in pseudo-CT generation. The method achieved a mean absolute error of 32 $\pm$ 10.4 HU on the CT images and an average error of (1.48 $\pm$ 0.68)\% across all regions of interest when comparing PET images reconstructed using the attenuation map of the generated pseudo-CT versus the true CT.
Understanding the relationship between antibody sequence, structure and function is essential for the design of antibody-based therapeutics and research tools. Recently, machine learning (ML) models mostly based on the application of large language models to sequence information have been developed to predict antibody properties. Yet there are open directions to incorporate structural information, not only to enhance prediction but also to offer insights into the underlying molecular mechanisms. This chapter provides an overview of these approaches and describes two ML frameworks that integrate structural data (via graph representations) with neural networks to predict properties of antibodies: ANTIPASTI predicts binding affinity (a global property) whereas INFUSSE predicts residue flexibility (a local property). We survey the principles underpinning these models; the ways in which they encode structural knowledge; and the strategies that can be used to extract biologically relevant statistical signals that can help discover and disentangle molecular determinants of the properties of interest.
Non-invasive colorectal cancer (CRC) screening represents a key opportunity to improve colonoscopy participation rates and reduce CRC mortality. This study explores the potential of the gut-liver axis for predicting colorectal neoplasia through liver-derived radiomic features extracted from routine CT images as a novel opportunistic screening approach. In this retrospective study, we analyzed data from 1,997 patients who underwent colonoscopy and abdominal CT. Patients either had no colorectal neoplasia (n=1,189) or colorectal neoplasia (n_total=808; adenomas n=423, CRC n=385). Radiomics features were extracted from 3D liver segmentations using the Radiomics Processing ToolKit (RPTK), which performed feature extraction, filtering, and classification. The dataset was split into training (n=1,397) and test (n=600) cohorts. Five machine learning models were trained with 5-fold cross-validation on the 20 most informative features, and the best model ensemble was selected based on the validation AUROC. The best radiomics-based XGBoost model achieved a test AUROC of 0.810, clearly outperforming the best clinical-only model (test AUROC: 0.457). Subclassification between colorectal cancer and adenoma showed lower accuracy (test AUROC: 0.674). Our findings establish proof-of-concept that liver-derived radiomics from routine abdominal CT can predict colorectal neoplasia. Beyond offering a pragmatic, widely accessible adjunct to CRC screening, this approach highlights the gut-liver axis as a novel biomarker source for opportunistic screening and sparks new mechanistic hypotheses for future translational research.
Accurate protein function prediction requires integrating heterogeneous intrinsic signals (e.g., sequence and structure) with noisy extrinsic contexts (e.g., protein-protein interactions and GO term annotations). However, two key challenges hinder effective fusion: (i) cross-modal distributional mismatch among embeddings produced by pre-trained intrinsic encoders, and (ii) noisy relational graphs of extrinsic data that degrade GNN-based information aggregation. We propose Diffused and Aligned Multi-modal Protein Embedding (DAMPE), a unified framework that addresses these through two core mechanisms. First, we propose Optimal Transport (OT)-based representation alignment that establishes correspondence between intrinsic embedding spaces of different modalities, effectively mitigating cross-modal heterogeneity. Second, we develop a Conditional Graph Generation (CGG)-based information fusion method, where a condition encoder fuses the aligned intrinsic embeddings to provide informative cues for graph reconstruction. Meanwhile, our theoretical analysis implies that the CGG objective drives this condition encoder to absorb graph-aware knowledge into its produced protein representations. Empirically, DAMPE outperforms or matches state-of-the-art methods such as DPFunc on standard GO benchmarks, achieving AUPR gains of 0.002-0.013 pp and Fmax gains 0.004-0.007 pp. Ablation studies further show that OT-based alignment contributes 0.043-0.064 pp AUPR, while CGG-based fusion adds 0.005-0.111 pp Fmax. Overall, DAMPE offers a scalable and theoretically grounded approach for robust multi-modal protein representation learning, substantially enhancing protein function prediction.
Prenatal maternal stress (PS) is a risk factor for adverse offspring neurodevelopment. Heart rate variability (HRV) complexity provides a non-invasive marker of maternal autonomic regulation and may be influenced by mind--body interventions such as Yoga. In this quasi-randomized controlled trial, 28 chronically stressed pregnant women were followed from the second trimester until birth: 14 participated in weekly Hatha Yoga with electrocardiogram (ECG) recordings, and 14 received standard obstetric care with monthly ECGs. Group allocation was based on availability, with participants unaware of their assignment at enrollment. HRV complexity was assessed first with Sample Entropy and Entropy Rate and then expanded to 94 HRV metrics spanning temporal, frequency, nonlinear, and information-theoretical domains. All metrics were covariate-adjusted (maternal age, BMI, gestational age), standardized, and analyzed using timepoint-specific principal component analysis (PCA). From this, a unified HRV index was derived. Analyses revealed that HRV metric relationships changed dynamically across pregnancy, with PCA loadings shifting from frequency toward complexity measures in late gestation. The mixed effects model identified a significant time x group interaction effect (p = 0.041). These findings suggest a restructuring of HRV signal-analytical domains with advancing pregnancy attributable to Yoga and highlight the utility of advanced HRV analysis frameworks for future, larger trials.
Understanding how creativity is represented in the brain's intrinsic functional architecture remains a central challenge in cognitive neuroscience. While resting-state fMRI studies have revealed large-scale network correlates of creative potential, electroencephalography (EEG) offers a temporally precise and scalable approach to capture the fast oscillatory dynamics that underlie spontaneous neural organization. In this study, we used a data-driven network approach to examine whether resting-state EEG connectivity patterns differentiate individuals according to their creative abilities. Creativity was evaluated by: The Inventory of Creative Activities and Achievements (ICAA), The Divergent Association Task (DAT), The Matchstick Arithmetic Puzzles Task (MAPT) and Self-rating (SR) of creative ability in 30 healthy young adults. Graph-theoretical analyses were applied to functional connectivity matrices and clustered based on graph similarity. Two distinct participant clusters emerged, differing systematically across multiple dimensions of creativity. Cluster 1, characterized by consistently higher performance across multiple creativity variables (ICAA, DAT, MAPT and SR), showed broad alpha-band hypoconnectivity, relatively preserved left frontal connectivity and greater network modularity. Cluster 0, associated with lower creativity scores, exhibited stronger overall connectivity strength, reduced modularity and higher local clustering. These findings suggest that resting-state EEG connectivity patterns can index stable cognitive traits such as creativity. More broadly, they point to an intrinsic neural signature of adaptive brain function marked by efficient yet flexible network organization that may support creative and adaptive cognition.
Background: Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD) affects ~33% of U.S. adults and is the most common chronic liver disease. Although often asymptomatic, progression can lead to cirrhosis. Early detection is important, as lifestyle interventions can prevent disease progression. We developed a fair, rigorous, and reproducible MASLD prediction model and compared it to prior methods using a large electronic health record database. Methods: We evaluated LASSO logistic regression, random forest, XGBoost, and a neural network for MASLD prediction using clinical feature subsets, including the top 10 SHAP-ranked features. To reduce disparities in true positive rates across racial and ethnic subgroups, we applied an equal opportunity postprocessing method. Results: This study included 59,492 patients in the training data, 24,198 in the validating data, and 25,188 in the testing data. The LASSO logistic regression model with the top 10 features was selected for its interpretability and comparable performance. Before fairness adjustment, the model achieved AUROC of 0.84, accuracy of 78%, sensitivity of 72%, specificity of 79%, and F1-score of 0.617. After equal opportunity postprocessing, accuracy modestly increased to 81% and specificity to 94%, while sensitivity decreased to 41% and F1-score to 0.515, reflecting the fairness trade-off. Conclusions: We developed the MASER prediction model (MASLD Static EHR Risk Prediction), a LASSO logistic regression model which achieved competitive performance for MASLD prediction (AUROC 0.836, accuracy 77.6%), comparable to previously reported ensemble and tree-based models. Overall, this approach demonstrates that interpretable models can achieve a balance of predictive performance and fairness in diverse patient populations.