Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
We employed Multifractal Detrended Fluctuation Analysis (MF-DFA) and Refined Composite Multiscale Sample Entropy (RCMSE) to investigate the complexity of Bitcoin, GBP/USD, gold, and natural gas price log-return time series. This study provides a comparative analysis of these markets and offers insights into their predictability and associated risks. Each tool presents a unique method to quantify time series complexity. The RCMSE and MF-DFA methods demonstrate a higher complexity for the Bitcoin time series than others. It is discussed that the increased complexity of Bitcoin may be attributable to the presence of higher nonlinear correlations within its log-return time series.
Quantum-enhanced machine learning, encompassing both quantum algorithms and quantum-inspired classical methods such as tensor networks, offers promising tools for extracting structure from complex, high-dimensional data. In this work, we study the training dynamics of Matrix Product State (MPS) classifiers applied to three-class problems, using both fashion MNIST and hyper-spectral satellite imagery as representative datasets. We investigate the phenomenon of grokking, where generalization emerges suddenly after memorization, by tracking entanglement entropy, local magnetization, and model performance across training sweeps. Additionally, we employ information theory tools to gain deeper insights: transfer entropy is used to reveal causal dependencies between label-specific quantum masks, while O-information captures the shift from synergistic to redundant correlations among class outputs. Our results show that grokking in the fashion MNIST task coincides with a sharp entanglement transition and a peak in redundant information, whereas the overfitted hyper-spectral model retains synergistic, disordered behavior. These findings highlight the relevance of high-order information dynamics in quantum-inspired learning and emphasize the distinct learning behaviors that emerge in multi-class classification, offering a principled framework to interpret generalization in quantum machine learning architectures.
Precise modeling of detector energy response is crucial for next-generation neutrino experiments which present computational challenges due to lack of analytical likelihoods. We propose a solution using neural likelihood estimation within the simulation-based inference framework. We develop two complementary neural density estimators that model likelihoods of calibration data: conditional normalizing flows and a transformer-based regressor. We adopt JUNO - a large neutrino experiment - as a case study. The energy response of JUNO depends on several parameters, all of which should be tuned, given their non-linear behavior and strong correlations in the calibration data. To this end, we integrate the modeled likelihoods with Bayesian nested sampling for parameter inference, achieving uncertainties limited only by statistics with near-zero systematic biases. The normalizing flows model enables unbinned likelihood analysis, while the transformer provides an efficient binned alternative. By providing both options, our framework offers flexibility to choose the most appropriate method for specific needs. Finally, our approach establishes a template for similar applications across experimental neutrino and broader particle physics.
Wall-pressure fluctuations beneath turbulent boundary layers drive noise and structural fatigue through interactions between fluid and structural modes. Conventional predictive models for the spectrum--such as the widely accepted Goody model--fail to capture the energetic growth in the subconvective regime that occurs at high Reynolds number, while at the same time over-predicting the variance. To address these shortcomings, two semi-empirical models are proposed for the wall-pressure spectrum in canonical turbulent boundary layers, pipes and channels for friction Reynolds numbers $\delta^+$ ranging from 180 to 47 000. The models are based on consideration of two eddy populations that broadly represent the contributions to the wall pressure fluctuations from inner-scale motions and outer-scale motions. The first model expresses the premultiplied spectrum as the sum of two overlapping log-normal populations: an inner-scaled term that is $\delta^+$-invariant and an outer-scaled term whose amplitude broadens smoothly with $\delta^+$. Calibrated against large-eddy simulations, direct numerical simulations, and recent high-$\delta^+$ pipe data, it reproduces the convective ridge and the emergence of a sub-convective ridge at large $\delta^+$. The second model, developed around newly-available pipe data, uses theoretical arguments to prescribe the spectral shapes of the inner and outer populations. By embedding the $\delta^+$ dependence in smooth asymptotic functions, it yields a formulation that varies continuously with $\delta^+$. Both models capture the full spectrum and the logarithmic growth of its variance, laying the groundwork for more accurate engineering predictions of wall-pressure fluctuations.
Extreme events in complex physical systems, such as anomalous wind gusts, often cause significant material and human damage. Their modeling is crucial for risk assessment and understanding the underlying dynamics. In this work, we introduce a local influence analysis to assess the stability of a class of extreme-value Birnbaum-Saunders regression models, which are particularly suited for analyzing such data. The proposed approach uses the conformal normal curvature (CNC) of the log-likelihood function to diagnose the influence of individual observations on the postulated model. By examining the eigenvalues and eigenvectors associated with the CNC, we identify influential data points-physical events that disproportionately affect the model's parameters. We illustrate the methodology through a simulation study and apply it to a time series of wind gust data from Itajai, Brazil, where a severe event caused multiple damages and casualties. Our approach successfully pinpoints this specific event as a highly influential observation and quantifies its impact on the fitted model. This work provides a valuable diagnostic tool for physicists and data scientists working with extreme-value models of complex natural phenomena.
Human mobility forms the backbone of contact patterns through which infectious diseases propagate, fundamentally shaping the spatio-temporal dynamics of epidemics and pandemics. While traditional models are often based on the assumption that all individuals have the same probability of infecting every other individual in the population, a so-called random homogeneous mixing, they struggle to capture the complex and heterogeneous nature of real-world human interactions. Recent advancements in data-driven methodologies and computational capabilities have unlocked the potential of integrating high-resolution human mobility data into epidemic modeling, significantly improving the accuracy, timeliness, and applicability of epidemic risk assessment, contact tracing, and intervention strategies. This review provides a comprehensive synthesis of the current landscape in human mobility-informed epidemic modeling. We explore diverse sources and representations of human mobility data, and then examine the behavioral and structural roles of mobility and contact in shaping disease transmission dynamics. Furthermore, the review spans a wide range of epidemic modeling approaches, ranging from classical compartmental models to network-based, agent-based, and machine learning models. And we also discuss how mobility integration enhances risk management and response strategies during epidemics. By synthesizing these insights, the review can serve as a foundational resource for researchers and practitioners, bridging the gap between epidemiological theory and the dynamic complexities of human interaction while charting clear directions for future research.
Many data-science applications involve detecting a shared signal between two high-dimensional variables. Using random matrix theory methods, we determine when such signal can be detected and reconstructed from sample correlations, despite the background of sampling noise induced correlations. We consider three different covariance matrices constructed from two high-dimensional variables: their individual self covariance, their cross covariance, and the self covariance of the concatenated (joint) variable, which incorporates the self and the cross correlation blocks. We observe the expected Baik, Ben Arous, and P\'ech\'e detectability phase transition in all these covariance matrices, and we show that joint and cross covariance matrices always reconstruct the shared signal earlier than the self covariances. Whether the joint or the cross approach is better depends on the mismatch of dimensionalities between the variables. We discuss what these observations mean for choosing the right method for detecting linear correlations in data and how these findings may generalize to nonlinear statistical dependencies.
Quantum generative adversarial networks (QGANs) have been investigated as a method for generating synthetic data with the goal of augmenting training data sets for neural networks. This is especially relevant for financial time series, since we only ever observe one realization of the process, namely the historical evolution of the market, which is further limited by data availability and the age of the market. However, for classical generative adversarial networks it has been shown that generated data may (often) not exhibit desired properties (also called stylized facts), such as matching a certain distribution or showing specific temporal correlations. Here, we investigate whether quantum correlations in quantum inspired models of QGANs can help in the generation of financial time series. We train QGANs, composed of a quantum generator and a classical discriminator, and investigate two approaches for simulating the quantum generator: a full simulation of the quantum circuits, and an approximate simulation using tensor network methods. We tested how the choice of hyperparameters, such as the circuit depth and bond dimensions, influenced the quality of the generated time series. The QGAN that we trained generate synthetic financial time series that not only match the target distribution but also exhibit the desired temporal correlations, with the quality of each property depending on the hyperparameters and simulation method.
We introduce a benchmark framework developed by and for the scientific community to evaluate, monitor and steer large language model development in fundamental physics. Building on philosophical concepts of scientific understanding and creativity, we develop a scoring system in which each question is scored by an expert for its correctness, difficulty, and surprise. The questions are of three forms: (i) multiple-choice questions for conceptual understanding, (ii) analytical problems requiring mathematical derivation, and (iii) openended tasks requiring complex problem solving. Our current dataset contains diverse set of examples, including a machine learning challenge to classify high-energy physics events, such as the four top quark signal. To ensure continued relevance, we propose a living benchmark, where physicists contribute questions, for instance alongside new publications. We invite contributions via: http://www.physicsbenchmarks.org/. We hope that this benchmark will enable a targeted AI development that can make a meaningful contribution to fundamental physics research.
Within the continuous endeavour of improving the efficiency and resilience of air transport, the trend of using concepts and metrics from statistical physics has recently gained momentum. This scientific discipline, which integrates elements from physics and statistics, aims at extracting knowledge about the microscale rules governing a (potentially complex) system when only its macroscale is observable. Translated to air transport, this entails extracting information about how individual operations are managed, by only studying coarse-grained information, e.g. average delays. We here review some fundamental concepts of statistical physics, and explore how these have been applied to the analysis of time series representing different aspects of the air transport system. In order to overcome the abstractness and complexity of some of these concepts, intuitive definitions and explanations are provided whenever possible. We further conclude by discussing the main obstacles towards a more widespread adoption of statistical physics in air transport, and sketch topics that we believe may be relevant in the future.
Accurately measuring street dimensions is essential to evaluating how their design influences both travel behavior and safety. However, gathering street-level information at city scale with precision is difficult given the quantity and complexity of urban intersections. To address this challenge in the context of pedestrian crossings - a crucial component of walkability - we introduce a scalable and accurate method for automatically measuring crossing distance at both marked and unmarked crosswalks, applied to America's 100 largest cities. First, OpenStreetMap coordinates were used to retrieve satellite imagery of intersections throughout each city, totaling roughly three million images. Next, Meta's Segment Anything Model was trained on a manually-labelled subset of these images to differentiate drivable from non-drivable surfaces (i.e., roads vs. sidewalks). Third, all available crossing edges from OpenStreetMap were extracted. Finally, crossing edges were overlaid on the segmented intersection images, and a grow-cut algorithm was applied to connect each edge to its adjacent non-drivable surface (e.g., sidewalk, private property, etc.), thus enabling the calculation of crossing distance. This achieved 93 percent accuracy in measuring crossing distance, with a median absolute error of 2 feet 3 inches (0.69 meters), when compared to manually-verified data for an entire city. Across the 100 largest US cities, median crossing distance ranges from 32 feet to 78 feet (9.8 to 23.8m), with detectable regional patterns. Median crossing distance also displays a positive relationship with cities' year of incorporation, illustrating in a novel way how American cities increasingly emphasize wider (and more car-centric) streets.
In this work, we present LensingFlow. This is an implementation of an automated workflow to search for evidence of gravitational lensing in a large series of gravitational wave events. This workflow conducts searches for evidence in all generally considered lensing regimes. The implementation of this workflow is built atop the Asimov automation framework and CBCFlow metadata management software and the resulting product therefore encompasses both the automated running and status checking of jobs in the workflow as well as the automated production and storage of relevant metadata from these jobs to allow for later reproduction. This workflow encompasses a number of existing lensing pipelines and has been designed to accommodate any additional future pipelines to provide both a current and future basis on which to conduct large scale lensing analyses of gravitational wave signal catalogues. The workflow also implements a prioritisation management system for jobs submitted to the schedulers in common usage in computing clusters ensuring both the completion of the workflow across the entire catalogue of events as well as the priority completion of the most significant candidates. As a first proof-of-concept demonstration, we deploy LensingFlow on a mock data challenge comprising 10 signals in which signatures of each lensing regime are represented. LensingFlow successfully ran and identified the candidates from this data through its automated checks of results from consituent analyses.
Convolution operations are foundational to classical image processing and modern deep learning architectures, yet their extension into the quantum domain has remained algorithmically and physically costly due to inefficient data encoding and prohibitive circuit complexity. In this work, we present a resource-efficient quantum algorithm that reformulates the convolution product as a structured matrix multiplication via a novel sparse reshaping formalism. Leveraging the observation that localized convolutions can be encoded as doubly block-Toeplitz matrix multiplications, we construct a quantum framework wherein sparse input patches are prepared using optimized key-value QRAM state encoding, while convolutional filters are represented as quantum states in superposition. The convolution outputs are computed through inner product estimation using a low-depth SWAP test circuit, which yields probabilistic amplitude information with reduced sampling overhead. Our architecture supports batched convolution across multiple filters using a generalized SWAP circuit. Compared to prior quantum convolutional approaches, our method eliminates redundant preparation costs, scales logarithmically with input size under sparsity, and enables direct integration into hybrid quantum-classical machine learning pipelines. This work provides a scalable and physically realizable pathway toward quantum-enhanced feature extraction, opening up new possibilities for quantum convolutional neural networks and data-driven quantum inference.
Dimensional analysis provides a universal framework for reducing physical complexity and reveal inherent laws. However, its application to high-dimensional systems still generates redundant dimensionless parameters, making it challenging to establish physically meaningful descriptions. Here, we introduce Hierarchical Dimensionless Learning (Hi-{\pi}), a physics-data hybrid-driven method that combines dimensional analysis and symbolic regression to automatically discover key dimensionless parameter combination(s). We applied this method to classic examples in various research fields of fluid mechanics. For the Rayleigh-B\'enard convection, this method accurately extracted two intrinsic dimensionless parameters: the Rayleigh number and the Prandtl number, validating its unified representation advantage across multiscale data. For the viscous flows in a circular pipe, the method automatically discovers two optimal dimensionless parameters: the Reynolds number and relative roughness, achieving a balance between accuracy and complexity. For the compressibility correction in subsonic flow, the method effectively extracts the classic compressibility correction formulation, while demonstrating its capability to discover hierarchical structural expressions through optimal parameter transformations.
We analyze the influence of Liouvillian exceptional points (LEPs) in the three-level quantum absorption refrigerator, putting emphasis on the non-equilibrium process before the convergence to the steady state. We search for the second-order and third-order LEPs in the system with two types of couplings. Focusing on the third-order LEPs, we analyze the damping of the system state in the long term analytically and numerically. In addition, we analyze the damping of heat currents and the influence of the non-equilibrium process in the heat extraction from the cold bath. Critical damping at LEPs of both the system state and the heat currents is achieved, implying the fastest convergence to the equilibrium system. During the non-equilibrium process, we find that much heat transfer from the cold bath to the hot bath with less energy cost of the work bath is achieved at the third-order LEP, leading to better performance of the refrigerator.
Particle physics experiments rely on the (generalised) likelihood ratio test (LRT) for searches and measurements, which consist of composite hypothesis tests. However, this test is not guaranteed to be optimal, as the Neyman-Pearson lemma pertains only to simple hypothesis tests. Any choice of test statistic thus implicitly determines how statistical power varies across the parameter space. An improvement in the core statistical testing methodology for general settings with composite tests would have widespread ramifications across experiments. We discuss an alternate test statistic that provides the data analyzer an ability to focus the power of the test on physics-motivated regions of the parameter space. We demonstrate the improvement from this technique compared to the LRT on a Higgs $\rightarrow\tau\tau$ dataset simulated by the ATLAS experiment and a dark matter dataset inspired by the LZ experiment. We also employ machine learning to efficiently perform the Neyman construction, which is essential to ensure statistically valid confidence intervals.
This Brief Communication introduces a graph-neural-network architecture built on geometric vector perceptrons to predict the committor function directly from atomic coordinates, bypassing the need for hand-crafted collective variables (CVs). The method offers atom-level interpretability, pinpointing the key atomic players in complex transitions without relying on prior assumptions. Applied across diverse molecular systems, the method accurately infers the committor function and highlights the importance of each heavy atom in the transition mechanism. It also yields precise estimates of the rate constants for the underlying processes. The proposed approach opens new avenues for understanding and modeling complex dynamics, by enabling CV-free learning and automated identification of physically meaningful reaction coordinates of complex molecular processes.
We theoretically propose a symmetric encryption scheme based on Restricted Boltzmann Machines that functions as a probabilistic Enigma device, encoding information in the marginal distributions of visible states while utilizing bias permutations as cryptographic keys. Theoretical analysis reveals significant advantages including factorial key space growth through permutation matrices, excellent diffusion properties, and computational complexity rooted in sharp P-complete problems that resist quantum attacks. Compatible with emerging probabilistic computing hardware, the scheme establishes an asymmetric computational barrier where legitimate users decrypt efficiently while adversaries face exponential costs. This framework unlocks probabilistic computers' potential for cryptographic systems, offering an emerging encryption paradigm between classical and quantum regimes for post-quantum security.