Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
Linear Programming (LP) is widely applied in industry and is a key component of various other mathematical problem-solving techniques. Recent work introduced an LP compiler translating polynomial-time, polynomial-space algorithms into polynomial-size LPs using intuitive high-level programming languages, offering a promising alternative to manually specifying each set of constraints through Algebraic Modeling Languages (AMLs). However, the resulting LPs, while polynomial in size, are often extremely large, posing challenges for existing LP solvers. In this paper, we propose a novel approach for generating substantially smaller LPs from algorithms. Our goal is to establish minimum-size compact LP formulations for problems in P having natural formulations with exponential extension complexities. Our broader vision is to enable the systematic generation of Compact Integer Programming (CIP) formulations for problems with exponential-size IPs having polynomial-time separation oracles. To this end, we introduce a hierarchical linear pipelining technique that decomposes nested program structures into synchronized regions with well-defined execution transitions -- functions of compile-time parameters. This decomposition allows us to localize LP constraints and variables within each region, significantly reducing LP size without the loss of generality, ensuring the resulting LP remains valid for all inputs of size $n$. We demonstrate the effectiveness of our method on two benchmark problems -- the makespan problem, which has exponential extension complexity, and the weighted minimum spanning tree problem -- both of which have exponential-size natural LPs. Our results show up to a $25$-fold reduction in LP size and substantial improvements in solver performance across both commercial and non-commercial LP solvers.
Solving Bayesian inverse problems typically involves deriving a posterior distribution using Bayes' rule, followed by sampling from this posterior for analysis. Sampling methods, such as general-purpose Markov chain Monte Carlo (MCMC), are commonly used, but they require prior and likelihood densities to be explicitly provided. In cases where expressing the prior explicitly is challenging, implicit priors offer an alternative, encoding prior information indirectly. These priors have gained increased interest in recent years, with methods like Plug-and-Play (PnP) priors and Regularized Linear Randomize-then-Optimize (RLRTO) providing computationally efficient alternatives to standard MCMC algorithms. However, the abstract concept of implicit priors for Bayesian inverse problems is yet to be systematically explored and little effort has been made to unify different kinds of implicit priors. This paper presents a computational framework for implicit priors and their distinction from explicit priors. We also introduce an implementation of various implicit priors within the CUQIpy Python package for Computational Uncertainty Quantification in Inverse Problems. Using this implementation, we showcase several implicit prior techniques by applying them to a variety of different inverse problems from image processing to parameter estimation in partial differential equations.
Signature-based methods have recently gained significant traction in machine learning for sequential data. In particular, signature kernels have emerged as powerful discriminators and training losses for generative models on time-series, notably in quantitative finance. However, existing implementations do not scale to the dataset sizes and sequence lengths encountered in practice. We present pySigLib, a high-performance Python library offering optimised implementations of signatures and signature kernels on CPU and GPU, fully compatible with PyTorch's automatic differentiation. Beyond an efficient software stack for large-scale signature-based computation, we introduce a novel differentiation scheme for signature kernels that delivers accurate gradients at a fraction of the runtime of existing libraries.
This study presents novel strategies for improving the node-level performance of matrix-free evaluation of continuous and discontinuous Galerkin spatial discretizations on unstructured tetrahedral grids. In our approach the underlying integrals of a generic finite-element operator are computed cell-by-cell through numerical quadrature using tabulated dense local matrices of shape functions, achieving high throughput for low to moderate-order polynomial degrees. By employing dense matrix-matrix products instead of matrix-vector products for the cell-wise interpolation, the method reaches over $60\%$ of peak performance. The optimization strategies exploit explicit data parallelism to enhance computational efficiency, complemented by a hierarchical mesh reordering algorithm that improves data locality. The matrix-free implementation achieves up to a $6\times$ speedup compared to a global sparse matrix-based approach at a polynomial degree of three. The effectiveness of the method is demonstrated through numerical experiments on the Poisson and Navier--Stokes equations. The Poisson operator is preconditioned by a hybrid multigrid scheme that combines auxiliary continuous finite-element spaces, polynomial and geometric coarsening where possible while employing algebraic multigrid on the coarse mesh. Within the preconditioner, the implementation transitions between the matrix-free and matrix-based strategies for optimal efficiency. Finally, we analyze the strong scaling behavior of the Poisson and Helmholtz operators, demonstrating the method's potential to solve large real-world problems.
This article introduces HYLU, a hybrid parallel LU factorization-based general-purpose solver designed for efficiently solving sparse linear systems (Ax=b) on multi-core shared-memory architectures. The key technical feature of HYLU is the integration of hybrid numerical kernels so that it can adapt to various sparsity patterns of coefficient matrices. Tests on 34 sparse matrices from SuiteSparse Matrix Collection reveal that HYLU outperforms Intel MKL PARDISO in the numerical factorization phase by geometric means of 1.74X (for one-time solving) and 2.26X (for repeated solving). HYLU can be downloaded from https://github.com/chenxm1986/hylu.
The database community lacks a unified relational query language for subset selection and optimisation queries, limiting both user expression and query optimiser reasoning about such problems. Decades of research (latterly under the rubric of prescriptive analytics) have produced powerful evaluation algorithms with incompatible, ad-hoc SQL extensions that specify and filter through distinct mechanisms. We present the first unified algebraic foundation for these queries, introducing relational exponentiation to complete the fundamental algebraic operations alongside union (addition) and cross product (multiplication). First, we extend relational algebra to complete domain relations-relations defined by characteristic functions rather than explicit extensions-achieving the expressiveness of NP-complete/hard problems, while simultaneously providing query safety for finite inputs. Second, we introduce solution sets, a higher-order relational algebra over sets of relations that naturally expresses search spaces as functions f: Base to Decision, yielding |Decision|^|Base| candidate relations. Third, we provide structure-preserving translation semantics from solution sets to standard relational algebra, enabling mechanical translation to existing evaluation algorithms. This framework achieves the expressiveness of the most powerful prior approaches while providing the theoretical clarity and compositional properties absent in previous work. We demonstrate the capabilities these algebras open up through a polymorphic SQL where standard clauses seamlessly express data management, subset selection, and optimisation queries within a single paradigm.
Basic computer arithmetic operations, such as $+$, $\times$, or $\div$ are correctly rounded, whilst mathematical functions such as $e^x$, $\ln(x)$, or $\sin(x)$ in general are not, meaning that separate implementations may provide different results when presented with an exact same input, and that their accuracy may differ. We present a methodology and a software tool that is suited for exhaustive and non-exhaustive testing of mathematical functions of Julia in various floating-point formats. The software tool is useful to the users of Julia, to quantise the level of accuracy of the mathematical functions and interpret possible effects of errors on their scientific computation codes that depend on these functions. It is also useful to the developers and maintainers of the functions in Julia Base, to test the modifications to existing functions and to test the accuracy of new functions. The software (a test bench) is designed to be easy to set up for running the accuracy tests in automatic regression testing. Our focus is to provide software that is user friendly and allows to avoid the need for specialised knowledge of floating-point arithmetic or the workings of mathematical functions; users only need to supply a list of formats, choose the rounding modes, and specify the input space search strategies based on how long they can afford the testing to run. We have utilized the test bench to determine the errors of a subset of mathematical functions in the latest version of Julia, for binary16, binary32, and binary64 IEEE 754 floating-point formats, and found $0.49$ to $0.51$ULPs in binary16, and $0.5$ to $2.4$ULPs of error in binary32 and binary64. The functions that may be correctly rounded (error of $0.5$ULP) in all the three formats are sqrt and cbrt. The following functions may be correctly rounded only for binary16: sinh, asin, cospi, sinpi, atanh, log2, tanh.
We present a GPU implementation for the factorization and solution of block-tridiagonal symmetric positive definite linear systems, which commonly arise in time-dependent estimation and optimal control problems. Our method employs a recursive algorithm based on Schur complement reduction, transforming the system into a hierarchy of smaller, independent blocks that can be efficiently solved in parallel using batched BLAS/LAPACK routines. While batched routines have been used in sparse solvers, our approach applies these kernels in a tailored way by exploiting the block-tridiagonal structure known in advance. Performance benchmarks based on our open-source, cross-platform implementation, TBD-GPU, demonstrate the advantages of this tailored utilization: achieving substantial speed-ups compared to state-of-the-art CPU direct solvers, including CHOLMOD and HSL MA57, while remaining competitive with NVIDIA cuDSS. However, the current implementation still performs sequential calls of batched routines at each recursion level, and the block size must be sufficiently large to adequately amortize kernel launch overhead.
For a datastream, the change over a short interval is often of low rank. For high throughput information arranged in matrix format, recomputing an optimal SVD approximation after each step is typically prohibitive. Instead, incremental and truncated updating strategies are used, which may not scale for large truncation ranks. Therefore, we propose a set of efficient new algorithms that update a bidiagonal factorization, and which are similarly accurate as the SVD methods. In particular, we develop a compact Householder-type algorithm that decouples a sparse part from a low-rank update and has about half the memory requirements of standard bidiagonalization methods. A second algorithm based on Givens rotations has only about 10 flops per rotation and scales quadratically with the problem size, compared to a typical cubic scaling. The algorithm is therefore effective for processing high-throughput updates, as we demonstrate in tracking large subspaces of recommendation systems and networks, and when compared to well known software such as LAPACK or the incremental SVD.
The DEViaN-LM is a R package that allows to detect the values poorly explained by a Gaussian linear model. The procedure is based on the maximum of the absolute value of the studentized residuals, which is a free statistic of the parameters of the model. This approach makes it possible to generalize several procedures used to detect abnormal values during longitudinal monitoring of certain biological markers. In this article, we describe the method used, and we show how to implement it on different real datasets.
One might argue that solving a trajectory optimization problem over a million grid points is preposterous. How about solving such a problem at an incredibly fast computational time? On a small form-factor processor? Algorithmic details that make possible this trifecta of breakthroughs are presented in this paper. The computational mathematics that deliver these advancements are: (i) a Birkhoff-theoretic discretization of optimal control problems, (ii) matrix-free linear algebra leveraging Krylov-subspace methods, and (iii) a near-perfect Birkhoff preconditioner that helps achieve $\mathcal{O}(1)$ iteration speed with respect to the grid size,~$N$. A key enabler of this high performance is the computation of Birkhoff matrix-vector products at $\mathcal{O}(N\log(N))$ time using fast Fourier transform techniques that eliminate traditional computational bottlenecks. A numerical demonstration of this unprecedented scale and speed is illustrated for a practical astrodynamics problem.
The Lean mathematical library Mathlib is one of the fastest-growing libraries of formalised mathematics. We describe various strategies to manage this growth, while allowing for change and avoiding maintainer overload. This includes dealing with breaking changes via a deprecation system, using code quality analysis tools (linters) to provide direct user feedback about common pitfalls, speeding up compilation times through conscious library (re-)design, dealing with technical debt as well as writing custom tooling to help with the review and triage of new contributions.
Accurate prediction of urban wind flow is essential for urban planning, pedestrian safety, and environmental management. Yet, it remains challenging due to uncertain boundary conditions and the high cost of conventional CFD simulations. This paper presents the use of the modular and efficient uncertainty quantification (UQ) framework OpenLB-UQ for urban wind flow simulations. We specifically use the lattice Boltzmann method (LBM) coupled with a stochastic collocation (SC) approach based on generalized polynomial chaos (gPC). The framework introduces a relative-error noise model for inflow wind speeds based on real measurements. The model is propagated through a non-intrusive SC LBM pipeline using sparse-grid quadrature. Key quantities of interest, including mean flow fields, standard deviations, and vertical profiles with confidence intervals, are efficiently computed without altering the underlying deterministic solver. We demonstrate this on a real urban scenario, highlighting how uncertainty localizes in complex flow regions such as wakes and shear layers. The results show that the SC LBM approach provides accurate, uncertainty-aware predictions with significant computational efficiency, making OpenLB-UQ a practical tool for real-time urban wind analysis.
High level programming languages and GPU accelerators are powerful enablers for a wide range of applications. Achieving scalable vertical (within a compute node), horizontal (across compute nodes), and temporal (over different generations of hardware) performance while retaining productivity requires effective abstractions. Distributed arrays are one such abstraction that enables high level programming to achieve highly scalable performance. Distributed arrays achieve this performance by deriving parallelism from data locality, which naturally leads to high memory bandwidth efficiency. This paper explores distributed array performance using the STREAM memory bandwidth benchmark on a variety of hardware. Scalable performance is demonstrated within and across CPU cores, CPU nodes, and GPU nodes. Horizontal scaling across multiple nodes was linear. The hardware used spans decades and allows a direct comparison of hardware improvements for memory bandwidth over this time range; showing a 10x increase in CPU core bandwidth over 20 years, 100x increase in CPU node bandwidth over 20 years, and 5x increase in GPU node bandwidth over 5 years. Running on hundreds of MIT SuperCloud nodes simultaneously achieved a sustained bandwidth $>$1 PB/s.
We present a Julia-based interface to the precompiled HALLaR and cuHALLaR binaries for large-scale semidefinite programs (SDPs). Both solvers are established as fast and numerically stable, and accept problem data in formats compatible with SDPA and a new enhanced data format taking advantage of Hybrid Sparse Low-Rank (HSLR) structure. The interface allows users to load custom data files, configure solver options, and execute experiments directly from Julia. A collection of example problems is included, including the SDP relaxations of the Matrix Completion and Maximum Stable Set problems.
Uncertainty quantification (UQ) is crucial in computational fluid dynamics to assess the reliability and robustness of simulations, given the uncertainties in input parameters. OpenLB is an open-source lattice Boltzmann method library designed for efficient and extensible simulations of complex fluid dynamics on high-performance computers. In this work, we leverage the efficiency of OpenLB for large-scale flow sampling with a dedicated and integrated UQ module. To this end, we focus on non-intrusive stochastic collocation methods based on generalized polynomial chaos and Monte Carlo sampling. The OpenLB-UQ framework is extensively validated in convergence tests with respect to statistical metrics and sample efficiency using selected benchmark cases, including two-dimensional Taylor--Green vortex flows with up to four-dimensional uncertainty and a flow past a cylinder. Our results confirm the expected convergence rates and show promising scalability, demonstrating robust statistical accuracy as well as computational efficiency. OpenLB-UQ enhances the capability of the OpenLB library, offering researchers a scalable framework for UQ in incompressible fluid flow simulations and beyond.
Model Predictive Control (MPC) is a powerful technique to control nonlinear, multi-input multi-output systems subject to input and state constraints. It is now a standard tool for trajectory tracking control of automated vehicles. As such it has been used in many research and development projects. However, MPC faces several challenges to be integrated into industrial production vehicles. The most important ones are its high computational demands and the complexity of implementation. The software packages AutoMPC aims to address both of these challenges. It builds on a robustified version of an active set algorithm for Nonlinear MPC. The algorithm is embedded into a framework for vehicle trajectory tracking, which makes it easy to used, yet highly customizable. Automatic code generation transforms the selections into a standalone, computationally efficient C-code file with static memory allocation. As such it can be readily deployed on a wide range of embedded platforms, e.g., based on Matlab/Simulink or Robot Operating System (ROS). Compared to a previous version of the code, the vehicle model and the numerical integration method can be manually specified, besides basic algorithm parameters. All of this information and all specifications are directly baked into the generated C-code. The algorithm is suitable driving scenarios at low or high speeds, even drifting, and supports direction changes. Multiple simulation scenarios show the versatility and effectiveness of the AutoMPC code, with the guarantee of a feasible solution, a high degree of robustness, and computational efficiency.
Quantum circuit simulations play a critical role in bridging the gap between theoretical quantum algorithms and their practical realization on physical quantum hardware, yet they face computational challenges due to the exponential growth of quantum state spaces with increasing qubit size. This work presents PennyLane-Lightning MPI, an MPI-based extension of the PennyLane-Lightning suite, developed to enable scalable quantum circuit simulations through parallelization of quantum state vectors and gate operations across distributed-memory systems. The core of this implementation is an index-dependent, gate-specific parallelization strategy, which fully exploits the characteristic of individual gates as well as the locality of computation associated with qubit indices in partitioned state vectors. Benchmarking tests with single gates and well-designed quantum circuits show that the present method offers advantages in performance over general methods based on unitary matrix operations and exhibits excellent scalability, supporting simulations of up to 41-qubit with hundreds of thousands of parallel processes. Being equipped with a Python plug-in for seamless integration to the PennyLane framework, this work contributes to extending the PennyLane ecosystem by enabling high-performance quantum simulations in standard multi-core CPU clusters with no library-specific requirements, providing a back-end resource for the cloud-based service framework of quantum computing that is under development in the Republic of Korea.