Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
We develop a framework for the operationalization of models and parameters by combining de Finetti's representation theorem with a conditional form of Sanov's theorem. This synthesis, the tilted de Finetti theorem, shows that conditioning exchangeable sequences on empirical moment constraints yields predictive laws in exponential families via the I-projection of a baseline measure. Parameters emerge as limits of empirical functionals, providing a probabilistic foundation for maximum entropy (MaxEnt) principles. This explains why exponential tilting governs likelihood methods and Bayesian updating, connecting naturally to finite-sample concentration rates that anticipate PAC-Bayes bounds. Examples include Gaussian scale mixtures, where symmetry uniquely selects location-scale families, and Jaynes' Brandeis dice problem, where partial information tilts the uniform law. Broadly, the theorem unifies exchangeability, large deviations, and entropy concentration, clarifying the ubiquity of exponential families and MaxEnt's role as the inevitable predictive limit under partial information.
We propose a semiparametric framework for causal inference with right-censored survival outcomes and many weak invalid instruments, motivated by Mendelian randomization in biobank studies where classical methods may fail. We adopt an accelerated failure time model and construct a moment condition based on augmented inverse probability of censoring weighting, incorporating both uncensored and censored observations. Under a heteroscedasticity-based condition on the treatment model, we establish point identification of the causal effect despite censoring and invalid instruments. We propose GEL-NOW (Generalized Empirical Likelihood with Non-Orthogonal and Weak moments) for valid inference under these conditions. A divergent number of Neyman orthogonal nuisance functions is estimated using deep neural networks. A key challenge is that the conditional censoring distribution is a non-Neyman orthogonal nuisance, contributing to the first-order asymptotics of the estimator for the target causal effect parameter. We derive the asymptotic distribution and explicitly incorporate this additional uncertainty into the asymptotic variance formula. We also introduce a censoring-adjusted over-identification test that accounts for this variance component. Simulation studies and UK Biobank applications demonstrate the method's robustness and practical utility.
We make three contributions to conformal prediction. First, we propose fuzzy conformal confidence sets that offer a degree of exclusion, generalizing beyond the binary inclusion/exclusion offered by classical confidence sets. We connect fuzzy confidence sets to e-values to show this degree of exclusion is equivalent to an exclusion at different confidence levels, capturing precisely what e-values bring to conformal prediction. We show that a fuzzy confidence set is a predictive distribution with a more appropriate error guarantee. Second, we derive optimal conformal confidence sets by interpreting the minimization of the expected measure of the confidence set as an optimal testing problem against a particular alternative. We use this to characterize exactly in what sense traditional conformal prediction is optimal. Third, we generalize the inheritance of guarantees by subsequent minimax decisions from confidence sets to fuzzy confidence sets. All our results generalize beyond the exchangeable conformal setting to prediction sets for arbitrary models. In particular, we find that any valid test (e-value) for a hypothesis automatically defines a (fuzzy) prediction confidence set.
In this article, we introduce a novel non-parametric predictor, based on conditional expectation, for the unknown diffusion coefficient function $\sigma$ in the stochastic partial differential equation $Lu = \sigma(u)\dot{W}$, where $L$ is a parabolic second order differential operator and $\dot{W}$ is a suitable Gaussian noise. We prove consistency and derive an upper bound for the error in the $L^p$ norm, in terms of discretization and smoothening parameters $h$ and $\varepsilon$. We illustrate the applicability of the approach and the role of the parameters with several interesting numerical examples.
In this article, we propose a least squares method for the estimation of the transition density in bifurcating Markov models. Unlike the kernel estimation, this method do not use the quotient which can be a source of errors. In order to study the rate of convergence for least squares estimators, we develop exponential inequalities for empirical process of bifurcating Markov chain under bracketing assumption. Unlike the classical processes, we observe that for bifurcating Markov chains, the complexity parameter depends on the ergodicity rate and as consequence, we have that the convergence rate of our estimator is a function of the ergodicity rate. We conclude with a numerical study to validate our theoretical results.
This paper addresses the statistical estimation of Gaussian Mixture Models (GMMs) with unknown diagonal covariances from independent and identically distributed samples. We employ the Beurling-LASSO (BLASSO), a convex optimization framework that promotes sparsity in the space of measures, to simultaneously estimate the number of components and their parameters. Our main contribution extends the BLASSO methodology to multivariate GMMs with component-specific unknown diagonal covariance matrices-a significantly more flexible setting than previous approaches requiring known and identical covariances. We establish non-asymptotic recovery guarantees with nearly parametric convergence rates for component means, diagonal covariances, and weights, as well as for density prediction. A key theoretical contribution is the identification of an explicit separation condition on mixture components that enables the construction of non-degenerate dual certificates-essential tools for establishing statistical guarantees for the BLASSO. Our analysis leverages the Fisher-Rao geometry of the statistical model and introduces a novel semi-distance adapted to our framework, providing new insights into the interplay between component separation, parameter space geometry, and achievable statistical recovery.
In the Admixture Model, the probability that an individual carries a certain allele at a specific marker depends on the allele frequencies in $K$ ancestral populations and the proportion of the individual's genome originating from these populations. The markers are assumed to be independent. The Linkage Model is a Hidden Markov Model (HMM) that extends the Admixture Model by incorporating linkage between neighboring loci. This study investigates the consistency and central limit behavior of maximum likelihood estimators (MLEs) for individual ancestry in the Linkage Model, complementing earlier results by \citep{pfaff2004information, pfaffelhuber2022central, heinzel2025consistency} for the Admixture Model. These theoretical results are used to prove theoretical properties of a statistical test that allows for model selection between the Admixture Model and the Linkage Model. Finally, we demonstrate the practical relevance of our results by applying the test to real-world data from \cite{10002015global}.
This paper introduces a novel framework for estimation theory by introducing a second-order diagnostic for estimator design. While classical analysis focuses on the bias-variance trade-off, we present a more foundational constraint. This result is model-agnostic, domain-agnostic, and is valid for both parametric and non-parametric problems, Bayesian and frequentist frameworks. We propose to classify the estimators into three primary power regimes. We theoretically establish that any estimator operating in the `power-dominant regime' incurs an unavoidable mean-squared error penalty, making it structurally prone to sub-optimal performance. We propose a `safe-zone law' and make this diagnostic intuitive through two safe-zone maps. One map is a geometric visualization analogous to a receiver operating characteristic curve for estimators, and the other map shows that the safe-zone corresponds to a bounded optimization problem, while the forbidden `power-dominant zone' represents an unbounded optimization landscape. This framework reframes estimator design as a path optimization problem, providing new theoretical underpinnings for regularization and inspiring novel design philosophies.
We develop sharp bounds on the statistical distance between high-dimensional permutation mixtures and their i.i.d. counterparts. Our approach establishes a new geometric link between the spectrum of a complex channel overlap matrix and the information geometry of the channel, yielding tight dimension-independent bounds that close gaps left by previous work. Within this geometric framework, we also derive dimension-dependent bounds that uncover phase transitions in dimensionality for Gaussian and Poisson families. Applied to compound decision problems, this refined control of permutation mixtures enables sharper mean-field analyses of permutation-invariant decision rules, yielding strong non-asymptotic equivalence results between two notions of compound regret in Gaussian and Poisson models.
In this paper, we provide a new property of value at risk (VaR), which is a standard risk measure that is widely used in quantitative financial risk management. We show that the subadditivity of VaR for given loss random variables holds for any confidence level if and only if those are comonotonic. This result also gives a new equivalent condition for the comonotonicity of random vectors.
In a coherent reliability system composed of multiple components configured according to a specific structure function, the distribution of system time to failure, or system lifetime, is often of primary interest. Accurate estimation of system reliability is critical in a wide range of engineering and industrial applications, forming decisions in system design, maintenance planning, and risk assessment. The system lifetime distribution can be estimated directly using the observed system failure times. However, when component-level lifetime data is available, it can yield improved estimates of system reliability. In this work, we demonstrate that under nonparametric assumptions about the component time-to-failure distributions, traditional estimators such as the Product-Limit Estimator (PLE) can be further improved under specific loss functions. We propose a novel methodology that enhances the nonparametric system reliability estimation through a shrinkage transformation applied to component-level estimators. This shrinkage approach leads to improved efficiency in estimating system reliability.
We prove ratio-consistency of the jackknife variance estimator, and certain variants, for a broad class of generalized U-statistics whose variance is asymptotically dominated by their H\'ajek projection, with the classical fixed-order case recovered as a special instance. This H\'ajek projection dominance condition unifies and generalizes several criteria in the existing literature, placing the simple nonparametric jackknife on the same footing as the infinitesimal jackknife in the generalized setting. As an illustration, we apply our result to the two-scale distributional nearest-neighbor regression estimator, obtaining consistent variance estimates under substantially weaker conditions than previously required.
Model selection in non-linear models often prioritizes performance metrics over statistical tests, limiting the ability to account for sampling variability. We propose the use of a statistical test to assess the equality of variances in forecasting errors. The test builds upon the classic Morgan-Pitman approach, incorporating enhancements to ensure robustness against data with heavy-tailed distributions or outliers with high variance, plus a strategy to make residuals from machine learning models statistically independent. Through a series of simulations and real-world data applications, we demonstrate the test's effectiveness and practical utility, offering a reliable tool for model evaluation and selection in diverse contexts.
Tempering is a popular tool in Bayesian computation, being used to transform a posterior distribution $p_1$ into a reference distribution $p_0$ that is more easily approximated. Several algorithms exist that start by approximating $p_0$ and proceed through a sequence of intermediate distributions $p_t$ until an approximation to $p_1$ is obtained. Our contribution reveals that high-quality approximation of terms up to $p_1$ is not essential, as knowledge of the intermediate distributions enables posterior quantities of interest to be extrapolated. Specifically, we establish conditions under which posterior expectations are determined by their associated tempered expectations on any non-empty $t$ interval. Harnessing this result, we propose novel methodology for approximating posterior expectations based on extrapolation and smoothing of tempered expectations, which we implement as a post-processing variance-reduction tool for sequential Monte Carlo.
It is often of interest to test a global null hypothesis using multiple, possibly dependent, $p$-values by combining their strengths while controlling the Type I error. Recently, several heavy-tailed combinations tests, such as the harmonic mean test and the Cauchy combination test, have been proposed: they map $p$-values into heavy-tailed random variables before combining them in some fashion into a single test statistic. The resulting tests, which are calibrated under the assumption of independence of the $p$-values, have shown to be rather robust to dependence. The complete understanding of the calibration properties of the resulting combination tests of dependent and possibly tail-dependent $p$-values has remained an important open problem in the area. In this work, we show that the powerful framework of multivariate regular variation (MRV) offers a nearly complete solution to this problem. We first show that the precise asymptotic calibration properties of a large class of homogeneous combination tests can be expressed in terms of the angular measure -- a characteristic of the asymptotic tail-dependence under MRV. Consequently, we show that under MRV, the Pareto-type linear combination tests, which are equivalent to the harmonic mean test, are universally calibrated regardless of the tail-dependence structure of the underlying $p$-values. In contrast, the popular Cauchy combination test is shown to be universally honest but often conservative; the Tippet combination test, while being honest, is calibrated if and only if the underlying $p$-values are tail-independent. One of our major findings is that the Pareto-type linear combination tests are the only universally calibrated ones among the large family of possibly non-linear homogeneous heavy-tailed combination tests.
Most statistical models for networks focus on pairwise interactions between nodes. However, many real-world networks involve higher-order interactions among multiple nodes, such as co-authors collaborating on a paper. Hypergraphs provide a natural representation for these networks, with each hyperedge representing a set of nodes. The majority of existing hypergraph models assume uniform hyperedges (i.e., edges of the same size) or rely on diversity among nodes. In this work, we propose a new hypergraph model based on non-symmetric determinantal point processes. The proposed model naturally accommodates non-uniform hyperedges, has tractable probability mass functions, and accounts for both node similarity and diversity in hyperedges. For model estimation, we maximize the likelihood function under constraints using a computationally efficient projected adaptive gradient descent algorithm. We establish the consistency and asymptotic normality of the estimator. Simulation studies confirm the efficacy of the proposed model, and its utility is further demonstrated through edge predictions on several real-world datasets.
This study develops and evaluates a novel hybridWavelet SARIMA Transformer, WST framework to forecast using monthly rainfall across five meteorological subdivisions of Northeast India over the 1971 to 2023 period. The approach employs the Maximal Overlap Discrete Wavelet Transform, MODWT with four wavelet families such as, Haar, Daubechies, Symlet, Coiflet etc. to achieve shift invariant, multiresolution decomposition of the rainfall series. Linear and seasonal components are modeled using Seasonal ARIMA, SARIMA, while nonlinear components are modeled by a Transformer network, and forecasts are reconstructed via inverse MODWT. Comprehensive validation using an 80 is to 20 train test split and multiple performance indices such as, RMSE, MAE, SMAPE, Willmotts d, Skill Score, Percent Bias, Explained Variance, and Legates McCabes E1 demonstrates the superiority of the Haar-based hybrid model, WHST. Across all subdivisions, WHST consistently achieved lower forecast errors, stronger agreement with observed rainfall, and unbiased predictions compared with stand alone SARIMA, stand-alone Transformer, and two-stage wavelet hybrids. Residual adequacy was confirmed through the Ljung Box test, while Taylor diagrams provided an integrated assessment of correlation, variance fidelity, and RMSE, further reinforcing the robustness of the proposed approach. The results highlight the effectiveness of integrating multiresolution signal decomposition with complementary linear and deep learning models for hydroclimatic forecasting. Beyond rainfall, the proposed WST framework offers a scalable methodology for forecasting complex environmental time series, with direct implications for flood risk management, water resources planning, and climate adaptation strategies in data-sparse and climate-sensitive regions.
We study the spectral properties of a stochastic process obtained by multiplicative inversion of a non-zero-mean Gaussian process. We show that its autocorrelation and power spectrum exist for most regular processes, and we find a convergent series expansion of the autocorrelation function in powers of the ratio between mean and standard deviation of the underlying Gaussian process. We apply the results to two sample processes, and we validate the theoretical results with simulations based on standard signal processing techniques.