Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
We consider the problem of performing inference on the number of common stochastic trends when data is generated by a cointegrated CKSVAR (a two-regime, piecewise-linear SVAR; Mavroeidis, 2021), using a modified version of the Breitung (2002) multivariate variance ratio test that is robust to the presence of nonlinear cointegration (of a known form). To derive the asymptotics of our test statistic, we prove a fundamental LLN-type result for a class of stable but nonstationary autoregressive processes, using a novel dual linear process approximation. We show that our modified test yields correct inferences regarding the number of common trends in such a system, whereas the unmodified test tends to infer a higher number of common trends than are actually present, when cointegrating relations are nonlinear.
In randomized experiments, the assumption of potential outcomes is usually accompanied by the \emph{joint exogeneity} assumption. Although joint exogeneity has faced criticism as a counterfactual assumption since its proposal, no evidence has yet demonstrated its violation in randomized experiments. In this paper, we reveal such a violation in a quantum experiment, thereby falsifying this assumption, at least in regimes where classical physics cannot provide a complete description. We further discuss its implications for potential outcome modelling, from both practial and philosophical perspectives.
Many causal and structural parameters in economics can be identified and estimated by computing the value of an optimization program over all distributions consistent with the model and the data. Existing tools apply when the data is discrete, or when only disjoint marginals of the distribution are identified, which is restrictive in many applications. We develop a general framework that yields sharp bounds on a linear functional of the unknown true distribution under i) an arbitrary collection of identified joint subdistributions and ii) structural conditions, such as (conditional) independence. We encode the identification restrictions as a continuous collection of moments of characteristic kernels, and use duality and approximation theory to rewrite the infinite-dimensional program over Borel measures as a finite-dimensional program that is simple to compute. Our approach yields a consistent estimator that is $\sqrt{n}$-uniformly valid for the sharp bounds. In the special case of empirical optimal transport with Lipschitz cost, where the minimax rate is $n^{2/d}$, our method yields a uniformly consistent estimator with an asymmetric rate, converging at $\sqrt{n}$ uniformly from one side.
Continuous monitoring is becoming more popular due to its significant benefits, including reducing sample sizes and reaching earlier conclusions. In general, it involves monitoring nuisance parameters (e.g., the variance of outcomes) until a specific condition is satisfied. The blinded method, which does not require revealing group assignments, was recommended because it maintains the integrity of the experiment and mitigates potential bias. Although Friede and Miller (2012) investigated the characteristics of blinded continuous monitoring through simulation studies, its theoretical properties are not fully explored. In this paper, we aim to fill this gap by presenting the asymptotic and finite-sample properties of the blinded continuous monitoring for continuous outcomes. Furthermore, we examine the impact of using blinded versus unblinded variance estimators in the context of continuous monitoring. Simulation results are also provided to evaluate finite-sample performance and to support the theoretical findings.
Modern data analysis increasingly requires identifying shared latent structure across multiple high-dimensional datasets. A commonly used model assumes that the data matrices are noisy observations of low-rank matrices with a shared singular subspace. In this case, two primary methods have emerged for estimating this shared structure, which vary in how they integrate information across datasets. The first approach, termed Stack-SVD, concatenates all the datasets, and then performs a singular value decomposition (SVD). The second approach, termed SVD-Stack, first performs an SVD separately for each dataset, then aggregates the top singular vectors across these datasets, and finally computes a consensus amongst them. While these methods are widely used, they have not been rigorously studied in the proportional asymptotic regime, which is of great practical relevance in today's world of increasing data size and dimensionality. This lack of theoretical understanding has led to uncertainty about which method to choose and limited the ability to fully exploit their potential. To address these challenges, we derive exact expressions for the asymptotic performance and phase transitions of these two methods and develop optimal weighting schemes to further improve both methods. Our analysis reveals that while neither method uniformly dominates the other in the unweighted case, optimally weighted Stack-SVD dominates optimally weighted SVD-Stack. We extend our analysis to accommodate multiple shared components, and provide practical algorithms for estimating optimal weights from data, offering theoretical guidance for method selection in practical data integration problems. Extensive numerical simulations and semi-synthetic experiments on genomic data corroborate our theoretical findings.
Adequacy for estimation between an inferential method and a model can be de{\ldots}ned through two main requirements: {\ldots}rstly the inferential tool should de{\ldots}ne a well posed problem when applied to the model; secondly the resulting statistical procedure should produce consistent estimators. Conditions which entail these analytical and statistical issues are considered in the context when divergence based inference is applied for smooth semiparametric models under moment restrictions. A discussion is also held on the choice of the divergence, extending the classical parametric inference to the estimation of both parameters of interest and of nuisance. Arguments in favor of the omnibus choice of the L 2 and Kullback Leibler choices as presented in [16] are discussed and motivation for the class of power divergences de{\ldots}ned in [5] is presented in the context of the present semi parametric smooth models. A short simulation study illustrates the method.
There is a growing interest in procedures for Bayesian inference that bypass the need to specify a model and prior but simply rely on a predictive rule that describes how we learn on future observations given the available ones. At the heart of the idea is a bootstrap-type scheme that allows us to move from the realm of prediction to that of inference. Which conditions the predictive rule needs to satisfy to produce valid inference is a key question. In this work, we substantially relax previous assumptions building on a generalization of martingales, opening up the possibility of employing a much wider range of predictive rules that were previously ruled out. These include ``old" ideas in Statistics and Learning Theory, such as kernel estimators, and more novel ones, such as the parametric Bayesian bootstrap or copula-based algorithms. Our aim is not to advocate in favor of one predictive rule versus the other ones, but rather to showcase the benefits of working with this larger class of predictive rules.
We investigate convergence of alternating Bregman projections between non-convex sets and prove convergence to a point in the intersection, or to points realizing a gap between the two sets. The speed of convergence is generally sub-linear, but may be linear under transversality. We apply our analysis to prove convergence of versions of the expectation maximization algorithm for non-convex parameter sets.
We study the problem of efficiency under $\alpha$ local differential privacy ($\alpha$ LDP) in both discrete and continuous settings. Building on a factorization lemma, which shows that any privacy mechanism can be decomposed into an extremal mechanism followed by additional randomization, we reduce the Fisher information maximization problem to a search over extremal mechanisms. The representation of extremal mechanisms requires working in infinite dimensional spaces and invokes advanced tools from convex and functional analysis, such as Choquet's theorem. Our analysis establishes matching upper and lower bounds on the Fisher information in the high privacy regime ($\alpha \to 0$), and proves that the maximization problem always admits a solution for any $\alpha$. As a concrete application, we consider the problem of estimating the parameter of a uniform distribution on $[0, \theta]$ under $\alpha$ LDP. Guided by our theoretical findings, we design an extremal mechanism that yields a consistent and asymptotically efficient estimator in high privacy regime. Numerical experiments confirm our theoretical results.
In this study, we derive the exact distribution and moment of the noncentral complex Roy's largest root statistic, expressed as a product of complex zonal polynomials. We show that the linearization coefficients arising from the product of complex zonal polynomials in the distribution of Roy's test under a specific alternative hypothesis can be explicitly computed using Pieri's formula, a well-known result in combinatorics. These results were then applied to compute the power of tests in the complex multivariate analysis of variance (MANOVA).
In the setting of multiple testing, compound p-values generalize p-values by asking for superuniformity to hold only \emph{on average} across all true nulls. We study the properties of the Benjamini--Hochberg procedure applied to compound p-values. Under independence, we show that the false discovery rate (FDR) is at most $1.93\alpha$, where $\alpha$ is the nominal level, and exhibit a distribution for which the FDR is $\frac{7}{6}\alpha$. If additionally all nulls are true, then the upper bound can be improved to $\alpha + 2\alpha^2$, with a corresponding worst-case lower bound of $\alpha + \alpha^2/4$. Under positive dependence, on the other hand, we demonstrate that FDR can be inflated by a factor of $O(\log m)$, where~$m$ is the number of hypotheses. We provide numerous examples of settings where compound p-values arise in practice, either because we lack sufficient information to compute non-trivial p-values, or to facilitate a more powerful analysis.
In information geometry, statistical models are considered as differentiable manifolds, where each probability distribution represents a unique point on the manifold. A Riemannian metric can be systematically obtained from a divergence function using Eguchi's theory (1992); the well-known Fisher-Rao metric is obtained from the Kullback-Leibler (KL) divergence. The geometric derivation of the classical Cram\'er-Rao Lower Bound (CRLB) by Amari and Nagaoka (2000) is based on this metric. In this paper, we study a Riemannian metric obtained by applying Eguchi's theory to the Basu-Harris-Hjort-Jones (BHHJ) divergence (1998) and derive a generalized Cram\'er-Rao bound using Amari-Nagaoka's approach. There are potential applications for this bound in robust estimation.
We introduce a dimension-free Bernstein-type tail inequality for self-normalised martingales normalised by their predictable quadratic variation. As applications of our result, we propose solutions to the recent open problems posed by Mussi et al. (2024), providing computationally efficient confidence sequences for logistic regression with adaptively chosen RKHS-valued covariates, and establishing instance-adaptive regret bounds in the corresponding kernelised bandit setting.
We consider the problem of testing independence in mixed-type data that combine count variables with positive, absolutely continuous variables. We first introduce two distinct classes of test statistics in the bivariate setting, designed to test independence between the components of a bivariate mixed-type vector. These statistics are then extended to the multivariate context to accommodate: (i) testing independence between vectors of different types and possibly different dimensions, and (ii) testing total independence among all components of vectors with different types. The construction is based on the recently introduced Baringhaus-Gaigall transformation, which characterizes the joint distribution of such data. We establish the asymptotic properties of the resulting tests and, through an extensive power study, demonstrate that the proposed approach is both competitive and flexible.
Since Pearson [Philosophical Transactions of the Royal Society of London. A, 185 (1894), pp. 71-110] first applied the method of moments (MM) for modeling data as a mixture of one-dimensional Gaussians, moment-based estimation methods have proliferated. Among these methods, the generalized method of moments (GMM) improves the statistical efficiency of MM by weighting the moments appropriately. However, the computational complexity and storage complexity of MM and GMM grow exponentially with the dimension, making these methods impractical for high-dimensional data or when higher-order moments are required. Such computational bottlenecks are more severe in GMM since it additionally requires estimating a large weighting matrix. To overcome these bottlenecks, we propose the diagonally-weighted GMM (DGMM), which achieves a balance among statistical efficiency, computational complexity, and numerical stability. We apply DGMM to study the parameter estimation problem for weakly separated heteroscedastic low-rank Gaussian mixtures and design a computationally efficient and numerically stable algorithm that obtains the DGMM estimator without explicitly computing or storing the moment tensors. We implement the proposed algorithm and empirically validate the advantages of DGMM: in numerical studies, DGMM attains smaller estimation errors while requiring substantially shorter runtime than MM and GMM. The code and data will be available upon publication at https://github.com/liu-lzhang/dgmm.
We establish a general, non-asymptotic error analysis framework for understanding the effects of incremental approximations made by practical approaches for Bayesian sequential learning (BSL) on their long-term inference performance. Our setting covers inverse problems, state estimation, and parameter-state estimation. In these settings, we bound the difference-termed the learning error-between the unknown true posterior and the approximate posterior computed by these approaches, using three widely used distribution metrics: total variation, Hellinger, and Wasserstein distances. This framework builds on our establishment of the global Lipschitz stability of the posterior with respect to the prior across these settings. To the best of our knowledge, this is the first work to establish such global Lipschitz stability under the Hellinger and Wasserstein distances and the first general error analysis framework for approximate BSL methods. Our framework offers two sets of upper bounds on the learning error. The first set demonstrates the stability of general approximate BSL methods with respect to the incremental approximation process, while the second set is estimable in many practical scenarios. Furthermore, as an initial step toward understanding the phenomenon of learning error decay, which is sometimes observed, we identify sufficient conditions under which data assimilation leads to learning error reduction.
Beta regression is commonly employed when the outcome variable is a proportion. Since its conception, the approach has been widely used in applications spanning various scientific fields. A series of extensions have been proposed over time, several of which address variable selection and penalized estimation, e.g., with an $\ell_1$-penalty (LASSO). However, a theoretical analysis of this popular approach in the context of Beta regression with high-dimensional predictors is lacking. In this paper, we aim to close this gap. A particular challenge arises from the non-convexity of the associated negative log-likelihood, which we address by resorting to a framework for analyzing stationary points in a neighborhood of the target parameter. Leveraging this framework, we derive a non-asymptotic bound on the $\ell_1$-error of such stationary points. In addition, we propose a debiasing approach to construct confidence intervals for the regression parameters. A proximal gradient algorithm is devised for optimizing the resulting penalized negative log-likelihood function. Our theoretical analysis is corroborated via simulation studies, and a real data example concerning the prediction of county-level proportions of incarceration is presented to showcase the practical utility of our methodology.
Though the underlying fields associated with vector-valued environmental data are continuous, observations themselves are discrete. For example, climate models typically output grid-based representations of wind fields or ocean currents, and these are often downscaled to a discrete set of points. By treating the area of interest as a two-dimensional manifold that can be represented as a triangular mesh and embedded in Euclidean space, this work shows that discrete intrinsic Gaussian processes for vector-valued data can be developed from discrete differential operators defined with respect to a mesh. These Gaussian processes account for the geometry and curvature of the manifold whilst also providing a flexible and practical formulation that can be readily applied to any two-dimensional mesh. We show that these models can capture harmonic flows, incorporate boundary conditions, and model non-stationary data. Finally, we apply these models to downscaling stationary and non-stationary gridded wind data on the globe, and to inference of ocean currents from sparse observations in bounded domains.