Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
We explore the information geometry of L\'evy processes. As a starting point, we derive the $\alpha$-divergence between two L\'evy processes. Subsequently, the Fisher information matrix and the $\alpha$-connection associated with the geometry of L\'evy processes are computed from the $\alpha$-divergence. In addition, we discuss statistical applications of this information geometry. As illustrative examples, we investigate the differential-geometric structures of various L\'evy processes relevant to financial modeling, including tempered stable processes, the CGMY model, and variance gamma processes.
Practitioners making decisions based on causal effects typically ignore structural uncertainty. We analyze when this uncertainty is consequential enough to warrant methodological solutions (Bayesian model averaging over competing causal structures). Focusing on bivariate relationships ($X \rightarrow Y$ vs. $X \leftarrow Y$), we establish that model averaging is beneficial when: (1) structural uncertainty is moderate to high, (2) causal effects differ substantially between structures, and (3) loss functions are sufficiently sensitive to the size of the causal effect. We prove optimality results of our suggested methodological solution under regularity conditions and demonstrate through simulations that modern causal discovery methods can provide, within limits, the necessary quantification. Our framework complements existing robust causal inference approaches by addressing a distinct source of uncertainty typically overlooked in practice.
Characteristic-function based goodness-of-fit tests are suggested for multivariate observations. The test statistics, which are straightforward to compute, are defined as two-sample criteria measuring discrepancy between multivariate ranks of the original observations and the corresponding ranks obtained from an artificial sample generated from the reference distribution under test. Multivariate ranks are constructed using the theory of the optimal measure transport, thus rendering the tests of a simple null hypothesis distribution-free, while bootstrap approximations are still necessary for testing composite null hypotheses. Asymptotic theory is developed and a simulation study, concentrating on comparisons with previously proposed tests of multivariate normality, demonstrates that the method performs well in finite samples.
We study central limit theorems for linear statistics in high-dimensional Bayesian linear regression with product priors. Unlike the existing literature where the focus is on posterior contraction, we work under a non-contracting regime where neither the likelihood nor the prior dominates the other. This is motivated by modern high-dimensional datasets characterized by a bounded signal-to-noise ratio. This work takes a first step towards understanding limit distributions for one-dimensional projections of the posterior, as well as the posterior mean, in such regimes. Analogous to contractive settings, the resulting limiting distributions are Gaussian, but they heavily depend on the chosen prior and center around the Mean-Field approximation of the posterior. We study two concrete models of interest to illustrate this phenomenon -- the white noise design, and the (misspecified) Bayesian model. As an application, we construct credible intervals and compute their coverage probability under any misspecified prior. Our proofs rely on a combination of recent developments in Berry-Esseen type bounds for Random Field Ising models and both first and second order Poincar\'{e} inequalities. Notably, our results do not require any sparsity assumptions on the prior.
We consider the classical problem of estimating the mixing distribution of binomial mixtures, but under trial heterogeneity and smoothness. This problem has been studied extensively when the trial parameter is homogeneous, but not under the more general scenario of heterogeneous trials, and only within a low smoothness regime, where the resulting rates are slow. Under the assumption that the density is s-smooth, we derive fast error rates for the kernel density estimator under trial heterogeneity that depend on the harmonic mean of the trials. Importantly, even when reduced to the homogeneous case, our result improves on the state-of-the-art rate of Ye and Bickel (2021). We also study nonparametric estimation of the difference between two densities, which can be smoother than the individual densities, in both i.i.d. and binomial-mixture settings. Our work is motivated by an application in criminal justice: comparing conviction rates of indigent representation in Pennsylvania. We find that the estimated conviction rates for appointed counsel (court-appointed private attorneys) are generally higher than those for public defenders, potentially due to a confounding factor: appointed counsel are more likely to take on severe cases.
The displayed tree phylogenetic network model is shown to sit as a natural submodel of the graphical model associated to a directed acyclic graph (DAG). This representation allows to derive a number of results about the displayed tree model. In particular, the concept of a local modification to a DAG model is developed and applied to the displayed tree model. As an application, some nonidentifiability issues related to the displayed tree models are highlighted as they relate to reticulation edges and stacked reticulations in the networks. We also derive rank conditions on flattenings of probability tensors for the displayed tree model, generalizing classic results for phylogenetic tree models.
We consider the problem of performing inference on the number of common stochastic trends when data is generated by a cointegrated CKSVAR (a two-regime, piecewise-linear SVAR; Mavroeidis, 2021), using a modified version of the Breitung (2002) multivariate variance ratio test that is robust to the presence of nonlinear cointegration (of a known form). To derive the asymptotics of our test statistic, we prove a fundamental LLN-type result for a class of stable but nonstationary autoregressive processes, using a novel dual linear process approximation. We show that our modified test yields correct inferences regarding the number of common trends in such a system, whereas the unmodified test tends to infer a higher number of common trends than are actually present, when cointegrating relations are nonlinear.
In randomized experiments, the assumption of potential outcomes is usually accompanied by the \emph{joint exogeneity} assumption. Although joint exogeneity has faced criticism as a counterfactual assumption since its proposal, no evidence has yet demonstrated its violation in randomized experiments. In this paper, we reveal such a violation in a quantum experiment, thereby falsifying this assumption, at least in regimes where classical physics cannot provide a complete description. We further discuss its implications for potential outcome modelling, from both practial and philosophical perspectives.
Many causal and structural parameters in economics can be identified and estimated by computing the value of an optimization program over all distributions consistent with the model and the data. Existing tools apply when the data is discrete, or when only disjoint marginals of the distribution are identified, which is restrictive in many applications. We develop a general framework that yields sharp bounds on a linear functional of the unknown true distribution under i) an arbitrary collection of identified joint subdistributions and ii) structural conditions, such as (conditional) independence. We encode the identification restrictions as a continuous collection of moments of characteristic kernels, and use duality and approximation theory to rewrite the infinite-dimensional program over Borel measures as a finite-dimensional program that is simple to compute. Our approach yields a consistent estimator that is $\sqrt{n}$-uniformly valid for the sharp bounds. In the special case of empirical optimal transport with Lipschitz cost, where the minimax rate is $n^{2/d}$, our method yields a uniformly consistent estimator with an asymmetric rate, converging at $\sqrt{n}$ uniformly from one side.
Continuous monitoring is becoming more popular due to its significant benefits, including reducing sample sizes and reaching earlier conclusions. In general, it involves monitoring nuisance parameters (e.g., the variance of outcomes) until a specific condition is satisfied. The blinded method, which does not require revealing group assignments, was recommended because it maintains the integrity of the experiment and mitigates potential bias. Although Friede and Miller (2012) investigated the characteristics of blinded continuous monitoring through simulation studies, its theoretical properties are not fully explored. In this paper, we aim to fill this gap by presenting the asymptotic and finite-sample properties of the blinded continuous monitoring for continuous outcomes. Furthermore, we examine the impact of using blinded versus unblinded variance estimators in the context of continuous monitoring. Simulation results are also provided to evaluate finite-sample performance and to support the theoretical findings.
Modern data analysis increasingly requires identifying shared latent structure across multiple high-dimensional datasets. A commonly used model assumes that the data matrices are noisy observations of low-rank matrices with a shared singular subspace. In this case, two primary methods have emerged for estimating this shared structure, which vary in how they integrate information across datasets. The first approach, termed Stack-SVD, concatenates all the datasets, and then performs a singular value decomposition (SVD). The second approach, termed SVD-Stack, first performs an SVD separately for each dataset, then aggregates the top singular vectors across these datasets, and finally computes a consensus amongst them. While these methods are widely used, they have not been rigorously studied in the proportional asymptotic regime, which is of great practical relevance in today's world of increasing data size and dimensionality. This lack of theoretical understanding has led to uncertainty about which method to choose and limited the ability to fully exploit their potential. To address these challenges, we derive exact expressions for the asymptotic performance and phase transitions of these two methods and develop optimal weighting schemes to further improve both methods. Our analysis reveals that while neither method uniformly dominates the other in the unweighted case, optimally weighted Stack-SVD dominates optimally weighted SVD-Stack. We extend our analysis to accommodate multiple shared components, and provide practical algorithms for estimating optimal weights from data, offering theoretical guidance for method selection in practical data integration problems. Extensive numerical simulations and semi-synthetic experiments on genomic data corroborate our theoretical findings.
Adequacy for estimation between an inferential method and a model can be de{\ldots}ned through two main requirements: {\ldots}rstly the inferential tool should de{\ldots}ne a well posed problem when applied to the model; secondly the resulting statistical procedure should produce consistent estimators. Conditions which entail these analytical and statistical issues are considered in the context when divergence based inference is applied for smooth semiparametric models under moment restrictions. A discussion is also held on the choice of the divergence, extending the classical parametric inference to the estimation of both parameters of interest and of nuisance. Arguments in favor of the omnibus choice of the L 2 and Kullback Leibler choices as presented in [16] are discussed and motivation for the class of power divergences de{\ldots}ned in [5] is presented in the context of the present semi parametric smooth models. A short simulation study illustrates the method.
There is a growing interest in procedures for Bayesian inference that bypass the need to specify a model and prior but simply rely on a predictive rule that describes how we learn on future observations given the available ones. At the heart of the idea is a bootstrap-type scheme that allows us to move from the realm of prediction to that of inference. Which conditions the predictive rule needs to satisfy to produce valid inference is a key question. In this work, we substantially relax previous assumptions building on a generalization of martingales, opening up the possibility of employing a much wider range of predictive rules that were previously ruled out. These include ``old" ideas in Statistics and Learning Theory, such as kernel estimators, and more novel ones, such as the parametric Bayesian bootstrap or copula-based algorithms. Our aim is not to advocate in favor of one predictive rule versus the other ones, but rather to showcase the benefits of working with this larger class of predictive rules.
We investigate convergence of alternating Bregman projections between non-convex sets and prove convergence to a point in the intersection, or to points realizing a gap between the two sets. The speed of convergence is generally sub-linear, but may be linear under transversality. We apply our analysis to prove convergence of versions of the expectation maximization algorithm for non-convex parameter sets.
We study the problem of efficiency under $\alpha$ local differential privacy ($\alpha$ LDP) in both discrete and continuous settings. Building on a factorization lemma, which shows that any privacy mechanism can be decomposed into an extremal mechanism followed by additional randomization, we reduce the Fisher information maximization problem to a search over extremal mechanisms. The representation of extremal mechanisms requires working in infinite dimensional spaces and invokes advanced tools from convex and functional analysis, such as Choquet's theorem. Our analysis establishes matching upper and lower bounds on the Fisher information in the high privacy regime ($\alpha \to 0$), and proves that the maximization problem always admits a solution for any $\alpha$. As a concrete application, we consider the problem of estimating the parameter of a uniform distribution on $[0, \theta]$ under $\alpha$ LDP. Guided by our theoretical findings, we design an extremal mechanism that yields a consistent and asymptotically efficient estimator in high privacy regime. Numerical experiments confirm our theoretical results.
In this study, we derive the exact distribution and moment of the noncentral complex Roy's largest root statistic, expressed as a product of complex zonal polynomials. We show that the linearization coefficients arising from the product of complex zonal polynomials in the distribution of Roy's test under a specific alternative hypothesis can be explicitly computed using Pieri's formula, a well-known result in combinatorics. These results were then applied to compute the power of tests in the complex multivariate analysis of variance (MANOVA).
In the setting of multiple testing, compound p-values generalize p-values by asking for superuniformity to hold only \emph{on average} across all true nulls. We study the properties of the Benjamini--Hochberg procedure applied to compound p-values. Under independence, we show that the false discovery rate (FDR) is at most $1.93\alpha$, where $\alpha$ is the nominal level, and exhibit a distribution for which the FDR is $\frac{7}{6}\alpha$. If additionally all nulls are true, then the upper bound can be improved to $\alpha + 2\alpha^2$, with a corresponding worst-case lower bound of $\alpha + \alpha^2/4$. Under positive dependence, on the other hand, we demonstrate that FDR can be inflated by a factor of $O(\log m)$, where~$m$ is the number of hypotheses. We provide numerous examples of settings where compound p-values arise in practice, either because we lack sufficient information to compute non-trivial p-values, or to facilitate a more powerful analysis.
In information geometry, statistical models are considered as differentiable manifolds, where each probability distribution represents a unique point on the manifold. A Riemannian metric can be systematically obtained from a divergence function using Eguchi's theory (1992); the well-known Fisher-Rao metric is obtained from the Kullback-Leibler (KL) divergence. The geometric derivation of the classical Cram\'er-Rao Lower Bound (CRLB) by Amari and Nagaoka (2000) is based on this metric. In this paper, we study a Riemannian metric obtained by applying Eguchi's theory to the Basu-Harris-Hjort-Jones (BHHJ) divergence (1998) and derive a generalized Cram\'er-Rao bound using Amari-Nagaoka's approach. There are potential applications for this bound in robust estimation.