Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
Identifying areas where the signal is prominent is an important task in image analysis, with particular applications in brain mapping. In this work, we develop confidence regions for spatial excursion sets above and below a given level. We achieve this by treating the confidence procedure as a testing problem at the given level, allowing control of the False Discovery Rate (FDR). Methods are developed to control the FDR, separately for positive and negative excursions, as well as jointly over both. Furthermore, power is increased by incorporating a two-stage adaptive procedure. Simulation results with various signals show that our confidence regions successfully control the FDR under the nominal level. We showcase our methods with an application to functional magnetic resonance imaging (fMRI) data from the Human Connectome Project illustrating the improvement in statistical power over existing approaches.
Quantum channel discrimination has been studied from an information-theoretic perspective, wherein one is interested in the optimal decay rate of error probabilities as a function of the number of unknown channel accesses. In this paper, we study the query complexity of quantum channel discrimination, wherein the goal is to determine the minimum number of channel uses needed to reach a desired error probability. To this end, we show that the query complexity of binary channel discrimination depends logarithmically on the inverse error probability and inversely on the negative logarithm of the (geometric and Holevo) channel fidelity. As a special case of these findings, we precisely characterize the query complexity of discriminating between two classical channels. We also provide lower and upper bounds on the query complexity of binary asymmetric channel discrimination and multiple quantum channel discrimination. For the former, the query complexity depends on the geometric R\'enyi and Petz R\'enyi channel divergences, while for the latter, it depends on the negative logarithm of (geometric and Uhlmann) channel fidelity. For multiple channel discrimination, the upper bound scales as the logarithm of the number of channels.
ROCFTP is a perfect sampling algorithm that employs various random operations, and requiring a specific Markov chain construction for each target. To overcome this requirement, the Metropolis algorithm is incorporated as a random operation within ROCFTP. While the Metropolis sampler functions as a random operation, it isn't a coupler. However, by employing normal multishift coupler as a symmetric proposal for Metropolis, we obtain ROCFTP with Metropolis-multishift. Initially designed for bounded state spaces, ROCFTP's applicability to targets with unbounded state spaces is extended through the introduction of the Most Interest Range (MIR) for practical use. It was demonstrated that selecting MIR decreases the likelihood of ROCFTP hitting $MIR^C$ by a factor of (1 - {\epsilon}), which is beneficial for practical implementation. The algorithm exhibits a convergence rate characterized by exponential decay. Its performance is rigorously evaluated across various targets, and tests ensure its goodness of fit. Lastly, an R package is provided for generating exact samples using ROCFTP Metropolis-multishift.
We consider a new statistical model called the circulant correlation structure model, which is a multivariate Gaussian model with unknown covariance matrix and has a scale-invariance property. We construct shrinkage priors for the circulant correlation structure models and show that Bayesian predictive densities based on those priors asymptotically dominate Bayesian predictive densities based on Jeffreys priors under the Kullback-Leibler (KL) risk function. While shrinkage of eigenvalues of covariance matrices of Gaussian models has been successful, the proposed priors shrink a non-eigenvalue part of covariance matrices.
How do we interpret the differential privacy (DP) guarantee for network data? We take a deep dive into a popular form of network DP ($\varepsilon$--edge DP) to find that many of its common interpretations are flawed. Drawing on prior work for privacy with correlated data, we interpret DP through the lens of adversarial hypothesis testing and demonstrate a gap between the pairs of hypotheses actually protected under DP (tests of complete networks) and the sorts of hypotheses implied to be protected by common claims (tests of individual edges). We demonstrate some conditions under which this gap can be bridged, while leaving some questions open. While some discussion is specific to edge DP, we offer selected results in terms of abstract DP definitions and provide discussion of the implications for other forms of network DP.
Markov chain sampling methods form the backbone of modern computational statistics. However, many popular methods are prone to random walk behavior, i.e., diffusion-like exploration of the sample space, leading to slow mixing that requires intricate tuning to alleviate. Non-reversible samplers can resolve some of these issues. We introduce a device that turns jump processes that satisfy a skew-detailed balance condition for a reference measure into a process that samples a target measure that is absolutely continuous with respect to the reference measure. The resulting sampler is rejection-free, non-reversible, and continuous-time. As an example, we apply the device to Hamiltonian dynamics discretized by the leapfrog integrator, resulting in a rejection-free non-reversible continuous-time version of Hamiltonian Monte Carlo (HMC). We prove the geometric ergodicity of the resulting sampler under certain convexity conditions, and demonstrate its qualitatively different behavior to HMC through numerical examples.
In this paper, the author introduces new methods to construct Archimedean copulas. The generator of each copula fulfills the sufficient conditions as regards the boundary and being continuous, decreasing, and convex. Each inverse generator also fulfills the necessary conditions as regards the boundary conditions, marginal uniformity, and 2-increasing properties. Although these copulas satisfy these conditions, they have some limitations. They do not cover the entire dependency spectrum, ranging from perfect negative dependency to perfect positive dependency, passing through the independence state
Compositional graphoids are fundamental discrete structures which appear in probabilistic reasoning, particularly in the area of graphical models. They are semigraphoids which satisfy the Intersection and Composition properties. These important properties, however, are not enjoyed by general probability distributions. We survey what is known in terms of sufficient conditions for Intersection and Composition and derive a set of new sufficient conditions in the context of discrete random variables based on conditional information inequalities for Shannon entropies.
The population intervention indirect effect (PIIE) is a novel mediation effect representing the indirect component of the population intervention effect. Unlike traditional mediation measures, such as the natural indirect effect, the PIIE holds particular relevance in observational studies involving unethical exposures, when hypothetical interventions that impose harmful exposures are inappropriate. Although prior research has identified PIIE under unmeasured confounders between exposure and outcome, it has not fully addressed the confounding that affects the mediator. This study extends the PIIE identification to settings where unmeasured confounders influence exposure-outcome, exposure-mediator, and mediator-outcome relationships. Specifically, we leverage observed covariates as proxy variables for unmeasured confounders, constructing three proximal identification frameworks. Additionally, we characterize the semiparametric efficiency bound and develop multiply robust and locally efficient estimators. To handle high-dimensional nuisance parameters, we propose a debiased machine learning approach that achieves $\sqrt{n}$-consistency and asymptotic normality to estimate the true PIIE values, even when the machine learning estimators for the nuisance functions do not converge at $\sqrt{n}$-rate. In simulations, our estimators demonstrate higher confidence interval coverage rates than conventional methods across various model misspecifications. In a real data application, our approaches reveal an indirect effect of alcohol consumption on depression risk mediated by depersonalization symptoms.
Many statistical problems can be reduced to a linear inverse problem in which only a noisy version of the operator is available. Particular examples include random design regression, deconvolution problem, instrumental variable regression, functional data analysis, error-in-variable regression, drift estimation in stochastic diffusion, and many others. The pragmatic plug-in approach can be well justified in the classical asymptotic setup with a growing sample size. However, recent developments in high dimensional inference reveal some new features of this problem. In high dimensional linear regression with a random design, the plug-in approach is questionable but the use of a simple ridge penalization yields a benign overfitting phenomenon; see \cite{baLoLu2020}, \cite{ChMo2022}, \cite{NoPuSp2024}. This paper revisits the general Error-in-Operator problem for finite samples and high dimension of the source and image spaces. A particular focus is on the choice of a proper regularization. We show that a simple ridge penalty (Tikhonov regularization) works properly in the case when the operator is more regular than the signal. In the opposite case, some model reduction technique like spectral truncation should be applied.
False discovery rate (FDR) has been a key metric for error control in multiple hypothesis testing, and many methods have developed for FDR control across a diverse cross-section of settings and applications. We develop a closure principle for all FDR controlling procedures, i.e., we provide a characterization based on e-values for all admissible FDR controlling procedures. We leverage this idea to formulate the closed eBH procedure, a (usually strict) improvement over the eBH procedure for FDR control when provided with e-values. We demonstrate the practical performance of closed eBH in simulations.
We investigate Bayesian posterior consistency in the context of parametric models with proper priors. While typical state-of-the-art approaches rely on regularity conditions that are difficult to verify and often misaligned with the actual mechanisms driving posterior consistency, we propose an alternative framework centered on a simple yet general condition we call ''sequential identifiability''. This concept strengthens the usual identifiability assumption by requiring that any sequence of parameters whose induced distributions converge to the true data-generating distribution must itself converge to the true parameter value. We demonstrate that sequential identifiability, combined with a standard Kullback--Leibler prior support condition, is sufficient to ensure posterior consistency. Moreover, we show that failure of this condition necessarily entails a specific and pathological form of oscillations of the model around the true density, which cannot exist without intentional design. This leads to the important insight that posterior inconsistency may be safely ruled out, except in the unrealistic scenario where the modeler possesses precise knowledge of the data-generating distribution and deliberately incorporates oscillatory pathologies into the model targeting the corresponding density. Taken together, these results provide a unified perspective on both consistency and inconsistency in parametric settings, significantly expanding the class of models for which posterior consistency can be rigorously established. To illustrate the strength and versatility of our framework, we construct a one-dimensional model that violates standard regularity conditions and fails to admit a consistent maximum likelihood estimator, yet supports a complete posterior consistency analysis via our approach.
In this paper, we consider asymptotics of the optimal value and the optimal solutions of parametric minimax estimation problems. Specifically, we consider estimators of the optimal value and the optimal solutions in a sample minimax problem that approximates the true population problem and study the limiting distributions of these estimators as the sample size tends to infinity. The main technical tool we employ in our analysis is the theory of sensitivity analysis of parameterized mathematical optimization problems. Our results go well beyond the existing literature and show that these limiting distributions are highly non-Gaussian in general and normal in simple specific cases. These results open up the way for the development of statistical inference methods in parametric minimax problems.
This paper addresses the problem of model selection in the sequence model $Y=\theta+\varepsilon\xi$, when $\xi$ is sub-Gaussian, for non-euclidian loss-functions. In this model, the Penalized Comparison to Overfitting procedure is studied for the weighted $\ell_p$-loss, $p\geq 1.$ Several oracle inequalities are derived from concentration inequalities for sub-Weibull variables. Using judicious collections of models and penalty terms, minimax rates of convergence are stated for Besov bodies $\mathcal{B}_{r,\infty}^s$. These results are applied to the functional model of nonparametric regression.
The notion of relative universality with respect to a {\sigma}-field was introduced to establish the unbiasedness and Fisher consistency of an estimator in nonlinear sufficient dimension reduction. However, there is a gap in the proof of this result in the existing literature. The existing definition of relative universality seems to be too strong for the proof to be valid. In this note we modify the definition of relative universality using the concept of \k{o}-measurability, and rigorously establish the mentioned unbiasedness and Fisher consistency. The significance of this result is beyond its original context of sufficient dimension reduction, because relative universality allows us to use the regression operator to fully characterize conditional independence, a crucially important statistical relation that sits at the core of many areas and methodologies in statistics and machine learning, such as dimension reduction, graphical models, probability embedding, causal inference, and Bayesian estimation.
We study estimation and inference for the mean of real-valued random functions defined on a hypercube. The independent random functions are observed on a discrete, random subset of design points, possibly with heteroscedastic noise. We propose a novel optimal-rate estimator based on Fourier series expansions and establish a sharp non-asymptotic error bound in $L^2-$norm. Additionally, we derive a non-asymptotic Gaussian approximation bound for our estimated Fourier coefficients. Pointwise and uniform confidence sets are constructed. Our approach is made adaptive by a plug-in estimator for the H\"older regularity of the mean function, for which we derive non-asymptotic concentration bounds.
Motivated by small bandwidth asymptotics for kernel-based semiparametric estimators in econometrics, this paper establishes Gaussian approximation results for high-dimensional fixed-order $U$-statistics whose kernels depend on the sample size. Our results allow for a situation where the dominant component of the Hoeffding decomposition is absent or unknown, including cases with known degrees of degeneracy as special forms. The obtained error bounds for Gaussian approximations are sharp enough to almost recover the weakest bandwidth condition of small bandwidth asymptotics in the fixed-dimensional setting when applied to a canonical semiparametric estimation problem. We also present an application to an adaptive goodness-of-fit testing, along with discussions about several potential applications.
In this paper, we study the problem of testing the equality of two multivariate distributions. One class of tests used for this purpose utilizes geometric graphs constructed using inter-point distances. The asymptotic theory of these tests so far applies only to graphs which fall under the stabilizing graphs framework of Penrose and Yukich. We study the case of the $K$-nearest neighbors graph where $K=k_N$ increases with the sample size, which does not fall under the stabilizing graphs framework. Our main result gives detection thresholds for this test in parametrized families when $k_N = o(N^{1/4})$, thus extending the family of graphs where the theoretical behavior is known. We propose a 2-sided version of the test which removes an exponent gap that plagues the 1-sided test. Our result also shows that using a greater number of nearest neighbors boosts the power of the test. This provides theoretical justification for using denser graphs in testing equality of two distributions.