Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
This note introduces a unified theory for causal inference that integrates Riesz regression, covariate balancing, density-ratio estimation (DRE), targeted maximum likelihood estimation (TMLE), and the matching estimator in average treatment effect (ATE) estimation. In ATE estimation, the balancing weights and the regression functions of the outcome play important roles, where the balancing weights are referred to as the Riesz representer, bias-correction term, and clever covariates, depending on the context. Riesz regression, covariate balancing, DRE, and the matching estimator are methods for estimating the balancing weights, where Riesz regression is essentially equivalent to DRE in the ATE context, the matching estimator is a special case of DRE, and DRE is in a dual relationship with covariate balancing. TMLE is a method for constructing regression function estimators such that the leading bias term becomes zero. Nearest Neighbor Matching is equivalent to Least Squares Density Ratio Estimation and Riesz Regression.
The goal of policy learning is to train a policy function that recommends a treatment given covariates to maximize population welfare. There are two major approaches in policy learning: the empirical welfare maximization (EWM) approach and the plug-in approach. The EWM approach is analogous to a classification problem, where one first builds an estimator of the population welfare, which is a functional of policy functions, and then trains a policy by maximizing the estimated welfare. In contrast, the plug-in approach is based on regression, where one first estimates the conditional average treatment effect (CATE) and then recommends the treatment with the highest estimated outcome. This study bridges the gap between the two approaches by showing that both are based on essentially the same optimization problem. In particular, we prove an exact equivalence between EWM and least squares over a reparameterization of the policy class. As a consequence, the two approaches are interchangeable in several respects and share the same theoretical guarantees under common conditions. Leveraging this equivalence, we propose a novel regularization method for policy learning. Our findings yield a convex and computationally efficient training procedure that avoids the NP-hard combinatorial step typically required in EWM.
Consider the setting in which a researcher is interested in the causal effect of a treatment $Z$ on a duration time $T$, which is subject to right censoring. We assume that $T=\varphi(X,Z,U)$, where $X$ is a vector of baseline covariates, $\varphi(X,Z,U)$ is strictly increasing in the error term $U$ for each $(X,Z)$ and $U\sim \mathcal{U}[0,1]$. Therefore, the model is nonparametric and nonseparable. We propose nonparametric tests for the hypothesis that $Z$ is exogenous, meaning that $Z$ is independent of $U$ given $X$. The test statistics rely on an instrumental variable $W$ that is independent of $U$ given $X$. We assume that $X,W$ and $Z$ are all categorical. Test statistics are constructed for the hypothesis that the conditional rank $V_T= F_{T \mid X,Z}(T \mid X,Z)$ is independent of $(X,W)$ jointly. Under an identifiability condition on $\varphi$, this hypothesis is equivalent to $Z$ being exogenous. However, note that $V_T$ is censored by $V_C =F_{T \mid X,Z}(C \mid X,Z)$, which complicates the construction of the test statistics significantly. We derive the limiting distributions of the proposed tests and prove that our estimator of the distribution of $V_T$ converges to the uniform distribution at a rate faster than the usual parametric $n^{-1/2}$-rate. We demonstrate that the test statistics and bootstrap approximations for the critical values have a good finite sample performance in various Monte Carlo settings. Finally, we illustrate the tests with an empirical application to the National Job Training Partnership Act (JTPA) Study.
The difference-in-differences (DID) research design is a key identification strategy which allows researchers to estimate causal effects under the parallel trends assumption. While the parallel trends assumption is counterfactual and cannot be tested directly, researchers often examine pre-treatment periods to check whether the time trends are parallel before treatment is administered. Recently, researchers have been cautioned against using preliminary tests which aim to detect violations of parallel trends in the pre-treatment period. In this paper, we argue that preliminary testing can -- and should -- play an important role within the DID research design. We propose a new and more substantively appropriate conditional extrapolation assumption, which requires an analyst to conduct a preliminary test to determine whether the severity of pre-treatment parallel trend violations falls below an acceptable level before extrapolation to the post-treatment period is justified. This stands in contrast to prior work which can be interpreted as either setting the acceptable level to be exactly zero (in which case preliminary tests lack power) or assuming that extrapolation is always justified (in which case preliminary tests are not required). Under mild assumptions on how close the actual violation is to the acceptable level, we provide a consistent preliminary test as well confidence intervals which are valid when conditioned on the result of the test. The conditional coverage of these intervals overcomes a common critique made against the use of preliminary testing within the DID research design. We use real data as well as numerical simulations to illustrate the performance of the proposed methods.
This paper examines methods of causal inference based on groupwise matching when we observe multiple large groups of individuals over several periods. We formulate causal inference validity through a generalized matching condition, generalizing the parallel trend assumption in difference-in-differences designs. We show that difference-in-differences, synthetic control, and synthetic difference-in-differences designs are distinguished by the specific matching conditions that they invoke. Through regret analysis, we demonstrate that difference-in-differences and synthetic control with differencing are complementary; the former dominates the latter if and only if the latter's extrapolation error exceeds the former's matching error up to a term vanishing at the parametric rate. The analysis also reveals that synthetic control with differencing is equivalent to difference-in-differences when the parallel trend assumption holds for both the pre-treatment and post-treatment periods. We develop a statistical inference procedure based on synthetic control with differencing and present an empirical application demonstrating its usefulness.
We study the statistical properties of nonparametric distance-based (isotropic) local polynomial regression estimators of the boundary average treatment effect curve, a key causal functional parameter capturing heterogeneous treatment effects in boundary discontinuity designs. We present necessary and/or sufficient conditions for identification, estimation, and inference in large samples, both pointwise and uniformly along the boundary. Our theoretical results highlight the crucial role played by the ``regularity'' of the boundary (a one-dimensional manifold) over which identification, estimation, and inference are conducted. Our methods are illustrated with simulated data. Companion general-purpose software is provided.
We introduce Agentic Economic Modeling (AEM), a framework that aligns synthetic LLM choices with small-sample human evidence for reliable econometric inference. AEM first generates task-conditioned synthetic choices via LLMs, then learns a bias-correction mapping from task features and raw LLM choices to human-aligned choices, upon which standard econometric estimators perform inference to recover demand elasticities and treatment effects.We validate AEM in two experiments. In a large scale conjoint study with millions of observations, using only 10% of the original data to fit the correction model lowers the error of the demand-parameter estimates, while uncorrected LLM choices even increase the errors. In a regional field experiment, a mixture model calibrated on 10% of geographic regions estimates an out-of-domain treatment effect of -65\pm10 bps, closely matching the full human experiment (-60\pm8 bps).Under time-wise extrapolation, training with only day-one human data yields -24 bps (95% CI: [-26, -22], p<1e-5),improving over the human-only day-one baseline (-17 bps, 95% CI: [-43, +9], p=0.2049).These results demonstrate AEM's potential to improve RCT efficiency and establish a foundation method for LLM-based counterfactual generation.
We provide theoretical results for the estimation and inference of a class of welfare and value functionals of the nonparametric conditional average treatment effect (CATE) function under optimal treatment assignment, i.e., treatment is assigned to an observed type if and only if its CATE is nonnegative. For the optimal welfare functional defined as the average value of CATE on the subpopulation with nonnegative CATE, we establish the $\sqrt{n}$ asymptotic normality of the semiparametric plug-in estimators and provide an analytical asymptotic variance formula. For more general value functionals, we show that the plug-in estimators are typically asymptotically normal at the 1-dimensional nonparametric estimation rate, and we provide a consistent variance estimator based on the sieve Riesz representer, as well as a proposed computational procedure for numerical integration on submanifolds. The key reason underlying the different convergence rates for the welfare functional versus the general value functional lies in that, on the boundary subpopulation for whom CATE is zero, the integrand vanishes for the welfare functional but does not for general value functionals. We demonstrate in Monte Carlo simulations the good finite-sample performance of our estimation and inference procedures, and conduct an empirical application of our methods on the effectiveness of job training programs on earnings using the JTPA data set.
We revisit the classical problem of comparing regression functions, a fundamental question in statistical inference with broad relevance to modern applications such as data integration, transfer learning, and causal inference. Existing approaches typically rely on smoothing techniques and are thus hindered by the curse of dimensionality. We propose a generalized notion of kernel-based conditional mean dependence that provides a new characterization of the null hypothesis of equal regression functions. Building on this reformulation, we develop two novel tests that leverage modern machine learning methods for flexible estimation. We establish the asymptotic properties of the test statistics, which hold under both fixed- and high-dimensional regimes. Unlike existing methods that often require restrictive distributional assumptions, our framework only imposes mild moment conditions. The efficacy of the proposed tests is demonstrated through extensive numerical studies.
We develop a structural framework for modeling and inferring unobserved heterogeneity in dynamic panel-data models. Unlike methods treating clustering as a descriptive device, we model heterogeneity as arising from a latent clustering mechanism, where the number of clusters is unknown and estimated. Building on the mixture of finite mixtures (MFM) approach, our method avoids the clustering inconsistency issues of Dirichlet process mixtures and provides an interpretable representation of the population clustering structure. We extend the Telescoping Sampler of Fruhwirth-Schnatter et al. (2021) to dynamic panels with covariates, yielding an efficient MCMC algorithm that delivers full Bayesian inference and credible sets. We show that asymptotically the posterior distribution of the mixing measure contracts around the truth at parametric rates in Wasserstein distance, ensuring recovery of clustering and structural parameters. Simulations demonstrate strong finite-sample performance. Finally, an application to the income-democracy relationship reveals latent heterogeneity only when controlling for additional covariates.
This study proves that Nearest Neighbor (NN) matching can be interpreted as an instance of Riesz regression for automatic debiased machine learning. Lin et al. (2023) shows that NN matching is an instance of density-ratio estimation with their new density-ratio estimator. Chernozhukov et al. (2024) develops Riesz regression for automatic debiased machine learning, which directly estimates the Riesz representer (or equivalently, the bias-correction term) by minimizing the mean squared error. In this study, we first prove that the density-ratio estimation method proposed in Lin et al. (2023) is essentially equivalent to Least-Squares Importance Fitting (LSIF) proposed in Kanamori et al. (2009) for direct density-ratio estimation. Furthermore, we derive Riesz regression using the LSIF framework. Based on these results, we derive NN matching from Riesz regression. This study is based on our work Kato (2025a) and Kato (2025b).
This paper addresses the challenges of giving a causal interpretation to vector autoregressions (VARs). I show that under independence assumptions VARs can identify average treatment effects, average causal responses, or a mix of the two, depending on the distribution of the policy. But what about situations in which the economist cannot rely on independence assumptions? I propose an alternative method, defined as control-VAR, which uses control variables to estimate causal effects. Control-VAR can estimate average treatment effects on the treated for dummy policies or average causal responses over time for continuous policies. The advantages of control-based approaches are demonstrated by examining the impact of natural disasters on the US economy, using Germany as a control. Contrary to previous literature, the results indicate that natural disasters have a negative economic impact without any cyclical positive effect. These findings suggest that control-VARs provide a viable alternative to strict independence assumptions, offering more credible causal estimates and significant implications for policy design in response to natural disasters.
This paper discusses the different contemporaneous causal interpretations of Panel Vector Autoregressions (PVAR). I show that the interpretation of PVARs depends on the distribution of the causing variable, and can range from average treatment effects, to average causal responses, to a combination of the two. If the researcher is willing to postulate a no residual autocorrelation assumption, and some units can be thought of as controls, PVAR can identify average treatment effects on the treated. This method complements the toolkits already present in the literature, such as staggered-DiD, or LP-DiD, as it formulates assumptions in the residuals, and not in the outcome variables. Such a method features a notable advantage: it allows units to be ``sparsely'' treated, capturing the impact of interventions on the innovation component of the outcome variables. I provide an example related to the evaluation of the effects of natural disasters economic activity at the weekly frequency in the US.I conclude by discussing solutions to potential violations of the SUTVA assumption arising from interference.
We develop a direct debiased machine learning framework comprising Neyman targeted estimation and generalized Riesz regression. Our framework unifies Riesz regression for automatic debiased machine learning, covariate balancing, targeted maximum likelihood estimation (TMLE), and density-ratio estimation. In many problems involving causal effects or structural models, the parameters of interest depend on regression functions. Plugging regression functions estimated by machine learning methods into the identifying equations can yield poor performance because of first-stage bias. To reduce such bias, debiased machine learning employs Neyman orthogonal estimating equations. Debiased machine learning typically requires estimation of the Riesz representer and the regression function. For this problem, we develop a direct debiased machine learning framework with an end-to-end algorithm. We formulate estimation of the nuisance parameters, the regression function and the Riesz representer, as minimizing the discrepancy between Neyman orthogonal scores computed with known and unknown nuisance parameters, which we refer to as Neyman targeted estimation. Neyman targeted estimation includes Riesz representer estimation, and we measure discrepancies using the Bregman divergence. The Bregman divergence encompasses various loss functions as special cases, where the squared loss yields Riesz regression and the Kullback-Leibler divergence yields entropy balancing. We refer to this Riesz representer estimation as generalized Riesz regression. Neyman targeted estimation also yields TMLE as a special case for regression function estimation. Furthermore, for specific pairs of models and Riesz representer estimation methods, we can automatically obtain the covariate balancing property without explicitly solving the covariate balancing objective.
Experiments deliver credible but often localized effects, tied to specific sites, populations, or mechanisms. When such estimates are insufficient to extrapolate effects for broader policy questions, such as external validity and general-equilibrium (GE) effects, researchers combine trials with external evidence from reduced-form or structural observational estimates, or prior experiments. We develop a unified framework for designing experiments in this setting: the researcher selects which parameters to identify experimentally from a feasible set (which treatment arms and/or individuals to include in the experiment), allocates sample size, and specifies how to weight experimental and observational estimators. Because observational inputs may be biased in ways unknown ex ante, we develop a minimax proportional regret objective that evaluates any candidate design relative to an oracle that knows the bias and jointly chooses the design and estimator. This yields a transparent bias-variance trade-off that requires no prespecified bias bound and depends only on information about the precision of the estimators and the estimand's sensitivity to the underlying parameters. We illustrate the framework by (i) designing small-scale cash transfer experiments aimed at estimating GE effects and (ii) optimizing site selection for microfinance interventions.
Accurate macroeconomic forecasting has become harder amid geopolitical disruptions, policy reversals, and volatile financial markets. Conventional vector autoregressions (VARs) overfit in high dimensional settings, while threshold VARs struggle with time varying interdependencies and complex parameter structures. We address these limitations by extending the Sims Zha Bayesian VAR with exogenous variables (SZBVARx) to incorporate domain-informed shrinkage and four newspaper based uncertainty shocks such as economic policy uncertainty, geopolitical risk, US equity market volatility, and US monetary policy uncertainty. The framework improves structural interpretability, mitigates dimensionality, and imposes empirically guided regularization. Using G7 data, we study spillovers from uncertainty shocks to five core variables (unemployment, real broad effective exchange rates, short term rates, oil prices, and CPI inflation), combining wavelet coherence (time frequency dynamics) with nonlinear local projections (state dependent impulse responses). Out-of-sample results at 12 and 24 month horizons show that SZBVARx outperforms 14 benchmarks, including classical VARs and leading machine learning models, as confirmed by Murphy difference diagrams, multivariate Diebold Mariano tests, and Giacomini White predictability tests. Credible Bayesian prediction intervals deliver robust uncertainty quantification for scenario analysis and risk management. The proposed SZBVARx offers G7 policymakers a transparent, well calibrated tool for modern macroeconomic forecasting under pervasive uncertainty.
To increase statistical efficiency in a randomized experiment, researchers often use stratification (i.e., blocking) in the design stage. However, conventional practices of stratification fail to exploit valuable information about the predictive relationship between covariates and potential outcomes. In this paper, I introduce an adaptive stratification procedure for increasing statistical efficiency when some information is available about the relationship between covariates and potential outcomes. I show that, in a paired design, researchers can rematch observations across different batches. For inference, I propose a stratified estimator that allows for nonparametric covariate adjustment. I then discuss the conditions under which researchers should expect gains in efficiency from stratification. I show that stratification complements rather than substitutes for regression adjustment, insuring against adjustment error even when researchers plan to use covariate adjustment. To evaluate the performance of the method relative to common alternatives, I conduct simulations using both synthetic data and more realistic data derived from a political science experiment. Results demonstrate that the gains in precision and efficiency can be substantial.
This paper studies a class of models for two-sided interactions, where outcomes depend on latent characteristics of two distinct agent types. Models in this class have two core elements: the matching network, which records which agent pairs interact, and the interaction function, which maps latent characteristics of these agents to outcomes and determines the role of complementarities. I introduce the Tukey model, which captures complementarities with a single interaction parameter, along with two extensions that allow richer complementarity patterns. First, I establish an identification trade-off between the flexibility of the interaction function and the density of the matching network: the Tukey model is identified under mild conditions, whereas the more flexible extensions require dense networks that are rarely observed in applications. Second, I propose a cycle-based estimator for the Tukey interaction parameter and show that it is consistent and asymptotically normal even when the network is sparse. Third, I use its asymptotic distribution to construct a formal test of no complementarities. Finally, an empirical illustration shows that the Tukey model recovers economically meaningful complementarities.