Power properties of the two sample test based on nearest neighbors graphs
Abstract
In this paper, we study the problem of testing the equality of two multivariate distributions. One class of tests used for this purpose utilizes geometric graphs constructed using inter-point distances. The asymptotic theory of these tests so far applies only to graphs which fall under the stabilizing graphs framework of Penrose and Yukich. We study the case of the $K$-nearest neighbors graph where $K=k_N$ increases with the sample size, which does not fall under the stabilizing graphs framework. Our main result gives detection thresholds for this test in parametrized families when $k_N = o(N^{1/4})$, thus extending the family of graphs where the theoretical behavior is known. We propose a 2-sided version of the test which removes an exponent gap that plagues the 1-sided test. Our result also shows that using a greater number of nearest neighbors boosts the power of the test. This provides theoretical justification for using denser graphs in testing equality of two distributions.