Exact variance estimation for model-assisted survey estimators using U- and V-statistics

Abstract: Model-assisted estimation combines sample survey data with auxiliary information to increase precision when estimating finite population quantities. Accurately estimating the variance of model-assisted estimators is challenging: the classical approach ignores uncertainty from estimating the working model for the functional relationship between survey and auxiliary variables. This approach may be asymptotically valid, but can underestimate variance in practical settings with limited sample sizes. In this work, we develop a connection between model-assisted estimation and the theory of U- and V-statistics. We demonstrate that when predictions from the working model for the variable of interest can be represented as a U- or V-statistic, the resulting model-assisted estimator also admits a U- or V-statistic representation. We exploit this connection to derive an improved estimator of the exact variance of such model-assisted estimators. The class of working models for which this strategy can be used is broad, ranging from linear models to modern ensemble methods. We apply our approach to the model-assisted estimator constructed with a linear regression working model, commonly referred to as the generalized regression estimator, show that it can be re-written as a U-statistic, and propose an estimator of its exact variance. We illustrate our proposal and compare it against the classical asymptotic variance estimator using household survey data from the American Community Survey.

The preprint can be downloaded here.