Link

Publications

Journal publications

[ Top ]

1. Near-Optimal Confidence Sequences for Bounded Random Variables

Arun Kumar Kuchibhotla, Qinqing Zheng (2021)
Accepted at International Conference on Machine Learning (ICML 2021). arXiv:2006.05022
[] [] [ arXiv ]

Many inference problems, such as sequential decision problems like A/B testing, adaptive sampling schemes like bandit selection, are often online in nature. The fundamental problem for online inference is to provide a sequence of confidence intervals that are valid uniformly over the growing-into-infinity sample sizes. To address this question, we provide a near-optimal confidence sequence for bounded random variables by utilizing Bentkus' concentration results. We show that it improves on the existing approaches that use the Cram{é}r-Chernoff technique such as the Hoeffding, Bernstein, and Bennett inequalities. The resulting confidence sequence is confirmed to be favorable in both synthetic coverage problems and an application to adaptive stopping algorithms.

The paper provides confidence sequences, i.e., a sequence of confidence intervals that are valid simultaneously over all the sample sizes. This allows sequential/adaptive sampling schemes. The confidence intervals are finite sample valid under boundedness assumptions and are based on near-optimal concentration inequalities of Bentkus (2002, 2004).

2. Models as Approximations---Rejoinder

Andreas Buja, Arun Kumar Kuchibhotla, Richard Berk, Edward I. George, Eric J. Tchetgen Tchetgen, Linda Zhao (2019)
Statistical Science.
[] [] [ arXiv ]

Rejoinder to the discussants

Rejoinder to the discussants.

3. Models as Approximations---Part II: A General Theory of Model-Robust Regression

Andreas Buja, Lawrence D. Brown, Arun Kumar Kuchibhotla, Richard Berk, Edward I. George, Linda Zhao (2019)
Statistical Science.
[] [] [ arXiv ]

We develop a model-free theory of general types of parametric regression for iid observations. The theory replaces the parameters of parametric models with statistical functionals, to be called regression functionals, defined on large non-parametric classes of joint $(X, Y)$ distributions, without assuming a correct model. Parametric models are reduced to heuristics to suggest plausible objective functions. An example of a regression functional is the vector of slopes of linear equations fitted by OLS to largely arbitrary $(X, Y)$ distributions, without assuming a linear model (see Part~I). More generally, regression functionals can be defined by minimizing objective functions or solving estimating equations at joint $(X, Y)$ distributions. In this framework it is possible to achieve the following: (1)~define a notion of well-specification for regression functionals that replaces the notion of correct specification of models, (2)~propose a well-specification diagnostic for regression functionals based on reweighting distributions and data, (3)~decompose sampling variability of regression functionals into two sources, one due to the conditional response distribution and another due to the regressor distribution interacting with misspecification, both of order $N^{-1/2}$, (4)~exhibit plug-in/sandwich estimators of standard error as limit cases of $(X, Y)$ bootstrap estimators, and (5)~provide theoretical heuristics to indicate that $(X, Y)$ bootstrap standard errors may generally be more stable than sandwich estimators.

This provides provides the theory and interpretation of M-estimators under misspecification.

4. On Least Squares Estimation under Heteroscedastic and Heavy-Tailed Errors

Arun Kumar Kuchibhotla, Rohit K. Patra (2019)
arXiv:1909.02088. Accepted at Annals of Statistics.
[] [] [ arXiv ]

We consider least squares estimation in a general nonparametric regression model. The rate of convergence of the least squares estimator (LSE) for the unknown regression function is well studied when the errors are sub-Gaussian. We find upper bounds on the rates of convergence of the LSE when the errors have uniformly bounded conditional variance and have only finitely many moments. We show that the interplay between the moment assumptions on the error, the metric entropy of the class of functions involved, and the "local" structure of the function class around the truth drives the rate of convergence of the LSE. We find sufficient conditions on the errors under which the rate of the LSE matches the rate of the LSE under sub-Gaussian error. Our results are finite sample and allow for heteroscedastic and heavy-tailed errors.

This paper considers the asymptotics of nonparametric least squares estimator when the errors have finite number of moments and is dependent on the covariates.

5. Efficient Estimation in Single Index Models through Smoothing splines

Arun Kumar Kuchibhotla, Rohit K. Patra (2019)
Bernoulli.
[] [] [ arXiv ]

We consider estimation and inference in a single index regression model with an unknown but smooth link function. In contrast to the standard approach of using kernels or regression splines, we use smoothing splines to estimate the smooth link function. We develop a method to compute the penalized least squares estimators (PLSEs) of the parametric and the nonparametric components given independent and identically distributed (i.i.d.)~data. We prove the consistency and find the rates of convergence of the estimators. We establish asymptotic normality under under mild assumption and prove asymptotic efficiency of the parametric component under homoscedastic errors. A finite sample simulation corroborates our asymptotic theory. We also analyze a car mileage data set and a Ozone concentration data set. The identifiability and existence of the PLSEs are also investigated.

This paper considers estimation and inference in a single index model using a smoothing spline estimator of the link function.

6. Nested conformal prediction and quantile out-of-bag ensemble methods

Chirag Gupta, Arun Kumar Kuchibhotla, Aaditya Ramdas (2019)
arXiv:1910.10562. Accepted at Pattern Recognition
[] [] [ arXiv ]

We provide an alternate unified framework for conformal prediction, which is a framework to provide assumption-free prediction intervals. Instead of beginning by choosing a conformity score, our framework starts with a sequence of nested sets $\{\mathcal{F}_t(x)\}_{t\in\mathcal{T}}$ for some ordered set $\mathcal{T}$ that specifies all potential prediction sets. We show that most proposed conformity scores in the literature, including several based on quantiles, straightforwardly result in nested families. Then, we argue that what conformal prediction does is find a mapping $\alpha \mapsto t(\alpha)$, meaning that it calibrates or rescales $\mathcal{T}$ to $[0,1]$. Nestedness is a natural and intuitive requirement because the optimal prediction sets (eg: level sets of conditional densities) are also nested, but we also formally prove that nested sets are universal, meaning that any conformal prediction method can be represented in our framework. Finally, to demonstrate its utility, we show how to develop the full conformal, split conformal, cross-conformal and the recent jackknife+ methods within our nested framework, thus immediately generalizing the latter two classes of methods to new settings. Specifically, we prove validity of the leave-one-out, $K$-fold, subsampling and bootstrap variants of the latter two methods for any nested family.

This paper considers conformal prediction based on a nested sequence of sets rather than a function estimate. An element of the nested sequence is obtained based on the data to have the required coverage properties. This framework includes several of the existing conformal procedures and naturally provides an extension of jackknife+ recently introduced.

7. Valid Post-selection Inference in Model-free Linear Regression

Arun Kumar Kuchibhotla, Lawrence D. Brown, Andreas Buja, Junhui Cai, Edward I. George, Linda Zhao (2019)
Annals of Statistics. Accepted
[] [] [ arXiv ]

Modern data-driven approaches to modeling make extensive use of covariate/model selection. Such selection incurs a cost: it invalidates classical statistical inference. A conservative remedy to the problem was proposed by Berk et al. (2013) and further extended by Bachoc et al. (2016). These proposals, labeled ``PoSI methods'', provide valid inference after arbitrary model selection. They are computationally NP-hard and have certain limitations in their theoretical justifications. We therefore propose computationally efficient PoSI confidence regions and prove large-$p$ asymptotics for them. We do this for linear OLS regression allowing misspecification of the normal linear model, for both fixed and random covariates, and for independent as well as some types of dependent data. We start by proving a general equivalence result for the post-selection inference problem and a simultaneous inference problem in a setting that strips inessential features still present in a related result of Berk et al. (2013). We then construct valid PoSI confidence regions that are the first to have vastly improved computational efficiency in that the required computation times grow only quadratically rather than exponentially with the total number $p$ of covariates. These are also the first PoSI confidence regions with guaranteed asymptotic validity when the total number of covariates~$p$ diverges (almost exponentially) with the sample size~$n$. Under standard tail assumptions, we only require $(\log p)^7 = o(n)$ and $k = o(\sqrt{n/\log p})$ where $k (\le p)$ is the largest number of covariates (model size) considered for selection. We study various properties of these confidence regions, including their Lebesgue measures, and compare them (theoretically) with those proposed previously.

This paper prvides the first ever computationally efficient simultaneous inference for all sub-models in linear regression without assuming linearity, independence of observations.

8. First order expansion of convex regularized estimators

Pierre C. Bellec, Arun Kumar Kuchibhotla (2019)
Advances in Neural Information Processing Systems (NeurIPS'19). Accepted
[] [] [ arXiv ] [ Published version ]

We consider first order expansions of convex penalized estimators in high-dimensional regression problems with random designs. Our setting includes linear regression and logistic regression as special cases. For a given penalty function $h$ and the corresponding penalized estimator $\hbeta$, we construct a quantity $\eta$, the first order expansion of $\hbeta$, such that the distance between $\hbeta$ and $\eta$ is an order of magnitude smaller than the estimation error $\|\hat{eta} - eta^*\|$. In this sense, the first order expansion $\eta$ can be thought of as a generalization of influence functions from the mathematical statistics literature to regularized estimators in high-dimensions. Such first order expansion implies that the risk of $\hat{eta}$ is asymptotically the same as the risk of $\eta$ which leads to a precise characterization of the MSE of $\hbeta$; this characterization takes a particularly simple form for isotropic design. Such first order expansion also leads to inference results based on $\hat{eta}$. We provide sufficient conditions for the existence of such first order expansion for three regularizers: the Lasso in its constrained form, the lasso in its penalized form, and the Group-Lasso. The results apply to general loss functions under some conditions and those conditions are satisfied for the squared loss in linear regression and for the logistic loss in the logistic model.

This paper provides a first order influence function expansion for penalized estimators. The classical influence function expansion shows M-estimators properly centered behaves like a centered average and in this paper, we show the penalized estimator behaves like a shrinked version of the true parameter shifted by a centered average.

9. Assumption lean regression

Richard Berk, Andreas Buja, Lawrence D. Brown, Edward I. George, Arun Kumar Kuchibhotla, Weijie Su, Linda Zhao (2019)
The American Statistician
[] [] [ arXiv ] [ Published version ]

It is well known that models used in conventional regression analysis are commonly misspecified. A standard response is little more than a shrug. Data analysts invoke Box’s maxim that all models are wrong and then proceed as if the results are useful nevertheless. In this paper, we provide an alternative. Regression models are treated explicitly as approximations of a true response surface that can have a number of desir- able statistical properties, including estimates that are asymptotically unbiased. Valid statistical inference follows. We generalize the formulation to include regression func- tionals, which broadens substantially the range of potential applications. An empirical application is provided to illustrate the paper’s key concepts.

This paper is a non-technical version of the papers: Models as approximation parts 1 and 2.

10. Statistical inference based on bridge divergences

Arun Kumar Kuchibhotla, Somabha Mukherjee, Ayanendranath Basu (2019)
Annals of the Institute of Statistical Mathematics
[] [] [ arXiv ] [ Published version ]

M-estimators offer simple robust alternatives to the maximum likelihood estimator. The density power divergence (DPD) and the logarithmic density power divergence (LDPD) measures provide two classes of robust M-estimators which contain the MLE as a special case. In each of these families, the robustness of the estimator is achieved through a density power down-weighting of outlying observations. Even though the families have proved to be useful in robust inference, the relation and hierarchy between these two families are yet to be fully established. In this paper, we present a generalized family of divergences that provides a smooth bridge between DPD and LDPD measures. This family helps to clarify and settle several longstanding issues in the relation between the important families of DPD and LDPD, apart from being an important tool in different areas of statistical inference in its own right.

There are many robust estimators for location and scale estimators. For general parametric families, there are two prominent alternatives based on density power divergences and logarithmic density power divergences. These two classes have very different properties and in this paper we combine these two families to get a better family of estimators.

11. High-dimensional CLT: Improvements, Non-uniform Extensions and Large Deviations

Arun Kumar Kuchibhotla, Somabha Mukherjee, Debapratim Banerjee (2018)
arXiv:1806.06153. Accepted at Bernoulli
[] [] [ arXiv ]

Central limit theorems (CLTs) for high-dimensional random vectors with dimension possibly growing with the sample size have received a lot of attention in the recent times. Chernozhukov et al. (2017) proved a Berry--Esseen type result for high-dimensional averages for the class of hyperrectangles and they proved that the rate of convergence can be upper bounded by $n^{-1/6}$ upto a polynomial factor of $\log(p)$ (where $n$ represents the sample size and $p$ denotes the dimension). Convergence to zero of the bound requires $\log^7p=o(n)$. We improve upon their result which only requires $\log^4p=o(n)$ (in the best case). This improvement is made possible by a sharper dimension-free anti-concentration inequality for Gaussian process on a compact metric space. In addition, we prove two non-uniform variants of the high-dimensional CLT based on the large deviation and non-uniform CLT results for random variables in a Banach space by Bentkus, Rackauskas, and Paulauskas. We apply our results in the context of post-selection inference in linear regression and of empirical processes.

This paper considers the implication of Banach space CLT results for the high-dimensional case and comparing the proof techniques, we improve on the results of Chernozhukov et al. (2017). We further provide non-uniform and large deviation versions of high-dimensional CLT. Applications for empirical processes and post-selection inference are also considered.

12. Moving Beyond Sub-Gaussianity in High-dimensional Statistics: Applications in Covariance Estimation and Linear Regression

Arun Kumar Kuchibhotla, Abhishek Chakrabortty (2018)
arXiv:1804.02605. Accepted at Information and Inference: A Journal of the IMA
[] [] [ arXiv ]

Concentration inequalities form an essential toolkit in the study of high dimensional statistical methods. Most of the relevant statistics literature in this regard is, however, based on the assumptions of sub-Gaussian/sub-exponential random vectors. In this paper, we first bring together, via a unified exposition, various probability inequalities for sums of independent random variables under much weaker exponential type (sub-Weibull) tail assumptions. These results extract a part sub-Gaussian tail behavior of the sum in finite samples, matching the asymptotics governed by the central limit theorem, and are compactly represented in terms of a new Orlicz quasi-norm -- the Generalized Bernstein-Orlicz norm -- that typifies such kind of tail behaviors. We illustrate the usefulness of these inequalities through the analysis of four fundamental problems in high dimensional statistics. In the first two problems, we study the rate of convergence of the sample covariance matrix in terms of the maximum elementwise norm and the maximum $$k$$-sub-matrix operator norm which are key quantities of interest in bootstrap procedures and high dimensional structured covariance matrix estimation. The third example concerns the restricted eigenvalue condition, required in high dimensional linear regression, which we verify for all sub-Weibull random vectors under only marginal (not joint) tail assumptions on the covariates. To our knowledge, this is the first unified result obtained in such generality. In the final example, we consider the Lasso estimator for linear regression and establish its rate of convergence to be generally $$\sqrt{k\log p/n}$$, for $$k$$-sparse signals, under much weaker tail assumptions (on the errors as well as the covariates) than those in the existing literature. The common feature in all our results is that the convergence rates under most exponential tails match the usual ones obtained under sub-Gaussian assumptions. Finally, we also establish a high dimensional central limit theorem with a concrete rate bound for sub-Weibulls, as well as tail bounds for suprema of empirical processes. All our results are finite sample.

Concentration inequalities are well-known under sub-Gaussian and sub-exponential conditions. In some examples these do not hold. For example, in the case of sandwich variance in linear regression we encounter $XX'(Y - X'\beta)^2$ which is a quartic function of random variables. Even if we assume sub-Gaussianity of X and Y this quartic function is only sub-Weibull(1/2). In this paper, we consider tail bounds for these kind of random variables. After an exposition of these concentration inequalities, we consider four statistical applications.

13. A Model Free Perspective for Linear Regression: Uniform-in-model Bounds for Post Selection Inference

Arun Kumar Kuchibhotla, Lawrence D. Brown, Andreas Buja, Edward I. George, Linda Zhao (2018)
arXiv:1802.05801. Accepted at Econometric Theory.
[] [] [ arXiv ]

For the last two decades, high-dimensional data and methods have proliferated throughout the literature. The classical technique of linear regression, however, has not lost its touch in applications. Most high-dimensional estimation techniques can be seen as variable selection tools which lead to a smaller set of variables where classical linear regression technique applies. In this paper, we prove estimation error and linear representation bounds for the linear regression estimator uniformly over (many) subsets of variables. Based on deterministic inequalities, our results provide ``good'' rates when applied to both independent and dependent data. These results are useful in correctly interpreting the linear regression estimator obtained after exploring the data and also in post model-selection inference. All the results are derived under no model assumptions and are non-asymptotic in nature.

This paper proves finite sample results for convergence of OLS estimators uniform over all subset of covariates. The results apply for both independent and dependent observations. The dependence setting considered is functional dependence introduced by Wu (2005).

14. On the asymptotics of minimum disparity estimation

Arun Kumar Kuchibhotla, Ayanendranath Basu (2017)
TEST: An Official Journal of the Spanish Society of Statistics and Operations Research
[] [] [ arXiv ] [ Published version ]

Inference procedures based on the minimization of divergences are popular statistical tools. Beran (Ann stat 5(3):445–463, 1977) proved consistency and asymptotic normality of the minimum Hellinger distance (MHD) estimator. This method was later extended to the large class of disparities in discrete models by Lindsay (Ann stat 22(2):1081–1114, 1994) who proved existence of a sequence of roots of the estimating equation which is consistent and asymptotically normal. However, the current literature does not provide a general asymptotic result about the minimizer of a generic disparity. In this paper, we prove, under very general conditions, an asymptotic representation of the minimum disparity estimator itself (and not just for a root of the estimating equation), thus generalizing the results of Beran (Ann stat 5(3):445–463, 1977) and Lindsay (Ann stat 22(2):1081–1114, 1994). This leads to a general framework for minimum disparity estimation encompassing both discrete and continuous models.

This paper provides a unified framework for the asymptotics of the minimum disparity estimators. This paper does not assume any model or dependence structure on the data. Hence it is similar in spirit to my paper: Determinisitic Inequalities for Smooth M-estimators.

15. Semiparametric Efficiency in Convexity Constrained Single Index Model

Arun Kumar Kuchibhotla, Rohit K. Patra, Bodhisattva Sen (2017)
arXiv:1708.00145. Accepted at Journal of the American Statistical Association.
[] [] [ arXiv ]

We consider estimation and inference in a single index regression model with an unknown convex link function. We propose a Lipschitz constrained least squares estimator (LLSE) for both the parametric and the nonparametric components given independent and identically distributed observations. We prove the consistency and find the rates of convergence of the LLSE when the errors are assumed to have only $q\ge2$ moments and are allowed to depend on the covariates. In fact, we prove a general theorem which can be used to find the rates of convergence of LSEs in a variety of nonparametric/semiparametric regression problems under the same assumptions on the errors. Moreover when $q\ge5$, we establish $n^{-1/2}$-rate of convergence and asymptotic normality of the estimator of the parametric component. Moreover the LLSE is proved to be semiparametrically efficient if the errors happen to be homoscedastic. Furthermore, we develop the R package simest to compute the proposed estimator.

This paper consider estimation and inference in a single index model when the link function is constrained to be convex and Lipschitz. This paper proves a general result that helps find rate of convergence of nonparametric least squares under finite number of moments.

16. simest: Single Index Model Estimation with Constraints on Link Function

Arun Kumar Kuchibhotla, Rohit K. Patra (2016)
CRAN. R package version 0.6,
[] [] [ arXiv ]


This package implements the estimators from the papers on single index model with Rohit Patra and Bodhisattva Sen.

17. A general set up for minimum disparity estimation

Arun Kumar Kuchibhotla, Ayanendranath Basu (2015)
Statistics & Probability Letters
[] [] [ arXiv ] [ Published version ]

Lindsay (1994) provided a general set up in discrete models for minimum disparity estimation. Such a set up eludes us in continuous models. We provide such a general result and hence fill up a major gap in the literature.

Lindsay (1994) provided a general set up in discrete models for minimum disparity estimation. Such a set up eludes us in continuous models. We provide such a general result and hence fill up a major gap in the literature.

Preprints

[ Top ]

18. Nested Conformal Prediction Sets for Classification with Applications to Probation Data

Arun Kumar Kuchibhotla, Richard Berk (2021)
arXiv:2104.09358
[] [] [ arXiv ]

Risk assessments to help inform criminal justice decisions have been used in the United States since the 1920s. Over the past several years, statistical learning risk algorithms have been introduced amid much controversy about fairness, transparency and accuracy. In this paper, we focus on accuracy for a large department of probation and parole that is considering a major revision of its current, statistical learning risk methods. Because the content of each offender's supervision is substantially shaped by a forecast of subsequent conduct, forecasts have real consequences. Here we consider the probability that risk forecasts are correct. We augment standard statistical learning estimates of forecasting uncertainty (i.e., confusion tables) with uncertainty estimates from nested conformal prediction sets. In a demonstration of concept using data from the department of probation and parole, we show that the standard uncertainty measures and uncertainty measures from nested conformal prediction sets can differ dramatically in concept and output. We also provide a modification of nested conformal called the localized conformal method to match confusion tables more closely when possible. A strong case can be made favoring the nested and localized conformal approach. As best we can tell, our formulation of such comparisons and consequent recommendations is novel.

The paper explains nested conformal prediction for classification with application to probation data. The discussion is most non-technical and didactic in nature.

19. Median bias of M-estimators

Arun Kumar Kuchibhotla (2021)
arXiv:2106.00164
[] [] [ arXiv ]

In this note, we derive bounds on the median bias of univariate M-estimators under mild regularity conditions. These requirements are not sufficient to imply convergence in distribution of the M-estimators. We also discuss median bias of some multivariate M-estimators.

This paper provides simple bounds on the median bias of M-estimators.

20. The HulC: Confidence Regions from Convex Hulls

Arun Kumar Kuchibhotla, Sivaraman Balakrishnan, Larry Wasserman (2021)
arXiv:2105.14577
[] [] [ arXiv ]

We develop and analyze the HulC, an intuitive and general method for constructing confidence sets using the convex hull of estimates constructed from subsets of the data. Unlike classical methods which are based on estimating the (limiting) distribution of an estimator, the HulC is often simpler to use and effectively bypasses this step. In comparison to the bootstrap, the HulC requires fewer regularity conditions and succeeds in many examples where the bootstrap provably fails. Unlike subsampling, the HulC does not require knowledge of the rate of convergence of the estimators on which it is based. The validity of the HulC requires knowledge of the (asymptotic) median-bias of the estimators. We further analyze a variant of our basic method, called the Adaptive HulC, which is fully data-driven and estimates the median-bias using subsampling. We show that the Adaptive HulC retains the aforementioned strengths of the HulC. In certain cases where the underlying estimators are pathologically asymmetric the HulC and Adaptive HulC can fail to provide useful confidence sets. We propose a final variant, the Unimodal HulC, which can salvage the situation in cases where the distribution of the underlying estimator is (asymptotically) unimodal. We discuss these methods in the context of several challenging inferential problems which arise in parametric, semi-parametric, and non-parametric inference. Although our focus is on validity under weak regularity conditions, we also provide some general results on the width of the HulC confidence sets, showing that in many cases the HulC confidence sets have near-optimal width.

This paper develops a very simple procedure for constructing confidence intervals for functionals/parameters. Despite the simplicity, it is as generally applicable as bootstrap and subsampling, if not more.

21. maars: Tidy Inference under the 'Models as Approximations' Framework in R

Riccardo Fogliato, Shamindra Shrotriya, Arun Kumar Kuchibhotla (2021)
arXiv:2106.11188
[] [] [ arXiv ]

Linear regression using ordinary least squares (OLS) is a critical part of every statistician's toolkit. In R, this is elegantly implemented via lm() and its related functions. However, the statistical inference output from this suite of functions is based on the assumption that the model is well specified. This assumption is often unrealistic and at best satisfied approximately. In the statistics and econometrics literature, this has long been recognized and a large body of work provides inference for OLS under more practical assumptions. This can be seen as model-free inference. In this paper, we introduce our package maars ("models as approximations") that aims at bringing research on model-free inference to R via a comprehensive workflow. The maars package differs from other packages that also implement variance estimation, such as sandwich, in three key ways. First, all functions in maars follow a consistent grammar and return output in tidy format, with minimal deviation from the typical lm() workflow. Second, maars contains several tools for inference including empirical, multiplier, residual bootstrap, and subsampling, for easy comparison. Third, maars is developed with pedagogy in mind. For this, most of its functions explicitly return the assumptions under which the output is valid. This key innovation makes maars useful in teaching inference under misspecification and also a powerful tool for applied researchers. We hope our default feature of explicitly presenting assumptions will become a de facto standard for most statistical modeling in R.

This paper introduces the maars package in R for misspecification inference for OLS.

22. Finite-sample Efficient Conformal Prediction

Yachong Yang, Arun Kumar Kuchibhotla (2021)
arXiv:2104.13871
[] [] [ arXiv ]

Conformal prediction is a generic methodology for finite-sample valid distribution-free prediction. This technique has garnered a lot of attention in the literature partly because it can be applied with any machine learning algorithm that provides point predictions to yield valid prediction regions. Of course, the efficiency (width/volume) of the resulting prediction region depends on the performance of the machine learning algorithm. In this paper, we consider the problem of obtaining the smallest conformal prediction region given a family of machine learning algorithms. We provide two general-purpose selection algorithms and consider coverage as well as width properties of the final prediction region. The first selection method yields the smallest width prediction region among the family of conformal prediction regions for all sample sizes, but only has an approximate coverage guarantee. The second selection method has a finite sample coverage guarantee but only attains close to the smallest width. The approximate optimal width property of the second method is quantified via an oracle inequality. Asymptotic oracle inequalities are also considered when the family of algorithms is given by ridge regression with different penalty parameters.

This paper develops two algorithms to tune tuning parameters of machine learning methods to obtain smallest width prediction sets from (split) conformal prediction. The first algorithm is as widely applicable as the split conformal method and yields the smallest width prediction set but with approximate validity. The second algorithm gets finite sample validity but only approximately the smallest width.

23. Exchangeability, Conformal Prediction and Rank Tests

Arun Kumar Kuchibhotla (2020)
arXiv:2005.06095
[] [] [ arXiv ]

Conformal prediction has been a very popular method of distribution-free predictive inference in recent years in machine learning and statistics. The main reason for its popularity comes from the fact that it works as a wrapper around any prediction algorithm such as neural networks or random forests. Exchangeability is at the core of the validity of conformal prediction. The concept of exchangeability is also at the core of rank tests widely known in nonparametric statistics. In this paper, we review the concept of exchangeability and discuss its implications for rank tests and conformal prediction. Although written as an exposition, the main message of the paper is to show that similar to conformal prediction, rank tests can also be used as a wrapper around any dimension reduction algorithm.

The paper explores the concept of exchangeability and shows how some of its basic implications fuel the topics of conformal prediction as well as non-parametric rank tests.

24. High-dimensional CLT for Sums of Non-degenerate Random Vectors: $n^{−1/2}$-rate

Arun Kumar Kuchibhotla, Alessandro Rinaldo (2020)
arXiv:2009.13673
[] [] [ arXiv ]

In this note, we provide a Berry--Esseen bounds for rectangles in high-dimensions when the random vectors have non-singular covariance matrices. Under this assumption of non-singularity, we prove an $n^{−1/2}$ scaling for the Berry--Esseen bound for sums of mean independent random vectors with a finite third moment. The proof is essentially the method of compositions proof of multivariate Berry--Esseen bound from Senatov (2011). Similar to other existing works (Kuchibhotla et al. 2018, Fang and Koike 2020a), this note considers the applicability and effectiveness of classical CLT proof techniques for the high-dimensional case.

This paper provides a central limit theorem in the high-dimensional context with an root-n rate of convergence, assuming only a finite third moment on the random vectors and degeneracy.

25. Berry-Esseen Bounds for Projection Parameters and Partial Correlations with Increasing Dimension

Arun Kumar Kuchibhotla, Alessandro Rinaldo, Larry Wasserman (2020)
arXiv:2007.09751
[] [] [ arXiv ]

The linear regression model can be used even when the true regression function is not linear. The resulting estimated linear function is the best linear approximation to the regression function and the vector β of the coefficients of this linear approximation are the projection parameter. We provide finite sample bounds on the Normal approximation to the law of the least squares estimator of the projection parameters normalized by the sandwich-based standard error. Our results hold in the increasing dimension setting and under minimal assumptions on the distribution of the response variable. Furthermore, we construct confidence sets for β in the form of hyper-rectangles and establish rates on their coverage accuracy. We provide analogous results for partial correlations among the entries of sub-Gaussian vectors.

The paper provides sharp requirements on the dimension with respect to the sample size for valid confidence intervals for ordinary least squares estimator and partial correlations. We derive these based on Berry-Esseen bounds and proving finite-sample bounds for multiplier bootstrap.

26. All of Linear Regression

Arun Kumar Kuchibhotla, Lawrence D. Brown, Andreas Buja, Junhui Cai (2019)
arXiv:1910.06386
[] [] [ arXiv ]

Least squares linear regression is one of the oldest and widely used data analysis tools. Although the theoretical analysis of ordinary least squares (OLS) estimator is as old, several fundamental questions are yet to be answered. Suppose regression observations $(X_1,Y_1),...,(X_n,Y_n)$ (not necessarily independent) are available. Some of the questions we deal with are as follows: under what conditions, does the OLS estimator converge and what is the limit? What happens if the dimension is allowed to grow with $n$? What happens if the observations are dependent with dependence possibly strengthening with $n$? How to do statistical inference under these kinds of misspecification? What happens to OLS estimator under variable selection? How to do inference under misspecification and variable selection? We answer all the questions raised above with one simple deterministic inequality which holds for any set of observations and any sample size. This implies that all our results are finite sample (non-asymptotic) in nature. At the end, one only needs to bound certain random quantities under specific settings of interest to get concrete rates and we derive these bounds for the case of independent observations. In particular the problem of inference after variable selection is studied, for the first time, when $d$, the number of covariates increases (almost exponentially) with sample size $n$. We provide comments on the ``right'' statistic to consider for inference under variable selection and efficient computation of quantiles.

This paper provides a complete study of the ordinary least squares linear regression estimator starting from misspecified case and leading upto inference after variable selection. All the results are valid for any set of observations, possibly dependent. Exact rates are derived for independent observations.

27. Model-free Study of Ordinary Least Squares Linear Regression

Arun Kumar Kuchibhotla, Lawrence D. Brown, Andreas Buja (2018)
arXiv:1809.10538
[] [] [ arXiv ]

Ordinary least squares (OLS) linear regression is one of the most basic statistical techniques for data analysis. In the main stream literature and the statistical education, the study of linear regression is typically restricted to the case where the covariates are fixed, errors are mean zero Gaussians with variance independent of the (fixed) covariates. Even though OLS has been studied under misspecification from as early as the 1960's, the implications have not yet caught up with the main stream literature and applied sciences. The present article is an attempt at a unified viewpoint that makes the various implications of misspecification stand out.

This paper provides a thorough study of the OLS estimator under misspecification. All the results are finite sample in nature backed by Berry-Esseen bounds. Validity of different bootstrap methods is also discussed.

28. Tail Bounds for Canonical U-Statistics and U-Processes with Unbounded Kernels

Abhishek Chakrabortty, Arun Kumar Kuchibhotla (2018)
[] [] [ arXiv ]

In this paper, we prove exponential tail bounds for canonical (or degenerate) U-statistics and U-processes under exponential-type tail assumption on the kernels. Most of the existing results in the relevant literature often assume bounded kernels or obtain sub-optimal tail behavior under unbounded kernels. We obtain sharp rates and optimal tail behavior under sub-Weibull kernel functions. Some examples from non-parametric regression literature are considered.

This paper is an extension of the Moving beyond sub-Gaussianity paper. Degenerate U-statistics represent a second order extension of mean zero averages. We prove bounds when the U-statistics kernels are unbounded.

29. A minimum distance weighted likelihood method of estimation

Arun Kumar Kuchibhotla, Ayanendranath Basu (2018)
Technical report, Interdisciplinary Statistical Research Unit (ISRU), Indian Statistical Institute
[] [] [ arXiv ]

Over the last several decades, minimum distance (or minimum divergence, minimum disparity, minimum discrepancy) estimation methods have been studied in different statistical settings as an alternative to the method of maximum likelihood. The initial motivation was probably to exhibit that there exists other estimators apart from the maximum likelihood estimator (MLE) which has full asymptotic efficiency at the model. As the scope of and interest in the area of robust inference grew, many of these estimators were found to be particularly useful in that respect and performed better than the MLE under contamination. Later, a weighted likelihood variant of the method was developed in the same spirit, which was substantially simpler to implement. In the statistics literature the method of minimum disparity estimation and the corresponding weighted likelihood estimation methods have distinct identities. Despite their similarities, they have some basic differences. In this paper we propose a method of estimation which is simultaneously a minimum disparity method and a weighted likelihood method, and may be viewed as a method that combines the positive aspects of both. We refer to the estimator as the minimum distance weighted likelihood (MDWL) estimator, investigate its properties, and illustrate the same through real data examples and simulations. We briefly explore the applicability of the method in robust tests of hypothesis.

This paper provides a version of weighted likelihood estimator that corresponds to a minimum disparity estimator. This estimator is both robust to outliers and asymptotically efficient under the true model.

30. Deterministic Inequalities for Smooth M-estimators

Arun Kumar Kuchibhotla (2018)
arXiv:1809.05172
[] [] [ arXiv ]

Ever since the proof of asymptotic normality of maximum likelihood estimator by Cramer (1946), it has been understood that a basic technique of the Taylor series expansion suffices for asymptotics of M-estimators with smooth/differentiable loss function. Although the Taylor series expansion is a purely deterministic tool, the realization that the asymptotic normality results can also be made deterministic (and so finite sample) received far less attention. With the advent of big data and high-dimensional statistics, the need for finite sample results has increased. In this paper, we use the (well-known) Banach fixed point theorem to derive various deterministic inequalities that lead to the classical results when studied under randomness. In addition, we provide applications of these deterministic inequalities for crossvalidation/subsampling, marginal screening and uniform-in-submodel results that are very useful for post-selection inference and in the study of post-regularization estimators. Our results apply to many classical estimators, in particular, generalized linear models, non-linear regression and cox proportional hazards model. Extensions to non-smooth and constrained problems are also discussed.

This paper proves deterministic inequalities for estimator defined by smooth estimating equations without assuming any model or dependence structure on the data. The examples include ordinary least squares, generalized linear models, Cox proportional hazards model, non-linear least squares and equality constrained estimators.

31. On single index models with convex link

Arun Kumar Kuchibhotla, Rohit K. Patra, Bodhisattva Sen (2015)
Previous version of Least Squares Estimation in a Single Index Model with Convex Lipschitz link
[] [] [ arXiv ]

We consider estimation and inference in a single index regression model with an unknown convex link function. We propose a Lipschitz constrained least squares estimator (LLSE) for both the parametric and the nonparametric components given independent and identically distributed observations. We prove the consistency and find the rates of convergence of the LLSE when the errors are assumed to have only $q\ge2$ moments and are allowed to depend on the covariates. In fact, we prove a general theorem which can be used to find the rates of convergence of LSEs in a variety of nonparametric/semiparametric regression problems under the same assumptions on the errors. Moreover when $q\ge5$, we establish $n^{-1/2}$-rate of convergence and asymptotic normality of the estimator of the parametric component. Moreover the LLSE is proved to be semiparametrically efficient if the errors happen to be homoscedastic. Furthermore, we develop the R package simest to compute the proposed estimator.

This paper consider estimation and inference in a single index model when the link function is constrained to be convex and smooth. This paper studies two different estimators. One based on convexity constrained smoothing splines and convex Liptschitz least squares.

32. Testing in Additive and Projection Pursuit Models

Arun Kumar Kuchibhotla (2013)
[] [] [ arXiv ]

Additive models and projection pursuit models are very useful popular nonparametric methods for fitting multivariate data. The flexibility of these models makes them very useful. Yet, this very property can sometimes lead to overfitting. Inference procedures like testing of hypothesis in these cases are not very well developed in the literature. This might be due to the complexity involved in estimation. In the present paper we introduce a bootstrap based technique which allows one to test the hypothesis of the adequacy of multiple linear regression model versus the nonparametric additive model and beyond. These tests are highly useful for practitioners since the simpler models are more interpretable. We will also introduce a new model which incorporates both the additive model and the multiple index model.

This paper was awarded second prize in ISI Jan Tinbergen Awards 2015. The paper considers the problem of testing between projection pursuit models and ACE models.

Working papers

[ Top ]

33. Conformal Prediction for Validity of Resampling Inference

Arun Kumar Kuchibhotla (2020)
[] [] [ arXiv ]

This note describes a deficiency of traditional proofs of consistency of resampling techniques for statistical inference and provides a simple solution based on conformal prediction.

Traditional bootstrap consistency results have a gap and do not complete the proof of consistency of such inference because the practical bootstrap technique is limited by finite computational power. In this paper, we discuss this issue in detail and provide an alternative solution that does not require empirical bootstrap distribution to converge to the true bootstrap distribution conditional on the data.

34. A Weighted Likelihood Approach Based on Statistical Data Depths

Claudio Agostinelli, Ayanendranath Basu, Arun Kumar Kuchibhotla (2019)
[] []

We propose a general approach to construct weighted likelihood estimating equations with the aim of obtain robust estimates. The weight, attached to each score contribution, is evaluated by comparing the statistical data depth at the model with that of the sample in a given point. Observations are considered regular when the ratio of these two depths is close to one, whereas, when the ratio is large the corresponding score contribution may be downweigthed. Details and examples are provided for the robust estimation of the parameters in the multivariate normal model. Because of the form of the weights, we expect that, there will be no downweighting under the true model leading to highly efficient estimators. Robustness is illustrated using two real data sets.

This paper extends the minimum distance weighted likelihood estimator for the multivariate case using data depth.

35. Model-agnostic Semiparametric Inference

Arun Kumar Kuchibhotla, Eric J. Tchetgen Tchetgen (2019)
[] []


This paper considers an application of deterministic inequalities for linear and logistic regression for causal estimands.