The residuals of a least squares regression model are defined as the observations minus the modeled values. For least squares regression to produce valid CIs and

One wants to conduct a linear regression, a model in which a dependent variable (assumed, to limit scope, continuous and uncensored) is predicted from one or more independent variables. Alternately, one might consider a 1- or 2-sample

One gathers observations of both the dependent variable and the independent variable(s). One uses software that implements regression. Suppose the slope is significantly different from zero. Does that mean that one is done? Actually, additional steps are needed to make sure the conclusion is valid. We will discuss some (but not all) of them.

Linear regression (the example below assumes one independent variable, but there can be any number) involves a model of the following form:

where, for the ith observation, Y_{i} is the dependent variable, X_{i} is the independent variable, _{0} and _{1} are fixed but unknown constants, and _{i} is a random variable accounting for measurement error/lack of fit of model. However, for least squares, the most common form of regression, to work, certain assumptions concerning {_{i}} must be satisfied:

{_{i}} must be uncorrelated.

{_{i}} must be normally distributed.

Var{_{i}} must be constant.

For a more complete explanation, see Berry (

To be clear, these are assumptions about {_{i}} and do not apply to the dependent or independent variables. In particular, neither the dependent nor independent variables need to be normally distributed.

Violation of these assumptions can cause coverage of CIs (the probability that the CI contains the true parameter value) to be very different from the nominal value (the value that was calculated under the assumptions listed above). Similarly, the actual and nominal values of the probabilities of type I error and type II error can be far apart. The more severe the violation, the more severe the impact. In practical terms, although a minor violation might or might not have much practical consequence, a severe violation often leads to poor statistical performance.

Of course, {_{i}} cannot be directly observed. We must, therefore, work with the estimated residuals, {_{i}s}, defined as the observed ith value of the dependent variable minus the modeled ith value. How can we decide, based on the estimated residuals, if the assumptions about the unobserved {_{i}} are satisfied? There are both formal tests and less formal graphical methods, both of which have advantages. Tests are objective and can, if necessary, be automated. However, with large sample sizes, tests can flag trivial deviations from the assumptions. Tests only reject the null hypothesis if the evidence is strong; with small sample sizes, tests might fail to detect violations that, although not statistically significant, might still be problematic if real. Graphical methods are subjective. However, graphical methods often allow one to judge the severity of the departures from the assumptions. The degree of severity determines how badly remedial measures are needed—and, indeed, if they are likely to make much difference. Admittedly, this might be difficult for those without extensive statistical experience, so the reader should assess his or her own ability to interpret graphics before implementing them. Here, we discuss both tests and graphical methods for assessing assumptions and what to do if the assumptions are violated.

Knowledge of how the data were gathered often makes independence (which implies no correlation) plausible. However, correlation of population residuals can be assessed. Note that one must distinguish {_{i}}, the population residual, from the estimated residuals, {_{i}}; some correlation is always present in the {_{i}}, because it is a property of linear regression that Σ{_{i}} must always be 0.

There are infinitely many forms of correlation. For example, observations from similar individuals might be correlated. Clustering (e.g., if the data were collected in a few locations, observations collected at a single location) can induce correlation. We limit our discussion to what is perhaps the most common form of correlation, serial correlation—if observations are gathered sequentially, residuals that occur near one another might be correlated (autocorrelation).

The most common test for assessing serial dependence is based on the Durbin-Watson statistic (

The Durbin-Watson statistic can be interpreted by noting that it approximately equals 2(1 − r), where r is the sample correlation between estimated residuals and lag-one estimated residuals (hereafter, simple “residuals”) (

The Gauss-Markov theorem shows us that regression provides the linear unbiased estimate with the smallest possible variance, even if residuals are not normally distributed. This is sometimes misinterpreted to mean that normality is not important. The Gauss-Markov theorem only concerns point estimates, not tests or CIs. Regression estimates can be especially sensitive to heavytailed distributions (

Many tests for normality of residuals have been proposed. The D'Agostino test (

The Shapiro-Wilk test (

In all of these tests, the null hypothesis is normality. The null is rejected only if evidence of nonnormality is strong. Therefore, one should be cautious in implementing these tests with small sample sizes. In such cases, departures from normality that can have substantial consequences are sometimes not detected.

The simplest way to graphically evaluate normality of residuals is to plot a histogram and examine it for departures from normality. This can often reveal skewed residuals. It can be problematic for heavy-tailed residuals, because some heavy-tailed symmetric distributions can look quite normal. A more sophisticated method, and one that often reveals deviations from normality that are difficult to see in a histogram, is to plot the empirical quantiles of the residuals against the theoretical quantiles of a normal distribution (a quantile-quantile plot). A straight line suggests normality, whereas a curved suggests a departure from normality. In

Variance of residuals can, in a departure from the assumptions that underlie linear regression, change as independent variables do. It is particularly common that the variance of the residuals increases with values of an independent variable. Even with nonconstant variance, linear regression provides unbiased point estimators (Gauss-Markov theorem). However, with nonconstant residual variance, the nominal and actual probabilities of type I and type II errors can be very different. Similarly, coverage of CIs can be far from their nominal values.

This is of particular concern with ANOVA. If a subpopulation has both a larger variance and a larger sample size, the resulting test becomes conservative (and of low power). Conversely, if a subpopulation has both a smaller sample size and a larger variance, the resulting test can be anticonservative (

The Breusch-Pagan test (^{2} test based on regressing the squared residuals on the independent variables. The Breusch-Pagan test can be sensitive to violations of normality. The White test (

The simplest form of graphical evaluation is to plot residuals vs. each of the independent variables (or, alternately, vs. the modeled values of the dependent variable). If the assumption is satisfied, one should see a patternless blob. A pattern suggests there might be an issue. The “v-shaped” pattern (or, if one prefers, “fan shaped” or “pie-wedge shaped”), where absolute values of residuals tend to increase as an independent variable increases, is reasonably common in real data. Therefore, one should be very wary of it. In

The discussions of this section are, of necessity, somewhat sketchy. However, we provide references where the details of each method can be found by someone who needs to implement the method.

How one handles correlated residuals depends on how much one knows about the correlation structure of the residuals. For example, if one knows that residuals are likely to be auto-correlated, this can be accounted for in modeling. Feasible generalized least squares is a method that has broad applicability. An explanation appears in Baltagi (

If a parametric family can be identified, then one can often achieve greatest power by explicit modeling. This can be done through generalized linear models (

Robust regression is a variant of regression for which outliers in the residuals (but not necessarily in the independent variables) have little impact on the estimates. There are too many forms of robust regression to discuss here, although most involve down-weighting, in some manner, “extreme” residuals. For example, least trimmed squares (

In the case of residuals of totally unknown parametric form, one can use resampling methods, in which one samples from the sample, to obtain estimates of standard errors that do not depend on parametric assumptions (although other assumptions, not specified here, must be made). For example, one can bootstrap residuals (a classic resampling method) or use jackknifing (another resampling method) on the entire set of observations. At one time, objections to resampling due to computational intensiveness were common. In today's world of cheap and easy computing, these objections are no longer valid. Indeed, many common statistical software packages (e.g., SAS, Stata) implement bootstrapping and/or the jackknife with a single command. An overview of resampling, as applied to regression, appears in Wu (

Finally, some authors [e.g., Valdar et al. (

Nonconstant variance can be extremely problematic, because data points with greater variance can have a disproportionate impact on the estimates. Fortunately, methods already mentioned can often help. For example, the Box-Cox transformation or Tukey's ladder of transformations can sometimes make variances approximately constant. Robust regression methods can be less sensitive to nonconstant variance than traditional methods. Weighted least squares [for an explanation, see Strutz (

In conclusion, if the residuals of a least squares regression model do not satisfy the assumptions,

Historically, some authors check assumptions and some do not. For example, Hirose et al. (

Finally, this report has not covered all important issues. We have not addressed outliers in either the independent variables (high-leverage data points) or the dependent variables (all but a small number of residuals are approximately normally distributed, but those few suggest very heavy tails). We have not addressed collinearity (some linear combination of independent variables is approximately constant). We have not discussed linearity of relation between dependent and independent variable (s). These issues are important but beyond the scope of this report.

The authors thank Pamela Sedgwick-Barker and Srila Sen for their editorial contributions.

The authors reported no funding received for this study.

The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the CDC.

The authors' responsibilities were as follows—LEB and KMS: responsible for design, writing, and final content. Neither author declared a conflict of interest related to this study.

_{2}and √b

_{2}

Uncorrelated residuals, Pearson correlation = −0.06 (A). Correlated residuals, Pearson correlation = 0.45 (B).

Residuals normally distributed, quantile-quantile plot (A). Residuals not normally distributed, quantile-quantile plot (B).

Constant variance of residuals, Pearson correlation = 0.00 (A). Increasing variance of residuals, Pearson correlation = 0.00 (B).