10159384040718Epidemiol MethodEpidemiol MethodEpidemiologic methods2194-92632161-962X25844304438246810.1515/em-2013-0008HHSPA655960ArticleExtended Matrix and Inverse Matrix Methods Utilizing Internal Validation Data When Both Disease and Exposure Status Are MisclassifiedTangLiDepartment of Biostatistics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USAli.tang@stjude.orgLylesRobert H.Department of Biostatistics and Bioinformatics, Rollins School of Public Health of Emory University, Atlanta, GA 30322, USArlyles@emory.eduYeYeIntelligent Systems Program and RODS Laboratory, University of Pittsburgh, Pittsburgh, PA 15206, USAyey5@pitt.eduLoYungtaiDepartment of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY 10461, USAyungtai.lo@einstein.yu.eduKingCaroline C.Division of Reproductive Health, Centers for Disease Control and Prevention, Atlanta, GA 30341, USAzpg2@cdc.gov12220151920131920130242015214966

The problem of misclassification is common in epidemiological and clinical research. In some cases, misclassification may be incurred when measuring both exposure and outcome variables. It is well known that validity of analytic results (e.g. point and confidence interval estimates for odds ratios of interest) can be forfeited when no correction effort is made. Therefore, valid and accessible methods with which to deal with these issues remain in high demand. Here, we elucidate extensions of well-studied methods in order to facilitate misclassification adjustment when a binary outcome and binary exposure variable are both subject to misclassification. By formulating generalizations of assumptions underlying well-studied “matrix” and “inverse matrix” methods into the framework of maximum likelihood, our approach allows the flexible modeling of a richer set of misclassification mechanisms when adequate internal validation data are available. The value of our extensions and a strong case for the internal validation design are demonstrated by means of simulations and analysis of bacterial vaginosis and trichomoniasis data from the HIV Epidemiology Research Study.

inverse matrix methodlikelihoodmatrix methodmisclassification
1 Introduction

In many epidemiologic and clinical studies, one aims to quantify the association between binary disease and exposure status, for instance, via odds ratios (ORs) based on 2 × 2 tables. A common practical problem is that misclassification may exist in one or both variables. The threats to the validity of analytic results that stem from misclassification have received considerable attention. For example, the “matrix method” discussed in epidemiological textbooks (Kleinbaum et al., 1982; Rothman and Greenland, 1998) provides variations on an intuitive correction identity due to Barron (1977) that is parameterized in terms of familiar sensitivity and specificity properties of surrogate measurements on disease and exposure status. Greenland (1988) discussed point estimation and derived variance estimators under differential and nondifferential exposure misclassification using the matrix method, under various validation sampling schemes. By instead parameterizing in terms of positive and negative predictive values, Marshall (1990) developed an alternative correction identity later designated as the “inverse matrix method” (Morrissey and Spiegelman, 1999). The original inverse matrix method is restricted to the situation when there is differential misclassification of one variable (disease or exposure status), in which case it has been shown that Marshall’s closed-form internal validation data-based corrected OR estimator is in fact a maximum-likelihood estimator (MLE) (Lyles, 2002; Greenland, 2008). Efficiency studies comparing the matrix and inverse matrix methods when exposure is misclassified also appear in the literature (Morrissey and Spiegelman, 1999).

We recognize the practical need of developing intuitive methods for estimating ORs in 2 × 2 tables with a more general view of misclassification. In particular, Barron’s (1977) matrix method is an identity that assumes nondifferential and independent misclassification of both variables and is directly applicable only as a sensitivity analysis tool. Greenland and Kleinbaum (1983) extended this identity to permit differential but independent misclassification of both Y and X, but did not delve into efficient analysis based on validation data. Greenland (1988), Marshall (1990), Morrissey and Spiegelman (1999), and Lyles (2002) facilitated efficient estimation of the crude OR via validation data, but all considered misclassification of only one variable (e.g. exposure). Holcroft et al. (1997) tackled a similar problem with the use of a three-stage validation design, by proposing a class of semiparametric estimators.

Here, we seek to further extend the focus within the 2 × 2 table setting in a way that allows full generalization of the assumed misclassification process, and as a result subsumes the preceding treatments as special cases. This extension is driven by the practicalities of study design and analysis, as we focus on flexible modeling to account for complex misclassification via a rich internal validation sample when both binary variables are subject to errors in measurement. Rather than solely a theoretical exercise, it is directly motivated by real data for which we demonstrate that only this most general misclassification model is adequate.

In Section 2, we provide a maximum-likelihood (ML) framework that can be viewed as a practical facilitation of generalized versions of the matrix and inverse matrix methods. To our knowledge, it constitutes the first generalization of the matrix method identity to account for both dependent and differential misclassification and the first generalization of the inverse matrix identity to account for misclassification of both X and Y. We draw comparisons across methods and make suggestions for analyzing data in practice, heavily emphasizing the advantages of internal validation subsampling. This strategy, when feasible, facilitates efficient estimation of corrected ORs while avoiding serious biases that can occur when the assumed misclassification model is too simplistic. In addition, we suggest a model selection procedure that is readily implemented in standard statistical software. While our primary focus is on the point estimation of ORs in cross-sectional studies, we also briefly address the applicability of the methods to case–control studies. In Section 3, we introduce our motivating example, based on assessments of bacterial vaginosis (BV) and trichomoniasis (TRICH) in the HIV Epidemiology Research study (HERS). This example clearly illustrates how serious misinterpretation of the data can result when overly simplified misclassification models are assumed and highlights the benefits of the proposed approach. In Section 4, we present simulation studies to demonstrate the overall performance of the ML methodology in the context of cross-sectional studies.

2 Methods2.1 Notation and terminology2.1.1 Differential and dependent misclassification

Consider a 2 × 2 table in which one measures an error-prone surrogate X* in place of a true exposure X and an error-prone Y* in place of a true response Y. We assume X, X*, Y, and Y* are all binary variables. Now define πxy = Pr(X = x, Y = y) and πxy=Pr(X=x,Y=y)(x,y,x,y=0,1). The true OR of primary interest is given by π11π00/π10π01, while with misclassification in both variables, the naïve OR is π11π00π10π01.

The observed data likelihood contribution for an observation with (X* = x*, Y* = y*) can be expressed as follows without losing generality: πxy=x=01y=01Pr(Y=yY=y,X=x,X=x)Pr(X=xX=x,Y=y)πxy.

The first and second terms in eq. [1] represent the most general form of the likelihood expressed with a generalized version of the familiar misclassification parameters known as sensitivity (SE) and specificity (SP). Without additional constraints, we define SEYxx* = Pr(Y* = 1|Y = 1, X = x, X* = x*) and SPYxx* = Pr(Y* = 0|Y = 0, X = x, X* = x*). Note that misclassification parameters on Y depend on the joint distribution of (X, X*), indicating the misclassification process in Y is differential but also depends on X, which is subject to misclassification too. This is potentially important, since it is far more common to assume independence of the misclassification processes (see Section 2.1.2). Similarly, denote SEXy = Pr(X* = 1|X = 1, Y = y) and SPXy = Pr(X* = 0|X = 0, Y = y), taking the typical form associated with differential misclassification (Thomas et al., 1993). Terminology-wise, we view the general expression in eq. [1] as reflecting “differential and dependent misclassification”.

Alternatively, one may choose to parameterize the observed data likelihood contribution in terms of positive and negative predictive values, that is,

πxy=x=01y=01Pr(Y=yY=y,X=x,X=x)Pr(X=xX=x,Y=y)πxy, where the first and second terms relate to predictive values of X and Y, defined as PPVYxx* = Pr(Y = 1|Y* = 1, X = x, X* = x*), NPVYxx* = Pr(Y = 0|Y* = 0, X = x, X* = x*), PPVXy* = Pr(X = 1|X* = 1, Y* = y*), and NPVXy* = Pr(X = 0|X* = 0, Y* = y*). In contrast to the parameterization using SE and SP, note that the predictive values of X depend on the potentially mismeasured response. Again, predictive values of Y depend on the joint distribution of (X, X*), implying the dependence of misclassification of Y on the other misclassified variable. When only X is subject to misclassification, eq. [2] can be rewritten as πxy=x=01Pr(X=xX=x,Y=y)Pr(X=x,Y=y). This reflects Marshall’s (1990) original proposal, which we refer to as the “inverse matrix method”.

2.1.2 Differential and independent misclassification

Assuming independent misclassification implies that Pr(Y* = y*, X* = x*|Y = y, X = x) = Pr(Y* = y*|Y = y, X = x)Pr(X* = x*|X = x, Y = y). In other words, X* and Y* are conditionally independent given (X, Y). However, it should be noted that the reverse may not be true. This corresponds to reducing eq. [1] to the following form: πxy=x=01y=01Pr(Y=yY=y,X=x)Pr(X=xX=x,Y=y)πxy, where misclassification on Y only depends on true exposure X characterized by parameters SEYx = Pr(Y* = 1|Y = 1, X = x) and SPYx = Pr(Y* = 0|Y = 0, X = x). The misclassification model for X stays the same as in Section 2.1.1.

2.1.3 Nondifferential and independent misclassification

When assuming nondifferential and independent misclassification, we define SEX = Pr(X* = 1|X = 1), SPX = Pr(X* = 0|X = 0), SEY = Pr(Y* = 1|Y = 1), and SPY = Pr(Y* = 0|Y = 0). We can then rewrite the observed data likelihood contribution as: πxy=x=01y=01Pr(Y=yY=y)Pr(X=xX=x)πxy.

This corresponds to the setting originally studied by Barron (1977).

2.1.4 Other combinations

Sections 2.1.1–2.1.3 outline three misclassification mechanisms. However, other possibilities exist; for example, Y could be differentially but X nondifferentially misclassified. While we confine our main attention to the three situations described above, the proposed methodology accommodates such variations without difficulty assuming adequate internal validation sampling.

2.2 ML approach

In general, the main study likelihood piece based on observed data pairs ( Ym,Xm) (m = 1, …, M) can be expressed as: Lmain=m=1Mπ11(ymxm)π01((1-xm)ym)π10(xm(1-ym))π00((1-xm)(1-ym)), where the π*s take appropriate forms corresponding to different assumptions on the misclassification process as described in Section 2.1 and m denotes for the main study sample. For instance, if parameterizing in terms of SE/SP and allowing differential and dependent misclassification, we have π11=SEY11π11SEX1+SEY01π01(1-SPX1)+(1-SPY11)π10SEX0+(1-SPY01)π00(1-SPX0). In contrast, if independence is assumed while preserving differentiality on both variables, π11=SEY1π11SEX1+SEY0π01(1-SPX1)+(1-SPY1)π10SEX0+(1-SPY0)π00(1-SPX0). Under the most simplified setting (e.g. Barron, 1977), the simultaneous assumptions of independent and nondifferential misclassification imply that π11=SEYπ11SEX+SEYπ01(1-SPX)+(1-SPY)π10SEX+(1-SPY)π00(1-SPX). The other π*s are derived similarly under each scenario (Tang, 2012). Note that the “main study only” likelihood in eq. [5] is directly applicable solely for sensitivity analysis. We emphasize extensions to accommodate a main/internal validation design in Section 2.5.

2.3 Generalized matrix method

We generalize the concept of the matrix method and its extensions (Kleinbaum et al., 1982; Greenland and Kleinbaum, 1983) by flexibly incorporating the full range of possible misclassification models. In general, one is able to relate surrogate and true cell probabilities via the equality Π* = , where Π = (π11 π01 π10 π00)′, Π=(π11π01π10π00) and the definition of A varies according to the assumptions made. For differential and dependent misclassification, we derive A in its most general form as follows: A=[SEY11SEX1SEY01(1-SPX1)(1-SPY11)SEX0(1-SPY01)(1-SPX0)SEY10(1-SEX1)SEY00SPX1(1-SPY10)(1-SEX0)(1-SPY00)SPX0(1-SEY11)SEX1(1-SEY01)(1-SPX1)SPY11SEX0SPY01(1-SPX0)(1-SEY10)(1-SEX1)(1-SEY00)SPX1SPY10(1-SEX0)SPY00SPX0]

Under other assumptions, the matrix A can be derived as in Appendix 1. The matrix method identity relies upon inversion of the matrix A in order to obtain the vector Π = A−1Π*.

2.4 Generalized inverse matrix method

The inverse matrix identity directly expresses true cell probabilities as sums of products of surrogate cell probabilities and predictive values. Here, we extend the proposal of Marshall (1990) to a general context with both variables misclassified in a 2 × 2 table. For example, under dependent and differential misclassification, the law of total probability dictates that π11=PPVY11π11PPVX1+(1-NPVY11)π10PPVX0+PPVY10π01(1-NPVX1)+(1-NPVY10)π00(1-NPVX0). Packaging linear equations into matrices, the form of the generalized inverse matrix method is as given in Marshall’s original proposal: Π = *. However, in our approach, the matrix B takes a more complicated form to accommodate a general misclassification mechanism for both the X and the Y variables: B=[PPVY11PPVX1PPVY10(1-NPVX1)(1-NPVY11)PPVX0(1-NPVY10)(1-NPVX0)PPVY01(1-PPVX1)PPVY00NPVX1(1-NPVY01)(1-PPVX0)(1-NPVY00)NPVX0(1-PPVY11)PPVX1(1-PPVY10)(1-NPVX1)NPVY11PPVX0NPVY10(1-NPVX0)(1-PPVY01)(1-PPVX1)(1-PPVY00)NPVX1NPVY01(1-PPVX0)NPVY00NPVX0]

In contrast to the generalized matrix method, there is no matrix inversion involved in computing the corrected OR through the generalized inverse matrix method. In principle, this could confer a numerical advantage in practice, although again direct use of the identity is generally restricted to the setting of sensitivity analysis.

2.5 Estimation via internal validation sampling

The estimate of the corrected OR is OR^=π11^π00^π10^π01^. For all of the approaches presented above, estimation of misclassification probabilities is crucial in practice. When possible, we recommend the use of an internal validation subsample randomly selected from one’s current study, for which both true binary variables are measured via gold-standard methods along with the error-prone methods used in the main study. The primary appeal of adopting internal (as opposed to external) validation sampling is the avoidance of the necessity to assume “transportability” of misclassification probabilities (Begg, 1987; Carroll et al., 2006) and the accommodation of more general misclassification mechanisms.

When allowing full generality, that is, dependent and differential misclassification, it can be shown that a full likelihood approach based on the proposed main/internal validation design is equivalent regardless of whether parameterized based on predictive values or SE/SP probabilities (Tang, 2012). There are in total 16 types of validation set records, if validations on X and Y are measured simultaneously for each subject in the subsample. Table 1 shows the likelihood contributions for each validation record type based on both parameterizations. In contrast, the main study likelihood based on (X*, Y*) records is given explicitly in eq. [5], that is,

Lmain=m=1Mπ11(ymxm)π01((1-xm)ym)π10(xm(1-ym))π00((1-xm)(1-ym)).

If parameterizing in terms of SE and SP values, all the π*s are further expanded (see Section 2.2).

The internal validation subsample likelihood is given by

Lval=p=116Lvpnvp, where Lvp is the likelihood term corresponding to observation type p in Table 1, while nvp is the total number of observations of the pth type (p = 1, 2, … 16). Note that the total validation study sample size is nv=p=116nvp. The overall likelihood to be maximized is based on a total of M + nv subjects and is proportional to the product of the main and validation study components, i.e. Lmain × Lval.

There are no closed-form solutions for the MLEs based on the overall likelihood written in terms of SE and SP. Interestingly, however, closed forms exist for the predictive value parameterization in the most general case. For example, one can readily verify that

π11^=i=1M+nvIXi=1,Yi=1M+nvandPPV^Y11=i=1nvIval=1,yi=1,yi=1,xi=1,xi=1i=1nvIval=1,yi=1,xi=1,xi=1, where the I notation represents an indicator that the conditions described in the subscript are met (Tang, 2012). The MLEs for the πs can then be estimated from the π^s,PPV^s, and NPV^s by direct use of the generalized inverse matrix identity of Section 2.4. Because the two parameterizations are equivalent under the circumstance of dependent and differential misclassification, we may also obtain closed-form MLEs for the SE^ and SP^ parameters as functions of the PPV^s and NPV^s in that setting. For example,

SE^Y11=PPV^Y11PPV^X1x11^PPV^Y11PPV^X1π11^+(1-NPV^Y11)PPV^X0π10^.

The remaining closed-form MLEs are displayed in Appendix 2.

When the misclassification process is not fully general (e.g. assuming independent misclassification and/or nondifferential misclassification of either variable), the equivalence between the likelihoods based on the SE/SP and predictive value parameterizations no longer holds. In such cases, it appears that there are no simple closed forms for likelihood-based SE^s,SP^s, and π̂s. If one supplies the generalized matrix method with data-driven SE and SP estimates that are not MLEs, the corrected OR^ will not be fully efficient. These conclusions are consistent with previous findings in a simpler context, with misclassification of only one variable (Lyles, 2002).

In general, we recommend the use of the ML approach for optimal efficiency and the ease of numerically computing standard errors. Optimizing the full main/internal validation likelihood under either parameterization path is readily achieved by taking advantage of numerical procedures in standard statistical software. As such, we view the matrix and inverse matrix constructs more as instructive identities than as practical analysis tools, unless they are to be used solely for sensitivity analyses. Straightforward multivariate delta-method calculations allow computing the approximate standard error of the corrected log(OR^) based on ML, after obtaining the π̂s and the corresponding numerically-derived Hessian. SAS NLMIXED (SAS Institute, Inc., 2008) programs for accomplishing these tasks are readily available from the first author.

A natural question one might ask is whether measuring (X*, Y*) on every subject in addition to (X, Y) yields a different or improved estimate of the true OR characterizing the (X, Y) association. In fact, if (X, Y, X*, Y*) is available on all participants, the available information for estimating the OR is equivalent to that contained in the (X, Y) data alone. The overall likelihood then reduces to Lval. In Appendix 3, we show that maximizing the reduced form of the overall likelihood (Lval only) under the most general misclassification model in this situation leads to exactly the same MLEs of the πs as those obtained from analyses ignoring (X*, Y*). A similar argument can be readily derived under other types of misclassification models. This finding unsurprisingly suggests that knowing surrogates when gold-standard measures are available on the whole sample does not offer additional value in the estimation of the primary effect (e.g. OR) of interest, which further implies that if gold-standard measures are comparatively affordable compared to surrogates, it is more efficient to evaluate via gold standards only.

2.6 Notes on case–control studies

While our focus has been on cross-sectional sampling, the case–control sampling scheme is also worthy of discussion. Here, we consider “case–control” studies as those where case oversampling is conducted based on the error-prone responses. In other words, observations with Y* = 1 (“cases”) are sampled with a greater probability than those with Y* = 0 (“controls”). Prior work (Greenland and Kleinbaum, 1983) has noted that supplying the population misclassification probabilities to the correction methods will yield invalid estimates; however, with nondifferential misclassification, the validity of the analytic results could be restored by introducing the sampling fraction of cases and controls into the correction. It was also noted in Lyles et al. (2011) that the main/internal validation design is favorable for handling such oversampling under nondifferential misclassification, because it automatically yields estimates of the “operating” misclassification probabilities. Similar findings are observed in the current setting. With oversampling of “cases” (Y* = 1), the method described in the previous sections yields valid estimation of the OR, as long as misclassification of Y is nondifferential. When the nondifferential misclassification assumption is not met, however, the validity of the estimated OR based on the main/internal validation design does not hold under “case” oversampling. More details can be found in Tang (2012).

2.7 Model selection

When correcting the estimate of the OR, we would ideally choose the misclassification mechanism that generated the observed data. Here, we provide a straightforward model selection procedure to guide practitioners. For ease of discussion, denote the dependent and differential misclassification model as “Model 1”, followed by “Model 2” (the independent and differential misclassification model in Section 2.1.2) and “Model 3” (the completely nondifferential model in Section 2.1.3). Model 1 reflects a fully general misclassification mechanism, while Model 2 can be regarded as a generalization of Marshall’s (1990) framework to the situation when both X and Y are misclassified and Model 3 is a representation of Barron’s (1977) setting.

Define AICq = the value of the Akaike Information Criterion (AIC) (Akaike, 1974) upon fitting Model q (q = 1, 2, 3). In practice, we recommend selecting the model that yields the smallest value of AIC, as that criterion is well known to balance between the number of necessary parameters included and the quality of model fit. One may then simply report the results corresponding to the selected model. Although a more accurate standard error for the resulting estimated log(OR) might presumably be obtained via resampling, our empirical studies suggest that it is suitably reliable and computationally efficient to report the standard error from the selected model (see Section 4). We apply this AIC-based approach to real data in the following section, and a program utilizing the SAS NLMIXED procedure to implement the model selection method is available from the first author. For additional comments regarding selection of the misclassification model, see Section 5.

3 Example

Our motivating example comes from the HERS. This is a multi-center prospective cohort study with a total of 1,310 women enrolled in four U.S. cities from 1993 to 1995 (Smith et al., 1997). Among them, 871 women were HIV-infected, and 439 were not infected but at risk. During each semi-annual visit, a wealth of subject-specific information was collected. The question of interest is to assess the association between two binary variables: BV status and TRICH status. BV was measured by two different clinical methods: the clinically-based (CLIN) and the laboratory-based (LAB) methods. CLIN is a less accurate method that diagnoses BV by evaluating multiple clinical criteria based on a modified Amsel’s criteria (Amsel et al., 1983), while LAB relies on a more sophisticated Gram-staining technique (Nugent et al., 1991). The LAB method is more expensive and serves here as an arguable gold standard, while the CLIN method is more cost-efficient and accessible. The presence of TRICH was evaluated by a clinical wet mount technique characterized by low sensitivity (Thomason et al., 1988), along with a gold-standard culture method. For both BV and TRICH measurements, gold-standard and error-prone diagnoses are widely available for HERS participants at Visit 4 and beyond. This feature of the HERS makes for an excellent illustrative example of internal validation data-based methodology.

We consider 916 patients with complete observations on both error-prone and gold-standard diagnoses of BV and TRICH at the fourth HERS visit. We selected Visit 4, because a previous examination uncovered a complex misclassification process underlying the assessment of BV status at that visit (Lyles et al., 2011). The prevalence of BV via the LAB technique in the sample was 18.2%, while due to misclassifying some diagnoses the naïve CLIN prevalence was only 7.5%. Compared to the LAB BV diagnosis, estimates suggest that CLIN BV conferred a crude SE around 37% and a crude SP of about 99%. The prevalence of TRICH in our sample was 40.2% when assessed by culture testing. In contrast, when evaluated by wet mount, the prevalence was only 24.5%, with an estimated crude SE of 51.9% and SP of 94.0%.

Table 2 summarizes the results based on using gold-standard measurements only, error-prone diagnoses only, and fitting correction models via the proposed main/internal validation design under various misclassification mechanisms. Note that the naïve result characterizing the association between CLIN BV and wet mount-based TRICH inflated the estimated OR by nearly 50% relative to the LAB and culture-based analyses. For main/internal validation analysis based on Models 1–3, we utilized a random subsample selecting ¼ of the total sample size as the internal validation set. A summary of the data comprising the resulting main and internal validation samples is presented in Table 7 (Appendix 4). The corrected OR^ is close to the gold-standard (LAB and culture-based) result, though with expected efficiency loss, when dependent and differential misclassification is allowed (Model 1). If differential but independent misclassification (Model 2) is assumed, the corrected OR^ appears slightly biased away from the null. When a nondifferential misclassification model is adopted, the corrected OR^ is similar to that obtained via the naïve result.

With the proposed model selection approach (Section 2.7), Model 1 is chosen with the smallest AIC value among the three candidate models. Therefore, we retain the fully general Model 1 as the final model, suggesting that the HERS data require one to account for dependent misclassification that is differential with respect to both X and Y. The results indicate that TRICH is positively associated with BV among the HERS population at Visit 4, and our corrected analysis based on Model 1 agrees with the gold-standard analysis extremely well.

As discussed in Section 2.5, when utilizing both the gold-standard and surrogate measures of BV and TRICH for all 916 subjects in order to specify the corresponding full likelihood Lval, we obtained the identical log(OR) estimate and standard error as when performing the “gold-standard” analysis in Table 2. Therefore, this result is omitted from the table.

4 Simulation studies4.1 Study I: mimicking real-data example

Our first simulation experiment evaluates the performance of the proposed methods under conditions mimicking the HERS example (Section 3). Cell counts were simulated from a multinomial distribution with cell probabilities of (π11 = 0.1146, π10 = 0.2871, π01 = 0.0677, π00 = 0.5306), and main and internal validation sample sizes (nm = 687, nv = 219) similar to those observed in the HERS example. Error-prone response Y* and exposure X* were generated with misclassification probabilities estimated from the HERS sample based on the fit of Model 1 (data available in Table 7), where the misclassification process was assumed dependent and differential. For each of 500 simulated datasets, we conducted naïve analysis associating Y* with X*, true analysis with Y and X, and main/internal validation analyses via Models 1–3.

Table 3 summarizes the results. The naïve analysis yields a result biased away from the null. Model 1 produces the corrected OR estimate closest to the gold-standard OR, with tolerable sacrifice in efficiency. The 95% CI coverage under Model 1 is also excellent. When reducing Model 1 to other simpler versions by assuming independent or nondifferential misclassification, the results are biased, reflecting the fact that the reduced models are not consistent with the data generation process. Note that with the simplest model assuming nondifferential misclassification of both variables (Model 3), the corrected result is similar to the naïve result (in fact, arguably worse). This strongly highlights the importance of internal validation data to permit flexibility in the selected misclassification model.

The corrected results using the generalized matrix methods discussed in Section 2.1.1 agree well with the MLEs, when ML estimates of misclassification probabilities are supplied. However, when simpler crude estimates obtained from the validation subsample are inserted into the generalized matrix method, results are not satisfying, even producing negative estimates of probabilities in some cases (Tang, 2012; results not shown). Thus, in practice, we favor the proposed main/internal validation study-based full ML approach in the interest of obtaining both valid and efficient results.

4.2 Study II: performance of model selection

The results in Section 4.1 suggest the importance of misclassification model selection to ensure the model is specified correctly (or, at least, generally enough). Extensive simulations were performed to evaluate the performance of the proposed AIC-based model selection strategy (Tables 46), when the underlying association was negative (Table 4), or moderate positive (Table 5), or strong positive (Table 6). Under various settings, the model was chosen correctly most of the time. For example, under setting 4, the true underlying model from which data were generated was Model 3. Unsurprisingly, the more general Models 1 and 2 yield valid results. However, with the proposed model selection strategy, Model 3 is correctly picked 88.0% of the time, yielding a slight improvement in efficiency relative to Model 1. In contrast, under setting 6, Model 1 is the underlying model; thus, estimates from Models 2 and 3 are not valid. By correctly selecting Model 1, 94.8% of the time, however, the model selection strategy maintained overall validity and achieved satisfactory 95% CI coverage.

The simulation results in Tables 46 suggest that AIC is a highly effective criterion for selecting among the alternative misclassification models. The key concern, however, is maintenance of validity in the OR estimate. Since the true misclassification model is unknown, only Model 1 ensures such validity in theory. Thus, whenever the internal validation subsample is of adequate size to support its fit, Model 1 must be viewed as the safest choice. Another argument in favor of Model 1 is the fact that, at least under the simulation conditions examined here, it produced a log(OR) estimate with very similar mean and variance properties to those characterizing the MLEs under simpler true underlying misclassification models.

5 Discussion

We have considered the classic problem of analyzing 2 × 2 tables, when both binary variables are subject to misclassification. Our main contributions are twofold. First, we have expanded the well-studied matrix (Barron, 1977) and inverse matrix (Marshall, 1990) identities to a more general context than ever before. Specifically, the results given in Sections 2.3 and 2.4 extend both identities to a fully general scenario with dependent and differential misclassification of two binary variables and could serve to update epidemiological methodology texts with regard to this topic. Secondly, we place heavy emphasis on specifying likelihood functions corresponding to main/internal validation designs under potentially complex misclassification mechanisms involving two binary variables. To our knowledge, this effort provides the first fully articulated framework to accomplish a joint main/internal validation study-based ML analysis allowing for dependent and differential misclassification of both variables. By parameterizing in terms of positive and negative predictive values, we have derived closed-form MLEs for the true cell probabilities based on this fully general misclassification model. The ML analysis requires numerical optimization under more restrictive nested misclassification models, but easily implemented programs designed to fit Models 1–3 (Section 2.7) using SAS NLMIXED are available from the first author by request.

In the context studied here, the ability to apply a misclassification model that is sufficiently general can be critical, if one hopes to obtain a valid estimate of association. Our motivating example involving BV and TRICH assessments from the HERS illustrates this point extremely well, as we find evidence suggesting bias in all estimates of the OR except the one based on the fully general dependent and differential misclassification model introduced in this article. When misclassification of either variable is differential, the naïve log(OR) estimator can be biased in either direction. Moreover, the HERS example demonstrates that a corrected estimate based on an incorrect nondifferential error assumption for either variable can be potentially worse than the naïve estimate. For this reason, we urge practitioners not to simply assume nondifferential misclassification of either variable, unless that assumption is supported by the data or there is no other resource.

It should be noted that familiar matrix and inverse matrix methods as applied in practice are only equivalent to special cases of the proposed likelihood-based approach, when MLEs of misclassification rates are supplied into the generalized matrix identities. Otherwise, estimators based on application of the matrix and inverse matrix methods are not fully efficient. For this reason, we favor the approach advocated here in which the full main/internal validation study likelihood is utilized. If one is also interested in obtaining a confidence interval for the OR, numerical optimization of the likelihood function greatly reduces the complexity of delta-method-based calculations for computing standard errors to accompany the adjusted log(OR) estimate (Tang, 2012; details and program available from first author).

We have proposed a straightforward model selection procedure for practitioners who not only seek to obtain a valid analytic result but also pursue a more precise result that may be achievable via a correct reduced misclassification model. It has been demonstrated that the proposed model selection procedure works stably and permits the choice of simpler models when the deviation of the estimated OR is acceptable relative to the general model. However, since the saturated model allowing dependent and differential misclassification is always valid and appeared to sacrifice little efficiency in our simulations given an adequate validation sample, it may often be prudent to avoid model selection and simply settle upon the saturated misclassification model.

Our findings suggest that when designing large-scale epidemiologic studies for which standard outcome (Y) and exposure (X) assessments are error-prone, it is valuable to invest in collecting an internal validation subsample with gold-standard measurements applied to both Y and X. This allows one to evaluate and adjust for differential and/or dependent misclassification if it could be an issue. When gold standards are not available, however, one should consider sensitivity analyses to explore the potential effects of misclassification (Lash and Fink, 2003; Fox et al., 2005; Lyles and Lin, 2010). In our context, a series of pre-specified misclassification rates could be supplied into matrices A and B of the generalized matrix and inverse matrix methods in Sections 2.3 and 2.4, respectively, to assess their impact on the estimated OR. We caution, however, that such sensitivity analyses may generally be invalid under case oversampling (e.g. Greenland and Kleinbaum, 1983).

We are currently investigating natural extensions of the current work to the multivariable regression and longitudinal settings, with internal validation subsampling to facilitate misclassification adjustments. Future work could involve specific consideration of cost-efficient internal validation designs when both X and Y are misclassified, as in practice the costs associated with validating X or Y may be different. As an extension of prior work, it could be of interest to consider the allocation of validated observations cleverly into different types, to ensure the control of cost while still maintaining analytic validity. In some cases, formal considerations of this question may reveal the most cost-efficient approach to be the one in which the gold-standard approach is applied to all experimental units (Spiegelman and Gray, 1991; Lyles et al., 2005). A sample simulation program evaluating analytic validity with various validation sample sizes and pre-specified parameters is available from the author upon request, which offers a practical guide for study planning. Also, investigators may sometimes be more interested in validating a particular subpopulation, for example, those with a disease than those without, leading to nonrandom validation sampling. There could also be interest in extending the methods studied here to settings in which one or both gold-standard methods are imperfect, or “alloyed” (Wacholder et al., 1993; Brenner, 1996).

AkaikeH1974A new look at the statistical model identificationIEEE Transactions on Automatic Control19716723AmselRTottenPASpiegelCAChenKCEschenbachDHolmesKK1983Nonspecific vaginitis: diagnostic criteria and microbial and epidemiologic associationsAmerican Journal of Medicine7414226600371BarronBA1977The effects of misclassification on the estimation of relative riskBiometrics33414418884199BeggCB1987Biases in the assessment of diagnostic testsStatistics in Medicine64114233114858BrennerH1996Correcting for exposure misclassification using an alloyed gold standardEpidemiology74064108793367CarrollRJRuppertDStefanskiLA2006Measurement Error in Nonlinear Models2LondonChapman and HallFoxMPLashTLGreenlandS2005A method to automate probabilistic sensitivity analyses of misclassified binary variablesInternational Journal of Epidemiology341370137616172102GreenlandS1988Variance estimation for epidemiologic effect estimates under misclassificationStatistics in Medicine77457573043623GreenlandS2008Maximum-likelihood and closed-form estimators of epidemiologic measures under misclassificationJournal of Statistical Planning and Inference138528538GreenlandSKleinbaumD1983Correcting for misclassification in two-way tables and matched-pair studiesInternational Journal of Epidemiology1293976840961HolcroftCARotnitzkyARobinsJM1997Efficient estimation of regression parameters from multistage studies with validation of outcomes and covariatesJournal of Statistical Planning and Inference65349374SAS Institute Inc2008SAS/STAT® 9.2 User’s GuideCary, NCSAS Institute IncKleinbaumDKupperLMorgensternH1982Epidemiologic Research: Principles and Quantitative MethodsBelmont, CALifetime LearningLashTLFinkAK2003Semi-automated sensitivity analysis to assess systematic errors in observational dataEpidemiology1445145812843771LylesRH2002A note on estimating crude odds ratios in case–control studies with differentially misclassified exposureBiometrics581034103712495160LylesRHLinJ2010Sensitivity analysis for misclassification in logistic regression via likelihood methods and predictive value weightingStatistics in Medicine292297230920552681LylesRHTangLSuperakHMKingCCCelantanoDLoYSobelJ2011Validation data-based adjustments for outcome misclassification in logistic regression: an illustrationEpidemiology2258959721487295LylesRHWilliamsonJMLinHMHeiligCM2005Extending McNemar’s test: estimation and inference when paired binary outcome data are misclassifiedBiometrics61281294MarshallRJ1990Validation study methods for estimating proportions and odds ratios with misclassified dataJournal of Clinical Epidemiology439419472213082MorrisseyMJSpiegelmanD1999Matrix methods for estimating odds ratios with misclassified exposure data: extensions and comparisonsBiometrics5533834411318185NugentRPKrohnMAHillierSL1991Reliability of diagnosing bacterial vaginosis is improved by a standardized method of gram stain interpretationJournal of Clinical Microbiology292973011706728RothmanKJGreenlandS1998Modern EpidemiologyPhiladelphia, PALippincott-RavenSmithDKWarrenDLVlahovDSchumanPSteinMDGreenbergBL1997Design and baseline participant characteristics of the Human Immunodeficiency Virus Epidemiology Research (HER) Study: a prospective cohort study of human immunodeficiency virus infection in U.S. womenAmerican Journal of Epidemiology1464594699290506SpiegelmanDGrayR1991Cost-efficient study designs for binary response data with generalized Gaussian measurement error in the covariateBiometrics478518701789885TangL2012Analysis of Data with Complex Misclassification in Response or Predictor Variables by Incorporating Validation SubsamplingPhD DissertationAtlanta, GAEmory UniversityThomasDStramDDwyerJ1993Exposure measurement error: influence on exposure-disease. Relationships and methods of correctionAnnual Review of Public Health146993ThomasonJLGelbartSMSobunJFSchulienMBHamiltonPR1988Comparison of four methods to detect Trichomonas vaginalisJournal Clinical Microbiology2618691870WacholderSArmstrongBHartgeP1993Validation studies using an alloyed gold standardAmerican Journal of Epidemiology137125112588322765Appendix 1: matrix <italic>A</italic> for generalized matrix identity under various situations

Assuming differential misclassification with independence,

A=[SEY1SEX1SEY0(1-SPX1)(1-SPY1)SEX0(1-SPY0)(1-SPX0)SEY1(1-SEX1)SEY0SPX1(1-SPY1)(1-SEX0)(1-SPY0)SPX0(1-SEY1)SEX1(1-SEY0)(1-SPX1)SPY1SEX0SPY0(1-SPX0)(1-SEY1)(1-SEX1)(1-SEY0)SPX1SPY1(1-SEX0)SPY0SPX0] which has the same form as defined by Greenland and Kleinbaum (1983). Under the circumstance of nondifferential and independent misclassification,

A=[SEYSEXSEY(1-SPX)(1-SPY)SEX(1-SPY)(1-SPX)SEY(1-SEX)SEYSPX(1-SPY)(1-SEX)(1-SPY)SPX(1-SEY)SEX(1-SEY)(1-SPX)SPYSEXSPY(1-SPX)(1-SEY)(1-SEX)(1-SEY)SPXSPY(1-SEX)SPYSPX] and with some algebraic work, one can easily show that this equation is equivalent to that underlying Barron’s original matrix method (Barron, 1977). With algebraic work, it can be shown that A is invertible if and only if SEX + SPX − 1 > 0 and SEY + SPY − 1 > 0. Under usual circumstances with reasonable error-prone assessments, one can reasonably expect these two inequalities to hold. The generalized matrix method is then derived immediately as Π = A1Π*.

Appendix 2: closed-form ML estimators for SE and SP parametersSE^Y11=PPV^Y11PPV^X1π11^PPV^Y11PPV^X1π11^+(1-NPV^Y11)PPV^X0π10^SE^Y10=PPV^Y10(1-NPV^X1)π11^PPV^Y10(1-NPV^X1)π01^+(1-NPV^Y10)(1-NPV^X0)π00^SE^Y01=PPV^Y01(1-PPV^X1)π11^PPV^Y01(1-PPV^X1)π11^+(1-NPV^Y01)(1-PPV^X0)π10^SE^Y00=PPV^Y00NPV^X1π01^PPV^Y00NPV^X1π01^+(1-NPV^Y00)NPV^X0π00^SP^Y11=NPV^Y11PPV^X0π10^NPV^Y11PPV^X0π10^+(1-PPV^Y11)PPV^X1π11^SP^Y10=NPV^Y10(1-NPV^X0)π00^NPV^Y10(1-NPV^X0)π00^+(1-PPV^Y10)(1-NPV^X1)π01^SP^Y01=NPV^Y01(1-PPV^X0)π10^NPV^Y01(1-PPV^X0)π10^+(1-PPV^Y01)(1-PPV^X1)π11^SP^Y00=NPV^Y00NPV^X0π00^NPV^Y00NPV^X0π00^+(1-PPV^Y00)NPV^X1π01^SE^X1=PPV^Y11PPV^X1π11^+(1-NPV^Y11)PPV^X0π10^PPV^Y11PPV^X1π11^+(1-NPV^Y11)PPV^X0π10^+PPV^Y10(1-NPV^X1)π01^+(1-NPV^Y10)(1-NPV^X0)π00^SE^X0=NPV^Y11PPV^X0π10^+(1-PPV^Y11)PPV^X1π11^NPV^Y11PPV^X0π10^+(1-PPV^Y11)PPV^X1π11^+NPV^Y10(1-NPV^X0)π00^+(1-PPV^Y10)(1-NPV^X1)π01^SP^X1=PPV^Y00NPV^X1π01^+(1-NPV^Y00)NPV^X0π00^PPV^Y00NPV^X1π01^+(1-NPV^Y00)NPV^X0π00^+PPV^Y01(1-PPV^X1)π11^+(1-NPV^Y01)(1-PPV^X0)π10^SP^X0=NPV^Y00NPV^X0π00^+(1-PPV^Y00)NPV^X1π01^NPV^Y00NPV^X0π00^+(1-PPV^Y00)NPV^X1π01^+NPV^Y01(1-PPV^X0)π10^+(1-PPV^Y01)(1-PPV^X1)π11^Appendix 3: closed-form ML estimators for <italic>π</italic>s with (<italic>X</italic>, <italic>Y</italic>, <italic>X</italic><sup>*</sup>, <italic>Y</italic><sup>*</sup>) available on all subjects

In general, Lfull = Lmain × Lval. When (X, Y, X*, Y*) is measured on the whole sample, every subject can be regarded as a validation observation, so that there are no main study observations (i.e. M = 0 in Section 2.5) in this special case. Thus, Lfull = Lval.

Under the most general misclassification model (Model 1 in Section 2.7), we may write the likelihood as follows: Lfull=Lval=i=1nv(SEY11SEX1π11)xiyixiyi((1-SPY11)SEX0π10)xi(1-yi)xiyi(SEY01(1-SPX1)π01)(1-xi)yixiyi×((1-SPY01)(1-SPX0)π00)(1-xi)(1-yi)xiyi((1-SEY11)SEX1π11)xiyixi(1-yi)(SPY11SEX0π10)xi(1-yi)xi(1-yi)×((1-SEY01)(1-SPX1)π01)(1-xi)yixi(1-yi)(SPY01(1-SPX0)π00)(1-xi)(1-yi)xi(1-yi)×(SEY10(1-SEX1)π11)xiyi(1-xi)yi((1-SPY10)(1-SEX0)π10)xi(1-yi)(1-xi)yi(SEY00SPX1π01)(1-xi)yi(1-xi)yi×((1-SPY00)SPX0π00)(1-xi)(1-yi)(1-xi)yi((1-SEY10)(1-SEX1)π11)xiyi(1-xi)(1-yi)×(SPY10(1-SEX0)π10)xi(1-yi)(1-xi)(1-yi)((1-SEY00)SPX1π01)(1-xi)yi(1-xi)(1-yi)×(SPY00SPX0π00)(1-xi)(1-yi)(1-xi)(1-yi)

The above term can be rewritten as: Lfull=Lval=(SEY11SEX1π11)i=1nvxiyixiyi((1-SPY11)SEX0π10)i=1nvxi(1-yi)xiyi(SEY01(1-SPX1)π01)i=1nv(1-xi)yixiyi×((1-SPY01)(1-SPX0)π00)i=1nv(1-xi)(1-yi)xiyi((1-SEY11)SEX1π11)i=1nvxiyixi(1-yi)×(SPY11SEX0π10)i=1nvxi(1-yi)xi(1-yi)((1-SEY01)(1-SPX1)π01)i=1nv(1-xi)yixi(1-yi)×(SPY01(1-SPX0)π00)i=1nv(1-xi)(1-yi)xi(1-yi)(SEY10(1-SEX1)π11)i=1nvxiyi(1-xi)yi×((1-SPY10)(1-SEX0)π10)i=1nvxi(1-yi)(1-xi)yi(SEY00SPX1π01)i=1nv(1-xi)yi(1-xi)yi×((1-SPY00)SPX0π00)i=1nv(1-xi)(1-yi)(1-xi)yi((1-SEY10)(1-SEX1)π11)i=1nvxiyi(1-xi)(1-yi)×(SPY10(1-SEX0)π10)i=1nvxi(1-yi)(1-xi)(1-yi)((1-SEY00)SPX1π01)i=1nv(1-xi)yi(1-xi)(1-yi)×(SPY00SPX0π00)i=1nv(1-xi)(1-yi)(1-xi)(1-yi) which is π11i=1nvxiyiπ10i=1nvxi(1-yi)π01i=1nv(1-xi)yiπ00i=1nv(1-xi)(1-yi) multiplied by a piece only involving misclassification probabilities (denoted by P). As a result,

log(Lfull)=log(P)+i=1nvxiyi×log(π11)+i=1nvxi(1-yi)×log(π10)+i=1nv(1-xi)yi×log(π01)+i=1nv(1-xi)(1-yi)×log(π00)

Since the term P does not involve primary parameters, we can maximize the above log likelihood in terms of the πs easily with closed-form solutions as π^ij=Ix=i,y=jnv, where I is defined similarly as in Section 2.5. The standard errors can be derived by taking the second derivatives of eq. [6] with respect to the πs. It should be noted that if only interested in primary parameters, the log likelihood expression in eq. [6] has exactly the same form when ignoring (X*, Y*). This confirms that inference on the πs stays the same no matter whether surrogate information is taken into account or not, when all participants in the study receive gold-standard evaluations. Under other less general misclassification models, the conclusion holds by following a similar argument.

Appendix 4: a summary of the fourth HERS visit data for models in Section 3

BV and TRICH data of 916 participants at the fourth HERS visit

Main study
CLIN BVWet mount TRICH
Total
+
49723520
+13829167
Total63552687
Internal validation sample
CLIN BV = 1, WET TRICH = 1, LAB BV = 1, CULTURE TRICH = 17
CLIN BV = 1, WET TRICH = 1, LAB BV = 1, CULTURE TRICH = 00
CLIN BV = 1, WET TRICH = 1, LAB BV = 0, CULTURE TRICH = 13
CLIN BV = 1, WET TRICH = 1, LAB BV = 0, CULTURE TRICH = 00
CLIN BV = 1, WET TRICH = 0, LAB BV = 1, CULTURE TRICH = 111
CLIN BV = 1, WET TRICH = 0, LAB BV = 1, CULTURE TRICH = 028
CLIN BV = 1, WET TRICH = 0, LAB BV = 0, CULTURE TRICH = 10
CLIN BV = 1, WET TRICH = 0, LAB BV = 0, CULTURE TRICH = 08
CLIN BV = 0, WET TRICH = 1, LAB BV = 1, CULTURE TRICH = 12
CLIN BV = 0, WET TRICH = 1, LAB BV = 1, CULTURE TRICH = 00
CLIN BV = 0, WET TRICH = 1, LAB BV = 0, CULTURE TRICH = 14
CLIN BV = 0, WET TRICH = 1, LAB BV = 0, CULTURE TRICH = 01
CLIN BV = 0, WET TRICH = 0, LAB BV = 1, CULTURE TRICH = 111
CLIN BV = 0, WET TRICH = 0, LAB BV = 1, CULTURE TRICH = 034
CLIN BV = 0, WET TRICH = 0, LAB BV = 0, CULTURE TRICH = 111
CLIN BV = 0, WET TRICH = 0, LAB BV = 0, CULTURE TRICH = 0109
Total229

Description and likelihood contributions for 16 possible types of observations under the internal validation sampling

Obs. typeDescriptionLikelihood contribution in terms of SE and SPLikelihood contribution in terms of predictive values
1X* = 1, Y* = 1, X = 1, Y = 1SEY11SEX1π11PPVY11PPVX1 π11*
2X* = 1, Y* = 1, X = 1, Y = 0(1−SPY11)SEX0π10(1−PPVY11)PPVX1 π11*
3X* = 1, Y* = 1, X = 0, Y = 1SEY01(1−SPX1)π01PPVY01(1−PPVX1) π11*
4X* = 1, Y* = 1, X = 0, Y = 0(1−SPY01)(1−SPX0)π00(1−PPVY01)(1−PPVX1) π11*
5X* = 1, Y* = 0, X = 1, Y = 1(1−SEY11)SEX1π11(1−NPVY11)PPVX0 π10*
6X* = 1, Y* = 0, X = 1, Y = 0SPY11SEX0π10NPVY11PPVX0π10*
7X* = 1, Y* = 0, X = 0, Y = 1(1−SEY01)(1−SPX1)π01(1−NPVY01)(1−PPVX0) π10*
8X* = 1, Y* = 0, X = 0, Y = 0SPY01 (1−SPX0)π00NPVY01(1−PPVX0) π10*
9X* = 0, Y* = 1, X = 1, Y = 1SEY10 (1−SEX1)π11PPVY10(1−NPVX1) π01*
10X* = 0, Y* = 1, X = 1, Y = 0(1−SPY10)(1−SEX0)π10(1−PPVY10)(1−NPVX1) π01*
11X* = 0, Y* = 1, X = 0, Y = 1SEY00SPX1π01PPVY00NPVX1π01*
12X* = 0, Y* = 1, X = 0, Y = 0(1−SPY00)SPX0π00(1−PPVY00) NPVX1π01*
13X* = 0, Y* = 0, X = 1, Y = 1(1−SEY10)(1−SEX1)π11(1−NPVY10) (1−NPVX0) π00*
14X* = 0, Y* = 0, X = 1, Y = 0SPY10 (1−SEX0)π10NPVY10(1−NPVX0)π00*
15X* = 0, Y* = 0, X = 0, Y = 1(1−SEY00)SPX1π01(1−NPVY00)NPVX0π00*
16X* = 0, Y* = 0, X = 0, Y = 0SPY00SPX0π00NPVY00NPVX0π00*

Note: See Section 2.1 for the definitions of the terms.

Results of analysis of 916 women at Visit 4 in the HERS, effects of correction models on OR estimates under various misclassification assumptions

Model log(OR^) (StdErr) OR^ (95% CI)AIC
Naïvea1.54(0.26)4.65 (2.81, 7.69)
Gold standardb1.14(0.18)3.13 (2.21, 4.43)
Main/internal validation: Model 1c1.18(0.33)3.24 (1.14, 5.35)1,935.0
Main/internal validation: Model 2d1.25(0.32)3.48 (1.25, 5.71)1,946.0
Main/internal validation: Model 3e1.58(0.31)4.84 (1.90, 7.78)1,942.9

Notes:

CLIN BV vs wet mount TRICH for all 916 subjects.

LAB BV vs culture TRICH for all 916 subjects.

229 internal validation and 687 main study observations per simulation. Model 1 assumes dependent and differential misclassification.

Model 2 assumes independent and differential misclassification.

Model 3 assumes completely nondifferential misclassification.

Results of simulations addressing main/internal validation study-based analysis mimicking HERS data

Model log(OR^) (SD)95% CI coverage
Naïvea1.42 (0.23)67.4%
Gold standardb1.15 (0.18)93.6%
Model 1c1.16 (0.34)95.7%
Model 2d1.28 (0.34)93.3%
Model 3e1.58 (0.31)72.4%

Notes: 500 simulations; 229 internal validation and 687 main study observations per simulation. True log(OR) = 1.14.

OR^ calculated using (Y*, X*) data.

OR^ calculated using (Y, X) data. SEx1 = 0.55, SPx1 = 0.82, SEx0 = 0.51, SPx0 = 0.95, SEy11 = 0.47, SPy11 = 0.98, SEy01 = 0.82, SPy01 = 0.99, SEy10 = 0.21, SPy10 = 0.98, SEy00 = 0.31, and SPy00 = 0.99.

Model assuming dependent and differential misclassification.

Model assuming independent and differential misclassification.

Model assuming completely nondifferential misclassification.

Performance of model selection with main/internal validation study-based analysis under a negative association

Model log(OR^) (SD)Mean SE95% CI coverage
Setting 1: SEX =0.60, SPX = 0.90, SEY = 0.70, SPY = 0.80
Naïve−0.32 (0.15)0.150
Gold standard−1.10 (0.14)0.1595.4%
Model 1−1.10 (0.28)0.2995.2%
Model 2−1.10 (0.28)0.2894.8%
Model 3 (underlying model)−1.10 (0.27)0.2795.4%
Model selectiona−1.10 (0.27)0.2795.4%
Setting 2: SEX1 = 0.60, SPX1 = 0.60, SEX0 = 0.90, SPX0 = 0.90, SEY1 = 0.40, SPY1 = 0.98, SEY0 = 0.70, SPY0 = 0.80
Naïve−0.61 (0.15)0.159.4%
Gold standard−1.10 (0.16)0.1593.2%
Model 1−1.10 (0.30)0.2894.4%
Model 2 (underlying model)−1.10 (0.29)0.2894.6%
Model 3−1.28 (0.26)0.2690.0%
Model selectionb−1.10 (0.29)0.2894.2%
Setting 3: SEX1 = 0.60, SPX1 = 0.91, SEX0 = 0.48, SPX0 = 0.94, SEY11 = 0.50, SPY11 = 0.98, SEY10 = 0.21, SPY10 = 0.99, SEY01 = 0.63, SPY01 = 0.97, SEY00 = 0.31, SPY00 = 0.99
Naïve0.82 (0.27)0.200
Gold standard−1.11 (0.15)0.1594.6%
Model 1 (underlying model)−1.12 (0.28)0.2794.1%
Model 2−1.00 (0.27)0.2785.2%
Model 3−0.62 (0.27)0.2758.3%
Model selectionc−1.11 (0.28)0.2793.2%

Notes: 500 simulation studies; 229 internal validation observations and 687 main study observations per simulation. Data were generated from a multinomial distribution with cell probabilities of (π11 = 0.10, π10 = 0.30, π01 = 0.30, π00 = 0.30). True log(OR) = −1.10. Naïve model uses (Y*, X*) data. Gold-standard model uses (Y, X) data. Model 1 assumes dependent and differential misclassification. Model 2 assumes independent and differential misclassification. Model 3 assumes completely nondifferential misclassification. Model selection based on the strategy described in Section 2.7.

Model 3 selected 88.8% of the time.

Model 2 selected 92.4% of the time.

Model 1 selected 85.0% of the time.

Performance of model selection with main/internal validation study-based analysis under a moderate positive association

Model log(OR^) (SD)Mean SE95% CI coverage
Setting 4: SEX = 0.60, SPX = 0.90, SEY = 0.70, SPY = 0.80
Naïve0.22(0.13)0.141.2%
Gold standard0.81(0.14)0.1494.6%
Model 10.82(0.27)0.2694.8%
Model 20.82(0.26)0.2695.0%
Model 3 (underlying model)0.82(0.25)0.2595.8%
Model selectiona0.82(0.25)0.2595.8%
Setting 5: SEX1 = 0.60, SPX1 = 0.60, SEX0 = 0.90, SPX0 = 0.90, SEY1 = 0.40, SPY1 = 0.98, SEY0 = 0.70, SPY0 = 0.80
Naïve−0.28(0.14)0.140
Gold standard0.81(0.14)0.1494.6%
Model 10.81(0.26)0.2695.6%
Model 2 (underlying model)0.81(0.25)0.2594.8%
Model 30.60(0.25)0.2584.6%
Model selectionb0.81(0.25)0.2595.0%
Setting 6: SEX1 = 0.60, SPX1 = 0.91, SEX0 = 0.48, SPX0 = 0.94, SEY11 = 0.50, SPY11 = 0.98, SEY10 = 0.21, SPY10 = 0.99, SEY01 = 0.63, SPY01 = 0.97, SEY00 = 0.31, SPY00 = 0.99
Naïve1.64(0.17)0.177.2%
Gold standard0.82(0.14)0.1494.6%
Model 1 (underlying model)0.81(0.25)0.2695.0%
Model 20.93(0.24)0.2586.6%
Model 31.60(0.24)0.2417.0%
Model selectionc0.81(0.25)0.2694.4%

Notes: 500 simulation studies; 229 internal validation observations and 687 main study observations per simulation. Data were generated from a multinomial distribution with cell probabilities of (π11 = 0.30, π10 = 0.20, π01 = 0.20, π00 = 0.30). True log(OR) = 0.81. Naïve model uses (Y*, X*) data. Gold-standard model uses (Y, X) data. Model 1 assumes dependent and differential misclassification. Model 2 assumes independent and differential misclassification. Model 3 assumes completely nondifferential misclassification. Model selection based on the strategy described in Section 2.7.

Model 3 selected 88.0% of the time.

Model 2 selected 94.0% of the time.

Model 1 selected 94.8% of the time.

Performance of model selection with main/internal validation study-based analysis under a strong positive association

Model log(OR^) (SD)Mean SE95% CI coverage
Setting 7: SEX = 0.60, SPX = 0.90, SEY = 0.70, SPY = 0.80
Naïve0.46(0.14)0.140
Gold standard1.80(0.14)0.1596.4%
Model 11.82(0.28)0.3096.8%
Model 21.82(0.28)0.2996.8%
Model 3 (underlying model)1.82(0.27)0.2896.4%
Model selectiona1.82(0.28)0.2896.4%
Setting 8: SEX1 = 0.60, SPX1 = 0.60, SE X0 = 0.90, SPX0 = 0.90, SEY1 = 0.40, SPY1 = 0.98, SEY0 = 0.70, SPY0 = 0.80
Naïve−0.20(0.15)0.150
Gold standard1.80(0.16)0.1593.8%
Model 11.81(0.31)0.2993.6%
Model 2 (underlying model)1.81(0.31)0.2993.4%
Model 31.59(0.30)0.2985.8%
Model selectionb1.81(0.31)0.2993.2%
Setting 9: SEX1 = 0.60, SPX1 = 0.91, SE X0 = 0.48, SPX0 = 0.94, SEY11 = 0.50, SPY11 = 0.98, SEY10 = 0.21, SPY10 = 0.99, SEY01 = 0.63, SPY01 = 0.97, SE Y00 = 0.31, SPY00 = 0.99
Naïve1.98(0.18)0.1868.2%
Gold standard1.79(0.14)0.1596.8%
Model 1 (underlying model)1.80(0.28)0.2997.0%
Model 21.95(0.28)0.2892.8%
Model 32.57(0.27)0.2719.8%
Model selectionc1.80(0.28)0.2997.0%

Notes: 500 simulation studies; 229 internal validation observations and 687 main study observations per simulation. Data were generated from a multinomial distribution with cell probabilities of (π11 = 0.30, π10 = 0.10, π01 = 0.20, π00 = 0.40). True log (OR) = 1.79. Naïve model uses (Y*, X*) data. Gold-standard model uses (Y, X) data. Model 1 assumes dependent and differential misclassification. Model 2 assumes independent and differential misclassification. Model 3 assumes completely nondifferential misclassification. Model selection based on the strategy described in Section 2.7.

Model 3 selected 87.2% of the time.

Model 2 selected 90.6% of the time.

Model 1 selected 95.8% of the time.