The proportions of new cancer cases and deaths that are caused by exposure to risk factors and that could be prevented are key statistics for public health policy and planning. This paper summarizes the methodologies for estimating, challenges in the analysis of, and utility of, population attributable and preventable fractions for cancers caused by major risk factors such as tobacco smoking, dietary factors, high body fat, physical inactivity, alcohol consumption, infectious agents, occupational exposure, air pollution, sun exposure, and insufficient breastfeeding. For population attributable and preventable fractions, evidence of a causal relationship between a risk factor and cancer, outcome (such as incidence and mortality), exposure distribution, relative risk, theoretical-minimum-risk, and counterfactual scenarios need to be clearly defined and congruent. Despite limitations of the methodology and the data used for estimations, the population attributable and preventable fractions are a useful tool for public health policy and planning.

The impacts of behavioral and environmental risk factors on disease have long been studied, and quantifying such impacts has been a major public health objective in order to guide prevention and policy [

Numerous, national, regional, and international PAF and PPF estimation studies have estimated the attributable and/or preventable cancers due to either a specific risk factor or multiple risk factors. Risk factors are selected based on the level of evidence for a causal relationship, relevancy of the risk factors for population health, availability and quality of population-representative data, and if the risk factors are avoidable [

The PAFs and PPFs are estimated by comparing the risk of cancer for populations under past and/or current conditions as compared to a counterfactual scenario [

The PAFs and PPFs can be estimated, firstly, using cohort or case-control data from population representative studies, secondly, using exposure data among cases (where the cases are representative of the population) combined with relative risk (RR) data, or, thirdly, through separate data sources on the prevalence of exposure to a risk factor and the corresponding RR (most common method). This review focuses on the third method to estimate PAFs and PPFs. In addition to these methods, other risk factor-specific methods (see section on smoking, UVR and infections) are also briefly summarized.

For the PAFs, the counterfactual scenario is based on everyone having a theoretical-minimum-risk exposure. The theoretical-minimum-risk exposure is usually defined as the exposure distribution leading to the lowest population risk of morbidity and/or mortality [

For the PPFs, the counterfactual scenario is determined based on an attainable exposure distribution (see [

Incidence, mortality, years of life lost due to premature mortality, and years lived with disability are the most common outcomes modelled using PAFs and PPFs [

When estimating the PAFs and PPFs, the time between exposure and outcome i.e., the biological latency period (time lag) should be considered. For example, in the case of tobacco smoking, there is a time lag of approximately 30 years between exposure and the diagnosis of cancer [

The population exposure can be modelled using a categorical or continuous distribution. Categorical exposure estimates are sometimes used when population surveys report exposures within categories, or when RR estimates are reported for categorical measures of exposure. The precision of PAFs and PPFs when using categorical estimates is dependent on the number of categories used. For BMI and cancer, a previous study showed small differences in estimations of cancers attributable to a high BMI when using a categorical distribution as compared to a continuous distribution [

When estimating PAFs based on different data sources, where possible, the risk data should be from meta-analyses (or large cohort studies), and consistent with the available exposure and outcome data in terms of the categories of exposure or the units of measurement used (for the exposure and RR data) and the outcome measured (for the RR and the outcome). This is a problem for environmental exposure PAF and PPF estimations, in particular where RRs for high exposures are commonly applied to people with low exposures [

The reference category for the RR measures, the PAF estimations, is the theoretical-minimum-risk, whereas in the PPF estimations, the RR reference category can either be the theoretical-minimum-risk or a risk category with the preventative target exposure status. Similar to the population exposure, RRs can be modelled using categorical or continuous exposure distributions. If only RR point estimates are provided, the continuous functions can be modelled using either a linear RR, a log-linear RR, or a log-logit RR (see

Differences in the risks of cancer among various sub-populations and across cancer subtypes need to be accounted for, if possible. For example, the RR for an exposure can vary by the histological subtype of cancer at a particular organ site (e.g., for esophageal cancer, obesity is only a cause of adenocarcinomas [

The PAF for smoking can be modelled using the Lopez and Peto methodology [

Preston and colleagues present an alternative method of estimating the PAF for tobacco smoking and total mortality, based on lung cancer mortality rates relative to total mortality rates [

The PAFs and PPFs for infectious agents are based on an estimation of the proportion of cancer cases that would not have occurred if all or some of the infections had been avoided or successfully treated before oncogenesis [

The PAF and PPF fractions for skin melanoma due to sun UVR are usually estimated using a direct method based on differences of skin melanoma rates between populations [

Current PAF and PPF formulas assume the use of a RR; however, studies often report odds ratios. An odds ratio can be used as an approximation of the risk ratio; however, it may overestimate the effect size of the RR (i.e., the distance by which the RR deviates from the null (i.e., 1)) when the incidence of an outcome of interest is not rare within the exposure group [

Commonly, risk factors are not found to be independent of one another [

The PAFs and PPFs for multiple risk factors cannot be estimated through the simple addition of the PAFs and PPFs for the individual risk factors [

Due to the use of data from multiple sources to estimate the PAFs and PPFs, the uncertainty of these estimates is often determined using a Monte Carlo approach, where a set of the lowest level parameters used in these estimations is generated from their respective uncertainty distributions (taking into account variation between the parameters). These lowest level parameters are then used to estimate the uncertainty distribution of the PAFs and PPFs [

As previously mentioned, the time lag between exposure and outcome in some cases is as long as 50 years. Besides the challenge of retrieving high quality historical data that are consistently measured across time (which is particularly problematic for disease classification, especially malignant lymphomas), the cancer risk factors may affect the risk of competing causes of death [

PAF and PPF estimates are restricted by time and population and depend on the quality and representativeness of the exposure and risk data. Data for risk factor exposure usually are obtained from population surveys [

The estimation of the proportion of cancer incidence and mortality attributable to various risk factors, as well as the proportion that could be prevented, provides useful information for health planning and setting health priorities by creating a hierarchy of cancer risk factors and interventions [

The accuracy of PAFs and PPFs greatly depends on the quality of the underlying exposure and RR data. Therefore, there is a need to support initiatives at the national and international level in order to improve both risk factor and cancer surveillance systems, as these data are required for planning, implementing, and evaluating cancer prevention and control efforts [

Many of the risk factors for cancer are also risk factors for other diseases, conditions and injuries [

When performing PAF and PPF studies, it is important to clearly describe the methods used, including data sources and assumptions made, to ensure replicability and transparency, and to highlight the limitations of these estimates in their applicability to health policy. Studies estimating attributable and preventable cancer burdens also provide an opportunity for cross-disciplinary collaboration in order to ensure this translational research is reflected in public health policies.

Kevin D. Shield, D. Maxwell Parkin, David C. Whiteman, Jürgen Rehm, Vivian Viallon, Claire Marant Micallef, Paolo Vineis, Lesley Rushton, Freddie Bray, Isabelle Soerjomataram declare that they have no conflict of interest.

Papers of particular interest, published recently, have been highlighted as:

• Of importance •• Of major importance

An example of the exposure distribution of Body Mass Index (based on the mean and standard deviation (SD)) and the corresponding continuous generic Relative Risks (RRs) (Figure from [

Example of the exposure distribution of alcohol consumption among current drinkers (in grams of pure alcohol per day) before and after adjustment for undercoverage (for French men 35 to 44 years of age in 2005 as obtained from the Baromètre [

Population attributable fractions compared to population preventable fractions as applied to cancer risk factor surveillance and to cancer policy projection

Input/ Output | Population attributable fraction | Population preventable fraction |
---|---|---|

Reference exposure scenario | The current or historical exposure | The current or historical exposure |

Reference exposure group (reference | Theoretical-minimum-risk | Theoretical-minimum-risk |

Counterfactual scenario | Everyone at the theoretical- | Counterfactual scenario of an |

Outcomes | Deaths, years of life lost, years lived | Deaths, years of life lost, years lived |

Main aim | Estimate the proportion of an | Assess the impact of implementing |

Also known as | Attributable proportion | Avoidable fraction |

These terms also appear with the word “population” preceding the term to denote that the fraction/proportion are estimated for a given population, whereas without the term “population” the term can refer to the cancer cases within a cohort or case series attributable to a given risk factor.

Etiological fraction has been previously used as the proportion of cases that would have occurred by a certain time even in the absence of exposure, but, with exposure, occurred earlier than they otherwise would have. Although distinct conceptually from the attributable fraction, based on this definition, all attributable cases are etiologic cases, but not vice versa [

This term also appears with the word “theoretical” preceding the term to denote that the fraction/proportion is based on a theoretical scenario.

_{obs} represents the cases under the factual scenario and cases_{expected} represents the expected number of cases.

_{c} or x_{cf} which represents the current exposure distribution and the counterfactual exposure distribution respectively (either categorically or continuously). For the combination formula, T represents the total PAF or PPF, and n represents a risk factor-specific PAF or PPF. See [