Multivariate disease mapping enriches traditional disease mapping studies by analysing several diseases jointly. This yields improved estimates of the geographical distribution of risk from the diseases by enabling borrowing of information across diseases. Beyond multivariate smoothing for several diseases, several other variables, such as sex, age group, race, time period, and so on, could also be jointly considered to derive multivariate estimates. The resulting multivariate structures should induce an appropriate covariance model for the data. In this paper, we introduce a formal framework for the analysis of multivariate data arising from the combination of more than two variables (geographical units and at least two more variables), what we have called Multidimensional Disease Mapping. We develop a theoretical framework containing both separable and nonseparable dependence structures and illustrate its performance on the study of real mortality data in Comunitat Valenciana (Spain).
Areallyreferenced spatial data arise frequently in epidemiological studies seeking to describe the geographical distribution of diseases over a region of study. Disease maps describe the geographic variation of disease and generate etiological hypotheses about the possible causes for apparent differences in disease risk. They can also be used to detect spatial clusters attributable to common environmental, demographical, or cultural effects shared by neighbouring regions. However, mapping crude rates can be misleading when the population sizes for some of the geographical units are small and result in excessive variation in the estimated rates, which make the traditional epidemiological risk estimates unreliable. Statistical models built specifically for analysing datasets over small areas are required for exhibiting clearer patterns int the geographical distribution of the diseases. They allow us to borrow strength across regions by using not only the data from a given region, but also the data from neighbouring regions, thereby increasing the amount of information used for estimating the risks in each unit. Univariate models account for information on a single disease, while multivariate models enable us to reliably estimate the geographical distribution of the risks corresponding to several diseases over a region of study; see, e.g.
Recently,
The multivariate disease mapping literature has presented models with just two factors—disease types and geographical units. Hereafter, we will use the classical terminology
This paper is organized as follows: Section 2 introduces some basic tensor algebra that will be later used for building up the models in the rest of sections. Section 3 shows how to generalize the separable multivariate modelling proposal to the multidimensional case. Section 4 introduces nonseparability into the multidimensional context and describes the high number of models that arise when separability is no longer assumed. Section 5 shows two examples illustrating multidimensional modelling in a real setting. First, we show on a trivariate study example how separability can be a restrictive assumption in some cases and how we could use the theory introduced in the former sections to overcome the separability assumption. Second, we undertake a fourdimensional study considering two unstructured factors (Disease and Sex) and two structured ones (Geographical unit and Period). Finally, Section 5 contains some conclusions about the results and models developed in the previous sections.
Let 𝒳 an
More generally, let
The
The
We also generalize the
Let us elucidate further with the example of a trivariate setting, which presents all the challenges in the multidimensional approach. Therefore, for easier exposition, we restrict our attention to the trivariate setting and, when required, point out any specific complexities of models with more than three factors. Here, we are interested in modelling the spatial distribution of risks for several combinations of two factors. The first factor in this setting will always be the geographical unit, while one of the other two factors will usually be the disease (from a set of diseases) and the third factor may either be unstructured, such as Sex or Race, or structured in some way such as Time period or Age group. The spatial term may also be considered as a special case of a structured factor. Let
Let
Consider the expression in (
Alternatively, the associative property of matrix products further yields
This extends easily to produce the following for a general
The expression
Expression (
In this manner, we can easily build a fully separable dependence structure for all the factors considered, by means of successive ∘
Let us now turn to the definition of the
Any other decomposition of the form
Spatial dependence is introduced in a similar manner with
Unfortunately, covariance structures arising from tensor products may not be identifiable. For example, in the trivariate setting,
Matters are somewhat more lenient with unstructured factors and one can restrict the unstructured
Imposing one of these restrictions on every factor will remove the identifiability issues during the inference. This comment also applies to the nonseparable settings we describe below.
The separable model discussed in the previous section is a straightforward extension of the separable model in
Under separability, dependence is separately introduced for every factor by means of an
In the separable case, dependence of these two factors was induced using (
We introduce nonseparable structures by
The complexity in nonseparable dependence structures depends upon the number of variables defining each of them and will vary as long as the numbers of levels in factors 2 and 3 are different. Unlike in separable models, where the order of the factors is irrelevant because the matrix products in (
Nesting factors, as above, is not the only way to generate nonseparable models. Consider the expressions in (
An obvious way to generalize these expressions is to consider
Hitherto, we have only considered interactions between two factors. Interactions between three or more factors is treated analogously. Factorial nonseparability for higher orders is fairly straightforward to achieve by considering the
To summarize, multidimensional disease mapping models can be treated as a series of operations on an array 𝒳
We now illustrate, in greater detail, the threedimensional setting. First, we are going to introduce the following nomenclature to name the different models that can be built with the abovedescribed tools. We use
Multidimensional modelling can be seen as a combination of mathematical operations on an unfolded Gaussian array. These elemental operations for the trivariate case are shown in
Each row of
The fifth column in
We also remark that certain combinations of elemental operations, while mathematically legitimate, may lack statistical interpretability. For example, the combination of the 12· and 23· operations is difficult to interpret because they assume that factors one and two on one side and two and three on the other side are combined with as much flexibility as possible for every one of these pairs. In that case, it would seem much more natural to consider instead a 12(3)·, a 23(1)· or a 123· relationship.
However, models incorporating any factor(s) nested within the spatial factor do not seem reasonable either. These models would allow some covariance matrix (for any of the factor(s) in the model) to vary by every spatial unit. This would surely yield overparameterized models since the number of geographical units is typically much higher than the number of levels in the rest of the factors in the model. Hence, although the combination of operations in
We conclude this section with some remarks on the practical implementation of the proposed models. Although
We have carried out two separate multidimensional studies with Comunitat Valenciana’s mortality data. The dataset corresponds to the Spatiotemporal Mortality Atlas of Comunitat Valenciana (
All the models we describe below have been implemented in WinBUGS 1.4.3 (
The different chains used for every model were run in parallel in order to speed up computations. That is, instead of sending all three chains in a single call to WinBUGS, we made three different calls (one per chain) by means of an R (
An additional simulation study has been carried out in order to assess the performance of DIC for model selection in our context and the ability of some of the entertained models to retrieve the original variancecovariance matrix between geographical patterns. For lack of space the results of this study are included as
We next consider two trivariate scenarios with factors: Geographical Unit (540 levels), Disease (2 levels) and Sex (2 levels). We will refer to them as factors 1 to 3, respectively. We embark upon two separate studies. First, we consider the joint study of Colon and Rectum Cancer for both sexes and, second, the study of Lung Cancer and Diabetes also for both sexes. For these two studies we have ran all those models arising from the combination of the elemental operations in
For the Colon/Rectum study, the model with the lowest DIC is Model 3. This model accommodates spatial dependence parameters for the CAR models to vary across sexes. We point out that none of the models accounting for nonseparability between Disease and Sex (Models 5–9) show notable improvements with respect to the fully separable model. Model 10 was not run for this study because it too considered nonseparability between Disease and Sex and was not expected to yield any improvement.
In contrast, nonseparability between Disease and Sex seems to improve the fit for the Lung/Diabetes study. One such model with the factorial structure delivers the lowest DIC. Nesting of the geographical component within the other factors may also yield some improvement in some occasions, mainly the nesting of the geographical component within diseases. Therefore, we have run the model incorporating a factorial interaction between Disease and Sex and nesting the geographical structure within diseases. However, this model does not perform better than that incorporating only the factorial relationship between Disease and Sex. The fifth column of
Besides model selection with DIC, we have also assessed the fit of the models implemented for both datasets. For this goal, we have used Posterior Predictive
Few differences were found among models in terms of the mentioned
We now present a fourdimensional version of the Lung/Diabetes study from the previous Section. We consider the same dataset, dividing the whole period of study (1987–2006) into five different fouryear periods. Hence, we have a new factor, the Time period, to include in the multidimensional study. This factor, unlike Disease and Sex, has a specific structure reflecting temporal dependence that should, ideally, be accounted for. We assume a firstorder autoregressive structure to model this factor and specify the resulting dependence structure using the matrix in (
As in our earlier experiments, we have again fitted several models and compared their performances using the DIC. Results are shown in
Results in
Regarding computing times for the models run in this study, the fullyseparable model took 780 minutes to run. This time is about 40 times higher than the corresponding trivariate model. We have also run the fourdimensional model without considering any particular temporal structure for Time period and the computing time decreased to 351 minutes. Therefore, the temporal structure seems to considerably slow down the MCMC sampling. For the remaining models, the increase from the three to the fourdimensional case is similar. The bestperforming model, the factorial 1 · 23 · 4· model took 2,223 minutes to run. All models revealed excellent convergence and could surely have been run with less iterations than those simulated in our study.
Models 1 · 3(2) · 2 · 4· and 1 · 23 · 4· have been selected as the most appropriate models based on
Finally, we have also included, as
This paper has tried to set forth some theoretical bases for the development of multivariate disease mapping analyses involving more than one factor besides the geographical factor, what we have called multidimensional disease mapping studies. Very clear links can be drawn between the multidimensional disease mapping problem and tensor algebracalculus therefore the latter offers a clear contextual framework where multidimensional methods can be developed, formalized and studied. In our opinion the establishment of new links between these two areas of research may yield new tools and very valuable ideas for the development of multidimensional models.
Most of the models compared in the examples produce quite similar risk estimates with hardly any practical difference, at least in terms of their posterior means. Maybe, as pointed out by a reviewer, performing quite an extensive model selection as that performed in our examples does not make much sense, nevertheless, we considered it convenient to implement and compare such a large number of models in order to illustrate the variety of models introduced along the paper. In practical terms, we advise users to fit fewer models than those considered in our examples. For example, from an epidemiological point of view, we do not see any relevant difference between the 1·2·3·4(2) and the 1·3·4(2)·2 models in Example 5.2. Since both models produce similar estimates we would advise users to fit just one of them, i.e. we advise to compare just models which have relevant epidemiological differences in their interpretations. This will keep simpler the analysis made and neither their interpretations nor their conclusions should be very different.
Some models have already been formulated which may be competitive alternatives to the framework proposed in this paper. Thus SANOVA (
The models developed within this framework, despite their high complexity due to the difficulty of incorporating several factors within a unique dependence structure, are reasonably affordable from an applied point of view. All of them can be run within
The authors thank the associate editor and two reviewers for their constructive comments. We also thank Professor Ying Macnab for her helpful comments and interesting conversations on this work.
Dr. BotellaRocamora acknowledges the financial support of Ministerio de Educación, Cultura y Deporte, via the Programa Nacional de Movilidad de Recursos Humanos del Plan Nacional de ID+i 2008–2011, prorogued by agreement of the Consejo de Ministros of 2011/10/7. Dr. MartinezBeneito acknowledges the financial support of the research grants MTM201342323P from the Spanish Ministry of Economy and Competitiveness and ACOMP/2015/202 from the Generalitat Valenciana. Dr. Banerjee’s work was supported in part by grants NIH/NIGMS 1RC1GM09240001, NIH/NCI 1R03CA17955501A1 and NSF/DMS1513654.
Posterior mean of the Relative Risk for every municipality. Results in the first row correspond to the Colon/Rectum study and those in the second row correspond to the Lung/Diabetes study.
Elemental operations for building up threedimensional models.
Operation  Matrices involved  # parameters  M  

1·  (1)  (  
2·  (2)  (  
3·  (3)  (  
1(2)· 
 (1,2) 
 
1(3)· 
 (1,3) 
 
1(23)· 
 (1,2,3) 
 
2(1)· 
 (1,2) 
 
2(3)· 
 (2,3) 
 
2(13)· 
 (1,2,3) 
 
3(1)· 
 (1,2,3) 
 
3(2)· 
 (2,3) 
 
3(12)· 
 (1,2,3) 
 
12·  (  (1,2)  (  
23·  (  (2,3)  (  
13·  (  (1,3) 
 
123·  (  (1,2,3)  
12(3)· 
 (  (1,2,3) 

23(1)· 
 (  (1,2,3) 

13(2)· 
 (  (1,2,3) 

DIC (and pD, within brackets) for the Colon cancer/Rectum cancer and the Lung cancer/Diabetes studies. The fifth column shows the computing times (in minutes) for every model implemented for the Lung cancer/Diabetes study.
Model  Dependence structure  DIC (pD) (Colon/Rectum)  DIC (pD) (Lung/Diabetes)  Computing time 

 
1  1 · 2 · 3·  7546.9 (171.5) 9674.9 (489.6)  18.5  
 
2  1(2) · 2 · 3·  7553.4 (165.5)  9669.6 (499.0)  20.8 
3  1(3) · 2 · 3·  9677.4 (493.2)  21.5  
4  1(23) · 2 · 3·  7545.3 (173.6)  9670.4 (506.2)  18.0 
5  1 · 2(3) · 3·  7547.2 (170.1)  9672.5 (474.6)  20.4 
6  1 · 3 · 2(3)·  7556.0 (173.4)  9689.6 (465.9)  16.5 
7  1 · 2 · 3(2)·  7551.0 (170.6)  9671.0 (473.0)  17.4 
8  1 · 3(2) · 2·  7549.7 (175.0)  9670.9 (475.7)  14.5 
 
9  1 · 23·  7552.0 (193.0)  67.8  
 
10  1(2) · 23·  –  9670.5 (476.3)  66.9 
Estimated correlations matrix (posterior means and 80% Credible Intervals) between the different maps for the 1 · 23· model in both studies. Upper/lower row of every cell corresponds respectively to the Colon/Rectum and Lung/Diabetes studies.
Disease 1  Disease 2  Disease 1  Disease 2  

Disease 1  1  0.73 [0.53  0.84 [0.70  0.63 [0.39 
Disease 2  1  0.79 [0.60  0.77 [0.57  
Disease 1  1  0.63 [0.33  
Disease 2  1 
DIC (and pD) for the models run. Rows two and three consider Time period as a nonseparable factor while rows 4 to 11 consider it as a separable factor. Models on the righthandside of the table correspond to the lefthandside models changing the order in which dependence is induced into the factors of the models.
Model  DIC (pD)  Model  DIC (pD) 

 
1 · 2 · 3 · 4·  29008.7 (737.5)  –  – 
 
1 · 2 · 3 · 4(2)·  29014.9 (718.7)  1 · 3 · 4(2) · 2·  29021.3 (748.9) 
1 · 2 · 3 · 4(3)·  29015.7 (719.4)  1 · 2 · 4(3) · 3·  29015.4 (743.1) 
 
1(2) · 2 · 3 · 4·  29013.5 (747.4)  –  – 
1(3) · 2 · 3 · 4·  29008.3 (754.6)  –  – 
1(4) · 2 · 3 · 4·  29020.3 (750.2)  –  – 
1 · 2(3) · 3 · 4·  29003.6 (732.8)  1 · 3 · 2(3) · 4·  29007.9 (715.1) 
1 · 2 · 3(2) · 4·  29019.9 (702.9)  1 · 3(2) · 2 · 4·  29002.5 (735.5) 
1 · 23 · 4·  29002.5 (704.4)  –  – 
1 · 2(4) · 3 · 4·  29018.6 (708.4)  1 · 4 · 2(4) · 3·  29018.8 (719.7) 
1 · 2 · 3(4) · 4·  29028.9 (730.7)  1 · 2 · 4 · 3(4)·  29019.3 (723.9) 