^{1}

^{*}

^{2}

^{3}

^{1}

^{3}

^{3}

^{4}

^{1}

^{5}

^{6}

Performed the phylogenetic analysis: MP. Provided the visualization support: RGB. Conceived and designed the experiments: NZ PMAS. Performed the experiments: NZ. Analyzed the data: MP MC ADL NZ RGB PMAS. Contributed reagents/materials/analysis tools: NZ MP RGB. Wrote the paper: NZ MP RGB MC ADL PMAS.

Inferring disease transmission networks is important in epidemiology in order to understand and prevent the spread of infectious diseases. Reconstruction of the infection transmission networks requires insight into viral genome data as well as social interactions. For the HIV-1 epidemic, current research either uses genetic information of patients' virus to infer the past infection events or uses statistics of sexual interactions to model the network structure of viral spreading. Methods for a reliable reconstruction of HIV-1 transmission dynamics, taking into account both molecular and societal data are still lacking. The aim of this study is to combine information from both genetic and epidemiological scales to characterize and analyse a transmission network of the HIV-1 epidemic in central Italy.

We introduce a novel filter-reduction method to build a network of HIV infected patients based on their social and treatment information. The network is then combined with a genetic network, to infer a hypothetical infection transmission network. We apply this method to a cohort study of HIV-1 infected patients in central Italy and find that patients who are highly connected in the network have longer untreated infection periods. We also find that the network structures for homosexual males and heterosexual populations are heterogeneous, consisting of a majority of ‘peripheral nodes’ that have only a few sexual interactions and a minority of ‘hub nodes’ that have many sexual interactions. Inferring HIV-1 transmission networks using this novel combined approach reveals remarkable correlations between high out-degree individuals and longer untreated infection periods. These findings signify the importance of early treatment and support the potential benefit of wide population screening, management of early diagnoses and anticipated antiretroviral treatment to prevent viral transmission and spread. The approach presented here for reconstructing HIV-1 transmission networks can have important repercussions in the design of intervention strategies for disease control.

Understanding the dynamics of infectious disease spreading demands a holistic approach

At the epidemiological level, scientists have been trying to study the spread of infectious diseases using social or sexual contact networks, modelling the population as a complex network (where nodes are individuals and links are relationships) and running models of disease spread on top of that. In the case of type HIV-1 infection, these models have been used to understand the complexity of HIV-1 transmission and spread of viral drug resistance

Phylogenetic analysis has been employed to study the evolution of HIV-1 both at the population and intra-host level during different stages of the disease, using molecular sequences

This work proposes a new approach to combine information present at both genetic and epidemiological levels in order to obtain a more comprehensive picture of HIV-1 transmission. A filter-reduction method is applied to infer a meta-network of HIV-1 sequences based on the corresponding patient's demographic and medical information. For this meta-network, we use the term contact network as it contains all the contacts that are socially and sexually possible contact between infected individuals in the population. In contrast to standard network methods, no assumptions are being made on the network structure. An intersection of such contact network with a genetic distance network is subsequently computed, from which a hypothetical transmission network is inferred. The method is then applied to identify the HIV-1 subtype B transmission networks in central Italy. The structure of the inferred networks for the MSM and heterosexual risk groups is in agreement with the recognized network structures for social and sexual contacts in the HIV-1 infected population

Considering population level data beside genomic data is essential for understanding the true nature of infectious disease transmission networks, as was alluded to by DeGruttola et al.

A dataset of 895 HIV-1 infected patients from a regional study cohort in Rome, Italy (see _{10} HIV RNA copies/ml (3.5–4.7). The percentage of therapy-naive patients was 19.3%, whilst 80.7% were antiretroviral therapy-experienced. The median (IQR) time from the estimated seroconversion date to the first viral sequence date was 8 (4–11) years. In the subset of therapy-experienced patients, the median (IQR) time from the estimated seroconversion date to the first therapy date was 3 (1.25–5) years, and the median (IQR) time passed from the first therapy date to the viral sequencing date was 4 (1–8) years.

We proposed a new filter-reduction method to infer networks of HIV infected patients, taking into account patients attributes and parameters from literature. The filter-reduction method was defined as follows. Consider a social-sexual network as a graph/network composed of

Starting from an undirected fully-connected network of all HIV sequences in the data, a set of social/sexual filters is applied to obtain an undirected filtered network. To convert the network to a directed one a seroconversion function is applied, deriving a contact network.

For patients1 and 2: | |

Filter 1 | _{1}−age_{2}|) connection = 0 |

Filter 2 | _{1} is not equal to r_{2}) connection = 0_{1} = r_{2} = “Heterosexual” & g_{1} = g_{2}) connection = 0_{1}_{2} |

Filter 3 | _{1} is older than s_{2}) connection = 0_{2} is older than s_{1}) connection = 0 |

Rules for social/sexual filters. gender (

To analyse the inferred networks we fist visualized the networks and plotted the degree distributions.

Visualization of the contact network consisting of three sub-networks corresponding to the major HIV-1 transmission risk groups: MSM (yellow), Heterosexual (red), and IDU (green).

The cumulative total- (black), in- (blue), and out-degree (pink) distributions for the entire network (all risk groups), MSM, Heterosexual, and IDU risk groups plotted in log-log scale.

The degree distributions presented in

MSM | Heterosexual | IDU | All risk groups | |

fraction of removed edges | 80.4% | 91.2% | 45.7% | 91.3% |

average degree | 34.7 | 22.30 | 117.1 | 56.7 |

average path length | 2.16 | 2.83 | 1.48 | 2.20 |

clustering coefficient (global) | 0.59 | 0.00 | 0.76 | 0.71 |

clustering coefficient (local) | 0.70 | 0.00 | 0.82 | 0.47 |

assortativity (degree) | 0.04 | −0.20 | −0.11 | 0.45 |

The percentage of removed edges from the MSM and heterosexual networks is almost twice as the percentage of removed edges from the IDU network. This implies that the MSM and heterosexual contact networks are sparser than the IDU and although the same filters were applied to all risk groups, the nodes in the IDU contact network remains more connected and the network structure is more compact. These observations together with the discrepancies in the degree distributions (

We used community detecting methods based on the leading eigenvector of the community matrix to identify community structures in the network

The colouring trend in the patient's estimated seroconversion year, ranging from 1982 (blue) to 2008 (red).

Next we studied the relationship between the untreated infection period and the connectivity of the patients in the network. For that we defined an untreated infection period (

To construct a hypothetical transmission network we coupled information from both genetic and epidemiological scales. To this aim, we computed the intersection of the contact network with a genetic network which was obtained from a genetic distance matrix ^{th} percentile of the all pairwise comparisons (see ^{st} percentile) to 0.05 (35^{th} percentile). We observed that by increasing the threshold value, the percentage of removed edges gradually decreases for the MSM. But, for the heterosexual, IDU and all risk groups the percentages drop under 50% for threshold value 0.05 (

The hypothetical transmission network of the entire population obtained from computing the intersection of the contact and the genetic network. Patients are colored based on their risk groups: MSM (yellow), Heterosexual (red), IDU (green) and blood products (cyan).

Cumulative total- (black), in- (blue), and out- (pink) degree distributions of the hypothetical transmission network of the MSM, heterosexual, IDU and all risk groups plotted in log-log scale.

MSM | Heterosexual | IDU | All risk groups | |

fraction of removed edges | 98.1% | 98.0% | 74.4% | 96.7% |

average degree | 3.32 | 4.86 | 55.30 | 21.10 |

average path length | 2.86 | 3.27 | 1.78 | 2.22 |

clustering coefficient (global) | 0.36 | 0.00 | 0.60 | 0.59 |

clustering coefficient (local) | 0.50 | 0.00 | 0.74 | 0.45 |

assortativity (degree) | −0.07 | −0.17 | −0.22 | 0.11 |

In

quantity | n | Degree | Data | Power law (p) | goodness-of-fit p-value | |||

σ | ||||||||

MSM | 176 | Total in out | 3.32 1.66 1.66 | 5.62 3.05 3.58 | 27 18 27 | 1.82 (0.54) 2.09 (0.38) 2.65 (0.43) | 11 (1.80) 2 (1.26) 5 (1.61) | 0.0040 0.0590 |

Heterosexual | 255 | Total In out | 4.86 2.43 2.43 | 7.68 3.56 5.67 | 49 17 39 | 3.50 (0.61) 2.50 (0.48) 1.88 (0.31) | 18 (4.78) 4 (1.71) 2 (1.87) | |

IDU | 217 | Total in out | 55.30 27.65 27.65 | 43.43 23.90 33.54 | 175 90 146 | 3.50 (0.12) 3.50 (0.31) 1.96 (0.47) | 69 (5.23) 42 (6.02) 15 (12.37) | 0.0170 0.0000 0.0000 |

All risk groups | 655 | Total in out | 21.10 10.55 10.55 | 35.15 18.47 23.08 | 175 90 146 | 3.5 (0.82) 1.6 (0.51) 2.0 (0.29) | 69 (27.84) 5 (3.91) 14 (7.24) | 0.0160 0.0000 0.0000 |

Basic parameters of the data (total-, in- and out-degree distributions of the MSM, heterosexual, IDU and all risk groups), along with their power-law fits and the corresponding p-value. Goodness-of-fit tests compare the observed data to the hypothesized power-law distribution. If the resulting p-value is greater than 0.1, power-law is plausible for the data (statistically significant values are denoted in bold).

Then we performed statistical tests (via a likelihood ratio test) to compare the power-law again alternative (Exponential and Poisson) distributions for the data. For each alternative distribution, we computed a likelihood ratio shown in

Power law (p-value) | Poisson | Exponential | Support for power law | |||

LR | p-value | LR | p-value | |||

MSM (out-degree) | 2.31 | 0.35 | 0.72 | good | ||

Heterosexual (total-degree) | 4.08 | −2.67 | moderate | |||

Heterosexual (in-degree) | 3.28 | 1.85 | good |

For each degree distribution we give a p-value for the fit to the power-law model and likelihood ratios (

We compared the inferred transmission network with a set of genetic clusters obtained through phylogenetic analysis of the corresponding viral sequences (see

High out-degree nodes in the network have a higher probability of out-spreading the virus to more contacts. In a population these nodes can play the role of super-spreaders with lot of connections

Factor/risk group | MSM | Heterosexual | IDU | All risk groups | ||||||||

Coef | Std | P value | Coef | Std | P value | Coef | Std | P value | Coef | Std | P value | |

0.02 | 0.01 | 0.0078 | 0.01 | 0.01 | 0.7788 | 0.03 | 0.00 | <0.0001 | 0.02 | 0.00 | <0.0001 | |

0.02 | 0.06 | 0.7450 | −0.03 | 0.04 | 0.5242 | −0.06 | 0.01 | 0.0001 | −0.08 | 0.01 | <0.0001 | |

0.22 | 0.01 | <0.0001 | 0.24 | 0.01 | <0.0001 | 0.13 | 0.00 | <0.0001 | 0.17 | 0.00 | <0.0001 | |

- | - | - | 0.39 | 0.10 | 0.0001 | 0.09 | 0.03 | 0.0044 | 0.31 | 0.03 | <0.0001 | |

0.12 | 0.01 | <0.0001 | 0.15 | 0.01 | <0.0001 | 0.01 | 0.00 | <0.0001 | 0.03 | 0.00 | <0.0001 |

Results of a multi-variable regression analysis showing the factors associated with high out-degree nodes. The out-degree is the dependent variable in the analysis, and age, viral load, UIP, gender, and In-degree are independent variables.

To compare the hypothetical transmission networks with random graphs, we generated random networks of the same size (nodes and edges) as the inferred transmission networks for each population (MSM, heterosexual, IDU and all risk groups). For this, we used the fraction of remaining edges in each network, as a probability to generate an edge in the random network.

MSM | Heterosexual | IDU | All risk groups | |||||

Inferred | Randomized | Inferred | Randomized | Inferred | Randomized | Inferred | Randomized | |

average degree | 3.32 | 3.24 | 4.86 | 5.02 | 55.30 | 55.35 | 21.10 | 21.50 |

average path length | 2.86 | 4.38 | 3.27 | 3.59 | 1.78 | 1.74 | 2.22 | 2.44 |

clustering coefficient (global) | 0.36 | 0.02 | 0.00 | 0.02 | 0.60 | 0.25 | 0.59 | 0.032 |

clustering coefficient (local) | 0.50 | 0.02 | 0.00 | 0.01 | 0.74 | 0.25 | 0.45 | 0.032 |

assortativity (degree) | −0.07 | −0.02 | −0.17 | 0.03 | −0.22 | −0.02 | 0.11 | <−0.01 |

Both inferred and randomized networks are of the same size in terms of number of nodes and edges. The properties of the randomized network is an average over the properties of 5 random networks.

A new method for inferring hypothetical HIV-1 transmission networks is introduced using information from both genetic and epidemiological scales. This study constitutes, to the best of our knowledge, the first attempt to combine social and genetic data to characterise transmission networks for HIV-1. We propose a new filter-reduction method for network construction and used it to build a network of HIV-1 sequences based on their connected social and demographical information. To characterise the hypothetical transmission networks we compute the intersection of the social network with the genetic network obtained from the genetic distance matrix of Italian patients. Standard network approaches consider a predefined network structure with certain parameter values to build a network, such as scale-free structure with an exponent in the range of 1.5 to 2.0 for the MSM population in HIV transmission

Interestingly, we uncover a positive correlation between the duration of untreated infection periods and the out-degree of the nodes in the network. This important finding may be explained by the fact that untreated individuals have higher viral loads and are therefore more infectious; moreover not being on therapy is generally associated to a higher probability of not being diagnosed or not being compliant to treatment and prevention messages conveyed by health care providers. This finding underscores the importance of case finding, early diagnosis and anticipated antiretroviral treatment as tools to prevent HIV-1 transmission and spread

The delay between the median estimated seroconversion and the start of genotyping may have caused the older half of infections to be a bias sample, as in the pre-HAART (highly active antiretroviral therapy) era when only the slow progressors survived to be genotyped later. To investigate this effect, we perform the analysis on a subset of recent infections, by only considering instances with first positive test after 1998 calendar year. There were 202 patients with a recent infection in the data in which 79 were MSM, 99 were Heterosexual, 24 were IDU. The correlation between the untreated infection period and the out-degree of nodes in the contact network still holds (

Super-spreaders are highly infectious individuals with a high viral load and a high rate of partner change

The transmission of HIV drug resistance is another important clinical and epidemiological concern which induces treatment failure. Approximately 10% of newly diagnosed patients with HIV-1 infection in Europe are infected with a drug resistant virus

In this study we have limited ourselves to transmission within the three main risk groups, omitting transmission between risk groups which are also observed in the phylogenetic analysis

We believe that the new approach presented here for inferring transmission networks can have important repercussions in the design of intervention for disease control not only for HIV, but potentially for a wide range of viruses and emerging pathogens.

In this study, we combined information from both genetic (derived from HIV-1 RNA sequences) and epidemiological scales to characterize a transmission network of the HIV-1 epidemic in central Italy. The study population included HIV-1 infected patients, with viral genotyping between 1997 and 2009, enrolled and followed up at the Clinic of Infectious Diseases of the Catholic University of the Sacred Heart in Rome, Italy. Inclusion criteria were to have at least one viral genotype sequence performed for each patient, allowing multiple observations for patients with more than a viral genotype available. We applied a novel filter-reduction method to infer a network of HIV-1 sequences based on the corresponding patient's epidemiological information, obtaining a potential contact network. The method is based on real patient data and no pre-assumptions are made on the network structure. To characterize the transmission network of HIV-1, the intersection of the contact network with a genetic network based on a genetic distance matrix was computed.

HIV-1 RNA sequences from a region-wide cohort study of HIV-1-infected people in Rome and Lazio region, Italy, were used [The database is a part of the three national HIV data cohort in Italy: ARCA (

Data statistics | Number of unknown/missing data entries | ||||

Risk group | 22.8% MSM (n = 176) | 33.0% heterosexual (n = 255) | 28.0% IDU (n = 217) | 0.9% blood products (n = 7) | - |

Gender | 65% male (n = 426) | 35.0% females (n = 229) | - | ||

country of origin | 84.0% Italian (n = 553) | 10.4% non-Italian (n = 68) | 5.2% unknown (n = 34) | ||

Antiretroviral therapy | 19.3% therapy-naïve (n = 127) | 80.7% therapy-experienced (n = 528) | - | ||

median (IQR) | |||||

Age | 48 (43–53) years | - | |||

Estimated seroconversion date | 1996 (1993–2000) calendar year | 79.0% unknown (n = 517) | |||

Last negative test date | 1995 (1991–1999) calendar year | 78% unknown (n = 515) | |||

First available positive test date | 1995 (1991–2000) calendar year | - | |||

viral genotyping date | 2004 (2001–2007) calendar year | 0.4% unknown (n = 3) | |||

First available therapy date | 1998 (1995–2003) calendar year | - | |||

plasma viral load (At the time of viral genotyping) | 4.1 log10 HIV RNA copies/ml (3.5–4.7) | 0.4% unknown (n = 3) | |||

time from estimated seroconversion date to the first therapy date | 3 (1.25–5) years | 79.0% unknown (n = 517) | |||

time from estimated seroconversion date to the first viral sequence date | 8 (4–11) years | 79.0% unknown (n = 517) |

HIV-1 sequences matching the inclusion criteria were aligned using MUSCLE software

The filter-reduction method was used to build a contact network from the dataset. Each node in the network represents a viral sequence isolate of HIV-1 obtained from a patient. Starting from an undirected fully-connected network of all patients, a set of social/sexual filters was applied. These filters considered patients' demographical and treatment information. A direct connection between every two nodes that did not satisfy the epidemiological criteria was removed from the network (the percentage of removed edges from the network by applying each filter is presented in

The network visualizations in this article were produced using an in-house developed interactive visualization tool, called “Twilight”, which is based on the igraph software package for complex network research

(TIF)

Click here for additional data file.

(TIF)

Click here for additional data file.

(TIF)

Click here for additional data file.

(TIF)

Click here for additional data file.

(TIFF)

Click here for additional data file.

(TIF)

Click here for additional data file.

(TIF)

Click here for additional data file.

(TIF)

Click here for additional data file.

(DOC)

Click here for additional data file.

(DOC)

Click here for additional data file.

(DOC)

Click here for additional data file.

(DOC)

Click here for additional data file.

(CSV)

Click here for additional data file.

(RAR)

Click here for additional data file.