Sequence data will increase understanding of virus evolution, adaptability, and pathogenicity.

Despite safe and efficacious vaccines against peste des petits ruminants virus (PPRV), this virus has emerged as the cause of a highly contagious disease with serious economic consequences for small ruminant agriculture across Asia, the Middle East, and Africa. We used complete and partial genome sequences of all 4 lineages of the virus to investigate evolutionary and epidemiologic dynamics of PPRV. A Bayesian phylogenetic analysis of all PPRV lineages mapped the time to most recent common ancestor and initial divergence of PPRV to a lineage III isolate at the beginning of 20th century. A phylogeographic approach estimated the probability for root location of an ancestral PPRV and individual lineages as being Nigeria for PPRV, Senegal for lineage I, Nigeria/Ghana for lineage II, Sudan for lineage III, and India for lineage IV. Substitution rates are critical parameters for understanding virus evolution because restrictions in genetic variation can lead to lower adaptability and pathogenicity.

Peste des petits ruminants is a highly contagious and devastating viral disease of small ruminants that is endemic to much of Africa, the Middle East, and Asia (

PPRV has caused numerous serious epidemics in small ruminant populations across sub-Saharan Africa, the Middle East, and major parts of the Indian subcontinent where PPRV is considered endemic (

The molecular epidemiology of PPRV, which is based on sequence comparison of a small region of the fusion (F) gene (322 nt) or the nucleoprotein (N) gene (255 nt), has identified 4 distinct lineages (I–IV) of PPRV (

Advanced sequencing technologies have enabled molecular epidemiologic studies of viruses in which whole gene and complete genome data are used to enhance and clarify the evolutionary dynamics of viral infectious disease (

Complete genome sequencing of 7 PPRV isolates (4 from lineage III and 3 from lineage IV) was performed according to the methods described by Muniraju et al. (

Virus isolates | GenBank accession no. | Lineage | Source (reference) |
---|---|---|---|

Ivory Coast/1989 | EU267273 | I | Goat ( |

Nigeria/1976 | EU267274 | II | Sheep ( |

Nigeria/1975/1 | X74443 | II | Goat (14), vaccine strain |

Uganda/2012* | KJ867543 | III | Goat |

UAE/1986*† | KJ867545 | III | Dorcas gazelle ( |

Oman/1983* | KJ867544 | III | Goat ( |

Ethiopia/1994* | KJ867540 | III | Goat ( |

Ethiopia/2010* | KJ867541 | IV | Goat |

India/Sungri/1996* | KJ867542 | IV | Goat (provided by Intervet International B.V, Boxmeer, the Netherlands), vaccine strain |

Morocco/2008* | KC594074 | IV | Goat ( |

China/Tibet Bharal/2008 | JX217850 | IV | Bharal, |

China/Tibet33/2007 | JF939201 | IV | Goat ( |

China/TibetGeg30/2007 | FJ905304 | IV | Goat ( |

Turkey/2000 | NC006383 | IV | Sheep ( |

*Whole genome sequencing was conducted. †UAE, United Arab Emirates.

In addition to the 7 complete genomes sequences of PPRV generated in this study, another 7 complete genome sequences were obtained from GenBank (

Partial N gene sequences of PPRV (nucleotide positions 1253–1507) that have a detailed history of collection date and place were obtained from GenBank (available up to August 2013). These partial sequences were aligned by using the ClustalW algorithm in BioEdit software v7.2.0. (

The nucleotide and amino acid sequence differences between the PPRV lineages for 12 complete genome sequences were estimated by using BioEdit software v7.2.0. Analyses of selection pressure in individual PPRV genes was performed by obtaining mean ratios of nonsynonymous (dN) to synonymous (dS) substitutions per site. The dN/dS was calculated by using codon-based maximum likelihood approaches with the single-likelihood ancestor method implemented in hypothesis testing using the phylogenies package (

Molecular evolutionary rate and divergence times were co-estimated. A Bayesian maximum clade credibility (MCC) phylogenetic tree was constructed by using Bayesian Markov chain Monte Carlo (MCMC) analysis and Bayesian evolutionary analysis sampling trees (BEAST) software package v1.8.0 (

For each analysis, 2 independent MCMC chains were run to get a final output of 10,000 trees (ESS >200 for all the parameters estimated) and were assessed for their proper mixing, convergence, and consistency by Tracer v1.5 with 10% burn in. The 2 individual runs were combined by using LogCombiner v1.8.0 in the BEAST software package. The nucleotide substitution rate (substitutions/site/year) and the TMRCA (year) values were obtained from Tracer v1.5. The posterior tree distributions were summarized by using TreeAnnotator (

Bayesian phylogeographic analysis was performed by using complete PPRV genome sequence and partial N gene sequence datasets, and isolates were annotated according to their location (longitude and latitude). Partial N gene data were chosen instead of F gene data because of increased divergence reported for the N gene (

All 7 PPRV complete genomes are 15,948 nt and conform to the rule of 6 as described for all other morbillivirus genomes (

Lineage | Lineage | |||
---|---|---|---|---|

I | II | III | IV | |

I | 5.1 | 6.1–7.0 | 5.7–6.1 | |

II | 5.7–6.3 | 4.0–4.2 | ||

III | 0.2–3.0, | 6.1–7.2 | ||

IV | 0.1–2.0, |

*Values are percentage nucleotide (bold) and amino acid sequences differences.

The dN/dS for coding regions of the various genes of PPRV (n = 12) for all 4 lineages ranged from 0.06 to 0.45 (

Gene† | Total amino acids | Codon position | Mean dN/dS | ||
---|---|---|---|---|---|

CP1.mu | CP2.mu | CP3.mu | |||

N | 526 | 0.44 | 0.33 | 2.23 | 0.13 |

P | 510 | 0.81 | 0.69 | 1.49 | 0.45 |

M | 336 | 0.48 | 0.15 | 2.36 | 0.06 |

F | 547 | 0.46 | 0.26 | 2.29 | 0.10 |

H | 610 | 0.57 | 0.37 | 2.06 | 0.19 |

L | 2184 | 0.42 | 0.18 | 2.40 | 0.08 |

*BEAST, Bayesian evolutionary analysis sampling trees; dN/dS, nonsynonymous/synonymous substitutions per site; SLAC, single-likelihood ancestor counting; CP, codon position. †N = nucleoprotein; P, phosphoprotein; M, matrix; F, fusion; H, hemagglutinin; L, large polymerase.

Mean ratios of nonsynonymous (dN) to synonymous (dS) substitutions per site of concatenated coding regions of peste des petits ruminants virus genome. Proportion of dS substitutions per potential dS site and proportion of dN substitutions per potential dN site were calculated by using the method of Nei and Gojobori (

Complete genome sequences of 12 PPRV and partial N gene dataset (n = 159) were analyzed by using the coalescent-based Bayesian MCMC approach. The general time-reversible nucleotide substitution model with a gamma distribution for rate variation was selected on the basis of Akaike information criterion scores. Bayes factor test with marginal likelihood comparisons showed that the relaxed UCED clock model best fitted the PPRV complete genome and partial N gene datasets (

Sequence dataset (no.)† | Models, substitution/ clock/demographic | Mean nucleotide substitution rate, substitutions/site/y (95% HPD) | TMRCA, y (95% HPD) | Bayes factor, –log likelihood |
---|---|---|---|---|

PPRV complete genome (12) | GTR + G/strict/BSP | 3.2 x 10^{–4} (2.02 x 10^{–4}–4.31 x 10^{–4}) | 1763 (1653–1832) | –46,972.98 |

GTR + G/strict/CS | 3.21 x 10^{–4 }(2.12 x 10^{–4}–4.38 x 10^{–4}) | 1763 (1659–1834) | –46,973.06 | |

GTR + G/strict/EG | 3.24 x 10^{–4 }(2.12 x 10^{–4}–4.33 x 10^{–4}) | 1765 (1668–1836) | –46,973.06 | |

GTR + G/UCLD/BSP | 2.89 x 10^{-3} (3.21 x 10^{–8}–6.92 x 10^{–4}) | 1691 (123 | –46,935.66 | |

GTR + G/UCLD/CS | 3.03 x 10^{–4 }(8.99 x 10^{–9}–7.07 x 10^{–4}) | 1705 (123–1961) | –46,935.86 | |

GTR + G/UCLD/EG | 3.72 x 10^{–4 }(3.01 x 10^{–5}–7.93 x 10^{–4}) | 1767 (1222–1948) | –46,935.89 | |

GTR + G/UCED/BSP | 7.91 x 10^{–4 }(7.46 x 10^{–5}–1.53 x 10^{–3}) | 1889 (1586–1968) | –46,933.82 | |

GTR + G/UCED/CS | 7.98 x 10^{–4 }(8.03 x 10^{–5}–1.54 x 10^{–3}) | 1887 (1569–1968) | –46,933.98 | |

^{–4 }(2.13 x 10^{–4}–1.64 x 10^{–3}) | ||||

GTR + G/random/BSP | 7.01 x 10^{–4 }(5.55 x 10^{–4}–8.50 x 10^{–4}) | 1888 (1862–1908) | –46,934.75 | |

GTR + G/random/CS | 6.97 x 10^{–4 }(5.38 x 10^{–4}–8.41 x 10^{–4}) | 1887 (1860–1908) | –46,934.64 | |

GTR + G/random/EG | 7.04 x 10^{–4 }(5.57 x 10^{–4}–8.57 x 10^{–4}) | 1888 (1861–1908) | –46,934.89 | |

N partial (159) | GTR + G/strict/BSP | 1.22 x 10^{–3} (9.39 x 10^{–4}–1.51 x 10^{–3}) | 1890 (1857–1917) | –2,884.524 |

GTR + G/strict/CS | 1.23 x 10^{–3} (9.49 x 10^{–4}–1.52 x 10^{–3}) | 1886 (1853–1913) | –2,887.723 | |

GTR + G/strict/EG | 1.24 x 10^{–3} (9.71 x 10^{–4}–1.56 x 10^{–3}) | 1893 (1863–1919) | –2,885.44 | |

GTR + G/UCLD/BSP | 1.45 x 10^{–3} (1.06 x 10^{–3}–1.87 x 10^{–3}) | 1896 (1815–1943) | –2,806.535 | |

GTR + G/UCLD/CS | 1.41 x 10^{–3} (1.05 x 10^{–3}–1.80 x 10^{–3}) | 1882 (1793–1935) | –2,805.535 | |

GTR + G/UCLD/EG | 1.49 x 10^{–3} (1.10 x 10^{–3}–1.89 x 10^{–3}) | 1904 (1838–1943) | –2,805.921 | |

GTR + G/UCED/BSP | 1.52 x 10^{–3} (1.11 x 10^{–3}–1.98 x 10^{–3}) | 1904 (1817–1949) | –2,799.572 | |

GTR + G/UCED/CS | 1.46 x 10^{–3} (1.05 x 10^{–3}–1.88 x 10^{–3}) | 1886 (1785–1940) | –2,799.512 | |

^{-3} (1.16 x 10^{–3}–1.99 x 10^{–3}) | ||||

GTR + G/random/BSP | 1.26 x 10^{–3} (9.44 x 10^{–4}–1.58 x 10^{–3}) | 1881 (1837–1915) | –2,865.846 | |

GTR + G/random/CS | 1.24 x 10^{–3} (9.38 x 10^{–4}–1.57 x 10^{–3}) | 1875 (1831–1910) | –2,866.111 | |

GTR + G/random/EG | 1.27 x 10^{–3} (9.62 x 10^{–4}–1.60 x 10^{–3}) | 1880 (1841–1914) | –2,866.929 | |

N CDS (12) | GTR + G/UCED/EG | 1.01 x 10^{–3} (2.79 x 10^{–4}–1.83 x 10^{–3}) | 1924 (1799–1970) | NA |

N complete gene (12) | GTR + G/UCED/EG | 1.08 x 10^{–3} (3.19 x 10^{–4}–1.93 x 10^{–3}) | 1923 (1804–1970) | NA |

P CDS (12) | GTR + I/UCED/EG | 1.11 x 10^{–3} (3.46 x 10^{–4}–1.29 x 10^{–3}) | 1931 (1833–1972) | NA |

P complete gene (12) | GTR + I/UCED/EG | 1.19 x 10^{–3} (3.46 x 10^{–4}–2.03 x 10^{–3}) | 1930 (1828–1971) | NA |

M CDS (12) | GTR + G/UCED/EG | 6.52 x 10^{–4 }(1.20 x 10^{–4}–1.20 x 10^{–3}) | 1897 (1695–1964) | NA |

M complete gene (12) | GTR + I/UCED/EG | 2.49 x 10^{–3} (9.96 x 10^{–4}–4.14 x 10^{–3}) | 1944 (1879–1973) | NA |

FCDS (12) | GTR + I/CED/EG | 8.95 x 10^{–4 }(2.43 x 10^{–4}–1.58 x 10^{–3}) | 1914 (1766–1968) | NA |

F complete gene (12) | GTR + G/UCED/EG | 1.33 x 10^{–3} (3.26 x 10^{–4}–2.36 x 10^{–3}) | 1912 (1754–1967) | NA |

H CDS (12) | GTR + G/UCED/EG | 1.21 x 10^{–3} (3.96 x 10^{–4}–2.04 x 10^{–3}) | 1926 (1826–1969) | NA |

H complete gene (12) | GTR + G/UCED/EG | 1.25 x 10^{–3} (4.34 x 10^{–4}–2.14 x 10^{–3}) | 1925 (1821–1968) | NA |

L CDS (12) | GTR + I/UCED/EG | 9.82 x 10^{–4 }(3.76 x 10^{–4}–1.67 x 10^{–3}) | 1929 (1834–1969) | NA |

L complete gene (12) | GTR + I/UCED/EG | 9.69 x 10^{–4 }(3.36 x 10^{–4}–1.64 x 10^{–3}) | 1927 (1820–1969) | NA |

PPRV/RPV/MV (16) | GTR+ G + I/UCED/EG | 1.89 x 10^{–3} (5.55 x 10^{–4}–3.31 x 10^{–3}) | 1616 (1072–1859) | NA |

*Bold indicates best-fit models. HPD, highest posterior density; TMRCA, time to most recent common ancestor; GTR + G, general time-reversible with gamma distribution rates; BSP, Bayesian skyline plot; CS, constant size; EG, exponential growth; UCLD, uncorrelated lognormal distribution; UCED, uncorrelated exponential distribution; NA, not applicable; GTR + I, general time-reversible with invariant sites; GTR+ G + I, general time-reversible with gamma distribution rates and invariant sites. †PPRV, peste des petits ruminants virus; N, nucleoprotein; CDS, coding sequence; P, phosphoprotein; M, matrix; F, fusion; H, hemagglutinin; L, large polymerase; RPV, rinderpest virus; MV, measles virus.

Accordingly, the UCED and exponential growth model have been directly used for the individual PPRV gene dataset and the PPRV/RPV/MV complete genome dataset to estimate the TMRCA and substitution rate per site per year. When we used the UCED and exponential growth models, we found that the mean evolutionary substitution rate of the PPRV complete genome was estimated to be 9.09 × 10^{−4} (95% HPD 2.13 × 10^{−4}–1.64 ×10^{−3}). When 2 complete genome sequences of vaccine strains were added into this analysis, the same models (general time-reversible nucleotide substitution model with a gamma distribution, UCED, and the exponential growth demographic models) were best fitted, and the mean substitution rate/site/year was reduced to 7.86 × 10^{−4} (95% HPD 2.17 × 10^{−4}–1.4 × 10^{−3}). Furthermore, the evolutionary nucleotide substitution rate for combined PPRV/RPV/MV complete genomes was 1.89 × 10^{−3} (95% HPD 5.55 × 10^{−4}–3.31 × 10^{−3}). Analysis of individual genes of the PPRV coding region dataset, coding and noncoding region datasets, and partial N gene dataset are shown in

A Bayesian time-scaled MCC tree based on complete PPRV genomes was constructed (

Time-scaled Bayesian maximum clade credibility phylogeny tree based on peste des petits ruminants virus complete genome sequences. The tree was constructed by using the uncorrelated exponential distribution model and exponential tree prior. Branch tips correspond to date of collection and branch lengths reflect elapsed time. Tree nodes were annotated with posterior probability values and estimated median dates of time to most recent common ancestor (TMRCA). Corresponding 95% highest posterior density (HPD) interval values of TMRCA are indicated as gray bars. Horizontal axis indicates time in years. UAE, United Arab Emirates.

Results of TMRCA analysis using complete coding and coding and noncoding regions of individual PPRV genes are shown in

Time-scaled Bayesian MCC phylogeny tree based on peste des petits ruminants virus (PPRV), rinderpest virus (RPV), and measles virus (MV) complete genome sequences. The tree was constructed by using the uncorrelated exponential distribution model and exponential tree prior. Branch tips correspond to date of collection and branch lengths reflect elapsed time. Tree nodes were annotated with posterior probability values, estimated median dates of time to most recent common ancestor (TMRCA). Corresponding 95% highest posterior density (HPD) values of TMRCA are indicated as gray bars. Horizontal axis indicates time in years. UAE, United Arab Emirates.

The demographic history of PPRV was investigated by using the partial N gene sequence dataset according to the BSP method implemented in BEAST. The BSP with an assumed piecewise-constant model has facilitated estimation of effective population size through time. The BSP showed that the population did not show much genetic diversity (effective number of infections) until the mid-1990s when diversity started to increase. Toward the first decade of the 21st century, the population size appeared to reach a peak and then showed a small decrease until the most recent sampling in 2012 (

Bayesian skyline plot showing demographic history of global peste des petits ruminants viruses sampled during 1968–2012. Genetic diversity was estimated by using a partial nucleoprotein gene dataset (n = 159). The thick black line represents median genetic diversity and the blue shaded areas show 95% highest posterior density estimate.

To estimate the geographic origin of PPRV, we summarized the results of Bayesian phylogeographic analyses by visualizing the annotated MCC tree (

Maximum clade credibility tree constructed for the geospatial analysis of peste des petits ruminants viruses by using complete genome data. Nodes are colored according to the most probable location of their ascendent locations. Posterior probability values are shown along tree nodes. Posterior probability distribution (PPD) values of root location states of the ancestral node are shown along the x-axis at the top left. UAE, United Arab Emirates.

Because the geographic origin of PPRV could not be localized to a single country by using 14 complete genome sequences, further phylogeographic analysis was performed by using 159 partial N gene sequences collected from 30 locations during 1968–2012. The root state posterior probabilities of PPRV ranged from 0.11% to 17.20%, and Nigeria (17.20%), Ghana (14.28%), and Sierra Leone (11.68%) showed the highest marginal support (

Probability of root locations of the most recent common ancestral peste des petits ruminants (PPRV). MCC trees were obtained by using the continuous time Markov chain and Bayesian stochastic search variable selection procedures. Root location probabilities of the most recent common ancestor using global PPRV isolates (panel A ) are shown graphically alongside lineages I–IV (panels B–E) and were estimated by using a complete dataset of PPRV partial nucleoprotein gene data and individual lineages separately. Probabilities of root locations are shown as percentages along the x-axes. UAE, United Arab Emirates; CAR, Central African Republic.

We sequenced complete genomes of 4 lineage III and 3 lineage IV isolates of PPRV. We used these genomes and other available genomes to assess the evolutionary substitution rate, TMRCA, and divergence of PPRV lineages and the geographic origin of PPRV.

The measure of selective pressures acting across the PPRV genome showed only purifying (stabilizing) selection occurring across the genome and no evidence of positive selection. The conservation of amino acid residues was further confirmed by the fact that the relative substitution rates at the third codon position of all the genes were higher than those for the first and second codon positions. The observed upper limit of 11.9% nt divergence (7.2% aa divergence) among PPRVs is consistent with the low level of antigenic divergence observed because despite lineage differentiation, only a single serotype exists for PPRV. Homologous recombination events are generally rare or absent in negative-sense RNA viruses (

From a genetic perspective, substitution rates are critical parameters for understanding virus evolution, given that restrictions in genetic variation within a population of viruses can lead to lower adaptability and pathogenicity (^{−3}–2.13 × 10^{−4} substitutions/site/year, which is similar to that predicted for other paramyxoviruses (10^{−3}–10^{−4} substitutions/site/year) (

That the predicted TMRCA for PPRV was during the early 20th century is reasonable because the first recorded description of PPRV was made in 1942 (

Biased estimates in substitution rate and TMRCA were observed by using datasets that included tissue culture–passaged, attenuated vaccine strain complete genome sequences, in which slower evolutionary substitution rates and earlier TMRCA were predicted. Similar observations were reported for PPRV/RPV/MV N gene sequence analyses, in which a slower and biased nucleotide substitution rate was observed when vaccine strain sequences (

Spatial and temporal dynamics of RNA viruses are often reflected by their phylogenetic structure (

The demographic analysis of PPRV with the BSP indicated historically constant genetic variability of PPRV over time. This finding could be a reflection of the use of RPV vaccine in small ruminants to protect animals against PPRV through the 1990s, which might have affected the evolution and spread of PPRV. In the early 21st century, genetic diversity of PPRV has gradually increased, which reflects frequent outbreak reports. The increased genetic diversity may be a driver for selection pressures within individual lineages and might result in extinction events, as suggested by an absence of lineage I virus. In recent years, as efforts have increased to actively control and eradicate PPRV, a decrease in genetic diversity has been observed.

Phylogeographic reconstruction with spatial and temporal information of virus isolates has enabled an understanding of the historic emergence and dispersal patterns involved in virus evolution (

Preliminary results were presented at the 15th International Negative Strand Virus Meeting, June 16–21, 2013, Granada, Spain.

We thank Vincent Michaud for his critical reading of and comments on the manuscript.

This study was supported by grants EU-BBSRC Anihwa BB/L013657/1, BBSRC-DFID CIDLID BB/H009485/1, and DBT-BBSRC FADH BB/L004801/1.

Mr Muniraju is a final year doctoral student at The Pirbright Institute, Pirbright, UK. His primary research interests are epidemiologic studies of PPRV and developing marker vaccines for peste des petits ruminants by using reverse genetics techniques.