Conceived and designed the experiments: MM CD MAB. Performed the experiments: MM MLS GC CD. Analyzed the data: MM MLS GC CD. Contributed reagents/materials/analysis tools: AM SG GT. Wrote the paper: MM CD MAB.
Multi-Locus Sequence Typing (MLST) of
Disease-causing pneumococci represent a phenotypically and genotypically diverse population of strains that cause bacteraemia, meningitis, pneumonia, sinusitis, and acute otitis media in children
Within recent years, the accessibility of sequence analysis tools has increased the diffusion of
The use of specifically designed algorithms such as eBURST, shows that MLST-related strains can be further grouped into Clonal Complexes (CCs) ideally including only genotypes descending from a common predicted founder
Whether or not CC156 still represents the evolutionary descent from a predicted ancestor or is an example of artificial grouping due to the reduction of the discriminatory power of the eBURST algorithm, is a matter of scientific interest. The application of alternative typing methods with respect to MLST would be of paramount importance to unravel this point. Whole genome sequence analysis has been successfully applied to
In order to assess whether distinct genetic lineages were present in CC156, we applied 96-MLST to a panel of strains representative of 41 different STs belonging to CC156. The application of the 96-MLST schema allowed for the distinction of ten lineages within CC156, each homogeneous in terms of capsular type and for the presence of clonally inherited genetic traits (such as the presence of PI-1). Noteworthy, strains belonging to ST4945, whose recent discovery had been responsible for the merging of distinct clones into CC156, were unequivocally assigned to one of the lineages. Moreover, two out of the three ST4945 SLV analyzed were assigned to lineages different from ST4945, suggesting that strains in close proximity in the eBURST graphic representation (and differing in only one of the seven MLST alleles) can be significantly different when considering additional loci interspersed in the whole genomic backbone.
The collection of
| Strain name | ST | Serotype/serogroup | Country | MLST alleles in common with ST4945 | MLST alleles in common with ST156 | PI-1 | Data source | Strain source | Lineage |
| 90 | 6B | Spain | 2/7 | 1/7 | yes | GenBank:CP002176 | f | ||
| 94 | 6B | Italy | 2/7 | 1/7 | yes | This Study | Istituto Superiore di Sanità, Italy | f | |
| 124 | 14 | Canada | 4/7 | 1/7 | no | GenBank:ABZC00000000 | d | ||
| 124 | 14 | Canada | 4/7 | 1/7 | no | GenBank:ABZT00000000 | d | ||
| 124 | 14 | USA | 4/7 | 1/7 | no | GenBank:ABAD00000000 | d | ||
| 138 | 6B | USA | 3/7 | 1/7 | yes | This Study | Center for Disease Control and Prevention, USA | b | |
| 143 | 14 | Italy | 3/7 | 5/7 | yes | This Study | Istituto Superiore di Sanità, Italy | i | |
| 145 | 6B | Iceland | 5/7 | 3/7 | yes | This Study | Landspitali, National University Hospital of Iceland, Iceland | e | |
| 146 | 6B | New Zeland | 4/7 | 2/7 | yes | This Study | Center for Disease Control and Prevention, USA | e | |
| 156 | 14 | Israel | 3/7 | 7/7 | yes | This Study | Ben-Gurion University of the Negev, Israel | i | |
| 156 | 14 | Israel | 3/7 | 7/7 | yes | This Study | Ben-Gurion University of the Negev, Israel | i | |
| 156 | 11A | Israel | 3/7 | 7/7 | yes | This Study | Ben-Gurion University of the Negev, Israel | i | |
| 156 | 9V | Thailand | 3/7 | 7/7 | yes | This Study | Shoklo Malaria Research Unit, Thailand | i | |
| 156 | 9V | Thailand | 3/7 | 7/7 | yes | This Study | Shoklo Malaria Research Unit, Thailand | i | |
| 156 | 14 | Brazil | 3/7 | 7/7 | yes | This Study | Oswaldo Cruz Foundation Salvador, Brazil | i | |
| 156 | 14 | Brazil | 3/7 | 7/7 | yes | This Study | Oswaldo Cruz Foundation Salvador, Brazil | i | |
| 156 | 9V | Italy | 3/7 | 7/7 | yes | This Study | Istituto Superiore di Sanità, Italy | i | |
| 156 | 14 | Italy | 3/7 | 7/7 | yes | This Study | Istituto Superiore di Sanità, Italy | i | |
| 156 | 9V | Italy | 3/7 | 7/7 | yes | This Study | Istituto Superiore di Sanità, Italy | i | |
| 156 | 14 | Italy | 3/7 | 7/7 | yes | This Study | Istituto Superiore di Sanità, Italy | i | |
| 156 | 9V | Sweden | 3/7 | 7/7 | yes | This Study | Karolinska Institutet, Sweden | i | |
| 156 | 9V | Sweden | 3/7 | 7/7 | yes | This Study | Karolinska Institutet, Sweden | i | |
| 156 | 9V | Worldwide | 3/7 | 7/7 | yes | GenBank:ABGE00000000 | Genome Biol 11:R107 | i | |
| 162 | 9V | Brazil | 4/7 | 6/7 | yes | This Study | Oswaldo Cruz Foundation Salvador, Brazil | i | |
| 162 | 9V | Brazil | 4/7 | 6/7 | yes | This Study | Oswaldo Cruz Foundation Salvador, Brazil | i | |
| 162 | 24F | Italy | 4/7 | 6/7 | yes | This Study | Istituto Superiore di Sanità, Italy | i | |
| 162 | 24F | Italy | 4/7 | 6/7 | yes | This Study | Istituto Superiore di Sanità, Italy | i | |
| 166 | 9V | USA | 4/7 | 6/7 | Yes | This Study | Center for Disease Control and Prevention, USA | i | |
| 171 | 6B | n.d. | 3/7 | 2/7 | no | This Study | University of Alabama, USA | b | |
| 172 | 23F | Israel | 1/7 | 1/7 | no | This Study | Ben-Gurion University of the Negev, Israel | a | |
| 172 | 19A | Israel | 1/7 | 1/7 | no | This Study | Ben-Gurion University of the Negev, Israel | a | |
| 172 | 23F | Israel | 1/7 | 1/7 | yes | This Study | Ben-Gurion University of the Negev, Israel | a | |
| 172 | 23F | Thailand | 1/7 | 1/7 | no | This Study | Shoklo Malaria Research Unit, Thailand | a | |
| 172 | 23F | Thailand | 1/7 | 1/7 | yes | This Study | Shoklo Malaria Research Unit, Thailand | a | |
| 173 | 23F | Poland | 2/7 | 2/7 | yes | This Study | Center for Disease Control and Prevention, USA | c | |
| 176 | 6B | Italy | 2/7 | 1/7 | yes | This Study | Istituto Superiore di Sanità, Italy | b | |
| 239 | 9V | Poland | 1/7 | 1/7 | no | This Study | National Medicine Institute, Poland | g | |
| 268 | 19A | Hungary | 1/7 | 1/7 | yes | GenBank:CP000936 | Genome Biol 11:R107 | c | |
| 273 | 6B | Greece | 4/7 | 1/7 | yes | This Study | Center for Disease Control and Prevention, USA | f | |
| 280 | 9V | Thailand | 2/7 | 1/7 | no | This Study | Shoklo Malaria Research Unit, Thailand | g | |
| 338 | 23F | Colombia | 1/7 | 1/7 | no | This Study | Center for Disease Control and Prevention, USA | a | |
| 361 | 6A | Ghana | 2/7 | 2/7 | no | This Study | Swiss Tropical Institute, Switzerland | a | |
| 385 | 6B | USA | 5/7 | 2/7 | yes | This Study | University of Alabama, USA | e | |
| 392 | 17F | USA | 6/7 | 3/7 | no | This Study | Center for Disease Control and Prevention, USA | h | |
| 440 | 23F | Italy | 5/7 | 2/7 | no | This Study | Ospedale le Scotte,Siena, Italy | h | |
| 559 | 6B | Italy | 2/7 | 1/7 | Yes | This Study | Istituto Superiore di Sanità, Italy | b | |
| 602 | 23F | Poland | 4/7 | 1/7 | no | This Study | National Medicine Institute, Poland | h | |
| 642 | 9V | USA | 4/7 | 4/7 | Yes | This Study | Center for Disease Control and Prevention, USA | i | |
| 671 | 14 | USA | 2/7 | 4/7 | Yes | This Study | Center for Disease Control and Prevention, USA | i | |
| 789 | 14 | Uruguay | 6/7 | 2/7 | no | This Study | The Rockfeller University, New York, USA | d | |
| 847 | 19A | Kenya | 4/7 | 4/7 | yes | This Study | Kenyan Medical Research Center, Kenya | j | |
| 847 | 19A | Kenya | 4/7 | 4/7 | yes | This Study | Center for Disease Control and Prevention, USA | j | |
| 1269 | 9 | USA | 4/7 | 5/7 | yes | GenBank:ABAB00000000 | i | ||
| 1349 | 23B | Turkey | 0/7 | 0/7 | no | This Study | Center for Disease Control and Prevention, USA | a | |
| 2218 | 23F | Thailand | 2/7 | 1/7 | no | This Study | Shoklo Malaria Research Unit, Thailand | a | |
| 4404 | 6B | Thailand | 6/7 | 3/7 | no | This Study | Shoklo Malaria Research Unit, Thailand | e | |
| 4405 | 6B | Thailand | 5/7 | 2/7 | no | This Study | Shoklo Malaria Research Unit, Thailand | e | |
| 4945 | 17F | Sweden | 7/7 | 3/7 | no | This Study | Center for Disease Control and Prevention, USA | h | |
| 4945 | 17F | Egypt | 7/7 | 3/7 | no | This Study | Center for Disease Control and Prevention, USA | h | |
| 4948 | 14 | Egypt | 4/7 | 4/7 | Yes | This Study | Center for Disease Control and Prevention, USA | i | |
| 4966 | 6B | Thailand | 4/7 | 3/7 | No | This Study | Center for Disease Control and Prevention, USA | b | |
| 4966 | 6C | Thailand | 4/7 | 3/7 | No | This Study | Center for Disease Control and Prevention, USA | b | |
| 4968 | 23A | Mozambique | 1/7 | 1/7 | No | This Study | Center for Disease Control and Prevention, USA | a | |
| 5420 | 6B | Thailand | 4/7 | 3/7 | No | This Study | Center for Disease Control and Prevention, USA | b | |
| 5613 | 6A | Nepal | 4/7 | 2/7 | No | This Study | Center for Disease Control and Prevention, USA | b | |
| 6214 | 6 | USA | 3/7 | 2/7 | yes | GenBank:ABAE00000000 | e |
For each strain name, ST, serotype/serogroup, country of isolation, number of MLST alleles in common with ST156 and ST4945, data source, strain source and lineage (as identified by 96-MLST hierarchical clustering,
Bacteria were grown overnight at 37°C in 5% CO2 on Tryptic Soy Agar plates (TSA) (Becton Dickinson) supplemented with 10 mg/l colistine, 5 mg/l oxolinic acid and 5% defibrinated sheep blood. Genomic DNA extractions for the 58 isolates were performed by using the Wizard Genomic DNA purification kit following the manufacturer’s instructions (Promega). PCR amplifications for 96-MLST were performed on genomic DNA as briefly described below, while the presence of PI-1 and the PI-1 clade were determined as reported elsewhere
The 96-MLST loci and the amplification primers for the 96-MLST schema reported in
The 96-MLST loci of the 58
Sequences were converted into allelic profiles (
Hierarchical clustering was performed using the package Cluster v1.13.1
Minimum Spanning Tree analysis was performed using PHYLOVIZ
For each of the 96 loci the sequences were aligned using MUSCLE
ClonalFrame V1.1
CC156 (predicted founder ST156) was identified as the largest clonal complex in the
A) In the absence of ST4945 CC156 is partitioned in three different CCs by e-BURST analysis. B) 32 out of the 41 CC156 STs analyzed differ in four or more than four alleles from the founder ST, ST156. The MLST database was accessed on 15h January 2012 and CC156 visualized using eBURST (the e-BURST algorithm was executed on a dataset comprising all the STs in the database represented once). A) Shadowed shapes indicate the partitioning in distinct CCs of CC156 (CC162 blue, CC124 red, CC176 green) when eBURST was executed with the same ST dataset but excluding ST4945. ST156 and ST4945 are highlighted in red, while all the other STs analysed in this study are in black. B) The STs analysed in this study are highlighted and colour coded based on the number of MLST alleles in common with the predicted founder, ST156 (colour coding is indicated in the Figure).
To investigate whether the merger of multiple CCs into a single CC was due to the identification of one or multiple new STs, the eBURST algorithm was iteratively executed by progressively excluding the most recently identified STs from the analysis. This resulted in the observation that ST4945 (a new allelic combination), occupied a central position within the eBURST CC156 graphic representation (
To further investigate whether the strains comprising the newly formed CC156 had a common evolutionary descendent (ST156) or whether ST4945 strains contained a combination of genetic alleles from different lineages (as suggested by their MLST profile), we analyzed a panel of 66 representative strains of different STs belonging to CC156 (see materials and methods section,
The 66 strains were typed by using the 96-MLST schema
Sequences were converted into allelic profiles assigning a unique ID number to each allele. Hierarchical clustering was performed using the package Cluster v1.13.1. Distances between strains were computed using the function “Daisy” with Gower’s distance, counting the number of differences between allelic profiles. An agglomerative hierarchical clustering of the data was performed using the function “Agnes” with “average” (unweighted pair-group average method – UPGMA) method. The ten lineages identified (a-j) are indicated by coloured boxes, and numbers represent the bootstrap support. The STs of all the strains are indicated in the coloured bar.
As expected, by computing a hierarchical clustering analysis with the MLST allelic profiles for the same set of 41 STs, SLV STs were grouped in the same lineage (
Interestingly, the most remarkable differences in the hierarchical clustering between 96-MLST and 7-MLST were that lineages “d” and “h” (the latter comprising ST4945), “f” and “e”, “a” “b” and “c”, and “i” and “j” clustered together in the 7-MLST, thus justified by their STs proximity within the CC156 eBURST representation.
The partitioning of CC156 into ten distinct lineages by hierarchical clustering was further evaluated by performing a Minimum Spanning Tree (MST) analysis of the 96-MLST alleles (
The Minimum Spanning Tree analysis was performed by using PHYLOVIZ on the 96-MLST alleles of the 66 strains considered in this study. The lineages identified by applying the threshold of 75/96 different loci are highlighted with shadowed shapes and named according to the lineage identification of
The distribution of the 7-MLST and the 96-MLST alleles was analysed by assigning identical colours to identical alleles across the strains (white = unique alleles). Red arrows indicate ST4945 strains, while black and orange arrows indicate single and double 7-MLST locus variants of ST4945, respectively. The 96-MLST loci are listed according to their order in the genome.
In order to visualize whether the allelic differences within and among lineages were concentrated in specific regions of the chromosome, and thus likely attributable to single recombination events, the strains were partitioned based on the lineages identified by hierarchical clustering analysis (
With the exception of lineages “a” and “b”, all of the strains belonging to the same lineage were also closely related in the eBURST graphic visualization of CC156 (
STs are indicated with different colours depending on PI-1 presence/absence as indicated in the Figure legend.
In addition, as shown in
The implementation of increased regional surveillance combined with molecular typing methods has allowed for the identification of several successful
The genetic characterization of
An example of this event is clonal complex CC156, which includes a large and heterogeneous group of strains that in many cases differ in all MLST loci, but nevertheless are connected by a continuous path of SLVs. In this report we provide evidence that the identification of a new ST (ST4945) was sufficient to induce the merger of formerly distinct CCs (here at least three) into one single clonal complex. Interestingly, as reviewed by Feil
In order to discriminate pneumococcal strains within this newly formed CC156, we used a recently developed typing schema based on the sequencing of 96 variable loci belonging to the core genome of
To further support the existence of distinct lineages within CC156, we provide evidence that the diversification of the identified lineages is not due to single recombination events occurring at the level of specific genomic regions, but rather by general sequence variability dispersed along the bacterial chromosome. ST4945 strains were unambiguously assigned to one of the identified lineages (containing also some SLV and DLV of ST4945), suggesting that ST4945 could represent an example of multiple recombination events occurring at the level of MLST loci.
In conclusion, exhaustive MLST typing of large collections of pneumococcal strains has led to the identification of new STs and to the reduction of the discriminatory power of the classical eBURST approach. The analysis of additional loci (such as those included in the 96-MLST schema or of the complete genome) will allow for the reconstruction of the clonal structure and increase the ability to infer evolutionary relationships within the pneumococcal population.
Graphic representation of CC156 by e-BURST. A) CC156 is heterogeneous for the presence of PI-1. B) 20 out of the 41 CC156 STs analyzed have three or less than three alleles in common with ST4945. MLST database was accessed on 15h January 2012 and CC156 visualized using eBURST (e-BURST algorithm was run on a dataset comprising all the STs in the database represented once). A) PI-1 presence and PI-1 clade analysis was assessed on all the STs analysed. The STs analysed in this study are highlighted and colour coded based on PI-1 presence as indicated in the Figure. B) The STs analysed in this study are highlighted and colour coded based on the number of 7-MLST alleles in common with ST4945 (colour coding is indicated in the Figure).
(TIF)
Click here for additional data file.
96-MLST data analysis (66 strains) by Hierarchical clustering and Clonal Frame. A) Hierarchical clustering performed on the 96-MLST alleles. Numbers are the bootstrap support of each node. B) Consensus network obtained using ClonalFrame on the aligned sequences. The thicker branches have a higher level of statistical support. Lineages are named and highlighted with the same colours of Figure2.
(TIF)
Click here for additional data file.
The NJ phylogenetic tree constructed by aligning the 96-MLST concatenated sequences of the 66 CC156 strains analyzed in this study identifies the same 10 lineages (a-j) as the hierarchical clustering (
(TIF)
Click here for additional data file.
Hierarchical clustering performed on the 7-MLST alleles of the 41 CC156 STs analyzed. Hierarchical clustering was performed using the package Cluster v1.13.1. Distances between strains were computed using the function “Daisy” with Gower’s distance, counting the number of differences between allelic profiles. An agglomerative hierarchical clustering of the data was performed using the function “Agnes” with “average” (unweighted pair-group average method – UPGMA) method.
(TIF)
Click here for additional data file.
NJ phylogenetic tree of the 41 CC156 STs analyzed in this study based on the concatenated sequences of the seven MLST loci.
(TIF)
Click here for additional data file.
Description of the 96-MLST loci set. ID locus name, short description, locus length, coordinates of start and stop on the TIGR4 genome, and number of alleles identified in this study are reported for each locus.
(XLSX)
Click here for additional data file.
Amplification primers set. For each locus the forward and reverse primers and the PCR amplicon size are indicated.
(XLSX)
Click here for additional data file.
List of the 96 alleles assigned for each of the 66 strains tested by 96-MLST. Sequences were converted into allelic profiles assigning a progressive unique ID number to each allele. The absent loci were assigned the ID number “0”.
(XLSX)
Click here for additional data file.
Nucleotide sequences of the 96 loci of the 66 strains analyzed in this study.
(TGZ)
Click here for additional data file.
We would like to acknowledge those people who kindly provided the