Emerg Infect DisEmerging Infect. DisEIDEmerging Infectious Diseases1080-60401080-6059Centers for Disease Control and Prevention12453368273853202-012510.3201/eid0811.020125DispatchGlobal Distribution of Mycobacterium tuberculosis SpoligotypesFilliolIngrid*DriscollJeffrey R.van SoolingenDickKreiswirthBarry N.§KremerKristinValétudieGeorges*AnhDang DucBarlowRachael#BanerjeeDilip**BifaniPablo J.§BrudeyKarin*CataldiAngel††CookseyRobert C.‡‡CousinsDebby V.§§DaleJeremy W.¶¶DellagostinOdir A.##DrobniewskiFrancis***EngelmannGuido†††FerdinandSéverine*Gascoyne-BinziDeborah#GordonMax*GutierrezM. Cristina‡‡‡HaasWalter H.§§§HeersmaHerreKälleniusGunilla¶¶¶Kassa-KelembhoEric###KoivulaTuija¶¶¶LyHo MinhMakristathisAthanasios****MamminaCaterina††††MartinGerald‡‡‡‡MoströmPeter*MokrousovIgor§§§§NarbonneValérie¶¶¶¶NarvskayaOlga§§§§NastasiAntonino####Niobe-EyangohSara Ngo‡‡‡PapeJean W*****†††††Rasolofo-RazanamparanyVoahangy‡‡‡‡‡RidellMalin§§§§§RossettiM. Lucia¶¶¶¶¶StaufferFritz#####SuffysPhilip N.******TakiffHowardTexier-MaugeinJeanne‡‡‡‡‡‡VincentVéronique‡‡‡de WaardJacobus H.§§§§§§SolaChristophe*RastogiNalin*Institut Pasteur, Pointe-à-Pitre, Guadeloupe, French West IndiesWadsworth Center, Albany, New York, USANational Institute of Public Health and the Environment, Bilthoven, the NetherlandsPublic Health Research Institute, New York, New York, USANational Institute of Hygiene and Epidemiology, Hanoi, VietnamGeneral Infirmary, Leeds, U.K.St. Georges’ Hospital Medical School, London, U.K.Instituto de Biotecnologia, Castelar, ArgentinaCenters for Disease Control and Prevention, Atlanta, Georgia, USAAustralian Reference Laboratory for Bovine Tuberculosis, Department of Agriculture, South Perth, AustraliaUniversity of Surrey, Guildford, Surrey, U.K.Universidade Federal, Pelotas, BrazilPublic Health Laboratory Service, Dulwich Hospital, London, U.K.University Children’s Hospital, Heidelberg, GermanyInstitut Pasteur, Paris, FranceRobert Koch Institute, Berlin, GermanySwedish Institute for Infectious Disease Control, Solna, SwedenInstitut Pasteur, Bangui, Central African RepublicHygiene-Institut der Universität, Wien, AustriaUniversity of Palermo, Palermo, ItalyBundesinstitut für gesundheitlichenVerbraucherschutz und Veterinärmedizin, Jena, GermanyPasteur Institute of Saint Petersburg, Saint Petersburg, RussiaCentre Hospitalier Universitaire, Brest, FranceUniversity of Firenze, Firenze, ItalyLes Centres Gheskio, Institut National de Laboratoire et de Recherche, Port-au- Prince, HaïtiCornell University, Ithaca, New York, USAInstitut Pasteur, Tananarive, MadagascarGöteborg University, Göteborg, SwedenUniversidade Federal do Rio Grande do Sul, BrazilBundesstaatliche bakteriologisch-serologische Untersuchungsanstalt Wien, AustriaOswaldo Cruz Institute, Rio de Janeiro, BrazilCaracas, VenezuelaCentre Hospitalier Universitaire, Bordeaux, FranceInstituto de Investigacionas Cientificas, Caracas, VenezuelaAddress for correspondence: Christophe Sola and Nalin Rastogi, Unité de la Tuberculose et des Mycobactéries, Institut Pasteur de Guadeloupe, BP 484, 97165 Pointe-à-Pitre Cedex, Guadeloupe, French West Indies; fax : 590(0)590- 893880; e-mails: csola@pasteur.gp; rastogi@pasteur.gp11200281113471349

We present a short summary of recent observations on the global distribution of the major clades of the Mycobacterium tuberculosis complex, the causative agent of tuberculosis. This global distribution was defined by data-mining of an international spoligotyping database, SpolDB3. This database contains 11,708 patterns from as many clinical isolates originating from more than 90 countries. The 11,708 spoligotypes were clustered into 813 shared types. A total of 1,300 orphan patterns (clinical isolates showing a unique spoligotype) were also detected.

Keywords: Mycobacterium tuberculosisspoligotyping

Since the publication of the second version of our spoligotypes database on Mycobacterium tuberculosis (1), the causative agent of tuberculosis (TB), the proportion of clustered isolates (shared types [STs]) increased from 84% (2,779/3,319) to 90% (11,708/13,008). Fifty percent of the clustered isolates were found in only 20 STs. Three of these isolates are M. bovis, including M. bovis BCG (ST 481, 482, and 683). The addition of the next 30 most frequent STs increased the total proportion of clustered isolates (65% instead of 50% initially).

A total of 36 potential subfamilies or subclades of M. tuberculosis complex have been tentatively identified, leading to the definition of major and minor visual recognition rules (Table). The ancestral East-African Indian family (EAI) is made up of at least five main subclades, whereas at least three major spoligotyping patterns are found within the Haarlem family (2). Two families found in central and Middle Eastern Asia (CAS1 and CAS2) are newly defined. The X family (3) is also currently split into at least three well-defined subclades. However, the subdivision of family T (T1–T4, likely to represent relatively old genotypes), which differs from the classic ST 53 (all spacers present except 33–36), remains poorly defined. Similarly, the Latino-American and Mediterranean family (LAM) is tentatively split into subclades LAM1–LAM10 (4). Spoligotyping used alone is not well suited for studying the phylogeny of these two clades (T and LAM). Such study will require results from other genotyping methods such as IS6110-restriction fragment length polymorphism (5) or mycobacterial interspersed repetitive units–variable number of DNA tandem repeats (6). Among well-characterized major clades of tubercle bacilli, four families represent 35% of 11,708 clustered isolates (Beijing 11%, LAM 9.3%, Haarlem 7.5%, and the X clade 7%).

Excerpt from SpolDB3 database showing prototype spoligotypes, visual recognition rules, and binary and octal description<sup>a</sup>
RkSTClassbTotal (n)cRulesdBinary descriptionOctal
11Beijing1282∆1–34οοοοοοοοοοοοοοοοοοοοοοοοοοοοοοοοοοννννννννν000000000003771
253T1864Fννννννννννννννννννννννννννννννννοοοοννννννν777777777760771
1152T2163∆40 and Fννννννννννννννννννννννννννννννννοοοονννοννν777777777760731
3037T371∆13 and Fννννννννννννονννννννννννννννννννοοοοννννννν777737777760771
6440T426∆19 and Fννννννννννννννννννονννννννννννννοοοοννννννν777777377760771
747Haarlem1246∆26–30 and Eνννννννννννννννννννννννννοοοοοονοοοοννννννν777777774020771
202Haarlem2104∆1–24, ∆26–30 and Eοοοοοοοοοοοοοοοοοοοοοοοονοοοοοονοοοοννννννν000000004020771
350Haarlem3519Eννννννννννννννννννννννννννννννονοοοοννννννν777777777720771
6119X1310Cνννννννννννννννννοννννννννννννννοοοοννννννν777776777760771
4137X2427C and ∆39–42νννννννννννννννννοννννννννννννννοοοοννοοοον777776777760601
3192X370∆4–12 and Cνννοοοοοοοοονννννοννννννννννννννοοοοννννννν700036777760731
1548EAI1118A and ∆40ννννννννννννννννννννννννννννοοοονονννννοννν777777777413731
1319EAI2130∆3, ∆20–21 and Aννοννννννννννννννννοονννννννοοοονοννννννννν677777477413771
1611EAI3121∆2-3, A and ∆37–39νοονννννννννννννννννννννννννοοοονοννοοονννν477777777413071
8139EAI4234∆26–27 and Aνννννννννννννννννννννννννοονοοοονοννννννννν777777774413771
46236EAI541Aννννννννννννννννννννννννννννοοοονοννννννννν777777777413771
24181Afri191∆7–9 and ∆39ννννννοοονννννννννννννννννννννννννννννονννν770777777777671
ND331Afri29∆8–12, ∆21–24 and ∆37–39νννννννοοοοοννννννννοοοοννννννννννννοοονννν774077607777071
ND438Afri33∆8–12 and ∆37–39νννννννοοοοοννννννννννννννννννννννννοοονννν774077777777071
17482M. bovis-BCG26∆3, ∆9, ∆16 and Dννονννννοννννννοννννννννννννννννννννννοοοοο676773777777600
ND641M. microti84-7, 23–24, 37–38οοοννννοοοοοοοοοοοοοοοννοοοοοοοοοοοοννοοοοο074000030000600
ND592M. canetti630 and 36οοοοοοοοοοοοοοοοοοοοοοοοοοοοονοοοοονοοοοοοο000000000101000
2126CAS1102∆4–7, ∆23–34νννοοοονννννννννννννννοοοοοοοοοοοοννννννννν703777740003771
ND288CAS26∆4–10, ∆23–34νννοοοοοοοννννννννννννοοοοοοοοοοοοννννννννν700377740003771
1220LAM1152∆3 and Bννονννννννννννννννννοοοοννννννννοοοοννννννν677777607760771
2217LAM292∆3, ∆13 and Bννονννννννννονννννννοοοοννννννννοοοοννννννν677737607760771
1933LAM3108∆9–11 and Bννννννννοοονννννννννοοοοννννννννοοοοννννννν776177607760771
4960LAM437∆40 and Bννννννννννννννννννννοοοοννννννννοοοονννοννν777777607760731
4293LAM544∆13 and Bννννννννννννονννννννοοοοννννννννοοοοννννννν777737607760771
3764LAM647∆29 and Bννννννννννννννννννννοοοοννννονννοοοοννννννν777777607560771
3641LAM748∆20, ∆26-27 and Bνννννννννννννννννννοοοοονοονννννοοοοννννννν777777404760771
NA290LAM89∆27 and Bννννννννννννννννννννοοοοννονννννοοοοννννννν777777606760771
542LAM9e344Bννννννννννννννννννννοοοοννννννννοοοοννννννν777777607760771
961LAM10202∆23–25 and Fννννννννννννννννννννννοοονννννννοοοοννννννν777777743760771
2634Sf82∆9–10 and Fννννννννοοννννννννννννννννννννννοοοοννννννν776377777760771
28451H37Rv78∆20–21 and Fνννννννννννννννννννοονννννννννννοοοοννννννν777777477760771

aRk, ranking no.; ND, not done; ST, arbitrary designation; M., Mycobacterium.
bClass: family definition. See text for the definition of the family acronyms.
cTotal (n), size of the class; binary and octal, description.
dRule A, absence of spacers 29–32, presence of spacer 33 and absence of spacer 34; rule B, absence of spacers 21–24 and spacers 33–36; rule C : absence of spacer 18 and spacers 33–36; rule D, absence of spacers 39–43; rule E, absence of spacer 31 and spacers 3–-36; rule F, absence of spacers 33–36. Clades defined with low sample size, such as Afri2, Afri3, CAS2, and LAM8 are subject to change.
eFormerly LAM1.
fFormerly LAM2.

The global distribution of the most frequently observed spoligotypes by continent in SpolDB3 is as follows. Among the patterns originating in North America (n= 4,276, 32% of the total number of isolates in the database), 16% of the strains are of the Beijing type, 14% belong to ST 137 or ST 119 (X family), and 8% are unique (results not shown). In Central America (n=587, 4.5%), 8% of the strains belong to the ubiquitous ST 53, 7% are ST 50, and 6% are ST 2; the last two STs are part of the Haarlem family. In South America (n=861, 6.6%), the distribution of ST 53 and ST 50 accounts for 10% and 9%, respectively, of the spoligotypes, whereas ST 42 accounts for as much as 9% of the total isolates. The origin of ST 42 remains to be established. In Africa (n=1,432, 11%), ST 59 and ST 53 account for 9% of all isolates studied thus far; however, the values obtained for ST 59 are biased because strains from Zimbabwe are overrepresented. We also observed that M. africanum ST 181 accounts for as much as 6% of all spoligotypes from Africa in our sample.

In Europe (n=4,360, 33.5%), ST 53 represents as much as 9% of the spoligotypes, ST 50 and 47 (Haarlem family) represent 8% of the cases, and the Beijing family accounts for 4% of the spoligotypes. In the Middle Eastern and central Asian region, where the number of samples obtained is still very low (n=351, 2.7%), a high diversity of strains within the EAI and CAS families has been observed, and no single pattern currently exceeds 5%. Further studies of isolates from these regions are needed, e.g., in India, where our sampling is still anecdotal (n=44 isolates). Notwithstanding the scarcity of available data from this region, the observed diversity suggests that this region might be of great interest for further study of the genetic variation of tubercle bacilli. Contrary to what we observed for the Middle East and central Asia, the Far East Asian region (n=801, 6.1%) is characterized by the prevalence of a single genotype, the Beijing type family, a family linked to emerging multiresistance (7). One out of two strains in the Far East is a Beijing type. In Oceania (n=340, 2.6%), ST 19 and Beijing account for 15% and 13%, respectively, of clustered isolates. Thus, this preliminary analysis of the spoligotype distribution of SpolDB3 clearly shows major differences in the population structure of tubercle bacilli within the eight subcontinents studied (Africa; Europe; North America; Central America; South America; Middle East and Central Asia; Far East Asia; and Oceania).

At present, SpolDB 3 is an experimental tool that has yet to prove its usefulness in tracking epidemics. Nevertheless, the facility with which matches between spoligotypes can be detected suggests that this tool may be a good screening mechanism for population-based studies on recent TB transmission. Indeed, the detection of a rarely found ST in SpolDB3 may be a catalyst that signals researchers to look for the clonality of the isolates and to study their epidemiologic relatedness.

Data-exchange protocols through inter-networking will also be implemented in the near future. Working groups such as the European Network for Exchange of Molecular Typing Information (available from: URL: www.rivm.nl/enemti) are coordinating such initiatives. The expanded use of the Bionumerics software (third upgrade; Applied Maths, St. Martens-Latem, Belgium) may also foster this research field. SpolDB3 will also be instrumental in facilitating better understanding of the driving forces that shape tubercle bacilli evolution. Further research should now emphasize the use of data-mining methods, in combination with experts’ knowledge, to tackle the complex dynamics of the population's genetics of tubercle bacilli and TB transmission (3). Our sample represents the compilation of many national studies and, as such, should be considered as an ongoing population-based project aimed at studying global TB genetic diversity. Nevertheless, obtaining a more precise and representative snapshot of the genetic variability of M. tuberculosis complex will require a larger sampling. Although only partially representative of worldwide spoligotypes of M. tuberculosis complex, Spo1DB3 contains a reservoir of genetic information that has already proved useful for defining the phylogenetic links that exist within the TB genomes and for constructing theoretical models of genome evolution. Much remains to be done to evaluate the potential of global genetic databases to better characterize casual contacts (that could lead to identification of sporadic cases) in TB epidemiology. An improved version of our database, which will focus on areas with a high prevalence of TB, is currently in development; as of August 26, 2002, it had 20,000 isolates and 3,000 alleles. Ongoing population-based genotyping projects will likely help shed light on contemporary and ancient tubercle bacilli’s evolutionary history.

Suggested citation for this article: Filliol I, Driscoll JR, van Soolingen D, Kreiswith BN, Kremer K, Valétudie G, et al. Global distribution of Mycobacterium tuberculosis spoligotypes. Emerg Infect Dis [serial online] 2002 Nov [date cited]. Available from http://www.cdc.gov/ncidod/EID/vol8no11/02-0125.htm

This paper was written as part of the EU Concerted Action project QLK2-CT-2000-00630 and partly supported by the Réseau International des Instituts Pasteur et Instituts Associés, Institut Pasteur and Fondation Française Raoul Follereau, France. An electronic, simplified, version of SpolDB3 is available from the corresponding authors upon request.

Dr. Filliol performed this work as part of her doctoral thesis. She has been working at the Institut Pasteur de Guadeloupe for the last 4 years. Her research focuses on molecular epidemiology and phylogeny of tubercle bacilli.

ReferencesSola C, Filliol I, Guttierez CM, Mokrousov I, Vincent V, Rastogi N Spoligotype database of Mycobacterium tuberculosis: biogeographical distribution of shared types and epidemiologic and phylogenetic perspectives. Emerg Infect Dis. 2001;7:390611384514Kremer K, van Soolingen D, Frothingham R, Haas WH, Hermans PWM, Martin C, Comparison of methods based on different molecular epidemiological markers for typing of Mycobacterium tuberculosis strains: interlaboratory study of discriminatory power and reproducibility. J Clin Microbiol. 1999;37:26071810405410Sebban M, Mokrousov I, Rastogi N, Sola C A data-mining approach to spacer oligonucleotide typing of Mycobacterium tuberculosis. Bioinformatics. 2002;18:23543 10.1093/bioinformatics/18.2.23511847071Sola C, Filliol I, Legrand E, Mokrousov I, Rastogi N Mycobacterium tuberculosis phylogeny reconstruction based on combined numerical analysis with IS1081, IS6110, VNTR and DR-based spoligotyping suggests the existence of two new phylogeographical clades. J Mol Evol. 2001;53:6809 10.1007/s00239001025511677628van Embden JDA, Cave MD, Crawford JT, Dale JW, Eisenach KD, Gicquel B, Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized methodology. J Clin Microbiol. 1993;31:40698381814Supply P, Lesjean S, Savine E, Kremer K, van Soolingen D, Locht C Automated high-throughput genotyping for the study of global epidemiology of Mycobacterium tuberculosis based on mycobacterial interspersed repetitive units. J Clin Microbiol. 2001;39:356371 10.1128/JCM.39.10.3563-3571.200111574573Glynn JR, Whiteley J, Bifani PJ, Kremer K, van Soolingen D Worldwide occurrence of Beijing/W strains of Mycobacterium tuberculosis: a systematic review. Emerg Infect Dis. 2002;8:843912141971