mBioMBiombiombiomBiomBio2150-7511American Society of Microbiology1752 N St., N.W., Washington, DC250284294161256mBio01464-1410.1128/mBio.01464-14Research ArticleCryptococcus gattii in North American Pacific Northwest: Whole-Population Genome Analysis Provides Insights into Species Evolution and DispersalTracing C. gattii Evolution and Dispersal to PNWEngelthalerDavid M.aHicksNathan D.aGilleceJohn D.aRoeChandler C.aSchuppJames M.aDriebeElizabeth M.aGilgadoFelixbCarricondeFabianb,cTrillesLucianab,dFiracativeCarolinab,eNgamskulrungrojPopchaib,fCastañedaElizabetheLazeraMarcia dos SantosdMelhemMarcia S. C.gPérez-BercoffÅsab,hHuttleyGavinhSorrellTania C.bVoelzKerstini,jMayRobin C.i,jFisherMatthew C.kThompsonGeorge R.IIIlLockhartShawn R.mKeimPaula,nMeyerWielandbTranslational Genomics Research Institute, Flagstaff, Arizona, USAMolecular Mycology Research Laboratory, Center for Infectious Diseases and Microbiology, Sydney Medical School, Westmead Hospital, Marie Bashir Institute for Infectious Diseases and Biosecurity, The University of Sydney, Westmead Millennium Institute, Sydney, AustraliaInstitut Agronomique Néo-Calédonien (IAC), Noumea, New CaledoniaLaboratório de Micologia, Instituto de Pesquisa Clínica Evandro Chagas, FIOCRUZ, Rio de Janeiro, BrazilMicrobiology Group, Instituto Nacional de Salud, Bogotá, ColombiaDepartment of Microbiology, Faculty of Medicine, Siriraj Hospital, Mahidol University, Bangkok, ThailandPublic Health Reference Laboratory, Institute Adolfo Lutz, São Paulo, SP, BrazilJohn Curtin School of Medical Research, Australian National University, Canberra, AustraliaInstitute of Microbiology and Infection and School of Biosciences, University of Birmingham, Birmingham, United KingdomNational Institute for Health Research Surgical Reconstruction and Microbiology Research Centre, Queen Elizabeth Hospital, Birmingham, United KingdomDepartment of Infectious Disease Epidemiology, Imperial College London, London, United KingdomUniversity of California, Davis, Davis, California, USACenters for Disease Control and Prevention, Atlanta, Georgia, USAMicrobial Genetics and Genomics Center, Northern Arizona University, Flagstaff, Arizona, USAAddress correspondence to Paul Keim (population genetics and genomics questions), paul.keim@nau.edu, or Wieland Meyer (mycology and molecular epidemiology questions), wieland.meyer@sydney.edu.au.

D.M.E. and N.D.H. contributed equally to this article.

Editor Arturo Casadevall, Albert Einstein College of Medicine

This article is a direct contribution from a member of the American Academy of Microbiology.

1572014Jul-Aug201454e01464-1415620141662014Copyright © 2014 Engelthaler et al.2014Engelthaler et al.This is an open-access article distributed under the terms of the Creative Commons Attribution-Noncommercial-ShareAlike 3.0 Unported license, which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original author and source are credited.ABSTRACT

The emergence of distinct populations of Cryptococcus gattii in the temperate North American Pacific Northwest (PNW) was surprising, as this species was previously thought to be confined to tropical and semitropical regions. Beyond a new habitat niche, the dominant emergent population displayed increased virulence and caused primary pulmonary disease, as opposed to the predominantly neurologic disease seen previously elsewhere. Whole-genome sequencing was performed on 118 C. gattii isolates, including the PNW subtypes and the global diversity of molecular type VGII, to better ascertain the natural source and genomic adaptations leading to the emergence of infection in the PNW. Overall, the VGII population was highly diverse, demonstrating large numbers of mutational and recombinational events; however, the three dominant subtypes from the PNW were of low diversity and were completely clonal. Although strains of VGII were found on at least five continents, all genetic subpopulations were represented or were most closely related to strains from South America. The phylogenetic data are consistent with multiple dispersal events from South America to North America and elsewhere. Numerous gene content differences were identified between the emergent clones and other VGII lineages, including genes potentially related to habitat adaptation, virulence, and pathology. Evidence was also found for possible gene introgression from Cryptococcus neoformans var. grubii that is rarely seen in global C. gattii but that was present in all PNW populations. These findings provide greater understanding of C. gattii evolution in North America and support extensive evolution in, and dispersal from, South America.

IMPORTANCE

Cryptococcus gattii emerged in the temperate North American Pacific Northwest (PNW) in the late 1990s. Beyond a new environmental niche, these emergent populations displayed increased virulence and resulted in a different pattern of clinical disease. In particular, severe pulmonary infections predominated in contrast to presentation with neurologic disease as seen previously elsewhere. We employed population-level whole-genome sequencing and analysis to explore the genetic relationships and gene content of the PNW C. gattii populations. We provide evidence that the PNW strains originated from South America and identified numerous genes potentially related to habitat adaptation, virulence expression, and clinical presentation. Characterization of these genetic features may lead to improved diagnostics and therapies for such fungal infections. The data indicate that there were multiple recent introductions of C. gattii into the PNW. Public health vigilance is warranted for emergence in regions where C. gattii is not thought to be endemic.

cover-dateJuly/August 2014
INTRODUCTION

Cryptococcus gattii, an environmental basidiomycetous yeast and the primary cause of cryptococcosis in immunocompetent patients, has been previously primarily associated with central nervous system disease in tropical and subtropical regions of the world (1). This distribution and disease phenotype are in contrast with those of another causative agent of cryptococcosis, Cryptococcus neoformans, which is primarily an opportunistic pathogen of immunosuppressed patients and is distributed worldwide. In addition, C. neoformans is classically found in bird excreta and decaying wood from a wide variety of trees (24), whereas C. gattii is primarily found growing in tree bark, tree detritus, and decaying wood. C. gattii has been closely associated with a number of trees, notably Eucalyptus species, which were thought to be the dominant environmental source allowing for transportation of C. gattii out of Australia where it is endemic to locales, such as California, Greece, and elsewhere (5, 6). It has since been shown to be endemic in South America where it is found in close association with a large number of tropical trees (2, 79). C. gattii is currently classified into four distinct populations, referred to previously as “major molecular types”: VGI (AFLP4), VGII (AFLP6), VGIII (AFLP5), and VGIV (AFLP7) (10), based on multilocus sequence typing (MLST), amplified fragment length polymorphism (AFLP), and other subgenomic genotyping methodologies. Each of these populations or molecular types can be divided further into distinct lineages or subtypes, as defined by AFLP profiling or MLST sequence typing, which identify individual clonal lineages within each molecular type.

In the late 1990s, infections caused by C. gattii molecular type VGII were reported on Vancouver Island in British Columbia, Canada (1113); until this time, reports of C. gattii infection were uncommon in North America (1). Two clonal subtypes within the VGII molecular type known as VGIIa (AFLP6A) and VGIIb (AFLP6B) were identified as the cause of the outbreak, which subsequently spread to the Canadian mainland (14) and other regions of the Pacific Northwest (PNW) (15, 16). Within a decade, an additional novel VGII subtype, VGIIc, appeared in the PNW, primarily in the states of Oregon and Washington (16). Given the genomic similarity of isolates within each subtype and finding only the α mating type locus, all three subtypes were thought to be distinct clones (17). While all three PNW subtypes of C. gattii display a shared novel biome preference (presence in a temperate versus tropical climate), only VGIIb maintains the classical clinical features of neurologic dominance (18). The VGIIa and VGIIc subtypes, on the other hand, share distinct novel phenotypes (dominantly pulmonary disease and increased virulence, both in vitro and in vivo) (18).

Given their geographic proximity, time frame of emergence, and genotypic and phenotypic similarity, it was proposed that VGIIa and VGIIc originated relatively recently from a common ancestor (15, 19). Several studies were conducted to better understand the genotypic and phylogenetic nature of the C. gattii strains associated with these outbreaks (11, 13, 15, 17). VGIIa and VGIIb strains have now been found in other non-North American locales, and genotyping studies have provided evidence that these strains emerged independently from a South American source (9). The VGIIc subtype has not been identified elsewhere thus far.

A number of studies have described in vitro and in vivo clinical phenotypic variances (i.e., virulence in animal model, clinical presentation, etc.) between the emergent North American strains (15, 18, 20). In animal models of infection, VGIIa and VGIIc strains have been shown to be more virulent than VGIIb and other C. gattii molecular types (9, 21). It is also clear that the virulence profile within subtype VGIIa does not remain consistent globally with isolates of lesser virulence being described (13, 21; C. Firacative and W. Meyer, unpublished data).

Niche adaptation and phenotypic changes in pathogenic fungi are known to happen through a series of evolutionary mechanisms, including gene duplication, sexual recombination, base mutation, and to some degree, horizontal gene transfer (22). The potential genetic causes for the phenotypic and niche differences in the PNW C. gattii strains are not well understood, although potential contributing factors have been deduced from targeted expression analyses (20, 23). Whole-genome analyses provide a more comprehensive view of phylogenetic relationships and genomic evolution. Here we apply population-level whole-genome approach with the following aims: (i) to more thoroughly understand the population genomics of the C. gattii VGII molecular type; (ii) to explore the emergence of its subtypes in North America; and (iii) to identify possible genetic causes for its changing phenotype.

RESULTSWhole-genome sequencing and SNP analysis.

Whole genomes from 118 Cryptococcus gattii isolates, representing both the Pacific Northwest (PNW) subtypes and the known global VGII molecular genotype diversity, along with isolates of each of the other major C. gattii molecular types (VGI, VGIII, and VGIV), were sequenced and assembled de novo, using data from multiple next-generation sequencers (all Illumina technology). We obtained at least 10× coverage across an average of 98.27% of the genome (mode, 98.64%; range, 90.63% to 99.43%) compared to the assembled R265 reference (VGIIa), with an average depth of 111× (range, 24× to 532×). The average assembly size was ~17.55 Mb (range, 17.36 to 17.83 Mb).

We determined the presence of single nucleotide polymorphisms (SNPs) across and between subtypes. A total of 1,282,876 SNPs were identified from all genomes, including the VGI, VGII, VGIII, and VGIV molecular types; 544,881 SNPs were parsimony informative, that is they distinguished between more than one strain and were not autapomorphic (Fig. 1). Within the 115 VGII genomes, 310,969 SNPs were found, with 221,248 being parsimony informative (Fig. 2). Consistent with recently emerged and low-diversity clones, only 717, 246, and 528 SNPs were identified within the VGIIa, VGIIc, and VGIIb populations, respectively (Fig. 3, Fig. 4, and Fig. 5, respectively). A combined analysis of only the three PNW clones identified little to no homoplasy (data not shown), as previously described (17).

Phylogenetic analysis of C. gattii isolates representing the major molecular types VGI to VGIV. Maximum-parsimony phylogenetic analysis was performed on WGST SNP data from 35 C. gattii genomes (one each of VGI, VGIII, and VGIV molecular types and 32 genomes representing the diversity of the VGII subtypes). The analysis found 1,282,876 total SNPs with 544,881 of them being parsimonious SNPs, with a consistency index (CI) of 0.729. The tree shown is not rooted. Branch lengths represent numbers of SNPs between taxa, with the unit bar in the figure. Unless otherwise shown in red, all branches had bootstrap values of 100 with 500 generations. The taxon nomenclature includes a unique strain identifier and a two-letter country or state abbreviation for the geographic source of the original isolate (AR, Argentina; AU, Australia; AW, Aruba; BR, Brazil; CO, Colombia; DK, Denmark; GR, Greece; JP, Japan; MY, Malaysia; TH, Thailand; UY, Uruguay; VE, Venezuela; BC, British Columbia, Canada; ZA, South Africa; and for the United States, CA, California; FL, Florida; HI, Hawai’i; ID, Idaho; OR, Oregon; WA, Washington).

Phylogenetic network of C. gattii VGII. A neighbor net representation of the relationships among all VGII subtypes based on WGST SNP data, using the uncorrected p-distance transformation. Each band of parallel edges indicates a split. For example, the two splits that group ST33 with two distinct groups of other VGII subtypes are labeled with red A and B letters. The split joining VGIIc with three additional STs is labeled with a red C letter.

SNP phylogeny from whole genomes from VGIIa isolates. Maximum-parsimony phylogenetic analysis was performed on all isolate genomes within VGIIa. The VGIIa tree includes 717 SNPs (97 parsimony informative) from 17 isolates (CI = 1.0). For country and state abbreviations, see the legend to Fig. 1. Numbers in black on top of branches are SNP length, and numbers in red next to branches are the bootstrap values generated from 500 generations.

SNP phylogeny from whole genomes from VGIIc isolates. Maximum-parsimony phylogenetic analysis was performed on all isolate genomes within VGIIc. The VGIIc tree includes 246 SNPs (55 parsimony informative) from 25 isolates (CI = 1.0). For state abbreviations, see the legend to Fig. 1. Numbers in black on top of branches are SNP length, and numbers in red next to branches are the bootstrap values generated from 500 generations.

SNP phylogeny from whole genomes from VGIIb isolates. Maximum-parsimony phylogenetic analysis was performed on all isolate genomes within VGIIb. The VGIIb tree includes 528 SNPs (147 parsimony informative) from 17 isolates (CI = 1.0). For country or state abbreviations, see the legend to Fig. 1. Numbers in black on top of branches are SNP length, and numbers in red next to branches are the bootstrap values generated from 500 generations.

Genome sequence data analysis.

Population-level genome analyses were initiated with an SNP-based assessment of phylogenetics and population structure, followed by gene presence/absence comparisons to understand the genetic relationships between and among the VGII subtypes in the PNW and other geographically diverse subtypes. A detailed description of the data analysis flowchart is shown in Fig. S1 in the supplemental material.

Phylogenetics.

Isolates chosen for phylogenetic analysis were representative of the global diversity of C. gattii VGII based on their multilocus sequence type (MLST), including a number of isolates from each of the PNW lineages (VGIIa, VGIIb, and VGIIc). Additionally, isolates of the other three C. gattii populations were analyzed. Based on whole-genome SNP typing (WGST), the VGII strains were clearly separated from other molecular types (i.e., VGI, VGIII, and VGIV), with greater than 500,000 SNPs between VGII and any other major molecular type (Fig. 1).

Using maximum-parsimony analysis for WGST, the VGII isolates separated mostly along the lines of their MLST (Table 1; see Fig. S2 and S3 in the supplemental material). A number of isolates with distinct sequence types (STs) were clearly more related to each other and likely represent members of the same clone with single or limited nucleotide variation within the MLST markers (e.g., VGIIc includes ST6 and ST49). Multiple clonal lineages are seen in the whole VGII phylogeny, including the PNW lineages (Fig. 2 and Fig. S2), and the long terminal branches indicate substantial genetic distance between most lineages. The maximum-parsimony tree (Fig. S2) had a relatively low consistency index of 0.4, indicating either a high degree of homoplasy from multiple mutations at the same sites or that VGII is not evolving in a strictly tree-like manner. To better visualize the relationships between lineages, we developed a neighbor-joining phylogenetic split tree network, or “neighbor net” (Fig. 2). The network contained many phylogenetic splits, which could not be condensed into a single tree phylogeny. This finding may indicate that sexual recombination events between distinct subtypes are contributing to the genetic diversity of VGII, as these events will cause additional splits due to different regions of the genome being best described by different trees.

Cryptococcus gattii and C. neoformans isolates used in this studya

GenotypeStrainSTCountrySourceYr of isolationFigure inclusion
VGIWM 17951AustraliaClinical19931, 6, 7
VGIIaB686420USAClinical20042, 3, S2, S3
B739520USAVeterinary20082, 3, S2, S3
B742220USAVeterinary20092, 3, S2, S3
B743620USAVeterinary20092, 3, S2, S3
B746720USAVeterinary20092, 3, S2, S3
B855520USAClinical20062, 3, S2, S3
B857720CanadaEnvironmental20092, 3, S2, S3
B879320USAVeterinary20102, 3, S2, S3
B884920USAEnvironmental20102, 3, S2, S3
B945720USAClinical20112, 3, S2, S3
B945820USAClinical20112, 3, S2, S3
B975720USAEnvironmental20022, 3, S2, S3
B975920USAVeterinary20032, 3, S2, S3
CA-101420USAClinical?2, 3, S2, S3
CBS 775020USAEnvironmental19902, 3, S2, S3
R26520CanadaClinical20011, 2, 3, 6, 7, S2, S3
HL_A120CanadaVeterinary20022, 3, S2, S3
HL_A320CanadaEnvironmental20022, 3, S2, S3
HL_A1120CanadaClinical20012, 3, S2, S3
HL_B520CanadaClinical20022, 3, S2, S3
HL_B620DenmarkClinical20062, 3, S2, S3
NIH 44420USAClinical19752, 3, S2, S3
WM 03.69720CanadaVeterinary20012, 3, S2, S3
WM 05.43220Japan-BrazilClinical20002, 3, S2, S3
WM 05.55420BrazilClinical20022, 3, S2, S3
WM 06.1020ArgentinaClinical20002, 3, S2, S3
ICB-107252BrazilClinical19812, 3, S2, S3
VGIIbB73947USAVeterinary20082, 5, S2, S3, S4
B77357USAClinical20092, 5, S2, S3, S4
B85547USAVeterinary20082, 5, S2, S3, S4
B88287USAVeterinary20102, 5, S2, S3, S4
B91577USAVeterinary20112, 5, S2, S3, S4
B95527USAVeterinary20112, 5, S2, S3, S4
B95637USAVeterinary20111, 2, 5, 6, 7, S2, S3, S4
B95887USAClinical20122, 5, S2, S3, S4
B97587USAEnvironmental20022, 5, S2, S3, S4
CDC R2727CanadaClinical20012, 5, S2, S3, S4
WM 25527MalaysiaClinical19972, 5, S2, S3, S4
WM 03.277AustraliaEnvironmental19922, 5, S2, S3, S4
WM 04.717AustraliaVeterinary19912, 5, S2, S3, S4
WM 05.4657BrazilClinical19972, 5, S2, S3, S4
WM 06.6347ThailandClinical19942, 5, S2, S3, S4
WM 06.6367ThailandClinical19952, 5, S2, S3, S4
WM 04.7530ThailandClinical19932, 5, S2, S3, S4
VGIIcB68636USAClinical20052, 4, S2, S3
B74326USAClinical20092, 4, S2, S3
B74346USAClinical20082, 4, S2, S3
B74666USAVeterinary20082, 4, S2, S3
B74916USAClinical20092, 4, S2, S3
B74936USAVeterinary20092, 4, S2, S3
B76416USAVeterinary20082, 4, S2, S3
B77376USAClinical20092, 4, S2, S3
B77656USAVeterinary20092, 4, S2, S3
B82106USAClinical20082, 4, S2, S3
B82146USAClinical20092, 4, S2, S3
B85106USAClinical20092, 4, S2, S3
B85716USAClinical20092, 4, S2, S3
B87886USAClinical20102, 4, S2, S3
B87986USAClinical20052, 4, S2, S3
B88256USAClinical20101, 2, 4, S2, S3
B88336USAClinical20102, 4, S2, S3
B88386USAClinical20102, 4, S2, S3
B88436USAClinical20102, 4, S2, S3
B98166USAVeterinary20122, 4, S2, S3
B99336USAClinical20122, 4, S2, S3
GT-11-76506USAVeterinary?2, 4, S2, S3
HL_B36USAClinical20052, 4, S2, S3
HL_B46USAClinical20082, 4, S2, S3
B739049USAClinical20082, 4, S2, S3
VGIIB897350USAClinical20101, 2, 6, 7, S2, S3
B976421USAVeterinary20121, 2, 6, 7, S2, S3
IAL-3225264BrazilVeterinary19941, 2, 6, 7, S2, S3
IAL-3234137BrazilVeterinary19981, 2, 6, 7, S2, S3
IAL-3243283BrazilVeterinary20001, 2, 6, 7, S2, S3
HL_A228BrazilClinicalPre-20082, S2, S3
HL_A43UruguayEnvironmental19961, 2, 6, 7, S2, S3
HL_A5124BrazilEnvironmentalPre-20081, 2, 6, 7, S2, S3
HL_A625ArubaVeterinary19532, S2, S3
HL_A8185BrazilEnvironmentalPre-20082, S2, S3
HL_B2248BrazilVeterinary20001, 2, 6, 7, S2, S3
HL_B1135GreeceClinical19961, 2, 6, 7, S2, S3
HL_B1218GreeceClinical19982, S2, S3
WM 17821AustraliaClinical19912, S2, S3
WM 185025VenezuelaClinical19992, S2, S3
WM 185145VenezuelaClinical19991, 2, 6, 7, S2, S3
WM 303233AustraliaClinical19831, 2, 6, 7, S2, S3
WM 04.7831ColombiaClinical19981, 2, 6, 7, S2, S3
WM 04.8434BrazilClinical19861, 2, 6, 7, S2, S3
WM 05.27429ColombiaClinical20021, 2, 6, 7, S2, S3
WM 05.27525ColombiaClinical20011, 2, 6, 7, S2, S3
WM 05.33943ColombiaEnvironmental20052, S2, S3
WM 05.34225ColombiaEnvironmental20052, S2, S3
WM 05.41939BrazilClinical19881, 2, 6, 7, S2, S3
WM 05.45216BrazilClinical19951, 2, 6, 7, S2, S3
WM 05.45617BrazilEnvironmental19941, 2, 6, 7, S2, S3
WM 05.45719BrazilClinical19952, S2, S3
WM 05.46114BrazilClinical19972, S2, S3
WM 05.46215BrazilClinical19971, 2, 6, 7, S2, S3
WM 05.52541BrazilClinical19971, 2, 6, 7, S2, S3
WM 05.52728BrazilClinical19971, 2, 6, 7, S2, S3
WM 05.52840BrazilClinical20011, 2, 6, 7, S2, S3
WM 05.52927BrazilClinical19971, 2, 6, 7, S2, S3
WM 05.53013BrazilClinical19991, 2, 6, 7, S2, S3
WM 05.53311BrazilClinical19971, 2, 6, 7, S2, S3
WM 05.5369BrazilClinical19972, S2, S3
WM 05.54524BrazilClinical20012, S2, S3
WM 05.54623BrazilClinical20012, S2, S3
WM 05.54722BrazilClinical20011, 2, 6, 7, S2, S3
WM 06.1237VenezuelaClinical19962, S2, S3
WM 08.30948AustraliaVeterinary19971, 2, 6, 7, S2, S3
WM 08.3115AustraliaVeterinary19961, 2, 6, 7, S2, S3
WM 09.9438AustraliaVeterinary20011, 2, 6, 7, S2, S3
WM 09.15248AustraliaEnvironmental20092, S2, S3
WM 11.6533AustraliaVeterinary20112, S2, S3
VGIIIWM 11.139143USAVeterinary20111, 7
VGIVWM 77970South AfricaVeterinary19941, 7
Cryptococcus neoformans
VNIH992USAClinical19787
VNIVJEC21126USAClinical19977

The strain designation, major molecular type, MLST sequence type, and figures that the strains are shown on are indicated.

When the phi test for recombination was performed using the SNP data, the test indicated that recombination was present with a very high level of significance (P = 0.0). We then examined the phi value across the genome to see whether recombination could be localized to a particular subset of the genome. Using nonoverlapping windows of 500 concatenated SNP positions, we tested for recombination separately in 673 “windows” across the genome (encompassing variable genome window sizes due to the nonuniform density of SNPs in the genomes). Recombination was detected at a significant level in 631 of these windows when the P value cutoff was adjusted for making 673 comparisons (P < 7.2−5), indicating that the recombination signal was common across most of the genome and not localized to a single region.

Due to the evidence for recombination among isolates, a model-based Bayesian approach (24) for identifying population structure and individual assignment to populations was added to our phylogenetic analyses. fineStructure analysis placed the 32 distinct VGII subtypes, as identified by WGS and MLST (see Fig. S2 and S3 in the supplemental material), into 11 “populations” or groups related by shared genomic regions, with some of these containing only a single representative (Fig. 6). For several of these groups, members shared many genome regions with members from other groups, indicating incomplete or recent separation between groups. This finding is also supported by the presence of isolates, such as B9764 and WM 05.525, which apparently share genetic material with several populations and may represent admixed individuals. Importantly, all three of the PNW subtypes were placed into separate fineStructure “populations,” all of which include members from South America. The neighbor net network (Fig. 2), maximum parsimony-based WGST (Fig. S2), and MLST (Fig. S3) analyses all demonstrate broad dispersal of subtypes from South America throughout the VGII population.

fineStructure analysis of dominant VGII subtypes. To identify the population structure and possible recombination events within VGII, fineStructure analysis was performed using the SNP matrix developed for Fig. 2. Using one representative from each of the 32 VGII subtypes, whole-genome SNP data were reduced to a pairwise similarity matrix. The population structure is identified using this similarity matrix, where the underlying model assumes individuals within populations will share more regions of their genome with each other and have a similar amount of admixture with individuals from different populations. Identified populations are merged one at a time, at each step using the most likely population merge, to relate populations to each other through a tree. The x-axis analysis represents the strain as a “recipient,” and the y axis represents the strain as a “donor” of genomic regions. The scale bar represents the number of shared genome regions with blue being the greatest amount of sharing and yellow being the least. The PNW-associated subtype representatives are outlined in green.

The members of the PNW subtypes (i.e., VGIIa, VGIIc, and VGIIb) showed strict clonality within their respective subtype, with consistency indices (CI) of 1.0 for each subtype tree (Fig. 3 to 5), and significant separation from each other, as described previously (17). Bioinformatic attempts to determine a date for the most recent common ancestor of each subtype did not yield useful estimates (analysis not shown), which may be due to the small amount of SNP data available or potentially the nonclocklike evolution of C. gattii.

The VGIIa subtype contained two distinct clades, a North American-dominated group and a South American group (Fig. 3). The North American VGIIa clade showed little topological structure with a 15-branch polytomy consistent with a radiation emanating from a recent common ancestor. One subclade, joined by 13 shared SNPs, contained a set of three isolates previously described as having lower virulence (13, 20), one of which was isolated in Brazil in 1981 and was genomically separated from the others by 261 additional SNPs. The other two members of this lower-virulence subclade were isolated in Washington State in 1972 and California in 1990 and had SNP branch lengths of 0 and 52, respectively. The VGIIc population was also strictly clonal (CI = 1.0) with a total of 55 parsimony informative SNPs (Fig. 4). This subtype phylogeny contained a 5-branched polytomy but with several distinct subclades. The branch lengths were short (<16 SNPs); notably, all of these isolates were collected between 2008 and 2012.

Unlike VGIIa and VGIIc, the VGIIb subtype is dispersed beyond the Western Hemisphere (6, 13, 15, 25). This subtype contained 528 SNPs (147 parsimony informative) and multiple distinct clades (Fig. 5). North American isolates were found in multiple locations across the tree, including in one large derived clade where they shared membership with isolates that were collected in Asia and Australia. The Asian-Australian/North American subclade shows polytomy, which is consistent with the presence of a genetic bottleneck and subsequent radiation, or may be a reflection of limited sampling. A single isolate from Florida shared membership in a separate basal clade with an isolate from Australia.

Gene content.

Ab initio gene prediction using the pretrained Augustus software identified a median of 6,276 genes per genome (range, 6,213 to 6,359). The gene content analysis initially focused on differences and similarities between the PNW subtypes. The use of BLAT score ratio (BSR) for this analysis allowed us to make gene predictions from the genome sequences, where exon-coding DNA sequences were concatenated into single contiguous sequences to create the in silico coding DNA (is-cDNA), the presence or absence of which could then be scored for the PNW genomes. The gene sequences from the three PNW subtypes clustered into 6,302 is-cDNAs. The gene sequences from the remaining 29 non-PNW subtypes clustered into 6,541 is-cDNAs. The individual and shared gene differences in the PNW subtypes were scored against representative genomes from the other known global VGII subtypes. Additionally, we surveyed for the presence or absence of specific genes from any identified PNW clone (e.g., PNW-VGIIa) within the representative genomes from the other C. gattii major molecular types (VGI, VGIII, and VGIV), as well as within the C. neoformans var. grubii and C. neoformans var. neoformans reference genomes (Table 2). A heat map was developed (Fig. 7) to visualize the gene presence, absence, partial presence, and/or presence of genes with low sequence similarity. We identified several gene content similarities and differences in all of the comparisons (Table 2). Many of the identified putative genes, or is-cDNAs, were located in contiguous blocks of assembled genome sequence, labeled clusters 1 to 7 in Fig. 7. When the PNW clones were compared with each other, VGIIa had the largest number of PNW clone-specific gene content differences. Additionally, the PNW VGIIa-specific genes were typically absent in the other VGII subtypes and non-VGII genomes. Conversely, the PNW VGIIb-specific and VGIIc-specific genes (i.e., those genes found only in their respective subtype in the PNW) were commonly found in the other VGII subtypes and non-VGII C. gattii genomes. A large number of genes that were present in both VGIIa and VGIIc were absent from VGIIb. These genes, most of which were located adjacent to each other in clusters 1 and 2, were found in nearly all other C. gattii genomes, indicating a likely gene loss event in the PNW VGIIb population. An analysis of genes common to all three PNW populations but absent in other VGII subtypes identified a three-gene cluster (“cluster 6”) that was found to be absent in most C. gattii genomes but was present, with a high degree of similarity, in the C. neoformans var. grubii genome (Fig. 7).

BLAT score ratio gene content comparison

Gene content similarity and is-cDNAFamily hit(s)PfamE value(s)
Present in PNW VGIIa/c
        is-cDNA 1Transmembrane amino acid transporter protein familyPF01490.131.90E−33
        is-cDNA 2Endoribonuclease L-PSPPF01042.168.30E−39
        is-cDNA 3Aminotransferase class I and II domainPF00155.165.20E−47
        is-cDNA 4No significant matches
        is-cDNA 5Zinc finger double domain, fungus-specific transcription factor domainPF13465.1, PF04082.134.4E−7, 2.3E−16
        is-cDNA 6Major facilitator superfamilyPF07690.115.80E−11
        is-cDNA 7Peptidase family M20/M25/M40, peptidase dimerization domainPF01546.23, PF07687.93E−6, 4.2E−9
        is-cDNA 8Major facilitator superfamily, fungus-specific transcription factor domainPF07690.11, PF04082.135.5E−29, 9.6E−12
        is-cDNA 9No significant matches
        is-cDNA 10Arylsulfotransferase (ASST) familyPF14269.19.90E−14
        is-cDNA 11Major facilitator superfamilyPF07690.117.20E−23
        is-cDNA 12Phosphatidylethanolamine-binding protein domainPF01161.155.70E−18
        is-cDNA 13Fungus-specific transcription factor domainPF04082.132.90E−05
        is-cDNA 14Glycosyl hydrolase family 32 N-terminal domain, glycosyl hydrolase family 32 C-terminal domainPF00251.15, PF08244.71.8E−51 4.8E−10
        is-cDNA 15No significant matches
        is-cDNA 16Alcohol dehydrogenase GroES-like domain, zinc-binding dehydrogenasePF08240.7, PF13602.18.5E−6, 2.8E−19
        is-cDNA 17No significant matches
Present in PNW VGIIb
        is-cDNA 18E1-E2 ATPase, haloacid dehalogenase-like hydrolase3.6E−47, 3.1E−21
        is-cDNA 19No significant matches
        is-cDNA 20No significant matches
Present in PNW VGIIa
        is-cDNA 21Sugar (and other) transporterPF00083.191.20E−94
        is-cDNA 22Alcohol dehydrogenase GroES-like domain, zinc-binding dehydrogenasePF08240.7, PF00107.211.6E−28, 1.6E−16
        is-cDNA 23Major facilitator superfamilyPF07690.111.20E−17
        is-cDNA 24No significant matches
        is-cDNA 25Aldo/keto reductase familyPF00248.161.30E−52
        is-cDNA 26Protein of unknown function, putative collagen-binding domain of a collagenasePF13204.1, PF12904.22.1E−54, 7.4E−19
        is-cDNA 27Sugar (and other) transporterPF00083.193.70E−93
        is-cDNA 28Alcohol dehydrogenase GroES-like domain, zinc-binding dehydrogenasePF08240.7, PF00107.215.3E−28, 1.2E−15
        is-cDNA 29Fungus-specific transcription factor domain, fungal Zn(2)-Cys(6) binuclear cluster domainPF04082.13, PF00172.132E−16, 1.8E−4
        is-cDNA 30No significant matches
        is-cDNA 31No significant matches
        is-cDNA 32Aspartyl proteasePF09668.53.30E−05
        is-cDNA 33Helicase conserved C-terminal domain, DEAD/DEAH box helicasePF00271.26, PF00270.244.6E−7, 1E−4
Present in PNW VGIIc
        is-cDNA 34Fungus-specific transcription factor domainPF04082.132.50E−11
        is-cDNA 35Lyase, adenylosuccinate lyase C terminusPF00206.15, PF10397.41.5E−47, 2.6E−18
        is-cDNA 36Sugar (and other) transporterPF00083.195.40E−80
        is-cDNA 37No significant matches
        is-cDNA 38No significant matches
        is-cDNA 39Protein of unknown functionPF11175.31.40E−80
        is-cDNA 40Sugar (and other) transporterPF00083.198.30E−63
        is-cDNA 41Protein of unknown functionPF11175.36.70E−83
        is-cDNA 42No significant matches
        is-cDNA 43No significant matches
Present in PNW VGIIa/b/c and missing in half of VGII STs
        is-cDNA 44Glycosyl hydrolase family 3 N-terminal domain, fibronectin type III-like domainPF00933, PF143101.9E−70,7.5E−20
        is-cDNA 45Alpha-l-rhamnosidase N-terminal domain, bacterial alpha-l-rhamnosidasePF08531, PF0559204.7E−36, 9.5E−166
        is-cDNA 46Sugar (and other) transporterPF000832.00E−57
        is-cDNA 47No significant matches
Missing in PNW and present in half of VGII STs
        is-cDNA 48Fungus-specific transcription factor domainPF04082.131.90E−23
        is-cDNA 49Proline dehydrogenasePF01619.134.60E−31
        is-cDNA 50Transmembrane amino acid transporter proteinPF01490.132.8E−17, 1E−9

Population-level gene content analysis. To identify shared and differing gene content between and among the PNW subtypes compared to other VGII subtypes, other C. gattii molecular types, and other Cryptococcus species, BLAT score ratio analysis was performed, which is described in detail in the Materials and Methods. The genomes scored are the same as in Fig. 6 plus one representative each from VGI, VGIII, VGIV, C. neoformans var. neoformans, and C. neoformans var. grubii. VGII isolates are presented in the populations identified by fineStructure analysis where the tree on top is reproduced from Fig. 6. The rows on the left are divided by comparison group (VGIIa only, VGIIa and VGIIc only, etc.); these rows are further grouped by identified contiguous gene clusters (clusters 1 to 7). Genes of high homology between groups are yellow (complete homology = 1.0 on the heat map bar), and no homology (complete absence of the scored gene = 0.0) is represented by blue, with intermediate colors representing a gradient of similarity. The PNW population representatives and their scores are outlined in green.

A comparison of phylogenies based on linked SNPs and gene content showed dissimilar relationships among VGII subtypes (data not shown). This suggests that differential gene content has not become fixed within the VGII lineages. As such, placement of an isolate in a phylogeny or within a subtype using SNP data is not necessarily a good indicator of its gene content, and conversely, gene content is not a good indicator of phylogenetic placement.

DISCUSSION

The emergence of C. gattii in temperate British Columbia in Canada and the U.S. PNW was unexpected in that C. gattii was previously thought to be prevalent only in tropical and semitropical regions. There has since been much speculation on the origins of the outbreak strains (9, 12, 15, 16). In this study, whole-genome sequencing was performed on 118 C. gattii isolates, primarily of the molecular type VGII, to better ascertain the source and genomic adaptations behind its emergence in the PNW. Our primary strategy was to analyze SNP mutations and gene content and to compare the PNW subtype genomes with a genomically diverse global collection of isolates. This study shows the following. (i) There were most likely multiple distinct North American introductions of at least two of the three PNW VGII subtypes. (ii) Gene content varies between the VGII subtypes. (iii) C. gattii has developed a number of evolutionary strategies that allow for continual niche adaptation. This study also explores the relationship of the North American VGII subtypes to a global C. gattii collection and the potential role of unique genes in virulence and niche adaption.

Out of South America? (phylogenetic analyses).

Analysis of the population genome structure of the PNW VGII clones provides additional significant evidence of multiple and continued emergence of apparently regionally fit lineages from within VGII originating in South America, as has been proposed previously (9, 26). As the apparent level of recombination across VGII negates the use of a single SNP-based tree to understand the true phylogenetic relationships among all VGII members, we conducted a neighbor net analysis, providing a better understanding of relationships within the phylogenetic network. For example, the Australian ST33 subtype, which appears most genetically distant on the WGST SNP tree (see Fig. S2 in the supplemental material), still appears most genetically distant in the network; however, there are several distinct splits uniting this subtype with other subtypes (see splits A and B on Fig. 2). The ST33 lineage, therefore, may have continued to exchange genetic material with these taxa after its derivation from the bulk of currently sampled VGII taxa.

From within the genomic diversity of VGII, we can see the emergence of the three low-diversity and highly fit clones (i.e., VGIIa, VGIIb, and VGIIc) that have become prevalent in North America. The emergence of these three clonal subtypes clearly represents at least three distinct events. Our phylogenetic analyses suggest that the phenotypically differentiated VGIIa and VGIIc lineages evolved in South America and spread to the PNW. Additionally, the widely disseminated VGIIb lineage and the VGIIa subtype were likely exported, possibly out of South America, on multiple occasions, and the VGIIb subtype has emerged in multiple geographic locations.

(i) VGIIa.

The likely most recent translocation of the VGIIa lineage, as inferred from the very short branch lengths within its respective clade (Fig. 3), resulted in its distinct emergence on Vancouver Island, spreading to the Canadian mainland and into Northwest United States. There is a distinct separation of the PNW dominant VGIIa clade and a clade of South American origin (i.e., Brazil, Argentina, and Japan [in a Brazilian emigrant]) strains. The two VGIIa clades likely have a recent common ancestor, and the lack of its presence elsewhere suggests that this subtype originated in South America. Additionally, WGST allowed us to identify genetic separation between the strains identified during the recent emergence on Vancouver Island and the subclade composed of the older, but less virulent, North American VGIIa strains: NIH-444 (1972; Washington) and CBS7750 (1990; California) (16). This lower-virulence subclade also contained a 1981 Brazilian isolate (ICB-107). It is possible that the latter isolate represents the originating lineage in Brazil and that the two North American isolates represent two distinct translocation events of that same lineage to North America prior to the eventual PNW emergence. Strain ICB-107 was isolated from a patient in Recife, a major port city in Brazil, representing a possible point of export. A lack of epidemiological information associated with that particular isolate prevents further understanding at this time.

(ii) VGIIc.

The VGIIc subtype has been found only in the PNW. Its most recent splits (i.e., closest genetic relatives) in the phylogenetic network were found exclusively in South America (see split A in Fig. 2). The very short branch lengths and lack of distinct separate clades within this subtype (Fig. 4) are indicative of an individual emergence of this subtype, likely from South America, where all of the most closely related subtypes were identified.

(iii) VGIIb.

VGIIb is the most diverse of the VGII lineages found in the PNW and likely has been translocated to North America and other parts of the world on multiple occasions. The origin of PNW VGIIb strains is not clear. A VGIIb subclade defined by the putative loss of a 12-gene 10-kb contiguous genomic region (see Fig. S4 in the supplemental material; cluster 1 in Fig. 7) contains most, but not all, PNW VGIIb strains along with strains collected in the previous decade in Australia and Asia (6, 25). These strains are the most commonly found VGIIb strains in the PNW and they reportedly are less virulent than the other VGIIb strains found in the PNW (16). This 10-kb contig is absent from the derived clade yet occurs in all other, more-basal VGIIb strains, including two additional isolates found in PNW, which strongly suggests a single gene loss event in a parental strain. This event, again, provides evidence for multiple independent introductions of VGIIb to North America, which are discernible by both SNP states and genetic content. Further studies are necessary to assess the phenotypic impact of the loss of this genomic component.

The maximum possible consistency indices (CI = 1.0) for the VGIIa, VGIIb, and VGIIc phylogenetic trees demonstrate a lack of observable recombination occurring within or among the North America VGII isolates. These data, taken with the fact that all PNW strains contain only the mating type α locus, indicate that there is limited to no sexual reproduction occurring in these strains. Same-sex mating was previously hypothesized as an evolutionary driver of these subtypes (13). As no phylogenetic inconsistencies were identified among strains within each PNW subtype, there is no obvious evidence of same-sex mating in the region as well, although the progeny of such events may not be readily identified due to the genomic similarities of the parental strains. As sexual reproduction is typically necessary for the development of infectious propagules, it is possible that individual isolates undergo unisexual reproduction (selfing) or haploid fruiting, which have been previously described in C. neoformans (27, 28), to induce a yeast-to-hyphal transition.

Searching for North American adaptations (comparative genomics).

In order to identify genetic content associated with niche adaptation to the PNW, we explored the population genome content for genes associated with each of the North American VGII subtypes compared to each other and to other C. gattii molecular types. We were able to identify a subtype-specific gene present (compared to other PNW subtypes) in each of the three subtypes, as well as shared genes in VGIIa and VGIIc compared to the VGIIb subtype. These genomic differences could be the basis for specific niche adaptations and possible differences in virulence and tissue tropism (18). None of the PNW VGIIa, VGIIb, and/or VGIIc clone-defining genes or gene regions was completely unique to North American populations; all were identified in at least once in other global VGII genotypes analyzed.

(i) Notable gene differences.

The gene regions specific to VGIIa in the PNW were rarely seen in other global subtypes and may represent greater likelihood of novel genes. The majority of these VGIIa-specific regions were in a contiguous gene cluster (gene cluster 3, which contains is-cDNA 21 to 29, in Fig. 7) that was seen only in three other global subtypes, originating from Brazil, Colombia, and Australia. This gene cluster contains several genes associated with sugar transport and alcohol dehydrogenase and is possibly important for regional plant sugar metabolism (Table 2). This cluster also includes a collagen-binding domain, likely from a collagenase enzyme, which has been identified in other pathogenic fungi (29). Collagen is a component of all major lung cells (30), and collagen binding in the lung has been associated with colonization of Aspergillus in the asthmatic lung (31) and increased virulence in other pulmonary infections (32). The presence of a collagen-binding domain, therefore, may be relevant to binding of cryptococci to host lung cells and contribute to increased virulence or lung tissue tropism or both in the PNW VGIIa infections (18).

Interestingly, the VGIIa cluster 3 gene cluster is also absent from the other major C. gattii types (VGI, VGIII, and VGIV), except for variants of two of the sugar transport/metabolic domains that were present in the VGI genome, possibly an indication of a historical gene transfer event. Additionally, this gene cluster was absent from the C. neoformans var. grubii and C. neoformans var. neoformans reference genomes (Fig. 7).

An additional gene complex found in the VGIIa group, a DEAD box RNA helicase (is-cDNA 33), was rarely seen in VGII subtypes outside the PNW. Semihomologous genes or gene fragments were identified in VGIIc and four additional VGII subtypes and in the VGIII genome, as well as in the C. neoformans var. grubii genome. The DEAD box RNA helicase has been found previously in multiple eukaryotic and prokaryotic genomes and was recently established as an important virulence factor in C. neoformans, with reported impacts on environmental sensing, pathogenesis, and RNA metabolism (33). Additionally, RNA helicases have been associated with fungal cold response (34). As such, its presence/partial presence in VGIIa and VGIIc and relatively few other VGII lineages may suggest a linkage to differential virulence and/or temperate climate phenotype. Its absence from most other C. gattii genomes suggests one or more intertype or interspecies gene transfer events.

An analysis of the genome content of the three North American VGII subtypes against all other VGII genomes in the current study revealed an ~10-kb region, containing three genes, that was consistently found in all VGIIa, VGIIb, and VGIIc genomes. This cluster (cluster 6, which contains is-cDNA 44 to 46, in Fig. 7) was present in only a limited number of other VGII subtypes and was absent from the representative VGI, VGIII, and VGIV strains. BSR analysis revealed that the region had high similarity with a homologous region of the same size in the C. neoformans var. grubii reference genome but was missing from C. neoformans var. neoformans. The three genes in this cluster were identified as encoding rhamnosidase, avenacinase, and an unspecified sugar transporter. Rhamnosidase has been previously identified in Cryptococcus albidus (35), Aspergillus spp. (36), and other basidiomycetes and ascomycetes (37). The presence of rhamnosidase may enable more efficient use of rhamnose as a carbohydrate source. Of particular note, rhamnose is a free sugar produced by Rhamnus (buckthorn), a plant with a large distribution in the PNW. Rhamnose is also found in members of the Uncaria family, a genus found throughout the tropical region. It is a component of many plant and bacterial polysaccharides, as well as plant glycosides. Additionally, rhamnosidase in soft-rot fungi has been shown to have lignocellulolytic activity, which is involved in the degradation of hardwoods (38). Its presence in the PNW population may also provide an adaptive advantage to old-growth forests of the region, as C. gattii been recovered from rotting and decaying wood in other geographic areas (39, 40).

The genes found in VGIIc, but not VGIIa or VGIIb (is-cDNA 34 to 41), relate primarily to sugar transport, metabolism, and gene expression and are commonly seen in other VGII subtypes and non-VGII cryptococcal genomes, suggesting that the lack of homologous genes in VGIIa and VGIIb represents independent gene loss events.

Among the three gene regions found only in the North American VGIIb strains is a haloacid dehalogenase-like hydrolase (is-cDNA 18). This gene was rather infrequent in other VGII subtypes, although a highly homologous region was seen in VGI, VGIII, and VGIV. A haloacid dehalogenase was found to be a virulence factor in Paracoccidioides brasiliensis, necessary for lung epithelial cell adhesion (41). No function for this gene region has been established in Cryptococcus.

The majority of all other between-PNW subtype genetic differences include genes or domains that are seen relatively frequently in other VGII subtypes. The presence or absence of these genes will need to be further explored to understand any functional role associated with the PNW populations.

The 12-gene cluster that was absent in the derived clade of VGIIb (cluster 1, which contains is-cDNA 1 to 12, in Fig. 7; see Fig. S4 in the supplemental material) was present in all other VGIIb isolates, including the separate clade found in North America, and was found, wholly or partially, in all other Cryptococcus isolates. This gene complement is composed of structural, regulatory, transport, and repair genes. The significance of this deletion remains unknown, but it is a marker of the derived lower-virulence clade of VGIIb. Phenotypic studies are necessary to determine whether there are other differences between the derived clade and the rest of the VGIIb subtype.

(ii) Evidence of recombination.

Beyond the clear differences of gene content between and among VGII subtypes, multiple lines of evidence demonstrate the likelihood of “genome sharing” throughout the VGII population. In addition to the “all VGII” tree consistency index of ~0.4 (meaning a high degree of homoplasy) (see Fig. S2 in the supplemental material), and a significant phi test for recombination (P = 0.0), it is evident that gene content does not dictate phylogeny. We assume that recombination throughout the VGII population accounts for the high degree of homoplasy, which may be accomplished through larger genome content sharing events (i.e., bisexual or same-sex mating [19]). While the neighbor net phylogenetic network recovers a mostly star-like shape, there are apparent relationships between lineages (Fig. 2). The fineStructure analysis conducted here assigned lineages to specific “populations” and assessed levels of relatedness between genomes using adjacent linked SNPs to identify the nearest neighbor relationships for portions of the genome bounded by recombination events. Evidence of recombination was identified in South American isolates. For example, isolates WM 05.275 (originating in Colombia) and WM 05.529 (originating in Brazil) share many of their linked loci, although they are distinct subtypes and are separated by WGST, and they have distinct gene content by BSR. The same pattern is seen with isolates WM 05.419 and WM 05.547, both Brazilian clinical isolates. The VGIIa (R265 [British Columbia, Canada]) and VGIIb (B9563 [Washington]) representatives showed clear evidence of limited genome sharing with most isolates in the analysis. Conversely, the VGIIc genome showed almost no sharing with the vast majority of other VGII subtypes, except its nearest phylogenetic relatives from Brazil (samples WM 05.462 and IAL-3234). Taken together, these results indicate a relatively weak population structure within VGII or a recent subdivision of populations that have not yet allowed gene absence and presence to become fixed within recombining groups. Again, the high consistency indices for whole-genome SNP phylogenies of each PNW subtype (Fig. 3 to 5) indicate a lack of observable recombination occurring within these clones.

The presence of the three-gene cluster with a high degree of homology to C. neoformans var. grubii (cluster 6) provides evidence of a possible previous hybridization event between C. gattii and C. neoformans var. grubii, resulting in the possible introgression of this cluster in either a limited number of C. gattii subtypes or in the C. neoformans var. grubii genome. More whole-genome sequencing of the latter will be needed to ascertain the direction of introgression. Natural C. gattii and C. neoformans var. grubii hybrids have been recently identified from clinical samples in Colombia and Brazil (42). Additionally, genetic remnants of previous hybridization events, also referred to as “identity islands,” have been reported in the case of a nonreciprocal exchange of a multigene cluster from C. neoformans var. grubii to C. neoformans var. neoformans (43). The existence of hybrids and subsequent introgression are an additional mechanism of genome evolution (beyond recombination, SNP mutation, and gene gain, loss, and duplication) and may allow continued successful emergence of novel phenotypes.

(iii) Conclusions.

No clear progenitor strains have been identified for VGIIa, VGIIb, or VGIIc, although the first two have been found in South America. The phylogenetic evidence shown here is consistent with multiple translocations of VGIIb to the PNW and elsewhere, as well as possible multiple introductions of VGIIa into North America. The VGIIc subtype lacks representation outside the PNW; however, its consistent SNP and gene content relationship with subtypes from Brazil provides compelling evidence of a likely South American origin for VGIIc as well. The results of the analysis also confirm that South America is a major source of VGII diversity and, with numerous subtypes and extensive recombination, it appears to represent an evolutionary accelerator for the VGII subtype and clones emerging from it.

Our extensive whole-population genome analysis enabled an in silico search for genetic factors associated with differing phenotypes in the PNW populations; similar in silico genome-wide association studies (GWAS) have been useful for studying adaptations in other widely dispersed fungi (34). The initial SNP and gene analyses based on this population-level study provide an important perspective on C. gattii diversity and will serve as a baseline for future studies. The identification of variant sugar transport and metabolism genes, as well as previously unexplored virulence genes (e.g., collagen adhesion gene in VGIIa), provides excellent candidates to pursue in functional studies that will help explain the PNW outbreaks and other C. gattii emergence events. The comprehensive SNP analyses will also be useful for future mutation-based functional analyses. Given the obvious gene content differences seen within VGIIb, even demonstrably clonal subtypes are not uniform and may require intrasubtype comparative analyses.

We recognize that limitations of this study include sampling bias and hence possible overrepresentation of isolates from some geographic regions and underrepresentation from others. There could be further diversity of VGII in other regions that were not sampled or that were sparsely sampled. Additionally, given the finding of C. gattii VGII strains in undisturbed Amazon rainforest and the amount of diversity already seen in South America (7), there is doubtless much more unsampled diversity on that continent. Additionally, this study was designed to identify important genomic features from whole-genome sequence data and was not designed to identify important mutations impacting gene regulation or expression. Nevertheless, our findings provide a significant advance in our understanding of the emergence of C. gattii in the PNW though the importance of the gene differences is unknown and requires further functional genomic analyses. As has been seen in recent years, C. gattii continues to spread to new locales, and a comprehensive understanding of its population genomics may help researchers, public health officials, and clinicians to better respond to the impacts of its continued evolution and emergence.

MATERIALS AND METHODS<italic>Cryptococcus gattii</italic> isolates and Illumina sequencing.

One hundred eighteen C. gattii isolates were included in the analyses (Table 1). Information for clinical strains was collected with the approval and oversight of the institutional ethic committee. Additionally, the two near neighboring taxa, C. neoformans var. neoformans strain JEC21 and C. neoformans var. grubii strain H99, were obtained as whole-genome assemblies from NCBI (http://www.ncbi.nlm.nih.gov/Genbank/index.html) and the Broad Institute of Harvard and MIT (http://www.broadinstitute.org/annotation/genome/cryptococcus_neoformans_b), respectively. The 118 C. gattii isolates were selected based on previously established MLST sequence types (Table 1) to ensure global genomic diversity. The previously published VGIIa reference genome (R265) (see the Broad Institute link above) was resequenced for this analysis, due to the relatively high number of likely errors in the existing Sanger-based reference genome (17). The genomes of all 118 C. gattii isolates were sequenced using Illumina HiSeq, MiSeq, or GAIIx sequencing technology as previously described (17, 44). High-molecular-weight DNA was extracted using the ZR Fungal/Bacterial DNA MiniPrep kit (catalog no. D6005; Zymo Research, Irvine, CA, USA). DNA samples were prepared for paired-end sequencing using the Kapa Biosystems library preparation kit (catalog no. KK8201; Woburn, MA, USA) protocol with an 8-bp index modification. Briefly, 2 µg of double-stranded DNA (dsDNA) sheared to an average size of 600 bp was used for the Kapa Illumina paired-end library preparation as described by the manufacturer. Modified oligonucleotides (Integrated DNA Technologies, Coralville, IA, USA) with 8-bp indexing capability (45), were substituted at the appropriate step. Prior to sequencing, the libraries were quantified with quantitative PCR (qPCR) on the ABI 7900HT (Life Technologies Corporation, Carlsbad, CA, USA) using the Kapa library quantification kit (catalog no. KK4835; Woburn, MA, USA). Libraries were sequenced to a read length of 100 bp or 150 bp on the Illumina HiSeq system, to 75 bp or 100 bp on the GAIIx system, or to 250 bp on the MiSeq system. All WGS data files have been deposited in the NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/bioproject; project PRJNA244927).

SNP detection and phylogenetic comparison. (i) Genome assembly.

All sequenced genomes were de novo assembled using the SPAdes assembler v2.5.0 (46). Raw contigs were filtered by length and coverage, as SPAdes tends to produce many short and low-coverage contigs. For isolates assembled from 100-bp or 150-bp paired-end data, contigs shorter than 200 bp were filtered and removed, while contigs shorter than 500 bp were filtered/removed in genomes assembled from 250-bp paired-end MiSeq data. Additionally, for every genome, contigs with coverage below 20% of the average coverage of the 20 largest contigs were filtered out.

(ii) MLST typing.

The seven MLST loci, CAP59, GPD1, LAC1, SOD1, URA5, PLB1, and intergene sequence (IGS), were aligned to the contigs according to the consensus MLST scheme for the C. neoformans/C. gattii species complex (10). Allele types (ATs) and sequence types (STs) were assigned according to the MLST database at http://mlst.mycologylab.org, and new ATs and STs were added to this database. A maximum likelihood tree was constructed based on ST locus scores using the program MEGA v. 5.05 (47).

(iii) SNP variant detection.

Illumina read data were aligned against the herein newly obtained R265 (high virulent VGIIa Vancouver Island outbreak strain) assembly using Novoalign 3.00.03 (Novocraft Technologies, Selangor, Malaysia), and then SNP variants were identified using the GATK Unified Genotyper v2.4 (48). SNP calls were then filtered (http://tgennorth.github.io/NASP/) to remove positions with less than 10× coverage, with less than 80% variant allele calls, or identified by Nucmer as being within duplicated regions in the reference. In the cases of the VGIIb and VGIIc analyses, the read data were aligned by the same methods using the genome assemblies for B9563 and B8825, respectively. In total, five SNP matrices with different taxa were produced: (i) C. gattii molecular types VGI to VGIV; (ii) C. gattii molecular type VGII only with all previously established MLST sequence types; (iii) C. gattii VGIIa; (iv) C. gattii VGIIb; and (v) C. gattii VGIIc.

(iv) Phylogenetic analysis.

To understand relationships between isolates, we conducted whole-genome SNP typing (WGST) as previously described (17, 39, 49). Maximum-parsimony SNP trees, based on each of the above five SNP matrices, were constructed, using PAUP* v.4.0b10 (50) and visualized in FigTree v.1.3.1 (http://tree.bio.ed.ac.uk/software/figtree/). The C. gattii VGII tree was rooted using the large C. gattii tree, while the VGIIa, VGIIb, and VGIIc trees were each rooted using the other VGII subtypes as outgroups. Roots are shown simply as the midpoint on the branch determined by outgroup rooting, because the exact number of SNPs assigned to each branch varies slightly with the inclusion of different taxa, due to the SNP filtering strategies employed. The phi test for recombination was conducted on the whole VGII SNP data set, as well as the same data set broken into 500-bp nonoverlapping windows, using the PhiPack software (51). Due to recombination signal strongly present in the VGII SNP data, a neighbor-joining split tree network or “neighbor net” (52) was drawn to visualize the relationship between isolates in the VGII tree. The neighbor net was drawn using SplitsTree4 (53) with the uncorrected P-distance transformation. The GTR+G model of nucleotide evolution, selected by jModelTest2 (54), was also used to calculate distances for input to the neighbor net algorithm, which produced a network showing the same divisions among samples (data not shown).

(v) ChromoPainter and fineStructure population analysis.

In addition to whole-genome SNP phylogenies, fineStructure analysis (25) was performed using SNP matrix 2 (see “SNP variant detection” above) to help identify the population structure within VGII. Using one representative from each of the 32 distinct VGII subtypes (i.e., MLST sequence types) (Table 1), visualized by placement in the SNP phylogeny, whole-genome SNP data were reduced to a pairwise similarity matrix using ChromoPainter, which estimates the number of regions in two genomes that coalesce prior to coalescing with another genome. ChromoPainter uses the linkage disequilibrium present between closely positioned SNP loci to help assign a given region of a genome to its best near neighbor, increasing the sensitivity of population detection over unlinked methods. ChromoPainter was run using the linkage model, assuming uniform rates of recombination per base pair of sequence. Next, the population structure is identified by fineStructure using this similarity matrix, where the underlying model assumes individuals within populations will share more regions of their genome with each other, and have a similar amount of admixture with individuals from different populations. After identifying populations, fineStructure then merges populations one at a time, at each step using the most likely population merge, to relate populations to each other through a tree.

Gene content comparison. (i) BSR analysis of gene content.

Potential gene content comparison of taxa was performed using BLAT score ratio (BSR) analysis, as previously described for prokaryotic genomes (55), with modifications made to accommodate eukaryotic genes containing introns (described below). Gene sequences were predicted for each assembled genome using the ab initio eukaryotic gene prediction software Augustus v2.6.1 (56). For each gene prediction, the coding DNA region sequences were concatenated into a single contig, removing introns to create the putative in silico coding DNA (is-cDNA). is-cDNA sequences from all isolates were then clustered and dereplicated using USEARCH v5.0.144 (57) with a 90% identity cutoff, creating a nonredundant set of consensus is-cDNA predictions encompassing the entire diversity of samples. Putative is-cDNAs were then aligned back to each genome assembly using BLAT v35x1 (58), which can align the cDNA sequence against a reference allowing for intervening intron sequences, using an 80% identity cutoff. BLAT alignments were converted to scores replicating the calculation used by the web-based BLAT server (https://genome.ucsc.edu/). For each putative is-cDNA, the raw score for an isolate was converted into a ratio by dividing the score of a particular isolate by the top score achieved by any isolate, as previously described (55). These BSRs were then imported into the MultiExperiment Viewer (MEV) to generate a heat map which could be manually inspected to identify genes present and absent across different sets of taxa (Fig. 7). Thus, a value of 1.0 (yellow on the heat map) represents the best hit, and a value of 0 (blue) represents no alignment by BLAT.

Using the BSR method described above, the potential gene content between VGIIa, VGIIb, and VGIIc sequence types was compared using representative isolates from each subtype present in the PNW (Table 2). Representative strains were selected based on assembly quality and being part of the PNW outbreak subtypes (i.e., isolates typed as VGIIa and VGIIb but not from the PNW were not included in these sets; all VGIIc isolates were from the PNW). This set included 9 VGIIa isolates, 6 VGIIb isolates, and 12 VGIIc isolates (Table 1). Putative consensus is-cDNAs showing differential content across the three PNW sequence types were identified using a cutoff BSR score of above 0.9 in the taxa of interest and below 0.9 in the excluded taxa. Multiple assemblies from each subtype were included to help reduce the number of genes that may appear to be absent due to variability of draft genome quality with multiple contigs per chromosome. Genes with a low BSR value in a single member of a subtype almost invariably were due to contig breaks in assembly. Unique putative genes were identified for each of the VGIIa, VGIIb, and VGIIc groups as well as for genes contained in both VGIIa and VGIIc but missing in VGIIb, as this comparison may help explain the previously described phenotype differences seen in the PNW.

Expanding the analysis beyond differences within the PNW sequence types, the putative is-cDNA sequences that define the PNW subtypes were scored in each of the additional 29 global VGII subtypes (Table 1 and Fig. 7), a representative sequence from each of the VGI, VGIII, and VGIV major molecular types, as well as C. neoformans var. grubii (H99) and C. neoformans var. neoformans (JEC21). This allowed for the gene differences found in the PNW subtypes to be examined across the diversity of the Cryptococcus neoformans/C. gattii species complex, as well as for the identification of genes present in all of the PNW isolates but not universally present in other VGII subtypes. In addition, the full complement of is-cDNAs was scored in all 29 global VGII subtypes.

To characterize genes absent in the PNW populations but present in many of the other global VGII subtypes, a second set of putative genes (is-cDNAs) was generated using the gene predictions in non-PNW VGII subtypes. While this would presumably contain many of the same genes found in the PNW gene set, it would also contain any genes that may be absent in all of the PNW subtypes. Using the same cutoff scheme as described above, putative genes were identified that were missing or truncated in all of the PNW subtypes but present in at least half of the other VGII subtypes. is-cDNA sequences identified as having a differential content pattern of interest were then located in the genome assemblies of strains R265 (VGIIa), B9563 (VGIIb), B8825 (VGIIc), depending on which taxa contained the is-cDNAs, to look for blocks of adjacent genes showing similar patterns of presence and absence.

(ii) BSR confirmation with sequence read data alignment.

Because a low BSR score may indicate gene loss or truncation as well as problems caused by assembly issues in draft genomes, putative gene sequences identified as differential across taxa of interest were subsequently assessed by raw read data alignment. This provides evidence for confirmation of the differential gene content and, in the case of scores between 0.0 and 0.9, helps to identify the type of gene structure differences present between isolates with different scores. To avoid potential problems in aligning genomic read data against putative is-cDNA sequences, which do not contain introns, full-length gene predictions containing introns were dereplicated with USEARCH to serve as read alignment references. Putative is-cDNAs of interest were aligned against the full-length gene sequences using BLAT to identify the corresponding full-length gene. Read data from each isolate in the PNW analysis was then aligned to the full-length gene. These alignments were inspected for each gene to ensure read coverage across the entire gene sequence for isolates with high BSR scores and to ensure a lack of read coverage for isolates with BSR scores indicating gene absence. Moderate BSR scores occurred in a few cases when limited read data successfully aligned to a putatively missing gene (e.g., is-cDNA 9 in VGIIb), or when read data aligned to truncated genes. In an effort to be conservative in identifying differential gene content using short-read data, gene differences that could not be confirmed with read data are not reported. This most often occurred in the case of putative genes that contained highly repetitive sequences, which cause assembly and definitive alignment issues in short-read data.

(iii) Putative gene characterization.

is-cDNAs, which were confirmed as variable with read data (see above), were translated into amino acid sequences and then searched against the Pfam database (http://pfam.sanger.ac.uk/) to help identify potential protein functions. Protein families and domains achieving a significant score are reported in Table 2. Additional characterization for selected is-cDNAs was performed using a blastp search of the NCBI nonredundant protein database (http://www.ncbi.nlm.nih.gov/RefSeq/).

SUPPLEMENTAL MATERIAL

Genome sequence analysis. Flowchart depicting analysis workflow for population genetic analyses and comparative gene content analysis. Download

Figure S1, EPS file, 0.9 MB

SNP phylogeny of C. gattii VGII molecular type with geographic annotation. Maximum-parsimony phylogenetic analysis was performed on SNPs from 115 C. gattii VGII genomes including PNW and global diversity. Major branches are annotated by the continent(s) of origin for member genomes. A total of 310,969 SNPs were identified (221,248 parsimony informative). CI = 0.401. For country or state abbreviations, see the legend to Fig. 1. Numbers in red next to branches are bootstrap values generated from 500 generations. Branches without bootstrap numbers shown all had values of 100. Download

Figure S2, PDF file, 1.7 MB

MLST phylogeny of global C. gattii VGII isolates. Multilocus sequence type (ST) loci were either sequenced directly or extracted from whole-genome sequence data. STs were assigned for each locus for each strain according to the MLST database at http://mlst.mycologylab.org. A maximum likelihood tree was constructed based on ST locus scores. Numbers in red next to branches are bootstrap values; branches without bootstrap numbers shown had values of 100. Download

Figure S3, EPS file, 0.3 MB

Gene loss in the VGIIb subclade. The tree is reproduced from Fig. 5 showing the SNP-based phylogeny of the VGIIb subtype isolates, and the heat map is reproduced from the VGIIb-only portion of the BLAT score ratio heat map in Fig. 7. The arrow points to the likely location of the loss of the 12-gene cluster in the VGIIb population. Download

Figure S4, EPS file, 1 MB

Citation Engelthaler DM, Hicks ND, Gillece JD, Roe CC, Schupp JM, Driebe EM, Gilgado F, Carriconde F, Trilles L, Firacative C, Ngamskulrungroj P, Castañeda E, Lazera MDS, Melhem MSC, Pérez-Bercoff Å, Huttley G, Sorrell TC, Voelz K, May RC, Fisher MC, Thompson GR, III, Lockhart SR, Keim P, Meyer W. 2014. Cryptococcus gattii in North American Pacific Northwest: whole-population genome analysis provides insights into species evolution and dispersal. mBio 5(4):e01464-14. doi:10.1128/mBio.01464-14.

ACKNOWLEDGMENTS

We thank all our colleagues from around the world, who have submitted isolates for this study.

This study was supported by a grant from the National Institutes of Health (R21AI098059 [P.K., D.M.E., G.R.T., S.R.L., and W.M.]) and an NHMRC grant (APP1031943 [W.M., T.C.S. and G.H.]).

The findings and conclusions of this article are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention.

REFERENCES Kwon-ChungKJBennettJE 1984 High prevalence of Cryptococcus neoformans var. gattii in tropical and subtropical regions. Zentralbl. Bakteriol. Mikrobiol. Hyg. A 257:2132186207684 LazeraMSSalmito CavalcantiMALonderoATTrillesLNishikawaMMWankeB 2000 Possible primary ecological niche of Cryptococcus neoformans. Med. Mycol. 38:379383. 10.1080/mmy.38.5.379.38311092385 EscandónPSánchezAMartínezMMeyerWCastañedaE 2006 Molecular epidemiology of clinical and environmental isolates of the Cryptococcus neoformans species complex reveals a high genetic diversity and the presence of the molecular type VGII mating type a in Colombia. FEMS Yeast Res 6:625635. 10.1111/j.1567-1364.2006.00055.x16696659 FiracativeCTorresGRodríguezMCEscandónP 2011 First environmental isolation of Cryptococcus gattii serotype B, from Cúcuta, Colombia. Biomedica 31:118123. 10.1590/S0120-4157201100010001422159490 EllisDHPfeifferTJ 1990 Natural habitat of Cryptococcus neoformans var. gattii. J. Clin. Microbiol 28:164216442199524 CarricondeFGilgadoFArthurIEllisDMalikRvan de WieleNRobertVCurrieBJMeyerW 2011 Clonality and alpha-a recombination in the Australian Cryptococcus gattii VGII population—an emerging outbreak in Australia. PLoS One 6:e16936. 10.1371/journal.pone.0016936 21383989 FortesSTLazéraMSNishikawaMMMacedoRCWankeB 2001 First isolation of Cryptococcus neoformans var. gattii from a native jungle tree in the Brazilian Amazon rainforest. Mycoses 44:137140. 10.1046/j.1439-0507.2001.00651.x11486449 BaltazarLDMRibeiroMA 2008 First isolation of Cryptococcus gattii from the environment in the State of Espírito Santo. Rev. Soc. Bras. Med. Trop. 41:449453 (In Portuguese.) 10.1590/S0037-8682200800050000319009184 HagenFCeresiniPCPolacheckIMaHvan NieuwerburghFGabaldónTKaganSPursallERHoogveldHLvan IerselLJKlauGWKelkSMStougieLBartlettKHVoelzKPryszczLPCastañedaELazeraMMeyerWDeforceDMeisJFMayRCKlaassenCHBoekhoutT 2013 Ancient dispersal of the human fungal pathogen Cryptococcus gattii from the Amazon rainforest. PLoS One 8:e71148. 10.1371/journal.pone.007114823940707 MeyerWAanensenDMBoekhoutTCogliatiMDiazMREspostoMCFisherMGilgadoFHagenFKaocharoenSLitvintsevaAPMitchellTGSimwamiSPTrillesLVivianiMAKwon-ChungJ 2009 Consensus multi-locus sequence typing scheme for Cryptococcus neoformans and Cryptococcus gattii. Med. Mycol. 47:561570. 10.1080/1369378090295388619462334 KiddSEHagenFTscharkeRLHuynhMBartlettKHFyfeMMacdougallLBoekhoutTKwon-ChungKJMeyerW 2004 A rare genotype of Cryptococcus gattii caused the cryptococcosis outbreak on Vancouver Island (British Columbia, Canada). Proc. Natl. Acad. Sci. U. S. A. 101:1725817263. 10.1073/pnas.040298110115572442 FraserJASubaranRLNicholsCBHeitmanJ 2003 Recapitulation of the sexual cycle of the primary fungal pathogen Cryptococcus neoformans var. gattii: implications for an outbreak on Vancouver Island, Canada. Eukaryot. Cell 2:10361045. 10.1128/EC.2.5.1036-1045.200314555486 FraserJAGilesSSWeninkECGeunes-BoyerSGWrightJRDiezmannSAllenAStajichJEDietrichFSPerfectJRHeitmanJ 2005 Same-sex mating and the origin of the Vancouver Island Cryptococcus gattii outbreak. Nature 437:13601364. 10.1038/nature0422016222245 MacDougallLFyfeM 2006 Emergence of Cryptococcus gattii in a novel environment provides clues to its incubation period. J. Clin. Microbiol. 44:18511852. 10.1128/JCM.44.5.1851-1852.200616672420 ByrnesEJIIIBildfellRJFrankSAMitchellTGMarrKAHeitmanJ 2009 Molecular evidence that the range of the Vancouver Island outbreak of Cryptococcus gattii infection has expanded into the Pacific Northwest in the United States. J. Infect. Dis. 199:10811086. 10.1086/59730619220140 ByrnesEJIIILiWLewitYMaHVoelzKRenPCarterDAChaturvediVBildfellRJMayRCHeitmanJ 2010 Emergence and pathogenicity of highly virulent Cryptococcus gattii genotypes in the northwest United States. PLoS Pathog. 6:e1000850. 10.1371/journal.ppat.100085020421942 GilleceJDSchuppJMBalajeeSAHarrisJPearsonTYanYKeimPDeBessEMarsden-HaugNWohrleREngelthalerDMLockhartSR 2011 Whole genome sequence analysis of Cryptococcus gattii from the Pacific Northwest reveals unexpected diversity. PLoS One 6:e28550. 10.1371/journal.pone.002855022163313 HarrisJRLockhartSRDebessEMarsden-HaugNGoldoftMWohrleRLeeSSmelserCParkBChillerT 2011 Cryptococcus gattii in the United States: clinical aspects of infection with an emerging pathogen. Clin. Infect. Dis. 53:11881195. 10.1093/cid/cir72322016503 SpringerDJPhadkeSBillmyreBHeitmanJ 2012 Cryptococcus gattii, no longer an accidental pathogen? Curr. Fungal Infect. Rep. 6:245256. 10.1007/s12281-012-0111-023243480 MaHMayRC 2010 Mitochondria and the regulation of hypervirulence in the fatal fungal outbreak on Vancouver Island. Virulence 1:197201. 10.4161/viru.1.3.1105321178442 NgamskulrungrojPSerenaCGilgadoFMalikRMeyerW 2011 Global VGIIa isolates are of comparable virulence to the major fatal Cryptococcus gattii Vancouver Island outbreak genotype. Clin. Microbiol. Infect. 17:251258. 10.1111/j.1469-0691.2010.03222.x20331682 MoranGPColemanDCSullivanDJ 2011 Comparative genomics and the evolution of pathogenicity in human pathogenic fungi. Eukaryot. Cell 10:3442. 10.1128/EC.00242-1021076011 NgamskulrungrojPPriceJSorrellTPerfectJRMeyerW 2011 Cryptococcus gattii virulence composite: candidate genes revealed by microarray analysis of high and less virulent Vancouver Island outbreak strains. PLoS One 6:e16076. 10.1371/journal.pone.001607621249145 KaocharoenSNgamskulrungrojPFiracativeCTrillesLPiyabongkarnDBanlunaraWPoonwanNChaiprasertAMeyerWChindampornA 2013 Molecular epidemiology reveals genetic diversity amongst isolates of the Cryptococcus neoformans/C. gattii species complex in Thailand. PLoS Negl. Trop. Dis. 7:e2297. 10.1371/journal.pntd.000229723861989 LawsonDJHellenthalGMyersSFalushD 2012 Inference of population structure using dense haplotype data. PLoS Genet. 8:e1002453. 10.1371/journal.pgen.100245322291602 NgamskulrungrojPGilgadoFFaganelloJLitvintsevaAPLealALTsuiKMMitchellTGVainsteinMHMeyerW 2009 Genetic diversity of the Cryptococcus species complex suggests that Cryptococcus gattii deserves to have varieties. PLoS One 4:e5862. 10.1371/journal.pone.000586219517012 PhadkeSSFeretzakiMHeitmanJ 2013 Unisexual reproduction enhances fungal competitiveness by promoting habitat exploration via hyphal growth and sporulation. Eukaryot. Cell 12:11551159. 10.1128/EC.00147-1323794511 TscharkeRLLazeraMChangYCWickesBLKwon-ChungKJ 2003 Haploid fruiting in Cryptococcus neoformans is not mating type alpha-specific. Fungal Genet. Biol. 39:230237. 10.1016/S1087-1845(03)00046-X12892636 NishikakuASRibeiroLCMolinaRFAlbeBPCunhaCDSBurgerE 2009 Matrix metalloproteinases with gelatinolytic activity induced by Paracoccidioides brasiliensis infection. Int. J. Exp. Pathol. 90:527537. 10.1111/j.1365-2613.2009.00663.x19765107 LaurentGJ 1986 Lung collagen: more than scaffolding. Thorax 41:418428. 10.1136/thx.41.6.4183024347 BromleyIMDonaldsonK 1996 Binding of Aspergillus fumigatus spores to lung epithelial cells and basement membrane proteins: relevance to the asthmatic lung. Thorax 51:12031209. 10.1136/thx.51.12.12038994516 WagnerCKhanASKamphausenTSchmausserBUnalCLorenzUFischerGHackerJSteinertM 2007 Collagen binding protein Mip enables Legionella pneumophila to transmigrate through a barrier of NCI-H292 lung epithelial cells and extracellular matrix. Cell. Microbiol. 9:450462. 10.1111/j.1462-5822.2006.00802.x16953800 PanepintoJLiuLRamosJZhuXValyi-NagyTEksiSFuJJaffeHAWickesBWilliamsonPR 2005 The DEAD-box RNA helicase Vad1 regulates multiple virulence-associated genes in Cryptococcus neoformans. J. Clin. Invest. 115:632641. 10.1172/JCI2304815765146 EllisonCEHallCKowbelDWelchJBremRBGlassNLTaylorJW 2011 Population genomics and local adaptation in wild isolates of a model microbial eukaryote. Proc. Natl. Acad. Sci. U. S. A. 108:28312836. 10.1073/pnas.101497110821282627 RzaievaOMVarbanets’LDHudzenkoOV 2011 Optimization of cultivation conditions of Cryptococcus albidus—producers of alpha-l-rhamnosidase. Mikrobiol. Z. 73:1016 (In Ukrainian.) PubMed. PubMed.21442947 LiuTYuHZhangCLuMPiaoYOhbaMTangMYuanXWeiSWangKMaAFengXQinSMukaiCTsujiAJinF 2012 Aspergillus niger DLFCC-90 rhamnoside hydrolase, a new type of flavonoid glycoside hydrolase. Appl. Environ. Microbiol. 78:47524754. 10.1128/AEM.00054-1222544243 YadavVYadavPKYadavSYadavKDS 2010 α-l-Rhamnosidase: a review. Proc. Biochem. 45:12261235. 10.1016/j.procbio.2010.05.025 NghiDHBittnerBKellnerHJehmlichNUllrichRPecynaMJNousiainenPSipiläJHoungLMHofrichterMLiersC 2012 The wood rot ascomycete Xylaria polymorpha produces a novel GH78 glycoside hydrolase that exhibits α-l-rhamnosidase and feruloyl esterase activities and releases hydroxycinnamic acids from lignocelluloses. Appl. Environ. Microbiol. 78:48934901. 10.1128/AEM.07588-1122544251 LazéraMSCavalcantiMATrillesLNishikawaMMWankeB 1998 Cryptococcus neoformans var. gattii—evidence for a natural habitat related to decaying wood in a pottery tree hollow. Med. Mycol. 36:119122. 10.1046/j.1365-280X.1998.00120.x9776823 GroverNNawangeSRNaiduJSinghSMSharmaA 2007 Ecological niche of Cryptococcus neoformans var. grubii and Cryptococcus gattii in decaying wood of trunk hollows of living trees in Jabalpur City of Central India. Mycopathologia 164:159170. 10.1007/s11046-007-9039-217661160 HernándezOAlmeidaAJGonzalezAGarciaAMTamayoDCanoLERestrepoAMcEwenJG 2010 A 32-kilodalton hydrolase plays an important role in Paracoccidioides brasiliensis adherence to host cells and influences pathogenicity. Infect. Immun. 78:52805286. 10.1128/IAI.00692-1020876288 AminnejadMDiazMArabatzisMCastañedaELazeraMVelegrakiAMarriottDSorrellTCMeyerW 2012 Identification of novel hybrids between Cryptococcus neoformans var. grubii VNI and Cryptococcus gattii VGII. Mycopathologia 173:337346. 10.1007/s11046-011-9491-x22081254 KavanaughLAFraserJADietrichFS 2006 Recent evolution of the human pathogen Cryptococcus neoformans by intervarietal transfer of a 14-gene fragment. Mol. Biol. Evol. 23:18791890. 10.1093/molbev/msl07016870684 EtienneKAGilleceJHilsabeckRSchuppJMColmanRLockhartSRGadeLThompsonEHSuttonDANeblett-FanfairRParkBJTurabelidzeGKeimPBrandtMEDeakEEngelthalerDM 2012 Whole genome sequence typing to investigate the Apophysomyces outbreak following a tornado in Joplin, Missouri, 2011. PLoS One 7:e49989. 10.1371/journal.pone.004998923209631 KozarewaITurnerDJ 2011 96-plex molecular bar coding for the Illumina Genome Analyzer. Methods Mol. Biol. 733:279298. 10.1007/978-1-61779-089-8_2021431778 BankevichANurkSAntipovDGurevichAADvorkinMKulikovASLesinVMNikolenkoSIPhamSPrjibelskiADPyshkinAVSirotkinAVVyahhiNTeslerGAlekseyevMAPevznerPA 2012 SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19:455477. 10.1089/cmb.2012.002122506599 TamuraKPetersonDPetersonNStecherGNeiMKumarS 2011 MEGA5: Molecular Evolutionary Genetics Analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28:27312739. 10.1093/molbev/msr12121546353 DePristoMABanksEPoplinRGarimellaKVMaguireJRHartlCPhilippakisAAdel AngelGRivasMAHannaMMcKennaAFennellTJKernytskyAMSivachenkoAYCibulskisKGabrielSBAltshulerDDalyMJ 2011 A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43:491498. 10.1038/ng.80621478889 EngelthalerDMKelleyEDriebeEMBowersJEberhardCFTrujilloJDecruyenaereFSchuppJMMossongJKeimPEvenJ 2013 Rapid and robust phylotyping of spa t003, a dominant MRSA clone in Luxembourg and other European countries. BMC Infect. Dis. 13:339. 10.1186/1471-2334-13-33923879266 SwoffordDL 2003 PAUP*: Phylogenetic Analysis Using Parsimony (and other methods), version 4. Sinauer Associates, Sunderland, MA BruenTCPhilippeHBryantD 2006 A simple and robust statistical test for detecting the presence of recombination. Genetics 172:26652681. 10.1534/genetics.105.04897516489234 BryantDMoultonV 2004 Neighbor-Net: an agglomerative method for the construction of phylogenetic networks. Mol. Biol. Evol. 21:255265. 10.1093/molbev/msh01814660700 HusonDHBryantD 2006 Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. 23:25426716221896 DarribaDTaboadaGLDoalloRPosadaD 2012 jModelTest 2: more models, new heuristics and parallel computing. Nat. Methods 9:772. 10.1038/nmeth.211122847109 SahlJWGilleceJDSchuppJMWaddellVDriebeEMEngelthalerDMKeimP 2013 Evolution of a pathogen: a comparative genomics analysis identifies a genetic pathway to pathogenesis in Acinetobacter. PLoS One 8:e54287. 10.1371/journal.pone.005428723365658 StankeMDiekhansMBaertschRHausslerD 2008 Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24:637644. 10.1093/bioinformatics/btn01318218656 EdgarRC 2010 Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:24602461. 10.1093/bioinformatics/btq46120709691 KentWJ 2002 BLAT—the BLAST-like alignment tool. Genome Res. 12:656664. 10.1101/gr.22920211932250