Genome AnnouncGenome AnnouncgagaGAGenome Announcements2169-8287American Society for Microbiology1752 N St., N.W., Washington, DC251037544125765genomeA00718-1410.1128/genomeA.00718-14ProkaryotesGenome Sequences of 228 Shiga Toxin-Producing Escherichia coli Isolates and 12 Isolates Representing Other Diarrheagenic E. coli PathotypesTreesEijaaStrockbineNancyaChangayilShankarbRanganathanSatishkumarbZhaoKunbWeilRyanbMacCannellDuncanbSabolAshleyaSchmidtkeAmberaMartinHaleyaStriplingDevonaRibotEfrain M.aGerner-SmidtPeteraDivision of Foodborne, Waterborne, and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USAOffice of Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USAAddress correspondence to Eija Trees, eih9@cdc.gov.782014Jul-Aug201424e00718-1424620141672014Copyright © 2014 Trees et al.2014Trees et al.This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported license.

Shiga toxin-producing Escherichia coli (STEC) are a common cause for food-borne diarrheal illness outbreaks and sporadic cases. Here, we report the availability of the draft genome sequences of 228 STEC strains representing 32 serotypes with known pulsed-field gel electrophoresis (PFGE) types and epidemiological relationships, as well as 12 strains representing other diarrheagenic E. coli pathotypes.

cover-dateJuly/August 2014access-typefree
GENOME ANNOUNCEMENT

The rapidly decreasing cost of next-generation sequencing (NGS) will facilitate its application for real-time surveillance in the near future. PulseNet, the molecular subtyping network for food-borne disease surveillance, currently relies on pulsed-field gel electrophoresis (PFGE) to define clusters of illness (1). In order to use NGS as a primary method for cluster detection, a thorough understanding of the genetic diversity in the target population is needed. Shiga toxin-producing Escherichia coli (STEC) are among the pathogens tracked by PulseNet. In this report, we announce the availability of the draft sequences of a carefully selected set of STEC strains that should enable us to gain insights into the sequence diversity within an outbreak or a carrier state and among epidemiologically unrelated isolates within a serotype and between serotypes.

We sequenced 228 STEC strains representing 32 serotypes with known PFGE types and epidemiological relationships. The strain set included a total of 50 isolates from five outbreaks, 11 isolates from a long-term carrier, and epidemiologically unrelated strains. Twelve strains of other diarrheagenic E. coli pathotypes were included as outliers. Genomic DNA from each strain was isolated using the ArchivePure DNA cell/tissue kit (5Prime, Hamburg, Germany). All 240 strains were sequenced to a minimum depth of 100× with the HiSeq 2000 or GAIIx (Illumina, San Diego, CA, USA) using the TrueSeq DNA LT sample prep kit (Illumina) for DNA library preparation and 100-bp paired-end read chemistry. Additionally, 82 strains were sequenced with the PacBio RS system (Pacific Biosciences, Menlo Park, CA) using C2 chemistry and four single-molecule real-time (SMRT) cells per genome.

Raw read quality checks were performed on the 240 samples using FastQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc) and in-house Perl scripts/Java programs. Primary analysis for the Illumina data was performed using CLC Genomics Workbench 5.5.1 (Aarhus, Denmark). The raw read files for each sample were trimmed with length (minimum, 50 bp) and quality score (0.02) filters. The trimmed reads were assembled into contigs with specific parameter settings (length fraction, 0.8; similarity fraction, 0.8; minimum contig length, 450 bp), and assembly statistics were parsed out in a table format using in-house scripts. The PacBio data analysis was performed using the whole-genome sequencing (WGS) assembler toolkit (2). Error correction of the filtered subreads was performed with the paired-end Illumina data (~60× data was used) using the WGS toolkit PacBioToCA script, followed by de novo assembly using the runCA script. The best assembly for each of these 82 samples was chosen based on the number of contigs, N50 value, and genome length.

The average genome size for the sequenced strains was 5,282,291 bp (range, 4,527,885 to 5,712,627). For the 240 Illumina assemblies, the average number of contigs was 211 (range, 68 to 465), and the average N50 was 128,850 (range, 26,435 to 230,877). For the 82 PacBio hybrid assemblies, the average number of contigs was 207 (range, 31 to 207), and the average N50 was 172,854 (range, 31,094 to 1,414,730).

Nucleotide sequence accession numbers.

The draft genome sequences for these 240 diarrheagenic E. coli strains have been deposited in DDBJ/ENA/GenBank under the accession numbers listed in Table 1.

NCBI accession numbers for 240 E. coli draft genomes

Strain IDSerotypeNCBI accession no.
00-3279O78:H12JFBE00000000
01-3076O111:NMJFGU00000000
01-3147O45:H2JHOA00000000
02-3012O81:NMJHNZ00000000
02-3404O28ac:NMJHNY00000000
03-3227O121:H19JHNX00000000
03-3269O174:H21JHNW00000000
03-3458O119:H4JHNV00000000
03-3484O111:NMJHNU00000000
03-3500O26:H11JHNT00000000
04-3023O103:H11JHOD00000000
04-3038O174:H8JHOC00000000
04-3211O111:NMJHNS00000000
05-3646026:H11JHOE00000000
06-3003O121:H19JHNR00000000
06-3256O118:H16JHNQ00000000
06-3325O69:H11JHNP00000000
06-3464O26:H11JHNO00000000
06-3484O145:NMJHNN00000000
06-3501O79:H7JHNM00000000
06-3555O55:H7JHNL00000000
06-3612O118:H16JHNK00000000
06-3691O91:H14JHNJ00000000
06-3745O157:H7JHNI00000000
06-3822O121:H19JHNH00000000
06-4039O157:H7JHNG00000000
07-3091O157:H7JHNF00000000
07-3391O157:H7JHNE00000000
07-4224O113:H21JHOB00000000
07-4281O69:H11JHLA00000000
08-3037O157:H7JHKZ00000000
08-3527O157:H7JHKY00000000
08-3651O118:H16JHKX00000000
08-4169O157:H7JHKW00000000
08-4270O145:NMJHKV00000000
08-4487O111:NMJHKU00000000
08-4529O157:H7JHHI00000000
08-4540O157:NMJHHH00000000
08-4661O69:H11JHHG00000000
2009C-3227O91:H14JHHF00000000
2009C-3279O103:H2JHHE00000000
2009C-3292O145:H28JHHD00000000
2009C-3299O121:H7JHHC00000000
2009C-3307O123:H11JHHB00000000
2009C-3601O69:H11JHHA00000000
2009C-3612O26:H11JHGZ00000000
2009C-3686O45:H2JHGY00000000
2009C-3689O26:H11JHGX00000000
2009C-3745O91:NMJHGW00000000
2009C-3996O26:H11JHGV00000000
2009C-4006O111:NMJHGU00000000
2009C-4050O121:H19JHGT00000000
2009C-4052O111:NMJHGS00000000
2009C-4126O111:H8JHGR00000000
2009C-4258O157:H7JHGQ00000000
2009C-4446O118:H16JHGP00000000
2009C-4646O91:H21JHGO00000000
2009C-4659O121:H19JHGN00000000
2009C-4747O26:H11JHGM00000000
2009C-4750O121:H19JHGL00000000
2009C-4760O26:H11JHGK00000000
2009C-4780O45:H2JHGJ00000000
2009C-4826O26:H11JHGI00000000
2009EL1302O121:H19JHGH00000000
2009EL1412O121:H19JHGG00000000
2009EL1449O157:H7JHGF00000000
2009EL1705O157:H7JHGE00000000
2009EL1913O157:H7JHGD00000000
2009EL2109O157:H7JHGC00000000
2009EL-2169O111:H8JHGB00000000
2010C-3051O26:H11JHGA00000000
2010C-3053O111: NMJHFZ00000000
2010C-3214O103:H11JHFY00000000
2010C-3472O26:H11JHFX00000000
2010C-3507O145:NMJHFW00000000
2010C-3508O145:NMJHFV00000000
2010C-3509O145:NMJHFU00000000
2010C-3510O145:NMJHFT00000000
2010C-3511O145:NMJHFS00000000
2010C-3516O145:NMJHFR00000000
2010C-3517O145:NMJHFQ00000000
2010C-3518O145:NMJHFP00000000
2010C-3521O145:NMJHFO00000000
2010C-3526O145:NMJHFN00000000
2010C-3609O121:H19JHFM00000000
2010C-3794O121:H19JHFL00000000
2010C-3840O121:H19JHFK00000000
2010C-3871O26:H11JHFJ00000000
2010C-3876O45:H2JHFI00000000
2010C-3902O26:H11JHFH00000000
2010C-3977O111:NMJHFG00000000
2010C-4086O111:NMJHFF00000000
2010C-4221O111:NMJHFE00000000
2010C-4244O26:H11JHFD00000000
2010C-4254O121:H19JHFC00000000
2010C-4347O26:NMJHFB00000000
2010C-4430O26:H11JHND00000000
2010C-4433O103:H2JHNC00000000
2010C-4529O103:H25JHNB00000000
2010C-4557C2O145:NMJHNA00000000
2010C-4558O177:NMJHMZ00000000
2010C-4592O111:NMJHMY00000000
2010C-4622O111:NMJHMX00000000
2010C-4715O111:NMJHMW00000000
2010C-4732O121:H19JHMV00000000
2010C-4735O111:NMJHMU00000000
2010C-4746O111:NMJHMT00000000
2010C-4788O26:NMJHMS00000000
2010C-4799O111:NMJHMR00000000
2010C-4818O111:NMJHMQ00000000
2010C-4819O26:H11JHMP00000000
2010C-4824O121:H19JHMO00000000
2010C-4834O26:H11JHMN00000000
2010C-4874O165:H25JHMM00000000
2010C-4966O121:H19JHML00000000
2010C-4979C1O157:H7JHMK00000000
2010C-4989O121:H19JHMJ00000000
2010C-5028O26:H11JHMI00000000
2010C-5034O153:H2JHMH00000000
2010EL1058O121:H19JHMG00000000
2010EL-1699O26:H11JHMF00000000
2010EL-2044O157:H7JHME00000000
2010EL-2045O157:H7JHMD00000000
2011C-3072O121:H19JHMC00000000
2011C-3108O121:H19JHMB00000000
2011C-3170O111:NMJHMA00000000
2011C-3216O121:H19JHLZ00000000
2011C-3270O26:H11JHLY00000000
2011C-3282O26:H11JHLX00000000
2011C-3362O111:NMJHLW00000000
2011C-3387O26:H11JHLV00000000
2011C-3453O111:H8JHLU00000000
2011C-3500O121:H19JHLT00000000
2011C-3506O26:H11JHLS00000000
2011C-3537O121:H19JHLR00000000
2011C-3573O111:NMJHLQ00000000
2011C-3602O156:H25JHLP00000000
2011C-3632O111:NMJHLO00000000
2011C-3655O26:H11JHLN00000000
2011C-3679O111:NMJHLM00000000
2011C-3750O103:H2JHLL00000000
2011EL-1107O157:H7JHLK00000000
2011EL-1675AO104:H4JHLJ00000000
2011EL-2090O157:H7JHLI00000000
2011EL-2091O157:H7JHLH00000000
2011EL-2092O157:H7JHLG00000000
2011EL-2093O157:H7JHLF00000000
2011EL-2094O157:H7JHLE00000000
2011EL-2096O157:H7JHLD00000000
2011EL-2097O157:H7JHLC00000000
2011EL-2098O157:H7JHLB00000000
2011EL-2099O157:H7JHKT00000000
2011EL-2101O157:H7JHKS00000000
2011EL-2103O157:H7JHKR00000000
2011EL-2104O157:H7JHKQ00000000
2011EL-2105O157:H7JHKP00000000
2011EL-2106O157:H7JHKO00000000
2011EL-2107O157:H7JHKN00000000
2011EL-2108O157:H7JHKM00000000
2011EL-2109O157:H7JHKL00000000
2011EL-2111O157:H7JHKK00000000
2011EL-2112O157:H7JHKJ00000000
2011EL-2113O157:H7JHKI00000000
2011EL-2114O157:H7JHKH00000000
2011EL-2286O157:H7JHKG00000000
2011EL-2287O157:H7JHKF00000000
2011EL-2288O157:H7JHKE00000000
2011EL-2289O157:H7JHKD00000000
2011EL-2290O157:H7JHKC00000000
2011EL-2312O157:H7JHKB00000000
2011EL-2313O157:H7JHKA00000000
94-30250104:H21JHJZ00000000
98-3133O157:H16JHJY00000000
99-3124O86:H34JHJX00000000
99-3165O6:H16JHJW00000000
E2539C1O25:NMJHJV00000000
F5656C1O6:H16JHJU00000000
F6142O157:H7JHJT00000000
F66270111:H8JHJS00000000
F67140121:H19JHJR00000000
F6749O157:H7JHJQ00000000
F6750O157:H7JHJP00000000
F6751O157:H7JHJO00000000
F7350O157:H7JHJN00000000
F7377O157:H7JHJM00000000
F7384O157:H7JHJL00000000
F7410O157:H7JHJK00000000
F9792O169:H41JHJJ00000000
G5303O157:H7JHJI00000000
H2495O157:H7JHJH00000000
H2498O157:H7JHJG00000000
K1420O157:H7JHJF00000000
K1516O15:H18JHJE00000000
K1792O157:H7JHJD00000000
K1793O157:H7JHJC00000000
K1795O157:H7JHJB00000000
K1796O157:H7JHJA00000000
K1845O157:H7JHIZ00000000
K1921O157:H7JHIY00000000
K1927O157:H7JHIX00000000
K2188O157:H7JHIW00000000
K2191O157:H7JHIV00000000
K2192O157:H7JHIU00000000
K2324O157:H7JHIT00000000
K2581O157:H7JHIS00000000
K2622O157:H7JHIR00000000
K2845O157:H7JHIQ00000000
K2854O157:H7JHIP00000000
K4396O157:H7JHIO00000000
K4405O157:H7JHIN00000000
K4406O157:H7JHIM00000000
K4527O157:H7JHIL00000000
K5198O121:H19JHIK00000000
K5269O121:H19JHIJ00000000
K5418O157:H7JHII00000000
K5448O157:H7JHIH00000000
K5449O157:H7JHIG00000000
K5453O157:H7JHIF00000000
K5460O157:H7JHIE00000000
K5467O157:H7JHID00000000
K5602O157:H7JHIC00000000
K5607O157:H7JHIB00000000
K5609O157:H7JHIA00000000
K5806O157:H7JHHZ00000000
K5852O157:H7JHHY00000000
K6590O157:H7JHHX00000000
K6676O157:H7JHHW00000000
K6687O157:H7JHHV00000000
K6722O111:NMJHHU00000000
K6723O111:NMJHHT00000000
K6728O111:NMJHHS00000000
K6890O111:NMJHHR00000000
K6895O111:NMJHHQ00000000
K6897O111:NMJHHP00000000
K6898O111:NMJHHO00000000
K6904O111:NMJHHN00000000
K6908O111:NMJHHM00000000
K6915O111:NMJHHL00000000
K7140O157:H7JHHK00000000
F8704-2O39:NMJHHJ00000000

Citation Trees E, Strockbine N, Changayil S, Ranganathan S, Zhao K, Weil R, MacCannell D, Sabol A, Schmidtke A, Martin H, Stripling D, Ribot EM, Gerner-Smidt P. 2014. Genome sequences of 228 Shiga toxin-producing Escherichia coli isolates and 12 isolates representing other diarrheagenic Ecoli pathotypes. Genome Announc. 2(4):e00718-14. doi:10.1128/genomeA.00718-14.

ACKNOWLEDGMENT

No external funding was received for this project.

REFERENCES Gerner-SmidtPHiseKKincaidJHunterSRolandoSHyytia-TreesERibotEMSwaminathanB 2006 PulseNet USA: a five-year update. Foodborne Pathog. Dis. 3:919. 10.1089/fpd.2006.3.916602975 KorenSSchatzMCWalenzBPMartinJHowardJTGanapathyGWangZRaskoDAMcCombieWRJarvisEDPhillippyAM 2012 Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 30:693700. 10.1038/nbt.228022750884