<!DOCTYPE article
PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD with MathML3 v1.3 20210610//EN" "JATS-archivearticle1-3-mathml3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="1.3" xml:lang="en" article-type="research-article"><?properties open_access?><?properties manuscript?><processing-meta base-tagset="archiving" mathml-version="3.0" table-model="xhtml" tagset-family="jats"><restricted-by>pmc</restricted-by></processing-meta><front><journal-meta><journal-id journal-id-type="nlm-journal-id">101522599</journal-id><journal-id journal-id-type="pubmed-jr-id">37026</journal-id><journal-id journal-id-type="nlm-ta">Ticks Tick Borne Dis</journal-id><journal-id journal-id-type="iso-abbrev">Ticks Tick Borne Dis</journal-id><journal-title-group><journal-title>Ticks and tick-borne diseases</journal-title></journal-title-group><issn pub-type="ppub">1877-959X</issn><issn pub-type="epub">1877-9603</issn></journal-meta><article-meta><article-id pub-id-type="pmid">37247570</article-id><article-id pub-id-type="pmc">10878300</article-id><article-id pub-id-type="doi">10.1016/j.ttbdis.2023.102207</article-id><article-id pub-id-type="manuscript">HHSPA1967886</article-id><article-categories><subj-group subj-group-type="heading"><subject>Article</subject></subj-group></article-categories><title-group><article-title>A bioinformatics pipeline for a tick pathogen surveillance multiplex amplicon sequencing assay</article-title></title-group><contrib-group><contrib contrib-type="author"><name><surname>Osikowicz</surname><given-names>Lynn M.</given-names></name><xref rid="A1" ref-type="aff">a</xref><xref rid="CR1" ref-type="corresp">*</xref></contrib><contrib contrib-type="author"><name><surname>Hojgaard</surname><given-names>Andrias</given-names></name><xref rid="A1" ref-type="aff">a</xref></contrib><contrib contrib-type="author"><name><surname>Maes</surname><given-names>Sarah</given-names></name><xref rid="A1" ref-type="aff">a</xref></contrib><contrib contrib-type="author"><name><surname>Eisen</surname><given-names>Rebecca J.</given-names></name><xref rid="A1" ref-type="aff">a</xref></contrib><contrib contrib-type="author"><name><surname>Stenglein</surname><given-names>Mark D.</given-names></name><xref rid="A2" ref-type="aff">b</xref></contrib></contrib-group><aff id="A1"><label>a</label>Division of Vector-Borne Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Fort Collins, CO, United States</aff><aff id="A2"><label>b</label>Center for Vector-Borne Infectious Disease, Department of Microbiology, Immunology, and Pathology, College of Veterinary Medicine and Biomedical Sciences, Colorado State University, Fort Collins, CO, United States</aff><author-notes><corresp id="CR1"><label>*</label>Corresponding author: <email>vir5@cdc.gov</email> (L.M. Osikowicz).</corresp></author-notes><pub-date pub-type="nihms-submitted"><day>15</day><month>2</month><year>2024</year></pub-date><pub-date pub-type="ppub"><month>9</month><year>2023</year></pub-date><pub-date pub-type="epub"><day>27</day><month>5</month><year>2023</year></pub-date><pub-date pub-type="pmc-release"><day>20</day><month>2</month><year>2024</year></pub-date><volume>14</volume><issue>5</issue><fpage>102207</fpage><lpage>102207</lpage><permissions><license><ali:license_ref xmlns:ali="http://www.niso.org/schemas/ali/1.0/" specific-use="textmining" content-type="ccbylicense">https://creativecommons.org/licenses/by/4.0/</ali:license_ref><license-p>This is an open access article under the CC BY license (<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>).</license-p></license></permissions><abstract id="ABS1"><p id="P1">The Centers for Disease Control and Prevention&#x02019;s national tick and tick-borne pathogen surveillance program collects information to better understand the regional distribution, prevalence, and exposure risk of host-seeking medically important ticks in the United States. A recently developed next generation sequencing (NGS) targeted multiplex PCR amplicon sequencing (MPAS) assay has enhanced the detection capabilities for <italic toggle="yes">Ixodes</italic>-associated human pathogens found in <italic toggle="yes">Ixodes scapularis</italic> and <italic toggle="yes">Ixodes pacificus</italic> ticks compared to the routinely used real-time PCR assay. To operationalize the MPAS assay for the large number of tick surveillance submissions processed each year, a reproducible high throughput bioinformatics pipeline is needed. We describe the development and validation of the MPAS pipeline, a bioinformatics pipeline that identifies and summarizes amplicon sequences produced by the MPAS assay. This pipeline is portable and reproducible across different computing environments, and flexible by allowing modifications to input parameters, assay primer and reference sequences. The automation of the summary report, BLAST report, and phylogenetic analysis reduces the amount of time needed for downstream analysis. To validate this pipeline, we compared the analysis of a MPAS assay dataset consisting of 175 <italic toggle="yes">I. scapularis</italic> nymphs with the MPAS pipeline and previously published results analyzed with a CLC Genomic Workbench workflow. The MPAS pipeline identified the same number of positive ticks for <italic toggle="yes">Anaplasma phagocytophilum</italic> and <italic toggle="yes">Babesia</italic> species as the original analysis, but the MPAS pipeline provided enhanced sequencing resolution of <italic toggle="yes">Borrelia burgdorferi</italic> sensu lato co-infected samples. The reproducibility, flexibility, analysis automation, and improved sequence resolution of the MPAS pipeline make it well suited for a high throughput tick pathogen surveillance program.</p></abstract><kwd-group><kwd>Tick-borne diseases</kwd><kwd>Tick surveillance</kwd><kwd>Next generation sequencing</kwd><kwd>Bioinformatics</kwd></kwd-group></article-meta></front><body><sec id="S1"><label>1.</label><title>Introduction</title><p id="P2">Tick-borne diseases account for the majority of all reported vector-borne disease cases in the United States (<xref rid="R25" ref-type="bibr">Rosenberg et al., 2018</xref>). The incidence of tick-borne diseases continues to increase, cases are reported over an expanding region, and novel tickborne disease agents continue to be identified (<xref rid="R9" ref-type="bibr">Eisen et al., 2017</xref>). To track changes in environmental risk factors for tick-borne diseases, the Centers for Disease Control and Prevention (CDC) established a national tick and tick-borne pathogen surveillance program in 2018. The program collects data to better understand the regional distribution, prevalence, and exposure risk of host-seeking medically important ticks (<xref rid="R6" ref-type="bibr">CDC, 2018</xref>; <xref rid="R9" ref-type="bibr">Eisen et al., 2017</xref>). The CDC provides pathogen testing support services to public health partners for <italic toggle="yes">Ixodes scapularis</italic> and <italic toggle="yes">Ixodes pacificus</italic> ticks. In the United States, these two tick species alone are responsible for transmitting up to seven known human pathogens including <italic toggle="yes">Borrelia burgdorferi</italic> sensu stricto (s.s.), the primary causative agent of Lyme disease, for which an estimated 476,000 Americans are treated each year (<xref rid="R10" ref-type="bibr">Eisen and Paddock, 2021</xref>; <xref rid="R17" ref-type="bibr">Kugeler et al., 2021</xref>). The other known human pathogens transmitted by <italic toggle="yes">I. scapularis</italic> include <italic toggle="yes">Borrelia mayonii, Borrelia miyamotoi, Anaplasma phagocytophilum, Ehrlichia muris eauclarensis, Babesia microti</italic>, and Powassan virus (<xref rid="R10" ref-type="bibr">Eisen and Paddock, 2021</xref>). In addition to <italic toggle="yes">B. burgdorferi</italic> s.s<italic toggle="yes">., I. pacificus</italic> can also transmit <italic toggle="yes">B. miyamotoi and A. phagocytophilum</italic> (<xref rid="R10" ref-type="bibr">Eisen and Paddock, 2021</xref>). There are eight other characterized <italic toggle="yes">Borrelia</italic> species found in <italic toggle="yes">Ixodes</italic> ticks in the U.S. that belong to the <italic toggle="yes">B. burgdorferi</italic> sensu lato (s.l.) complex (<italic toggle="yes">B. americana, B. andersonii, B. bissettiae, B. californiensis, B. carolinensis, B. kurtenbachii, B. lanei</italic>, and <italic toggle="yes">B. maritima</italic>), but their pathogenic potential is not well described ((<xref rid="R26" ref-type="bibr">Rudenko et al., 2011</xref>; <xref rid="R29" ref-type="bibr">Wolcott et al., 2021</xref>).</p><p id="P3">The existing tick surveillance testing algorithm consists of five multiplex TaqMan based real-time polymerase chain reaction (PCR) assays, which provide species level detection for the <italic toggle="yes">Ixodes</italic> spp. pathogens (<xref rid="R13" ref-type="bibr">Graham et al., 2018</xref>). Recent developments of a next generation sequencing (NGS) targeted multiplex PCR amplicon sequencing assay (MPAS) have shown comparable sensitivity and enhanced specificity to the TaqMan algorithm, while also requiring less input nucleic acid (<xref rid="R15" ref-type="bibr">Hojgaard et al., 2020</xref>). The MPAS assay amplifies sequences from four microbial genera that include pathogens transmitted by <italic toggle="yes">Ixodes</italic> species ticks: <italic toggle="yes">Borrelia</italic> spp., <italic toggle="yes">Babesia</italic> spp., <italic toggle="yes">Ehrlichia</italic> spp., and <italic toggle="yes">Anaplasma</italic> spp., and the tick actin gene, which acts as an internal control (<xref rid="R15" ref-type="bibr">Hojgaard et al., 2020</xref>). The final products are 150&#x02013;350 bp amplicon sequences, which improves the detection capabilities to include a wider range of microbial species and detects co-infected ticks (<xref rid="R15" ref-type="bibr">Hojgaard et al., 2020</xref>). Compared with the existing TaqMan testing algorithm, the MPAS assay expands the wealth of microbial information that can be collected from tick-borne pathogen surveillance testing. However, to operationalize the assay to accommodate testing of more than 7000 tick submissions per year, a reproducible high throughput bioinformatics pipeline is needed to efficiently analyze the NGS data that is produced from the large number of ticks annually tested.</p><p id="P4">Here we describe the development and validation of the MPAS pipeline, a bioinformatics pipeline that identifies and summarizes amplicon sequences produced by the MPAS assay. A number of goals guided the design and implementation of the pipeline. The pipeline should be at least as accurate as existing methods. It should be portable and reproducible and be able to run on different computing environments, and results should not vary on different platforms. It should be able to efficiently analyze large numbers of datasets. It should be modifiable to add or remove targeted pathogens. It should output and log results for surveillance efforts and provide a starting point for future research on tick borne pathogens. This paper describes how the pipeline was developed to have these properties and how it was validated in comparison to existing testing protocols.</p></sec><sec id="S2"><label>2.</label><title>Materials and methods</title><sec id="S3"><label>2.1.</label><title>MPAS pipeline development</title><p id="P5">The MPAS pipeline is implemented in the Nextflow workflow management language, version 20.10.0 (<xref rid="R7" ref-type="bibr">Di Tommaso et al., 2017</xref>). Nextflow pipelines offer a number of advantages including scripting flexibility and portability, as pipeline can be run on a local computer, cluster system, or cloud environment (<xref rid="R7" ref-type="bibr">Di Tommaso et al., 2017</xref>). To ensure portability to other computing environments, other software dependencies are handled using Singularity containers or Conda environments (<xref rid="R2" ref-type="bibr">Anaconda Software Distribution, 2020</xref>; <xref rid="R7" ref-type="bibr">Di Tommaso et al., 2017</xref>; <xref rid="R18" ref-type="bibr">Kurtzer et al., 2017</xref>). These environments and containers lock software versions to ensure reproducibility. The MPAS pipeline is available in the GitHub repository: CDCgov/tick_surveillance (<ext-link xlink:href="http://github.com" ext-link-type="uri">github.com</ext-link>)</p><sec id="S4"><label>2.1.1.</label><title>MPAS pipeline user input</title><p id="P6">The pipeline is run from the command line using Nextflow. The user provides the MPAS pipeline with several input files (<xref rid="T1" ref-type="table">Table 1</xref>) that exist in default locations relative to the run directory, but those locations can be overridden as described in the pipeline documentation (CDCgov/tick_surveillance (<ext-link xlink:href="http://github.com" ext-link-type="uri">github.com</ext-link>)).</p><p id="P7">The metadata file must contain the required columns listed in <xref rid="T1" ref-type="table">Table 1</xref>. An additional column titled &#x02018;<italic toggle="yes">Batch</italic>&#x02019; can be included if it is desired to group samples together for analysis. Any other relevant metadata information can be included in the metadata file at the discretion of the user. These metadata will be output as part of the pipeline&#x02019;s results.</p><p id="P8">The <italic toggle="yes">primers.tsv, targets.tsv</italic>, and <italic toggle="yes">surveillance_columns.txt</italic> files can be customized for the specific MPAS assay and include assay primer information, reference sequences, and the reported output information desired from the analysis (<xref rid="T1" ref-type="table">Table 1</xref>). The <italic toggle="yes">targets</italic> file contains the reference sequences and alignment parameters used to map sample reads to reference sequences. The user can indicate in the <italic toggle="yes">reporting_columns</italic> field if the read count or the species name should be used as the reported parameter in the surveillance report file. The user can indicate &#x02018;NA&#x02019; in the <italic toggle="yes">reporting_columns</italic> field if the reference sequence should not be listed in the summarized surveillance report. The <italic toggle="yes">surveillance_columns.txt</italic> file indicates the information that will be included in the output surveillance report. This report displays a summarized view of the target microorganism calls for each MPAS pipeline run. To include a reference sequence in the summarized surveillance report, the following should be indicated on a single line in the <italic toggle="yes">surveillance_columns.txt</italic> file; the <italic toggle="yes">reporting_column</italic> entry from the <italic toggle="yes">targets.tsv</italic> file, tab spaced, and &#x02018;Negative&#x02019;, which indicates that this entry will be assessed for sufficient species read numbers. The entry will receive a &#x02018;Positive&#x02019; or &#x02018;Negative&#x02019; call in the sequencing report file if the species read number is greater than the minimum number of target reads.</p></sec><sec id="S5"><label>2.1.2.</label><title>MPAS pipeline workflow</title><p id="P9">The sequencing analysis of the input FASTQ files occurs in nine primary workflow steps (<xref rid="F1" ref-type="fig">Fig. 1</xref>) using open-source software tools (<xref rid="T2" ref-type="table">Table 2</xref>). First, the reference sequences from the <italic toggle="yes">targets.csv</italic> file are converted to individual FASTA files and then merged into one FASTA file. Next, the reference sequences are indexed which is used downstream for sequence alignment. The quality of input FASTQ files is then analyzed with FastQC version 0.11 and quality reports are merged with MultiQC version 1.1. Cutadapt version 3.5 is used to remove reads shorter than the set minimum read length (100 bp by default) and to trim primers specified in the <italic toggle="yes">primers.tsv</italic> file and Illumina Nextera adapters (Illumina, San Diego, CA, USA) from each read. Cutadapt will only keep reads that contain the correct primer pairs and primer orientation, which discards artifactual or off-target amplicons. The quality scores of the trimmed sequences are then reassessed and combined into one report.</p><p id="P10">Next, the reads are processed by DADA2 version 1.18. This tool performs error correction, merges paired reads, groups identical sequences (amplicon sequence variants, ASVs), and tabulates the number of read pairs for each ASV. The observed ASVs are aligned to reference sequences with configurable sequence similarity cutoffs using BLASTn version 2.10. The number of internal control (tick actin) reads and pathogen reads are analyzed for each sample. A sample is considered to have sufficient internal control reads if the log number of internal control reads for each sample is within three times the standard deviation of average log of internal control reads per batch. If batch groups are not set, then internal control read cut-off is calculated from all the samples in the analysis. The default minimum number of read pairs aligned to reference sequences required for a positive call is 50. Next, a phylogenetic analysis of the aligned sequences amplified from each assay primer is performed. The FASTA files created contain the appropriate reference sequence from the <italic toggle="yes">targets.tsv</italic> file and a representative observed sequence from identified sequences grouped by state, tick species, and life stage. A multiple sequence alignment is performed for each FASTA file using MAFFT version 7.508, a maximum likelihood tree is created with IQ-TREE version 2.2.0.3, and pdf files for each tree are generated with ToyTree version 2.0.5. The phylogenetic analysis is intended to provide a supplemental visual to the user to help quickly identify divergent sequences and is not used to classify species. The user should determine if this visual is pertinent to their specific assay, as not all genes have the same phylogenetic power.</p><p id="P11">Finally, any sequences that did not align to the input reference sequences are searched against the NCBI nucleotide database using BLASTn. This search can take advantage of a local installation of this database or can run remotely. This step is meant to characterize divergent sequences that are outside of the sequence similarity parameters thresholds set for each reference sequence and could be useful for identifying divergent or novel pathogens whose sequences are amplified by the assay primers.</p></sec><sec id="S6"><label>2.1.3.</label><title>MPAS pipeline results</title><p id="P12">Pipeline output is placed by default in a results directory and consists of run reports, trimmed FASTQ files, FastQC and MultiQC reports, a sequencing report file, NCBI BLAST report, phylogenetic trees, and other intermediate output files from the DADA2 and reference sequence alignment analysis steps. The sequence report file in Excel format contains multiple tabs which include a summarized surveillance report, sequence information, metadata, and run parameter information. The <italic toggle="yes">Testing Results</italic> tab reports if samples had acceptable internal control reads and identifies if samples pass the minimum read cutoff values for any of the specified surveillance species. The <italic toggle="yes">surveillance_counts</italic> tab reports the total number read pairs observed for each species call in the <italic toggle="yes">surveillance</italic> tab. The <italic toggle="yes">data_by_species</italic> tab displays the reads identified as particular species based on the alignment and read abundance parameters for the MPAS pipeline run. The <italic toggle="yes">all_data</italic> tab contains all the unique sequences identified from each sample and the related alignment information. Finally, the <italic toggle="yes">metadata</italic> and <italic toggle="yes">targets</italic> tabs contain the input metadata and target information for the pipeline run.</p></sec></sec><sec id="S7"><label>2.2.</label><title>Pipeline validation</title><sec id="S8"><label>2.2.1.</label><title>Procedure</title><p id="P13">To validate the MPAS pipeline, we analyzed the MPAS assay FASTQ files used in the original description of the MPAS assay by <xref rid="R15" ref-type="bibr">Hojgaard et al. (2020)</xref> and compared the MPAS pipeline results to the original analysis performed using the CLC Genomic Workbench (Qiagen, Germantown, MD, USA) (<xref rid="R15" ref-type="bibr">Hojgaard et al., 2020</xref>). These datasets are derived from 175 host seeking <italic toggle="yes">I. scapularis</italic> nymphs collected in Connecticut, USA. Briefly, the primary PCR reaction contained primers targeting the four genera associated with <italic toggle="yes">Ixodes</italic> transmitted human pathogens: <italic toggle="yes">Borrelia</italic> spp. (<italic toggle="yes">flaB</italic>, 335 bp), <italic toggle="yes">Babesia</italic> spp. (<italic toggle="yes">18S</italic>, 247 bp), <italic toggle="yes">Anaplasma</italic> spp. (<italic toggle="yes">groEL</italic>, 315 bp), <italic toggle="yes">Ehrlichia</italic> spp. (<italic toggle="yes">groEL</italic>, 315 bp), and tick actin (156 bp) (<xref rid="R15" ref-type="bibr">Hojgaard et al., 2020</xref>). After the primary PCR, each reaction was purified following the protocol described by <xref rid="R15" ref-type="bibr">Hojgaard et al. (2020)</xref> and indexed using the Nextera XT index kit (Illumina). Once the indexes were added, each reaction was purified, pooled, and sequenced following the manufacturer&#x02019;s instructions using the MiSeq Reagent Kit v3 (600-cycle) (Illumina) on the MiSeq instrument (Illumina).</p><p id="P14">The CLC Workbench analysis was run with the customized workflow and reference sequences described by <xref rid="R15" ref-type="bibr">Hojgaard et al. (2020)</xref>. The primary CLC workflow steps include, QC reports, merge overlapping read pairs, trim adapters, mapping observed reads to reference sequences, and de novo assembly of un-mapped reads (<xref rid="R15" ref-type="bibr">Hojgaard et al., 2020</xref>). The BLAST analysis of the de novo assembly and target microorganism calls were performed manually (<xref rid="R15" ref-type="bibr">Hojgaard et al., 2020</xref>). Samples were considered positive for a pathogen if there was a 10-fold increase in mapped reads above the negative controls (<xref rid="R15" ref-type="bibr">Hojgaard et al., 2020</xref>). The MPAS pipeline analyzed this dataset with the default minimum read length and read cut-off values described above. The same reference sequences used in the CLC analysis were also used in the MPAS pipeline analysis with the inclusion of additional tick actin reference sequences (<xref rid="SD3" ref-type="supplementary-material">Supplement A Table 1</xref>). The minimum percent identity and minimum percent aligned cutoffs for the internal tick control were set to 85% and 95%. The reference sequence cutoffs were set to 95% for minimum percent identity and 99% for minimum percent aligned. The maximum percent gaps was set to 5% for all reference sequences. NCBI BLAST was used to further categorize the reported ASV found in the samples. The MPAS pipeline analyzed this dataset twice to confirm that the pipeline was reproducible in different computing environments. The first analysis was initiated using the Conda environment setting on a local Linux workstation, and the second analysis was initiated using the Singularity setting on a Linux terminal in a high-performance computing environment. <xref rid="SD1" ref-type="supplementary-material">Supplement B</xref> contains the <italic toggle="yes">primers.tsv, targets.tsv</italic>, and <italic toggle="yes">surveillance_columns.txt</italic> files used for the MPAS pipeline analysis. All raw FASTQ files are available at NCBI BioProject ID PRJNA937278 (BioSamples: SAMN33395139-SAMN33395313).</p></sec></sec></sec><sec id="S9"><label>3.</label><title>Results</title><p id="P15">In total, 175 samples were analyzed with the MPAS pipeline and CLC Workbench. The MPAS pipeline results from each computing environments (Conda, Singularity) produced identical pathogen calls for each sample in the dataset. The median number of normalized reads per target for each computing environment can be seen in <xref rid="SD3" ref-type="supplementary-material">Supplement A Table 2</xref>. Since both environments produced identical results, the Singularity environment results will be used to further discuss the pipeline results described below.</p><p id="P16">After primer trimming, the MPAS pipeline reported an average per sequence Phred score above 28 for each R1 and R2 file. An average of 89% of reads (standard deviation = 13.4%) in the dataset were assigned to the reference sequences. We have observed that overloading of the Illumina run can produce reads with regions of low quality, particularly in the beginning of reads. The resulting N basecalls cause the MPAS pipeline to fail to recognize expected primer sequences, resulting in negative results. We suggest optimizing the loading concentration for any newly developed MPAS assay. The sequencing libraries for this study were optimized to 10 pM.</p><p id="P17">The MPAS pipeline analysis of the samples yielded similar results as the CLC analysis (<xref rid="T3" ref-type="table">Tables 3</xref>, <xref rid="T4" ref-type="table">4</xref>). The notable exception was detection of multiple <italic toggle="yes">B. burgdorferi</italic> s.l. strain types of <italic toggle="yes">B. andersonii</italic> in three samples by the MPAS pipeline, while the CLC analysis did not separate <italic toggle="yes">B. burgdorferi</italic> s.l. strain types, and only identified a single <italic toggle="yes">B. andersonii</italic> sequence in each of the three <italic toggle="yes">B. andersonii</italic> positive samples. The MPAS pipeline reported four unique sequences in a single sample that were 99&#x02013;100% similar to two <italic toggle="yes">B. burgdorferi</italic> s.l. strain types, <italic toggle="yes">B. burgdorferi</italic> SI-10 (GenBank Accession <ext-link xlink:href="AF264883" ext-link-type="DDBJ/EMBL/GenBank">AF264883</ext-link>) (814 total reads) and <italic toggle="yes">B. burgdorferi</italic> BC-1 (GenBank Accession <ext-link xlink:href="AF264898" ext-link-type="DDBJ/EMBL/GenBank">AF264898</ext-link>) (108 total reads), both of which are considered strain types of <italic toggle="yes">B. andersonii</italic> (<xref rid="R19" ref-type="bibr">Lin et al., 2004</xref>). A second sample contained four unique sequences that were 99&#x02013;100% similar to <italic toggle="yes">B. burgdorferi</italic> SI-10 (1159 total reads), <italic toggle="yes">B. burgdorferi</italic> BC-1 (594 total reads), and <italic toggle="yes">B. andersonii</italic> 21,038 (786 total reads), and the third sample contained six unique sequences that were 99&#x02013;100% similar to <italic toggle="yes">B. burgdorferi</italic> strain SI-10 (2529 total reads) and <italic toggle="yes">B. burgdorferi</italic> BC-1 (1362 total reads).</p><p id="P18">One sample did not pass the MPAS actin read cutoff value for acceptable DNA, which was not a requirement of the CLC workflow. The MPAS pipeline analysis identified a median of 13.0 (Range: 0&#x02013;23) tick actin reads and a median of 0 (Range:0) target microorganism reads in the negative controls. The CLC analysis identified a median of 0 (Range: 0&#x02013;6) targeted microorganism reads in the negative controls and the tick actin reads were not evaluated (<xref rid="R15" ref-type="bibr">Hojgaard et al., 2020</xref>).</p></sec><sec id="S10"><label>4.</label><title>Discussion</title><p id="P19">The new bioinformatics workflow improved sensitivity through enhanced detection of <italic toggle="yes">B. burgdorferi</italic> s.l. co-infections. The newer pipeline also offers enhanced efficiency through automation and improved reproducibility through use of workflow management software and software containers. The MPAS pipeline creates two primary output report files for a dataset: one containing a summarized report of target microorganism calls, and the other containing the BLAST report for unaligned sequences. These improvements in automation enable the MPAS assay to be used in a high-throughput surveillance program. By comparison, the CLC analysis contains automated primary workflow steps but requires manual target microorganism calls, BLAST search of the un-mapped reads, and the creation of summary reports. The MPAS pipeline eliminates the user time required for these manual steps which improves analysis efficiency and increases reproducibility of the analysis across datasets. An additional benefit of this pipeline is customization. The user can customize assay primer information, reference sequences, alignment parameters, and the summary report layout, making this a flexible workflow that can be adapted to other amplicon sequencing assays.</p><p id="P20">All the microorganisms that were identified in the CLC Workbench analysis were also identified in the MPAS pipeline analysis. However, the MPAS pipeline was able to further distinguish <italic toggle="yes">B. burgdorferi</italic> s.l. co-infections, which were not identified by the original CLC analysis. This result is likely attributable to the difference in the sequence assembly between the MPAS pipeline and CLC analysis. The CLC analysis collapses all sequences that are within the set minimum sequence similarity parameter to one contig, potentially inhibiting the taxonomic resolution depending on how broad the parameter is set. In contrast, the MPAS pipeline identifies ASVs, which can differ by as little as one base pair, and then aligns the observed ASVs to the reference sequences and reports the percent similarity to the reference sequence. This difference enables the MPAS pipeline to report all observed ASVs within the minimum percent similarity parameter, improving the taxonomic resolution. The primary objective of the CDC national tick surveillance program is to identify human pathogens transmitted by ticks. The ability to provide enhanced resolution on <italic toggle="yes">B. burgdorferi</italic> s.l. complex co-infections is crucial for tick-borne pathogen surveillance because not all <italic toggle="yes">B. burgdorferi</italic> s.l. species are known to cause disease in humans. Distinguishing pathogenic and non-pathogenic <italic toggle="yes">B. burgdorferi</italic> s.l. species provides a more accurate estimation of exposure risk and pathogen prevalence from tick surveillance data, thus improving public health messaging and intervention strategies.</p><p id="P21">The benefits of the MPAS pipeline along with the portability and reproducibility of Nextflow pipelines, makes this pipeline an ideal analysis tools for high throughput tick and tick-borne pathogen surveillance programs. Through improvements in specificity, the MPAS assay and its associated bioinformatics pipeline improve assessments of acarological risk and aid in tickborne pathogen discovery.</p></sec><sec sec-type="supplementary-material" id="SM1"><title>Supplementary Material</title><supplementary-material id="SD1" position="float" content-type="local-data"><label>primers</label><media xlink:href="NIHMS1967886-supplement-primers.tsv" id="d64e573" position="anchor"/></supplementary-material><supplementary-material id="SD2" position="float" content-type="local-data"><label>sur_col</label><media xlink:href="NIHMS1967886-supplement-sur_col.txt" id="d64e576" position="anchor"/></supplementary-material><supplementary-material id="SD3" position="float" content-type="local-data"><label>supA</label><media xlink:href="NIHMS1967886-supplement-supA.docx" id="d64e579" position="anchor"/></supplementary-material><supplementary-material id="SD4" position="float" content-type="local-data"><label>targets</label><media xlink:href="NIHMS1967886-supplement-targets.tsv" id="d64e582" position="anchor"/></supplementary-material></sec></body><back><ack id="S12"><title>Funding source</title><p id="P23">This work was supported by CDC intramural funding.</p></ack><fn-group><fn id="FN1"><p id="P24">Disclaimer</p><p id="P25">The findings and conclusions in this presentation are those of the author(s) and do not necessarily represent the views of the Centers for Disease Control and Prevention.</p></fn><fn id="FN2"><p id="P26">CRediT authorship contribution statement</p><p id="P27"><bold>Lynn M. Osikowicz:</bold> Conceptualization, Investigation, Methodology, Visualization, Software, Writing &#x02013; original draft, Writing &#x02013; review &#x00026; editing. <bold>Andrias Hojgaard:</bold> Conceptualization, Investigation, Methodology, Writing &#x02013; review &#x00026; editing. <bold>Sarah Maes:</bold> Conceptualization, Investigation, Methodology, Writing &#x02013; review &#x00026; editing. <bold>Rebecca J. Eisen:</bold> Conceptualization, Investigation, Methodology, Writing &#x02013; review &#x00026; editing. <bold>Mark D. Stenglein:</bold> Conceptualization, Investigation, Methodology, Visualization, Software, Writing &#x02013; original draft, Writing &#x02013; review &#x00026; editing.</p></fn><fn id="FN3"><p id="P28">Declaration of Competing Interest</p><p id="P29">None.</p></fn><fn id="FN4"><p id="P30">Supplementary materials</p><p id="P31">Supplementary material associated with this article can be found, in the online version, at doi:<ext-link xlink:href="10.1016/j.ttbdis.2023.102207" ext-link-type="doi">10.1016/j.ttbdis.2023.102207</ext-link>.</p></fn></fn-group><sec sec-type="data-availability" id="S11"><title>Data availability</title><p id="P22">The github repository link has been included in the manuscript.</p></sec><ref-list><title>References</title><ref id="R1"><mixed-citation publication-type="journal"><name><surname>Altschul</surname><given-names>SF</given-names></name>, <name><surname>Gish</surname><given-names>W</given-names></name>, <name><surname>Miller</surname><given-names>W</given-names></name>, <name><surname>Myers</surname><given-names>EW</given-names></name>, <name><surname>Lipman</surname><given-names>DJ</given-names></name>, <year>1990</year>. <article-title>Basic local alignment search tool</article-title>. <source>J. Mol. Biol</source>
<pub-id pub-id-type="doi">10.1016/S0022-2836(05)80360-2</pub-id>.</mixed-citation></ref><ref id="R2"><mixed-citation publication-type="book"><collab>Anaconda Software Distribution</collab>, <year>2020</year>. <source>Anaconda Documentation</source>
<publisher-name>Anaconda Inc</publisher-name>. <comment>Retrieved from.</comment>
<comment><ext-link xlink:href="https://docs.anaconda.com/" ext-link-type="uri">https://docs.anaconda.com/</ext-link>.</comment>
<comment>Accessed</comment>
<date-in-citation>9.19.22</date-in-citation>.</mixed-citation></ref><ref id="R3"><mixed-citation publication-type="journal"><name><surname>Andrews</surname><given-names>S</given-names></name>, <year>2010</year>. <article-title>FastQC - A quality control tool for high throughput sequence data</article-title>. <source>Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence Data</source></mixed-citation></ref><ref id="R4"><mixed-citation publication-type="journal"><name><surname>Callahan</surname><given-names>BJ</given-names></name>, <name><surname>McMurdie</surname><given-names>PJ</given-names></name>, <name><surname>Rosen</surname><given-names>MJ</given-names></name>, <name><surname>Han</surname><given-names>AW</given-names></name>, <name><surname>Johnson</surname><given-names>AJA</given-names></name>, <name><surname>Holmes</surname><given-names>SP</given-names></name>, <year>2016</year>. <article-title>DADA2: high-resolution sample inference from Illumina amplicon data</article-title>. <source>Nat. Methods</source>
<volume>13</volume>. <pub-id pub-id-type="doi">10.1038/nmeth.3869</pub-id>.</mixed-citation></ref><ref id="R5"><mixed-citation publication-type="journal"><name><surname>Camacho</surname><given-names>C</given-names></name>, <name><surname>Coulouris</surname><given-names>G</given-names></name>, <name><surname>Avagyan</surname><given-names>V</given-names></name>, <name><surname>Ma</surname><given-names>N</given-names></name>, <name><surname>Papadopoulos</surname><given-names>J</given-names></name>, <name><surname>Bealer</surname><given-names>K</given-names></name>, <name><surname>Madden</surname><given-names>TL</given-names></name>, <year>2009</year>. <article-title>BLAST+: architecture and applications</article-title>. <source>BMC Bioinf</source>
<volume>10</volume>
<pub-id pub-id-type="doi">10.1186/1471-2105-10-421</pub-id>.</mixed-citation></ref><ref id="R6"><mixed-citation publication-type="book"><collab>CDC</collab>, <year>2018</year>. <source>Surveillance For Ixodes scapularis and Pathogens Found in This Tick Species in the United States</source>
<publisher-name>CDC</publisher-name>. <comment>URL.</comment>
<comment><ext-link xlink:href="https://www.cdc.gov/ticks/resources/TickSurveillance_Iscapularis-P.pdf" ext-link-type="uri">https://www.cdc.gov/ticks/resources/TickSurveillance_Iscapularis-P.pdf</ext-link>.</comment>
<comment>Accessed</comment>
<date-in-citation>9.19.22</date-in-citation>.</mixed-citation></ref><ref id="R7"><mixed-citation publication-type="journal"><name><surname>Di Tommaso</surname><given-names>P</given-names></name>, <name><surname>Chatzou</surname><given-names>M</given-names></name>, <name><surname>Floden</surname><given-names>EW</given-names></name>, <name><surname>Barja</surname><given-names>PP</given-names></name>, <name><surname>Palumbo</surname><given-names>E</given-names></name>, <name><surname>Notredame</surname><given-names>C</given-names></name>, <year>2017</year>. <article-title>Nextflow enables reproducible computational workflows</article-title>. <source>Nat. Biotechnol</source>
<pub-id pub-id-type="doi">10.1038/nbt.3820</pub-id>.</mixed-citation></ref><ref id="R8"><mixed-citation publication-type="journal"><name><surname>Eaton</surname><given-names>DAR</given-names></name>, <year>2020</year>. <article-title>Toytree: a minimalist tree visualization and manipulation library for Python</article-title>. <source>Methods Ecol. Evol</source>
<volume>11</volume>
<pub-id pub-id-type="doi">10.1111/2041-210X.13313</pub-id>.</mixed-citation></ref><ref id="R9"><mixed-citation publication-type="journal"><name><surname>Eisen</surname><given-names>RJ</given-names></name>, <name><surname>Kugeler</surname><given-names>KJ</given-names></name>, <name><surname>Eisen</surname><given-names>L</given-names></name>, <name><surname>Beard</surname><given-names>CB</given-names></name>, <name><surname>Paddock</surname><given-names>CD</given-names></name>, <year>2017</year>. <article-title>Tick-borne zoonoses in the United States: persistent and emerging threats to human health</article-title>. <source>ILAR J</source>
<volume>58</volume>
<pub-id pub-id-type="doi">10.1093/ilar/ilx005</pub-id>.</mixed-citation></ref><ref id="R10"><mixed-citation publication-type="journal"><name><surname>Eisen</surname><given-names>RJ</given-names></name>, <name><surname>Paddock</surname><given-names>CD</given-names></name>, <year>2021</year>. <article-title>Tick and tickborne pathogen surveillance as a public health tool in the United States</article-title>. <source>J. Med. Entomol</source>
<volume>58</volume>
<pub-id pub-id-type="doi">10.1093/jme/tjaa087</pub-id>.</mixed-citation></ref><ref id="R11"><mixed-citation publication-type="journal"><name><surname>Ewels</surname><given-names>P</given-names></name>, <name><surname>Magnusson</surname><given-names>M</given-names></name>, <name><surname>Lundin</surname><given-names>S</given-names></name>, <name><surname>K&#x000e4;ller</surname><given-names>M</given-names></name>, <year>2016</year>. <article-title>MultiQC: summarize analysis results for multiple tools and samples in a single report</article-title>. <source>Bioinformatics</source>
<volume>32</volume>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btw354</pub-id>.</mixed-citation></ref><ref id="R12"><mixed-citation publication-type="other"><name><surname>Gazoni</surname><given-names>E</given-names></name>, <name><surname>Clark</surname><given-names>C</given-names></name>, <year>2022</year>. <source>Openpyxl &#x02013; A python library to read/write Excel 2010 xlsx/xlsm files. openpyxl - A Python library to read/write Excel 2010 xlsx/xlsm files &#x02014; Openpyxl 3.0.10 documentation</source></mixed-citation></ref><ref id="R13"><mixed-citation publication-type="journal"><name><surname>Graham</surname><given-names>CB</given-names></name>, <name><surname>Maes</surname><given-names>SE</given-names></name>, <name><surname>Hojgaard</surname><given-names>A</given-names></name>, <name><surname>Fleshman</surname><given-names>AC</given-names></name>, <name><surname>Sheldon</surname><given-names>SW</given-names></name>, <name><surname>Eisen</surname><given-names>RJ</given-names></name>, <year>2018</year>. <article-title>A molecular algorithm to detect and differentiate human pathogens infecting Ixodes scapularis and Ixodes pacificus (Acari: ixodidae)</article-title>. <source>Ticks Tick Borne Dis</source>
<volume>9</volume>
<pub-id pub-id-type="doi">10.1016/j.ttbdis.2017.12.005</pub-id>.</mixed-citation></ref><ref id="R14"><mixed-citation publication-type="journal"><name><surname>Harris</surname><given-names>CR</given-names></name>, <name><surname>Millman</surname><given-names>KJ</given-names></name>, <name><surname>van der Walt</surname><given-names>SJ</given-names></name>, <name><surname>Gommers</surname><given-names>R</given-names></name>, <name><surname>Virtanen</surname><given-names>P</given-names></name>, <name><surname>Cournapeau</surname><given-names>D</given-names></name>, <name><surname>Wieser</surname><given-names>E</given-names></name>, <name><surname>Taylor</surname><given-names>J</given-names></name>, <name><surname>Berg</surname><given-names>S</given-names></name>, <name><surname>Smith</surname><given-names>NJ</given-names></name>, <name><surname>Kern</surname><given-names>R</given-names></name>, <name><surname>Picus</surname><given-names>M</given-names></name>, <name><surname>Hoyer</surname><given-names>S</given-names></name>, <name><surname>van Kerkwijk</surname><given-names>MH</given-names></name>, <name><surname>Brett</surname><given-names>M</given-names></name>, <name><surname>Haldane</surname><given-names>A</given-names></name>, <name><surname>Del R&#x000ed;o</surname><given-names>JF</given-names></name>, <name><surname>Wiebe</surname><given-names>M</given-names></name>, <name><surname>Peterson</surname><given-names>P</given-names></name>, <name><surname>G&#x000e9;rard-Marchant</surname><given-names>P</given-names></name>, <name><surname>Sheppard</surname><given-names>K</given-names></name>, <name><surname>Reddy</surname><given-names>T</given-names></name>, <name><surname>Weckesser</surname><given-names>W</given-names></name>, <name><surname>Abbasi</surname><given-names>H</given-names></name>, <name><surname>Gohlke</surname><given-names>C</given-names></name>, <name><surname>Oliphant</surname><given-names>TE</given-names></name>, <year>2020</year>. <article-title>Array programming with NumPy</article-title>. <source>Nature</source>
<pub-id pub-id-type="doi">10.1038/s41586-020-2649-2</pub-id>.</mixed-citation></ref><ref id="R15"><mixed-citation publication-type="journal"><name><surname>Hojgaard</surname><given-names>A</given-names></name>, <name><surname>Osikowicz</surname><given-names>LM</given-names></name>, <name><surname>Eisen</surname><given-names>L</given-names></name>, <name><surname>Eisen</surname><given-names>RJ</given-names></name>, <year>2020</year>. <article-title>Evaluation of a novel multiplex PCR amplicon sequencing assay for detection of human pathogens in Ixodes ticks</article-title>. <source>Ticks Tick Borne Dis</source>
<volume>11</volume>
<pub-id pub-id-type="doi">10.1016/j.ttbdis.2020.101504</pub-id>.</mixed-citation></ref><ref id="R16"><mixed-citation publication-type="journal"><name><surname>Katoh</surname><given-names>K</given-names></name>, <name><surname>Standley</surname><given-names>DM</given-names></name>, <year>2013</year>. <article-title>MAFFT multiple sequence alignment software version 7: improvements in performance and usability</article-title>. <source>Mol. Biol. Evol</source>
<volume>30</volume>
<pub-id pub-id-type="doi">10.1093/molbev/mst010</pub-id>.</mixed-citation></ref><ref id="R17"><mixed-citation publication-type="journal"><name><surname>Kugeler</surname><given-names>KJ</given-names></name>, <name><surname>Schwartz</surname><given-names>AM</given-names></name>, <name><surname>Delorey</surname><given-names>MJ</given-names></name>, <name><surname>Mead</surname><given-names>PS</given-names></name>, <name><surname>Hinckley</surname><given-names>AF</given-names></name>, <year>2021</year>. <article-title>Estimating the frequency of lyme disease diagnoses, United States, 2010&#x02013;2018</article-title>. <source>Emerg. Infect. Dis</source>
<pub-id pub-id-type="doi">10.3201/eid2702.202731</pub-id>.</mixed-citation></ref><ref id="R18"><mixed-citation publication-type="journal"><name><surname>Kurtzer</surname><given-names>GM</given-names></name>, <name><surname>Sochat</surname><given-names>V</given-names></name>, <name><surname>Bauer</surname><given-names>MW</given-names></name>, <year>2017</year>. <article-title>Singularity: scientific containers for mobility of compute</article-title>. <source>PLoS One</source>
<volume>12</volume>. <pub-id pub-id-type="doi">10.1371/journal.pone.0177459</pub-id>.</mixed-citation></ref><ref id="R19"><mixed-citation publication-type="journal"><name><surname>Lin</surname><given-names>T</given-names></name>, <name><surname>Oliver</surname><given-names>JH</given-names><suffix>Jr.</suffix></name>, <name><surname>Gao</surname><given-names>L</given-names></name>, <year>2004</year>. <article-title>Molecular characterization of Borrelia isolates from ticks and mammals from the southern United States</article-title>. <source>J. Parasitol</source>
<volume>90</volume> (<issue>6</issue>), <fpage>1298</fpage>&#x02013;<lpage>1307</lpage>. <pub-id pub-id-type="doi">10.1645/GE-195R1</pub-id>.<pub-id pub-id-type="pmid">15715220</pub-id>
</mixed-citation></ref><ref id="R20"><mixed-citation publication-type="journal"><name><surname>Martin</surname><given-names>M</given-names></name>, <year>2011</year>. <article-title>Cutadapt removes adapter sequences from high-throughput sequencing reads</article-title>. <source>EMBnet. J</source>
<volume>17</volume>
<pub-id pub-id-type="doi">10.14806/ej.17.1.200</pub-id>.</mixed-citation></ref><ref id="R21"><mixed-citation publication-type="journal"><name><surname>McKinney</surname><given-names>W</given-names></name>, <year>2010</year>. <article-title>Data structures for statistical computing in Python</article-title>. In: <source>Proceedings of the 9th Python in Science Conference</source>
<pub-id pub-id-type="doi">10.25080/majora-92bf1922-00a</pub-id>.</mixed-citation></ref><ref id="R22"><mixed-citation publication-type="journal"><name><surname>Minh</surname><given-names>BQ</given-names></name>, <name><surname>Schmidt</surname><given-names>HA</given-names></name>, <name><surname>Chernomor</surname><given-names>O</given-names></name>, <name><surname>Schrempf</surname><given-names>D</given-names></name>, <name><surname>Woodhams</surname><given-names>MD</given-names></name>, <name><surname>Haeseler</surname><given-names>AV</given-names></name>, <name><surname>Lanfear</surname><given-names>R</given-names></name>, <year>2020</year>. <article-title>IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era</article-title>. <source>Mol. Biol. Evol</source>
<volume>37</volume> (<issue>5</issue>), <fpage>1530</fpage>&#x02013;<lpage>1534</lpage>. <pub-id pub-id-type="doi">10.1093/molbev/msaa015</pub-id>. <month>May</month>.<pub-id pub-id-type="pmid">32011700</pub-id>
</mixed-citation></ref><ref id="R23"><mixed-citation publication-type="journal"><name><surname>Pratas</surname><given-names>D</given-names></name>, <name><surname>Pinho</surname><given-names>AJ</given-names></name>, <name><surname>Rodrigues</surname><given-names>JMOS</given-names></name>, <year>2014</year>. <article-title>XS: a FASTQ read simulator</article-title>. <source>BMC Res. Notes</source>
<volume>7</volume>. <pub-id pub-id-type="doi">10.1186/1756-0500-7-40</pub-id>.</mixed-citation></ref><ref id="R24"><mixed-citation publication-type="book"><collab>R Core Team</collab>, <year>2021</year>. <source>R: A language and Environment For Statistical Computing v. 3.6. 1</source>
<publisher-name>R Foundation for Statistical Computing</publisher-name>, <publisher-loc>Vienna, Austria</publisher-loc>, <volume>2019</volume>
<comment>URL. <ext-link xlink:href="http://www.R-project.org/" ext-link-type="uri">http://www.R-project.org/</ext-link>.</comment></mixed-citation></ref><ref id="R25"><mixed-citation publication-type="journal"><name><surname>Rosenberg</surname><given-names>R</given-names></name>, <name><surname>Lindsey</surname><given-names>NP</given-names></name>, <name><surname>Fischer</surname><given-names>M</given-names></name>, <name><surname>Gregory</surname><given-names>CJ</given-names></name>, <name><surname>Hinckley</surname><given-names>AF</given-names></name>, <name><surname>Mead</surname><given-names>PS</given-names></name>, <name><surname>Paz-Bailey</surname><given-names>G</given-names></name>, <name><surname>Waterman</surname><given-names>SH</given-names></name>, <name><surname>Drexler</surname><given-names>NA</given-names></name>, <name><surname>Kersh</surname><given-names>GJ</given-names></name>, <name><surname>Hooks</surname><given-names>H</given-names></name>, <name><surname>Partridge</surname><given-names>SK</given-names></name>, <name><surname>Visser</surname><given-names>SN</given-names></name>, <name><surname>Beard</surname><given-names>CB</given-names></name>, <name><surname>Petersen</surname><given-names>LR</given-names></name>, <year>2018</year>. <article-title>Vital signs: trends in reported vectorborne disease cases - United States and territories, 2004&#x02013;2016</article-title>. <source>MMWR Morb. Mortal. Wkly. Rep</source>
<volume>67</volume> (<issue>17</issue>), <fpage>496</fpage>&#x02013;<lpage>501</lpage>. <pub-id pub-id-type="doi">10.15585/mmwr.mm6717e1</pub-id>. <month>May</month>
<day>4</day>.<pub-id pub-id-type="pmid">29723166</pub-id>
</mixed-citation></ref><ref id="R26"><mixed-citation publication-type="journal"><name><surname>Rudenko</surname><given-names>N</given-names></name>, <name><surname>Golovchenko</surname><given-names>M</given-names></name>, <name><surname>Grubhoffer</surname><given-names>L</given-names></name>, <name><surname>Oliver</surname><given-names>JH</given-names><suffix>Jr.</suffix></name>, <year>2011</year>. <article-title>Updates on Borrelia burgdorferi sensu lato complex with respect to public health</article-title>. <source>Ticks Tick Borne Dis</source>
<volume>2</volume> (<issue>3</issue>), <fpage>123</fpage>&#x02013;<lpage>128</lpage>. <pub-id pub-id-type="doi">10.1016/j.ttbdis.2011.04.002</pub-id>.<pub-id pub-id-type="pmid">21890064</pub-id>
</mixed-citation></ref><ref id="R27"><mixed-citation publication-type="webpage"><name><surname>Schauberger</surname><given-names>P</given-names></name>, <name><surname>Walker</surname><given-names>A</given-names></name>, <year>2022</year>. <source>Read, Write and Edit xlsx Files</source>
<comment><ext-link xlink:href="https://ycphs.github.io/openxlsx/index.html" ext-link-type="uri">https://ycphs.github.io/openxlsx/index.html</ext-link></comment>, <comment><ext-link xlink:href="https://github.com/ycphs/openxlsx" ext-link-type="uri">https://github.com/ycphs/openxlsx</ext-link>.</comment></mixed-citation></ref><ref id="R28"><mixed-citation publication-type="journal"><name><surname>Wickham</surname><given-names>H</given-names></name>, <name><surname>Averick</surname><given-names>M</given-names></name>, <name><surname>Bryan</surname><given-names>J</given-names></name>, <name><surname>Chang</surname><given-names>W</given-names></name>, <name><surname>McGowan</surname><given-names>L</given-names></name>, <name><surname>Fran&#x000e7;ois</surname><given-names>R</given-names></name>, <name><surname>Grolemund</surname><given-names>G</given-names></name>, <name><surname>Hayes</surname><given-names>A</given-names></name>, <name><surname>Henry</surname><given-names>L</given-names></name>, <name><surname>Hester</surname><given-names>J</given-names></name>, <name><surname>Kuhn</surname><given-names>M</given-names></name>, <name><surname>Pedersen</surname><given-names>T</given-names></name>, <name><surname>Miller</surname><given-names>E</given-names></name>, <name><surname>Bache</surname><given-names>S</given-names></name>, <name><surname>M&#x000fc;ller</surname><given-names>K</given-names></name>, <name><surname>Ooms</surname><given-names>J</given-names></name>, <name><surname>Robinson</surname><given-names>D</given-names></name>, <name><surname>Seidel</surname><given-names>D</given-names></name>, <name><surname>Spinu</surname><given-names>V</given-names></name>, <name><surname>Takahashi</surname><given-names>K</given-names></name>, <name><surname>Vaughan</surname><given-names>D</given-names></name>, <name><surname>Wilke</surname><given-names>C</given-names></name>, <name><surname>Woo</surname><given-names>K</given-names></name>, <name><surname>Yutani</surname><given-names>H</given-names></name>, <year>2019a</year>. <article-title>Welcome to the Tidyverse</article-title>. <source>J. Open Source Softw</source>
<volume>4</volume>
<pub-id pub-id-type="doi">10.21105/joss.01686</pub-id>.</mixed-citation></ref><ref id="R29"><mixed-citation publication-type="journal"><name><surname>Wolcott</surname><given-names>KA</given-names></name>, <name><surname>Margos</surname><given-names>G</given-names></name>, <name><surname>Fingerle</surname><given-names>V</given-names></name>, <name><surname>Becker</surname><given-names>NS</given-names></name>, <year>2021</year>. <article-title>Host association of Borrelia burgdorferi sensu lato: a review</article-title>. <source>Ticks Tick Borne Dis</source>
<volume>12</volume> (<issue>5</issue>), <fpage>101766</fpage>
<pub-id pub-id-type="doi">10.1016/j.ttbdis.2021.101766</pub-id>.<pub-id pub-id-type="pmid">34161868</pub-id>
</mixed-citation></ref><ref id="R30"><mixed-citation publication-type="webpage"><name><surname>Wickham</surname><given-names>H</given-names></name>, <name><surname>Bryan</surname><given-names>J</given-names></name>, <collab>RStudio</collab>, <name><surname>Kalicinski</surname><given-names>M</given-names></name>, <name><surname>Valery</surname><given-names>K</given-names></name>, <name><surname>Leitienne</surname><given-names>C</given-names></name>, <name><surname>Colbert</surname><given-names>B</given-names></name>, <name><surname>Hoerl</surname><given-names>D</given-names></name>, <name><surname>Miller</surname><given-names>E</given-names></name>, <year>2019b</year>. <source>readxl: read Excel files. R package version 1.3.1</source>
<comment><ext-link xlink:href="https://CRAN.R-project.org/package=readxl" ext-link-type="uri">https://CRAN.R-project.org/package=readxl</ext-link>.</comment></mixed-citation></ref><ref id="R31"><mixed-citation publication-type="journal"><name><surname>Xie</surname><given-names>Y</given-names></name>, <name><surname>Cheng</surname><given-names>J</given-names></name>, <name><surname>Tan</surname><given-names>X</given-names></name>, <year>2022</year>. <article-title>DT: a Wrapper of the JavaScript Library &#x0201c;DataTables&#x0201d;</article-title>. <source>R package version 0.22. DT: a Wrapper of the JavaScript Library &#x02018;DataTables&#x02019;</source> (<comment><ext-link xlink:href="http://r-project.org" ext-link-type="uri">r-project.org</ext-link></comment>).</mixed-citation></ref></ref-list></back><floats-group><fig position="float" id="F1"><label>Fig. 1.</label><caption><p id="P32">The MPAS Pipeline workflow. Solid lines indicate dataflow and dashed ovals indicate workflow processes. Analysis tools are in bold.</p></caption><graphic xlink:href="nihms-1967886-f0001" position="float"/></fig><table-wrap position="float" id="T1"><label>Table 1</label><caption><p id="P33">The required input files for the MPAS pipeline.</p></caption><table frame="hsides" rules="groups"><colgroup span="1"><col align="left" valign="middle" span="1"/><col align="left" valign="middle" span="1"/><col align="left" valign="middle" span="1"/></colgroup><thead><tr><th align="left" valign="top" rowspan="1" colspan="1">Required input files</th><th align="left" valign="top" rowspan="1" colspan="1">Format</th><th align="left" valign="top" rowspan="1" colspan="1">Required information</th></tr></thead><tbody><tr><td align="left" valign="top" rowspan="1" colspan="1">metadata</td><td align="left" valign="top" rowspan="1" colspan="1">tsv</td><td align="left" valign="top" rowspan="1" colspan="1">Column names: Index, Pathogen_Testing_ID, CSID, State, Morphological_Ectoparasite_Genus_Species, Lifestage</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">primers</td><td align="left" valign="top" rowspan="1" colspan="1">tsv</td><td align="left" valign="top" rowspan="1" colspan="1">Forward primer name, forward primer sequence, reverse primer name, reverse primer sequence</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">targets</td><td align="left" valign="top" rowspan="1" colspan="1">tsv</td><td align="left" valign="top" rowspan="1" colspan="1">Reference sequence name, species, primer name, reporting columns, minimum percent identity, minimum percent aligned, max percent gaps, internal control designation, sequence</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">surveillance_columns</td><td align="left" valign="top" rowspan="1" colspan="1">txt</td><td align="left" valign="top" rowspan="1" colspan="1">Column name entry must match the reporting column from targets.tsv file</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">Sequence files</td><td align="left" valign="top" rowspan="1" colspan="1">FASTQ</td><td align="left" valign="top" rowspan="1" colspan="1">Demultiplexed paired-end FASTQ files from one or more paired-end Illumina 500&#x02013;600 cycle sequencing runs</td></tr></tbody></table></table-wrap><table-wrap position="float" id="T2" orientation="landscape"><label>Table 2</label><caption><p id="P34">Analysis software used in the MPAS pipeline.</p></caption><table frame="hsides" rules="groups"><colgroup span="1"><col align="left" valign="middle" span="1"/><col align="left" valign="middle" span="1"/><col align="left" valign="middle" span="1"/><col align="left" valign="middle" span="1"/></colgroup><thead><tr><th align="left" valign="top" rowspan="1" colspan="1">Software</th><th align="left" valign="top" rowspan="1" colspan="1">Version</th><th align="left" valign="top" rowspan="1" colspan="1">Conda Channel</th><th align="left" valign="top" rowspan="1" colspan="1">Refs.</th></tr></thead><tbody><tr><td align="left" valign="top" rowspan="1" colspan="1">
<bold>BLAST</bold>
</td><td align="left" valign="top" rowspan="1" colspan="1">2.10.<xref rid="TFN1" ref-type="table-fn">*</xref></td><td align="left" valign="top" rowspan="1" colspan="1">Bioconda</td><td align="left" valign="top" rowspan="1" colspan="1"><xref rid="R1" ref-type="bibr">Altschul et al., 1990</xref>; <xref rid="R5" ref-type="bibr">Camacho et al., 2009</xref></td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<bold>Cutadapt</bold>
</td><td align="left" valign="top" rowspan="1" colspan="1">3</td><td align="left" valign="top" rowspan="1" colspan="1">Bioconda</td><td align="left" valign="top" rowspan="1" colspan="1">
<xref rid="R20" ref-type="bibr">Martin, 2011</xref>
</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<bold>DADA2</bold>
</td><td align="left" valign="top" rowspan="1" colspan="1">1.18.<xref rid="TFN1" ref-type="table-fn">*</xref></td><td align="left" valign="top" rowspan="1" colspan="1">Bioconda</td><td align="left" valign="top" rowspan="1" colspan="1">
<xref rid="R4" ref-type="bibr">Callahan et al., 2016</xref>
</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<bold>FastQC</bold>
</td><td align="left" valign="top" rowspan="1" colspan="1">0.11.<xref rid="TFN1" ref-type="table-fn">*</xref></td><td align="left" valign="top" rowspan="1" colspan="1">Bioconda</td><td align="left" valign="top" rowspan="1" colspan="1">
<xref rid="R3" ref-type="bibr">Andrews, 2010</xref>
</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<bold>IQ-TREE</bold>
</td><td align="left" valign="top" rowspan="1" colspan="1">2.2.0.3</td><td align="left" valign="top" rowspan="1" colspan="1">Bioconda</td><td align="left" valign="top" rowspan="1" colspan="1">
<xref rid="R22" ref-type="bibr">Minh et al., 2020</xref>
</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<bold>MAFFT</bold>
</td><td align="left" valign="top" rowspan="1" colspan="1">7.508</td><td align="left" valign="top" rowspan="1" colspan="1">Bioconda</td><td align="left" valign="top" rowspan="1" colspan="1">
<xref rid="R16" ref-type="bibr">Katoh and Standley, 2013</xref>
</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<bold>MultiQC</bold>
</td><td align="left" valign="top" rowspan="1" colspan="1">1.1</td><td align="left" valign="top" rowspan="1" colspan="1">Bioconda</td><td align="left" valign="top" rowspan="1" colspan="1">
<xref rid="R11" ref-type="bibr">Ewels et al., 2016</xref>
</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<bold>Nextflow</bold>
</td><td align="left" valign="top" rowspan="1" colspan="1">20.10.<xref rid="TFN1" ref-type="table-fn">*</xref></td><td align="left" valign="top" rowspan="1" colspan="1">Bioconda</td><td align="left" valign="top" rowspan="1" colspan="1">
<xref rid="R7" ref-type="bibr">Di Tommaso et al., 2017</xref>
</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<bold>Numpy</bold>
</td><td align="left" valign="top" rowspan="1" colspan="1">1.22.<xref rid="TFN1" ref-type="table-fn">*</xref></td><td align="left" valign="top" rowspan="1" colspan="1">Anaconda</td><td align="left" valign="top" rowspan="1" colspan="1">
<xref rid="R14" ref-type="bibr">Harris et al., 2020</xref>
</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<bold>Openpyxl</bold>
</td><td align="left" valign="top" rowspan="1" colspan="1">3.0.10</td><td align="left" valign="top" rowspan="1" colspan="1">Anaconda</td><td align="left" valign="top" rowspan="1" colspan="1">
<xref rid="R12" ref-type="bibr">Eric Gazoni and Charlie Clark, 2022</xref>
</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<bold>Pandas</bold>
</td><td align="left" valign="top" rowspan="1" colspan="1">1.4.4</td><td align="left" valign="top" rowspan="1" colspan="1">Anaconda</td><td align="left" valign="top" rowspan="1" colspan="1">
<xref rid="R21" ref-type="bibr">McKinney, 2010</xref>
</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<bold>R-BASE</bold>
</td><td align="left" valign="top" rowspan="1" colspan="1">4.0.<xref rid="TFN1" ref-type="table-fn">*</xref></td><td align="left" valign="top" rowspan="1" colspan="1">Conda-forge</td><td align="left" valign="top" rowspan="1" colspan="1">
<xref rid="R24" ref-type="bibr">R Core Team, 2021</xref>
</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<bold>R-dt</bold>
</td><td align="left" valign="top" rowspan="1" colspan="1">0.17</td><td align="left" valign="top" rowspan="1" colspan="1">Conda-forge</td><td align="left" valign="top" rowspan="1" colspan="1">
<xref rid="R31" ref-type="bibr">Yihui Xie et al., 2022</xref>
</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<bold>R-openxlsx</bold>
</td><td align="left" valign="top" rowspan="1" colspan="1">4.2.<xref rid="TFN1" ref-type="table-fn">*</xref></td><td align="left" valign="top" rowspan="1" colspan="1">Conda-forge</td><td align="left" valign="top" rowspan="1" colspan="1">
<xref rid="R27" ref-type="bibr">Schauberger and Walker, 2022</xref>
</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<bold>R-readxl</bold>
</td><td align="left" valign="top" rowspan="1" colspan="1">1.3.<xref rid="TFN1" ref-type="table-fn">*</xref></td><td align="left" valign="top" rowspan="1" colspan="1">Conda-forge</td><td align="left" valign="top" rowspan="1" colspan="1">
<xref rid="R30" ref-type="bibr">Wickham et al., 2019b</xref>
</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<bold>R-Tidyverse</bold>
</td><td align="left" valign="top" rowspan="1" colspan="1">1.3.<xref rid="TFN1" ref-type="table-fn">*</xref></td><td align="left" valign="top" rowspan="1" colspan="1">Conda-forge</td><td align="left" valign="top" rowspan="1" colspan="1">
<xref rid="R28" ref-type="bibr">Wickham et al., 2019a</xref>
</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<bold>ToyTree</bold>
</td><td align="left" valign="top" rowspan="1" colspan="1">2.0.5</td><td align="left" valign="top" rowspan="1" colspan="1">Conda-forge</td><td align="left" valign="top" rowspan="1" colspan="1">
<xref rid="R8" ref-type="bibr">Eaton, 2020</xref>
</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<bold>XS</bold>
</td><td align="left" valign="top" rowspan="1" colspan="1">1.0.<xref rid="TFN1" ref-type="table-fn">*</xref></td><td align="left" valign="top" rowspan="1" colspan="1">Conda-forge</td><td align="left" valign="top" rowspan="1" colspan="1">
<xref rid="R23" ref-type="bibr">Pratas et al., 2014</xref>
</td></tr></tbody></table><table-wrap-foot><fn id="TFN1"><label>*</label><p id="P35">Indicates the latest update within the version number.</p></fn></table-wrap-foot></table-wrap><table-wrap position="float" id="T3"><label>Table 3</label><caption><p id="P36">Comparison of the identified targets detected 175 <italic toggle="yes">I. scapularis</italic> nymphs analyzed with the CLC Genomic Workbench (Qiagen) and the MPAS Pipeline. The CLC analysis data was originally reported in <xref rid="R15" ref-type="bibr">Hojgaard et al. (2020)</xref>.</p></caption><table frame="hsides" rules="groups"><colgroup span="1"><col align="left" valign="middle" span="1"/><col align="left" valign="middle" span="1"/><col align="left" valign="middle" span="1"/></colgroup><thead><tr><th align="left" valign="top" rowspan="1" colspan="1">Identified Microorganism</th><th colspan="2" align="left" valign="top" rowspan="1">Number of positive <italic toggle="yes">I. scapularis</italic><xref rid="TFN3" ref-type="table-fn">*</xref> nymphs (%) of 175 tested</th></tr><tr><th align="left" valign="top" rowspan="1" colspan="1"/><th align="left" valign="top" rowspan="1" colspan="1">CLC Analysis</th><th align="left" valign="top" rowspan="1" colspan="1">MPAS Pipeline Analysis</th></tr></thead><tbody><tr><td align="left" valign="top" rowspan="1" colspan="1">
<italic toggle="yes">A. phagocytophilum</italic>
</td><td align="left" valign="top" rowspan="1" colspan="1">5 (2.9)</td><td align="left" valign="top" rowspan="1" colspan="1">5 (2.9)</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<italic toggle="yes">Ba. microti</italic>
</td><td align="left" valign="top" rowspan="1" colspan="1">16 (9.1)</td><td align="left" valign="top" rowspan="1" colspan="1">16 (9.1)</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<italic toggle="yes">Ba. odocoilei</italic>
</td><td align="left" valign="top" rowspan="1" colspan="1">21 (12.0)</td><td align="left" valign="top" rowspan="1" colspan="1">21 (12.0)</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1"><italic toggle="yes">B. burgdorferi s.s</italic>.</td><td align="left" valign="top" rowspan="1" colspan="1">29 (16.6)</td><td align="left" valign="top" rowspan="1" colspan="1">29 (16.6)</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<italic toggle="yes">B. miyamotoi</italic>
</td><td align="left" valign="top" rowspan="1" colspan="1">4 (2.3)</td><td align="left" valign="top" rowspan="1" colspan="1">4 (2.3)</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">
<italic toggle="yes">B. andersonii</italic>
</td><td align="left" valign="top" rowspan="1" colspan="1">4 (2.3)</td><td align="left" valign="top" rowspan="1" colspan="1">4 (2.3)</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1">Acceptable DNA<sup><xref rid="TFN2" ref-type="table-fn">a</xref></sup></td><td align="left" valign="top" rowspan="1" colspan="1">NA</td><td align="left" valign="top" rowspan="1" colspan="1">174 (99.4)</td></tr></tbody></table><table-wrap-foot><fn id="TFN2"><label>a</label><p id="P37">Acceptable tick actin reads were not evaluated in the CLC analysis.</p></fn><fn id="TFN3"><label>*</label><p id="P38">Tick identification was based on morphology.</p></fn></table-wrap-foot></table-wrap><table-wrap position="float" id="T4"><label>Table 4</label><caption><p id="P39">Comparison of the co-infections detected in the 175 <italic toggle="yes">I. scapularis</italic> nymphs analyzed with the CLC Genomic Workbench and the MPAS Pipeline. The CLC analysis data was originally reported in <xref rid="R15" ref-type="bibr">Hojgaard et al. (2020)</xref>.</p></caption><table frame="hsides" rules="groups"><colgroup span="1"><col align="left" valign="middle" span="1"/><col align="left" valign="middle" span="1"/><col align="left" valign="middle" span="1"/></colgroup><thead><tr><th align="left" valign="top" rowspan="1" colspan="1">Identified Co-infections</th><th colspan="2" align="left" valign="bottom" rowspan="1">Number of positive <italic toggle="yes">I. scapularis</italic><xref rid="TFN4" ref-type="table-fn">*</xref> nymphs (%) of 175 tested</th></tr><tr><th align="left" valign="top" rowspan="1" colspan="1"/><th align="left" valign="top" rowspan="1" colspan="1">CLC Analysis</th><th align="left" valign="top" rowspan="1" colspan="1">MPAS Pipeline Analysis</th></tr></thead><tbody><tr><td align="left" valign="bottom" rowspan="1" colspan="1"><italic toggle="yes">B. burgdorferi s.s</italic>. + <italic toggle="yes">B. miyamotoi</italic></td><td align="left" valign="bottom" rowspan="1" colspan="1">1 (0.6)</td><td align="left" valign="bottom" rowspan="1" colspan="1">1 (0.6)</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1"><italic toggle="yes">B. burgdorferi s.s</italic>. + <italic toggle="yes">Ba. microti</italic></td><td align="left" valign="top" rowspan="1" colspan="1">10 (5.7)</td><td align="left" valign="top" rowspan="1" colspan="1">10 (5.7)</td></tr><tr><td align="left" valign="middle" rowspan="1" colspan="1"><italic toggle="yes">B. burgdorferi s.s</italic>. + <italic toggle="yes">Ba. odocoilei</italic></td><td align="left" valign="middle" rowspan="1" colspan="1">2 (1.1)</td><td align="left" valign="middle" rowspan="1" colspan="1">2 (1.1)</td></tr><tr><td align="left" valign="middle" rowspan="1" colspan="1"><italic toggle="yes">B. burgdorferi s.s</italic>. + <italic toggle="yes">A. phagocytophilum</italic></td><td align="left" valign="middle" rowspan="1" colspan="1">1 (0.6)</td><td align="left" valign="middle" rowspan="1" colspan="1">1 (0.6)</td></tr><tr><td align="left" valign="middle" rowspan="1" colspan="1"><italic toggle="yes">Ba. microti</italic> + <italic toggle="yes">A. phagocytophilum</italic></td><td align="left" valign="middle" rowspan="1" colspan="1">1 (0.6)</td><td align="left" valign="middle" rowspan="1" colspan="1">1 (0.6)</td></tr><tr><td align="left" valign="top" rowspan="1" colspan="1"><italic toggle="yes">B. burgdorferi s.s</italic>. + <italic toggle="yes">A. phagocytophilum</italic> + <italic toggle="yes">Ba. microti</italic> + <italic toggle="yes">Ba. odocoilei</italic></td><td align="left" valign="top" rowspan="1" colspan="1">1 (0.6)</td><td align="left" valign="top" rowspan="1" colspan="1">1 (0.6)</td></tr></tbody></table><table-wrap-foot><fn id="TFN4"><label>*</label><p id="P40">Tick identification was based on morphology.</p></fn></table-wrap-foot></table-wrap></floats-group></article>