75030564435J Am Chem SocJ. Am. Chem. Soc.Journal of the American Chemical Society0002-78631520-512622998630348238310.1021/ja307220zNIHMS410235ArticleIdentification and Characterization of the Echinocandin B Biosynthetic Gene Cluster from Emericella rugulosa NRRL 11440CachoRalph A.1JiangWei3ChooiYit-Heng1WalshChristopher T.3*TangYi12*Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, 420 Westwood Plaza, Los Angeles, CA 90095Department of Chemistry and Biochemistry, University of California, Los Angeles, 607 Charles E. Young Drive East, Los Angeles, CA 90095Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, 200 Longwood Ave, Boston, MA 02115Corresponding Authors: yitang@ucla.edu, christopher_walsh@hms.harvard.edu4102012011020121010201210102013134401678116790

Echinocandins are a family of fungal lipidated cyclic hexapeptide natural products. Due to their effectiveness as antifungal agents, three semisynthetic derivatives have been developed and approved for treatment of human invasive candidiasis. All six of the amino acid residues are hydroxylated, including 4R,5R-dihydroxy-L-ornithine, 4R-hydroxyl-L-proline, 3S,4S-dihydroxy-L-homotyrosine, and 3S-hydroxyl,4S-methyl-L-proline. We report here the biosynthetic gene cluster of echinocandin B 1 from Emericella rugulosa NRRL 11440 containing genes encoding for a six-module nonribosomal peptide synthetase EcdA, an acyl-AMP ligase EcdI and oxygenases EcdG, EcdH and EcdK. We showed EcdI activates linoleate as linoleyl-AMP and installs it on the first thiolation domain of EcdA. We have also established through ATP:PPi exchange assay that EcdA loads L-ornithine in the first module. A separate hty gene cluster encodes four enzymes for de novo generation of L-homotyrosine from acetyl-CoA and 4-hydroxy-phenyl-pyruvate is found from the sequenced genome. Deletions in the ecdA, and htyA genes validate their essential roles in echinocandin B production. Five predicted iron-centered oxygenase genes, ecdG, ecdH, ecdK, htyE, htyF, in the two separate ecd and hty clusters are likely to be the tailoring oxygenases for maturation of the nascent NRPS lipohexapeptidolactam product.

antifungalnonribosomal peptidelipopeptideunnatural amino acidsgenome miningNational Institute of General Medical Sciences : NIGMSR01 GM092217 || GM
INTRODUCTION

Invasive candidiasis caused by opportunistic pathogenic strains of genus Candida, accounts for 17% of ICU-related infections, third highest after Staphylococcus aureus and Pseudomonas spp.-related infections.1 Moreover, there has been a steady increase in the incidence of invasive candidiasis correlating with the increased use of immunosuppressants, broad-spectrum antibiotics, intravenous catheters and prosthetics and invasive clinical procedures.23 Echinocandins, a family of lipohexapeptides that prevent fungal wall synthesis through noncompetitive inhibition of 1,3-β-glucan synthase, rapidly rose to the top ranks of antifungal agents due to their activity against a wide range of Candida spp., in particular azole-resistant strains, and are significantly less toxic compared to Amphotericin B.45

As the first echinocandin discovered, Echinocandin B 1, was isolated from Aspergillus nidulans var. echinulatus6 and A. nidulans var. roseus NRRL 11440.7 While 1 and family members subsequently isolated, such as pneumocandin A0 4,8 aculeacin,9 cryptocandin,10 and mulundocandin11, show anti-Candida activity, the hemolytic properties of natural echinocandins prevented their use as therapeutics. Derivatization of 1 and 4, especially in the fatty acid moiety, led to the development of cilofungin 2,1213 anidulafungin 3 (Eraxis, Pfizer), 14 caspofungin (Cancidas, Merck and Co) 5,15 and micafungin 6 (Mycamine, Astellas Pharma)16 (Scheme 1) that have less hemolytic properties while retaining the bioactivity of their parent compound. For example, 3, a semisynthetic derivative of 1 containing a substituted terphenyl acyl chain, was approved by the FDA in 2006.14,17

In addition to their antifungal activity, echinocandins reflect interesting biosynthetic features in their structures. Aside from the long chain fatty acyl amide, the presence of nonproteinogenic amino acids 4R, 5R-dihydroxyl-L-ornithine, 3S-hydroxyl,4S-methyl-L-proline, 4R-hydroxyl-L-proline and 3S, 4S-dihydroxyl-L-homotyrosine suggests echinocandins are synthesized by nonribosomal peptide synthetases (NRPSs). An intriguing feature of echinocandins is the presence of multiple alcohol and diol groups within the scaffold as a result of the incorporation of these unnatural residues; most distinct of these hydroxyl groups are at Cδ of L-ornithine which creates a hydrolytically labile hemiaminal in the macrocycle1819 and the vicinal diol found in 3S,4S-dihydroxy-L-homotyrosine. It is noted that enzymatic Cδ-hydroxylation of L-ornithine is a particularly novel and challenging reaction due to the presence of the neighboring amine. Because of this unstable linkage, the total synthesis of 1 has not been demonstrated so far. Only simpler versions without the hydroxyl groups in the L-ornithine position, such as echinocandin D, have been synthesized and used in SAR studies.2022

Despite the medical importance and intriguing structural motifs in echinocandin, the genetic and molecular basis for the biosynthesis of this family has remained unknown to date. The enzymology (sequence, specificity, type of oxygenase) behind this multitude of hydroxylation steps is also unresolved. A. nidulans var. roseus NRRL 11440 (ATCC 58397) is an industrially important strain that produces 1, which is a precursor for the semisynthetic 3 (Scheme 1). A polyphasic characterization showed that this strain should belong to the Emericella rugulosa species and henceforth, we use that nomenclature here.23 In this study, we report the discovery of the gene cluster of 1 in this strain through gene deletion of a multimodular NRPS and biochemical characterization of the enzymes involved in the lipo-initiation process. Through gene deletion and chemical complementation, we have also uncovered the separate gene cluster responsible for the biosynthesis of L-homotyrosine in the peptide scaffold of 1.

RESULTSWhole Genome Sequencing and Analysis of NRPS Gene Clusters of <italic>Emericella rugulosa</italic> NRRL 11440

Whole genome shotgun sequencing of E. rugulosa NRRL 11440 was performed using Illumina HiSeq2000 to generate ~17 Gbp of sequence. Assembly of the sequence reads generated 433 contigs with N50 length of 235,313 bases (See Table S1). The total length of the 433 contigs amounts to 32,224,016 bases, which is slightly larger than the genome size of the previously sequenced A. nidulans A4 strain of ~31 Mb.24

Using the first adenylation (A) domain of the enzyme TqaA (accession number: ADY16697) from the tryptoquialanine pathway25 as BLAST query, we were able to find 26 putative NRPS genes in E. rugulosa (Table S2). Four out of the twenty-six NRPSs, which we denoted as ErNRPSs, are NRPSs with five or more modules. Two of these four ErNRPSs, annotated as ErNRPS99 and ErNRPS57, are also found in the model fungi A. nidulans A4 that does not produce 1. ErNRPS99 is likely a homolog of EasA found in the biosynthesis of emericellamide26 due to the high shared sequence identity between the two. ErNRPS57, on the other hand, contains a terminal reductive domain (R) similar to what is found in peptaibol synthetases.27 Thus, this leaves ErNRPS284 and ErNRPS123 as echinocandin synthetase candidates. ErNRPS284 (Table S2) contains five modules, which are insufficient for catalyzing the formation of the hexalipopeptide scaffold of 1 based on the colinearity hypothesis.28 On the other hand, the six-module ErNRPS123 (799 kDa), which is annotated as ecdA, has the correct number of modules necessary for the assembly of 1. However, performing A domain selectivity prediction using NRPS predictor29 offered little confirmation that this is the correct echinocandin NRPS, with only the third A domain prediction (L-proline) matching the corresponding amino acid in 1 (L-proline or 4R-hydroxyl-L-proline) (Table S3). Further bioinformatics analysis of the domain architecture of EcdA, as well as the genes within the ecd cluster, gave indications that this is most likely the correct gene cluster that is consistent with some of the expected biosynthetic transformations required for the assembly of 1. First, EcdA has a terminal condensation domain (CT) that has been shown to catalyze the cyclization of NRPS products in fungi30, in agreement with the anticipated macrocyclization of the hexapeptide. Furthermore, in proximity to ecdA is ecdI (Figure 1A and Table 1), an acyl-AMP ligase homolog gene, giving a plausible route for lipo-initiation of the NRPS. In addition, other genes adjacent to ecdA encode putative non-heme iron, α-ketoglutaratedependent oxygenases (ecdG and ecdK), and a cytochrome P450 heme-iron-dependent oxygenase (ecdH) (Table 1), indicating that the nascent NRPS product may undergo several oxygenation steps, as anticipated for the biosynthesis of 1. Other genes flanking the putative ecd cluster encode a fungal transcription factor gene (ecdB), three transporter protein genes (ecdC, ecdD and ecdL), a glycosyl hydrolase gene (ecdE), a glycosidase gene (ecdF) and a gene encoding for a protein with no predicted conserved domain (ecdJ). Interestingly, genes encoding the biosynthesis of unnatural amino acids such as L-homotyrosine are not present in the vicinity of ecd gene cluster, but are found to reside elsewhere on the genome (see below).

Verification of the role of <italic>ecdA</italic> in the biosynthesis of 1

To confirm the production of 1, E. rugulosa was grown in Medium 2 and the metabolite extracted from the fermentation broth in the same manner as previously described.7 High resolution LCMS trace of the extracted metabolites shows a peak at 17.6 min with m/z of 1042.5700 [M+H-H2O]+ corresponding to the theoretical mass-to charge ratio of 1 (m/z =1042.5707 [C52H81N7O16+H-H2O]+) (Figure 1B). Purification and subsequent characterization by NMR (Table S3 and Figure S13–S16) and comparison with authentic standard (Figure S1) verified that the compound is indeed 1.

To verify whether the ecd cluster is responsible for the production of 1, we developed a gene deletion method for E. rugulosa based on previous methods for A. nidulans A4,31 using the glufosinate resistance gene bar as selection marker (See Materials and Methods).32 A gene deletion cassette containing portions of ecdA internally disrupted by the bar gene driven by the trpC promoter was introduced into E. rugulosa protoplasts via PEG-mediated transformation and the resulting glufosinate-resistant strains were selected. A bioassay-guided knockout screening was developed in which individual fungal colonies are spotted on plates pre-inoculated with C. albicans. Approximately 100 mutants were screened by both loss of anti-Candida activity, and PCR-based screening using a bar gene primer and a primer found outside the knockout cassette. One mutant (ΔecdA I-16) was isolated which lost the ability to inhibit the growth of C. albicans (Figure S2) and also showed the correct PCR-amplified product (Figure S3). LCMS analysis of the extracted metabolites after 7 days of growth showed no production of 1 (Figure 1B), suggesting that EcdA is the NRPS responsible for the biosynthesis of 1.

Characterization of the Adenylation domain (A) of the First Module of EcdA

Guided by the results of the gene knockout of ecdA, we proceeded to determine the amino acid specificity of the first A domain of EcdA by cloning the initiation module of EcdA, which is fused to a thiolation domain (T0) that is likely the site of lipid attachment (EcdA-M1, T0CAT1). The 130 kDa protein was expressed in Escherichia coli BL21(DE3) cells to a final titer of ~20 mg/L and purity of >95% (Figure S4). Amino acid-dependent ATP-[32P]-PPi exchange assay with 2 μM of EcdA-M1 at ambient temperature and incubation time of 30 minutes showed that A1 could activate L-ornithine (L-Orn) and (4R/S)-4-hydroxyl-L-ornithine (4- OH-L-Orn). D-ornithine and other amino acids with basic side chains such as L-lysine and L-arginine were not activated (Figure 2A). Full kinetic analysis shows that L-Orn (kcat= 43.5 ± 1.9 min−1, KM= 45.4 ± 8.8 μM) is about a 500-fold better substrate than 4-OH-L-Orn (kcat= 5.6 ± 0.2 min−1, KM= 2.7 ± 0.3 mM) as judged by kcat/KM ratios (Figure 2B and 2C), suggesting that oxidation at Cγ of L-ornithine most likely occurs after loading of L-Orn to T1. The activation of L-Orn by the first module of EcdA further supports the link between the NRPS and the biosynthesis of 1.

EcdI catalyzed Lipo-initiation of EcdA

The addition of the lipid chain in the biosynthesis of lipopeptides, also known as lipo-initiation3334, is a critical step not only because it connects the fatty acid pool and NRPS biosynthesis, but also due to the importance of the lipid to the antimicrobial activities. 14 Several mechanisms exist in activating and transferring the lipid chain to the NRPS.34 These include fusion of a fatty acid synthetase (FAS)-like module to the N-terminus of a NRPS (mycosubtilin)35; and transfer of the lipid chain to a dissociated T domain (daptomycin)3637 or coenzyme A (surfactin) 33 by a fatty acyl ligase, followed by condensation with an aminoacyl adenylate catalyzed by the first C domain. Since EcdA only contains six A domains, one for each of the residues in the hexapeptide scaffold, we reasoned that EcdI, containing an AMP-binding domain might be responsible for the formation of the activated form of linoleic acid and its transfer to the T0 domain of EcdA. To test this hypothesis, we cloned EcdI Into an E. coli expression vector and expressed the N-terminal His-tagged protein in BL21(DE3) with a titer of ~30mg/L and purity of ~95% (Figure S5). Using the purified phosphopantetheinyl transferase Sfp from Bacillus subtilis, 38 we converted apo EcdA-M1 into its holo form in vitro. After co-incubation of holo EcdA-M1 with EcdI, ATP and [14C]-linoleic acid, the assay mixture was analyzed using SDS-PAGE and autoradiography. The autoradiogram shows a strong radiolabeling of the 130 kDa band, indicating that the 14C-linoleic acid is covalently bound to EcdA-M1 (Figure 3A). In contrast, in the absence of EcdI, ATP or Sfp, nearly no labeling of EcdA-M1 can be detected, confirming the proposed mechanism (Figure 3C). Quantification of the radiolabel suggests that ~30% of EcdA-M1 is loaded with the labeled substrate (Figure 3B), consistent with fractional stoichiometries seen in other labeling studies.3940

To pinpoint which of the two thiolation domains in EcdA initiation module (T0CAT1) is acylated by the linoleoyl starter unit in the presence of EcdI, single mutations to the active site serines in the two thiolation domains were made. The S47A mutant (T0*CAT1) prevents phosphopantetheinylation of the initiation T domain, but leaves the T1 domain available for conversion into the holo form that should be capable of undergoing covalent loading with [14C]-L-ornithine. Conversely, the corresponding S1127A mutant (T0CAT1*) will not be phosphopantetheinylated at T1 but should be able to load [14C]-linoleate. Both the S47A and the S1127A mutants were expressed from E.coli BAP1,41 which coexpresses Sfp for thiolation domain phosphopantetheinylation (thus no Sfp preincubation is required) (Figure S6). As shown in Figure S7, [14C]-linoleate is loaded onto the T0CAT1* but not the T0*CAT1 variant of the EcdA initiation module, as assessed by both autoradiography and by radioactive counting of protein precipitated via addition of trichloroacetic acid. The complementary result is seen for [14C]-L-ornithine covalent loading in which T0*CAT is labeled but T0CAT1* is not (Figure S8). The relatively low amount of L-Orn radioactivity may reflect lability of the thioester due to the attack by the Nγ of the ornithine side chain to the activated carbonyl carbon followed by the subsequent release of the cyclic δ-lactam, a known propensity of ornithine thioesters.42

In previously characterized lipopeptide NRPSs, the incorporation of fatty acids into the assembly line requires the formation of either acyl-CoA33 or acyl-AMP36 prior to loading onto the NRPS. To explore the mechanism of linoleic acid activation by EcdI, we removed excess CoA after conversion of apo EcdA-M1 to its holo form, followed by addition of 0, 0.5 or 5.0 mM CoA to the assay. The reaction rates for the acyl loading step, as determined by the fraction of EcdA-M1 loaded with labeled substrate at different time points, are essentially the same (Figure S9), ruling out the CoA-dependent mechanism and indicating that the acyl-AMP is directly transferred to EcdA-T0 by EcdI (Figure 3C).

In order to probe the substrate specificity of EcdI, we coincubated holo-EcdA-M1 with alternative acyl substrates. Palmitic acid (C-16) showed similar degree of loading to EcdA- M1 compared to linoleic acid (27% vs. 30 % for linoleic acid). Decreasing the length of the acyl chain to C-10 ([14C]-decanoic acid), on the other hand, dramatically reduced the loading of EcdA to ~8% at time points where the loading of linoleic acid reaches saturation. Nevertheless, this range of fatty acyl chain lengths that can be transferred to EcdA-M1 indicates there is room for incorporation of alternative lipid starter molecules into the structure of 1. Incubation of EcdI and EcdA-M1 with non-fatty-acid substrate such as benzoic acid, an aryl carboxylate, did not show covalent loading onto T0 (Figure S10).

Discovery of the L-homotyrosine biosynthetic gene cluster

One of the characteristic features of the echinocandin family is the presence of nonproteinogenic 3S, 4S-dihydroxyl-L-homotyrosine in the 4th amino acid position of the cyclic hexapeptide. This residue is derived from the dihydroxylation of L-homotyrosine either prior to, or after incorporation into the peptide scaffold. It was previously shown that Lhomotyrosine in the scaffold of 4 is derived from the condensation of 4-hydroxyphenyl-pyruvate and acetate to form 2-(4- hydroxybenzyl)-malate.43 Likewise, the biosynthesis of L-homophenylalanine in watercress plants is proposed to follow a pathway analogous to leucine biosynthesis,44 beginning with the condensation of acetyl-CoA and phenylpyruvate to form benzylmalic acid. In order to find putative 2-(4- hydroxybenzyl)-malate synthase (HBMS) that may be involved in L-homotyrosine biogenesis, we searched the E. rugulosa genome for genes encoding the functionally analogous isopropyl-malate synthase (IPMS), the enzyme that catalyzes the condensation of α-ketovalerate with acetyl-CoA in leucine biosynthesis. We used the IPMS gene from Mycobacterium tuberculosis (accession number: MT3813) as BLAST query. Two significant hits were found, ErIPMS48 and ErIPMS66, both of which are found outside the contig containing the ecd gene cluster. ErIPMS48 shares 99% protein sequence identity to the predicted A. nidulans A4 housekeeping IPMS (AN0804) as well as >85% identity to predicted IPMS in other ascomycetes, suggesting that this gene encodes for the actual IPMS involved in leucine biosynthesis in E. rugulosa. On the other hand, the ErIPMS66 protein sequence has a lower similarity to AN0804 (43% protein identity), while such a second IPMS homolog is not present in A. nidulans A4. Downstream of ErIPMS66 (designated as htyA) are genes putatively encoding a transaminase (htyB), a 3-isopropyl-malate dehydrogenase homolog (htyC), and an isopropyl-malate isomerase homolog (htyD), all possibly involved in biosynthesis of Lhomotyrosine(Table 2 and Figure 4A). Moreover, the presence of immediately upstream genes encoding for a predicted nonheme iron, α-ketoglutarate-dependent oxygenase (htyE) and cytochrome P450 oxygenase (htyF) are consistent with the requirement of Cγ and Cδ hydroxylation of L-homotyrosine in 1. Thus, we reasoned that the gene cluster, here designated as hty, may be responsible for the biosynthesis of L-homotyrosine in E. rugulosa.

To confirm this hypothesis, we genetically disrupted the htyA gene in the same manner as for construction of ΔecdA. Screening of ~100 colonies yielded 3 PCR-positive mutants (Figure S11). All three ΔhtyA mutants lost the ability to inhibit the growth of C. albicans under screening conditions (Figure S2), accompanied by the loss of production of 1 (Figure 4B). To chemically complement the ΔhtyA mutant, 0.1 mg/mL of L-homotyrosine were supplemented to the growth media. As expected, adding free homotyrosine restored the ability of the mutant to inhibit Candida (Figure S2), as well as the production of 1 (Figure 4B) to wild type levels. As a negative control, feeding of L-homotyrosine to ΔecdA I-16 mutant did not restore the production of 1 (Figure S1). Therefore, based on whole genome sequencing, we were able to identify a separately located gene cluster that is responsible for the biosynthesis of an unnatural amino acid building block for the ecd pathway.

Based on the putative functions of of HtyA-D (Table 2), the biosynthesis of L-homotyrosine is predicted to be as follows (Figure 4C): 4-hydroxy-phenylpyruvate undergoes an aldol-type condensation by HtyA with the C-2 of acetyl-CoA followed by the release of CoA to form 2-(4-hydroxybenzyl)- malate. This is followed by isomerization of 2-(4- hydroxybenzyl)- malate to 3-(4-hydroxybenzyl)-malate by HtyD. Thereafter, 3-(4-hydroxybenzyl)-malate undergoes decarboxylation and oxidation to form 2-oxo-4-(4-hydroxybenzyl)- butanoic acid, coupled to reduction of NAD+ to NADH by HtyC. The product then undergoes transamination catalyzed by HtyB to form L-homotyrosine. The closest homologs of HtyA-D is found in a four gene cassette from Alternaria alternata that is predicted to catalyze the formation of 2-amino-4- phenyl-valeric acid (APVA) in AM-toxin.45 Interestingly, this suggests that the HtyA-D homologs in the AM-toxin gene cluster must perform two cycles of the α-ketoacid elongation to afford APVA.

DISCUSSION

Echinocandins are a family of antifungal cyclic lipopeptides from ascomycetes. Through the sequencing of the genome of E. rugulosa NRRL 11440 and subsequent bioinformatics analysis of NRPS genes, we have identified ecdA, encoding a 799 kDa six module NRPS, which is confirmed by gene deletion to be required for production of 1. The ecd gene cluster is flanked by two microsyntenic blocks belonging to two different chromosomes in A. nidulans A4. Upstream of the ecd cluster are a group of genes that are syntenic to genes found in Chromosome VII of A. nidulans A4 while the genes found downstream of the cluster is syntenic to genes found in A. nidulans A4 Chromosome V (Figure S12). This suggests that a chromosomal translocation occurred when E. rugulosa NRRL 11440 and A. nidulans A4 diverged from their common ancestor.

Another interesting result revealed by the study is the separation of the ecd gene cluster and the L-homotyrosine biosynthetic genes. By adding the distance to the nearest end of their respective contigs, the minimum distance between the two clusters is ~42.5 kb, assuming that both clusters are located in the same chromosome (Figure S12). Moreover, similar to the phenomenon seen in the ecd gene cluster, the hty gene cluster is flanked by microsyntenic blocks from two different A. nidulans A4 chromosomes; upstream of hty cluster are genes that are syntenic to genes from Chromosome VII of A. nidulans A4 while downstream are genes that are syntenic to genes found in the A. nidulans A4 Chromosome VI (Figure S12). This chromosomal translocation in the E. rugulosa genome, in comparison to A. nidulans A4 genome, also prevents us from mapping the hty and ecd gene cluster in relation to the A. nidulans A4 chromosomes. While separation of the biosynthetic genes for 1 is unusual, it is not unprecedented. Examples of the separation of secondary metabolite biosynthetic genes in fungi are found in the dothistromin pathway in Dothistroma septosporum46, the convergence of the orsellinic acid and anthrone biosynthetic pathways in A. nidulans to form spiroanthrones,47 the austinol and dehydroaustinol pathways in A.nidulans,48 and the putative tryptoquivaline pathway in A. clavatus, which is a feature identified after comparison to the closely related tryptoquialanine pathway in Penicillium aethiopicum.25

Due to the size of the full-length EcdA, which is a formidable challenge for in vitro evaluation, we dissected the EcdA initiation module to establish the predicted specificity for L-ornithine activation by A1. The ~130 kDa, four domain (T0CAT1) EcdA initiation module (EcdA-M1) was expressed from E. coli. Determination of selectivity of A1 by amino aciddependent ATP-[32P]-PPi exchange assay established L-ornithine as the most likely substrate. Heterologous expression of EcdA-M1 and EcdI, an acyl-AMP ligase homolog, allowed us to investigate the mechanism of lipo-initiation of EcdA. Co-incubation of EcdA-M1 with EcdI, ATP and [14C]-linoleic acid indicated loading of linoleic acid to the initiation T0 domain of EcdA. While much is known regarding the lipoinitiation strategies of the bacterial NRPS’s,3337 this study presents the first in vitro characterization of the lipo-initiation of fungal NRPS. In comparison to the bacterial NRPS systems, EcdI acts analogously to DptE in daptomycin biosynthesis: 36 each adenylates the fatty acid substrate which is subsequently transferred to the initiation T0 domain. The difference between the two systems is that the acceptor thiolation domain is standalone to the NRPS DptA in daptomycin biosynthesis but is fused at the N-terminal of EcdA for the biosynthesis of 1. Hence it is expected that the smaller EcdI requires interaction with a thiolation domain that is part of a multimodular megasynthetases. This appears to be a commonly employed strategy among fungal PKSs and NRPSs, in which smaller enzyme partners are recruited to interact with the thiolation and acyl carrier protein domains.4950 Currently, semisynthesis of echinocandins used for medical use such as 3 and 6 require the biological deacylation of their parent compound via feeding to separate cultures of Actinoplanes spp. followed by chemical reacylation using protective chemistry.12,14 Thus, deciphering the lipo-initiation strategy may also enable the engineered biosynthesis of approved derivatives of 1 containing alternative lipid groups.

The presence of acyl-AMP ligase homolog gene ecdI within the ecd gene cluster led us to investigate whether other organisms, in particular filamentous fungi, also have clustering of genes encoding an acyl-AMP ligase and an NRPS with an N-terminal T0 domain. A search in the NCBI database revealed five genes for fatty-acyl-AMP ligase that share identity ≥39% with EcdI and are clustered with genes of NRPS with an initiation T0 domain (Table S6). Moreover, four of the five EcdI homolog genes are also clustered with highly-reducing PKS genes in addition to NRPS genes, further hinting that these gene clusters may encode for lipopeptide biosynthetic enzymes. Furthermore, easD from A. nidulans, the only characterized out of the five fatty acyl-CoA ligase genes, is shown to be involved in the biosynthesis of emericellamide. The emericellamide synthetase EasA, however, requires an additional acyltransferase EasC for lipo-initiation.26 Such an acyl-transferase is notably missing in the ecd gene cluster and is now proven not to be required for the linoleic acid loading in our in vitro assays.

Based on the results of the lipo-initiation by EcdI and determination of the selectivity of A1, we can propose one possible pathway for the biosynthesis of 1 (Scheme 2): linoleoyl-AMP, produced by EcdI, is transferred to the initiation T0 of EcdA. The linoleoyl-S-phosphopantetheinyl-T0 is sequentially extended with L-ornithine, L-threonine, L-proline, L-homotyrosine, L-threonine and 4S-methyl-L-proline to form the linear hexapeptide. Thereafter, the terminal condensation (CT) performs macrocyclization of the NRPS product30 and the cyclic scaffold 7 is released from EcdA. In this pathway, in which all the hydroxylation reactions are proposed to occur following completion of the cyclic peptide, the unhydroxylated precursor 7 will undergo six rounds of hydroxylation. In congruence to modification of the residues found in 1, five hydroxylase genes (ecdG, ecdH, ecdK, htyE and htyF) are embedded within the ecd and hty clusters. At this point, it is not possible to assign the hydroxylases based on sequence alone, as all are proposed to act on sp3 hybridized carbon atoms. It was previously shown that L-proline hydroxylation to 4R-hydroxyl-L-proline in protein scaffolds is catalyzed by an non-heme iron, α-ketoglutarate dependent oxygenase.51 Thus it is likely that the hydroxylation of L-proline in 1 might be catalyzed by any of EcdG, EcdK or HtyE. However, the possibility that a P450 oxidase such as EcdH or HtyF can catalyze the reaction cannot be excluded. On the other hand, the formation of vicinal diols to give the 4R,5R-dihydroxyl-L-ornithine and 3S,4S-dihydroxyl-L-homotyrosine residue are more novel compared to that of the modified proline, and may each require two hydroxylases to separately install the two hydroxyl groups. Due to the lability of the resulting hemiaminal,18 we anticipate that Cδ hydroxyl group in L-ornithine must be installed after peptide macrocycle formation, and is likely the last step in the oxidative tailoring cascade. An equally likely pathway to 1 is that some of the amino acids are hydroxylated prior to incorporation into the hexapeptide. The most plausible candidate for this scenario is 4R-hydroxyl-L-proline, which is a commonly observed unnatural amino acid in different organisms5254. The exact timing and substrate of this plethora of hydroxylation enzymes will be determined in subsequent efforts through a combination of A domain activation assays, genetic knockouts of the candidate oxygenases and in vitro biochemical investigation

In addition to L-homotyrosine and L-ornithine, 1 contains the nonproteinogenic amino acid 3S-hydroxyl-4S-methyl-Lproline, which is presumably derived from 4S-methyl-Lproline. Previous studies of 4,43 nostopeptolide,55 and nostocyclopeptide 56 biosyntheses showed that that 4-methyl-L-proline originates from Cγ oxidation and subsequent cyclization of Lleucine. Oxidation of L-leucine to 5-hydroxyl-L-leucine was recently identified to be catalyzed by a non-heme iron, α-ketoglutarate dependent oxygenase in Nostoc punctiforme.57 Thus, it is probable that one of the non-heme, α-ketoglutarate dependent oxygenase such as EcdG, EcdK and HtyE can perform this reaction. However, both the ecd and hty clusters do not harbor the genes encoding enzymes involved in the reactions downstream to oxidation of L-leucine. A genome-wide search for genes for pyrroline-5-carboxylate (P5C) reductase homolog, proposed to catalyze the final step of 4-methyl-Lproline biosynthesis,55 revealed the presence of four candidate genes in E. rugulosa. However, an ortholog for each of the four candidate genes are also present in A. nidulans A4 (Table S5) so at present it is not yet clear which, if any, is involved in synthesis of the 4S-methyl-L-proline for 1.

In conclusion, we report the discovery of the biosynthetic gene cluster of 1, the first such cluster for a member of the medically relevant fungal lipopeptide family of compounds. This study has also uncovered the genetic basis for the biosynthesis of the nonproteinogenic L-homotyrosine. The mechanism and timing of the hydroxylation of the residues Lhomotyrosine, L-proline, 4S-methyl-L-proline and L-ornithine, are intriguing features of the biosynthesis of 1 and are currently under investigation in our groups.

MATERIALS AND METHODSGeneral Methods and Material

E. rugulosa (A. nidulans var. roseus) strain NRRL 11440 was obtained from Agricultural Research Services Culture Collection (Peoria, Il). Authentic standard for 1 was purchased from Santa Cruz Biotechnology, Inc (Santa Cruz, CA). Primers used in this study were ordered from Integrated DNA Technologies and are listed in Table S7. Sequencing of heterologous expression constructs and knockout cassettes was performed by Laragen, Inc (Culver City, CA). RNA for cDNA amplification was isolated using RiboPure-Yeast Kit from Ambion. First strand cDNA synthesis was performed using SuperScript III- First Strand Synthesis SuperMix (Invitrogen Corp.).

Illumina Hiseq2000 Sequencing and Bioinformatic Analysis

The genomic DNA for sequencing was isolated as described elsewhere from stationary liquid cultures.58 Shotgun sequencing was performed at Ambry Genetics (Aliso, Viejo, CA) using Illumina Hiseq 2000 with a read size of 157 bases resulting in a total of ~17 Gbp reads. The Illumina sequencing reads were assembled using a hierarchical assembly method using the assembly programs SOAPdeNOVO59 and Geneious and performed by the UCLA Hoffman Cluster. First, the ~17 Gbp total reads were assembled using SOAPdeNOVO using k-mer size of 87 bp. The first-tier contigs were then assembled using Geneious to generate 433 second tier contigs with a N50 of 235,313 and total length of ~32 Mb.

The second-tier contigs were converted to BLAST database format for local BLAST search using stand-alone BLAST software (v. 2.2.18). Gene predictions were performed using the FGENESH program (Softberry) and manually checked by comparing with homologous gene/proteins in the GenBank database. Functional domains in the translated protein sequences were predicted using Conserved Domain Search (NCBI) or InterproScan (EBI). The nucleotide sequences for the ecd and hty gene clusters are deposited to Genbank database with accession numbers JX421685 and JX421684, respectively.

Fungal Transformation and Gene Disruption

Polyethylene glycol-mediated transformation of E. rugulosa NRRL 11440 was performed as done previously31 with the following modifications: the spores from two plates were grown in 250 mL GMM with 10 mM ammonium tartrate as sole nitrogen source for 16 hrs at 250 rpm, 28°C and 1 g of the germlings from the culture was used for digestion using 3 g of Vinotaste Pro enzyme mixture (Novozyme). The glufosinate resistance gene bar with an upstream trpC promoter was amplified from pBARGPE1 plasmid obtained from the Fungal Genetic Stock Center (Kansas City, MO). Glufosinate used for the selection of transformants was prepared by extracting twice with equal volume of 1-butanol from commercial herbicide Finale(Bayer), which contains 11.33% (w/v) glufosinate-ammonium. 60 Construction of the knockout cassette was performed by fusion PCR as described elsewhere.25 Fungal genomic DNA from the transformants was isolated using ZR-Fungal/Bacterial DNA Miniprep Kit (Zymo). Primers used for PCR screening are listed in Table S7.

Extraction and Characterization of 1

The echinocandin extraction method is based on US Patent #4,288,549.7 Briefly, the strain was grown in Medium 2 [2.5% (w/v) glucose (Sigma), 1% peptone (BD Biosciences), 1% (w/v) starch (Sigma), 1%(w/v) molasses, 0.4% (w/v) N-Z Amine A (Sheffield Biosciences), 0.2% (w/v) calcium carbonate (Sigma)] in 28°C for 7 days in 10 mL cultures for screening of transformants and 2 L for large scale compound extraction. An equal volume of methanol was added to the whole fermentation broth and the mixture was shaken at 16°C for 1 hour. The mixture was then filtered and the pH adjusted to 4.0. The filtrate was extracted twice with equal volumes of chloroform. The concentrated extract was purified using Sephadex LH-20 resin using 1:1 methanol and chloroform solvent system. The fractions containing 1, determined by LCMS, was further purified using HPLC with C-18 column and a 40– 95%, 20 minute water:acetonitrile gradient. Purified 1 was further characterized via Agilent 6520 high-resolution QTOF/LC-MS and its 1D-1H NMR, 2D-COSY, HSQC and HMBC spectra were recorded using a Varian 600 MHz NMR spectrometer.

Anti-<italic>Candida</italic> Assay

ΔecdA and ΔhtyA mutants were grown in 10 mL of liquid Medium 2 at static conditions for 7 days at 28°C. Discs of 10 mm in diameter were cut from the fungal mat and transferred to yeast extract-peptone-dextrose (YPD) agar plates that were previously inoculated with Candida albicans ATCC 90234. The co-cultures were grown at 28°C overnight. For chemical complementation of ΔhtyA (Figure S1), L-homotyrosine (0.1 mg/mL) was fed to the static liquid cultures at day-4 before transferring the mycelial discs from individual clones to the YPD plate pre-inoculated with C. albicans at day-7.

Cloning of EcdA-M1

The C-terminal boundary of EcdA- T1 was predicted through alignment of gramicidin synthetase PCP (accession: 1DNY) and fungal NRPSs with EcdA by ClustalW. EcdA-M1 was amplified from E.rugulosa cDNA sample using the primer pair NdeI-EcdA-T0 and EcoRI-EcdA-T1 (See Table S7). The amplicon was gel-purified, digested with NdeI and EcoRI and ligated into pET28a expression vector to create pET28a-EcdA-M1. QuikChange I Site-Directed Mutagenesis Kit (Agilent Technologies) was used to clone the EcdA-M1 variants using pET28A-EcdA-M1 as template and ecdA-S47A-F and ecdA-S47A-R as primer pair to create S47A variant and ecdA-S1127A-F and ecdA-S1127A-R primer pair to create S1127A variant.

Cloning of EcdI

EcdI cDNA was amplified from E. rugulosa cDNA sample using primers EcdI-NdeI-F and EcdI-EcoRI- R. The amplicon was ligated into PCR-blunt vector and was transformed into TOP10 cells. The plasmid was sequenced to confirm correct splicing of the transcript. The plasmid bearing the correct sequence was digested with NdeI and EcoRI and the insert was cloned into pET28a vector to create pET28a-EcdI.

Heterologous expression of EcdA-M1 and EcdI

pET28a-EcdA-M1 or pET28a-EcdI was transformed into BL21 (DE3) and the cells were grown in 500 mL LB at 37°C and 250 rpm. When the OD600 reading reached 0.4, the cultures were cooled to 16°C and protein expression was induced by addition of 60 μM IPTG. After 16 hr of shaking at 16°C, the cells were pelleted and resuspended in Buffer A (50 mM Tris-HCl, pH 7.9, 5 mM NaCl, 1 mM DTT) with 20 mM imidazole. The cells were lysed via sonication and centrifuged at 4°C at 15000 rpm. Nickel-NTA resin was added to the supernatant and was gently stirred at 4°C for 2 hours. The protein/resin mixture was loaded into a gravity flow column and the His-tagged proteins were purified with increasing concentration of imidazole in Buffer A. Additional anion exchange column purification for EcdI in was performed using a 80 minute gradient of 0–100% Buffer A and Buffer B (50 mM Tris-HCl, 1 M NaCl, 2mM DTT) using a MonoQ 10/100L Anion exchange column (GE Healthcare Life Sciences). The fractions containing the desired protein were concentrated using ultrafiltration column (100 kDa cutoff, for EcdA-M1 and 30 kDa cutoff for EcdI). Protein concentration was determined via by UV absorbance at λ=280 nm.

ATP-[<sup>32</sup>P]PPi Exchange Assays

A typical reaction mixture (500 μL) contained 1.0 μM EcdA, 2 mM amino acid substrate (unless specified), 5 mM ATP, 10 mM MgCl2, 5 mM Na[32P]-pyrophosphate (PPi) (~1.8 × 106 cpm/mL), and 50 mM Tris-HCl (pH 8). Mixtures were incubated at ambient temperature for regular time intervals (e.g., 5 min), and 150 μL aliquots were removed and quenched with 500 μL of a charcoal suspension (100 mM NaPPi, 350 mM HClO4, and 16 g/L charcoal). The mixtures were vortexed and centrifuged at 13000 rpm for 3 min. Pellets were washed twice with 500 μL of wash solution (100 mM NaPPi and 350 mM HClO4). Each pellet was resuspended in 500 μL wash solution and added to 10 mL Ultima Gold scintillation fluid. Charcoal-bound radio-activity was measured using a Beckman LS 6500 scintillation counter.

Loading of [<sup>14</sup>C]-substrate onto NRPS

The assay was carried out in two steps. First, a 50 μL reaction containing 10 μM EcdA-M1, 20 μM Sfp, 1mM CoA, 10 mM MgCl2, 1 mM TCEP, and 50 mM HEPES (pH 7.0) was incubated at ambient temperature for 30 min to convert apo EcdA-M1 to its holoform. Afterwards, 8 μM EcdI, 5 mM ATP, and ~ 40 μM [14C]-substrate (~ 4.4 × 106 cpm/mL) were added and incubated for 30 min. The reaction was quenched by 600 μL acetonitrile for the assays with [14C]-acyl substrate or 600 μL 10% trichloroacetic acid for [14C]-L-ornithine with addition of 100μL of 1 mg/mL BSA. The mixture was vortexed and centrifuged at 13,000 rpm for 3 min. The pellet was then washed twice with 600 μL acetonitrile, dissolved in 250 μL formic acid, added into 10 mL Ultima Gold scintillation fluid and subjected to a Beckman LS 6500 scintillation counter.

Supplementary Material

This work is supported by NIH Grants 1R01GM092217 and 1DP1GM106413 to Y. T.; 1R01GM49338 to C.T.W.; and NRSA GM08496 to R.A.C. Genome assembly of E.rugulosa NRRL 11440 was performed on the UCLA Academic Technological Services (ATS) Hoffman2 cluster. We would like to thank Qiyang Hu for his assistance in the use of the UCLA Hoffman cluster, Dr. Jaclyn M. Winter for assistance in RNA extraction and Prof. Scott G. Filler from the UCLA Harbor Medical Center for providing the C. albicans strain.

Additional bioinformatics data, in vitro assay results and spectra are found in the supplemental information.

This material is available free of charge via the Internet at http://pubs.acs.org.

VincentJLRelloJMarshallJSilvaEAnzuetoAMartinCDMorenoRLipmanJGomersallCSakrYReinhartKJAMA2009302232319952319ArendrupMCCurr Opin Crit Care20101644520711075MayrAAignerMLass-FlorlCMycoses2012552721668518AndesDRSafdarNBaddleyJWPlayfordGReboliACRexJHSobelJDPappasPGKullbergBJClin Infect Dis201254111022412055KettDHShorrAFReboliACReismanALBiswasPSchlammHTCrit Care201115R25322026929BenzFKnuselFNueschJTreichlerHVoserWNyfelerRKeller-SchierleinWHelv Chim Acta19745724594613708BoeckLKastnerRPatent, U SEli Lilly and CompanyUnited States1981SchwartzRESesinDFJoshuaHWilsonKEKempfAJGoklenKAKuehnerDGailliotPGleasonCWhiteRInamineEBillsGSalmonPZitanoLJ Antibiot (Tokyo)19924518531490876MizunoKYagiASatoiSTakadaMHayashiMJ Antibiot (Tokyo)197730297324959StrobelGAMillerRVMartinez-MillerCCondronMMTeplowDBHessWMMicrobiology1999145Pt 8191910463158RoyKMukhopadhyayTReddyGCDesikanKRGanguliBNJ Antibiot (Tokyo)1987402753570979BoeckLDFukudaDSAbbottBJDebonoMJ Antibiot (Tokyo)1989423822708131DebonoMAbbottBJFukudaDSBarnhartMWillardKEMolloyRMMichelKHTurnerJRButlerTFHuntAHJ Antibiot (Tokyo)1989423892708132DebonoMTurnerWWLaGrandeurLBurkhardtFJNissenJSNicholsKKRodriguezMJZweifelMJZecknerDJGordeeRSTangJParrTRJ Med Chem19953832717650681BouffardFAZambiasRADropinskiJFBalkovecJMHammondMLAbruzzoGKBartizalKFMarrinanJAKurtzMBMcFaddenDCNollstadtKHPowlesMASchmatzDMJ Med Chem1994372228295208TomishimaMOhkiHYamadaATakasugiHMakiKTawaraSTanakaHJ Antibiot (Tokyo)19995267410513849JosephJMKimRReboliACExpert Opin Pharmacother20089233918710358HensensODLieschJMZinkDLSmithJLWichmannCFSchwartzREJ Antibiot (Tokyo)19924518751490878LeonardWRJrBelykKMConlonDABenderDRDiMicheleLMLiuJHughesDLJ Org Chem200772233517343416KurokawaNOhfuneYJ Am Chem Soc1986108604322175373EvansDAWeberAEJ Am Chem Soc19871097151KurokawaNOhfuneYTetrahedron1993496195TothVNagyCTMiskeiMPocsiIEmriTFolia Microbiol (Praha)20115638121858538GalaganJECalvoSECuomoCMaLJWortmanJRBatzoglouSLeeSIBasturkmenMSpevakCCClutterbuckJKapitonovVJurkaJScazzocchioCFarmanMButlerJPurcellSHarrisSBrausGHDrahtOBuschSD’EnfertCBouchierCGoldmanGHBell-PedersenDGriffiths-JonesSDoonanJHYuJVienkenKPainAFreitagMSelkerEUArcherDBPenalvaMAOakleyBRMomanyMTanakaTKumagaiTAsaiKMachidaMNiermanWCDenningDWCaddickMHynesMPaolettiMFischerRMillerBDyerPSachsMSOsmaniSABirrenBWNature2005438110516372000GaoXChooiYHAmesBDWangPWalshCTTangYJ Am Chem Soc2011133272921299212ChiangYMSzewczykENayakTDavidsonADSanchezJFLoHCHoWYSimityanHKuoEPraseuthAWatanabeKOakleyBRWangCCChem Biol20081552718559263WiestAGrzegorskiDXuBWGoulardCRebuffatSEbboleDJBodoBKenerleyCJ Biol Chem20022772086211909873MarahielMAStachelhausTMootzHDChem Rev199797265111851476RottigMMedemaMHBlinKWeberTRauschCKohlbacherONucleic Acids Res201139W36221558170GaoXHaynesSWAmesBDWangPVienLWalshCTTangYNat Chem Biol201210.1038/nchembio.1047SzewczykENayakTOakleyCEEdgertonHXiongYTaheri-TaleshNOsmaniSAOakleyBRNat Protoc20061311117406574NayakTSzewczykEOakleyCEOsmaniAUkilLMurraySLHynesMJOsmaniSAOakleyBRGenetics2006172155716387870KraasFIHelmetagVWittmannMStriekerMMarahielMAChem Biol20101787220797616ChooiYHTangYChem Biol20101779120797606HansenDBBumpusSBAronZDKelleherNLWalshCTJ Am Chem Soc2007129636617472382WittmannMLinneUPohlmannVMarahielMAFEBS J2008275534318959760BaltzRHMiaoVWrigleySKNat Prod Rep20052271716311632MootzHDFinkingRMarahielMAJ Biol Chem20012763728911489886StachelhausTHuserAMarahielMAChem Biol199639138939706StachelhausTMootzHDBergendahlVMarahielMAJ Biol Chem1998273227739712910PfeiferBAAdmiraalSJGramajoHCaneDEKhoslaCScience2001291179011230695GadowAVaterJSchlumbohmWPalaczZSalnikowJKleinkaufHEur J Biochem19831322296188612AdefaratiAAGiacobbeRAHensensODTkaczJSJ Am Chem Soc19911133542UnderhillEWCan J Biochem1968464015658141HarimotoYHattaRKodamaMYamamotoMOtaniHTsugeTMol Plant Microbe Interact200720146317990954ZhangSSchwelmAJinHCollinsLJBradshawREFungal Genet Biol200744134217683963ScherlachKSarkarASchroeckhVDahseHMRothMBrakhageAAHornUHertweckCChemBioChem201112183621698737LoHCEntwistleRGuoCJAhujaMSzewczykEHungJHChiangYMOakleyBRWangCCCJ Am Chem Soc2012134470922329759XieXMeehanMJXuWDorresteinPCTangYJ Am Chem Soc2009131838819530726AmesBDNguyenCBrueggerJSmithPXuWMaSWongEWongSXieXLiJWHVederasJCTangYTsaiSCProc Natl Acad Sci U S A20121091114422733743HuttonJJJrKaplanAUdenfriendSArch Biochem Biophys19671213846057106PetersenLOlewinskiRSalmonPConnorsNAppl Microbiol Biotechnol20036226312883873LawrenceCCSobeyWJFieldRABaldwinJESchofieldCJBiochem J1996313Pt 11858546682HausingerRPCrit Rev Biochem Mol Biol2004392115121720LueschHHoffmannDHevelJMBeckerJEGolakotiTMooreREJ Org Chem2003688312515465BeckerJEMooreREMooreBSGene20043253514697508HibiMKawashimaTSokolovPSmirnovSKoderaTSugiyamaMShimizuSYokozekiKOgawaJAppl Microbiol Biotechnol2012(in press)10.1007/s00253CarlsonJETulsieramLKGlaubitzJCLukVWKKauffeldtCRutledgeRTheor Appl Genet199183194LiRZhuHRuanJQianWFangXShiZLiYLiSShanGKristiansenKYangHWangJGenome Res20102026520019144HaysSSelkerEFungal Genet Newl200047107

Identification of the ecd gene cluster. A) Organization of the ecd gene cluster. ecdA encodes for the six-module nonribosomal peptide synthetase. ecdI encodes for the linoleyl-AMP ligase responsible for lipo-initiation of EcdA. Also found in the cluster are ecdG, ecdH, and ecdK encoding for proposed hydroxylases; B) Metabolic profile of ΔecdA mutant shows the loss of production of 1.

ATP-[32P]-PPi exchange assay of EcdA-M1 A) The adenylation (A) domain of EcdA-M1 shows preference towards L-ornithine B) Michaelis-Menten plot for the A domain of EcdA-M1 with L-ornithine as substrate. C) Michaelis-Menten plot for the A domain of EcdA-M1 with 4(R/S)-OH-L-ornithine as substrate.

Loading Assay of EcdA-M1. A) EcdA-M1, converted to holo form in vitro by Sfp, is loaded with 14C-linoleic acid only in the presence of the fatty-acyl-AMP ligase EcdI and ATP as shown in the autoradiogram (lane 5). Loading of EcdA-M1 variants with 14C-linoleic acid in the same condition as sample 5 B) Quantification of percentage of EcdA-M1 loaded with radioactive linoleic acid. Samples 1–7 carried out in identical conditions as in A. C) Linoleic acid is activated by EcdI to form linoleyl-AMP which is subsequently transferred to EcdA-M1.

Biosynthesis of L-homotyrosine. A) Organization of hty gene cluster B) Disruption of htyA led to the loss of production of 1, which was restored upon addition of L-homotyrosine to the culture. C) Putative biosynthesis of L-homotyrosine.

Echinocandin B 1 and its semisynthetic derivatives cilofungin 2 and anidulafungin 3. Also shown are the related natural compound Pneumocandin A0 4 and its semisynthetic derivatives caspofungin 5 and micafungin 6.

Putative biosynthetic pathway of Echinocandin B 1

The Echinocandin B Biosynthetic Gene Cluster.

GeneLength (aa)Conserved Domain/FunctionNearest BLAST hit (Identity, Similarity)
ecdB545fungal transcription factorA. fumigatus AFUA_048990 (72%, 80%)
ecdC556transporter (MFS)N. fischeri NFIA_042010 (70%, 79%),
ecdD541transporter (MFS)N. fischeri NFIA_042010 (87%, 93%)
ecdE703glycosyl hydrolaseN. fischeri NFIA_042050 (78%, 88%)
ecdF508glycosidaseP. purpurogenum BAA12320 (71%, 84%)
ecdA7260NRPS (T-C-A-T-C-A-T-C-A-T-C-A-T-C-A-T-C-A-T-C)T. reesei EGR45389(33%, 52%)
ecdG338non-heme iron, α-ketoglutarate dependent dioxygenaseC. militaris CCM_03049 (31%, 50%),
ecdH503cytochrome P450 heme-iron-dependent oxygenaseN. haematococca 100691 (28%, 48%),
ecdI559fatty-acyl- AMP ligaseA. nidulans A4, AN3490 (52%, 67%),
ecdJ668hypothetical proteinA. capsulatus 05345 (34%, 47%),
ecdK332non-heme iron, α-ketoglutarate dependent dioxygenaseT. reesei TRIREDRAFT_58580 (43%, 63%)
ecdL1479multidrug transporter (ABC)M. anisoplae MAA_01638 (40%, 59%)

Genes found within the hty cluster

GeneLength (aa)Conserved Domain/FunctionNearest BLAST hit (Identity, Similarity)
htyF683heme-dependent P450 oxygenaseT. stipitatus TSTA_09270 (45%, 64%)
htyE329non-heme iron, α-ketoglutarate dependent dioxygenaseA. terreus ATEG09098 (50%, 66%),
htyA584isopropyl malate synthaseA. alternata BAI44742 (51%, 68%)
htyB379transaminaseA. alternata BAI44740 (55%, 71%),
htyC366isopropyl malate dehydrogenaseA. alternata BAI44741 (63%, 73%)
htyD877aconitaseA. alternata BAI44743 (57%, 68%),