PLoS OnePLoS ONEplosplosonePLoS ONE1932-6203Public Library of ScienceSan Francisco, USA243030503841265PONE-D-13-1084210.1371/journal.pone.0081469Research ArticleSynonymous Codon Usage in TTSuV2: Analysis and Comparison with TTSuV1Synonymous Codon Usage Bias in TTSuV2ZhangZhicheng1*DaiWei2DaiDingzhen1Department of Animal Science and Technology, Jinling Institute of Technology, Nanjing, ChinaKey Laboratory of Zoonoses of Anhui Province, Anhui Agricultural University, Hefei, ChinaKhudyakovYury EEditorCenters for Disease Control and Prevention, United States of America* E-mail: andzzc@126.com

Competing Interests: The authors have declared that no competing interests exist.

Conceived and designed the experiments: ZZ WD. Performed the experiments: ZZ WD. Analyzed the data: ZZ WD. Contributed reagents/materials/analysis tools: ZZ WD DD. Wrote the manuscript: ZZ WD.

201326112013811e814691232013121020132013Zhang et alThis is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Two species of the DNA virus Torque teno sus virus (TTSuV), TTSuV1 and TTSuV2, have become widely distributed in pig-farming countries in recent years. In this study, we performed a comprehensive analysis of synonymous codon usage bias in 41 available TTSuV2 coding sequences (CDS), and compared the codon usage patterns of TTSuV2 and TTSuV1. TTSuV codon usage patterns were found to be phylogenetically conserved. Values for the effective number of codons (ENC) indicated that the overall extent of codon usage bias in both TTSuV2 and TTSuV1 was not significant, the most frequently occurring codons had an A or C at the third codon position. Correspondence analysis (COA) was performed and TTSuV2 and TTSuV1 sequences were located in different quadrants of the first two major axes. A plot of the ENC revealed that compositional constraint was the major factor determining the codon usage bias for TTSuV2. In addition, hierarchical cluster analysis of 41 TTSuV2 isolates based on relative synonymous codon usage (RSCU) values suggested that there was no association between geographic distribution and codon bias of TTSuV2 sequences. Finally, the comparison of RSCU for TTSuV2, TTSuV1 and the corresponding host sequence indicated that the codon usage pattern of TTSuV2 was similar to that of TTSuV1. However the similarity was low for each virus and its host. These conclusions provide important insight into the synonymous codon usage pattern of TTSuV2, as well as better understangding of the molecular evolution of TTSuV2 genomes.

This study was supported by the Natural Science Foundation of Jiangsu Province (BK2012083), Major project of Nanjing Science and Technology Committee (201201026), Cooperative innovation fund of production-college-research of Jiangsu Province (BY2011114), and Research Foundation of Jinling Institute of Technology (40610047). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Introduction

It is well known that the 64 codons of the genetic code encode the 20 standard amino acids as well as three translation termination signals (UAA, UAG, UGA). Each amino acid is encoded with at least one codon (e.g., Met and Try); however, due to the degeneracy of the genetic code, some amino acids are encoded with up to six codons (e.g, Leu, Ser and Arg). Codons encoding the same amino acid are referred to as synonymous codons. Studies have indicated that synonymous codon usage is non-random and species-specific [1]. Some synonymous codons are more frequent than others both within and between genes, and this phenomenon is termed synonymous codon usage bias [2]. In general, genome dynamics, primarily mutation pressure, facilitate the evolution of novel viruses and strains and contribute to adaption to environment and host [3]. Hence, codon usage variation is considered to be an indicator of the type of force that influences genome evolution. Investigation of codon bias and the forces that influence it provides insights into the fundamental mechanisms of viral evolution. Thus, understanding codon bias is essential to understand the interplay between a virus and its host.

It was well established that mutational pressure and natural selection [4,5] were presented as the two major factors accounting for codon usage variation in mammalian, protozoan and endosymbiotic bacterial genes [6]. In their investigate of codon usage variation, Shackelton et al (2006) found that codon usage bias was strongly correlated with overall genomic GC content, indicating that compositional constraint under mutation pressure rather than natural selection was the main factor for specific codons [7]. Naya et al (2001) examined the Chlamydomonas reinhardtii genome, which has a high GC content, and found no evidence that base constraint under mutation pressure was responsible for determining the codon usage pattern [8]. Recently, it was also reported that codon usage variation is related to gene function and length [9,10], DNA replication and selective transcription [11], protein secondary structure [12,13] and environmental factors [14].

Torque teno virus (TTV) is a small, single-stranded, negative-sense non-enveloped, circular DNA virus [15], which has been classified as a member of the recently discovered Anelloviridae family [16]. It was first identified in a Japanese patient with post-transfusion hepatitis of unknown aetiology in 1997 [17]. Subsequently, TTV has been detected in humans, chimpanzees, poultry, swine, cattle, sheep, cats and dogs [18,19]. TTV was first detected in swine in 1999 and two genetically distinct species, Torque teno sus virus 1 (TTSuV1) and 2 (TTSuV2), have been identified based on the low sequence identity between the two variants [20].

Recently, Torque teno sus virus (TTSuV) infection of pigs has become widespread in many countries, including the USA, Canada, Spain, Germany, China, Japan, Korea and Brazil [21]. Despite the fact that TTV infection in humans is not yet directly associated with any disease [22], TTSuVs have been shown to be involved in co-infection with other diseases, including the experimental induction of porcine dermatitis and nephropathy syndrome in combination with porcine reproductive and respiratory syndrome virus infection [23] and post-weaning multisystemic wasting syndrome (PMWS) in combination with porcine circovirus type 2 (PCV2) infection in a gnotobiotic pig model [24]. Moreover, Kekarainen et al. (2006) found that TTSuV2 was detected at a significantly higher rate in PMWS pigs than in healthy pigs [25]. Other research comfirmed that the replication of TTSuV2, but not of TTSuV1, was up-regulated in the pigs with PMWS [26,27]. This result was supported by Taira et al (2009), who examined animals suspected of infection with PMWS and porcine respiratory disease complex [28]. However, due to the limited number of animal species examined and the lack of information about viral cell and tissue tropism, the characteristics and evolution of TTSuV are not fully understood.

We previously investigated synonymous codon usage in TTSuV1 [29] and began to suspect that this method might be important for elucidating the molecular mechanism and evolutionary process of TTSuV. In this study, synonymous codon usage bias was analyzed in the coding sequences (CDS) from the 41 available TTSuV2 genomes, and the codon usage patterns of TTSuV2 and TTSuV1 were compared.

Materials and MethodsSequences data

Complete genome sequences from 41 TTSuV2 isolates were downloaded from the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/Genbank/). Each TTSuV2 CDS was analyzed using DNAStar version 7.1 (DNAStar, Madison, WI). Table 1 summarizes relevant details about these viral sequences.

10.1371/journal.pone.0081469.t00141 complete TTSuV2 genes used in this study.
No.Accession no.NameIsolationYearLength(bp)
1AY8239912pBrazil20051875
2GU188046472142Germany20081878
3GU456385PTTV2b-VAUSA20081878
4GU456386PTTV2c-VAUSA20081878
5GU570197TTV2_GE9Spain20111884
6GU570203TTV2_1907Spain20111884
7GU570204TTV2_G31Spain20111884
8GU570205TTV2_G33Spain20111884
9GU570206TTV2_G43Spain20111875
10GU570207TTV2_G61Spain20111875
11GU570208TTV2_G64Spain20111884
12GU570209TTV2_GE1Spain20101884
13HM633214TTV2Bj7-2China20091863
14HM633215TTV2Bj2-3China20091884
15HM633216TTV2Bj4-3China20091884
16HM633217TTV2Bj6-2China20091875
17HM633218TTV2Bj6-3China20091875
18HM633219TTV2Bj7-3China20091884
19HM633220TTV2Fj2China20091884
20HM633221TTV2Jl1China20091884
21HM633222TTV2Jl2China20091884
22HM633223TTV2Jl27China20091863
23HM633224TTV2Bj1-2China20091878
24HM633225TTV2Hb1China20091884
25HM633226TTV2Bj8China20091863
26HM633227TTV2Bj11China20091878
27HM633228TTV2Bj12China20091875
28HM633229TTV2Gx1China20101872
29HM633230TTV2Gx2China20091878
30HM633231TTV2Gx3-2China20091884
31HM633232TTV2Gx4China20091872
32HM633233TTV2Jx1China20091884
33HM633234TTV2Jx2China20091875
34HM633235TTV2Ln13China20091863
35HM633236TTV2Ln14China20091872
36HM633237TTV2Ln21China20091863
37HM633238TTV2Ln22China20091863
38HM633239TTV2Ln23-2China20091875
39HM633240lung1China20091884
40HM633241lung3China20091878
41HQ204188SCChina20101878
Recombination analysis

The Recombination Analysis Tool (RAT, http://cbr.jic.ac.uk/dicks/software/RAT/) was used to detect recombination events in TTSuV2 and TTSuV1 sequences. Recombination is a prevailing drive that shapes genome evolution, and it is believed to influence the efficacy of natural selection on codon usage [30]. RAT uses a distance-method-based algorithm to perform pair-wise comparisons with multiple sequence alignments (DNA or protein). The RAT graph represents the genetic distance of each sequence in the alignment to a reference sequence (Y-axis) for each position in the sequence (X-axis). A putative recombination event is detected when the lines representing two sequences intersect in the graph [31].

Compositional properties measures

General nucleotide composition (A%, C%, T% and G%) and nucleotide composition at the third position of each codon (A3S%, C3S%, T3S% and G3S%) were analyzed for TTSuV2 CDSs using Molecular Evolutionary Genetics Analysis (MEGA) software version 5.0 [32]. The GC and GC3S index was used to calculate the overall G + C content in the gene sequence and at the third position of synonymous codon (excluding Met, Trp and termination codons).

Measure of synonymous codon usage

Relative synonymous codon usage (RSCU) values and effective number of codons (ENC) values were calculated using CodonW software version 1.4 (http://codonw.sourceforge.net). The RSCU is defined as the ratio between the usage frequency of one codon in the gene and its expected frequency in the synonymous codon family (i.e., the observed frequency of a codon adjusted for amino acid composition). RSCU value is calculated according to the following published equation [33]:

RSCUij=Xij1nij=1niXij

Xij denotes the position of the codon (i) in the CDS for the corresponding amino acid (j). ni denotes the total number of synonymous codons encoding the amino acid at this position. Codons with RSCU values greater than 1.0 exhibit positive codon usage bias, while those with RSCU values less than 1.0 have negative codon usage bias. RSCU values of 1.0 indicate that the codon frequencies are equal or random.

The ENC is the most useful estimator of absolute synonymous codon usage bias [34] and can indicate the degree of synonymous codon bias in a codon family. ENC values range from 20 (only one synonymous codon occurs in the CDS) to 61 (all synonymous codons occur with equal frequency). A gene with an ENC value lower than 35 is generally considered to have significant codon usage bias.

Correspondence analysis

Correspondence analysis (COA), also known as principal component analysis, was performed with CodonW software version 1.4. COA is the most commonly used multivariate statistical analysis method [35]. In this analysis, COA was used to study the major trends in sequence variation and distribute genes along continuous axes according to these trends. Each gene was represented as a 59-dimensional vector, each dimension corresponding to the RSCU value for each sense codon (excluding Met, Trp and termination codons). Major variation trends within this dataset can be determined with the relative inertia: genes were positioned according to the major inertia to determine the major factors affecting codon usage bias in the gene.

Statistical analysis

Correlation analysis was performed to compare the relationship between nucleotide composition and synonymous codon usage pattern using Spearman’s rank correlation analysis method. A phylogenetic tree was constructed by the neighbor-joining method with a bootstrap of 1000 replicates, based on the Clustal W alignment produced with MEGA software version 5. Cluster analysis was performed using the hierarchical cluster method, and the distances between selected sequences were calculated by the Euclidean distance method. All statistical results were analyzed using Student’s t-test, SPSS software version 11.6 for Windows (p > 0.05, no difference; 0.01 < p < 0.05, non-significant difference; p < 0.01, significant difference).

ResultsRecombination analysis

Recombination is believed to influence the efficacy of natural selection on codon usage [30]. A single recombinant sequence present in an alignment can seriously influence the branch order and branch length of the trees generated using standard phylogenetic methods [36]. Therefore, it was necessary to exclude any TTSuV2 and TTSuV1 sequences found to be recombinant from further analysis. Recombination analysis of a nucleotide sequence alignment including all 41 TTSuV2 sequences and 29 TTSuV1 sequences was performed using RAT software (Figure 1). The resulting graph provided no evidence for recombination within or between TTSuV2 and TTSuV1 sequences. However, the graph indicated that the sequences diverged at nucleotide position 2282 into branches corresponding to TTSuV2 and TTSuV1.

10.1371/journal.pone.0081469.g001Recombination analysis of TTSuV2 and TTSuV1 sequences using the RAT.

The colour of the line on the graph is the same as the colour of its sequence name on the left.

The 41 TTSuV2 sequences were further analyzed for codon usage bias and the synonymous codon usage pattern between TTSuV2 and TTSuV1 (previously analyzed) [29] was compared, as described in the following sections.

Compositional properties

The nucleotide content of the TTSuV2 genomes is provided in Table 2. In the CDSs from the 41 genomes, A and G occurred more frequently than C and T. A occurred most frequently at the third codon position (average A3S% = 41.77%) and T occurred the least frequently (average T3S% = 27.67%). The overall nucleotide composition and the composition at the third codon position in TTSuV2 genomes suggest that compositional constraint might be influencing the codon usage pattern of this genome. The GC% of TTSuV2 genomes (42.9% to 46.7%, average 45.1%) is lower than for other vertebrate DNA viruses. The GC3S% ranged from 43.2% to 48.2% with a mean value of 46.2%. Due to this compositional constraint, it was expected that A would occur most frequently at the third codon position in TTSuV2 genomes.

10.1371/journal.pone.0081469.t002Nucleotide content of 41 TTSuV2 genomes (%).
No.T(U)T(U)3SCC3SAA3SGG3SGC GC3SENC
120.5427.8121.4133.1136.3341.9621.7228.1045.046.054.55
220.2226.4822.5432.6135.6043.2421.6426.1945.945.355.20
320.4027.0622.4133.5534.8839.4622.3129.0146.748.057.31
420.4027.0622.4133.5534.8839.4622.3129.0146.748.057.31
520.7527.8921.6733.3335.4440.1722.1427.7645.646.557.86
620.4427.9121.8333.4135.9040.4021.8328.3345.546.658.18
720.5028.0721.5732.8935.7440.5222.1928.1745.646.258.16
820.3927.9121.8833.4135.9040.4021.8328.3345.646.658.18
920.3427.3121.3733.2636.4741.7221.8328.4745.146.553.76
1020.0327.6321.6732.8936.4741.6921.8327.2545.645.856.04
1120.0327.0321.5233.1936.1141.3322.3427.8845.846.455.66
1220.9628.5421.4732.6835.4439.9122.1428.1345.446.258.18
1321.5027.2921.3431.9937.0244.0420.1528.5443.445.154.69
1419.9328.0521.5233.7136.6741.8321.8827.9945.446.155.37
1520.4428.2821.0634.3937.0341.3121.4727.6244.646.555.04
1619.8128.2821.8333.7136.5840.4021.7827.9245.846.557.14
1720.0926.2321.7934.7536.1741.2421.9529.1746.148.157.56
1819.8828.4121.7333.3336.7241.7421.6726.9745.545.556.51
1920.2428.0721.6232.6835.9540.3522.1929.0245.846.657.03
2019.9328.0521.6233.7136.5241.6521.9327.8645.546.155.55
2120.2927.9521.5732.5335.9040.3522.2428.7845.946.556.86
2220.4027.3320.8731.8937.9043.1620.8230.0044.245.957.78
2320.6626.2921.2333.7136.1240.8022.0030.2645.248.056.84
2420.3428.4821.0633.8636.8841.3121.7328.0644.846.454.86
2521.8128.5120.9231.6336.8645.0720.4026.8242.943.853.98
2619.9926.2922.0035.0636.0140.9222.0029.1046.348.257.24
2720.5928.1621.5733.9236.0740.7721.7828.0645.346.754.41
2821.0927.2521.3431.2137.4244.5920.1628.1243.444.455.97
2920.5626.1521.8433.6335.5541.1622.0528.9546.147.556.81
3020.3428.2821.1134.3936.9340.9921.6227.7944.946.754.93
3121.4526.8121.1431.2136.9043.7220.5228.5043.744.956.01
3220.0826.6721.5733.1136.3142.5422.0328.0745.346.056.14
3320.3829.1221.5732.2837.3143.5120.7426.8544.744.156.39
3420.7727.2721.1331.8237.6442.8920.4630.5644.146.256.61
3521.1929.3621.1930.9137.1643.9420.4726.4243.743.256.11
3621.1828.4720.4031.4837.8542.6620.5631.3943.545.957.41
3721.0828.3720.4631.6338.2143.2220.2531.2243.245.856.27
3820.4327.3321.4733.7836.3840.6721.7229.1545.247.354.02
3919.8826.9321.9333.5536.6742.8321.5227.2345.445.854.49
4020.4028.9221.6432.9637.3543.1220.6126.5944.444.456.01
4120.1427.1121.7933.7836.2741.6321.7928.8545.747.156.25
mean20.4827.6721.5233.0436.4641.7721.5328.3545.0946.1856.21

The ENC values of these TTSuV2 genomes were much higher than genomes of other DNA viruses, varying from 55.20 to 58.18 with a mean value of 56.21. This result indicates that codon usage bias is not remarkable in TTSuV2 genomes and is apparently maintained at a stable level.

Codon usage in TTSuV2

The overall RSCU values for the 59 codons in all 41 TTSuV2 genomes indicated that A and C occurred most frequently at the third codon position (i.e., GUA for Val, GCA for Ala, CAA for Gln and AAC for Asn) as shown in Table 3. In addition, the CCU, ACU and UAU codons, encoding Pro, Thr and Tyr, respectively, occurred more frequently than the other synonymous codons for these amino acids. Two codons encoding Arg, CGA and CGC, also occurred more frequently than their synonymous codons. These results support the hypothesis that compositional constraint is a major contributing factor in codon usage pattern in TTSuV2 genomes.

10.1371/journal.pone.0081469.t003RSCU values of codons in TTSuV2, TTSuV1 and swine.<sup>a</sup>
AAbCodonRSCUTTSuV1SUScAAbCodonRSCUTTSuV1SUSc
PheUUU1.000.761.11TyrUAU1.101.101.12
UUC1.001.240.89UAC0.900.900.82
LeuUUA1.561.380.65AlaGCU0.811.201.36
UUG0.851.210.85GCC0.941.151.22
CUU0.900.491.20GCA1.511.151.05
CUC1.030.711.12GCG0.740.500.37
CUA1.230.990.56HisCAU0.910.600.97
CUG0.441.221.62CAC1.091.401.03
IleAUU0.530.571.06GlnCAA1.010.900.85
AUC0.951.021.11CAG0.991.101.15
AUA1.511.410.83AsnAAU0.971.011.02
ValGUU0.710.621.11AAC1.030.990.98
GUC0.560.400.96LysAAA1.261.221.21
GUA1.741.280.64AAG0.740.780.79
GUG0.991.711.29AspGAU0.670.740.95
SerUCU0.820.751.34GAC1.331.261.05
UCC1.350.721.20GluGAA1.261.081.09
UCA1.651.491.00GAG0.740.920.91
UCG0.530.560.31CysUGU0.670.551.06
AGU0.551.390.93UGC1.331.450.94
AGC1.101.101.22ArgCGU0.580.510.55
ProCCU1.581.211.26CGC0.390.930.65
CCC0.250.721.08CGA0.500.630.54
CCA1.481.111.23CGG0.640.550.74
CCG0.690.960.43AGA2.161.341.86
ThrACU1.311.011.19AGG1.732.051.67
ACC1.011.231.23GlyGGU0.490.740.81
ACA1.061.211.24GGC0.660.991.08
ACG0.630.550.34GGA2.071.311.18
GGG0.780.950.94

The preferred codons for each amino acid is displayed in bold.

AA is the abbreviation of Amino Acid.

SUS is swine.

For TTSuV2 sequences, ENC was plotted against both the GC content at the third synonymous codon position (GC3S%) and the expected ENC values, as determined by CodonW analysis (Figure 2). All actual codon usage indices were lower than expected, although differences were small. In addition, a positive correlation (r = 0.316, 0.01 < p < 0.05) between GC3S and ENC values was found. These results taken together support the conclusion that factors other than compositional constraint under mutation pressure (the major factor accounting for codon usage bias) have influenced TTSuV2 evolution.

10.1371/journal.pone.0081469.g002Distribution of the ENC values and GC content at synonymous codon third position (GC<sub>3S</sub>).

The curve indicates the expected codon usage if compositional constraint alone account for codon usage bias.

COA of codon usage

To investigate RSCU variation, COA was performed using the 41 TTSuV2 genomes as a single dataset. As described in the "Materials and methods" section, the distribution of genes on the COA axis was used to identify the source of the variation among a set of multivariate data points. A major trend in the first axis (f1’) accounted for 16.91% of total synonymous codon usage variation, and the second major trend in the second axis (f2’) accounted for 13.72% of the total variation (data not shown).

COA was performed for TTSuV1 and TTSuV2 genomes separately and the first two axes of the plots are shown in Figure 3. Although TTSuV1 and TTSuV2 genes occupied all four quadrants of the rectangular coordinate system, the points were generally separated from each other. This result reveals that variation in codon usage might be one of the factors driving the observed aspect of TTSuV evolution.

10.1371/journal.pone.0081469.g003Correspondence analysis of codon usage patterns of TTSuV2 and TTSuV1.
Effect of mutational bias on codon usage variation

To explore whether the evolution of codon usage bias in TTSuV2 CDS had been driven by mutation pressure alone or whether translation selection from its host has also contributed, we first compared the correlation between general nucleotide composition (A%, T%, G%, C%, GC%) and nucleotide composition at the third codon position (A3S%, T3S%, G3S%, C3S%, GC3S%) using the Spearman’s rank correlation analysis method (Table 4). A significant positive correlation was observed between A% and A3S% (r = 0.761, p < 0.01), C% and C3S% (r = 0.392, 0.01 < p < 0.05), GC% and GC3S% (r = 0.645, p < 0.01) and significant negative correlation was observed for most of heterogeneous nucleotide comparisons. Taken alone, these results suggest that compositional constraints under mutation pressure determine the codon usage pattern for TTSuV2. However, a significant positive correlation between G% and C3S% (r = 0.434, p < 0.01), GC% and T3S% (r = 0.434, p < 0.01) and no correlation between T% and T3S% (r = 0.175, p > 0.05), G% and G3S% (r = 0.171, p > 0.05) suggest that natural selection from its host might have played an appreciable role in determining the codon usage pattern of this virus.

10.1371/journal.pone.0081469.t004The correlation analysis between A, T, G, C, GC contents and A<sub>3S</sub>, T<sub>3S</sub>, G<sub>3S</sub>, C<sub>3S</sub>, GC<sub>3S</sub> contents in TTSuV2 CDS.<sup>a</sup>
A3S%T(U)3S %G3S %C3S %GC3S %
A%0.761**0.378*-0.086NS-0.364*-0.616**
U%0.238NS0.175NS0.234NS-0.52**-0.237NS
G%-0.805**-0.391*0.171NS0.434**0.664**
C%-0.458**-0.393*-0.139NS0.392*0.378*
GC%-0.710**0.434**0.078NS0.505**0.645**

Value in this table is the P-value of correlation analysis.

NS, non-significant (p>0.05).

0.01<p<0.05.

p<0.01.

Furthermore, G + C content at the first and second codon positions (GC1% and GC2%) was compared with the G + C content at the third codon position (GC3%). A highly significant correlation was observed between GC1% with GC2% (r = 0.551, p < 0.01), GC3% (r = 0.699, p < 0.01), and GC2% with GC3% (r = 0.490, p < 0.01). Since the effects were present at all codon positions, the results further support the hypothesis that nucleotide constraint under mutation pressure was a main determinant for synonymous codon usage pattern in TTSuV2.

COA was also performed for the first two principle axes (f1’ and f2’) and A%, T%, G%, C%, GC%, A3S%, T3S%, G3S%, C3S%, GC3S% (Table 5). The first principle axis (f1’) exhibited a significant positive correlation with G%, C%, GC%, C3S%, GC3S% and a negative correlation with A%, A3S%. It was interesting to note that, except G3S% (r = –0.357, 0.01 < p <0.05), the second principle axis (f2’) had no correlation with any nucleotide content. These results further support the conclusion that composition constraints under mutational bias is an important factor determining synonymous codon usage pattern in TTSuV2, and but that other factors, such as natural selection, contributed.

10.1371/journal.pone.0081469.t005The correlation analysis between the first two axes and nucleotide contents in TTSuV2 CDS.<sup>a</sup>
A%T(U)%G%C%GC%A3S%T(U)3S%G3S%C3S%GC3S%
f1-0.631**-0.367*0.614**0.493**0.552**-0.619**-0.260NS0.054NS0.664**0.608**
f20.071NS-0.014NS-0.023NS-0.233NS-0.236NS0.017NS0.270NS-0.357*-0.093NS-0.286NS

Value in this table is the P-value of correlation analysis.

NS, non-significant (p>0.05).

0.01<p<0.05.

p<0.01

Relationship between TTSuV and host codon usage patterns

In the ENC plot (Figure 2), most points were near to and under the expected curve, which suggested that other factors contributed to codon usage bias in addition to mutation pressure. To examine this further, a comparative analysis of RSCU values was performed for TTSuV2, TTSuV1 and swine, the natural host for this virus. We found that the codon usage pattern of TTSuV2 was mostly coincident with that of TTSuV1 and that the similarity between the viruses and the host was low. In particular, except for CCU encoding Pro and UAU encoding Tyr, all the preferentially used codons in TTSuV2 and TTSuV1 had an A or C in the third codon position: UUA for Leu, AUA for Ile, UCA for Ser, CAC for His, GAC for Asp and UGC for Gly (Table 3). In contrast, most frequent codons in swine had a T or A at the third codon position. Although some codons frequent in swine, such as CAC for His, AAA for Lys, GAC for Asp and AAA for Glu, were also frequent in TTSuV2 and TTSuV1, the high frequency codons in swine (CUG for Leu, UCU for Ser, UGU for Cys) were generally low frequency codons in TTSuV2 and TTSuV1. It was worth noting that the similarity to swine was higher for TTSuV1 than it was for TTSuV2. The RSCU values of synonymous codons in TTSuV1 and swine, including GUG for Val, GCU for Ala, CAG for Gln, AAU for Asn, were clearly different than TTSuV2 values. This suggests that TTSuV1 might have adapted to its host under natural selection to some degree for improved translation efficiency and that selection pressure from the host had less effect on codon usage pattern of TTSuV2.

Phylogenetic and cluster analysis

A cluster tree was generated with the RSCU values from all 41 TTSuV2 genomes using a hierarchical cluster method. As shown in Figure 4, the TTSuV2 CDS were divided into three main lineages (I–III). Lineage I comprised two strains isolated from the USA, one from Germany and five from China. Twenty-two strains isolated from Brazil, Spain and China were grouped into Lineage II. Lineage III was comprised of strains isolated from China only. Some genes from different isolates were classified into the same lineage, while others genes from the same isolate were classified into different lineages; thus lineage did not correspond well with geographical distribution.

10.1371/journal.pone.0081469.g004Cluster tree result of 41 TTSuV2 genes based on hierarchical cluster method.

The phylogenetic analysis of all 41 TTSuV2 (black dots) and 29 TTSuV1 sequences (white dots) was performed to determine the conservation and variation of codon usage pattern within TTSuV lineages (Figure 5). The two major branches of the resulting phylogenetic tree corresponded to TTSuV2 and TTSuV1, and each branch had several minor branches. Thus, phylogenetic analysis of the two viruses did not reveal correlations between sequence differences and geographical distribution.

10.1371/journal.pone.0081469.g005Phylogenetic tree of 41 TTSuV2 sequences and 29 TTSuV1 sequences.

● represents TTSuV2 and ○ represents TTSuV1.

Discussion

TTSuV is an emerging small DNA virus, widely distributed in pig-farming countries. Although reports implicate TTSuV in co-infection with other diseases, in depth studies on molecular characteristics and pathogenic mechanism are lacking [37,38]. Synonymous codon usage is a well established technique for analyzing genetic information from viral genomes. Most codon usage studies have focused on higher organisms or microorganisms with large genomes and viruses that pose a great threat to human health, such as human immunodeficiency virus, human bocavirus [39], hepatitis virus [40] and Influenza A virus [41]. Results from analyzing codon usage bias in TTSuV genomes are expected to contribute to the knowledge of the characteristics and molecular evolution of this virus. This report furthers our investigation of synonymous codon usage variation in TTSuV1 and provides the first analysis of TTSuV2.

Recombination is an important event in viral evolution and epidemiology [42]. It is interesting to note that recombinant viruses appear to be highly pathogenic, suggesting that recombination events either preserve or increase the pathogenicity of the original strains. Various studies have demonstrated that natural inter- and intra-genotypic recombination occurs frequently in viruses, as shown for highly pathogenic porcine reproductive and respiratory syndrome viruses [43], PCV2 [44], humane enterovirus 71 [45], and rabbit haemorrhagic disease virus [46]. Thus, before analyzing codon usage bias for TTSuV2, we first conducted recombination analysis of 41 TTSuV2 sequences and 29 TTSuV1 sequences, and found no evidence for recombination between the two viruses (Figure 1).

In this study, we analyzed synonymous codon usage bias in TTSuV2 CDS, as well as the relationship between codon usage patterns of TTSuV2 and TTSuV1. Most frequent codons in both TTSuV2 and TTSuV1 had A or C at the third codon position. Mean ENC values for H5N1 influenza A virus [47], severe acute respiratory syndrome [48] and human bocavirus [39], reported as 50.91, 48.99 and 44.45, respectively, are lower than the ENC values for TTSuV2 and TTSuV1 (56.21 and 56.46, respectively), indicating a relatively low codon usage bias for these two viruses. Codon usage patterns for TTSuV2 and TTSuV1 were remarkably similar. In addition, no significant relationship was found between the codon usage pattern of TTSuV2 and its host; although TTSuV1 codon usage was comparatively more similar to swine than that of TTSuV2 (Table 3). This observation might be the result of genome composition evolution and dynamic processes of mutation and selection that enabled the TTSuV1 virus to escape the antiviral cell responses and adapt its codon usage to its host environment [49].

In this study, nucleotide frequency at the third codon position of synonymous codons correlated to general composition for some codons but not for others (Table 4). The GC content was similar at all codon positions in TTSuV2 genomes, presumably as a result of mutational pressure. In addition, the general correlation between codon usage bias and composition constraint suggest that mutational pressure was an important factor determining codon usage in TTSuV2, as seen in the highly significant correlation between GC1%, GC2% and GC3% (p < 0.01), and remarkable correlation between f1’ values with respect to A%, G%, C%, GC%, A3S%, G3S%, GC3S% (p<0.01) (Table 5). Furthermore, in all ENC plots, values for TTSuV2 genomes were below the expected curve (Figure 1). Taken together, the above evidence indicates that compositional constraint under mutational pressure significantly contributed to the variation of synonymous codon usage in TTSuV2 genomes.

Natural selection has been shown to influence the synonymous codon usage pattern in viruses [50] and this conclusions is supported by this study. First, although the GC3S% for the TTSuV2 genome is lower than average (46.20%), the most frequent codons had A or C at the third codon position (Table 3). Second, a significant positive correlation existed between G% and C3S%, and GC% and T3S% (p < 0.01), whereas no correlation was detected between T% and T3S% or G% and G3S% (p > 0.05) (Table 4). Except G3S%, no correlation was found between f2’ values and A%, T%, G%, C%, GC%, A3S%, T3S%, C3S% or GC3S% (p > 0.05) in this study (Table 5). Third, most points in the ENC plot were close to the expected curve, although all were below it (Figure 2). The above evidences suggests that, in addition to mutation pressure, natural selection played an important role in determining codon usage bias for TTSuV2 genomes as well. Thus, codon bias in the TTSuV2 genome is multi-factorial. We believe that these characteristics of TTSuV2 genomes might have conferred adaptive advantage resulting in a highly efficient dissemination of this virus through different modes of transmission.

The analysis of TTSuV genome sequences identified two genetically distinct species, TTSuV1 and TTSuV2. COA was performed to detect possible codon usage variation between these two viruses. Unexpectedly, the distribution of the two viruses showed that genetically distinct species were distantly located in the plane defined by the first two axes of the analysis (Figure 3). A cluster tree analysis based on the RSCU values of TTSuV2 genomes revealed that geographic factors failed to correspond to the codon usage pattern of this virus (Figure 4). Further, the phylogenetic tree had two major branches corresponding to the two different species, and no specific geographical correlation was detected in this analysis (Figure 5). It seems likely that, given extensive international communication and various modes of transmission for this virus, geographical distance is a weak factor in the distribution of TTSuV2 in different countries.

In summary, our investigation of synonymous codon usage pattern in TTSuV2 CDS revealed that codon usage bias is not remarkable, possibly representing the interactions between compositional constraint under mutation pressure and natural selection. However, both TTSuV1 and TTSuV2 genomes exhibited significant synonymous codon usage bias favoring A or C at the third codon position, presumably determined by compositional constraint under mutation pressure. Although the analysis of synonymous codon usage does not perfectly reflect the genetic variation of TTSuV2 nor does it distinguish between TTSuV1 and TTSuV2, our results provide an insight into the codon usage variation in TTSuV2 genes that may also facilitate understanding of TTSuV evolution.

ReferencesGuptaSK, BhattacharyyaTK, GhoshTC (2004) Synonymous codon usage in Lactococcus lactis: mutational bias versus translational selection. J Biomol Struct Dyn21: 527-536.10.1080/07391102.2004.10506946 PubMed: 1469279714692797LloydAT, SharpPM (1992) Evolution of codon usage patterns: the extent and nature of divergence between Candida albicans and Saccharomyces cerevisiae. Nucleic Acids Res20: 5289-5295.10.1093/nar/20.20.5289 PubMed: 14375481437548ChenR, HolmesEC (2006) Avian influenza virus exhibits rapid evolutionary dynamics. Mol Biol Evol23: 2336-2341.10.1093/molbev/msl102 PubMed: 1694598016945980ZhongJ, LiY, ZhaoS, LiuS, ZhangZ (2007) Mutation pressure shapes codon usage in the GC-Rich genome of foot-and-mouth disease virus. Virus Genes35: 767-776.10.1007/s11262-007-0159-z PubMed: 1776867317768673SauK, SauS, MandalSC, GhoshTC (2005) Factors influencing the synonymous codon and amino acid usage bias in AT-rich Pseudomonas aeruginosa phage PhiKZ. Acta Biochim Biophys Sin (Shanghai)37: 625-633.10.1111/j.1745-7270.2005.00089.x PubMed: 1614381816143818SharpPM, LiWH (1986) Codon usage in regulatory genes in Escherichia coli does not reflect selection for 'rare' codons. Nucleic Acids Res14: 7737-7749.10.1093/nar/14.19.7737 PubMed: 35347923534792ShackeltonLA, ParrishCR, HolmesEC (2006) Evolutionary basis of codon usage and nucleotide composition bias in vertebrate DNA viruses. J Mol Evol62: 551-563.10.1007/s00239-005-0221-1 PubMed: 1655733816557338NayaH, RomeroH, CarelsN, ZavalaA, MustoH (2001) Translational selection shapes codon usage in the GC-rich genome of Chlamydomonas reinhardtii. FEBS Lett501: 127-130.10.1016/S0014-5793(01)02644-8 PubMed: 1147027011470270ChiapelloH, LisacekF, CabocheM, HenautA (1998) Codon usage and gene function are related in sequences of Arabidopsis thaliana. Gene209: GC1-GC38MaES, ChowEY, ChanAY, ChuCM, LinSYet al. (2002) Low affinity and unstable hemoglobin variant caused by AAC--ATC (Asn--Ile) mutation at codon 108 of the beta-globin gene. Haematologica87: 553-554 PubMed: 1201067312010673McInerneyJO (1998) Replicational and transcriptional selection on codon usage in Borrelia burgdorferi. Proc Natl Acad Sci U S A95: 10698-10703.10.1073/pnas.95.18.10698 PubMed: 97247679724767ChiusanoML, Alvarez-ValinF, Di GiulioM, D'OnofrioG, AmmiratoGet al. (2000) Second codon positions of genes and the secondary structures of proteins. Relationships and implications for the origin of the genetic code. Gene261: 63-6911164038GuptaSK, MajumdarS, BhattacharyaTK, GhoshTC (2000) Studies on the relationships between the synonymous codon usage and protein secondary structural units. Biochem Biophys Res Commun269: 692-696.10.1006/bbrc.2000.2351 PubMed: 1072047810720478LevinDB, WhittomeB (2000) Codon usage in nucleopolyhedroviruses. J Gen Virol81: 2313-2325 PubMed: 1095099110950991MushahwarIK, ErkerJC, MuerhoffAS, LearyTP, SimonsJNet al. (1999) Molecular and biophysical characterization of TT virus: evidence for a new virus family infecting humans. Proc Natl Acad Sci U S A96: 3177-3182.10.1073/pnas.96.6.3177 PubMed: 1007765710077657BiaginiP (2009) Classification of TTV and related viruses (anelloviruses). Curr Top Microbiol Immunol331: 21-33.10.1007/978-3-540-70972-5_2 PubMed: 1923055519230555NishizawaT, OkamotoH, KonishiK, YoshizawaH, MiyakawaYet al. (1997) A novel DNA virus (TTV) associated with elevated transaminase levels in posttransfusion hepatitis of unknown etiology. Biochem Biophys Res Commun241: 92-97.10.1006/bbrc.1997.7765 PubMed: 94052399405239OkamotoH, NishizawaT, TakahashiM, TawaraA, PengYet al. (2001) Genomic and evolutionary characterization of TT virus (TTV) in tupaias and comparison with species-specific TTVs in humans and non-human primates. J Gen Virol82: 2041-2050 PubMed: 1151471311514713OkamotoH, TakahashiM, NishizawaT, TawaraA, FukaiKet al. (2002) Genomic characterization of TT viruses (TTVs) in pigs, cats and dogs and their relatedness with species-specific TTVs in primates and tupaias. J Gen Virol83: 1291-1297 PubMed: 1202914312029143NielC, Diniz-MendesL, DevalleS (2005) Rolling-circle amplification of Torque teno virus (TTV) complete genomes from human and swine sera and identification of a novel swine TTV genogroup. J Gen Virol86: 1343-1347.10.1099/vir.0.80794-0 PubMed: 1583194515831945McKeownNE, FenauxM, HalburPG, MengXJ (2004) Molecular characterization of porcine TT virus, an orphan virus, in pigs from six different countries. Vet Microbiol104: 113-117.10.1016/j.vetmic.2004.08.013 PubMed: 1553074515530745JelcicI, Hotz-WagenblattA, HunzikerA, Zur HausenH, de VilliersEM (2004) Isolation of multiple TT virus genotypes from spleen biopsy tissue from a Hodgkin's disease patient: genome reorganization and diversity in the hypervariable region. J Virol78: 7498-7507.10.1128/JVI.78.14.7498-7507.2004 PubMed: 1522042315220423KakkolaL, BondénH, HedmanL, KiviN, MoisalaSet al. (2008) Expression of all six human Torque teno virus (TTV) proteins in bacteria and in insect cells, and analysis of their IgG responses. Virology382: 182-189.10.1016/j.virol.2008.09.012 PubMed: 1894784818947848EllisJA, AllanG, KrakowkaS (2008) Effect of coinfection with genogroup 1 porcine torque teno virus on porcine circovirus type 2-associated postweaning multisystemic wasting syndrome in gnotobiotic pigs. Am J Vet Res69: 1608-1614.10.2460/ajvr.69.12.1608 PubMed: 1904600819046008KekarainenT, SibilaM, SegalésJ (2006) Prevalence of swine Torque teno virus in post-weaning multisystemic wasting syndrome (PMWS)-affected and non-PMWS-affected pigs in Spain. J Gen Virol87: 833-837.10.1099/vir.0.81586-0 PubMed: 1652803216528032AramouniM, SegalésJ, SibilaM, Martin-VallsGE, NietoDet al. (2011) Torque teno sus virus 1 and 2 viral loads in postweaning multisystemic wasting syndrome (PMWS) and porcine dermatitis and nephropathy syndrome (PDNS) affected pigs. Vet Microbiol153: 377-381.10.1016/j.vetmic.2011.05.046 PubMed: 2171921521719215NietoD, AramouniM, Grau-RomaL, SegalésJ, KekarainenT (2011) Dynamics of Torque teno sus virus 1 (TTSuV1) and 2 (TTSuV2) DNA loads in serum of healthy and postweaning multisystemic wasting syndrome (PMWS) affected pigs. Vet Microbiol152: 284-290.10.1016/j.vetmic.2011.05.020 PubMed: 2168011321680113TairaO, OgawaH, NagaoA, TuchiyaK, NunoyaTet al. (2009) Prevalence of swine Torque teno virus genogroups 1 and 2 in Japanese swine with suspected post-weaning multisystemic wasting syndrome and porcine respiratory disease complex. Vet Microbiol139: 347-350.10.1016/j.vetmic.2009.06.010 PubMed: 1957062519570625ZhangZ, DaiW, WangY, LuC, FanH (2013) Analysis of synonymous codon usage patterns in torque teno sus virus 1 (TTSuV1). Arch Virol, 158: 14554 PubMed: 2301131023011310MaraisG, MouchiroudD, DuretL (2001) Does recombination improve selection on codon usage? Lessons from nematode and fly complete genomes. Proc Natl Acad Sci U S A98: 5688-5692.10.1073/pnas.091427698 PubMed: 1132021511320215EtheringtonGJ, DicksJ, RobertsIN (2005) Recombination Analysis Tool (RAT): a program for the high-throughput detection of recombination. Bioinformatics21: 278-281.10.1093/bioinformatics/bth500 PubMed: 1533346215333462TamuraK, PetersonD, PetersonN, StecherG, NeiMet al. (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol28: 2731-2739.10.1093/molbev/msr121 PubMed: 2154635321546353SharpPM, LiWH (1986) An evolutionary perspective on synonymous codon usage in unicellular organisms. J Mol Evol24: 28-38.10.1007/BF02099948 PubMed: 31046163104616ComeronJM, AguadéM (1998) An evaluation of measures of synonymous codon usage bias. J Mol Evol47: 268-274.10.1007/PL00006384 PubMed: 97324539732453ZhouJH, ZhangJ, ChenHT, MaLN, LiuYS (2010) Analysis of synonymous codon usage in foot-and-mouth disease virus. Vet Res Commun34: 393-404.10.1007/s11259-010-9359-4 PubMed: 2042514220425142PosadaD, CrandallKA (2002) The effect of recombination on the accuracy of phylogeny estimation. J Mol Evol54: 396-402 PubMed: 1184756511847565KrakowkaJE, MacIntoshK, RinglerSS, RingsDM,HartunianC, Zhang Yan, Allan G (2008) Porcine genogroup 1 Torque Teno Virus (G1-TTV) Potentiates PCV2 & PRRSV infections in gnotobiotic swine. Proceedings of the International Pig Veterinary Society (Durban) 1: 99KrakowkaS, EllisJA (2008) Evaluation of the effects of porcine genogroup 1 torque teno virus in gnotobiotic swine. Am J Vet Res69: 1623-1629.10.2460/ajvr.69.12.1623 PubMed: 1904601019046010ZhaoS, ZhangQ, LiuX, WangX, ZhangHet al. (2008) Analysis of synonymous codon usage in 11 human bocavirus isolates. Biosystems92: 207-214.10.1016/j.biosystems.2008.01.006 PubMed: 1837838618378386WangM, ZhangJ, ZhouJH, ChenHT, MaLNet al. (2011) Analysis of codon usage in type 1 and the new genotypes of duck hepatitis virus. Biosystems106: 45-50.10.1016/j.biosystems.2011.06.005 PubMed: 2170822121708221WongEH, SmithDK, RabadanR, PeirisM, PoonLL (2010) Codon usage bias and the evolution of influenza A viruses. Codon Usage Biases of Influenza Virus. BMC Evol Biol10: 253.10.1186/1471-2148-10-253 PubMed: 2072321620723216PosadaD, CrandallKA, HolmesEC (2002) Recombination in evolutionary genomics. Annu Rev Genet36: 75-97.10.1146/annurev.genet.36.040202.111115 PubMed: 1242968712429687ShiM, HolmesEC, BrarMS, LeungFC (2013) Recombination is Associated with an Outbreak of Novel Highly Pathogenic Porcine Reproductive and Respiratory Syndrome Viruses in China. J Virol, 87: 109047 PubMed: 2388507123885071RamosN, MirazoS, CastroG, ArbizaJ (2013) Molecular analysis of Porcine Circovirus Type 2 strains from Uruguay: Evidence for natural occurring recombination. Infect Genet Evol19C: 23-31 PubMed: 2380651623806516LiJ, HuoX, DaiY, YangZ, LeiYet al. (2012) Evidences for intertypic and intratypic recombinant events in EV71 of hand, foot and mouth disease during an epidemic in Hubei Province, China, 2011. Virus Res169: 195-202.10.1016/j.virusres.2012.07.028 PubMed: 2292255622922556AbrantesJ, EstevesPJ, van der LooW (2008) Evidence for recombination in the major capsid gene VP60 of the rabbit haemorrhagic disease virus (RHDV). Arch Virol153: 329-335.10.1007/s00705-007-1084-0 PubMed: 1819315618193156ZhouT, GuW, MaJ, SunX, LuZ (2005) Analysis of synonymous codon usage in H5N1 virus and other influenza A viruses. Biosystems81: 77-86.10.1016/j.biosystems.2005.03.002 PubMed: 1591713015917130GuW, ZhouT, MaJ, SunX, LuZ (2004) Analysis of synonymous codon usage in SARS Coronavirus and other viruses in the Nidovirales. Virus Res101: 155-161.10.1016/j.virusres.2004.01.006 PubMed: 1504118315041183ZhangZ, WangY, FanH, LuC (2012) Natural infection with torque teno sus virus 1 (TTSuV1) suppresses the immune response to porcine reproductive and respiratory syndrome virus (PRRSV) vaccination. Arch Virol157: 927-933.10.1007/s00705-012-1249-3 PubMed: 2232739122327391NamouchiA, DidelotX, SchöckU, GicquelB, RochaEP (2012) After the bottleneck: Genome-wide diversification of the Mycobacterium tuberculosis complex by mutation, recombination, and natural selection. Genome Res22: 721-734.10.1101/gr.129544.111 PubMed: 2237771822377718