Dietary intakes of red meat and fat are established risk factors for both colorectal cancer (CRC) and cardiovascular disease (CVDs). Recent studies have shown a mechanistic link between TMAO, an intestinal microbial metabolite of red meat and fat, and risk of CVDs. Data linking TMAO directly to CRC is, however, lacking. Here, we present an unbiased data-driven network-based systems approach to uncover a potential genetic relationship between TMAO and CRC.
We constructed two different epigenetic interaction networks (EINs) using chemical-gene, disease-gene and protein-protein interaction data from multiple large-scale data resources. We developed a network-based ranking algorithm to ascertain TMAO-related diseases from EINs. We systematically analyzed disease categories among TMAO-related diseases at different ranking cutoffs. We then determined which genetic pathways were associated with both TMAO and CRC.
We show that CVDs and their major risk factors were ranked highly among TMAO-related diseases, confirming the newly discovered mechanistic link between CVDs and TMAO, and thus validating our algorithms. CRC was ranked highly among TMAO-related disease retrieved from both EINs (top 0.02%, #1 out of 4,372 diseases retrieved based on Mendelian genetics and top 10.9% among 882 diseases based on genome-wide association genetics), providing strong supporting evidence for our hypothesis that TMAO is genetically related to CRC. We have also identified putative genetic pathways that may link TMAO to CRC, which warrants further investigation. Through systematic disease enrichment analysis, we also demonstrated that TMAO is related to metabolic syndromes and cancers in general.
Our genome-wide analysis demonstrates that systems approaches to studying the epigenetic interactions among diet, microbiome metabolisms, and disease genetics hold promise for understanding disease pathogenesis. Our results show that TMAO is genetically associated with CRC. This study suggests that TMAO may be an important intermediate marker linking dietary meat and fat and gut microbiota metabolism to risk of CRC, underscoring opportunities for the development of new gut microbiome-dependent diagnostic tests and therapeutics for CRC.
Colorectal cancer (CRC) represents the second most common cause of cancer in women (9.2%) and the third most common in men (10.0%). Diet clearly plays an important role in colon carcinogenesis. The Western diet, characterized by high fat and meat consumption, has been associated with increased risk of colorectal cancer in a large number of epidemiological studies [
The complex gut microbiota harbored by individuals have long been proposed to play an important role in colon carcinogenesis [
Recent studies have discovered that trimethylamine N-oxide (TMAO) generated by gut microbiota metabolism of dietary L-carnitine, a trimethylamine abundant in red meat, and dietary phosphatidylcholine is mechanistically linked to risk of cardiovascular diseases (CVDs) [
Whether TMAO plays a similar role in colon carcinogenesis has not been explored. Given the striking similarity of colorectal cancer and cardiovascular diseases in risk association with dietary red meat/fat intakes, we hypothesize that TMAO is an intermediate marker linking dietary red meat and fat and gut microbial metabolism to colorectal cancer. Here, we represent a genome-wide systems approach to the discovery of the genetic links between CRC and TMAO by reasoning over vast amounts of disease-gene association, protein-protein interaction and chemical-gene association data from multiple databases using advanced network-based ranking algorithms.
The experimental framework consists of the following steps: (1) we constructed two different genetic disease networks (GDNs) using disease-gene and protein-protein interaction data from multiple large-scale data resources; (2) we modeled the epigenetic interactions between TMAO and diseases by transforming GDNs into epigenetic interaction networks (EINs); (3) we developed a network-based ranking algorithm to find TMAO-related diseases from GDNs. These diseases share a high degree of genetic similarities with TMAO; (4) we validated recent findings that TMAO is associated with cardiovascular diseases; (5) we tested our hypothesis that TMAO might be genetically linked to CRC; (6) we systematically analyzed disease categories among TMAO-related diseases at different ranking cutoffs; and (7) we determined which genetic pathways were associated with both TMAO and CRC.
We constructed two separate GDNs using disease-gene association data from two complementary data resources. The first one is the Online Mendelian Inheritance in Man (OMIM), a comprehensive database of human genes and genetic phenotypes mainly for rare Mendelian genetic disorders [
The second source of disease genetics we utilized in constructing GDNs was the Catalog of Published Genome-Wide Association Studies from the US National Human Genome Research Institute (NHGRI), an exhaustive source containing the description of diseaseand trait-associated single nucleotide polymorphisms (SNPs) from published GWAS data [
We modeled the epigenetic interactions between TMAO and diseases on both GDN_OMIM and GDN_GWAS by inserting TMAO into these two disease networks. We obtained human genes associated with TMAO from STITCH, a publicly available database of known and predicted interactions of chemicals and proteins [
Ten TMAO-associated human genes.
| Gene Symbol | Gene name |
|---|---|
| MBD2 | Methyl-CpG binding domain protein 2 |
| FMO3 | Flavin containing monooxygenase 3 |
| RORC | RAR-related orphan receptor C |
| SGCG | Sarcoglycan, gamma (35kDa dystrophin-associated glycoprotein) |
| PNKD | Paroxysmal nonkinesigenic dyskinesia |
| RNASE1 | Ribonuclease, RNase A family, 1 (pancreatic) |
| NKRF | NFKB repressing factor |
| PFKM | Phosphofructokinase, muscle |
| MOCS1 | Molybdenum cofactor synthesis 1 |
| TRA2A | Transformer 2 alpha homolog |
We first inserted a pseudo-node representing TMAO into GDNs. This node was then connected to disease nodes on GDNs if TMAO-associated genes interact with disease-associated genes. The edge weights were determined by the numbers of interacting genes between the newly inserted node (
We then developed a network-based ranking algorithm to prioritize diseases on EINs based on their genetic commonalities with TMAO. We retargeted the TopicSensitive PageRank (TSPR) algorithm to rank similar diseases for a given input (TMAO in our study). TSPR is a context-sensitive ranking algorithm for web searches developed by Taher Haveliwala [
Recent studies indicate that high levels of TMAO in the blood are associated with an increased risk of cardiovascular diseases [
In order to provide evidence supporting our hypothesis that TMAO may be involved in CRC pathogenesis, we tested whether CRC would rank highly among TMAOrelated diseases retrieved from both EINs. High rankings of CRC would imply that TMAO and CRC share high genetics and that TMAO might be associated with CRC carcinogenesis.
To better understand TMAO-related diseases, we determined the kinds of diseases that were enriched among top-ranked diseases retrieved from EINs. We classified diseases into different categories using the 10th revision of the International Statistical Classification of Diseases and Related Health Problems (ICD10), a disease classification scheme designated by the World Health Organization (WHO) [
Sixteen disease chapters (classes) and numbers of diseases in each chapter.
| Disease Class | Diseases | Disease Classes | Diseases |
|---|---|---|---|
| Certain infectious and parasitic dis- | 11,598 | Diseases of the circulatory system | 5544 |
| Neoplasms | 14,158 | Diseases of the respiratory system | 3156 |
| Diseases of the blood and blood forming organs and certain disorders | 3264 | Diseases of the digestive system | 5960 |
| Endocrine, nutritional and metabolic | 5438 | Diseases of the skin and subcutaneous tissue | 4390 |
| Mental and behavioural disorders | 6162 | Diseases of the musculoskeletal system and connective tissue | 11520 |
| Diseases of the nervous system | 5258 | Diseases of the genitourinary system | 5247 |
| Diseases of the eye and adnexa | 3735 | Congenital malformations, deformations and chromosomal abnormalities | 9064 |
| Diseases of the ear and mastoid process | 1815 | Certain conditions originating in the | 3454 |
Since EIN_OMIM contains 4,848 disease nodes and EIN_GWAS contains 882 nodes, we performed disease class enrichment analysis on TMAO-related diseases retrieved from EIN_OIMIM only. For diseases ranked at 10 different ranking cutoffs (top 10%, 20%, . . . 100%), we calculated percentages of the sixteen ICD10 disease classes among them.
In order to gain insights into common mechanistic relationships shared between TMAO and CRC, we identified and ranked genetic pathways linking them (Figure
Recent studies indicate that high levels of TMAO in the blood are associated with an increased risk of CVDs. Our results demonstrated that CVDs as well as their major risk factors, including high blood cholesterol and triglyceride, high blood pressure, diabetes, and obesity, were ranked highly among TMAO-related diseases retrieved from both EIN_OMIM and EIN_GWAS. We retrieved a total of 878 diseases/traits from EIN_GWAS, among which
Top 10 ranked cardiovascular diseases and its related risk factors.
| Diseases/traits Based on GWAS genetics (878) | Diseases Based on OMIM genetics (4732) | ||
|---|---|---|---|
| Diseases/traits | Ranking | Diseases | Ranking |
| Obesity-related traits | 0.11% | Myocardial infarction, susceptibility to | 0.23% |
| Coronary heart disease | 0.80% | Ventricular tachycardia | 0.25% |
| HDL cholesterol | 1.13% | Diabetes mellitus, noninsulin-dependent | 0.32% |
| Type 2 diabetes | 1.48% | Coronary artery disease, susceptibility to | 0.51% |
| LDL cholesterol | 1.82% | LDL cholesterol level qt | 0.66% |
| Total cholesterol | 1.94% | Hypercholesterolemia, familial | 0.68% |
| Triglycerides | 3.30% | Microvascular complications of diabetes | 0.69% |
| Lipid metabolism phenotypes | 3.53% | Atherosclerosis, susceptibility to | 0.78% |
| Metabolic syndrome | 3.64% | Obesity, susceptibility to | 1.88% |
| Cardiovascular disease risk factors | 4.55% | Diabetes mellitus, type 2, susceptibility | 3.14% |
We retrieved a total of 4,732 diseases from EIN_OMIM using TMAO as input. Similar to results based on EIN_OMIM, CVDs and their major risk factors, including
Table
Top ten TMAO-related diseases/traits retrieved from EIN_OMIM and from EIN_GWAS.
| Rank | Diseases/traits from EIN_GWAS | Diseases from EIN_OMIM |
|---|---|---|
| 1 | Obesity-related traits | |
| 2 | Height | Breast cancer, somatic |
| 3 | Igg glycosylation | Gastric cancer, somatic |
| 4 | Metabolite levels | Ovarian cancer, somatic |
| 5 | Inflammatory bowel disease | Schizophrenia, susceptibility to |
| 6 | Multiple sclerosis | Asthma, susceptibility to |
| 7 | Coronary heart disease | Leukemia, acute myeloid |
| 8 | Crohn's disease | Bladder cancer, somatic |
| 9 | Metabolic traits | Malaria, cerebral, susceptibility to |
| 10 | HDL cholesterol | Thyroid carcinoma, follicular, somatic |
Strikingly, among top ten TMAO-related diseases retrieved from EIN_OMIM, seven are cancers, including CRC, breast cancer, gastric cancer and leukemia. Because of the strong (causal) disease-gene associations in the large OMIM database, the observed strong relationship between TMAO and cancers implies that TMAO might be genetically involved in not only CRC but also cancers in general, which we further confirmed in the next section.
We examined the distributions of sixteen disease classes among 4,732 TMAO-related diseases retrieved from EIN_OMIM at 10 different ranking cutoffs (top 10%, 20%,
. . . 100%). Among the sixteen disease classes, only two disease classes were enriched among top-ranked TMAO-related diseases:
We demonstrated that CRC was highly related to TMAO in afore-mentioned sections. We next investigated common genetic pathways that are involved in both TMAO and CRC. The 54 TMAO-associated human genes are involved in a total of 170 pathways. The 53 CRC genes based on OMIM genetics are involved in 503 pathways and the 65 CRC genes based on GWAS studies are associated with 182 pathways. Although no specific genes are shared between TMAO and CRC, many common genetic pathways are associated with both: 52 common pathways between TMAO and CRC based on OMIM genes and 39 common pathways based on GWAS genetics (Table
Numbers of shared genes and pathways between TMAO and CRC.
| Genes (n) | Pathways (n) | |
|---|---|---|
| TMAO | 54 | 170 |
| CRC (OMIM) | 53 | 503 |
| CRC (GWAS) | 65 | 182 |
| CRC (OMIM) ∩ CRC (GWAS) | 0 | 118 |
| TMAO ∩ CRC (OMIM) | 0 | 52 |
| TMAO ∩ CRC (GWAS) | 0 | 39 |
| TMAO ∩ CRC (OMIM) ∩ CRC (GWAS) | 0 | 20 |
Even though there is no overlap between the 53 CRC-associated genes identified from OMIM and the 65 CRC-associated genes identified from the GWAS catalog, these genes shared 118 pathways, which we used to identify genetic pathways linking CRC and TMAO. We found that TMAO shared 20 pathways of these 118 CRCrelated pathways with CRC. The top 10 ranked common pathways between TMAO and CRC (OMIM), TMAO and CRC(GWAS), and TMAO and CRC-genes from both OMIM and GWAS are shown in Table
Top ten ranked genetic pathways shared between TMAO and CRC.
| Pathways in cancer | Immune system | Immune system |
Recent studies have shown a mechanistic link between TMAO, gut microbial metabolism of dietary meat and fat, and risk of cardiovascular diseases (CVDs), and established an obligatory role of gut microbiota in the generation of the proatherosclerotic TMAO from dietary L-carnitine and phosphatidylcholine, abundant in red meat and dietary fat respectively [
High red meat and animal fat intakes have been well established as risk factors for both CVDs and colorectal cancer. The discovery of the TMAO-CVDs connection mediated by gut microbial metabolism provides evidence for a novel mechanism by which human gut microbiota may influence health and disease. Gut microbiota has long been postulated to modulate risk of CRC. Although increasing evidence shows gut microbial community differences in patients with and without colorectal neoplasia [
In this study, we present an unbiased data-driven network-based approach to uncover genetic links between TMAO and CRC by integrating and reasoning over vast amounts of disease genetics, protein interactions, and interactions of chemicals and proteins. Our approach is generic and can be readily retargeted to discover novel genetic links among any diseases and chemicals. Our genome-wide analysis demonstrates that systems approaches hold promise for the discovery of novel disease genetic basis. Our results show that TMAO is genetically associated with CRC. This study suggests that TMAO may be an important intermediate marker linking dietary meat and fat and gut microbiota metabolism to risk of CRC, underscoring opportunities for the development of new gut microbiome-dependent diagnostic tests and therapeutics for CRC.
The authors declare that they have no competing interests.
LL: initiated the hypothesis. RX and QW: jointly designed and implemented algorithms, and performed the experiments. RX, QW, and LL: wrote the paper.
We would like to thank the funding resources that have made this work possible. RX is funded by Case Western Reserve University/Cleveland Clinic CTSA Grant (UL1 RR024989), the Eunice Kennedy Shriver National Institute Of Child Health & Human Development of the National Institutes of Health under Award Number DP2HD084068, the Training grant in Computational Genomic Epidemiology of Cancer (CoGE) (R25 CA094186-06), and Grant #IRG-91-022-18 to the Case Comprehensive Cancer Center from the American Cancer Society. QW is partly funded by ThinTek LLC. LL is funded by National Cancer Institute U01CA181770 and R01CA136726.
Publication charges for this article have been funded by the Training grant in Computational Genomic Epidemiology of Cancer (CoGE) (R25 CA094186-06).
This article has been published as part of