1887

Abstract

The study addresses the challenge of utilizing human gut microbiome data for the early detection of colorectal cancer (CRC). The research emphasizes the potential of using machine learning techniques to analyze complex microbiome datasets, providing a non-invasive approach to identifying CRC-related microbial markers.

The primary hypothesis is that a robust machine learning-based analysis of 16S rRNA microbiome data can identify specific microbial features that serve as effective biomarkers for CRC detection, overcoming the limitations of classical statistical models in high-dimensional settings.

The primary objective of this study is to explore and validate the potential of the human microbiome, specifically in the colon, as a valuable source of biomarkers for colorectal cancer (CRC) detection and progression. The focus is on developing a classifier that effectively predicts the presence of CRC and normal samples based on the analysis of three previously published faecal 16S rRNA sequencing datasets.

To achieve the aim, various machine learning techniques are employed, including random forest (RF), recursive feature elimination (RFE) and a robust correlation-based technique known as the fuzzy forest (FF). The study utilizes these methods to analyse the three datasets, comparing their performance in predicting CRC and normal samples. The emphasis is on identifying the most relevant microbial features (taxa) associated with CRC development via partial dependence plots, i.e. a machine learning tool focused on explainability, visualizing how a feature influences the predicted outcome.

The analysis of the three faecal 16S rRNA sequencing datasets reveals the consistent and superior predictive performance of the FF compared to the RF and RFE. Notably, FF proves effective in addressing the correlation problem when assessing the importance of microbial taxa in explaining the development of CRC. The results highlight the potential of the human microbiome as a non-invasive means to detect CRC and underscore the significance of employing FF for improved predictive accuracy.

In conclusion, this study underscores the limitations of classical statistical techniques in handling high-dimensional information such as human microbiome data. The research demonstrates the potential of the human microbiome, specifically in the colon, as a valuable source of biomarkers for CRC detection. Applying machine learning techniques, particularly the FF, is a promising approach for building a classifier to predict CRC and normal samples. The findings advocate for integrating FF to overcome the challenges associated with correlation when identifying crucial microbial features linked to CRC development.

Funding
This study was supported by the:
  • Spanish National Plan for Scientific and Technical Research and Innovation
    • Principle Award Recipient: LauraJudith Marcos Zambrano
  • Fundação para a Ciência e a Tecnologia, I.P., with references UIDB/00297/2020 and UIDP/00297/2020 (NOVA Math), UIDB/00667/2020 and UIDP/00667/2020 (UNIDEMI), and CEECINST/00042/2021.
    • Principle Award Recipient: MartaB. B. Lopes
Loading

Article metrics loading...

/content/journal/jmm/10.1099/jmm.0.001903
2024-10-08
2024-11-10
Loading full text...

Full text loading...

References

  1. Thursby E, Juge N. Introduction to the human gut microbiota. Biochem J 2017; 474:1823–1836 [View Article] [PubMed]
    [Google Scholar]
  2. Ren L, Ye J, Zhao B, Sun J, Cao P et al. The role of intestinal microbiota in colorectal cancer. Front Pharmacol 2021; 12: [View Article]
    [Google Scholar]
  3. Cheng Y, Ling Z, Li L. The intestinal microbiota and colorectal cancer. Front Immunol 2020; 11: [View Article]
    [Google Scholar]
  4. Xu S, Yin W, Zhang Y, Lv Q, Yang Y et al. Foes or friends? bacteria enriched in the tumor microenvironment of colorectal cancer. Cancers 2020; 12:372 [View Article] [PubMed]
    [Google Scholar]
  5. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2018; 68:394–424 [View Article]
    [Google Scholar]
  6. Berbert L, Santos A, Magro DO, Guadagnini D, Assalin HB et al. Metagenomics analysis reveals universal signatures of the intestinal microbiota in colorectal cancer, regardless of regional differences. Braz J Med Biol Res 2022; 55:e11832 [View Article]
    [Google Scholar]
  7. Montalban-Arques A, Scharl M. Intestinal microbiota and colorectal carcinoma: implications for pathogenesis, diagnosis, and therapy. EBioMedicine 2019; 48:648–655 [View Article] [PubMed]
    [Google Scholar]
  8. Fiorentini C, Carlini F, Germinario EAP, Maroccia Z, Travaglione S et al. Gut microbiota and colon cancer: a role for bacterial protein toxins?. Int J Mol Sci 2020; 21:6201 [View Article] [PubMed]
    [Google Scholar]
  9. Si H, Yang Q, Hu H, Ding C, Wang H et al. Colorectal cancer occurrence and treatment based on changes in intestinal flora. Semin Cancer Biol 2021; 70:3–10 [View Article] [PubMed]
    [Google Scholar]
  10. Baxter NT, Ruffin MT, Rogers MAM, Schloss PD. Microbiota-based model improves the sensitivity of fecal immunochemical test for detecting colonic lesions. Genome Med 2016; 8:37 [View Article] [PubMed]
    [Google Scholar]
  11. Zackular JP, Rogers MAM, Ruffin MT, Schloss PD. The human gut microbiome as a screening tool for colorectal cancer. Cancer Prev Res 2014; 7:1112–1121 [View Article] [PubMed]
    [Google Scholar]
  12. Zeller G, Tap J, Voigt AY, Sunagawa S, Kultima JR et al. Potential of fecal microbiota for early-stage detection of colorectal cancer. Mol Syst Biol 2014; 10:766 [View Article] [PubMed]
    [Google Scholar]
  13. Bundgaard-Nielsen C, Baandrup UT, Nielsen LP, Sørensen S. The presence of bacteria varies between colorectal adenocarcinomas, precursor lesions and non-malignant tissue. BMC Cancer 2019; 19:399 [View Article] [PubMed]
    [Google Scholar]
  14. Reis SAD, da Conceição LL, Peluzio M do CG. Intestinal microbiota and colorectal cancer: changes in the intestinal microenvironment and their relation to the disease. J Med Microbiol 2019; 68:1391–1407 [View Article] [PubMed]
    [Google Scholar]
  15. Greenacre M, Martínez-Álvaro M, Blasco A. Compositional data analysis of microbiome and any-omics datasets: a validation of the additive logratio transformation. Front Microbiol 2021; 12: [View Article]
    [Google Scholar]
  16. Kaul A, Mandal S, Davidov O, Peddada SD. Analysis of microbiome data in the presence of excess zeros. Front Microbiol 2017; 8: [View Article]
    [Google Scholar]
  17. Li H. Microbiome, metagenomics, and high-dimensional compositional data analysis. Annu Rev Stat Appl 2015; 2:73–94 [View Article]
    [Google Scholar]
  18. Zhou Y-H, Sun G. Improve the colorectal cancer diagnosis using gut microbiome data. Front Mol Biosci 2022; 9: [View Article]
    [Google Scholar]
  19. Marcos-Zambrano LJ, Karaduzovic-Hadziabdic K, Loncar Turukalo T, Przymus P, Trajkovik V et al. Applications of machine learning in human microbiome studies: a review on feature selection, biomarker identification, disease prediction and treatment. Front Microbiol 2021; 12:634511 [View Article] [PubMed]
    [Google Scholar]
  20. Freitas P, Silva F, Sousa JV, Ferreira RM, Figueiredo C et al. Machine learning-based approaches for cancer prediction using microbiome data. Sci Rep 2023; 13:11821 [View Article] [PubMed]
    [Google Scholar]
  21. Sze MA, Schloss PD. Leveraging existing 16S rRNA gene surveys to identify reproducible biomarkers in individuals with colorectal tumors. mBio 2018; 9:e00630-18 [View Article] [PubMed]
    [Google Scholar]
  22. Zhang B, Xu S, Xu W, Chen Q, Chen Z et al. Leveraging fecal bacterial survey data to predict colorectal tumors. Front Genet 2019; 10:447 [View Article] [PubMed]
    [Google Scholar]
  23. Shah MS, DeSantis TZ, Weinmaier T, McMurdie PJ, Cope JL et al. Leveraging sequence-based faecal microbial community survey data to identify a composite biomarker for colorectal cancer. Gut 2018; 67:882–891 [View Article] [PubMed]
    [Google Scholar]
  24. Papoutsoglou G, Tarazona S, Lopes MB, Klammsteiner T, Ibrahimi E et al. Machine learning approaches in microbiome research: challenges and best practices. Front Microbiol 2023; 14:1261889 [View Article] [PubMed]
    [Google Scholar]
  25. Ibrahimi E, Norouzirad M, Meto M, Lopes MB. Regularised Generalised Linear Models to Disclose Host-Microbiome Associations in Colorectal Cancer. In Proceedings of the 2023 6th International Conference on Mathematics and Statistics In 2023 pp 98–102 [View Article]
    [Google Scholar]
  26. Khannous-Lleiffe O, Willis JR, Saus E, Moreno V, Castellví-Bel S et al. Microbiome profiling from Fecal Immunochemical Test reveals microbial signatures with potential for colorectal cancer screening. Cancers 2022; 15:120 [View Article] [PubMed]
    [Google Scholar]
  27. Lu F, Lei T, Zhou J, Liang H, Cui P et al. Using gut microbiota as a diagnostic tool for colorectal cancer: machine learning techniques reveal promising results. J Med Microbiol 2023; 72:001699 [View Article] [PubMed]
    [Google Scholar]
  28. Rynazal R, Fujisawa K, Shiroma H, Salim F, Mizutani S et al. Leveraging explainable AI for gut microbiome-based colorectal cancer classification. Genome Biol 2023; 24:21 [View Article] [PubMed]
    [Google Scholar]
  29. Novielli P, Romano D, Magarelli M, Bitonto PD, Diacono D et al. Explainable artificial intelligence for microbiome data analysis in colorectal cancer biomarker identification. Front Microbiol 2024; 15:1348974 [View Article] [PubMed]
    [Google Scholar]
  30. Jabeer A, Kocak A, Akkas H, Yenisert F, Nalbantoglu OU. Identifying taxonomic biomarkers of colorectal cancer in human intestinal microbiota using multiple feature selection methods. In 2022 Innovations in Intelligent Systems and Applications Conference (ASYU) IEEE; 2022 pp 1–6 [View Article]
    [Google Scholar]
  31. Mouradov D, Greenfield P, Li S, In E-J, Storey C et al. Oncomicrobial community profiling identifies clinicomolecular and prognostic subtypes of colorectal cancer. Gastroenterology 2023; 165:104–120 [View Article] [PubMed]
    [Google Scholar]
  32. Gao Y, Zhu Z, Sun F. Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data. Synth Syst Biotechnol 2022; 7:574–585 [View Article] [PubMed]
    [Google Scholar]
  33. Conn D, Ngun T, Li G, Ramirez CM. Fuzzy forests: extending random forest feature selection for correlated, high-dimensional data. J Stat Soft 2019; 91:1–25 [View Article]
    [Google Scholar]
  34. Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 2019; 37:852–857 [View Article]
    [Google Scholar]
  35. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res 2013; 41:D590–D596 [View Article] [PubMed]
    [Google Scholar]
  36. Ibrahimi E, Lopes MB, Dhamo X, Simeon A, Shigdel R et al. Overview of data preprocessing for machine learning applications in human microbiome research. Front Microbiol 2023; 14:1250909 [View Article] [PubMed]
    [Google Scholar]
  37. Marcos-Zambrano LJ. 16S rRNA sequencing gene datasets for CRC data (1.0.0) [Data set]. Zenodo; 2022 https://doi.org/10.5281/zenodo.7382814
  38. Breiman L. Random forests. Mach Learn 2001; 45:5–32 [View Article]
    [Google Scholar]
  39. Maturo F, Verde R. Supervised classification of curves via a combined use of functional data analysis and tree-based methods. Comput Stat 2023; 38:419–459 [View Article]
    [Google Scholar]
  40. Breiman L. Classification and Regression Trees New York: Routledge; 2017
    [Google Scholar]
  41. Díaz-Uriarte R, Alvarez de Andrés S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics 2006; 7:3 [View Article]
    [Google Scholar]
  42. Hastie T, Tibshirani R, Friedman J. Random forests. In Hastie T, Tibshirani R, Friedman J. eds Elements of Statistical Learning: Data Mining, Inference, and Prediction New York, NY: Springer; 2009 pp 587–604 [View Article]
    [Google Scholar]
  43. James G, Witten D, Hastie T, Tibshirani R. Introduction. In James G, Witten D, Hastie T, Tibshirani R. eds An Introduction to Statistical Learning 2013 pp 1–14 [View Article]
    [Google Scholar]
  44. Maturo F, Verde R. Pooling random forest and functional data analysis for biomedical signals supervised classification: theory and application to electrocardiogram data. Stat Med 2022; 41:2247–2275 [View Article] [PubMed]
    [Google Scholar]
  45. Nicodemus KK, Malley JD. Predictor correlation impacts machine learning algorithms: implications for genomic studies. Bioinformatics 2009; 25:1884–1890 [View Article] [PubMed]
    [Google Scholar]
  46. Strobl C, Boulesteix A-L, Zeileis A, Hothorn T. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics 2007; 8:25 [View Article] [PubMed]
    [Google Scholar]
  47. Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A. Conditional variable importance for random forests. BMC Bioinformatics 2008; 9:307 [View Article] [PubMed]
    [Google Scholar]
  48. White A, Ironmonger L, Steele RJC, Ormiston-Smith N, Crawford C et al. A review of sex-related differences in colorectal cancer incidence, screening uptake, routes to diagnosis, cancer stage and survival in the UK. BMC Cancer 2018; 18:906 [View Article] [PubMed]
    [Google Scholar]
  49. Waqas M, Halim SA, Ullah A, Ali AAM, Khalid A et al. Multi-fold computational analysis to discover novel putative inhibitors of isethionate sulfite-lyase (Isla) from Bilophila wadsworthia: combating colorectal cancer and inflammatory bowel diseases. Cancers 2023; 15:901 [View Article] [PubMed]
    [Google Scholar]
  50. Ternes D, Karta J, Tsenkova M, Wilmes P, Haan S et al. Microbiome in colorectal cancer: how to get from meta-omics to mechanism?. Trends Microbiol 2020; 28:401–423 [View Article] [PubMed]
    [Google Scholar]
  51. Tilg H, Adolph TE, Gerner RR, Moschen AR. The intestinal microbiota in colorectal cancer. Cancer Cell 2018; 33:954–964 [View Article] [PubMed]
    [Google Scholar]
  52. Kostic AD, Chun E, Robertson L, Glickman JN, Gallini CA et al. Fusobacterium nucleatum potentiates intestinal tumorigenesis and modulates the tumor-immune microenvironment. Cell Host Microbe 2013; 14:207–215 [View Article] [PubMed]
    [Google Scholar]
  53. Castellarin M, Warren RL, Freeman JD, Dreolini L, Krzywinski M et al. Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. Genome Res 2012; 22:299–306 [View Article] [PubMed]
    [Google Scholar]
  54. Attene-Ramos MS, Wagner ED, Plewa MJ, Gaskins HR. Evidence that hydrogen sulfide is a genotoxic agent. Cancer Res 2006; 4:9–14 [View Article]
    [Google Scholar]
  55. Rubinstein MR, Wang X, Liu W, Hao Y, Cai G et al. Fusobacterium nucleatum promotes colorectal carcinogenesis by modulating E-cadherin/β-catenin signaling via its FadA adhesin. Cell Host Microbe 2013; 14:195–206 [View Article] [PubMed]
    [Google Scholar]
  56. Okumura S, Konishi Y, Narukawa M, Sugiura Y, Yoshimoto S et al. Gut bacteria identified in colorectal cancer patients promote tumourigenesis via butyrate secretion. Nat Commun 2021; 12:5674 [View Article] [PubMed]
    [Google Scholar]
  57. Dikeocha IJ, Al-Kabsi AM, Chiu H-T, Alshawsh MA. Faecalibacterium prausnitzii ameliorates colorectal tumorigenesis and suppresses proliferation of HCT116 colorectal cancer cells. Biomedicines 2022; 10:1128 [View Article] [PubMed]
    [Google Scholar]
  58. Liang JQ, Li T, Nakatsu G, Chen Y-X, Yau TO et al. A novel faecal Lachnoclostridium marker for the non-invasive diagnosis of colorectal adenoma and cancer. Gut 2020; 69:1248–1257 [View Article] [PubMed]
    [Google Scholar]
  59. Mirzaei R, Mohammadzadeh R, Mirzaei H, Sholeh M, Karampoor S et al. Role of microRNAs in Staphylococcus aureus infection: potential biomarkers and mechanism. IUBMB Life 2020; 72:1856–1869 [View Article] [PubMed]
    [Google Scholar]
  60. Liu J, Dong W, Zhao J, Wu J, Xia J et al. Gut microbiota profiling variated during colorectal cancer development in mouse. BMC Genomics 2022; 23:848 [View Article]
    [Google Scholar]
  61. Zhao L, Zhang X, Zhou Y, Fu K, Lau HC-H et al. Parvimonas micra promotes colorectal tumorigenesis and is associated with prognosis of colorectal cancer patients. Oncogene 2022; 41:4200–4210 [View Article] [PubMed]
    [Google Scholar]
  62. Long X, Wong CC, Tong L, Chu ESH, Ho Szeto C et al. Peptostreptococcus anaerobius promotes colorectal carcinogenesis and modulates tumour immunity. Nat Microbiol 2019; 4:2319–2330 [View Article] [PubMed]
    [Google Scholar]
  63. Tsoi H, Chu ESH, Zhang X, Sheng J, Nakatsu G et al.. Peptostreptococcus anaerobius Induces intracellular cholesterol biosynthesis in colon cells to induce proliferation and causes dysplasia in mice. Gastroenterology 2017; 152:1419–1433 [View Article] [PubMed]
    [Google Scholar]
  64. Senthakumaran T, Moen AEF, Tannæs TM, Endres A, Brackmann SA et al. Microbial dynamics with CRC progression: a study of the mucosal microbiota at multiple sites in cancers, adenomatous polyps, and healthy controls. Eur J Clin Microbiol Infect Dis 2023; 42:305–322 [View Article] [PubMed]
    [Google Scholar]
  65. Boleij A, van Gelder M, Swinkels DW, Tjalsma H. Clinical importance of Streptococcus gallolyticus infection among colorectal cancer patients: systematic review and meta-analysis. Clin Infect Dis 2011; 53:870–878 [View Article] [PubMed]
    [Google Scholar]
  66. Gupta A, Madani R, Mukhtar H. Streptococcus bovis endocarditis, a silent sign for colonic tumour. Colorectal Dis 2010; 12:164–171 [View Article]
    [Google Scholar]
  67. Chung L, Thiele Orberg E, Geis AL, Chan JL, Fu K et al. Bacteroides fragilis toxin coordinates a pro-carcinogenic inflammatory cascade via targeting of colonic epithelial cells. Cell Host Microbe 2018; 23:203–214 [View Article] [PubMed]
    [Google Scholar]
  68. Sears CL, Geis AL, Housseau F. Bacteroides fragilis subverts mucosal biology: from symbiont to colon carcinogenesis. J Clin Invest 2014; 124:4166–4172 [View Article] [PubMed]
    [Google Scholar]
  69. Balamurugan R, Rajendiran E, George S, Samuel GV, Ramakrishna BS. Real-time polymerase chain reaction quantification of specific butyrate-producing bacteria, Desulfovibrio and Enterococcus faecalis in the feces of patients with colorectal cancer. J Gastroenterol Hepatol 2008; 23:1298–1303 [View Article] [PubMed]
    [Google Scholar]
  70. Wang X, Allen TD, May RJ, Lightfoot S, Houchen CW et al. Enterococcus faecalis induces aneuploidy and tetraploidy in colonic epithelial cells through a bystander effect. Cancer Res 2008; 68:9909–9917 [View Article] [PubMed]
    [Google Scholar]
  71. Veziant J, Gagnière J, Jouberton E, Bonnin V, Sauvanet P et al. Association of colorectal cancer with pathogenic escherichia coli: focus on mechanisms using optical imaging. World J Clin Oncol 2016; 7:293–301
    [Google Scholar]
  72. Schmidt TS, Raes J, Bork P. The human gut microbiome: from association to modulation. Cell 2018; 172:1198–1215
    [Google Scholar]
/content/journal/jmm/10.1099/jmm.0.001903
Loading
/content/journal/jmm/10.1099/jmm.0.001903
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error