1887

Abstract

Carbohydrate active enzymes (CAZymes) are pivotal in biological processes including energy metabolism, cell structure maintenance, signalling, and pathogen recognition. Bioinformatic prediction and mining of CAZymes improves our understanding of these activities and enables discovery of candidates of interest for industrial biotechnology, particularly the processing of organic waste for biofuel production. CAZy (www.cazy.org) is a high-quality, manually curated, and authoritative database of CAZymes that is often the starting point for these analyses. Automated querying and integration of CAZy data with other public datasets would constitute a powerful resource for mining and exploring CAZyme diversity. However, CAZy does not itself provide methods to automate queries, or integrate annotation data from other sources (except by following hyperlinks) to support further analysis. To overcome these limitations we developed cazy_webscraper, a command-line tool that retrieves data from CAZy and other online resources to build a local, shareable and reproducible database that augments and extends the authoritative CAZy database. cazy_webscraper’s integration of curated CAZyme annotations with their corresponding protein sequences, up-to-date taxonomy assignments, and protein structure data facilitates automated large-scale and targeted bioinformatic CAZyme family analysis and candidate screening. This tool has found widespread uptake in the community, with over 35 000 downloads (from April 2021 to June 2023). We demonstrate the use and application of cazy_webscraper to: (i) augment, update and correct CAZy database accessions; (ii) explore the taxonomic distribution of CAZymes recorded in CAZy, identifying under-represented taxa and unusual CAZy class distributions; and (iii) investigate three CAZymes having potential biotechnological application for degradation of biomass, but lacking a representative structure in the PDB database. We describe in general how cazy_webscraper facilitates functional, structural and evolutionary studies to aid identification of candidate enzymes for further characterization, and specifically note that CAZy provides supporting evidence for recent expansion of the Auxiliary Activities (AA) CAZy family in eukaryotes, consistent with functions potentially specific to eukaryotic lifestyles.

Funding
This study was supported by the:
  • Biotechnology and Biological Sciences Research Council (Award EASTBIO Doctoral Training Partnership)
    • Principle Award Recipient: EmmaElizabeth Mary Hobbs
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001086
2023-08-14
2024-07-17
Loading full text...

Full text loading...

/deliver/fulltext/mgen/9/8/mgen001086.html?itemId=/content/journal/mgen/10.1099/mgen.0.001086&mimeType=html&fmt=ahah

References

  1. Chettri D, Verma AK, Verma AK. Innovations in CAZyme gene diversity and its modification for biorefinery applications. Biotechnol Rep 2020; 28:e00525 [View Article] [PubMed]
    [Google Scholar]
  2. Zhang B, Gao Y, Zhang L, Zhou Y. The plant cell wall: biosynthesis, construction, and functions. J Integr Plant Biol 2021; 63:251–272 [View Article]
    [Google Scholar]
  3. Liu Y, Li R, Wang J, Zhang X, Jia R et al. Increased enzymatic hydrolysis of sugarcane bagasse by a novel glucose- and xylose-stimulated β-glucosidase from Anoxybacillus flavithermus subsp. yunnanensis E13T. BMC Biochem 2017; 18:4 [View Article] [PubMed]
    [Google Scholar]
  4. Kao MR, Kuo HW, Lee CC, Huang KY, Huang TY et al. Chaetomella raphigera β-glucosidase D2-BGL has intriguing structural features and a high substrate affinity that renders it an efficient cellulase supplement for lignocellulosic biomass hydrolysis. Biotechnol Biofuels 2019; 12:258 [View Article] [PubMed]
    [Google Scholar]
  5. Zheng M, Zhang K, Zhang J, Zhu L, Du G et al. Cheap, high yield, and strong corn husk-based textile bio-fibers with low carbon footprint via green alkali retting-splicing-twisting strategy. Ind Crops Prod 2022; 188:115699 [View Article]
    [Google Scholar]
  6. Drula E, Garron M-L, Dogan S, Lombard V, Henrissat B et al. The carbohydrate-active enzyme database: functions and literature. Nucleic Acids Res 2022; 50:D571–D577 [View Article] [PubMed]
    [Google Scholar]
  7. V Honorato R. CAZy-parser a way to extract information from the carbohydrate-active enZYmes database. JOSS 2016; 1:53 [View Article]
    [Google Scholar]
  8. Sayers EW, Cavanaugh M, Clark K, Ostell J, Pruitt KD et al. GenBank. Nucleic Acids Res 2020; 48:D84–D86 [View Article] [PubMed]
    [Google Scholar]
  9. The UniProt Consortium Bateman A, Martin M-J, Orchard S, Magrane M et al. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res 2021; 49:D480–D489 [View Article]
    [Google Scholar]
  10. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN et al. The protein data bank. Nucleic Acids Res 2000; 28:235–242 [View Article] [PubMed]
    [Google Scholar]
  11. Parks DH, Chuvochina M, Rinke C, Mussig AJ, Chaumeil P-A et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res 2022; 50:D785–D794 [View Article] [PubMed]
    [Google Scholar]
  12. Hipp RD. Sqlite; 2020 https://sqlite.org/index.html
  13. Bayer M. Sqlalchemy, in The Architecture of Open Source Applications Volume II: Structure, Scale, and a Few More Fearless Hacks Colorado, US: Mountain View; 2012
    [Google Scholar]
  14. Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K et al. Database resources of the national center for biotechnology information. Nucleic Acids Res 2005; 33:D39–45 [View Article] [PubMed]
    [Google Scholar]
  15. Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009; 25:1422–1423 [View Article] [PubMed]
    [Google Scholar]
  16. Cokelaer T, Pultz D, Harder LM, Serra-Musach J, Saez-Rodriguez J. BioServices: a common Python package to access biological Web Services programmatically. Bioinformatics 2013; 29:3241–3242 [View Article] [PubMed]
    [Google Scholar]
  17. Hamelryck T, Manderick B. PDB file parser and structure class implemented in Python. Bioinformatics 2003; 19:2308–2310 [View Article] [PubMed]
    [Google Scholar]
  18. Federhen S. The NCBI taxonomy database. Nucleic Acids Res 2012; 40:D136–D143 [View Article]
    [Google Scholar]
  19. Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J et al. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods 2018; 15:475–476 [View Article] [PubMed]
    [Google Scholar]
  20. Python Software Foundation Python package index - pypi; 2022 https://pypi.org/
  21. Mauri M, Elli T, Caviglia G, Uboldi G, Azzi M. Rawgraphs: a visualisation platform to create open outputs. In In Proceedings of the 12th Biannual Conference on Italian SIGCHI Chapter 2017 pp 1–5
    [Google Scholar]
  22. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol 1990; 215:403–410 [View Article] [PubMed]
    [Google Scholar]
  23. Rasko DA, Myers GSA, Ravel J. Visualization of comparative genomic analyses by BLAST score ratio. BMC Bioinformatics 2005; 6:2 [View Article] [PubMed]
    [Google Scholar]
  24. Galili T, O’Callaghan A, Sidi J, Sievert C. heatmaply: an R package for creating interactive cluster heatmaps for online publishing. Bioinformatics 2018; 34:1600–1602 [View Article] [PubMed]
    [Google Scholar]
  25. Wickham H. ggplot2. In Ggplot2: Elegant Graphics for Data Analysis Cham: Springer-Verlag; 2016
    [Google Scholar]
  26. Teufel F, Almagro Armenteros JJ, Johansen AR, Gíslason MH, Pihl SI et al. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat Biotechnol 2022; 40:1023–1025 [View Article] [PubMed]
    [Google Scholar]
  27. Hallgren J, Tsirigos KD, Pedersen MD, Almagro Armenteros JJ, Marcatili P et al. DeepTMHMM predicts alpha and beta transmembrane proteins using deep neural networks. Bioinformatics 2022 [View Article]
    [Google Scholar]
  28. Jumper J, Evans R, Pritzel A, Green T, Figurnov M et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021; 596:583–589 [View Article]
    [Google Scholar]
  29. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM et al. UCSF Chimera--a visualization system for exploratory research and analysis. J Comput Chem 2004; 25:1605–1612 [View Article] [PubMed]
    [Google Scholar]
  30. Meng EC, Pettersen EF, Couch GS, Huang CC, Ferrin TE. Tools for integrated sequence-structure analysis with UCSF Chimera. BMC Bioinformatics 2006; 7:339 [View Article] [PubMed]
    [Google Scholar]
  31. Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol 2017; 35:1026–1028 [View Article] [PubMed]
    [Google Scholar]
  32. Zhang H, Yohe T, Huang L, Entwistle S, Wu P et al. dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res 2018; 46:W95–W101 [View Article]
    [Google Scholar]
  33. Söding J, Biegert A, Lupas AN. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 2005; 33:W244–8 [View Article] [PubMed]
    [Google Scholar]
  34. Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 2014; 42:D490–5 [View Article] [PubMed]
    [Google Scholar]
  35. Barrett K, Hunt CJ, Lange L, Meyer AS. Conserved unique peptide patterns (CUPP) online platform: peptide-based functional annotation of carbohydrate active enzymes. Nucleic Acids Res 2020; 48:W110–W115 [View Article] [PubMed]
    [Google Scholar]
  36. Razeq FM, Jurak E, Stogios PJ, Yan R, Tenkanen M et al. A novel acetyl xylan esterase enabling complete deacetylation of substituted xylans. Biotechnol Biofuels 2018; 11:74 [View Article]
    [Google Scholar]
  37. Konno N, Igarashi K, Habu N, Samejima M, Isogai A. Cloning of the Trichoderma reesei cDNA encoding a glucuronan lyase belonging to a novel polysaccharide lyase family. Appl Environ Microbiol 2009; 75:101–107 [View Article] [PubMed]
    [Google Scholar]
  38. Singh DB, Tripathi T. Frontiers in protein structure, function, and dynamics. In Frontiers in Protein Structure, Function, and Dynamics, 1st edition. Singapore: Springer; 2020
    [Google Scholar]
  39. Little E, Bork P, Doolittle RF. Tracing the spread of fibronectin type III domains in bacterial glycohydrolases. J Mol Evol 1994; 39:631–643 [View Article]
    [Google Scholar]
  40. Szabady RL, Welch RA. Stce peptidase and the stce-like metalloendopeptidases. In Rawlings ND, Salvesen G. eds Handbook of Proteolytic Enzymes, 3rd edition. Massachusetts: Academic Press; 2013
    [Google Scholar]
  41. Adindla S, Inampudi KK, Guruprasad K, Guruprasad L. Identification and analysis of novel tandem repeats in the cell surface proteins of archaeal and bacterial genomes using computational tools. Comp Funct Genomics 2004; 5:2–16 [View Article] [PubMed]
    [Google Scholar]
  42. Garron ML, Henrissat B. The continuing expansion of CAZymes and their families. Curr Opin Chem Biol 2019; 53:82–87 [View Article] [PubMed]
    [Google Scholar]
  43. Ochiai A, Itoh T, Mikami B, Hashimoto W, Murata K. Structural determinants responsible for substrate recognition and mode of action in family 11 polysaccharide lyases. J Biol Chem 2009; 284:10181–10189 [View Article] [PubMed]
    [Google Scholar]
  44. Ochiai A, Itoh T, Kawamata A, Hashimoto W, Murata K. Plant cell wall degradation by saprophytic Bacillus subtilis strains: gene clusters responsible for rhamnogalacturonan depolymerization. Appl Environ Microbiol 2007; 73:3803–3813 [View Article] [PubMed]
    [Google Scholar]
  45. Silva IR, Jers C, Otten H, Nyffenegger C, Larsen DM et al. Design of thermostable rhamnogalacturonan lyase mutants from Bacillus licheniformis by combination of targeted single point mutations. Appl Microbiol Biotechnol 2014; 98:4521–4531 [View Article] [PubMed]
    [Google Scholar]
  46. Mølgaard A, Kauppinen S, Larsen S. Rhamnogalacturonan acetylesterase elucidates the structure and function of a new family of hydrolases. Structure 2000; 8:373–383 [View Article] [PubMed]
    [Google Scholar]
  47. Langkilde A, Kristensen SM, Lo Leggio L, Mølgaard A, Jensen JH et al. Short strong hydrogen bonds in proteins: a case study of rhamnogalacturonan acetylesterase. Acta Crystallogr D Biol Crystallogr 2008; 64:851–863 [View Article]
    [Google Scholar]
  48. Jones DR, Thomas D, Alger N, Ghavidel A, Inglis GD et al. SACCHARIS: an automated pipeline to streamline discovery of carbohydrate active enzyme activities within polyspecific families and de novo sequence datasets. Biotechnol Biofuels 2018; 11:27 [View Article]
    [Google Scholar]
  49. Levasseur A, Drula E, Lombard V, Coutinho PM, Henrissat B. Expansion of the enzymatic repertoire of the CAZy database to integrate auxiliary redox enzymes. Biotechnol Biofuels 2013; 6:41 [View Article]
    [Google Scholar]
  50. Robak K, Balcerek M. Review of second generation bioethanol production from residual biomass. Food Technol Biotechnol 2018; 56:174–187 [View Article] [PubMed]
    [Google Scholar]
  51. Ndeh D, Rogowski A, Cartmell A, Luis AS, Baslé A et al. Complex pectin metabolism by gut bacteria reveals novel catalytic functions. Nature 2017; 544:65–70 [View Article]
    [Google Scholar]
  52. Avci U, Peña MJ, O’Neill MA. Changes in the abundance of cell wall apiogalacturonan and xylogalacturonan and conservation of rhamnogalacturonan II structure during the diversification of the Lemnoideae. Planta 2018; 247:953–971 [View Article] [PubMed]
    [Google Scholar]
  53. Martínez-Martínez I, Navarro-Fernández J, Daniel Lozada-Ramírez J, García-Carmona F, Sánchez-Ferrer A. YesT: a new rhamnogalacturonan acetyl esterase from Bacillus subtilis. Proteins 2008; 71:379–388 [View Article] [PubMed]
    [Google Scholar]
  54. Kauppinen S, Christgau S, Kofod LV, Halkier T, Dörreich K et al. Molecular cloning and characterization of a rhamnogalacturonan acetylesterase from Aspergillus aculeatus. Synergism between rhamnogalacturonan degrading enzymes. J Biol Chem 1995; 270:27172–27178 [View Article] [PubMed]
    [Google Scholar]
  55. Montanier C, van Bueren AL, Dumon C, Flint JE, Correia MA et al. Evidence that family 35 carbohydrate binding modules display conserved specificity but divergent function. Proc Natl Acad Sci 2009; 106:3065–3070 [View Article] [PubMed]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.001086
Loading
/content/journal/mgen/10.1099/mgen.0.001086
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error