1887

Abstract

Pathogen genomics is a critical tool for public health surveillance, infection control, outbreak investigations as well as research. In order to make use of pathogen genomics data, they must be interpreted using contextual data (metadata). Contextual data include sample metadata, laboratory methods, patient demographics, clinical outcomes and epidemiological information. However, the variability in how contextual information is captured by different authorities and how it is encoded in different databases poses challenges for data interpretation, integration and their use/re-use. The DataHarmonizer is a template-driven spreadsheet application for harmonizing, validating and transforming genomics contextual data into submission-ready formats for public or private repositories. The tool’s web browser-based JavaScript environment enables validation and its offline functionality and local installation increases data security. The DataHarmonizer was developed to address the data sharing needs that arose during the COVID-19 pandemic, and was used by members of the Canadian COVID Genomics Network (CanCOGeN) to harmonize SARS-CoV-2 contextual data for national surveillance and for public repository submission. In order to support coordination of international surveillance efforts, we have partnered with the Public Health Alliance for Genomic Epidemiology to also provide a template conforming to its SARS-CoV-2 contextual data specification for use worldwide. Templates are also being developed for One Health and foodborne pathogens. Overall, the DataHarmonizer tool improves the effectiveness and fidelity of contextual data capture as well as its subsequent usability. Harmonization of contextual information across authorities, platforms and systems globally improves interoperability and reusability of data for concerted public health and research initiatives to fight the current pandemic and future public health emergencies. While initially developed for the COVID-19 pandemic, its expansion to other data management applications and pathogens is already underway.

Funding
This study was supported by the:
  • Michael G. DeGroote Institute for Infectious Disease Research, McMaster University
    • Principle Award Recipient: EmilyM Panousis
  • Michael Smith Foundation for Health Research (Award 18275)
    • Principle Award Recipient: WilliamW.L. Hsiao
  • Canadian Institutes of Health Research (Award PJT-159456)
    • Principle Award Recipient: WilliamW.L. Hsiao
  • Genome British Columbia (Award 286GET)
    • Principle Award Recipient: WilliamW.L. Hsiao
  • Genome Canada (Award E09CMA)
    • Principle Award Recipient: WilliamW.L. Hsiao
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000908
2023-01-23
2024-07-16
Loading full text...

Full text loading...

/deliver/fulltext/mgen/9/1/mgen000908.html?itemId=/content/journal/mgen/10.1099/mgen.0.000908&mimeType=html&fmt=ahah

References

  1. Seemann T, Lane CR, Sherry NL, Duchene S, Gonçalves da Silva A et al. Tracking the COVID-19 pandemic in Australia using genomics. Nat Commun 2020; 11:4376 [View Article]
    [Google Scholar]
  2. McLaughlin A, Montoya V, Miller RL, Mordecai GJ, Worobey M et al. Early and ongoing importations of SARS-CoV-2 in Canada. Epidemiology [View Article]
    [Google Scholar]
  3. Fauver JR, Petrone ME, Hodcroft EB, Shioda K, Ehrlich HY et al. Coast-to-coast spread of SARS-CoV-2 during the early epidemic in the United States. Cell 2020; 181:990–996 [View Article]
    [Google Scholar]
  4. Zhang W, Govindavari JP, Davis BD, Chen SS, Kim JT et al. Analysis of genomic characteristics and transmission routes of patients with confirmed SARS-CoV-2 in Southern California during the early stage of the US COVID-19 pandemic. JAMA Netw Open 2020; 3:e2024191 [View Article]
    [Google Scholar]
  5. Githinji G, de Laurent ZR, Mohammed KS, Omuoyo DO, Macharia PM et al. Tracking the introduction and spread of SARS-CoV-2 in coastal Kenya. Nat Commun 2021; 12:4809 [View Article]
    [Google Scholar]
  6. du Plessis L, McCrone JT, Zarebski AE, Hill V, Ruis C et al. Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK. Science 2021; 371:708–712 [View Article]
    [Google Scholar]
  7. Oude Munnink BB, Nieuwenhuijse DF, Stein M, O’Toole Á, Haverkate M et al. Author correction: rapid SARS-CoV-2 whole-genome sequencing and analysis for informed public health decision-making in the Netherlands. Nat Med 2020; 26:1802 [View Article]
    [Google Scholar]
  8. Schriml LM, Chuvochina M, Davies N, Eloe-Fadrosh EA, Finn RD et al. COVID-19 pandemic reveals the peril of ignoring metadata standards. Sci Data 2020; 7:188 [View Article]
    [Google Scholar]
  9. van Panhuis WG, Paul P, Emerson C, Grefenstette J, Wilder R et al. A systematic review of barriers to data sharing in public health. BMC Public Health 2014; 14:1144 [View Article]
    [Google Scholar]
  10. Gozashti L, Corbett-Detig R. Shortcomings of SARS-CoV-2 genomic metadata. BMC Res Notes 2021; 14:189 [View Article]
    [Google Scholar]
  11. Yilmaz P, Kottmann R, Field D, Knight R, Cole JR et al. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol 2011; 29:415–420 [View Article]
    [Google Scholar]
  12. Rajesh A, Chang Y, Abedalthagafi MS, Wong-Beringer A, Love MI et al. Improving the completeness of public metadata accompanying omics studies. Genome Biol 2021; 22:106 [View Article]
    [Google Scholar]
  13. Griffiths EJ, Timme RE, Mendes CI, Page AJ, Alikhan N-F et al. Future-proofing and maximizing the utility of metadata: the PHA4GE SARS-CoV-2 contextual data specification package. Gigascience 2022; 11:giac003 [View Article]
    [Google Scholar]
  14. Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N et al. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform 2009; 42:377–381 [View Article]
    [Google Scholar]
  15. Harris PA, Taylor R, Minor BL, Elliott V, Fernandez M et al. The REDCap consortium: building an international community of software platform partners. J Biomed Inform 2019; 95:103208 [View Article]
    [Google Scholar]
  16. Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data 2016; 3:160018 [View Article]
    [Google Scholar]
  17. LexMapr Centre for Infectious Disease and One Health; 2021Dec7 https://github.com/cidgoh/LexMapr
  18. ZOOMA [Internet]; 2021Dec7 https://www.ebi.ac.uk/spot/zooma/
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000908
Loading
/content/journal/mgen/10.1099/mgen.0.000908
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error