1887

Abstract

In the management of infectious disease outbreaks, grouping cases into clusters and understanding their underlying epidemiology are fundamental tasks. In genomic epidemiology, clusters are typically identified either using pathogen sequences alone or with sequences in combination with epidemiological data such as location and time of collection. However, it may not be feasible to culture and sequence all pathogen isolates, so sequence data may not be available for all cases. This presents challenges for identifying clusters and understanding epidemiology, because these cases may be important for transmission. Demographic, clinical and location data are likely to be available for unsequenced cases, and comprise partial information about their clustering. Here, we use statistical modelling to assign unsequenced cases to clusters already identified by genomic methods, assuming that a more direct method of linking individuals, such as contact tracing, is not available. We build our model on pairwise similarity between cases to predict whether cases cluster together, in contrast to using individual case data to predict the cases’ clusters. We then develop methods that allow us to determine whether a pair of unsequenced cases are likely to cluster together, to group them into their most probable clusters, to identify which are most likely to be members of a specific (known) cluster, and to estimate the true size of a known cluster given a set of unsequenced cases. We apply our method to tuberculosis data from Valencia, Spain. Among other applications, we find that clustering can be predicted successfully using spatial distance between cases and whether nationality is the same. We can identify the correct cluster for an unsequenced case, among 38 possible clusters, with an accuracy of approximately 35 %, higher than both direct multinomial regression (17 %) and random selection (< 5 %).

Keyword(s): genomic clustering and TB cases
Funding
This study was supported by the:
  • European Commission – NextGenerationEU
    • Principle Award Recipient: KurniaSusvitasari
  • Ministerio de Ciencia (Spanish Government)
    • Principle Award Recipient: IñakiComas
  • European Research Council
    • Principle Award Recipient: IñakiComas
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000929
2023-03-03
2024-05-03
Loading full text...

Full text loading...

/deliver/fulltext/mgen/9/3/mgen000929.html?itemId=/content/journal/mgen/10.1099/mgen.0.000929&mimeType=html&fmt=ahah

References

  1. Cancino-Muñoz I, López MG, Torres-Puente M, Villamayor LM, Borrás R et al. Ma montserrat ruiz-garcía, hermelinda vanaclocha, valencia region tuberculosis working group, and Iñaki comas. population-based sequencing of Mycobacterium tuberculosis reveals how current population dynamics are shaped by past epidemics. eLife 2022; 11:e76605 [View Article]
    [Google Scholar]
  2. Miquel S. Porta and International Epidemiological Association eds A Dictionary of Epidemiology, 5th edn. Oxford University Press, Oxford; New York: OCLC: ocn171258222; 2008
    [Google Scholar]
  3. Poon AFY, Gustafson R, Daly P, Zerr L, Demlow SE et al. Near real-time monitoring of HIV transmission hotspots from routine HIV genotyping: an implementation case study. Lancet HIV 2016; 3:e231–8 [View Article] [PubMed]
    [Google Scholar]
  4. Gardy J, Loman NJ, Rambaut A. Real-time digital pathogen surveillance - the time is now. Genome Biol 2015; 16:155 [View Article]
    [Google Scholar]
  5. Guerra-Assunção JA, Crampin AC, Houben RMGJ, Mzembe T, Mallard K et al. Large-scale whole genome sequencing of M. tuberculosis provides insights into transmission in a high prevalence area. eLife 2015; 4:e05166 [View Article]
    [Google Scholar]
  6. Ragonnet-Cronin M, Hodcroft E, Hué S, Fearnhill E, Delpech V et al. Samantha lycett, and UK HIV drug resistance database. Automated analysis of phylogenetic clusters. BMC Bioinform 2013; 14:317 [View Article]
    [Google Scholar]
  7. Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu C-H et al. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comput Biol 2014; 10:e1003537 [View Article]
    [Google Scholar]
  8. Campbell F, Didelot X, Fitzjohn R, Ferguson N, Cori A et al. outbreaker2: a modular platform for outbreak reconstruction. BMC Bioinformatics 2018; 19:363 [View Article]
    [Google Scholar]
  9. Stimson J, Gardy J, Mathema B, Crudu V, Cohen T et al. Beyond the SNP threshold: identifying outbreak clusters using inferred transmissions. Mol Biol Evol 2019; 36:587–603 [View Article]
    [Google Scholar]
  10. McCloskey RM, Poon AFY. A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation. PLoS Comput Biol 2017; 13:e1005868 [View Article] [PubMed]
    [Google Scholar]
  11. Farrington CP, Andrews NJ, Beale AD, Catchpole MA. A statistical algorithm for the early detection of outbreaks of infectious disease. J Roy Stat Soc Ser A 1996; 159:547 [View Article]
    [Google Scholar]
  12. Hutwagner LC, Maloney EK, Bean NH, Slutsker L, Martin SM. Using laboratory-based surveillance data for prevention: an algorithm for detecting Salmonella outbreaks. Emerg Infect Dis 1997; 3:395–400 [View Article] [PubMed]
    [Google Scholar]
  13. Stroup DF, Williamson GD, Herndon JL, Karon JM. Detection of aberrations in the occurrence of notifiable diseases surveillance data. Stat Med 1989; 8:323–329 [View Article]
    [Google Scholar]
  14. Nobre FF, Stroup DF. A monitoring system to detect changes in public health surveillance data. Int J Epidemiol 1994; 23:408–418 [View Article] [PubMed]
    [Google Scholar]
  15. Stern L, Lightfoot D. Automated outbreak detection: a quantitative retrospective analysis. Epidemiol Infect 1999; 122:103–110 [View Article] [PubMed]
    [Google Scholar]
  16. Bédubourg G, Le Strat Y. Evaluation and comparison of statistical methods for early temporal detection of outbreaks: a simulation-based study. PLoS One 2017; 12:e0181227 [View Article]
    [Google Scholar]
  17. Salmon M, Schumacher D, Höhle M. Monitoring count time series in R: aberration detection in public health surveillance. J Stat Soft 2016; 70:10 [View Article]
    [Google Scholar]
  18. Kulldorff M, Heffernan R, Hartman J, Assunção R, Mostashari F. A space-time permutation scan statistic for disease outbreak detection. PLoS Med 2005; 2:e59 [View Article]
    [Google Scholar]
  19. Watkins RE, Eagleson S, Veenendaal B, Wright G, Plant AJ. Disease surveillance using a hidden Markov model. BMC Med Inform Decis Mak 2009; 9:39 [View Article]
    [Google Scholar]
  20. Hossain MdM, Lawson AB. Space-time Bayesian small area disease risk models: development and evaluation with a focus on cluster detection. Environ Ecol Stat 2010; 17:73–95 [View Article] [PubMed]
    [Google Scholar]
  21. Ypma RJF, Donker T, van Ballegooijen WM, Wallinga J. Finding evidence for local transmission of contagious disease in molecular epidemiological datasets. PLoS One 2013; 8:e69875 [View Article]
    [Google Scholar]
  22. Donker T, Bosch T, Ypma RJF, Haenen APJ, van Ballegooijen WM et al. Monitoring the spread of meticillin-resistant Staphylococcus aureus in The Netherlands from a reference laboratory perspective. J Hosp Infect 2016; 93:366–374 [View Article]
    [Google Scholar]
  23. Cori A, Nouvellet P, Garske T, Bourhy H, Nakouné E et al. A graph-based evidence synthesis approach to detecting outbreak clusters: an application to dog rabies. PLoS Comput Biol 2018; 14:e1006554 [View Article]
    [Google Scholar]
  24. Wasserman L, Wasserman LA. All of Statistics: A Concise Course in Statistical Inference Springer Science & Business Media. Google-Books-ID: th3fbFI1DaMC; 2004
    [Google Scholar]
  25. Youden WJ. Index for rating diagnostic tests. Cancer 1950; 3:32–35 [View Article]
    [Google Scholar]
  26. Greiner M, Pfeiffer D, Smith RD. Principles and practical application of the receiver-operating characteristic analysis for diagnostic tests. Prev Vet Med 2000; 45:23–41 [View Article] [PubMed]
    [Google Scholar]
  27. Schisterman EF, Perkins NJ, Liu A, Bondell H. Optimal cut-point and its corresponding Youden Index to discriminate individuals using pooled blood samples. Epidemiology 2005; 16:73–81 [View Article] [PubMed]
    [Google Scholar]
  28. Xu Y, Cancino-Muñoz I, Torres-Puente M, Villamayor LM, Borrás R et al. High-resolution mapping of tuberculosis transmission: whole genome sequencing and phylogenetic modelling of a cohort from Valencia Region, Spain. PLoS Med 2019; 16:e1002961 [View Article]
    [Google Scholar]
  29. Kwan CK, Ernst JD. HIV and tuberculosis: a deadly human syndemic. Clin Microbiol Rev 2011; 24:351–376 [View Article] [PubMed]
    [Google Scholar]
  30. Sonnenberg P, Glynn JR, Fielding K, Murray J, Godfrey-Faussett P et al. How soon after infection with HIV does the risk of tuberculosis start to increase? A retrospective cohort study in South African gold miners. J Infect Dis 2005; 191:150–158 [View Article] [PubMed]
    [Google Scholar]
  31. Zumla A, Malon P, Henderson J, Grange JM. Impact of HIV infection on tuberculosis. Postgrad Med J 2000; 76:259–268 [View Article] [PubMed]
    [Google Scholar]
  32. Ronacher K, Joosten SA, van Crevel R, Dockrell HM, Walzl G et al. Acquired immunodeficiencies and tuberculosis: focus on HIV/AIDS and diabetes mellitus. Immunol Rev 2015; 264:121–137 [View Article] [PubMed]
    [Google Scholar]
  33. Jeon CY, Murray MB. Diabetes mellitus increases the risk of active tuberculosis: a systematic review of 13 observational studies. PLoS Med 2008; 5:e152 [View Article]
    [Google Scholar]
  34. Padgham M, Sumner MD. geodist: Fast, Dependency-Free Geodesic Distance Calculations R package version 0.0.3; 2019
    [Google Scholar]
  35. Yang C, Sobkowiak B, Naidu V, Codreanu A, Ciobanu N et al. Phylogeography and transmission of M. tuberculosis in Moldova: a prospective genomic analysis. PLoS Med 2022; 19:e1003933 [View Article]
    [Google Scholar]
  36. Warren JL, Chitwood MH, Sobkowiak B, Crudu V, Colijn C et al. Spatial modeling of dyadic genetic relatedness data: identifying factors associated with M. tuberculosis transmission in Moldova. arXiv:210914003 [stat] 2022
    [Google Scholar]
  37. Comin J, Chaure A, Cebollada A, Ibarz D, Viñuelas J et al. Investigation of a rapidly spreading tuberculosis outbreak using whole-genome sequencing. Infect Genet Evol 2020; 81:104184 [View Article] [PubMed]
    [Google Scholar]
  38. Althomsons SP, Hill AN, Harrist AV, France AM, Powell KM et al. Statistical method to detect tuberculosis outbreaks among endemic clusters in a low-incidence setting. Emerg Infect Dis 2018; 24:573–575 [View Article]
    [Google Scholar]
  39. Black AT, Hamblion EL, Buttivant H, Anderson SR, Stone M et al. Tracking and responding to an outbreak of tuberculosis using MIRU-VNTR genotyping and whole genome sequencing as epidemiological tools. J Public Health 2018; 40:e66–e73 [View Article]
    [Google Scholar]
  40. Bao H, Liu K, Wu Z, Wang X, Chai C et al. Tuberculosis outbreaks among students in mainland China: a systematic review and meta-analysis. BMC Infect Dis 2019; 19:972 [View Article]
    [Google Scholar]
  41. van der Werf MJ, Ködmön C. Whole-genome sequencing as tool for investigating international tuberculosis outbreaks: a systematic review. Front Public Health 2019; 7:87 [View Article]
    [Google Scholar]
  42. Barido-Sottani J, Vaughan TG, Stadler T. Detection of HIV transmission clusters from phylogenetic trees using a multi-state birth-death model. J R Soc Interface 2018; 15:146 [View Article] [PubMed]
    [Google Scholar]
  43. Bruisten SM, Schouls L. Molecular typing and clustering analysis as a tool for epidemiology of infectious diseases. In Modern Infectious Disease Epidemiology Springer; 2009 pp 117–141
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000929
Loading
/content/journal/mgen/10.1099/mgen.0.000929
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error