
Full text loading...
Shiga toxin-producing Escherichia coli O157: H7 (STEC) is a zoonotic pathogen that is globally dispersed, causing severe gastroenteritis when transmitted from ruminants to humans through direct or indirect contact with animals, their environment or contaminated food. Symptoms are varied in severity; from mild to bloody diarrhoea with more serious sequalae including hemolytic uremic syndrome (HUS) which can be fatal. Although there is compelling evidence that the Shiga toxin sub-type is a key predictor of disease severity, differences in virulence potential of strains with the same Shiga toxin profile are often observed. In this study, we employ machine learning algorithms to explore the relationship between the STEC genome with clinical outcome.
Kmer-counts of variable length (9-100 base pair) from 1148 isolates of STEC O157:H7, representing two years of routine surveillance in England, were matched to their respective clinical outcome data. A Random Forest classifier was developed and validated with the objective of inferring the clinical symptoms associated with a given STEC genome. Clinical outcomes were categorised into asymptomatic, diarrhoea, bloody diarrhoea and HUS. The model correctly classified 160 out of 190 cases of bloody diarrhoea, 81 out of 128 cases of diarrhoea and 7 out of 12 cases of HUS, with average AUC ROC score of 90%. Kmers deemed important for distinct classification were characterised and matches related to Shiga toxin 2a phage integration and excision genes and adhesion and transporter proteins were identified. This is consistent with reported virulence factors in the literature, supporting this approach of de novo pathogen characterisation.