Shiga toxin-producing O157: H7 (STEC) is a zoonotic pathogen that is globally dispersed, causing severe gastroenteritis when transmitted from ruminants to humans through direct or indirect contact with animals, their environment or contaminated food. Symptoms are varied in severity; from mild to bloody diarrhoea with more serious sequalae including hemolytic uremic syndrome (HUS) which can be fatal. Although there is compelling evidence that the Shiga toxin sub-type is a key predictor of disease severity, differences in virulence potential of strains with the same Shiga toxin profile are often observed. In this study, we employ machine learning algorithms to explore the relationship between the STEC genome with clinical outcome.

Kmer-counts of variable length (9-100 base pair) from 1148 isolates of STEC O157:H7, representing two years of routine surveillance in England, were matched to their respective clinical outcome data. A Random Forest classifier was developed and validated with the objective of inferring the clinical symptoms associated with a given STEC genome. Clinical outcomes were categorised into asymptomatic, diarrhoea, bloody diarrhoea and HUS. The model correctly classified 160 out of 190 cases of bloody diarrhoea, 81 out of 128 cases of diarrhoea and 7 out of 12 cases of HUS, with average AUC ROC score of 90%. Kmers deemed important for distinct classification were characterised and matches related to Shiga toxin 2a phage integration and excision genes and adhesion and transporter proteins were identified. This is consistent with reported virulence factors in the literature, supporting this approach of de novo pathogen characterisation.

  • This is an open-access article distributed under the terms of the Creative Commons Attribution License.

Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error