Cost effective, experimentally robust differential-expression analysis for human/mammalian, pathogen and dual-species transcriptomics Open Access

Abstract

As sequencing read length has increased, researchers have quickly adopted longer reads for their experiments. Here, we examine 14 pathogen or host–pathogen differential gene expression data sets to assess whether using longer reads is warranted. A variety of data sets was used to assess what genomic attributes might affect the outcome of differential gene expression analysis including: gene density, operons, gene length, number of introns/exons and intron length. No genome attribute was found to influence the data in principal components analysis, hierarchical clustering with bootstrap support, or regression analyses of pairwise comparisons that were undertaken on the same reads, looking at all combinations of paired and unpaired reads trimmed to 36, 54, 72 and 101 bp. Read pairing had the greatest effect when there was little variation in the samples from different conditions or in their replicates (e.g. little differential gene expression). But overall, 54 and 72 bp reads were typically most similar. Given differences in costs and mapping percentages, we recommend 54 bp reads for organisms with no or few introns and 72 bp reads for all others. In a third of the data sets, read pairing had absolutely no effect, despite paired reads having twice as much data. Therefore, single-end reads seem robust for differential-expression analyses, but in eukaryotes paired-end reads are likely desired to analyse splice variants and should be preferred for data sets that are acquired with the intent to be community resources that might be used in secondary data analyses.

Funding
This study was supported by the:
  • National Cancer Institute (Award R01CA206188)
    • Principle Award Recipient: Julie C Dunning Hotopp
  • National Institutes of Health (Award R01DE022600)
    • Principle Award Recipient: Scott G Filler
  • National Institute of Allergy and Infectious Diseases (Award R01AI124566)
    • Principle Award Recipient: Scott G Filler
  • National Institute of Allergy and Infectious Diseases (Award U19AI110820)
    • Principle Award Recipient: David A Rasko
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000320
2019-12-18
2024-03-28
Loading full text...

Full text loading...

/deliver/fulltext/mgen/6/1/mgen000320.html?itemId=/content/journal/mgen/10.1099/mgen.0.000320&mimeType=html&fmt=ahah

References

  1. Chhangawala S, Rudy G, Mason CE, Rosenfeld JA. The impact of read length on quantification of differentially expressed genes and splice junction detection. Genome Biol 2015; 16:131 [View Article]
    [Google Scholar]
  2. Bruno VM, Shetty AC, Yano J, Fidel PL, Noverr MC et al. Transcriptomic analysis of vulvovaginal candidiasis identifies a role for the NLRP3 inflammasome. MBio 2015; 6:e00182-15 [View Article]
    [Google Scholar]
  3. Watkins TN, Liu H, Chung M, Hazen TH, Dunning Hotopp JC et al. Comparative transcriptomics of Aspergillus fumigatus strains upon exposure to human airway epithelial cells. Microb Genom 2018; 4:mgen.0.000154 [View Article]
    [Google Scholar]
  4. Liu Y, Shetty AC, Schwartz JA, Bradford LL, Xu W et al. New signaling pathways govern the host response to C. albicans infection in various niches. Genome Res 2015; 25:679–689 [View Article]
    [Google Scholar]
  5. Hazen TH, Daugherty SC, Shetty A, Mahurkar AA, White O et al. RNA-Seq analysis of isolate- and growth phase-specific differences in the global transcriptomes of enteropathogenic Escherichia coli prototype isolates. Front Microbiol 2015; 6:569 [View Article]
    [Google Scholar]
  6. Chung M, Teigen LE, Libro S, Bromley RE, Olley D et al. Drug repurposing of bromodomain inhibitors as potential novel therapeutic leads for lymphatic filariasis guided by multispecies transcriptomics. mSystems 2019; 4:e00596-19 [View Article]
    [Google Scholar]
  7. Rossi E, Falcone M, Molin S, Johansen HK. High-resolution in situ transcriptomics of Pseudomonas aeruginosa unveils genotype independent patho-phenotypes in cystic fibrosis lungs. Nat Commun 2018; 9:3459 [View Article]
    [Google Scholar]
  8. Gifford AH, Willger SD, Dolben EL, Moulton LA, Dorman DB et al. Use of a multiplex transcript method for analysis of Pseudomonas aeruginosa gene expression profiles in the cystic fibrosis lung. Infect Immun 2016; 84:2995–3006 [View Article]
    [Google Scholar]
  9. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008; 5:621–628 [View Article]
    [Google Scholar]
  10. Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 2008; 320:1344–1349 [View Article]
    [Google Scholar]
  11. Juranic Lisnic V, Babic Cac M, Lisnic B, Trsan T, Mefferd A et al. Dual analysis of the murine cytomegalovirus and host cell transcriptomes reveal new aspects of the virus-host cell interface. PLoS Pathog 2013; 9:e1003611 [View Article]
    [Google Scholar]
  12. Westermann AJ, Gorski SA, Vogel J. Dual RNA-seq of pathogen and host. Nat Rev Microbiol 2012; 10:618–630 [View Article]
    [Google Scholar]
  13. Hrdlickova R, Toloue M, Tian B. RNA-Seq methods for transcriptome analysis. Wiley Interdiscip Rev RNA 2017; 8:e1364 [View Article]
    [Google Scholar]
  14. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 2009; 10:57–63 [View Article]
    [Google Scholar]
  15. Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A et al. A survey of best practices for RNA-Seq data analysis. Genome Biol 2016; 17:13 [View Article]
    [Google Scholar]
  16. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009; 25:2078–2079 [View Article]
    [Google Scholar]
  17. Quinlan AR. BEDTools: the Swiss-Army tool for genome feature analysis. Curr Protoc Bioinformatics 2014; 47:11.12.1–11.12.34 [View Article]
    [Google Scholar]
  18. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-seq. Bioinformatics 2009; 25:1105–1111 [View Article]
    [Google Scholar]
  19. Langmead B. Aligning short sequencing reads with Bowtie. Curr Protoc Bioinformatics 2010; 32:11.7.1–11.7.11 [View Article]
    [Google Scholar]
  20. Li H, Durbin R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 2010; 26:589–595 [View Article]
    [Google Scholar]
  21. Anders S, Pyl PT, Huber W. HTSeq - a Python framework to work with high-throughput sequencing data. Bioinformatics 2015; 31:166–169 [View Article]
    [Google Scholar]
  22. Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol 2010; 11:R106 [View Article]
    [Google Scholar]
  23. R Development Core Team R: a Language and Environment for Statistical Computing Vienna: R Foundation for Statistical Computing; 2013 http://www.R-project.org/.
  24. Iguchi A, Thomson NR, Ogura Y, Saunders D, Ooka T et al. Complete genome sequence and comparative genome analysis of enteropathogenic Escherichia coli O127:H6 strain E2348/69. J Bacteriol 2009; 191:347–354 [View Article]
    [Google Scholar]
  25. Darby AC, Armstrong SD, Bah GS, Kaur G, Hughes MA et al. Analysis of gene expression from the Wolbachia genome of a filarial nematode supports both metabolic and defensive roles within the symbiosis. Genome Res 2012; 22:2467–2477 [View Article]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000320
Loading
/content/journal/mgen/10.1099/mgen.0.000320
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF

Supplementary material 2

EXCEL

Most cited Most Cited RSS feed