DOI QR코드

DOI QR Code

A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing

  • Hyeonwoo Kim (Department of Bioinformatics, Soongsil University) ;
  • Jiwon Kim (Department of Bioinformatics, Soongsil University) ;
  • Ji Won Cho (Department of Biological Sciences, Sungkyunkwan University) ;
  • Kwang-Sung Ahn (Functional Genome Institute, PDXen Biosystems, Co.) ;
  • Dong-Il Park (Division of Gastroenterology, Department of Internal Medicine and Inflammatory Bowel Disease Center, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine) ;
  • Sangsoo Kim (Department of Bioinformatics, Soongsil University)
  • Received : 2023.06.08
  • Accepted : 2023.06.25
  • Published : 2023.09.30

Abstract

Microbial community profiling using 16S rRNA amplicon sequencing allows for taxonomic characterization of diverse microorganisms. While amplicon sequence variant (ASV) methods are increasingly favored for their fine-grained resolution of sequence variants, they often discard substantial portions of sequencing reads during quality control, particularly in datasets with large number samples. We present a streamlined pipeline that integrates FastP for read trimming, HmmUFOtu for operational taxonomic units (OTU) clustering, Vsearch for chimera checking, and Kraken2 for taxonomic assignment. To assess the pipeline's performance, we reprocessed two published stool datasets of normal Korean populations: one with 890 and the other with 1,462 independent samples. In the first dataset, HmmUFOtu retained 93.2% of over 104 million read pairs after quality trimming, discarding chimeric or unclassifiable reads, while DADA2, a commonly used ASV method, retained only 44.6% of the reads. Nonetheless, both methods yielded qualitatively similar β-diversity plots. For the second dataset, HmmUFOtu retained 89.2% of read pairs, while DADA2 retained a mere 18.4% of the reads. HmmUFOtu, being a closed-reference clustering method, facilitates merging separately processed datasets, with shared OTUs between the two datasets exhibiting a correlation coefficient of 0.92 in total abundance (log scale). While the first two dimensions of the β-diversity plot exhibited a cohesive mixture of the two datasets, the third dimension revealed the presence of a batch effect. Our comparative evaluation of ASV and OTU methods within this streamlined pipeline provides valuable insights into their performance when processing large-scale microbial 16S rRNA amplicon sequencing data. The strengths of HmmUFOtu and its potential for dataset merging are highlighted.

Keywords

Acknowledgement

This research was in part supported by the Soongsil University Research Fund. The computational resources were kindly provided by Korea Institute of Science and Technology Information (GSDC & KREONET). This research was also in part supported by a grant of the Korea Health Technology R & D Project through the Korea Health Industry Development Institutie (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI23C0661), and by a National Research Foundation (NRF) grant funded by the Korea government (NRF-2020R1A2B5B02002259).

References

  1. Christensen H, Andersson AJ, Jorgensen SL, Vogt JK. 16S rRNA amplicon sequencing for metagenomics. In: Introduction to Bioinformatics in Microbiology (Christensen H, ed.). Cham: Springer, 2018. pp. 135-161.
  2. Zheng Q, Bartow-McKenney C, Meisel JS, Grice EA. HmmUFOtu: an HMM and phylogenetic placement based ultra-fast taxonomic assignment and OTU picking tool for microbiome amplicon sequencing studies. Genome Biol 2018;19:82.
  3. Eren AM, Maignien L, Sul WJ, Murphy LG, Grim SL, Morrison HG, et al. Oligotyping: differentiating between closely related microbial taxa using 16S rRNA gene data. Methods Ecol Evol 2013;4:1111-1119. https://doi.org/10.1111/2041-210X.12114
  4. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJ, Holmes SP. DADA2: high-resolution sample inference from Illumina amplicon data. Nat Methods 2016;13:581-583. https://doi.org/10.1038/nmeth.3869
  5. Amir A, McDonald D, Navas-Molina JA, Kopylova E, Morton JT, Zech Xu Z, et al. Deblur rapidly resolves single-nucleotide community sequence patterns. mSystems 2017;2:e00191-16.
  6. Lim MY, Hong S, Bang SJ, Chung WH, Shin JH, Kim JH, et al. Gut microbiome structure and association with host factors in a Korean population. mSystems 2021;6:e0017921.
  7. Yun Y, Kim HN, Kim SE, Heo SG, Chang Y, Ryu S, et al. Comparative analysis of gut microbiota associated with body mass index in a large Korean cohort. BMC Microbiol 2017;17:151.
  8. Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018;34:i884-i890. https://doi.org/10.1093/bioinformatics/bty560
  9. Sasada R, Weinstein M, Prem A, Jin M, Bhasin J. FIGARO: an efficient and objective tool for optimizing microbiome rRNA gene trimming parameters. J Biomol Tech 2020;31(Suppl):S2.
  10. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol 2019;20:257.
  11. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 2009;75:7537-7541. https://doi.org/10.1128/AEM.01541-09
  12. Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende DR, et al. Enterotypes of the human gut microbiome. Nature 2011;473:174-180.
  13. Kool J, Tymchenko L, Shetty SA, Fuentes S. Reducing bias in microbiome research: comparing methods from sample collection to sequencing. Front Microbiol 2023;14:1094800.
  14. Weyrich LS, Farrer AG, Eisenhofer R, Arriola LA, Young J, Selway CA, et al. Laboratory contamination over time during low-biomass sample analysis. Mol Ecol Resour 2019;19:982-996. https://doi.org/10.1111/1755-0998.13011
  15. Levy R, Magis AT, Earls JC, Manor O, Wilmanski T, Lovejoy J, et al. Longitudinal analysis reveals transition barriers between dominant ecological states in the gut microbiome. Proc Natl Acad Sci U S A 2020;117:13839-13845. https://doi.org/10.1073/pnas.1922498117