Five Factors to Consider When Designing a Microbiome Study

Microbiome studies are complex and require up front planning to ensure you get the data you need to answer your research questions. Poor study design can result in missing differences that exist between cohorts, or worse, it can lead to falsely identifying signals that are unrelated to your research questions. Here, we highlight some of the factors to consider when designing a microbiome study to ensure you get the most out of your study.

Decide on a method

Analysis of the 16s or 18s rDNA sequences with amplicon sequencing has been the gold standard approach for microbiome analysis. Amplicon sequencing enables genus and sometimes species level resolution of microbial communities. Shotgun metagenomics is beginning to eclipse amplicon sequencing as the standard tool for analysis of microbiomes. This approach uses whole genome shotgun sequencing (WGS) to characterize DNA isolated from a particular environment. Shotgun metagenomics enables species and strain level resolution, but it also enables analysis of the functional capabilities of these organisms. This greatly expands the information available compared to amplicon sequencing. However, amplicon sequencing does still remain the tool of choice for certain kinds of studies, like those with very high host DNA contamination or very low biomass.

As the microbiome field continues to grow and evolve, new analytical methods are still emerging. One of these is metatranscriptomics, which characterizes RNA in a sample. This enables a more direct measurement of what genes are being expressed by organisms in that sample. Finally, metabolomics looks at small molecule metabolites present in a sample, including those produced by the microbiome. As these technologies emerge, tools for sample collection and data analysis are still lagging behind, but they are producing new information about the impact of microbiomes on hosts and the environment. This rest of this article focuses on shotgun metagenomics, but these considerations often apply for other kinds of microbiome approaches.

Use a large enough sample size to accomplish your research goals

Before starting any microbiome research study, an important decision is how many samples to include in the study. There are many factors that play a role, including the number of different groups and time points, variability in the population, effect size, and cost, among others. Because metagenomics studies often focus on small effects in highly variable populations, it is critical to choose an appropriate sample size. Even once you have started collecting data, it is critical to collect relevant metadata that is known to impact the microbiota to identify factors that may explain observed differences. For human or animal studies this should include factors like age, sex, and diet, and for environmental studies, geographic locations (latitude, longitude), season, and weather should be recorded. There is a significant risk that even for true differences between groups, studies with too few samples will struggle to obtain statistical significance or worse without appropriate metadata will identify differences that are actually due to a confounding factor.

Include controls throughout the process

Like any well-designed experiment, microbiome studies should include both positive and negative controls at multiple stages through sample collection and analysis. Mock sample negative control that are treated as similarly to real sample as possible are invaluable. Surfaces, plastics, skin, pipet tips, and even collection devices like swabs or fecal collection tubes can contain microbial contamination. Thus, mock samples are the ideal way to identify this kind of contamination and guide any changes needed to reduce it. Positive controls such as previously analyzed samples or commercially available controls are critical to determine if a particular batch or assay is behaving as expected. ATCC has developed a variety of whole cell and purified DNA standards of known composition for both human microbiome and environmental microbiome research. Because the composition and ratio of the organisms in these standards are known, crucial sources of bias can be studied and reduced, including lysis issues or GC bias. Other kinds of controls, including assay-specific no template controls (NTCs) and exogenous spike-ins, can help identify sample swaps and sample to sample contamination.

Integrate the best sample collection and preparation methods

From sample collection to bioinformatics, metagenomics workflows have many possible sources of bias. Carefully choosing the right reagents and parameters can reduce bias significantly. First, samples should be collected and stored in a stabilization buffer or flash frozen immediately. There are a variety of commercially available stabilization buffers designed for different sample types. The best stabilization buffer should prevent changes in the microbiome profile by penetrating into the sample, preventing growth, and stabilizing nucleic acids. DNA extraction is generally the most biased step in metagenomics workflows. Some organisms are difficult to lyse, and often require both chemical and physical methods of lysis. Even with physical lysis such as bead beating, using beads too large or too small can add bias. However using an aggressive extraction method can result in nucleic acid sheering. DNA library preparation is another source of bias as DNA fragmentation, ligation, and amplification can also carry GC and sequence length bias.

Choose an analysis method

There are a variety of pipelines and different approaches for analyzing shotgun metagenomics data. Among a myriad of differences, there are two factors that stand out: assembly and database choice. Some analysis pipelines start by assembling sequencing data into metagenomic assembled genomes (MAGs). The quality of these assemblies can vary greatly depending on sequencing depth, characteristics of the genomes themselves, read length, and read quality among other factors. Then, these assemblies are assigned to taxons. Alternatively, assembly-free approaches bypass the assembly step and go straight to taxonomic assignment. Assembly-based methods can aid in identifying gene clusters and bins of contigs that likely originated from the same organism. However, it can be difficult to generate MAGs for low abundance organisms, which can decrease the sensitivity of assembly-based methods for important though low abundance members of the microbiome. Another important choice is of database. Generally, there is a tradeoff between database size and quality. For example, RefSeq contains high quality genomes but has orders of magnitude fewer genomes than GenBank. On the other hand, GenBank genomes undergo minimal quality control so they are more likely to be poorly assembled or contain contamination, among other issues. Generally, databases that are too small can result in poor classification due to gaps in the database, while databases that are too large also result in misclassification due to errors or skew in the database. Without a publicly available database well-suited for metagenomics, we developed the One Codex Database, which is an ideal balance of size, quality, and species balances with a curated set of >115,000 genomes.

We will explore each of these factors in more detail over the coming weeks, so check back or subscribe for updates. You can also find out more about designing microbiome experiments in these publications:

Further reading

  • Kim D, Hofstaedter CE, Zhao C, et al. Optimizing methods and dodging pitfalls in microbiome research. Microbiome. 2017;5(1):52.
  • Allaband C, McDonald D, Vázquez-Baeza Y, et al. Microbiome 101: Studying, Analyzing, and Interpreting Gut Microbiome Data for Clinicians. Clin Gastroenterol Hepatol. 2019;17(2):218-230.
  • Johnson AJ, Zheng JJ, Kang JW, Saboe A, Knights D, Zivkovic AM. A Guide to Diet-Microbiome Study Design. Front Nutr. 2020;7:79.
  • Carney SM, Clemente JC, Cox MJ, et al. Methods in Lung Microbiome Research. Am J Respir Cell Mol Biol. 2020;62(3):283-299.
  • Kong HH, Andersson B, Clavel T, et al. Performing Skin Microbiome Research: A Method to the Madness. J Invest Dermatol. 2017;137(3):561-568.
← Back to the One Codex blog Introducing Microbiome Sequencing from One Codex →