# Sample Sizes in Microbiome Research

When you’re planning a new study, there are a lot of factors to consider. These can include:

- What population should I target?
- How should I collect and store the samples?
- What analyte do I want to study, and how should I extract it from the samples?
- What kind of technology should I use, PCR, arrays, or sequencing?
- What sequencing approach should I use, amplicon or whole shotgun metagenome?
- How deeply should I sequence?

These are some common questions researchers ask, and your research goals will drive the decisions you make when setting up your study. But there is a commonly underappreciated question, and one that we at One Codex are often asked: how do I know if I have enough samples?

Many researchers are working within the constraints of a tight budget, and one of the ways they tend to keep their budgets low is by working with small sample sizes. Maybe they have seen other studies with low sample numbers. Or perhaps they get partway through the study and their budget is running low. By cutting your sample size however, you risk not being able to detect the differences you want to study. Sample size calculations allow you to adjust the parameters of your study design to ensure you have enough samples to observe an effect.

Sample size calculations can be challenging, particularly in microbiome studies. Like the questions above, these calculations are dependent on a number of factors. Outlined below are some of those considerations, to help you design your studies to ensure you can detect the differences of interest.

## Sample Size Considerations

**Study design:** Depending on the number of groups you have and the number of time points, you may need different sample sizes.

**Population variance:** If there is a large variability in the population, you will need a greater sample size to identify any significant differences. Variance is usually much higher in studies examining biological differences across a population of human microbiome samples than, say, testing differences in experimental procedures across many replicates from the same sample. This also ties in with your study design, since you may have to consider many different confounding factors such as age, sex, diet, or medications. Each of these can increase the variance observed in the population. Unfortunately, high variability is often overlooked both at the planning and analysis phases. Even basic factors such as age, sex, diet, or medications are known to impact the microbiome of study participants. Keep these factors in mind as selection criteria and plan to collect as much metadata as possible for your study to minimize the impact of confounding factors.

**Effect size:** This is the expected magnitude of differences between groups. This is usually estimated based on a pilot study within your population. If a pilot study is not possible, some researchers base their effect size estimates on studies of similar conditions and measures.

**Metrics of interest:** In microbiome studies, researchers tend to examine alpha diversity, beta diversity, or differences in abundances of specific microbes, among other metrics. Each of these metrics could have a different effect size. They are also measured or tested using different statistical tests or models. These statistical tests or models should be accounted for or used in your sample size calculations. In the case where you’re trying to measure differences in species abundances, you may need a greater sample size to find differences in lower abundance microbes.

**Alpha or type-I error:** This is the probability of identifying a difference that is not really there, similar to a p-value. This is commonly set at 5%, but you may choose to increase or decrease it, depending on how concerned you are about false positives.

**Power:** This is the probability of detecting a true-positive (1 - type-II error; i.e. 1 - the probability of a false-negative result). This is commonly set at 80%, but again, you may choose to be more strict or more lenient. This setting would mean there is a 20% chance that you would miss a true effect if it was there.

There are a number of tools freely available to help researchers estimate the sample sizes they would need for their research questions. Some are more comprehensive than others. But the best tools will allow you to account for each of the above points, including providing options for different statistical tests that you may need to consider. We hope this gives you some helpful guidance to set up your studies. What other challenges do you face in your microbiome research? One Codex would love to hear from you!

## Further reading

- Mattiello F, Verbist B, Faust K, Raes J, Shannon WD, Bijnens L, Thas O. A web application for sample size and power calculation in case-control microbiome studies. Bioinformatics. 2016 Jul 1;32(13):2038-40.
- Kelly BJ, Gross R, Bittinger K, Sherrill-Mix S, Lewis JD, Collman RG, Bushman FD, Li H. Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA. Bioinformatics. 2015 Aug 1;31(15):2461-8.
- Casals-Pascual C, González A, Vázquez-Baeza Y, Song SJ, Jiang L, Knight R. Microbial Diversity in Clinical Microbiome Studies: Sample Size and Statistical Power Considerations. Gastroenterology. 2020 May;158(6):1524-1528.