The 2023 One Codex Database Update

Genomes across Bacteria, Archaea, Fungi, Viruses and Eukaryotes are continuously submitted to public repositories and the taxonomic placement of these organisms is constantly being improved. Without curation for quality and regular updates, metagenomic databases can fall behind, leading to outdated taxonomic names as well as missed opportunities for improvements to resolution due to better taxonomies. Keeping up to date with these changes as well as managing databases, genome curation, and taxonomies across time can be time consuming and computationally intensive. At One Codex, we are dedicated to regularly updating, curating, and benchmarking the database we use for our taxonomic classifications. Today, we are thrilled to announce the largest update to our database to date, which now encompasses over 148,000 high-quality reference genomes spanning the entire taxonomic tree (Fig 1). The benefits of increasing the taxonomic breadth of our classifier are multiplied when combined with analytical workflows for microbiome discovery, clinical metagenomics, and genomic epidemiology.

Distinct taxa in this and last year’s editions of the One Codex reference database.

With the increasing risks of infectious disease outbreaks due to global and environmental changes, pathogen surveillance remains of utmost importance1. In combination with clinical diagnostic testing, untargeted metagenomics sequencing and analysis serve as valuable tools for detecting emerging pathogens and tracking pathogen evolution2. Hence, in this release, we have prioritized enhancements that support public health and clinical applications by expanding the coverage of human pathogens in our reference database. Specifically, we have increased the number of viruses in our database by over 220%. This new database is already being used to improve the output of the Twist Comprehensive Viral Panel analysis as well as our workflows for Genomic Epidemiology and Clinical Metagenomics.

