Larger, improved reference libraries

We’re happy to announce that we’ve just released larger, improved microbial reference libraries on One Codex.

More data

The RefSeq Database now includes over 7,000 reference and representative genomes from NCBI, while the One Codex Database holds nearly 34,000 different bacterial, viral, fungal, and archeal genomes. This is more than a 45% and 20% increase, respectively, from the last releases of the RefSeq and One Codex databases. As with all of our data releases, you should continue to see improvements in the specificity of your analyses by using our more comprehensive reference libraries.

Detailed information about the new reference data is available here.

Better data

Beyond the larger volume of reference data, we’ve also made a number of improvements in other areas with this release:

  • Better handling of adapter sequences, especially those used with Ion Torrent and PacBio platforms
  • Improved adapter masking for samples that have not been properly trimmed
  • Continued improvements to and clean up of reference data (e.g., enhanced contaminant screening)

Smarter systems

Finally, we’ve automatically re-run all prior analyses against our updated reference libraries. We think there’s a lot of value in having always-up-to-date analyses drawing on the latest, best quality reference data, and hope you also find this feature useful.

Of course, all prior results remain available via the dropdown in the top-right of our application:

Please drop us a note or reach out to us on Twitter if you have any questions or we can ever help with your microbial genomics work. Thanks!

← Back to the One Codex blog Public links and a better datasets view →