Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Correspondence
  • Published:

Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: VGP–Galaxy assembly pipeline (version 2.1) consists of 10 workflows that can be combined into 8 analysis trajectories depending on the combination of input data.
Fig. 2: Phylogenetic tree and assembly statistics of genomes assembled using the VGP–Galaxy assembly pipeline.

Data availability

The workflows, their description and instructions on how to use them can be found at https://galaxyproject.org/projects/vgp/workflows/. The requisite tools are installed on usegalaxy.org and usegalaxy.eu, and are in the process of being installed on usegalaxy.org.au. These genomes were supported by collaborators of the VGP and ERGA, and the QC analyses reported here to test the VGP Galaxy pipeline do not release those that are under specific embargo policies for genome-wide analyses (e.g., https://genome10k.ucsc.edu/data-use-policies/). New genome assemblies are available in the GenomeArk repository: https://www.genomeark.org/. After manual curation, the assemblies are submitted to the US National Center for Biotechnology Information (NCBI) under the BioProject Vertebrate Genome Project: https://www.ncbi.nlm.nih.gov/bioproject/48924317.

References

  1. Hotaling, S., Kelley, J. L. & Frandsen, P. B. Proc. Natl Acad. Sci. USA 118, e2109019118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Formenti, G. et al. Trends Ecol. Evol. 37, 197–202 (2022).

    Article  CAS  PubMed  Google Scholar 

  3. Theissinger, K. et al. Trends Genet. 39, 545–559 (2003).

    Article  Google Scholar 

  4. Lewin, H. A. et al. Proc. Natl Acad. Sci. USA 119, e2115635118 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Genome Biol. 21, 245 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Galaxy Community. Nucleic Acids Res. 50, W345–W351 (2022).

    Article  Google Scholar 

  7. Lander, E. S. & Waterman, M. S. Genomics 2, 231–239 (1988).

    Article  CAS  PubMed  Google Scholar 

  8. Bray, S. & Maier, W. Automating Galaxy workflows using the command line. Galaxy Training Network (2023).

  9. Galaxy Community. Galaxy Server administration. Galaxy Training Network https://github.com/galaxyproject/training-material (2019).

  10. Formenti, G. et al. Genome Biol. 22, 120 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Uliano-Silva, M. et al. BMC Bioinform. 24, 288 (2023).

  12. Wenger, A. M. et al. Nat. Biotechnol. 37, 1155–1162 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Batut, B. et al. Cell Syst. 6, 752–758.e1 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Lariviere, D., Ostrovsky, A., Gallardo, C., Pickett, B. & Abueg, L. VGP assembly pipeline - short version. Galaxy Training Network (2023); https://gxy.io/GTN:T00040

  15. Rautiainen, M. et al. Nat. Biotechnol. 41, 1474–1482 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Cheng, H., Asri, M., Lucas, J., Koren, S. & Li, H. Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph. Preprint at arXiv https://doi.org/10.48550/arXiv.2306.03399 (2023).

  17. BioProject Vertebrate Genome Project. NCBI BioProject PRJNA489243 (accessed 18 January 2024); https://www.ncbi.nlm.nih.gov/bioproject/489243

Download references

Acknowledgements

We thank Yagoub Adam, Tyler Alioto, Jun Aruga, Diego De Panis, Sagane Dind, Diego Fuentes, Shilpa Garg and Jèssica Gómez for contributing to the initial implementation during ELIXIR Biohackathon 2021. We also thank Nate Jue for help testing and developing the pipeline tutorials and Andrea Guarracino for their useful comments to the manuscript. This work was supported in part by the Intramural Research Program of the US National Human Genome Research Institute (NHGRI), the US National Institutes of Health (NIH) and the Howard Hughes Medical Institute (HHMI). The authors are grateful to the broader Galaxy community for their support and software development efforts. This work is funded by NIH grants U41 HG006620, U24 HG010263, U24 CA231877 and U01CA253481, along with US National Science Foundation grants 1661497, 1758800 and 2216612. The work was also supported in part by The Human Frontier Science Program (HFSP) RGP0025/2021, the Swiss National Science Foundation (SNSF) grants 202669 and 198691, the Swiss State Secretariat for Education, Research and Innovation (SERI) grant 22.00173 and Horizon Europe under the Biodiversity, Circular Economy and Environment program (REA.B.3, BGE 101059492). Usegalaxy.eu is supported by German Federal Ministry of Education and Research grants 031L0101C and de.NBI-epi to B.G. Computational resources are provided by the Advanced Cyberinfrastructure Coordination Ecosystem (ACCESS-CI), Texas Advanced Computing Center, and the JetStream2 scientific cloud.

Author information

Authors and Affiliations

Authors

Contributions

D.L. built the assembly pipeline with support from G.F., L.A., C.G.-A., B.G., A.O., H.C., M.U.-S., B.D.P., A.R., M.v.d.B. and the VGP assembly working group. L.A., A.D., G.R.G., A.M.G., G.M.G., N.J., C.J., B.O., S.S., M.S. and T.T. generated one or several assemblies used in the analyses. B.J.K., K.R. and M.J.P.C. validated the zebra finch assemblies. J.C. performed the manual curation on the zebra finch assembly. L.A. assembled and evaluated the mitochondrial genomes. N.B. established the decontamination pipeline and performed the contamination analyses. N.B. and M.P.-F. compared the scaffolding strategies. A.N. performed the analyses on XBP1. C.G.-A. and B.D.P. developed the training material with support from the user community. K.H. and M.C. sourced and arranged for sample procurement for species in this study. J.R.B., N.J., T.T., B.O’T., O.F., C.L., H.K., T.M.-B. and R.M.W. generated the PacBio and Hi-C data. G.F., M.C.S., A.N., A.M.P. and E.D.J. conceived the study and drafted the manuscript. All authors, including A.A. and R.W.W., contributed to writing and editing the manuscript and approved it.

Corresponding authors

Correspondence to Erich D. Jarvis, Michael C. Schatz, Anton Nekrutenko or Giulio Formenti.

Ethics declarations

Competing interests

The authors declare no competing interests.

Supplementary information

Supplementary Information (download PDF )

Supplementary Notes and Supplementary Figs. 1–14

Supplementary Table (download XLSX )

Supplementary Tables 1–10

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Larivière, D., Abueg, L., Brajuka, N. et al. Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy. Nat Biotechnol 42, 367–370 (2024). https://doi.org/10.1038/s41587-023-02100-3

Download citation

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41587-023-02100-3

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing