Genomic data portals collect, annotate, and make data files available to researchers and, increasingly, AI algorithms. They are run by, among others, broad data archive repositories or consortium-specific Data Coordination Centers. Their design may seem a niche topic, but these portals realize the open data principles by making millions of data files findable, accessible, interoperable, and reusable (FAIR). Almost every researcher uses them, yet, we are unaware of published guidance on how web data portals should be funded, built, and run. We present lessons we have learned from creating genomics-focused data portals. We highlight the importance of funders in defining rules, human data wranglers as liaisons, a flexible and simple metadata schema, and a user-centered engineering process. We also present concrete suggestions on accessions, metrics, testing, controlled access, and licenses. Finally, we discuss the unsolved problems of interoperability, portal reuse, and long-term stability. We hope these guidelines can help funders and creators of new data portals develop a better understanding of the unique challenges they may face and possible solutions.
- Matthew L. Speir
- Wei Kheng Teh
- Maximilian Haeussler