Introduction

Microbiome research is an exponentially growing field that spans diverse domains ranging from human health to agriculture to aquatic system functioning1,2,3,4,5. Increasingly, researchers utilize multi-omics approaches to generate large, complex datasets in an effort to understand the genomic composition and functional potential of microbial communities6,7,8. Much of this data is currently generated and documented in non-standardized ways across researchers and organizations, creating challenges for data reuse and ultimately limiting the return on investment for microbiome studies9,10. Efforts have been made by several groups to establish standards and guidelines for microbiome research best practices, such as the genomic standards consortium (GSC) reporting standards, the Strengthening The Organization and Reporting of Microbiome Studies (STORMS) guidelines for human microbiome research, and the controls developed by the national institute of standards and technology (NIST)10,11,12,13,14. While these efforts have established a solid foundation for microbiome data stewardship best practices, the limited awareness and adoption of these standards across the microbiome research community remains a significant hindrance15.

To address this gap in awareness and training, we launched the National Microbiome Data Collaborative (NMDC) Ambassador Program in 2021 as an annual program preliminarily focused on hosting training workshops for metadata standards. Using a cohort-based learning approach, the pilot cohort of Ambassadors were trained on the benefits of findable, accessible, interoperable, and reusable (FAIR) data principles16, metadata standards, and how to use the GSC’s Minimum Information about any (x) Sequence (MIxS) standard14. This pilot cohort of Ambassadors then hosted 23 events over 9 months, reaching over 800 researchers17. While the events ranged from panel discussions to town halls, most events were hands-on workshops focused on providing practical experiences with the MIxS templates to workshop participants. Similar to other successful workshops focused on bioinformatics tools and data science resources18,19,20, we found this training approach to be highly effective17. We also observed the ‘train-the-trainer’ approach had additional value as a means for emerging researchers in the field to serve as leaders within their domain-specific networks and be part of a coordinated national program. This echoes the successes seen with other implementations of this community engagement format21.

Leveraging the lessons learned during the pilot phase, we expanded the NMDC Ambassador Program beyond metadata standards to other aspects of data stewardship within microbiome research and to include content about the three NMDC products: the NMDC Submission Portal, NMDC EDGE22, and the NMDC Data Portal23. This expansion also included the option for Ambassadors to choose one or a combination of three outlined content paths to focus their events on: 1) Microbiome data stewardship, data management, and the NMDC Data Portal; 2) Microbiome metadata standards, metadata templates, and the NMDC Submission Portal; or 3) Multi-omics data processing, standardized bioinformatics workflows, and NMDC EDGE. To better quantify the training activities and impact, we designed a post-event survey to be completed by event participants to evaluate if the educational approach, training materials, and content were effective. The survey contained retrospective questions to capture participant assessments of their own knowledge about the topics covered prior to the event and after the event, while minimizing response shift biases and incomplete datasets that can occur when participants are given separate pre- and post-event surveys24,25,26. Overall, the 2023 Ambassador cohort of 13 Ambassadors hosted 21 events in three countries (USA, Canada, and Japan) and in three languages (English, Spanish, and French), reaching over 550 participants over the course of the year27,28. Herein, we report on the anonymized survey results from 122 participants (22% of the total reported event participants) who responded to the post-event survey. The survey results were analyzed to assess the quantitative impact of this learning model for microbiome researchers, and these insights and lessons learned can serve as a guide for implementing similar training programs in other fields of research.

Methods

Survey design

The post-event survey was designed by the NMDC team and reviewed and approved by the Human Subjects Committee at Lawrence Berkeley National Laboratory as an exempt IRB protocol under #394NR002. Informed consent was obtained from all participants. All methods were carried out in accordance with relevant guidelines and regulations for human subjects research. The post-event survey included the following questions regarding participant event experience (https://doi.org/10.6084/m9.figshare.25045667.v1). The survey questions were grouped into five sections: i) Background; ii) Event experience; iii) Standardization, FAIR data, and Data stewardship; iv) NMDC products; and v) Next steps.

For the first section, Background, we sought to capture information about participant career stage, institution type, and microbiome research domain to better understand who was attending these workshops and if the content was applicable across diverse research backgrounds and expertise levels. This section included three questions, two of which were checkboxes (institution type and microbiome research domain) and one of which was multiple choice (career stage). The second section, Event experience, included six questions focused on assessing participant experiences and overall opinions about the event. This section utilized Likert scales to balance positive and negative options and included one multiple choice question and one long answer question to capture any additional comments about participant event experiences29. The third section, Standardization, FAIR data, and Data Stewardship, included four Likert scale questions based on a retrospective survey design structure. These questions were displayed as multiple choice matrices where participants were asked to rank their familiarity with a topic and their perceived importance of a topic before and after the event to determine gained knowledge throughout the course of the event24,25,26. These questions provided background information on the participants’ knowledge of the topics presented as well as the perceived knowledge gained from the event.

The NMDC Products section included nine subsections, each focused on one of the NMDC products and any hands-on activities included in the event about the products. Participants were only directed to these specific questions if their event included detailed information about any of these products. The last section, Next steps, included three questions to assess how the microbiome community intends to incorporate data stewardship principles into their own work, how they intend to stay connected with the NMDC, and to capture any feedback that may not have been otherwise addressed by the survey. This section was used to evaluate how actionable the event content was and determine potential methods for continued engagement with event participants.

Survey distribution

The approved event survey questions were added to Google Forms for distribution, and survey links for each Ambassador event were created. Ambassadors were provided with the links and QR codes for the survey to distribute to participants following their events. Ambassadors sent the survey link directly to any virtual participants or displayed the link or QR code for participants to access the Form. Many participants utilized the QR code option and answered the survey on their mobile devices. It was estimated that the survey would take around five minutes to complete based on preliminary testing by the survey designers. Participants were given ample time at the end of most of the events to complete the survey and were encouraged to complete it while still at the workshop, but participants with the survey link were able to provide their answers at any time following the event. Participants were provided with the IRB information at the start of the survey and were not required to complete the survey. No survey questions were required to be answered, and all responses were anonymous. Respondents were not compensated for completing the survey, but the benefits of the survey were explained in the context of how the results would help to improve the Ambassador Program and the NMDC products, and that the results would be summarized in a publication. For the six events without survey results, the Ambassadors either ran out of time, the event type was not conducive for this type of assessment (e.g., they participated in a town hall rather than hosted a workshop), or participants were unable to take the survey due to technical challenges (e.g., participants did not have a phone or computer, could not access Google Forms, internet problems). Therefore, the number of respondents does not match the total number of event attendees for this cohort.

Survey results & statistical analyses

Each Ambassador event had a unique survey link and event identifier to enable event-specific response analyses if desired. However, after the completion of the program, all survey data was collated to better assess the impact of the entire Ambassador cohort rather than individual Ambassadors or events. NMDC team members approved to work on this IRB protocol removed or anonymized any potentially identifiable information from the combined results, including the date, time, and other information included in the long answer responses (e.g., participants mentioning their Ambassador event host’s name in a comment).

As not all questions were required to be answered, unanswered questions did not disqualify an entire participant’s survey response from being included in the larger combined dataset, but any unanswered question responses were removed prior to statistical analyses. For the retrospective questions within the Standardization, FAIR data, and Data Stewardship section, participant responses were only included if they answered for both the pre- and post-event options.

Microsoft Excel was used for basic data organization and calculations. For the retrospective survey questions, the results were analyzed using two-tailed paired sample t-tests with a significance cut-off of p < 0.05. Results were transformed into figures using R (Version 4.3.1) , RStudio (Version 2023.06.1 + 524)30, ggplot2 (Version 3.4.3)31, and UpSetR (Version 1.4.0)32 R packages.

Results

The 2023 cohort of 13 NMDC Ambassadors hosted 21 events throughout their term (May 2023 to December 2023). Of the 13 Ambassadors, 3 were graduate students, 8 were postdoctoral fellows, and 2 were research scientists. To foster collaborative and networking opportunities, Ambassadors were encouraged to co-host events with other Ambassadors, and seven of the 21 events were co-hosted by at least two Ambassadors. One Ambassador extended the ‘train-the-trainers’ model into their own events, holding a smaller hands-on workshop with graduate students to give them the tools needed to then co-host an event for undergraduates. Two Ambassadors co-hosted a session within the national summer undergraduate research project (NSURP) aimed at providing rewarding microbiology research opportunities for black, indigenous, people of color (BIPOC) and Latinx students33. Two Ambassadors modified their workshop content to accommodate the primary language of their event attendees, with one Ambassador translating their workshop slides into Spanish for a workshop with undergraduates at the Universidad de Puerto Rico28 and another Ambassador presenting in French for an event at the Université du Québec à Chicoutimi.

Of the 21 total Ambassador events, 15 included the distribution of a post-event survey to capture participant responses about their event experiences, their assessments of the presented materials, and what they gained from the event. Collectively, these 15 events captured responses from 122 participants. None of the questions were required to be answered, therefore, some questions had more participant responses than others.

The Ambassador events reached researchers primarily from academic institutions (76.2%, n = 93), although several events also included participants from government (15.6%, n = 19), industry (2.5%, n = 3), and a combination of the academia and government sectors (5.7%, n = 7) (Fig. 1A). Within these sectors, the majority of event participants were graduate students (39.3%, n = 48), with the remainder spanning all career stage options in the survey, from Undergraduate student (15.6%, n = 19) to Established scientist (13.1%, n = 16) (Fig. 1B). Respondents also reported working with diverse microbiome environments, ranging from human skin to freshwater, with the majority focused on animal-associated and soil microbiomes, and more than half (51.7%, n = 62) reporting that they work with at least two of the listed microbiome environment types (Fig. 1C).

Fig. 1
figure 1

Self-reported responses from event participants regarding their demographic information and research background. (A) Sector of participant primary institution, as reported from a checkbox question with the prompt “Which sector best describes your primary institution? Check all that apply” (total responses n = 122). (B) Participant career stage, as reported from a multiple choice question with the prompt “What career stage do you identify with?” (total responses n = 122). (C) Participant microbiome environment research domain as defined using the GSC’s MIxS environment extensions (total responses n = 120).

Four questions about overall event experience were asked using a Likert scale format from 1 (Strongly disagree) to 5 (Strongly agree) to assess participants’ overall thoughts, feelings, and takeaways from the events. For the question, “The content of the event was useful and appropriate to my existing level of knowledge about the subject”, 95% (n = 116) of respondents reported a 3 (Agree), 4, or 5 (Strongly agree) (Fig. 2A). Participants were asked if they learned something new from the event, and 98% (n = 119) responded with a 3 (Agree), 4, or 5 (Strongly agree) (Fig. 2B). This spanned all career stages, including established scientists, where 94% of respondents reporting this career stage (n = 15) selected a 3 (Agree), 4, or 5 (Strongly agree), indicating that these topics and learning methods were even relevant to those more senior in their field. Participants also provided high ratings for “The materials for this event were effective for learning the content” (97%, n = 118 reported a 3, 4, or 5) and “I felt that my contributions and questions were welcome during the event” (98%, n = 119 reported a 3, 4, or 5), indicating that the training materials were effective and that the Ambassadors fostered an inclusive and collaborative environment for discussion (Fig. 2C,D).

Fig. 2
figure 2

Histograms representing the number of participants that reported each rating in response to Likert scale questions about overall event experience. Rating scale ranged from 1 (Strongly disagree) to 5 (Strongly agree) (n = 122 for total responses for all four questions). The entire Event experience section of the survey included six questions. The two questions that are not reported here were a yes/no/other multiple choice for a prompt about if participants could access the technology and materials (e.g., Zoom, the NMDC websites, the activity materials), and a free-form long answer text question prompting for any additional comments about overall event experience. (A) Prompt: The content of the event was useful and appropriate to my existing level of knowledge about the subject. (B) Prompt: I learned something new from this event. (C) Prompt: The materials for this event were effective for learning the content. (D) Prompt: I felt that my contributions and questions were welcome during the event.

Event attendees were provided with a series of retrospective questions where they were asked to rate their familiarity with or their perceived importance of concepts before and after the event to assess knowledge gained over the course of the event from the Ambassadors and the training content. When asked about their familiarity with the FAIR data principles, 86% (n = 101) of respondents reported increased familiarity after the event (Fig. 3A). Nine of the 17 respondents who did not report an increase started off with a 5 (Very familiar) rating and ended at a 5. The mode of this dataset shifted from 1 for prior to the event to 4 for after the event, and the two-tailed, paired sample t-test results indicated a significant shift in the mean from 1.93 to 3.76 (p = 1.39E-35). Participants were then asked to rate their familiarity with existing metadata standards and standard templates, and even though only nine of the 16 Ambassador events covered this topic in great detail, 82% (n = 98) of attendees reported an increase in familiarity (Fig. 3B). The mode for the responses to this question also shifted from 1 to 4, and the mean from 2.18 to 3.59 (p = 1.31E-30). To gauge the microbiome community’s recognition of the importance of standardization in multi-omics data processing, participants were asked to rate the importance of standardization in data processing to enable data reusability (Fig. 3C). This question received the lowest percentage (58%, n = 69) of rating increase, as many of the event attendees already had a high level of awareness and recognition for this concept before the events (84%, n = 100 started at a 3, 4, or 5). However, the improvements in ratings were still significant, with a 3.60 mean for before the event and 4.50 mean for after the event (p = 1.02E-17).

Fig. 3
figure 3

Grouped histograms representing the retrospective survey data where participants were asked to rank their familiarity with a topic or their perceived importance of a topic before and after the Ambassador-led event. (A) Prompt: Please rate your familiarity with the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles; Rating scale ranged from 1 (Not familiar) to 5 (Very familiar); Response n = 117. (B) Prompt: Please rate your familiarity with existing metadata standards and standard templates; Rating scale ranged from 1 (Not familiar) to 5 (Very familiar); Response n = 119. (C) Prompt: How would you rate the importance of standardization in data processing to enable data reusability?; Rating scale ranged from 1 (Not important) to 5 (Very important); Response n = 119. (D) Prompt: Please rate your familiarity with the NMDC, its mission, and its products; Rating scale ranged from 1 (Not familiar) to 5 (Very familiar); Response n = 120.

To assess how the Ambassador events impacted recognition of the NMDC and associated activities, participants were asked to rate their familiarity with the NMDC, its mission, and its products both before and after the event (Fig. 3D). This question had the highest percentage (93%; n = 112) that reported an increase in their familiarity ratings. The mean response rating significantly increased from 1.58 to 3.70 (p = 1.39E-47). No respondent gave a 1 (Not familiar) rating after the event.

To assess the potential lasting impact of these workshops and the perception of the utility of the presented concepts, participants were asked to respond with a Likert scale rating from 1 (Strongly disagree) to 5 (Strongly agree) to the prompt, “I plan to incorporate the concepts of FAIR microbiome data, data reuse, proper data management, data stewardship, and/or data standards into my work” (Fig. 4A). Of the total responses, 99% (n = 116) indicated a 3, 4, or 5. Attendees were also asked about their potential continued involvement with the NMDC and its products. Seventy-one respondents indicated that they plan to continue their engagement with the NMDC by using one of the products, and twenty-one attendees expressed interest in applying for a future cohort of the Ambassador Program or the NMDC Champions Program.

Fig. 4
figure 4

Insights into participant usage of -omics data and data stewardship best practices following the event. (A) Histogram of Likert scale [1 (Strongly disagree) to 5 (Strongly agree)] participant responses to the prompt, “I plan to incorporate the concepts of FAIR microbiome data, data reuse, proper data management, data stewardship, and/or data standards into my work” (total responses: n = 117); (B) Number of participants that selected each response to the prompt, “Which bioinformatics workflow(s) are you most likely to use moving forward? Check all that apply” (total responses: n = 68); (C) Upset plot depicting the -omics types and combinations of -omics types selected by participants in response to the prompt “What -omics data types would you be interested in searching for/reusing from the data portal? Check all that apply” (total responses: n = 108).

To better understand the interests and needs of the microbiome research community regarding resources and training materials, participants were surveyed about the bioinformatics workflows and -omics data types they would most likely use moving forward. Participants indicated the most interest in utilizing the NMDC standardized metagenome bioinformatics workflows in the future (Fig. 4B). The next most popular choice was the viruses & plasmids workflow, which takes in assemblies and provides reports about any viruses or plasmids detected in the sample, including taxonomic, quality, and antimicrobial resistance gene information34. Participants were also asked, “What omics data types would you be interested in searching for/reusing from the Data Portal?” (Fig. 4C). Similarly, most respondents reported that they are most likely to reuse metagenomics data (90%, n = 97), and 56 of the 108 participants that responded to this question selected multiple -omics types, suggesting an interest in reusing multi-omics datasets.

While the aim of this study was to quantify the impact of the Ambassador Program, the survey also captured non-quantitative attendee feedback in the free-form text prompts. Highlights of responses include “Very well done, it was easy to follow along”; “Really nice presentation by [Ambassador]! Enjoyed and learned a lot”; “Great presentation in our language”; “Great job answering all the questions!”; “Excellent information shared, and I hope to connect with the presenters!”; “Loved the [Data Portal] scavenger hunt [activity], [it] was a very useful way to see if I actually understood”; “The organization and initiatives were great. Hope it can expand to various levels of academic research! Great resources!”. The majority of free-form text responses were positive, with a handful of neutral or more critical responses that include “I struggled to understand the material since this is my first experience with research”; “Please present the best research, not only the tools”; “Data standardization is generally a pretty dry (albeit important) topic. To compensate, I think the presentation should be more engaging. Maybe go through some processes available”. All combined survey responses are available at https://doi.org/10.6084/m9.figshare.25045667.v1.

Discussion

Existing educational resources for microbiome research are typically focused on technical aspects of microbiome data analysis and rarely emphasize data standards and stewardship principles35,36. To expand the awareness and adoption of best practices in data stewardship, management, and standards, there is a need for resources and coordinated programs focused on these core themes that encompass the interdisciplinary nature of microbiome research. To have the greatest impact on microbiome research practices for future projects, these programs should ideally target early-career researchers and diverse audiences, and be intentionally structured to maximize the learning of these concepts throughout the community. The NMDC Ambassador Program previously demonstrated its utility for expanding the reach of these training materials17, and the quantitative results described herein demonstrate the value of community learning models towards increasing awareness across the field. All of the post-event survey retrospective question responses showed an overwhelmingly positive trend towards an increased understanding of metadata standards, bioinformatics workflows, standardization, the FAIR principles, and the NMDC program following the Ambassador events. This included researchers from various career stages, institutions, and microbiome environments, demonstrating the Program’s efficacy across experience levels and backgrounds.

Several challenges and lessons learned were encountered and documented during the 2023 Ambassador workshops. While hybrid formats can make events more equitable and accessible, they presented challenges for the Ambassadors similar to what others in the field have encountered37. Virtual attendees reported issues hearing questions and answers and experiencing a lack of virtual engagement. To mitigate some of these challenges with virtual and hybrid events, Ambassadors were able to request support from the NMDC team, and team members attended many of these events to assist with monitoring questions and providing links. Not all events allotted adequate time for the survey, and some participants had issues accessing the survey form (e.g., could not connect to the internet, could not access Google Forms due to a firewall). Although the survey responses indicate that the content was broadly applicable across career stages, there was some constructive feedback, for example: “I struggled to understand the material since this is my first experience with research”, that led to discussions on how to best modify event content depending on experience levels. Check-in calls occurred with the Ambassadors throughout their term to facilitate discussion, leading to valuable insights that enabled adaptation throughout the year to subsequent workshops. This prompted exchanges between Ambassadors ranging from “Make sure to tell participants to bring their laptops for hands-on activities” to “I underestimated how much coffee to order”. Overall, this study was limited by a lack of participation in the post-event surveys and its reliance on self-reporting. Further, we acknowledge that our analysis pooled the survey results from diverse events and workshop types with a range of participants (e.g., 4 to 45 participants). Future efforts will focus on gathering survey information from more participants to improve the representation of feedback. Additionally, the retrospective survey methodology does have known limitations, but it was chosen to minimize the number of incomplete datasets from participants only answering either a pre- or post-event survey, and to minimize biases with what participants think they know about a topic prior to the educational event24,25,26.

Beyond the broader concepts, the results also indicate that this ‘train-the-trainers’ model is effective for communicating the NMDC mission and products to diverse audiences, and expands upon the number and reach of events that would have been possible by NMDC team members alone. The development of the NMDC products relies on user input to ensure we are addressing community needs, and the Ambassador events and survey provided valuable feedback to the team from diverse groups and research domains outside of our direct network. The data about bioinformatics workflows and usage of multi-omics data is indispensable for understanding current and future user needs. The fact that over twenty respondents indicated interest in applying to the NMDC Ambassador or Champions Programs also highlighted the success of these events for promoting interest in continued engagement across microbiome research communities.

Conclusions

Educational efforts such as the NMDC Ambassador Program will continue to be invaluable to microbiome science, as they expand the distribution of microbiome research best practices to diverse audiences and institutions. Here, we quantitatively measured how the 2023 Ambassador events increased awareness of standardization efforts and data stewardship practices. Implementing and adhering to microbiome data stewardship best practices, FAIR data principles, and standardization efforts will undoubtedly lead to overall improvements in the generation and utilization of microbiome data, thus increasing scientific outputs and innovation both in the short term and in years to come. Hands-on workshops like those presented here will continue to be critical in microbiome workforce development and will contribute to training the next generation of microbiome scientists. Insights from the NMDC Ambassador Program and its post-event survey can be a generally useful model to consider in other training programs.