Introduction

The One Health approach emphasizes the interconnectedness of human, animal, and environmental health, and has gained relevance in recent years due to the emergence of zoonotic diseases and increasing antibiotic resistance1. The importance of One Health in epidemiological surveillance lies in its capacity to monitor and control infectious diseases across different domains, enabling a comprehensive understanding of disease dynamics and risk factors2. By integrating human, animal, and environmental health data, One Health surveillance systems facilitate early detection and response to health threats, thereby enhancing global health security and reducing the burden of disease3. Furthermore, this integrated approach fosters multidisciplinary collaborations among stakeholders to develop and implement coordinated strategies aimed at disease prevention and control4. Recently, the conventional process of serotyping through serology has been undergoing a gradual transformation, with molecular typing methods such as multi-locus sequence-based typing (MLST)5 increasingly complementing or replacing traditional methods. However, these techniques do not possess the necessary discriminatory power to differentiate between closely related strains6, which limits its application in many epidemiological studies. In recent years, High-throughput sequencing (HTS) technologies have revolutionized the field, enabling rapid and cost-effective analyses of complete genomes7. Whole-genome sequences (WGS) offer an unparalleled level of discrimination among genetically related isolates, allowing for the exploration of compelling questions such as accurate phylogenetic and phylogenomic analyses8, as well as the examination of serotype- or subtype-determining genes9. Thus, the use of sequence-derived typing is gaining acceptance and is being employed for source attribution and epidemiological surveillance10. Accordingly, different analytical strategies have been developed11,12,13, and numerous studies have demonstrated the potential of WGS in epidemiological investigations14,15,16,17,18. This advancement in genomic sequencing is revolutionizing the way in which the epidemiology of various pathogens is approached and assessed and, actually, the European Centre for Disease Prevention and Control (ECDC)19 and the World Health Organization20 have recommended the use of WGS as the gold standard methodology for surveillance of bacterial pathogens. One interesting aspect of WGS is its ability to provide not only precise typing but also the opportunity to conduct additional analyses beyond routine surveillance, such as assessing the existence of genetic factors related to antimicrobial resistance and virulence, which holds great significance in the context of the One Health approach21. This approach is rapidly emerging as the primary framework for monitoring and managing antimicrobial resistance, establishing a compelling connection between genomics and comprehensive control strategies.

Andalusia, with a population of 8.5 million inhabitants, is the third largest region in Europe and has the size of a medium-sized European country, like Switzerland or Austria.

On May 15, 2020, the Andalusian Local Ministry of Health entrusted the Progress and Health Foundation to initiate a program aimed at monitoring pathogens through whole genome sequencing and bioinformatic analysis of a specific set of bacteria with significant implications in public health22. This marked the inception of the SIEGA initiative.

Subsequently, the Regional Ministry of Health and Consumer Affairs designed a genomic sequencing circuit aimed at facilitating this integration, which was included in Instruction 130/201923 on the treatment and sequencing of biological agents isolates in Andalusia with a focus on health protection, inspired in The Transformation of Reference Microbiology Methods and Surveillance for Salmonella With the Use of Whole Genome Sequencing in England and Wales24. Figure 1 sketches the general operating layout of the circuit.

Figure 1
figure 1

The SIEGA circuit. The different provinces that collect samples send them for extraction of DNA to the different reference laboratories, which is subsequently sent to the sequencing facilities and finally the resulting genomic data is uploaded in the central SIEGA data management system. The map of Andalusia was generated using mapSpain software from: https://ropenspain.github.io/mapSpain/articles/x02_mapasesp.html. The figure was generated with PowerPoint.

Any laboratory that isolates any strain of the pathogens of interest can send it, under the conditions set out in that instruction and accompanied by basic metadata, to the Public Health Laboratory of Málaga, under the Regional Ministry of Health and Consumer Affairs, where DNA extraction is carried out. Once the DNA is extracted, it is preserved in freezing conditions and sent to the sequencing laboratory at the Andalusian Molecular Biology and Regenerative Medicine Center (CABIMER).

Additionally, other collaborating centers have been incorporated, which have either provided DNA extracts (University Hospital Puerta del Mar in Cádiz) or directly provided sequences (University Hospital Virgen del Rocío or University Hospital San Cecilio).

Non-clinical origin samples primarily stem from official veterinary control activities in primary production, such as the implementation of the Annual National Program for the Control of Certain Serotypes of Salmonella in meat chickens of the species Gallus gallus. They also originate from Official Control Services in stages subsequent to primary production, which are included in the National Official Control Plan for the Food Chain. This was agreed upon in coordination meetings between the competent authorities. Additionally, other samples from investigations conducted within the framework of the management of foodborne outbreaks are included. All these samples are analyzed in designated official laboratories in accordance with the provisions of Article 37 of Regulation 2017/625, which, in the case of Andalusia, follow the aforementioned Instruction 130/2019.

Clinical origin samples are obtained in the exercise of the healthcare function of the referring centers on a voluntary basis, also applying the aforementioned Instruction 130/2019.

Here we present an application, SIEGA (acronym for “Integrated system of genomic epidemiology in Andalusia” in Spanish), that supports and facilitates the region-wide genomic surveillance system. As of July 2023, SIEGA contains a total of 1906 bacterial genomes corresponding to 670 Salmonella enterica, 688 Listeria monocytogenes, 276 Campylobacter jejuni, 191 Escherichia coli, 23 Yersinia enterocolitica and 58 Legionella pneumophila, collected at the different provinces of Andalusia in food products, factories, farms, water systems and human clinical samples (although these numbers change rapidly due to the continuous increase of samples). SIEGA contains information on the origin of the samples, their typing and resistance and virulence genes. It consist of two modules, one of them is public and contains descriptive information on the samples in the conventional NEXSTRAIN representation25, and the other one, the SIEGA management data system, is a private Laboratory Information Management System (LIMS) for the use of the personnel of public health and the clinicians involved in the surveillance program. The LIMS allows users to build customized reports on the isolates, provides tools for detecting transmission chains and implements an automated alert system that promptly signals whenever a newly detected bacterium exhibits a predetermined genetic similarity to any existing database entry, enhancing vigilance and response capabilities. Some statistics on the isolates as well as a more detailed study on the relationships between them and the distribution of resistances across samples is provided.

Results

The public SIEGA webpage

SIEGA has a public website25 where a detailed description of the circuit and updated information on the species under surveillance is available for the general public. Having a public webpage for a project of genomic surveillance of pathogens is vital for promoting transparency, disseminating knowledge, fostering collaboration, and bridging the gap between the scientific community and the general public. By providing open access to information on the surveillance achievements, SIEGA encourages participation from diverse stakeholders, and creates a more knowledgeable and engaged society in the ongoing fight against infectious diseases. SIEGA offers access to Nextstrain Auspice26 viewers for the different species under surveillance: Salmonella enterica, Listeria monocytogenes, Campylobacter jejuni, Escherichia coli, Legionella pneumophila and Yersinia enterocolitica (see Fig. 2). In the viewers, the public can explore the relationships among the samples sequenced in the region and other international samples of reference. It is also possible to locate in a map the geographical origin of the different isolates, including the animated options available in the interface, which emulate in a very visual way the transmission of the samples over time.

Figure 2
figure 2

Phylogenies from Nextstrain viewers for: (A) Salmonella enterica, (B) Listeria monocytogenes, (C) Campylobacter jejuni, (D) Escherichia coli, (E) Legionella pneumophila and (F) Yersinia enterocolitica.

The SIEGA data management system

The SIEGA data management system serves as a private LIMS designed for utilization by personnel in public health and participating clinicians of the surveillance program. SIEGA facilitates the seamless uploading of raw sequencing data and orchestrates automated processing, including quality control assessments, considering the Guidelines for reporting Whole Genome Sequencing-based typing data through the EFSA One Health WGS System of the European Food Safety Authority (EFSA)27. Through this platform, users can generate tailored reports concerning the isolates, explore potential transmission chains, and deploy an automated alert mechanism that promptly signals any genetic similarity between newly identified bacteria and entries within the existing database. This system bolsters vigilance and response capabilities. Furthermore, SIEGA furnishes statistical insights into the isolates, along with a comprehensive exploration of their interrelationships and the distribution of resistances across samples.

Within the SIEGA interface, each organism has five distinct subsections to facilitate comprehensive analysis that include sample status, metadata, control results and Flexible Table, a wizard to combine metadata for complex representations (see Table 1).

Table 1 SIEGA features and description.

The entire SIEGA data management system has been designed to free users of the need to have in-house expertise in genomic data management and resources to store such data. Simultaneously, the system offers a centralized database of all genomes sampled, allowing optimal exploitation of the results and the correct implementation of one health surveillance. It provides a convenient user permission structure to allows data sharing and collaborative work (if desired), detailed quality control for the standard pipelines used for data processing and accurate data traceability. Beyond facilitating collaborative work, user permission also helps with data privacy in samples of human origin. In any case, metadata does not include any field with personal identification and, ultimately, it is user responsibility not including any information that would reveal the origin of the sample. In general, data is treated in an open data philosophy where possible, according to FAIR principles (findable, accessible, interoperable and reusable)28.

Detailed reports on each sample, that include MLST, cgMLST, serotyping outcomes, antimicrobial resistance and virulence genes, plasmids and a phylogenetic tree with the most related samples are provided. Additionally, customized phylogenetic analysis can be carried out in an easy and intuitive way. Finally, one of the most interesting features is the automatic alert system. SIEGA can be configured to automatically send a warning when a new sample is introduced that meets some criteria defined by the user, based on genetic distance, serotype, presence of antimicrobial or virulence genes, etc. (see Table 1 and Supplementary results for details).

Advantages of SIEGA

The overarching goal of the SIEGA initiative, and particularly its SIEGA data management system, is to streamline the implementation of epidemiological surveillance, especially at the regional level or within large communities engaged in the One Health approach. It offers a continuously upgraded environment for managing genomic data, eliminating the need for end-users to establish their own bioinformatics teams for data processing and interpretation, as well as to invest in computer infrastructure for data storage and analysis. Furthermore, because data undergo uniform processing through a regularly updated pipelines adhering to international analysis standards, the results are consistent and can be readily compared. This democratizes genomic surveillance by involving all necessary stakeholders, as the SIEGA platform provides the essential resources for both data processing and interpretation.

Current users

Microbiology laboratories across different hospitals in Andalusia are actively utilizing SIEGA. In addition, the “Sistema de Vigilancia Epidemiológica de Andalucía” (SVEA) from the “Junta de Andalucía” and professionals in the field of Health Protection are also users of SIEGA. In adherence to the One Health principle, individuals involved in both animal health management and laboratory aspects have been incorporated as users of SIEGA.

Prospective new users must contact SIEGA through the contact address in the public SIEGA web page.

Genomic surveillance of isolates

Salmonella enterica

The SIEGA encompasses a dataset comprising 670 whole genome sequences of Salmonella enterica, which were sequenced from June 2020 to July 2023 using samples collected between 2013 and 2023. Within this dataset, 42.54% (285) of the samples were sourced from clinical origins, 34.63% from the food-related sector, and 21.34% from livestock sources. A total of 448 distinct Sequence Types (STs) were identified and categorized into clonal complexes (CCs). The prevailing ST, ST 309694 (corresponding to clonal complex ST-71), was encountered on 26 occasions. Additionally, 83 strains exhibited concurrence with more than one ST. 6 STs (ST-67337, ST-138467, ST-197094, ST-207307, ST-247937, and ST-320298) were found cross-wide clinical, food-related, and livestock-origin samples. Similarly, 18 STs were identified in both clinical and food samples, and 7 STs were identified in clinical and livestock-origin samples.

Listeria monocytogenes

The SIEGA includes a dataset comprising 678 whole genome sequences of Listeria monocytogenes, which were sequenced from June 2019 to July 2023. Within this dataset, 69.61% (472) of the samples were sourced from clinical origins, including all the samples from Andalusia sequenced by the Neisseria, Listeria and Bordetella Unit of the National Centre for Microbiology in Spain while investigating the Listeriosis outbreak caused by contaminated stuffed pork in Spain in 201929, and 30.38% (206) from food origin. A total of 248 distinct STs were identified and categorized into CCs. The prevailing ST, ST 29,514 (corresponding to clonal complex ST-388), was encountered on 210 occasions.

Campylobacter spp

The SIEGA contains 276 whole genome sequences of Campylobacter, received between December 2020 and June 2023, corresponding to both C. jejuni and C. coli. Most of the sequences have been obtained from human clinical strains from two reference hospitals in Cádiz and Seville. There are some STs, grouped into CCs, which have been detected with greater frequency in clinical samples. From ST-16294 (corresponding to the clonal complex ST-206) 12 sequences have been obtained, with the interest of being detected from 2020 to 2023 and in a scattered way, with 8 isolates in Cádiz and 4 in Seville. Their identical virulome and resistome profiles have been recovered from these sequences, using the tools described, particularly ABRicate30 on VFDB31 and CARD32, databases. Another frequent STs have been ST-12550 (ST-573CC) and ST-18855 (ST-52CC).

Escherichia coli

The SIEGA includes 121 whole genome sequences of Escherichia coli, 44 of them downloaded from the EnteroBase website33 for reference and 77 of them collected between November 2021 and May 2023. To date, most of the sequences (72) have been obtained from food samples taken at retail level on behalf of the monitoring programme of anti-microbial resistance (AMR) according to the provisions of the Commission Implementing Decision (EU) 2020/172934 implemented in Andalusia. The human clinical strains come from two reference hospitals in Cádiz and Seville. There are STs, grouped into CCs, which have been detected with greater frequency in food samples. The prevailing ST, ST 169652 (corresponding to clonal complex ST-10) was encountered in 6 occasion all from food samples. Additionally, other 4 strains grouped in ST142026 (CC 155) and 3 strains exhibited concurrence with ST 191979 (CC 162) or 60064 (CC 93). To the date, no shared CCs have been detected in the food and human clinical origin samples.

Yersinia enterocolitica

The SIEGA encompasses 23 whole genome sequences of Yersinia enterocolitica, received in the year 2022, sampled between February and November. To date 21 (91.3%) of the sequences have been obtained from clinical samples from one of the reference hospitals, the Hospital Virgen del Rocio in Seville, and 2 were obtained from food samples. There are some STs, grouped into CCs, which have been detected with greater frequency in these clinical samples. The prevailing ST, ST-1574 (corresponding to clonal complex ST-135), was encountered on 4 occasions. Additionally, 3 strains exhibited concurrence with ST-52 and 2 strains grouped into ST-1716 (corresponding also to clonal complex ST-135). Figure 3 depicts the genetic relationships between all the Yersinia enterocolitica samples.

Figure 3
figure 3

Y. enterocolitica GrapeTree representation, generated within the SIEGA application. Node labels represent ST (in some cases an ambiguous ST assignation occurred and more than one number is displayed) and node color correspond to the sampling month (a warm gradient has been used to better display the time scale). Numbers in the branches correspond to the allelic distances among nodes. The GrapeTree representation provides an intuitive visualization of the temporal scale of sampling and the genetic similarities among the samples. Using different labels from the metadata and the results tables, it is possible to obtain visual representations of many aspects of the epidemiology of the selected samples.

Legionella pneumophila

Legionella data stored in SIEGA, includes the comparative analyses of 58 Legionella pneumophila isolates during 2021–2023. Of these, 12 isolates corresponded to clinical isolates and 46 to environmental isolates. Clonal relation between the Isolates was determined by cgMLST. This scheme classified the isolates into 18 ST (sequence type). The most abundant being ST 293 (20 isolates, 34.5%) and 180F (11 isolates, 18.9%). Four ST (293, 427, 489 and 180F) were present in both clinical and environmental isolates. In addition, we identified 3 STs (95, 98, and 524F) in clinical isolates that are not associated with environmental origin, suggesting that they derived from unrecognized sources.

Analysis of antimicrobial resistance

In the EU, the new legislation related to the harmonized monitoring and reporting of AMR from 202135 authorized whole genome sequencing as an alternative method to supplementary phenotypic testing of Salmonella and E. coli in certain conditions. The SIEGA allows the monitoring of the presence of resistance genes in the different microorganisms facilitating the tracking of the dissemination or emergence of AMR throughout the food chain under a One Health approach. For example, the presence of AMR genes in the population of Salmonella included in the SIEGA can be analyzed, categorized by antimicrobial classes (Fig. 4). This analysis reveals that 52.7% (347) exhibit resistance genes to only 1 group of antimicrobials, while 15% (99) carry resistance genes to two distinct classes of antimicrobials. In contrast, 32.2% (212) demonstrate resistance genes to 3 or more classes of antimicrobials. In a similar manner, this could be carried out with the other microorganisms hosted in the database, or further analysis could be conducted by delving into the multiple variables, for instance, this could involve monitoring the emergence of Salmonella strains harboring colistin resistance genes, the occurrence of Salmonella strains harboring resistance genes to fluoroquinolones and third-generation cephalosporins or monitoring the presence of resistance genes to Critically Important Antibiotics (CIAs) in the database. The Fig. 4 represents a summary of the observed frequency of potential multi-resistances, represented as the number of different AMR genes corresponding to different antimicrobial classes harbored by each individual sample.

Figure 4
figure 4

Observed frequency of potential multi-resistance cases found among the Salmonella samples sequenced. Number of samples in which from only one to up to 9 different AMR genes have been found.

Another perspective for monitoring antimicrobial resistance is through the surveillance of plasmids harboring these resistance genes. SIEGA flexible tables allow the integration of data to both antimicrobial resistance and plasmid tracking, facilitating a comprehensive analysis. This data can be graphically represented on a phylogenetic tree, which illustrates the STs that have acquired resistance-bearing plasmids. Moreover, this representation can highlight whether such acquisitions have occurred within the same time, geographical region, livestock farm, food processing plant, grocery store or healthcare facility, thus providing critical insights into the patterns and pathways of resistance spread. Figure 5 illustrates a case is the plasmid NZ_AJ437107, which harbors a beta-lactam resistance gene. This plasmid has been acquired in both livestock and food samples with the same sequence type, similar date and from the same geographical location, suggesting a potential selection pressure in certain environments favoring the acquisition of beta-lactam resistance genes.

Figure 5
figure 5

Phylogenetic tree, based on allelic differences (log scale), of Salmonella enterica isolated from the same locality, grouped by allelic profile (the circle size correlates with the number of samples). Blue dots represent strains harboring the plasmid NZ_AJ437107, and the number inside each dot indicates the corresponding MLST for each sample.

Some successful SIEGA use cases

Despite its incipient use, SIEGA has already proven its usefulness in several cases. Among them it is worth mentioning the investigations carried out in connection with an outbreak of Salmonella Agona, as declared by Norwegian authorities36, entailed, irrespective of field investigations, a comprehensive review of the SIEGA database in pursuit of genomic congruences. This study resulted in the absence of any coincident strains. Another case was the investigation regarding the food alert notification issued under the identifier 2020.5961 within the European Commission Rapid Alert System for Food and Feed (RASFF)37, involving actions that extended beyond on-site measures. These actions included obtaining genomic sequences from food samples supplied by the Finnish food safety authorities. These sequences were then integrated into the SIEGA database. However, no matches were identified both at the time of integration and among subsequent samples added to the system. Further investigations undertaken in relation to the “Joint ECDC-EFSA rapid outbreak assessment” released on July 27, 202338, in which Spain was identified as one of the conceivable sources of the suspected transmission vehicle, involved, regardless of field actions, a reassessment of the SIEGA database in search of genomic coincidences, resulting in the non-existence of any matching strain.

These cases clearly illustrate how SIEGA is valuable in ruling out the existence of coincident strains in our database with strains from evaluations of other outbreaks detected at the national or European level, thus facilitating decision-making. On the other hand, alerts are generated that point out possible connections between the stored strains and the ongoing outbreaks, accelerating the possible identification of the source of origin in an outbreak, an aspect that is very difficult to identify with previous methods.

Discussion

In order to effectively apply the One Health approach to the surveillance and control of antimicrobial resistance, it is crucial to adopt methodologies that ensure consistent results across diverse settings and various sample sources, facilitating comprehensive and reliable data analysis39. Traditionally, surveillance efforts have predominantly centered around specific organisms, prioritizing genotypic and/or phenotypic characteristics that are deemed significant for individual pathogens. Consequently, a multitude of distinct methodologies emerged, giving rise to challenges of consistency, applicability, standardization, and scalability across laboratories and pathogens. Addressing these issues necessitated the implementation of rigorous protocols, regular quality and harmonization controls, as well as the adoption of shared platforms and common controls for result dissemination, thereby mitigating some of the aforementioned concerns40. Nonetheless, the advent of high-throughput sequencing and the accessibility of obtaining comprehensive or nearly comprehensive genome sequences from bacterial isolates within a reasonable timeframe and cost have brought about a profound transformation in the pursuit of surveillance objectives, addressing a significant portion of the previously mentioned challenges. While complete realization remains a work in progress, the trajectory appears unambiguous, with the majority of agencies transitioning toward genome-centric surveillance. This approach presents numerous advantages, notably the capacity to establish connections among surveillance outcomes across diverse levels, locations, and species, which holds particular relevance for the One Health framework41,42.

Currently, the genomic surveillance in Andalusia comprises 6 main pathogens, although new pathogens will be included in brief, like Vibrio or others, pending further decisions. Specifically, Campylobacter is one of the major foodborne pathogens of concern in its growing trend of antimicrobial resistance. C. jejuni and E. coli are the major causative agents, with C. jejuni contributing to most of the cases in approximately 90% in the world. Infection is transmitted to humans due to consumption of contaminated food and water43. It is very necessary to establish the chains of transmission between reservoir animals, food and humans. There are few published studies on the genomic characterization of human clinical strains44,45 and their relationship with outbreaks46. It would be very useful to establish sentinel surveillance, such as the one implemented in Ireland recently47. The difficulty in maintaining the viability of Campylobacter in environmental samples is a challenge to increase the data that provide comparative information with clinical strains.

Although SIEGA represents a regional solution, Andalusia due to its large, country-like size, can be considered an illustrative case of implementation for a medium-sized region in Europe, serving as an exemplary guide for other authorities wishing to adopt a similar model. Actually, within an international context, SIEGA has yielded results in ruling out the involvement of products originating from Andalusia in outbreaks occurring in other European countries. Moreover, upon reaching a national-level management agreement, it will facilitate the inclusion of non-human origin strains in the European Food Safety Authority (EFSA) sequencing database, as the quality controls and bioinformatics analysis tools described in [reference] have been implemented. This will enable both EFSA and the ECDC at the European level to interact with genomic information originating from Andalusia. It is also important to remark that SIEGA has been designed with the capability to incorporate new domains, such as strains originating from other territories, countries or entities with its multi-level user management system and flexible permissions. Obviously, expanding the scope and the user’s base comes with the challenge of scalability and sustainability. While sustainability is assured by the support of the Andalusian Local Ministry of Health and technical limitations due to scalability to a large number of samples are not expected in a near future, a limitation could arise from the need of implementing many diverse pathogens (viruses, bacteria, and maybe fungi) in the near future, to implement new sequencing technologies and other types of samples, like metagenomic surveys, wastewater surveillance, etc.

There are some publicly available tools that can be used for genomic surveillance. Pathogenwatch48 is perhaps the most popular one, that share some functionalities with SIEGA. It can upload and process about 20 of pathogen species and provide some functionalities like AMR prediction and Core SNP-based trees, although data sharing facilities are quite limited and it does not implement anything like an alert system. The EFSA One Health WGS system27 is another tool with restricted access, only for official data providers, designed by country officers, also with limited functionalities and managing only three pathogen species. Another recent initiative, the Swiss Pathogen Surveillance Platform, developed a tool for genomic surveillance, but mainly oriented to facilitate visualization of the results from an epidemiologic point of view, like maps, time evolution of samples, etc.49. Other specific tool are Microreact50 or Nexstrain26, with functionalities limited to data visualization and sharing for genomic epidemiology.

Summarizing, this work presents SIEGA, an advanced integrated One Health system that allows precise surveillance of environmental pathogens, permitting a comprehensive characterization of new isolates with data quality control and data traceability included. SIEGA facilitates automated generation of customized reports that include similarities with other samples, represented as a dendrogram, potential AMR or virulence genes detected, among other details. The alerting functionality of SIEGA, that alerts as immediately as one sample is introduced that meets a predefined similarity conditions, is a revolutionary tool for detecting transmission chains at an early stage. In addition, the possibility of crossing metadata and representing them over sample dendrograms allows performing detailed retrospective studies of different sample features (e.g., emergence or transmission of AMR, etc.) Moreover, since SIEGA connects environmental with clinical samples it allows tracing clinical occurrences back to their environmental origins, permitting rapid interventions targeted to the source of the outbreak. In addition, the success in discarding relationships in the use cases underscore the intricate challenges in elucidating the etiology and dynamics of microorganism-related incidents. The pursuit of genomic concordance within the SIEGA database, although yielding no direct matches, serves to illuminate the genetic diversification that these pathogens can exhibit. These episodes emphasize the need for continued vigilance, inter-agency cooperation, and cutting-edge molecular methodologies to fortify our comprehension and management of such microbial phenomena. It is worth noting that the modular structure of SIEGA allows the incorporation of new pathogens as the surveillance policies consider them relevant.

Thus, by leveraging high-throughput sequencing technologies, advanced bioinformatics tools, and robust data sharing platform, SIEGA offers a comprehensive view of the genetic landscape of pathogens across different geographical regions and host populations.

Conclusions

Centralized circuits of genomic surveillance based on whole genome sequencing (WGS) as SIEGA provide a convenient and efficient way to monitor infectious diseases and detect outbreaks in real-time51. Such facilities enable the characterization and relatedness determination of bacterial isolates, aiding in tracking transmission patterns and implementing effective infection control measures4. Moreover, WGS has been incorporated into public health surveillance systems, offering significant contributions to outbreak investigations, infection prevention, and control51. The potential integration of WGS into epidemiological investigations has been highlighted, emphasizing the need to establish optimal models for data integration and evaluate public health impacts resulting from genomic surveillance52. By harnessing the power of whole-genome sequencing, centralized circuits of genomic surveillance offer immense potential for improving disease surveillance and response, ultimately contributing to the overarching goals of the One Health framework1,4,39.

Methods

DNA extraction protocol

For the majority of the samples, DNA extraction was conducted using the PureLink Genomic DNA Mini Kit (Invitrogen). Subsequently, quantification of the extracted DNA was performed using the QUBIT FLEX fluorometer, and the quality of the eluate was assessed through electrophoresis using the Egel Power Snap Electrophoresis Device.

Sequencing

The majority of the samples have been sequenced at the CABIMER Genomic Unit, with a significant contribution from Listeria sequences obtained from the National Center of Microbiology (469 samples). These sequences were incorporated under a mutual agreement for sequence exchange established between the Andalusian Public Foundation Progress and Health-FPS and the National Center of Microbiology in 2020.

Whole genome sequencing of the isolates has been carried out at the CABIMER genomic facility. Library construction and sequencing was performed at the Genomics Core Facility of CABIMER. DNA libraries were prepared using Nextera DNA Flex Library prep kit (Illumina) following the manufacturer’s instructions. Currently, high-throughput sequencing was performed on the NextSeq 500 Sequencing System (Illumina), although other sequencing technologies, such as nanopore will be included in a near future.

Sequencing data processing

The raw reads are filtered using the fastp53 application (v0.23.4), followed by a search for potential contamination from other organisms using the kraken254 and bracken55 applications (v2.1.2). Coverage quality control is carried out using qualimap256 (v2.2.2). Subsequently, with quality-controlled reads, a de novo assembly is performed using SPAdes57 (v3.15.4) and the quality of the assembly is assessed using QUAST58 (v5.0.2).

Sample typing

MLST (Multi-Locus Sequence Typing) is acquired using the MentaLiST59 application (v1.0.0) for those organisms whose databases are updated within this tool. CgMLST (core genome Multi-Locus Sequence Typing) profiles are generated using Chewbbaca60 (v3.0.0) with the schema available in the Chewie Nomenclature Server (chewie-NS)61 for Listeria monocytogenes. Salmonella enterica and Escherichia coli schemas are filtered based on the EFSA gene list62 CgMLST profiles. Moreover cgMLST profiles based in other databases are obtained using a custom script that utilizes the BLAST + 63 tool (v2.12.0 +) and allelic profiles from the following public databases as references: Pasteur Institute64 for Listeria monocytogenes, EnteroBase33 for Yersinia enterocolitica, Salmonella enterica and Escherichia coli and PubMLST65 for Campylobacter jejuni/coli. Additionally, the applications LisSero66,67 (v0.4.1), SeqSero268 (v1.2.1), and SerotypeFinder69 (v2.0.1) are employed to determine the serotype of Listeria monocytogenes, Salmonella enterica, and Escherichia coli, respectively.

New species can be easily incorporated either by using its own typing schema, if the corresponding application is available, or with a customized script, as mentioned above.

AMR genes

To identify antimicrobial resistances, two approaches were employed. Firstly, the ResFinder70 application (v4.5.0) was utilized, along with its dedicated database. Secondly, the ABRicate30 tool (v1.0.1) was employed in conjunction with various databases, namely CARD32, MegaRES71, and ARG-ANNOT72.

Virulence genes

To identify virulence genes, two applications were employed: VirulenceFinder73 (v2.0.4), with its dedicated database and ABRicate using the Virulence Factor Database (VFDB)31 as a reference.

Plasmids

To identify plasmids present in the samples two approaches are employed: the PlasmidFinder74 application (v2.1.6) and the Mash Screen75 application (v2.3) combined with the PlasDB database76.

Core genome determination and phylogenetic analysis

Using the parSNP77 application (v1.7.4), the core genome is analyzed, SNP selection is performed, and subsequent phylogenetic trees are constructed based on SNP differences for each of the organisms present in SIEGA. On the other hand, allelic differences are obtained using a custom script that compares the cgMLST of each sample for each organism, generating a matrix for each organism. This matrix is then utilized by GrapeTree78 (v2.1) to construct the phylogenetic tree based on allelic distances, which could be visualized with GrapeTree or Taxonium79. The ETE380 application (v3.1.2) is used for the selection and generation of sub-trees included in the reports. The reference sequences (NCBI database identifiers) used for the different species were for Listeria monocytogenes NC_003210.1, Salmonella enterica NZ_SRHS01000001.1, Escherichia coli NC_000913.3, Campylobacter jejuni / coli NZ_CYQA01000001.1, Yersinia enterocolitica NZ_CQAE01000001.1 and Legionella pneumophila NZ_CP015941.1.