Introduction

Maize (Zea mays L.), together with wheat (Triticum aestivum L.) and rice (Oryza sativa L.), is a staple food crop and plays a vital role in global food security. Its uses extend to human consumption, animal feed, industrial processing, and bioenergy production1,2. Maize cultivation and productivity strongly influence both global food security and economic development. However, weeds can impede maize pollination by competing for pollinator services, hindering pollen dissemination, and altering the microenvironment within fields. Likewise, pests disrupt maize pollination by damaging floral structures, deterring pollinator activity, and compromising pollen viability and dispersal. These combined effects on photosynthesis and pollination contribute to significant yield losses3,4. Although pesticides and herbicides effectively mitigate these biotic stresses, their widespread use imposes substantial environmental burdens5,6.

GM crops are developed using molecular biology techniques to introduce traits such as insect resistance, herbicide tolerance, or nutritional enhancement7,8,9. GM technology has revolutionized crop breeding, with the International Service for the Acquisition of Applied Biotechnology (ISAAA) reporting 230 million hectares of global GM cropland in 2024, including 80 million hectares dedicated to GM maize10. The widespread adoption of insect-resistant and herbicide-tolerant GM seeds reduced pesticide applications by 748.6 million kilograms of active ingredients between 1996 and 2020, while also lowering agricultural production costs and environmental impacts. To date, 290 transgenic maize events have been documented globally, most of which confer traits related to pest resistance, herbicide tolerance, drought resilience, and yield improvement11.

The commercialization of GM crops has intensified public debates on food safety. Regulatory frameworks in many countries mandate labeling thresholds for GM content, which vary by region: 0.9% in Europe, 1% in Brazil, 3% in China and South Korea, and 5% in the United States12. In China, recent regulations have shifted from qualitative to quantitative labeling with a 3% threshold. Because nucleic acids remain relatively stable during food processing, they serve as primary targets for GM detection13. Standardized testing typically focuses on specific transformation events including foreign genes, promoters (e.g., 35 S), terminators (e.g., T-Nos), and target genomic sequences14,15,16,17,18. PCR-based methods including conventional PCR, quantitative PCR (qPCR), and digital PCR (dPCR), provide reliable qualitative and quantitative analyses19,20,21,22. Recent applications include the detection of Bt11 events in Jordanian maize hybrids using conventional PCR23the development of novel qPCR assays for CpTI-KDEL/T-nos junctions in transgenic rice24and the implementation of duplex dPCR for soybean event DAS-68416-4 quantification25.

The advancements in GM technology have created an urgent need for parallel detection methodologies. Conventional nucleic acid- or protein-based approaches face limitations when analyzing complex GM genomes. NGS enables genome-wide detection of GM components without prior knowledge of the transformants, making it a promising solution for multi-target identification in intricate transgenic plants26. Recent bioinformatics workflows now allow precise characterization of GM plants through NGS27,28. Previous studies demonstrate the equivalence of NGS to traditional methods (PCR, Sanger sequencing, or dPCR) in molecular characterization of transgenic rice event G6H129, as well as its effectiveness in mapping T-DNA insertion sites (Chr19:50543767–50543792 and Chr17:7980527–7980541) in glyphosate-resistant soybean lines GE-J16 and ZH10-630. However, critical parameters including library preparation protocols, optimal sequencing depth, and standardized analytical workflows remain undefined for NGS-based GM crops detection. Establishing robust and low-depth NGS methodologies for comprehensive identification of undocumented maize transformants is therefore an urgent research priority.

This study developed a low-depth NGS-based multi-target detection system for characterizing nucleic acid signatures in GM maize. A reference database containing 56 event-specific nucleotide sequences was constructed to allow precise alignment of low-depth sequencing data. We optimized key methodological parameters, including DNA extraction protocols, Library preparation workflows, sequencing depth requirements, and bioinformatic pipelines. The resulting integrated workflow enables high-throughput detection of 56 target sequences through optimized low-depth sequencing. Compared with traditional PCR-based detection methods, this approach substantiallyenhances detection efficiency. Furthermore, the established low-depth sequencing method facilitates regulatory compliance by simultaneously verifying multiple GM markers and meeting quantitative labeling requirements for maize products.

Results

Identification of positive sequences in transgenic maize using low-depth NGS

The nucleic acid sequences of maize transformants comprise native maize genomic regions flanking the 5’– and 3’– insertion sites (left boundary [LB] and right boundary [RB], respectively) and integrated exogenous genes (Fig. 1A). To establish a detection framework for transgenic maize using low-depth sequencing, we irst defined criteria for identifying positive reads of transformant-specific sequences. We analyzed the ND207 maize transformant, a pest- and herbicide-resistant GM variety developed by China Agricultural University and certified as safe in China. Sequencing was performed at 5× depth, and reads were aligned to ND207-specific LB/RB sequences under varying alignment stringencies (5–50 bp). Positive reads were quantified and validated against empirical detection data. Reliable detection was achievedwhen reads matched both the transformant-specific maize genome and exogenous gene sequences at 20 bp (Fig. 1B). This threshold is consistent with typical PCR primer lengths, ensuring compatibility with conventional amplification workflows. Visualization in Integrative Genomics Viewer (IGV) corroborated read-counting results and confirmed precise localization of LB/RB junctions (Fig. 1C–F). In this framework, a read is classified as positive if it shows ≥ 20 bp complementarity to both the transformant-specific genome and the exogenous insertions.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Detection of transgenic nucleic acid sequences in maize using low-depth sequencing. A Schematic of transgenic maize nucleic acid sequences, highlighting the LB and RB flanking the exogenous gene insertion site. B The quantity of detected reads across varying read length thresholds (5–50 bp, 5 bp increments). The 20 bp threshold matched empirical detection results (ND207_total reads_reliability). C, D IGV validation of ND207-specific reads at 20 bp alignment: C LB junction (4 reads); D RB junction (6 reads). E, F Empirical confirmation of detected reads: (E) LB (4 reads); (F) RB (6 reads). Data represent mean ± SD of ten biological replicates. Lowercase letters denote statistically significant differences between groups (one-way ANOVA, P < 0.05).

The impact of different tissues/organs on test results

To evaluate the effect of tissue type on transgenic detection, we analyzed ND207 maize transformants using roots, stems, leaves, and seeds. High-throughput sequencing at 5× depth was performed on each tissue type to assess ND207-specific sequence detection. Quantification of reads aligned to the LB and RB transgenic sequences revealed no significant differences in positive read counts among root, stem, leaf, or seed samples (Fig. 2A). IGV visualization confirmed consistent detection of ND207-specific reads in all tissues (Fig. 2B–I). These results suggested that all tested plant tissues can serve as a reliable source for the identification of ND207 transgenic sequences using low-depth sequencing.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Tissue-independent detection of transgenic nucleic acid sequences in maize using low-depth sequencing. A Quantification of ND207-specific reads aligned to LB and RB sequences across root, stem, leaf, and seed tissues. No significant differences were observed between tissue types. BI IGV validation of ND207-specific reads in each tissue: Root: (B) LB (4 reads), (C) RB (7 reads) Stem: (D) LB (5 reads), (E) RB (7 reads) Leaf: (F) LB (6 reads), (G) RB (7 reads) Seed: (H) LB (5 reads), (I) RB (7 reads). Data represent mean ± SD of ten biological replicates. Lowercase letters indicate statistically significant differences between groups (one-way ANOVA, P < 0.05).

Optimization of DNA preparation methods

Low-depth sequencing generates limited data, making high-quality DNA inputs essential for accurate analysis. To evaluate the impact of DNA extraction methods on detection reliability, genomic DNA were extracted from ND207 transgenic maize leaves using four protocols: CTAB protocol, DNeasy Plant Maxi Kit, DNeasy Plant Mini Kit (QIAGEN Co., Ltd., Shanghai, China), and Plant Genomic DNA Kit (Tiangen Biotech Co., Ltd., China). All samples were sequenced at 5× depth, and reads were aligned to ND207-specific transgenic sequences. The DNeasy Plant Maxi Kit yielded the highest total positive reads for both LB and RB sequences (Fig. 3A). IGV visualization confirmed superior detection efficiency with the Maxi Kit across all replicates (Fig. 3B–I). High-quality genomic DNA (e.g., Maxi Kit outputs with minimal degradation) correlated with enhanced sensitivity in low-depth sequencing workflows. Consequently, for reliable detection of transgenic sequences in maize, DNA extraction methods that preserve genomic integrity such as the DNeasy Plant Maxi Kit should be prioritized.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Impact of DNA extraction methods on transgenic sequence detection in maize using NGS. A Quantification of ND207-specific reads aligned to LB and RB sequences using four DNA extraction methods: The Maxi Kit yielded the highest read counts. BI IGV validation of ND207-specific reads. CTAB: (B) LB (2 reads), (C) RB (4 reads). DNeasy Maxi Kit: (D) LB (7 reads), (E) RB (7 reads). DNeasy Mini Kit: (F) LB (5 reads), (G) RB (6 reads). Plant Genomic DNA Kit: (H) LB (3 reads), (I) RB (2 reads). Data represent mean ± SD of ten biological replicates. Lowercase letters denote statistically significant differences between groups (one-way ANOVA, P < 0.05).

Optimization of NGS library construction methods

To determine the optimal NGS library preparation workflow for transgenic detection, we compared four fragmentation methods: PCR-Free amplification, Tn5 transposase digestion, mechanical shearing, and standard enzymatic digestion. ND207 maize transformants were used as experimental subjects, and analyzed across all four library construction methods. As a result, PCR-Free libraries generated the highest total read counts (Fig. 4A), yielding 1.27–1.56× more ND207-specific positive reads than the other methods (Fig. 4B). Additionally, PCR-Free libraries exhibited superior alignment precision to LB/RB junctions, with minimal off-target noise (Fig. 4C–J). These results indicate that PCR-Free library construction maximizes sensitivity and specificity for low-depth sequencing of transgenic maize while avoiding amplification biases inherent in enzymatic or mechanical methods.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

The impact of different library construction methods on the detection of ND207 transformants. A Total reads generated by PCR-Free, Tn5 digestion, mechanical shearing, and enzymatic digestion methods. B Detection efficiency of ND207-specific sequences across library methods. CJ IGV validation of ND207-specific reads. PCR-Free: (C) LB (8 reads), (D) RB (6 reads). Tn5 digestion: (E) LB (5 reads), (F) RB (4 reads). Mechanical shearing: (G) LB (5 reads), (H) RB (6 reads). Enzymatic digestion: (I) LB (5 reads), (J) RB (5 reads). Data represent mean ± SD of ten biological replicates. Lowercase letters denote statistically significant differences between groups (one-way ANOVA, P < 0.05).

Determining the depth of genome sequencing

To establish a cost-effective yet reliable sequencing depth for transgenic maize detection, we sequenced the homozygous ND207 transgenic Lines at depths of 1×, 2×, 3×, 4×, and 5×, and statistically analyzed the detection of transgenic-specific sequences. The results showed that all 10 trials successfully detected reads for transgenic-specific sequences at 3× depth (Fig. 5A, Supplementary Table S2-S6). DBN9936, a hybrid maize transgenic variety certified as safe in China and developed by Dabeinong Biotechnology Co., Ltd., was used as testing material. The results indicated that a 5× depth to was required to consistently detect transgenic sequences at both LB and RB regions in all replicates (Fig. 5B). IGV visualization confirmed concordance between read counts and alignment accuracy across sequencing depths (Fig. 5C–F). Considering variability in sample zygosity (homozygous vs. heterozygous) in real-world scenarios, we recommend a sequencing depth of 5× for robust detection of transgenic sequences.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

Optimization of the sequencing depth for the detection of transgenic sequences in maize trans-formants using NGS. A Analysis of the detection of nucleic acid specific sequences in ND207 maize transformants with 1×–5× sequencing depth of homozygous materials. (A) Detection efficiency of ND207-specific sequences (homozygous transgenic maize) across sequencing depths (1×–5×). All replicates achieved reliable detection at 3× depth. B Detection efficiency of DBN9936-specific sequences (heterozygous transgenic maize) across depths. Consistent detection (reads > 0 in all trials) required 5× depth. CF IGV validation of transformants. ND207 (homozygous): (C) LB at 5× depth (5 reads), (D) RB at 3× depth (7 reads). DBN9936 (heterozygous): (E) LB at 3× depth (3 reads), (F) RB at 5× depth (2 reads). Data represent mean ± SD of ten biological replicates. Lowercase letters denote statistically significant differences between groups (one-way ANOVA, P < 0.05).

Data analysis process optimization

To improve the sensitivity of transgenic sequence detection in low-depth sequencing data, we refined the bioinformatics workflow by incorporating paired-end read dynamics. In paired-end sequencing, Read 1 and Read 2 correspond to complementary strands sequenced from opposite ends of a DNA fragment (Fig. S1A). When a pair of reads sharing the same identifier can each align with the genomic sequence of the transgenic component and the exogenous gene sequence, they are categorized as ‘Coordinate reads’, which are also recognized as positive reads for the target transgenic component (Fig. S1B). We further investigated the ND207 transgenic component to determine whether including ‘Coordinate reads’ could improve detection of positive reads at 5× sequencing depth. The results showed that, compared with considering only ‘Split reads’, incorporating ‘Coordinate reads’ significantly increased the detection of positive reads for the ND207 transgenic component (Fig. 6A, Supplementary Table S7-S11). The read statistics results are consistent with the data analysis results (Fig. 6B–E). Thus, in this approach, positive reads are defined as the sum of ‘Split reads’ and ‘Coordinate reads’.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

Analysis of the detection of specific nucleic acid sequences in ND207 transformants using different reads statistical methods. A Statistical analysis of the detection of ND207 transformants: Split_reads and Coordinate_reads. B Analysis of IGV for a specific nucleic acid sequence in the LB of the ND207 maize transformant of the Split_reads (reads = 5). C Analysis of IGV for a specific nucleic acid sequence in the RB of the ND207 maize transformant of the Split_reads (reads = 5). D Analysis of IGV for a specific nucleic acid sequence in the LB of the ND207 maize transformant of the Split_reads and Coordinate_reads (reads = 8). E Analysis of IGV for a specific nucleic acid sequence in the RB of the ND207 maize transformant of the Split_reads and Coordinate_reads. (reads = 7) (reads = 4) Data represent mean ± SD of ten biological replicates. Lowercase letters denote statistically significant differences between groups (one-way ANOVA, P < 0.05).

Discussion

In recent years, advances in biotechnology have led to numerous GM plants entering the market. To regulate GM crops, governments have established labeling thresholds. For example, the European Union requires labeling of products containing more than 0.9% GM food, whereas Brazil and Australia use thresholds around 1%. In 2023, the Ministry of Agriculture and Rural Affairs of China solicited public comments on the “Decision of the Ministry of Agriculture and Rural Affairs on Amending the Administrative Measures for the Labeling of Genetically Modified Organisms in Agriculture (Draft for Comments).” Among the proposals, a new regulation stating that “when the content of a single genetically modified crop component exceeds 3% of the product, it must be labeled” drew particular attention. If adopted, this draft regulation would shift GM food labeling in China from a qualitative system to a quantitative system with a 3% threshold. As GM crop diversity increases, detection standards become more complex, creating new challenges for screening GM components. Low-depth whole-genome sequencing provides a cost-effective approach for surveying entire genomes across populations of genetically modified seeds and plants. Increasing sequencing depth may further enhance detection sensitivity. This approach therefore holds promise as a potential method for quantitative screening of processed products in the future.

This study addressed these challenges by constructing a reference sequence alignment database for authorized transgenic maize events in China. This resource supports and validates detection of transgenic maize using NGS. We optimized each step of the workflow including selection of sample tissues and organs, DNA extraction methods, high-throughput sequencing library preparation, and data analysis pipelines to improve detection sensitivity. Key refinements in our workflow including three points as follows: (1) Sample preparation: Selection of appropriate tissues and organs and optimization of DNA extraction protocols to ensure high-quality DNA. (2) Sequencing protocols: Tailoring of library preparation methods for high-throughput sequencing at low depth. (3) Data analysis: Development of a robust pipeline for aligning reads to transgenic and maize genomes. These optimizations enabled reliable identification of maize transgenic sequences at a sequencing depth of only 5×. Our approach leverages high-throughput sequencing to simultaneously screen for multiple specific transgenic DNA sequences in maize plants, thereby substantially reducing the effort required for comprehensive transgene screening. Furthermore, the sequence alignment database can be easily updated through incorporating additional transgene sequences (foreign genes and regulatory elements), the analysis to target hundreds of loci in parallel.

Importantly, this NGS-based screening strategy can be extended to other major crops such as rice and soybean. Both rice and soybean have smaller genomes than maize, which means that the low-depth sequencing can cover a larger fraction of the genome with the same sequencing effort. In practice, this should allow more accurate detection of transgenic elements at lower costs in these species. Thus, similar or improved performance is expected when applying our method to crops with smaller genomes.

Despite these advantages, challenges remain. Detecting transgenic sequences present at very low concentrations is difficult. At present, sequencing read counts can provide a rough estimate of transgene content by comparing reads mapped to transgenic sequences versus endogenous maize genes. However, the precise quantification of GM percentage in a sample is not yet feasible with our current approach. Future work should focus on improving quantitative accuracy for low-abundance transgenes and establishing robust standards for quantification.

In summary, our results demonstrate the feasibility of using low-depth whole-genome sequencing for efficient screening of GM maize, offering significant advantages in multiplex detection and scalability. The updateable sequence database and optimized protocols provide valuable tools for GM crop monitoring. Addressing the remaining limitations in quantitative accuracy will further enhance the utility of NGS for both regulatory and research applications.

Methods

Plant materials and DNA extraction

A total of 3 commercial maize varieties were used in this experiment, including ND207 (Beijing Liangyuan Biotechnology Co., Ltd.), MIR162 (Syngenta Co., Ltd.) and DBN9936 (Beijing Dabeinong Biotechnology Co., Ltd.). All seeds used in this study were officially obtained as research materials from the Development Center of Science and Technology of the Ministry of Agriculture and Rural Affairs of the People’s Republic of China. According to China’s standards for testing GM organisms, the requirements for DNA extraction and purification of GM plants and their products include the use of the CTAB method31as well as DNA extraction kits.

Construction of a sequence alignment database for maize transformants

By searching for national standards regarding the detection of GM plants and their products, as well as patents and Literature related to GM maize, we obtained nucleic acid specific sequences for 34 maize transformation events, including DBN9501, MON87419, MON87411, MON863, MIR098, ND207, CM8101, MON810, MIR604, Bt11, Bt176, MON87427, 3272, 5307, T25, GA21, NK603, DAS40278-9, DBN9936, DBN9858, BFL4-2, GAB-3, RF125, TC1507, ZZM030, MON88017, MON89034, MON87460, MIR162, BT506, 59,122, DP4114, CBH351, and IE09S034, respectively. A database of nucleic acid-specific sequences for transgenic maize was constructed based on the specific nucleic acid sequences obtained from various transgenic maize events (Supplementary Table S1). This database is utilized for aligning sequencing data.

Determination and optimization of data analysis process

Fastp software was used for data quality assessment and preprocessing of the raw sequencing data32. Subsequently, the pre-processed sequencing data were aligned to the reference genome (B73 V5) using the Burrows–Wheeler aligner (BWA) to generate bam files33. SAMtools and BCFtools were utilized for sorting, indexing, filtering and statistical analysis of these files34. In the comparative analysis of sequencing data, the reads generated from sequencing were aligned with the genomic sequences associated with the maize transformation event and the sequences of the exogenous genes at intervals of 5, 10, 15, 20, 25, and 30 base pairs. The occurrences of positive detections and false positives were systematically analyzed to establish the optimal criteria for identifying positive results. Consequently, in accordance with the principles of paired-end sequencing, the detection of complementary pairs of reads was enhanced. Specifically, if one read from a paired set al.igns with the genomic sequence of the maize transformation event while the other aligns with the exogenous gene sequence of the same event, that pair of reads is classified as positive reads indicative of the maize transformation event.

The influence of maize roots, stems, leaves, and grains on low-depth sequencing methodologies

The roots, stems, leaves, and grains of the ND207 transgenic maize transformation event were selected for analysis. Subsequently, the DNeasy Plant Maxi Kit will be utilized to extract DNA from these plant tissues. Following DNA extraction, high-throughput sequencing and data analysis will be conducted on the extracted DNA.

Identifying the most appropriate method for DNA extraction

Utilizing the ND207 transformation event as the experimental subject, DNA was extracted employing the CTAB method as well as three distinct DNA extraction kits (DNeasy plant Maxi Kit (QIAGEN, Germany), DNeasy plant Mini Kit (QIAGEN, Germany) and Plant Genomic DNA Kit (TianGen, China). The extracted DNA was subsequently analyzed through high-throughput sequencing and data analysis.

Analysis of NGS library construction methods

This study employs four prevalent library construction methods in high-throughput sequencing: PCR-free amplification, Tn5 enzyme digestion, mechanical fragmentation, and conventional enzyme digestion. These methods are utilized to construct low-depth sequencing libraries, which are subsequently subjected to high-throughput sequencing and data analysis.

Low-depth sequencing depth analysis

We conducted deep sequencing at a depth of 5× using pure homozygous ND207 maize transformants with varying contents of 10%, 20%, 30%, 50%, and 100%. This approach allowed us to detect the event components of the ND207 maize transformants at the specified depth. Furthermore, we performed sequencing at depths of 1×, 2×, 3×, 4×, and 5× on the unknown homozygous transformants DBN9936 and MIR162. This analysis aimed to determine the optimal sequencing depth for the detection of nucleic acid-specific sequences associated with maize transformant events utilizing low-depth sequencing techniques.