Abstract
Rice cultivation in Europe is declining as consumers increasingly prefer imported exotic varieties, such as aromatic and basmati rice, which are prone to fraudulent varietal claims due to their higher market value. To address this issue, we sequenced 20 high-value rice cultivars circulating in the Mediterranean market, analyzing their phylogeny and whole-genome polymorphisms. Our results revealed that two basmati varieties are genetically closer to two Mediterranean varieties, and that no direct link exists between genome-wide single nucleotide polymorphism (SNP) patterns and the rice commercial category. We further discuss genes located in previously described quantitative trait loci (QTLs) related to eating quality and seed properties. A variant in the WX1 gene, associated with higher amylose content, was found in both the Basmati group and Mediterranean varieties. Additionally, a SNP that could disrupt the drought tolerance gene OsbHLH148 was identified in five European varieties, while a variant affecting splicing of OsPol lambda, related to drought response, was present in four of those. This data could assist certification offices in reducing fraud in the rice market and provide valuable insights for researchers and breeders, particularly regarding the production, consumption, and adaptation of these cultivars to the Mediterranean region.
Similar content being viewed by others
Introduction
As the primary staple food for over half of the world’s population, rice (Oryza sativa) accounts for about 20% of the calories consumed worldwide1. Originating in Asia, Oryza sativa is highly adaptable to different latitudes and longitudes and is currently grown worldwide in a vast range of ecosystems. Genetic structure studies have revealed two primary subgroups within the O. sativa species: indica and japonica, arising from independent domestication events, but other types with less clear origins as sadri-basmati and aus-boro have also been considered2,3.
Rice cultivation in Europe has a relatively short history, and a slight decline in rice production has been reported in recent years across this continent4. While the European pedigree mainly consists of temperate japonica varieties, European consumer preferences tendentially lean towards characteristics related to grain size, colour, and cooking qualities, with a growing demand for aromatic and indica varieties4,5,6. This, coupled with increasing prices for japonica rice, has led to a rise in imports of “exotic” varieties, and the EU has become a net importer of rice, with about one-third of its consumption sourced from countries like Pakistan, India, and Thailand6. Consequently, efforts are underway to boost EU japonica rice production and sales within the European market, targeting varieties specifically produced and grown in the European region. Furthermore, the influx of new/exotic accessions is fostering fraudulent varietal claims of higher quality and expensive rice (as the example of basmati rice), with replacement or mixture of lower quality and cheaper varieties7,8.
The challenges imposed by climate change and the increasing human population also drive the necessity for better, more resilient, and more nutritious rice varieties. According to “The Second Report on the State of the World’s Plant Genetic Resources for Food and Agriculture”, there are over 700,000 certified varieties distributed in genebanks worldwide9. The 3 K Rice Genome Project10 was launched to better understand the genetic diversity of this broad germplasm, by obtaining the whole-genome sequence of more than 3000 target accessions. The data produced in this project serves as an unparalleled resource for uncovering rice genetic variation on a large scale1. From this initiative, 29 million single nucleotide polymorphisms (SNPs), 2.4 million small insertions and deletions (InDels), and over 90,000 structural variations (SVs) were identified1. The public availability of such genomic data allows for increased knowledge of rice populations, varieties, and even other species from the Oryza genus. Second11 discussed the application of molecular markers to perform phylogenetic analysis in rice and, in 1999, Ge et al.12 performed a phylogenetic analysis from well-described genes, using only 2 nuclear genes (Adh1 and Adh2) and 1 from the chloroplast (matK). Since then, the evolution of technology helped to detect and target the smallest genomic variations and use them to either analyse population structure or possibly implement fraud detection methods7,13,14. Moreover, molecular markers have facilitated breeding programs targeting not only agronomic but also eating and cooking quality traits, and have been widely applied for genomic studies. One example is the multiple polymorphisms associated with the waxy gene (acting in amylose synthesis), which have been applied to assess the authenticity of carnaroli rice and estimate amylose levels in different rice varieties15,16,17.
Our study aimed to characterize the genetic background of 22 rice varieties currently circulating in the Mediterranean market, and generate knowledge to further tackle fraudulent varietal claims and contribute to genotype conservation. The data generated highlighted the genetic relatedness of these varieties and genetic variability within genes of interest regarding grain-related traits. These data may also be used for the design of reliable and cost-effective DNA-based adulteration detection methods.
Materials and methods
Plant material and DNA extraction
This study targeted a total of 22 Mediterranean rice varieties (Table 1), considered as high-value by European rice producers, industrials and breeders based on agronomic and industrial behavior, and/or cooking qualities. Furthermore, except for two Spanish varieties - Bomba and Puntal - they still had no sequenced genomes. These varieties were classified in 6 distinct groups according to the rice commercial type and based on grain biometric parameters according to the Council Regulation No 1785/2003: Long A, Long B, Medium grain, Round grain, Basmati and European aromatic (see Table 1) [18(Supplementary Material),19]. Out of the 22 varieties, two of them were previously studied by Reig-Valiente14 and their genome sequences retrieved from ENA (PRJEB13328): Bomba, under the accession replicates SAMEA3927584, SAMEA3927585, and SAMEA3927584; and Puntal, with accession replicates SAMEA3927614, SAMEA3927615, and SAMEA3927616. For each of the additional 20 varieties, rice seeds were germinated in hydroponics for a period of 10–16 days. The shoots of about 25 seedlings (two weeks old) were harvested and immediately frozen in liquid nitrogen for storage at −80 °C. Frozen shoots were ground in liquid nitrogen and used for DNA extraction using an optimized CTAB-based method20 with an increase of RNAse final concentration to 20 ug/mL. DNA from the Basmati Type III variety was exceptionally extracted from seed flour following CTAB extraction as described in21, after the seeds were dehusked and ground into a fine powder, using a disinfected coffee grinder.
Sequencing, preprocessing, and mapping
DNA libraries were prepared with Truseq DNA PCR-free protocol and whole-genome sequencing (WGS) was performed using the Illumina NovaSeq 6000 platform (Macrogen, South Korea). Raw paired-end read quality was assessed using FastQC (v0.11.9)22. Due to the high quality of the reads, alongside the absence of adapters, no read trimming was applied for most accessions. Exceptionally, for Bomba and Puntal (reads obtained from14, low-quality nucleotide regions were removed (QS ≥ 20) using Trimmomatic (v0.39)23.
High-quality reads were then mapped to the reference genome Nipponbare 1.0 (IRGSP-1.0, release 52) using bwa-mem (v0.7.17) with default parameters24. The resulting SAM files were converted into BAM files (option ‘samtobam’) and coverage/depth statistics were obtained using option ‘depth’ from SAMtools (v1.7)25. Additionally, BAM files were sorted (option ‘sort’) and indexed (option ‘index’), and duplicate reads were marked (option ‘dedup’) using SAMtools.
Variant calling and filtering
Short variant calling was performed using Genome Analysis Toolkit 4 (GATK v4.2.6.1), following the GATK “Guide of good practices for the discovery of germline short variants”26. In detail, quality scores for each base pair were recalibrated using known sites of O. sativa variants (https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/release-60/variation/vcf/oryza_sativa/oryza_sativa.vcf.gz). Then, SNPs and InDels were called for each variety using GATK HaplotypeCaller and stored in genomic variant calling format (gVCF) files. These 22 files were merged by joint genotyping in a single cohort VCF file using GATK CombineGVCFs and GenotypeGVCF options.
SNPs and InDels records were stored in 2 separate files using GATK SelectVariants for filtering them independently, as recommended in the GATK best practices and as previously reported by Ji et al.27. Low-quality SNPs were removed based on: allele depth (QD) < 5, strand bias estimated by Symmetric Odds Ratio test (SOR) > 3, Fisher exact test (FS) > 50, root mean square of the mapping quality of reads across all samples (MQ) < 50, rank sum test for mapping qualities (MQRankSum) < −2.5 and the relative positioning of reference versus alternative alleles within reads (ReadPosRankSum) < −1.0 and > 3.5. Low-quality InDels were removed based on QD < 2.0, FS > 200.0, and ReadPosRankSum < −20.0.
Estimation of variant effects
A field containing annotations regarding the position (related to annotated genes) of each variant, in addition to their putative effect (HIGH, LOW, MODERATE, MODIFIER) on gene function was added to the files containing each type of variant, using SnpEFF (v5.1)28, using the built-in structure annotation library for Oryza sativa (MSU7). A functional enrichment analysis of the identified genes annotated with HIGH impact SNPs was performed using the gprofiler2 R package (‘gostres’ and ‘gostplot’ functions)29,30.
Phylogeny analysis
The phylogenetic tree was generated with the total filtered SNPs in VCF2PopTree31 using default parameters (-output “Newick tree”). Then, tree labels were colored according to the rice commercial category and exported in.svg format using the Interactive Tree of Life (v6) webtool32.
QTLs and gene enrichment analysis
Start and end positions of Quantitative Trait Locus (QTLs) related to rice-eating quality and seed properties were retrieved from the Rice SNP-Seek database (last accessed on January 31 st, 2023). Using a custom R script (snp_in_qtl.R), genes annotated with HIGH impact SNPs were screened to check if their variants occur within the collected QTLs of interest. Then, the resulting table was used for a gene enrichment analysis using the gprofiler2 R package (custom R script variant_enrichment.R).
In silico discrimination of varieties using unique combinations of SNPs
The Conditional Random Selection method described by Yuan et al.33 was applied to the filtered SNPs file, after simplifying it to contain only SNPs of the 12 chromosomes, converting it to.txt., and filtering out rows that lacked any information regarding the SNPs’ haplotypes of all varieties (haplotype “./.”). The final input file contained 24 columns: CHROM, POS, and the 22 variety names; and each row represented the haplotypes per SNP (e.g. “0/0” for homozygous equal to reference, “0/1” for heterozygous, and “1/1” for homozygous with alternative allele). The default instructions were followed as publicly available by the authors33.
Results
Genome-wide profiling of commercially valuable varieties from the mediterranean market
To obtain the genetic polymorphisms present in the genome of the commercially valuable varieties selected in this study, high-quality paired-end reads obtained for each genotype were aligned to the rice reference genome (Nipponbare). For all 20 varieties sequenced within this study, the average percentage of high-quality reads mapped and properly paired in unique positions was 97%. As a result, on average, over 127 M of reads were unique alignments (Fig. 1a) with a final mapping depth ranging from 46.0x (Maçarico) to 62.6x (Manobi), with a mean of 53.0x coverage. For Bomba and Puntal varieties, about 98.2% of the total reads were mapped and paired and 87% were unique alignments. A final coverage of 30.5x for Bomba and 25.8x for Puntal (Table 2) was obtained. Given the good quality of the paired-end read files and the high coverage after alignment to the reference genome, the final BAM files were prepared for short variants extraction. These files included both those generated in this study and previously published sequencing data. The full array of SNPs and InDels obtained using GATK was filtered, leading to the identification of over 4.8 M high-confidence variants, of which over 3.6 M were SNPs and 1 M were InDels (Table 2). From the sequencing data, out of the 12 rice chromosomes, chromosome 11 had the highest number of variants, contrasting with chromosome 9 (the smallest one). The size of the chromosomes was taken into consideration for the calculation of variant rate and density, which revealed chromosome 5 as the one with the least number of variants per 1 kbp window (density) and a higher rate (Table 2), which describes the mean number of base pairs in which one variant occurs27. Additionally, an analysis of the obtained substitutions, in the case of SNPs, and of length, in the case of InDels, was performed (Fig. 1b, c). It is noticeable that most SNPs corresponded to a substitution of Cytosine to Thymine (C > T) or Guanine to Adenine (G > A) (Fig. 1c). Regarding InDels, the comparison with the reference genome showed a higher predominance of shorter InDels (< 5 bp), which was an expected result.
Whole-genome sequencing of 20 varieties and the characteristics of the generated polymorphisms, with the addition of two Spanish genotypes. (a) Number of reads obtained from the whole-genome sequencing of 20 rice varieties. The violin plot represents the distribution of total reads, mapped reads, mapped and paired, and unique alignments obtained from the alignment with the rice reference genome. (b) Length of the InDels identified in 22 rice varieties, in comparison with Nipponbare. Negative numbers correspond to deletions and positive to insertions of a number of nucleotides. The number of InDels with each size is on a logarithmic scale. (c) Substitutions detected on SNPs data. A- Adenine, C- Cytosine, G- Guanine, T- Thymine. The number of SNPs is on a logarithmic scale.
Variant information
A comprehensive analysis of the SNPs along the twelve chromosomes in rice highlighted a somewhat uneven distribution (Fig. 2). It should be noted that no correlation between SNPs occurrence and sequencing coverage was found, indicating a reduced bias from underrepresented regions (Fig. 2b, c). Although some peaks of coverage occur in parts of the genome with lower SNP detection, this is not observed in the majority of peaks or the chromosomes, for instance when comparing chromosome 4 to chromosome 10 (Fig. 2b, c). Also, when examining the gene density, there seems to be no correlation with SNP distribution. Although some of the less dense regions of each chromosome correspond to the centromere, those regions have less occurrence of SNPs in most chromosomes. Tajima’s d was calculated to further reveal regions of the genome where the observed variation is conserved. A few chromosome regions (end of chromosome 1 and chromosome 10) are highlighted by having high Tajima’s d positive peaks (Fig. 2d), suggesting an event of balancing selection.
SNPs analysis at whole-genome level. Circos plot represents (from the outer to the inner ring): (a) Chromosome size, centromere position (red band), and coding genes density (by intensity of each color); (b) coverage of reads per chromosome; (c) SNPs density (number of SNPs in 100 Kb windows); (d) Tajima’s D mean in each 100 Kb window, with positive values in green and negative values in red.
SNP density across each chromosome reveals conserved polymorphisms between varieties
The whole-genome SNP-based phylogenetic tree highlights two main groups (Fig. 3). The first cluster groups together both Basmati varieties (Super-Basmati, Basmati Type III) alongside Giza181 and Maçarico. The second cluster groups all other varieties, with subsequent divisions highlighting Long B varieties (CL-28, Puntal) grouping, as well as the medium grain varieties (in green), JSendra, and Manobi. Based on SNP data alone, and lacking the pedigree information for these varieties, we suggest that Giza 181 and Maçarico are more closely related to Basmati varieties than the others assessed in the study, despite belonging to separate categories defined from phenotype observation.
When observing SNP density along each chromosome for all varieties, we found that Maçarico, Giza 181, Basmati Type III, and Super Basmati clustered together with a tendency to accumulate a higher number of SNPs (Fig. 3). This trend was conserved in most chromosomes (Fig. 4), with the exception of Chr9. This particular group of 4 varieties is more evident in 9 of the total chromosomes (Chr1 - Chr5, Chr8, and Chr10 - Chr12), with 3 of them (Maçarico, Giza 181, and Basmati Type III) always grouping within the same cluster in all chromosomes. The clustering of the remaining varieties based on SNP density per chromosome was less conserved, with the overall relatedness being chromosome-dependent.
Global and variety-specific genes in QTLs of interest
The list of collected SNPs was annotated according to their putative impact on the genomic sequence. To assess the functional impact of the detected diversity, we further identified high-impact variants, predicted to disrupt a start codon or introduce a stop codon within the coding sequence of genes in all varieties targeted in this study. A total of 1003 high-impact SNPs was identified (Online Resource 1) associated with 912 genes. This list of genes was subsequently filtered for genes within previously identified QTL regions related to eating quality traits and/or seed properties, resulting in 911 genes. Table 3 highlights a group of genes with particular interest due to their previously described roles in regulating seed traits, and specific variants within their coding sequences which may actively impact these traits.
Enrichment analysis
The set of genes with one or more HIGH-impact SNPs within their coding sequence was enriched in molecular functions related to ADP and nucleotide binding, and biological processes related to defense response, namely against other organisms. Some examples include the Os8Tak2 gene, related to disease resistance, the DNA damage repair OsPol lambda gene, and the OsbHLH148 gene, which is involved in abiotic stress tolerance (Fig. 5).
SNPs as a source of varietal discrimination potential
The Conditional Random Selection method identified groups of a minimum number of SNPs with enough discriminating potential to distinguish all 22 varieties. Those groups contained different numbers of SNPs, with the lowest being 5 SNPs which, together with the respective haplotype, is represented in Table 4. We employed this method as proof of concept of the use of the generated dataset in targeted techniques for varietal identification.
Discussion
In this study, we sequenced 20 and obtained the global polymorphisms for 22 rice varieties, produced and or circulating in the Mediterranean region. This list was selected based on factors such as market value and the rice potential for breeding applications.
The SNP distribution at the chromosome level highlights that the 22 targeted varieties form clusters that are mainly consistent across the genome. Phylogenetic inference from SNP density and variability enabled the prediction of genetic relatedness among these varieties. This analysis divided the group into two main clusters, one with both Basmati varieties, plus Maçarico (Long B) and Giza181 (Long A). This shows that the high-quality polymorphisms dataset generated in this study has no direct link with their respective commercial category, which is mainly based on grain-related phenotypical traits. We hypothesize that this direct correlation may not be reached without the full context of varietal pedigree of the varieties (undisclosed by seed providers), and putative epigenetic impact.
Interestingly, out of the 912 genes annotated with one or more HIGH impact SNPs, 911 of them located within QTLs of interest related to eating quality and seed-related properties. These polymorphisms were identified in genes such as Waxy, OsPol lambda, and PAO1, and are present in at least one of the varieties in this study. The well-established Waxy (Wx) “granule-bound starch synthase 1” gene is responsible for amylose synthesis, a key determinant of rice cooking and processing qualities16. This gene is located in chromosome 6 and is genetically linked to Eating Quality QTLs (QTAROqtl-157, QTAROqtl-987, QTAROqtl-989, qST3-2, qAC-6, hr6, alk6-1, qAC-6, qGC6, ac6, qAC9, qPC9) and seed related QTLs (yd1a, gy12). In the endosperm, the expression of this gene is highly impacted depending on different alleles within the Wx locus, which leads to differential amylose content. This allele forms relate to a transversion (T→G) located in position 1,765,761 bp that impacts the splicing of the Wx transcript, leading to changes in amylose content. The allele version AGTTATA correlates with lower amylose content, while the AGGTATA allele version is linked to intermediate and higher amylose content16. Interestingly, we confirmed the presence of this SNP, from T to G, in both Basmati varieties targeted in this study, in addition to Maçarico, Carnaroli, CL28 (heterozygous) and the Spanish varieties Bomba and Puntal (homozygous for the alternate allele). This matches the phenotypic trait of Basmati varieties, which have lower starch but higher amylose content within their grain34. Pereira et al.18 [Supplementary Material] had previously published data with the physicochemical properties of the rice grains of the same varieties sequenced in this study, which included grain amylose content. These authors observed that the varieties we identified as having the alternate AGGTATA allele, show intermediate to high amylose content (24.77–31.91%), namely Bomba, Carnaroli, CL-28, Maçarico and Puntal16. Regarding the two Basmati varieties however, they showed intermediate amylose content values of 17.97–21.89%. This evidence agrees with previous data from our team16, suggesting that these two alleles alone may not be sufficient to explain the variations in amylose content as other (epi-)genetic factors may also play a role.
It was previously shown that, when over-expressed, the OsbHLH148 gene confers drought tolerance in rice35. We have identified a high-impact SNP within the coding region of this gene (pos-chr3: 30,406,910 bp) in the Albatros, Arelate, Ariete, Lusitano, and Ronaldo varieties. This SNP is annotated as disrupting the single transcript start codon, which may lead to the gene’s defective transcription and compromise its role in drought response. Another gene of interest was the OsPol lambda (Os06t0237200-01) “DNA polymerase lambda” gene, which belongs to the only X family of DNA polymerase in rice involved in DNA damage repair. It is described as up-regulated in response to abiotic stress (drought, salt), and correlating with the stress intensity36. In four of the varieties under study (Albatros, Arelate, Ariete, and Ronaldo), we found a homozygous SNP corresponding to a transition A→G, which was predicted to interfere with the transcript splicing process. This subset of varieties also showed the SNP variant described above for OsbHLH148. Collectively, and since drought tolerance represents a trait of interest in the context of climate change, particularly in the Mediterranean region, we suggest that both genes can be further studied in the high-value varieties produced in this region to better understand specific stress responses during production.
The Os8Tak2 “receptor-like kinase 20” gene has been previously shown to negatively regulate rice resistance to bacterial blight37. For all varieties except Albatros, Arelate, Caravela, Gageron, Giza181, JSendra, Maçarico, and Manobi, we identified a high-impact SNP annotated as introducing an early stop codon in this gene coding region. The functional enrichment analysis revealed that the group of 911 genes is significantly enriched in processes like stress response and response to other organisms. We suggest that the Os8Tak2 gene, along with the OsbHLH148 and OsPol lambda genes described above, serve as key examples within this group, potentially playing important roles in the stress response and adaptation of these varieties to the biotic and abiotic factors commonly found in the Mediterranean region.
The GS3 gene is an evolutionary important gene in controlling grain size in rice38. All varieties except Bomba, Gageron, Giza177 and Lusitano were identified as having a variant (G→T) within the coding region of this gene (chr3 position: 16,733,441), annotated as prematurely introducing a stop codon. Interestingly, 3 out of the 4 varieties identified with the absence of this SNP, were all of the round grain type (Bomba, Gageron and Giza 177). Therefore, we hypothesize that the presence of this variant is partially responsible for higher rice seed lengths in the varieties targeted in this study.
OsYUCCA12 is one of three Indole-3-Acetic Acid (IAA) biosynthesis genes previously reported as expressed during early rice grain development, mainly responsible for the increase in IAA content in the grain39. A high-impact variant in the chr2 position: 9,878,862 was identified in both Basmati varieties, in addition to Albatros, Arelate, Giza 181 and Maçarico. This SNP introduces an early stop codon, eventually compromising the accumulation of IAA in the seed.
Rice contains seven genes that encode polyamine oxidases (PAOs), named OsPAO1 to OsPAO7 according to chromosome and gene ID number. PAO1, located in chromosome 1, has a known function in the back-conversion of spermine and thermospermine into spermidine and is described as responding to cytokinin levels in rice40,41. In Arabidopsis, the same role is described for its ortholog AtPAO5, in addition to promoting seed germination42,43. Within the coding sequence of the PAO1 gene, we identified a high-impact SNP (chr1:29,513,400) for both Basmati varieties (heterozygous alleles) and for Maçarico, Giza181, Bomba, and Puntal (homozygous for the alternate allele). This SNP is annotated as introducing a stop codon within the gene coding sequence, likely compromising the PAO1 transcriptional process. In the targeted varieties, PAO1 occurs in a region with a positive tajimaD value (‘>2’), therefore we hypothesize that it may be undergoing a process of balancing selection and acting as a main discriminator between the two main observed phylogenetic clusters.
Although we have not studied the putative impact of these mutations on the phenotype of the rice varieties carrying them, we believe this information deserves being highlighted so that other researchers and breeders may further investigate genetic contexts and eventually validate correlations.
Whole-genome data can not only justify phenotypic differences between closely or far-related varieties but also be employed in diagnosis analysis7. Our analysis was performed based on a random selection of SNPs that may be employed in DNA-based methods for varietal identification. Remarkably, a minimal panel of just five SNPs was enough to distinguish the 22 varieties included in our dataset. The detection of such small differences, may be further achieved through the use of target fluorophore labels (e.g. KASP markers44 coupled with techniques such as capillary electrophoresis45). The application of such methods to a reduced SNP set may offer a cost-effective alternative to conventional genotyping strategies relying on larger marker arrays. Furthermore, this targeted strategy serves as a scalable model for varietal identification in broader germplasm collections.
In conclusion, we believe that the open availability of the data generated in this study will be useful for researchers and breeders to deepen their knowledge regarding these high-value cultivars, predicted to increase in relevance given the production and consumption trends observed in the Mediterranean region. It may also support further studies focusing on their adaptation potential to the changing Mediterranean climate.
Data availability
Whole-genome sequencing data generated and analyzed during the current study are available in the European Nucleotide Archive repository under the PRJEB64146 accession code. The raw variants data generated and analyzed during the current study are available in the European Variation Archive repository under the PRJEB83571 accession code. Custom scripts used for data analysis during the current study are available on GitHub (https://github.com/hmrodrigues99/TRACE-RICE). Supplementary data and MIAPPE information for sequenced varieties can be found in Dataverse (https://dmportal.biodata.pt/dataverse/gvtritqb).
References
Wang, W. et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557, 43–49 (2018).
Garris, A. J., Tai, T. H., Coburn, J., Kresovich, S. & McCouch, S. Genetic structure and diversity in Oryza sativa L. Genetics 169, 1631–1638 (2005).
Zhao, K. et al. Genomic diversity and introgression in O. sativa reveal the impact of domestication and breeding on the rice genome. PLOS ONE. 5, e10780 (2010).
Courtois, B. et al. Genetic diversity and population structure in a European collection of rice. Crop Sci. 52, 1663–1675 (2012).
Castanho, A. et al. Adaptation of the food choice questionnaire using a design thinking approach and application to rice consumption by the major European consumers. Food Qual. Prefer. 110, 104951 (2023).
Directorate-General for Agriculture and Rural Development (European Commission). EU Agricultural Outlook for Markets, Income and Environment 2020–2030 (Publications Office of the European Union, 2020).
Vieira, M. B., Faustino, M. V., Lourenço, T. F. & Oliveira, M. M. DNA-Based tools to certify authenticity of rice varieties: An overview. Foods 11, 258 (2022).
Fridez, F. Basmati rice fraud under the magnifying glass of DNA analysis. CHIMIA 70, 354–354 (2016).
FAO. The Second Report on the state of the world’s plant genetic resources for food and agriculture.(2010). https://www.fao.org/4/i1500e/i1500e00.htm
The 3,000 rice genomes project. The 3000 rice genomes project. GigaScience 3, 7 (2014).
Second, G. Molecular markers in rice systematics and the evaluation of genetic resources. In Rice (ed. Bajaj, Y. P. S.) 468–494 (Springer, 1991). https://doi.org/10.1007/978-3-642-83986-3_31.
Ge, S., Sang, T., Lu, B. R. & Hong, D. Y. Phylogeny of rice genomes with emphasis on origins of allotetraploid species. Proc. Natl. Acad. Sci. 96, 14400–14405 (1999).
Tam, N. T. et al. Profiling SNP and nucleotide diversity to characterize Mekong delta rice landraces in Southeast Asian populations. Plant. Genome. 12, 190042 (2019).
Reig-Valiente, J. L. et al. Genetic diversity and population structure of rice varieties cultivated in temperate regions. Rice 9, 58 (2016).
Grazina, L. et al. Authentication of Carnaroli rice by HRM analysis targeting nucleotide polymorphisms in the Alk and Waxy genes. Food Control. 135, 108829 (2022).
Jayamani, P., Negrão, S., Brites, C. & Oliveira, M. M. Potential of Waxy gene microsatellite and single-nucleotide polymorphisms to develop Japonica varieties with desired amylose levels in rice (Oryza sativa L). J. Cereal Sci. 46, 178–186 (2007).
Cheng, A., Ismail, I., Osman, M. & Hashim, H. Simple and rapid molecular techniques for identification of amylose levels in rice varieties. Int. J. Mol. Sci. 13, 6156–6166 (2012).
Pereira, C. L. et al. Relationship between physicochemical and cooking quality parameters with estimated glycaemic index of rice varieties. Foods 13, 135 (2023).
Official Journal of the European Union. Council Regulation (EC) No 1785/2003 of 29 September 2003 on the common organization of the market in rice (2003).
Doyle, J. J. & Doyle, J. Isolation of plant DNA from fresh tissue. Focus (Madison). 12, 13–15 (1990).
Datukishvili, N., Gabriadze, I., Kutateladze, T., Karseladze, M. & Vishnepolsky, B. Comparative evaluation of DNA extraction methods for food crops. Int. J. Food Sci. Technol. 45, 1316–1320 (2010).
Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data (2010).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at. https://doi.org/10.48550/arXiv.1303.3997 (2013).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021).
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. 201178 Preprint at (2018). https://doi.org/10.1101/201178
Ji, H. et al. Genomic variation in Korean Japonica rice varieties. Genes 12, 1749 (2021).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, snpeff: SNPs in the genome of drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).
Kolberg, L., Raudvere, U., Kuzmin, I., Vilo, J. & Peterson, H. gprofiler2 -- an R package for gene list functional enrichment analysis and namespace conversion toolset g:profiler. Preprint at. https://doi.org/10.12688/f1000research.24956.2 (2020).
Kolberg, L. et al. g:Profiler—interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update). Nucleic Acids Res. 51, W207–W212 (2023).
Subramanian, S., Ramasamy, U. & Chen, D. VCF2PopTree: a client-side software to construct population phylogeny from genome-wide SNPs. PeerJ 7, e8213 (2019).
Letunic, I. & Bork, P. Interactive tree of life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res. 52, W78–W82 (2024).
Yuan, X. et al. Effective identification of varieties by nucleotide polymorphisms and its application for essentially derived variety identification in rice. BMC Bioinform. 23, 30 (2022).
Yadav, R., Khurana, S. & Kumar, S. Сomparative study of properties of basmati and non-basmati rice cultivars. Ukr. Food J. 12, 51–64 (2023).
Seo, J. S. et al. OsbHLH148, a basic helix-loop-helix protein, interacts with OsJAZ proteins in a jasmonate signaling pathway leading to drought tolerance in rice. Plant J. 65, 907–921 (2011).
Sihi, S., Bakshi, S., Maiti, S., Nayak, A. & Sengupta, D. N. Analysis of DNA polymerase λ activity and gene expression in response to salt and drought stress in Oryza sativa indica rice cultivars. J. Plant. Growth Regul. 41, 1499–1515 (2022).
Mei, Q., Fu, Y. W., Li, T. M. & Xuan, Y. H. Ac/Ds-Induced Receptor-like kinase genes deletion provides Broad-Spectrum resistance to bacterial blight in rice. Int. J. Mol. Sci. 23, 4561 (2022).
Takano-Kai, N. et al. Evolutionary history of GS3, a gene conferring grain length in rice. Genetics 182, 1323–1334 (2009).
French, S. R., Abu-Zaitoon, Y., Uddin, M. M., Bennett, K. & Nonhebel, H. M. Auxin and cell wall invertase related signaling during rice grain development. Plants 3, 95–112 (2014).
Sagor, G. H. M., Inoue, M., Kusano, T. & Berberich, T. Expression profile of seven polyamine oxidase genes in rice (Oryza sativa) in response to abiotic stresses, phytohormones and polyamines. Physiol. Mol. Biol. Plants. 27, 1353–1359 (2021).
Liu, T., Kim, D. W., Niitsu, M., Berberich, T. & Kusano, T. Oryza sativa polyamine oxidase 1 back-converts tetraamines, spermine and thermospermine, to spermidine. Plant. Cell. Rep. 33, 143–151 (2014).
Fincato, P. et al. The members of Arabidopsis thaliana PAO gene family exhibit distinct tissue- and organ-specific expression pattern during seedling growth and flower development. Amino Acids. 42, 831–841 (2012).
Liu, T., Wook Kim, D., Niitsu, M., Berberich, T. & Kusano, T. Polyamine oxidase 1 from rice (Oryza sativa) is a functional ortholog of Arabidopsis polyamine oxidase 5. Plant Signal. Behav. 9, e29773 (2014).
Steele, K., Tulloch, M. Q., Burns, M. & Nader, W. Developing KASP markers for identification of basmati rice varieties. Food Anal. Methods. 14, 663–673 (2021).
Vemireddy, R., Archak, L., Nagaraju, J. & S. & Capillary electrophoresis is essential for microsatellite marker based detection and quantification of adulteration of basmati rice (Oryza sativa). J. Agric. Food Chem. 55, 8112–8117 (2007).
Acknowledgements
The authors thank the TRACE-RICE project for supplying the biological material, and Biodata.pt for guidance on data management.
Funding
This work was supported by TRACE-RICE, a PRIMA Programme project with Grant nª1934 supported under Horizon 2020, the European Union’s Framework Programme for Research and Innovation; and Fundação para a Ciência e a Tecnologia, I.P., through GREEN-IT Bioresources for Sustainability R&D Unit base (DOI: https://doi.org/10.54499/UIDB/04551/2020) and programmatic (DOI: https://doi.org/10.54499/UIDP/04551/2020) funding, LS4FUTURE Associated Laboratory (DOI: https://doi.org/10.54499/LA/P/0087/2020), and the Post-Doc contract awarded to PMB (DOI: https://doi.org/10.54499/DL57/2016/CP1369/CT0029).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study’s conception and design. Material preparation and data collection were performed by MBV. Data analysis was performed by MBV and HMR. The first draft of the manuscript including subsequent modifications was written by all authors. All authors read and approved of the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Rodrigues, H.M., Vieira, M.B., Barros, P.M. et al. Whole-genome polymorphisms and relatedness of rice varieties circulating in the Mediterranean market. Sci Rep 15, 40300 (2025). https://doi.org/10.1038/s41598-025-23999-5
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-23999-5







