Introduction

Rice (Oryza sativa L.) is an important cereal crop consumed as a staple and processed form, along with wheat, maize, and potatoes. High rice consumption in more than 175 countries has led to many international programs aimed at boosting human nutrition through the development of high-quality rice1. Consumption of a single type of grain may result in deficiency disorders due to decreased concentrations of vital amino acids, minerals, and vitamins2. Pigmented rice is recognized as a source of natural antioxidant compounds, such as flavonoids, γ-oryzanol, tocopherol, and anthocyanin, among others3,4. Epidemiological research has suggested that the low prevalence of some chronic diseases in rice-consuming regions may be due to the antioxidant content of rice5. Phenolic compounds in rice, such as flavonoids, protect the cells against oxidative stress6. γ-oryzanol in rice bran is widely acknowledged as a beneficial dietary supplement and medicinal material that maintains and promotes human health7. Tocopherol, also a primary bioactive compound, is a source of vitamin E present in high amounts in rice bran, whole rice, and milled rice8.

Genome-wide association studies (GWASs) are widely used to evaluate hundreds of thousands of variations across numerous genomes and to identify those that are statistically associated with complex traits, such as plant height, flowering time, grain production, and other important agronomic features9,10,11,12. GWAS mapping has also identified significant loci linked to rice grain quality13,14,15 and stress resistance traits16,17,18,19.

In a study by Xu et al.20, GWAS analysis used single nucleotide polymorphisms (SNPs) from the whole-genome sequences of 120 rice accessions and identified at least four loci significantly associated with antioxidant capacity (2,2′-azino-bis-3-ethylbenzthiazoline-6-sulphonic acid [ABTS] and 2,2-diphenyl-1-picrylhydrazyl [DPPH]), total phenolic content, and total flavonoid content. These loci include Rd gene (LOC_Os01g44260), which encodes dihydroflavonol-4-reductase, and LOC_Os07g11020, the Rc gene encoding a bHLH transcription factor involved in seed proanthocyanidin biosynthesis.

In this research, 233 rice cultivars were evaluated for their antioxidant traits using various assays, namely ABTS 21, DPPH22, ferric reducing antioxidant power (FRAP)23, total flavonoid content (TFC)24, and total phenolic content (TPC)25 assays, and were associated with SNPs to determine the regions important for antioxidant traits. Appropriate primers were designed and validated in rice population which is the subset of the population used for GWAS, to demonstrate their potential for the selection of antioxidant traits in rice breeding programs.

Results

Phenotypic description of five antioxidant properties

The phenotypic variations in ABTS, DPPH, FRAP, TFC, and TPC are shown in Fig. 1A and Supplementary Table S1. The nonpigmented rice cultivars (white) were distinct from the pigmented cultivars (red and black) in all antioxidant phenotypes. The principal component analysis (PCA) plot displayed tight clustering of all white-pericarp cultivars indicating little variation. Colored rice cultivars (red and black), on the other hand, were scattered (Fig. 1B), although black and red types were fairly distinct along the PC2 axis.

Fig. 1
figure 1

Frequency distribution (A), principal component analysis (PCA) plot (B), and heatmap hierarchical clustering plot (C) of antioxidant properties in different pericarp colors. Three clusters are shown. The color of the top bar represents the pericarp color, and the number at the bottom indicates the accession number in this population (list is provided in Supplementary Table S1). Green, white rice cultivar; purple, black rice cultivar; and red, red rice cultivar. Number above box plot indicate p-value of t-test for each pairwise differences in antioxidants between pigmentation groups.

The ABTS value of white rice ranged from 2.01–22.91 mg Trolox equivalent antioxidant capacity (TEAC)/100 g, which was substantially lower than the ranges of red and black rice at 38.22–109.63 and 31.08–107.02 mg TEAC/100 g, (p = 1.4x10−14 and 4.8x10−11), respectively. The DPPH values of black, red, and white rice ranged from 19.19–70.23, 9.84–64.06, and 0.76–14.13 mg TEAC/100 g, respectively. The black and red rice varieties ‘Khao Niaw Dam Maw’ and ‘Khao Niaw Daeng Grahm Raed’, respectively, had the highest DPPH values (70.23 and 64.06 mg TEAC/100 g, respectively). The FRAP values of black, red, and white rice ranged from 10,187.5—33,931.2, 8,722.7 – 28,352, and 3,070.3–5,957.3 mg TEAC/100 g, respectively. The mean TFC values of black and red rice were comparable (1184.03 and 1111.03 mg gallic acid equivalent [GAE]/100 g, respectively; p = 0.31) and were significantly higher than that of white rice (322.43 mg GAE/100 g; p = 2.2x10−16 for black rice and p = 1.13x10–09 for red rice). Moreover, the TFC value of the red rice cultivar ‘Hawm Mali Daeng’ (69.96 GAE/100 g) was lower than that of all white rice cultivars. The highest TPC values were found in red rice cultivars (476.89–1160.80 mg GAE/100 g), followed by black (356.44–782.22 mg GAE/100 g) and white (60.89–165.33 mg GAE/100 g) rice cultivars. Thus, the red rice cultivar ‘Hawm Mali Daeng’ had the highest TPC value among all cultivars (Fig. 1A).

Heatmap and hierarchical clustering were performed based on the antioxidant phenotypes of all cultivars in this study. The cultivars were grouped into three clusters. The rice cultivars with colored seeds were separated into two clusters, A and B, whereas all white rice cultivars were grouped in cluster C (Fig. 1C). Cluster A typically comprised red and black cultivars with high DPPH and FRAP values, whereas cluster B comprised several red and black cultivars having low antioxidant values, mostly in DPPH and FRAP, with a few also having low ABTS, TFC, and TPC values. A list of the pigmented rice in clusters A and B is presented in Table 1. Cluster C was composed of all the members of the white rice cultivars and had antioxidant content that was typically lower than that of clusters A and B. However, some small groups in this cluster comprised six white rice cultivars, namely ‘Chai Nat’, ‘RD43’, ‘CH4’, ‘CH1’, ‘Aew Mod Daeng’, and ‘Re Hawm’, with slightly higher DPPH content than other white rice cultivars.

Table 1 Pigmented rice cultivars were in clusters A and B.

Genotypic description of SNP markers

In total, 119,541 high-quality SNP markers from the promoters and first 5 exons of all genes, excluding the minor alleles with the frequency less than 5% and the variants from the accessions with missing data, from 233 rice accessions were used in the GWAS. The distribution and density of the SNPs across the 12 chromosomes are shown in Fig. 2. The total length of the rice genome was ~ 380 Mbp, with one SNP per 3.18 Kbp on average. Chromosome 11 had the highest number of SNPs (15,809) and highest variation rate (one SNP per 1.8 Kbp), whereas chromosome 9 had the lowest number of SNPs (5318 SNPs) and lowest variation rate (one SNP per 4.3 Kbp). Details of the distribution and density of the SNPs on all chromosomes are provided in Supplementary Table S2.

Fig. 2
figure 2

SNP density plot of 119,541 variants in 233 rice accessions within 1 Mb window size. The x-axis represents the length of the chromosome in megabases (Mb), whereas the various colors indicate the densities.

The PCA plot of the SNP variants from red, black, and white pericarp cultivars demonstrated that the genotypes associated with pericarp color were combined (Fig. 3A), indicating recent gene flow between the different phenotypes, although a set of white varieties form a homogeneous cluster. This finding differs from the PCA graph of the antioxidant phenotypes (Fig. 1B), in which the white rice cultivars were distinctly separated from the colored rice cultivars. The percentage of explained components was lower, with only 34.39% in PC1 and 19.36% in PC2. Interestingly, the genotypes of ‘Khao Niaw Dam Maw’, ‘Khao Niaw Dam Chaw Mai Pai’, ‘Saeng Lisaw’ (black rice cultivar), and ‘Ble La’ (red rice cultivar) were separated from the others in the PCA plot. This finding is echoed in the phylogenetic tree based on unweighted pair group method with arithmetic mean (UPGMA) (Fig. 3B), where these four cultivars form a distinct clade, suggesting divergence from the other cultivars in this study.

Fig. 3
figure 3

Principal component analysis (PCA) plot of 119,541 SNP markers (A) and phylogenetic tree based on UPGMA (B). The square shows three black rice cultivars and one red rice cultivar, demonstrating the variance dissociation from other rice cultivars in this study.

Detecting genes associated with antioxidant properties using GWAS

A total of 336 quantitative trait nucleotides (QTNs) were detected to be associated with five phenotypic traits linked to antioxidant properties. These QTNs were annotated at 93 loci across 12 chromosomes. The Manhattan plot indicated that many SNPs affected phenotype. Several signals passed the threshold level using the mixed linear model (MLM) method, which was p < 0.05/119,541 ~ 4x10−7 (Fig. 4; Supplementary Table S3).

Fig. 4
figure 4

The Manhattan plot (left) consists of five association antioxidants traits: (A) ABTS, (B) DPPH, (C) FRAP, (D) TFC and (E) TPC. The red dots represent QTNs that pass-through threshold value < 4x10−7 p-value (~ 6.39log10(p)). The bar below represents the density of QTNs in each chromosome. The chart (right) presents the quantile–quantile (Q–Q) plot comparing the observed and expected p-values from the association analyses.

As shown in Fig. 4A, most QTNs associated with ABTS were located on chromosome 7. The QTN at the peak was snp72189, overlapped with LOC_Os07g11020, which is Rc gene, encoding a bHLH transcription factor regulating proanthocyanidin production in seeds. A few other QTNs were found on chromosomes 2, 3, 6, 10, and 11.

More QTNs were associated with DPPH than with ABTS (Fig. 4B). Several QTNs were annotated as loci on chromosome 3, including LOC_Os03g38330 (disease resistance RPP13-like protein 1) and LOC_Os03g38390 (GDSL-like lipase/acyl hydrolase). Similar to its association with ABTS, LOC_Os07g11020 was the most significant locus. LOC_Os04g47080 (anthocyanin regulatory Lc protein, putative, expressed) had the second highest significance value.

The associations with FRAP showed a trend similar to that of DPPH (Fig. 4C). QTNs were spread across all chromosomes, except chromosomes 6, 8, 9 and 10. The most significant SNPs overlapped with LOC_Os07g11050 (IPP transferase), LOC_Os07g11060.1 (expressed protein), and LOC_Os07g11070 (DUF630/DUF632 domains containing protein) with the same p-values (p = 4.73x10−20). The SNPs in these loci were also most significantly associated with TFC. (Fig. 4D; p = 1.34x10−19). This indicates that the same genetic architecture may be involved in the antioxidant traits detected using FRAP and TFC. The SNPs in LOC_Os07g11020 were detected as the peak of TPC association (Fig. 4E) with the lowest p-value across all associations (p = 1.15x10−41). To identify the loci that overlapped with the SNPs consistently found in multiple associations, a Venn diagram was generated (Fig. 5).

Fig. 5
figure 5

The Venn diagram of co-associated loci in all five phenotypic traits.

Among the 93 loci associated with antioxidant properties, 10 were associated with all phenotypic traits (Table 2, Fig. 5). All 10 co-associated loci are located on chromosome 7. Nine of these were clustered together, whereas LOC_Os07g11650 was located 0.3 Mb away from the others. LOC_Os07g11020 with the lowest p-value (1.16x10−41) was detected by GWAS of TPC trait.

Table 2 Ten loci that co-associated with all phenotypic traits based on GWAS.

These 10 loci were considered as common loci involved in antioxidant properties, as they were found in all different traits. Some other loci were considered specific markers for particular antioxidant traits, such as LOC_Os10g30750 encoding Appr-1-p processing enzyme family protein that was only associated with TFC (Fig. 5, Supplementary Table S4). TPC had the most unique loci (26 loci), followed by DPPH (15 loci), and ABTS and FRAP (8 loci in each trait).

Of the 93 genes, 77 (83%) were mapped to previously reported quantitative trait loci (QTLs) on antioxidant traits, which were not only the direct antioxidant capacity but also the chemicals contributed to antioxidant capacity, such as the phenolic and flavonoid contents28, phenyl propanoid glycosides30, vitamin E29, steryl ferulates51, anthocyanin, proanthocyanidin32,27, folate50 and tocotrienol49. However, some SNPs on chromosomes 3, 5, and 8 were not consistent with previously reported QTLs. There are 32 SNPs located between 21.1–21.4 Mb of chromosome 3, covering 15 loci that have not been reported to be involved with antioxidant traits. The loci containing the highest number of SNPs are LOC_Os03g38330, encoding disease resistance RPP13-like protein 1, LOC_Os03g38350, encoding expressed protein, and LOC_Os03g38390, encoding GDSL-like lipase. None of them have been reported to be associated with antioxidant traits in rice seeds. Three SNPs in two loci at the far end of the long arm of chromosome 5 (position 28 Mb) were detected via association with DPPH and FRAP, whereas only one SNP on the short arm of chromosome 8 at position 3.2 Mb has not been reported to be related to antioxidant traits. In the two latter cases, the signal is less convincing than the one on chromosome 3 and could be artefacts. In future studies, the loci between 21.1–21.4 Mb in the long arm of chromosome 3 should be considered for their role and functions (Fig. 6).

Fig. 6
figure 6

The mapping of previously reported QTLs is shown with different color bars located beside the chromosomes in comparison with the detected loci from GWAS in this study, as shown by the horizontal black bars in the chromosomes. The red arrow indicates the potential QTN relating to antioxidants that have not been reported by QTLs marker before.

Linkage disequilibrium and haplotype analysis

The LD analysis was conducted to evaluate the region’s degree of linkage. During meiosis, QTNs in the same block tend to be inherited together. GWAS mapping may identify false positive QTNs close to a real QTN. The average LD for SNPs at 1000 kb distance from each other was 0.76 (Supplementary Figure S1). In this study, we analyzed LD in chromosome 4 and 7, especially neighboring snp40579 and snp72189 (Fig. 7 and 8), since those two QTNs located within significant loci, LOC_Os04g47080/Ra gene and LOC_Os07g11020/Rc gene, respectively.

Fig. 7
figure 7

Linkage disequilibrium near snp40579 and distribution of haplotype of each phenotype. (A) Local Manhattan plot (top) and LD heatmap (bottom). The significant SNPs within the block are represented in red font. (B) Boxplot of distribution of each haplotype in Block 2 regarding all antioxidant phenotypes. (C) The sequences of eight haplotypes in Block 2 have frequencies > 0.01. Numbers on the right-hand side represent the frequency of each haplotype.

Fig. 8
figure 8

Linkage disequilibrium near snp72189 and distribution of haplotype of each phenotype. (A) Local Manhattan plot (top) and LD heatmap (bottom). The significant SNPs within the block are represented in red font. (B) Boxplot of distribution of each haplotype in Block 2 and Block 3 regarding all antioxidant phenotypes. (C) The sequences of haplotypes in Block 2 and Block 3 have frequencies > 0.01. Numbers on the right-hand side represent the frequency of each haplotype.

An LD analysis on chromosome 4 shows that QTNs surrounding snp40579 along the distance between 27.8 k to 28.0 k (Fig. 7A, Block 2) are inherited together more frequently than would be expected by chance. Moreover, as shown in Figs. 7B and 7C, there are eight haplotypes with frequencies higher than 0.01 within block 2. Black rice cultivars were typically found in haplotype 8. Red rice cultivars were found across haplotypes 1, 3, 5, and 6. Meanwhile, white rice cultivars are found across all haplotypes except haplotype 8. Various remaining haplotypes with a frequency less than 0.01 actually showed a high correlation with antioxidant levels.

On the other hand, LD analysis on chromosome 7 focused on the region between 6.0 Kb and 6.1 Kb surrounding snp72189 (Fig. 8A). At this distance, there were two blocks, Block 2 and 3. Block 2 consisted of three haplotypes (2 homozygous and 1 heterozygous) with 12 SNP each (Fig. 8B and C). Haplotype 1 consisted of all group cultivars. Haplotype 2 was typically dominated by red and white. Meanwhile, one black and one red rice were two cultivars with heterozygote genotypes (Fig. 8B). A black rice cultivar, ‘Khao Niaw Dam Maw’, was typically a member of haplotype 1 with the highest antioxidant levels. At the same time, a red rice cultivar ‘Khao Niaw Daeng Grahm Raed’, was the highest antioxidant level in haplotype 2. Block 3 contained five haplotypes with frequencies larger than 0.01 (Fig. 8B and C). Black rice cultivars were found across haplotypes 1, 2, followed by red rice cultivars in haplotypes 3, 4, and 5. In comparison, white rice cultivars were spread across haplotypes 1, 2, and 3 (Fig. 8B).

Antioxidant markers were designed based on OsRa and OsRc

Snp40579 is located on LOC_Os04g47080 and overlapped with OsRa34. Four associations (DPPH, FRAP, TFC, and TPC) showed that this QTN had a significantly low p-value (p = 2.31x10−07 – 3.01x10−16). Therefore, this gene was selected for the development of genetic markers. snp72189 is located on LOC_Os07g11020 and overlapped with OsRc35. This QTN had the most consistent association with all phenotypic traits, with a significantly low p-value (p = 2.26x10−11 – 1.16x10−41). This gene was selected as the second region for the development of genetic markers that can discriminate between rice cultivars with low and high antioxidant levels.

The genetic markers targeted snp40579 (chr4) and snp72189 (chr7). For each SNP, approximately 2.5 Kb was expanded in both directions of the selected SNPs to determine regions with (insertion-deletions) INDELs showing an association with antioxidant traits. Marker efficiency was tested on 96 rice cultivars: 44 colored rice and 52 white rice cultivars. The cultivars are listed in Supplementary Table S5.

For snp40579 (chr4), the marker was designed in the region of LOC_Os04g47059, which was previously annotated as OsTT8 (Transparent Testa 8)36, as insufficient INDELs were found in LOC_Os04g47080 (OsRa). OsTT8 is located between OsRa and nearby QTNs in LOC_Os04g47030. The expected sizes of the PCR products were 510 bp for allele 1 (Hom1), in cultivars with low antioxidant levels, and 464 bp for allele 2 (Hom2), in cultivars with high antioxidant levels. The example of PCR banding patterns was shown in Supplementary Figure S2.

Poin-biserial correlation was used to determine the association of PCR product patterns amplified with the marker in OsTT8 gene and the values of five antioxidant properties (ABTS, DPPH, FRAP, TFC, and TPC; Fig. 9; Supplementary Table S5). All correlations show a positive trend, with R-values ranging from 0.49– 0.76. The lowest R-value was observed for ABTS, whereas the highest was detected for DPPH. All black rice cultivars possessed the Hom2 allele, whereas all white rice cultivars possessed the Hom1 allele. Nevertheless, several colored rice cultivars with high antioxidant levels behaved the same as white rice cultivars (possessing allele Hom1). We found that 15 red rice cultivars had the Hom1 allele (Supplementary Table S5). We assumed that the genetic background of Thai rice cultivars is unique and may not be discriminated using this marker alone. As indicated in the PCA and phylogenetic tree analyses (Fig. 3), the genetic makeup of the colored rice cultivars was mixed with that of the white rice cultivars.

Fig. 9
figure 9

Point-biserial correlation and linear regression of the first marker in the OsTT8 gene with five antioxidant phenotypes: (A) ABTS, (B) DPPH, (C) FRAP, (D) TFC, and (E) TPC. Green, white rice cultivars; purple, black rice cultivars; red, red rice cultivars.

The second marker was focused on a position near snp72189 (OsRc). A primer pair was designed to cover the promoter region of OsRc. The expected fragment size amplified from the genomic DNA of white rice was 241 bp and was designated as the Hom3 allele, whereas that from the genomic DNA of colored rice was 216 bp and was designated as the Hom4 allele. The PCR product patterns showing Hom3 and Hom4 alleles were shown in Supplementary Figure S3.

The Point-biserial correlations between alleles and antioxidant traits in the 96 cultivars showed a positive trend, with an R-value ranging from 0.35–0.70 (Fig. 10A - E). DPPH was the least affected by this second marker, as indicated by an R-value of 0.35. This finding contrasts with that of the OsTT8 marker (R-value = 0.76) (Fig. 9B). Correlation of OsRc with ABTS had the highest R-value (R-value = 0.70) (Fig. 10A). Surprisingly, the correlation between ABTS and the OsTT8 marker was the lowest among all correlations with antioxidant properties (Fig. 9A). This finding suggests that the antioxidant properties detected by the different methods reflect the different genes involved in antioxidant regulation.

Fig. 10
figure 10

Point-biserial correlation and linear regression of the second marker in the Rc gene promoter with five antioxidant phenotypes: (A) ABTS, (B) DPPH, (C) FRAP, (D) TFC, and (E) TPC. Green: White rice cultivars, Purple: Black rice cultivars, and Red: Red rice cultivars.

All the red rice cultivars possessed the Hom4 allele, whereas all the white rice cultivars possessed the Hom3 allele. Similar to the OsTT8 marker, 22 black rice genotypes contained the Hom3 allele. Furthermore, six red rice cultivars possessed the Hom2 (OsTT8 marker) and Hom4 (OsRc marker) alleles (Supplementary Table S5).

Discussion

Thailand is a center for rice diversity with more than 5,000 types of rice37 and since the green revolution has been a leading global exporter of rice38. Thailand has four rice production regions based on ecological resource conditions and topographic factors: (1) the northern area, which produces 23% of all paddy rice; (2) the northeastern area, which produces 44% of fragrant rice; (3) the central plain area, which produces 26% of deep-water rice; and (4) the southern area, where is the origins of multiple types of rice, white, red and black rice39. The pigmented rice cultivars studied here were distributed in the northern, northeastern, and southern parts of Thailand whereas the white rice cultivars were mostly distributed in the central and eastern regions (Fig. 11). For instance, specific rice genotypes in the highlands of northern Thailand have higher monomeric anthocyanin and pigment content than those in the lowlands40. Local high-quality rice varieties have gained market share owing to growing interest both locally and internationally. Cultivars with pigmented pericarps are notable for their potential health benefits and commercialization41.

Fig. 11
figure 11

Geographic distribution of Thai rice cultivars used in the current study. The cultivar numbers are listed in Supplementary Table S1. The colors of the circles represent the pericarp color: green, white rice; red, red rice; and purple, black rice. N/A indicates that there is no information available regarding the location of the cultivar.

Pigmented rice is widely known to have antioxidant levels superior to those of white rice. However, color variations may affect the antioxidant levels. In this study, the red rice cultivars had significantly higher ABTS and TPC values than the black rice cultivars. In contrast, DPPH values were higher in the black rice cultivars. Meanwhile, FRAP and TFC values did not significantly differ between the red and black rice cultivars. According to Thaworn, et al.42, red Thai rice cultivars have much higher ABTS and TPC values, but the same TFC values as purple rice cultivars. ‘Lai Mak’, ‘Chaw Pli Khao’ (red rice cultivar), and ‘Gam Feuang’ (purple rice cultivar) exhibited the highest antioxidant properties. Higher levels of TPC in red rice than in purple rice were attributed to protocatechuic acid (11.08 and 5.77 mg/100 g) and syringic acid (0.42 and 0.07 mg/100 g)5. Proanthocyanidins may have potent antioxidant activity superior to that of anthocyanins due to the high content of ortho-dihydroxyl groups43. Nevertheless, another study reported that black rice cultivars had a net tendency for the highest values based on DPPH, FRAP, TFC, and TPC, followed by red, purple, and brown rice cultivars5. Consistent with the results of previous studies, variations in antioxidant properties within the same pericarp type were found in this study such as the phenolic acid44, polyphenol45, and phenolic compound concentration46.

Variation in metabolites depends on two possible causes: changes in regulatory factors and changes in biosynthetic enzymes. In the Thai rice varieties, LOC_Os07g11020 was the locus most consistently associated with all antioxidant traits. It is also known as the OsRc gene 35,47,48, which encodes an MYB- helix-loop-helix (bHLH) motif domain regulatory protein and is involved in proanthocyanidin synthesis in rice pericarp. Previous studies have demonstrated the association of the OsRc gene with pericarp color. Yang, et al.49 reported that a GWAS using the presence-absence and degree of pericarp color methods showed a high density of signals on chromosome 7 with the signal peak on the OsRc gene. Nitika, et al.50 also mentioned similar GWAS results which OsRc was the main detected gene in pericarp color using 1,349,269 SNPs and 325 diverse rice accessions. Morover, five out of 10 loci that co-associated with all phenotypic traits (LOC_Os07g10970, LOC_Os07g11000, LOC_Os07g11050, LOC_Os07g11060, LOC_Os07g11070) were located in same LD block with LOC_Os07g11020 / OsRc (Fig. 7A). This finding indicates that nearby loci tend to be inherited along with LOC_Os07g11020 instead of having their own effect.

As shown in Fig. 6, approximately 83% of the QTNs detected by GWAS were mapped with known QTL reported by several publications, including Sookwong, et al.33, Dong Wei, et al.32, Mahender, et al.51, Kato, et al.7, Xu, et al.30, Nakano, et al.29, Nakano, et al.27, Zhang, et al.28, Tong, et al.31, and Li, et al.26. The finding of our GWAS analysis, therefore, indicates that several of the same QTL identified in previous studies contributed to antioxidant properties in Thai rice varieties.

The Ra gene has been reported to regulate antioxidant traits in rice. Ra encodes the anthocyanin regulator Lc (LOC_Os04g47080.1), a bHLH transcription factor. GWAS detected this locus in association with all antioxidant traits, except the ABTS detection method. QTL mapping has indicated that this locus regulates steryl ferulate content29, phenylpropanoid glycosides27, and vitamin E28. Therefore, both OsRa and OsRc detected in this GWAS were consistent with previous reports.

A gene associated with antioxidants in the present study is OsTT836 (LOC_Os04g47059.1), is closely linked to OsRa around 40 kb away and located in same LD block (Fig. 7A). The OsTT8 is also known as OsKala452 and OsB234. This locus is essential for the formation of black-purple pigmentation in rice pericarp. OsTT8 encodes a bHLH transcription factor, which was well characterized in multiple species as anthocyanin regulator. Mutations that inactivate OsTT8 are recessive and affect anthocyanin production resulting in transparent or chalky white grains, hence the name "Transparent Testa 8" (TT8)53,54,55.

Based on Point-biserial’s correlation, the OsTT8 marker was strongly associated with black rice cultivars because black rice varieties always display the Hom2 allele, while red rice cultivars possess either the Hom1 or Hom2 alleles. Compared with the other phenotypic traits, DPPH was most positively correlated with this region (R-value = 0.763). Ghasemzadeh, et al.56 reported that the DPPH radical scavenging activities of the free and bound fractions in black rice were higher than those in red and brown rice bran. Moreover, Laokuldilok, et al.57 reported that a higher amount of anthocyanin was found in black bran (1135–2562 mg/g), followed by red bran (188 mg/g), and that it was absent in normal bran. The black rice cultivars also contain higher γ-oryzanol (2483–4057 mg/g) and α-tocopherol (24–27 μg/g) levels compared with red rice cultivars (1859 mg/g, and 16 μg/g, respectively).

In contrast, the marker on the Rc promoter was significantly associated with red rice cultivars because of the consistency of the red rice genotype possessing the Hom4 allele, as opposed to black rice cultivars (possessing the Hom3 or Hom4 allele). OsRc encodes an rc—bHLH transcription factor regulating proanthocyanidin production in seeds. The phenotypic traits ABTS and TPC significantly correlated with this region (R-value = 0.704 and 0.685, respectively). This was confirmed by the phenotypic distribution shown in Fig. 1, which demonstrated that red rice cultivars had significantly higher ABTS and TPC values than black rice. Our results are similar to those of Thaworn, et al.42 reported that the TPC value in local Thai rice was higher in red rice than in black rice cultivars. However, contrasting findings reported by Wisetkomolmat, et al.37 showed that the ABTS, DPPH, FRAP, TFC, and TPC levels of black rice cultivars (‘Bien Koo 5 CMU’, ‘K 4 CMU’, ‘Kum Akha 1 CMU’, ‘Kum Doi Saket’, ‘Kum Chao Morchor 107’, ‘Sang 5 CMU’, and ‘Pieisu 1 CMU’) were higher than those of a red rice cultivar (‘Yamuechaebia 3 CMU’). Finally, FRAP and TFC appeared to positively correlate with both markers in the black and white cultivars.

In conclusion, our investigation of 223 Thai rice cultivars revealed those with pigmented seeds display higher antioxidant content. Furthermore, much of the variation in antioxidant content depends on few regulatory and biosynthetic loci. We did, however, identify a novel QTL on chromosome 3 that deserves further characterization. We also identified a small and distinct clade of four pigmented varieties that suggest selection for specific properties likely to include anthocyanin pigments.

Materials and methods

Plant materials

A collection of 233 accessions was obtained from the Center of Excellence in Environment and Plant Physiology (Department of Botany, Chulalongkorn University, Thailand) as previously described by14. The accessions included 191 white, 20 red, and 22 black pericarp cultivars (Supplementary Table S1). All rice varieties were planted in the field simultaneously in 2017, and the seeds were collected to determine antioxidant levels using ABTS, DPPH, FRAP, TFC, and TPC assays.

One rice population consisting of 52 white rice cultivars and 44 pigmented rice cultivars was chosen for PCR validation. DNA samples were obtained from the shoot tissue of seedlings at the 2-week-old stage. DNA samples were used to validate the designed genetic markers via PCR.

Determination of total antioxidant capacity, TFC, and TPC

The antioxidant capacities of rice extracts were determined using the ABTS, DPPH, and FRAP assays21,22,23. Results are expressed as the TEAC per 100 g of dry weight whole-grain rice. TFC was calculated according to Shen et al.46, and TPC was calculated according to Zhang et al.25. The results are expressed as mg GAE per 100 g of dry weight whole-grain rice. The measured values for these antioxidant assays are listed in Supplementary Table S1.

The box plots were used to illustrate the distribution of antioxidant phenotypes among white, red, and black rice cultivars. The student’s t-test was used to analyze the significant difference of antioxidant level among rice pigmentation group. The first two principal components of the PCA were calculate using prcomp() function in R Program and visualized using ggplot2 package58 to represent the phenotypic variation. A heatmap with a hierarchical clustering plot was constructed using R package “pheatmap”59 based on Euclidean distances to identify clustering patterns in the rice populations based on their antioxidant properties.

Thai rice genome sequencing

Genotype data consisting of 345,448 SNPs were obtained from the whole-genome sequences of 233 local Thai rice accessions. The rice genome of these 233 accessions were sequenced using Illumina NovaSeq system (Illumina, Inc., San Diego, CA, USA). The whole-genome sequencing data exhibited high quality, with a mean coverage of 92.5%, an average sequencing depth of 37.46x, a mean base quality score of 36.35, and a mean mapping quality score of 41.61, ensuring robust and accurate variant calling across the genome.

Variant calling and preparation for GWAS

Burrow-Wheeler Aligner (BWA) version 0.5.7–160 was used to align raw reads against genome assembly IRGSP-1.0. Variants were called following genome analysis toolkit (GATK) best practice pipeline using GATK version 3.3–061. Variants with minor allele frequencies below 5% were excluded, and all sites with missing data were filtered out, resulting in a missing site proportion of 0. Variants were filtered based on the following criteria: for a heterozygous call, the minimum coverage was set to 5, with each of the two major alleles comprising at least 20% of the reads, and a minimum total coverage of 10 was required. Multiallelic variants were discarded. For a homozygous call, a minimum coverage of 6 was used, or a minimum of 3 if at least 10 accessions had positions with coverage of 6. Variants from the first five exons of each gene were selected for GWAS analysis to enhance the detection of causative associations within genes and increase the likelihood of detecting significant association which optimize the power of study in the given the sample size. The density of SNPs in each chromosome then was visualized using R package ‘CMplot’62. A PCA was also generated to illustrate the population stratification within rice population using R function prcomp() and package ‘ggpplot’58. A phylogenetic tree was calculated based on UPGMA using VCF2PopTree63 to show the genotypic relationships among cultivars. Some cultivars were emphasized as potential parental lines based on their antioxidant levels and genetic backgrounds for further rice breeding programs.

Identification of genes associated with antioxidant properties using GWAS

GWAS was performed to determine the association between SNP markers and the five antioxidant traits. GWAS was conducted on GEMMA v0.96 using MLM64, which treats the SNP markers as fixed effects and phenotypic traits, kinship matrix (K), and PCA for population structure as random effects. Bonferroni correction (0.05/m; m = number of SNP markers) was used for the single-locus method; the threshold for association was corrected to 4.18x10−7 for the GWAS with 119,541 SNPs. Manhattan plots were generated using the R package ‘CMplot’62. SNPs with a p-value below the cutoff threshold were selected as significant SNPs and used for annotation analysis against the Rice Genome Annotation Project database (http://rice.uga.edu)65. The candidate loci were then obtained based on the QTN positions that either hit gene positions or are located 2.5 Kb near the gene positions based on the LD-plot decay.

The potential genes responsible for antioxidant traits were determined via functional annotation and comparison with previously reported QTLs underlying antioxidant traits. The Conserved Domain Database tool (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) of National Center for Biotechnology Information was used to predict the conserved domain of each candidate gene66. The reported QTLs associated with antioxidant properties were collected from publications over the last 10 years, and the physical locations were retrieved from the Gramene database (https://archive.gramene.org/markers/). They were subsequently mapped together along the candidate gene using the Chromosome Map Tool (http://viewer.shigen.öoryzavw/maptool/MapTool.do)67,68. A Venn diagram was generated to identify the loci that were co-associated with all the traits by Drew Venn Diagram (https://bioinformatics.psb.ugent.be/webtools/Venn/).

Linkage disequilibrium analysis

The linkage disequilibrium (LD) decay was plotted to visualize how the association between alleles at different loci decreases with increasing distance between those loci. Pairwise LD values between markers were calculated as the squared correlation coefficient (r2) using PLINK 1.969. Then the LD decay conducted by division of marker pairs within the 1000-kb region into bins 125 kbp and then was visualized by ggplot58.

The LD-plot associations were done for the region with high association signal in the Manhattan Plot. The SNPs with lowest p-value were selected from corresponding chromosome and extended to the up and down stream regions until the region moved to another LD block. The LD-plot association was then visualized by Haploview70. The sequences of each haplotype from the selected block were then plotted as a boxplot based on the phenotype distribution.

Development of genetic markers in local Thai rice cultivars

Genetic markers were developed to emphasize the presence of INDELS in the candidate genes of rice cultivars with different antioxidant levels and pigmentation. Primer pairs were designed to amplify fragments of candidate genes containing all INDELS, leading to a size difference of a minimum of 50 bp, must hit QTNs from any associations and should not have heterozygous allele, using our in-house bash script run in linux environment. These genetic markers were validated using 52 white and 44 pigmented rice cultivars (23 black and 21 red cultivars), which is the subset in the population used for GWAS with high and low antioxidant levels, respectively (Supplementary Table S5). The varying PCR amplicon sizes represent the presence of INDELS in each marker. Finally, Point-biserial’s correlation was used to evaluate the correlation between antioxidant levels in rice and genotypic differences in the markers. The primers used were as follows: for the first marker (OsTT8), Fwd, CGATCATCACTAGGCGATTG and Rev, CATGGCATCGTCCCTTATG; and for the second marker (promoter region of OsRc), Fwd, AAGAATAAAAACCGTGCTGACG and Rev, GGATGATCGAGATCGATAAGTAAAG.