Application of machine learning and genomics for orphan crop improvement

MacNish, Tessa R.; Danilevicz, Monica F.; Bayer, Philipp E.; Bestry, Mitchell S.; Edwards, David

doi:10.1038/s41467-025-56330-x

Download PDF

Review Article
Open access
Published: 24 January 2025

Application of machine learning and genomics for orphan crop improvement

Nature Communications volume 16, Article number: 982 (2025) Cite this article

12k Accesses
24 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Orphan crops are important sources of nutrition in developing regions and many are tolerant to biotic and abiotic stressors; however, modern crop improvement technologies have not been widely applied to orphan crops due to the lack of resources available. There are orphan crop representatives across major crop types and the conservation of genes between these related species can be used in crop improvement. Machine learning (ML) has emerged as a promising tool for crop improvement. Transferring knowledge from major crops to orphan crops and using machine learning to improve accuracy and efficiency can be used to improve orphan crops.

Advancing crop recommendation system with supervised machine learning and explainable artificial intelligence

Article Open access 15 July 2025

Explainable machine learning models of major crop traits from satellite-monitored continent-wide field trial data

Article 04 October 2021

Integrated biotechnological and AI innovations for crop improvement

Article 23 July 2025

Introduction

Orphan crops, also known as “minor”, “neglected”, “underutilised”, and “understudied” crops are frequently grown in developing countries and are an important source of nutrition for local communities (Table 1)^1,2,3,4,5,6. Many of these crops have not benefited from the Green Revolution that improved the productivity of major crops such as wheat and rice. Modern crop improvement techniques, such as marker assisted breeding (MAB) and genome editing, have not been widely applied to orphan crops due to the lack of resources available. However, genomic technologies offer significant potential for orphan crop improvement. The major crops such as wheat, rice, maize and soybean are widely distributed and produced on an industrial scale, whereas the orphan crops vary considerably in their production, from the reasonably wide distribution of sorghum, to crops that are only produced in specific regions such as Ensete. Many orphan crops are tolerant of abiotic and biotic stressors and can be produced in marginal and harsh environments, possessing traits, that if understood, may be transferrable to major crops.

Table 1 Production of representative orphan crops

Full size table

There are currently relatively few genomic resources for orphan crops, though initiatives such as the African Orphan Crops Consortium (AOCC)⁷ and Crops for the Future (CFF)⁸ are working towards their improvement. However, there is still much work needed to translate these resources to improve crops. There are orphan crop representatives across major crop types and the relatedness between orphan and major crops can be used to improve crop breeding efforts in both orphan and major crops. Major crops have a wide range of resources available, and this can be transferred to related orphan crop species. Orthologs of agronomically important genes from major crops have been found in orphan crops for traits such as stress tolerance⁹, and conservation of gene function between major and orphan crops can support their improvement. Similarly, knowledge of novel beneficial genes in orphan crops could be used to enhance traits in the major crops. A major challenge in crop improvement is the continued growth of data. Machine learning based methods are starting to be applied for crop improvement (Fig. 1) and these will have direct applications for orphan crop improvement as well as the translation of knowledge from major crops to orphan crops.

**Fig. 1: The proportion of articles available on Europe PMC for the search terms ‘machine learning crops’ and ‘orphan crops’ between 2008 and 2024.**

In the past 100 years, large gains in crop yield have been made possible by the introduction of statistical methods into plant breeding. R. A. Fisher pioneered statistical methods such as ANOVA and randomised control trials in plant breeding^10,11. Since then, statistical approaches have been at the core of plant breeding leading to unprecedented increases in crop yields. These methods include RR (Ridge Regression), BLUP (Best Linear Unbiased Prediction) and its variants such as GBLUP (Genomic Best Linear Unbiased Prediction), all of which fall under the broader category of genomic selection. However, yields are not keeping pace with a growing population and the threat of climate change. To ensure sufficient food production for a warmer world, modern approaches such as CRISPR genome editing and machine learning are needed. Machine learning (ML) is a set of methods that uses large amounts of data to approximate mathematical functions. Deep learning (DL), a subset of ML, utilises deep layers of artificial neural networks to “learn” mathematical functions from training data. ML’s ability to identify complex patterns within large and diverse datasets, from images and genomics to tabular data, makes it a powerful tool for improving trait prediction accuracy and crop breeding efficiency^12,13,14,15 (Fig. 2).

**Fig. 2: Machine learning models can analyse genome sequencing and related datasets to generate various predictions.**

The success of ML has been facilitated by an explosion of available data, driven by the ever-decreasing costs of genome sequencing ($200 per human genome in sequencing costs only). Other drivers are the increased availability of compute power in the form of accessible and machine learning-specialised GPUs, high performance computing centres, and accessible cloud computing, leading to ML becoming established as a group of tools in genomics and crop improvement.

One area where ML is having an impact in crops is marker-assisted breeding (MAB)^16,17, where ML can be used to link phenotypes of agronomic interest with molecular genetic markers so that they can be applied to accelerate breeding. Another is yield prediction, where several studies have evaluated the accuracy of different machine learning architectures across different datasets to predict crop yield¹⁸. When ML is combined with CRISPR genome editing¹⁹, it can be used both to identify potential favourable modifications and design accurate guide RNAs (sgRNAs) with few off-target effects²⁰.

The knowledge gained by applying these methods to major crops will also assist in the improvement of orphan crops, and vice versa. In this Review, we will discuss the potential of using ML for orphan crop improvement. We will highlight how ML can improve the knowledge available for orphan crops, find similarities between major and orphan crops, and transfer knowledge from major crops to orphan crops.

Machine learning applications for crop improvement

Machine learning has been extensively applied in crop improvement, with hundreds of publications ranging from identifying markers for MAB to using image recognition for accurate phenotyping and disease resistance recognition (Table 2)^{19,20,21,22,23,24,25,26,27,28,29,30,31,32}. Predicting phenotypes has been one of the main applications of machine learning. One of the earliest examples of yield prediction using machine learning is from 2008, where bread wheat field measurements were used in a simple artificial neural network to predict yield over seasons³³. Later studies used different variables to predict yield using ML, such as transplanting parameters in rice³⁴, irrigation and evaporation parameters in sugarcane³⁵, or soil and irrigation data in wheat, barley, and canola across years and locations³⁶. All these examples use environmental data, but do not include information about the genetic composition of crops.

Table 2 Different applications of machine learning in crops

Full size table

Some studies have used genetic data alone to predict yield directly. One of the earliest examples is DeepGS, a convolutional neural network (CNN) that predicts phenotypes from genotype data, complementing the widely used RR-BLUP²⁶. Other DL architectures have been used successfully to predict mixed phenotypes (binary, ordinal, continuous) from genotypes in bread wheat²⁷ as well as phenotypes while incorporating data from multiple environments³⁷. However, benchmarks reveal that DL on its own usually performed similar to traditional genomic selection approaches, with ensemble-based approaches including several models showing the highest prediction accuracy³⁸. A similar benchmark revealed that in soybean, tree-based machine learning approaches such as XGBoost and Random Forests outperformed deep learning-based approaches in 13 out of 14 phenotypes²⁵, indicating that DL may not be the best machine learning approach in plant phenotype prediction.

Genomic data has been successfully combined with environmental data to improve prediction accuracy. Kick et al.³⁹ utilised genetic data, environmental measurements, and recorded management interventions to predict maize yields, finding that DL models performed similarly to, but with greater consistency than, BLUP models. Måløy et al.⁴⁰ evaluated the then-novel Performer deep learning architecture using SNPs and environmental data to predict barley yield across locations and years, outperforming other DL architectures and Bayesian approaches. Li et al.⁴¹ assessed the accuracy of transfer learning by pre-training DL models using genomic and non-yield phenotypic data in maize, rice, and wheat. The pre-trained layers were then fine-tuned for yield prediction tasks, outperforming established DL and RR BLUP approaches. Image-based phenotyping or drone data is commonly used in conjunction with genetic data to predict yield. In maize, Danilevicz et al.⁴² combined multispectral imagery with genotyping data to identify high-performing varieties in the field. Later research focuses on multimodal models, as integrating multiple data types has generally shown superior performance compared to single-modality models⁴³.

Once phenotype prediction accuracy has been established, ML can be employed to identify quantitative trait loci (QTLs) or genes underlying traits of interest. An early example used QTL identified by genome-wide association studies and several approaches from RR-BLUP to Random Forest, to predict yield based on genome-wide association study (GWAS)-associated markers in rice and showed that these methods outperformed established pedigree-based approaches²⁹. In soybean, predicting yield from genotypic data using XGBoost led to the identification of SNPs linked with prediction accuracy, and these SNPs overlapped with known markers previously linked with yield²⁵. Liu et al.²⁸ trained a Convolutional Neural Network to predict yield based on soybean SNPs, and then drew saliency maps to identify genomic regions with the strongest impact on phenotype prediction. All identified regions overlapped with GWAS-identified SNPs. Another approach is PlantMine, which identified SNPs associated with prediction accuracy using XGBoost, and then used these ‘core’ SNPs to reduce noise in genomic prediction algorithms⁴⁴. One interesting approach identified nitrogen-use efficiency genes using RNASeq, and then ranked these genes using an expression-level trained XGBoost to identify candidate genes and transcription factors. These genes were functionally validated and are now available for further nitrogen breeding in maize⁴⁵. Machine learning is now at the core of crop breeding in companies, leading to improved breeding pipelines and reduced cost, for example the application of an AI assistant for breeders selecting the best breeding candidates⁴⁶.

For ML to have a significant practical impact on plant breeding, training programs are essential. ML practitioners in plant breeding operate at the intersection of bioinformatics, plant biology, and breeding. They require a unique combination of skills and experience, including computational abilities, domain expertise, and proficiency in experimental design. Similar recommendations have been made previously in the field of plant breeding⁴⁶. However, training opportunities for this specific skill set are currently limited.

Machine learning applications for orphan crops

There are two main approaches for crop trait prediction using ML, image-based ML models and genomics-based ML models, with some studies combining these in an ensemble approach^12,14. While genomic and image data is increasingly abundant for the major crops, it is rarely available for orphan crops. Publicly available orphan crop data sources include online resource for community annotation of eukaryotes (ORCAE)⁴⁷, a metabolomics database for roots, tubers, and bananas⁴⁸, and a collection of 26 transcriptomes for orphan crops and their wild relatives⁴⁹. ORCAE is a database for the genomes and annotations of the orphan crops assembled by AOCC⁴⁷, and the genomes available through ORCAE could be used for the construction of genomics-based ML models for orphan crop trait prediction where suitable phenotype data is available. These could be complemented by intermediate phenotypes, for example transcriptomic or metabolomic data^48,49.

There are currently no public databases hosting orphan crop images that could be used for image-based ML models. However, two studies have applied ML to orphan crops for trait prediction using the limited data available^50,51. Nazari and colleagues⁵² developed a DL model, a type of ML model, to predict the quality traits of protein, tannin and, total phenolic content (TPC) in sorghum. Determining chemical content through conventional laboratory tests is expensive and time consuming, so Nazari et al.⁵² developed an efficient and cost-effective method to predict chemical composition using images and DL. The grains of ten lines of sorghum were harvested at maturity and the protein, tannin, and TPC content of 100 g of each line was measured using conventional laboratory tests. The remaining sorghum grains were photographed on a black background with consistent lighting, and the colours within each photograph were analysed to determine texture variables. The protein, tannin, and TPC content and the texture variables for all the ten sorghum lines were used as input for a multilayer perceptron (MLP) model for trait prediction. Multilayer perceptron is a type of DL model made up of three layers, the input layer, the output layer and a hidden layer. The hidden layer is where the model identifies patterns within the data and these patterns are then used to predict the output. The model learns through interconnected nodes within each layer that are designed to work in a similar way to the neurons in a human brain. Nazari and colleagues⁵² found a significant difference in the protein, tannin, and TPC content between each sorghum line and the content measured in the laboratory, and predicted by the DL model had a correlation of greater than 0.9 for each of these traits. Another study used near-infrared reflectance spectroscopy (NIRS) and DL to predict quality traits in the orphan crop Perilla⁵³. The DL models had high prediction accuracy with R² values of 0.83, 0.92, 0.78, and 0.82 for the biochemical traits ash, protein, total soluble sugar and phenol content respectively. By using NIRS and ML the authors were able to develop a cost-efficient and accurate method for predicting the nutritional content within Perilla germplasm. As the knowledge available for orphan crops grows, more studies could use ML to efficiently predict the traits (Fig. 3). However, the limited quantity of public data highlights the need for establishing and supporting databases of image and genomic data for orphan crops that could be applied for ML based trait prediction.

**Fig. 3: A workflow of machine learning applications in orphan crops improvement.**

Large language model applications for crop improvement

Large language models (LLMs) are a subsection of machine learning, designed to “understand” language and identify patterns from text⁵⁴. Recently, LLMs have been increasingly applied to analyse biological sequential data, such as gene expression profiles, genomic DNA sequences and protein sequences. In this context, biological language models approach the DNA or amino acid sequence as text strings, splitting the biological sequences into words and finding the relationship between them⁵⁵. The application of language models to understand plant biological datasets is not a new concept^56,57,58,59, but recent technological advances have enabled more powerful LLM architectures to emerge^60,61. The application of LLMs can enrich the reduced genomic resources of orphan crops, leading to a better understanding of the diversity in orphan crop genomes.

The large language model’s capacity for transferring knowledge into new domains is particularly valuable in the context of orphan crops, as they can leverage insights from well-studied species to predict gene functions, identify regulatory elements, and uncover genetic patterns in orphan crops. Nucleotide Transformer is a prime example of a collection of foundational LLMs for predicting gene sequence phenotype and function that can be used for transfer learning. The Nucleotide Transformer models were trained using an extensive genomic sequence database with approximately 3202 human genomes and 850 genomes from diverse phyla, which allowed the models to learn context-specific nucleotide sequences and gain a robust understanding of genomic indicators that could be used to support the annotation of orphan plant genomes⁶². For example, the chia (Salvia hispanica) genome annotation used transcriptome and orthologous gene models from multiple other species, leading to ~94% genes identified according to a BUSCO analysis⁶³. Integrating LLMs into the annotation process could further refine the functional annotation of orphan genomes by identifying the genomic patterns and gene context learned during the LLM training. DNABERT is another foundational LLM that was trained with 135 human genomes for predicting gene function, promoter sites, splice sites and transcription factor binding sites based on DNA sequence⁵⁸. DNABERT demonstrated a high capacity for transferring learning to other species, effectively detecting transcription factor binding sites in genomes with under 50% non-coding similarity to the human genome⁵⁸. Since transcription factors regulate gene expression, and their binding sites are often found in non-coding regions at varying distances from target genes, DNABERT’s success in identifying these sites suggests it accurately captures conserved semantic relationships within the DNA sequences. Several studies have leveraged the DNABERT model to advance plant research. A recent study further trained the DNABERT model to identify long non-coding RNA (lncRNA) in six major plant species⁶⁴. The lncRNAs play an important role in regulating gene expression through interactions with DNA, RNA, and proteins that modulate gene activity being valuable targets for crop improvement⁶⁵. The LLM identified lncRNA sequences from genomic DNA sequences with up to 83% accuracy in target species and a high average accuracy in identifying lncRNA sequences in previously unseen crop species⁶⁴. Multiple models leveraging these foundational LLMs were proposed for the prediction of DNA methylation sites in plants due to their importance as gene expression regulators. These LLMs were trained in major plant species and tested on previously unseen plant datasets, showing their capacity to capture the species-specific indicators for methylation sites and an ability to generalise across different species that highlighting the LLMs’ effectiveness in identifying critical regulatory elements in less-studied plant genomes^66,67,68.

More recently, a foundational LLM focused on crop genome sequences was released. AgroNT uses a similar structure to DNABERT, but it was trained on 48 crop species genomes, including the orphan crops pigeonpea (Cajanus cajan), cassava (Manihot esculenta) and quinoa (Chenopodium quinoa). The AgroNT model has demonstrated high accuracy in predicting regulatory annotations, promoter/terminator strength, lncRNA prediction and tissue-specific gene expression across species, indicating the model’s versatility and potential uses for identifying sites controlling gene expression in orphan crops⁶⁹. Being trained exclusively with plant datasets may provide an advantage to AgroNT, as it avoids biases towards genomic structures that are exclusive to other organisms. The foundational LLMs above offer a powerful tool for transferring knowledge from major to orphan crops, as the biological annotations and experimental validation from well-curated plant species can be leveraged to detect gene regulation mechanisms in orphan species.

A major limitation for the genomics-based improvement of orphan crops is the insufficient genome references and annotated genomic resources for these species. This has hindered the identification of causal genes associated with valuable crop phenotypes. Pre-trained LLMs models could be useful to predict gene function from DNA or RNA sequencing datasets^58,69. The estimated gene function output could also be applied for prioritising functional variants identified through genome wide association studies (GWAS), RNA sequencing and other genomic analysis⁶⁹. In addition, the pre-trained LLMs models could be fine-tuned for specific orphan crop prediction using a reduced training dataset, leveraging the model’s learning about the molecular relationships to focus on species specific features. Ultimately, integrating pre-trained LLMs with genomic data and focused fine-tuning could help bridge the gap in understanding and harnessing the unique traits of orphan crops, unlocking their full potential for sustainable agriculture.

Transfer of knowledge between major and orphan crops

The limited knowledge and resources available for orphan crops has slowed their development^50,51. However, there are many orphan crops that are closely related to major crops. For example, Solanaceae fruit include tomatoes, a major crop, and ground cherries, an orphan crop^70,71. For examples like this, their evolutionary relationship can be used to learn about and improve orphan crops through gene homology. Conservation of orthologs and their functions has been found between orphan crops and related major crops⁹. These conserved genes allow studies to use genes and knowledge available in major crops to identify candidate genes, edit genomes, and predict traits in orphan crops.

Gene homology with major crops or model species can be used to identify genes associated with a trait of interest. Gene homology with Arabidopsis thaliana, a model species, was used to identify 108 candidate genes for seed mucilage production in chia⁷². Candidate genes for domestication were identified using gene homology with A. thaliana and rice⁷³. While these studies used model species, the same methods could be applied using major crops. A ML approach for identifying candidate genes from sequences associated with a trait of interest is QTG-Finder2⁷⁴. QTG-Finder2 is a fast and efficient way to identify candidate genes from quantitative trait loci (QTL). The QTG-Finder2 ML model was trained on orthologs of causal genes from major crops and model plant species. Lin et al.⁷⁴ hypothesised that the QTG-Finder2 model could be applied to species with little to no known causal genes, due to the conservation of orthologs between species. To test this hypothesis, they applied the QTG-Finder2 ML model in sorghum, an orphan crop, to predict causal genes for plant height. QTG-Finder2 correctly identified true plant height causal genes 70% of the time⁷⁴. QTG-Finder2 improves the efficiency of identifying candidate genes and can be applied to species with few if any known causal genes. Machine learning and gene homology can be used to predict essential genes in species with little knowledge available. Essential genes are required for the reproductive success of a species and are highly conserved⁷⁵. If ML can identify essential genes using gene homology it could be applied to predict other conserved genes.

Genome editing using CRISPR can make changes to DNA to improve a trait. To be able to make changes to DNA, information on the gene sequence is needed. The conservation of orthologs between major and orphan crops can be used to identify targets for genome editing. The mutation of tomato orthologs has improved the fruit size and production of ground cherries through genome editing^70,71. Lodging resistance in tef, an orphan crop, has been improved by editing a rice ortholog for semi-dwarfism⁷⁶. Gene conservation between orphan and major crops can be used to identify candidate genes and design genome editing targets when there is no data available for the gene of interest within the orphan crop. Machine learning can also be used to improve the editing efficiency and specificity of genome editing.

One way to find gene orthologs, that can be used for orphan crop studies, is to source it from the literature^72,76; however, this information is spread through papers and journals making it challenging to know the extent of gene homology between major and orphan crops and where to find this data. Databases such as NCBI are a source of protein and nucleotide sequences and gene homology for many species⁷⁷; however, they do not have information specific to orphan crops. Consolidating all major crop orthologs and their presence in orphan crops into a comprehensive database would aid studies identifying candidate genes and improving traits through genome editing in orphan crops.

Transfer learning is a machine learning method that uses pre-trained models and new datasets to fine tune ML models for a new purpose⁷⁸. Transfer learning can be used to make predictions in a species with little available knowledge by training the model on a species with available data (Fig. 4). Pre-trained models can be transferred from major to related orphan crops due to the conservation of genes and gene functions⁴⁵. Tomatoes are a major crop with poor quality annotations. Transfer learning was used to improve the prediction accuracy of generalised and specialised metabolism genes in tomatoes⁷⁹. A model trained on A. thaliana was applied to tomato annotation data, and the prediction accuracy of the transfer learning model was greater than the model trained on the tomato annotation data for generalised metabolism genes. Prediction accuracy did not improve for specialised metabolism genes. The reason the transfer learning model performed better for the generalised metabolism genes is because they are conserved between species while specialised metabolism genes are lineage specific⁷⁹. While this study focuses on a major crop with poor annotation, the same method can be applied to orphan crops. Transfer learning can be used to link knowledge from resource rich major crops to related orphan crops, for conserved traits. To aid trait prediction in orphan crops a database of trait prediction models trained on major crops should be collated; these pre-trained models could then be applied to related orphan crops using transfer learning.

**Fig. 4: A basic workflow for the use of transfer learning in orphan crops.**

The limitations of transferring knowledge from major to orphan crops whether it is through gene prediction, genome editing, or transfer learning, is that all these methods rely on conserved genes. Orphan genes are lineage specific genes that have no homologues in other species and make up 10–20% of a genome⁸⁰. Orphan genes have been found to be associated with agronomically important traits such as disease resistance and abiotic stress tolerance^81,82,83. These orphan or novel genes cannot be identified or improved without species specific genomic resources, so, while transferring knowledge from major to orphan crops can be used to improve some traits, we still need orphan crop specific resources to reach the maximum potential for crop improvement. Orphan crop genomic resources can identify these orphan genes that can aid crop improvement in both orphan and major crops.

There are some examples of knowledge transfer between orphan and major crops and vice-versa. For example, abiotic resistance genes not present in the bread wheat genome have been identified in the orphan cereal tef⁸⁴. The salinity-resistant orphan crop groundnut has been identified as a potential source for salinity resistance in soybean⁸⁵. Other examples involve transfer of knowledge from wild relatives to major crops. An example is the super-pangenome of Cicer, which included several wild relatives of chickpea and led to the discovery of novel disease resistance genes and genes involved in salt resistance, along with novel mutations in vernalisation genes^86,87. A similar super-pangenome in tomato identified a wild-type only cytochrome P450 allele linked with increased yield⁸⁸. Sequencing of Aegilops accessions has led to the cloning of four novel disease resistance genes not present in bread wheat⁸⁹. In bread wheat, the Watkins collection of landraces from the 1930s has been a large source of knowledge applied to bread wheat, including novel resistance genes to tan spot, Fusarium head blight⁹⁰ and eyespot resistance^91,92. Sequencing the entire Watkins collection identified and subsequently introgressed 127 QTL alleles from landraces to bread wheat, leading to yield increases of up to 0.91 t ha⁻¹⁹³. These examples show that by focusing on wild or landrace relatives, plant breeders can introduce significant yield gains by introgressions and crossbreeding.

Implementation of machine learning-based improvement of orphan crops

Some of the challenges for orphan crop improvement include the lack of genomic resources, limited uptake of modern crop breeding methods, and the lack of local scientists working on these issues. Collaboration between scientists, local communities, smallholder farmers and international collaborators can help bridge the gap between major and orphan crops. The International Maize and Wheat Improvement Centre (CIMMYT) is a non-for-profit organisation that aims to address the challenges faced by smallholder farmers in marginal environments⁹⁴. CIMMYT develops high yielding, nutritious, and abiotic stress resistant wheat and maize varieties. They work with smallholder farmers in developing countries by providing training, trading knowledge, and exploring market opportunities. With the aid of public and private collaborations CIMMYT has improved the food security of millions of smallholder farmers in Africa, Asia and Latin America. Similar initiatives aim to improve orphan crops. Feed the Future Innovation for Crop Improvement focuses on accelerating the breeding of local roots, tubers, bananas, millets, legumes and sorghum varieties through the collaboration of scientists, global stakeholders, and local communities⁹⁵. The AOCC uses a network of public and private collaborators from international, non-government, and academic institutes to collect germplasm reserves, sequence genomes and gather local input⁹⁶. The AOCC aims to sequence a total of 101 orphan crop species, has completed 6 of these genomes and is in the progress of completing an additional 26 genomes. Orphan crop germplasm is held in over 150 gene banks globally, which can be used for sequencing and genotyping by initiatives such as the AOCC⁹⁷. Important to each of these initiatives is the input of local communities to ensure that the crop varieties are suited to each local environment, the farmers are willing to adopt the technology and that there is a demand for the product within the local marketplace. Another method to increase local involvement and to increase the manpower behind orphan crop improvement is to recruit local farmers as citizen scientists. Triadic comparisons of technologies (TRICOT) is a citizen science method that sends volunteer farmers crop varieties or agronomic technologies to trial⁹⁸. TRICOT is cost effective and does not require training or specialized skills, making it accessible to farmers in marginal communities. TRICOT has been successfully used to trial the climatic response of crops in marginal environments and to determine consumer preference of orphan crop varieties^99,100. Given how important local famer and community input is for orphan crop improvement it is required that these communities benefit from the studies that they take part in. All studies in orphan crops in marginal or regional environments should have the consent of the local community and the results should be accessible by the smallholder farmers that participate. Citizen scientist studies and regional and international collaborations should be supported by policy to ensure funding of initiatives to improve orphan crops. The United Nation’s recommendations for supporting orphan crops includes funding and training for farmers willing to adopt new technologies, funding for smallholder farmers to access markets, and policies encouraging the collaborations of local knowledge and science and technology^101,102. Policy frameworks should be developed to train and fund the implementation of ML and modern breeding techniques by local farmers in orphan crops and to encourage further collaborations with local communities when developing new orphan crop varieties.

One of the greatest challenges for orphan crop improvement and associated improvement in food security in nations that rely on these crops, is the lack of funding. While the majority of orphan crops will remain orphans due to their niche habitats or limited potential, many, with appropriate investment, have the potential to become major crops either regionally or even globally. The rising tide of genomic technologies should lift the performance of all crops, as knowledge can be transferred to closely related species. However, the investment should be focussed on those crops with the greatest potential for improvement considering the use of machine learning to optimise results. Machine learning models can leverage major crop datasets for training, decreasing the amount of data required for trait prediction and identification of genomic features in orphan crops. Nonetheless, strategic data generation from orphan crops is required to ensure strong alignment with the training datasets. As additional datasets are generated from orphan crops, the models can be fine-tuned to improve their accuracy and specificity over time. Additionally, understanding the genomic basis of traits in orphan crops could benefit the major crops through gene introgression and editing, and there is an argument for international seed companies to support orphan crop improvement either directly, or indirectly through technology exchange as many of them currently do. Increased support from breeding and seed companies for orphan crop improvement could substantially accelerate the use of machine learning, as these enterprises have a wealth of data from genome sequencing and field trials. Providing machine learning models with a diverse dataset would allow the model to consider the intra-species genomic variability and other factors impacting trait prediction outcomes.

Investing in single technologies for data generation is unlikely to deliver sufficient results and support across fields, with investment diversification from genomics-based breeding through to agronomy, processing and marketing required to boost orphan crop performance. Moreover, while investing in the countries where orphan crops are predominantly grown can leverage local expertise and has significant social benefits, restricting the investment to a geographic boundary may not always be the most efficient pathway to accelerate crop improvement. A strategy that integrates the best available technologies and expertise on a local and global scale can enhance the effectiveness and impact of such efforts. This is particularly important for developing ML models, that often require high computational resources and specialised skills that can be accessed more cost-effectively on a global scale. Given the limited financial resources available for orphan crop improvement, a balanced approach is important, where the most effective improvements per dollar invested may be through low-cost traditional breeding, education and marketing strategies. Subsequently, machine learning models can exploit the generated knowledge after low-cost approaches have been exploited.

Outlook

Machine learning has emerged as a promising tool for advancing research and breeding efforts in orphan crops. These underutilised plant species, often vital for food security in developing regions, have historically received less scientific attention than major staple crops. Here we have demonstrated how ML techniques are being applied to analyse genomic data, predict crop traits, optimise breeding strategies, and enhance disease resistance in major crops, knowledge which is then transferred to orphan crops. By leveraging large datasets and complex algorithms, ML approaches can accelerate the identification of beneficial genes and help develop improved varieties. This technology shows potential to address challenges specific to orphan crops, such as limited genetic resources and adaptation to local environments, to ensure food for a growing population in a warming climate.

References

Food and Agriculture Organization of the United Nations. Statistics. https://www.fao.org/statistics/en (2024).
Borrell, J. S. et al. Enset‐based agricultural systems in Ethiopia: a systematic review of production trends, agronomy, processing and the wider food security applications of a neglected banana relative. Plants People Planet 2, 212–228 (2020).
Article Google Scholar
Tadele, Z. Orphan crops: their importance and the urgency of improvement. Planta 250, 677–694 (2019).
Article CAS PubMed MATH Google Scholar
Sosa, A. Chia crop (Salvia hispanica L.): Its history and importance as a source of polyunsaturated fatty acids omega-3 around the world: A review. JCRF 1, 1–4 (2016).
Article MATH Google Scholar
Zamora-Tavares, P., Vargas-Ponce, O., Sánchez-Martínez, J. & Cabrera-Toledo, D. Diversity and genetic structure of the husk tomato (Physalis philadelphica Lam.) in Western Mexico. Genet. Resour. Crop Evol. 62, 141–153 (2015).
Article Google Scholar
Wasihun, G. & Desu, A. Trend of cereal crops production area and productivity, in Ethiopia. J. of Cereals Oilseeds 12, 9–17 (2021).
Article MATH Google Scholar
African Orphan Crops Consortium. Healthy Africa through nutritious, diverse and local food crops. https://africanorphancrops.org/ (2024).
Crops for the Future. Facilitating the wider use of underutilised crops. https://cropsforthefutureuk.org/ (2024).
Kumar, B., Singh, A. K., Bahuguna, R. N., Pareek, A. & Singla‐Pareek, S. L. Orphan crops: a genetic treasure trove for hunting stress tolerance genes. Food Energy Secur 12, e436 (2023).
Article CAS MATH Google Scholar
Fisher, R. A. Statistical methods for research workers. (1934).
Fisher, R. A. The design of experiments. (1935).
Araújo, S. O., Peres, R. S., Ramalho, J. C., Lidon, F. & Barata, J. Machine learning applications in agriculture: current trends, challenges, and future perspectives. Agronomy 13, 2976 (2023).
Article Google Scholar
Kang, M., Ko, E. & Mersha, T. B. A roadmap for multi-omics data integration using deep learning. Brief Bioinform. 23, (2022).
Liakos, K. G., Busato, P., Moshou, D., Pearson, S. & Bochtis, D. Machine learning in agriculture: a review. Sensors 18, 2674 (2018).
Article ADS PubMed PubMed Central Google Scholar
Yoosefzadeh Najafabadi, M., Hesami, M. & Eskandari, M. Machine learning-assisted approaches in modernized plant breeding programs. Genes 14, 777 (2023).
Article CAS PubMed PubMed Central Google Scholar
Dudley, J. Molecular markers in plant improvement: manipulation of genes affecting quantitative traits. Crop Sci 33, 660–668 (1993).
Article CAS MATH Google Scholar
Tong, H. & Nikoloski, Z. Machine learning approaches for crop improvement: leveraging phenotypic and genotypic big data. J. Plant Physiol. 257, 153354 (2021).
Article CAS PubMed MATH Google Scholar
Van Klompenburg, T., Kassahun, A. & Catal, C. Crop yield prediction using machine learning: a systematic literature review. Comput. Electron. Agric. 177, 105709 (2020).
Article Google Scholar
Scheben, A., Wolter, F., Batley, J., Puchta, H. & Edwards, D. Towards CRISPR/Cas crops–bringing together genomics and genome editing. New Phytol 216, 682–698 (2017).
Article CAS PubMed Google Scholar
Chen, K., Wang, Y., Zhang, R., Zhang, H. & Gao, C. CRISPR/Cas genome editing and precision plant breeding in agriculture. Annu. Rev. Plant Biol. 70, 667–697 (2019).
Article CAS PubMed MATH Google Scholar
Li, B. et al. Genomic prediction of breeding values using a subset of SNPs identified by three machine learning methods. Front. Genet. 9, 237 (2018).
Article PubMed PubMed Central MATH Google Scholar
Li, Y., Raidan, F., Vitezica, Z. & Reverter, A. Using Random Forests as a prescreening tool for genomic prediction: Impact of subsets of SNPs on prediction accuracy of total genetic values. In Proceedings of the World Congress on Genetics Applied to Livestock Production (WCGALP) 11, (2018).
Herr, A. et al. Unoccupied aerial systems imagery for phenotyping in cotton, maize, soybean, and wheat breeding. Crop Sci 63, 1722–1749 (2023).
Article CAS MATH Google Scholar
Singh, A., Ganapathysubramanian, B., Singh, A. K. & Sarkar, S. Machine learning for high-throughput stress phenotyping in plants. Trends Plant Sci 21, 110–124 (2016).
Article CAS PubMed MATH Google Scholar
Gill, M. et al. Machine learning models outperform deep learning models, provide interpretation and facilitate feature selection for soybean trait prediction. BMC Plant Biol 22, 180 (2022).
Article PubMed PubMed Central MATH Google Scholar
Ma, W. et al. A deep convolutional neural network approach for predicting phenotypes from genotypes. Planta 248, 1307–1318 (2018).
Article CAS PubMed MATH Google Scholar
Montesinos-López, O. A. et al. New deep learning genomic-based prediction model for multiple traits with binary, ordinal, and continuous phenotypes. G3: Genes, Genomes, Genet 9, 1545–1556 (2019).
Article MATH Google Scholar
Liu, Y. et al. Phenotype prediction and genome-wide association study using deep convolutional neural network of soybean. Front. Genet. 10, 1091 (2019).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Spindel, J. et al. Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS Genet 11, e1004982 (2015).
Article PubMed PubMed Central Google Scholar
Xu, Y., Laurie, J. D. & Wang, X. CropGBM: An ultra-efficient machine learning toolbox for genomic selection-assisted breeding in crops. Accelerated Breeding of Cereal Crops 133-150 (2022).
Gabur, I., Simioniuc, D. P., Snowdon, R. J. & Cristea, D. Machine learning applied to the search for nonlinear features in breeding populations. Front. Artif. Intell. 5, 876578 (2022).
Article PubMed PubMed Central Google Scholar
Parmley, K. A., Higgins, R. H., Ganapathysubramanian, B., Sarkar, S. & Singh, A. K. Machine learning approach for prescriptive plant breeding. Sci. Rep. 9, 17132 (2019).
Article ADS PubMed PubMed Central Google Scholar
Ruß, G., Kruse, R., Schneider, M. & Wagner, P. Data mining with neural networks for wheat yield prediction. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 47–56 (2008).
Basir, M. S., Chowdhury, M., Islam, M. N. & Ashik-E-Rabbani, M. Artificial neural network model in predicting yield of mechanically transplanted rice from transplanting parameters in Bangladesh. J. Agric. Food Res. 5, 100186 (2021).
Google Scholar
Taherei Ghazvinei, P. et al. Sugarcane growth prediction based on meteorological parameters using extreme learning machine and artificial neural network. Eng. Appl. Comput. Fluid Mech. 12, 738–749 (2018).
MATH Google Scholar
Filippi, P. et al. An approach to forecast grain crop yield using multi-layered, multi-farm data sets and machine learning. Precision Agriculture 20, 1015–1029 (2019).
Article MATH Google Scholar
Montesinos-López, A., Montesinos-López, O. A., Gianola, D., Crossa, J. & Hernández-Suárez, C. M. Multi-environment genomic prediction of plant traits using deep learners with dense architecture. G3: Genes, Genomes Genet. 8, 3813–3828 (2018).
Article MATH Google Scholar
Azodi, C. B. et al. Benchmarking parametric and machine learning models for genomic prediction of complex traits. G3: Genes, Genomes Genet. 9, 3691–3702 (2019).
Article MATH Google Scholar
Kick, D. R. et al. Yield prediction through integration of genetic, environment, and management data through deep learning. G3: Genes, Genomes Genet, 13, jkad006 (2023).
Article MATH Google Scholar
Måløy, H., Windju, S., Bergersen, S., Alsheikh, M. & Downing, L. Multimodal performers for genomic selection and crop yield prediction. Smart Agric. Technol. 1, 100017 (2021).
Article Google Scholar
Li, J. et al. TrG2P: A transfer learning-based tool integrating multi-trait data for accurate prediction of crop yield. Plant Commun, (2024).
Danilevicz, M. F., Bayer, P. E., Boussaid, F., Bennamoun, M. & Edwards, D. Maize yield prediction at an early developmental stage using multispectral images and genotype data for preliminary hybrid selection. Remote Sens 13, 3976 (2021).
Article ADS Google Scholar
Togninalli, M. et al. Multi-modal deep learning improves grain yield prediction in wheat breeding by fusing genomics and phenomics. Bioinformatics 39, btad336 (2023).
Article CAS PubMed PubMed Central Google Scholar
Tong, K. et al. PlantMine: A machine-learning framework to detect core SNPs in rice genomics. Genes 15, 603 (2024).
Article CAS PubMed PubMed Central MATH Google Scholar
Cheng, C.-Y. et al. Evolutionarily informed machine learning enhances the power of predictive gene-to-phenotype relationships. Nature Commun 12, 5627 (2021).
Article ADS CAS MATH Google Scholar
Xu, Y. et al. Enhancing genetic gain in the era of molecular breeding. J. Exp. Bot. 68, 2641–2666 (2017).
Article CAS PubMed MATH Google Scholar
Sterck, L., Billiau, K., Abeel, T., Rouzé, P. & Van De Peer, Y. ORCAE: Online resource for community annotation of eukaryotes. Nat Methods 9, 1041–1041 (2012).
Article CAS PubMed Google Scholar
Price, E. J. et al. Metabolite database for root, tuber, and banana crops to facilitate modern breeding in understudied crops. Plant J 101, 1258–1268 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Sarah, G. et al. A large set of 26 new reference transcriptomes dedicated to comparative population genomics in crops and wild relatives. Mol Ecol Resour 17, 565–580 (2017).
Article CAS PubMed MATH Google Scholar
Kumar, B. & Bhalothia, P. Orphan crops for future food security. J. Biosci. 45, 131 (2020).
Article PubMed Google Scholar
Mabhaudhi, T. et al. Prospects of orphan crops in climate change. Planta 250, 695–708 (2019).
Article CAS PubMed PubMed Central MATH Google Scholar
Nazari, L., Khazaei, A. & Ropelewska, E. Prediction of tannin, protein, and total phenolic content of grain sorghum using image analysis and machine learning. Cereal Chem 99, 843–849 (2022). Image-based ML models accurately and efficiently predicted the protein, tannin, and total phenolic content in grain sorghum, demonstrating the usefulness of these ML models in sorghum improvement.
Article CAS Google Scholar
Kaur, S. et al. NIRS-based prediction modeling for nutritional traits in Perilla germplasm from NEH Region of India: Comparative chemometric analysis using mPLS and deep learning. J. Food Meas. Charact. 18, 9019–9035 (2024). The most accurate ML model for predicting biochemicals using Near-Infrared Reflectance Spectroscopy in Perilla depended on the trait of interest, highlighting the importance of model selection prior to germplasm screening.
Article MATH Google Scholar
Thirunavukarasu, A. J. et al. Large language models in medicine. Nature Medicine 29, 1930–1940 (2023).
Article CAS PubMed MATH Google Scholar
Lam, H. Y. I., Ong, X. E. & Mutwil, M. Large language models in plant biology. Trends Plant Sci 29, 1145–1155 (2024).
Article CAS PubMed Google Scholar
Meng, J., Chang, Z., Zhang, P., Shi, W. & Luan, Y. (2019). lncRNA-LSTM: Prediction of Plant Long Non-coding RNAs Using Long Short-Term Memory Based on p-nts Encoding. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 347–357 (2019).
Lemay, M., de Ronne, M., Bélanger, R. & Belzile, F. k‐mer‐based GWAS enhances the discovery of causal variants and candidate genes in soybean. Plant Genome 16, e20374 (2023).
Article CAS PubMed Google Scholar
Ji, Y., Zhou, Z., Liu, H. & Davuluri, R. V. DNABERT: Pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics 37, 2112–2120 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Nguyen, V.-N., Ho, T.-T., Doan, T.-D. & Le, N. K. Using a hybrid neural network architecture for DNA sequence representation: A study on N4-methylcytosine sites. Comput. Biol. Med. 178, 108664 (2024).
Article CAS PubMed MATH Google Scholar
Brown, T. B. et al. Language models are few-shot learners. https://arxiv.org/abs/2005.14165 (2020).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, (2017).
Dalla-Torre, H. et al. The Nucleotide Transformer: Building and evaluating robust foundation models for human genomics. https://www.biorxiv.org/content/10.1101/2023.01.11.523679v1 (2023).
Gupta, P. et al. Reference genome of the nutrition-rich orphan crop chia (Salvia hispanica) and its implications for future breeding. Front. Plant Sci. 14, 1272966 (2023).
Article PubMed PubMed Central Google Scholar
Danilevicz, M. F. et al. DNABERT-based explainable lncRNA identification in plant genome assemblies. Comput. Struct. Biotechnol. J. 21, 5676–5685 (2023).
Article CAS PubMed PubMed Central MATH Google Scholar
Urquiaga, M. C. O., Thiebaut, F., Hemerly, A. S. & Ferreira, P. C. G. From trash to luxury: The potential role of plant lncRNA in DNA methylation during abiotic stress. Front. Plant Sci. 11, 603246 (2021).
Article PubMed PubMed Central Google Scholar
Shi, H., Li, S. & Su, X. Plant6mA: A predictor for predicting N6-methyladenine sites with lightweight structure in plant genomes. Methods 204, 126–131 (2022).
Article CAS PubMed MATH Google Scholar
Yu, Y. et al. iDNA-ABT: advanced deep learning model for detecting DNA methylation with adaptive features and transductive information maximization. Bioinformatics 37, 4603–4610 (2021).
Article CAS PubMed MATH Google Scholar
Zeng, W., Gautam, A. & Huson, D. H. MuLan-Methyl—multiple transformer-based language models for accurate DNA methylation prediction. GigaScience 12, giad054 (2022).
Article PubMed Google Scholar
Mendoza-Revilla, J. et al. A foundational large language model for edible plant genomes. Commun Biol 7, 835 (2024). The pretrained large language model AgroNT predicted enhancer regions and the effect of promoter-proximal regions in cassava with high and moderate accuracies respectively, demonstraing the ability of AgroNT to predict regulatory features in orphan crops.
Article CAS PubMed PubMed Central MATH Google Scholar
Kwon, C. T. et al. Rapid customization of Solanaceae fruit crops for urban agriculture. Nat. Biotechnol. 38, 182–188 (2020).
Article CAS PubMed MATH Google Scholar
Lemmon, Z. H. et al. Rapid improvement of domestication traits in an orphan crop by genome editing. Nat. Plants 4, 766–770 (2018). CRISPR-Cas9 was successfully used to mutate tomato orthologues and improve productivity traits in the orphan crop groundcherry.
Article CAS PubMed MATH Google Scholar
Alejo-Jacuinde, G. et al. Multi-omic analyses reveal the unique properties of chia (Salvia hispanica) seed metabolism. Commun. Biol. 6, 820–820 (2023).
Article PubMed PubMed Central Google Scholar
Li, X. et al. Multi-omics analyses of 398 foxtail millet accessions reveal genomic regions associated with domestication, metabolite traits, and anti-inflammatory effects. Mol. Plant 15, 1367–1383 (2022).
Article CAS PubMed MATH Google Scholar
Lin, F., Lazarus, E. Z. & Rhee, S. Y. QTG-Finder2: A generalized machine-learning algorithm for prioritizing QTL causal genes in plants. G3: Genes - Genomes - Genet 10, 2411–2421 (2020). The QTG-Finder2 ML model correctly identified true plant height causal genes in sorghum and can improve the efficiency of candidate gene identification in orphan crops.
Article CAS MATH Google Scholar
Beder, T. et al. Identifying essential genes across eukaryotes by machine learning. NAR Genom. Bioinform. 3, lqab110–lqab110 (2021).
Article PubMed PubMed Central Google Scholar
Beyene, G. et al. CRISPR/Cas9‐mediated tetra‐allelic mutation of the ‘Green Revolution’ SEMIDWARF‐1 (SD‐1) gene confers lodging resistance in tef (Eragrostis tef). Plant Biotechnol. J. 20, 1716–1729 (2022).
Article CAS PubMed PubMed Central Google Scholar
Sayers, E. W. et al. Database resources of the national center for biotechnology information. Nucleic Acids Research 50, D20–D26 (2022).
Article CAS PubMed MATH Google Scholar
Yan, J. & Wang, X. Unsupervised and semi‐supervised learning: The next frontier in machine learning for plant systems biology. Plant J 111, 1527–1538 (2022).
Article CAS PubMed MATH Google Scholar
Moore, B. M. et al. Within- and cross-species predictions of plant specialized metabolism genes using transfer learning. In Silico Plants 2, diaa005 (2020). A ML model trained on A. thaliana increased the trait prediction accuracy of generalised metabolism genes in tomatoes but did not improve the prediction accuracy of specialised metabolism genes, demonstrating the ability of transfer learning to improve trait prediction of conserved genes.
Article CAS PubMed PubMed Central MATH Google Scholar
Khalturin, K., Hemmrich, G., Fraune, S., Augustin, R. & Bosch, T. C. G. More than just orphans: Are taxonomically-restricted genes important in evolution? Trends Genet 25, 404–413 (2009).
Article CAS PubMed Google Scholar
Perochon, A. et al. A wheat NAC interacts with an orphan protein and enhances resistance to Fusarium head blight disease. Plant Biotechnol. J. 17, 1892–1904 (2019).
Article CAS PubMed PubMed Central MATH Google Scholar
Li, G. et al. Orphan genes are involved in drought adaptations and ecoclimatic-oriented selections in domesticated cowpea. J. Exp. Bot. 70, 3101–3110 (2019).
Article CAS PubMed MATH Google Scholar
Ma, D. et al. Identification, characterization and function of orphan genes among the current Cucurbitaceae genomes. Front. Plant Sci. 13, 872137–872137 (2022).
Article PubMed PubMed Central Google Scholar
Cannarozzi, G. et al. Genome and transcriptome sequencing identifies breeding targets in the orphan crop tef (Eragrostis tef). BMC genomics 15, 581 (2014).
Article PubMed PubMed Central MATH Google Scholar
Mayes, S. et al. Bambara groundnut: an exemplar underutilised legume for resilience under climate change. Planta 250, 803–820 (2019).
Article CAS PubMed Google Scholar
Khan, A. W. et al. Super-pangenome by integrating the wild side of a species for accelerated crop improvement. Trends Plant Sci 25, 148–158 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Khan, A. W. et al. Cicer super-pangenome provides insights into species evolution and agronomic trait loci for crop improvement in chickpea. Nat. Genet. 56, 1225–1234 (2024).
Article CAS PubMed MATH Google Scholar
Li, N. et al. Super-pangenome analyses highlight genomic diversity and structural variation across wild and cultivated tomato species. Nat. Genet. 55, 852–860 (2023).
Article CAS PubMed PubMed Central MATH Google Scholar
Arora, S. et al. Resistance gene cloning from a wild crop relative by sequence capture and association genetics. Nat. Biotechnol. 37, 139–143 (2019).
Article CAS PubMed MATH Google Scholar
Halder, J. et al. Mining and genomic characterization of resistance to tan spot, Stagonospora nodorum blotch (SNB), and Fusarium head blight in Watkins core collection of wheat landraces. BMC Plant Biol 19, 480 (2019).
Article PubMed PubMed Central Google Scholar
Burt, C. et al. Mining the Watkins collection of wheat landraces for novel sources of eyespot resistance. Plant Pathol 63, 1241–1250 (2014).
Article MATH Google Scholar
Winfield, M. O. et al. High‐density genotyping of the AE Watkins Collection of hexaploid landraces identifies a large molecular diversity compared to elite bread wheat. Plant Biotechnol. J. 16, 165–175 (2018).
Article CAS PubMed MATH Google Scholar
Cheng, S. et al. Harnessing landrace diversity empowers wheat breeding. Nature 632, 823–831 (2024).
Article CAS PubMed PubMed Central Google Scholar
International Maize and Wheat Improvement Center. CIMMYT. https://www.cimmyt.org/ (2024).
Cornell University. Feed the Future Innovation Lab for Crop Improvement. https://ilci.cornell.edu/ (2024).
Hendre, P. S. et al. African Orphan Crops Consortium (AOCC): status of developing genomic resources for African orphan crops. Planta 250, 989–1003 (2019).
Article CAS PubMed Google Scholar
Genesys. The Global Gateway to Genetic Resources. https://www.genesys-pgr.org (2024).
Van Etten, J. et al. First experiences with a novel farmer citizen science approach: crowdsourcing participatory variety selection through on-farm triadic comparisons of technologies (tricot). Exp. Agric. 55, 275–296 (2019).
Article Google Scholar
van Etten, J. et al. Crop variety management for climate adaptation supported by citizen science. Proc. Nat. Acad. Sci. 116, 4194–4199 (2019).
Article ADS PubMed PubMed Central MATH Google Scholar
Moyo, M. et al. Consumer preference testing of boiled sweetpotato using crowdsourced citizen science in Ghana and Uganda. Front. Sustain. Food Syst. 5, 620363 (2021).
Article MATH Google Scholar
United Nations. Agriculture technology for sustainable development: leaving no one behind. https://documents.un.org/doc/undoc/gen/n23/218/53/pdf/n2321853.pdf (2024).
United Nations. Agriculture technology for sustainable development: leaving no one behind the future of food and agriculture: drivers and triggers for achieving sustainable agrifood systems. https://documents.un.org/doc/undoc/gen/n23/216/98/pdf/n2321698.pdf (2024).

Download references

Acknowledgements

This research was carried out while the author was in receipt of an Australian Government Research Training Program Stipend at The University of Western Australia. This work was supported by resources provided by the Pawsey Supercomputing Centre with funding from the Australian Government and the Government of Western Australia.

Author information

These authors contributed equally: Tessa R. MacNish, Monica F. Danilevicz.

Authors and Affiliations

School of Biological Sciences, The University of Western Australia, Perth, Australia
Tessa R. MacNish, Monica F. Danilevicz, Mitchell S. Bestry & David Edwards
Centre for Applied Bioinformatics, The University of Western Australia, Perth, Australia
Tessa R. MacNish, Monica F. Danilevicz, Philipp E. Bayer, Mitchell S. Bestry & David Edwards
Australian Herbicide Resistance Initiative, The University of Western Australia, Perth, Australia
Monica F. Danilevicz
The UWA Oceans Institute, The University of Western Australia, Perth, Australia
Philipp E. Bayer
Minderoo Foundation, Perth, Australia
Philipp E. Bayer

Authors

Tessa R. MacNish
View author publications
Search author on:PubMed Google Scholar
Monica F. Danilevicz
View author publications
Search author on:PubMed Google Scholar
Philipp E. Bayer
View author publications
Search author on:PubMed Google Scholar
Mitchell S. Bestry
View author publications
Search author on:PubMed Google Scholar
David Edwards
View author publications
Search author on:PubMed Google Scholar

Contributions

T.R.M: Conceptualization, Visualization, Writing - Original Draft, Writing - Review & Editing. M.F.D: Conceptualization, Visualization, Writing - Original Draft, Writing - Review & Editing. P.E.B: Conceptualization, Visualization, Writing - Original Draft, Writing - Review & Editing. M.S.B: Writing - Review & Editing. D.E: Conceptualization, Supervision, Funding acquisition, Writing - Original Draft, Writing - Review & Editing.

Corresponding author

Correspondence to David Edwards.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Carlos Hernández-Suárez and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

MacNish, T.R., Danilevicz, M.F., Bayer, P.E. et al. Application of machine learning and genomics for orphan crop improvement. Nat Commun 16, 982 (2025). https://doi.org/10.1038/s41467-025-56330-x

Download citation

Received: 28 September 2024
Accepted: 15 January 2025
Published: 24 January 2025
Version of record: 24 January 2025
DOI: https://doi.org/10.1038/s41467-025-56330-x

This article is cited by

Printing technologies for monitoring crop health
- David Panáček
- Vojtěch Kupka
- Michal Otyepka
Nature Communications (2026)
Advances in CRISPR/Cas systems for engineering abiotic stress tolerance in plants: mechanisms and future prospects
- Muhammad Farooq
- Asma Khan
- Mohammad Maroof Shah
Planta (2026)
Revitalizing orphan crops to combat food insecurity
- Xiaozhen Huang
- Deding Su
- Cao Xu
Nature Communications (2025)
The use of web resources for metabolomics in horticultural crops
- Esra Karakas
- Mustafa Bulut
- Alisdair R. Fernie
Horticulture Advances (2025)
Towards smart agriculture: AI-driven prediction of key genes for revolutionizing crop breeding
- Shaobo Cai
- Changhui Sun
- Jianhong Tian
Planta (2025)