Figure 1: Overview of data processing.

Raw reads (1) were assembled using the Trinity Assembler (2) at two kmer values: 25 and 32. Assembly quality was assessed using BUSCO and TransRate (3) utilising external sequence and protein data along with initial raw read sequences. A final assembly was then chosen for each accession (4). For MP accession, reads were also subsampled to the same read depth using seqtk (5) and assembled at both read depths. The predicted protein sequences were obtained using Transdecoder (6). Blast searches were carried out on the protein and transcript sequences against the uniprot and uniref databases (7). These were then combined into an annotation using Trinotate (8). Protein sequences were also clustered into orthogroups using OrthoFinder (9) and protein sequences from other plant species. A multiple alignment was produced from each orthogroup using Muscle (10). Key—Yellow, input data; blue, processing steps; orange, intermediate data/files produced during the process; green, data from public databases; red, final output data.