Fig. 4 | Scientific Data

Fig. 4

From: Chromosome-level reference genome assembly for the protected resource plant, Zenia insignis

Fig. 4

Flowchart of Zenia insignis gene prediction. The hybrid approach incorporates de novo prediction, homology-based prediction, and RNA-Seq-assisted prediction.Three evidences, including repeat, protein, and transcript evidence, was generated firstly. The RepeatMasker software was employed to generate repeat evidence. Three sets of protein data for Arabidopsis thaliana, Medicago truncatul and Swiss-Prot were used to construct homologous gene models by genBlastG as protein evidence. The evidence from the transcript was divided into two sections: one comprising NGS transcript data, and the other full-length transcript data. HISAT2 was used to map paired-end short transcriptome reads to the Zenia insignis genome. Thereafter, the mapped reads were retrieved and subsequently assembled using Trinity. The full-length transcriptome data were processed using the IsoSeq 3. Minimap2 was used to align both short and long-read data, from which the evidence of the transcriptome was derived. The integrated repeat, protein, and transcript evidence were then subjected to gene prediction using MAKER2. Three rounds of MAKER2 analysis were executed. In the second and third rounds, the output generated through de novo gene prediction using AUGUSTUS was also included. Following the third round of MAKER2, the results were input into the PASA pipeline. PASA pineline could optimise and refine MAKER2 outputs. The BUSCO was employed to evaluate the gene prediction results after each round of MAKER2 and PASA pipeline.

Back to article page