Table 1 Simulated data information. Gene sequences (taken from the corresponding reference gene in Ensembl) and their corresponding PacBio CCS error rates used for simulation. We simulated multigene families by using these gene sequences at the root. We refer to the resulting gene families by using the name of the reference gene, sometimes dropping the last modifier (i.e. TSPY instead of TSPY13P)

From: Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon

Reference gene

Sum of exon lengths (nt)

Number of exons

Overall simulation error rate (%)

Median errors per simulated transcript

Median no. of passes per simulated transcript

TSPY13P

914

6

0.5

2

18

HSFY2

2668

6

2.6

41

6

DAZ2

5904

28

6.1

276

2