Extended Data Table 3 Summary of gene resolution in HG002 fully phased assemblies relative to the HG002 Q100 reference genome

From: Efficient near-telomere-to-telomere assembly of nanopore simplex reads

  1. Each assembly consists of two sets of sequences representing the paternal and maternal haplotypes. The two numbers in each cell correspond to the metrics for the two haplotypes, respectively. For each assembly or genome, genes were identified by aligning cDNA sequences with a sequence identity of at least 99%. ‘HG002 ref’ refers to the T2T Q100 reference genome for HG002, whose aligned genes were used as the ground truth. Gene completeness was evaluated by comparing the genes identified in each assembly with those in the HG002 reference. ‘Single-copy resolved’ denotes the number of single-copy genes in the HG002 reference that remain single copy in the assembly. ‘Multicopy resolved’ denotes the number of multicopy genes in the HG002 reference that remain multicopy in the assembly. ‘False duplicated’ refers to single-copy genes in the HG002 reference that appear as multicopy in the assembly. ‘Partially resolved’ refers to genes that are fully present but fragmented across multiple pieces in the assembly. ‘Unresolved’ (>50%, 10–50%, ≤10%) indicates genes that cannot be completely identified in the assembly, in which the aligned length covers more than 50%, between 10% and 50% or less than 10% of the full gene length, respectively.