Table 2 The quality and performance of long-read assembly with NECAT.

From: Efficient assembly of nanopore reads via highly accurate and intact error correction

Genome	Pipeline	Assembly size (Mb)	Contig	NG50 (Kb)	NGA50 (Kb)	MA/local MA	QV (pre-/post-polish)	BUSCO	Correct/contig/total time
E. coli	Ref.	4.6	1	4642	–	–	–/–	–	–/–/–
	Canu	4.6	1	4601	3335	2/18	18.0/22.1	18.4%	26.1/698.1/724.2
	Canu + S	4.6	1	4630	3287	3/2	18.6/22.2	19.8%	26.1/8.0/34.1
	Flye	4.6	1	4622	3071	2/2	20.2/22.6	20.2%	–/–/630.4
	NECAT	4.6	1	4595	3984	2/3	18.5/22.3	19.8%	1.6/1.2/2.8
S. cerevisiae	S228C	12.2	17	924	–	–	–/–	–	–/–/–
	Canu	12.7	26	814	703	38/33	22.3/28.5	98.5%	493.3/1029.9/1523.2
	Canu + S	12.4	19	815	705	34/29	22.7/28.9	98.2%	493.3/38.4/531.7
	Flye	12.3	26	943	706	21/26	21.8/29.0	98.5%	–/–/197.8
	NECAT	12.3	19	937	708	26/35	23.1/29.0	98.3%	4.4/4.9/9.3
A. thaliana	TAIR10	119.7	7	23,460	–	–	–/–	–	–/–/–
	Canu	113.4	288	6523	445	478/1152	15.6/19.5	98.5%	193.1/1229.9/1423.0
	Canu + S	115.6	44	11,071	527	576/1170	15.9/19.6	98.8%	193.1/125.9/319.0
	Flye	126.6	154	12,043	627	1085/1962	16.8/18.5	98.7%	–/–/59.4
	NECAT	122.9	136	11,157	582	886/1304	16.0/18.9	98.8%	19.8/28.0/47.9
D. melanogaster	dm6	143.7	1870	25,287	–	–	–/–	–	–/–/–
	Canu	146.8	499	3509	3240	1307/678	20.2/22.2	91.3%	289.6/1259.2/1548.8
	Canu + S	135.8	162	14,456	6473	587/333	20.8/23.2	91.6%	289.6/294.4/584.0
	Flye	139.9	593	11,925	5129	558/749	21.4/22.5	89.9%	–/–/127.9
	NECAT	143.0	277	18,072	6323	1117/1333	20.2/22.3	92.0%	37.7/32.7/70.4
C. reinhardtii	Ref. v5.5	111.1	53	7784	–	–	–/–	–	–/–/–
	Canu	116.4	93	4564	739	853/2269	19.3/22.2	97.9%	950.4/17,369.6/18,320.0
	Canu + S	109.7	46	4498	713	655/1629	20.1/23.0	97.7%	950.4/816.0/1766.4
	Flye	112.9	65	6573	831	764/2029	21.6/23.6	98.4%	–/–/185.8
	NECAT	113.4	54	6169	732	831/2273	19.8/22.4	98.0%	54.8/47.0/101.8
O. sativa	Ref.v4.0	382.8	15	30,829	–	–	–/–	–	–/–/–
	Canu	383.9	385	5041	2253	474/8334	15.9/15.9	58.6%	2768.0/16,800.0/19,568.0
	Canu + S	366.4	229	3586	1832	394/5116	16.3/16.3	59.2%	2768.0/1926.3/4694.3
	Flye	380.7	249	3552	2213	573/1742	16.4/16.3	59.2%	–/–/817.6
	NECAT	373.1	120	9650	3311	479/4873	16.0/16.3	58.4%	186.9/330.3/517.2
S. pennellii	Ref	915.6	899	2522	–	–	–/–	–	–/–/–
	Canu	961.8	2010	1664	797	5614/15,301	–/20.3	97.1%	5733.1/15,398.4/21,131.5
	Canu + S	915.6	899	2522	–	–	–/–	97.2%	5733.1/2510.2/8243.3
	Flye	1026.0	3180	1971	651	8504/10,726	16.0/18.5	96.7%	–/–/3590.8
	NECAT	991.8	1344	4802	992	5813/12,592	15.2/17.3	95.5%	799.6/2434.1/3233.7
NA12878 (rel3,4)	Ref38	3272.1	639	145,139	–	–	–/–	–	–/–/–
	Canu	2759.0	2337	5691	3368	1977/25,179	15.4/24.5	86.3%	–/–/60,000.0
	NECAT	2798.4	1494	14,066	9538	964/4591	16.6/24.6	74.9%	2217.6/5904.0/8121.6
NA12878 (rel6)	Flye	2867.0	3309	28,407	16,640	4054/7258	22.9/24.2	74.6%	–/–/2500.0
NA12878 (rel6)	NECAT	2846.9	1047	20,913	13,441	948/1467	23.1/24.4	74.5%	2518.4/6900.4/9418.8

“Assembly size” is the total number of base pairs in all contigs generated by assemblers. “NG50” indicates that 50% of reference genome size was contained in contigs having length ≥ n. “NGA50” is NG50 of aligned blocks that contigs are broken into at mis-assembly breakpoints. “MA/local MA” are the numbers of misassemblies and local misassemblies evaluated by QUAST. “QV” is defined as \(10 \times \log _{10}(\frac{{\rm{100kbp}}}{\# \;{\rm{mismatches}}\;{\rm{per}}\;{\rm{100}}\;{\rm{kbp}} + \;\# \;{\rm{indels}}\;{\rm{per}}\;{\rm{100}}\;{\rm{kbp}}})\), where “# mismatches per 100 kbp” and “# indels per 100 kbp” are evaluated by QUAST. “BUSCO” is gene completeness evaluated by BUSCO. All the pipelines were tested on the same computer with a 2.0 GHz CPU and 3 T GB RAM of memory. For the first five data sets, we ran all the pipelines on our computer with 32 threads; the correction and contig computational time of the pipelines were recorded. For O. sativa, S. pennellii, and the human data set, we ran all pipelines on our computer with 64 threads, and correction and contig computational time were recorded. The S. pennellii assemblies of Canu and Canu + Smartdenovo were acquired from https://www.plabipd.de/portal/solanum-pennellii, NG50 of which were longer than those generated by us. The S. pennellii assembly of Canu + Smartdenovo was used as the reference genome, and therefore its metrics NGA50, MA, and QV are not evaluated. The NA12878 (rel3,4) assembly and running time of Canu were acquired from public paper. The NA12878 (rel6) assembly and running time of Flye were acquired from https://github.com/fenderglass/Flye.

Back to article page

Table 2 The quality and performance of long-read assembly with NECAT.

Search

Quick links