Fig. 1: Flowchart of GoldRush and GoldPath. | Nature Communications

Fig. 1: Flowchart of GoldRush and GoldPath.

From: Linear time complexity de novo long read genome assembly with GoldRush

Fig. 1

a Raw long reads are first processed by GoldPath to generate the golden path, a ~1X representation of the genome. The golden path is then polished by GoldPolish and corrected for structural errors with Tigmint-long. Finally, GoldChain scaffolds the polished and corrected golden path to generate the final genome assembly. b GoldPath uses the input long sequencing reads or silver path sequences to initialize a miBf data structure. GoldPath then loops over the sequences, and queries each sequence against the miBf. If the sequence is found in the miBf, GoldPath skips it and resumes its iterations. Conversely, if the sequence is not found in the miBf, it is inserted into the miBf and added to the silver/golden path. When GoldPath is constructing a silver path, and if the silver path has not reached the threshold number of bases, GoldPath will continue recruiting bases from the input reads. If the threshold number of bases is reached, GoldPath will check if more silver paths need to be generated. If more silver paths are needed, GoldPath will create them using the same algorithm and parameters, otherwise, it will terminate. Five (by default) silver paths, each representing ~0.9X (by default) coverage of the target genome, are combined to generate a low-coverage subsample input for GoldPath to build the golden path. When creating the golden path, GoldPath will continue iterating over the sequences from the silver paths until all sequences are exhausted.

Back to article page