Supplementary Figure 2: Illustration of clustering based on novo k-mers and associated read pairs.
From: novoBreak: local assembly for breakpoint detection in cancer genomes

At each breakpoint, there are k-1 k-mers covering the breakpoint. If a read fully covers a breakpoint, it must contain several k-mers (< k-1 if there are sequencing errors) covering the breakpoint. On the other hand, there should be several read pairs sharing identical k-mers, given sufficient coverage. Based on this relationship, a union-find algorithm is applied to accomplish the clustering procedure.