Figure 2

Overall system architecture and the execution flow of minimap2-fpga (a) Architecture of the CPU-FPGA heterogeneous system running minimap2-fpga. The multi-core CPU (Core 1, ..., Y), within the host, handles software executions, while the FPGA, within the device, hosts multiple hardware kernels (Kernel 1, ..., N) designed for hardware-accelerated chaining. They are connected via PCI Express for host-device communication. (b) Overall execution flow of hardware-software integrated minimap2-fpga. Multiple threads (upto Y) are launched by the CPU when minimap2-fpga starts. When a thread reaches the chaining stage, the chaining task is processed either on FPGA as hardware or on the same CPU thread as software, based on predicted execution and wait times. As a thread (denoted as t) reaches a point requiring the execution of a chaining task, it predicts the time needed for this task to complete on both hardware (\(T_{hardware}\)) and software (\(T_{software}\)). If \(T_{software} \le T_{hardware}\), the chaining task is executed on the same thread t as software. If \(T_{hardware} < T_{software}\), minimap2-fpga attempts to schedule the task on one of the N available hardware kernels, taking into account the wait time (\(T_{wait}\)) associated with accessing each hardware kernel. It searches for a kernel where the total processing time on hardware (\(T_{total} = T_{wait} + T_{hardware}\)) is less than \(T_{software}\), and if such a kernel is found, the chaining task is queued into the hardware access queue specific to that kernel (queue i) and later processed on the corresponding hardware kernel (kernel i) once the kernel is ready. If not, the task proceeds as software execution on the same thread t. This prediction-based dynamic hardware scheduling approach optimizes the utilization of smaller number of hardware kernels (N) from a higher number of software threads (upto Y) while ensuring efficient multi-threaded performance.