SuperUROP: An FPGA-Based Spatial Accelerator for Sparse Matrix Operations
Abstract
Solving sparse systems of linear equations is a fundamental problem in the field of numerical methods, with applications spanning from circuit design to urban planning. These problems can have millions of constraints, such as when laying out transistors on a circuit, or trying to optimize traffic light timings, making fast sparse solvers extremely important. However, existing state-of-the-art software-level solutions for solving sparse linear systems, termed iterative solvers, are extremely inefficient on current hardware. This inefficiency can be attributed to two key reasons: (1) poor short-term data reuse, which causes frequent, irregular memory accesses, and (2) complex data dependencies, which limit parallelism. Hence, in this paper, we present an FPGA implementation of the existing Azul accelerator, an SRAM-only hardware accelerator that achieves both high memory bandwidth utilization and arithmetic intensity. Azul features a grid of tiles, each of which is composed of a processing element (PE) and a small independent SRAM memory, which are all connected over a network on chip (NoC). We implement Azul on FPGA using simple RISC-V CPU cores connected to a memory hierarchy of different FPGA memory modules. We utilize custom RISC-V ISA augmentations to implement a task-based programming model for the various PEs, allowing communication over the NoC. Finally, we design simple distributed test cases so that we can functionally verify the FPGA implementation, verifying equivalent performance to an architectural simulation of the Azul framework.