High-Performance Pipelined NTT Accelerators with Homogeneous Digit-Serial Modulo Arithmetic
Abstract
The Number Theoretic Transform (NTT) is a fundamental operation in privacy-preserving technologies, particularly within fully homomorphic encryption (FHE). The efficiency of NTT computation directly impacts the overall performance of FHE, making hardware acceleration a critical technology that will enable realistic FHE applications. Custom accelerators, in FPGAs or ASICs, offer significant performance advantages due to their ability to exploit massive parallelism and specialized optimizations. However, the operation of NTT over large moduli requires large word-length modulo arithmetic that limits achievable clock frequencies in hardware and increases hardware area costs. To overcome such deficits, digit-serial arithmetic has been explored for modular multiplication and addition independently. The goal of this work is to leverage digit-serial modulo arithmetic combined with appropriate redundant data representation to design modular pipelined NTT accelerators that operate uniformly on arbitrary small digits, without the need for intermediate (de)serialization. The proposed architecture enables high clock frequencies through regular pipelining while maintaining parallelism. Experimental results demonstrate that the proposed approach outperforms state-of-the-art implementations and reduces hardware complexity under equal performance and input-output bandwidth constraints.