HERP: Hardware for Energy Efficient and Realtime DB Search and Cluster Expansion in Proteomics
Abstract
Database (DB) search and clustering are fundamental in proteomics but conventional full clustering and search approaches demand high resources and incur long latency. We propose a lightweight incremental clustering and highly parallelizable DB search platform tailored for resource-constrained environments, delivering low energy and latency without compromising performance. By leveraging mass-spectrometry insights, we employ bucket-wise parallelization and query scheduling to reduce latency. A one-time hardware initialization with pre-clustered proteomics data enables continuous DB search and local re-clustering, offering a more practical and efficient alternative to clustering from scratch. Heuristics from pre-clustered data guide incremental clustering, accelerating the process by 20x with only a 0.3% increase in clustering error. DB search results overlap by 96% with state-of-the-art tools, validating search quality. The hardware leverages a 3T 2M T J SOT-CAM at the 7nm node with a compute-in-memory design. For the human genome draft dataset (131GB), setup requires 1.19mJ for 2M spectra, while a 1000 query search consumes 1.1{\mu}J. Bucket-wise parallelization further achieves 100x speedup.