緯創-交大嵌入式人工智慧研究中心

The Hybrid fixed point CNN accelerator (called H-NPU) is developed to further improve the energy efficiency of accelerating the CNN model as compared to floating point GPU acceleration and even INT-8 GPU acceleration. With the hybrid fixed point arithmetic (i.e. 8-b/4-b/2-b/1-b) for different layers of CNN operations, the proposed H-NPU is able to further reduce the model complexity and data bandwidth as compared to the INT-8 GPU acceleration in terms of 64% improvement under the example of Mobilenet +SSD. The H-NPU has been verified in Xilinx ZCU-102 FPGA with the performance of 690 GOPS (8-bit), 1.38 TOPS (4-bit), 2.76 TOPS (2-bit), and 5.52 TOPS (1-bit) when operated at 150MHz. The H-NPU 1.0 can only support one-stage CNN model inferencing. The next release of H-NPU 2.0 will support one-stage CNN model training as well as continual learning.

與浮點 GPU 加速甚至 INT-8 GPU 加速相比，混合定點 CNN 加速器（稱為 H-NPU）的開發是為了進一步提高加速 CNN 模型的能效。對於不同層的 CNN 操作，通過混合定點算法（即 8-b/4-b/2-b/1-b），所提出的 H-NPU 能夠進一步降低模型複雜度和數據帶寬。在 Mobilenet + SSD 的例子下，INT-8 GPU 加速提高了 64%。 H-NPU 已在 Xilinx ZCU-102 FPGA 中得到驗證，性能為 690 GOPS（8 位）、1.38 TOPS（4 位）、2.76 TOPS（2 位）和 5.52 TOPS（1 位）時運行在 150MHz。 H-NPU 1.0 只能支援一級 CNN 模型推理。下一版本的 H-NPU 2.0 將支持單階段 CNN 模型訓練以及持續學習。