DLA Demo

The Hybrid fixed point CNN accelerator (called H-NPU) is developed to further improve the energy efficiency of accelerating the CNN model as compared to floating point GPU acceleration and even INT-8 GPU acceleration. With the hybrid fixed point arithmetic (i.e. 8-b/4-b/2-b/1-b) for different layers of CNN operations, the proposed H-NPU is able to further reduce the model complexity and data bandwidth as compared to the INT-8 GPU acceleration in terms of 64% improvement under the example of Mobilenet +SSD. The H-NPU has been verified in Xilinx ZCU-102 FPGA with the performance of 690 GOPS (8-bit), 1.38 TOPS (4-bit), 2.76 TOPS (2-bit), and 5.52 TOPS (1-bit) when operated at 150MHz. The H-NPU 1.0 can only support one-stage CNN model inferencing. The next release of H-NPU 2.0 will support one-stage CNN model training as well as continual learning.

 

與浮點 GPU 加速甚至 INT-8 GPU 加速相比,混合定點 CNN 加速器(稱為 H-NPU)的開發是為了進一步提高加速 CNN 模型的能效。 對於不同層的 CNN 操作,通過混合定點算法(即 8-b/4-b/2-b/1-b),所提出的 H-NPU 能夠進一步降低模型複雜度和數據帶寬。 在 Mobilenet + SSD 的例子下,INT-8 GPU 加速提高了 64%。 H-NPU 已在 Xilinx ZCU-102 FPGA 中得到驗證,性能為 690 GOPS(8 位)、1.38 TOPS(4 位)、2.76 TOPS(2 位)和 5.52 TOPS(1 位)時 運行在 150MHz。 H-NPU 1.0 只能支援一級 CNN 模型推理。 下一版本的 H-NPU 2.0 將支持單階段 CNN 模型訓練以及持續學習。