Resource and delay efficient matrix multiplication using newer FPGA devices

Matrix multiplication is a fundamental building block for many applications including image processing, coding, and digital signal processing. This paper presents a delay and resource efficient methodology for implementing integer and floating point matrix multiplication using FPG As. We present a scalable architecture based on new FPGA features that provides a significant reduction in total computation time and resource utilization over previous solutions. The implementation of our algorithm for various matrix dimensions using Xilinx FPGAs is also described. When compared to the best previously reported method, our approach achieves an improvement in the parallelization of 60% for 64-bit floating point computations. Copyright 2006 ACM.

Resource and delay efficient matrix multiplication using newer FPGA devices Conference Paper