Matrix decomposition and computation constitute an important part of various signal processing, image processing, and communication systems. A... Show moreMatrix decomposition and computation constitute an important part of various signal processing, image processing, and communication systems. A better solution in terms of power, performance, and area, can lead to improved performance of the whole system. Designing and testing a new idea is a big challenge due to time limitations. So, a better implementation flow using High-Level Synthesis is discussed. This flow is used to implement QR decomposition algorithms. Three different QR factorization techniques, Gram-Schmidt, Givens Rotation, and Householder Transformation is discussed. These algorithms are compared in terms of area, performance, and precision. All the algorithms are implemented with two different variations in terms of the data type used. A 32-bit floating-point implementation and 16-bit fixed-point implementation are discussed. Results for different designs with various optimization techniques like loop unrolling and pipelining are presented. A scalable architecture is implemented for all the algorithms which are compared for a 10 × 10 matrix architecture. Results for scaled up 100 × 100 matrix architecture are also discussed for the Gram- Schmidt algorithm. Gram-Schmidt had the best performance in all. The performance of Gram Schmidt algorithm was improved by a factor of 3 for 10 × 10 matrix size and by a factor of up to 10 for 100 × 100 matrix size using different optimizations. Givens rotation was close in terms of performance, but the Householder Transformation was four times slower compared to other two algorithms, the reason being the high complexity of the algorithm. All floating-point implementations had nearly 100% precision but varied from 3% to 5% in average error for fixed-point data-type for a 10 × 10 implementation. All the algorithms were coded in C++ and synthesized using High-Level Synthesis using Xilinx Vivado HLS 2016.4 tool. This generated an IP core which was imported to Xilinx Vivado 2016.4 for implementation. The design was targeted for Zedboard, a Zynq – 7020 Extensible Processing Platform (EPP) Development Kit, which has a 7 series Xilinx FPGA architecture and a dual core ARM Cortex A-9 processor. M.S. in Computer Engineering, May 2017 Show less