This document discusses methods to reduce computational complexity of mathematical functions using Field Programmable Gate Arrays (FPGAs) through parallel processing. It outlines various optimization directives such as loop unrolling and pipelining applied to fixed-point and floating-point addition and matrix multiplication, showing significant improvements in execution time and hardware efficiency. Experimental results demonstrate reductions in delay and increases in hardware usage as compared to conventional sequential processing.