This document describes an implementation of cyclic convolution based on the Fermat Number Transform (FNT). It proposes using a Code Convolution method Without Addition (CCWA) and a Butterfly Operation method Without Addition (BOWA) to perform the FNT and inverse FNT without carry propagation additions, except for the final stages. This reduces delay compared to other cyclic convolution architectures. A parallel architecture is presented using these techniques along with Modulo 2n+1 Partial Product Multipliers to perform the point-wise multiplications with less hardware complexity. Synthesis results demonstrate this architecture has better throughput and less hardware complexity than reported solutions.