The document presents an FPGA-based acceleration methodology and performance model for iterative stencil algorithms, highlighting techniques for intra-iterations parallelization and data management to improve performance and efficiency. It discusses previous work, experimental setups, and benchmarking results demonstrating significant performance gains and energy efficiency. The authors also outline future work and potential scaling on multi-FPGA systems.