The document describes optimizing a 9-point image blurring algorithm on Intel Xeon and Xeon Phi processors. Initially, running the algorithm serially, Xeon was over 11 times faster than Xeon Phi. Adding OpenMP pragmas to enable vectorization improved performance further, with Xeon now over 3 times faster than Xeon Phi. Further optimizations discussed include adding thread parallelism and improving data access patterns.
Related topics: