This weekly report discusses a new method for computing integral images using CUDA that improves performance. Specifically, it processes one pixel per thread using shared memory to store intermediate values from the previous row, calculating the current row and storing the result back to shared memory to process the next row. It also describes using Nsight, an IDE for CUDA applications, to debug issues when problems arise with the new compute by row method.