The document describes optimizing a lighting calculation for the SPU by analyzing memory requirements, partitioning data, and rearranging data for a streaming model. It then provides an example of optimizing a lighting calculation function, including vectorizing the calculation by hand to process 4 vertices simultaneously. The optimizations reduced the calculation time from 231.6 cycles per vertex per light to 208.5 cycles through compiler hints and further to an estimated higher performance by manual vectorization.