The document describes research on correcting text extracted from PDF files for the LDS Church. It outlines problems in extracted text like combined words, empty lines, incorrect casing, etc. It discusses previous work on related tasks and the methods used in this research, including splitting words probabilistically, standardizing whitespace, and removing unwanted characters. The results show the word splitting method corrected over 80% of combined words with few incorrect splits, demonstrating the effectiveness of the implemented solutions.