The document describes a machine learning pipeline for photo OCR with several stages, including sliding window detection to scan frames of different sizes over an image and detect pedestrians, sliding window text detection by identifying white and gray regions as text probabilities, and artificial data synthesis to amplify the training set by generating synthetic data with varied fonts on different backgrounds.