Tesseract is an open-source optical character recognition (OCR) engine developed between 1984 and 1994. It uses a multi-step pipeline for processing images, including connecting component analysis, organizing blobs into text lines and words, and a two-pass recognition process. Tesseract can handle both black-on-white and white-on-black text through analyzing nesting of outlines.