The document outlines a method for separating handwritten and machine-printed text in images using the bag of visual words model, which describes image content through clusters of local features like SIFT. It details the process of codebook creation, visual word descriptor formation, and classification using support vector machines. Evaluation of the proposed method shows high F-measure results across various datasets, although challenges such as binarization failures and text overlapping remain.