This document proposes a system to annotate faces in videos with name labels by matching faces in video frames to a database of labeled facial images. The key steps are:
1) Extract frames from the input video.
2) Recognize faces in each frame and match them to labeled faces in the database to retrieve name annotations.
3) Associate the retrieved names to the corresponding faces in each frame.
Conditional random fields are used to model relationships between detected faces and names to determine the optimal face labeling that maximizes the joint probability. A within-video face labeling algorithm is presented using belief propagation on a constructed graph. Preliminary implementation results demonstrate taking a video as input and extracting its frames.