The document discusses building a video conferencing application using VoiceXML and off-the-shelf components. It covers the system architecture including major components like the video conference application, video-enabled media server, and SIP/RTP protocols. It also describes controlling conference participants through features like muting, priority speaker selection, and manual/automated video source control.