The document describes research on multimodal video genre categorization. It evaluates different techniques for fusing audio, visual, and text features to classify videos into genres. The best individual modality performances were 42.33% for audio features using Extremely Random Forests and 26.17% for MPEG-7 visual features also using Extremely Random Forests. Fusion of modalities improved classification accuracy over the individual modalities.