The document presents a multi-stage CNN framework for temporal action localization in untrimmed videos, detailing the network architecture, training strategies, and segment generation methods. Experiments were conducted on datasets such as thumos 2014, demonstrating the effectiveness of the proposed approach in identifying action categories and improving detection accuracy. Results indicate the benefit of integrating dense trajectory features with SVM for enhanced performance.
Related topics: