The document discusses human action recognition using motion representation techniques and compares systems with and without human presence in videos. Through experiments involving the UCF101 dataset, it reveals that a two-stream CNN can achieve significant accuracy even in the absence of human input, although human presence improves recognition. It emphasizes the need for improved motion representation systems focused solely on motion, highlighting the dependency of current methods on background information.
Related topics: