The document proposes a hierarchical feature fusion network (HFFN) for multimodal affective computing. The HFFN uses a divide and conquer approach by dividing input into windows and performing local fusion on modalities within each window. It then combines the locally fused representations with an attention-based LSTM. The HFFN achieves state-of-the-art performance on multimodal sentiment analysis datasets like CMU-MOSI by capturing dependencies at both local and global levels.