This document summarizes a keynote address on recognizing emotions in spontaneous speech. It discusses how emotions can be recognized from audio in either a two-step process of speech to text then emotion analysis, or directly from audio in one step. Direct emotion recognition from audio is preferable as it is independent of speech recognition accuracy and can handle mixed languages. However, building corpora of spontaneous speech with emotion labels is challenging due to the difficulties of eliciting natural emotional speech and achieving consistent emotion annotations. The talk explores these issues and the importance of developing methods and resources for emotion recognition from spontaneous speech.