This document discusses neural image caption generation using attention mechanisms. It introduces image caption generators that previously lost information using high-level representations or needed powerful mechanisms when using low-level representations. It then describes using an encoder-decoder model with CNN and RNN, and explores two types of attention mechanisms: stochastic "hard" attention and deterministic "soft" attention to better generate image captions while preserving important information.