This document proposes a probabilistic model for learning hierarchical visual representations in a recursive manner. The model, based on Latent Dirichlet Allocation, learns image features at multiple layers of abstraction jointly rather than in a strictly feedforward way. The model represents local image patches as distributions over visual words at the lowest layer, and higher layers learn distributions over the representations of lower layers. Evaluating the model on a standard recognition dataset, it outperforms existing hierarchical models and achieves performance on par with state-of-the-art single-feature models, demonstrating the benefits of joint learning and inference in hierarchical visual processing.