This document discusses mechanisms of bottom-up and top-down visual processing. It outlines that rapid recognition in humans can occur through feedforward processing alone, extracting the gist of scenes at 7 images per second without eye movements or expectations. Beyond this, top-down feedback and attention are needed to solve the "clutter problem" in complex scenes. It also describes the hierarchical architecture of object recognition in the ventral visual stream, from primary visual cortex to anterior inferior temporal cortex, with increasing complexity and invariance properties.