The document presents a new method for unsupervised representation learning called Auto-Encoding Transformations (AET). AET trains models to encode and reconstruct transformations like rotations that are applied to images, rather than directly encoding the image pixels like traditional autoencoders. This avoids trivial solutions and forces the model to learn representations focused on the semantic content of images rather than surface statistics. The method outperforms previous self-supervised and generative methods on downstream tasks like object detection and segmentation.