This document summarizes a tutorial on object detection beyond RetinaNet and Mask R-CNN. It discusses challenges in object detection including the backbone network, detection head, pretraining, handling scale variations, large batch sizes, detecting objects in crowds, and neural architecture search. It also introduces recent works that aim to address these challenges, such as DetNet, Light Head R-CNN, Objects365 pretraining, SFace for scale, MegDet for batch size, CrowdHuman benchmark for crowds, and NAS approaches. The document concludes that further improving object detection requires focusing on details and that continued progress will significantly benefit computer vision applications.