This document describes a system for object detection and translation for blind people using deep learning. The system uses a camera inside a box to capture images of objects placed inside. Mask R-CNN is used to detect objects in the images by generating proposals for object regions, predicting the class of the object, and defining a bounding box. The class name is then converted to speech using pyttsx3 text-to-speech, allowing blind users to identify objects audibly. The system aims to help blind people recognize everyday objects through computer vision and speech translation techniques with a low-cost portable design.