This document presents a system to aid visually impaired people using object and text detection with deep learning. Object detection is performed using a convolutional neural network trained on datasets like MS-COCO and PASCAL VOC. Text detection uses a fully convolutional neural network. Detected objects and text are converted to speech using a text-to-speech synthesizer to help visually impaired users understand their surroundings. The system achieves real-time object detection and can detect multiple objects and text in an image with reasonable accuracy depending on lighting and other conditions.