This document discusses a literature survey on image and linguistic visual question answering. It aims to develop a model that achieves higher performance than state-of-the-art solutions by exploring different existing models and developing a custom model. The paper reviews several existing models for visual question answering and image classification using convolutional neural networks. It also discusses developing a new dataset for visual question answering using automated question generation from image descriptions.