This document presents a framework for Bangla text-to-speech synthesis. It discusses factors that are important for Bangla speech synthesis like diphones and pronunciation variations based on position. The proposed framework includes text normalization rules, a syllable parser to break text into syllables based on rules, and the selection and normalization of audio files for syllables and diphones. An experiment was conducted with test articles to evaluate the accuracy of synthesized speech, showing some limitations but overall effectiveness of the framework. The framework aims to enable Bangla text-to-speech with a minimal set of audio files.