This document describes a karaoke-style read-aloud system that uses speech alignment and text-to-speech technology. It involves using a text-to-speech API to generate an audio file from text, then aligning the audio with the text using hidden Markov model tools (HTK) to create a timed text file. This allows highlighting text as it is read like a karaoke system and has applications for language learning by allowing shadowing of speech. The process involves text preprocessing, audio generation and processing, phonetic transcription, forced alignment with HTK, and output of a timed text file.