The research investigates the improvement in alignment of speech phrases using dynamic time warping (DTW) for speech synthesis and recognition. By analyzing recordings from six speakers, it was found that phoneme-based alignment outperforms phrase-based alignment, achieving over 20% reduction in Mahalanobis distances. The study highlights the significance of using phoneme-level segmentation for enhancing speaker recognition and transformation efficiency.
Related topics: