The research investigates the effect of dynamic time warping (DTW) combined with mel-frequency cepstral coefficients (MFCC) and hidden Markov model (HNM) for aligning speech signals segmented at phrase, word, and phoneme levels. Results indicate that using the HNM model leads to improved alignment and reduced error in matching speech signals compared to MFCC alone, particularly at the phrase level. The study highlights the significant variability in alignment errors among different speaker combinations, specifically noting that alignment errors are larger when transitioning from female to male speakers.
Related topics: