Voice_Intonation Transformation using Linear Mapping

Voice Intonation Transformation Using Segmental Linear
Mapping of Pitch Contour
Amit Banerjee, Sakshi Pandey, K.M. Khushboo

1.Introduction
• Sound signal is a continuous air pressure variation in time.
• A sound can be considered as an one-dimensional continuous
signal with time as a free variable.
• Speech is like any other sound is a continuous air pressure
variation.
• The prosodic information in the speech signal helps in the human
speech perception. It creates an impression in the mind of the
listener about learned characteristics like the dialect, tone and
pitch.

1.Introduction
• Voice transformation converts the voice of a speaker to an intended
person, such that the listener is deceived for the target speaker.
• Consider y(t) and x(t) as two different sound signal than a
mapping function f can be defined as:
y(t) = f(x(t)).
• The idea is to transform the sound signal to the desired target sound
using mapping function f.

2.Problem Definition
• The paper, uses functional mapping for the pitch transformation in
voice, to map the pitch contour of source voice to a target voice
using segmental linear mapping function.
• Extracts the invariant features the source and target vocal signals to
perform the mapping for voice transformation.
• The invariant features correspond to the linguist parameter set,
used to performs the segmentation of the pitch contour and finally
perform the voice intonation transformation.

3.Methodology
Figure 1: Pitch Transformation in Human Voice

3.Methodology
• The process of mapping pitch contour for voice transformation is
divided in four steps :
(1) Pitch Contour Extraction and Smoothing
(2) Extraction of the Linguistic Parameter Set
(3) Segmental Linear Mapping
(4) Re-synthesis of the Audio Signal.

3.1 Pitch Contour Extraction and Smoothing
• The audio signals are used to extract pitch using the “Yet Another
Pitch tracking algorithm” (YAAPT).
• Smoothing is performed on extracted pitch contour.

3.1 Pitch Contour Extraction and Smoothing
Figure 2. (a) Pitch Contour (b) Smoothed Pitch Contour

3.2 Extraction of the Linguistic Parameter
• In the voiced segment, we extract the linguistically motivated
parameter set to capture the intonation of a speaker.
• Sentence-Initial High(S)
• Non-Initial Accent Peak(H)
• Post-Accent Valleys(L)
• Sentence-Final Low(F)

3.2 Extraction of the Linguistic Parameter
Figure 3. Linguistically motivated parameter set

3.4 Re-synthesis of the Audio Signal.
• The pitch-marks are generated on the transformed pitch contour.
• Finally, the modified pitch contour is re-synthesized using pitch
synchronous overlapping add (PSOLA) to generate the transformed
speech signal.

4.Results
Figure 4. Pitch in target speech and transformed speech

4.Results
TABLE I. LINGUISTIC PARAMETER
SET

4.Results
FIGURE 5. PITCH CONTOUR OF TARGET AND
TRANSFORMED SPEECH SIGNAL

5.Conclusions
• This paper, investigates voice intonation transformation by segmental
mapping of the pitch contour by extraction of the linguistic parameter
set from the source and target voices.
• The obtained results of the modified pitch contour shows The approach
discussed in the paper captures the intonation of the target speaker but
voice quality is same as the source speaker.
• The results are better in cases when the target and the source signals are
similar. However, further investigation needs to be done to capture the
voice quality of the target speaker

References
1. J. W. Shin, J.-H. Chang, and N. S. Kim, “Voice activity detection based on
statistical models and machine learning approaches,” Computer Speec &
Language, vol. 24, no. 3, pp. 515–530, 2010.
2. Y. Stylianou, “Voice transformation: a survey,” in Acoustics, Speech and Signal
Processing, 2009. ICASSP 2009. IEEE International Conference on, pp. 3585–
3588, IEEE, 2009.
3. E. E. Helander and J. Nurminen, “A novel method for prosody prediction in
voice conversion,” in Acoustics, Speech and Signal Processing, 2007. ICASSP
2007. IEEE International Conference on, vol. 4, pp. IV–509, IEEE, 2007.
4. L. M. Arslan and D. Talkin, “Voice conversion by codebook mapping of line
spectral frequencies and excitation spectrum,” in In Proc. EUROSPEECH, pp.
1347–1350, 1997.

References
5. B. Gillett and S. King, “Transforming f0 contours,” 2003.
6. J. P. Campbell, “Speaker recognition: A tutorial,” Proceedings of the IEEE,
vol. 85, no. 9, pp. 1437–1462, 1997.
7. Q. Liu, M. Yao, H. Xu, and F. Wang, “Research on different feature
parameters in speaker recognition,” Journal of Signal and Information
Processing, vol. 4, no. 02, p. 106, 2013.
8. A. G. Adami, R. Mihaescu, D. A. Reynolds, and J. J. Godfrey, “Modeling
prosodic dynamics for speaker recognition,” in Acoustics, Speech, and
Signal Processing, 2003. Proceedings.(ICASSP’03). 2003 IEEE International
Conference on, vol. 4, pp. IV–788, IEEE, 2003.

References
9. E. Shriberg, D. R. Ladd, J. Terken, and A. Stolcke, “Modeling pitch range variation within
and across speakers predicting f 0 targets when speaking up,” in Proceedings of the 4th
international conference on spoken language processing, pp. 1–4, 1996.
10. S. A. Zahorian and H. Hu, “A spectral/temporal method for robust fundamental frequency
tracking,” The Journal of the Acoustical Society of America, vol. 123, no. 6, pp. 4559–4571,
2008.
11. X. Zhao, D. O’Shaughnessy, and N. Minh-Quang, “A processing method for pitch
smoothing based on autocorrelation and cepstral f0 detection approaches,” in Signals,
Systems and Electronics, 2007. ISSSE’07. International Symposium on, pp. 59–62, IEEE,
2007.
12. D. J. Patterson, Linguistic approach to pitch range modelling. PhD thesis, Edinburgh
University, 2000.
13. A. Mousa, “Voice conversion using pitch shifting algorithm by time stretching with psola
and re-sampling,” Journal of electrical engineering, vol. 61, no. 1, pp. 57–61, 2010.

Voice_Intonation Transformation using Linear Mapping

More Related Content

Similar to Voice_Intonation Transformation using Linear Mapping (20)

Recently uploaded (20)

Voice_Intonation Transformation using Linear Mapping