SlideShare a Scribd company logo
Voice Intonation Transformation Using Segmental Linear
Mapping of Pitch Contour
Amit Banerjee, Sakshi Pandey, K.M. Khushboo
1.Introduction
• Sound signal is a continuous air pressure variation in time.
• A sound can be considered as an one-dimensional continuous
signal with time as a free variable.
• Speech is like any other sound is a continuous air pressure
variation.
• The prosodic information in the speech signal helps in the human
speech perception. It creates an impression in the mind of the
listener about learned characteristics like the dialect, tone and
pitch.
1.Introduction
• Voice transformation converts the voice of a speaker to an intended
person, such that the listener is deceived for the target speaker.
• Consider y(t) and x(t) as two different sound signal than a
mapping function f can be defined as:
y(t) = f(x(t)).
• The idea is to transform the sound signal to the desired target sound
using mapping function f.
2.Problem Definition
• The paper, uses functional mapping for the pitch transformation in
voice, to map the pitch contour of source voice to a target voice
using segmental linear mapping function.
• Extracts the invariant features the source and target vocal signals to
perform the mapping for voice transformation.
• The invariant features correspond to the linguist parameter set,
used to performs the segmentation of the pitch contour and finally
perform the voice intonation transformation.
3.Methodology
Figure 1: Pitch Transformation in Human Voice
3.Methodology
• The process of mapping pitch contour for voice transformation is
divided in four steps :
(1) Pitch Contour Extraction and Smoothing
(2) Extraction of the Linguistic Parameter Set
(3) Segmental Linear Mapping
(4) Re-synthesis of the Audio Signal.
3.1 Pitch Contour Extraction and Smoothing
• The audio signals are used to extract pitch using the “Yet Another
Pitch tracking algorithm” (YAAPT).
• Smoothing is performed on extracted pitch contour.
3.1 Pitch Contour Extraction and Smoothing
Figure 2. (a) Pitch Contour (b) Smoothed Pitch Contour
3.2 Extraction of the Linguistic Parameter
• In the voiced segment, we extract the linguistically motivated
parameter set to capture the intonation of a speaker.
• Sentence-Initial High(S)
• Non-Initial Accent Peak(H)
• Post-Accent Valleys(L)
• Sentence-Final Low(F)
3.2 Extraction of the Linguistic Parameter
Figure 3. Linguistically motivated parameter set
3.3 Segmental Linear Mapping
3.4 Re-synthesis of the Audio Signal.
• The pitch-marks are generated on the transformed pitch contour.
• Finally, the modified pitch contour is re-synthesized using pitch
synchronous overlapping add (PSOLA) to generate the transformed
speech signal.
4.Results
Figure 4. Pitch in target speech and transformed speech
4.Results
TABLE I. LINGUISTIC PARAMETER
SET
4.Results
FIGURE 5. PITCH CONTOUR OF TARGET AND
TRANSFORMED SPEECH SIGNAL
5.Conclusions
• This paper, investigates voice intonation transformation by segmental
mapping of the pitch contour by extraction of the linguistic parameter
set from the source and target voices.
• The obtained results of the modified pitch contour shows The approach
discussed in the paper captures the intonation of the target speaker but
voice quality is same as the source speaker.
• The results are better in cases when the target and the source signals are
similar. However, further investigation needs to be done to capture the
voice quality of the target speaker
References
1. J. W. Shin, J.-H. Chang, and N. S. Kim, “Voice activity detection based on
statistical models and machine learning approaches,” Computer Speec &
Language, vol. 24, no. 3, pp. 515–530, 2010.
2. Y. Stylianou, “Voice transformation: a survey,” in Acoustics, Speech and Signal
Processing, 2009. ICASSP 2009. IEEE International Conference on, pp. 3585–
3588, IEEE, 2009.
3. E. E. Helander and J. Nurminen, “A novel method for prosody prediction in
voice conversion,” in Acoustics, Speech and Signal Processing, 2007. ICASSP
2007. IEEE International Conference on, vol. 4, pp. IV–509, IEEE, 2007.
4. L. M. Arslan and D. Talkin, “Voice conversion by codebook mapping of line
spectral frequencies and excitation spectrum,” in In Proc. EUROSPEECH, pp.
1347–1350, 1997.
References
5. B. Gillett and S. King, “Transforming f0 contours,” 2003.
6. J. P. Campbell, “Speaker recognition: A tutorial,” Proceedings of the IEEE,
vol. 85, no. 9, pp. 1437–1462, 1997.
7. Q. Liu, M. Yao, H. Xu, and F. Wang, “Research on different feature
parameters in speaker recognition,” Journal of Signal and Information
Processing, vol. 4, no. 02, p. 106, 2013.
8. A. G. Adami, R. Mihaescu, D. A. Reynolds, and J. J. Godfrey, “Modeling
prosodic dynamics for speaker recognition,” in Acoustics, Speech, and
Signal Processing, 2003. Proceedings.(ICASSP’03). 2003 IEEE International
Conference on, vol. 4, pp. IV–788, IEEE, 2003.
References
9. E. Shriberg, D. R. Ladd, J. Terken, and A. Stolcke, “Modeling pitch range variation within
and across speakers predicting f 0 targets when speaking up,” in Proceedings of the 4th
international conference on spoken language processing, pp. 1–4, 1996.
10. S. A. Zahorian and H. Hu, “A spectral/temporal method for robust fundamental frequency
tracking,” The Journal of the Acoustical Society of America, vol. 123, no. 6, pp. 4559–4571,
2008.
11. X. Zhao, D. O’Shaughnessy, and N. Minh-Quang, “A processing method for pitch
smoothing based on autocorrelation and cepstral f0 detection approaches,” in Signals,
Systems and Electronics, 2007. ISSSE’07. International Symposium on, pp. 59–62, IEEE,
2007.
12. D. J. Patterson, Linguistic approach to pitch range modelling. PhD thesis, Edinburgh
University, 2000.
13. A. Mousa, “Voice conversion using pitch shifting algorithm by time stretching with psola
and re-sampling,” Journal of electrical engineering, vol. 61, no. 1, pp. 57–61, 2010.
THANK YOU

More Related Content

PDF
Animal Voice Morphing System
DOCX
Voice morphing document
PDF
Analysis of Suitable Extraction Methods and Classifiers For Speaker Identific...
DOC
Voice Morphing
PPTX
Voice Morping ppt
PDF
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
PDF
Voice Morphing System for People Suffering from Laryngectomy
PDF
Pitch detection from singing voice, advantages, limitations and applications ...
Animal Voice Morphing System
Voice morphing document
Analysis of Suitable Extraction Methods and Classifiers For Speaker Identific...
Voice Morphing
Voice Morping ppt
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
Voice Morphing System for People Suffering from Laryngectomy
Pitch detection from singing voice, advantages, limitations and applications ...

Similar to Voice_Intonation Transformation using Linear Mapping (20)

PPT
Voice morphing-101113123852-phpapp01
PPT
Voice morphing-101113123852-phpapp01 (1)
PDF
An Effective Approach for Chinese Speech Recognition on Small Size of Vocabulary
PDF
Transformation of feelings using pitch parameter for Marathi speech
PDF
Speech Analysis and synthesis using Vocoder
PDF
Bz33462466
PDF
Bz33462466
PDF
Identification of frequency domain using quantum based optimization neural ne...
PDF
17. 22071.pdf
DOC
Speaker recognition on matlab
PDF
Bachelors project summary
PDF
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
PDF
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
PDF
Course report-islam-taharimul (1)
PDF
The past, present and future of singing synthesis
PDF
Speaker Recognition System using MFCC and Vector Quantization Approach
PDF
IRJET- Pitch Detection Algorithms in Time Domain
PDF
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
PDF
Speaker Recognition Using Vocal Tract Features
Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01 (1)
An Effective Approach for Chinese Speech Recognition on Small Size of Vocabulary
Transformation of feelings using pitch parameter for Marathi speech
Speech Analysis and synthesis using Vocoder
Bz33462466
Bz33462466
Identification of frequency domain using quantum based optimization neural ne...
17. 22071.pdf
Speaker recognition on matlab
Bachelors project summary
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
Course report-islam-taharimul (1)
The past, present and future of singing synthesis
Speaker Recognition System using MFCC and Vector Quantization Approach
IRJET- Pitch Detection Algorithms in Time Domain
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
Speaker Recognition Using Vocal Tract Features
Ad

Recently uploaded (20)

PPTX
Cell Structure & Organelles in detailed.
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
RMMM.pdf make it easy to upload and study
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Cell Types and Its function , kingdom of life
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
Cell Structure & Organelles in detailed.
O7-L3 Supply Chain Operations - ICLT Program
Renaissance Architecture: A Journey from Faith to Humanism
VCE English Exam - Section C Student Revision Booklet
Pharmacology of Heart Failure /Pharmacotherapy of CHF
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
RMMM.pdf make it easy to upload and study
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Microbial disease of the cardiovascular and lymphatic systems
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Module 4: Burden of Disease Tutorial Slides S2 2025
O5-L3 Freight Transport Ops (International) V1.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Complications of Minimal Access Surgery at WLH
Microbial diseases, their pathogenesis and prophylaxis
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Cell Types and Its function , kingdom of life
human mycosis Human fungal infections are called human mycosis..pptx
Ad

Voice_Intonation Transformation using Linear Mapping

  • 1. Voice Intonation Transformation Using Segmental Linear Mapping of Pitch Contour Amit Banerjee, Sakshi Pandey, K.M. Khushboo
  • 2. 1.Introduction • Sound signal is a continuous air pressure variation in time. • A sound can be considered as an one-dimensional continuous signal with time as a free variable. • Speech is like any other sound is a continuous air pressure variation. • The prosodic information in the speech signal helps in the human speech perception. It creates an impression in the mind of the listener about learned characteristics like the dialect, tone and pitch.
  • 3. 1.Introduction • Voice transformation converts the voice of a speaker to an intended person, such that the listener is deceived for the target speaker. • Consider y(t) and x(t) as two different sound signal than a mapping function f can be defined as: y(t) = f(x(t)). • The idea is to transform the sound signal to the desired target sound using mapping function f.
  • 4. 2.Problem Definition • The paper, uses functional mapping for the pitch transformation in voice, to map the pitch contour of source voice to a target voice using segmental linear mapping function. • Extracts the invariant features the source and target vocal signals to perform the mapping for voice transformation. • The invariant features correspond to the linguist parameter set, used to performs the segmentation of the pitch contour and finally perform the voice intonation transformation.
  • 5. 3.Methodology Figure 1: Pitch Transformation in Human Voice
  • 6. 3.Methodology • The process of mapping pitch contour for voice transformation is divided in four steps : (1) Pitch Contour Extraction and Smoothing (2) Extraction of the Linguistic Parameter Set (3) Segmental Linear Mapping (4) Re-synthesis of the Audio Signal.
  • 7. 3.1 Pitch Contour Extraction and Smoothing • The audio signals are used to extract pitch using the “Yet Another Pitch tracking algorithm” (YAAPT). • Smoothing is performed on extracted pitch contour.
  • 8. 3.1 Pitch Contour Extraction and Smoothing Figure 2. (a) Pitch Contour (b) Smoothed Pitch Contour
  • 9. 3.2 Extraction of the Linguistic Parameter • In the voiced segment, we extract the linguistically motivated parameter set to capture the intonation of a speaker. • Sentence-Initial High(S) • Non-Initial Accent Peak(H) • Post-Accent Valleys(L) • Sentence-Final Low(F)
  • 10. 3.2 Extraction of the Linguistic Parameter Figure 3. Linguistically motivated parameter set
  • 12. 3.4 Re-synthesis of the Audio Signal. • The pitch-marks are generated on the transformed pitch contour. • Finally, the modified pitch contour is re-synthesized using pitch synchronous overlapping add (PSOLA) to generate the transformed speech signal.
  • 13. 4.Results Figure 4. Pitch in target speech and transformed speech
  • 15. 4.Results FIGURE 5. PITCH CONTOUR OF TARGET AND TRANSFORMED SPEECH SIGNAL
  • 16. 5.Conclusions • This paper, investigates voice intonation transformation by segmental mapping of the pitch contour by extraction of the linguistic parameter set from the source and target voices. • The obtained results of the modified pitch contour shows The approach discussed in the paper captures the intonation of the target speaker but voice quality is same as the source speaker. • The results are better in cases when the target and the source signals are similar. However, further investigation needs to be done to capture the voice quality of the target speaker
  • 17. References 1. J. W. Shin, J.-H. Chang, and N. S. Kim, “Voice activity detection based on statistical models and machine learning approaches,” Computer Speec & Language, vol. 24, no. 3, pp. 515–530, 2010. 2. Y. Stylianou, “Voice transformation: a survey,” in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, pp. 3585– 3588, IEEE, 2009. 3. E. E. Helander and J. Nurminen, “A novel method for prosody prediction in voice conversion,” in Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, vol. 4, pp. IV–509, IEEE, 2007. 4. L. M. Arslan and D. Talkin, “Voice conversion by codebook mapping of line spectral frequencies and excitation spectrum,” in In Proc. EUROSPEECH, pp. 1347–1350, 1997.
  • 18. References 5. B. Gillett and S. King, “Transforming f0 contours,” 2003. 6. J. P. Campbell, “Speaker recognition: A tutorial,” Proceedings of the IEEE, vol. 85, no. 9, pp. 1437–1462, 1997. 7. Q. Liu, M. Yao, H. Xu, and F. Wang, “Research on different feature parameters in speaker recognition,” Journal of Signal and Information Processing, vol. 4, no. 02, p. 106, 2013. 8. A. G. Adami, R. Mihaescu, D. A. Reynolds, and J. J. Godfrey, “Modeling prosodic dynamics for speaker recognition,” in Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03). 2003 IEEE International Conference on, vol. 4, pp. IV–788, IEEE, 2003.
  • 19. References 9. E. Shriberg, D. R. Ladd, J. Terken, and A. Stolcke, “Modeling pitch range variation within and across speakers predicting f 0 targets when speaking up,” in Proceedings of the 4th international conference on spoken language processing, pp. 1–4, 1996. 10. S. A. Zahorian and H. Hu, “A spectral/temporal method for robust fundamental frequency tracking,” The Journal of the Acoustical Society of America, vol. 123, no. 6, pp. 4559–4571, 2008. 11. X. Zhao, D. O’Shaughnessy, and N. Minh-Quang, “A processing method for pitch smoothing based on autocorrelation and cepstral f0 detection approaches,” in Signals, Systems and Electronics, 2007. ISSSE’07. International Symposium on, pp. 59–62, IEEE, 2007. 12. D. J. Patterson, Linguistic approach to pitch range modelling. PhD thesis, Edinburgh University, 2000. 13. A. Mousa, “Voice conversion using pitch shifting algorithm by time stretching with psola and re-sampling,” Journal of electrical engineering, vol. 61, no. 1, pp. 57–61, 2010.