SlideShare a Scribd company logo
Language And Speech Processing 1st Edition
Joseph Mariani download
https://ebookbell.com/product/language-and-speech-processing-1st-
edition-joseph-mariani-2528496
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Statistical Language And Speech Processing 8th International
Conference Slsp 2020 Cardiff Uk October 1416 2020 Proceedings 1st Ed
Luis Espinosaanke
https://ebookbell.com/product/statistical-language-and-speech-
processing-8th-international-conference-slsp-2020-cardiff-uk-
october-1416-2020-proceedings-1st-ed-luis-espinosaanke-22497272
Statistical Language And Speech Processing First International
Conference Slsp 2013 Tarragona Spain July 2931 2013 Proceedings 1st
Edition Yoshua Bengio Auth
https://ebookbell.com/product/statistical-language-and-speech-
processing-first-international-conference-slsp-2013-tarragona-spain-
july-2931-2013-proceedings-1st-edition-yoshua-bengio-auth-4314662
Statistical Language And Speech Processing Second International
Conference Slsp 2014 Grenoble France October 1416 2014 Proceedings 1st
Edition Laurent Besacier
https://ebookbell.com/product/statistical-language-and-speech-
processing-second-international-conference-slsp-2014-grenoble-france-
october-1416-2014-proceedings-1st-edition-laurent-besacier-4932916
Statistical Language And Speech Processing Third International
Conference Slsp 2015 Budapest Hungary November 2426 2015 Proceedings
1st Edition Adrianhoria Dediu
https://ebookbell.com/product/statistical-language-and-speech-
processing-third-international-conference-slsp-2015-budapest-hungary-
november-2426-2015-proceedings-1st-edition-adrianhoria-dediu-5354880
Statistical Language And Speech Processing 4th International
Conference Slsp 2016 Pilsen Czech Republic October 1112 2016
Proceedings 1st Edition Pavel Krl
https://ebookbell.com/product/statistical-language-and-speech-
processing-4th-international-conference-slsp-2016-pilsen-czech-
republic-october-1112-2016-proceedings-1st-edition-pavel-krl-5607828
Statistical Language And Speech Processing 5th International
Conference Slsp 2017 Le Mans France October 2325 2017 Proceedings 1st
Edition Nathalie Camelin
https://ebookbell.com/product/statistical-language-and-speech-
processing-5th-international-conference-slsp-2017-le-mans-france-
october-2325-2017-proceedings-1st-edition-nathalie-camelin-6790768
Statistical Language And Speech Processing 6th International
Conference Slsp 2018 Mons Belgium October 1516 2018 Proceedings 1st Ed
Thierry Dutoit
https://ebookbell.com/product/statistical-language-and-speech-
processing-6th-international-conference-slsp-2018-mons-belgium-
october-1516-2018-proceedings-1st-ed-thierry-dutoit-7320198
Statistical Language And Speech Processing 7th International
Conference Slsp 2019 Ljubljana Slovenia October 1416 2019 Proceedings
1st Ed 2019 Carlos Martnvide
https://ebookbell.com/product/statistical-language-and-speech-
processing-7th-international-conference-slsp-2019-ljubljana-slovenia-
october-1416-2019-proceedings-1st-ed-2019-carlos-martnvide-10800606
Analysis And Application Of Natural Language And Speech Processing
Mourad Abbas
https://ebookbell.com/product/analysis-and-application-of-natural-
language-and-speech-processing-mourad-abbas-49166200
Language And Speech Processing 1st Edition Joseph Mariani
Language And Speech Processing 1st Edition Joseph Mariani
This page intentionally left blank
Spoken Language Processing
This page intentionally left blank
Spoken Language
Processing
Edited by
Joseph Mariani
First published in France in 2002 by Hermes Science/Lavoisier entitled Traitement automatique du
langage parlé 1 et 2 © LAVOISIER, 2002
First published in Great Britain and the United States in 2009 by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as
permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced,
stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers,
or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA.
Enquiries concerning reproduction outside these terms should be sent to the publishers at the
undermentioned address:
ISTE Ltd John Wiley & Sons, Inc.
27-37 St George’s Road 111 River Street
London SW19 4EU Hoboken, NJ 07030
UK USA
www.iste.co.uk www.wiley.com
© ISTE Ltd, 2009
The rights of Joseph Mariani to be identified as the author of this work have been asserted by him in
accordance with the Copyright, Designs and Patents Act 1988.
Library of Congress Cataloging-in-Publication Data
Traitement automatique du langage parlé 1 et 2. English
Spoken language processing / edited by Joseph Mariani.
p. cm.
Includes bibliographical references and index.
ISBN 978-1-84821-031-8
1. Automatic speech recognition. 2. Speech processing systems. I. Mariani, Joseph. II. Title.
TK7895.S65T7213 2008
006.4'54--dc22
2008036758
British Library Cataloguing-in-Publication Data
A CIP record for this book is available from the British Library
ISBN: 978-1-84821-031-8
Printed and bound in Great Britain by CPI Antony Rowe Ltd, Chippenham, Wiltshire.
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Chapter 1. Speech Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Christophe D’ALESSANDRO
1.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1. Source-filter model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2. Speech sounds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3. Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.4. Vocal tract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.1.5. Lip-radiation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2. Linear prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2.1. Source-filter model and linear prediction . . . . . . . . . . . . . . . . 18
1.2.2. Autocorrelation method: algorithm . . . . . . . . . . . . . . . . . . . 21
1.2.3. Lattice filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.2.4. Models of the excitation . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.3. Short-term Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.3.1. Spectrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.3.2. Interpretation in terms of filter bank. . . . . . . . . . . . . . . . . . . 36
1.3.3. Block-wise interpretation . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.3.4. Modification and reconstruction . . . . . . . . . . . . . . . . . . . . . 38
1.4. A few other representations . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.4.1. Bilinear time-frequency representations . . . . . . . . . . . . . . . . 39
1.4.2. Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.4.3. Cepstrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.4.4. Sinusoidal and harmonic representations . . . . . . . . . . . . . . . . 46
1.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
1.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
vi Spoken Language Processing
Chapter 2. Principles of Speech Coding . . . . . . . . . . . . . . . . . . . . . . 55
Gang FENG and Laurent GIRIN
2.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.1.1. Main characteristics of a speech coder . . . . . . . . . . . . . . . . . 57
2.1.2. Key components of a speech coder . . . . . . . . . . . . . . . . . . . 59
2.2. Telephone-bandwidth speech coders . . . . . . . . . . . . . . . . . . . . . 63
2.2.1. From predictive coding to CELP. . . . . . . . . . . . . . . . . . . . . 65
2.2.2. Improved CELP coders . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.2.3. Other coders for telephone speech . . . . . . . . . . . . . . . . . . . . 77
2.3. Wideband speech coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.3.1. Transform coding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.3.2. Predictive transform coding. . . . . . . . . . . . . . . . . . . . . . . . 85
2.4. Audiovisual speech coding. . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.4.1. A transmission channel for audiovisual speech . . . . . . . . . . . . 86
2.4.2. Joint coding of audio and video parameters . . . . . . . . . . . . . . 88
2.4.3. Prospects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Chapter 3. Speech Synthesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Olivier BOËFFARD and Christophe D’ALESSANDRO
3.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.2. Key goal: speaking for communicating . . . . . . . . . . . . . . . . . . . 100
3.2.1. What acoustic content? . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.2.2. What melody? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.2.3. Beyond the strict minimum . . . . . . . . . . . . . . . . . . . . . . . . 103
3.3 Synoptic presentation of the elementary modules in speech synthesis
systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.3.1. Linguistic processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.3.2. Acoustic processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.3.3. Training models automatically . . . . . . . . . . . . . . . . . . . . . . 106
3.3.4. Operational constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.4. Description of linguistic processing . . . . . . . . . . . . . . . . . . . . . 107
3.4.1. Text pre-processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.4.2. Grapheme-to-phoneme conversion . . . . . . . . . . . . . . . . . . . 108
3.4.3. Syntactic-prosodic analysis . . . . . . . . . . . . . . . . . . . . . . . . 110
3.4.4. Prosodic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.5. Acoustic processing methodology . . . . . . . . . . . . . . . . . . . . . . 114
3.5.1. Rule-based synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.5.2. Unit-based concatenative synthesis . . . . . . . . . . . . . . . . . . . 115
3.6. Speech signal modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.6.1. The source-filter assumption . . . . . . . . . . . . . . . . . . . . . . . 118
3.6.2. Articulatory model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.6.3. Formant-based modeling . . . . . . . . . . . . . . . . . . . . . . . . . 119
Table of Contents vii
3.6.4. Auto-regressive modeling . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.6.5. Harmonic plus noise model . . . . . . . . . . . . . . . . . . . . . . . . 120
3.7. Control of prosodic parameters: the PSOLA technique . . . . . . . . . . 122
3.7.1. Methodology background . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.7.2. The ancestors of the method . . . . . . . . . . . . . . . . . . . . . . . 125
3.7.3. Descendants of the method . . . . . . . . . . . . . . . . . . . . . . . . 128
3.7.4. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
3.8. Towards variable-size acoustic units . . . . . . . . . . . . . . . . . . . . . 131
3.8.1. Constitution of the acoustic database . . . . . . . . . . . . . . . . . . 134
3.8.2. Selection of sequences of units . . . . . . . . . . . . . . . . . . . . . . 138
3.9. Applications and standardization . . . . . . . . . . . . . . . . . . . . . . . 142
3.10. Evaluation of speech synthesis. . . . . . . . . . . . . . . . . . . . . . . . 144
3.10.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
3.10.2. Global evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
3.10.3. Analytical evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
3.10.4. Summary for speech synthesis evaluation. . . . . . . . . . . . . . . 153
3.11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
3.12. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Chapter 4. Facial Animation for Visual Speech . . . . . . . . . . . . . . . . . 169
Thierry GUIARD-MARIGNY
4.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
4.2. Applications of facial animation for visual speech. . . . . . . . . . . . . 170
4.2.1. Animation movies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
4.2.2. Telecommunications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
4.2.3. Human-machine interfaces . . . . . . . . . . . . . . . . . . . . . . . . 170
4.2.4. A tool for speech research. . . . . . . . . . . . . . . . . . . . . . . . . 171
4.3. Speech as a bimodal process. . . . . . . . . . . . . . . . . . . . . . . . . . 171
4.3.1. The intelligibility of visible speech . . . . . . . . . . . . . . . . . . . 172
4.3.2. Visemes for facial animation . . . . . . . . . . . . . . . . . . . . . . . 174
4.3.3. Synchronization issues. . . . . . . . . . . . . . . . . . . . . . . . . . . 175
4.3.4. Source consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
4.3.5. Key constraints for the synthesis of visual speech. . . . . . . . . . . 177
4.4. Synthesis of visual speech . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.4.1. The structure of an artificial talking head. . . . . . . . . . . . . . . . 178
4.4.2. Generating expressions . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.5. Animation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
4.5.1. Analysis of the image of a face. . . . . . . . . . . . . . . . . . . . . . 180
4.5.2. The puppeteer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
4.5.3. Automatic analysis of the speech signal . . . . . . . . . . . . . . . . 181
4.5.4. From the text to the phonetic string . . . . . . . . . . . . . . . . . . . 181
4.6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
4.7. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
viii Spoken Language Processing
Chapter 5. Computational Auditory Scene Analysis . . . . . . . . . . . . . . 189
Alain DE CHEVEIGNÉ
5.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
5.2. Principles of auditory scene analysis . . . . . . . . . . . . . . . . . . . . . 191
5.2.1. Fusion versus segregation: choosing a representation . . . . . . . . 191
5.2.2. Features for simultaneous fusion. . . . . . . . . . . . . . . . . . . . . 191
5.2.3. Features for sequential fusion. . . . . . . . . . . . . . . . . . . . . . . 192
5.2.4. Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.2.5. Illusion of continuity, phonemic restoration . . . . . . . . . . . . . . 193
5.3. CASA principles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.3.1. Design of a representation. . . . . . . . . . . . . . . . . . . . . . . . . 193
5.4. Critique of the CASA approach . . . . . . . . . . . . . . . . . . . . . . . . 200
5.4.1. Limitations of ASA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
5.4.2. The conceptual limits of “separable representation” . . . . . . . . . 202
5.4.3. Neither a model, nor a method? . . . . . . . . . . . . . . . . . . . . . 203
5.5. Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.5.1. Missing feature theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.5.2. The cancellation principle. . . . . . . . . . . . . . . . . . . . . . . . . 204
5.5.3. Multimodal integration . . . . . . . . . . . . . . . . . . . . . . . . . . 205
5.5.4. Auditory scene synthesis: transparency measure . . . . . . . . . . . 205
5.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Chapter 6. Principles of Speech Recognition . . . . . . . . . . . . . . . . . . . 213
Renato DE MORI and Brigitte BIGI
6.1. Problem definition and approaches to the solution. . . . . . . . . . . . . 213
6.2. Hidden Markov models for acoustic modeling . . . . . . . . . . . . . . . 216
6.2.1. Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.2.2. Observation probability and model parameters . . . . . . . . . . . . 217
6.2.3. HMM as probabilistic automata . . . . . . . . . . . . . . . . . . . . . 218
6.2.4. Forward and backward coefficients . . . . . . . . . . . . . . . . . . . 219
6.3. Observation probabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
6.4. Composition of speech unit models . . . . . . . . . . . . . . . . . . . . . 223
6.5. The Viterbi algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
6.6. Language models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
6.6.1. Perplexity as an evaluation measure for language models . . . . . . 230
6.6.2. Probability estimation in the language model . . . . . . . . . . . . . 232
6.6.3. Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . 234
6.6.4. Bayesian estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
6.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
6.8. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Table of Contents ix
Chapter 7. Speech Recognition Systems . . . . . . . . . . . . . . . . . . . . . . 239
Jean-Luc GAUVAIN and Lori LAMEL
7.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
7.2. Linguistic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
7.3. Lexical representation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
7.4. Acoustic modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
7.4.1. Feature extraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
7.4.2. Acoustic-phonetic models. . . . . . . . . . . . . . . . . . . . . . . . . 249
7.4.3. Adaptation techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
7.5. Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
7.6. Applicative aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
7.6.1. Efficiency: speed and memory . . . . . . . . . . . . . . . . . . . . . . 257
7.6.2. Portability: languages and applications . . . . . . . . . . . . . . . . . 259
7.6.3. Confidence measures. . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
7.6.4. Beyond words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
7.7. Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
7.7.1. Text dictation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
7.7.2. Audio document indexing. . . . . . . . . . . . . . . . . . . . . . . . . 263
7.7.3. Dialog systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
7.8. Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
7.9. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
Chapter 8. Language Identification . . . . . . . . . . . . . . . . . . . . . . . . . 279
Martine ADDA-DECKER
8.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
8.2. Language characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
8.3. Language identification by humans. . . . . . . . . . . . . . . . . . . . . . 286
8.4. Language identification by machines. . . . . . . . . . . . . . . . . . . . . 287
8.4.1. LId tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
8.4.2. Performance measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
8.4.3. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
8.5. LId resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
8.6. LId formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
8.7. Lid modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
8.7.1. Acoustic front-end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
8.7.2. Acoustic language-specific modeling . . . . . . . . . . . . . . . . . . 300
8.7.3. Parallel phone recognition. . . . . . . . . . . . . . . . . . . . . . . . . 302
8.7.4. Phonotactic modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
8.7.5. Back-end optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . 309
8.8. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
8.9. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
x Spoken Language Processing
Chapter 9. Automatic Speaker Recognition . . . . . . . . . . . . . . . . . . . . 321
Frédéric BIMBOT.
9.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
9.1.1. Voice variability and characterization. . . . . . . . . . . . . . . . . . 321
9.1.2. Speaker recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
9.2. Typology and operation of speaker recognition systems . . . . . . . . . 324
9.2.1. Speaker recognition tasks . . . . . . . . . . . . . . . . . . . . . . . . . 324
9.2.2. Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
9.2.3. Text-dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
9.2.4. Types of errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
9.2.5. Influencing factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
9.3. Fundamentals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
9.3.1. General structure of speaker recognition systems . . . . . . . . . . . 329
9.3.2. Acoustic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
9.3.3. Probabilistic modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
9.3.4. Identification and verification scores . . . . . . . . . . . . . . . . . . 335
9.3.5. Score compensation and decision . . . . . . . . . . . . . . . . . . . . 337
9.3.6. From theory to practice . . . . . . . . . . . . . . . . . . . . . . . . . . 342
9.4. Performance evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
9.4.1. Error rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
9.4.2. DET curve and EER . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
9.4.3. Cost function, weighted error rate and HTER . . . . . . . . . . . . . 346
9.4.4. Distribution of errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
9.4.5. Orders of magnitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
9.5. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
9.5.1. Physical access control. . . . . . . . . . . . . . . . . . . . . . . . . . . 348
9.5.2. Securing remote transactions . . . . . . . . . . . . . . . . . . . . . . . 349
9.5.3. Audio information indexing. . . . . . . . . . . . . . . . . . . . . . . . 350
9.5.4. Education and entertainment . . . . . . . . . . . . . . . . . . . . . . . 350
9.5.5. Forensic applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
9.5.6. Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
9.6. Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
9.7. Further reading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Chapter 10. Robust Recognition Methods . . . . . . . . . . . . . . . . . . . . . 355
Jean-Paul HATON
10.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
10.2. Signal pre-processing methods. . . . . . . . . . . . . . . . . . . . . . . . 357
10.2.1. Spectral subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
10.2.2. Adaptive noise cancellation . . . . . . . . . . . . . . . . . . . . . . . 358
10.2.3. Space transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
10.2.4. Channel equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
10.2.5. Stochastic models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
10.3. Robust parameters and distance measures . . . . . . . . . . . . . . . . . 360
Table of Contents xi
10.3.1. Spectral representations . . . . . . . . . . . . . . . . . . . . . . . . . 361
10.3.2. Auditory models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
10.3.3 Distance measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
10.4. Adaptation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
10.4.1 Model composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
10.4.2. Statistical adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
10.5. Compensation of the Lombard effect . . . . . . . . . . . . . . . . . . . . 368
10.6. Missing data scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
10.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
10.8. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
Chapter 11. Multimodal Speech: Two or Three senses are
Better than One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
Jean-Luc SCHWARTZ, Pierre ESCUDIER and Pascal TEISSIER
11.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
11.2. Speech is a multimodal process . . . . . . . . . . . . . . . . . . . . . . . 379
11.2.1. Seeing without hearing . . . . . . . . . . . . . . . . . . . . . . . . . . 379
11.2.2. Seeing for hearing better in noise. . . . . . . . . . . . . . . . . . . . 380
11.2.3. Seeing for better hearing… even in the absence of noise. . . . . . 382
11.2.4. Bimodal integration imposes itself to perception . . . . . . . . . . 383
11.2.5. Lip reading as taking part to the ontogenesis of speech. . . . . . . 385
11.2.6. ...and to its phylogenesis ? . . . . . . . . . . . . . . . . . . . . . . . . 386
11.3. Architectures for audio-visual fusion in speech perception . . . . . . . 388
11.3.1.Three paths for sensory interactions in cognitive psychology . . . 389
11.3.2. Three paths for sensor fusion in information processing . . . . . . 390
11.3.3. The four basic architectures for audiovisual fusion . . . . . . . . . 391
11.3.4. Three questions for a taxonomy . . . . . . . . . . . . . . . . . . . . 392
11.3.5. Control of the fusion process . . . . . . . . . . . . . . . . . . . . . . 394
11.4. Audio-visual speech recognition systems . . . . . . . . . . . . . . . . . 396
11.4.1. Architectural alternatives . . . . . . . . . . . . . . . . . . . . . . . . 397
11.4.2. Taking into account contextual information . . . . . . . . . . . . . 401
11.4.3. Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
11.5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
11.6. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
Chapter 12. Speech and Human-Computer Communication . . . . . . . . . 417
Wolfgang MINKER & Françoise NÉEL
12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
12.2. Context. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
12.2.1. The development of micro-electronics. . . . . . . . . . . . . . . . . 419
12.2.2. The expansion of information and communication technologies and
increasing interconnection of computer systems . . . . . . . . . . . . . . . 420
xii Spoken Language Processing
12.2.3. The coordination of research efforts and the improvement of
automatic speech processing systems . . . . . . . . . . . . . . . . . . . . . . 421
12.3. Specificities of speech. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
12.3.1. Advantages of speech as a communication mode . . . . . . . . . . 424
12.3.2. Limitations of speech as a communication mode . . . . . . . . . . 425
12.3.3. Multidimensional analysis of commercial speech recognition
products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
12.4. Application domains with voice-only interaction. . . . . . . . . . . . . 430
12.4.1. Inspection, control and data acquisition . . . . . . . . . . . . . . . . 431
12.4.2. Home automation: electronic home assistant . . . . . . . . . . . . . 432
12.4.3. Office automation: dictation and speech-to-text systems . . . . . . 432
12.4.4. Training. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
12.4.5. Automatic translation . . . . . . . . . . . . . . . . . . . . . . . . . . 438
12.5. Application domains with multimodal interaction . . . . . . . . . . . . 439
12.5.1. Interactive terminals . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
12.5.2. Computer-aided graphic design. . . . . . . . . . . . . . . . . . . . . 441
12.5.3. On-board applications . . . . . . . . . . . . . . . . . . . . . . . . . . 442
12.5.4. Human-human communication facilitation . . . . . . . . . . . . . . 444
12.5.5. Automatic indexing of audio-visual documents . . . . . . . . . . . 446
12.6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
12.7. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
Chapter 13. Voice Services in the Telecom Sector . . . . . . . . . . . . . . . . 455
Laurent COURTOIS, Patrick BRISARD and Christian GAGNOULET
13.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
13.2. Automatic speech processing and telecommunications . . . . . . . . . 456
13.3. Speech coding in the telecommunication sector . . . . . . . . . . . . . 456
13.4. Voice command in telecom services . . . . . . . . . . . . . . . . . . . . 457
13.4.1. Advantages and limitations of voice command . . . . . . . . . . . 457
13.4.2. Major trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
13.4.3. Major voice command services . . . . . . . . . . . . . . . . . . . . . 460
13.4.4. Call center automation (operator assistance) . . . . . . . . . . . . . 460
13.4.5. Personal voice phonebook . . . . . . . . . . . . . . . . . . . . . . . . 462
13.4.6. Voice personal telephone assistants . . . . . . . . . . . . . . . . . . 463
13.4.7. Other services based on voice command . . . . . . . . . . . . . . . 463
13.5. Speaker verification in telecom services . . . . . . . . . . . . . . . . . . 464
13.6. Text-to-speech synthesis in telecommunication systems . . . . . . . . 464
13.7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
13.8. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
List of Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
Preface
This book, entitled Spoken Language Processing, addresses all the aspects
covering the automatic processing of spoken language: how to automate its
production and perception, how to synthesize and understand it. It calls for existing
know-how in the field of signal processing, pattern recognition, stochastic modeling,
computational linguistics, human factors, but also relies on knowledge specific to
spoken language.
The automatic processing of spoken language covers activities related to the
analysis of speech, including variable rate coding to store or transmit it, to its
synthesis, especially from text, to its recognition and understanding, should it be for
a transcription, possibly followed by an automatic indexation, or for human-machine
dialog or human-human machine-assisted interaction. It also includes speaker and
spoken language recognition. These tasks may take place in a noisy environment,
which makes the problem even more difficult.
The activities in the field of automatic spoken language processing started after
the Second World War with the works on the Vocoder and Voder at Bell Labs by
Dudley and colleagues, and were made possible by the availability of electronic
devices. Initial research work on basic recognition systems was carried out with very
limited computing resources in the 1950s. The computer facilities that became
available to researchers in the 1970s made it possible to achieve initial progress
within laboratories, and microprocessors then led to the early commercialization of
the first voice recognition and speech synthesis systems at an affordable price. The
steady progress in the speed of computers and in the storage capacity accompanied
the scientific advances in the field.
Research investigations in the 1970s, including those carried out in the large
DARPA “Speech Understanding Systems” (SUS) program in the USA, suffered
from a lack of availability of speech data and of means and methods for evaluating
xiv Spoken Language Processing
the performance of different approaches and systems. The establishment by
DARPA, as part of its following program launched in 1984, of a national language
resources center, the Linguistic Data Consortium (LDC), and of a system assessment
center, within the National Institute of Standards and Technology (NIST, formerly
NBS), brought this area of research into maturity. The evaluation campaigns in the
area of speech recognition, launched in 1987, made it possible to compare the
different approaches that had coexisted up to then, based on “Artificial Intelligence”
methods or on stochastic modeling methods using large amounts of data for training,
with a clear advantage to the latter. This led progressively to a quasi-generalization
of stochastic approaches in most laboratories in the world. The progress made by
researchers has constantly accompanied the increasing difficulty of the tasks which
were handled, starting from the recognition of sentences read aloud, with a limited
vocabulary of 1,000 words, either speaker-dependent or speaker-independent, to the
dictation of newspaper articles for vocabularies of 5,000, 20,000 and 64,000 words,
and then to the transcription of radio or television broadcast news, with unlimited
size vocabularies. These evaluations were opened to the international community in
1992. They first focused on the American English language, but early initiatives
were also carried out on the French, German or British English languages in a
French or European context. Other campaigns were subsequently held on speaker
recognition, language identification or speech synthesis in various contexts,
allowing for a better understanding of the pros and cons of an approach, and for
measuring the status of technology and the progress achieved or still to be achieved.
They led to the conclusion that a sufficient level of maturation has been reached for
putting the technology on the market, in the field of voice dictation systems for
example. However, it also identified the difficulty of other more challenging
problems, such as those related to the recognition of conversational speech,
justifying the need to keep on supporting fundamental research in this area.
This book consists of two parts: a first part discusses the analysis and synthesis
of speech and a second part speech recognition and understanding. The first part
starts with a brief introduction of the principles of speech production, followed by a
broad overview of the methods for analyzing speech: linear prediction, short-term
Fourier transform, time-representations, wavelets, cepstrum, etc. The main methods
for speech coding are then developed for the telephone bandwidth, such as the CELP
coder, or, for broadband communication, such as “transform coding” and
quantization methods. The audio-visual coding of speech is also introduced. The
various operations to be carried out in a text-to-speech synthesis system are then
presented regarding the linguistic processes (grapheme-to-phoneme transcription,
syntactic and prosodic analysis) and the acoustic processes, using rule-based
approaches or approaches based on the concatenation of variable length acoustic
units. The different types of speech signal modeling – articulatory, formant-based,
auto-regressive, harmonic-noise or PSOLA-like – are then described. The evaluation
of speech synthesis systems is a topic of specific attention in this chapter. The
Preface xv
extension of speech synthesis to talking faces animation is the subject of the next
chapter, with a presentation of the application fields, of the interest of a bimodal
approach and of models used to synthesize and animate the face. Finally,
computational auditory scene analysis opens prospects in the signal processing of
speech, especially in noisy environments.
The second part of the book focuses on speech recognition. The principles of
speech recognition are first presented. Hidden Markov models are introduced, as
well as their use for the acoustic modeling of speech. The Viterbi algorithm is
depicted, before introducing language modeling and the way to estimate
probabilities. It is followed by a presentation of recognition systems, based on those
principles and on the integration of those methodologies, and of lexical and
acoustic-phonetic knowledge. The applicative aspects are highlighted, such as
efficiency, portability and confidence measures, before describing three types of
recognition systems: for text dictation, for audio documents indexing and for oral
dialog. Research in language identification aims at recognizing which language is
spoken, using acoustic, phonetic, phonotactic or prosodic information. The
characteristics of languages are introduced and the way humans or machines can
achieve that task is depicted, with a large presentation of the present performances
of such systems. Speaker recognition addresses the recognition and verification of
the identity of a person based on his voice. After an introduction on what
characterizes a voice, the different types and designs of systems are presented, as
well as their theoretical background. The way to evaluate the performances of
speaker recognition systems and the applications of this technology are a specific
topic of interest. The use of speech or speaker recognition systems in noisy
environments raises especially difficult problems to solve, but they must be taken
into account in any operational use of such systems. Various methods are available,
either by pre-processing the signal, during the parameterization phase, by using
specific distances or by adaptation methods. The Lombard effect, which causes a
change in the production of the voice signal itself due to the noisy environment
surrounding the speaker, benefits from a special attention. Along with recognition
based solely on the acoustic signal, bi-modal recognition combines two acquisition
channels: auditory and visual. The value added by bimodal processing in a noisy
environment is emphasized and architectures for the audiovisual merging of audio
and visual speech recognition are presented. Finally, applications of automatic
spoken language processing systems, generally for human-machine communication
and particularly in telecommunications, are described. Many applications of speech
coding, recognition or synthesis exist in many fields, and the market is growing
rapidly. However, there are still technological and psychological barriers that require
more work on modeling human factors and ergonomics, in order to make those
systems widely accepted.
xvi Spoken Language Processing
The reader, undergraduate or graduate student, engineer or researcher will find in
this book many contributions of leading French experts of international renown who
share the same enthusiasm for this exciting field: the processing by machines of a
capacity which used to be specific to humans: language.
Finally, as editor, I would like to warmly thank Anna and Frédéric Bimbot for
the excellent work they achieved in translating the book Traitement automatique du
langage parlé, on which this book is based.
Joseph Mariani
November 2008
Chapter 1
Speech Analysis
1.1. Introduction
1.1.1. Source-filter model
Speech, the acoustic manifestation of language, is probably the main means of
communication between human beings. The invention of telecommunications and
the development of digital information processing have therefore entailed vast
amounts of research aimed at understanding the mechanisms of speech
communication.
Speech can be approached from different angles. In this chapter, we will
consider speech as a signal, a one-dimensional function, which depends on the time
variable (as in [BOI 87, OPP 89, PAR 86, RAB 75, RAB 77]). The acoustic speech
signal is obtained at a given point in space by a sensor (microphone) and converted
into electrical values. These values are denoted )
(t
s and they represent a real-valued
function of real variable t, analogous to the variation of the acoustic pressure. Even
if the acoustic form of the speech signal is the most widespread (it is the only signal
transmitted over the telephone), other types of analysis also exist, based on
alternative physiological signals (for instance, the electroglottographic signal, the
palatographic signal, the airflow), or related to other modalities (for example, the
image of the face or the gestures of the articulators). The field of speech analysis
covers the set of methods aiming at the extraction of information on and from this
signal, in various applications, such as:
Chapter written by Christophe D’ALESSANDRO.
2 Spoken Language Processing
– speech coding: the compression of information carried by the acoustic signal,
in order to save data storage or to reduce transmission rate;
– speech recognition and understanding, speaker and spoken language
recognition;
– speech synthesis or automatic speech generation, from an arbitrary text;
– speech signal processing, which covers many applications, such as auditory
aid, denoising, speech encrypting, echo cancellation, post-processing for audiovisual
applications;
– phonetic and linguistic analysis, speech therapy, voice monitoring in
professional situations (for instance, singers, speakers, teachers, managers, etc.).
Two ways of approaching signal analysis can be distinguished: the model-based
approach and the representation-based approach. When a voice signal model (or a
voice production model or a voice perception model) is assumed, the goal of the
analysis step is to identify the parameters of that model. Thus, many analysis
methods, referred to as parametric methods, are based on the source-filter model of
speech production; for example, the linear prediction method. On the other hand,
when no particular hypothesis is made on the signal, mathematical representations
equivalent to its time representation can be defined, so that new information can be
drawn from the coefficients of the representation. An example of a non-parametric
method is the short-term Fourier transform (STFT). Finally, there are some hybrid
methods (sometimes referred to as semi-parametric). These consist of estimating
some parameters from non-parametric representations. The sinusoidal and cepstral
representations are examples of semi-parametric representation.
This chapter is centered on the linear acoustic source-filter speech production
model. It presents the most common speech signal analysis techniques, together with
a few illustrations. The reader is assumed to be familiar with the fundamentals of
digital signal processing, such as discrete-time signals, Fourier transform, Laplace
transform, Z-transforms and digital filters.
1.1.2. Speech sounds
The human speech apparatus can be broken down into three functional parts
[HAR 76]: 1) the lungs and trachea, 2) the larynx and 3) the vocal tract. The
abdomen and thorax muscles are the engine of the breathing process. Compressed
by the muscular system, the lungs act as bellows and supply some air under pressure
which travels through the trachea (subglottic pressure). The airflow thus expired is
then modulated by the movements of the larynx and those of the vocal tract.
Speech Analysis 3
The larynx is composed of the set of muscles, articulated cartilage, ligaments and
mucous membranes located between the trachea on one side, and the pharyngeal
cavity on the other side. The cartilage, ligaments and muscles in the larynx can set
the vocal cords in motion, the opening of which is called the glottis. When the vocal
cords lie apart from each other, the air can circulate freely through the glottis and no
sound is produced. When both membranes are close to each other, they can join and
modulate the subglottic airflow and pressure, thus generating isolated pulses or
vibrations. The fundamental frequency of these vibrations governs the pitch of the
voice signal (F0).
The vocal tract can be subdivided into three cavities: the pharynx (from the
larynx to the velum and the back of the tongue), the oral tract (from the pharynx to
the lips) and the nasal cavity. When it is open, the velum is able to divert some air
from the pharynx to the nasal cavity. The geometrical configuration of the vocal
tract depends on the organs responsible for the articulation: jaws, lips, tongue.
Each language uses a certain subset of sounds, among those that the speech
apparatus can produce [MAL 74]. The smallest distinctive sound units used in a
given language are called phonemes. The phoneme is the smallest spoken unit
which, when substituted with another one, changes the linguistic content of an
utterance. For instance, changing the initial /p/ sound of “pig” (/pIg/) into /b / yields
a different word: “big” (/bIg/). Therefore, the phonemes /p/ and /b/ can be
distinguished from each other.
A set of phonemes, which can be used for the description of various languages
[WEL 97], is given in Table 1.1 (described both by the International Phonetic
Alphabet, IPA, and the computer readable Speech Assessment Methodologies
Phonetic Alphabet, SAMPA). The first subdivision that is observed relates to the
excitation mode and to the vocal tract stability: the distinction between vowels and
consonants. Vowels correspond to a periodic vibration of the vocal cords and to a
stable configuration of the vocal tract. Depending on whether the nasal branch is
open or not (as a result of the lowering of the velum), vowels have either a nasal or
an oral character. Semivowels are produced when the periodic glottal excitation
occurs simultaneously with a fast movement of the vocal tract, between two vocalic
positions.
Consonants correspond to fast constriction movements of the articulatory organs,
i.e. generally to rather unstable sounds, which evolve over time. For fricatives, a
strong constriction of the vocal tract causes a friction noise. If the vocal cords
vibrate at the same time, the fricative consonant is then voiced. Otherwise, if the
vocal folds let the air pass through without producing any sound, the fricative is
unvoiced. Plosives are obtained by a complete obstruction of the vocal tract,
followed by a release phase. If produced together with the vibration of the vocal
4 Spoken Language Processing
cords, the plosive is voiced, otherwise it is unvoiced. If the nasal branch is opened
during the mouth closure, the produced sound is a nasal consonant. Semivowels are
considered voiced consonants, resulting from a fast movement which briefly passes
through the articulatory position of a vowel. Finally, liquid consonants are produced
as the combination of a voiced excitation and fast articulatory movements, mainly
from the tongue.
SAMPA IPA Unicode label and exemplification
symbol ASCII hex dec.
Vowels
A 65 Ǡ script a 0251 593
open back unrounded, Cardinal 5, Eng.
start
{ 123 æ
ae
ligature
00E6 230 near-open front unrounded, Eng. trap
6 54 ǟ turned a 0250 592 open schwa, Ger. besser
Q 81 ǡ
turned
script a
0252 594 open back rounded, Eng. lot
E 69 Ǫ epsilon 025B 603 open-mid front unrounded, Fr. même
@ 64 ԥ turned e 0259 601 schwa, Eng. banana
3 51 ǫ
rev.
epsilon
025C 604 long mid central, Eng. nurse
I 73 ǹ
small
cap I
026A 618 lax close front unrounded, Eng. kit
O 79 ǣ turned c 0254 596 open-mid back rounded, Eng. thought
2 50 ø o-slash 00F8 248 close-mid front rounded, Fr. deux
9 57 œ
oe
ligature
0153 339 open-mid front rounded, Fr. neuf
 38 ȅ
s.c. OE
ligature
0276 630 open front rounded, Swedish skörd
U 85 ș upsilon 028A 650 lax close back rounded, Eng. foot
} 125 Ș barred u 0289 649 close central rounded, Swedish sju
V 86 ț turned v 028C 652 open-mid back unrounded, Eng. strut
Y 89 Ȟ
small
cap Y
028F 655 lax [y], Ger. hübsch
Speech Analysis 5
Consonants
B 66 ȕ beta 03B2 946 Voiced bilabial fricative, Sp. cabo
C 67 ç c-cedilla 00E7 231 voiceless palatal fricative, Ger. ich
D 68 ð eth 00F0 240 Voiced dental fricative, Eng. then
G 71 Dz gamma 0263 611 Voiced velar fricative, Sp. fuego
L 76 ȝ turned y 028E 654 Palatal lateral, It. famiglia
J 74 ȁ
left-tail
n
0272 626 Palatal nasal, Sp. año
N 78 ƾ eng 014B 331 velar nasal, Eng. thing
R 82 Ȑ
inv. s.c.
R
0281 641 Voiced uvular fricative. or trill, Fr. roi
S 83 Ȓ esh 0283 643
voiceless palatoalveolar fricative, Eng.
ship
T 84 ș theta 03B8 952 voiceless dental fricative, Eng. thin
H 72 Ǵ turned h 0265 613 labial-palatal semivowel, Fr. huit
Z 90 Ș 
ezh
(yogh)
0292 658 vd. palatoalveolar fric., Eng. measure
? 63 ȣ dotless ? 0294 660
glottal stop, Ger. Verein, also Danish
stød
Table 1.1. Computer-readable Speech Assessment Methodologies Phonetic Alphabet,
SAMPA, and its correspondence in the International Phonetic Alphabet,
IPA, with examples in 6 different languages [WEL 97]
In speech production, sound sources appear to be relatively localized; they excite
the acoustic cavities in which the resulting air disturbances propagate and then
radiate to the outer acoustic field. This relative independence of the sources with the
transformations that they undergo is the basis for the acoustic theory of speech
production [FAN 60, FLA 72, STE 99]. This theory considers source terms, on the
one hand, which are generally assumed to be non-linear, and a linear filter on the
other hand, which acts upon and transforms the source signal. This source-filter
decomposition reflects the terminology commonly used in phonetics, which
describes the speech sounds in terms of “phonation” (source) and “articulation”
(filter). The source and filter acoustic contributions can be studied separately, as
they can be considered to be decoupled from each other, in a first approximation.
From the point of view of physics, this model is an approximation, the main
advantage of which is its simplicity. It can be considered as valid at frequencies
below 4 or 5 kHz, i.e. those frequencies for which the propagation in the vocal tract
consists of one-dimensional plane waves. For signal processing purposes, the
6 Spoken Language Processing
acoustic model can be described as a linear system, by neglecting the source-filter
interaction:
s(t) )
(
*
)
(
*
)]
(
)
(
[
)
(
*
)
(
*
)
( t
l
t
v
t
r
t
p
t
l
t
v
t
e  [1.1]
)
(
*
)
(
*
)
(
)
(
*
)
( 0 t
l
t
v
t
r
t
u
iT
t
i
g
»
»
¼
º
«
«
¬
ª


¦
f
f
G [1.2]
S(Ȧ) )
(
)
(
)]
(
)
(
[
)
(
)
(
)
( Z
Z
Z
Z
Z
Z
Z L
V
R
P
L
V
E u
u

u
u [1.3]
(
)
(
)
(
)
(
)
(
0
)
(
)
(
)
(
)
(
)
(
Z
T
Z
T
Z
T
Z
T
Z
Z
Z
Z
Z
G
l
v
r
g
u
j
j
j
j
g
i
e
L
e
V
e
R
e
U
iF
u
u
»
»
¼
º
«
«
¬
ª

¸
¸
¹
·
¨
¨
©
§

¦
f

f [1.4]
where s(t) is the speech signal, v(t) the impulse response of the vocal tract, e(t) the
vocal excitation source, l(t) the impulse response of the lip radiation component, p(t)
the periodic part of the excitation, r(t) the non-periodic part of the excitation, ug(t)
the glottal airflow wave, T0 the fundamental period, r(t) the noise part of the
excitation, į the Dirac distribution, and where S(Ȧ), V(Ȧ), E(Ȧ), L(Ȧ), P(Ȧ), R(Ȧ),
Ug(Ȧ) denote the Fourier transforms of s(t), v(t), e(t), l(t), p(t), r(t), ug(t)
respectively. F0=1/T0 is the voicing fundamental frequency. The various terms of the
source-filter model are now going to be studied in more details.
1.1.3. Sources
The source component e(t), E(Ȧ) is a signal composed of a periodic part
(vibrations of the vocal cords, characterized by F0 and the glottal airflow waveform)
and a noise part. The various phonemes use both types of source excitation either
separately or simultaneously.
1.1.3.1. Glottal airflow wave
The study of glottal activity (phonation) is particularly important in speech
science. Physical models of the glottis functioning, in terms of mass-spring systems
have been investigated [FLA 72]. Several types of physiological signals can be used
to conduct studies on the glottal activity (for example, electroglottography, fast
photography, see [TIT 94]). From the acoustic point of view, the glottal airflow
wave, which represents the airflow traveling through the glottis as a function of
time, is preferred to the pressure wave. It is indeed easier to measure the glottal
Speech Analysis 7
airflow rather than the glottal pressure, from physiological data. Moreover, the
pseudo-periodic voicing source p(t) can be broken down into two parts: a pulse
train, which represents the periodic part of the excitation and a low-pass filter, with
an impulse response ug, which corresponds to the (frequency-domain and time-
domain) shape of the glottal airflow wave.
The time-domain shape of the glottal airflow wave (or, more precisely, of its
derivative) generally governs the behavior of the time-domain signal for vowels and
voiced signals [ROS 71]. Time-domain models of the glottal airflow have several
properties in common: they are periodical, always non-negative (no incoming
airflow), they are continuous functions of the time variable, derivable everywhere
except, in some cases, at the closing instant. An example of such a time-domain
model is the Klatt model [KLA 90], which calls for 4 parameters (the fundamental
frequency F0, the voicing amplitude AV, the opening ratio Oq and the frequency TL
of a spectral attenuation filter). When there is no attenuation, the KGLOTT88 model
writes:
°̄
°
®
­
d
d
d
d

0
0
0
3
2
0
0
)
(
T
t
T
O
for
T
O
t
for
bt
at
t
U
q
q
g
2
0
3
0
2
4
27
4
27
T
O
AV
b
T
O
AV
a
with
q
q
[1.5]
when TL  0, Ug(t) is filtered by an additional low-pass filter, with an attenuation at
3,000 Hz equal to TL dB.
The LF model [FAN 85] represents the derivative of the glottal airflow with 5
parameters (fundamental period T0, amplitude at the minimum of the derivative or at
the maximum of the wave Ee, instant of maximum excitation Te, instant of
maximum airflow wave Tp, time constant for the return phase Ta):
°
°
¯
°
°
®
­
d
d


d
d





0
)
(
)
(
)
(
'
for
)
(
0
for
)
/
sin(
)
/
sin(
)
(
0
T
t
T
e
e
T
E
T
t
T
T
T
t
e
E
t
U
e
T
T
T
t
a
e
e
p
e
p
T
t
a
e
g
e
e
e
H
H
H
S
S
[1.6]
In this equation, parameter İ is defined by an implicit equation:
0
( )
1 e
T T
a
T e H
H  
 [1.7]
8 Spoken Language Processing
All time-domain models (see Figure 1.1) have at least three main parameters: the
voicing amplitude, which governs the time-domain amplitude of the wave, the
voicing period, and the opening duration, i.e. the fraction of the period during which
the wave is non-zero. In fact, the glottal wave represents the airflow traveling
through the glottis. This flow is zero when the vocal chords are closed. It is positive
when they are open. A fourth parameter is introduced in some models to account for
the speed at which the glottis closes. This closing speed is related to the high
frequency part of the speech spectrum.
Figure 1.1. Models of the glottal airflow waveform in the time domain: triangular model,
Rosenberg model, KGLOT88, LF and the corresponding spectra
Speech Analysis 9
The general shape of the glottal airflow spectrum is one of a low-pass filter. Fant
[FAN 60] uses four poles on the negative real axis:
– 
4
1
)
1
(
)
( 0
r r
g
g
s
s
U
s
U [1.8]
with sr1 | sr2 = 2ʌ × 100 Hz, and sr3 = 2ʌ ×2,000 Hz, sr4 = 2ʌ ×4,000 Hz. This is a
spectral model with six parameters (F0, Ug0 and four poles), among which two are
fixed (sr3 and sr4). This simple form is used in [MAR 76] in the digital domain, as a
second-order low-pass filter, with a double real pole in K:
2
1
)
1
(
)
( 0

 Kz
U
z
U
g
g [1.9]
Two poles are sufficient in this case, as the numerical model is only valid up to
approximately 4,000 Hz. Such a filter depends on three parameters: gain Ug0, which
corresponds to the voicing amplitude, fundamental frequency F0 and a frequency
parameter K, which replaces both sr1 and sr2. The spectrum shows an asymptotic
slope of –12 dB/octave when the frequency increases. Parameter K controls the
filter’s cut-off frequency. When the frequency tends towards zero, |Ug(0)| a Ug0.
Therefore, the spectral slope is zero in the neighborhood of zero, and –12 dB/octave,
for frequencies above a given bound (determined by K). When the focus is put on
the derivative of the glottal airflow, the two asymptotes have slopes of +6 dB/octave
and –6 dB/octave respectively. This explains the existence of a maximum in the
speech spectrum at low frequencies, stemming from the glottal source.
Another way to calculate the glottal airflow spectrum is to start with time-
domain models. For the Klatt model, for example, the following expression is
obtained for the Laplace transform L, when there is no additional spectral
attenuation:
¸
¸
¹
·
¨
¨
©
§ 



c



2
)
1
(
6
)
2
1
(
2
1
4
27
)
)(
(
s
e
s
e
e
s
s
n
L
s
s
s
g [1.10]
10 Spoken Language Processing
Figure 1.2. Schematic spectral representation of the glottal airflow waveform. Solid line:
abrupt closure of the vocal cords (minimum spectral slope). Dashed line: dampened closure.
The cut-off frequency owed to this dampening is equal to 4 times the spectral maximum Fg
It can be shown that this is a low-pass spectrum. The derivative of the glottal
airflow shows a spectral maximum located at:
0
1
3
T
O
f
q
g
S
[1.11]
This sheds light on the links between time-domain and frequency-domain
parameters: the opening ratio (i.e. the ratio between the opening duration of the
glottis and the overall glottal period) governs the spectral peak frequency. The time-
domain amplitude rules the frequency-domain amplitude. The closing speed of the
vocal cords relates directly to the spectral attenuation in the high frequencies, which
shows a minimum slope of –12 dB/octave.
1.1.3.2. Noise sources
The periodic vibration of the vocal cords is not the only sound source in speech.
Noise sources are involved in the production of several phonemes. Two types of
noise can be observed: transient noise and continuous noise. When a plosive is
produced, the holding phase (total obstruction of the vocal tract) is followed by a
release phase. A transient noise is then produced by the pressure and airflow
Speech Analysis 11
impulse generated by the opening of the obstruction. The source is located in the
vocal tract, at the point where the obstruction and release take place. The impulse is
a wide-band noise which slightly varies with the plosive.
For continuous noise (fricatives), the sound originates from turbulences in the
fast airflow at the level of the constriction. Shadle [SHA 90] distinguishes noise
caused by the lining and noise caused by obstacles, depending on the incidence
angle of the air stream on the constriction. In both cases, the turbulences produce a
source of random acoustic pressure downstream of the constriction. The power
spectrum of this signal is approximately flat in the range of 0 – 4,000 Hz, and then
decreases with frequency.
When the constriction is located at the glottis, the resulting noise (aspiration
noise) shows a wide-band spectral maximum around 2,000 Hz. When the
constriction is in the vocal tract, the resulting noise (frication noise) also shows a
roughly flat spectrum, either slowly decreasing or with a wide maximum somewhere
between 4 kHz and 9 kHz. The position of this maximum depends on the fricative.
The excitation source for continuous noise can thus be considered as a white
Gaussian noise filtered by a low-pass filter or by a wide band-pass filter (several
kHz wide).
In continuous speech, it is interesting to separate the periodic and non-periodic
contributions of the excitation. For this purpose, either the sinusoidal representation
[SER 90] or the short-term Fourier spectrum [DAL 98, YEG 98] can be used. The
principle is to subtract from the source signal its harmonic component, in order to
obtain the non-periodic component. Such a separation process is illustrated in Figure
1.3.
12 Spoken Language Processing
Figure 1.3. Spectrum of the excitation source for a vowel. (A) the complete spectrum; (B) the
non-periodic part; (C) the periodic part
1.1.4. Vocal tract
The vocal tract is an acoustic cavity. In the source-filter model, it plays the role
of a filter, i.e. a passive system which is independent from the source. Its function
consists of transforming the source signal, by means of resonances and anti-
resonances. The maxima of the vocal tract’s spectral gain are called spectral
formants, or more simply formants. Formants can generally be assimilated to the
spectral maxima which can be observed on the speech spectrum, as the source
spectrum is globally monotonous for voiced speech. However, depending on the
Speech Analysis 13
source spectrum, formants and resonances may turn out to be shifted. Furthermore,
in some cases, a source formant can be present. Formants are also observed in
unvoiced speech segments, at least those that correspond to cavities located in front
of the constriction, and thus excited by the noise source.
1.1.4.1. Multi-tube model
The vocal tract is an acoustic duct with a complex shape. At a first level of
approximation, its acoustic behavior may be understood to be one of an acoustic
tube. Hypotheses must be made to calculate the propagation of an acoustic wave
through this tube:
– the tube is cylindrical, with a constant area section A;
– the tube walls are rigid (i.e. no vibration terms at the walls);
– the propagation mode is (mono-dimensional) plane waves. This assumption is
satisfied if the transverse dimension of the tube is small, compared to the considered
wavelengths, which correspond in practice to frequencies below 4,000 Hz for a
typical vocal tract (i.e. a length of 17.6 cm and a section of 8 cm2
for the neutral
vowel);
– the process is adiabatic (i.e. no loss by thermal conduction);
– the hypothesis of small movements is made (i.e. second-order terms can be
neglected).
Let A denote the (constant) section of the tube, x the abscissa along the tube, t the
time, p(x, t) the pressure, u(x, t) the speed of the air particles, U(x, t) the volume
velocity, ȡ the density, L the tube length and C the speed of sound in the air
(approximately 340 m/s). The equations governing the propagation of a plane wave
in a tube (Webster equations) are:
2
2
2
2
2
2
2
2
2
2
1
and
1
x
u
t
u
C
x
p
t
p
C w
w
w
w
w
w
w
w
[1.12]
This result is obtained by studying an infinitesimal variation of the pressure, the
air particle speed and the density: p(x, t) = p0 + ˜p(x, t), u(x, t) = u0 + ˜u(x, t), ȡ(x, t) =
ȡ0 + ˜ȡ(x, t), in conjunction with two fundamental laws of physics:
1) the conservation of mass entering a slice of the tube comprised between x and
x+dx: A˜x˜ȡ = ȡA˜u˜t. By neglecting the second-order term (˜ȡ˜u˜t), by using the
ideal gas law and the fact that the process is adiabatic, (p/ȡ = C2), this equation can
be rewritten ˜p/C2˜t = ȡ0˜u/˜x;
14 Spoken Language Processing
2) Newton’s second law applied to the air in the slice of tube yields: A˜p =
ȡA˜x(˜u/˜t), thus ˜p/˜x = ȡ0˜u/˜t.
The solutions of these equations are formed by any linear combination of
functions f(t) and g(t) of a single variable, twice continuously derivable, written as a
forward wave and a backward wave which propagate at the speed of sound:
¸
¹
·
¨
©
§

¸
¹
·
¨
©
§
 

C
x
t
g
t
x
f
C
x
t
f
t
x
f )
,
(
and
)
,
( [1.13]
and thus the pressure in the tube can be written:
¸
¹
·
¨
©
§


¸
¹
·
¨
©
§

C
x
t
g
C
x
t
f
t
x
p )
,
( [1.14]
It is easy to verify that function p satisfies equation [1.12]. Moreover, functions f
and g satisfy:
x
C
x
t
g
c
t
C
x
t
g
x
C
x
t
f
c
t
C
x
t
f
w

w
w

w
w

w

w

w )
(
)
(
and
)
(
)
(
[1.15]
which, when combined for example with Newton’s second law, yields the following
expression for the volume velocity (the tube having a constant section A):
»
¼
º
«
¬
ª
¸
¹
·
¨
©
§


¸
¹
·
¨
©
§

C
x
t
g
C
x
t
f
C
A
t
x
U
U
)
,
( [1.16]
It must be noted that if the pressure is the sum of a forward function and a
backward function, the volume velocity is the difference between these two
functions. The expression Zc = ȡC/A is the ratio between the pressure and the volume
velocity, which is called the characteristic acoustic impedance of the tube. In
general, the acoustic impedance is defined in the frequency domain. Here, the term
“impedance” is used in the time domain, as the ratio between the forward and
backward parts of the pressure and the volume velocity. The following
electroacoustical analogies are often used: “acoustic pressure” for “voltage”;
“acoustic volume velocity” for “intensity”.
The vocal tract can be considered as the concatenation of cylindrical tubes, each
of them having a constant area section A, and all tubes being of the same length. Let
' denote the length of each tube. The vocal tract is considered as being composed of
p sections, numbered from 1 to p, starting from the lips and going towards the
glottis. For each section n, the forward and backward waves (respectively from the
Speech Analysis 15
glottis to the lips and from the lips to the glottis) are denoted fn and bn. These waves
are defined at the section input, from n+1 to n (on the left of the section, if the glottis
is on the left). Let Rn =ȡC/An denote the acoustic impedance of the section, which
depends only on its area section.
Each section can then be considered as a quadripole with two inputs fn+1 and
bn+1, two outputs fn and bn and a transfer matrix Tn+1:
»
¼
º
«
¬
ª
»
¼
º
«
¬
ª



1
1
1
n
n
n
n
n
b
f
T
b
f
[1.17]
For a given section, the transfer matrix can be broken down into two terms. Both
the interface with the previous section (1) and the behavior of the waves within the
section (2) must be taken into account:
1) At the level of the discontinuity between sections n and n+1, the following
relations hold, on the left and on the right, for the pressure and the volume velocity:
)
(
and
)
(
1
1
1
1
1
1
1
¯
®
­


¯
®
­









n
n
n
n
n
n
n
n
n
n
n
n
n
n
b
f
U
b
f
R
p
b
f
U
b
f
R
p
[1.18]
as the pressure and the volume velocity are both continuous at the junction, we have
Rn+1 (fn+1+bn+1) = Rn (fn+bn) and fn+1íbn+1 = fn–bn, which enables the transfer matrix at
the interface to be calculated as:
»
¼
º
«
¬
ª
»
¼
º
«
¬
ª




»
¼
º
«
¬
ª






1
1
1
1
1
1
2
1
n
n
n
n
n
n
n
n
n
n
n
n
n
b
f
R
R
R
R
R
R
R
R
R
b
f
[1.19]
After defining acoustic reflection coefficient k, the transfer matrix )
1
(
1

n
T at the
interface is:
n
n
n
n
n
n
n
n
n
A
A
A
A
R
R
R
R
k
k
k
k
T




»
¼
º
«
¬
ª


 




1
1
1
1
)
1
(
1 with
1
1
1
1
[1.20]
2) Within the tube of section n+1, the waves are simply submitted to
propagation delays, thus:
(t)
and
(t) 1
1 ¸
¹
·
¨
©
§

¸
¹
·
¨
©
§


C
ǻ
t
b
b
C
ǻ
t-
f
f n
n
n
n [1.21]
16 Spoken Language Processing
The phase delays and advances of the wave are all dependent on the same
quantity '/C. The signal can thus be sampled with a sampling period equal to Fs =
C/(2') which corresponds to a wave traveling back and forth in a section. Therefore,
the z-transform of equations [1.21] can be considered as a delay (respectively an
advance) of '/C corresponding to a factor z-1/2
(respectively z1/2
).
and 2
1
1
2
1
1 (z)z
B
(z)
B
(z)z
F
(z)
F n
n
-
n
n 
 [1.22]
from which the transfer matrix )
2
(
1

n
T corresponding to the propagation in section
n + 1 can be deduced.
In the z-transform domain, the total transfer matrix Tn+1 for section n+1 is the
product of )
1
(
1

n
T and )
2
(
1

n
T :
1
1
0
0
1
1
1
1 2
1
2
1
2
1
1 »
¼
º
«
¬
ª



»
»
¼
º
«
«
¬
ª
»
¼
º
«
¬
ª






z
k
kz
k
z
z
z
k
k
k
Tn [1.23]
The overall volume velocity transfer matrix for the p tubes (from the glottis to
the lips) is finally obtained as the product of the matrices for each tube:
–
»
¼
º
«
¬
ª
»
¼
º
«
¬
ª p
i
i
p
p
T
T
b
f
T
b
f
1
0
0
with [1.24]
The properties of the volume velocity transfer function for the tube (from the
glottis to the lips) can be derived from this result, defined as Au = (f0íb0)/(fp íbp).
For this purpose, the lip termination has to be calculated, i.e. the interface between
the last tube and the outside of the mouth. Let (fl,bl) denote the volume velocity
waves at the level of the outer interface and (f0,b0) the waves at the inner interface.
Outside of the mouth, the backward wave bl is zero. Therefore, b0 and f0 are linearly
dependent and a reflection coefficient at the lips can be defined as kl = b0/f0. Then,
transfer function Au can be calculated by inverting T, according to the coefficients of
matrix T and the reflection coefficient at lips kl:
)
(
)
1
)(
det(
12
11
22
21 T
T
k
T
T
k
T
A
l
l
u




[1.25]
It can be verified that the determinant of T does not depend on z, as this is also
not the case for the determinant of each elementary tube. As the coefficients of the
transfer matrix are the products of a polynomial expression of z and a constant
Speech Analysis 17
multiplied by z-1/2
for each section, the transfer function of the vocal tract is
therefore an all-pole function with a zero for z=0 (which accounts for the
propagation delay in the vocal tract).
1.1.4.2. All-pole filter model
During the production of oral vowels, the vocal tract can be viewed as an
acoustic tube of a complex shape. Its transfer function is composed of poles only,
thus behaving as an acoustic filter with resonances only. These resonances
correspond to the formants of the spectrum, which, for a sampled signal with limited
bandwidth, are of a finite number N. In average, for a uniform tube, the formants are
spread every kHz; as a consequence, a signal sampled at F=1/T kHz (i.e. with a
bandwidth of F/2 kHz), will contain approximately F/2 formants and N=F poles will
compose the transfer function of the vocal tract from which the signal originates:
– 




N
i i
i
N
g
l
z
z
z
z
z
K
z
U
U
z
V
1
1
*
1
2
1
)
ˆ
1
)(
ˆ
1
(
)
(
)
( [1.26]
Developing the expression for the conjugate complex poles
]
2
exp[
*
ˆ
,
ˆ T
i
f
i
T
i
B
i
z
i
z S
S r
 yields:
– 





N
i i
i
i
N
z
T
B
z
T
f
T
B
z
K
z
V
1
2
1
2
1
]
)
2
exp(
)
2
cos(
)
exp(
2
1
[
)
(
S
S
S
[1.27]
where Bi denotes the formant’s bandwidth at í6 dB on each side of its maximum and
fi its center frequency.
To take into account the coupling with the nasal cavities (for nasal vowels and
consonants) or with the cavities at the back of the excitation source (the subglottic
cavity during the open glottis part of the vocalic cycle or the cavities upstream the
constriction for plosives and fricatives), it is necessary to incorporate in the transfer
function a finite number of zeros *
, j
j z
z (for a band-limited signal).
–
–








N
i i
i
M
j j
j
g
l
z
z
z
z
z
z
z
z
K
z
U
U
z
V
1
1
*
1
1
1
*
1
2
)
ˆ
1
)(
ˆ
1
(
)
1
)(
1
(
)
(
)
( [1.28]
18 Spoken Language Processing
Any zero in the transfer function can be approximated by a set of poles,
as n
n
n
z
a
az 
f

¦

0
1
/
1
1 . Therefore, an all-pole model with a sufficiently large
number of poles is often preferred in practice to a full pole-zero model.
1.1.5. Lip-radiation
The last term in the linear model corresponds to the conversion of the airflow
wave at the lips into a pressure wave radiated at a given distance from the head. At a
first level of approximation, the radiation effect can be assimilated to a
differentiation: at the lips, the radiated pressure is the derivative of the airflow. The
pressure recorded with the microphone is analogous to the one radiated at the lips,
except for an attenuation factor, depending on its distance to the lips. The time-
domain derivation corresponds to a spectral emphasis, i.e. a first-order high-pass
filtering. The fact that the production model is linear can be exploited to condense
the radiation term at the very level of the source. For this purpose, the derivative of
the source is considered rather than the source itself. In the spectral domain, the
consequence is to increase the slope of the spectrum by approximately +6
dB/octave, which corresponds to a time-domain derivation and, in the sampled
domain, to the following transfer function:
1
1
)
(
)
( 

| z
K
z
U
P
z
L d
l
[1.29]
with Kd|1.
1.2. Linear prediction
Linear prediction (or LPC for Linear Predictive Coding) is a parametric model
of the speech signal [ATA 71, MAR 76]. Based on the source-filter model, an
analysis scheme can be defined, relying on a small number of parameters and
techniques for estimating these parameters.
1.2.1. Source-filter model and linear prediction
The source-filter model of equation [1.4] can be further simplified by grouping
in a single filter the contributions of the glottis, the vocal tract and the lip-radiation
term, while keeping a flat-spectrum term for the excitation. For voiced speech, P(z)
is a periodic train of pulses and for unvoiced speech, N(z) is a white noise.
Speech Analysis 19
)
(z
S )
(
)
(
)
(
)
(
)
(
)
( z
H
z
P
z
L
z
V
z
U
z
P g voiced speech [1.30]
)
(z
S )
(
)
(
)
(
)
(
)
( z
H
z
N
z
L
z
V
z
R unvoiced speech [1.31]
Considering the lip-radiation spectral model in equation [1.29] and the glottal
airflow model in equation [1.9], both terms can be grouped into the flat spectrum
source E, with unit gain (the gain factor G is introduced to take into account the
amplitude of the signal). Filter H is referred to as the synthesis filter. An additional
simplification consists of considering the filter H as an all-pole filter. The acoustic
theory indicates that the filter V, associated with the vocal tract, is an all-pole filter
only for non-nasal sounds whereas is contains both poles and zeros for nasal sounds.
However, it is possible to approximate a pole/zero transfer function with an all-pole
filter, by increasing the number of poles, which means that, in practice, an all-pole
approximation of the transfer function is acceptable. The inverse filter of the
synthesis filter is an all-zero filter, referred to as the analysis filter and denoted A.
This filter has a transfer function that is written as an Mth
-order polynomial, where
M is the number of poles in the transfer function of the synthesis filter H:
)
(z
S )
(
)
( z
H
z
E
G H(z): synthesis filter [1.32]
)
(
)
(
z
A
z
E
G
with ¦ 
M
i
i
i z
a
z
A
0
)
( : analysis filter [1.33]
Linear prediction is based on the correlation between successive samples in the
speech signal. The knowledge of p samples until the instant n–1 allows some
prediction of the upcoming sample, denoted n
ŝ , with the help of a prediction filter,
the transfer function of which is denoted F(z):
n
n s
s ˆ
| p
n
p
n
n s
s
s 

 

 D
D
D 
2
2
1
1 ¦ 
p
i
i
n
is
1
D [1.34]
)
(
ˆ z
S )
)(
( 2
2
1
1
p
p z
z
z
z
S 




 D
D
D 
¸
¸
¹
·
¨
¨
©
§
¦ 
P
i
i
i z
z
S
1
)
( D [1.35]
)
(
ˆ z
S )
(
)
( z
F
z
S [1.36]
The prediction error İn between the predicted and actual signals is thus written:
20 Spoken Language Processing
n
H
¸
¸
¹
·
¨
¨
©
§

 ¦ 
p
i
i
n
i
n
n
n s
s
s
s
1
ˆ D [1.37]
ȯ(z)
¸
¸
¹
·
¨
¨
©
§

 ¦ 
P
i
i
i z
z
S
z
S
z
S
1
1
)
(
)
(
ˆ
)
( D [1.38]
Linear prediction of speech thus closely relates with the linear acoustic
production model: the source-filter production model and the linear prediction
model can be identified with each other. The residual error İn can then be interpreted
as the source of excitation e and the inverse filter A is associated with the prediction
filter (by setting M = p).
¦
¦ 
 

p
i
i
n
i
p
i
i
n
i
n s
a
n
e
G
s
1
1
)
(
D
H [1.39]
The identification of filter A assumes a flat spectrum residual, which corresponds
to a white noise or a single pulse excitation. The modeling of the excitation source
in the framework of linear prediction can therefore be achieved by a pulse generator
and a white noise generator, piloted by a voiced/unvoiced decision. The estimation
of the prediction coefficients is obtained by minimizing the prediction error. Let 2
n
H
denote the square prediction error and E the total square error over a given time
interval, between n0 and n1:
1
0
2 2 2
1
[ ] and
n
p
n n i n i n
i n n
s s E
H D H

 ¦ ¦ [1.40]
The expression of coefficients k
D that minimizes the prediction error E over a
frame is obtained by zeroing the partial derivatives of E with respect to
the k
D coefficients, i.e., for k = 1, 2, …, p:
0
2
i.e.
0
1
0
1
»
¼
º
«
¬
ª
¦

w
w
¦ 

n
n
n
p
i
i
n
i
n
k
n
k
s
s
s
E
D
D
[1.41]
Finally, this leads to the following system of equations:
p
k
s
s
s
s
n
n
n
i
n
k
n
p
i
i
n
n
n
n
k
n d
d
¦
¦
¦ 

 1
1
0
1
0 1
D [1.42]
Speech Analysis 21
and, if new coefficients cki are defined, the system becomes:
¦
¦ 

d
d
1
0
h
wit
1
1
0
n
n
n
k
n
i
n
ki
p
i
ki
i
k s
s
c
p
k
c
c D [1.43]
Several fast methods for computing the prediction coefficients have been
proposed. The two main approaches are the autocorrelation method and the
covariance method. Both methods differ by the choice of interval [n0, n1] on which
total square error E is calculated. In the case of the covariance method, it is assumed
that the signal is known only for a given interval of N samples exactly. No
hypothesis is made concerning the behavior of the signal outside this interval. On
the other hand, the autocorrelation method considers the whole range í’, +’ for
calculating the total error. The coefficients are thus written:
¦



1
N
p
n
k
n
i
n
ki s
s
c covariance [1.44]
¦
f
f


n
k
n
i
n
ki s
s
c autocorrelation [1.45]
The covariance method is generally employed for the analysis or rather short
signals (for instance, one voicing period, or one closed glottis phase). In the case of
the covariance method, matrix [cki] is symmetric. The prediction coefficients are
calculated with a fast algorithm [MAR 76], which will not be detailed here.
1.2.2. Autocorrelation method: algorithm
For this method, signal s is considered as stationary. The limits for calculating
the total error are í’, +’. However, only a finite number of samples are taken into
account in practice, by zeroing the signal outside an interval [0, Ní1], i.e. by
applying a time window to the signal. Total quadratic error E and coefficients
cki become:
¦ ¦
¦
f
f
f
f




f
f n n
i
k
n
n
k
n
i
n
ki
n
n s
s
s
s
c
E and
2
H [1.46]
Those are the autocorrelation coefficients of the signal, hence the name of the
method. The roles of k and i are symmetric and the correlation coefficients only
depend on the difference between k and i.
22 Spoken Language Processing
The samples of the signal sn (resp. sn+|k-i|) are non-zero only for n  [0, N–1]
(n+|k-i|  [0, N–1] respectively). Therefore, by rearranging the terms in the sum, it
can be written for k = 0, …, p:
1 1
0 0
with
( ) ( )
N k i N k
ki n n n k
n k i
n n
c s s r k i r k s s
    

  
¦ ¦ [1.47]
The p equation system to be solved is thus (see [1.43]):
p
k
r
a
p
i
i d
d
¦ 1
0
)
i
-
k
(
1
[1.48]
Moreover, one equation follows from the definition of the error E:
¦
¦ ¦ ¦ ¦ ¦
f
f
f
f



p
i
i
i
n
p
i
p
j n
p
i
i
n
i
n
j
n
j
i
n
i r
a
s
a
s
s
a
s
a
E
0
0 0 0
[1.49]
as a consequence of the above set of equations [1.48]. An efficient method to solve
this system is the recursive method used in the Levinson algorithm.
Under its matrix form, this system is written:
»
»
»
»
»
»
»
»
¼
º
«
«
«
«
«
«
«
«
¬
ª
»
»
»
»
»
»
»
»
¼
º
«
«
«
«
«
«
«
«
¬
ª
»
»
»
»
»
»
»
»
¼
º
«
«
«
«
«
«
«
«
¬
ª






0
0
0
0
1
3
2
1
0
3
2
1
3
0
1
2
3
2
1
0
1
2
1
2
1
0
1
3
2
1
0












 E
a
a
a
a
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
p
p
p
p
p
p
p
p
p
[1.50]
The matrix is symmetric and it is a Toeplitz matrix. In order to solve this system,
a recursive solution on prediction order n is searched for. At each step n, a set of
Speech Analysis 23
n+1 prediction coefficients is calculated: n
n
n
n
n
a
..
a
a
a ,
.
,
,
, 2
1
0 . The process is repeated
up to the desired prediction order p, at which stage: 0
0 a
a p
, 1
1 a
a p
,
2
2 a
a p
,…, .
p
p p
a a If we assume that the system has been solved at step n–1,
the coefficients and the error at step n of the recursion are obtained as:
»
»
»
»
»
»
»
»
¼
º
«
«
«
«
«
«
«
«
¬
ª

»
»
»
»
»
»
»
»
¼
º
«
«
«
«
«
«
«
«
¬
ª
»
»
»
»
»
»
»
»
¼
º
«
«
«
«
«
«
«
«
¬
ª












1
0
1
1
1
2
1
1
1
1
1
2
1
1
1
0
1
2
1
0 0
0 n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
a
a
a
a
k
a
a
a
a
a
a
a
a
a



[1.51]
»
»
»
»
»
»
»
»
¼
º
«
«
«
«
«
«
«
«
¬
ª

»
»
»
»
»
»
»
»
¼
º
«
«
«
«
«
«
«
«
¬
ª
»
»
»
»
»
»
»
»
¼
º
«
«
«
«
«
«
«
«
¬
ª


1
1
0
0
0
0
0
0
0
0
0
0
n
n
n
n
E
q
k
q
E
E



[1.52]
i.e. 1
1 


 n
i
n
n
n
i
n
i a
k
a
a , where it can be easily shown from equations [1.50], [1.51]
and [1.52] that:
)
1
(
and
1 2
1
1
0
1
1
n
n
n
n
i
i
n
n
i
n
n k
E
E
r
a
E
k 
 




¦ [1.53]
As a whole, the algorithm for calculating the prediction coefficients is
(coefficients ki are called reflection coefficients):
24 Spoken Language Processing
1) E0 = r0
2) step n: i
n
n
i
n
i
n
n r
a
E
k 
 

¦

1
0
1
1
1
3) n
n
n k
a and 1
0
n
a
4) 1
1 


 n
i
n
n
n
i
n
i a
k
a
a for 1 ” i ” n-1
5) 1
2
)
1
( 
 n
n
n E
k
E
These equations are solved recursively, until the solution for order p is reached.
In many applications, one of the goals is to identify the filter associated with the
vocal tract, for instance to extract the formants [MCC 74]. Let us consider vowel
signals, the spectra of which are shown in Figures 1.4, 1.5 and 1.6 (these spectra
were calculated with a short-term Fourier transform (STFT) and are represented on a
logarithmic scale). The linear prediction analysis of these vowels yields filters which
correspond to the prediction model which could have produced them. Therefore, the
magnitude of the transfer function of these filters can be viewed as the spectral
envelope of the corresponding vowels.
Linear prediction thus estimates the filter part of the source-filter model. To
estimate the source, the speech signal can be filtered by the inverse of the analysis
filter. The residual signal subsequently obtained represents the derivative of the
source signal, as the lip-radiation term is included in the filter (according to equation
[1.30]). The residual signal must thus be integrated in order to obtain an estimation
of the actual source, which is represented in Figure 1.7, both in the frequency and
time domains.
Speech Analysis 25
Figure 1.4. Vowel /a/. Hamming windowed signal (Fe = 16 kHz). Magnitude spectrum on a
logarithmic scale and gain of the LPC model transfer function (autocorrelation method).
Complex poles of the LPC model (16 coefficients)
26 Spoken Language Processing
Figure 1.5. Vowel /u/. Hamming windowed signal (Fe = 16 kHz). Magnitude spectrum on a
logarithmic scale and gain of the LPC model transfer function (autocorrelation method).
Complex poles of the LPC model (16 coefficients)
Speech Analysis 27
Figure 1.6. Vowel /i/. Hamming windowed signal (Fe = 16 kHz). Magnitude spectrum on a
logarithmic scale and gain of the LPC model transfer function (autocorrelation method).
Complex poles of the LPC model (16 coefficients)
28 Spoken Language Processing
1.2.3. Lattice filter
We are now going to show that reflection coefficients ki obtained by the
autocorrelation method correspond to the reflection coefficients of a multi-tube
acoustic model of the vocal tract. For this purpose, new coefficients n
i
b must be
introduced, which are defined at each step of the recursion as:
n
i
a
b n
i
n
n
i ,
,
1
,
0 
 [1.54]
The { p
i
b } coefficients, where p is the prediction order, can be used to postdict
the signal, i.e. to predict the preceding sample of the signal. Let’s form the estimate
p
n
s 
ˆ :
p
n
s 
ˆ 1
1
1
1
0 


 


 p
n
p
n
n s
b
s
b
s
b 
¦
¦






1
0
1
0
p
i
i
n
i
p
p
i
i
n
i s
s
b D [1.55]
A postdiction, or backward error, 
n
H can be defined as:
¦ 




p
i
p
i
n
i
p
n
p
n
n b
s
b
s
s
0
1
with
ˆ
H [1.56]
The total forward prediction error E (of equation [1.40]) is denoted E+
, while the
total backward prediction error is denoted E-
. In a same manner as in the previous
development, it can be shown that, for the autocorrelation method, we have E
í
=E
+
.
Subsequently, the backward prediction coefficients bi obtained via the minimization
of the total backward error are identical to the ai coefficients, and the Levinson
algorithm can be rewritten as:
1
1
1 


 n
i
n
n
i
n
i b
k
a
a and 1
1
1


  n
i
n
n
i
n
i a
k
b
b with
°̄
°
®
­


0
0
1
1
n
n
n
n
b
a
[1.57]
If we consider the forward and backward prediction errors for a same instant (at
order n):
¦ ¦ 



n
j
n
j
j
i
j
n
i
j
i
j
n
i s
b
s
a
0 0
and H
H [1.58]
Speech Analysis 29
and then equations [1.57] yield:













 )
1
(
)
1
(
1
)
1
(
1
)
1
(
and n
i
n
n
i
n
i
n
i
n
n
i
n
i k
k H
H
H
H
H
H [1.59]
The z-transforms of these equations provide:
»
»
¼
º
«
«
¬
ª
»
»
¼
º
«
«
¬
ª
»
»
¼
º
«
«
¬
ª








)
(
)
(
1
)
(
)
(
)
1
(
)
1
(
1
1
z
E
z
E
z
k
z
k
z
E
z
E
n
n
n
n
n
n
[1.60]
with, for n = 0: i
i
i s

 0
0
H
H .
To complete the analogy between linear prediction and multi-tube acoustic
model, a slightly different definition of the backward prediction coefficients must be
resorted to: n
i
n
n
i a
b 1

 for i = 1, 2, ..., n+1. The total backward error has the same
expression and the Levinson algorithm is written:
1
1 

 n
i
n
n
i
n
i b
k
a
a and 1
1
1
1



  n
i
n
n
i
n
i a
k
b
b with
°̄
°
®
­


0
0
1
0
1
n
n
n
b
a
[1.61]
from which the error recursion matrix can be deduced:
»
»
¼
º
«
«
¬
ª
»
¼
º
«
¬
ª
»
»
¼
º
«
«
¬
ª








)
(
)
(
1
)
(
)
(
)
1
(
)
1
(
1
1
z
E
z
E
z
z
k
k
z
E
z
E
n
n
n
n
n
n
[1.62]
for n = 0, i
i s

0
H and 1
0


i
i s
H , i.e. )
(
)
(
0
z
S
z
E 
and )
(
)
( 1
0
z
S
z
z
E 

. The
inverse matrix from equation [1.62] is:
»
¼
º
«
¬
ª


 z
k
z
k
k n
n
n
1
1
1
2
[1.63]
Except for a multiplicative factor, this is the matrix of equation [1.23], obtained
for a section of the multi-tube vocal tract model. This justifies the naming of the kn
coefficients as reflection coefficients. This is the inverse matrix, as the linear
prediction algorithm provides the analysis filter. On the contrary, the matrix for an
elementary section of the multi-tube acoustic model corresponds to the synthesis
filter, i.e. the inverse of the analysis filter. Note that this definition of backward
prediction coefficients introduces a shift of one sample between the forward error
and the backward error, which in fact corresponds to the physical situation of the
multi-tube model, in which the backward wave comes back only after a delay due to
30 Spoken Language Processing
the propagation time in the tube section. On the contrary, if the definition of [1.54]
is used, there is no shift between forward and backward errors.
Equation [1.62] allows for the analysis and synthesis of speech by linear
prediction, with a lattice filter structure. In fact, for each step in the recursion,
crossed terms are used that result from the previous step. A remarkable property of
lattice filters is that the prediction coefficients are not directly used in the filtering
algorithm. Only the signal and the reflection coefficients intervene. Moreover, it can
be shown [MAR 76, PAR 86] that the reflection coefficients resulting from the
autocorrelation method can be directly calculated using the following formula:
( 1) ( 1)
1
0
2 2
( 1) ( 1)
1 1
0 0
n n
N
i i
i
n
n n
N N
i i
i i
k
H H
H H
   

   
 
¦
¦ ¦
[1.64]
These coefficients are sometimes called PARCOR coefficients (for PARtial error
CORrelation). The use of equation [1.64] is thus an alternate way to calculate the
analysis and synthesis filters, which is equivalent to the autocorrelation method, but
without calculating explicitly the prediction coefficient. Other lattice filter structures
have been proposed. In the Burg method, the calculation of the reflection
coefficients is based on the minimization (in the least squares sense) of the sum of
the forward and backward errors. The error term to minimize is:
¦



»
¼
º
«
¬
ª

1
0
2
2
N
i
n
i
n
i
N
E H
H [1.65]
By writing that ˜En
/˜kn = 0, in order to find the optimal kn coefficients, we
obtain:
¦
¦
¦
 

 

 



1
0
2
)
1
(
1
0
2
)
1
(
1
0
)
1
(
)
1
(
2
N
i
n
i
N
i
n
i
N
i
n
i
n
i
n
k
H
H
H
H
[1.66]
Speech Analysis 31
These coefficients no longer correspond to the autocorrelation method, but they
possess good stability properties, as it can be shown that í1 ” kn ” 1. Adaptive
versions of the Burg algorithm also exist [MAK 75, MAK 81].
1.2.4. Models of the excitation
In addition to the filter part of the linear prediction model, the source part has to
be estimated. One of the terms concerning the source is the synthesis gain G. There
is no unique solution to this problem and additional hypotheses must be made. A
commonly accepted hypothesis is to set the total signal energy equal to that of the
impulse response of the synthesis filter. Let us denote as h(n) the impulse response
and rh(k) the corresponding autocorrelation coefficients. Thus:
¦
¦ 


p
i
h
i
p
i
h
i i
k
r
k
r
i
n
h
n
G
n
h
1
1
)
(
)
(
and
)
(
)
(
)
( D
D
G [1.67]
Indeed, for k  0, the autocorrelation coefficients are infinite sums of terms such
as:
¦ 




p
i
i i
n
h
k
n
h
k
n
h
n
G
n
h
k
n
h
1
)
(
)
(
)
(
)
(
)
(
)
( D
G [1.68]
and the terms į(n)h(ník) are always zero, for k  0. Equaling the total energies is
equivalent to equaling the 0th
order autocorrelations. Thanks to recurrence equation
[1.67], the autocorrelation coefficients of the signal and of the impulse response can
be identified with each other: rh(i) = r(i), for i = 0, 1, …, p. For n = 0, h(0) = G;
therefore, reusing equation [1.67] yields:
2
1 1
(0) (0) ( ) therefore: (0) ( )
p p
h i h i
i i
r Gh r i G r r i
D D
 
¦ ¦ [1.69]
32 Spoken Language Processing
Figure 1.7. Vowel /i/. Residual signal and its magnitude spectrum on a logarithmic scale
In the conventional linear prediction model, the excitation is either voiced or
unvoiced, for each analysis frame. In the case of a voiced signal, the excitation is a
periodic pulse train at the fundamental period (see Figure 1.9), and for an unvoiced
signal, the excitation is a Gaussian white noise (see Figure 1.8). The mixture of
these two sources is not allowed, which is a definite drawback for voiced sounds for
which a noise component is also present in the excitation.
Other documents randomly have
different content
among them, and many more that are not so good. Those that saw
the thing out say they finally got to singing, Glory to God, and
Abe Linkum, and wound up with a prayer meeting, in which Massa
Linkum and the Linkum Sogers were the names most often heard.
October 17, 1863.
Saturday. To-day Lieutenants Heath, Reynolds, the quartermaster
and myself took a long ride about the country spreading the news of
our headquarters for recruits. The white people we met were civil,
but their hatred of us could not be entirely covered up. I could not
find it in my heart to blame them, and I much regretted that one of
our party saw fit to trade horses with one of them and entirely
against his will. But the blacks are wild with joy, and eager to
become Linkum Sogers.
In the afternoon a detail was sent out with the quartermaster's
wagon for mutton or beef, for our family is getting so large they will
soon eat up the government rations at hand. They came back soon
with a choice lot of dressed mutton. The guides apparently knew
just where to go. Later in the day Reynolds, Gorton and myself
made another tour of the country towards the Mississippi River. We
came to a house over towards the Great Cypress Swamp, as the
folks here call it, and which is a belt of big timber lying between the
Teche prairie and the Mississippi River, in which outlaws and wild
beasts are said to abound, and in which bands of guerrillas have
their hiding places. We have heard much of the Great Cypress
Swamp and its terrors, and felt quite brave as we looked at it from a
half mile distance. No one appeared to be at home, so we
investigated. The weeds were as high as our heads, but a path led
back to a stable in which was the most perfect picture of a horse I
ever looked at. He appeared to be scared out of his head at the
sight of us, and plunged and snorted as if a bear was after him. The
path continued and soon we came to a mulatto and his wife busy
digging peanuts. We introduced the subject of enlistment and found
he was ready and willing to go at once if he could take his horse
with him. They could both talk English, and a jargon we supposed
was French. When speaking to us they used English, but to each
other they talked French. After a short confab he agreed to go with
us, and his wife made no objection. He got his horse from the
stable, and his saddle from the house and we set out for camp.
I thought it strange that either of them showed so little concern at
parting for what might be forever, and wondered the wife did not
ask to go also, as so many of the others had done. We reached
camp just at night, where both the horse and man attracted the
attention of all hands. Colonel Parker at once wanted to buy the
horse, and a bargain was soon struck, the horse to be paid for on
the next pay day, which was agreeable to the mulatto. He was so
frank and open in all his talk, that when he asked if he might ride
the horse home and remain till morning the colonel readily
consented, telling him to be in camp by noon the next day.
October 18, 1863.
Sunday. We lay about camp until noon and the horse and his rider
did not appear. The colonel was mad clear through. He had been
told the nigger would not come back, but he believed he would, and
as the time went on little was heard but comments on the slick trick
the rogue had played on Colonel Parker. After dinner he told Gorton
and me to saddle up and show him the way and he would see
whether he could find him. We went to the house but found no one
at home. We then rode on towards the swamp. We saw a man
running across a cleared spot and soon overhauled him. It was the
fellow himself. He said his horse had got away and he was trying to
find him, had been looking for him all the morning. The colonel drew
his revolver and told him to march ahead of him to a big tree a short
distance away, at the same time telling me to get my picket rope
ready, for he was going to find that horse, or else find a dead nigger.
The nig was scared and began to beg, declaring the horse had
gotten out of the stable in the night, and he and his wife both had
been looking for him all day long. After he had got through, the
colonel told me to throw the line over a limb, for he was going to
keep his word. Whether he did really intend to hang him or not I
don't know, but I thought he would stop short of the actual deed, so
I proceeded to get the rope in position for a real hanging. Just then
the rascal owned up. The horse was in the swamp where he had
hidden him, and if the colonel would spare his life he would take us
to him. We then went on and soon came to a beaten path that led
directly to the dense forest before us. At the first turn in the path
after we entered the woods the colonel dropped me off. At the next
turn he left Gorton, and he himself with revolver in hand followed
the fellow on and out of sight. He was gone perhaps fifteen minutes
when out they came, horse and all, and we made tracks for camp,
which we reached about sundown. The next morning the man's wife
came into camp, and they both acted as if nothing out of the
ordinary had happened. Where I waited in the woods the
undergrowth was so dense I could not see a rod in any direction
except along the path. Squirrels, both black and gray, came out of
the bushes and looked at me. I counted five black squirrels in sight
at one time. They are not quite so large as the grays, and are a dark
brown rather than black. I wondered if they were as plenty all
through the woods as where I sat. Gorton says he saw as many as I
did. If all the stories I have heard about the Great Cypress Swamp
are true, I don't care for any closer acquaintance than I now have.
There are wild animals of all kinds common to this part of the
country—bears, wildcats, opossum, deer and snakes as big as any in
Barnum's menagerie. I can believe the snake part, for I have seen
so many that I believe all the snake stories I hear. This same Great
Cypress Swamp is said to be the home of outlaws, both white and
black. That they have homes there where they live undisturbed by
the laws made to govern other people. That runaway slaves find
homes there, where they live and raise families which recruit the
ranks of the lawless set living there, as fast as they are killed off by
the fights they have among themselves and with the officers of the
law that attempt to capture or subdue them.
Night. The work for to-morrow has been mapped out. Quartermaster
Schemerhorn, Lieutenant Reynolds and myself are to start for
Brashear City, taking with us the men we have enlisted. Two days'
rations have been given out, and the darkies are having a farewell
dance. This has been a busy Sunday, one I will long remember.
October 19, 1863.
Monday. We were up early and found the dance still going on. These
creatures have danced all night, and eaten up a good portion of the
rations, in spite of the fact that they knew a hard tramp lay before
them to-day. How they will get through, or what we will do if they
give out on the way, is the next thing for us to think of. They don't
care. Someone has always thought for them and will have to think
for them for some time to come.
The quartermaster and Reynolds started off in good season but I
was kept back for instructions until they were out of sight, and I did
not overtake them until they had reached Vermillion Bayou. A drove
of men, women and children, the families of the men we were taking
away, had followed them until now. We had to wait for a wagon train
to get off the bridge and this gave time for them to get through with
the good-byes, and most of them turned back. A half dozen or more
of the younger women kept on and went all the way through. The
day was warm, and the road was dusty, but we went through
without accident or adventure, other than might be expected when
all things are considered. For several days the men had been in a
state of great excitement over their new prospects. They had wound
up by dancing all night, and eating up the provisions intended for us
on this hard tramp. As the day wore on the excitement wore off and
they found themselves very tired and very hungry. Such few things
as they had beside those on their backs was in a cart drawn by a
mule, and driven by three wenches. When a man gave out we
turned out a wench and put the man in her place. Finally all three
wenches were on foot, and their places in the cart taken by as many
men. Before long others gave out and the cart was loaded until that
broke down. Then we held a council. We were outside the picket
lines and night was coming on, and staying there in the road was
not to be thought of. Three revolvers were the only weapons of
defense we could muster in case of attack by a guerrilla squad.
Capture meant death. We explained the situation to such as could
understand us, and they made it so plain to the others that they
were all ready to hustle. We patched up the cart so the extras could
be dragged along and away we went. The quartermaster rode on to
find a place to stay at, and something to eat. I let one who was
worst off ride my horse, and with Reynolds at the front to coax, and
I at the rear to drive, we got up such a gait I had to do my best to
keep up. The road had been graded for a railroad, and was wide and
level as a floor. At dusk I saw the steeple of a church, and knew we
were near our journey's end. Now that the end was in sight, the
weariness all seemed to disappear. We passed the picket line and
were soon in the town.
The quartermaster had got a schoolhouse for a stay over and had
rations from the commissary. We made short work of these and
expected to settle right down for the night. The men and women
filled the schoolhouse full, and after being in there a few minutes,
we three made up our minds the air was better outside, so we each
took a board shutter from the windows and were soon settled down
as comfortable as the circumstances would allow. Before we were
asleep we heard a fiddle tuning up and in a little while a dance was
started and was in full blast when I fell asleep. How long it lasted I
don't know, but when I awoke about sunrise the inmates of the
schoolhouse were sleeping like the dead.
October 20, 1863.
Tuesday. I was nearly blind when I awoke. Something like an
inflammation in my eyes had troubled me for some days, and the
dusty tramp of the day before had made it worse. However, I soaked
them open, and found that it had not affected my appetite in the
least. While at breakfast Lieutenant Bell came and joined us. He was
on his way to join the colonel and his party at the front. The colonel
had given us an order to stop any boat going towards Brashear City,
and with it I proceeded to the landing, leaving Reynolds and the
quartermaster to pick up and bring on our party. At the landing I
met a party on their way to the front, and gave my horse to one of
them who was in just such a fix as I was the morning I became a
horse thief. In reply to his very profuse thanks I told him I would
have to turn her loose if I didn't give her away, for I could take her
no farther. I had long forgiven her the kick she gave me and
sincerely wished her well. At Nelson's Landing I found a boat which
was being held in readiness for General Banks and his staff, so that
was of no use to us. Soon after the A. G. Brown came up and said
she would be back that night, and take us. We went into camp near
the sugar mill and very soon our small army was arranging for a
sham battle. They talked French, so I could only judge what they
were up to from what I saw. They divided into two squads and
proceeded to fortify their positions by rolling the empty sugar
hogsheads up in two parallel rows, behind which they stationed
themselves, while the generals in command jawed at each other
across the field. The men each had a hogshead stave for a weapon.
For flags they used bandanna handkerchiefs, and for drums a piece
of board upon which one man pounded while another held it up.
One of the generals made a speech which made the other side
fighting mad, and they all jumped over the breastworks and met in
the space between, batting each other over the head with their
weapons, and yelling with all the power of their lungs. We thought
sure they would kill each other, for the blows they struck broke some
of the staves into splinters. Just as we were going to try and
interfere, one side surrendered and were marched off, prisoners.
There had been some blood shed, and the wonder is that no heads
were broken. But the best part came after the fight was over, and
when the final settlement was being made. Through an interpreter
we learned that the general who should win the fight was to kiss one
of the young ladies that had marched with us all the way from
Mouton's Plantation, and he now demanded his pay. She was led out
upon the battlefield, and when the victorious officer came up to
claim his reward she slapped his face, and then turned her back to
him. He then gave some orders, when his men grabbed the dusky
maiden and turned her about. I could not tell whether she blushed
or not, but suppose of course she did. The general got down on one
knee and then on both and jabbered French at her until she finally
relented and stuck out her hand, which she allowed him to kiss. This
soon led to a full surrender, and the battle was over, and peace
declared.
We gave out the rations and began to get ready for a start as soon
as the boat came along. We even filled a barrel with sugar, thinking
it might come handy when we got to Brashear City. But night came
and the A. G. Brown failed to appear. There were many here who
like ourselves were waiting to get out of the country. Among them
was a young mulatto woman, whom the others called Margaret, and
who seemed of a higher order than those about her. She was willing
to talk, and from her I have a story that has fully reconciled me to
the wisdom of the President's Emancipation Proclamation. She has
started for the North. Our coming among them has given her the
chance she had long looked for. She has run away from her mistress,
and her master is in the Rebel army. She has a picture of her
husband, and a fine-looking man he was. He was as white as I am.
He was the son of his master, and her father she says is Judge ——,
now in the Rebel service. Her husband picked up enough education
to be head man on his father's plantation. He knew too much for a
nigger, and when the Rebel army came through last spring he was
taken out and hanged to a tree right before her eyes. After they had
gone the slaves cut the body down and buried it. Margaret is in
hopes to reach New York, and I wished I could land her there that
minute. If she was dressed as well, and if she was educated, she
would pass muster with any I have seen that go by the name of
ladies.
No boat coming to take us away, we posted guards, giving each a
stick of wood for a weapon. I remained up until midnight, and in
going the rounds to see if the guards were awake, came near
getting a club over my head as I turned the corner of the sugar mill.
At midnight I called Reynolds, and rolled myself in my blanket and
was soon asleep. The mosquitoes were about as thick and as savage
as any we had met with. The horses and cattle had no peace for
them. I rolled myself up head and heels in my blanket, and yet when
I awoke found one foot had got out of bed, and the varmints had
put a belt around my ankle between my stocking and trousers that
looked like raw beef. I don't suppose there was an atom of space
that had not been punctured by a bill. But I slept right through, and
as usual dreamed of home and home folks.
October 21, 1863.
Wednesday. Nelly, one of the women who came with our crowd, has
volunteered to be our cook, and besides being a good cook has
proved herself to be a good forager. When I woke up she had fresh
pork and chicken cooked and we asked no questions about what
price she paid for them. Quartermaster Schemerhorn rode up to
Newtown for rations, and I went back to bed to finish up my nap.
The mosquitoes had not quite finished their job on me, and some
actually bit me through a thick woollen blanket. My leg was very
sore where they feasted on it this morning. One of the men mixed
up some mud for a poultice, which helped it wonderfully. I found out
we could learn many things from these poor creatures, not the least
being how to live on the fat of the land we are in.
Noon. The quartermaster came back and said the A. G. Brown would
be along to-day some time. That it will make a landing one-half mile
above here. Accordingly we pack up and move up to Mr. Nelson's so
as to be sure of not missing it. Mr. Nelson, the owner of everything
in this region, is here. He has been a merchant in New Orleans, but
since Banks' order driving all Rebel sympathizers from the city, has
been here at his plantation home. It is said he owns 20,000 acres of
land, and all the necessary stock and tools to work so large a tract.
After a supper of hard-tack and bacon, Lieutenant Reynolds and I
went and called on the gentleman. He received us very politely, and
offered us the best his house afforded. The boat not coming we
prolonged our visit, sitting on the broad piazza and smoking his
cigars. He said he was a widower, with two children, a son in the
army, and a daughter at school in Georgia. He told us of the
outrageous wrongs he had suffered at the hands of the invading
armies, how they had laid waste his land, torn down his buildings
and fences, taking away his mules and horses, cattle and sheep,
until he had nothing but the bare land to live upon, and no slaves
left him to work even that. It was holding up the other side of the
picture to our view, and in spite of ourselves we were sorry for him.
He evidently did not expect sympathy from us, for after reciting his
wrongs he changed the subject of conversation around to topics we
could all agree upon, and after a sociable chat he invited us to spend
the night with him, agreeing to have us called in case the boat came
during the night. He urged us to stay and we did. He gave us rooms,
elegantly furnished, with beds so white and clean we were some
time making up our minds whether after all we ought not to sleep
on the floor, and leave the beds as they were. But the whole
mosquito bars and a few nips from our ever-present enemies
decided us. We undressed and were soon asleep, too sound even to
dream of home. The boat did not come and the next thing we were
aware of it was morning.
October 22, 1863.
Thursday. We slept late, and when we came out, our host was
waiting for us, to say that breakfast was ready, and would not listen
to our going away until we had partaken of it with him. We sat down
to a beefsteak breakfast, with all the extras. I did not think I was so
hungry, but the smell of the victuals made us both ravenous. Our
host seemed to enjoy seeing us eat and thanked us heartily for
making him the visit, going so far as to say that in case the boat did
not come that day he would be glad to entertain us again. In books
and in other ways I had heard of southern hospitality and I now
know it was all true. I wonder if it was ever put to a severer test.
We went down to the landing and found a guard of soldiers from an
Illinois regiment, keeping watch over a quantity of sugar and
molasses which the government has confiscated, and which the boat
was expected to take away when it came. They invited us to make
one of their party until the boat came, and we gladly accepted the
invitation. They thought we had risked our lives in going to stay with
Mr. Nelson, and eating food in his house, but we did not believe it,
and did all we could to make them think better of him than they had
so far done. The guards shot a hog, which made fodder for our folks
for the day, together with the government rations we already had.
The day passed and another night came on and still no boat. We
crawled in wherever we could get and slept as best we could for the
mosquitoes, which seems determined to eat us alive.
October 23, 1863.
A cold rain storm that has been threatened for a day or two came
upon us early this morning. A small flock of sheep came up the road
driven by a man on horseback. The negroes from everywhere have
gathered here and the rations we give our men they give away to
their friends and are always hungry in consequence. When the
sheep came along they surrounded them and killed at least a dozen
before we could stop them. The man hustled along with what was
left and those killed were soon skinned and being cooked in various
ways. We had mutton for dinner and for supper, and had enough left
for breakfast. The day finally passed and we began looking for better
sleeping quarters. Reynolds and I with a part of the guard finally
climbed a ladder and got into a loft full of cornstalks with the corn
on just as it had been cut and stored away. The place was alive with
rats and mice, which ran over and through the stalks, making a
terrible racket, varied once in a while by a fight among themselves.
We got used to the racket and finally were asleep. Just as we were
enjoying ourselves, along came the boat we had waited so long for.
We hustled to sort out the nigs that belonged to us and get them on
board. In a little while we were off. The boat was crammed full of
people—black and white, old and young, men and women all spread
out on the cabin floor, or the tables. I never saw such a mass of
people in so small a space. We poked around and after a while
found room to lie down, after which getting asleep was quick work.
October 24, 1863.
Saturday. Another raw day. Now that the people are standing on end
there is more room to get about. We made out to eat such as we
had; while we wished for more, we had to content ourselves with
what we had grabbed hold of the night before in the dark. At noon
we passed Franklin, and about 3 p. m. reached Centerville, where
there was a lot of sugar to load on the lower deck. The captain said
if we would turn in our men to roll on the sugar he would undertake
to fill them up.
I took advantage of the stop to see what the place looked like. On
one of the streets I saw oranges on a tree and went in to see if I
could beg or buy a few. As I went into the yard a young lady came
out and, in a tone and with a look that almost froze me, asked what
I was doing in her yard. To save me I couldn't think what to say, but
I did after a while come to enough to say I would like an orange.
She turned to a negro and motioned towards the trees, when he
went and picked his hands full and gave me. Then the madam
pointed her finger towards the street and said, Now that you have
what you came after will you please go—and I went. I don't know
yet what I ought to have said or done, but the only thing I did was
to get back to the boat as fast as I could. I kept the adventure to
myself, and gave the oranges away, for I think they would have
choked me. That is a sort of southern hospitality I never read of in a
book, or heard of in any other way. I never saw so much scorn on a
face before. Why I stood there like a chicken thief caught in the act,
and then carried off the oranges, I don't now know. If the Rebels
were all like her I would resign and go home at once, for she did
actually scare my wits all away from me. The sugar was on board
and true to his promise the captain ordered a supper for our army,
which must have made his stock of provisions look small. Rube
asked me what I found the town like, and I told him it was different
from any I had yet seen. We soon got settled down for the night.
October 25, 1863.
Sunday. When we awoke we were in sight of Brashear City. We
landed, formed in line as well as we could, and marched to our
headquarters, where I found my old crony, Sol Drake. We found
quarters for the men in an unused building, and in a little while their
woolly heads were sticking out from every window.
The quartermaster drew clothes for them, and they were soon fitted
out with suits of blue, just like the rest of the Linkum Sogers. The
trouble was to fit them with shoes. I doubt if many had ever had a
shoe on their feet. Their feet are wide at the toes and taper straight
back to the heel. No. 12 was the smallest size we found use for, the
most of them taking 14 or larger. They insisted on squeezing a No.
14 foot into a No. 10 or 12 shoe, but we, knowing what that would
result in, got them properly shod after a long time. Then how proud
they were! We then gave them their rations for the day, telling them
through interpreters that if they wasted it or gave it away, they could
have no more until to-morrow. We moved all our belongings from
the boat and filled out the day visiting and talking over old times,
and at early bedtime settled down for the night in a four-room house
which has been taken for our headquarters while here.
October 26, 1863.
Brashear City, La. Monday. On going out this morning who should
appear to me but George Story of Company B, who was captured
with General Dow at Port Hudson last summer. He says he was well
treated by his captors, and has no fault to find with them. They took
him and the general to Richmond, and put them in Libby Prison.
After a while he was paroled, and sent to Annapolis, Md. There he
was kept until exchanged, and then sent south in charge of the
provost marshal to be turned over to the 128th New York. Through a
mistake at headquarters he was sent here, as the 128th was
supposed to be at the front in the Teche country. If he had not met
us as he did, he would have gone up the Teche on the next boat. As
it is he will go back to New Orleans to-morrow, and look for his
regiment up the river, probably at Baton Rouge, where we left them.
We commenced teaching our recruits the rudiments of soldiering.
They are awkward, but very anxious to learn, and as that is the main
thing, we look for little trouble in drilling them. By shoving them
together, lock-step fashion, they soon got the idea of marching in
time, and on the whole did as well or better than we did at Hudson,
when we took our first lesson. The quartermaster has gone to the
city for equipments, tents, etc., and when he returns we will soon be
at the Manual of Arms. We expect Major Palon here to-day to take
charge, and by the time Colonel B. and the rest get back, hope to
have our recruits fit for turning over to any regiment that needs
them.
October 27, 1863.
Tuesday. It rained hard all day, consequently no drill or other work
was attempted. Major Palon and the quartermaster came from the
city, the latter with rubber blankets and shelter tents for the recruits.
He also brought some letters, one for me telling about the draft at
home. Those that are drafted can get off by hiring a substitute or by
paying $300, in which case a substitute is furnished them. I am glad
I enlisted. There have been times when I could hardly say it, but I
can say it now with all sincerity.
More women and children have come, wives and children of the men
we have. Poor things! I suppose they have nowhere else to go or to
stay, so they have followed on after their husbands and fathers. I
have heard that the government has provided camps for them,
where rations are served to them just as to the soldiers. It is a very
proper thing to do, and I hope it may be true that these helpless
ones are thus provided for. This arming of the negroes is not such a
simple affair as it seemed. This is a side I had not thought of, but I
don't see how it can be dodged.
October 28, 1863.
Wednesday. The rain has stopped, and the mud is now having its
turn. It makes us just as helpless as the rain did. We have put in the
time making plans for the time when the mud hardens. It does not
dry up, as it does in the north, but the water seems to settle and
leave the ground hard even if there be no sun or wind.
October 29, 1863.
Thursday. After a council on matters and things in general, we have
made some changes, looking to a more orderly arrangement of our
camp life in these quarters. The hangers on about camp have been
driven away. The quartermaster's stores and those of the
commissary department have been separated and placed in tents
outside, where they can be found and got at. The most intelligent
among the recruits have been appointed corporals and sergeants,
and the screws of discipline turned on just a little more. Guards are
placed, more for their instruction than for our safety, and things are
putting on more the appearance of a military camp than a mere
lounging place, as it has heretofore been. Just as we had got
everything to our notion, a boat came, and on it were Captains
Merritt and Enoch with 120 more recruits. Tents and blankets were
given them and quarters assigned them, which altogether has made
a busy day for us. Discipline, what little there had been, went to the
winds when the men all got together. They all seemed to be
acquainted, and such jabbering French as they had. I suppose they
had lots of news to tell each other. Some can talk English, but all of
them can and do talk French when talking to each other. They came
from Colonel B.'s headquarters at Opelousas, and were in charge of
Colonel Parker, who got left behind at Newtown, and will be along on
the next boat. At night Dr. Warren, our surgeon to be, came from
New Orleans, and to-morrow will examine the recruits. Sol Drake has
been sent for to join Colonel B. at Opelousas and expects to leave
on the next boat. Opelousas is beyond where I have been. I have
posted Sol in getting as far as Mouton's, where we were, and
beyond that he must find out for himself.
October 30, 1863.
Friday. It has been a rainy day, but we have paid little attention to it.
Dr. Warren finished up his examination and nearly every man passed
muster. He was not as particular about it as Dr. Cole was at Hudson.
As fast as examined and passed we gave them their new clothes,
and a prouder set of people I never saw. Lieutenant Colonel Parker
came at night with later word from Colonel B. and Drake does not
have to go. For this he and the rest of us are glad. Colonel Parker
brought eight men with him and about as many women. We have
quite a respectable squad, and they are learning very fast—faster I
think than we did when we first began. Those that were rejected by
the surgeon as unsound are here yet, and what to do with them is a
puzzle to us. We have each of us taken one, to do anything for us
we can think of, and they seem perfectly happy. Mine is named Tony,
and is a great big good-natured soul, ready to do anything for me, if
I will only let him stay. He came to me at first asking if I would write
a letter to his wife, and when I asked him what I should write, told
me anything I was a mind to. I wrote the letter, telling her where he
was, and how he was, and put in a word for some of the others for
Tony's wife to tell their folks. This pleased him so much that he hung
around trying to do me a favor in return, and when he was rejected
by the doctor he said I must keep him, for he would be killed if he
went back home, because he had enlisted. The government allows
us transportation and a daily ration for a servant, so I am nothing
out, for he asks no other pay than his board and the privilege of
staying.
October 31, 1863.
Saturday. Lieutenant Colonel Parker and Dr. Warren left us to look
for a healthier place, as many of the men are getting chills and fever.
The ground is low and wet and I suppose is a regular breeding place
for fever and ague. We are glad of a prospect of a change, but this
country is all swampy and wet. The Teche country comes the
nearest to dry ground of anything I have seen. We are getting into
full swing. Companies A, B, and C are organized and assigned to
Captain Merritt, Captain Hoyt, and Captain Enoch. There are thirty
men left and these are turned over to Lieutenant Reynolds for drill.
At night, a telegram from Colonel Parker says we must stay at
Brashear City until our regiment is full. I have been out of sorts to-
day and have laid up for repairs.
November 1, 1863.
Sunday. Was detailed for officer of the guard, but not feeling well
Lieutenant Reynolds volunteered to act for me, for which I am very
much obliged. I put in another day trying to be sick, but toward
night gave it up as a failure. However, I put in the day by staying
indoors, writing letters for the men, some to their wives and some to
their sweethearts. The more love I can put in the letters, and the
bigger words I can use, the better they suit the sender. What effect
they have on those that receive them I happily do not know.
November 2, 1863.
Monday. I lay down last night thinking if only mother was here to fix
me up a dose, as she has so many times done, I should be well right
off. I soon dropped off, and the same thought kept right on going
through my brain until I awoke this morning and found myself in the
same position, lying crosswise of my bed just as I lay down last
night. But my dream of home had cured me, and I was myself
again, ready for whatever might come.
I found myself again on the detail for guard. After the new guard
was posted I had but little to do, except to see to it that the reliefs
were changed at the proper time. There was no enemy in sight,
though the guards were just as watchful as if the enemy had been in
the next yard. The worst was to remember the names of the
sergeants, and that I got round by writing them down. Even then I
had to guess at some. At night Colonel Parker came back from the
city, on his way to join Colonel B., who is at the front with the rest of
the gang. He brought me two letters, one saying father is sick and
the other saying he is well again. I am glad the good news came
with the bad, though I had much rather no news of that kind would
come. I also had a list of names of those drafted from the town of
North East. John and Perry Loucks and Amon Briggs were among
them. Whether they will go or get substitutes the letter did not say.
Also that another proclamation from the President calls for 300,000
more men. I wonder if he knows what an army we are raising for
him here. Report says an accident between here and Algiers last
night killed twelve soldiers and wounded over sixty more. One train
broke down and another ran into it, both loaded with soldiers. These
roads are so straight and level it would seem that accidents of that
kind might be avoided.
November 3, 1863.
Tuesday. I made a raise of a postage stamp to-day and sent a letter
home. The day has passed like all do nowadays, with little to do. But
it has been pleasant, and that is an exception I am happy to make a
note of. The quartermaster came in to-night with more tents, and
more supplies.
November 4, 1863.
Wednesday. The steamer Red Chief came down the Teche this
morning with more recruits, in charge of Lieutenants Gorton, Smith,
Heath and Ames. This will make more work and I am glad of it.
Lieutenant Colonel Parker has been on the point of starting up the
country again for several days, but has not gone yet. To-day he has
decided to move our quarters to higher ground. This is a wise thing
to do according to Dr. Warren, for a great many of the men are sick
with chills and fever. The site chosen is about a mile away. I am
detailed to see that the stuff gets off, and the others are to be on
the new site and receive it, and see to its proper distribution. I am
temporarily assigned to Company D. By noon I had everything on
the way, and after reaching camp helped to get Company D in as
good shape as the others. A regular camp is laid out and company
streets made. It made me think of the laying out of Camp Millington.
Grading the company streets and other necessary work will give us
something to do for days to come. I put in so much time helping the
others get fixed that I forgot my own tent, and as Captain Enoch
invited me to sleep with him, I accepted, and after fighting
mosquitoes until nearly midnight, I fell asleep and remained so until
late the next morning.
November 5, 1863.
Thursday. Tony was waiting for me when I woke up, and was feeling
badly because I had to go to the neighbors to sleep. After our hard-
tack and coffee were safely stowed away, I got my tent out and we
soon had it up. Then Tony began skirmishing for furnishings. He had
seen what the others had and set out to beat them all. He got hold
of a board wide enough and long enough for me to sleep on, and
soon had legs driven in the ground to hold it up. My modest
belongings were put under it, and the deed was done. Colonel
Parker gave a few parting orders and then took boat for New Iberia
to join Colonel B., leaving Captain Merritt, in command. Captain Laird
not yet having joined the command, I am curious to know what sort
of a man I am to serve under. Company D is as yet made up of raw
recruits, not yet having passed through the medical mill, so I have
only to keep them within bounds until they are examined and sworn
in as soldiers, when their education will begin.
At night Dr. Warren and Lieutenant John Mathers came from New
Orleans. A cold drizzling rain began about that time and we were
driven into our tents, where the hungry mosquitoes awaited us and
war was at once declared. If I had a brigade of men as determined
as these Brashear City mosquitoes, I believe I could sweep the
Rebellion off its feet in a month's time. They make no threats as our
home mosquitoes do, but pounce right on and the first notice you
get is a stab that brings the blood. I have had at least one bite for
every word I have written about them, and all in the same time I
have been writing it. The only escape from them is in the hot sun, or
under a blanket so thick they cannot reach through it.
November 6, 1863.
Friday. This morning Lieutenants Reynolds, Smith, Ames and myself
formed a club of four for mutual protection against starvation. We
have a rejected recruit for a cook, and have made a draft on the
commissary for salt horse, hard-tack and coffee. If he can't get up a
meal on that, then he's no cook for us. My company was examined
and almost every one proved to be sound enough for soldiers. A
dozen at a time were taken into a tent, where they stripped and
were put through the usual gymnastic performance, after which they
were measured for shoes and a suit, and then another dozen called
in. Some of them were scarred from head to foot where they had
been whipped. One man's back was nearly all one scar, as if the skin
had been chopped up and left to heal in ridges. Another had scars
on the back of his neck, and from that all the way to his heels every
little ways; but that was not such a sight as the one with the great
solid mass of ridges, from his shoulders to his hips. That beat all the
anti-slavery sermons ever yet preached. But this is over with now,
and I don't wonder their prayers are mostly of thanks to Massa
Linkum. They are very religious, holding prayer meetings every
night, after which the fiddle begins and dancing goes on all night, if
not stopped on account of the noise they make. I don't know how
they get along with so little sleep, or rest. After the examination we
got blankets and clothes from the quartermaster and they were
fitted as well as it is possible to fit from a ready-made stock.
Our cook, George, proved to be a jewel. He made salt beef taste so
much like a chicken we didn't notice the difference. Major Palon
came from the city at night, and brought some letters. One was for
me and contained three dollars from my old crony, Walt Loucks. This
will keep us in extras for a little while. We were some time deciding
how to use it, but a majority thought a part of it should go for flour,
so George could try his hand at pancakes.
November 7, 1863.
Saturday. I have never described our camp, and may never have a
better time than now. We are out of town, to the north, on high,
hard ground, for this country—so high that there is quite a slope
towards the water of Berwick Bay. Company streets are laid out and
the camp kept clean by a detail made each day for that purpose.
There are many large trees in and about our camp, and taken
altogether we have never had a stopping-place quite equal to it. The
sick list has shrunk already, though the hospital tent is pretty well
filled yet. We have company-drill every day and there is quite a strife
among us to see which can learn his troop the fastest. The men are
as eager to learn as we are to have them, which makes it much
easier for both parties. Berwick, which is directly opposite, is quite a
place from the looks, larger than Brashear. It is the shipping port for
the great Teche country that lies beyond.
Just after dinner Colonel Tarbell's orderly rode into camp and
inquired for me, handing me an order which read, Lieutenant
Lawrence Van Alstyne, commanding Company D, 90th U. S. C. I., at
Brashear City, La. Captain Vallance, quartermaster, will furnish the
bearer with a boat, in which he will proceed to Berwick and procure
a sufficient supply of lumber to floor the hospital tent in said
regiment. Signed, Tarbell, commander. I took five men and such
tools as we could find and called on Captain Vallance, who gave us a
boat in which we rowed across the bay, which was still as a mill
pond. We landed near a shanty which easily came apart, and which
had good wide boards, enough to floor several hospital tents. We
made these into a raft which we towed back, reaching camp without
having seen a person, except a guard—who considered my order
good enough authority for letting the boards go. We had boards
enough for the hospital tent and all the other tents, which as soon
as they are dry will be used for the comfort of all hands. At night
Lieutenant Gorton arrived from the city to take the next boat for
Newtown to join Colonel B.
Lieutenant Smith made me a present of a handsome pair of shoulder
straps. The groundwork is dark velvet and the border of gold cord
twisted and woven together. Altogether they are as handsome a pair
as I have ever seen on anybody's shoulders. I shall lay them away
until I get a coat fit to put them on, and that won't be until after pay
day. Thank you, Matt, I'll try and not disgrace them. I presume he
paid money for them that he needed for fodder; but that's just like
Matt Smith. Major Palon also returned to-night, and made some
changes. Lieutenant Ames, my partner in Company D, goes in the
medical department as clerk, and Lieutenant Reynolds takes his
place with me.
November 8, 1863.
Sunday. On duty to-day as officer of the guard. Generally that is a
light duty, but with these men it is not so much so. None of the men
can read or write, and so the sergeant and corporal of each relief
has to have the names of his relief repeated to him until he
remembers them. Even then there are many mix-ups that have to be
straightened out. The names are strange to me, and after writing
them as they sound, I find it difficult to pronounce them.
I went the rounds during every relief, and never failed to find
something out of joint. One at the Major's tent, whom I had taken
extra pains to educate, I found taking his gun apart to see how it
was made. Another had his shoes and stockings off and was walking
his beat with bare feet. Another had taken off his accoutrements and
piled them up at the end of his beat and was strutting back and
forth with folded arms. The only thing to do is to call up a man who
speaks both French and English and through him straighten the
matter out.
November 9, 1863.
Monday. To-day an order came to move to New Orleans. That is, all
the companies that are full. That leaves Company D here until more
men come. There is a regular jollification over the order, as none of
us are in love with this place. I suppose it would be a proper thing
for me to introduce the officers of the Ninetieth to whom the readers
of this diary may be, and as there is nothing to prevent I will do it
now. If I ever get a chance to read it myself it will call them up
before me as I now know them.
Colonel Edward Bostwick comes first, and any one who will be apt to
read this knows him as well as I. But as I want the list complete I
will begin with him and work down the line. He is about five feet ten
inches, light complexion, gray eyes, with brown hair and beard. He is
rather particular about his own appearance, and also that of the
men under him. He is always on the lookout for a higher limb to
roost on, and after getting there himself, is very good about helping
his friends up to him. He seldom drinks, never to excess, and on the
whole is a good soldier. He came out as captain of Company B,
128th New York. Was promoted to major of the First Louisiana
Engineers, May 2, 1863. He served at Port Hudson with them and
had the name of doing well whatever he was ordered to do. In
August 1863, was promoted to the rank of colonel, with permission
to raise a regiment from the freed slaves in this department, and this
he is now trying to do.
Lieutenant Colonel George Parker is from Poughkeepsie. Came out
as captain of Company D, 128th New York. On Colonel Bostwick's
recommendation he was promoted to his present rank. He is about
five feet seven inches, light complexion, sandy hair and beard. Is
well up in military tactics, and is afraid of nothing. Rushes right into
anything, regardless of getting out again. Is kind to his men, but a
strict disciplinarian. When his orders are obeyed he is all right, but
when he gets angry he acts without judgment or feeling for any one
or anything.
Major Rufus J. Palon is from Hudson. Came out as second lieutenant
in Company G, 128th New York. He has the army regulations and
military tactics at his tongue's end. Is pretty strict on discipline, but
never loses his head. Money has no value to him. He would give his
last cent to any one in need, even though he might be just as needy
himself.
Surgeon Charles E. Warren is tall, dark complexion, with dark sandy
hair and beard. So far as I know he is a good surgeon. He is free
with his money, and with the hospital whiskey. A real good fellow,
though not in all things the sort one can pattern after with safety.
Quartermaster Peter J. Schemerhorn left home as orderly sergeant
of Company G, 128th New York. Acted as second lieutenant of his
company at Port Hudson, and was afterwards detailed as clerk at
headquarters, where he remained until the formation of this
regiment, when he was made first lieutenant and acting
quartermaster. He makes a good quartermaster, seeing that his stock
is kept up and ready for distribution.
Adjutant T. Augustus Phillips is one of the boys. He served in the
Second Fire Zouaves in the three months' service and afterwards
came out as orderly sergeant in the 165th New York. Was detailed
as clerk at headquarters and in some way got a recommendation for
adjutant in Colonel Bostwick's regiment. He is a New York tough.
Gets drunk as a lord, and looks down upon any one else who does
not do as he does. He is not as popular in the regiment as he might
be.
Captain Thomas E. Merritt was formerly sergeant in Company I,
128th New York. Was raised to acting second lieutenant of same
company, and finally promoted to captain in this regiment. He has
traveled a great deal and remembers what he has seen. He seems
well fitted for the position he now holds and stands well with all
hands.
Captain Charles Hoyt is as good an all-round man as is often found.
He is fine-looking, a fine singer, has a way of being everyone's
friend, and making everyone a friend to himself. He is cut out more
for society than for the army. He takes now and then a drink, but
never gets beyond himself. Will share his last dollar or his last hard-
tack with any one. Altogether, he acts as a sort of balance wheel to
the rest of the machine, keeping some from going too fast, and
helping others to go faster. He would be missed if taken away, more
than any half dozen of us.
Captain Richard Enoch came out as first sergeant of Company I,
128th New York. He was wounded at Port Hudson, and did not again
join his company, being recommended for promotion as first
lieutenant in the Corps de Afrique, from which he came to us with a
captain's commission. He has a jovial disposition, but has a very
quiet way of showing it. He sometimes takes a little too much, and
then is reckless of his money and of the good name he has gained.
Every one likes him, because they cannot help it. As a military man I
doubt if he is ever heard much about. He had rather have a good
time, and no matter what is going on he generally manages to have
it.
There are several other officers who have not yet reported and of
them I know nothing. One of them is Captain Laird, who will be
captain of Company D, when he comes.
First Lieutenant Robert H. Clark was promoted from sergeant in the
116th New York. He is an excellent penman and would make a much
better clerk in some department office than he ever will a soldier. He
is rather hasty tempered, and has already had several jars with his
brother officers, particularly with Adjutant Phillips, whose assistant
he at present is. If Adjutant Phillips kicks clear out from the traces
Lieutenant Clark will probably succeed him.
First Lieutenant Martin Smith was formerly an engineer on the
Harlem R. R. He went out with a three months' regiment and
afterwards as sergeant in Company G, 128th New York. He is open-
hearted and outspoken. One can always tell where he is, for he is
not deceitful. He is well liked by his brother officers. Just now he lies
on his back on my bed making fun of a stove I have manufactured
out of a camp kettle. He has no idea I am writing his biography.
First Lieutenant Reuben Reynolds is from Hudson, N. Y. He came out
as a private in Company A, 128th New York. Was promoted to
corporal, then to sergeant and then to first lieutenant in this
regiment. He looks as if he had just been taken from a bandbox. No
matter what clothes he has on he always looks neat and well
dressed. He was on a three years' whaling voyage before the war,
and tells some very interesting stories of his life on shipboard.
Before he came to us he was detailed as clerk in the Y. M. C. A. at
New Orleans. He is a professor of religion, and I think tries to make
his profession and his army life jibe. We all respect him, though
none of us feel as if we fairly knew him.
First Lieutenant John Mathers is from Fishkill, N. Y. He came out as a
private in Company F, 128th New York. Was promoted to second
lieutenant in the Third Engineers, and from that to our regiment as
first lieutenant. For some unknown reason he and I took a dislike to
each other while in the 128th, and used to pass each other by as
one surly dog does another. Since we have been thrown together we
have talked the matter over, and neither of us can give any reason
for our mutual dislike. We are the firmest of friends now, together
much of the time we can call our own. We are not a bit alike. He is a
regular dandy in appearance but the commonest sort of a fellow
when you get at him.
First Lieutenant Charles Heath was a sergeant in Company I, 128th
New York. Was given a commission in the Third Louisiana Engineers,
and afterwards given the same position in this regiment. In my
opinion his head is not right. He acts strange at times. Sometimes he
is as quiet and docile as can be, and in a little while as profane and
foul-mouthed a man as I ever met. Is not ambitious, but seems to
take what comes as a matter of course. He has no intimates,
keeping mostly to himself. What influence ever brought him up from
the ranks I cannot imagine.
First Lieutenant Garret F. Dillon was promoted from sergeant in
Company H, 128th New York. He is a very small man, has a lisp, and
a mincing walk. He looks and acts as if he was cut out for a dandy,
but lacked the material for making one, and was thrown out in the
shape he now is.
First Lieutenant Charles M. Bell was first sergeant of Company G,
128th New York. At the battle of Port Hudson he happened to be
nearest Colonel Cowles when he fell. He received the colonel's dying
message to his mother and was sent home with the body. He is one
of the most capable of the whole lot of us. There is no position he
could not fill, were it not for his liking for strong drink. This he does
not seem able to control. I believe he tries to but lacks the strength
to resist the temptations that are constantly placed in his way. Poor
Bell, I pity him more than any other man here. With the right
influences about him, what a different man he might be. He has
more good traits than any of us can boast, but his one besetting
weakness is strong enough to overcome them all.
First Lieutenant George H. Gorton enlisted in the 128th New York, as
wagoner. Was promoted to commissary sergeant in the Third
Louisiana Engineers, and from there he came as first lieutenant to
this regiment. He is of a strange make-up. Is well liked by all, but
not greatly respected by any. Is a good horseman and would
probably make out better handling horses than he does men. Put
him anywhere, and he manages to make money, and manages to
spend it as fast as he gets it. Is free-hearted and obliging and I
never knew of his having an enemy. Neither does he make any
lasting friendships. He worked as teamster for Colonel Bostwick
before going into the army, and it was through Colonel Bostwick that
he got the position he now occupies.
First Lieutenant Henry C. Lay was a corporal in Company A, 128th
New York. I knew him while in that regiment, but he has not yet
reported for duty with us. He is on some special service and I
suppose will sometime turn up among us. From what little I know of
him I should say he will average well with the rest of us.
First Lieutenant George S. Drake was also with Colonel Bostwick
before he entered the army. He was commissary sergeant in the
128th New York, and always in close touch with Colonel B. He and I
have long been fast friends, so it will not do to say anything against
him. But I couldn't if I would. There is nothing but good to say of
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PDF
An Introduction To Text-To-Speech Synthesis
PDF
PhD-Thesis-ErhardRank
PDF
Thesis yossie
PDF
[Tobias herbig, franz_gerl]_self-learning_speaker_(book_zz.org)
PDF
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
PDF
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
PPTX
speech recognition and removal of disfluencies
PDF
ch1.pdf
An Introduction To Text-To-Speech Synthesis
PhD-Thesis-ErhardRank
Thesis yossie
[Tobias herbig, franz_gerl]_self-learning_speaker_(book_zz.org)
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
speech recognition and removal of disfluencies
ch1.pdf

Similar to Language And Speech Processing 1st Edition Joseph Mariani (20)

PPTX
Digital speech processing lecture1
PPTX
Speech Signal Processing
PPT
Automatic speech recognition
PPT
Automatic speech recognition
PDF
From sound to grammar: theory, representations and a computational model
DOC
Speaker recognition on matlab
PDF
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
DOCX
Speech Recognition
PDF
Dafx (digital audio-effects)
PDF
Speech recognition (dr. m. sabarimalai manikandan)
PDF
Advances In Speech And Music Technology Computational Aspects And Application...
PPT
Speech Technology Overview
PDF
DSP_Module5_Rev2.pdfICE3251_DSP_DIGITAL SYSTEM PROCESSING_MIT
PPTX
Automatic Speech Recognition
PDF
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
PPT
Automatic speech recognition
PPT
Automatic Speech Recognition.ppt
PPT
Principal characteristics of speech
PPTX
Speech recognition final presentation
Digital speech processing lecture1
Speech Signal Processing
Automatic speech recognition
Automatic speech recognition
From sound to grammar: theory, representations and a computational model
Speaker recognition on matlab
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
Speech Recognition
Dafx (digital audio-effects)
Speech recognition (dr. m. sabarimalai manikandan)
Advances In Speech And Music Technology Computational Aspects And Application...
Speech Technology Overview
DSP_Module5_Rev2.pdfICE3251_DSP_DIGITAL SYSTEM PROCESSING_MIT
Automatic Speech Recognition
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
Automatic speech recognition
Automatic Speech Recognition.ppt
Principal characteristics of speech
Speech recognition final presentation
Ad

Recently uploaded (20)

DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
Digestion and Absorption of Carbohydrates, Proteina and Fats
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
UNIT III MENTAL HEALTH NURSING ASSESSMENT
PDF
Empowerment Technology for Senior High School Guide
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Classroom Observation Tools for Teachers
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PPTX
Lesson notes of climatology university.
PDF
RMMM.pdf make it easy to upload and study
PDF
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Digestion and Absorption of Carbohydrates, Proteina and Fats
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
202450812 BayCHI UCSC-SV 20250812 v17.pptx
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
UNIT III MENTAL HEALTH NURSING ASSESSMENT
Empowerment Technology for Senior High School Guide
History, Philosophy and sociology of education (1).pptx
LDMMIA Reiki Yoga Finals Review Spring Summer
Final Presentation General Medicine 03-08-2024.pptx
Classroom Observation Tools for Teachers
A powerpoint presentation on the Revised K-10 Science Shaping Paper
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
Lesson notes of climatology university.
RMMM.pdf make it easy to upload and study
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
Chinmaya Tiranga quiz Grand Finale.pdf
Orientation - ARALprogram of Deped to the Parents.pptx
Supply Chain Operations Speaking Notes -ICLT Program
Ad

Language And Speech Processing 1st Edition Joseph Mariani

  • 1. Language And Speech Processing 1st Edition Joseph Mariani download https://ebookbell.com/product/language-and-speech-processing-1st- edition-joseph-mariani-2528496 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Statistical Language And Speech Processing 8th International Conference Slsp 2020 Cardiff Uk October 1416 2020 Proceedings 1st Ed Luis Espinosaanke https://ebookbell.com/product/statistical-language-and-speech- processing-8th-international-conference-slsp-2020-cardiff-uk- october-1416-2020-proceedings-1st-ed-luis-espinosaanke-22497272 Statistical Language And Speech Processing First International Conference Slsp 2013 Tarragona Spain July 2931 2013 Proceedings 1st Edition Yoshua Bengio Auth https://ebookbell.com/product/statistical-language-and-speech- processing-first-international-conference-slsp-2013-tarragona-spain- july-2931-2013-proceedings-1st-edition-yoshua-bengio-auth-4314662 Statistical Language And Speech Processing Second International Conference Slsp 2014 Grenoble France October 1416 2014 Proceedings 1st Edition Laurent Besacier https://ebookbell.com/product/statistical-language-and-speech- processing-second-international-conference-slsp-2014-grenoble-france- october-1416-2014-proceedings-1st-edition-laurent-besacier-4932916 Statistical Language And Speech Processing Third International Conference Slsp 2015 Budapest Hungary November 2426 2015 Proceedings 1st Edition Adrianhoria Dediu https://ebookbell.com/product/statistical-language-and-speech- processing-third-international-conference-slsp-2015-budapest-hungary- november-2426-2015-proceedings-1st-edition-adrianhoria-dediu-5354880
  • 3. Statistical Language And Speech Processing 4th International Conference Slsp 2016 Pilsen Czech Republic October 1112 2016 Proceedings 1st Edition Pavel Krl https://ebookbell.com/product/statistical-language-and-speech- processing-4th-international-conference-slsp-2016-pilsen-czech- republic-october-1112-2016-proceedings-1st-edition-pavel-krl-5607828 Statistical Language And Speech Processing 5th International Conference Slsp 2017 Le Mans France October 2325 2017 Proceedings 1st Edition Nathalie Camelin https://ebookbell.com/product/statistical-language-and-speech- processing-5th-international-conference-slsp-2017-le-mans-france- october-2325-2017-proceedings-1st-edition-nathalie-camelin-6790768 Statistical Language And Speech Processing 6th International Conference Slsp 2018 Mons Belgium October 1516 2018 Proceedings 1st Ed Thierry Dutoit https://ebookbell.com/product/statistical-language-and-speech- processing-6th-international-conference-slsp-2018-mons-belgium- october-1516-2018-proceedings-1st-ed-thierry-dutoit-7320198 Statistical Language And Speech Processing 7th International Conference Slsp 2019 Ljubljana Slovenia October 1416 2019 Proceedings 1st Ed 2019 Carlos Martnvide https://ebookbell.com/product/statistical-language-and-speech- processing-7th-international-conference-slsp-2019-ljubljana-slovenia- october-1416-2019-proceedings-1st-ed-2019-carlos-martnvide-10800606 Analysis And Application Of Natural Language And Speech Processing Mourad Abbas https://ebookbell.com/product/analysis-and-application-of-natural- language-and-speech-processing-mourad-abbas-49166200
  • 10. First published in France in 2002 by Hermes Science/Lavoisier entitled Traitement automatique du langage parlé 1 et 2 © LAVOISIER, 2002 First published in Great Britain and the United States in 2009 by ISTE Ltd and John Wiley & Sons, Inc. Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address: ISTE Ltd John Wiley & Sons, Inc. 27-37 St George’s Road 111 River Street London SW19 4EU Hoboken, NJ 07030 UK USA www.iste.co.uk www.wiley.com © ISTE Ltd, 2009 The rights of Joseph Mariani to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988. Library of Congress Cataloging-in-Publication Data Traitement automatique du langage parlé 1 et 2. English Spoken language processing / edited by Joseph Mariani. p. cm. Includes bibliographical references and index. ISBN 978-1-84821-031-8 1. Automatic speech recognition. 2. Speech processing systems. I. Mariani, Joseph. II. Title. TK7895.S65T7213 2008 006.4'54--dc22 2008036758 British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library ISBN: 978-1-84821-031-8 Printed and bound in Great Britain by CPI Antony Rowe Ltd, Chippenham, Wiltshire.
  • 11. Table of Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Chapter 1. Speech Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Christophe D’ALESSANDRO 1.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1. Source-filter model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2. Speech sounds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.3. Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.4. Vocal tract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.1.5. Lip-radiation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.2. Linear prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.2.1. Source-filter model and linear prediction . . . . . . . . . . . . . . . . 18 1.2.2. Autocorrelation method: algorithm . . . . . . . . . . . . . . . . . . . 21 1.2.3. Lattice filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.2.4. Models of the excitation . . . . . . . . . . . . . . . . . . . . . . . . . . 31 1.3. Short-term Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . 35 1.3.1. Spectrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 1.3.2. Interpretation in terms of filter bank. . . . . . . . . . . . . . . . . . . 36 1.3.3. Block-wise interpretation . . . . . . . . . . . . . . . . . . . . . . . . . 37 1.3.4. Modification and reconstruction . . . . . . . . . . . . . . . . . . . . . 38 1.4. A few other representations . . . . . . . . . . . . . . . . . . . . . . . . . . 39 1.4.1. Bilinear time-frequency representations . . . . . . . . . . . . . . . . 39 1.4.2. Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 1.4.3. Cepstrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 1.4.4. Sinusoidal and harmonic representations . . . . . . . . . . . . . . . . 46 1.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 1.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
  • 12. vi Spoken Language Processing Chapter 2. Principles of Speech Coding . . . . . . . . . . . . . . . . . . . . . . 55 Gang FENG and Laurent GIRIN 2.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.1.1. Main characteristics of a speech coder . . . . . . . . . . . . . . . . . 57 2.1.2. Key components of a speech coder . . . . . . . . . . . . . . . . . . . 59 2.2. Telephone-bandwidth speech coders . . . . . . . . . . . . . . . . . . . . . 63 2.2.1. From predictive coding to CELP. . . . . . . . . . . . . . . . . . . . . 65 2.2.2. Improved CELP coders . . . . . . . . . . . . . . . . . . . . . . . . . . 69 2.2.3. Other coders for telephone speech . . . . . . . . . . . . . . . . . . . . 77 2.3. Wideband speech coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 2.3.1. Transform coding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 2.3.2. Predictive transform coding. . . . . . . . . . . . . . . . . . . . . . . . 85 2.4. Audiovisual speech coding. . . . . . . . . . . . . . . . . . . . . . . . . . . 86 2.4.1. A transmission channel for audiovisual speech . . . . . . . . . . . . 86 2.4.2. Joint coding of audio and video parameters . . . . . . . . . . . . . . 88 2.4.3. Prospects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 2.5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Chapter 3. Speech Synthesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Olivier BOËFFARD and Christophe D’ALESSANDRO 3.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 3.2. Key goal: speaking for communicating . . . . . . . . . . . . . . . . . . . 100 3.2.1. What acoustic content? . . . . . . . . . . . . . . . . . . . . . . . . . . 101 3.2.2. What melody? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 3.2.3. Beyond the strict minimum . . . . . . . . . . . . . . . . . . . . . . . . 103 3.3 Synoptic presentation of the elementary modules in speech synthesis systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 3.3.1. Linguistic processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.3.2. Acoustic processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.3.3. Training models automatically . . . . . . . . . . . . . . . . . . . . . . 106 3.3.4. Operational constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 107 3.4. Description of linguistic processing . . . . . . . . . . . . . . . . . . . . . 107 3.4.1. Text pre-processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 3.4.2. Grapheme-to-phoneme conversion . . . . . . . . . . . . . . . . . . . 108 3.4.3. Syntactic-prosodic analysis . . . . . . . . . . . . . . . . . . . . . . . . 110 3.4.4. Prosodic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 3.5. Acoustic processing methodology . . . . . . . . . . . . . . . . . . . . . . 114 3.5.1. Rule-based synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 3.5.2. Unit-based concatenative synthesis . . . . . . . . . . . . . . . . . . . 115 3.6. Speech signal modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 3.6.1. The source-filter assumption . . . . . . . . . . . . . . . . . . . . . . . 118 3.6.2. Articulatory model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 3.6.3. Formant-based modeling . . . . . . . . . . . . . . . . . . . . . . . . . 119
  • 13. Table of Contents vii 3.6.4. Auto-regressive modeling . . . . . . . . . . . . . . . . . . . . . . . . . 120 3.6.5. Harmonic plus noise model . . . . . . . . . . . . . . . . . . . . . . . . 120 3.7. Control of prosodic parameters: the PSOLA technique . . . . . . . . . . 122 3.7.1. Methodology background . . . . . . . . . . . . . . . . . . . . . . . . . 124 3.7.2. The ancestors of the method . . . . . . . . . . . . . . . . . . . . . . . 125 3.7.3. Descendants of the method . . . . . . . . . . . . . . . . . . . . . . . . 128 3.7.4. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 3.8. Towards variable-size acoustic units . . . . . . . . . . . . . . . . . . . . . 131 3.8.1. Constitution of the acoustic database . . . . . . . . . . . . . . . . . . 134 3.8.2. Selection of sequences of units . . . . . . . . . . . . . . . . . . . . . . 138 3.9. Applications and standardization . . . . . . . . . . . . . . . . . . . . . . . 142 3.10. Evaluation of speech synthesis. . . . . . . . . . . . . . . . . . . . . . . . 144 3.10.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 3.10.2. Global evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 3.10.3. Analytical evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 3.10.4. Summary for speech synthesis evaluation. . . . . . . . . . . . . . . 153 3.11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 3.12. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Chapter 4. Facial Animation for Visual Speech . . . . . . . . . . . . . . . . . 169 Thierry GUIARD-MARIGNY 4.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 4.2. Applications of facial animation for visual speech. . . . . . . . . . . . . 170 4.2.1. Animation movies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 4.2.2. Telecommunications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 4.2.3. Human-machine interfaces . . . . . . . . . . . . . . . . . . . . . . . . 170 4.2.4. A tool for speech research. . . . . . . . . . . . . . . . . . . . . . . . . 171 4.3. Speech as a bimodal process. . . . . . . . . . . . . . . . . . . . . . . . . . 171 4.3.1. The intelligibility of visible speech . . . . . . . . . . . . . . . . . . . 172 4.3.2. Visemes for facial animation . . . . . . . . . . . . . . . . . . . . . . . 174 4.3.3. Synchronization issues. . . . . . . . . . . . . . . . . . . . . . . . . . . 175 4.3.4. Source consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 4.3.5. Key constraints for the synthesis of visual speech. . . . . . . . . . . 177 4.4. Synthesis of visual speech . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 4.4.1. The structure of an artificial talking head. . . . . . . . . . . . . . . . 178 4.4.2. Generating expressions . . . . . . . . . . . . . . . . . . . . . . . . . . 178 4.5. Animation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 4.5.1. Analysis of the image of a face. . . . . . . . . . . . . . . . . . . . . . 180 4.5.2. The puppeteer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 4.5.3. Automatic analysis of the speech signal . . . . . . . . . . . . . . . . 181 4.5.4. From the text to the phonetic string . . . . . . . . . . . . . . . . . . . 181 4.6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 4.7. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
  • 14. viii Spoken Language Processing Chapter 5. Computational Auditory Scene Analysis . . . . . . . . . . . . . . 189 Alain DE CHEVEIGNÉ 5.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 5.2. Principles of auditory scene analysis . . . . . . . . . . . . . . . . . . . . . 191 5.2.1. Fusion versus segregation: choosing a representation . . . . . . . . 191 5.2.2. Features for simultaneous fusion. . . . . . . . . . . . . . . . . . . . . 191 5.2.3. Features for sequential fusion. . . . . . . . . . . . . . . . . . . . . . . 192 5.2.4. Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 5.2.5. Illusion of continuity, phonemic restoration . . . . . . . . . . . . . . 193 5.3. CASA principles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 5.3.1. Design of a representation. . . . . . . . . . . . . . . . . . . . . . . . . 193 5.4. Critique of the CASA approach . . . . . . . . . . . . . . . . . . . . . . . . 200 5.4.1. Limitations of ASA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 5.4.2. The conceptual limits of “separable representation” . . . . . . . . . 202 5.4.3. Neither a model, nor a method? . . . . . . . . . . . . . . . . . . . . . 203 5.5. Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 5.5.1. Missing feature theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 5.5.2. The cancellation principle. . . . . . . . . . . . . . . . . . . . . . . . . 204 5.5.3. Multimodal integration . . . . . . . . . . . . . . . . . . . . . . . . . . 205 5.5.4. Auditory scene synthesis: transparency measure . . . . . . . . . . . 205 5.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Chapter 6. Principles of Speech Recognition . . . . . . . . . . . . . . . . . . . 213 Renato DE MORI and Brigitte BIGI 6.1. Problem definition and approaches to the solution. . . . . . . . . . . . . 213 6.2. Hidden Markov models for acoustic modeling . . . . . . . . . . . . . . . 216 6.2.1. Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 6.2.2. Observation probability and model parameters . . . . . . . . . . . . 217 6.2.3. HMM as probabilistic automata . . . . . . . . . . . . . . . . . . . . . 218 6.2.4. Forward and backward coefficients . . . . . . . . . . . . . . . . . . . 219 6.3. Observation probabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 6.4. Composition of speech unit models . . . . . . . . . . . . . . . . . . . . . 223 6.5. The Viterbi algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 6.6. Language models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 6.6.1. Perplexity as an evaluation measure for language models . . . . . . 230 6.6.2. Probability estimation in the language model . . . . . . . . . . . . . 232 6.6.3. Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . 234 6.6.4. Bayesian estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 6.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 6.8. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
  • 15. Table of Contents ix Chapter 7. Speech Recognition Systems . . . . . . . . . . . . . . . . . . . . . . 239 Jean-Luc GAUVAIN and Lori LAMEL 7.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 7.2. Linguistic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 7.3. Lexical representation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 7.4. Acoustic modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 7.4.1. Feature extraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 7.4.2. Acoustic-phonetic models. . . . . . . . . . . . . . . . . . . . . . . . . 249 7.4.3. Adaptation techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 7.5. Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 7.6. Applicative aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 7.6.1. Efficiency: speed and memory . . . . . . . . . . . . . . . . . . . . . . 257 7.6.2. Portability: languages and applications . . . . . . . . . . . . . . . . . 259 7.6.3. Confidence measures. . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 7.6.4. Beyond words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 7.7. Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 7.7.1. Text dictation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 7.7.2. Audio document indexing. . . . . . . . . . . . . . . . . . . . . . . . . 263 7.7.3. Dialog systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 7.8. Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 7.9. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 Chapter 8. Language Identification . . . . . . . . . . . . . . . . . . . . . . . . . 279 Martine ADDA-DECKER 8.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 8.2. Language characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 8.3. Language identification by humans. . . . . . . . . . . . . . . . . . . . . . 286 8.4. Language identification by machines. . . . . . . . . . . . . . . . . . . . . 287 8.4.1. LId tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 8.4.2. Performance measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 8.4.3. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 8.5. LId resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 8.6. LId formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 8.7. Lid modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 8.7.1. Acoustic front-end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 8.7.2. Acoustic language-specific modeling . . . . . . . . . . . . . . . . . . 300 8.7.3. Parallel phone recognition. . . . . . . . . . . . . . . . . . . . . . . . . 302 8.7.4. Phonotactic modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 8.7.5. Back-end optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . 309 8.8. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 8.9. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
  • 16. x Spoken Language Processing Chapter 9. Automatic Speaker Recognition . . . . . . . . . . . . . . . . . . . . 321 Frédéric BIMBOT. 9.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 9.1.1. Voice variability and characterization. . . . . . . . . . . . . . . . . . 321 9.1.2. Speaker recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 9.2. Typology and operation of speaker recognition systems . . . . . . . . . 324 9.2.1. Speaker recognition tasks . . . . . . . . . . . . . . . . . . . . . . . . . 324 9.2.2. Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 9.2.3. Text-dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 9.2.4. Types of errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 9.2.5. Influencing factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 9.3. Fundamentals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 9.3.1. General structure of speaker recognition systems . . . . . . . . . . . 329 9.3.2. Acoustic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 9.3.3. Probabilistic modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 9.3.4. Identification and verification scores . . . . . . . . . . . . . . . . . . 335 9.3.5. Score compensation and decision . . . . . . . . . . . . . . . . . . . . 337 9.3.6. From theory to practice . . . . . . . . . . . . . . . . . . . . . . . . . . 342 9.4. Performance evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 9.4.1. Error rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 9.4.2. DET curve and EER . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 9.4.3. Cost function, weighted error rate and HTER . . . . . . . . . . . . . 346 9.4.4. Distribution of errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 9.4.5. Orders of magnitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 9.5. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 9.5.1. Physical access control. . . . . . . . . . . . . . . . . . . . . . . . . . . 348 9.5.2. Securing remote transactions . . . . . . . . . . . . . . . . . . . . . . . 349 9.5.3. Audio information indexing. . . . . . . . . . . . . . . . . . . . . . . . 350 9.5.4. Education and entertainment . . . . . . . . . . . . . . . . . . . . . . . 350 9.5.5. Forensic applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 9.5.6. Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 9.6. Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 9.7. Further reading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Chapter 10. Robust Recognition Methods . . . . . . . . . . . . . . . . . . . . . 355 Jean-Paul HATON 10.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 10.2. Signal pre-processing methods. . . . . . . . . . . . . . . . . . . . . . . . 357 10.2.1. Spectral subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 10.2.2. Adaptive noise cancellation . . . . . . . . . . . . . . . . . . . . . . . 358 10.2.3. Space transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 10.2.4. Channel equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 10.2.5. Stochastic models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 10.3. Robust parameters and distance measures . . . . . . . . . . . . . . . . . 360
  • 17. Table of Contents xi 10.3.1. Spectral representations . . . . . . . . . . . . . . . . . . . . . . . . . 361 10.3.2. Auditory models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 10.3.3 Distance measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 10.4. Adaptation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 10.4.1 Model composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 10.4.2. Statistical adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 10.5. Compensation of the Lombard effect . . . . . . . . . . . . . . . . . . . . 368 10.6. Missing data scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 10.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 10.8. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 Chapter 11. Multimodal Speech: Two or Three senses are Better than One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 Jean-Luc SCHWARTZ, Pierre ESCUDIER and Pascal TEISSIER 11.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 11.2. Speech is a multimodal process . . . . . . . . . . . . . . . . . . . . . . . 379 11.2.1. Seeing without hearing . . . . . . . . . . . . . . . . . . . . . . . . . . 379 11.2.2. Seeing for hearing better in noise. . . . . . . . . . . . . . . . . . . . 380 11.2.3. Seeing for better hearing… even in the absence of noise. . . . . . 382 11.2.4. Bimodal integration imposes itself to perception . . . . . . . . . . 383 11.2.5. Lip reading as taking part to the ontogenesis of speech. . . . . . . 385 11.2.6. ...and to its phylogenesis ? . . . . . . . . . . . . . . . . . . . . . . . . 386 11.3. Architectures for audio-visual fusion in speech perception . . . . . . . 388 11.3.1.Three paths for sensory interactions in cognitive psychology . . . 389 11.3.2. Three paths for sensor fusion in information processing . . . . . . 390 11.3.3. The four basic architectures for audiovisual fusion . . . . . . . . . 391 11.3.4. Three questions for a taxonomy . . . . . . . . . . . . . . . . . . . . 392 11.3.5. Control of the fusion process . . . . . . . . . . . . . . . . . . . . . . 394 11.4. Audio-visual speech recognition systems . . . . . . . . . . . . . . . . . 396 11.4.1. Architectural alternatives . . . . . . . . . . . . . . . . . . . . . . . . 397 11.4.2. Taking into account contextual information . . . . . . . . . . . . . 401 11.4.3. Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 11.5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 11.6. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406 Chapter 12. Speech and Human-Computer Communication . . . . . . . . . 417 Wolfgang MINKER & Françoise NÉEL 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 12.2. Context. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418 12.2.1. The development of micro-electronics. . . . . . . . . . . . . . . . . 419 12.2.2. The expansion of information and communication technologies and increasing interconnection of computer systems . . . . . . . . . . . . . . . 420
  • 18. xii Spoken Language Processing 12.2.3. The coordination of research efforts and the improvement of automatic speech processing systems . . . . . . . . . . . . . . . . . . . . . . 421 12.3. Specificities of speech. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424 12.3.1. Advantages of speech as a communication mode . . . . . . . . . . 424 12.3.2. Limitations of speech as a communication mode . . . . . . . . . . 425 12.3.3. Multidimensional analysis of commercial speech recognition products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 12.4. Application domains with voice-only interaction. . . . . . . . . . . . . 430 12.4.1. Inspection, control and data acquisition . . . . . . . . . . . . . . . . 431 12.4.2. Home automation: electronic home assistant . . . . . . . . . . . . . 432 12.4.3. Office automation: dictation and speech-to-text systems . . . . . . 432 12.4.4. Training. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 12.4.5. Automatic translation . . . . . . . . . . . . . . . . . . . . . . . . . . 438 12.5. Application domains with multimodal interaction . . . . . . . . . . . . 439 12.5.1. Interactive terminals . . . . . . . . . . . . . . . . . . . . . . . . . . . 440 12.5.2. Computer-aided graphic design. . . . . . . . . . . . . . . . . . . . . 441 12.5.3. On-board applications . . . . . . . . . . . . . . . . . . . . . . . . . . 442 12.5.4. Human-human communication facilitation . . . . . . . . . . . . . . 444 12.5.5. Automatic indexing of audio-visual documents . . . . . . . . . . . 446 12.6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 12.7. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 Chapter 13. Voice Services in the Telecom Sector . . . . . . . . . . . . . . . . 455 Laurent COURTOIS, Patrick BRISARD and Christian GAGNOULET 13.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 13.2. Automatic speech processing and telecommunications . . . . . . . . . 456 13.3. Speech coding in the telecommunication sector . . . . . . . . . . . . . 456 13.4. Voice command in telecom services . . . . . . . . . . . . . . . . . . . . 457 13.4.1. Advantages and limitations of voice command . . . . . . . . . . . 457 13.4.2. Major trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 13.4.3. Major voice command services . . . . . . . . . . . . . . . . . . . . . 460 13.4.4. Call center automation (operator assistance) . . . . . . . . . . . . . 460 13.4.5. Personal voice phonebook . . . . . . . . . . . . . . . . . . . . . . . . 462 13.4.6. Voice personal telephone assistants . . . . . . . . . . . . . . . . . . 463 13.4.7. Other services based on voice command . . . . . . . . . . . . . . . 463 13.5. Speaker verification in telecom services . . . . . . . . . . . . . . . . . . 464 13.6. Text-to-speech synthesis in telecommunication systems . . . . . . . . 464 13.7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 13.8. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 List of Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
  • 19. Preface This book, entitled Spoken Language Processing, addresses all the aspects covering the automatic processing of spoken language: how to automate its production and perception, how to synthesize and understand it. It calls for existing know-how in the field of signal processing, pattern recognition, stochastic modeling, computational linguistics, human factors, but also relies on knowledge specific to spoken language. The automatic processing of spoken language covers activities related to the analysis of speech, including variable rate coding to store or transmit it, to its synthesis, especially from text, to its recognition and understanding, should it be for a transcription, possibly followed by an automatic indexation, or for human-machine dialog or human-human machine-assisted interaction. It also includes speaker and spoken language recognition. These tasks may take place in a noisy environment, which makes the problem even more difficult. The activities in the field of automatic spoken language processing started after the Second World War with the works on the Vocoder and Voder at Bell Labs by Dudley and colleagues, and were made possible by the availability of electronic devices. Initial research work on basic recognition systems was carried out with very limited computing resources in the 1950s. The computer facilities that became available to researchers in the 1970s made it possible to achieve initial progress within laboratories, and microprocessors then led to the early commercialization of the first voice recognition and speech synthesis systems at an affordable price. The steady progress in the speed of computers and in the storage capacity accompanied the scientific advances in the field. Research investigations in the 1970s, including those carried out in the large DARPA “Speech Understanding Systems” (SUS) program in the USA, suffered from a lack of availability of speech data and of means and methods for evaluating
  • 20. xiv Spoken Language Processing the performance of different approaches and systems. The establishment by DARPA, as part of its following program launched in 1984, of a national language resources center, the Linguistic Data Consortium (LDC), and of a system assessment center, within the National Institute of Standards and Technology (NIST, formerly NBS), brought this area of research into maturity. The evaluation campaigns in the area of speech recognition, launched in 1987, made it possible to compare the different approaches that had coexisted up to then, based on “Artificial Intelligence” methods or on stochastic modeling methods using large amounts of data for training, with a clear advantage to the latter. This led progressively to a quasi-generalization of stochastic approaches in most laboratories in the world. The progress made by researchers has constantly accompanied the increasing difficulty of the tasks which were handled, starting from the recognition of sentences read aloud, with a limited vocabulary of 1,000 words, either speaker-dependent or speaker-independent, to the dictation of newspaper articles for vocabularies of 5,000, 20,000 and 64,000 words, and then to the transcription of radio or television broadcast news, with unlimited size vocabularies. These evaluations were opened to the international community in 1992. They first focused on the American English language, but early initiatives were also carried out on the French, German or British English languages in a French or European context. Other campaigns were subsequently held on speaker recognition, language identification or speech synthesis in various contexts, allowing for a better understanding of the pros and cons of an approach, and for measuring the status of technology and the progress achieved or still to be achieved. They led to the conclusion that a sufficient level of maturation has been reached for putting the technology on the market, in the field of voice dictation systems for example. However, it also identified the difficulty of other more challenging problems, such as those related to the recognition of conversational speech, justifying the need to keep on supporting fundamental research in this area. This book consists of two parts: a first part discusses the analysis and synthesis of speech and a second part speech recognition and understanding. The first part starts with a brief introduction of the principles of speech production, followed by a broad overview of the methods for analyzing speech: linear prediction, short-term Fourier transform, time-representations, wavelets, cepstrum, etc. The main methods for speech coding are then developed for the telephone bandwidth, such as the CELP coder, or, for broadband communication, such as “transform coding” and quantization methods. The audio-visual coding of speech is also introduced. The various operations to be carried out in a text-to-speech synthesis system are then presented regarding the linguistic processes (grapheme-to-phoneme transcription, syntactic and prosodic analysis) and the acoustic processes, using rule-based approaches or approaches based on the concatenation of variable length acoustic units. The different types of speech signal modeling – articulatory, formant-based, auto-regressive, harmonic-noise or PSOLA-like – are then described. The evaluation of speech synthesis systems is a topic of specific attention in this chapter. The
  • 21. Preface xv extension of speech synthesis to talking faces animation is the subject of the next chapter, with a presentation of the application fields, of the interest of a bimodal approach and of models used to synthesize and animate the face. Finally, computational auditory scene analysis opens prospects in the signal processing of speech, especially in noisy environments. The second part of the book focuses on speech recognition. The principles of speech recognition are first presented. Hidden Markov models are introduced, as well as their use for the acoustic modeling of speech. The Viterbi algorithm is depicted, before introducing language modeling and the way to estimate probabilities. It is followed by a presentation of recognition systems, based on those principles and on the integration of those methodologies, and of lexical and acoustic-phonetic knowledge. The applicative aspects are highlighted, such as efficiency, portability and confidence measures, before describing three types of recognition systems: for text dictation, for audio documents indexing and for oral dialog. Research in language identification aims at recognizing which language is spoken, using acoustic, phonetic, phonotactic or prosodic information. The characteristics of languages are introduced and the way humans or machines can achieve that task is depicted, with a large presentation of the present performances of such systems. Speaker recognition addresses the recognition and verification of the identity of a person based on his voice. After an introduction on what characterizes a voice, the different types and designs of systems are presented, as well as their theoretical background. The way to evaluate the performances of speaker recognition systems and the applications of this technology are a specific topic of interest. The use of speech or speaker recognition systems in noisy environments raises especially difficult problems to solve, but they must be taken into account in any operational use of such systems. Various methods are available, either by pre-processing the signal, during the parameterization phase, by using specific distances or by adaptation methods. The Lombard effect, which causes a change in the production of the voice signal itself due to the noisy environment surrounding the speaker, benefits from a special attention. Along with recognition based solely on the acoustic signal, bi-modal recognition combines two acquisition channels: auditory and visual. The value added by bimodal processing in a noisy environment is emphasized and architectures for the audiovisual merging of audio and visual speech recognition are presented. Finally, applications of automatic spoken language processing systems, generally for human-machine communication and particularly in telecommunications, are described. Many applications of speech coding, recognition or synthesis exist in many fields, and the market is growing rapidly. However, there are still technological and psychological barriers that require more work on modeling human factors and ergonomics, in order to make those systems widely accepted.
  • 22. xvi Spoken Language Processing The reader, undergraduate or graduate student, engineer or researcher will find in this book many contributions of leading French experts of international renown who share the same enthusiasm for this exciting field: the processing by machines of a capacity which used to be specific to humans: language. Finally, as editor, I would like to warmly thank Anna and Frédéric Bimbot for the excellent work they achieved in translating the book Traitement automatique du langage parlé, on which this book is based. Joseph Mariani November 2008
  • 23. Chapter 1 Speech Analysis 1.1. Introduction 1.1.1. Source-filter model Speech, the acoustic manifestation of language, is probably the main means of communication between human beings. The invention of telecommunications and the development of digital information processing have therefore entailed vast amounts of research aimed at understanding the mechanisms of speech communication. Speech can be approached from different angles. In this chapter, we will consider speech as a signal, a one-dimensional function, which depends on the time variable (as in [BOI 87, OPP 89, PAR 86, RAB 75, RAB 77]). The acoustic speech signal is obtained at a given point in space by a sensor (microphone) and converted into electrical values. These values are denoted ) (t s and they represent a real-valued function of real variable t, analogous to the variation of the acoustic pressure. Even if the acoustic form of the speech signal is the most widespread (it is the only signal transmitted over the telephone), other types of analysis also exist, based on alternative physiological signals (for instance, the electroglottographic signal, the palatographic signal, the airflow), or related to other modalities (for example, the image of the face or the gestures of the articulators). The field of speech analysis covers the set of methods aiming at the extraction of information on and from this signal, in various applications, such as: Chapter written by Christophe D’ALESSANDRO.
  • 24. 2 Spoken Language Processing – speech coding: the compression of information carried by the acoustic signal, in order to save data storage or to reduce transmission rate; – speech recognition and understanding, speaker and spoken language recognition; – speech synthesis or automatic speech generation, from an arbitrary text; – speech signal processing, which covers many applications, such as auditory aid, denoising, speech encrypting, echo cancellation, post-processing for audiovisual applications; – phonetic and linguistic analysis, speech therapy, voice monitoring in professional situations (for instance, singers, speakers, teachers, managers, etc.). Two ways of approaching signal analysis can be distinguished: the model-based approach and the representation-based approach. When a voice signal model (or a voice production model or a voice perception model) is assumed, the goal of the analysis step is to identify the parameters of that model. Thus, many analysis methods, referred to as parametric methods, are based on the source-filter model of speech production; for example, the linear prediction method. On the other hand, when no particular hypothesis is made on the signal, mathematical representations equivalent to its time representation can be defined, so that new information can be drawn from the coefficients of the representation. An example of a non-parametric method is the short-term Fourier transform (STFT). Finally, there are some hybrid methods (sometimes referred to as semi-parametric). These consist of estimating some parameters from non-parametric representations. The sinusoidal and cepstral representations are examples of semi-parametric representation. This chapter is centered on the linear acoustic source-filter speech production model. It presents the most common speech signal analysis techniques, together with a few illustrations. The reader is assumed to be familiar with the fundamentals of digital signal processing, such as discrete-time signals, Fourier transform, Laplace transform, Z-transforms and digital filters. 1.1.2. Speech sounds The human speech apparatus can be broken down into three functional parts [HAR 76]: 1) the lungs and trachea, 2) the larynx and 3) the vocal tract. The abdomen and thorax muscles are the engine of the breathing process. Compressed by the muscular system, the lungs act as bellows and supply some air under pressure which travels through the trachea (subglottic pressure). The airflow thus expired is then modulated by the movements of the larynx and those of the vocal tract.
  • 25. Speech Analysis 3 The larynx is composed of the set of muscles, articulated cartilage, ligaments and mucous membranes located between the trachea on one side, and the pharyngeal cavity on the other side. The cartilage, ligaments and muscles in the larynx can set the vocal cords in motion, the opening of which is called the glottis. When the vocal cords lie apart from each other, the air can circulate freely through the glottis and no sound is produced. When both membranes are close to each other, they can join and modulate the subglottic airflow and pressure, thus generating isolated pulses or vibrations. The fundamental frequency of these vibrations governs the pitch of the voice signal (F0). The vocal tract can be subdivided into three cavities: the pharynx (from the larynx to the velum and the back of the tongue), the oral tract (from the pharynx to the lips) and the nasal cavity. When it is open, the velum is able to divert some air from the pharynx to the nasal cavity. The geometrical configuration of the vocal tract depends on the organs responsible for the articulation: jaws, lips, tongue. Each language uses a certain subset of sounds, among those that the speech apparatus can produce [MAL 74]. The smallest distinctive sound units used in a given language are called phonemes. The phoneme is the smallest spoken unit which, when substituted with another one, changes the linguistic content of an utterance. For instance, changing the initial /p/ sound of “pig” (/pIg/) into /b / yields a different word: “big” (/bIg/). Therefore, the phonemes /p/ and /b/ can be distinguished from each other. A set of phonemes, which can be used for the description of various languages [WEL 97], is given in Table 1.1 (described both by the International Phonetic Alphabet, IPA, and the computer readable Speech Assessment Methodologies Phonetic Alphabet, SAMPA). The first subdivision that is observed relates to the excitation mode and to the vocal tract stability: the distinction between vowels and consonants. Vowels correspond to a periodic vibration of the vocal cords and to a stable configuration of the vocal tract. Depending on whether the nasal branch is open or not (as a result of the lowering of the velum), vowels have either a nasal or an oral character. Semivowels are produced when the periodic glottal excitation occurs simultaneously with a fast movement of the vocal tract, between two vocalic positions. Consonants correspond to fast constriction movements of the articulatory organs, i.e. generally to rather unstable sounds, which evolve over time. For fricatives, a strong constriction of the vocal tract causes a friction noise. If the vocal cords vibrate at the same time, the fricative consonant is then voiced. Otherwise, if the vocal folds let the air pass through without producing any sound, the fricative is unvoiced. Plosives are obtained by a complete obstruction of the vocal tract, followed by a release phase. If produced together with the vibration of the vocal
  • 26. 4 Spoken Language Processing cords, the plosive is voiced, otherwise it is unvoiced. If the nasal branch is opened during the mouth closure, the produced sound is a nasal consonant. Semivowels are considered voiced consonants, resulting from a fast movement which briefly passes through the articulatory position of a vowel. Finally, liquid consonants are produced as the combination of a voiced excitation and fast articulatory movements, mainly from the tongue. SAMPA IPA Unicode label and exemplification symbol ASCII hex dec. Vowels A 65 Ǡ script a 0251 593 open back unrounded, Cardinal 5, Eng. start { 123 æ ae ligature 00E6 230 near-open front unrounded, Eng. trap 6 54 ǟ turned a 0250 592 open schwa, Ger. besser Q 81 ǡ turned script a 0252 594 open back rounded, Eng. lot E 69 Ǫ epsilon 025B 603 open-mid front unrounded, Fr. même @ 64 ԥ turned e 0259 601 schwa, Eng. banana 3 51 ǫ rev. epsilon 025C 604 long mid central, Eng. nurse I 73 ǹ small cap I 026A 618 lax close front unrounded, Eng. kit O 79 ǣ turned c 0254 596 open-mid back rounded, Eng. thought 2 50 ø o-slash 00F8 248 close-mid front rounded, Fr. deux 9 57 œ oe ligature 0153 339 open-mid front rounded, Fr. neuf 38 ȅ s.c. OE ligature 0276 630 open front rounded, Swedish skörd U 85 ș upsilon 028A 650 lax close back rounded, Eng. foot } 125 Ș barred u 0289 649 close central rounded, Swedish sju V 86 ț turned v 028C 652 open-mid back unrounded, Eng. strut Y 89 Ȟ small cap Y 028F 655 lax [y], Ger. hübsch
  • 27. Speech Analysis 5 Consonants B 66 ȕ beta 03B2 946 Voiced bilabial fricative, Sp. cabo C 67 ç c-cedilla 00E7 231 voiceless palatal fricative, Ger. ich D 68 ð eth 00F0 240 Voiced dental fricative, Eng. then G 71 Dz gamma 0263 611 Voiced velar fricative, Sp. fuego L 76 ȝ turned y 028E 654 Palatal lateral, It. famiglia J 74 ȁ left-tail n 0272 626 Palatal nasal, Sp. año N 78 ƾ eng 014B 331 velar nasal, Eng. thing R 82 Ȑ inv. s.c. R 0281 641 Voiced uvular fricative. or trill, Fr. roi S 83 Ȓ esh 0283 643 voiceless palatoalveolar fricative, Eng. ship T 84 ș theta 03B8 952 voiceless dental fricative, Eng. thin H 72 Ǵ turned h 0265 613 labial-palatal semivowel, Fr. huit Z 90 Ș ezh (yogh) 0292 658 vd. palatoalveolar fric., Eng. measure ? 63 ȣ dotless ? 0294 660 glottal stop, Ger. Verein, also Danish stød Table 1.1. Computer-readable Speech Assessment Methodologies Phonetic Alphabet, SAMPA, and its correspondence in the International Phonetic Alphabet, IPA, with examples in 6 different languages [WEL 97] In speech production, sound sources appear to be relatively localized; they excite the acoustic cavities in which the resulting air disturbances propagate and then radiate to the outer acoustic field. This relative independence of the sources with the transformations that they undergo is the basis for the acoustic theory of speech production [FAN 60, FLA 72, STE 99]. This theory considers source terms, on the one hand, which are generally assumed to be non-linear, and a linear filter on the other hand, which acts upon and transforms the source signal. This source-filter decomposition reflects the terminology commonly used in phonetics, which describes the speech sounds in terms of “phonation” (source) and “articulation” (filter). The source and filter acoustic contributions can be studied separately, as they can be considered to be decoupled from each other, in a first approximation. From the point of view of physics, this model is an approximation, the main advantage of which is its simplicity. It can be considered as valid at frequencies below 4 or 5 kHz, i.e. those frequencies for which the propagation in the vocal tract consists of one-dimensional plane waves. For signal processing purposes, the
  • 28. 6 Spoken Language Processing acoustic model can be described as a linear system, by neglecting the source-filter interaction: s(t) ) ( * ) ( * )] ( ) ( [ ) ( * ) ( * ) ( t l t v t r t p t l t v t e [1.1] ) ( * ) ( * ) ( ) ( * ) ( 0 t l t v t r t u iT t i g » » ¼ º « « ¬ ª ¦ f f G [1.2] S(Ȧ) ) ( ) ( )] ( ) ( [ ) ( ) ( ) ( Z Z Z Z Z Z Z L V R P L V E u u u u [1.3] ( ) ( ) ( ) ( ) ( 0 ) ( ) ( ) ( ) ( ) ( Z T Z T Z T Z T Z Z Z Z Z G l v r g u j j j j g i e L e V e R e U iF u u » » ¼ º « « ¬ ª ¸ ¸ ¹ · ¨ ¨ © § ¦ f f [1.4] where s(t) is the speech signal, v(t) the impulse response of the vocal tract, e(t) the vocal excitation source, l(t) the impulse response of the lip radiation component, p(t) the periodic part of the excitation, r(t) the non-periodic part of the excitation, ug(t) the glottal airflow wave, T0 the fundamental period, r(t) the noise part of the excitation, į the Dirac distribution, and where S(Ȧ), V(Ȧ), E(Ȧ), L(Ȧ), P(Ȧ), R(Ȧ), Ug(Ȧ) denote the Fourier transforms of s(t), v(t), e(t), l(t), p(t), r(t), ug(t) respectively. F0=1/T0 is the voicing fundamental frequency. The various terms of the source-filter model are now going to be studied in more details. 1.1.3. Sources The source component e(t), E(Ȧ) is a signal composed of a periodic part (vibrations of the vocal cords, characterized by F0 and the glottal airflow waveform) and a noise part. The various phonemes use both types of source excitation either separately or simultaneously. 1.1.3.1. Glottal airflow wave The study of glottal activity (phonation) is particularly important in speech science. Physical models of the glottis functioning, in terms of mass-spring systems have been investigated [FLA 72]. Several types of physiological signals can be used to conduct studies on the glottal activity (for example, electroglottography, fast photography, see [TIT 94]). From the acoustic point of view, the glottal airflow wave, which represents the airflow traveling through the glottis as a function of time, is preferred to the pressure wave. It is indeed easier to measure the glottal
  • 29. Speech Analysis 7 airflow rather than the glottal pressure, from physiological data. Moreover, the pseudo-periodic voicing source p(t) can be broken down into two parts: a pulse train, which represents the periodic part of the excitation and a low-pass filter, with an impulse response ug, which corresponds to the (frequency-domain and time- domain) shape of the glottal airflow wave. The time-domain shape of the glottal airflow wave (or, more precisely, of its derivative) generally governs the behavior of the time-domain signal for vowels and voiced signals [ROS 71]. Time-domain models of the glottal airflow have several properties in common: they are periodical, always non-negative (no incoming airflow), they are continuous functions of the time variable, derivable everywhere except, in some cases, at the closing instant. An example of such a time-domain model is the Klatt model [KLA 90], which calls for 4 parameters (the fundamental frequency F0, the voicing amplitude AV, the opening ratio Oq and the frequency TL of a spectral attenuation filter). When there is no attenuation, the KGLOTT88 model writes: °̄ ° ® ­ d d d d 0 0 0 3 2 0 0 ) ( T t T O for T O t for bt at t U q q g 2 0 3 0 2 4 27 4 27 T O AV b T O AV a with q q [1.5] when TL  0, Ug(t) is filtered by an additional low-pass filter, with an attenuation at 3,000 Hz equal to TL dB. The LF model [FAN 85] represents the derivative of the glottal airflow with 5 parameters (fundamental period T0, amplitude at the minimum of the derivative or at the maximum of the wave Ee, instant of maximum excitation Te, instant of maximum airflow wave Tp, time constant for the return phase Ta): ° ° ¯ ° ° ® ­ d d d d 0 ) ( ) ( ) ( ' for ) ( 0 for ) / sin( ) / sin( ) ( 0 T t T e e T E T t T T T t e E t U e T T T t a e e p e p T t a e g e e e H H H S S [1.6] In this equation, parameter İ is defined by an implicit equation: 0 ( ) 1 e T T a T e H H [1.7]
  • 30. 8 Spoken Language Processing All time-domain models (see Figure 1.1) have at least three main parameters: the voicing amplitude, which governs the time-domain amplitude of the wave, the voicing period, and the opening duration, i.e. the fraction of the period during which the wave is non-zero. In fact, the glottal wave represents the airflow traveling through the glottis. This flow is zero when the vocal chords are closed. It is positive when they are open. A fourth parameter is introduced in some models to account for the speed at which the glottis closes. This closing speed is related to the high frequency part of the speech spectrum. Figure 1.1. Models of the glottal airflow waveform in the time domain: triangular model, Rosenberg model, KGLOT88, LF and the corresponding spectra
  • 31. Speech Analysis 9 The general shape of the glottal airflow spectrum is one of a low-pass filter. Fant [FAN 60] uses four poles on the negative real axis: – 4 1 ) 1 ( ) ( 0 r r g g s s U s U [1.8] with sr1 | sr2 = 2ʌ × 100 Hz, and sr3 = 2ʌ ×2,000 Hz, sr4 = 2ʌ ×4,000 Hz. This is a spectral model with six parameters (F0, Ug0 and four poles), among which two are fixed (sr3 and sr4). This simple form is used in [MAR 76] in the digital domain, as a second-order low-pass filter, with a double real pole in K: 2 1 ) 1 ( ) ( 0 Kz U z U g g [1.9] Two poles are sufficient in this case, as the numerical model is only valid up to approximately 4,000 Hz. Such a filter depends on three parameters: gain Ug0, which corresponds to the voicing amplitude, fundamental frequency F0 and a frequency parameter K, which replaces both sr1 and sr2. The spectrum shows an asymptotic slope of –12 dB/octave when the frequency increases. Parameter K controls the filter’s cut-off frequency. When the frequency tends towards zero, |Ug(0)| a Ug0. Therefore, the spectral slope is zero in the neighborhood of zero, and –12 dB/octave, for frequencies above a given bound (determined by K). When the focus is put on the derivative of the glottal airflow, the two asymptotes have slopes of +6 dB/octave and –6 dB/octave respectively. This explains the existence of a maximum in the speech spectrum at low frequencies, stemming from the glottal source. Another way to calculate the glottal airflow spectrum is to start with time- domain models. For the Klatt model, for example, the following expression is obtained for the Laplace transform L, when there is no additional spectral attenuation: ¸ ¸ ¹ · ¨ ¨ © § c 2 ) 1 ( 6 ) 2 1 ( 2 1 4 27 ) )( ( s e s e e s s n L s s s g [1.10]
  • 32. 10 Spoken Language Processing Figure 1.2. Schematic spectral representation of the glottal airflow waveform. Solid line: abrupt closure of the vocal cords (minimum spectral slope). Dashed line: dampened closure. The cut-off frequency owed to this dampening is equal to 4 times the spectral maximum Fg It can be shown that this is a low-pass spectrum. The derivative of the glottal airflow shows a spectral maximum located at: 0 1 3 T O f q g S [1.11] This sheds light on the links between time-domain and frequency-domain parameters: the opening ratio (i.e. the ratio between the opening duration of the glottis and the overall glottal period) governs the spectral peak frequency. The time- domain amplitude rules the frequency-domain amplitude. The closing speed of the vocal cords relates directly to the spectral attenuation in the high frequencies, which shows a minimum slope of –12 dB/octave. 1.1.3.2. Noise sources The periodic vibration of the vocal cords is not the only sound source in speech. Noise sources are involved in the production of several phonemes. Two types of noise can be observed: transient noise and continuous noise. When a plosive is produced, the holding phase (total obstruction of the vocal tract) is followed by a release phase. A transient noise is then produced by the pressure and airflow
  • 33. Speech Analysis 11 impulse generated by the opening of the obstruction. The source is located in the vocal tract, at the point where the obstruction and release take place. The impulse is a wide-band noise which slightly varies with the plosive. For continuous noise (fricatives), the sound originates from turbulences in the fast airflow at the level of the constriction. Shadle [SHA 90] distinguishes noise caused by the lining and noise caused by obstacles, depending on the incidence angle of the air stream on the constriction. In both cases, the turbulences produce a source of random acoustic pressure downstream of the constriction. The power spectrum of this signal is approximately flat in the range of 0 – 4,000 Hz, and then decreases with frequency. When the constriction is located at the glottis, the resulting noise (aspiration noise) shows a wide-band spectral maximum around 2,000 Hz. When the constriction is in the vocal tract, the resulting noise (frication noise) also shows a roughly flat spectrum, either slowly decreasing or with a wide maximum somewhere between 4 kHz and 9 kHz. The position of this maximum depends on the fricative. The excitation source for continuous noise can thus be considered as a white Gaussian noise filtered by a low-pass filter or by a wide band-pass filter (several kHz wide). In continuous speech, it is interesting to separate the periodic and non-periodic contributions of the excitation. For this purpose, either the sinusoidal representation [SER 90] or the short-term Fourier spectrum [DAL 98, YEG 98] can be used. The principle is to subtract from the source signal its harmonic component, in order to obtain the non-periodic component. Such a separation process is illustrated in Figure 1.3.
  • 34. 12 Spoken Language Processing Figure 1.3. Spectrum of the excitation source for a vowel. (A) the complete spectrum; (B) the non-periodic part; (C) the periodic part 1.1.4. Vocal tract The vocal tract is an acoustic cavity. In the source-filter model, it plays the role of a filter, i.e. a passive system which is independent from the source. Its function consists of transforming the source signal, by means of resonances and anti- resonances. The maxima of the vocal tract’s spectral gain are called spectral formants, or more simply formants. Formants can generally be assimilated to the spectral maxima which can be observed on the speech spectrum, as the source spectrum is globally monotonous for voiced speech. However, depending on the
  • 35. Speech Analysis 13 source spectrum, formants and resonances may turn out to be shifted. Furthermore, in some cases, a source formant can be present. Formants are also observed in unvoiced speech segments, at least those that correspond to cavities located in front of the constriction, and thus excited by the noise source. 1.1.4.1. Multi-tube model The vocal tract is an acoustic duct with a complex shape. At a first level of approximation, its acoustic behavior may be understood to be one of an acoustic tube. Hypotheses must be made to calculate the propagation of an acoustic wave through this tube: – the tube is cylindrical, with a constant area section A; – the tube walls are rigid (i.e. no vibration terms at the walls); – the propagation mode is (mono-dimensional) plane waves. This assumption is satisfied if the transverse dimension of the tube is small, compared to the considered wavelengths, which correspond in practice to frequencies below 4,000 Hz for a typical vocal tract (i.e. a length of 17.6 cm and a section of 8 cm2 for the neutral vowel); – the process is adiabatic (i.e. no loss by thermal conduction); – the hypothesis of small movements is made (i.e. second-order terms can be neglected). Let A denote the (constant) section of the tube, x the abscissa along the tube, t the time, p(x, t) the pressure, u(x, t) the speed of the air particles, U(x, t) the volume velocity, ȡ the density, L the tube length and C the speed of sound in the air (approximately 340 m/s). The equations governing the propagation of a plane wave in a tube (Webster equations) are: 2 2 2 2 2 2 2 2 2 2 1 and 1 x u t u C x p t p C w w w w w w w w [1.12] This result is obtained by studying an infinitesimal variation of the pressure, the air particle speed and the density: p(x, t) = p0 + ˜p(x, t), u(x, t) = u0 + ˜u(x, t), ȡ(x, t) = ȡ0 + ˜ȡ(x, t), in conjunction with two fundamental laws of physics: 1) the conservation of mass entering a slice of the tube comprised between x and x+dx: A˜x˜ȡ = ȡA˜u˜t. By neglecting the second-order term (˜ȡ˜u˜t), by using the ideal gas law and the fact that the process is adiabatic, (p/ȡ = C2), this equation can be rewritten ˜p/C2˜t = ȡ0˜u/˜x;
  • 36. 14 Spoken Language Processing 2) Newton’s second law applied to the air in the slice of tube yields: A˜p = ȡA˜x(˜u/˜t), thus ˜p/˜x = ȡ0˜u/˜t. The solutions of these equations are formed by any linear combination of functions f(t) and g(t) of a single variable, twice continuously derivable, written as a forward wave and a backward wave which propagate at the speed of sound: ¸ ¹ · ¨ © § ¸ ¹ · ¨ © § C x t g t x f C x t f t x f ) , ( and ) , ( [1.13] and thus the pressure in the tube can be written: ¸ ¹ · ¨ © § ¸ ¹ · ¨ © § C x t g C x t f t x p ) , ( [1.14] It is easy to verify that function p satisfies equation [1.12]. Moreover, functions f and g satisfy: x C x t g c t C x t g x C x t f c t C x t f w w w w w w w w ) ( ) ( and ) ( ) ( [1.15] which, when combined for example with Newton’s second law, yields the following expression for the volume velocity (the tube having a constant section A): » ¼ º « ¬ ª ¸ ¹ · ¨ © § ¸ ¹ · ¨ © § C x t g C x t f C A t x U U ) , ( [1.16] It must be noted that if the pressure is the sum of a forward function and a backward function, the volume velocity is the difference between these two functions. The expression Zc = ȡC/A is the ratio between the pressure and the volume velocity, which is called the characteristic acoustic impedance of the tube. In general, the acoustic impedance is defined in the frequency domain. Here, the term “impedance” is used in the time domain, as the ratio between the forward and backward parts of the pressure and the volume velocity. The following electroacoustical analogies are often used: “acoustic pressure” for “voltage”; “acoustic volume velocity” for “intensity”. The vocal tract can be considered as the concatenation of cylindrical tubes, each of them having a constant area section A, and all tubes being of the same length. Let ' denote the length of each tube. The vocal tract is considered as being composed of p sections, numbered from 1 to p, starting from the lips and going towards the glottis. For each section n, the forward and backward waves (respectively from the
  • 37. Speech Analysis 15 glottis to the lips and from the lips to the glottis) are denoted fn and bn. These waves are defined at the section input, from n+1 to n (on the left of the section, if the glottis is on the left). Let Rn =ȡC/An denote the acoustic impedance of the section, which depends only on its area section. Each section can then be considered as a quadripole with two inputs fn+1 and bn+1, two outputs fn and bn and a transfer matrix Tn+1: » ¼ º « ¬ ª » ¼ º « ¬ ª 1 1 1 n n n n n b f T b f [1.17] For a given section, the transfer matrix can be broken down into two terms. Both the interface with the previous section (1) and the behavior of the waves within the section (2) must be taken into account: 1) At the level of the discontinuity between sections n and n+1, the following relations hold, on the left and on the right, for the pressure and the volume velocity: ) ( and ) ( 1 1 1 1 1 1 1 ¯ ® ­ ¯ ® ­ n n n n n n n n n n n n n n b f U b f R p b f U b f R p [1.18] as the pressure and the volume velocity are both continuous at the junction, we have Rn+1 (fn+1+bn+1) = Rn (fn+bn) and fn+1íbn+1 = fn–bn, which enables the transfer matrix at the interface to be calculated as: » ¼ º « ¬ ª » ¼ º « ¬ ª » ¼ º « ¬ ª 1 1 1 1 1 1 2 1 n n n n n n n n n n n n n b f R R R R R R R R R b f [1.19] After defining acoustic reflection coefficient k, the transfer matrix ) 1 ( 1 n T at the interface is: n n n n n n n n n A A A A R R R R k k k k T » ¼ º « ¬ ª 1 1 1 1 ) 1 ( 1 with 1 1 1 1 [1.20] 2) Within the tube of section n+1, the waves are simply submitted to propagation delays, thus: (t) and (t) 1 1 ¸ ¹ · ¨ © § ¸ ¹ · ¨ © § C ǻ t b b C ǻ t- f f n n n n [1.21]
  • 38. 16 Spoken Language Processing The phase delays and advances of the wave are all dependent on the same quantity '/C. The signal can thus be sampled with a sampling period equal to Fs = C/(2') which corresponds to a wave traveling back and forth in a section. Therefore, the z-transform of equations [1.21] can be considered as a delay (respectively an advance) of '/C corresponding to a factor z-1/2 (respectively z1/2 ). and 2 1 1 2 1 1 (z)z B (z) B (z)z F (z) F n n - n n [1.22] from which the transfer matrix ) 2 ( 1 n T corresponding to the propagation in section n + 1 can be deduced. In the z-transform domain, the total transfer matrix Tn+1 for section n+1 is the product of ) 1 ( 1 n T and ) 2 ( 1 n T : 1 1 0 0 1 1 1 1 2 1 2 1 2 1 1 » ¼ º « ¬ ª » » ¼ º « « ¬ ª » ¼ º « ¬ ª z k kz k z z z k k k Tn [1.23] The overall volume velocity transfer matrix for the p tubes (from the glottis to the lips) is finally obtained as the product of the matrices for each tube: – » ¼ º « ¬ ª » ¼ º « ¬ ª p i i p p T T b f T b f 1 0 0 with [1.24] The properties of the volume velocity transfer function for the tube (from the glottis to the lips) can be derived from this result, defined as Au = (f0íb0)/(fp íbp). For this purpose, the lip termination has to be calculated, i.e. the interface between the last tube and the outside of the mouth. Let (fl,bl) denote the volume velocity waves at the level of the outer interface and (f0,b0) the waves at the inner interface. Outside of the mouth, the backward wave bl is zero. Therefore, b0 and f0 are linearly dependent and a reflection coefficient at the lips can be defined as kl = b0/f0. Then, transfer function Au can be calculated by inverting T, according to the coefficients of matrix T and the reflection coefficient at lips kl: ) ( ) 1 )( det( 12 11 22 21 T T k T T k T A l l u [1.25] It can be verified that the determinant of T does not depend on z, as this is also not the case for the determinant of each elementary tube. As the coefficients of the transfer matrix are the products of a polynomial expression of z and a constant
  • 39. Speech Analysis 17 multiplied by z-1/2 for each section, the transfer function of the vocal tract is therefore an all-pole function with a zero for z=0 (which accounts for the propagation delay in the vocal tract). 1.1.4.2. All-pole filter model During the production of oral vowels, the vocal tract can be viewed as an acoustic tube of a complex shape. Its transfer function is composed of poles only, thus behaving as an acoustic filter with resonances only. These resonances correspond to the formants of the spectrum, which, for a sampled signal with limited bandwidth, are of a finite number N. In average, for a uniform tube, the formants are spread every kHz; as a consequence, a signal sampled at F=1/T kHz (i.e. with a bandwidth of F/2 kHz), will contain approximately F/2 formants and N=F poles will compose the transfer function of the vocal tract from which the signal originates: – N i i i N g l z z z z z K z U U z V 1 1 * 1 2 1 ) ˆ 1 )( ˆ 1 ( ) ( ) ( [1.26] Developing the expression for the conjugate complex poles ] 2 exp[ * ˆ , ˆ T i f i T i B i z i z S S r yields: – N i i i i N z T B z T f T B z K z V 1 2 1 2 1 ] ) 2 exp( ) 2 cos( ) exp( 2 1 [ ) ( S S S [1.27] where Bi denotes the formant’s bandwidth at í6 dB on each side of its maximum and fi its center frequency. To take into account the coupling with the nasal cavities (for nasal vowels and consonants) or with the cavities at the back of the excitation source (the subglottic cavity during the open glottis part of the vocalic cycle or the cavities upstream the constriction for plosives and fricatives), it is necessary to incorporate in the transfer function a finite number of zeros * , j j z z (for a band-limited signal). – – N i i i M j j j g l z z z z z z z z K z U U z V 1 1 * 1 1 1 * 1 2 ) ˆ 1 )( ˆ 1 ( ) 1 )( 1 ( ) ( ) ( [1.28]
  • 40. 18 Spoken Language Processing Any zero in the transfer function can be approximated by a set of poles, as n n n z a az f ¦ 0 1 / 1 1 . Therefore, an all-pole model with a sufficiently large number of poles is often preferred in practice to a full pole-zero model. 1.1.5. Lip-radiation The last term in the linear model corresponds to the conversion of the airflow wave at the lips into a pressure wave radiated at a given distance from the head. At a first level of approximation, the radiation effect can be assimilated to a differentiation: at the lips, the radiated pressure is the derivative of the airflow. The pressure recorded with the microphone is analogous to the one radiated at the lips, except for an attenuation factor, depending on its distance to the lips. The time- domain derivation corresponds to a spectral emphasis, i.e. a first-order high-pass filtering. The fact that the production model is linear can be exploited to condense the radiation term at the very level of the source. For this purpose, the derivative of the source is considered rather than the source itself. In the spectral domain, the consequence is to increase the slope of the spectrum by approximately +6 dB/octave, which corresponds to a time-domain derivation and, in the sampled domain, to the following transfer function: 1 1 ) ( ) ( | z K z U P z L d l [1.29] with Kd|1. 1.2. Linear prediction Linear prediction (or LPC for Linear Predictive Coding) is a parametric model of the speech signal [ATA 71, MAR 76]. Based on the source-filter model, an analysis scheme can be defined, relying on a small number of parameters and techniques for estimating these parameters. 1.2.1. Source-filter model and linear prediction The source-filter model of equation [1.4] can be further simplified by grouping in a single filter the contributions of the glottis, the vocal tract and the lip-radiation term, while keeping a flat-spectrum term for the excitation. For voiced speech, P(z) is a periodic train of pulses and for unvoiced speech, N(z) is a white noise.
  • 41. Speech Analysis 19 ) (z S ) ( ) ( ) ( ) ( ) ( ) ( z H z P z L z V z U z P g voiced speech [1.30] ) (z S ) ( ) ( ) ( ) ( ) ( z H z N z L z V z R unvoiced speech [1.31] Considering the lip-radiation spectral model in equation [1.29] and the glottal airflow model in equation [1.9], both terms can be grouped into the flat spectrum source E, with unit gain (the gain factor G is introduced to take into account the amplitude of the signal). Filter H is referred to as the synthesis filter. An additional simplification consists of considering the filter H as an all-pole filter. The acoustic theory indicates that the filter V, associated with the vocal tract, is an all-pole filter only for non-nasal sounds whereas is contains both poles and zeros for nasal sounds. However, it is possible to approximate a pole/zero transfer function with an all-pole filter, by increasing the number of poles, which means that, in practice, an all-pole approximation of the transfer function is acceptable. The inverse filter of the synthesis filter is an all-zero filter, referred to as the analysis filter and denoted A. This filter has a transfer function that is written as an Mth -order polynomial, where M is the number of poles in the transfer function of the synthesis filter H: ) (z S ) ( ) ( z H z E G H(z): synthesis filter [1.32] ) ( ) ( z A z E G with ¦ M i i i z a z A 0 ) ( : analysis filter [1.33] Linear prediction is based on the correlation between successive samples in the speech signal. The knowledge of p samples until the instant n–1 allows some prediction of the upcoming sample, denoted n ŝ , with the help of a prediction filter, the transfer function of which is denoted F(z): n n s s ˆ | p n p n n s s s D D D 2 2 1 1 ¦ p i i n is 1 D [1.34] ) ( ˆ z S ) )( ( 2 2 1 1 p p z z z z S D D D ¸ ¸ ¹ · ¨ ¨ © § ¦ P i i i z z S 1 ) ( D [1.35] ) ( ˆ z S ) ( ) ( z F z S [1.36] The prediction error İn between the predicted and actual signals is thus written:
  • 42. 20 Spoken Language Processing n H ¸ ¸ ¹ · ¨ ¨ © § ¦ p i i n i n n n s s s s 1 ˆ D [1.37] ȯ(z) ¸ ¸ ¹ · ¨ ¨ © § ¦ P i i i z z S z S z S 1 1 ) ( ) ( ˆ ) ( D [1.38] Linear prediction of speech thus closely relates with the linear acoustic production model: the source-filter production model and the linear prediction model can be identified with each other. The residual error İn can then be interpreted as the source of excitation e and the inverse filter A is associated with the prediction filter (by setting M = p). ¦ ¦ p i i n i p i i n i n s a n e G s 1 1 ) ( D H [1.39] The identification of filter A assumes a flat spectrum residual, which corresponds to a white noise or a single pulse excitation. The modeling of the excitation source in the framework of linear prediction can therefore be achieved by a pulse generator and a white noise generator, piloted by a voiced/unvoiced decision. The estimation of the prediction coefficients is obtained by minimizing the prediction error. Let 2 n H denote the square prediction error and E the total square error over a given time interval, between n0 and n1: 1 0 2 2 2 1 [ ] and n p n n i n i n i n n s s E H D H ¦ ¦ [1.40] The expression of coefficients k D that minimizes the prediction error E over a frame is obtained by zeroing the partial derivatives of E with respect to the k D coefficients, i.e., for k = 1, 2, …, p: 0 2 i.e. 0 1 0 1 » ¼ º « ¬ ª ¦ w w ¦ n n n p i i n i n k n k s s s E D D [1.41] Finally, this leads to the following system of equations: p k s s s s n n n i n k n p i i n n n n k n d d ¦ ¦ ¦ 1 1 0 1 0 1 D [1.42]
  • 43. Speech Analysis 21 and, if new coefficients cki are defined, the system becomes: ¦ ¦ d d 1 0 h wit 1 1 0 n n n k n i n ki p i ki i k s s c p k c c D [1.43] Several fast methods for computing the prediction coefficients have been proposed. The two main approaches are the autocorrelation method and the covariance method. Both methods differ by the choice of interval [n0, n1] on which total square error E is calculated. In the case of the covariance method, it is assumed that the signal is known only for a given interval of N samples exactly. No hypothesis is made concerning the behavior of the signal outside this interval. On the other hand, the autocorrelation method considers the whole range í’, +’ for calculating the total error. The coefficients are thus written: ¦ 1 N p n k n i n ki s s c covariance [1.44] ¦ f f n k n i n ki s s c autocorrelation [1.45] The covariance method is generally employed for the analysis or rather short signals (for instance, one voicing period, or one closed glottis phase). In the case of the covariance method, matrix [cki] is symmetric. The prediction coefficients are calculated with a fast algorithm [MAR 76], which will not be detailed here. 1.2.2. Autocorrelation method: algorithm For this method, signal s is considered as stationary. The limits for calculating the total error are í’, +’. However, only a finite number of samples are taken into account in practice, by zeroing the signal outside an interval [0, Ní1], i.e. by applying a time window to the signal. Total quadratic error E and coefficients cki become: ¦ ¦ ¦ f f f f f f n n i k n n k n i n ki n n s s s s c E and 2 H [1.46] Those are the autocorrelation coefficients of the signal, hence the name of the method. The roles of k and i are symmetric and the correlation coefficients only depend on the difference between k and i.
  • 44. 22 Spoken Language Processing The samples of the signal sn (resp. sn+|k-i|) are non-zero only for n  [0, N–1] (n+|k-i|  [0, N–1] respectively). Therefore, by rearranging the terms in the sum, it can be written for k = 0, …, p: 1 1 0 0 with ( ) ( ) N k i N k ki n n n k n k i n n c s s r k i r k s s ¦ ¦ [1.47] The p equation system to be solved is thus (see [1.43]): p k r a p i i d d ¦ 1 0 ) i - k ( 1 [1.48] Moreover, one equation follows from the definition of the error E: ¦ ¦ ¦ ¦ ¦ ¦ f f f f p i i i n p i p j n p i i n i n j n j i n i r a s a s s a s a E 0 0 0 0 [1.49] as a consequence of the above set of equations [1.48]. An efficient method to solve this system is the recursive method used in the Levinson algorithm. Under its matrix form, this system is written: » » » » » » » » ¼ º « « « « « « « « ¬ ª » » » » » » » » ¼ º « « « « « « « « ¬ ª » » » » » » » » ¼ º « « « « « « « « ¬ ª 0 0 0 0 1 3 2 1 0 3 2 1 3 0 1 2 3 2 1 0 1 2 1 2 1 0 1 3 2 1 0 E a a a a r r r r r r r r r r r r r r r r r r r r r r r r r p p p p p p p p p [1.50] The matrix is symmetric and it is a Toeplitz matrix. In order to solve this system, a recursive solution on prediction order n is searched for. At each step n, a set of
  • 45. Speech Analysis 23 n+1 prediction coefficients is calculated: n n n n n a .. a a a , . , , , 2 1 0 . The process is repeated up to the desired prediction order p, at which stage: 0 0 a a p , 1 1 a a p , 2 2 a a p ,…, . p p p a a If we assume that the system has been solved at step n–1, the coefficients and the error at step n of the recursion are obtained as: » » » » » » » » ¼ º « « « « « « « « ¬ ª » » » » » » » » ¼ º « « « « « « « « ¬ ª » » » » » » » » ¼ º « « « « « « « « ¬ ª 1 0 1 1 1 2 1 1 1 1 1 2 1 1 1 0 1 2 1 0 0 0 n n n n n n n n n n n n n n n n n n n a a a a k a a a a a a a a a [1.51] » » » » » » » » ¼ º « « « « « « « « ¬ ª » » » » » » » » ¼ º « « « « « « « « ¬ ª » » » » » » » » ¼ º « « « « « « « « ¬ ª 1 1 0 0 0 0 0 0 0 0 0 0 n n n n E q k q E E [1.52] i.e. 1 1 n i n n n i n i a k a a , where it can be easily shown from equations [1.50], [1.51] and [1.52] that: ) 1 ( and 1 2 1 1 0 1 1 n n n n i i n n i n n k E E r a E k ¦ [1.53] As a whole, the algorithm for calculating the prediction coefficients is (coefficients ki are called reflection coefficients):
  • 46. 24 Spoken Language Processing 1) E0 = r0 2) step n: i n n i n i n n r a E k ¦ 1 0 1 1 1 3) n n n k a and 1 0 n a 4) 1 1 n i n n n i n i a k a a for 1 ” i ” n-1 5) 1 2 ) 1 ( n n n E k E These equations are solved recursively, until the solution for order p is reached. In many applications, one of the goals is to identify the filter associated with the vocal tract, for instance to extract the formants [MCC 74]. Let us consider vowel signals, the spectra of which are shown in Figures 1.4, 1.5 and 1.6 (these spectra were calculated with a short-term Fourier transform (STFT) and are represented on a logarithmic scale). The linear prediction analysis of these vowels yields filters which correspond to the prediction model which could have produced them. Therefore, the magnitude of the transfer function of these filters can be viewed as the spectral envelope of the corresponding vowels. Linear prediction thus estimates the filter part of the source-filter model. To estimate the source, the speech signal can be filtered by the inverse of the analysis filter. The residual signal subsequently obtained represents the derivative of the source signal, as the lip-radiation term is included in the filter (according to equation [1.30]). The residual signal must thus be integrated in order to obtain an estimation of the actual source, which is represented in Figure 1.7, both in the frequency and time domains.
  • 47. Speech Analysis 25 Figure 1.4. Vowel /a/. Hamming windowed signal (Fe = 16 kHz). Magnitude spectrum on a logarithmic scale and gain of the LPC model transfer function (autocorrelation method). Complex poles of the LPC model (16 coefficients)
  • 48. 26 Spoken Language Processing Figure 1.5. Vowel /u/. Hamming windowed signal (Fe = 16 kHz). Magnitude spectrum on a logarithmic scale and gain of the LPC model transfer function (autocorrelation method). Complex poles of the LPC model (16 coefficients)
  • 49. Speech Analysis 27 Figure 1.6. Vowel /i/. Hamming windowed signal (Fe = 16 kHz). Magnitude spectrum on a logarithmic scale and gain of the LPC model transfer function (autocorrelation method). Complex poles of the LPC model (16 coefficients)
  • 50. 28 Spoken Language Processing 1.2.3. Lattice filter We are now going to show that reflection coefficients ki obtained by the autocorrelation method correspond to the reflection coefficients of a multi-tube acoustic model of the vocal tract. For this purpose, new coefficients n i b must be introduced, which are defined at each step of the recursion as: n i a b n i n n i , , 1 , 0 [1.54] The { p i b } coefficients, where p is the prediction order, can be used to postdict the signal, i.e. to predict the preceding sample of the signal. Let’s form the estimate p n s ˆ : p n s ˆ 1 1 1 1 0 p n p n n s b s b s b ¦ ¦ 1 0 1 0 p i i n i p p i i n i s s b D [1.55] A postdiction, or backward error, n H can be defined as: ¦ p i p i n i p n p n n b s b s s 0 1 with ˆ H [1.56] The total forward prediction error E (of equation [1.40]) is denoted E+ , while the total backward prediction error is denoted E- . In a same manner as in the previous development, it can be shown that, for the autocorrelation method, we have E í =E + . Subsequently, the backward prediction coefficients bi obtained via the minimization of the total backward error are identical to the ai coefficients, and the Levinson algorithm can be rewritten as: 1 1 1 n i n n i n i b k a a and 1 1 1 n i n n i n i a k b b with °̄ ° ® ­ 0 0 1 1 n n n n b a [1.57] If we consider the forward and backward prediction errors for a same instant (at order n): ¦ ¦ n j n j j i j n i j i j n i s b s a 0 0 and H H [1.58]
  • 51. Speech Analysis 29 and then equations [1.57] yield: ) 1 ( ) 1 ( 1 ) 1 ( 1 ) 1 ( and n i n n i n i n i n n i n i k k H H H H H H [1.59] The z-transforms of these equations provide: » » ¼ º « « ¬ ª » » ¼ º « « ¬ ª » » ¼ º « « ¬ ª ) ( ) ( 1 ) ( ) ( ) 1 ( ) 1 ( 1 1 z E z E z k z k z E z E n n n n n n [1.60] with, for n = 0: i i i s 0 0 H H . To complete the analogy between linear prediction and multi-tube acoustic model, a slightly different definition of the backward prediction coefficients must be resorted to: n i n n i a b 1 for i = 1, 2, ..., n+1. The total backward error has the same expression and the Levinson algorithm is written: 1 1 n i n n i n i b k a a and 1 1 1 1 n i n n i n i a k b b with °̄ ° ® ­ 0 0 1 0 1 n n n b a [1.61] from which the error recursion matrix can be deduced: » » ¼ º « « ¬ ª » ¼ º « ¬ ª » » ¼ º « « ¬ ª ) ( ) ( 1 ) ( ) ( ) 1 ( ) 1 ( 1 1 z E z E z z k k z E z E n n n n n n [1.62] for n = 0, i i s 0 H and 1 0 i i s H , i.e. ) ( ) ( 0 z S z E and ) ( ) ( 1 0 z S z z E . The inverse matrix from equation [1.62] is: » ¼ º « ¬ ª z k z k k n n n 1 1 1 2 [1.63] Except for a multiplicative factor, this is the matrix of equation [1.23], obtained for a section of the multi-tube vocal tract model. This justifies the naming of the kn coefficients as reflection coefficients. This is the inverse matrix, as the linear prediction algorithm provides the analysis filter. On the contrary, the matrix for an elementary section of the multi-tube acoustic model corresponds to the synthesis filter, i.e. the inverse of the analysis filter. Note that this definition of backward prediction coefficients introduces a shift of one sample between the forward error and the backward error, which in fact corresponds to the physical situation of the multi-tube model, in which the backward wave comes back only after a delay due to
  • 52. 30 Spoken Language Processing the propagation time in the tube section. On the contrary, if the definition of [1.54] is used, there is no shift between forward and backward errors. Equation [1.62] allows for the analysis and synthesis of speech by linear prediction, with a lattice filter structure. In fact, for each step in the recursion, crossed terms are used that result from the previous step. A remarkable property of lattice filters is that the prediction coefficients are not directly used in the filtering algorithm. Only the signal and the reflection coefficients intervene. Moreover, it can be shown [MAR 76, PAR 86] that the reflection coefficients resulting from the autocorrelation method can be directly calculated using the following formula: ( 1) ( 1) 1 0 2 2 ( 1) ( 1) 1 1 0 0 n n N i i i n n n N N i i i i k H H H H ¦ ¦ ¦ [1.64] These coefficients are sometimes called PARCOR coefficients (for PARtial error CORrelation). The use of equation [1.64] is thus an alternate way to calculate the analysis and synthesis filters, which is equivalent to the autocorrelation method, but without calculating explicitly the prediction coefficient. Other lattice filter structures have been proposed. In the Burg method, the calculation of the reflection coefficients is based on the minimization (in the least squares sense) of the sum of the forward and backward errors. The error term to minimize is: ¦ » ¼ º « ¬ ª 1 0 2 2 N i n i n i N E H H [1.65] By writing that ˜En /˜kn = 0, in order to find the optimal kn coefficients, we obtain: ¦ ¦ ¦ 1 0 2 ) 1 ( 1 0 2 ) 1 ( 1 0 ) 1 ( ) 1 ( 2 N i n i N i n i N i n i n i n k H H H H [1.66]
  • 53. Speech Analysis 31 These coefficients no longer correspond to the autocorrelation method, but they possess good stability properties, as it can be shown that í1 ” kn ” 1. Adaptive versions of the Burg algorithm also exist [MAK 75, MAK 81]. 1.2.4. Models of the excitation In addition to the filter part of the linear prediction model, the source part has to be estimated. One of the terms concerning the source is the synthesis gain G. There is no unique solution to this problem and additional hypotheses must be made. A commonly accepted hypothesis is to set the total signal energy equal to that of the impulse response of the synthesis filter. Let us denote as h(n) the impulse response and rh(k) the corresponding autocorrelation coefficients. Thus: ¦ ¦ p i h i p i h i i k r k r i n h n G n h 1 1 ) ( ) ( and ) ( ) ( ) ( D D G [1.67] Indeed, for k 0, the autocorrelation coefficients are infinite sums of terms such as: ¦ p i i i n h k n h k n h n G n h k n h 1 ) ( ) ( ) ( ) ( ) ( ) ( D G [1.68] and the terms į(n)h(ník) are always zero, for k  0. Equaling the total energies is equivalent to equaling the 0th order autocorrelations. Thanks to recurrence equation [1.67], the autocorrelation coefficients of the signal and of the impulse response can be identified with each other: rh(i) = r(i), for i = 0, 1, …, p. For n = 0, h(0) = G; therefore, reusing equation [1.67] yields: 2 1 1 (0) (0) ( ) therefore: (0) ( ) p p h i h i i i r Gh r i G r r i D D ¦ ¦ [1.69]
  • 54. 32 Spoken Language Processing Figure 1.7. Vowel /i/. Residual signal and its magnitude spectrum on a logarithmic scale In the conventional linear prediction model, the excitation is either voiced or unvoiced, for each analysis frame. In the case of a voiced signal, the excitation is a periodic pulse train at the fundamental period (see Figure 1.9), and for an unvoiced signal, the excitation is a Gaussian white noise (see Figure 1.8). The mixture of these two sources is not allowed, which is a definite drawback for voiced sounds for which a noise component is also present in the excitation.
  • 55. Other documents randomly have different content
  • 56. among them, and many more that are not so good. Those that saw the thing out say they finally got to singing, Glory to God, and Abe Linkum, and wound up with a prayer meeting, in which Massa Linkum and the Linkum Sogers were the names most often heard. October 17, 1863. Saturday. To-day Lieutenants Heath, Reynolds, the quartermaster and myself took a long ride about the country spreading the news of our headquarters for recruits. The white people we met were civil, but their hatred of us could not be entirely covered up. I could not find it in my heart to blame them, and I much regretted that one of our party saw fit to trade horses with one of them and entirely against his will. But the blacks are wild with joy, and eager to become Linkum Sogers. In the afternoon a detail was sent out with the quartermaster's wagon for mutton or beef, for our family is getting so large they will soon eat up the government rations at hand. They came back soon with a choice lot of dressed mutton. The guides apparently knew just where to go. Later in the day Reynolds, Gorton and myself made another tour of the country towards the Mississippi River. We came to a house over towards the Great Cypress Swamp, as the folks here call it, and which is a belt of big timber lying between the Teche prairie and the Mississippi River, in which outlaws and wild beasts are said to abound, and in which bands of guerrillas have their hiding places. We have heard much of the Great Cypress Swamp and its terrors, and felt quite brave as we looked at it from a half mile distance. No one appeared to be at home, so we investigated. The weeds were as high as our heads, but a path led back to a stable in which was the most perfect picture of a horse I ever looked at. He appeared to be scared out of his head at the sight of us, and plunged and snorted as if a bear was after him. The path continued and soon we came to a mulatto and his wife busy digging peanuts. We introduced the subject of enlistment and found
  • 57. he was ready and willing to go at once if he could take his horse with him. They could both talk English, and a jargon we supposed was French. When speaking to us they used English, but to each other they talked French. After a short confab he agreed to go with us, and his wife made no objection. He got his horse from the stable, and his saddle from the house and we set out for camp. I thought it strange that either of them showed so little concern at parting for what might be forever, and wondered the wife did not ask to go also, as so many of the others had done. We reached camp just at night, where both the horse and man attracted the attention of all hands. Colonel Parker at once wanted to buy the horse, and a bargain was soon struck, the horse to be paid for on the next pay day, which was agreeable to the mulatto. He was so frank and open in all his talk, that when he asked if he might ride the horse home and remain till morning the colonel readily consented, telling him to be in camp by noon the next day. October 18, 1863. Sunday. We lay about camp until noon and the horse and his rider did not appear. The colonel was mad clear through. He had been told the nigger would not come back, but he believed he would, and as the time went on little was heard but comments on the slick trick the rogue had played on Colonel Parker. After dinner he told Gorton and me to saddle up and show him the way and he would see whether he could find him. We went to the house but found no one at home. We then rode on towards the swamp. We saw a man running across a cleared spot and soon overhauled him. It was the fellow himself. He said his horse had got away and he was trying to find him, had been looking for him all the morning. The colonel drew his revolver and told him to march ahead of him to a big tree a short distance away, at the same time telling me to get my picket rope ready, for he was going to find that horse, or else find a dead nigger. The nig was scared and began to beg, declaring the horse had
  • 58. gotten out of the stable in the night, and he and his wife both had been looking for him all day long. After he had got through, the colonel told me to throw the line over a limb, for he was going to keep his word. Whether he did really intend to hang him or not I don't know, but I thought he would stop short of the actual deed, so I proceeded to get the rope in position for a real hanging. Just then the rascal owned up. The horse was in the swamp where he had hidden him, and if the colonel would spare his life he would take us to him. We then went on and soon came to a beaten path that led directly to the dense forest before us. At the first turn in the path after we entered the woods the colonel dropped me off. At the next turn he left Gorton, and he himself with revolver in hand followed the fellow on and out of sight. He was gone perhaps fifteen minutes when out they came, horse and all, and we made tracks for camp, which we reached about sundown. The next morning the man's wife came into camp, and they both acted as if nothing out of the ordinary had happened. Where I waited in the woods the undergrowth was so dense I could not see a rod in any direction except along the path. Squirrels, both black and gray, came out of the bushes and looked at me. I counted five black squirrels in sight at one time. They are not quite so large as the grays, and are a dark brown rather than black. I wondered if they were as plenty all through the woods as where I sat. Gorton says he saw as many as I did. If all the stories I have heard about the Great Cypress Swamp are true, I don't care for any closer acquaintance than I now have. There are wild animals of all kinds common to this part of the country—bears, wildcats, opossum, deer and snakes as big as any in Barnum's menagerie. I can believe the snake part, for I have seen so many that I believe all the snake stories I hear. This same Great Cypress Swamp is said to be the home of outlaws, both white and black. That they have homes there where they live undisturbed by the laws made to govern other people. That runaway slaves find homes there, where they live and raise families which recruit the ranks of the lawless set living there, as fast as they are killed off by the fights they have among themselves and with the officers of the law that attempt to capture or subdue them.
  • 59. Night. The work for to-morrow has been mapped out. Quartermaster Schemerhorn, Lieutenant Reynolds and myself are to start for Brashear City, taking with us the men we have enlisted. Two days' rations have been given out, and the darkies are having a farewell dance. This has been a busy Sunday, one I will long remember. October 19, 1863. Monday. We were up early and found the dance still going on. These creatures have danced all night, and eaten up a good portion of the rations, in spite of the fact that they knew a hard tramp lay before them to-day. How they will get through, or what we will do if they give out on the way, is the next thing for us to think of. They don't care. Someone has always thought for them and will have to think for them for some time to come. The quartermaster and Reynolds started off in good season but I was kept back for instructions until they were out of sight, and I did not overtake them until they had reached Vermillion Bayou. A drove of men, women and children, the families of the men we were taking away, had followed them until now. We had to wait for a wagon train to get off the bridge and this gave time for them to get through with the good-byes, and most of them turned back. A half dozen or more of the younger women kept on and went all the way through. The day was warm, and the road was dusty, but we went through without accident or adventure, other than might be expected when all things are considered. For several days the men had been in a state of great excitement over their new prospects. They had wound up by dancing all night, and eating up the provisions intended for us on this hard tramp. As the day wore on the excitement wore off and they found themselves very tired and very hungry. Such few things as they had beside those on their backs was in a cart drawn by a mule, and driven by three wenches. When a man gave out we turned out a wench and put the man in her place. Finally all three wenches were on foot, and their places in the cart taken by as many
  • 60. men. Before long others gave out and the cart was loaded until that broke down. Then we held a council. We were outside the picket lines and night was coming on, and staying there in the road was not to be thought of. Three revolvers were the only weapons of defense we could muster in case of attack by a guerrilla squad. Capture meant death. We explained the situation to such as could understand us, and they made it so plain to the others that they were all ready to hustle. We patched up the cart so the extras could be dragged along and away we went. The quartermaster rode on to find a place to stay at, and something to eat. I let one who was worst off ride my horse, and with Reynolds at the front to coax, and I at the rear to drive, we got up such a gait I had to do my best to keep up. The road had been graded for a railroad, and was wide and level as a floor. At dusk I saw the steeple of a church, and knew we were near our journey's end. Now that the end was in sight, the weariness all seemed to disappear. We passed the picket line and were soon in the town. The quartermaster had got a schoolhouse for a stay over and had rations from the commissary. We made short work of these and expected to settle right down for the night. The men and women filled the schoolhouse full, and after being in there a few minutes, we three made up our minds the air was better outside, so we each took a board shutter from the windows and were soon settled down as comfortable as the circumstances would allow. Before we were asleep we heard a fiddle tuning up and in a little while a dance was started and was in full blast when I fell asleep. How long it lasted I don't know, but when I awoke about sunrise the inmates of the schoolhouse were sleeping like the dead. October 20, 1863. Tuesday. I was nearly blind when I awoke. Something like an inflammation in my eyes had troubled me for some days, and the dusty tramp of the day before had made it worse. However, I soaked
  • 61. them open, and found that it had not affected my appetite in the least. While at breakfast Lieutenant Bell came and joined us. He was on his way to join the colonel and his party at the front. The colonel had given us an order to stop any boat going towards Brashear City, and with it I proceeded to the landing, leaving Reynolds and the quartermaster to pick up and bring on our party. At the landing I met a party on their way to the front, and gave my horse to one of them who was in just such a fix as I was the morning I became a horse thief. In reply to his very profuse thanks I told him I would have to turn her loose if I didn't give her away, for I could take her no farther. I had long forgiven her the kick she gave me and sincerely wished her well. At Nelson's Landing I found a boat which was being held in readiness for General Banks and his staff, so that was of no use to us. Soon after the A. G. Brown came up and said she would be back that night, and take us. We went into camp near the sugar mill and very soon our small army was arranging for a sham battle. They talked French, so I could only judge what they were up to from what I saw. They divided into two squads and proceeded to fortify their positions by rolling the empty sugar hogsheads up in two parallel rows, behind which they stationed themselves, while the generals in command jawed at each other across the field. The men each had a hogshead stave for a weapon. For flags they used bandanna handkerchiefs, and for drums a piece of board upon which one man pounded while another held it up. One of the generals made a speech which made the other side fighting mad, and they all jumped over the breastworks and met in the space between, batting each other over the head with their weapons, and yelling with all the power of their lungs. We thought sure they would kill each other, for the blows they struck broke some of the staves into splinters. Just as we were going to try and interfere, one side surrendered and were marched off, prisoners. There had been some blood shed, and the wonder is that no heads were broken. But the best part came after the fight was over, and when the final settlement was being made. Through an interpreter we learned that the general who should win the fight was to kiss one of the young ladies that had marched with us all the way from
  • 62. Mouton's Plantation, and he now demanded his pay. She was led out upon the battlefield, and when the victorious officer came up to claim his reward she slapped his face, and then turned her back to him. He then gave some orders, when his men grabbed the dusky maiden and turned her about. I could not tell whether she blushed or not, but suppose of course she did. The general got down on one knee and then on both and jabbered French at her until she finally relented and stuck out her hand, which she allowed him to kiss. This soon led to a full surrender, and the battle was over, and peace declared. We gave out the rations and began to get ready for a start as soon as the boat came along. We even filled a barrel with sugar, thinking it might come handy when we got to Brashear City. But night came and the A. G. Brown failed to appear. There were many here who like ourselves were waiting to get out of the country. Among them was a young mulatto woman, whom the others called Margaret, and who seemed of a higher order than those about her. She was willing to talk, and from her I have a story that has fully reconciled me to the wisdom of the President's Emancipation Proclamation. She has started for the North. Our coming among them has given her the chance she had long looked for. She has run away from her mistress, and her master is in the Rebel army. She has a picture of her husband, and a fine-looking man he was. He was as white as I am. He was the son of his master, and her father she says is Judge ——, now in the Rebel service. Her husband picked up enough education to be head man on his father's plantation. He knew too much for a nigger, and when the Rebel army came through last spring he was taken out and hanged to a tree right before her eyes. After they had gone the slaves cut the body down and buried it. Margaret is in hopes to reach New York, and I wished I could land her there that minute. If she was dressed as well, and if she was educated, she would pass muster with any I have seen that go by the name of ladies.
  • 63. No boat coming to take us away, we posted guards, giving each a stick of wood for a weapon. I remained up until midnight, and in going the rounds to see if the guards were awake, came near getting a club over my head as I turned the corner of the sugar mill. At midnight I called Reynolds, and rolled myself in my blanket and was soon asleep. The mosquitoes were about as thick and as savage as any we had met with. The horses and cattle had no peace for them. I rolled myself up head and heels in my blanket, and yet when I awoke found one foot had got out of bed, and the varmints had put a belt around my ankle between my stocking and trousers that looked like raw beef. I don't suppose there was an atom of space that had not been punctured by a bill. But I slept right through, and as usual dreamed of home and home folks. October 21, 1863. Wednesday. Nelly, one of the women who came with our crowd, has volunteered to be our cook, and besides being a good cook has proved herself to be a good forager. When I woke up she had fresh pork and chicken cooked and we asked no questions about what price she paid for them. Quartermaster Schemerhorn rode up to Newtown for rations, and I went back to bed to finish up my nap. The mosquitoes had not quite finished their job on me, and some actually bit me through a thick woollen blanket. My leg was very sore where they feasted on it this morning. One of the men mixed up some mud for a poultice, which helped it wonderfully. I found out we could learn many things from these poor creatures, not the least being how to live on the fat of the land we are in. Noon. The quartermaster came back and said the A. G. Brown would be along to-day some time. That it will make a landing one-half mile above here. Accordingly we pack up and move up to Mr. Nelson's so as to be sure of not missing it. Mr. Nelson, the owner of everything in this region, is here. He has been a merchant in New Orleans, but since Banks' order driving all Rebel sympathizers from the city, has
  • 64. been here at his plantation home. It is said he owns 20,000 acres of land, and all the necessary stock and tools to work so large a tract. After a supper of hard-tack and bacon, Lieutenant Reynolds and I went and called on the gentleman. He received us very politely, and offered us the best his house afforded. The boat not coming we prolonged our visit, sitting on the broad piazza and smoking his cigars. He said he was a widower, with two children, a son in the army, and a daughter at school in Georgia. He told us of the outrageous wrongs he had suffered at the hands of the invading armies, how they had laid waste his land, torn down his buildings and fences, taking away his mules and horses, cattle and sheep, until he had nothing but the bare land to live upon, and no slaves left him to work even that. It was holding up the other side of the picture to our view, and in spite of ourselves we were sorry for him. He evidently did not expect sympathy from us, for after reciting his wrongs he changed the subject of conversation around to topics we could all agree upon, and after a sociable chat he invited us to spend the night with him, agreeing to have us called in case the boat came during the night. He urged us to stay and we did. He gave us rooms, elegantly furnished, with beds so white and clean we were some time making up our minds whether after all we ought not to sleep on the floor, and leave the beds as they were. But the whole mosquito bars and a few nips from our ever-present enemies decided us. We undressed and were soon asleep, too sound even to dream of home. The boat did not come and the next thing we were aware of it was morning. October 22, 1863. Thursday. We slept late, and when we came out, our host was waiting for us, to say that breakfast was ready, and would not listen to our going away until we had partaken of it with him. We sat down to a beefsteak breakfast, with all the extras. I did not think I was so hungry, but the smell of the victuals made us both ravenous. Our
  • 65. host seemed to enjoy seeing us eat and thanked us heartily for making him the visit, going so far as to say that in case the boat did not come that day he would be glad to entertain us again. In books and in other ways I had heard of southern hospitality and I now know it was all true. I wonder if it was ever put to a severer test. We went down to the landing and found a guard of soldiers from an Illinois regiment, keeping watch over a quantity of sugar and molasses which the government has confiscated, and which the boat was expected to take away when it came. They invited us to make one of their party until the boat came, and we gladly accepted the invitation. They thought we had risked our lives in going to stay with Mr. Nelson, and eating food in his house, but we did not believe it, and did all we could to make them think better of him than they had so far done. The guards shot a hog, which made fodder for our folks for the day, together with the government rations we already had. The day passed and another night came on and still no boat. We crawled in wherever we could get and slept as best we could for the mosquitoes, which seems determined to eat us alive. October 23, 1863. A cold rain storm that has been threatened for a day or two came upon us early this morning. A small flock of sheep came up the road driven by a man on horseback. The negroes from everywhere have gathered here and the rations we give our men they give away to their friends and are always hungry in consequence. When the sheep came along they surrounded them and killed at least a dozen before we could stop them. The man hustled along with what was left and those killed were soon skinned and being cooked in various ways. We had mutton for dinner and for supper, and had enough left for breakfast. The day finally passed and we began looking for better sleeping quarters. Reynolds and I with a part of the guard finally climbed a ladder and got into a loft full of cornstalks with the corn on just as it had been cut and stored away. The place was alive with
  • 66. rats and mice, which ran over and through the stalks, making a terrible racket, varied once in a while by a fight among themselves. We got used to the racket and finally were asleep. Just as we were enjoying ourselves, along came the boat we had waited so long for. We hustled to sort out the nigs that belonged to us and get them on board. In a little while we were off. The boat was crammed full of people—black and white, old and young, men and women all spread out on the cabin floor, or the tables. I never saw such a mass of people in so small a space. We poked around and after a while found room to lie down, after which getting asleep was quick work. October 24, 1863. Saturday. Another raw day. Now that the people are standing on end there is more room to get about. We made out to eat such as we had; while we wished for more, we had to content ourselves with what we had grabbed hold of the night before in the dark. At noon we passed Franklin, and about 3 p. m. reached Centerville, where there was a lot of sugar to load on the lower deck. The captain said if we would turn in our men to roll on the sugar he would undertake to fill them up. I took advantage of the stop to see what the place looked like. On one of the streets I saw oranges on a tree and went in to see if I could beg or buy a few. As I went into the yard a young lady came out and, in a tone and with a look that almost froze me, asked what I was doing in her yard. To save me I couldn't think what to say, but I did after a while come to enough to say I would like an orange. She turned to a negro and motioned towards the trees, when he went and picked his hands full and gave me. Then the madam pointed her finger towards the street and said, Now that you have what you came after will you please go—and I went. I don't know yet what I ought to have said or done, but the only thing I did was to get back to the boat as fast as I could. I kept the adventure to myself, and gave the oranges away, for I think they would have
  • 67. choked me. That is a sort of southern hospitality I never read of in a book, or heard of in any other way. I never saw so much scorn on a face before. Why I stood there like a chicken thief caught in the act, and then carried off the oranges, I don't now know. If the Rebels were all like her I would resign and go home at once, for she did actually scare my wits all away from me. The sugar was on board and true to his promise the captain ordered a supper for our army, which must have made his stock of provisions look small. Rube asked me what I found the town like, and I told him it was different from any I had yet seen. We soon got settled down for the night. October 25, 1863. Sunday. When we awoke we were in sight of Brashear City. We landed, formed in line as well as we could, and marched to our headquarters, where I found my old crony, Sol Drake. We found quarters for the men in an unused building, and in a little while their woolly heads were sticking out from every window. The quartermaster drew clothes for them, and they were soon fitted out with suits of blue, just like the rest of the Linkum Sogers. The trouble was to fit them with shoes. I doubt if many had ever had a shoe on their feet. Their feet are wide at the toes and taper straight back to the heel. No. 12 was the smallest size we found use for, the most of them taking 14 or larger. They insisted on squeezing a No. 14 foot into a No. 10 or 12 shoe, but we, knowing what that would result in, got them properly shod after a long time. Then how proud they were! We then gave them their rations for the day, telling them through interpreters that if they wasted it or gave it away, they could have no more until to-morrow. We moved all our belongings from the boat and filled out the day visiting and talking over old times, and at early bedtime settled down for the night in a four-room house which has been taken for our headquarters while here.
  • 68. October 26, 1863. Brashear City, La. Monday. On going out this morning who should appear to me but George Story of Company B, who was captured with General Dow at Port Hudson last summer. He says he was well treated by his captors, and has no fault to find with them. They took him and the general to Richmond, and put them in Libby Prison. After a while he was paroled, and sent to Annapolis, Md. There he was kept until exchanged, and then sent south in charge of the provost marshal to be turned over to the 128th New York. Through a mistake at headquarters he was sent here, as the 128th was supposed to be at the front in the Teche country. If he had not met us as he did, he would have gone up the Teche on the next boat. As it is he will go back to New Orleans to-morrow, and look for his regiment up the river, probably at Baton Rouge, where we left them. We commenced teaching our recruits the rudiments of soldiering. They are awkward, but very anxious to learn, and as that is the main thing, we look for little trouble in drilling them. By shoving them together, lock-step fashion, they soon got the idea of marching in time, and on the whole did as well or better than we did at Hudson, when we took our first lesson. The quartermaster has gone to the city for equipments, tents, etc., and when he returns we will soon be at the Manual of Arms. We expect Major Palon here to-day to take charge, and by the time Colonel B. and the rest get back, hope to have our recruits fit for turning over to any regiment that needs them. October 27, 1863. Tuesday. It rained hard all day, consequently no drill or other work was attempted. Major Palon and the quartermaster came from the city, the latter with rubber blankets and shelter tents for the recruits. He also brought some letters, one for me telling about the draft at home. Those that are drafted can get off by hiring a substitute or by
  • 69. paying $300, in which case a substitute is furnished them. I am glad I enlisted. There have been times when I could hardly say it, but I can say it now with all sincerity. More women and children have come, wives and children of the men we have. Poor things! I suppose they have nowhere else to go or to stay, so they have followed on after their husbands and fathers. I have heard that the government has provided camps for them, where rations are served to them just as to the soldiers. It is a very proper thing to do, and I hope it may be true that these helpless ones are thus provided for. This arming of the negroes is not such a simple affair as it seemed. This is a side I had not thought of, but I don't see how it can be dodged. October 28, 1863. Wednesday. The rain has stopped, and the mud is now having its turn. It makes us just as helpless as the rain did. We have put in the time making plans for the time when the mud hardens. It does not dry up, as it does in the north, but the water seems to settle and leave the ground hard even if there be no sun or wind. October 29, 1863. Thursday. After a council on matters and things in general, we have made some changes, looking to a more orderly arrangement of our camp life in these quarters. The hangers on about camp have been driven away. The quartermaster's stores and those of the commissary department have been separated and placed in tents outside, where they can be found and got at. The most intelligent among the recruits have been appointed corporals and sergeants, and the screws of discipline turned on just a little more. Guards are placed, more for their instruction than for our safety, and things are putting on more the appearance of a military camp than a mere
  • 70. lounging place, as it has heretofore been. Just as we had got everything to our notion, a boat came, and on it were Captains Merritt and Enoch with 120 more recruits. Tents and blankets were given them and quarters assigned them, which altogether has made a busy day for us. Discipline, what little there had been, went to the winds when the men all got together. They all seemed to be acquainted, and such jabbering French as they had. I suppose they had lots of news to tell each other. Some can talk English, but all of them can and do talk French when talking to each other. They came from Colonel B.'s headquarters at Opelousas, and were in charge of Colonel Parker, who got left behind at Newtown, and will be along on the next boat. At night Dr. Warren, our surgeon to be, came from New Orleans, and to-morrow will examine the recruits. Sol Drake has been sent for to join Colonel B. at Opelousas and expects to leave on the next boat. Opelousas is beyond where I have been. I have posted Sol in getting as far as Mouton's, where we were, and beyond that he must find out for himself. October 30, 1863. Friday. It has been a rainy day, but we have paid little attention to it. Dr. Warren finished up his examination and nearly every man passed muster. He was not as particular about it as Dr. Cole was at Hudson. As fast as examined and passed we gave them their new clothes, and a prouder set of people I never saw. Lieutenant Colonel Parker came at night with later word from Colonel B. and Drake does not have to go. For this he and the rest of us are glad. Colonel Parker brought eight men with him and about as many women. We have quite a respectable squad, and they are learning very fast—faster I think than we did when we first began. Those that were rejected by the surgeon as unsound are here yet, and what to do with them is a puzzle to us. We have each of us taken one, to do anything for us we can think of, and they seem perfectly happy. Mine is named Tony, and is a great big good-natured soul, ready to do anything for me, if
  • 71. I will only let him stay. He came to me at first asking if I would write a letter to his wife, and when I asked him what I should write, told me anything I was a mind to. I wrote the letter, telling her where he was, and how he was, and put in a word for some of the others for Tony's wife to tell their folks. This pleased him so much that he hung around trying to do me a favor in return, and when he was rejected by the doctor he said I must keep him, for he would be killed if he went back home, because he had enlisted. The government allows us transportation and a daily ration for a servant, so I am nothing out, for he asks no other pay than his board and the privilege of staying. October 31, 1863. Saturday. Lieutenant Colonel Parker and Dr. Warren left us to look for a healthier place, as many of the men are getting chills and fever. The ground is low and wet and I suppose is a regular breeding place for fever and ague. We are glad of a prospect of a change, but this country is all swampy and wet. The Teche country comes the nearest to dry ground of anything I have seen. We are getting into full swing. Companies A, B, and C are organized and assigned to Captain Merritt, Captain Hoyt, and Captain Enoch. There are thirty men left and these are turned over to Lieutenant Reynolds for drill. At night, a telegram from Colonel Parker says we must stay at Brashear City until our regiment is full. I have been out of sorts to- day and have laid up for repairs. November 1, 1863. Sunday. Was detailed for officer of the guard, but not feeling well Lieutenant Reynolds volunteered to act for me, for which I am very much obliged. I put in another day trying to be sick, but toward night gave it up as a failure. However, I put in the day by staying
  • 72. indoors, writing letters for the men, some to their wives and some to their sweethearts. The more love I can put in the letters, and the bigger words I can use, the better they suit the sender. What effect they have on those that receive them I happily do not know. November 2, 1863. Monday. I lay down last night thinking if only mother was here to fix me up a dose, as she has so many times done, I should be well right off. I soon dropped off, and the same thought kept right on going through my brain until I awoke this morning and found myself in the same position, lying crosswise of my bed just as I lay down last night. But my dream of home had cured me, and I was myself again, ready for whatever might come. I found myself again on the detail for guard. After the new guard was posted I had but little to do, except to see to it that the reliefs were changed at the proper time. There was no enemy in sight, though the guards were just as watchful as if the enemy had been in the next yard. The worst was to remember the names of the sergeants, and that I got round by writing them down. Even then I had to guess at some. At night Colonel Parker came back from the city, on his way to join Colonel B., who is at the front with the rest of the gang. He brought me two letters, one saying father is sick and the other saying he is well again. I am glad the good news came with the bad, though I had much rather no news of that kind would come. I also had a list of names of those drafted from the town of North East. John and Perry Loucks and Amon Briggs were among them. Whether they will go or get substitutes the letter did not say. Also that another proclamation from the President calls for 300,000 more men. I wonder if he knows what an army we are raising for him here. Report says an accident between here and Algiers last night killed twelve soldiers and wounded over sixty more. One train broke down and another ran into it, both loaded with soldiers. These
  • 73. roads are so straight and level it would seem that accidents of that kind might be avoided. November 3, 1863. Tuesday. I made a raise of a postage stamp to-day and sent a letter home. The day has passed like all do nowadays, with little to do. But it has been pleasant, and that is an exception I am happy to make a note of. The quartermaster came in to-night with more tents, and more supplies. November 4, 1863. Wednesday. The steamer Red Chief came down the Teche this morning with more recruits, in charge of Lieutenants Gorton, Smith, Heath and Ames. This will make more work and I am glad of it. Lieutenant Colonel Parker has been on the point of starting up the country again for several days, but has not gone yet. To-day he has decided to move our quarters to higher ground. This is a wise thing to do according to Dr. Warren, for a great many of the men are sick with chills and fever. The site chosen is about a mile away. I am detailed to see that the stuff gets off, and the others are to be on the new site and receive it, and see to its proper distribution. I am temporarily assigned to Company D. By noon I had everything on the way, and after reaching camp helped to get Company D in as good shape as the others. A regular camp is laid out and company streets made. It made me think of the laying out of Camp Millington. Grading the company streets and other necessary work will give us something to do for days to come. I put in so much time helping the others get fixed that I forgot my own tent, and as Captain Enoch invited me to sleep with him, I accepted, and after fighting mosquitoes until nearly midnight, I fell asleep and remained so until late the next morning.
  • 74. November 5, 1863. Thursday. Tony was waiting for me when I woke up, and was feeling badly because I had to go to the neighbors to sleep. After our hard- tack and coffee were safely stowed away, I got my tent out and we soon had it up. Then Tony began skirmishing for furnishings. He had seen what the others had and set out to beat them all. He got hold of a board wide enough and long enough for me to sleep on, and soon had legs driven in the ground to hold it up. My modest belongings were put under it, and the deed was done. Colonel Parker gave a few parting orders and then took boat for New Iberia to join Colonel B., leaving Captain Merritt, in command. Captain Laird not yet having joined the command, I am curious to know what sort of a man I am to serve under. Company D is as yet made up of raw recruits, not yet having passed through the medical mill, so I have only to keep them within bounds until they are examined and sworn in as soldiers, when their education will begin. At night Dr. Warren and Lieutenant John Mathers came from New Orleans. A cold drizzling rain began about that time and we were driven into our tents, where the hungry mosquitoes awaited us and war was at once declared. If I had a brigade of men as determined as these Brashear City mosquitoes, I believe I could sweep the Rebellion off its feet in a month's time. They make no threats as our home mosquitoes do, but pounce right on and the first notice you get is a stab that brings the blood. I have had at least one bite for every word I have written about them, and all in the same time I have been writing it. The only escape from them is in the hot sun, or under a blanket so thick they cannot reach through it. November 6, 1863. Friday. This morning Lieutenants Reynolds, Smith, Ames and myself formed a club of four for mutual protection against starvation. We have a rejected recruit for a cook, and have made a draft on the
  • 75. commissary for salt horse, hard-tack and coffee. If he can't get up a meal on that, then he's no cook for us. My company was examined and almost every one proved to be sound enough for soldiers. A dozen at a time were taken into a tent, where they stripped and were put through the usual gymnastic performance, after which they were measured for shoes and a suit, and then another dozen called in. Some of them were scarred from head to foot where they had been whipped. One man's back was nearly all one scar, as if the skin had been chopped up and left to heal in ridges. Another had scars on the back of his neck, and from that all the way to his heels every little ways; but that was not such a sight as the one with the great solid mass of ridges, from his shoulders to his hips. That beat all the anti-slavery sermons ever yet preached. But this is over with now, and I don't wonder their prayers are mostly of thanks to Massa Linkum. They are very religious, holding prayer meetings every night, after which the fiddle begins and dancing goes on all night, if not stopped on account of the noise they make. I don't know how they get along with so little sleep, or rest. After the examination we got blankets and clothes from the quartermaster and they were fitted as well as it is possible to fit from a ready-made stock. Our cook, George, proved to be a jewel. He made salt beef taste so much like a chicken we didn't notice the difference. Major Palon came from the city at night, and brought some letters. One was for me and contained three dollars from my old crony, Walt Loucks. This will keep us in extras for a little while. We were some time deciding how to use it, but a majority thought a part of it should go for flour, so George could try his hand at pancakes. November 7, 1863. Saturday. I have never described our camp, and may never have a better time than now. We are out of town, to the north, on high, hard ground, for this country—so high that there is quite a slope towards the water of Berwick Bay. Company streets are laid out and
  • 76. the camp kept clean by a detail made each day for that purpose. There are many large trees in and about our camp, and taken altogether we have never had a stopping-place quite equal to it. The sick list has shrunk already, though the hospital tent is pretty well filled yet. We have company-drill every day and there is quite a strife among us to see which can learn his troop the fastest. The men are as eager to learn as we are to have them, which makes it much easier for both parties. Berwick, which is directly opposite, is quite a place from the looks, larger than Brashear. It is the shipping port for the great Teche country that lies beyond. Just after dinner Colonel Tarbell's orderly rode into camp and inquired for me, handing me an order which read, Lieutenant Lawrence Van Alstyne, commanding Company D, 90th U. S. C. I., at Brashear City, La. Captain Vallance, quartermaster, will furnish the bearer with a boat, in which he will proceed to Berwick and procure a sufficient supply of lumber to floor the hospital tent in said regiment. Signed, Tarbell, commander. I took five men and such tools as we could find and called on Captain Vallance, who gave us a boat in which we rowed across the bay, which was still as a mill pond. We landed near a shanty which easily came apart, and which had good wide boards, enough to floor several hospital tents. We made these into a raft which we towed back, reaching camp without having seen a person, except a guard—who considered my order good enough authority for letting the boards go. We had boards enough for the hospital tent and all the other tents, which as soon as they are dry will be used for the comfort of all hands. At night Lieutenant Gorton arrived from the city to take the next boat for Newtown to join Colonel B. Lieutenant Smith made me a present of a handsome pair of shoulder straps. The groundwork is dark velvet and the border of gold cord twisted and woven together. Altogether they are as handsome a pair as I have ever seen on anybody's shoulders. I shall lay them away until I get a coat fit to put them on, and that won't be until after pay day. Thank you, Matt, I'll try and not disgrace them. I presume he
  • 77. paid money for them that he needed for fodder; but that's just like Matt Smith. Major Palon also returned to-night, and made some changes. Lieutenant Ames, my partner in Company D, goes in the medical department as clerk, and Lieutenant Reynolds takes his place with me. November 8, 1863. Sunday. On duty to-day as officer of the guard. Generally that is a light duty, but with these men it is not so much so. None of the men can read or write, and so the sergeant and corporal of each relief has to have the names of his relief repeated to him until he remembers them. Even then there are many mix-ups that have to be straightened out. The names are strange to me, and after writing them as they sound, I find it difficult to pronounce them. I went the rounds during every relief, and never failed to find something out of joint. One at the Major's tent, whom I had taken extra pains to educate, I found taking his gun apart to see how it was made. Another had his shoes and stockings off and was walking his beat with bare feet. Another had taken off his accoutrements and piled them up at the end of his beat and was strutting back and forth with folded arms. The only thing to do is to call up a man who speaks both French and English and through him straighten the matter out. November 9, 1863. Monday. To-day an order came to move to New Orleans. That is, all the companies that are full. That leaves Company D here until more men come. There is a regular jollification over the order, as none of us are in love with this place. I suppose it would be a proper thing for me to introduce the officers of the Ninetieth to whom the readers of this diary may be, and as there is nothing to prevent I will do it
  • 78. now. If I ever get a chance to read it myself it will call them up before me as I now know them. Colonel Edward Bostwick comes first, and any one who will be apt to read this knows him as well as I. But as I want the list complete I will begin with him and work down the line. He is about five feet ten inches, light complexion, gray eyes, with brown hair and beard. He is rather particular about his own appearance, and also that of the men under him. He is always on the lookout for a higher limb to roost on, and after getting there himself, is very good about helping his friends up to him. He seldom drinks, never to excess, and on the whole is a good soldier. He came out as captain of Company B, 128th New York. Was promoted to major of the First Louisiana Engineers, May 2, 1863. He served at Port Hudson with them and had the name of doing well whatever he was ordered to do. In August 1863, was promoted to the rank of colonel, with permission to raise a regiment from the freed slaves in this department, and this he is now trying to do. Lieutenant Colonel George Parker is from Poughkeepsie. Came out as captain of Company D, 128th New York. On Colonel Bostwick's recommendation he was promoted to his present rank. He is about five feet seven inches, light complexion, sandy hair and beard. Is well up in military tactics, and is afraid of nothing. Rushes right into anything, regardless of getting out again. Is kind to his men, but a strict disciplinarian. When his orders are obeyed he is all right, but when he gets angry he acts without judgment or feeling for any one or anything. Major Rufus J. Palon is from Hudson. Came out as second lieutenant in Company G, 128th New York. He has the army regulations and military tactics at his tongue's end. Is pretty strict on discipline, but never loses his head. Money has no value to him. He would give his last cent to any one in need, even though he might be just as needy himself.
  • 79. Surgeon Charles E. Warren is tall, dark complexion, with dark sandy hair and beard. So far as I know he is a good surgeon. He is free with his money, and with the hospital whiskey. A real good fellow, though not in all things the sort one can pattern after with safety. Quartermaster Peter J. Schemerhorn left home as orderly sergeant of Company G, 128th New York. Acted as second lieutenant of his company at Port Hudson, and was afterwards detailed as clerk at headquarters, where he remained until the formation of this regiment, when he was made first lieutenant and acting quartermaster. He makes a good quartermaster, seeing that his stock is kept up and ready for distribution. Adjutant T. Augustus Phillips is one of the boys. He served in the Second Fire Zouaves in the three months' service and afterwards came out as orderly sergeant in the 165th New York. Was detailed as clerk at headquarters and in some way got a recommendation for adjutant in Colonel Bostwick's regiment. He is a New York tough. Gets drunk as a lord, and looks down upon any one else who does not do as he does. He is not as popular in the regiment as he might be. Captain Thomas E. Merritt was formerly sergeant in Company I, 128th New York. Was raised to acting second lieutenant of same company, and finally promoted to captain in this regiment. He has traveled a great deal and remembers what he has seen. He seems well fitted for the position he now holds and stands well with all hands. Captain Charles Hoyt is as good an all-round man as is often found. He is fine-looking, a fine singer, has a way of being everyone's friend, and making everyone a friend to himself. He is cut out more for society than for the army. He takes now and then a drink, but never gets beyond himself. Will share his last dollar or his last hard- tack with any one. Altogether, he acts as a sort of balance wheel to the rest of the machine, keeping some from going too fast, and
  • 80. helping others to go faster. He would be missed if taken away, more than any half dozen of us. Captain Richard Enoch came out as first sergeant of Company I, 128th New York. He was wounded at Port Hudson, and did not again join his company, being recommended for promotion as first lieutenant in the Corps de Afrique, from which he came to us with a captain's commission. He has a jovial disposition, but has a very quiet way of showing it. He sometimes takes a little too much, and then is reckless of his money and of the good name he has gained. Every one likes him, because they cannot help it. As a military man I doubt if he is ever heard much about. He had rather have a good time, and no matter what is going on he generally manages to have it. There are several other officers who have not yet reported and of them I know nothing. One of them is Captain Laird, who will be captain of Company D, when he comes. First Lieutenant Robert H. Clark was promoted from sergeant in the 116th New York. He is an excellent penman and would make a much better clerk in some department office than he ever will a soldier. He is rather hasty tempered, and has already had several jars with his brother officers, particularly with Adjutant Phillips, whose assistant he at present is. If Adjutant Phillips kicks clear out from the traces Lieutenant Clark will probably succeed him. First Lieutenant Martin Smith was formerly an engineer on the Harlem R. R. He went out with a three months' regiment and afterwards as sergeant in Company G, 128th New York. He is open- hearted and outspoken. One can always tell where he is, for he is not deceitful. He is well liked by his brother officers. Just now he lies on his back on my bed making fun of a stove I have manufactured out of a camp kettle. He has no idea I am writing his biography. First Lieutenant Reuben Reynolds is from Hudson, N. Y. He came out as a private in Company A, 128th New York. Was promoted to
  • 81. corporal, then to sergeant and then to first lieutenant in this regiment. He looks as if he had just been taken from a bandbox. No matter what clothes he has on he always looks neat and well dressed. He was on a three years' whaling voyage before the war, and tells some very interesting stories of his life on shipboard. Before he came to us he was detailed as clerk in the Y. M. C. A. at New Orleans. He is a professor of religion, and I think tries to make his profession and his army life jibe. We all respect him, though none of us feel as if we fairly knew him. First Lieutenant John Mathers is from Fishkill, N. Y. He came out as a private in Company F, 128th New York. Was promoted to second lieutenant in the Third Engineers, and from that to our regiment as first lieutenant. For some unknown reason he and I took a dislike to each other while in the 128th, and used to pass each other by as one surly dog does another. Since we have been thrown together we have talked the matter over, and neither of us can give any reason for our mutual dislike. We are the firmest of friends now, together much of the time we can call our own. We are not a bit alike. He is a regular dandy in appearance but the commonest sort of a fellow when you get at him. First Lieutenant Charles Heath was a sergeant in Company I, 128th New York. Was given a commission in the Third Louisiana Engineers, and afterwards given the same position in this regiment. In my opinion his head is not right. He acts strange at times. Sometimes he is as quiet and docile as can be, and in a little while as profane and foul-mouthed a man as I ever met. Is not ambitious, but seems to take what comes as a matter of course. He has no intimates, keeping mostly to himself. What influence ever brought him up from the ranks I cannot imagine. First Lieutenant Garret F. Dillon was promoted from sergeant in Company H, 128th New York. He is a very small man, has a lisp, and a mincing walk. He looks and acts as if he was cut out for a dandy, but lacked the material for making one, and was thrown out in the shape he now is.
  • 82. First Lieutenant Charles M. Bell was first sergeant of Company G, 128th New York. At the battle of Port Hudson he happened to be nearest Colonel Cowles when he fell. He received the colonel's dying message to his mother and was sent home with the body. He is one of the most capable of the whole lot of us. There is no position he could not fill, were it not for his liking for strong drink. This he does not seem able to control. I believe he tries to but lacks the strength to resist the temptations that are constantly placed in his way. Poor Bell, I pity him more than any other man here. With the right influences about him, what a different man he might be. He has more good traits than any of us can boast, but his one besetting weakness is strong enough to overcome them all. First Lieutenant George H. Gorton enlisted in the 128th New York, as wagoner. Was promoted to commissary sergeant in the Third Louisiana Engineers, and from there he came as first lieutenant to this regiment. He is of a strange make-up. Is well liked by all, but not greatly respected by any. Is a good horseman and would probably make out better handling horses than he does men. Put him anywhere, and he manages to make money, and manages to spend it as fast as he gets it. Is free-hearted and obliging and I never knew of his having an enemy. Neither does he make any lasting friendships. He worked as teamster for Colonel Bostwick before going into the army, and it was through Colonel Bostwick that he got the position he now occupies. First Lieutenant Henry C. Lay was a corporal in Company A, 128th New York. I knew him while in that regiment, but he has not yet reported for duty with us. He is on some special service and I suppose will sometime turn up among us. From what little I know of him I should say he will average well with the rest of us. First Lieutenant George S. Drake was also with Colonel Bostwick before he entered the army. He was commissary sergeant in the 128th New York, and always in close touch with Colonel B. He and I have long been fast friends, so it will not do to say anything against him. But I couldn't if I would. There is nothing but good to say of
  • 83. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com