Language And Speech Processing 1st Edition Joseph Mariani

Language And Speech Processing 1st Edition
Joseph Mariani download
https://ebookbell.com/product/language-and-speech-processing-1st-
edition-joseph-mariani-2528496
Explore and download more ebooks at ebookbell.com

Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Statistical Language And Speech Processing 8th International
Conference Slsp 2020 Cardiff Uk October 1416 2020 Proceedings 1st Ed
Luis Espinosaanke
https://ebookbell.com/product/statistical-language-and-speech-
processing-8th-international-conference-slsp-2020-cardiff-uk-
october-1416-2020-proceedings-1st-ed-luis-espinosaanke-22497272
Statistical Language And Speech Processing First International
Conference Slsp 2013 Tarragona Spain July 2931 2013 Proceedings 1st
Edition Yoshua Bengio Auth
processing-first-international-conference-slsp-2013-tarragona-spain-
july-2931-2013-proceedings-1st-edition-yoshua-bengio-auth-4314662
Statistical Language And Speech Processing Second International
Conference Slsp 2014 Grenoble France October 1416 2014 Proceedings 1st
Edition Laurent Besacier
processing-second-international-conference-slsp-2014-grenoble-france-
october-1416-2014-proceedings-1st-edition-laurent-besacier-4932916
Statistical Language And Speech Processing Third International
Conference Slsp 2015 Budapest Hungary November 2426 2015 Proceedings
1st Edition Adrianhoria Dediu
processing-third-international-conference-slsp-2015-budapest-hungary-
november-2426-2015-proceedings-1st-edition-adrianhoria-dediu-5354880

Conference Slsp 2016 Pilsen Czech Republic October 1112 2016
Proceedings 1st Edition Pavel Krl
processing-4th-international-conference-slsp-2016-pilsen-czech-
republic-october-1112-2016-proceedings-1st-edition-pavel-krl-5607828
Conference Slsp 2017 Le Mans France October 2325 2017 Proceedings 1st
Edition Nathalie Camelin
processing-5th-international-conference-slsp-2017-le-mans-france-
october-2325-2017-proceedings-1st-edition-nathalie-camelin-6790768
Conference Slsp 2018 Mons Belgium October 1516 2018 Proceedings 1st Ed
Thierry Dutoit
processing-6th-international-conference-slsp-2018-mons-belgium-
october-1516-2018-proceedings-1st-ed-thierry-dutoit-7320198
Conference Slsp 2019 Ljubljana Slovenia October 1416 2019 Proceedings
1st Ed 2019 Carlos Martnvide
processing-7th-international-conference-slsp-2019-ljubljana-slovenia-
october-1416-2019-proceedings-1st-ed-2019-carlos-martnvide-10800606
Analysis And Application Of Natural Language And Speech Processing
Mourad Abbas
https://ebookbell.com/product/analysis-and-application-of-natural-
language-and-speech-processing-mourad-abbas-49166200

This page intentionally left blank

Spoken Language
Processing
Edited by
Joseph Mariani

First published in France in 2002 by Hermes Science/Lavoisier entitled Traitement automatique du
langage parlé 1 et 2 © LAVOISIER, 2002
First published in Great Britain and the United States in 2009 by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as
permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced,
stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers,
or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA.
Enquiries concerning reproduction outside these terms should be sent to the publishers at the
undermentioned address:
ISTE Ltd John Wiley & Sons, Inc.
27-37 St George’s Road 111 River Street
London SW19 4EU Hoboken, NJ 07030
UK USA
www.iste.co.uk www.wiley.com
© ISTE Ltd, 2009
The rights of Joseph Mariani to be identified as the author of this work have been asserted by him in
accordance with the Copyright, Designs and Patents Act 1988.
Library of Congress Cataloging-in-Publication Data
Traitement automatique du langage parlé 1 et 2. English
Spoken language processing / edited by Joseph Mariani.
p. cm.
Includes bibliographical references and index.
ISBN 978-1-84821-031-8
1. Automatic speech recognition. 2. Speech processing systems. I. Mariani, Joseph. II. Title.
TK7895.S65T7213 2008
006.4'54--dc22
2008036758
British Library Cataloguing-in-Publication Data
A CIP record for this book is available from the British Library
ISBN: 978-1-84821-031-8
Printed and bound in Great Britain by CPI Antony Rowe Ltd, Chippenham, Wiltshire.

Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Chapter 1. Speech Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Christophe D’ALESSANDRO
1.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1. Source-filter model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2. Speech sounds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3. Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.4. Vocal tract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.1.5. Lip-radiation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2. Linear prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2.1. Source-filter model and linear prediction . . . . . . . . . . . . . . . . 18
1.2.2. Autocorrelation method: algorithm . . . . . . . . . . . . . . . . . . . 21
1.2.3. Lattice filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.2.4. Models of the excitation . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.3. Short-term Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.3.1. Spectrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.3.2. Interpretation in terms of filter bank. . . . . . . . . . . . . . . . . . . 36
1.3.3. Block-wise interpretation . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.3.4. Modification and reconstruction . . . . . . . . . . . . . . . . . . . . . 38
1.4. A few other representations . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.4.1. Bilinear time-frequency representations . . . . . . . . . . . . . . . . 39
1.4.2. Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.4.3. Cepstrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.4.4. Sinusoidal and harmonic representations . . . . . . . . . . . . . . . . 46
1.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
1.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

vi Spoken Language Processing
Chapter 2. Principles of Speech Coding . . . . . . . . . . . . . . . . . . . . . . 55
Gang FENG and Laurent GIRIN
2.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.1.1. Main characteristics of a speech coder . . . . . . . . . . . . . . . . . 57
2.1.2. Key components of a speech coder . . . . . . . . . . . . . . . . . . . 59
2.2. Telephone-bandwidth speech coders . . . . . . . . . . . . . . . . . . . . . 63
2.2.1. From predictive coding to CELP. . . . . . . . . . . . . . . . . . . . . 65
2.2.2. Improved CELP coders . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.2.3. Other coders for telephone speech . . . . . . . . . . . . . . . . . . . . 77
2.3. Wideband speech coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.3.1. Transform coding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.3.2. Predictive transform coding. . . . . . . . . . . . . . . . . . . . . . . . 85
2.4. Audiovisual speech coding. . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.4.1. A transmission channel for audiovisual speech . . . . . . . . . . . . 86
2.4.2. Joint coding of audio and video parameters . . . . . . . . . . . . . . 88
2.4.3. Prospects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Chapter 3. Speech Synthesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Olivier BOËFFARD and Christophe D’ALESSANDRO
3.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.2. Key goal: speaking for communicating . . . . . . . . . . . . . . . . . . . 100
3.2.1. What acoustic content? . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.2.2. What melody? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.2.3. Beyond the strict minimum . . . . . . . . . . . . . . . . . . . . . . . . 103
3.3 Synoptic presentation of the elementary modules in speech synthesis
systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.3.1. Linguistic processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.3.2. Acoustic processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.3.3. Training models automatically . . . . . . . . . . . . . . . . . . . . . . 106
3.3.4. Operational constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.4. Description of linguistic processing . . . . . . . . . . . . . . . . . . . . . 107
3.4.1. Text pre-processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.4.2. Grapheme-to-phoneme conversion . . . . . . . . . . . . . . . . . . . 108
3.4.3. Syntactic-prosodic analysis . . . . . . . . . . . . . . . . . . . . . . . . 110
3.4.4. Prosodic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.5. Acoustic processing methodology . . . . . . . . . . . . . . . . . . . . . . 114
3.5.1. Rule-based synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.5.2. Unit-based concatenative synthesis . . . . . . . . . . . . . . . . . . . 115
3.6. Speech signal modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.6.1. The source-filter assumption . . . . . . . . . . . . . . . . . . . . . . . 118
3.6.2. Articulatory model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.6.3. Formant-based modeling . . . . . . . . . . . . . . . . . . . . . . . . . 119

Table of Contents vii
3.6.4. Auto-regressive modeling . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.6.5. Harmonic plus noise model . . . . . . . . . . . . . . . . . . . . . . . . 120
3.7. Control of prosodic parameters: the PSOLA technique . . . . . . . . . . 122
3.7.1. Methodology background . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.7.2. The ancestors of the method . . . . . . . . . . . . . . . . . . . . . . . 125
3.7.3. Descendants of the method . . . . . . . . . . . . . . . . . . . . . . . . 128
3.7.4. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
3.8. Towards variable-size acoustic units . . . . . . . . . . . . . . . . . . . . . 131
3.8.1. Constitution of the acoustic database . . . . . . . . . . . . . . . . . . 134
3.8.2. Selection of sequences of units . . . . . . . . . . . . . . . . . . . . . . 138
3.9. Applications and standardization . . . . . . . . . . . . . . . . . . . . . . . 142
3.10. Evaluation of speech synthesis. . . . . . . . . . . . . . . . . . . . . . . . 144
3.10.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
3.10.2. Global evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
3.10.3. Analytical evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
3.10.4. Summary for speech synthesis evaluation. . . . . . . . . . . . . . . 153
3.11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
3.12. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Chapter 4. Facial Animation for Visual Speech . . . . . . . . . . . . . . . . . 169
Thierry GUIARD-MARIGNY
4.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
4.2. Applications of facial animation for visual speech. . . . . . . . . . . . . 170
4.2.1. Animation movies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
4.2.2. Telecommunications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
4.2.3. Human-machine interfaces . . . . . . . . . . . . . . . . . . . . . . . . 170
4.2.4. A tool for speech research. . . . . . . . . . . . . . . . . . . . . . . . . 171
4.3. Speech as a bimodal process. . . . . . . . . . . . . . . . . . . . . . . . . . 171
4.3.1. The intelligibility of visible speech . . . . . . . . . . . . . . . . . . . 172
4.3.2. Visemes for facial animation . . . . . . . . . . . . . . . . . . . . . . . 174
4.3.3. Synchronization issues. . . . . . . . . . . . . . . . . . . . . . . . . . . 175
4.3.4. Source consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
4.3.5. Key constraints for the synthesis of visual speech. . . . . . . . . . . 177
4.4. Synthesis of visual speech . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.4.1. The structure of an artificial talking head. . . . . . . . . . . . . . . . 178
4.4.2. Generating expressions . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.5. Animation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
4.5.1. Analysis of the image of a face. . . . . . . . . . . . . . . . . . . . . . 180
4.5.2. The puppeteer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
4.5.3. Automatic analysis of the speech signal . . . . . . . . . . . . . . . . 181
4.5.4. From the text to the phonetic string . . . . . . . . . . . . . . . . . . . 181
4.6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
4.7. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

viii Spoken Language Processing
Chapter 5. Computational Auditory Scene Analysis . . . . . . . . . . . . . . 189
Alain DE CHEVEIGNÉ
5.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
5.2. Principles of auditory scene analysis . . . . . . . . . . . . . . . . . . . . . 191
5.2.1. Fusion versus segregation: choosing a representation . . . . . . . . 191
5.2.2. Features for simultaneous fusion. . . . . . . . . . . . . . . . . . . . . 191
5.2.3. Features for sequential fusion. . . . . . . . . . . . . . . . . . . . . . . 192
5.2.4. Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.2.5. Illusion of continuity, phonemic restoration . . . . . . . . . . . . . . 193
5.3. CASA principles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.3.1. Design of a representation. . . . . . . . . . . . . . . . . . . . . . . . . 193
5.4. Critique of the CASA approach . . . . . . . . . . . . . . . . . . . . . . . . 200
5.4.1. Limitations of ASA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
5.4.2. The conceptual limits of “separable representation” . . . . . . . . . 202
5.4.3. Neither a model, nor a method? . . . . . . . . . . . . . . . . . . . . . 203
5.5. Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.5.1. Missing feature theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.5.2. The cancellation principle. . . . . . . . . . . . . . . . . . . . . . . . . 204
5.5.3. Multimodal integration . . . . . . . . . . . . . . . . . . . . . . . . . . 205
5.5.4. Auditory scene synthesis: transparency measure . . . . . . . . . . . 205
5.6. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Chapter 6. Principles of Speech Recognition . . . . . . . . . . . . . . . . . . . 213
Renato DE MORI and Brigitte BIGI
6.1. Problem definition and approaches to the solution. . . . . . . . . . . . . 213
6.2. Hidden Markov models for acoustic modeling . . . . . . . . . . . . . . . 216
6.2.1. Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.2.2. Observation probability and model parameters . . . . . . . . . . . . 217
6.2.3. HMM as probabilistic automata . . . . . . . . . . . . . . . . . . . . . 218
6.2.4. Forward and backward coefficients . . . . . . . . . . . . . . . . . . . 219
6.3. Observation probabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
6.4. Composition of speech unit models . . . . . . . . . . . . . . . . . . . . . 223
6.5. The Viterbi algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
6.6. Language models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
6.6.1. Perplexity as an evaluation measure for language models . . . . . . 230
6.6.2. Probability estimation in the language model . . . . . . . . . . . . . 232
6.6.3. Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . 234
6.6.4. Bayesian estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
6.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
6.8. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

Table of Contents ix
Chapter 7. Speech Recognition Systems . . . . . . . . . . . . . . . . . . . . . . 239
Jean-Luc GAUVAIN and Lori LAMEL
7.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
7.2. Linguistic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
7.3. Lexical representation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
7.4. Acoustic modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
7.4.1. Feature extraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
7.4.2. Acoustic-phonetic models. . . . . . . . . . . . . . . . . . . . . . . . . 249
7.4.3. Adaptation techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
7.5. Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
7.6. Applicative aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
7.6.1. Efficiency: speed and memory . . . . . . . . . . . . . . . . . . . . . . 257
7.6.2. Portability: languages and applications . . . . . . . . . . . . . . . . . 259
7.6.3. Confidence measures. . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
7.6.4. Beyond words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
7.7. Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
7.7.1. Text dictation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
7.7.2. Audio document indexing. . . . . . . . . . . . . . . . . . . . . . . . . 263
7.7.3. Dialog systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
7.8. Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
7.9. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
Chapter 8. Language Identification . . . . . . . . . . . . . . . . . . . . . . . . . 279
Martine ADDA-DECKER
8.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
8.2. Language characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
8.3. Language identification by humans. . . . . . . . . . . . . . . . . . . . . . 286
8.4. Language identification by machines. . . . . . . . . . . . . . . . . . . . . 287
8.4.1. LId tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
8.4.2. Performance measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
8.4.3. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
8.5. LId resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
8.6. LId formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
8.7. Lid modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
8.7.1. Acoustic front-end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
8.7.2. Acoustic language-specific modeling . . . . . . . . . . . . . . . . . . 300
8.7.3. Parallel phone recognition. . . . . . . . . . . . . . . . . . . . . . . . . 302
8.7.4. Phonotactic modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
8.7.5. Back-end optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . 309
8.8. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
8.9. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

x Spoken Language Processing
Chapter 9. Automatic Speaker Recognition . . . . . . . . . . . . . . . . . . . . 321
Frédéric BIMBOT.
9.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
9.1.1. Voice variability and characterization. . . . . . . . . . . . . . . . . . 321
9.1.2. Speaker recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
9.2. Typology and operation of speaker recognition systems . . . . . . . . . 324
9.2.1. Speaker recognition tasks . . . . . . . . . . . . . . . . . . . . . . . . . 324
9.2.2. Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
9.2.3. Text-dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
9.2.4. Types of errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
9.2.5. Influencing factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
9.3. Fundamentals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
9.3.1. General structure of speaker recognition systems . . . . . . . . . . . 329
9.3.2. Acoustic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
9.3.3. Probabilistic modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
9.3.4. Identification and verification scores . . . . . . . . . . . . . . . . . . 335
9.3.5. Score compensation and decision . . . . . . . . . . . . . . . . . . . . 337
9.3.6. From theory to practice . . . . . . . . . . . . . . . . . . . . . . . . . . 342
9.4. Performance evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
9.4.1. Error rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
9.4.2. DET curve and EER . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
9.4.3. Cost function, weighted error rate and HTER . . . . . . . . . . . . . 346
9.4.4. Distribution of errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
9.4.5. Orders of magnitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
9.5. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
9.5.1. Physical access control. . . . . . . . . . . . . . . . . . . . . . . . . . . 348
9.5.2. Securing remote transactions . . . . . . . . . . . . . . . . . . . . . . . 349
9.5.3. Audio information indexing. . . . . . . . . . . . . . . . . . . . . . . . 350
9.5.4. Education and entertainment . . . . . . . . . . . . . . . . . . . . . . . 350
9.5.5. Forensic applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
9.5.6. Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
9.6. Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
9.7. Further reading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Chapter 10. Robust Recognition Methods . . . . . . . . . . . . . . . . . . . . . 355
Jean-Paul HATON
10.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
10.2. Signal pre-processing methods. . . . . . . . . . . . . . . . . . . . . . . . 357
10.2.1. Spectral subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
10.2.2. Adaptive noise cancellation . . . . . . . . . . . . . . . . . . . . . . . 358
10.2.3. Space transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
10.2.4. Channel equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
10.2.5. Stochastic models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
10.3. Robust parameters and distance measures . . . . . . . . . . . . . . . . . 360

Table of Contents xi
10.3.1. Spectral representations . . . . . . . . . . . . . . . . . . . . . . . . . 361
10.3.2. Auditory models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
10.3.3 Distance measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
10.4. Adaptation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
10.4.1 Model composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
10.4.2. Statistical adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
10.5. Compensation of the Lombard effect . . . . . . . . . . . . . . . . . . . . 368
10.6. Missing data scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
10.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
10.8. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
Chapter 11. Multimodal Speech: Two or Three senses are
Better than One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
Jean-Luc SCHWARTZ, Pierre ESCUDIER and Pascal TEISSIER
11.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
11.2. Speech is a multimodal process . . . . . . . . . . . . . . . . . . . . . . . 379
11.2.1. Seeing without hearing . . . . . . . . . . . . . . . . . . . . . . . . . . 379
11.2.2. Seeing for hearing better in noise. . . . . . . . . . . . . . . . . . . . 380
11.2.3. Seeing for better hearing… even in the absence of noise. . . . . . 382
11.2.4. Bimodal integration imposes itself to perception . . . . . . . . . . 383
11.2.5. Lip reading as taking part to the ontogenesis of speech. . . . . . . 385
11.2.6. ...and to its phylogenesis ? . . . . . . . . . . . . . . . . . . . . . . . . 386
11.3. Architectures for audio-visual fusion in speech perception . . . . . . . 388
11.3.1.Three paths for sensory interactions in cognitive psychology . . . 389
11.3.2. Three paths for sensor fusion in information processing . . . . . . 390
11.3.3. The four basic architectures for audiovisual fusion . . . . . . . . . 391
11.3.4. Three questions for a taxonomy . . . . . . . . . . . . . . . . . . . . 392
11.3.5. Control of the fusion process . . . . . . . . . . . . . . . . . . . . . . 394
11.4. Audio-visual speech recognition systems . . . . . . . . . . . . . . . . . 396
11.4.1. Architectural alternatives . . . . . . . . . . . . . . . . . . . . . . . . 397
11.4.2. Taking into account contextual information . . . . . . . . . . . . . 401
11.4.3. Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
11.5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
11.6. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
Chapter 12. Speech and Human-Computer Communication . . . . . . . . . 417
Wolfgang MINKER & Françoise NÉEL
12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
12.2. Context. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
12.2.1. The development of micro-electronics. . . . . . . . . . . . . . . . . 419
12.2.2. The expansion of information and communication technologies and
increasing interconnection of computer systems . . . . . . . . . . . . . . . 420

xii Spoken Language Processing
12.2.3. The coordination of research efforts and the improvement of
automatic speech processing systems . . . . . . . . . . . . . . . . . . . . . . 421
12.3. Specificities of speech. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
12.3.1. Advantages of speech as a communication mode . . . . . . . . . . 424
12.3.2. Limitations of speech as a communication mode . . . . . . . . . . 425
12.3.3. Multidimensional analysis of commercial speech recognition
products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
12.4. Application domains with voice-only interaction. . . . . . . . . . . . . 430
12.4.1. Inspection, control and data acquisition . . . . . . . . . . . . . . . . 431
12.4.2. Home automation: electronic home assistant . . . . . . . . . . . . . 432
12.4.3. Office automation: dictation and speech-to-text systems . . . . . . 432
12.4.4. Training. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
12.4.5. Automatic translation . . . . . . . . . . . . . . . . . . . . . . . . . . 438
12.5. Application domains with multimodal interaction . . . . . . . . . . . . 439
12.5.1. Interactive terminals . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
12.5.2. Computer-aided graphic design. . . . . . . . . . . . . . . . . . . . . 441
12.5.3. On-board applications . . . . . . . . . . . . . . . . . . . . . . . . . . 442
12.5.4. Human-human communication facilitation . . . . . . . . . . . . . . 444
12.5.5. Automatic indexing of audio-visual documents . . . . . . . . . . . 446
12.6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
12.7. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
Chapter 13. Voice Services in the Telecom Sector . . . . . . . . . . . . . . . . 455
Laurent COURTOIS, Patrick BRISARD and Christian GAGNOULET
13.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
13.2. Automatic speech processing and telecommunications . . . . . . . . . 456
13.3. Speech coding in the telecommunication sector . . . . . . . . . . . . . 456
13.4. Voice command in telecom services . . . . . . . . . . . . . . . . . . . . 457
13.4.1. Advantages and limitations of voice command . . . . . . . . . . . 457
13.4.2. Major trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
13.4.3. Major voice command services . . . . . . . . . . . . . . . . . . . . . 460
13.4.4. Call center automation (operator assistance) . . . . . . . . . . . . . 460
13.4.5. Personal voice phonebook . . . . . . . . . . . . . . . . . . . . . . . . 462
13.4.6. Voice personal telephone assistants . . . . . . . . . . . . . . . . . . 463
13.4.7. Other services based on voice command . . . . . . . . . . . . . . . 463
13.5. Speaker verification in telecom services . . . . . . . . . . . . . . . . . . 464
13.6. Text-to-speech synthesis in telecommunication systems . . . . . . . . 464
13.7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
13.8. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
List of Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471

Preface
This book, entitled Spoken Language Processing, addresses all the aspects
covering the automatic processing of spoken language: how to automate its
production and perception, how to synthesize and understand it. It calls for existing
know-how in the field of signal processing, pattern recognition, stochastic modeling,
computational linguistics, human factors, but also relies on knowledge specific to
spoken language.
The automatic processing of spoken language covers activities related to the
analysis of speech, including variable rate coding to store or transmit it, to its
synthesis, especially from text, to its recognition and understanding, should it be for
a transcription, possibly followed by an automatic indexation, or for human-machine
dialog or human-human machine-assisted interaction. It also includes speaker and
spoken language recognition. These tasks may take place in a noisy environment,
which makes the problem even more difficult.
The activities in the field of automatic spoken language processing started after
the Second World War with the works on the Vocoder and Voder at Bell Labs by
Dudley and colleagues, and were made possible by the availability of electronic
devices. Initial research work on basic recognition systems was carried out with very
limited computing resources in the 1950s. The computer facilities that became
available to researchers in the 1970s made it possible to achieve initial progress
within laboratories, and microprocessors then led to the early commercialization of
the first voice recognition and speech synthesis systems at an affordable price. The
steady progress in the speed of computers and in the storage capacity accompanied
the scientific advances in the field.
Research investigations in the 1970s, including those carried out in the large
DARPA “Speech Understanding Systems” (SUS) program in the USA, suffered
from a lack of availability of speech data and of means and methods for evaluating

xiv Spoken Language Processing
the performance of different approaches and systems. The establishment by
DARPA, as part of its following program launched in 1984, of a national language
resources center, the Linguistic Data Consortium (LDC), and of a system assessment
center, within the National Institute of Standards and Technology (NIST, formerly
NBS), brought this area of research into maturity. The evaluation campaigns in the
area of speech recognition, launched in 1987, made it possible to compare the
different approaches that had coexisted up to then, based on “Artificial Intelligence”
methods or on stochastic modeling methods using large amounts of data for training,
with a clear advantage to the latter. This led progressively to a quasi-generalization
of stochastic approaches in most laboratories in the world. The progress made by
researchers has constantly accompanied the increasing difficulty of the tasks which
were handled, starting from the recognition of sentences read aloud, with a limited
vocabulary of 1,000 words, either speaker-dependent or speaker-independent, to the
dictation of newspaper articles for vocabularies of 5,000, 20,000 and 64,000 words,
and then to the transcription of radio or television broadcast news, with unlimited
size vocabularies. These evaluations were opened to the international community in
1992. They first focused on the American English language, but early initiatives
were also carried out on the French, German or British English languages in a
French or European context. Other campaigns were subsequently held on speaker
recognition, language identification or speech synthesis in various contexts,
allowing for a better understanding of the pros and cons of an approach, and for
measuring the status of technology and the progress achieved or still to be achieved.
They led to the conclusion that a sufficient level of maturation has been reached for
putting the technology on the market, in the field of voice dictation systems for
example. However, it also identified the difficulty of other more challenging
problems, such as those related to the recognition of conversational speech,
justifying the need to keep on supporting fundamental research in this area.
This book consists of two parts: a first part discusses the analysis and synthesis
of speech and a second part speech recognition and understanding. The first part
starts with a brief introduction of the principles of speech production, followed by a
broad overview of the methods for analyzing speech: linear prediction, short-term
Fourier transform, time-representations, wavelets, cepstrum, etc. The main methods
for speech coding are then developed for the telephone bandwidth, such as the CELP
coder, or, for broadband communication, such as “transform coding” and
quantization methods. The audio-visual coding of speech is also introduced. The
various operations to be carried out in a text-to-speech synthesis system are then
presented regarding the linguistic processes (grapheme-to-phoneme transcription,
syntactic and prosodic analysis) and the acoustic processes, using rule-based
approaches or approaches based on the concatenation of variable length acoustic
units. The different types of speech signal modeling – articulatory, formant-based,
auto-regressive, harmonic-noise or PSOLA-like – are then described. The evaluation
of speech synthesis systems is a topic of specific attention in this chapter. The

Preface xv
extension of speech synthesis to talking faces animation is the subject of the next
chapter, with a presentation of the application fields, of the interest of a bimodal
approach and of models used to synthesize and animate the face. Finally,
computational auditory scene analysis opens prospects in the signal processing of
speech, especially in noisy environments.
The second part of the book focuses on speech recognition. The principles of
speech recognition are first presented. Hidden Markov models are introduced, as
well as their use for the acoustic modeling of speech. The Viterbi algorithm is
depicted, before introducing language modeling and the way to estimate
probabilities. It is followed by a presentation of recognition systems, based on those
principles and on the integration of those methodologies, and of lexical and
acoustic-phonetic knowledge. The applicative aspects are highlighted, such as
efficiency, portability and confidence measures, before describing three types of
recognition systems: for text dictation, for audio documents indexing and for oral
dialog. Research in language identification aims at recognizing which language is
spoken, using acoustic, phonetic, phonotactic or prosodic information. The
characteristics of languages are introduced and the way humans or machines can
achieve that task is depicted, with a large presentation of the present performances
of such systems. Speaker recognition addresses the recognition and verification of
the identity of a person based on his voice. After an introduction on what
characterizes a voice, the different types and designs of systems are presented, as
well as their theoretical background. The way to evaluate the performances of
speaker recognition systems and the applications of this technology are a specific
topic of interest. The use of speech or speaker recognition systems in noisy
environments raises especially difficult problems to solve, but they must be taken
into account in any operational use of such systems. Various methods are available,
either by pre-processing the signal, during the parameterization phase, by using
specific distances or by adaptation methods. The Lombard effect, which causes a
change in the production of the voice signal itself due to the noisy environment
surrounding the speaker, benefits from a special attention. Along with recognition
based solely on the acoustic signal, bi-modal recognition combines two acquisition
channels: auditory and visual. The value added by bimodal processing in a noisy
environment is emphasized and architectures for the audiovisual merging of audio
and visual speech recognition are presented. Finally, applications of automatic
spoken language processing systems, generally for human-machine communication
and particularly in telecommunications, are described. Many applications of speech
coding, recognition or synthesis exist in many fields, and the market is growing
rapidly. However, there are still technological and psychological barriers that require
more work on modeling human factors and ergonomics, in order to make those
systems widely accepted.

xvi Spoken Language Processing
The reader, undergraduate or graduate student, engineer or researcher will find in
this book many contributions of leading French experts of international renown who
share the same enthusiasm for this exciting field: the processing by machines of a
capacity which used to be specific to humans: language.
Finally, as editor, I would like to warmly thank Anna and Frédéric Bimbot for
the excellent work they achieved in translating the book Traitement automatique du
langage parlé, on which this book is based.
Joseph Mariani
November 2008

Chapter 1
Speech Analysis
1.1. Introduction
1.1.1. Source-filter model
Speech, the acoustic manifestation of language, is probably the main means of
communication between human beings. The invention of telecommunications and
the development of digital information processing have therefore entailed vast
amounts of research aimed at understanding the mechanisms of speech
communication.
Speech can be approached from different angles. In this chapter, we will
consider speech as a signal, a one-dimensional function, which depends on the time
variable (as in [BOI 87, OPP 89, PAR 86, RAB 75, RAB 77]). The acoustic speech
signal is obtained at a given point in space by a sensor (microphone) and converted
into electrical values. These values are denoted )
(t
s and they represent a real-valued
function of real variable t, analogous to the variation of the acoustic pressure. Even
if the acoustic form of the speech signal is the most widespread (it is the only signal
transmitted over the telephone), other types of analysis also exist, based on
alternative physiological signals (for instance, the electroglottographic signal, the
palatographic signal, the airflow), or related to other modalities (for example, the
image of the face or the gestures of the articulators). The field of speech analysis
covers the set of methods aiming at the extraction of information on and from this
signal, in various applications, such as:
Chapter written by Christophe D’ALESSANDRO.

2 Spoken Language Processing
– speech coding: the compression of information carried by the acoustic signal,
in order to save data storage or to reduce transmission rate;
– speech recognition and understanding, speaker and spoken language
recognition;
– speech synthesis or automatic speech generation, from an arbitrary text;
– speech signal processing, which covers many applications, such as auditory
aid, denoising, speech encrypting, echo cancellation, post-processing for audiovisual
applications;
– phonetic and linguistic analysis, speech therapy, voice monitoring in
professional situations (for instance, singers, speakers, teachers, managers, etc.).
Two ways of approaching signal analysis can be distinguished: the model-based
approach and the representation-based approach. When a voice signal model (or a
voice production model or a voice perception model) is assumed, the goal of the
analysis step is to identify the parameters of that model. Thus, many analysis
methods, referred to as parametric methods, are based on the source-filter model of
speech production; for example, the linear prediction method. On the other hand,
when no particular hypothesis is made on the signal, mathematical representations
equivalent to its time representation can be defined, so that new information can be
drawn from the coefficients of the representation. An example of a non-parametric
method is the short-term Fourier transform (STFT). Finally, there are some hybrid
methods (sometimes referred to as semi-parametric). These consist of estimating
some parameters from non-parametric representations. The sinusoidal and cepstral
representations are examples of semi-parametric representation.
This chapter is centered on the linear acoustic source-filter speech production
model. It presents the most common speech signal analysis techniques, together with
a few illustrations. The reader is assumed to be familiar with the fundamentals of
digital signal processing, such as discrete-time signals, Fourier transform, Laplace
transform, Z-transforms and digital filters.
1.1.2. Speech sounds
The human speech apparatus can be broken down into three functional parts
[HAR 76]: 1) the lungs and trachea, 2) the larynx and 3) the vocal tract. The
abdomen and thorax muscles are the engine of the breathing process. Compressed
by the muscular system, the lungs act as bellows and supply some air under pressure
which travels through the trachea (subglottic pressure). The airflow thus expired is
then modulated by the movements of the larynx and those of the vocal tract.

Speech Analysis 3
The larynx is composed of the set of muscles, articulated cartilage, ligaments and
mucous membranes located between the trachea on one side, and the pharyngeal
cavity on the other side. The cartilage, ligaments and muscles in the larynx can set
the vocal cords in motion, the opening of which is called the glottis. When the vocal
cords lie apart from each other, the air can circulate freely through the glottis and no
sound is produced. When both membranes are close to each other, they can join and
modulate the subglottic airflow and pressure, thus generating isolated pulses or
vibrations. The fundamental frequency of these vibrations governs the pitch of the
voice signal (F0).
The vocal tract can be subdivided into three cavities: the pharynx (from the
larynx to the velum and the back of the tongue), the oral tract (from the pharynx to
the lips) and the nasal cavity. When it is open, the velum is able to divert some air
from the pharynx to the nasal cavity. The geometrical configuration of the vocal
tract depends on the organs responsible for the articulation: jaws, lips, tongue.
Each language uses a certain subset of sounds, among those that the speech
apparatus can produce [MAL 74]. The smallest distinctive sound units used in a
given language are called phonemes. The phoneme is the smallest spoken unit
which, when substituted with another one, changes the linguistic content of an
utterance. For instance, changing the initial /p/ sound of “pig” (/pIg/) into /b / yields
a different word: “big” (/bIg/). Therefore, the phonemes /p/ and /b/ can be
distinguished from each other.
A set of phonemes, which can be used for the description of various languages
[WEL 97], is given in Table 1.1 (described both by the International Phonetic
Alphabet, IPA, and the computer readable Speech Assessment Methodologies
Phonetic Alphabet, SAMPA). The first subdivision that is observed relates to the
excitation mode and to the vocal tract stability: the distinction between vowels and
consonants. Vowels correspond to a periodic vibration of the vocal cords and to a
stable configuration of the vocal tract. Depending on whether the nasal branch is
open or not (as a result of the lowering of the velum), vowels have either a nasal or
an oral character. Semivowels are produced when the periodic glottal excitation
occurs simultaneously with a fast movement of the vocal tract, between two vocalic
positions.
Consonants correspond to fast constriction movements of the articulatory organs,
i.e. generally to rather unstable sounds, which evolve over time. For fricatives, a
strong constriction of the vocal tract causes a friction noise. If the vocal cords
vibrate at the same time, the fricative consonant is then voiced. Otherwise, if the
vocal folds let the air pass through without producing any sound, the fricative is
unvoiced. Plosives are obtained by a complete obstruction of the vocal tract,
followed by a release phase. If produced together with the vibration of the vocal

cords, the plosive is voiced, otherwise it is unvoiced. If the nasal branch is opened
during the mouth closure, the produced sound is a nasal consonant. Semivowels are
considered voiced consonants, resulting from a fast movement which briefly passes
through the articulatory position of a vowel. Finally, liquid consonants are produced
as the combination of a voiced excitation and fast articulatory movements, mainly
from the tongue.
SAMPA IPA Unicode label and exemplification
symbol ASCII hex dec.
Vowels
A 65 Ǡ script a 0251 593
open back unrounded, Cardinal 5, Eng.
start
{ 123 æ
ae
ligature
00E6 230 near-open front unrounded, Eng. trap
6 54 ǟ turned a 0250 592 open schwa, Ger. besser
Q 81 ǡ
turned
script a
0252 594 open back rounded, Eng. lot
E 69 Ǫ epsilon 025B 603 open-mid front unrounded, Fr. même
@ 64 ԥ turned e 0259 601 schwa, Eng. banana
3 51 ǫ
rev.
epsilon
025C 604 long mid central, Eng. nurse
I 73 ǹ
small
cap I
026A 618 lax close front unrounded, Eng. kit
O 79 ǣ turned c 0254 596 open-mid back rounded, Eng. thought
2 50 ø o-slash 00F8 248 close-mid front rounded, Fr. deux
9 57 œ
oe
ligature
0153 339 open-mid front rounded, Fr. neuf
38 ȅ
s.c. OE
ligature
0276 630 open front rounded, Swedish skörd
U 85 ș upsilon 028A 650 lax close back rounded, Eng. foot
} 125 Ș barred u 0289 649 close central rounded, Swedish sju
V 86 ț turned v 028C 652 open-mid back unrounded, Eng. strut
Y 89 Ȟ
small
cap Y
028F 655 lax [y], Ger. hübsch

Speech Analysis 5
Consonants
B 66 ȕ beta 03B2 946 Voiced bilabial fricative, Sp. cabo
C 67 ç c-cedilla 00E7 231 voiceless palatal fricative, Ger. ich
D 68 ð eth 00F0 240 Voiced dental fricative, Eng. then
G 71 ǲ gamma 0263 611 Voiced velar fricative, Sp. fuego
L 76 ȝ turned y 028E 654 Palatal lateral, It. famiglia
J 74 ȁ
left-tail
n
0272 626 Palatal nasal, Sp. año
N 78 ƾ eng 014B 331 velar nasal, Eng. thing
R 82 Ȑ
inv. s.c.
R
0281 641 Voiced uvular fricative. or trill, Fr. roi
S 83 Ȓ esh 0283 643
voiceless palatoalveolar fricative, Eng.
ship
T 84 ș theta 03B8 952 voiceless dental fricative, Eng. thin
H 72 Ǵ turned h 0265 613 labial-palatal semivowel, Fr. huit
Z 90 Ș
ezh
(yogh)
0292 658 vd. palatoalveolar fric., Eng. measure
? 63 ȣ dotless ? 0294 660
glottal stop, Ger. Verein, also Danish
stød
Table 1.1. Computer-readable Speech Assessment Methodologies Phonetic Alphabet,
SAMPA, and its correspondence in the International Phonetic Alphabet,
IPA, with examples in 6 different languages [WEL 97]
In speech production, sound sources appear to be relatively localized; they excite
the acoustic cavities in which the resulting air disturbances propagate and then
radiate to the outer acoustic field. This relative independence of the sources with the
transformations that they undergo is the basis for the acoustic theory of speech
production [FAN 60, FLA 72, STE 99]. This theory considers source terms, on the
one hand, which are generally assumed to be non-linear, and a linear filter on the
other hand, which acts upon and transforms the source signal. This source-filter
decomposition reflects the terminology commonly used in phonetics, which
describes the speech sounds in terms of “phonation” (source) and “articulation”
(filter). The source and filter acoustic contributions can be studied separately, as
they can be considered to be decoupled from each other, in a first approximation.
From the point of view of physics, this model is an approximation, the main
advantage of which is its simplicity. It can be considered as valid at frequencies
below 4 or 5 kHz, i.e. those frequencies for which the propagation in the vocal tract
consists of one-dimensional plane waves. For signal processing purposes, the

acoustic model can be described as a linear system, by neglecting the source-filter
interaction:
s(t) )
(
*
)
(
*
)]
(
)
(
[
)
(
*
)
(
*
)
( t
l
t
v
t
r
t
p
t
l
t
v
t
e [1.1]
)
(
*
)
(
*
)
(
)
(
*
)
( 0 t
l
t
v
t
r
t
u
iT
t
i
g
»
»
¼
º
«
«
¬
ª

¦
f
f
G [1.2]
S(Ȧ) )
(
)
(
)]
(
)
(
[
)
(
)
(
)
( Z
Z
Z
Z
Z
Z
Z L
V
R
P
L
V
E u
u

u
u [1.3]
(
)
(
)
(
)
(
)
(
0
)
(
)
(
)
(
)
(
)
(
Z
T
Z
T
Z
T
Z
T
Z
Z
Z
Z
Z
G
l
v
r
g
u
j
j
j
j
g
i
e
L
e
V
e
R
e
U
iF
u
u
»
»
¼
º
«
«
¬
ª

¸
¸
¹
·
¨
¨
©
§

¦
f

f [1.4]
where s(t) is the speech signal, v(t) the impulse response of the vocal tract, e(t) the
vocal excitation source, l(t) the impulse response of the lip radiation component, p(t)
the periodic part of the excitation, r(t) the non-periodic part of the excitation, ug(t)
the glottal airflow wave, T0 the fundamental period, r(t) the noise part of the
excitation, į the Dirac distribution, and where S(Ȧ), V(Ȧ), E(Ȧ), L(Ȧ), P(Ȧ), R(Ȧ),
Ug(Ȧ) denote the Fourier transforms of s(t), v(t), e(t), l(t), p(t), r(t), ug(t)
respectively. F0=1/T0 is the voicing fundamental frequency. The various terms of the
source-filter model are now going to be studied in more details.
1.1.3. Sources
The source component e(t), E(Ȧ) is a signal composed of a periodic part
(vibrations of the vocal cords, characterized by F0 and the glottal airflow waveform)
and a noise part. The various phonemes use both types of source excitation either
separately or simultaneously.
1.1.3.1. Glottal airflow wave
The study of glottal activity (phonation) is particularly important in speech
science. Physical models of the glottis functioning, in terms of mass-spring systems
have been investigated [FLA 72]. Several types of physiological signals can be used
to conduct studies on the glottal activity (for example, electroglottography, fast
photography, see [TIT 94]). From the acoustic point of view, the glottal airflow
wave, which represents the airflow traveling through the glottis as a function of
time, is preferred to the pressure wave. It is indeed easier to measure the glottal

Speech Analysis 7
airflow rather than the glottal pressure, from physiological data. Moreover, the
pseudo-periodic voicing source p(t) can be broken down into two parts: a pulse
train, which represents the periodic part of the excitation and a low-pass filter, with
an impulse response ug, which corresponds to the (frequency-domain and time-
domain) shape of the glottal airflow wave.
The time-domain shape of the glottal airflow wave (or, more precisely, of its
derivative) generally governs the behavior of the time-domain signal for vowels and
voiced signals [ROS 71]. Time-domain models of the glottal airflow have several
properties in common: they are periodical, always non-negative (no incoming
airflow), they are continuous functions of the time variable, derivable everywhere
except, in some cases, at the closing instant. An example of such a time-domain
model is the Klatt model [KLA 90], which calls for 4 parameters (the fundamental
frequency F0, the voicing amplitude AV, the opening ratio Oq and the frequency TL
of a spectral attenuation filter). When there is no attenuation, the KGLOTT88 model
writes:
°̄
°
®

d
d
d
d

0
0
0
3
2
0
0
)
(
T
t
T
O
for
T
O
t
for
bt
at
t
U
q
q
g
2
0
3
0
2
4
27
4
27
T
O
AV
b
T
O
AV
a
with
q
q
[1.5]
when TL 0, Ug(t) is filtered by an additional low-pass filter, with an attenuation at
3,000 Hz equal to TL dB.
The LF model [FAN 85] represents the derivative of the glottal airflow with 5
parameters (fundamental period T0, amplitude at the minimum of the derivative or at
the maximum of the wave Ee, instant of maximum excitation Te, instant of
maximum airflow wave Tp, time constant for the return phase Ta):
°
°
¯
°
°
®

d
d

d
d

0
)
(
)
(
)
(
'
for
)
(
0
for
)
/
sin(
)
/
sin(
)
(
0
T
t
T
e
e
T
E
T
t
T
T
T
t
e
E
t
U
e
T
T
T
t
a
e
e
p
e
p
T
t
a
e
g
e
e
e
H
H
H
S
S
[1.6]
In this equation, parameter İ is defined by an implicit equation:
0
( )
1 e
T T
a
T e H
H
[1.7]

All time-domain models (see Figure 1.1) have at least three main parameters: the
voicing amplitude, which governs the time-domain amplitude of the wave, the
voicing period, and the opening duration, i.e. the fraction of the period during which
the wave is non-zero. In fact, the glottal wave represents the airflow traveling
through the glottis. This flow is zero when the vocal chords are closed. It is positive
when they are open. A fourth parameter is introduced in some models to account for
the speed at which the glottis closes. This closing speed is related to the high
frequency part of the speech spectrum.
Figure 1.1. Models of the glottal airflow waveform in the time domain: triangular model,
Rosenberg model, KGLOT88, LF and the corresponding spectra

Speech Analysis 9
The general shape of the glottal airflow spectrum is one of a low-pass filter. Fant
[FAN 60] uses four poles on the negative real axis:
–
4
1
)
1
(
)
( 0
r r
g
g
s
s
U
s
U [1.8]
with sr1 | sr2 = 2ʌ × 100 Hz, and sr3 = 2ʌ ×2,000 Hz, sr4 = 2ʌ ×4,000 Hz. This is a
spectral model with six parameters (F0, Ug0 and four poles), among which two are
fixed (sr3 and sr4). This simple form is used in [MAR 76] in the digital domain, as a
second-order low-pass filter, with a double real pole in K:
2
1
)
1
(
)
( 0

Kz
U
z
U
g
g [1.9]
Two poles are sufficient in this case, as the numerical model is only valid up to
approximately 4,000 Hz. Such a filter depends on three parameters: gain Ug0, which
corresponds to the voicing amplitude, fundamental frequency F0 and a frequency
parameter K, which replaces both sr1 and sr2. The spectrum shows an asymptotic
slope of –12 dB/octave when the frequency increases. Parameter K controls the
filter’s cut-off frequency. When the frequency tends towards zero, |Ug(0)| a Ug0.
Therefore, the spectral slope is zero in the neighborhood of zero, and –12 dB/octave,
for frequencies above a given bound (determined by K). When the focus is put on
the derivative of the glottal airflow, the two asymptotes have slopes of +6 dB/octave
and –6 dB/octave respectively. This explains the existence of a maximum in the
speech spectrum at low frequencies, stemming from the glottal source.
Another way to calculate the glottal airflow spectrum is to start with time-
domain models. For the Klatt model, for example, the following expression is
obtained for the Laplace transform L, when there is no additional spectral
attenuation:
¸
¸
¹
·
¨
¨
©
§

c

2
)
1
(
6
)
2
1
(
2
1
4
27
)
)(
(
s
e
s
e
e
s
s
n
L
s
s
s
g [1.10]

Figure 1.2. Schematic spectral representation of the glottal airflow waveform. Solid line:
abrupt closure of the vocal cords (minimum spectral slope). Dashed line: dampened closure.
The cut-off frequency owed to this dampening is equal to 4 times the spectral maximum Fg
It can be shown that this is a low-pass spectrum. The derivative of the glottal
airflow shows a spectral maximum located at:
0
1
3
T
O
f
q
g
S
[1.11]
This sheds light on the links between time-domain and frequency-domain
parameters: the opening ratio (i.e. the ratio between the opening duration of the
glottis and the overall glottal period) governs the spectral peak frequency. The time-
domain amplitude rules the frequency-domain amplitude. The closing speed of the
vocal cords relates directly to the spectral attenuation in the high frequencies, which
shows a minimum slope of –12 dB/octave.
1.1.3.2. Noise sources
The periodic vibration of the vocal cords is not the only sound source in speech.
Noise sources are involved in the production of several phonemes. Two types of
noise can be observed: transient noise and continuous noise. When a plosive is
produced, the holding phase (total obstruction of the vocal tract) is followed by a
release phase. A transient noise is then produced by the pressure and airflow

Speech Analysis 11
impulse generated by the opening of the obstruction. The source is located in the
vocal tract, at the point where the obstruction and release take place. The impulse is
a wide-band noise which slightly varies with the plosive.
For continuous noise (fricatives), the sound originates from turbulences in the
fast airflow at the level of the constriction. Shadle [SHA 90] distinguishes noise
caused by the lining and noise caused by obstacles, depending on the incidence
angle of the air stream on the constriction. In both cases, the turbulences produce a
source of random acoustic pressure downstream of the constriction. The power
spectrum of this signal is approximately flat in the range of 0 – 4,000 Hz, and then
decreases with frequency.
When the constriction is located at the glottis, the resulting noise (aspiration
noise) shows a wide-band spectral maximum around 2,000 Hz. When the
constriction is in the vocal tract, the resulting noise (frication noise) also shows a
roughly flat spectrum, either slowly decreasing or with a wide maximum somewhere
between 4 kHz and 9 kHz. The position of this maximum depends on the fricative.
The excitation source for continuous noise can thus be considered as a white
Gaussian noise filtered by a low-pass filter or by a wide band-pass filter (several
kHz wide).
In continuous speech, it is interesting to separate the periodic and non-periodic
contributions of the excitation. For this purpose, either the sinusoidal representation
[SER 90] or the short-term Fourier spectrum [DAL 98, YEG 98] can be used. The
principle is to subtract from the source signal its harmonic component, in order to
obtain the non-periodic component. Such a separation process is illustrated in Figure
1.3.

Figure 1.3. Spectrum of the excitation source for a vowel. (A) the complete spectrum; (B) the
non-periodic part; (C) the periodic part
1.1.4. Vocal tract
The vocal tract is an acoustic cavity. In the source-filter model, it plays the role
of a filter, i.e. a passive system which is independent from the source. Its function
consists of transforming the source signal, by means of resonances and anti-
resonances. The maxima of the vocal tract’s spectral gain are called spectral
formants, or more simply formants. Formants can generally be assimilated to the
spectral maxima which can be observed on the speech spectrum, as the source
spectrum is globally monotonous for voiced speech. However, depending on the

Speech Analysis 13
source spectrum, formants and resonances may turn out to be shifted. Furthermore,
in some cases, a source formant can be present. Formants are also observed in
unvoiced speech segments, at least those that correspond to cavities located in front
of the constriction, and thus excited by the noise source.
1.1.4.1. Multi-tube model
The vocal tract is an acoustic duct with a complex shape. At a first level of
approximation, its acoustic behavior may be understood to be one of an acoustic
tube. Hypotheses must be made to calculate the propagation of an acoustic wave
through this tube:
– the tube is cylindrical, with a constant area section A;
– the tube walls are rigid (i.e. no vibration terms at the walls);
– the propagation mode is (mono-dimensional) plane waves. This assumption is
satisfied if the transverse dimension of the tube is small, compared to the considered
wavelengths, which correspond in practice to frequencies below 4,000 Hz for a
typical vocal tract (i.e. a length of 17.6 cm and a section of 8 cm2
for the neutral
vowel);
– the process is adiabatic (i.e. no loss by thermal conduction);
– the hypothesis of small movements is made (i.e. second-order terms can be
neglected).
Let A denote the (constant) section of the tube, x the abscissa along the tube, t the
time, p(x, t) the pressure, u(x, t) the speed of the air particles, U(x, t) the volume
velocity, ȡ the density, L the tube length and C the speed of sound in the air
(approximately 340 m/s). The equations governing the propagation of a plane wave
in a tube (Webster equations) are:
2
2
2
2
2
2
2
2
2
2
1
and
1
x
u
t
u
C
x
p
t
p
C w
w
w
w
w
w
w
w
[1.12]
This result is obtained by studying an infinitesimal variation of the pressure, the
air particle speed and the density: p(x, t) = p0 + ˜p(x, t), u(x, t) = u0 + ˜u(x, t), ȡ(x, t) =
ȡ0 + ˜ȡ(x, t), in conjunction with two fundamental laws of physics:
1) the conservation of mass entering a slice of the tube comprised between x and
x+dx: A˜x˜ȡ = ȡA˜u˜t. By neglecting the second-order term (˜ȡ˜u˜t), by using the
ideal gas law and the fact that the process is adiabatic, (p/ȡ = C2), this equation can
be rewritten ˜p/C2˜t = ȡ0˜u/˜x;

2) Newton’s second law applied to the air in the slice of tube yields: A˜p =
ȡA˜x(˜u/˜t), thus ˜p/˜x = ȡ0˜u/˜t.
The solutions of these equations are formed by any linear combination of
functions f(t) and g(t) of a single variable, twice continuously derivable, written as a
forward wave and a backward wave which propagate at the speed of sound:
¸
¹
·
¨
©
§

¸
¹
·
¨
©
§

C
x
t
g
t
x
f
C
x
t
f
t
x
f )
,
(
and
)
,
( [1.13]
and thus the pressure in the tube can be written:
¸
¹
·
¨
©
§

¸
¹
·
¨
©
§

C
x
t
g
C
x
t
f
t
x
p )
,
( [1.14]
It is easy to verify that function p satisfies equation [1.12]. Moreover, functions f
and g satisfy:
x
C
x
t
g
c
t
C
x
t
g
x
C
x
t
f
c
t
C
x
t
f
w

w
w

w
w

w

w

w )
(
)
(
and
)
(
)
(
[1.15]
which, when combined for example with Newton’s second law, yields the following
expression for the volume velocity (the tube having a constant section A):
»
¼
º
«
¬
ª
¸
¹
·
¨
©
§

¸
¹
·
¨
©
§

C
x
t
g
C
x
t
f
C
A
t
x
U
U
)
,
( [1.16]
It must be noted that if the pressure is the sum of a forward function and a
backward function, the volume velocity is the difference between these two
functions. The expression Zc = ȡC/A is the ratio between the pressure and the volume
velocity, which is called the characteristic acoustic impedance of the tube. In
general, the acoustic impedance is defined in the frequency domain. Here, the term
“impedance” is used in the time domain, as the ratio between the forward and
backward parts of the pressure and the volume velocity. The following
electroacoustical analogies are often used: “acoustic pressure” for “voltage”;
“acoustic volume velocity” for “intensity”.
The vocal tract can be considered as the concatenation of cylindrical tubes, each
of them having a constant area section A, and all tubes being of the same length. Let
' denote the length of each tube. The vocal tract is considered as being composed of
p sections, numbered from 1 to p, starting from the lips and going towards the
glottis. For each section n, the forward and backward waves (respectively from the

Speech Analysis 15
glottis to the lips and from the lips to the glottis) are denoted fn and bn. These waves
are defined at the section input, from n+1 to n (on the left of the section, if the glottis
is on the left). Let Rn =ȡC/An denote the acoustic impedance of the section, which
depends only on its area section.
Each section can then be considered as a quadripole with two inputs fn+1 and
bn+1, two outputs fn and bn and a transfer matrix Tn+1:
»
¼
º
«
¬
ª
»
¼
º
«
¬
ª

1
1
1
n
n
n
n
n
b
f
T
b
f
[1.17]
For a given section, the transfer matrix can be broken down into two terms. Both
the interface with the previous section (1) and the behavior of the waves within the
section (2) must be taken into account:
1) At the level of the discontinuity between sections n and n+1, the following
relations hold, on the left and on the right, for the pressure and the volume velocity:
)
(
and
)
(
1
1
1
1
1
1
1
¯
®

¯
®

n
n
n
n
n
n
n
n
n
n
n
n
n
n
b
f
U
b
f
R
p
b
f
U
b
f
R
p
[1.18]
as the pressure and the volume velocity are both continuous at the junction, we have
Rn+1 (fn+1+bn+1) = Rn (fn+bn) and fn+1íbn+1 = fn–bn, which enables the transfer matrix at
the interface to be calculated as:
»
¼
º
«
¬
ª
»
¼
º
«
¬
ª

»
¼
º
«
¬
ª

1
1
1
1
1
1
2
1
n
n
n
n
n
n
n
n
n
n
n
n
n
b
f
R
R
R
R
R
R
R
R
R
b
f
[1.19]
After defining acoustic reflection coefficient k, the transfer matrix )
1
(
1

n
T at the
interface is:
n
n
n
n
n
n
n
n
n
A
A
A
A
R
R
R
R
k
k
k
k
T

»
¼
º
«
¬
ª

1
1
1
1
)
1
(
1 with
1
1
1
1
[1.20]
2) Within the tube of section n+1, the waves are simply submitted to
propagation delays, thus:
(t)
and
(t) 1
1 ¸
¹
·
¨
©
§

¸
¹
·
¨
©
§

C
ǻ
t
b
b
C
ǻ
t-
f
f n
n
n
n [1.21]

The phase delays and advances of the wave are all dependent on the same
quantity '/C. The signal can thus be sampled with a sampling period equal to Fs =
C/(2') which corresponds to a wave traveling back and forth in a section. Therefore,
the z-transform of equations [1.21] can be considered as a delay (respectively an
advance) of '/C corresponding to a factor z-1/2
(respectively z1/2
).
and 2
1
1
2
1
1 (z)z
B
(z)
B
(z)z
F
(z)
F n
n
-
n
n
[1.22]
from which the transfer matrix )
2
(
1

n
T corresponding to the propagation in section
n + 1 can be deduced.
In the z-transform domain, the total transfer matrix Tn+1 for section n+1 is the
product of )
1
(
1

n
T and )
2
(
1

n
T :
1
1
0
0
1
1
1
1 2
1
2
1
2
1
1 »
¼
º
«
¬
ª

»
»
¼
º
«
«
¬
ª
»
¼
º
«
¬
ª

z
k
kz
k
z
z
z
k
k
k
Tn [1.23]
The overall volume velocity transfer matrix for the p tubes (from the glottis to
the lips) is finally obtained as the product of the matrices for each tube:
–
»
¼
º
«
¬
ª
»
¼
º
«
¬
ª p
i
i
p
p
T
T
b
f
T
b
f
1
0
0
with [1.24]
The properties of the volume velocity transfer function for the tube (from the
glottis to the lips) can be derived from this result, defined as Au = (f0íb0)/(fp íbp).
For this purpose, the lip termination has to be calculated, i.e. the interface between
the last tube and the outside of the mouth. Let (fl,bl) denote the volume velocity
waves at the level of the outer interface and (f0,b0) the waves at the inner interface.
Outside of the mouth, the backward wave bl is zero. Therefore, b0 and f0 are linearly
dependent and a reflection coefficient at the lips can be defined as kl = b0/f0. Then,
transfer function Au can be calculated by inverting T, according to the coefficients of
matrix T and the reflection coefficient at lips kl:
)
(
)
1
)(
det(
12
11
22
21 T
T
k
T
T
k
T
A
l
l
u

[1.25]
It can be verified that the determinant of T does not depend on z, as this is also
not the case for the determinant of each elementary tube. As the coefficients of the
transfer matrix are the products of a polynomial expression of z and a constant

Speech Analysis 17
multiplied by z-1/2
for each section, the transfer function of the vocal tract is
therefore an all-pole function with a zero for z=0 (which accounts for the
propagation delay in the vocal tract).
1.1.4.2. All-pole filter model
During the production of oral vowels, the vocal tract can be viewed as an
acoustic tube of a complex shape. Its transfer function is composed of poles only,
thus behaving as an acoustic filter with resonances only. These resonances
correspond to the formants of the spectrum, which, for a sampled signal with limited
bandwidth, are of a finite number N. In average, for a uniform tube, the formants are
spread every kHz; as a consequence, a signal sampled at F=1/T kHz (i.e. with a
bandwidth of F/2 kHz), will contain approximately F/2 formants and N=F poles will
compose the transfer function of the vocal tract from which the signal originates:
–

N
i i
i
N
g
l
z
z
z
z
z
K
z
U
U
z
V
1
1
*
1
2
1
)
ˆ
1
)(
ˆ
1
(
)
(
)
( [1.26]
Developing the expression for the conjugate complex poles
]
2
exp[
*
ˆ
,
ˆ T
i
f
i
T
i
B
i
z
i
z S
S r
yields:
–

N
i i
i
i
N
z
T
B
z
T
f
T
B
z
K
z
V
1
2
1
2
1
]
)
2
exp(
)
2
cos(
)
exp(
2
1
[
)
(
S
S
S
[1.27]
where Bi denotes the formant’s bandwidth at í6 dB on each side of its maximum and
fi its center frequency.
To take into account the coupling with the nasal cavities (for nasal vowels and
consonants) or with the cavities at the back of the excitation source (the subglottic
cavity during the open glottis part of the vocalic cycle or the cavities upstream the
constriction for plosives and fricatives), it is necessary to incorporate in the transfer
function a finite number of zeros *
, j
j z
z (for a band-limited signal).
–
–

N
i i
i
M
j j
j
g
l
z
z
z
z
z
z
z
z
K
z
U
U
z
V
1
1
*
1
1
1
*
1
2
)
ˆ
1
)(
ˆ
1
(
)
1
)(
1
(
)
(
)
( [1.28]

Any zero in the transfer function can be approximated by a set of poles,
as n
n
n
z
a
az
f

¦

0
1
/
1
1 . Therefore, an all-pole model with a sufficiently large
number of poles is often preferred in practice to a full pole-zero model.
1.1.5. Lip-radiation
The last term in the linear model corresponds to the conversion of the airflow
wave at the lips into a pressure wave radiated at a given distance from the head. At a
first level of approximation, the radiation effect can be assimilated to a
differentiation: at the lips, the radiated pressure is the derivative of the airflow. The
pressure recorded with the microphone is analogous to the one radiated at the lips,
except for an attenuation factor, depending on its distance to the lips. The time-
domain derivation corresponds to a spectral emphasis, i.e. a first-order high-pass
filtering. The fact that the production model is linear can be exploited to condense
the radiation term at the very level of the source. For this purpose, the derivative of
the source is considered rather than the source itself. In the spectral domain, the
consequence is to increase the slope of the spectrum by approximately +6
dB/octave, which corresponds to a time-domain derivation and, in the sampled
domain, to the following transfer function:
1
1
)
(
)
(

| z
K
z
U
P
z
L d
l
[1.29]
with Kd|1.
1.2. Linear prediction
Linear prediction (or LPC for Linear Predictive Coding) is a parametric model
of the speech signal [ATA 71, MAR 76]. Based on the source-filter model, an
analysis scheme can be defined, relying on a small number of parameters and
techniques for estimating these parameters.
1.2.1. Source-filter model and linear prediction
The source-filter model of equation [1.4] can be further simplified by grouping
in a single filter the contributions of the glottis, the vocal tract and the lip-radiation
term, while keeping a flat-spectrum term for the excitation. For voiced speech, P(z)
is a periodic train of pulses and for unvoiced speech, N(z) is a white noise.

Speech Analysis 19
)
(z
S )
(
)
(
)
(
)
(
)
(
)
( z
H
z
P
z
L
z
V
z
U
z
P g voiced speech [1.30]
)
(z
S )
(
)
(
)
(
)
(
)
( z
H
z
N
z
L
z
V
z
R unvoiced speech [1.31]
Considering the lip-radiation spectral model in equation [1.29] and the glottal
airflow model in equation [1.9], both terms can be grouped into the flat spectrum
source E, with unit gain (the gain factor G is introduced to take into account the
amplitude of the signal). Filter H is referred to as the synthesis filter. An additional
simplification consists of considering the filter H as an all-pole filter. The acoustic
theory indicates that the filter V, associated with the vocal tract, is an all-pole filter
only for non-nasal sounds whereas is contains both poles and zeros for nasal sounds.
However, it is possible to approximate a pole/zero transfer function with an all-pole
filter, by increasing the number of poles, which means that, in practice, an all-pole
approximation of the transfer function is acceptable. The inverse filter of the
synthesis filter is an all-zero filter, referred to as the analysis filter and denoted A.
This filter has a transfer function that is written as an Mth
-order polynomial, where
M is the number of poles in the transfer function of the synthesis filter H:
)
(z
S )
(
)
( z
H
z
E
G H(z): synthesis filter [1.32]
)
(
)
(
z
A
z
E
G
with ¦
M
i
i
i z
a
z
A
0
)
( : analysis filter [1.33]
Linear prediction is based on the correlation between successive samples in the
speech signal. The knowledge of p samples until the instant n–1 allows some
prediction of the upcoming sample, denoted n
ŝ , with the help of a prediction filter,
the transfer function of which is denoted F(z):
n
n s
s ˆ
| p
n
p
n
n s
s
s

D
D
D
2
2
1
1 ¦
p
i
i
n
is
1
D [1.34]
)
(
ˆ z
S )
)(
( 2
2
1
1
p
p z
z
z
z
S

D
D
D
¸
¸
¹
·
¨
¨
©
§
¦
P
i
i
i z
z
S
1
)
( D [1.35]
)
(
ˆ z
S )
(
)
( z
F
z
S [1.36]
The prediction error İn between the predicted and actual signals is thus written:

n
H
¸
¸
¹
·
¨
¨
©
§

¦
p
i
i
n
i
n
n
n s
s
s
s
1
ˆ D [1.37]
ȯ(z)
¸
¸
¹
·
¨
¨
©
§

¦
P
i
i
i z
z
S
z
S
z
S
1
1
)
(
)
(
ˆ
)
( D [1.38]
Linear prediction of speech thus closely relates with the linear acoustic
production model: the source-filter production model and the linear prediction
model can be identified with each other. The residual error İn can then be interpreted
as the source of excitation e and the inverse filter A is associated with the prediction
filter (by setting M = p).
¦
¦

p
i
i
n
i
p
i
i
n
i
n s
a
n
e
G
s
1
1
)
(
D
H [1.39]
The identification of filter A assumes a flat spectrum residual, which corresponds
to a white noise or a single pulse excitation. The modeling of the excitation source
in the framework of linear prediction can therefore be achieved by a pulse generator
and a white noise generator, piloted by a voiced/unvoiced decision. The estimation
of the prediction coefficients is obtained by minimizing the prediction error. Let 2
n
H
denote the square prediction error and E the total square error over a given time
interval, between n0 and n1:
1
0
2 2 2
1
[ ] and
n
p
n n i n i n
i n n
s s E
H D H

¦ ¦ [1.40]
The expression of coefficients k
D that minimizes the prediction error E over a
frame is obtained by zeroing the partial derivatives of E with respect to
the k
D coefficients, i.e., for k = 1, 2, …, p:
0
2
i.e.
0
1
0
1
»
¼
º
«
¬
ª
¦

w
w
¦

n
n
n
p
i
i
n
i
n
k
n
k
s
s
s
E
D
D
[1.41]
Finally, this leads to the following system of equations:
p
k
s
s
s
s
n
n
n
i
n
k
n
p
i
i
n
n
n
n
k
n d
d
¦
¦
¦

1
1
0
1
0 1
D [1.42]

Speech Analysis 21
and, if new coefficients cki are defined, the system becomes:
¦
¦

d
d
1
0
h
wit
1
1
0
n
n
n
k
n
i
n
ki
p
i
ki
i
k s
s
c
p
k
c
c D [1.43]
Several fast methods for computing the prediction coefficients have been
proposed. The two main approaches are the autocorrelation method and the
covariance method. Both methods differ by the choice of interval [n0, n1] on which
total square error E is calculated. In the case of the covariance method, it is assumed
that the signal is known only for a given interval of N samples exactly. No
hypothesis is made concerning the behavior of the signal outside this interval. On
the other hand, the autocorrelation method considers the whole range í’, +’ for
calculating the total error. The coefficients are thus written:
¦

1
N
p
n
k
n
i
n
ki s
s
c covariance [1.44]
¦
f
f

n
k
n
i
n
ki s
s
c autocorrelation [1.45]
The covariance method is generally employed for the analysis or rather short
signals (for instance, one voicing period, or one closed glottis phase). In the case of
the covariance method, matrix [cki] is symmetric. The prediction coefficients are
calculated with a fast algorithm [MAR 76], which will not be detailed here.
1.2.2. Autocorrelation method: algorithm
For this method, signal s is considered as stationary. The limits for calculating
the total error are í’, +’. However, only a finite number of samples are taken into
account in practice, by zeroing the signal outside an interval [0, Ní1], i.e. by
applying a time window to the signal. Total quadratic error E and coefficients
cki become:
¦ ¦
¦
f
f
f
f

f
f n n
i
k
n
n
k
n
i
n
ki
n
n s
s
s
s
c
E and
2
H [1.46]
Those are the autocorrelation coefficients of the signal, hence the name of the
method. The roles of k and i are symmetric and the correlation coefficients only
depend on the difference between k and i.

The samples of the signal sn (resp. sn+|k-i|) are non-zero only for n [0, N–1]
(n+|k-i| [0, N–1] respectively). Therefore, by rearranging the terms in the sum, it
can be written for k = 0, …, p:
1 1
0 0
with
( ) ( )
N k i N k
ki n n n k
n k i
n n
c s s r k i r k s s

¦ ¦ [1.47]
The p equation system to be solved is thus (see [1.43]):
p
k
r
a
p
i
i d
d
¦ 1
0
)
i
-
k
(
1
[1.48]
Moreover, one equation follows from the definition of the error E:
¦
¦ ¦ ¦ ¦ ¦
f
f
f
f

p
i
i
i
n
p
i
p
j n
p
i
i
n
i
n
j
n
j
i
n
i r
a
s
a
s
s
a
s
a
E
0
0 0 0
[1.49]
as a consequence of the above set of equations [1.48]. An efficient method to solve
this system is the recursive method used in the Levinson algorithm.
Under its matrix form, this system is written:
»
»
»
»
»
»
»
»
¼
º
«
«
«
«
«
«
«
«
¬
ª
»
»
»
»
»
»
»
»
¼
º
«
«
«
«
«
«
«
«
¬
ª
»
»
»
»
»
»
»
»
¼
º
«
«
«
«
«
«
«
«
¬
ª

0
0
0
0
1
3
2
1
0
3
2
1
3
0
1
2
3
2
1
0
1
2
1
2
1
0
1
3
2
1
0

E
a
a
a
a
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
p
p
p
p
p
p
p
p
p
[1.50]
The matrix is symmetric and it is a Toeplitz matrix. In order to solve this system,
a recursive solution on prediction order n is searched for. At each step n, a set of

Speech Analysis 23
n+1 prediction coefficients is calculated: n
n
n
n
n
a
..
a
a
a ,
.
,
,
, 2
1
0 . The process is repeated
up to the desired prediction order p, at which stage: 0
0 a
a p
, 1
1 a
a p
,
2
2 a
a p
,…, .
p
p p
a a If we assume that the system has been solved at step n–1,
the coefficients and the error at step n of the recursion are obtained as:
»
»
»
»
»
»
»
»
¼
º
«
«
«
«
«
«
«
«
¬
ª

»
»
»
»
»
»
»
»
¼
º
«
«
«
«
«
«
«
«
¬
ª
»
»
»
»
»
»
»
»
¼
º
«
«
«
«
«
«
«
«
¬
ª

1
0
1
1
1
2
1
1
1
1
1
2
1
1
1
0
1
2
1
0 0
0 n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
n
a
a
a
a
k
a
a
a
a
a
a
a
a
a

[1.51]
»
»
»
»
»
»
»
»
¼
º
«
«
«
«
«
«
«
«
¬
ª

»
»
»
»
»
»
»
»
¼
º
«
«
«
«
«
«
«
«
¬
ª
»
»
»
»
»
»
»
»
¼
º
«
«
«
«
«
«
«
«
¬
ª

1
1
0
0
0
0
0
0
0
0
0
0
n
n
n
n
E
q
k
q
E
E

[1.52]
i.e. 1
1

n
i
n
n
n
i
n
i a
k
a
a , where it can be easily shown from equations [1.50], [1.51]
and [1.52] that:
)
1
(
and
1 2
1
1
0
1
1
n
n
n
n
i
i
n
n
i
n
n k
E
E
r
a
E
k

¦ [1.53]
As a whole, the algorithm for calculating the prediction coefficients is
(coefficients ki are called reflection coefficients):

1) E0 = r0
2) step n: i
n
n
i
n
i
n
n r
a
E
k

¦

1
0
1
1
1
3) n
n
n k
a and 1
0
n
a
4) 1
1

n
i
n
n
n
i
n
i a
k
a
a for 1 ” i ” n-1
5) 1
2
)
1
(
n
n
n E
k
E
These equations are solved recursively, until the solution for order p is reached.
In many applications, one of the goals is to identify the filter associated with the
vocal tract, for instance to extract the formants [MCC 74]. Let us consider vowel
signals, the spectra of which are shown in Figures 1.4, 1.5 and 1.6 (these spectra
were calculated with a short-term Fourier transform (STFT) and are represented on a
logarithmic scale). The linear prediction analysis of these vowels yields filters which
correspond to the prediction model which could have produced them. Therefore, the
magnitude of the transfer function of these filters can be viewed as the spectral
envelope of the corresponding vowels.
Linear prediction thus estimates the filter part of the source-filter model. To
estimate the source, the speech signal can be filtered by the inverse of the analysis
filter. The residual signal subsequently obtained represents the derivative of the
source signal, as the lip-radiation term is included in the filter (according to equation
[1.30]). The residual signal must thus be integrated in order to obtain an estimation
of the actual source, which is represented in Figure 1.7, both in the frequency and
time domains.

Speech Analysis 25
Figure 1.4. Vowel /a/. Hamming windowed signal (Fe = 16 kHz). Magnitude spectrum on a
logarithmic scale and gain of the LPC model transfer function (autocorrelation method).
Complex poles of the LPC model (16 coefficients)

Figure 1.5. Vowel /u/. Hamming windowed signal (Fe = 16 kHz). Magnitude spectrum on a

Speech Analysis 27
Figure 1.6. Vowel /i/. Hamming windowed signal (Fe = 16 kHz). Magnitude spectrum on a

1.2.3. Lattice filter
We are now going to show that reflection coefficients ki obtained by the
autocorrelation method correspond to the reflection coefficients of a multi-tube
acoustic model of the vocal tract. For this purpose, new coefficients n
i
b must be
introduced, which are defined at each step of the recursion as:
n
i
a
b n
i
n
n
i ,
,
1
,
0
[1.54]
The { p
i
b } coefficients, where p is the prediction order, can be used to postdict
the signal, i.e. to predict the preceding sample of the signal. Let’s form the estimate
p
n
s
ˆ :
p
n
s
ˆ 1
1
1
1
0

p
n
p
n
n s
b
s
b
s
b
¦
¦

1
0
1
0
p
i
i
n
i
p
p
i
i
n
i s
s
b D [1.55]
A postdiction, or backward error,
n
H can be defined as:
¦

p
i
p
i
n
i
p
n
p
n
n b
s
b
s
s
0
1
with
ˆ
H [1.56]
The total forward prediction error E (of equation [1.40]) is denoted E+
, while the
total backward prediction error is denoted E-
. In a same manner as in the previous
development, it can be shown that, for the autocorrelation method, we have E
í
=E
+
.
Subsequently, the backward prediction coefficients bi obtained via the minimization
of the total backward error are identical to the ai coefficients, and the Levinson
algorithm can be rewritten as:
1
1
1

n
i
n
n
i
n
i b
k
a
a and 1
1
1

n
i
n
n
i
n
i a
k
b
b with
°̄
°
®

0
0
1
1
n
n
n
n
b
a
[1.57]
If we consider the forward and backward prediction errors for a same instant (at
order n):
¦ ¦

n
j
n
j
j
i
j
n
i
j
i
j
n
i s
b
s
a
0 0
and H
H [1.58]

Speech Analysis 29
and then equations [1.57] yield:

)
1
(
)
1
(
1
)
1
(
1
)
1
(
and n
i
n
n
i
n
i
n
i
n
n
i
n
i k
k H
H
H
H
H
H [1.59]
The z-transforms of these equations provide:
»
»
¼
º
«
«
¬
ª
»
»
¼
º
«
«
¬
ª
»
»
¼
º
«
«
¬
ª

)
(
)
(
1
)
(
)
(
)
1
(
)
1
(
1
1
z
E
z
E
z
k
z
k
z
E
z
E
n
n
n
n
n
n
[1.60]
with, for n = 0: i
i
i s

0
0
H
H .
To complete the analogy between linear prediction and multi-tube acoustic
model, a slightly different definition of the backward prediction coefficients must be
resorted to: n
i
n
n
i a
b 1

for i = 1, 2, ..., n+1. The total backward error has the same
expression and the Levinson algorithm is written:
1
1

n
i
n
n
i
n
i b
k
a
a and 1
1
1
1

n
i
n
n
i
n
i a
k
b
b with
°̄
°
®

0
0
1
0
1
n
n
n
b
a
[1.61]
from which the error recursion matrix can be deduced:
»
»
¼
º
«
«
¬
ª
»
¼
º
«
¬
ª
»
»
¼
º
«
«
¬
ª

)
(
)
(
1
)
(
)
(
)
1
(
)
1
(
1
1
z
E
z
E
z
z
k
k
z
E
z
E
n
n
n
n
n
n
[1.62]
for n = 0, i
i s

0
H and 1
0

i
i s
H , i.e. )
(
)
(
0
z
S
z
E
and )
(
)
( 1
0
z
S
z
z
E

. The
inverse matrix from equation [1.62] is:
»
¼
º
«
¬
ª

z
k
z
k
k n
n
n
1
1
1
2
[1.63]
Except for a multiplicative factor, this is the matrix of equation [1.23], obtained
for a section of the multi-tube vocal tract model. This justifies the naming of the kn
coefficients as reflection coefficients. This is the inverse matrix, as the linear
prediction algorithm provides the analysis filter. On the contrary, the matrix for an
elementary section of the multi-tube acoustic model corresponds to the synthesis
filter, i.e. the inverse of the analysis filter. Note that this definition of backward
prediction coefficients introduces a shift of one sample between the forward error
and the backward error, which in fact corresponds to the physical situation of the
multi-tube model, in which the backward wave comes back only after a delay due to

the propagation time in the tube section. On the contrary, if the definition of [1.54]
is used, there is no shift between forward and backward errors.
Equation [1.62] allows for the analysis and synthesis of speech by linear
prediction, with a lattice filter structure. In fact, for each step in the recursion,
crossed terms are used that result from the previous step. A remarkable property of
lattice filters is that the prediction coefficients are not directly used in the filtering
algorithm. Only the signal and the reflection coefficients intervene. Moreover, it can
be shown [MAR 76, PAR 86] that the reflection coefficients resulting from the
autocorrelation method can be directly calculated using the following formula:
( 1) ( 1)
1
0
2 2
( 1) ( 1)
1 1
0 0
n n
N
i i
i
n
n n
N N
i i
i i
k
H H
H H

¦
¦ ¦
[1.64]
These coefficients are sometimes called PARCOR coefficients (for PARtial error
CORrelation). The use of equation [1.64] is thus an alternate way to calculate the
analysis and synthesis filters, which is equivalent to the autocorrelation method, but
without calculating explicitly the prediction coefficient. Other lattice filter structures
have been proposed. In the Burg method, the calculation of the reflection
coefficients is based on the minimization (in the least squares sense) of the sum of
the forward and backward errors. The error term to minimize is:
¦

»
¼
º
«
¬
ª

1
0
2
2
N
i
n
i
n
i
N
E H
H [1.65]
By writing that ˜En
/˜kn = 0, in order to find the optimal kn coefficients, we
obtain:
¦
¦
¦

1
0
2
)
1
(
1
0
2
)
1
(
1
0
)
1
(
)
1
(
2
N
i
n
i
N
i
n
i
N
i
n
i
n
i
n
k
H
H
H
H
[1.66]

Speech Analysis 31
These coefficients no longer correspond to the autocorrelation method, but they
possess good stability properties, as it can be shown that í1 ” kn ” 1. Adaptive
versions of the Burg algorithm also exist [MAK 75, MAK 81].
1.2.4. Models of the excitation
In addition to the filter part of the linear prediction model, the source part has to
be estimated. One of the terms concerning the source is the synthesis gain G. There
is no unique solution to this problem and additional hypotheses must be made. A
commonly accepted hypothesis is to set the total signal energy equal to that of the
impulse response of the synthesis filter. Let us denote as h(n) the impulse response
and rh(k) the corresponding autocorrelation coefficients. Thus:
¦
¦

p
i
h
i
p
i
h
i i
k
r
k
r
i
n
h
n
G
n
h
1
1
)
(
)
(
and
)
(
)
(
)
( D
D
G [1.67]
Indeed, for k 0, the autocorrelation coefficients are infinite sums of terms such
as:
¦

p
i
i i
n
h
k
n
h
k
n
h
n
G
n
h
k
n
h
1
)
(
)
(
)
(
)
(
)
(
)
( D
G [1.68]
and the terms į(n)h(ník) are always zero, for k 0. Equaling the total energies is
equivalent to equaling the 0th
order autocorrelations. Thanks to recurrence equation
[1.67], the autocorrelation coefficients of the signal and of the impulse response can
be identified with each other: rh(i) = r(i), for i = 0, 1, …, p. For n = 0, h(0) = G;
therefore, reusing equation [1.67] yields:
2
1 1
(0) (0) ( ) therefore: (0) ( )
p p
h i h i
i i
r Gh r i G r r i
D D

¦ ¦ [1.69]

Figure 1.7. Vowel /i/. Residual signal and its magnitude spectrum on a logarithmic scale
In the conventional linear prediction model, the excitation is either voiced or
unvoiced, for each analysis frame. In the case of a voiced signal, the excitation is a
periodic pulse train at the fundamental period (see Figure 1.9), and for an unvoiced
signal, the excitation is a Gaussian white noise (see Figure 1.8). The mixture of
these two sources is not allowed, which is a definite drawback for voiced sounds for
which a noise component is also present in the excitation.

Other documents randomly have
different content

among them, and many more that are not so good. Those that saw
the thing out say they finally got to singing, Glory to God, and
Abe Linkum, and wound up with a prayer meeting, in which Massa
Linkum and the Linkum Sogers were the names most often heard.
October 17, 1863.
Saturday. To-day Lieutenants Heath, Reynolds, the quartermaster
and myself took a long ride about the country spreading the news of
our headquarters for recruits. The white people we met were civil,
but their hatred of us could not be entirely covered up. I could not
find it in my heart to blame them, and I much regretted that one of
our party saw fit to trade horses with one of them and entirely
against his will. But the blacks are wild with joy, and eager to
become Linkum Sogers.
In the afternoon a detail was sent out with the quartermaster's
wagon for mutton or beef, for our family is getting so large they will
soon eat up the government rations at hand. They came back soon
with a choice lot of dressed mutton. The guides apparently knew
just where to go. Later in the day Reynolds, Gorton and myself
made another tour of the country towards the Mississippi River. We
came to a house over towards the Great Cypress Swamp, as the
folks here call it, and which is a belt of big timber lying between the
Teche prairie and the Mississippi River, in which outlaws and wild
beasts are said to abound, and in which bands of guerrillas have
their hiding places. We have heard much of the Great Cypress
Swamp and its terrors, and felt quite brave as we looked at it from a
half mile distance. No one appeared to be at home, so we
investigated. The weeds were as high as our heads, but a path led
back to a stable in which was the most perfect picture of a horse I
ever looked at. He appeared to be scared out of his head at the
sight of us, and plunged and snorted as if a bear was after him. The
path continued and soon we came to a mulatto and his wife busy
digging peanuts. We introduced the subject of enlistment and found

he was ready and willing to go at once if he could take his horse
with him. They could both talk English, and a jargon we supposed
was French. When speaking to us they used English, but to each
other they talked French. After a short confab he agreed to go with
us, and his wife made no objection. He got his horse from the
stable, and his saddle from the house and we set out for camp.
I thought it strange that either of them showed so little concern at
parting for what might be forever, and wondered the wife did not
ask to go also, as so many of the others had done. We reached
camp just at night, where both the horse and man attracted the
attention of all hands. Colonel Parker at once wanted to buy the
horse, and a bargain was soon struck, the horse to be paid for on
the next pay day, which was agreeable to the mulatto. He was so
frank and open in all his talk, that when he asked if he might ride
the horse home and remain till morning the colonel readily
consented, telling him to be in camp by noon the next day.
October 18, 1863.
Sunday. We lay about camp until noon and the horse and his rider
did not appear. The colonel was mad clear through. He had been
told the nigger would not come back, but he believed he would, and
as the time went on little was heard but comments on the slick trick
the rogue had played on Colonel Parker. After dinner he told Gorton
and me to saddle up and show him the way and he would see
whether he could find him. We went to the house but found no one
at home. We then rode on towards the swamp. We saw a man
running across a cleared spot and soon overhauled him. It was the
fellow himself. He said his horse had got away and he was trying to
find him, had been looking for him all the morning. The colonel drew
his revolver and told him to march ahead of him to a big tree a short
distance away, at the same time telling me to get my picket rope
ready, for he was going to find that horse, or else find a dead nigger.
The nig was scared and began to beg, declaring the horse had

gotten out of the stable in the night, and he and his wife both had
been looking for him all day long. After he had got through, the
colonel told me to throw the line over a limb, for he was going to
keep his word. Whether he did really intend to hang him or not I
don't know, but I thought he would stop short of the actual deed, so
I proceeded to get the rope in position for a real hanging. Just then
the rascal owned up. The horse was in the swamp where he had
hidden him, and if the colonel would spare his life he would take us
to him. We then went on and soon came to a beaten path that led
directly to the dense forest before us. At the first turn in the path
after we entered the woods the colonel dropped me off. At the next
turn he left Gorton, and he himself with revolver in hand followed
the fellow on and out of sight. He was gone perhaps fifteen minutes
when out they came, horse and all, and we made tracks for camp,
which we reached about sundown. The next morning the man's wife
came into camp, and they both acted as if nothing out of the
ordinary had happened. Where I waited in the woods the
undergrowth was so dense I could not see a rod in any direction
except along the path. Squirrels, both black and gray, came out of
the bushes and looked at me. I counted five black squirrels in sight
at one time. They are not quite so large as the grays, and are a dark
brown rather than black. I wondered if they were as plenty all
through the woods as where I sat. Gorton says he saw as many as I
did. If all the stories I have heard about the Great Cypress Swamp
are true, I don't care for any closer acquaintance than I now have.
There are wild animals of all kinds common to this part of the
country—bears, wildcats, opossum, deer and snakes as big as any in
Barnum's menagerie. I can believe the snake part, for I have seen
so many that I believe all the snake stories I hear. This same Great
Cypress Swamp is said to be the home of outlaws, both white and
black. That they have homes there where they live undisturbed by
the laws made to govern other people. That runaway slaves find
homes there, where they live and raise families which recruit the
ranks of the lawless set living there, as fast as they are killed off by
the fights they have among themselves and with the officers of the
law that attempt to capture or subdue them.

Night. The work for to-morrow has been mapped out. Quartermaster
Schemerhorn, Lieutenant Reynolds and myself are to start for
Brashear City, taking with us the men we have enlisted. Two days'
rations have been given out, and the darkies are having a farewell
dance. This has been a busy Sunday, one I will long remember.
October 19, 1863.
Monday. We were up early and found the dance still going on. These
creatures have danced all night, and eaten up a good portion of the
rations, in spite of the fact that they knew a hard tramp lay before
them to-day. How they will get through, or what we will do if they
give out on the way, is the next thing for us to think of. They don't
care. Someone has always thought for them and will have to think
for them for some time to come.
The quartermaster and Reynolds started off in good season but I
was kept back for instructions until they were out of sight, and I did
not overtake them until they had reached Vermillion Bayou. A drove
of men, women and children, the families of the men we were taking
away, had followed them until now. We had to wait for a wagon train
to get off the bridge and this gave time for them to get through with
the good-byes, and most of them turned back. A half dozen or more
of the younger women kept on and went all the way through. The
day was warm, and the road was dusty, but we went through
without accident or adventure, other than might be expected when
all things are considered. For several days the men had been in a
state of great excitement over their new prospects. They had wound
up by dancing all night, and eating up the provisions intended for us
on this hard tramp. As the day wore on the excitement wore off and
they found themselves very tired and very hungry. Such few things
as they had beside those on their backs was in a cart drawn by a
mule, and driven by three wenches. When a man gave out we
turned out a wench and put the man in her place. Finally all three
wenches were on foot, and their places in the cart taken by as many

men. Before long others gave out and the cart was loaded until that
broke down. Then we held a council. We were outside the picket
lines and night was coming on, and staying there in the road was
not to be thought of. Three revolvers were the only weapons of
defense we could muster in case of attack by a guerrilla squad.
Capture meant death. We explained the situation to such as could
understand us, and they made it so plain to the others that they
were all ready to hustle. We patched up the cart so the extras could
be dragged along and away we went. The quartermaster rode on to
find a place to stay at, and something to eat. I let one who was
worst off ride my horse, and with Reynolds at the front to coax, and
I at the rear to drive, we got up such a gait I had to do my best to
keep up. The road had been graded for a railroad, and was wide and
level as a floor. At dusk I saw the steeple of a church, and knew we
were near our journey's end. Now that the end was in sight, the
weariness all seemed to disappear. We passed the picket line and
were soon in the town.
The quartermaster had got a schoolhouse for a stay over and had
rations from the commissary. We made short work of these and
expected to settle right down for the night. The men and women
filled the schoolhouse full, and after being in there a few minutes,
we three made up our minds the air was better outside, so we each
took a board shutter from the windows and were soon settled down
as comfortable as the circumstances would allow. Before we were
asleep we heard a fiddle tuning up and in a little while a dance was
started and was in full blast when I fell asleep. How long it lasted I
don't know, but when I awoke about sunrise the inmates of the
schoolhouse were sleeping like the dead.
October 20, 1863.
Tuesday. I was nearly blind when I awoke. Something like an
inflammation in my eyes had troubled me for some days, and the
dusty tramp of the day before had made it worse. However, I soaked

them open, and found that it had not affected my appetite in the
least. While at breakfast Lieutenant Bell came and joined us. He was
on his way to join the colonel and his party at the front. The colonel
had given us an order to stop any boat going towards Brashear City,
and with it I proceeded to the landing, leaving Reynolds and the
quartermaster to pick up and bring on our party. At the landing I
met a party on their way to the front, and gave my horse to one of
them who was in just such a fix as I was the morning I became a
horse thief. In reply to his very profuse thanks I told him I would
have to turn her loose if I didn't give her away, for I could take her
no farther. I had long forgiven her the kick she gave me and
sincerely wished her well. At Nelson's Landing I found a boat which
was being held in readiness for General Banks and his staff, so that
was of no use to us. Soon after the A. G. Brown came up and said
she would be back that night, and take us. We went into camp near
the sugar mill and very soon our small army was arranging for a
sham battle. They talked French, so I could only judge what they
were up to from what I saw. They divided into two squads and
proceeded to fortify their positions by rolling the empty sugar
hogsheads up in two parallel rows, behind which they stationed
themselves, while the generals in command jawed at each other
across the field. The men each had a hogshead stave for a weapon.
For flags they used bandanna handkerchiefs, and for drums a piece
of board upon which one man pounded while another held it up.
One of the generals made a speech which made the other side
fighting mad, and they all jumped over the breastworks and met in
the space between, batting each other over the head with their
weapons, and yelling with all the power of their lungs. We thought
sure they would kill each other, for the blows they struck broke some
of the staves into splinters. Just as we were going to try and
interfere, one side surrendered and were marched off, prisoners.
There had been some blood shed, and the wonder is that no heads
were broken. But the best part came after the fight was over, and
when the final settlement was being made. Through an interpreter
we learned that the general who should win the fight was to kiss one
of the young ladies that had marched with us all the way from

Mouton's Plantation, and he now demanded his pay. She was led out
upon the battlefield, and when the victorious officer came up to
claim his reward she slapped his face, and then turned her back to
him. He then gave some orders, when his men grabbed the dusky
maiden and turned her about. I could not tell whether she blushed
or not, but suppose of course she did. The general got down on one
knee and then on both and jabbered French at her until she finally
relented and stuck out her hand, which she allowed him to kiss. This
soon led to a full surrender, and the battle was over, and peace
declared.
We gave out the rations and began to get ready for a start as soon
as the boat came along. We even filled a barrel with sugar, thinking
it might come handy when we got to Brashear City. But night came
and the A. G. Brown failed to appear. There were many here who
like ourselves were waiting to get out of the country. Among them
was a young mulatto woman, whom the others called Margaret, and
who seemed of a higher order than those about her. She was willing
to talk, and from her I have a story that has fully reconciled me to
the wisdom of the President's Emancipation Proclamation. She has
started for the North. Our coming among them has given her the
chance she had long looked for. She has run away from her mistress,
and her master is in the Rebel army. She has a picture of her
husband, and a fine-looking man he was. He was as white as I am.
He was the son of his master, and her father she says is Judge ——,
now in the Rebel service. Her husband picked up enough education
to be head man on his father's plantation. He knew too much for a
nigger, and when the Rebel army came through last spring he was
taken out and hanged to a tree right before her eyes. After they had
gone the slaves cut the body down and buried it. Margaret is in
hopes to reach New York, and I wished I could land her there that
minute. If she was dressed as well, and if she was educated, she
would pass muster with any I have seen that go by the name of
ladies.

No boat coming to take us away, we posted guards, giving each a
stick of wood for a weapon. I remained up until midnight, and in
going the rounds to see if the guards were awake, came near
getting a club over my head as I turned the corner of the sugar mill.
At midnight I called Reynolds, and rolled myself in my blanket and
was soon asleep. The mosquitoes were about as thick and as savage
as any we had met with. The horses and cattle had no peace for
them. I rolled myself up head and heels in my blanket, and yet when
I awoke found one foot had got out of bed, and the varmints had
put a belt around my ankle between my stocking and trousers that
looked like raw beef. I don't suppose there was an atom of space
that had not been punctured by a bill. But I slept right through, and
as usual dreamed of home and home folks.
October 21, 1863.
Wednesday. Nelly, one of the women who came with our crowd, has
volunteered to be our cook, and besides being a good cook has
proved herself to be a good forager. When I woke up she had fresh
pork and chicken cooked and we asked no questions about what
price she paid for them. Quartermaster Schemerhorn rode up to
Newtown for rations, and I went back to bed to finish up my nap.
The mosquitoes had not quite finished their job on me, and some
actually bit me through a thick woollen blanket. My leg was very
sore where they feasted on it this morning. One of the men mixed
up some mud for a poultice, which helped it wonderfully. I found out
we could learn many things from these poor creatures, not the least
being how to live on the fat of the land we are in.
Noon. The quartermaster came back and said the A. G. Brown would
be along to-day some time. That it will make a landing one-half mile
above here. Accordingly we pack up and move up to Mr. Nelson's so
as to be sure of not missing it. Mr. Nelson, the owner of everything
in this region, is here. He has been a merchant in New Orleans, but
since Banks' order driving all Rebel sympathizers from the city, has

been here at his plantation home. It is said he owns 20,000 acres of
land, and all the necessary stock and tools to work so large a tract.
After a supper of hard-tack and bacon, Lieutenant Reynolds and I
went and called on the gentleman. He received us very politely, and
offered us the best his house afforded. The boat not coming we
prolonged our visit, sitting on the broad piazza and smoking his
cigars. He said he was a widower, with two children, a son in the
army, and a daughter at school in Georgia. He told us of the
outrageous wrongs he had suffered at the hands of the invading
armies, how they had laid waste his land, torn down his buildings
and fences, taking away his mules and horses, cattle and sheep,
until he had nothing but the bare land to live upon, and no slaves
left him to work even that. It was holding up the other side of the
picture to our view, and in spite of ourselves we were sorry for him.
He evidently did not expect sympathy from us, for after reciting his
wrongs he changed the subject of conversation around to topics we
could all agree upon, and after a sociable chat he invited us to spend
the night with him, agreeing to have us called in case the boat came
during the night. He urged us to stay and we did. He gave us rooms,
elegantly furnished, with beds so white and clean we were some
time making up our minds whether after all we ought not to sleep
on the floor, and leave the beds as they were. But the whole
mosquito bars and a few nips from our ever-present enemies
decided us. We undressed and were soon asleep, too sound even to
dream of home. The boat did not come and the next thing we were
aware of it was morning.
October 22, 1863.
Thursday. We slept late, and when we came out, our host was
waiting for us, to say that breakfast was ready, and would not listen
to our going away until we had partaken of it with him. We sat down
to a beefsteak breakfast, with all the extras. I did not think I was so
hungry, but the smell of the victuals made us both ravenous. Our

host seemed to enjoy seeing us eat and thanked us heartily for
making him the visit, going so far as to say that in case the boat did
not come that day he would be glad to entertain us again. In books
and in other ways I had heard of southern hospitality and I now
know it was all true. I wonder if it was ever put to a severer test.
We went down to the landing and found a guard of soldiers from an
Illinois regiment, keeping watch over a quantity of sugar and
molasses which the government has confiscated, and which the boat
was expected to take away when it came. They invited us to make
one of their party until the boat came, and we gladly accepted the
invitation. They thought we had risked our lives in going to stay with
Mr. Nelson, and eating food in his house, but we did not believe it,
and did all we could to make them think better of him than they had
so far done. The guards shot a hog, which made fodder for our folks
for the day, together with the government rations we already had.
The day passed and another night came on and still no boat. We
crawled in wherever we could get and slept as best we could for the
mosquitoes, which seems determined to eat us alive.
October 23, 1863.
A cold rain storm that has been threatened for a day or two came
upon us early this morning. A small flock of sheep came up the road
driven by a man on horseback. The negroes from everywhere have
gathered here and the rations we give our men they give away to
their friends and are always hungry in consequence. When the
sheep came along they surrounded them and killed at least a dozen
before we could stop them. The man hustled along with what was
left and those killed were soon skinned and being cooked in various
ways. We had mutton for dinner and for supper, and had enough left
for breakfast. The day finally passed and we began looking for better
sleeping quarters. Reynolds and I with a part of the guard finally
climbed a ladder and got into a loft full of cornstalks with the corn
on just as it had been cut and stored away. The place was alive with

rats and mice, which ran over and through the stalks, making a
terrible racket, varied once in a while by a fight among themselves.
We got used to the racket and finally were asleep. Just as we were
enjoying ourselves, along came the boat we had waited so long for.
We hustled to sort out the nigs that belonged to us and get them on
board. In a little while we were off. The boat was crammed full of
people—black and white, old and young, men and women all spread
out on the cabin floor, or the tables. I never saw such a mass of
people in so small a space. We poked around and after a while
found room to lie down, after which getting asleep was quick work.
October 24, 1863.
Saturday. Another raw day. Now that the people are standing on end
there is more room to get about. We made out to eat such as we
had; while we wished for more, we had to content ourselves with
what we had grabbed hold of the night before in the dark. At noon
we passed Franklin, and about 3 p. m. reached Centerville, where
there was a lot of sugar to load on the lower deck. The captain said
if we would turn in our men to roll on the sugar he would undertake
to fill them up.
I took advantage of the stop to see what the place looked like. On
one of the streets I saw oranges on a tree and went in to see if I
could beg or buy a few. As I went into the yard a young lady came
out and, in a tone and with a look that almost froze me, asked what
I was doing in her yard. To save me I couldn't think what to say, but
I did after a while come to enough to say I would like an orange.
She turned to a negro and motioned towards the trees, when he
went and picked his hands full and gave me. Then the madam
pointed her finger towards the street and said, Now that you have
what you came after will you please go—and I went. I don't know
yet what I ought to have said or done, but the only thing I did was
to get back to the boat as fast as I could. I kept the adventure to
myself, and gave the oranges away, for I think they would have

choked me. That is a sort of southern hospitality I never read of in a
book, or heard of in any other way. I never saw so much scorn on a
face before. Why I stood there like a chicken thief caught in the act,
and then carried off the oranges, I don't now know. If the Rebels
were all like her I would resign and go home at once, for she did
actually scare my wits all away from me. The sugar was on board
and true to his promise the captain ordered a supper for our army,
which must have made his stock of provisions look small. Rube
asked me what I found the town like, and I told him it was different
from any I had yet seen. We soon got settled down for the night.
October 25, 1863.
Sunday. When we awoke we were in sight of Brashear City. We
landed, formed in line as well as we could, and marched to our
headquarters, where I found my old crony, Sol Drake. We found
quarters for the men in an unused building, and in a little while their
woolly heads were sticking out from every window.
The quartermaster drew clothes for them, and they were soon fitted
out with suits of blue, just like the rest of the Linkum Sogers. The
trouble was to fit them with shoes. I doubt if many had ever had a
shoe on their feet. Their feet are wide at the toes and taper straight
back to the heel. No. 12 was the smallest size we found use for, the
most of them taking 14 or larger. They insisted on squeezing a No.
14 foot into a No. 10 or 12 shoe, but we, knowing what that would
result in, got them properly shod after a long time. Then how proud
they were! We then gave them their rations for the day, telling them
through interpreters that if they wasted it or gave it away, they could
have no more until to-morrow. We moved all our belongings from
the boat and filled out the day visiting and talking over old times,
and at early bedtime settled down for the night in a four-room house
which has been taken for our headquarters while here.

October 26, 1863.
Brashear City, La. Monday. On going out this morning who should
appear to me but George Story of Company B, who was captured
with General Dow at Port Hudson last summer. He says he was well
treated by his captors, and has no fault to find with them. They took
him and the general to Richmond, and put them in Libby Prison.
After a while he was paroled, and sent to Annapolis, Md. There he
was kept until exchanged, and then sent south in charge of the
provost marshal to be turned over to the 128th New York. Through a
mistake at headquarters he was sent here, as the 128th was
supposed to be at the front in the Teche country. If he had not met
us as he did, he would have gone up the Teche on the next boat. As
it is he will go back to New Orleans to-morrow, and look for his
regiment up the river, probably at Baton Rouge, where we left them.
We commenced teaching our recruits the rudiments of soldiering.
They are awkward, but very anxious to learn, and as that is the main
thing, we look for little trouble in drilling them. By shoving them
together, lock-step fashion, they soon got the idea of marching in
time, and on the whole did as well or better than we did at Hudson,
when we took our first lesson. The quartermaster has gone to the
city for equipments, tents, etc., and when he returns we will soon be
at the Manual of Arms. We expect Major Palon here to-day to take
charge, and by the time Colonel B. and the rest get back, hope to
have our recruits fit for turning over to any regiment that needs
them.
October 27, 1863.
Tuesday. It rained hard all day, consequently no drill or other work
was attempted. Major Palon and the quartermaster came from the
city, the latter with rubber blankets and shelter tents for the recruits.
He also brought some letters, one for me telling about the draft at
home. Those that are drafted can get off by hiring a substitute or by

paying $300, in which case a substitute is furnished them. I am glad
I enlisted. There have been times when I could hardly say it, but I
can say it now with all sincerity.
More women and children have come, wives and children of the men
we have. Poor things! I suppose they have nowhere else to go or to
stay, so they have followed on after their husbands and fathers. I
have heard that the government has provided camps for them,
where rations are served to them just as to the soldiers. It is a very
proper thing to do, and I hope it may be true that these helpless
ones are thus provided for. This arming of the negroes is not such a
simple affair as it seemed. This is a side I had not thought of, but I
don't see how it can be dodged.
October 28, 1863.
Wednesday. The rain has stopped, and the mud is now having its
turn. It makes us just as helpless as the rain did. We have put in the
time making plans for the time when the mud hardens. It does not
dry up, as it does in the north, but the water seems to settle and
leave the ground hard even if there be no sun or wind.
October 29, 1863.
Thursday. After a council on matters and things in general, we have
made some changes, looking to a more orderly arrangement of our
camp life in these quarters. The hangers on about camp have been
driven away. The quartermaster's stores and those of the
commissary department have been separated and placed in tents
outside, where they can be found and got at. The most intelligent
among the recruits have been appointed corporals and sergeants,
and the screws of discipline turned on just a little more. Guards are
placed, more for their instruction than for our safety, and things are
putting on more the appearance of a military camp than a mere

lounging place, as it has heretofore been. Just as we had got
everything to our notion, a boat came, and on it were Captains
Merritt and Enoch with 120 more recruits. Tents and blankets were
given them and quarters assigned them, which altogether has made
a busy day for us. Discipline, what little there had been, went to the
winds when the men all got together. They all seemed to be
acquainted, and such jabbering French as they had. I suppose they
had lots of news to tell each other. Some can talk English, but all of
them can and do talk French when talking to each other. They came
from Colonel B.'s headquarters at Opelousas, and were in charge of
Colonel Parker, who got left behind at Newtown, and will be along on
the next boat. At night Dr. Warren, our surgeon to be, came from
New Orleans, and to-morrow will examine the recruits. Sol Drake has
been sent for to join Colonel B. at Opelousas and expects to leave
on the next boat. Opelousas is beyond where I have been. I have
posted Sol in getting as far as Mouton's, where we were, and
beyond that he must find out for himself.
October 30, 1863.
Friday. It has been a rainy day, but we have paid little attention to it.
Dr. Warren finished up his examination and nearly every man passed
muster. He was not as particular about it as Dr. Cole was at Hudson.
As fast as examined and passed we gave them their new clothes,
and a prouder set of people I never saw. Lieutenant Colonel Parker
came at night with later word from Colonel B. and Drake does not
have to go. For this he and the rest of us are glad. Colonel Parker
brought eight men with him and about as many women. We have
quite a respectable squad, and they are learning very fast—faster I
think than we did when we first began. Those that were rejected by
the surgeon as unsound are here yet, and what to do with them is a
puzzle to us. We have each of us taken one, to do anything for us
we can think of, and they seem perfectly happy. Mine is named Tony,
and is a great big good-natured soul, ready to do anything for me, if

I will only let him stay. He came to me at first asking if I would write
a letter to his wife, and when I asked him what I should write, told
me anything I was a mind to. I wrote the letter, telling her where he
was, and how he was, and put in a word for some of the others for
Tony's wife to tell their folks. This pleased him so much that he hung
around trying to do me a favor in return, and when he was rejected
by the doctor he said I must keep him, for he would be killed if he
went back home, because he had enlisted. The government allows
us transportation and a daily ration for a servant, so I am nothing
out, for he asks no other pay than his board and the privilege of
staying.
October 31, 1863.
Saturday. Lieutenant Colonel Parker and Dr. Warren left us to look
for a healthier place, as many of the men are getting chills and fever.
The ground is low and wet and I suppose is a regular breeding place
for fever and ague. We are glad of a prospect of a change, but this
country is all swampy and wet. The Teche country comes the
nearest to dry ground of anything I have seen. We are getting into
full swing. Companies A, B, and C are organized and assigned to
Captain Merritt, Captain Hoyt, and Captain Enoch. There are thirty
men left and these are turned over to Lieutenant Reynolds for drill.
At night, a telegram from Colonel Parker says we must stay at
Brashear City until our regiment is full. I have been out of sorts to-
day and have laid up for repairs.
November 1, 1863.
Sunday. Was detailed for officer of the guard, but not feeling well
Lieutenant Reynolds volunteered to act for me, for which I am very
much obliged. I put in another day trying to be sick, but toward
night gave it up as a failure. However, I put in the day by staying

indoors, writing letters for the men, some to their wives and some to
their sweethearts. The more love I can put in the letters, and the
bigger words I can use, the better they suit the sender. What effect
they have on those that receive them I happily do not know.
November 2, 1863.
Monday. I lay down last night thinking if only mother was here to fix
me up a dose, as she has so many times done, I should be well right
off. I soon dropped off, and the same thought kept right on going
through my brain until I awoke this morning and found myself in the
same position, lying crosswise of my bed just as I lay down last
night. But my dream of home had cured me, and I was myself
again, ready for whatever might come.
I found myself again on the detail for guard. After the new guard
was posted I had but little to do, except to see to it that the reliefs
were changed at the proper time. There was no enemy in sight,
though the guards were just as watchful as if the enemy had been in
the next yard. The worst was to remember the names of the
sergeants, and that I got round by writing them down. Even then I
had to guess at some. At night Colonel Parker came back from the
city, on his way to join Colonel B., who is at the front with the rest of
the gang. He brought me two letters, one saying father is sick and
the other saying he is well again. I am glad the good news came
with the bad, though I had much rather no news of that kind would
come. I also had a list of names of those drafted from the town of
North East. John and Perry Loucks and Amon Briggs were among
them. Whether they will go or get substitutes the letter did not say.
Also that another proclamation from the President calls for 300,000
more men. I wonder if he knows what an army we are raising for
him here. Report says an accident between here and Algiers last
night killed twelve soldiers and wounded over sixty more. One train
broke down and another ran into it, both loaded with soldiers. These

roads are so straight and level it would seem that accidents of that
kind might be avoided.
November 3, 1863.
Tuesday. I made a raise of a postage stamp to-day and sent a letter
home. The day has passed like all do nowadays, with little to do. But
it has been pleasant, and that is an exception I am happy to make a
note of. The quartermaster came in to-night with more tents, and
more supplies.
November 4, 1863.
Wednesday. The steamer Red Chief came down the Teche this
morning with more recruits, in charge of Lieutenants Gorton, Smith,
Heath and Ames. This will make more work and I am glad of it.
Lieutenant Colonel Parker has been on the point of starting up the
country again for several days, but has not gone yet. To-day he has
decided to move our quarters to higher ground. This is a wise thing
to do according to Dr. Warren, for a great many of the men are sick
with chills and fever. The site chosen is about a mile away. I am
detailed to see that the stuff gets off, and the others are to be on
the new site and receive it, and see to its proper distribution. I am
temporarily assigned to Company D. By noon I had everything on
the way, and after reaching camp helped to get Company D in as
good shape as the others. A regular camp is laid out and company
streets made. It made me think of the laying out of Camp Millington.
Grading the company streets and other necessary work will give us
something to do for days to come. I put in so much time helping the
others get fixed that I forgot my own tent, and as Captain Enoch
invited me to sleep with him, I accepted, and after fighting
mosquitoes until nearly midnight, I fell asleep and remained so until
late the next morning.

November 5, 1863.
Thursday. Tony was waiting for me when I woke up, and was feeling
badly because I had to go to the neighbors to sleep. After our hard-
tack and coffee were safely stowed away, I got my tent out and we
soon had it up. Then Tony began skirmishing for furnishings. He had
seen what the others had and set out to beat them all. He got hold
of a board wide enough and long enough for me to sleep on, and
soon had legs driven in the ground to hold it up. My modest
belongings were put under it, and the deed was done. Colonel
Parker gave a few parting orders and then took boat for New Iberia
to join Colonel B., leaving Captain Merritt, in command. Captain Laird
not yet having joined the command, I am curious to know what sort
of a man I am to serve under. Company D is as yet made up of raw
recruits, not yet having passed through the medical mill, so I have
only to keep them within bounds until they are examined and sworn
in as soldiers, when their education will begin.
At night Dr. Warren and Lieutenant John Mathers came from New
Orleans. A cold drizzling rain began about that time and we were
driven into our tents, where the hungry mosquitoes awaited us and
war was at once declared. If I had a brigade of men as determined
as these Brashear City mosquitoes, I believe I could sweep the
Rebellion off its feet in a month's time. They make no threats as our
home mosquitoes do, but pounce right on and the first notice you
get is a stab that brings the blood. I have had at least one bite for
every word I have written about them, and all in the same time I
have been writing it. The only escape from them is in the hot sun, or
under a blanket so thick they cannot reach through it.
November 6, 1863.
Friday. This morning Lieutenants Reynolds, Smith, Ames and myself
formed a club of four for mutual protection against starvation. We
have a rejected recruit for a cook, and have made a draft on the

commissary for salt horse, hard-tack and coffee. If he can't get up a
meal on that, then he's no cook for us. My company was examined
and almost every one proved to be sound enough for soldiers. A
dozen at a time were taken into a tent, where they stripped and
were put through the usual gymnastic performance, after which they
were measured for shoes and a suit, and then another dozen called
in. Some of them were scarred from head to foot where they had
been whipped. One man's back was nearly all one scar, as if the skin
had been chopped up and left to heal in ridges. Another had scars
on the back of his neck, and from that all the way to his heels every
little ways; but that was not such a sight as the one with the great
solid mass of ridges, from his shoulders to his hips. That beat all the
anti-slavery sermons ever yet preached. But this is over with now,
and I don't wonder their prayers are mostly of thanks to Massa
Linkum. They are very religious, holding prayer meetings every
night, after which the fiddle begins and dancing goes on all night, if
not stopped on account of the noise they make. I don't know how
they get along with so little sleep, or rest. After the examination we
got blankets and clothes from the quartermaster and they were
fitted as well as it is possible to fit from a ready-made stock.
Our cook, George, proved to be a jewel. He made salt beef taste so
much like a chicken we didn't notice the difference. Major Palon
came from the city at night, and brought some letters. One was for
me and contained three dollars from my old crony, Walt Loucks. This
will keep us in extras for a little while. We were some time deciding
how to use it, but a majority thought a part of it should go for flour,
so George could try his hand at pancakes.
November 7, 1863.
Saturday. I have never described our camp, and may never have a
better time than now. We are out of town, to the north, on high,
hard ground, for this country—so high that there is quite a slope
towards the water of Berwick Bay. Company streets are laid out and

the camp kept clean by a detail made each day for that purpose.
There are many large trees in and about our camp, and taken
altogether we have never had a stopping-place quite equal to it. The
sick list has shrunk already, though the hospital tent is pretty well
filled yet. We have company-drill every day and there is quite a strife
among us to see which can learn his troop the fastest. The men are
as eager to learn as we are to have them, which makes it much
easier for both parties. Berwick, which is directly opposite, is quite a
place from the looks, larger than Brashear. It is the shipping port for
the great Teche country that lies beyond.
Just after dinner Colonel Tarbell's orderly rode into camp and
inquired for me, handing me an order which read, Lieutenant
Lawrence Van Alstyne, commanding Company D, 90th U. S. C. I., at
Brashear City, La. Captain Vallance, quartermaster, will furnish the
bearer with a boat, in which he will proceed to Berwick and procure
a sufficient supply of lumber to floor the hospital tent in said
regiment. Signed, Tarbell, commander. I took five men and such
tools as we could find and called on Captain Vallance, who gave us a
boat in which we rowed across the bay, which was still as a mill
pond. We landed near a shanty which easily came apart, and which
had good wide boards, enough to floor several hospital tents. We
made these into a raft which we towed back, reaching camp without
having seen a person, except a guard—who considered my order
good enough authority for letting the boards go. We had boards
enough for the hospital tent and all the other tents, which as soon
as they are dry will be used for the comfort of all hands. At night
Lieutenant Gorton arrived from the city to take the next boat for
Newtown to join Colonel B.
Lieutenant Smith made me a present of a handsome pair of shoulder
straps. The groundwork is dark velvet and the border of gold cord
twisted and woven together. Altogether they are as handsome a pair
as I have ever seen on anybody's shoulders. I shall lay them away
until I get a coat fit to put them on, and that won't be until after pay
day. Thank you, Matt, I'll try and not disgrace them. I presume he

paid money for them that he needed for fodder; but that's just like
Matt Smith. Major Palon also returned to-night, and made some
changes. Lieutenant Ames, my partner in Company D, goes in the
medical department as clerk, and Lieutenant Reynolds takes his
place with me.
November 8, 1863.
Sunday. On duty to-day as officer of the guard. Generally that is a
light duty, but with these men it is not so much so. None of the men
can read or write, and so the sergeant and corporal of each relief
has to have the names of his relief repeated to him until he
remembers them. Even then there are many mix-ups that have to be
straightened out. The names are strange to me, and after writing
them as they sound, I find it difficult to pronounce them.
I went the rounds during every relief, and never failed to find
something out of joint. One at the Major's tent, whom I had taken
extra pains to educate, I found taking his gun apart to see how it
was made. Another had his shoes and stockings off and was walking
his beat with bare feet. Another had taken off his accoutrements and
piled them up at the end of his beat and was strutting back and
forth with folded arms. The only thing to do is to call up a man who
speaks both French and English and through him straighten the
matter out.
November 9, 1863.
Monday. To-day an order came to move to New Orleans. That is, all
the companies that are full. That leaves Company D here until more
men come. There is a regular jollification over the order, as none of
us are in love with this place. I suppose it would be a proper thing
for me to introduce the officers of the Ninetieth to whom the readers
of this diary may be, and as there is nothing to prevent I will do it

now. If I ever get a chance to read it myself it will call them up
before me as I now know them.
Colonel Edward Bostwick comes first, and any one who will be apt to
read this knows him as well as I. But as I want the list complete I
will begin with him and work down the line. He is about five feet ten
inches, light complexion, gray eyes, with brown hair and beard. He is
rather particular about his own appearance, and also that of the
men under him. He is always on the lookout for a higher limb to
roost on, and after getting there himself, is very good about helping
his friends up to him. He seldom drinks, never to excess, and on the
whole is a good soldier. He came out as captain of Company B,
128th New York. Was promoted to major of the First Louisiana
Engineers, May 2, 1863. He served at Port Hudson with them and
had the name of doing well whatever he was ordered to do. In
August 1863, was promoted to the rank of colonel, with permission
to raise a regiment from the freed slaves in this department, and this
he is now trying to do.
Lieutenant Colonel George Parker is from Poughkeepsie. Came out
as captain of Company D, 128th New York. On Colonel Bostwick's
recommendation he was promoted to his present rank. He is about
five feet seven inches, light complexion, sandy hair and beard. Is
well up in military tactics, and is afraid of nothing. Rushes right into
anything, regardless of getting out again. Is kind to his men, but a
strict disciplinarian. When his orders are obeyed he is all right, but
when he gets angry he acts without judgment or feeling for any one
or anything.
Major Rufus J. Palon is from Hudson. Came out as second lieutenant
in Company G, 128th New York. He has the army regulations and
military tactics at his tongue's end. Is pretty strict on discipline, but
never loses his head. Money has no value to him. He would give his
last cent to any one in need, even though he might be just as needy
himself.

Surgeon Charles E. Warren is tall, dark complexion, with dark sandy
hair and beard. So far as I know he is a good surgeon. He is free
with his money, and with the hospital whiskey. A real good fellow,
though not in all things the sort one can pattern after with safety.
Quartermaster Peter J. Schemerhorn left home as orderly sergeant
of Company G, 128th New York. Acted as second lieutenant of his
company at Port Hudson, and was afterwards detailed as clerk at
headquarters, where he remained until the formation of this
regiment, when he was made first lieutenant and acting
quartermaster. He makes a good quartermaster, seeing that his stock
is kept up and ready for distribution.
Adjutant T. Augustus Phillips is one of the boys. He served in the
Second Fire Zouaves in the three months' service and afterwards
came out as orderly sergeant in the 165th New York. Was detailed
as clerk at headquarters and in some way got a recommendation for
adjutant in Colonel Bostwick's regiment. He is a New York tough.
Gets drunk as a lord, and looks down upon any one else who does
not do as he does. He is not as popular in the regiment as he might
be.
Captain Thomas E. Merritt was formerly sergeant in Company I,
128th New York. Was raised to acting second lieutenant of same
company, and finally promoted to captain in this regiment. He has
traveled a great deal and remembers what he has seen. He seems
well fitted for the position he now holds and stands well with all
hands.
Captain Charles Hoyt is as good an all-round man as is often found.
He is fine-looking, a fine singer, has a way of being everyone's
friend, and making everyone a friend to himself. He is cut out more
for society than for the army. He takes now and then a drink, but
never gets beyond himself. Will share his last dollar or his last hard-
tack with any one. Altogether, he acts as a sort of balance wheel to
the rest of the machine, keeping some from going too fast, and

helping others to go faster. He would be missed if taken away, more
than any half dozen of us.
Captain Richard Enoch came out as first sergeant of Company I,
128th New York. He was wounded at Port Hudson, and did not again
join his company, being recommended for promotion as first
lieutenant in the Corps de Afrique, from which he came to us with a
captain's commission. He has a jovial disposition, but has a very
quiet way of showing it. He sometimes takes a little too much, and
then is reckless of his money and of the good name he has gained.
Every one likes him, because they cannot help it. As a military man I
doubt if he is ever heard much about. He had rather have a good
time, and no matter what is going on he generally manages to have
it.
There are several other officers who have not yet reported and of
them I know nothing. One of them is Captain Laird, who will be
captain of Company D, when he comes.
First Lieutenant Robert H. Clark was promoted from sergeant in the
116th New York. He is an excellent penman and would make a much
better clerk in some department office than he ever will a soldier. He
is rather hasty tempered, and has already had several jars with his
brother officers, particularly with Adjutant Phillips, whose assistant
he at present is. If Adjutant Phillips kicks clear out from the traces
Lieutenant Clark will probably succeed him.
First Lieutenant Martin Smith was formerly an engineer on the
Harlem R. R. He went out with a three months' regiment and
afterwards as sergeant in Company G, 128th New York. He is open-
hearted and outspoken. One can always tell where he is, for he is
not deceitful. He is well liked by his brother officers. Just now he lies
on his back on my bed making fun of a stove I have manufactured
out of a camp kettle. He has no idea I am writing his biography.
First Lieutenant Reuben Reynolds is from Hudson, N. Y. He came out
as a private in Company A, 128th New York. Was promoted to

corporal, then to sergeant and then to first lieutenant in this
regiment. He looks as if he had just been taken from a bandbox. No
matter what clothes he has on he always looks neat and well
dressed. He was on a three years' whaling voyage before the war,
and tells some very interesting stories of his life on shipboard.
Before he came to us he was detailed as clerk in the Y. M. C. A. at
New Orleans. He is a professor of religion, and I think tries to make
his profession and his army life jibe. We all respect him, though
none of us feel as if we fairly knew him.
First Lieutenant John Mathers is from Fishkill, N. Y. He came out as a
private in Company F, 128th New York. Was promoted to second
lieutenant in the Third Engineers, and from that to our regiment as
first lieutenant. For some unknown reason he and I took a dislike to
each other while in the 128th, and used to pass each other by as
one surly dog does another. Since we have been thrown together we
have talked the matter over, and neither of us can give any reason
for our mutual dislike. We are the firmest of friends now, together
much of the time we can call our own. We are not a bit alike. He is a
regular dandy in appearance but the commonest sort of a fellow
when you get at him.
First Lieutenant Charles Heath was a sergeant in Company I, 128th
New York. Was given a commission in the Third Louisiana Engineers,
and afterwards given the same position in this regiment. In my
opinion his head is not right. He acts strange at times. Sometimes he
is as quiet and docile as can be, and in a little while as profane and
foul-mouthed a man as I ever met. Is not ambitious, but seems to
take what comes as a matter of course. He has no intimates,
keeping mostly to himself. What influence ever brought him up from
the ranks I cannot imagine.
First Lieutenant Garret F. Dillon was promoted from sergeant in
Company H, 128th New York. He is a very small man, has a lisp, and
a mincing walk. He looks and acts as if he was cut out for a dandy,
but lacked the material for making one, and was thrown out in the
shape he now is.

First Lieutenant Charles M. Bell was first sergeant of Company G,
128th New York. At the battle of Port Hudson he happened to be
nearest Colonel Cowles when he fell. He received the colonel's dying
message to his mother and was sent home with the body. He is one
of the most capable of the whole lot of us. There is no position he
could not fill, were it not for his liking for strong drink. This he does
not seem able to control. I believe he tries to but lacks the strength
to resist the temptations that are constantly placed in his way. Poor
Bell, I pity him more than any other man here. With the right
influences about him, what a different man he might be. He has
more good traits than any of us can boast, but his one besetting
weakness is strong enough to overcome them all.
First Lieutenant George H. Gorton enlisted in the 128th New York, as
wagoner. Was promoted to commissary sergeant in the Third
Louisiana Engineers, and from there he came as first lieutenant to
this regiment. He is of a strange make-up. Is well liked by all, but
not greatly respected by any. Is a good horseman and would
probably make out better handling horses than he does men. Put
him anywhere, and he manages to make money, and manages to
spend it as fast as he gets it. Is free-hearted and obliging and I
never knew of his having an enemy. Neither does he make any
lasting friendships. He worked as teamster for Colonel Bostwick
before going into the army, and it was through Colonel Bostwick that
he got the position he now occupies.
First Lieutenant Henry C. Lay was a corporal in Company A, 128th
New York. I knew him while in that regiment, but he has not yet
reported for duty with us. He is on some special service and I
suppose will sometime turn up among us. From what little I know of
him I should say he will average well with the rest of us.
First Lieutenant George S. Drake was also with Colonel Bostwick
before he entered the army. He was commissary sergeant in the
128th New York, and always in close touch with Colonel B. He and I
have long been fast friends, so it will not do to say anything against
him. But I couldn't if I would. There is nothing but good to say of

Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

Language And Speech Processing 1st Edition Joseph Mariani

More Related Content

Similar to Language And Speech Processing 1st Edition Joseph Mariani (20)

Recently uploaded (20)

Language And Speech Processing 1st Edition Joseph Mariani