Advanced Signal Processing For Communication Systems The Springer International Series In Engineering And Computer Science 1st Edition Tadeusz Wysocki

Advanced Signal Processing For Communication
Systems The Springer International Series In
Engineering And Computer Science 1st Edition
Tadeusz Wysocki download
https://ebookbell.com/product/advanced-signal-processing-for-
communication-systems-the-springer-international-series-in-
engineering-and-computer-science-1st-edition-tadeusz-
wysocki-2127288
Explore and download more ebooks at ebookbell.com

Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Advanced Signal Processing For Industry 40 Evolution Communication
Protocols And Applications In Manufacturing Systems Bajaj Ansari
https://ebookbell.com/product/advanced-signal-processing-for-
industry-40-evolution-communication-protocols-and-applications-in-
manufacturing-systems-bajaj-ansari-50856820
Advanced Signal Processing Theory And Implementation For Sonar Radar
And Noninvasive Medical Diagnostic Systems 2nd Ed Stergios
Stergiopoulos
https://ebookbell.com/product/advanced-signal-processing-theory-and-
implementation-for-sonar-radar-and-noninvasive-medical-diagnostic-
systems-2nd-ed-stergios-stergiopoulos-4096642
Advanced Signal Processing Handbook Theory And Implementation For
Radar Sonar And Medical Imaging Real Time Systems Stergiopoulos
https://ebookbell.com/product/advanced-signal-processing-handbook-
theory-and-implementation-for-radar-sonar-and-medical-imaging-real-
time-systems-stergiopoulos-6750218
Advanced Signal Processing Handbook Theory And Implementation For
Radar Sonar And Medical Imaging Realtime Stergios Stergiopoulos
https://ebookbell.com/product/advanced-signal-processing-handbook-
theory-and-implementation-for-radar-sonar-and-medical-imaging-
realtime-stergios-stergiopoulos-1269268

Advanced Signal Processing Theory And Lmplementation For Sonarradarand
Noninvasive Medical Diagnostic Systems 2009th Edition Stergios
Stergiopoulos
https://ebookbell.com/product/advanced-signal-processing-theory-and-
lmplementation-for-sonarradarand-noninvasive-medical-diagnostic-
systems-2009th-edition-stergios-stergiopoulos-231593430
Signal Processing For Neuroscientists A Companion Volume Advanced
Topics Nonlinear Techniques And Multichannel Analysis Elsevier
Insights 1st Edition Drongelen
https://ebookbell.com/product/signal-processing-for-neuroscientists-a-
companion-volume-advanced-topics-nonlinear-techniques-and-
multichannel-analysis-elsevier-insights-1st-edition-drongelen-1824500
Multimedia Signals And Systems Basic And Advanced Algorithms For
Signal Processing 2nd Edition Srdjan Stankovi
https://ebookbell.com/product/multimedia-signals-and-systems-basic-
and-advanced-algorithms-for-signal-processing-2nd-edition-srdjan-
stankovi-5354424
Advanced Design Techniques For Rf Power Amplifiers Analog Circuits And
Signal Processing 1st Edition Anna N Rudiakova
https://ebookbell.com/product/advanced-design-techniques-for-rf-power-
amplifiers-analog-circuits-and-signal-processing-1st-edition-anna-n-
rudiakova-2356380
Advanced Methods For Processing And Visualizing The Renewable Energy A
New Perspective From Signal To Image Recognition 1st Ed 2021
https://ebookbell.com/product/advanced-methods-for-processing-and-
visualizing-the-renewable-energy-a-new-perspective-from-signal-to-
image-recognition-1st-ed-2021-36127610

ADVANCED SIGNAL PROCESSING
FOR COMMUNICATION SYSTEMS

THE KLUWER INTERNATIONAL SERIES
IN ENGINEERING AND COMPUTER SCIENCE

ADVANCED SIGNAL PROCESSING
FOR COMMUNICATION SYSTEMS
edited by
Tadeusz A. Wysocki
University of Wollongong, Australia
Michael Darnell
The University of Leeds, United Kingdom
Bahram Honary
Lancaster University, United Kingdom
KLUWER ACADEMIC PUBLISHERS
NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW

eBook ISBN: 0-306-47791-2
Print ISBN: 1-4020-7202-3
©2002 Kluwer Academic Publishers
New York, Boston, Dordrecht, London, Moscow
Print ©2002 Kluwer Academic Publishers
All rights reserved
No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,
mechanical, recording, or otherwise, without written consent from the Publisher
Created in the United States of America
Visit Kluwer Online at: http://kluweronline.com
and Kluwer's eBookstore at: http://ebooks.kluweronline.com
Dordrecht

CONTENTS
PREFACE ix.
1.
2.
3.
4.
5.
6.
Application of Streaming Media in Educational Environments
P. Doulai 1
Wideband Speech and Audio Coding in the Perceptual Domain
15
L.Lin, E.Ambikairajah and W.H.Holmes
Recognition of Environmental Sounds Using Speech Recognition
Techniques
M.Cowling andR.Sitte 31
A Novel Dual Adaptive Approach to Speech Processing
M.C.Orr, B.J.Lithgow, R.Mahony, andD.S.Pham 47
On the Design of Wideband CDMA User Equipment (UE) Modem
K.H.Chang, M.C.Song, H.S.Park, Y.S.Song, K.-Y.Sohn, Y.-H.Kim,
C.I.Yeh, C.W.Yu, andD.H.Kim 59
MMSE Performance of Adaptive Oversampling Linear Multiuser
Receivers in CDMA Systems
P.Iamsa-ard andP.B.Rapajic 71

7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
vi.
Peak-to-Average Power Ratio of IEEE 802.11a PHY Layer
Signals
A.D.S.Jayalath and C.Tellambura 83
97
A Proposed Hangup Free and Self-Noise Reduction Method for
Digital Symbol Synchronizer in MFSK Systems
C.D.Lee and M.Darnell
A Channel Sounder for Low-Frequency Sub-Surface Radio Paths
D.Gibson and M.Darnell 113
Computational Complexity of Iterative Channel Estimation and
Decoding Algorithms for GSM Receivers
H.Cui and P.B.Rapajic 129
Modelling and Prediction of Wireless Channel Quality
S.Ci andH.Sharif 139
Packet Error Rates of Terminated and Tailbiting Convolutional
Codes
J.Lassing, T.Ottosson and E.Ström 151
The Feng-Rao Designed Minimum Distance of Binary Linear
Codes and Cyclic Codes
J.Zheng, T.Kaida andK.Imamura 167
On a Use of Golay Sequences for Asynchronous DS CDMA
Applications
J.R.Seberry, B.J.Wysocki and T.A.Wysocki 183
PUM-Based Turbo Codes
L.Fagoonee, B.Honary and C.Williams 197
A Code for Sequential Traitor Tracing
R.Safavi-Naini and Y.Wang 211
Software-Defined Analyzer of Radio Signals
J.Lopatka 225
Interleaved PC-OFDM to Reduce Peak-to-Average Power Ratio
A.D.S.Jayalath and C.Tellambura 239

19.
20.
INDEX
vii.
283
Reducing PAR and PICR of an OFDM Signal
K.Sathananthan and C.Tellambura 251
Iterative Joint Equalization and Decoding Based on Soft Cholesky
Equalization For General Complex Valued Modulation Symbols
J.Egle and J.Lindner 267

PREFACE
In the second year of the twenty first century, we are witnessing
unprecedented growth in both quality and quantity of services offered by
communication systems. Most of the recent advancements in communication
systems performance have been only made possible by application of digital
signal processing in all areas of communication systems development and
implementation. Advanced digital signal processing allows for the new
generation of communication systems to approach the theoretical
predictions, and to practically utilize the ideas that have not been considered
feasible to implement not so long ago. This book consists of 20 selected and
revised papers from the 6th
International Symposium on Digital Signal
Processing for Communication Systems, held in January 2002, at Pacific
Parkroyal Hotel in Manly, Sydney, Australia.
The first group of papers, deals with the audio and video processing for
communications applications, and includes topics ranging from multimedia
content delivery over the Internet, through the speech processing and
recognition to recognition of non-speech sounds that can be attributed to the
surrounding environment.
Another theme which receives significant attention in this book is orthogonal
freqency division multiplexing (OFDM) in its various forms, eg
HIPERLAN, IEEE 802, 11 a. Aspects of OFDM technology, which are
covered here, include novel forms of modulation and coding, methods of
reducing in-band and out-of-band spurious signal generation, and means of
reducing the peak-to-average power ratio of an OFDM waveform. In these
contributions, a key objective is to return the inherent implementational

simplicity ofthe OFDM technique whilst enhancing its performance relative
to single carrier systems.
Digital signal processing for second and third generation systems is
represented in the book as well. The topics covered here include both
theoretical issues like spreading sequence design and implementation issues
of 3G user equipment modem, and MMSE receivers for CDMA systems. A
useful comparison of complexity of channel estimation, equalization and
decoding for GSM receivers is discussed, too.
The book also includes useful papers on applications of error control
coding and information theory. These start with mathematical structure and
decoding techniques and continue with channel capacity approaching codes
and their applications to various communication systems.
The last group of papers included in the book consider several important
issues of digital signal processing for communication systems like
modulation, software defined radio, and channel estimation.
The Symposium was made possible by the generous support of the New
South Wales Section of IEEE, the Smart Internet Technology Cooperative
Research Center, the Telecommunications and Information Technology
Research Institute, the Australian Telecommunications Cooperative
Research Center, and the School of Electrical, Computer, and
Telecommunications Engineering at the University of Wollongong. The
Organizing Committee is most grateful for their support. The editors wish to
thank the authors for their dedication and lot of efforts in preparing their
contributions, revising and submitting their chapters as well as everyone else
who participated in preparation ofthis book.
Tadeusz Wysocki
Mike Darnell
Bahram Honary
x.

APPLICATION OF STREAMING MEDIA IN
EDUCATIONAL ENVIRONMENTS
Parviz Doulai
Educational Delivery Technology Laboratory (EDTLab), University of Wollongong, Wollongong
NSW 2500, Australia
Abstract: This paper discusses the growing application of Web-based instruction
and examines real time streaming technology in educational settings.
The steps required in the process of applying streaming technology in
education are outlined, and available tools and the nature of the
delivery platforms are identified. The prospects and challenges in
introducing virtual learning environments to tertiary institutions are
illustrated using two case studies. It will not be long to overcome the
challenges confronting technology-based education in traditional
teaching institutions.
Key words: Educational Technology, Streaming, Multimedia, Virtual Learning
Environment, Virtual Classroom
Chapter 1
1. INTRODUCTION AND BACKGROUND
Educational institutions have long been a testing ground for the latest
technological breakthroughs that change the way professional educators
work and live. Examples include the growing application of information
communications technologies and the use of network delivered multimedia
educational modules through the application of interactive and dynamic Web
environments. The growing global information technology revolution has
already changed the face and culture of teaching and learning in Australia
and other parts of the world, creating new opportunities and challenges for
professional educators. The new and emerging educational technologies
have enabled academic institutions to provide a flexible and more open

learning environment for students. It is shown that in a well-designed web-
based support system, students take more responsibility for their own
learning, and instructors function more like coaches and mentors for a new
generation of professionals [1].
The outcome of research and development work in utilizing new and
emerging educational technologies in traditional educational institutions has
also found its way in serving distance students. The convergence of new
information technologies such as telecommunications, computers, satellites,
and fiber optic technologies is making it easier for teaching institutions to
implement distance education [2,3]. National and transnational virtual
universities as well as traditional educational establishments are offering
online degree programs, continuing education and corporate training
courses. In many cases Web-based instruction and course management tools
are used to deliver courseware containing interactive multimedia-based
educational modules.
An integrated environment containing Web-based course delivery and
management along with multimedia modules is commonly referred to as a
virtual learning environment or a virtual classroom. Virtual learning
environments are used to support real classroom environments in traditional
academic institutions [4,5]. Virtual classrooms also were found to be very
attractive in virtual campuses and virtual universities all around the globe
[6]. Key technologies involved in the development of virtual learning
environments include multimedia and streaming media.
Reasons for developing and utilizing virtual classrooms by teaching
institutions vary, some endeavor to keep up with the ever changing frontiers
of educational technologies, whilst others see it as an approach that gives
students more control over their learning. The use of new and emerging
educational technologies offers students a dynamic learning environment
through which class communication and collaboration can be achieved with
minimum time and budget requirements. In fact, the great benefit of online
learning in general and virtual classrooms in particular is that it provides
educators with an opportunity to get students to collaborate and to
communicate very easily [1]. Two key issues in online learning are retention
and the development of interactive and collaborative activities and
environments. Creating a motivational and interactive virtual learning
environment can enhance student retention, completion, and overall
enthusiasm for this new type of learning arena [1].
In applications related to online learning, multimedia is the ability to
include sound and video into Web pages. Due to the availability of many
public domain and commercial computer programs it has become
increasingly easy to incorporate audio and video clips into any digital
document or multimedia Web publishing materials. Streaming media came
2 Chapter 1

about in response to the problem of bandwidth-greedy multimedia files,
opening the possibilities of delivering many multimedia applications via the
Internet. Streaming refers to the process of delivering audio clips, video
clips, and other media in real-time to online users [7].
Streamed audio and video files can be found in a number of World Wide
Web locations serving a wide-variety of purposes, such as a vocal
introduction to a homepage, a movie trailer, or an interactive educational
presentation. One of the major attractions to streaming media is "live"
broadcasting that has less applicability to educational environment. In a
simple educational setup, the streaming media is used to deliver
synchronized text, images and other media files over the public TCP/IP
network. In a more complex setup, streaming is used for network delivery of
interactive multimedia modules [8].
This paper illustrates two case studies; a simple virtual classroom
offering standard Power Point slides synchronized with streamed voice
narration and a stream video presentation in which the video is indexed to
the table of the content. These case studies are explained in terms of the
module structure and the method of delivery. Both modules are delivered to
students over a low bandwidth modem connection.
It would be useful to utilize desktop videos for course material
presentation and distribution. However, until recent times network delivery
of multimedia clips was limited to a corporate environment or on-campus
environment where students have direct access to high-speed lines. The
delivery of media files over the Web has always been limited by the
bandwidth of communication lines or channels. Development in this field is
happening in two directions: faster connections and communication
technologies [9] that are altering the capacity of the communication channels
and new multimedia technologies for the Web, such as streaming audio and
video, flash animation, and others that are allowing for better delivery of
media on the Web [4].
When video first came to the World Wide Web, it was necessary to
download the entire video file before it could be played. This was seen to be
one major disadvantage of traditional multimedia clips and modules.
Downloading typically megabytes of video files resulted in substantial
delays before the audience could actually hear or view the clip. This was
even worse when large clips were downloaded over a slow modem
connection.
1. Application of Streaming Media in Educational Environments 3
STREAMING: MULTIMEDIA FILES FOR
NETWORK DELIVERY
2.

Streaming media is a method of providing audio, video and other media
files in real-time without download waits over the Internet or corporate
Intranet. Instead of downloading the file in its entirety before playing it,
streaming technology takes a different approach; it downloads the beginning
of the file, forms a buffer of packets, and when an appropriate buffer is
reached, the client player plays back the packets in a seamless stream. While
the viewer is watching, it downloads the next portion, etc., until the entire
file is played. The buffer provides a way for the player to protect itself in
case of network congestion, lost packets, or other interference.
4 Chapter 1
2.1 History: Streaming Audio and Video
Progressive Networks [10] led the way in the development of streaming
audio and video, launching “RealAudio 1.0“ in 1995. “RealAudio 2.0” was
then announced that upgraded sound to “FM mono” quality and made live
Webcasting possible for the first time. RealAudio 2.0 introduced important
features such as server bandwidth negotiation, support for firewalls and open
Application Programming Interface (API) for third party developers.
Compatibility of RealAudio 2.0 with the Netscape Navigator plug-in
architecture made it possible to play RealAudio content available as an
integrated part of a Web page. In February 1997, Progressive Networks
released RealVideo 1.0 that made delivery of video over 28.8 kbps a reality.
The system also offered full-motion-quality video using V.56 (56kbps) and
near TV broadcast quality video at Local Area Network (LAN) rates or
broadband speeds (100 kbps and above).
In October 1997, Progressive Networks officially changed its name to
Real Networks prior to the release of what it called “RealSystem 5.0”. The
system included RealPlayer 5.0, RealEncoder 5.0, RealServer 5.0 and a
software called RealPublisher. Until the release of RealSystem 6.0 in 1999,
the delivery of multimedia files were conducted using Real Networks
propriety PNM (Progressive Networks Metafile) format. RealSystem 6.0
used the Real Time Streaming Protocol (RTSP) that was then a new standard
for improved server-client communication. RealSystem 6.0 could also
stream and play not just Real Networks own format, but also standard data
types such as MIDI, AVI or QuickTime. Case studies illustrated in this paper
were based on RealSystem 6.0.
Real Time Streaming Protocol is designed to work with time-based
media, such as streaming audio and video, as well as any application where
application-controlled, time-based delivery is essential. In addition, RTSP is
designed to control multicast delivery of streams, and is ideally suited to full
multicast solutions [7]. Currently, RealSystem supports a variety ofnew data
types. These include audio and video as well as text, images and animation.

In fact, streaming now is seen to be a platform for delivering information,
rather than just as a system for delivering video. One can tie other kinds of
Web content to the timeline of a video or an audio presentation. This allows
the creation of a complex and personalized experiences for the end user. An
example that contains a variety of media files with precise timing structure is
available in [11].
2.2 Why Streaming?
There are several reasons why downloading of an entire media file prior
to its play back is unsuitable in the delivery of information over the public
TCP/IP network. For instance, if a user on a low bandwidth connection (and
even high bandwidth) wants to move forward in the video they have to wait
until the whole file is downloaded. Also, if a user only views a small portion
of the stream and they are on a high bandwidth connection they are likely to
have downloaded the whole file after only a few seconds. This will cost the
user extra bandwidth because Web servers typically download as fast as they
can. Moreover, Web severs do not have Intellectual Property control and so
a publisher will not be able to prevent users from downloading the media file
for re-using. Also Web servers are not capable of delivering presentations of
unlimited or undetermined length, as well as live broadcast of media files.
There are other reasons as well, which proves the superiority of dedicated
streaming servers over the standard web servers in the delivery of
multimedia files.
Streaming multimedia has been optimized for use on the Internet in two
ways:
Clips are highly compressed, so that download time is drastically
reduced. The goal is to download the clip faster than it takes to play the
clip, even when using dial up modem connections.
The players and plug-ins can play the clip as it is being downloaded.
They start playing immediately, thus reducing wait time for the user.
These optimizations allow users to do things that are impractical for
traditional multimedia including broadcasting of live audio and video events
and broadcasting of extremely large multimedia files, such as audio books
that can take many hours to play. Often delivery of multimedia files through
a dedicated stream server is combined with fast-forward and rewind
capabilities.

RealSystem, Microsoft Windows Media Technologies [12] and Apple’s
QuickTime [13] offer tools for streaming multimedia content across
corporate Intranets and the Internet. They allow the use of scripting
languages to control the player or more importantly the integration with the
browser so that one can embed the player and control it using Java script.
Exposure to Java is useful as it ensures the developers can use the wealth of
Java in virtual classrooms.
Producing a pre-recorded streaming multimedia requires the following
steps:
6 Chapter 1
3. STREAMING MEDIA: SERVERS, PLAYERS AND
ENCODERS
l.
2.
3.
4.
5.
Recording the content that requires proper recording equipment such as
video cameras, microphones, etc.
Digitization or conversions of resulting clip into a multimedia format,
such as .wav, .avi, .mov, rm, etc. It is possible to do this at the same time
as step one by recording directly to the multimedia format.
Post-processing in the multimedia format, such as adjusting sound
quality, editing the content, etc.
Conversions of the resulting multimedia format into a preferred format
(eg. RealSystem format) using the relevant encoder (eg. RealProducer).
If there are no editing enhancements, one can record direct to the
preferred format.
Uploading the resulting file on a Web server, or a dedicated steaming
server such as RealServer, so people on the Web can access it as
streaming multimedia.
Examples shown in this paper use RealSystem, which is a collection of
components by Real Networks for producing and distributing streaming
multimedia. The three components of RealSystem include:
Producer Module (encoder) that converts existing multimedia files into
RealSystem format. The encoder program can also record to RealSystem
format directly from audio and video sources.
Player Module that plays, amongst other things, the RealSystem media
file formats. The free version of RealPlayer includes both as an external
version, and a Web browser plug-in version. The professional version of
RealPlayer adds the ability to record broadcasts and other advanced
features.
Server Module that offers live broadcast and advanced features like
automatic adjustments of transmission speeds to match user’s
connection, or the ability to fast forward and rewind.

4. VIRTUAL LEARNING ENVIRONMENTS
Web-based instruction can be supplemented by audio and video files to
closely simulate a real classroom environment. Streaming technology is the
key technology used in delivery of educational multimedia modules over the
network. A virtual learning environment in its relatively complete form
contains a small size video clips that shows the class activity as well as a
series of text pages and images representing the content of the blackboard
and the overhead projector screen.
From a developer of educational resources perspectives, the interesting
idea behind streaming files is the synchronization of the playback of
arbitrary files such as text, images etc. For instance, one can synchronize a
flash animation file with an audio, text, image, or any other data files. In a
virtual classroom environment, one can synchronize the playback of a class
video with images taken from the blackboard or the overhead screen as the
lecture progresses.
4.1 Case Study 1: Stream Video Integration into Virtual
Classrooms
Due to the recent availability of video compressor/decompressor (codec)
technologies with compressions designed for web delivery, it is now possible
to use video as an effective resource in a web-based instruction environment.
Different client programs are now available to make movies with different
data rates, and different streaming server programs are now available to
negotiate with the client machines to deliver stream video at relatively high
quality even via narrow bandwidth of modem connections.
A stream video presentation was included into a combined final year and
Master subject (ELEC476/912) learning environment to provide background
materials for students group projects. This module was offered in two
formats to meet low- and high-end Mac, PC and UNIX platforms as well as
slow and moderately fast network connections. In both formats an audio and
a video file synchronized with text and images were used to create a simple
virtual tutorial classroom.
An interesting feature of most streaming server programs is that they
allow client machines to directly negotiate with the server to access the part
on the media file it wants. Normally, after a short pause the user can jump to
anywhere in an audio or video clip. The video can be indexed to a table of
contents and can also automatically "flip" pages in an adjacent frame
according to markers embedded in the video. As shown in Figure 1, the
video file in this presentation was indexed to a table of content, and that was
done through markers embedded in the video file during the encoding

process. These enabled students to click on items listed in the table of
content (left window) in order to view its associated video along with its
synchronized text and images in allocated areas within the presentation
window.
An online questionnaire was administered to obtain information
regarding student access to the subject homepage and its stream video
integration in ELEC476/912 virtual learning environment. Survey results
showed that students realized the benefits of technology-enhanced resources
that were incorporated into their on-campus course delivery. Students’
comments and feedback on the course content, the method of delivery and
available tools and resources for this subject was archived in [14].
4.2 Case Study 2: ELEC101 Virtual Classroom
The Web Edition of “Fundamentals of Electrical Engineering
(ELEC101)” is a simple virtual classroom environment that uses the real-
time streaming technology to deliver synchronized Power Point slides
(images) and audio files (the lecturer voice) over the Internet.
To ensure students using different computers of any power and different
connections of any speed could retrieve the content of ELEC101 virtual
classroom four options, namely plain, synchronized, controlled synchronized
8 Chapter 1

1. Application of Streaming Media in Educational Environments
and power-point slide/script were provided. Figure 2 shows a screen caption
of the cover page of ELEC101 World Wide Web Edition.
Rather than replacing the conventional lecturing of ELEC101, the Web
edition was designed and implemented to help students who need to review
important pointers of major topics. Students need to have a freely available
RealPlayer and perhaps a headphone set so that they can hear the lecture and
view the overheads in computer laboratories or at home using a standard 56
kbps dial up connection on PC, Mac or UNIX platform.
In the plain format students first receive a page containing thumbnails of
available overheads. The RealPlayer will start working as soon as students
click on a thumbnail to view the actual overhead. Then, they step to the slide
they are interested in, and hear the associated audio clip with each slide.
Students may control the RealPlayer operation, and they also have
standard navigation tools. The RealPlayer may be used as a plug-in program
or as a Netscape or Internet Explorer helper application. The latter means by
clicking on the RealAudio icon, the browser lunches the player and from
there, students control the player operation; recording, playback, rewind and
so forth. They may also use standard previous and next buttons to move
around. A screen caption ofthe plain format is shown in Figure 3.
9

10 Chapter 1
In synchronized format student receive power point slides and their
associated sound. The audio file automatically updates slides displayed as
the lecture progresses. RealPlayer multiple controls were provided in this
option. These include play, pause, volume-control and position-slider. Users
can use the latter to move forward and backward through the presentation.

The controlled synchronized option of the ELEC101 displays projected
slides on the screen and plays the corresponding sound. In this mode of
operation, students step to the slide they are interested in and start the player.
While the audio is playing, it will automatically update the slide as the
lecture progresses. Alternatively, students can jump to a new slide by
clicking on thumbnails listed on the left frame, and the audio will jump to
follow. To start listening to the audio from a particular slide, students may
type the slide number in the space provided in control section and press the
enter key. Figure 5 shows a screen caption of ELEC101 in a controlled
synchronized mode of operation.
Provisions also were made for students using a computer without a sound
card. In this case they view a slide on one window and read its
corresponding text on another browser window.
Implementation of the plain format is very simple provided the developer
knows the technology and has some almost freely available tools. The
“controlled synchronized” version of ELEC101 represents some challenges.
This version uses JavaScript, Frames, and the RealAudio Plug-in.

Nowadays, the RealPlayer itself supports Java driven events. This basically
means the development of synchronized audio and video files for network
delivery is much easier, and can be done by almost everyone.
The ELEC101 virtual classroom environment was tested by a group of
second year students using moderately high-speed connection (computer
laboratories on campus) and low speed dial up connections (28.8kbps and
higher modems). The setup performed with no interruptions or delay in
delivering the subject content (sound and images). The entire concept of
virtual classroom and the application of streamed and synchronized audio
file were found by students very exciting and motivating. The setup is now
available on Internet for public use [14].
5. CONCLUSION
The combination of powerful compression algorithms, extensive features
that are associated with streaming servers and integration with the Web
make it possible to use virtual learning environments effectively over narrow
bandwidth networks. This paper explored the integration of the multimedia
modules into a virtual learning environment. Real time streaming technology
in an educational setting was examined and the process of applying
streaming technology in education was briefly highlighted. Two examples of
virtual learning environments using stream synchronized audio/video and
image files were illustrated. It is envisaged that the usage of technology
enabled methods in face-to-face university instruction results in a model that
works equally well for distance students and learners in virtual campuses.
P. Doulai, “Preserving the quality of on-Campus education using resource-based
approaches,” Proc. International WebCT Conference on Learning Technologies,
University of British Columbia, Vancouver, Canada, 1999, pp. 97-101.
B. Hart-Davidson and R Grice, “Extending the dimensions of education: Designing,
developing, and delivering effective distance-educ.,” Proc. of the IEEE Professional
Communication Conference, 2001, pp. 221-230.
E. R. Ladd, J. R. Holt and H. A. Rumsey, “Washington state university's engineering
management program distance education industry partnership,” Proc. of Portland
International Conference on Management of Engineering and Technology, 2001. pp.
302-306.
P. Doulai, Smart and Flexible Campus: “Technology Enabled University Education,”
Proc. of The World Internet and Electronic Cities Conference, 2001, Iran, pp. 94-101.
V. Trajkovic, D. Davcev etal, “Web-based virtual classroom,” Proc. of IEEE
Conference on Technology of Object-Oriented Languages and Systems, 2000, pp.
137-146
12 Chapter 1
REFERENCE
[1]
[2]
[3]
[4]
[5]

W. Beuschel, “Virtual campus: scenarios, obstacles and experiences,” Proc. of IEEE
Conference on System Sciences, 1998, pp. 284-293.
A. Zhang; Y. Song and M. Mieike, NetMedia: “Streaming multimedia presentations
in distributed environments,” IEEE Multimedia, Vol.9, 2002 pp. 56-73.
P. Doulai, “Recent developments in Web-based educational technologies: A practical
overview using in-house implementation,” Proc. of the International Power
Engineering Conference, 1999, Singapore, pp. 845-850.
D. Fernandez, A. B. Garcia, D. Larrabeiti, A. Azcorra, P. Pacyna, and Z. Papir,
“Multimedia services for distant work and education in an IP/ATM environment,”
IEEE Multimedia, Vol.8,2001 pp. 68-77.
RealNetworks(ProgressiveNetworks) http://www.real.com/
Design and Management 1, “Introduction to Group Projects (ELEC195) Homepage,”
http://edt.uow.edu.au/elec195/welcome.ram
S. Huang and H. Hu, “Integrating windows streaming media technologies into a
virtual classroom environment,” Proc. of International Symposium on Multimedia
Software Engineering, 2000, pp. 411-418
Apple QuickTime, http://www.apple.come/quicktime/
The Educational Delivery Technology Laboratory (EDTLab), University of
Wollongong, http://edt.uow.edu.au/edtlab/portfolio.html/
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]

WIDEBAND SPEECH AND AUDIO CODING IN
THE PERCEPTUAL DOMAIN
L. Lin, E. Ambikairajah and W.H. Holmes
School of Electrical Engineering and Telecommunications, The University of New South
Wales, UNSW Sydney 2052, Australia.
Abstract: A new critical band auditory filterbank with superior auditory masking
properties is proposed and is applied to wideband speech and audio coding.
The analysis and synthesis are performed in the perceptual domain using this
filterbank. The outputs of the analysis filters are processed to obtain a series
of pulse trains that represent neural firing. Simultaneous and temporal
masking models are applied to reduce the number of pulses in order to achieve
a compact time-frequency parameterization. The pulse amplitudes and
positions are then coded using a run-length coding algorithm. The new speech
and audio coder produces high quality coded speech and audio, with both
temporal and spectral fidelity.
Key words: auditory filterbank, speech coding, simultaneous and temporal masking
Current applications of speech and audio coding algorithms include
cellular and personal communications, teleconferencing, secure
communications etc. Historically, coding algorithms using incompatible
compression techniques have been optimized for particular signal classes
such as narrowband speech, wideband speech, high quality audio and high
fidelity audio (CD quality). It is evident that a universal speech and audio
coding paradigm is required to meet the diverse needs of the above
applications. Low bit rate speech coders provide impressive performance
above 4kbps for speech signals. But do not perform well on music signals.
Similarly, transform coders perform well for music signals, but not for
speech signals at lower bit rates.
Speech and general audio coders are usually quite different – for speech
one of the main tools is a model of the speech production process, whereas
Chapter 2
1. INTRODUCTION

for audio more attention is paid to modeling the human auditory system,
since a source model is usually not feasible. The new MPEG-4 standard for
multimedia communication includes a scalable audio codec supporting
transmission at bit rates from 2 to 64kbps. However, in order to achieve the
highest audio quality with the full range of bit rates, MPEG-4 actually
employs three types of codec. For lower bit rates, a parametric codec
(Harmonic Vector Excitation Coding) is used which encodes at 2-4kbps for
speech with an 8kHz sampling frequency, and at 4-16kbps for speech and
audio with 8 or 16kHz sampling frequency. A Code Excited Linear
Predictive (CELP) codec is used for the medium rate – i.e. 6-24kbps at 8 or
16kHz sampling frequency. Time-frequency (TF) codecs, including the
MPEG-2 AAC and Twin VQ codecs are used for the higher bit rates,
requiring 16-64kbps at a sampling frequency of 8kHz.
There is therefore a need for high quality coders that can work equally
well with either speech or general audio signals. In this work we propose a
scheme for a universal coder that can handle both wideband speech and
audio signals. This coder is based on a new auditory filterbank model, and is
a further development of the speech and audio coding scheme initially
proposed by Ambikairajah et al. [3], in which the analysis and synthesis of
the speech and audio signals take place in the perceptual domain.
16 Chapter 2
1.1 Coding using Auditory Filterbanks
In recent years parallel auditory filterbanks such as the Gammatone
filterbank [5,13] have outperformed the conventional transmission line
auditory model [1,12] in terms of computational simplicity. They have
applications in various types of signal processing required to model human
auditory filtering. Gammatone auditory filters were first proposed by
Flanagan [5] to model basilar membrane motion, and were subsequently
used by Patterson et al. [13] as a reasonably accurate alternative for auditory
filtering. They have since become very popular. Robert and Eriksson [15]
applied them to produce a nonlinear active model of the auditory periphery,
and Kubin and Kleijn [7] applied them to speech coding.
In the wideband speech and audio coder proposed by Ambikairajah et al.
[3], the analysis is performed in the auditory domain by using Gammatone
filters to obtain an auditory-based time-frequency parameterization of the
input signal in the form of critical band pulse trains. This parameterization
approximates the patterns of neural firing generated by the auditory nerves,
and preserves the temporal information present in speech and music. An
advantage of this parameterization is its ability to scale easily between
different sampling rates, bit rates and signal types.
Adequate modeling of the principal behavior of the peripheral auditory
systems is still a difficult problem. An important shortcoming of Gammatone

filters is that they do not provide an accurate frequency domain description
of the tuning curves because of their flat upper-frequency slopes. In this
work we propose a new parallel auditory filterbank based on the critical
band scale. The filterbank models psychoacoustic tuning curves obtained
from the well-known masking curves [16,17]. The new auditory filters,
which have a steeper upper-frequency slope, achieve high frequency domain
accuracy and are computationally efficient. The new filterbank is then
applied to wideband speech and audio coding under the same paradigm as in
[3]. Auditory masking is applied to eliminate redundant information in the
critical band pulse trains. A technique to code the pulse positions and
amplitudes based on a run-length coding algorithm is also proposed.
This chapter is organized as follows: Section 2 presents the design
techniques for the new critical band auditory filterbank. Section 3 describes
the auditory-filterbank-based speech and audio coding scheme, including the
reduction of redundancy in the pulse trains and the quantization and coding
techniques for the pulse amplitudes and positions.
A filterbank that models the characteristics of the human hearing system
will have many desirable features and can have wide applications in speech
and audio processing. It is very difficult and costly to experimentally
observe the motion of the basilar membrane in a fully functional cochlea.
We present here an inexpensive method for generating psychoacoustic
tuning curves from the well-known auditory masking curves [16,17]. Then
two approaches to obtain the critical band filterbank that model these tuning
curves are introduced. The first approach is based on the Log-Modeling
technique for filter design, which gives very accurate results. The second
approach uses a unified transfer function to represent each filter in the
critical band filterbank.
2. Wideband Speech and Audio Coding in the Perceptual Domain 17
2. DESIGN OF A CRITICAL BAND AUDITORY
FILTERBANK
2.1 Generation of Psychoacoustic Tuning Curves from
Masking Curves
Masking is usually described as the sound-pressure level of a test sound
necessary to be barely audible in the presence of a masker. Using narrow-
band noise of a given center frequency and bandwidth as maskers and a pure
tone as the test sound, masking patterns have been obtained by Zwicker and
Fastl [16,17]. The effect of masking produced by narrow-band maskers is
level dependent. The five curves plotted as solid lines in Fig. 1 are the

18 Chapter 2
masking patterns centered at 1 kHz at the five different levels
and 100 dB [17].
It is known that the shapes of the masking patterns for different center
frequencies and different levels are very similar when plotted using the
critical band rate scale. Hence masking curves at different center
frequencies can be obtained by simply shifting the available masking curves
at Masking curves at levels other than and
100 can be generated through interpolation. The masking curves
obtained through interpolation and shifting are shown in Fig. 1 by the dashed
lines.
The tuning curves can be obtained from the masking curves as follows.
The first step is to fix a test tone at a particular frequency and level. Then
the masking curves with different center frequencies that are just able to
mask the testing tone are found and the corresponding levels are noted.
Plotting the levels as a function ofthe center frequencies provides the tuning
curve at that test tone frequency (Fig. 2).
The magnitude response of the basilar membrane (or auditory filters) can
be obtained by vertically reversing and scaling the tuning curves in Fig. 2.
This is shown in later subsections in Fig. 3 and 4 by the dashed lines. More
details can be found in [11]. The tuning curves are consistent with the
measurement of nerve tuning curves [8] and the basilar membrane response
[14]. Two auditory filter design techniques that model the magnitude
response accurately are introduced in the next subsection.

It is well known that the human auditory system gives rise to a perception
of loudness that closely follows a logarithmic scale. Log-magnitude
modeling is a technique for IIR digital filter design [6]. This technique has
also been applied in [10] to the modeling of auditory tuning curves. The
result is a very accurate model that matches the magnitudes of the tuning
curves. The criterion for auditory filter design is based on the minimization
of the difference between the log-magnitude of the desired basilar membrane
frequency response and a pole-zero filter. The transfer function of one filter
in a critical band rate filterbank can be written as
where and are the filter parameters, P is the number of poles, and Q is
the number of zeroes. The filter design technique minimize the sum of
squared differences, on a logarithmic scale, between a given set of spectral
amplitudes and the magnitude response of sampled at the
same frequencies:
2.2 Filterbank Design by the Log-Magnitude Modeling
Technique

20 Chapter 2
where is a set of uniformly spaced frequencies
and is the desired basilar membrane frequency response (positive
magnitude values) at a certain center frequency.
The minimization of J with respect to the parameters and is a
nonlinear problem. To avoid gradient-based optimization, an iterative
procedure originally proposed in [6] is used. The minimization index at the
step can be written as
The filter at step m is computed from
where
The solution of (4) is used to update the weight function in (3) and the
process is then repeated. The complete algorithm converges to a sufficiently
small error within 2 to 3 iterations. The details of this procedure can be
found in [6,10]. A critical band filterbank of 17 filters covering the
frequency range of 50 Hz to 4000 Hz was obtained by this design technique.
The frequency response of the 17 filters is shown in Fig. 3 by the solid lines,
together with the vertically flipped tuning curves by the dashed lines. These
filters are minimum-phase IIR filters with 8 poles and 7 zeros. The
magnitude responses of the digital filters are almost indistinguishable from
the true tuning curves.

2.3 Filterbank Design by Direct Modeling Approach
A unified digital filter model is proposed in [11] to represent the
frequency characteristics of all the tuning curves. The transfer function of
one auditory filter in a critical band filterbank is expressed in the z-domain
by
The parameters in (5) are given by
where is the sampling frequency. The critical bandwidth and the central
frequency in (6) are calculated from the following equations [16, 17]:
where is the critical band rate in Bark corresponding to The spacing of
is linear on a critical band scale.
The parameter is chosen as The term
produces a notch filter with a sharp dip at a

22 Chapter 2
point to the right of the center frequency so that the upper-frequency slope
of the overall filter is steep enough. The parameter is chosen as
To ensure the notch happens at a frequency location about 60 dB lower than
the center frequency the empirical formula that we obtained can be used
to choose
where is in Hz.
The frequency responses of five filters at critical bands 4, 7, 10, 13 and
16 are plotted in Fig. 4, together with the corresponding tuning curves. The
modeling accuracy of this direct modeling approach is acceptable and is
more straightforward than the log-magnitude modeling approach.
Our filters are also compared with the well-known Gammatone auditory
filters [5,13]. Our filters have steeper upper-frequency slopes, which is
desirable for both accurate modeling of the masking effect and noise
suppression. Critical band filters designed using this method can achieve
both high frequency domain accuracy and computational efficiency. Next we
will apply the critical band auditory filterbank to speech and audio
processing.

then the synthesis filterbank is implemented using FIR filters obtained by
time-reversal of the impulse responses of the corresponding analysis filters.
The reconstruction is nearly perfect – i.e.
Each FIR synthesis filter has 128 coefficients, so that an 8 ms delay is
required to make the filter causal if kHz.
3. PERCEPTUAL DOMAIN BASED SPEECH AND
AUDIO CODING
3.1 Speech/audio Coding Using an Auditory Filterbank
where is the frequency response of the analysis filter at the ith
channel and M is the total number of channels. If we choose the synthesis
filters as
The speech and audio coding system implemented in this work is an
IIR/FIR analysis/synthesis scheme as described in [9] and also shown in
Figs. 5 and 6. Other possible analysis/synthesis filterbank implementations
can also be found in [9].
Each IIR analysis filter has 8 poles and 3 zeros. The analysis filterbank
can also be implemented in FIR form [3,7], but at least 100 coefficients are
required for each FIR filter to approximate the impulse response of the IIR
filter with reasonable accuracy. The auditory filterbank is also approximately
power-complementary. That is,

24 Chapter 2
The output of each filter is half-wave rectified, and the positive peaks of
the critical band signals are located. Physically, the half-wave rectification
process corresponds to the action of the inner hair cells, which respond to
movement of the basilar membrane in one direction only. Peaks correspond
to higher rates of neural firing at larger displacements of the inner hair cell
from its position at rest. This process results in a series of critical band pulse
trains, where the pulses retain the amplitudes of the critical band signals
from which they were derived.
In recognition of the fact that lower power components of the critical
band signals are rendered inaudible by the presence of larger power
components in neighboring critical bands, a simultaneous masking model is
employed. Weak signal components become inaudible by the presence of
stronger signal components in the same critical band that precede or follow
3.2 Auditory Masking

In the implementation described a simultaneous masking model similar
to that used in MPEG [4] was employed to calculate the masking threshold
for the ith critical band, however the optimum simultaneous masking
model for this scheme has yet to be determined. The simultaneous masked
pulse train for the ith critical band was obtained from pulses in the
unmasked pulse train whose amplitudes were below the masking
threshold calculated for each critical band were considered inaudible, and
were set to zero
Note that for each 32 ms frame, the gain of each critical band is
calculated based only on the non-zero pulse amplitudes. The purpose of
applying simultaneous masking is to produce a more efficient and
perceptually accurate parameterization of the firing pulses occurring in each
band. Experiments revealed that simultaneous masking removed an average
of around 10% of the pulses without altering the quality of the reconstructed
speech in any way.
25
2. Wideband Speech and Audio Coding in the Perceptual Domain
them in time, and this is called temporal masking. When the signal precedes
the masker in time, it is called pre-masking; when the signal follows the
masker in time, the condition is called post-masking. A strong signal can
mask a weaker signal that occurs after it and a weaker signal that occurs
before it [2, 16, 17]. Both temporal pre-masking and temporal post-masking
are employed in this work to reduce the number of pulses.
3.2.1 Simultaneous Masking
3.2.2 Temporal Post-masking
The masking threshold for temporal post-masking decays
approximately exponentially following each pulse, or neural firing. A simple
approximation to this masking threshold, introduced in [3], is
where is the ith of M= 21 simultaneous masked critical band pulse
train signals, and is the discrete time sample index. The

26 Chapter 2
time constants were determined empirically by listening to the
quality of the reconstructed speech, and values between and
were chosen. All pulses with amplitudes less than the masking
threshold were discarded. The thresholds are shown in Fig. 7 by the
dashed line, where the filled spikes are the pulses to be kept after applying
post-masking.
3.2.3 Temporal Pre-masking
Pre-masking is also allowed for in this work. The masking threshold
for this temporal pre-masking is chosen as
where is the ith critical band pulse train after post-masking, and is
chosen as to simulate the fast exponential decay of pre-
masking. All pulses with amplitude less than the masking threshold
were discarded. This is shown in Fig. 8, where the filled spikes are the pulses
to be kept after applying pre-masking. A reduction rate of 10% can be
achieved by pre-masking on the pulses obtained after post-masking.
The purpose of applying masking is to produce a more efficient and
perceptually accurate parameterization of the firing pulses occurring in each
band. Experiments show that the application of temporal masking reduces
the overall pulse number to about 0.70N (where N is the frame size) while
maintaining transparent quality of the coded speech and audio. This is a
significant improvement over the pulse number of 1.26N in the previous
application [3], which used Gammatone filters in the front end. The
improvement is mainly due to the spectral shape of the new auditory filters
used in this work.

3.3 Quantization and Coding
The pulse train in each critical band after redundancy reduction was
finally normalized by the mean of its non-zero pulse amplitudes across the
frame. Thus, the parameterization consists of the critical band gains
(incorporating the normalization factors) and a series of critical band pulse
trains with normalized amplitudes. For each frame, the signal parameters
requiring for coding are the gains of the critical bands and the amplitudes
and positions of the pulses.
3.3.2 Pulse Positions
The pulse positions are coded using a new run-length coding technique.
After temporal masking and thresholding, most locations on the time-
frequency map have zero pulses. This suggests that we can just code the
3.2.4 Thresholding
The pulses in the silent frames obtained after auditory filtering and peak
picking are most likely due to background and quantization noise. These
pulses are at random positions and their magnitudes are very small, so that
the sound synthesized from these pulses are inaudible. By thresholding,
these pulses can be eliminated without affecting the quality of the
synthesized signal. A simple approach is to choose the threshold based on
the silent frames at the beginning of the coding process.
3.3.1 Pulse Amplitudes
Each critical band gain is quantized to 6 bits and the amplitude of each
pulse is quantized to 1 bit, which does not result in any perceivable
deterioration in the quality of the reconstructed speech or audio signal.
Alternatively, vector quantization can be adapted to reduce the bits required
for coding the amplitude [3].

28 Chapter 2
relative positions of neighboring pulses or the numbers of zeros between
them. Specifically, the data in all channels with one frame is concatenated
into one large vector and is scanned for pulses. Then the number of zeros
preceding each pulse is coded using 7 bits. An example is shown below
If the number of zeros is over 128, a code word of 0000000 is generated and
the counting of zeros restarts after the 128 zeros. If during the decoding
process, seven consecutive zeros are encountered, then no pulse will be
generated and the decoding carries on to the next code word. This coding
strategy is a form of run-length coding and is lossless.
The overall average bit rate resulting from this coding scheme is 58 kbps.
This is an improvement upon the 69.7 kbps in the previous work [3]. By
exploring the statistical correlations and redundancy among the pulses,
Huffman or arithmetic coding can be applied to further reduce the bit rate.
The synthesis process starts with decoding to obtain the pulse train for
each channel, and then filtering the pulse train by the corresponding FIR
synthesis filter. Summing the outputs from all filters results in the
reconstructed speech or audio signal, which is perceptually the same as the
original. The results at different stages are shown in Figs. 9-12, where Fig. 9
is the original speech signal, Fig. 10 shows the pulses obtained from peak-
picking, Fig. 11 shows the pulses retained after applying auditory masking,
and Fig. 12 is the reconstructed speech.

4. CONCLUSIONS
Design techniques for a new critical band auditory filterbank that models
the psychoacoustic tuning curves have been proposed. The auditory
filterbank has been applied to speech and audio coding. The filterbank is
implemented as an IIR/FIR analysis/synthesis scheme to reduce
computation. Auditory masking is applied to reduce the number of pulses.
A simple run-length coding algorithm is used to code the positions of the
pulses. The reconstructed speech or audio signals are perceptually
transparent. The overall average bit rate resulting from this coding scheme is
58kbps. The filterbank has superior masking properties and the auditory-

system-based coding paradigm produces high quality coded speech or audio,
is highly scalable, and is of moderate complexity. Current research involves
investigation into to the use of Huffman coding or arithmetic coding
techniques to further reduce the bit rate by examining the statistical
correlation and redundancy among the pulses.
Ambikairajah, E., Black, N.D. and Linggard, R., “Digital filter simulation of the
basilar membrane”, Computer Speech and Language, 1989, vol. 3, pp. 105-118.
Ambikairajah, E., Davis, A.G., and Wong, W.T.K., “Auditory masking and MPEG-1
audio compression”, Electr. & Commun. Eng. Journal, vol. 9, no. 4, August 1997,
pp. 165-197.
Ambikairajah, E., Epps, J. and Lin, L., “Wideband speech and audio coding using
Gammatone filter banks”, Proc. ICASSP, 2001, pp. 773-776.
Black, M. and Zeytinoglu, M., “Computationally efficient wavelet packet coding of
wide-band stereo audio signals ”, Proc. ICASSP, 1995, pp. 3075-3078.
Flanagan, J.L., “Models for approximating basilar membrane displacement”, Bell
Sys. Tech. J, 1960, vol. 39, pp. 1163-1191.
Kobayashi, T. and Imai, A., “Design of IIR digital filter with arbitrary log magnitude
function by WLS techniques”, IEEE Trans. ASSP, vol. ASSP-38,1990, pp. 247-252.
Kubin, G. and Kleijn, W.B., “On speech coding in a perceptual domain”, Proc.
ICASSP, 1999, pp. 205-208.
Liberman, M.C. “Auditory-nerve response from cats raised in a low-noise chamber”,
J. Acoust. Soc. Am., vol. 63, 1978, pp. 442-455.
Lin, L., Holmes, W.H. and Ambikairajah, E., “Auditory filter bank inversion”, Proc.
ISCAS 2001, 200l. Vol. 2pp: 537–540.
Lin, L., Ambikairajah, E. and Holmes, W.H., “Log-magnitude modelling of auditory
tuning curves”, Proc. ICASSP, 2001, pp. 3293-3296.
Lin, L., Ambikairajah, E. and Holmes, W.H., “Auditory filterbank design using
masking curves”, Proc. EUROSPEECH 2001, pp. 411-414.
Lyon, R.F., “A computational model of filtering detection and compression in the
cochlea”, Proc. ICASSP, 1982, pp. 1282-1285.
Patterson, R.D., Allerhand, M., and Giguere, C., “Time-domain modelling of
peripheral auditory processing: a modular architecture and a software platform”, J.
Acoust. Soc. Am., vol. 98, 1995, pp. 1890-1894.
Rhode, W.S., “Observation of the vibration of the basilar membrane of the squirrel
monkey using the Mossbauer technique”, J. Acoust. Soc. Am., vol. 49, 1971, pp.
1218-1231.
Robert, A. and Eriksson, J., “A composite model of the auditory periphery for
simulating responses to complex sounds”, J. Acoust. Soc. Am., vol. 106, 1999, pp.
1852-1864.
Zwicker, E. and Zwicker, U.T., “Audio engineering and psychoacoustics: matching
signals to the final receiver, the human auditory system”, J. Audio Eng. Soc., vol. 39,
No. 3, 1991, pp. 115-125.
Zwicker, E. and Fastl, H., Psychoacoustics: Facts and models. Springer-Verlag,
1999.
30 Chapter 2
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]

RECOGNITION OF ENVIRONMENTAL SOUNDS
USING SPEECH RECOGNITION TECHNIQUES
Michael Cowling and Renate Sitte.
Griffith University, Gold Coast, Qld 9726,, Australia
Abstract: This paper discusses the use of speech recognition techniques in non-speech
sound recognition. It analyses the different techniques used for speech
recognition and identifies those that can be used for non-speech sound
recognition. It then performs benchmarks on these techniques and determines
which technique is better suited for non-speech sound recognition. As a
comparison, it also gives results for the use of learning vector quantization
(LVQ) and artificial neural network (ANN) techniques in speech recognition.
Key words: non-speech sound recognition, environmental sound recognition, artificial
neural networks, learning vector quantization, dynamic time warping, long-
term statistics, mel-frequency cepstral coefficients, homomorphic cepstral
coefficients
It has long been a goal of researchers around the world to build a
computer that displays features and characteristics similar to those of human
beings. The research of Brooks [1] is an example of developing human-like
movement in robots. However, another subset of this research is to develop
machines that have the same sensory perception as human beings. This work
finds its practical application in the wearable computer domain (e.g. certain
cases of deafness where a bionic ear (cochlea implant) cannot be used.)
Humans use a variety of different senses in order to gather information
about the world around them. If we were to list the classic five human senses
in order of importance, it is generally accepted that we would come up with
the sequence: vision, hearing, touch, smell, taste.
Chapter 3
1. INTRODUCTION

Vision is undoubtedly the most important sense with hearing being the
next important and so on. However, despite the fact that hearing is a human
beings second most important sense, it is all but ignored when trying to build
a computer that has human like senses. The research that has been done into
computer hearing revolves around the recognition of speech, with little
research done into the recognition of non-speech environmental sounds.
This chapter expands upon the research done by the authors [2, 3]. In
these papers, a prototype system is described that recognizes 12 different
environmental sounds (as well as performing direction detection in 7
directions in a 180° radius). This system was implemented using Learning
Vector Quantization (LVQ), because LVQ is able to produce and modify its
classification vectors so that multiple sounds of a similar nature are still
considered as separate classes. However, no comparative testing was done to
ensure that LVQ was the best method for the implementation of a non-
speech sound classification system.
Therefore, this chapter will review the various techniques that can be
used for non-speech recognition and perform benchmark tests to determine
the technique most suited for non-speech sound recognition. Due to lack of
research into non-speech classification systems, this chapter will focus on
using speech and speaker recognition techniques applied to the domain of
environmental non-speech sounds.
The remainder of this chapter will be split into four sections. The first
section will discuss techniques that have been previously used for speech
recognition and identify those techniques that could also be applied to non-
speech recognition. The second section will show the results of benchmarks
on these techniques and also compare their performance with results for
speech recognition. The third section of this chapter will discuss these
results. Finally, the fourth section will conclude and suggest areas for future
research.
Research into speech recognition began by reviewing the literature and
finding techniques that had previously been used for speech/speaker
recognition. Techniques for both feature extraction and system learning were
analyzed and those techniques that could be used for non-speech sound
recognition were identified. These techniques were then benchmarked and
results will be presented in the Results section.
32 Chapter 3
2. SELECTION OF TECHNIQUES

In addition, it was found that emerging research in speech recognition
suggests the use of time-frequency techniques such as wavelets. Due to the
emerging nature of this research, these techniques will not be included in this
comparison. However, for an insight into how wavelets can be used for
speaker recognition, please refer to the chapter in this volume by Michael
Orr et al, "A Novel Dual Adaptive Approach to Speech Processing".
A specific investigation was then performed for each of these eight
techniques. This investigation revealed that techniques based on LPC
Cepstral Coefficients were based on the idea of a vocoder, which is a
simulation of the human vocal tract. Since the human vocal tract does not
produce environmental sounds, these techniques are not appropriate for
recognition ofnon-speech sounds.
In addition, Lilly [4] mentions that the results of the Mel Frequency
Based Filter and the Bark Frequency filter are similar, mainly due to the
similar nature of these filters. Gold [5] also mentions that PLP and Mel
Frequency are similar techniques. Based on these previous findings, only the
more popular Mel Frequency technique was selected for benchmarking.
3. Recognition of Environmental Sounds 33
2.1 Feature Extraction
For feature extraction, the literature review showed that speech
recognition relies on only a few different types of feature extraction
techniques (each with several different variations). Eight techniques were
selected as possible candidates for feature extraction of non-speech sounds.
These were:
Frequency Extraction
LPC Cepstral Coefficients
Homomorphic Cepstral Coefficients
Mel Frequency Cepstral Coefficients
Mel Frequency LPC Cepstral Coefficients
Bark Frequency Cepstral Coefficients
Bark Frequency LPC Cepstral Coefficients
Perceptual Linear Prediction Features
This leaves three feature extraction techniques to be tested:
Frequency Extraction
Homomorphic Cepstral Coefficients
Mel Frequency Cepstral Coefficients

To aid in selection of techniques, comparison tables were built (using [5,
6, 7, 8]) to compare the different feature extraction and classification
methods used by each of these techniques.
The comparison tables showed that some of these techniques, by their
very nature, could not be used for non-speech sound recognition. Any of the
techniques that use subword features are not suitable for non-speech sound
identification. This is because environmental sounds lack the phonetic
structure that speech does. There is no set “alphabet” that certain slices of
non-speech sound can be split into, and therefore subword features (and the
related techniques) cannot be used.
Due to the lack of an environmental sound alphabet, the Hidden Markov
Model (HMM) based techniques shown above will be difficult to implement.
However, this technique may be revisited in the future if other techniques
produce lower than expected results.
In addition, it was decided that the SOM and LVQ techniques
compliment each other. Kohonen developed both techniques, with specific
applications intended for each technique. For classification, Kohonen
suggests the use of the LVQ technique over the SOM technique [9].
Therefore, LVQ will be the technique benchmarked.
34
2.2 System Learning
Chapter 3
Based on this information, the four techniques left to be tested are:
Dynamic Time Warping
Long-Term Statistics
Vector Quantization / Learning Vector Quantization
Artificial Neural Networks
The following system learning techniques are commonly used for
speech/speaker recognition or have, in the past, been used for this
application domain. They are:
Dynamic Time Warping (DTW)
Hidden Markov Models (HMM)
Vector Quantization (VQ) / Learning Vector Quantization (LVQ)
Self-Organizing Maps (SOM)
Ergodic-HMM's
Artificial Neural Networks (ANN)
Long-Term Statistics

This section will detail how each of the techniques listed above were
implemented in this system. It will also discuss the details of the experiment
(such as number of sounds etc).
The techniques will be tested using a jackknife method, identical to the
method used by Goldhor [10]. A jackknife testing procedure involves
training the network with all of the data except the sound that will be tested.
This sound is then tested against the network and the classification is
recorded. In cases where the setting of initial weights may affect the
classification result (as is the case with LVQ and ANN techniques),
classification is repeated 5 times, with different initializations each time. A
correct classification is only recorded if more than three of the training runs
are correct. This jackknife procedure will be repeated with all six of the
samples from each of the eight sounds.
3. ANALYTICAL ANALYSIS OF SPEECH
RECOGNITION TECHNIQUES
3.1 Experiment Setup
As an initial test, eight sounds were used, each with six different samples.
Data set size was kept as small as possible due to the time it takes to train
larger data sets. The sounds used for this test are detailed below and are
some typical sounds that would be classified in a sound surveillance system.
3.2 Benchmarking Method
The feature extraction and system learning techniques shown in the
comparison will be tested for their ability to classify non-speech sounds in
two ways. First, benchmarking will be performed, using these techniques, on
non-speech sounds and data on the parameters, the resulting time taken and
the final correct classification rate will be recorded. Then, these results will

36 Chapter 3
be compared with statistics and benchmark results reported in the literature
for the performance of these techniques on speech. This will demonstrate
how these techniques perform against each other on speech and provide a
comparison to the results for non-speech.
In addition, since feature extraction and system learning are both required
to recognize a sound, each system learning technique should also be tested
against each feature extraction technique to determine the best combination
of these two techniques. The exception to this is the Long-Term Statistics
technique, which generates its own features and therefore requires no feature
extraction techniques. Therefore, ten combinations of techniques must be
benchmarked:
3.3 Methodology
Each of the techniques used was implemented in MATLAB. Both feature
extraction and system learning techniques were implemented and then
combined together in the way shown above in order to perform a
comprehensive comparison. In this section, the implementation of both the
feature extraction and system learning techniques will be discussed.
3.3.1 Feature Extraction Techniques
Three feature extraction techniques will be tested in this comparison. The
implementation of each of these techniques will be discussed in this section.
3.3.1.1 Frequency Extraction
Frequency Extraction was performed using the Fast Fourier Transform
(FFT) routine in MATLAB, which uses the following equation for FFT:

where f represents the range of frequencies in the signal. Each filter is then
multiplied by the spectrum (or portion of the spectrum if it has been split
using hamming windows) to produce a series of magnitude values (one for
each filter). Finally, a Cepstral Coefficient formula (shown in the next
with
where is the frequency we wish to check for, j counts all the samples in
the signal and N is the length of the signal being tested. Since non-speech
sound covers a wider frequency range than speech (anywhere from 0Hz to
20,050Hz, the approximate limit of human hearing), a 44,100 point FFT (N
= 44100) was performed and the results (22,050 unique features) were used
to train the system learning network.
3.3.1.2 Mel-Frequency Cepstral Coefficients
The MFCC algorithm was taken from the Auditory Toolbox by Malcolm
Slaney of Interval Research Corporation [11]. This toolbox is in wide use in
the research community. This toolbox applies three steps to produce the
MFCC. First, it applies a Hamming Window using the standard Hamming
Window equation:
where n represents the subset of the signal which is being windowed. A
Melody Frequency Filterbank is then applied to each windowed segment.
The melody frequency filter bank m is a logarithmic calculation using the
following relation:

section) is applied to produce MFCC and these features are then modified
into a vector that is more appropriate for training a network. Special
attention was paid to removing the first scalar within the vector, which
represents the total signal power [5] and is therefore too sensitive to the
amplitude of the signal [4].
3.3.1.3 Homomorphic Cepstral Coefficients
The MFCC algorithm from the Auditory Toolbox by Malcolm Slaney of
Interval Research Corporation [11] was then used as a basis to implement a
Homomorphic Cepstral Coefficient (HCC) algorithm. This algorithm was
written from scratch but based on information from the source code in the
MFCC algorithm.
The HCC algorithm applies the cepstral coefficient formula directly to
the signal after it had been split using hamming windows. To calculate
cepstral coefficients we use the following relation:
where
and n is the length of the windowed segment being manipulated. These
features were then modified into a vector that was more appropriate for
training a network. As with the MFCC, special attention was paid to
removing the first scalar within the vector, which represents the total signal
power [5] and is therefore too sensitive to the amplitude of the signal [4].
3.3.2 System Learning Techniques
Four system-learning techniques will be tested in this comparison. The
implementation of each of these techniques will be discussed in this section.
38 Chapter 3

3.3.2.1 Learning Vector Quantization
Learning vector quantization (LVQ) was implemented using the inbuilt
LVQ routines in MATLAB’s neural network toolbox. The network was
initialized with 20 competitive neurons and a learning rate of 0.05. This
combination was found to give an acceptable classification rate.
3.3.2.2 Artificial Neural Networks
Artificial neural network (ANN) was implemented using the fast back
propagation algorithm (BPA) in the MATLAB neural network toolbox
(trainbpx). The network was initialized with 20 hidden neurons and a
learning rate of 0.05. In addition, sum-squared error was set to 0.1 and the
momentum constant was set to 0.95.
3.3.2.3 Dynamic Time Warping
Dynamic time warping (DTW) was implemented using the algorithm in
the Auditory Toolbox developed by Malcolm Slaney [11]. The test signal
was warped against each of the reference signals and the error was recorded.
The smallest error was taken to represent the closest class of sound.
3.3.2.4 Long-Term Statistics
Long-Term Statistics (LTS) was implemented using the mean and
covariance functions available in the standard MATLAB distribution, where
N is the length of the signal x. Mean and covariance were calculated for each
of the reference signals and stored in a matrix. The mean and covariance of
the test signal was then compared to this matrix. The closest match was
selected as the correct class. If the closest mean and covariance occurred in
difference classes, the test was concluded to be inconclusive.
4. RESULTS & DISCUSSION
This section will cover the results of this research. Results are shown for
the comparative study of existing speech recognition techniques when these
techniques are applied to non-speech. In addition, a discussion is given on
these results.

40 Chapter 3
4.1 Results
4.1.1 Non-Speech Sound Recognition
Results for non-speech sound recognition are presented below.

42 Chapter 3
4.1.2 Speech Recognition
For comparison, results were found for LVQ and ANN in speech
recognition systems. These results are presented here. Due to the current
popularity of HMM methods in speech recognition at the present time,
results for DTW are difficult to find, therefore no DTW results are
presented.
For ANN’s, a selection of results from Castro and Perez [12] are shown
below. Their results were taken on an isolated word recognition set with
typically high classification error, the Spanish EE-set. The Multi-Layer
Perceptron (MLP) tested used the back propagation algorithm, contained 20
hidden neurons and was trained over 2000 epochs with various amounts of
inputs. The figures given are the MLP’s estimated error rate with a 95%
confidence interval.
For LVQ, results from Van de Wouver e.a. [13] are shown below for
both female and male voices. These results present statistics for both a
standard LVQ implementation for speech recognition and an implementation
of LVQ that then has fuzzy logic performed on it (FILVQ). As can be seen
from the results, the use of LVQ for speech recognition produces rather low
recognition results.

Other documents randomly have
different content

THE SPIRIT LEVEL
This is necessary on outdoor structures which are to be placed on
foundations, in securing level or horizontal timbers, and in plumbing
the uprights. The human eye is not equal to the task. Masons and
builders make use of wooden plumb rods, but as the level is
necessary to secure the horizontals, it will be at hand for the
uprights, the two glass tubes being at right angles. (Fig. 131.)

Fig. 131. The spirit level
RULE
A two-foot, four-fold, boxwood rule, graduated to eighths outside
and sixteenths inside, will answer all ordinary requirements. (Fig.
132.)
THE STEEL SQUARE
Fig. 132. Steel square and rule
This simple but valuable tool, about which volumes have been
written, is necessary for building construction, but is not needed in
the making of furniture or cabinet work.

Fig. 133. The nail box
Fig. 134. Socket chisels
XXIII
MAKING NAIL BOXES
The boys now became very busy completing their shop equipment,
and the first project was a box for holding different sizes of nails.
This was to be kept on the bench where it could be reached
conveniently, and it is shown in Fig. 133.
After studying the sketch, Harry made out the bill of material:
2 pcs. pine 15 × 13⁄4 × 1⁄2
2 pcs. pine 3 × 13⁄4 × 1⁄2
2 pcs. pine 31⁄2 × 13⁄4 × 3⁄8
These six pieces were squared up, and
the joints for the two partitions laid
out by placing them edge to edge in
the vise. Pencil lines were drawn
across the faces at random, a. Ralph
explained that by fitting these pencil
lines they could at any time bring the
two pieces together in the original
position.
The four knife lines representing the
edges of the grooves were next drawn,
and squared half-way down on each
edge, using the face with the pencil
lines as a working face. The bottom of
the groove was laid off with the
marking gauge set at 1⁄4 inch. The
wood inside the lines was removed by

making a saw cut just inside the knife lines, and cutting out with a
3⁄8-inch chisel.
This led to a talk on chisels. Ralph explained that for fine work a
"firmer" chisel was used, having a comparatively thin body.
There are two kinds of handles, known as "socket" and "tang." The
chisels having "tangs" should never be hammered, as the tang acts
as a wedge and splits the handle. Where blows are to be struck with
the mallet, a socket handle should be used. (Fig. 134.) For heavy
work, where hard blows are to be struck, as in house-framing, and
out-of-door work generally, the heavy framing tool should be used.
The handle of this chisel has a heavy iron ring near the top to keep
it from going to pieces.
Our boys' equipment at this time consisted of one half-inch and a
one-inch firmer chisel with tang handles, a 1⁄8-inch and 3⁄8-inch
socket firmer, and one 1⁄2-inch framing chisel. Later on they added a
1⁄4-inch firmer with tang handle.
The grooves for the nail box were cut with the 3⁄8-inch chisel
without the aid of the mallet.
Ralph showed how, by inclining the tool at a slight angle, a paring
action could be obtained, and by working from both ends of the
groove no corners were destroyed.
When the four grooves were finished, the box was ready for
assembling. This called for hammer and nails.
Wire nails are so cheap now that the old-fashioned cut nails have
been largely driven from the market.
The nails used on the box were one-inch brads.
The holding power of flat-head nails is of course much greater than
bung head, but in this case the box was to be squared up after
nailing, exactly as if it were a solid block of wood. This meant
planing the sides and ends, and as the nails would ruin the plane
iron, they were all sunk below the surface with a nail set or punch.
(Fig. 135). This is a useful tool, but not absolutely necessary, as for

light work a wire nail, with the point ground flat on the grindstone,
will answer the same purpose. A carpenter frequently uses the edge
of a flat-head nail instead of the punch.
Photograph by Arthur G. Eldredge
The Correct Way to Hold the Chisel.

Fig. 135. Wire nails
and nail sets
Fig. 135a Wire
nails and nail
sets
The box was assembled by nailing together the sides and ends. The
bottom was next put on, holding the try square along one side and
end to make sure everything was square, and last of all the two
partitions were pushed down into their grooves, and tied in place by
one brad from each side. Next, all nails were set, and the outside
tested with the try square and trued up with the plane.
The cabinet of drawers
shown in Fig. 136 was
next designed to keep
the assortment of
screws and nails, which
the boys knew would
soon accumulate. As far
as possible, they were
kept in their original
paper boxes, on which
the sizes were plainly
printed.
The twelve drawers
were simply boxes
without covers or
partitions, and Ralph
suggested that it was
not necessary to make
them all at once, but that they could often
fill in spare time that way, and gradually
complete the dozen.

Fig. 136. Cabinet for nails and screws
After making the nail box with partitions, this was a simple job, it
being only important that they all be of the same size.
The construction of the cabinet, however, brought new problems.
The shelves, being short, did not require any vertical support except
at the ends, where they were gained into the sides, and to give
Harry practice the top and bottom were to be "rabbeted" into the
sides. The sides then were the most important parts. All six pieces
were first squared up to the dimensions called for in the drawing.
The list of material was as follows:
4 pcs. 245⁄8 × 12 × 1⁄2 shelves
2 pcs. 14 × 12 × 1⁄2 ends
1 pcs. 251⁄8 × 14 × 1⁄4 back

"The grain must run the long way," said Ralph, "so the grooves will
be across the grain."
The four grooves were laid out with knife and try square, and the
lines scored as deeply with the knife as possible.
Then another cut was made with the knife inside of the first, and
with the knife held at about 45 degrees, cutting out a V-shaped
groove, as shown at a.
In each of these grooves a cut with the buck saw was made down to
the line, and the wood removed with the 3⁄8-inch chisel. There are
special planes, called rabbet planes, and plows for doing this kind of
work, but it is good practice for beginners to use the chisel.
The grooves finished, the cabinet was put together with 11⁄2-inch
brads, except the back. This being of thin material, and having no
special strain on it, was nailed on with 1-inch brads. The total width
of the drawers in each tier was 1⁄8 inch less than the space. This
gave clearance, so that they could be moved in or out easily.
Later, when all twelve drawers were finished, the boys bought a
dozen simple drawer pulls, and screwed one in the centre of each
box.
The centre was found by drawing the diagonals in light pencil lines.
The front and ends were sand-papered, and given two coats of dark-
green stain, and the cabinet was placed on a shelf against the wall.

XXIV
BIRD HOUSES
The boys felt that they were ready for business, and Ralph
suggested that they had provided enough weather vanes and
windmills, but had made no provisions for the birds.
The cat, that arch enemy of the native birds, had driven the robins,
martins, and wrens all away. Each year some of these brave little
birds started homes in the trees near the house only to have their
families devoured as soon as they were hatched.
A bird house to be attractive need not be very pretentious, but it
must absolutely be cat-proof, or the birds will inspect it carefully
from all points of view and leave it severely alone. A nest well
hidden in the tree foliage or shrubbery is not nearly so conspicuous
as a brightly painted house fastened to the limbs of a tree. The side
of a barn or outhouse, far enough down from the roof so that the
cat cannot reach it, or a tall pole covered on the upper part with tin,
so that the feline bird hunter cannot gain a foothold, are about the
only safe places for a house which the birds will actually adopt. The
first house our woodworkers manufactured is shown in Fig. 137.
This was a single or one-family house, and its construction was very
simple.
The list of material follows:
One pc. 1⁄2 -inch pine or white wood 10 × 61⁄2 ins.
Two pcs. 1⁄2 -inch pine or white wood 71⁄2 × 3 ins.
One pc. 1⁄2 -inch pine or white wood 91⁄2 × 5 ins.
One pc. 1⁄2 -inch pine or white wood 91⁄2 × 41⁄2 ins.
Two pcs. 1⁄2 -inch pine or white wood 51⁄4 × 41⁄2 ins.

The first piece, 10 × 61⁄2 inches, was simply squared up for the
bottom. The two pieces for the sides, 71⁄2 × 3 inches, were squared
up, and one edge of each planed to a 45-degree bevel, to engage
with the roof boards.
The latter were squared up, and nailed together at right angles with
11⁄4-inch brads.
The two ends, 51⁄2 × 41⁄2 inches, were carefully laid out as shown
in the drawing, sawed, and planed to the lines with square edges.
In the end which was to contain the circular door a hole 13⁄4 inches
in diameter was bored with its centre two inches from the bottom
line. This required the services of the extension bit, and, to avoid
splitting the wood, as soon as the spur of the bit showed on the
further side, the wood was turned about, and the hole finished from
the other side.
The house was next turned upside down, and fastened in the bench
vise. Holes were drilled along the sides of the bottom piece 3⁄4 inch
in from the edge—three on each side—countersunk, and the piece
fastened to the sides with 1-inch No. 8 screws. The top pieces
already nailed together were now nailed in position on the sides and
ends with 1-inch brads.

Fig. 137. One family bird house, and house for high-hole
The pole they used was 13 feet long and about 3 inches in diameter
at the small end. It was rounded at this end by using a draw knife.
(Fig. 138). A block of 7⁄8-inch pine was bored out, and fitted snugly
over the end of the pole. This block was then removed, and four
holes bored through it for screws.

Fig. 138. The draw knife
Before replacing the block on the top of the pole a cut was made
across the end of the pole about two inches deep, by means of the
rip saw.
The block was replaced, and wooden wedges driven into the saw
cut. This fastened the block securely on the end of the pole, and
after making sure that it was level, the bird house was fastened to
the block by four 11⁄4-inch screws from the under side.
A piece of sheet tin was wound around just under the house to
discourage pussy, and the pole set into the ground about three feet,
bringing the under side of the house ten feet above the ground.
A double or two-family house of similar proportions was built next,
as shown in Fig. 139. The list of material called for:
One pc. 1⁄2-inch wood 181⁄2 × 61⁄2 (bottom)
One pc. 1⁄2-inch wood 181⁄2 × 51⁄2 (roof)
One pc. 1⁄2-inch wood 181⁄2 × 41⁄2 (roof)
Two pcs. 1⁄2-inch wood 151⁄2 × 3 (sides)
Three pcs. 1⁄2-inch wood 51⁄4 × 41⁄2 (ends and partition)
The construction was the same as before, each end having a door,
and the partition of course being solid. The block for supporting the
house on the pole was larger, being 8 × 5 × 11⁄4 inches, and called
for six 11⁄2-inch No. 10 screws, to secure it to the under side of the
floor. Harry wanted to make it more complete by adding a small
wind vane, but Ralph said it might frighten the birds, so it was
omitted.

Of course larger and more ornamental houses may be built, but
where there are too many families in such close proximity there is
apt to be trouble, while houses that are too conspicuous do not
appeal to the beautiful American wild birds that we want to attract.
With the English sparrow it does not matter so much. For these
birds, a tenement house against the side of a barn may be built
easily, in the form shown in Fig. 139.
This may be made any length, each door leading to a compartment
separated from the others by partitions. Make as many pieces plus
one as there are to be compartments, apartments, or flats; have the
bottom project as shown in side view for a perch and walk, and have
the roof also project to shed rain.
If not fastened from the inside of the barn by stout screws, this
house must be secured to a shelf, or by brackets.
Fig. 139. Two family house and tenement
The side view shows a simple shelf made of a back piece secured to
the side of the barn by screws or nails, a plain shelf nailed to this

Fig. 140. The bird
bath
back piece, and two wooden brackets. If iron brackets are used,
both the shelf and back piece may be omitted, the brackets being
fastened to the under side of the bird house and to the siding of the
barn by screws.
For birds like the high-hole, or flicker, a piece of hollow log, or an
elongated box fastened securely to the side of a pole, made cat
proof, is very acceptable. This should not be painted, but should be
provided with a door on the side and a perch. (Fig. 137.) The
opening should be about three inches for these large birds, and the
location should be as secluded as possible. Any number of devices
will suggest themselves, but always remember the cat, and study
the location from the bird point of view. The martins and swallows
are especially to be encouraged, as they are wonderful destroyers of
insects.
One device, especially grateful to these
feathered friends in hot weather, is a pan of
water, in a place where they can drink and
bathe without being eternally on the watch
for that crouching enemy, who is always
stalking them—Tabby.
A pedestal with a platform about four feet
above the ground will do nicely, and it can
be placed so close to the house that you
can watch them, and enjoy their ablutions
almost as much as they do. (Fig. 140.)
The construction is too simple to require an
explanation.

XXV
SIMPLE ARTICLES FOR HOUSEHOLD USE
The boys thought it was about time to pay some attention to the
wants of the family, who had been clamouring for weeks to have this
article or that for the kitchen, dining room, and in fact for every part
of the house.
Ralph was a wise teacher, however. He knew that the cause of
ninety out of every hundred failures was due to the young
mechanic's trying some problem too far advanced.
It seems strange that people cannot learn this lesson. We have seen
hundreds of boys led along, say in carving, from one simple lesson
to another, until at the end of five or six carefully graded exercises,
these boys could carve beautifully any design given them.
On the other hand, we have seen boys start in on their own hook,
without any direction from older people, and ruining everything they
tried, simply because they wanted to do the most difficult thing first,
before they had developed any skill.
Ralph was determined that his boy should be an expert and
successful user of tools, so he paid no attention to the clamours of
the family, and allowed Harry to make only those things which were
within his power to do well. Each time a piece of work was finished,
and inspected by the family, the universal chorus was something like
this:
"Well, if he can make such a fine bird house, I don't see why he
can't make half a dozen picture frames for these water colors," or,
"If he can make such a fine pen tray, I don't see why he can't make
a new stool for the piano!"

In vain Ralph explained that these things could be made in due time,
that a picture frame required much more skill than a bird house, etc.
Their household articles commenced with a bread board for the
kitchen. (Fig. 141). This gave Harry his first experience in planing a
broad surface. He used jack and smoothing planes for the working
face, and squared the rest of the board as he had smaller pieces.
This required some time. The wood about the semi-circular top was
removed with saw and chisel, the board held for the chiselling flat on
the bench hook. After getting this curve as true as possible with the
chisel, it was finished with a sand-paper block. A 1⁄2-inch hole was
bored at the centre of the semi-circle to hang it up by, and the two
lower corners were rounded with chisel and sand-paper. No sand-
paper was used on the flat surface, as Ralph explained this was a
board for cutting bread, and the grit from the sand-paper would
become more or less embedded in the wood, and it would spoil the
bread knife. Sand-paper is made of ground quartz, and it soon dulls
the edge of a cutting tool.
Fig. 141. The bread board

The knife and fork box (Fig. 142) brought new problems. The list of
material was:
1 pc. 111⁄2 × 31⁄4 × 1⁄2 2 pcs. 7 × 11⁄2 × 1⁄2
2 pcs. 14 × 11⁄2 × 1⁄2 1 pc. 12 × 61⁄2 × 1⁄4
It was made of white wood, and, after being assembled, was stained
a rich brown by receiving two coats of bichromate of potash. This is
a chemical, which may be bought at a paint or drug store in the
form of crystals. These are dissolved in water, until the solution looks
like pink lemonade. It can be applied with a brush, but each coat
must be allowed to dry completely before the whole is sand-papered
smooth with No. 0 sand-paper. A deeper brown can be obtained by
adding one or two extra coats of stain.
The middle partition containing the handle was made first. The
drawing was laid out on the wood after it had been squared up, and
two holes 1 inch in diameter were bored out at a a. The wood
between was taken out with a key-hole saw, and finished to the line
with chisel and knife. A turning saw can be used to advantage on
this handle, but it is not absolutely necessary. Spaces b b were
removed in the same way, but a knife was used in the concave part
of the curve. If it is handy, a small spokeshave can be employed on
the whole upper line of this handle.
Anything in the nature of a handle should be rounded to fit the
hand. Edges c c were therefore rounded with the knife, and finished
with coarse, followed by fine, sand-paper.
The two sides were laid out together as in the nail box, and the
groove cut with back saw and 1⁄8-inch chisel.
The end pieces were made in a similar manner, and the bottom
piece squared to 1⁄16-inch of finished size. The assembling consisted
of first gluing together the sides and ends. Two hand screws were
used to hold them. This was Harry's first attempt at using hand
screws, and Ralph showed him the importance of keeping the jaws
parallel.

Fig. 142. Method of
using hand screws in
the construction of a
knife box
The box remained in the hand screws
over night, and the next day it was
found to be securely fastened. The
most convenient kind of glue for boys
is the liquid sold in cans. It is always
ready for use, and very handy where
only a moderate quantity is needed.
Dry glue in the form of flakes, or
granulated, must be soaked over
night, and then heated in a pot having
a double bottom with water in the
lower part.
It should be put on hot with a brush or
a small flat stick. The best glue is none
too good, yet a good quality has
wonderful holding power and should
last indefinitely.
After removing the hand screws, the
unfinished box was placed in the vise,
tested with the edge of the plane, and
made perfectly true, top and bottom.
The 1⁄4-inch bottom piece was now
put on with one-inch brads, the sides and ends made square, the
handle partition slipped into the grooves, and fastened with two
brads at each end.
This knife box was so satisfactory that our young carpenters
resolved to have a large one for tools. Whenever they had a job to
do in the house, they were constantly running out to the shop for
something, so that a tool box became a necessity.
The construction was similar to the knife box; but this was larger
and heavier, and the dado joints at the ends were replaced by a butt
joint fastened with flat-head screws. (Fig. 143). The bottom and

Fig. 143. Tool box
partition were also put on with screws,
on account of the weight to be carried.
Fig. 144. Another tool box
These tool boxes are frequently made in the shape shown in Fig.
144, with sloping sides and ends called the hopper joint; but aside
from the tool practice it affords, it is doubtful if the shape has
advantage enough over the other form to warrant the extra time it
takes. Man is an imitative creature, however, and what one
carpenter has, the others copy.
The principal features about this useful article should be size and
strength, especially in the handle, which should be of about 5⁄8 or
3⁄4 inch stock.

XXVI
THE MITRE BOX AND PICTURE FRAMES
It seemed to Harry that the shop was fairly well equipped, but Ralph
insisted that they must have a mitre box before making anything
else for the house.
The mitre box is, or should be, an instrument of precision, and
although simple in construction, must be perfectly accurate, or it is
useless. (Fig. 145.)
The illustration shows the common form, but elaborate affairs of iron
and wood can be bought ready made. Every boy should make his
own, for the practice, if for nothing else. The sides should be made
of oak 7⁄8 inch thick, 18 inches long, and 31⁄2 inches high, the
bottom of 7⁄8-inch pine or other soft wood, the same size.
When squared up, the two sides must be tested by standing them
side by side; then reverse one end for end, to see if they are alike. If
not, find where the trouble is, and correct it.
It is especially important that the edges of the bottom piece be
square and the sides perfectly parallel. This test can be made with
the marking gauge. Sides are fastened on by boring and
countersinking for three screws on each. After assembling, the
whole thing must be tested as if it were a solid block. Top edges
must be true and parallel.

Fig. 145. The 45° mitre box and test pieces
Near one end—about two inches in—lay out across the top with try
square a line 90 degrees with the sides. Carry the line down each
side, square with the top edges. For 45-degree angles, lay out a
square by drawing two pencil lines across the top, as far apart as the
finished mitre box is wide. Draw the two diagonals and square lines
from their ends down both sides, taking care that their position is
not over the screw in the bottom; because as the saw cuts deeper it
may reach this screw and ruin its teeth.
Make the three saw cuts directly on the lines laid out with a cross
cut or back saw, with the utmost care. If this is not done accurately,
all the labour of preparation is wasted. The blank end of the mitre
box may have an additional 90-degree cut, or be left for new cuts in
the future, as a mitre box of this description wears out and becomes
inaccurate.
Other angles may be used, as 60 degrees or 30 degrees, but it is
better to have these on another box as they are used less, and for
special purposes. (Fig. 146.)
The mitre box is not ready to use until it has been thoroughly tested.
Prepare a strip of soft wood—pine or white wood—11⁄2 inches wide
and 1⁄2 inch thick. Cut four pieces from it on the mitre box, using the
back saw as shown at a, with only one of the slits. Place these four
triangular pieces together to form a square. All the four mitre joints
of this square must fit perfectly. If they do not, mark the slit "N. G.,"
and test the other slit in the same way. If all right, mark "O. K." It
often happens that one may be perfect and the other inaccurate. If
they are both O. K., the box is ready for use. If one slit is useless,
lay out and cut another on the blank end of the mitre box in the
same direction, and test again.
In testing a 30-degree cut three pieces of the strip should be sawed
out, and when placed together they should form a perfect equilateral

Fig. 146. 30-60-90
mitre box
triangle, while from a 60-degree cut,
six pieces are needed to form a
hexagon.
These angles are valuable in inlaid
work, and for getting out geometrical
designs.
The 45-degree cut is indispensable in
making the mitred corners of picture
frames and in cabinet work.
In making picture frames of simple
cross section, it is first necessary to
cut the rabbet (Fig. 147) with a rabbet
plane. If this moulding is made by hand, the size of the picture
should be measured, the length of all four sides added, and a liberal
allowance made for waste.
Fig. 147. Making picture frames
In the figure, the triangles a a are waste, the rabbet being indicated
by the dotted line. After the four pieces have been sawed out on the
mitre box, they should be placed together on a flat surface, such as

the bench top or floor, to see if the mitres fit perfectly. If they do
not, one of them can be block planed to make a perfect fit, and the
other three laid close together, as shown in the illustration.
The assembling is the hardest part of the operation, and many
devices have been tried and some patented to hold the parts
together while the glue is drying.
Perhaps the surest way is to drill a hole in one piece of each joint
large enough for the passage of a wire bung-head nail.
The undrilled piece is placed vertically in the vise. The drilled piece,
after receiving a thin coat of glue, is brought into position
horizontally, and the nail driven home.
Theoretically, the nail should catch at the first blow, but the
horizontal piece will sometimes slip, even with the best of care. It is
wiser to place this piece about 1⁄16 inch above its final position, to
allow for this slip.
A method sometimes used is to glue near the ends of each piece a
triangular block of wood, as shown at d. These must be left over
night to harden.
The next day the whole four pieces can be glued and held together
by four hand screws, as shown, until the glue is thoroughly hard.
This method, of course, can only be used with plain moulding or that
which is square on the outside.
Our boys tried another way that is commonly practised. They nailed
oblong blocks to an old drawing board, as shown at e e, and then
placed the picture frame in the centre, after gluing the joints, and
driving wedges in between the blocks and the frame. Paper placed
under each joint prevented the frame from being stuck to the
drawing board by the glue forced out by the pressure.
This paper plan was learned by experience, as the first frame the
boys tried had to be pried up from the board, and in so doing they
broke it at two of the joints, so that it had to be made again.

It is well to remember in gluing mitre joints that end grain absorbs
more glue than a flat surface. A priming coat should be applied first,
and allowed to remain a few moments to fill up the pores. The
second coat should hold fast and make a strong joint, but an excess
of glue should always be avoided, as it must be removed after
hardening, and glue soon takes the edge from the best of tools.
Very fancy frames should be avoided. A bevel on the outside or
inside, or both, is about all the young woodworker should attempt in
the way of ornamentation. Depend on the natural beauty of the
wood, as a fancy frame draws the attention from the picture, which
after all is the main thing. We should admire the man, not his
clothes, the picture not its frame, although the latter should be neat
and well made.
The finishing and polishing of frames is taken up in Chapter XLIX.

XXVII
MAKING TOILET BOXES
To make a wooden box sounds like a simple proposition; but in
making the drawing, the questions of size, proportion, joints, hinges,
etc., immediately come up.
The size of course depends on the purpose of the box. If it is for
ladies' gloves, it should be long and narrow; if for collars or
handkerchiefs, square or nearly so. The height is nearly always
made too great. In fact, the whole question of proportion is one
which can hardly be taught; it must be felt, and different people
have different ideas as to what constitutes good proportion.
Some hints, however, may be given: A box perfectly square does not
look well. Again, dimensions that are multiples do not look well. A
box 4 × 8 × 12 inches would not be nearly so pleasing as one 3 ×
51⁄2 × 12 inches.
The proportions are also affected by the constructive details. Is the
box to be flat on the sides and ends or is the top to project? etc.
Our boys argued and sketched and finally drew the design shown at
Fig. 148. This was to hold ties. The top was to project and have a
bevel, or chamfer, also the bottom. No hinges were to be used, but
the cover was to have cleats fastened on the under side to keep it in
place, and to prevent warping.
The next question was the manner of fastening the sides and ends.
On unimportant work, a butt joint with glue and brads can be used,
but for a toilet article, the holes made by the brads, even if they are
filled with putty, are not satisfactory.

Fig. 148. Dado joint
used in box design
So it was decided to use the dado joint
as shown at a. This meant more fine
work, but, as Ralph suggested, it was
to last a lifetime, and should be made
right.
Sides and ends were squared up, and
the grooves on the side pieces laid out
as in the nail box. The rabbets on the
end pieces were cut out with the back
saw and chisel. After the joints had been carefully fitted, the four
pieces were glued together and placed in hand screws over night.
While the glue was hardening, the two pieces for the top and bottom
were squared up and bevelled with the smoothing plane on the long
sides, the block plane on the ends.
The cleats for the top were next made, drilled and countersunk for
the screws as at b.
A careful full-sized drawing of half of the top was made, and a chip
carving design drawn for it. The cleats were not put on until the
carving was finished and short screws had to be used so they would
not come through and spoil the surface.
The next day the body of the box was removed from the hand
screws and squared with a smoothing plane. The top and bottom
were put on with 1-inch brads. These were "set" with a nail punch to
prevent any possible scratching and the whole box was rubbed down
with wax dissolved in turpentine.
For fine cabinet work, the dovetail joint makes the most satisfactory
method of fastening, but Harry was not yet skilled enough to do the
fine work it demanded.
The second box was for handkerchiefs, dimensions 8 × 7 × 3 inches
outside, and no overhang at either top or bottom. The construction
brought in several new features. Sides and ends were dadoed
together as in the first box.

The top and bottom, after being squared, were rabbeted on all four
sides until they fitted snugly into the opening top and bottom. They
were glued in these positions and placed in hand screws over night.
(Fig. 149.)
"How are you going to get into that box?" asked Harry. "You've
closed it up solid and glued the top on."
"Wait and see," was all the satisfaction he got.
Fig. 149. The handkerchief box
The next day the hand screws were removed and the box squared
up exactly as if it had been a solid piece of wood. Ralph then made
two gauge lines around the four sides, 3⁄4 inch from the top and 1⁄8
inch apart. Then he cut the box in two between these two lines with
a rip saw, after slightly rounding all corners except the bottom ones
with a plane and sand-paper.
By this method, the box and cover must be exactly alike in outline,
and by planing to the gauge lines, they will fit perfectly.
It only remained to hinge the two parts together, but this operation
proved to be no slight task.

Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

Advanced Signal Processing For Communication Systems The Springer International Series In Engineering And Computer Science 1st Edition Tadeusz Wysocki

More Related Content

Similar to Advanced Signal Processing For Communication Systems The Springer International Series In Engineering And Computer Science 1st Edition Tadeusz Wysocki (20)

Recently uploaded (20)

Advanced Signal Processing For Communication Systems The Springer International Series In Engineering And Computer Science 1st Edition Tadeusz Wysocki