SlideShare a Scribd company logo
Advanced Signal Processing For Communication
Systems The Springer International Series In
Engineering And Computer Science 1st Edition
Tadeusz Wysocki download
https://ebookbell.com/product/advanced-signal-processing-for-
communication-systems-the-springer-international-series-in-
engineering-and-computer-science-1st-edition-tadeusz-
wysocki-2127288
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Advanced Signal Processing For Industry 40 Evolution Communication
Protocols And Applications In Manufacturing Systems Bajaj Ansari
https://ebookbell.com/product/advanced-signal-processing-for-
industry-40-evolution-communication-protocols-and-applications-in-
manufacturing-systems-bajaj-ansari-50856820
Advanced Signal Processing Theory And Implementation For Sonar Radar
And Noninvasive Medical Diagnostic Systems 2nd Ed Stergios
Stergiopoulos
https://ebookbell.com/product/advanced-signal-processing-theory-and-
implementation-for-sonar-radar-and-noninvasive-medical-diagnostic-
systems-2nd-ed-stergios-stergiopoulos-4096642
Advanced Signal Processing Handbook Theory And Implementation For
Radar Sonar And Medical Imaging Real Time Systems Stergiopoulos
https://ebookbell.com/product/advanced-signal-processing-handbook-
theory-and-implementation-for-radar-sonar-and-medical-imaging-real-
time-systems-stergiopoulos-6750218
Advanced Signal Processing Handbook Theory And Implementation For
Radar Sonar And Medical Imaging Realtime Stergios Stergiopoulos
https://ebookbell.com/product/advanced-signal-processing-handbook-
theory-and-implementation-for-radar-sonar-and-medical-imaging-
realtime-stergios-stergiopoulos-1269268
Advanced Signal Processing Theory And Lmplementation For Sonarradarand
Noninvasive Medical Diagnostic Systems 2009th Edition Stergios
Stergiopoulos
https://ebookbell.com/product/advanced-signal-processing-theory-and-
lmplementation-for-sonarradarand-noninvasive-medical-diagnostic-
systems-2009th-edition-stergios-stergiopoulos-231593430
Signal Processing For Neuroscientists A Companion Volume Advanced
Topics Nonlinear Techniques And Multichannel Analysis Elsevier
Insights 1st Edition Drongelen
https://ebookbell.com/product/signal-processing-for-neuroscientists-a-
companion-volume-advanced-topics-nonlinear-techniques-and-
multichannel-analysis-elsevier-insights-1st-edition-drongelen-1824500
Multimedia Signals And Systems Basic And Advanced Algorithms For
Signal Processing 2nd Edition Srdjan Stankovi
https://ebookbell.com/product/multimedia-signals-and-systems-basic-
and-advanced-algorithms-for-signal-processing-2nd-edition-srdjan-
stankovi-5354424
Advanced Design Techniques For Rf Power Amplifiers Analog Circuits And
Signal Processing 1st Edition Anna N Rudiakova
https://ebookbell.com/product/advanced-design-techniques-for-rf-power-
amplifiers-analog-circuits-and-signal-processing-1st-edition-anna-n-
rudiakova-2356380
Advanced Methods For Processing And Visualizing The Renewable Energy A
New Perspective From Signal To Image Recognition 1st Ed 2021
https://ebookbell.com/product/advanced-methods-for-processing-and-
visualizing-the-renewable-energy-a-new-perspective-from-signal-to-
image-recognition-1st-ed-2021-36127610
Advanced Signal Processing For Communication Systems The Springer International Series In Engineering And Computer Science 1st Edition Tadeusz Wysocki
ADVANCED SIGNAL PROCESSING
FOR COMMUNICATION SYSTEMS
THE KLUWER INTERNATIONAL SERIES
IN ENGINEERING AND COMPUTER SCIENCE
ADVANCED SIGNAL PROCESSING
FOR COMMUNICATION SYSTEMS
edited by
Tadeusz A. Wysocki
University of Wollongong, Australia
Michael Darnell
The University of Leeds, United Kingdom
Bahram Honary
Lancaster University, United Kingdom
KLUWER ACADEMIC PUBLISHERS
NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
eBook ISBN: 0-306-47791-2
Print ISBN: 1-4020-7202-3
©2002 Kluwer Academic Publishers
New York, Boston, Dordrecht, London, Moscow
Print ©2002 Kluwer Academic Publishers
All rights reserved
No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,
mechanical, recording, or otherwise, without written consent from the Publisher
Created in the United States of America
Visit Kluwer Online at: http://kluweronline.com
and Kluwer's eBookstore at: http://ebooks.kluweronline.com
Dordrecht
CONTENTS
PREFACE ix.
1.
2.
3.
4.
5.
6.
Application of Streaming Media in Educational Environments
P. Doulai 1
Wideband Speech and Audio Coding in the Perceptual Domain
15
L.Lin, E.Ambikairajah and W.H.Holmes
Recognition of Environmental Sounds Using Speech Recognition
Techniques
M.Cowling andR.Sitte 31
A Novel Dual Adaptive Approach to Speech Processing
M.C.Orr, B.J.Lithgow, R.Mahony, andD.S.Pham 47
On the Design of Wideband CDMA User Equipment (UE) Modem
K.H.Chang, M.C.Song, H.S.Park, Y.S.Song, K.-Y.Sohn, Y.-H.Kim,
C.I.Yeh, C.W.Yu, andD.H.Kim 59
MMSE Performance of Adaptive Oversampling Linear Multiuser
Receivers in CDMA Systems
P.Iamsa-ard andP.B.Rapajic 71
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
vi.
Peak-to-Average Power Ratio of IEEE 802.11a PHY Layer
Signals
A.D.S.Jayalath and C.Tellambura 83
97
A Proposed Hangup Free and Self-Noise Reduction Method for
Digital Symbol Synchronizer in MFSK Systems
C.D.Lee and M.Darnell
A Channel Sounder for Low-Frequency Sub-Surface Radio Paths
D.Gibson and M.Darnell 113
Computational Complexity of Iterative Channel Estimation and
Decoding Algorithms for GSM Receivers
H.Cui and P.B.Rapajic 129
Modelling and Prediction of Wireless Channel Quality
S.Ci andH.Sharif 139
Packet Error Rates of Terminated and Tailbiting Convolutional
Codes
J.Lassing, T.Ottosson and E.Ström 151
The Feng-Rao Designed Minimum Distance of Binary Linear
Codes and Cyclic Codes
J.Zheng, T.Kaida andK.Imamura 167
On a Use of Golay Sequences for Asynchronous DS CDMA
Applications
J.R.Seberry, B.J.Wysocki and T.A.Wysocki 183
PUM-Based Turbo Codes
L.Fagoonee, B.Honary and C.Williams 197
A Code for Sequential Traitor Tracing
R.Safavi-Naini and Y.Wang 211
Software-Defined Analyzer of Radio Signals
J.Lopatka 225
Interleaved PC-OFDM to Reduce Peak-to-Average Power Ratio
A.D.S.Jayalath and C.Tellambura 239
19.
20.
INDEX
vii.
283
Reducing PAR and PICR of an OFDM Signal
K.Sathananthan and C.Tellambura 251
Iterative Joint Equalization and Decoding Based on Soft Cholesky
Equalization For General Complex Valued Modulation Symbols
J.Egle and J.Lindner 267
PREFACE
In the second year of the twenty first century, we are witnessing
unprecedented growth in both quality and quantity of services offered by
communication systems. Most of the recent advancements in communication
systems performance have been only made possible by application of digital
signal processing in all areas of communication systems development and
implementation. Advanced digital signal processing allows for the new
generation of communication systems to approach the theoretical
predictions, and to practically utilize the ideas that have not been considered
feasible to implement not so long ago. This book consists of 20 selected and
revised papers from the 6th
International Symposium on Digital Signal
Processing for Communication Systems, held in January 2002, at Pacific
Parkroyal Hotel in Manly, Sydney, Australia.
The first group of papers, deals with the audio and video processing for
communications applications, and includes topics ranging from multimedia
content delivery over the Internet, through the speech processing and
recognition to recognition of non-speech sounds that can be attributed to the
surrounding environment.
Another theme which receives significant attention in this book is orthogonal
freqency division multiplexing (OFDM) in its various forms, eg
HIPERLAN, IEEE 802, 11 a. Aspects of OFDM technology, which are
covered here, include novel forms of modulation and coding, methods of
reducing in-band and out-of-band spurious signal generation, and means of
reducing the peak-to-average power ratio of an OFDM waveform. In these
contributions, a key objective is to return the inherent implementational
simplicity ofthe OFDM technique whilst enhancing its performance relative
to single carrier systems.
Digital signal processing for second and third generation systems is
represented in the book as well. The topics covered here include both
theoretical issues like spreading sequence design and implementation issues
of 3G user equipment modem, and MMSE receivers for CDMA systems. A
useful comparison of complexity of channel estimation, equalization and
decoding for GSM receivers is discussed, too.
The book also includes useful papers on applications of error control
coding and information theory. These start with mathematical structure and
decoding techniques and continue with channel capacity approaching codes
and their applications to various communication systems.
The last group of papers included in the book consider several important
issues of digital signal processing for communication systems like
modulation, software defined radio, and channel estimation.
The Symposium was made possible by the generous support of the New
South Wales Section of IEEE, the Smart Internet Technology Cooperative
Research Center, the Telecommunications and Information Technology
Research Institute, the Australian Telecommunications Cooperative
Research Center, and the School of Electrical, Computer, and
Telecommunications Engineering at the University of Wollongong. The
Organizing Committee is most grateful for their support. The editors wish to
thank the authors for their dedication and lot of efforts in preparing their
contributions, revising and submitting their chapters as well as everyone else
who participated in preparation ofthis book.
Tadeusz Wysocki
Mike Darnell
Bahram Honary
x.
APPLICATION OF STREAMING MEDIA IN
EDUCATIONAL ENVIRONMENTS
Parviz Doulai
Educational Delivery Technology Laboratory (EDTLab), University of Wollongong, Wollongong
NSW 2500, Australia
Abstract: This paper discusses the growing application of Web-based instruction
and examines real time streaming technology in educational settings.
The steps required in the process of applying streaming technology in
education are outlined, and available tools and the nature of the
delivery platforms are identified. The prospects and challenges in
introducing virtual learning environments to tertiary institutions are
illustrated using two case studies. It will not be long to overcome the
challenges confronting technology-based education in traditional
teaching institutions.
Key words: Educational Technology, Streaming, Multimedia, Virtual Learning
Environment, Virtual Classroom
Chapter 1
1. INTRODUCTION AND BACKGROUND
Educational institutions have long been a testing ground for the latest
technological breakthroughs that change the way professional educators
work and live. Examples include the growing application of information
communications technologies and the use of network delivered multimedia
educational modules through the application of interactive and dynamic Web
environments. The growing global information technology revolution has
already changed the face and culture of teaching and learning in Australia
and other parts of the world, creating new opportunities and challenges for
professional educators. The new and emerging educational technologies
have enabled academic institutions to provide a flexible and more open
learning environment for students. It is shown that in a well-designed web-
based support system, students take more responsibility for their own
learning, and instructors function more like coaches and mentors for a new
generation of professionals [1].
The outcome of research and development work in utilizing new and
emerging educational technologies in traditional educational institutions has
also found its way in serving distance students. The convergence of new
information technologies such as telecommunications, computers, satellites,
and fiber optic technologies is making it easier for teaching institutions to
implement distance education [2,3]. National and transnational virtual
universities as well as traditional educational establishments are offering
online degree programs, continuing education and corporate training
courses. In many cases Web-based instruction and course management tools
are used to deliver courseware containing interactive multimedia-based
educational modules.
An integrated environment containing Web-based course delivery and
management along with multimedia modules is commonly referred to as a
virtual learning environment or a virtual classroom. Virtual learning
environments are used to support real classroom environments in traditional
academic institutions [4,5]. Virtual classrooms also were found to be very
attractive in virtual campuses and virtual universities all around the globe
[6]. Key technologies involved in the development of virtual learning
environments include multimedia and streaming media.
Reasons for developing and utilizing virtual classrooms by teaching
institutions vary, some endeavor to keep up with the ever changing frontiers
of educational technologies, whilst others see it as an approach that gives
students more control over their learning. The use of new and emerging
educational technologies offers students a dynamic learning environment
through which class communication and collaboration can be achieved with
minimum time and budget requirements. In fact, the great benefit of online
learning in general and virtual classrooms in particular is that it provides
educators with an opportunity to get students to collaborate and to
communicate very easily [1]. Two key issues in online learning are retention
and the development of interactive and collaborative activities and
environments. Creating a motivational and interactive virtual learning
environment can enhance student retention, completion, and overall
enthusiasm for this new type of learning arena [1].
In applications related to online learning, multimedia is the ability to
include sound and video into Web pages. Due to the availability of many
public domain and commercial computer programs it has become
increasingly easy to incorporate audio and video clips into any digital
document or multimedia Web publishing materials. Streaming media came
2 Chapter 1
about in response to the problem of bandwidth-greedy multimedia files,
opening the possibilities of delivering many multimedia applications via the
Internet. Streaming refers to the process of delivering audio clips, video
clips, and other media in real-time to online users [7].
Streamed audio and video files can be found in a number of World Wide
Web locations serving a wide-variety of purposes, such as a vocal
introduction to a homepage, a movie trailer, or an interactive educational
presentation. One of the major attractions to streaming media is "live"
broadcasting that has less applicability to educational environment. In a
simple educational setup, the streaming media is used to deliver
synchronized text, images and other media files over the public TCP/IP
network. In a more complex setup, streaming is used for network delivery of
interactive multimedia modules [8].
This paper illustrates two case studies; a simple virtual classroom
offering standard Power Point slides synchronized with streamed voice
narration and a stream video presentation in which the video is indexed to
the table of the content. These case studies are explained in terms of the
module structure and the method of delivery. Both modules are delivered to
students over a low bandwidth modem connection.
It would be useful to utilize desktop videos for course material
presentation and distribution. However, until recent times network delivery
of multimedia clips was limited to a corporate environment or on-campus
environment where students have direct access to high-speed lines. The
delivery of media files over the Web has always been limited by the
bandwidth of communication lines or channels. Development in this field is
happening in two directions: faster connections and communication
technologies [9] that are altering the capacity of the communication channels
and new multimedia technologies for the Web, such as streaming audio and
video, flash animation, and others that are allowing for better delivery of
media on the Web [4].
When video first came to the World Wide Web, it was necessary to
download the entire video file before it could be played. This was seen to be
one major disadvantage of traditional multimedia clips and modules.
Downloading typically megabytes of video files resulted in substantial
delays before the audience could actually hear or view the clip. This was
even worse when large clips were downloaded over a slow modem
connection.
1. Application of Streaming Media in Educational Environments 3
STREAMING: MULTIMEDIA FILES FOR
NETWORK DELIVERY
2.
Streaming media is a method of providing audio, video and other media
files in real-time without download waits over the Internet or corporate
Intranet. Instead of downloading the file in its entirety before playing it,
streaming technology takes a different approach; it downloads the beginning
of the file, forms a buffer of packets, and when an appropriate buffer is
reached, the client player plays back the packets in a seamless stream. While
the viewer is watching, it downloads the next portion, etc., until the entire
file is played. The buffer provides a way for the player to protect itself in
case of network congestion, lost packets, or other interference.
4 Chapter 1
2.1 History: Streaming Audio and Video
Progressive Networks [10] led the way in the development of streaming
audio and video, launching “RealAudio 1.0“ in 1995. “RealAudio 2.0” was
then announced that upgraded sound to “FM mono” quality and made live
Webcasting possible for the first time. RealAudio 2.0 introduced important
features such as server bandwidth negotiation, support for firewalls and open
Application Programming Interface (API) for third party developers.
Compatibility of RealAudio 2.0 with the Netscape Navigator plug-in
architecture made it possible to play RealAudio content available as an
integrated part of a Web page. In February 1997, Progressive Networks
released RealVideo 1.0 that made delivery of video over 28.8 kbps a reality.
The system also offered full-motion-quality video using V.56 (56kbps) and
near TV broadcast quality video at Local Area Network (LAN) rates or
broadband speeds (100 kbps and above).
In October 1997, Progressive Networks officially changed its name to
Real Networks prior to the release of what it called “RealSystem 5.0”. The
system included RealPlayer 5.0, RealEncoder 5.0, RealServer 5.0 and a
software called RealPublisher. Until the release of RealSystem 6.0 in 1999,
the delivery of multimedia files were conducted using Real Networks
propriety PNM (Progressive Networks Metafile) format. RealSystem 6.0
used the Real Time Streaming Protocol (RTSP) that was then a new standard
for improved server-client communication. RealSystem 6.0 could also
stream and play not just Real Networks own format, but also standard data
types such as MIDI, AVI or QuickTime. Case studies illustrated in this paper
were based on RealSystem 6.0.
Real Time Streaming Protocol is designed to work with time-based
media, such as streaming audio and video, as well as any application where
application-controlled, time-based delivery is essential. In addition, RTSP is
designed to control multicast delivery of streams, and is ideally suited to full
multicast solutions [7]. Currently, RealSystem supports a variety ofnew data
types. These include audio and video as well as text, images and animation.
In fact, streaming now is seen to be a platform for delivering information,
rather than just as a system for delivering video. One can tie other kinds of
Web content to the timeline of a video or an audio presentation. This allows
the creation of a complex and personalized experiences for the end user. An
example that contains a variety of media files with precise timing structure is
available in [11].
2.2 Why Streaming?
There are several reasons why downloading of an entire media file prior
to its play back is unsuitable in the delivery of information over the public
TCP/IP network. For instance, if a user on a low bandwidth connection (and
even high bandwidth) wants to move forward in the video they have to wait
until the whole file is downloaded. Also, if a user only views a small portion
of the stream and they are on a high bandwidth connection they are likely to
have downloaded the whole file after only a few seconds. This will cost the
user extra bandwidth because Web servers typically download as fast as they
can. Moreover, Web severs do not have Intellectual Property control and so
a publisher will not be able to prevent users from downloading the media file
for re-using. Also Web servers are not capable of delivering presentations of
unlimited or undetermined length, as well as live broadcast of media files.
There are other reasons as well, which proves the superiority of dedicated
streaming servers over the standard web servers in the delivery of
multimedia files.
Streaming multimedia has been optimized for use on the Internet in two
ways:
1. Application of Streaming Media in Educational Environments 5
Clips are highly compressed, so that download time is drastically
reduced. The goal is to download the clip faster than it takes to play the
clip, even when using dial up modem connections.
The players and plug-ins can play the clip as it is being downloaded.
They start playing immediately, thus reducing wait time for the user.
These optimizations allow users to do things that are impractical for
traditional multimedia including broadcasting of live audio and video events
and broadcasting of extremely large multimedia files, such as audio books
that can take many hours to play. Often delivery of multimedia files through
a dedicated stream server is combined with fast-forward and rewind
capabilities.
RealSystem, Microsoft Windows Media Technologies [12] and Apple’s
QuickTime [13] offer tools for streaming multimedia content across
corporate Intranets and the Internet. They allow the use of scripting
languages to control the player or more importantly the integration with the
browser so that one can embed the player and control it using Java script.
Exposure to Java is useful as it ensures the developers can use the wealth of
Java in virtual classrooms.
Producing a pre-recorded streaming multimedia requires the following
steps:
6 Chapter 1
3. STREAMING MEDIA: SERVERS, PLAYERS AND
ENCODERS
l.
2.
3.
4.
5.
Recording the content that requires proper recording equipment such as
video cameras, microphones, etc.
Digitization or conversions of resulting clip into a multimedia format,
such as .wav, .avi, .mov, rm, etc. It is possible to do this at the same time
as step one by recording directly to the multimedia format.
Post-processing in the multimedia format, such as adjusting sound
quality, editing the content, etc.
Conversions of the resulting multimedia format into a preferred format
(eg. RealSystem format) using the relevant encoder (eg. RealProducer).
If there are no editing enhancements, one can record direct to the
preferred format.
Uploading the resulting file on a Web server, or a dedicated steaming
server such as RealServer, so people on the Web can access it as
streaming multimedia.
Examples shown in this paper use RealSystem, which is a collection of
components by Real Networks for producing and distributing streaming
multimedia. The three components of RealSystem include:
Producer Module (encoder) that converts existing multimedia files into
RealSystem format. The encoder program can also record to RealSystem
format directly from audio and video sources.
Player Module that plays, amongst other things, the RealSystem media
file formats. The free version of RealPlayer includes both as an external
version, and a Web browser plug-in version. The professional version of
RealPlayer adds the ability to record broadcasts and other advanced
features.
Server Module that offers live broadcast and advanced features like
automatic adjustments of transmission speeds to match user’s
connection, or the ability to fast forward and rewind.
4. VIRTUAL LEARNING ENVIRONMENTS
Web-based instruction can be supplemented by audio and video files to
closely simulate a real classroom environment. Streaming technology is the
key technology used in delivery of educational multimedia modules over the
network. A virtual learning environment in its relatively complete form
contains a small size video clips that shows the class activity as well as a
series of text pages and images representing the content of the blackboard
and the overhead projector screen.
From a developer of educational resources perspectives, the interesting
idea behind streaming files is the synchronization of the playback of
arbitrary files such as text, images etc. For instance, one can synchronize a
flash animation file with an audio, text, image, or any other data files. In a
virtual classroom environment, one can synchronize the playback of a class
video with images taken from the blackboard or the overhead screen as the
lecture progresses.
4.1 Case Study 1: Stream Video Integration into Virtual
Classrooms
Due to the recent availability of video compressor/decompressor (codec)
technologies with compressions designed for web delivery, it is now possible
to use video as an effective resource in a web-based instruction environment.
Different client programs are now available to make movies with different
data rates, and different streaming server programs are now available to
negotiate with the client machines to deliver stream video at relatively high
quality even via narrow bandwidth of modem connections.
A stream video presentation was included into a combined final year and
Master subject (ELEC476/912) learning environment to provide background
materials for students group projects. This module was offered in two
formats to meet low- and high-end Mac, PC and UNIX platforms as well as
slow and moderately fast network connections. In both formats an audio and
a video file synchronized with text and images were used to create a simple
virtual tutorial classroom.
An interesting feature of most streaming server programs is that they
allow client machines to directly negotiate with the server to access the part
on the media file it wants. Normally, after a short pause the user can jump to
anywhere in an audio or video clip. The video can be indexed to a table of
contents and can also automatically "flip" pages in an adjacent frame
according to markers embedded in the video. As shown in Figure 1, the
video file in this presentation was indexed to a table of content, and that was
done through markers embedded in the video file during the encoding
1. Application of Streaming Media in Educational Environments 7
process. These enabled students to click on items listed in the table of
content (left window) in order to view its associated video along with its
synchronized text and images in allocated areas within the presentation
window.
An online questionnaire was administered to obtain information
regarding student access to the subject homepage and its stream video
integration in ELEC476/912 virtual learning environment. Survey results
showed that students realized the benefits of technology-enhanced resources
that were incorporated into their on-campus course delivery. Students’
comments and feedback on the course content, the method of delivery and
available tools and resources for this subject was archived in [14].
4.2 Case Study 2: ELEC101 Virtual Classroom
The Web Edition of “Fundamentals of Electrical Engineering
(ELEC101)” is a simple virtual classroom environment that uses the real-
time streaming technology to deliver synchronized Power Point slides
(images) and audio files (the lecturer voice) over the Internet.
To ensure students using different computers of any power and different
connections of any speed could retrieve the content of ELEC101 virtual
classroom four options, namely plain, synchronized, controlled synchronized
8 Chapter 1
1. Application of Streaming Media in Educational Environments
and power-point slide/script were provided. Figure 2 shows a screen caption
of the cover page of ELEC101 World Wide Web Edition.
Rather than replacing the conventional lecturing of ELEC101, the Web
edition was designed and implemented to help students who need to review
important pointers of major topics. Students need to have a freely available
RealPlayer and perhaps a headphone set so that they can hear the lecture and
view the overheads in computer laboratories or at home using a standard 56
kbps dial up connection on PC, Mac or UNIX platform.
In the plain format students first receive a page containing thumbnails of
available overheads. The RealPlayer will start working as soon as students
click on a thumbnail to view the actual overhead. Then, they step to the slide
they are interested in, and hear the associated audio clip with each slide.
Students may control the RealPlayer operation, and they also have
standard navigation tools. The RealPlayer may be used as a plug-in program
or as a Netscape or Internet Explorer helper application. The latter means by
clicking on the RealAudio icon, the browser lunches the player and from
there, students control the player operation; recording, playback, rewind and
so forth. They may also use standard previous and next buttons to move
around. A screen caption ofthe plain format is shown in Figure 3.
9
10 Chapter 1
In synchronized format student receive power point slides and their
associated sound. The audio file automatically updates slides displayed as
the lecture progresses. RealPlayer multiple controls were provided in this
option. These include play, pause, volume-control and position-slider. Users
can use the latter to move forward and backward through the presentation.
1. Application of Streaming Media in Educational Environments 11
The controlled synchronized option of the ELEC101 displays projected
slides on the screen and plays the corresponding sound. In this mode of
operation, students step to the slide they are interested in and start the player.
While the audio is playing, it will automatically update the slide as the
lecture progresses. Alternatively, students can jump to a new slide by
clicking on thumbnails listed on the left frame, and the audio will jump to
follow. To start listening to the audio from a particular slide, students may
type the slide number in the space provided in control section and press the
enter key. Figure 5 shows a screen caption of ELEC101 in a controlled
synchronized mode of operation.
Provisions also were made for students using a computer without a sound
card. In this case they view a slide on one window and read its
corresponding text on another browser window.
Implementation of the plain format is very simple provided the developer
knows the technology and has some almost freely available tools. The
“controlled synchronized” version of ELEC101 represents some challenges.
This version uses JavaScript, Frames, and the RealAudio Plug-in.
Nowadays, the RealPlayer itself supports Java driven events. This basically
means the development of synchronized audio and video files for network
delivery is much easier, and can be done by almost everyone.
The ELEC101 virtual classroom environment was tested by a group of
second year students using moderately high-speed connection (computer
laboratories on campus) and low speed dial up connections (28.8kbps and
higher modems). The setup performed with no interruptions or delay in
delivering the subject content (sound and images). The entire concept of
virtual classroom and the application of streamed and synchronized audio
file were found by students very exciting and motivating. The setup is now
available on Internet for public use [14].
5. CONCLUSION
The combination of powerful compression algorithms, extensive features
that are associated with streaming servers and integration with the Web
make it possible to use virtual learning environments effectively over narrow
bandwidth networks. This paper explored the integration of the multimedia
modules into a virtual learning environment. Real time streaming technology
in an educational setting was examined and the process of applying
streaming technology in education was briefly highlighted. Two examples of
virtual learning environments using stream synchronized audio/video and
image files were illustrated. It is envisaged that the usage of technology
enabled methods in face-to-face university instruction results in a model that
works equally well for distance students and learners in virtual campuses.
P. Doulai, “Preserving the quality of on-Campus education using resource-based
approaches,” Proc. International WebCT Conference on Learning Technologies,
University of British Columbia, Vancouver, Canada, 1999, pp. 97-101.
B. Hart-Davidson and R Grice, “Extending the dimensions of education: Designing,
developing, and delivering effective distance-educ.,” Proc. of the IEEE Professional
Communication Conference, 2001, pp. 221-230.
E. R. Ladd, J. R. Holt and H. A. Rumsey, “Washington state university's engineering
management program distance education industry partnership,” Proc. of Portland
International Conference on Management of Engineering and Technology, 2001. pp.
302-306.
P. Doulai, Smart and Flexible Campus: “Technology Enabled University Education,”
Proc. of The World Internet and Electronic Cities Conference, 2001, Iran, pp. 94-101.
V. Trajkovic, D. Davcev etal, “Web-based virtual classroom,” Proc. of IEEE
Conference on Technology of Object-Oriented Languages and Systems, 2000, pp.
137-146
12 Chapter 1
REFERENCE
[1]
[2]
[3]
[4]
[5]
W. Beuschel, “Virtual campus: scenarios, obstacles and experiences,” Proc. of IEEE
Conference on System Sciences, 1998, pp. 284-293.
A. Zhang; Y. Song and M. Mieike, NetMedia: “Streaming multimedia presentations
in distributed environments,” IEEE Multimedia, Vol.9, 2002 pp. 56-73.
P. Doulai, “Recent developments in Web-based educational technologies: A practical
overview using in-house implementation,” Proc. of the International Power
Engineering Conference, 1999, Singapore, pp. 845-850.
D. Fernandez, A. B. Garcia, D. Larrabeiti, A. Azcorra, P. Pacyna, and Z. Papir,
“Multimedia services for distant work and education in an IP/ATM environment,”
IEEE Multimedia, Vol.8,2001 pp. 68-77.
RealNetworks(ProgressiveNetworks) http://www.real.com/
Design and Management 1, “Introduction to Group Projects (ELEC195) Homepage,”
http://edt.uow.edu.au/elec195/welcome.ram
S. Huang and H. Hu, “Integrating windows streaming media technologies into a
virtual classroom environment,” Proc. of International Symposium on Multimedia
Software Engineering, 2000, pp. 411-418
Apple QuickTime, http://www.apple.come/quicktime/
The Educational Delivery Technology Laboratory (EDTLab), University of
Wollongong, http://edt.uow.edu.au/edtlab/portfolio.html/
1. Application of Streaming Media in Educational Environments 13
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
WIDEBAND SPEECH AND AUDIO CODING IN
THE PERCEPTUAL DOMAIN
L. Lin, E. Ambikairajah and W.H. Holmes
School of Electrical Engineering and Telecommunications, The University of New South
Wales, UNSW Sydney 2052, Australia.
Abstract: A new critical band auditory filterbank with superior auditory masking
properties is proposed and is applied to wideband speech and audio coding.
The analysis and synthesis are performed in the perceptual domain using this
filterbank. The outputs of the analysis filters are processed to obtain a series
of pulse trains that represent neural firing. Simultaneous and temporal
masking models are applied to reduce the number of pulses in order to achieve
a compact time-frequency parameterization. The pulse amplitudes and
positions are then coded using a run-length coding algorithm. The new speech
and audio coder produces high quality coded speech and audio, with both
temporal and spectral fidelity.
Key words: auditory filterbank, speech coding, simultaneous and temporal masking
Current applications of speech and audio coding algorithms include
cellular and personal communications, teleconferencing, secure
communications etc. Historically, coding algorithms using incompatible
compression techniques have been optimized for particular signal classes
such as narrowband speech, wideband speech, high quality audio and high
fidelity audio (CD quality). It is evident that a universal speech and audio
coding paradigm is required to meet the diverse needs of the above
applications. Low bit rate speech coders provide impressive performance
above 4kbps for speech signals. But do not perform well on music signals.
Similarly, transform coders perform well for music signals, but not for
speech signals at lower bit rates.
Speech and general audio coders are usually quite different – for speech
one of the main tools is a model of the speech production process, whereas
Chapter 2
1. INTRODUCTION
for audio more attention is paid to modeling the human auditory system,
since a source model is usually not feasible. The new MPEG-4 standard for
multimedia communication includes a scalable audio codec supporting
transmission at bit rates from 2 to 64kbps. However, in order to achieve the
highest audio quality with the full range of bit rates, MPEG-4 actually
employs three types of codec. For lower bit rates, a parametric codec
(Harmonic Vector Excitation Coding) is used which encodes at 2-4kbps for
speech with an 8kHz sampling frequency, and at 4-16kbps for speech and
audio with 8 or 16kHz sampling frequency. A Code Excited Linear
Predictive (CELP) codec is used for the medium rate – i.e. 6-24kbps at 8 or
16kHz sampling frequency. Time-frequency (TF) codecs, including the
MPEG-2 AAC and Twin VQ codecs are used for the higher bit rates,
requiring 16-64kbps at a sampling frequency of 8kHz.
There is therefore a need for high quality coders that can work equally
well with either speech or general audio signals. In this work we propose a
scheme for a universal coder that can handle both wideband speech and
audio signals. This coder is based on a new auditory filterbank model, and is
a further development of the speech and audio coding scheme initially
proposed by Ambikairajah et al. [3], in which the analysis and synthesis of
the speech and audio signals take place in the perceptual domain.
16 Chapter 2
1.1 Coding using Auditory Filterbanks
In recent years parallel auditory filterbanks such as the Gammatone
filterbank [5,13] have outperformed the conventional transmission line
auditory model [1,12] in terms of computational simplicity. They have
applications in various types of signal processing required to model human
auditory filtering. Gammatone auditory filters were first proposed by
Flanagan [5] to model basilar membrane motion, and were subsequently
used by Patterson et al. [13] as a reasonably accurate alternative for auditory
filtering. They have since become very popular. Robert and Eriksson [15]
applied them to produce a nonlinear active model of the auditory periphery,
and Kubin and Kleijn [7] applied them to speech coding.
In the wideband speech and audio coder proposed by Ambikairajah et al.
[3], the analysis is performed in the auditory domain by using Gammatone
filters to obtain an auditory-based time-frequency parameterization of the
input signal in the form of critical band pulse trains. This parameterization
approximates the patterns of neural firing generated by the auditory nerves,
and preserves the temporal information present in speech and music. An
advantage of this parameterization is its ability to scale easily between
different sampling rates, bit rates and signal types.
Adequate modeling of the principal behavior of the peripheral auditory
systems is still a difficult problem. An important shortcoming of Gammatone
filters is that they do not provide an accurate frequency domain description
of the tuning curves because of their flat upper-frequency slopes. In this
work we propose a new parallel auditory filterbank based on the critical
band scale. The filterbank models psychoacoustic tuning curves obtained
from the well-known masking curves [16,17]. The new auditory filters,
which have a steeper upper-frequency slope, achieve high frequency domain
accuracy and are computationally efficient. The new filterbank is then
applied to wideband speech and audio coding under the same paradigm as in
[3]. Auditory masking is applied to eliminate redundant information in the
critical band pulse trains. A technique to code the pulse positions and
amplitudes based on a run-length coding algorithm is also proposed.
This chapter is organized as follows: Section 2 presents the design
techniques for the new critical band auditory filterbank. Section 3 describes
the auditory-filterbank-based speech and audio coding scheme, including the
reduction of redundancy in the pulse trains and the quantization and coding
techniques for the pulse amplitudes and positions.
A filterbank that models the characteristics of the human hearing system
will have many desirable features and can have wide applications in speech
and audio processing. It is very difficult and costly to experimentally
observe the motion of the basilar membrane in a fully functional cochlea.
We present here an inexpensive method for generating psychoacoustic
tuning curves from the well-known auditory masking curves [16,17]. Then
two approaches to obtain the critical band filterbank that model these tuning
curves are introduced. The first approach is based on the Log-Modeling
technique for filter design, which gives very accurate results. The second
approach uses a unified transfer function to represent each filter in the
critical band filterbank.
2. Wideband Speech and Audio Coding in the Perceptual Domain 17
2. DESIGN OF A CRITICAL BAND AUDITORY
FILTERBANK
2.1 Generation of Psychoacoustic Tuning Curves from
Masking Curves
Masking is usually described as the sound-pressure level of a test sound
necessary to be barely audible in the presence of a masker. Using narrow-
band noise of a given center frequency and bandwidth as maskers and a pure
tone as the test sound, masking patterns have been obtained by Zwicker and
Fastl [16,17]. The effect of masking produced by narrow-band maskers is
level dependent. The five curves plotted as solid lines in Fig. 1 are the
18 Chapter 2
masking patterns centered at 1 kHz at the five different levels
and 100 dB [17].
It is known that the shapes of the masking patterns for different center
frequencies and different levels are very similar when plotted using the
critical band rate scale. Hence masking curves at different center
frequencies can be obtained by simply shifting the available masking curves
at Masking curves at levels other than and
100 can be generated through interpolation. The masking curves
obtained through interpolation and shifting are shown in Fig. 1 by the dashed
lines.
The tuning curves can be obtained from the masking curves as follows.
The first step is to fix a test tone at a particular frequency and level. Then
the masking curves with different center frequencies that are just able to
mask the testing tone are found and the corresponding levels are noted.
Plotting the levels as a function ofthe center frequencies provides the tuning
curve at that test tone frequency (Fig. 2).
The magnitude response of the basilar membrane (or auditory filters) can
be obtained by vertically reversing and scaling the tuning curves in Fig. 2.
This is shown in later subsections in Fig. 3 and 4 by the dashed lines. More
details can be found in [11]. The tuning curves are consistent with the
measurement of nerve tuning curves [8] and the basilar membrane response
[14]. Two auditory filter design techniques that model the magnitude
response accurately are introduced in the next subsection.
2. Wideband Speech and Audio Coding in the Perceptual Domain 19
It is well known that the human auditory system gives rise to a perception
of loudness that closely follows a logarithmic scale. Log-magnitude
modeling is a technique for IIR digital filter design [6]. This technique has
also been applied in [10] to the modeling of auditory tuning curves. The
result is a very accurate model that matches the magnitudes of the tuning
curves. The criterion for auditory filter design is based on the minimization
of the difference between the log-magnitude of the desired basilar membrane
frequency response and a pole-zero filter. The transfer function of one filter
in a critical band rate filterbank can be written as
where and are the filter parameters, P is the number of poles, and Q is
the number of zeroes. The filter design technique minimize the sum of
squared differences, on a logarithmic scale, between a given set of spectral
amplitudes and the magnitude response of sampled at the
same frequencies:
2.2 Filterbank Design by the Log-Magnitude Modeling
Technique
20 Chapter 2
where is a set of uniformly spaced frequencies
and is the desired basilar membrane frequency response (positive
magnitude values) at a certain center frequency.
The minimization of J with respect to the parameters and is a
nonlinear problem. To avoid gradient-based optimization, an iterative
procedure originally proposed in [6] is used. The minimization index at the
step can be written as
The filter at step m is computed from
where
The solution of (4) is used to update the weight function in (3) and the
process is then repeated. The complete algorithm converges to a sufficiently
small error within 2 to 3 iterations. The details of this procedure can be
found in [6,10]. A critical band filterbank of 17 filters covering the
frequency range of 50 Hz to 4000 Hz was obtained by this design technique.
The frequency response of the 17 filters is shown in Fig. 3 by the solid lines,
together with the vertically flipped tuning curves by the dashed lines. These
filters are minimum-phase IIR filters with 8 poles and 7 zeros. The
magnitude responses of the digital filters are almost indistinguishable from
the true tuning curves.
2. Wideband Speech and Audio Coding in the Perceptual Domain 21
2.3 Filterbank Design by Direct Modeling Approach
A unified digital filter model is proposed in [11] to represent the
frequency characteristics of all the tuning curves. The transfer function of
one auditory filter in a critical band filterbank is expressed in the z-domain
by
The parameters in (5) are given by
where is the sampling frequency. The critical bandwidth and the central
frequency in (6) are calculated from the following equations [16, 17]:
where is the critical band rate in Bark corresponding to The spacing of
is linear on a critical band scale.
The parameter is chosen as The term
produces a notch filter with a sharp dip at a
22 Chapter 2
point to the right of the center frequency so that the upper-frequency slope
of the overall filter is steep enough. The parameter is chosen as
To ensure the notch happens at a frequency location about 60 dB lower than
the center frequency the empirical formula that we obtained can be used
to choose
where is in Hz.
The frequency responses of five filters at critical bands 4, 7, 10, 13 and
16 are plotted in Fig. 4, together with the corresponding tuning curves. The
modeling accuracy of this direct modeling approach is acceptable and is
more straightforward than the log-magnitude modeling approach.
Our filters are also compared with the well-known Gammatone auditory
filters [5,13]. Our filters have steeper upper-frequency slopes, which is
desirable for both accurate modeling of the masking effect and noise
suppression. Critical band filters designed using this method can achieve
both high frequency domain accuracy and computational efficiency. Next we
will apply the critical band auditory filterbank to speech and audio
processing.
then the synthesis filterbank is implemented using FIR filters obtained by
time-reversal of the impulse responses of the corresponding analysis filters.
The reconstruction is nearly perfect – i.e.
Each FIR synthesis filter has 128 coefficients, so that an 8 ms delay is
required to make the filter causal if kHz.
2. Wideband Speech and Audio Coding in the Perceptual Domain 23
3. PERCEPTUAL DOMAIN BASED SPEECH AND
AUDIO CODING
3.1 Speech/audio Coding Using an Auditory Filterbank
where is the frequency response of the analysis filter at the ith
channel and M is the total number of channels. If we choose the synthesis
filters as
The speech and audio coding system implemented in this work is an
IIR/FIR analysis/synthesis scheme as described in [9] and also shown in
Figs. 5 and 6. Other possible analysis/synthesis filterbank implementations
can also be found in [9].
Each IIR analysis filter has 8 poles and 3 zeros. The analysis filterbank
can also be implemented in FIR form [3,7], but at least 100 coefficients are
required for each FIR filter to approximate the impulse response of the IIR
filter with reasonable accuracy. The auditory filterbank is also approximately
power-complementary. That is,
24 Chapter 2
The output of each filter is half-wave rectified, and the positive peaks of
the critical band signals are located. Physically, the half-wave rectification
process corresponds to the action of the inner hair cells, which respond to
movement of the basilar membrane in one direction only. Peaks correspond
to higher rates of neural firing at larger displacements of the inner hair cell
from its position at rest. This process results in a series of critical band pulse
trains, where the pulses retain the amplitudes of the critical band signals
from which they were derived.
In recognition of the fact that lower power components of the critical
band signals are rendered inaudible by the presence of larger power
components in neighboring critical bands, a simultaneous masking model is
employed. Weak signal components become inaudible by the presence of
stronger signal components in the same critical band that precede or follow
3.2 Auditory Masking
In the implementation described a simultaneous masking model similar
to that used in MPEG [4] was employed to calculate the masking threshold
for the ith critical band, however the optimum simultaneous masking
model for this scheme has yet to be determined. The simultaneous masked
pulse train for the ith critical band was obtained from pulses in the
unmasked pulse train whose amplitudes were below the masking
threshold calculated for each critical band were considered inaudible, and
were set to zero
Note that for each 32 ms frame, the gain of each critical band is
calculated based only on the non-zero pulse amplitudes. The purpose of
applying simultaneous masking is to produce a more efficient and
perceptually accurate parameterization of the firing pulses occurring in each
band. Experiments revealed that simultaneous masking removed an average
of around 10% of the pulses without altering the quality of the reconstructed
speech in any way.
25
2. Wideband Speech and Audio Coding in the Perceptual Domain
them in time, and this is called temporal masking. When the signal precedes
the masker in time, it is called pre-masking; when the signal follows the
masker in time, the condition is called post-masking. A strong signal can
mask a weaker signal that occurs after it and a weaker signal that occurs
before it [2, 16, 17]. Both temporal pre-masking and temporal post-masking
are employed in this work to reduce the number of pulses.
3.2.1 Simultaneous Masking
3.2.2 Temporal Post-masking
The masking threshold for temporal post-masking decays
approximately exponentially following each pulse, or neural firing. A simple
approximation to this masking threshold, introduced in [3], is
where is the ith of M= 21 simultaneous masked critical band pulse
train signals, and is the discrete time sample index. The
26 Chapter 2
time constants were determined empirically by listening to the
quality of the reconstructed speech, and values between and
were chosen. All pulses with amplitudes less than the masking
threshold were discarded. The thresholds are shown in Fig. 7 by the
dashed line, where the filled spikes are the pulses to be kept after applying
post-masking.
3.2.3 Temporal Pre-masking
Pre-masking is also allowed for in this work. The masking threshold
for this temporal pre-masking is chosen as
where is the ith critical band pulse train after post-masking, and is
chosen as to simulate the fast exponential decay of pre-
masking. All pulses with amplitude less than the masking threshold
were discarded. This is shown in Fig. 8, where the filled spikes are the pulses
to be kept after applying pre-masking. A reduction rate of 10% can be
achieved by pre-masking on the pulses obtained after post-masking.
The purpose of applying masking is to produce a more efficient and
perceptually accurate parameterization of the firing pulses occurring in each
band. Experiments show that the application of temporal masking reduces
the overall pulse number to about 0.70N (where N is the frame size) while
maintaining transparent quality of the coded speech and audio. This is a
significant improvement over the pulse number of 1.26N in the previous
application [3], which used Gammatone filters in the front end. The
improvement is mainly due to the spectral shape of the new auditory filters
used in this work.
2. Wideband Speech and Audio Coding in the Perceptual Domain 27
3.3 Quantization and Coding
The pulse train in each critical band after redundancy reduction was
finally normalized by the mean of its non-zero pulse amplitudes across the
frame. Thus, the parameterization consists of the critical band gains
(incorporating the normalization factors) and a series of critical band pulse
trains with normalized amplitudes. For each frame, the signal parameters
requiring for coding are the gains of the critical bands and the amplitudes
and positions of the pulses.
3.3.2 Pulse Positions
The pulse positions are coded using a new run-length coding technique.
After temporal masking and thresholding, most locations on the time-
frequency map have zero pulses. This suggests that we can just code the
3.2.4 Thresholding
The pulses in the silent frames obtained after auditory filtering and peak
picking are most likely due to background and quantization noise. These
pulses are at random positions and their magnitudes are very small, so that
the sound synthesized from these pulses are inaudible. By thresholding,
these pulses can be eliminated without affecting the quality of the
synthesized signal. A simple approach is to choose the threshold based on
the silent frames at the beginning of the coding process.
3.3.1 Pulse Amplitudes
Each critical band gain is quantized to 6 bits and the amplitude of each
pulse is quantized to 1 bit, which does not result in any perceivable
deterioration in the quality of the reconstructed speech or audio signal.
Alternatively, vector quantization can be adapted to reduce the bits required
for coding the amplitude [3].
28 Chapter 2
relative positions of neighboring pulses or the numbers of zeros between
them. Specifically, the data in all channels with one frame is concatenated
into one large vector and is scanned for pulses. Then the number of zeros
preceding each pulse is coded using 7 bits. An example is shown below
If the number of zeros is over 128, a code word of 0000000 is generated and
the counting of zeros restarts after the 128 zeros. If during the decoding
process, seven consecutive zeros are encountered, then no pulse will be
generated and the decoding carries on to the next code word. This coding
strategy is a form of run-length coding and is lossless.
The overall average bit rate resulting from this coding scheme is 58 kbps.
This is an improvement upon the 69.7 kbps in the previous work [3]. By
exploring the statistical correlations and redundancy among the pulses,
Huffman or arithmetic coding can be applied to further reduce the bit rate.
The synthesis process starts with decoding to obtain the pulse train for
each channel, and then filtering the pulse train by the corresponding FIR
synthesis filter. Summing the outputs from all filters results in the
reconstructed speech or audio signal, which is perceptually the same as the
original. The results at different stages are shown in Figs. 9-12, where Fig. 9
is the original speech signal, Fig. 10 shows the pulses obtained from peak-
picking, Fig. 11 shows the pulses retained after applying auditory masking,
and Fig. 12 is the reconstructed speech.
2. Wideband Speech and Audio Coding in the Perceptual Domain 29
4. CONCLUSIONS
Design techniques for a new critical band auditory filterbank that models
the psychoacoustic tuning curves have been proposed. The auditory
filterbank has been applied to speech and audio coding. The filterbank is
implemented as an IIR/FIR analysis/synthesis scheme to reduce
computation. Auditory masking is applied to reduce the number of pulses.
A simple run-length coding algorithm is used to code the positions of the
pulses. The reconstructed speech or audio signals are perceptually
transparent. The overall average bit rate resulting from this coding scheme is
58kbps. The filterbank has superior masking properties and the auditory-
system-based coding paradigm produces high quality coded speech or audio,
is highly scalable, and is of moderate complexity. Current research involves
investigation into to the use of Huffman coding or arithmetic coding
techniques to further reduce the bit rate by examining the statistical
correlation and redundancy among the pulses.
Ambikairajah, E., Black, N.D. and Linggard, R., “Digital filter simulation of the
basilar membrane”, Computer Speech and Language, 1989, vol. 3, pp. 105-118.
Ambikairajah, E., Davis, A.G., and Wong, W.T.K., “Auditory masking and MPEG-1
audio compression”, Electr. & Commun. Eng. Journal, vol. 9, no. 4, August 1997,
pp. 165-197.
Ambikairajah, E., Epps, J. and Lin, L., “Wideband speech and audio coding using
Gammatone filter banks”, Proc. ICASSP, 2001, pp. 773-776.
Black, M. and Zeytinoglu, M., “Computationally efficient wavelet packet coding of
wide-band stereo audio signals ”, Proc. ICASSP, 1995, pp. 3075-3078.
Flanagan, J.L., “Models for approximating basilar membrane displacement”, Bell
Sys. Tech. J, 1960, vol. 39, pp. 1163-1191.
Kobayashi, T. and Imai, A., “Design of IIR digital filter with arbitrary log magnitude
function by WLS techniques”, IEEE Trans. ASSP, vol. ASSP-38,1990, pp. 247-252.
Kubin, G. and Kleijn, W.B., “On speech coding in a perceptual domain”, Proc.
ICASSP, 1999, pp. 205-208.
Liberman, M.C. “Auditory-nerve response from cats raised in a low-noise chamber”,
J. Acoust. Soc. Am., vol. 63, 1978, pp. 442-455.
Lin, L., Holmes, W.H. and Ambikairajah, E., “Auditory filter bank inversion”, Proc.
ISCAS 2001, 200l. Vol. 2pp: 537–540.
Lin, L., Ambikairajah, E. and Holmes, W.H., “Log-magnitude modelling of auditory
tuning curves”, Proc. ICASSP, 2001, pp. 3293-3296.
Lin, L., Ambikairajah, E. and Holmes, W.H., “Auditory filterbank design using
masking curves”, Proc. EUROSPEECH 2001, pp. 411-414.
Lyon, R.F., “A computational model of filtering detection and compression in the
cochlea”, Proc. ICASSP, 1982, pp. 1282-1285.
Patterson, R.D., Allerhand, M., and Giguere, C., “Time-domain modelling of
peripheral auditory processing: a modular architecture and a software platform”, J.
Acoust. Soc. Am., vol. 98, 1995, pp. 1890-1894.
Rhode, W.S., “Observation of the vibration of the basilar membrane of the squirrel
monkey using the Mossbauer technique”, J. Acoust. Soc. Am., vol. 49, 1971, pp.
1218-1231.
Robert, A. and Eriksson, J., “A composite model of the auditory periphery for
simulating responses to complex sounds”, J. Acoust. Soc. Am., vol. 106, 1999, pp.
1852-1864.
Zwicker, E. and Zwicker, U.T., “Audio engineering and psychoacoustics: matching
signals to the final receiver, the human auditory system”, J. Audio Eng. Soc., vol. 39,
No. 3, 1991, pp. 115-125.
Zwicker, E. and Fastl, H., Psychoacoustics: Facts and models. Springer-Verlag,
1999.
30 Chapter 2
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
RECOGNITION OF ENVIRONMENTAL SOUNDS
USING SPEECH RECOGNITION TECHNIQUES
Michael Cowling and Renate Sitte.
Griffith University, Gold Coast, Qld 9726,, Australia
Abstract: This paper discusses the use of speech recognition techniques in non-speech
sound recognition. It analyses the different techniques used for speech
recognition and identifies those that can be used for non-speech sound
recognition. It then performs benchmarks on these techniques and determines
which technique is better suited for non-speech sound recognition. As a
comparison, it also gives results for the use of learning vector quantization
(LVQ) and artificial neural network (ANN) techniques in speech recognition.
Key words: non-speech sound recognition, environmental sound recognition, artificial
neural networks, learning vector quantization, dynamic time warping, long-
term statistics, mel-frequency cepstral coefficients, homomorphic cepstral
coefficients
It has long been a goal of researchers around the world to build a
computer that displays features and characteristics similar to those of human
beings. The research of Brooks [1] is an example of developing human-like
movement in robots. However, another subset of this research is to develop
machines that have the same sensory perception as human beings. This work
finds its practical application in the wearable computer domain (e.g. certain
cases of deafness where a bionic ear (cochlea implant) cannot be used.)
Humans use a variety of different senses in order to gather information
about the world around them. If we were to list the classic five human senses
in order of importance, it is generally accepted that we would come up with
the sequence: vision, hearing, touch, smell, taste.
Chapter 3
1. INTRODUCTION
Vision is undoubtedly the most important sense with hearing being the
next important and so on. However, despite the fact that hearing is a human
beings second most important sense, it is all but ignored when trying to build
a computer that has human like senses. The research that has been done into
computer hearing revolves around the recognition of speech, with little
research done into the recognition of non-speech environmental sounds.
This chapter expands upon the research done by the authors [2, 3]. In
these papers, a prototype system is described that recognizes 12 different
environmental sounds (as well as performing direction detection in 7
directions in a 180° radius). This system was implemented using Learning
Vector Quantization (LVQ), because LVQ is able to produce and modify its
classification vectors so that multiple sounds of a similar nature are still
considered as separate classes. However, no comparative testing was done to
ensure that LVQ was the best method for the implementation of a non-
speech sound classification system.
Therefore, this chapter will review the various techniques that can be
used for non-speech recognition and perform benchmark tests to determine
the technique most suited for non-speech sound recognition. Due to lack of
research into non-speech classification systems, this chapter will focus on
using speech and speaker recognition techniques applied to the domain of
environmental non-speech sounds.
The remainder of this chapter will be split into four sections. The first
section will discuss techniques that have been previously used for speech
recognition and identify those techniques that could also be applied to non-
speech recognition. The second section will show the results of benchmarks
on these techniques and also compare their performance with results for
speech recognition. The third section of this chapter will discuss these
results. Finally, the fourth section will conclude and suggest areas for future
research.
Research into speech recognition began by reviewing the literature and
finding techniques that had previously been used for speech/speaker
recognition. Techniques for both feature extraction and system learning were
analyzed and those techniques that could be used for non-speech sound
recognition were identified. These techniques were then benchmarked and
results will be presented in the Results section.
32 Chapter 3
2. SELECTION OF TECHNIQUES
In addition, it was found that emerging research in speech recognition
suggests the use of time-frequency techniques such as wavelets. Due to the
emerging nature of this research, these techniques will not be included in this
comparison. However, for an insight into how wavelets can be used for
speaker recognition, please refer to the chapter in this volume by Michael
Orr et al, "A Novel Dual Adaptive Approach to Speech Processing".
A specific investigation was then performed for each of these eight
techniques. This investigation revealed that techniques based on LPC
Cepstral Coefficients were based on the idea of a vocoder, which is a
simulation of the human vocal tract. Since the human vocal tract does not
produce environmental sounds, these techniques are not appropriate for
recognition ofnon-speech sounds.
In addition, Lilly [4] mentions that the results of the Mel Frequency
Based Filter and the Bark Frequency filter are similar, mainly due to the
similar nature of these filters. Gold [5] also mentions that PLP and Mel
Frequency are similar techniques. Based on these previous findings, only the
more popular Mel Frequency technique was selected for benchmarking.
3. Recognition of Environmental Sounds 33
2.1 Feature Extraction
For feature extraction, the literature review showed that speech
recognition relies on only a few different types of feature extraction
techniques (each with several different variations). Eight techniques were
selected as possible candidates for feature extraction of non-speech sounds.
These were:
Frequency Extraction
LPC Cepstral Coefficients
Homomorphic Cepstral Coefficients
Mel Frequency Cepstral Coefficients
Mel Frequency LPC Cepstral Coefficients
Bark Frequency Cepstral Coefficients
Bark Frequency LPC Cepstral Coefficients
Perceptual Linear Prediction Features
This leaves three feature extraction techniques to be tested:
Frequency Extraction
Homomorphic Cepstral Coefficients
Mel Frequency Cepstral Coefficients
To aid in selection of techniques, comparison tables were built (using [5,
6, 7, 8]) to compare the different feature extraction and classification
methods used by each of these techniques.
The comparison tables showed that some of these techniques, by their
very nature, could not be used for non-speech sound recognition. Any of the
techniques that use subword features are not suitable for non-speech sound
identification. This is because environmental sounds lack the phonetic
structure that speech does. There is no set “alphabet” that certain slices of
non-speech sound can be split into, and therefore subword features (and the
related techniques) cannot be used.
Due to the lack of an environmental sound alphabet, the Hidden Markov
Model (HMM) based techniques shown above will be difficult to implement.
However, this technique may be revisited in the future if other techniques
produce lower than expected results.
In addition, it was decided that the SOM and LVQ techniques
compliment each other. Kohonen developed both techniques, with specific
applications intended for each technique. For classification, Kohonen
suggests the use of the LVQ technique over the SOM technique [9].
Therefore, LVQ will be the technique benchmarked.
34
2.2 System Learning
Chapter 3
Based on this information, the four techniques left to be tested are:
Dynamic Time Warping
Long-Term Statistics
Vector Quantization / Learning Vector Quantization
Artificial Neural Networks
The following system learning techniques are commonly used for
speech/speaker recognition or have, in the past, been used for this
application domain. They are:
Dynamic Time Warping (DTW)
Hidden Markov Models (HMM)
Vector Quantization (VQ) / Learning Vector Quantization (LVQ)
Self-Organizing Maps (SOM)
Ergodic-HMM's
Artificial Neural Networks (ANN)
Long-Term Statistics
This section will detail how each of the techniques listed above were
implemented in this system. It will also discuss the details of the experiment
(such as number of sounds etc).
The techniques will be tested using a jackknife method, identical to the
method used by Goldhor [10]. A jackknife testing procedure involves
training the network with all of the data except the sound that will be tested.
This sound is then tested against the network and the classification is
recorded. In cases where the setting of initial weights may affect the
classification result (as is the case with LVQ and ANN techniques),
classification is repeated 5 times, with different initializations each time. A
correct classification is only recorded if more than three of the training runs
are correct. This jackknife procedure will be repeated with all six of the
samples from each of the eight sounds.
3. Recognition of Environmental Sounds 35
3. ANALYTICAL ANALYSIS OF SPEECH
RECOGNITION TECHNIQUES
3.1 Experiment Setup
As an initial test, eight sounds were used, each with six different samples.
Data set size was kept as small as possible due to the time it takes to train
larger data sets. The sounds used for this test are detailed below and are
some typical sounds that would be classified in a sound surveillance system.
3.2 Benchmarking Method
The feature extraction and system learning techniques shown in the
comparison will be tested for their ability to classify non-speech sounds in
two ways. First, benchmarking will be performed, using these techniques, on
non-speech sounds and data on the parameters, the resulting time taken and
the final correct classification rate will be recorded. Then, these results will
36 Chapter 3
be compared with statistics and benchmark results reported in the literature
for the performance of these techniques on speech. This will demonstrate
how these techniques perform against each other on speech and provide a
comparison to the results for non-speech.
In addition, since feature extraction and system learning are both required
to recognize a sound, each system learning technique should also be tested
against each feature extraction technique to determine the best combination
of these two techniques. The exception to this is the Long-Term Statistics
technique, which generates its own features and therefore requires no feature
extraction techniques. Therefore, ten combinations of techniques must be
benchmarked:
3.3 Methodology
Each of the techniques used was implemented in MATLAB. Both feature
extraction and system learning techniques were implemented and then
combined together in the way shown above in order to perform a
comprehensive comparison. In this section, the implementation of both the
feature extraction and system learning techniques will be discussed.
3.3.1 Feature Extraction Techniques
Three feature extraction techniques will be tested in this comparison. The
implementation of each of these techniques will be discussed in this section.
3.3.1.1 Frequency Extraction
Frequency Extraction was performed using the Fast Fourier Transform
(FFT) routine in MATLAB, which uses the following equation for FFT:
where f represents the range of frequencies in the signal. Each filter is then
multiplied by the spectrum (or portion of the spectrum if it has been split
using hamming windows) to produce a series of magnitude values (one for
each filter). Finally, a Cepstral Coefficient formula (shown in the next
3. Recognition of Environmental Sounds 37
with
where is the frequency we wish to check for, j counts all the samples in
the signal and N is the length of the signal being tested. Since non-speech
sound covers a wider frequency range than speech (anywhere from 0Hz to
20,050Hz, the approximate limit of human hearing), a 44,100 point FFT (N
= 44100) was performed and the results (22,050 unique features) were used
to train the system learning network.
3.3.1.2 Mel-Frequency Cepstral Coefficients
The MFCC algorithm was taken from the Auditory Toolbox by Malcolm
Slaney of Interval Research Corporation [11]. This toolbox is in wide use in
the research community. This toolbox applies three steps to produce the
MFCC. First, it applies a Hamming Window using the standard Hamming
Window equation:
where n represents the subset of the signal which is being windowed. A
Melody Frequency Filterbank is then applied to each windowed segment.
The melody frequency filter bank m is a logarithmic calculation using the
following relation:
section) is applied to produce MFCC and these features are then modified
into a vector that is more appropriate for training a network. Special
attention was paid to removing the first scalar within the vector, which
represents the total signal power [5] and is therefore too sensitive to the
amplitude of the signal [4].
3.3.1.3 Homomorphic Cepstral Coefficients
The MFCC algorithm from the Auditory Toolbox by Malcolm Slaney of
Interval Research Corporation [11] was then used as a basis to implement a
Homomorphic Cepstral Coefficient (HCC) algorithm. This algorithm was
written from scratch but based on information from the source code in the
MFCC algorithm.
The HCC algorithm applies the cepstral coefficient formula directly to
the signal after it had been split using hamming windows. To calculate
cepstral coefficients we use the following relation:
where
and n is the length of the windowed segment being manipulated. These
features were then modified into a vector that was more appropriate for
training a network. As with the MFCC, special attention was paid to
removing the first scalar within the vector, which represents the total signal
power [5] and is therefore too sensitive to the amplitude of the signal [4].
3.3.2 System Learning Techniques
Four system-learning techniques will be tested in this comparison. The
implementation of each of these techniques will be discussed in this section.
38 Chapter 3
3.3.2.1 Learning Vector Quantization
Learning vector quantization (LVQ) was implemented using the inbuilt
LVQ routines in MATLAB’s neural network toolbox. The network was
initialized with 20 competitive neurons and a learning rate of 0.05. This
combination was found to give an acceptable classification rate.
3.3.2.2 Artificial Neural Networks
Artificial neural network (ANN) was implemented using the fast back
propagation algorithm (BPA) in the MATLAB neural network toolbox
(trainbpx). The network was initialized with 20 hidden neurons and a
learning rate of 0.05. In addition, sum-squared error was set to 0.1 and the
momentum constant was set to 0.95.
3.3.2.3 Dynamic Time Warping
Dynamic time warping (DTW) was implemented using the algorithm in
the Auditory Toolbox developed by Malcolm Slaney [11]. The test signal
was warped against each of the reference signals and the error was recorded.
The smallest error was taken to represent the closest class of sound.
3.3.2.4 Long-Term Statistics
Long-Term Statistics (LTS) was implemented using the mean and
covariance functions available in the standard MATLAB distribution, where
N is the length of the signal x. Mean and covariance were calculated for each
of the reference signals and stored in a matrix. The mean and covariance of
the test signal was then compared to this matrix. The closest match was
selected as the correct class. If the closest mean and covariance occurred in
difference classes, the test was concluded to be inconclusive.
4. RESULTS & DISCUSSION
This section will cover the results of this research. Results are shown for
the comparative study of existing speech recognition techniques when these
techniques are applied to non-speech. In addition, a discussion is given on
these results.
3. Recognition of Environmental Sounds 39
40 Chapter 3
4.1 Results
4.1.1 Non-Speech Sound Recognition
Results for non-speech sound recognition are presented below.
3. Recognition of Environmental Sounds 41
42 Chapter 3
4.1.2 Speech Recognition
For comparison, results were found for LVQ and ANN in speech
recognition systems. These results are presented here. Due to the current
popularity of HMM methods in speech recognition at the present time,
results for DTW are difficult to find, therefore no DTW results are
presented.
For ANN’s, a selection of results from Castro and Perez [12] are shown
below. Their results were taken on an isolated word recognition set with
typically high classification error, the Spanish EE-set. The Multi-Layer
Perceptron (MLP) tested used the back propagation algorithm, contained 20
hidden neurons and was trained over 2000 epochs with various amounts of
inputs. The figures given are the MLP’s estimated error rate with a 95%
confidence interval.
For LVQ, results from Van de Wouver e.a. [13] are shown below for
both female and male voices. These results present statistics for both a
standard LVQ implementation for speech recognition and an implementation
of LVQ that then has fuzzy logic performed on it (FILVQ). As can be seen
from the results, the use of LVQ for speech recognition produces rather low
recognition results.
Other documents randomly have
different content
THE SPIRIT LEVEL
This is necessary on outdoor structures which are to be placed on
foundations, in securing level or horizontal timbers, and in plumbing
the uprights. The human eye is not equal to the task. Masons and
builders make use of wooden plumb rods, but as the level is
necessary to secure the horizontals, it will be at hand for the
uprights, the two glass tubes being at right angles. (Fig. 131.)
Fig. 131. The spirit level
RULE
A two-foot, four-fold, boxwood rule, graduated to eighths outside
and sixteenths inside, will answer all ordinary requirements. (Fig.
132.)
THE STEEL SQUARE
Fig. 132. Steel square and rule
This simple but valuable tool, about which volumes have been
written, is necessary for building construction, but is not needed in
the making of furniture or cabinet work.
Advanced Signal Processing For Communication Systems The Springer International Series In Engineering And Computer Science 1st Edition Tadeusz Wysocki
Fig. 133. The nail box
Fig. 134. Socket chisels
XXIII
MAKING NAIL BOXES
The boys now became very busy completing their shop equipment,
and the first project was a box for holding different sizes of nails.
This was to be kept on the bench where it could be reached
conveniently, and it is shown in Fig. 133.
After studying the sketch, Harry made out the bill of material:
2 pcs. pine 15 × 13⁄4 × 1⁄2
2 pcs. pine 3 × 13⁄4 × 1⁄2
2 pcs. pine 31⁄2 × 13⁄4 × 3⁄8
These six pieces were squared up, and
the joints for the two partitions laid
out by placing them edge to edge in
the vise. Pencil lines were drawn
across the faces at random, a. Ralph
explained that by fitting these pencil
lines they could at any time bring the
two pieces together in the original
position.
The four knife lines representing the
edges of the grooves were next drawn,
and squared half-way down on each
edge, using the face with the pencil
lines as a working face. The bottom of
the groove was laid off with the
marking gauge set at 1⁄4 inch. The
wood inside the lines was removed by
making a saw cut just inside the knife lines, and cutting out with a
3⁄8-inch chisel.
This led to a talk on chisels. Ralph explained that for fine work a
"firmer" chisel was used, having a comparatively thin body.
There are two kinds of handles, known as "socket" and "tang." The
chisels having "tangs" should never be hammered, as the tang acts
as a wedge and splits the handle. Where blows are to be struck with
the mallet, a socket handle should be used. (Fig. 134.) For heavy
work, where hard blows are to be struck, as in house-framing, and
out-of-door work generally, the heavy framing tool should be used.
The handle of this chisel has a heavy iron ring near the top to keep
it from going to pieces.
Our boys' equipment at this time consisted of one half-inch and a
one-inch firmer chisel with tang handles, a 1⁄8-inch and 3⁄8-inch
socket firmer, and one 1⁄2-inch framing chisel. Later on they added a
1⁄4-inch firmer with tang handle.
The grooves for the nail box were cut with the 3⁄8-inch chisel
without the aid of the mallet.
Ralph showed how, by inclining the tool at a slight angle, a paring
action could be obtained, and by working from both ends of the
groove no corners were destroyed.
When the four grooves were finished, the box was ready for
assembling. This called for hammer and nails.
Wire nails are so cheap now that the old-fashioned cut nails have
been largely driven from the market.
The nails used on the box were one-inch brads.
The holding power of flat-head nails is of course much greater than
bung head, but in this case the box was to be squared up after
nailing, exactly as if it were a solid block of wood. This meant
planing the sides and ends, and as the nails would ruin the plane
iron, they were all sunk below the surface with a nail set or punch.
(Fig. 135). This is a useful tool, but not absolutely necessary, as for
light work a wire nail, with the point ground flat on the grindstone,
will answer the same purpose. A carpenter frequently uses the edge
of a flat-head nail instead of the punch.
Photograph by Arthur G. Eldredge
The Correct Way to Hold the Chisel.
Fig. 135. Wire nails
and nail sets
Fig. 135a Wire
nails and nail
sets
The box was assembled by nailing together the sides and ends. The
bottom was next put on, holding the try square along one side and
end to make sure everything was square, and last of all the two
partitions were pushed down into their grooves, and tied in place by
one brad from each side. Next, all nails were set, and the outside
tested with the try square and trued up with the plane.
The cabinet of drawers
shown in Fig. 136 was
next designed to keep
the assortment of
screws and nails, which
the boys knew would
soon accumulate. As far
as possible, they were
kept in their original
paper boxes, on which
the sizes were plainly
printed.
The twelve drawers
were simply boxes
without covers or
partitions, and Ralph
suggested that it was
not necessary to make
them all at once, but that they could often
fill in spare time that way, and gradually
complete the dozen.
Fig. 136. Cabinet for nails and screws
After making the nail box with partitions, this was a simple job, it
being only important that they all be of the same size.
The construction of the cabinet, however, brought new problems.
The shelves, being short, did not require any vertical support except
at the ends, where they were gained into the sides, and to give
Harry practice the top and bottom were to be "rabbeted" into the
sides. The sides then were the most important parts. All six pieces
were first squared up to the dimensions called for in the drawing.
The list of material was as follows:
4 pcs. 245⁄8 × 12 × 1⁄2 shelves
2 pcs. 14 × 12 × 1⁄2 ends
1 pcs. 251⁄8 × 14 × 1⁄4 back
"The grain must run the long way," said Ralph, "so the grooves will
be across the grain."
The four grooves were laid out with knife and try square, and the
lines scored as deeply with the knife as possible.
Then another cut was made with the knife inside of the first, and
with the knife held at about 45 degrees, cutting out a V-shaped
groove, as shown at a.
In each of these grooves a cut with the buck saw was made down to
the line, and the wood removed with the 3⁄8-inch chisel. There are
special planes, called rabbet planes, and plows for doing this kind of
work, but it is good practice for beginners to use the chisel.
The grooves finished, the cabinet was put together with 11⁄2-inch
brads, except the back. This being of thin material, and having no
special strain on it, was nailed on with 1-inch brads. The total width
of the drawers in each tier was 1⁄8 inch less than the space. This
gave clearance, so that they could be moved in or out easily.
Later, when all twelve drawers were finished, the boys bought a
dozen simple drawer pulls, and screwed one in the centre of each
box.
The centre was found by drawing the diagonals in light pencil lines.
The front and ends were sand-papered, and given two coats of dark-
green stain, and the cabinet was placed on a shelf against the wall.
Advanced Signal Processing For Communication Systems The Springer International Series In Engineering And Computer Science 1st Edition Tadeusz Wysocki
XXIV
BIRD HOUSES
The boys felt that they were ready for business, and Ralph
suggested that they had provided enough weather vanes and
windmills, but had made no provisions for the birds.
The cat, that arch enemy of the native birds, had driven the robins,
martins, and wrens all away. Each year some of these brave little
birds started homes in the trees near the house only to have their
families devoured as soon as they were hatched.
A bird house to be attractive need not be very pretentious, but it
must absolutely be cat-proof, or the birds will inspect it carefully
from all points of view and leave it severely alone. A nest well
hidden in the tree foliage or shrubbery is not nearly so conspicuous
as a brightly painted house fastened to the limbs of a tree. The side
of a barn or outhouse, far enough down from the roof so that the
cat cannot reach it, or a tall pole covered on the upper part with tin,
so that the feline bird hunter cannot gain a foothold, are about the
only safe places for a house which the birds will actually adopt. The
first house our woodworkers manufactured is shown in Fig. 137.
This was a single or one-family house, and its construction was very
simple.
The list of material follows:
One pc. 1⁄2 -inch pine or white wood 10 × 61⁄2 ins.
Two pcs. 1⁄2 -inch pine or white wood 71⁄2 × 3 ins.
One pc. 1⁄2 -inch pine or white wood 91⁄2 × 5 ins.
One pc. 1⁄2 -inch pine or white wood 91⁄2 × 41⁄2 ins.
Two pcs. 1⁄2 -inch pine or white wood 51⁄4 × 41⁄2 ins.
The first piece, 10 × 61⁄2 inches, was simply squared up for the
bottom. The two pieces for the sides, 71⁄2 × 3 inches, were squared
up, and one edge of each planed to a 45-degree bevel, to engage
with the roof boards.
The latter were squared up, and nailed together at right angles with
11⁄4-inch brads.
The two ends, 51⁄2 × 41⁄2 inches, were carefully laid out as shown
in the drawing, sawed, and planed to the lines with square edges.
In the end which was to contain the circular door a hole 13⁄4 inches
in diameter was bored with its centre two inches from the bottom
line. This required the services of the extension bit, and, to avoid
splitting the wood, as soon as the spur of the bit showed on the
further side, the wood was turned about, and the hole finished from
the other side.
The house was next turned upside down, and fastened in the bench
vise. Holes were drilled along the sides of the bottom piece 3⁄4 inch
in from the edge—three on each side—countersunk, and the piece
fastened to the sides with 1-inch No. 8 screws. The top pieces
already nailed together were now nailed in position on the sides and
ends with 1-inch brads.
Fig. 137. One family bird house, and house for high-hole
The pole they used was 13 feet long and about 3 inches in diameter
at the small end. It was rounded at this end by using a draw knife.
(Fig. 138). A block of 7⁄8-inch pine was bored out, and fitted snugly
over the end of the pole. This block was then removed, and four
holes bored through it for screws.
Fig. 138. The draw knife
Before replacing the block on the top of the pole a cut was made
across the end of the pole about two inches deep, by means of the
rip saw.
The block was replaced, and wooden wedges driven into the saw
cut. This fastened the block securely on the end of the pole, and
after making sure that it was level, the bird house was fastened to
the block by four 11⁄4-inch screws from the under side.
A piece of sheet tin was wound around just under the house to
discourage pussy, and the pole set into the ground about three feet,
bringing the under side of the house ten feet above the ground.
A double or two-family house of similar proportions was built next,
as shown in Fig. 139. The list of material called for:
One pc. 1⁄2-inch wood 181⁄2 × 61⁄2 (bottom)
One pc. 1⁄2-inch wood 181⁄2 × 51⁄2 (roof)
One pc. 1⁄2-inch wood 181⁄2 × 41⁄2 (roof)
Two pcs. 1⁄2-inch wood 151⁄2 × 3 (sides)
Three pcs. 1⁄2-inch wood 51⁄4 × 41⁄2 (ends and partition)
The construction was the same as before, each end having a door,
and the partition of course being solid. The block for supporting the
house on the pole was larger, being 8 × 5 × 11⁄4 inches, and called
for six 11⁄2-inch No. 10 screws, to secure it to the under side of the
floor. Harry wanted to make it more complete by adding a small
wind vane, but Ralph said it might frighten the birds, so it was
omitted.
Of course larger and more ornamental houses may be built, but
where there are too many families in such close proximity there is
apt to be trouble, while houses that are too conspicuous do not
appeal to the beautiful American wild birds that we want to attract.
With the English sparrow it does not matter so much. For these
birds, a tenement house against the side of a barn may be built
easily, in the form shown in Fig. 139.
This may be made any length, each door leading to a compartment
separated from the others by partitions. Make as many pieces plus
one as there are to be compartments, apartments, or flats; have the
bottom project as shown in side view for a perch and walk, and have
the roof also project to shed rain.
If not fastened from the inside of the barn by stout screws, this
house must be secured to a shelf, or by brackets.
Fig. 139. Two family house and tenement
The side view shows a simple shelf made of a back piece secured to
the side of the barn by screws or nails, a plain shelf nailed to this
Fig. 140. The bird
bath
back piece, and two wooden brackets. If iron brackets are used,
both the shelf and back piece may be omitted, the brackets being
fastened to the under side of the bird house and to the siding of the
barn by screws.
For birds like the high-hole, or flicker, a piece of hollow log, or an
elongated box fastened securely to the side of a pole, made cat
proof, is very acceptable. This should not be painted, but should be
provided with a door on the side and a perch. (Fig. 137.) The
opening should be about three inches for these large birds, and the
location should be as secluded as possible. Any number of devices
will suggest themselves, but always remember the cat, and study
the location from the bird point of view. The martins and swallows
are especially to be encouraged, as they are wonderful destroyers of
insects.
One device, especially grateful to these
feathered friends in hot weather, is a pan of
water, in a place where they can drink and
bathe without being eternally on the watch
for that crouching enemy, who is always
stalking them—Tabby.
A pedestal with a platform about four feet
above the ground will do nicely, and it can
be placed so close to the house that you
can watch them, and enjoy their ablutions
almost as much as they do. (Fig. 140.)
The construction is too simple to require an
explanation.
Advanced Signal Processing For Communication Systems The Springer International Series In Engineering And Computer Science 1st Edition Tadeusz Wysocki
XXV
SIMPLE ARTICLES FOR HOUSEHOLD USE
The boys thought it was about time to pay some attention to the
wants of the family, who had been clamouring for weeks to have this
article or that for the kitchen, dining room, and in fact for every part
of the house.
Ralph was a wise teacher, however. He knew that the cause of
ninety out of every hundred failures was due to the young
mechanic's trying some problem too far advanced.
It seems strange that people cannot learn this lesson. We have seen
hundreds of boys led along, say in carving, from one simple lesson
to another, until at the end of five or six carefully graded exercises,
these boys could carve beautifully any design given them.
On the other hand, we have seen boys start in on their own hook,
without any direction from older people, and ruining everything they
tried, simply because they wanted to do the most difficult thing first,
before they had developed any skill.
Ralph was determined that his boy should be an expert and
successful user of tools, so he paid no attention to the clamours of
the family, and allowed Harry to make only those things which were
within his power to do well. Each time a piece of work was finished,
and inspected by the family, the universal chorus was something like
this:
"Well, if he can make such a fine bird house, I don't see why he
can't make half a dozen picture frames for these water colors," or,
"If he can make such a fine pen tray, I don't see why he can't make
a new stool for the piano!"
In vain Ralph explained that these things could be made in due time,
that a picture frame required much more skill than a bird house, etc.
Their household articles commenced with a bread board for the
kitchen. (Fig. 141). This gave Harry his first experience in planing a
broad surface. He used jack and smoothing planes for the working
face, and squared the rest of the board as he had smaller pieces.
This required some time. The wood about the semi-circular top was
removed with saw and chisel, the board held for the chiselling flat on
the bench hook. After getting this curve as true as possible with the
chisel, it was finished with a sand-paper block. A 1⁄2-inch hole was
bored at the centre of the semi-circle to hang it up by, and the two
lower corners were rounded with chisel and sand-paper. No sand-
paper was used on the flat surface, as Ralph explained this was a
board for cutting bread, and the grit from the sand-paper would
become more or less embedded in the wood, and it would spoil the
bread knife. Sand-paper is made of ground quartz, and it soon dulls
the edge of a cutting tool.
Fig. 141. The bread board
The knife and fork box (Fig. 142) brought new problems. The list of
material was:
1 pc. 111⁄2 × 31⁄4 × 1⁄2 2 pcs. 7 × 11⁄2 × 1⁄2
2 pcs. 14 × 11⁄2 × 1⁄2 1 pc. 12 × 61⁄2 × 1⁄4
It was made of white wood, and, after being assembled, was stained
a rich brown by receiving two coats of bichromate of potash. This is
a chemical, which may be bought at a paint or drug store in the
form of crystals. These are dissolved in water, until the solution looks
like pink lemonade. It can be applied with a brush, but each coat
must be allowed to dry completely before the whole is sand-papered
smooth with No. 0 sand-paper. A deeper brown can be obtained by
adding one or two extra coats of stain.
The middle partition containing the handle was made first. The
drawing was laid out on the wood after it had been squared up, and
two holes 1 inch in diameter were bored out at a a. The wood
between was taken out with a key-hole saw, and finished to the line
with chisel and knife. A turning saw can be used to advantage on
this handle, but it is not absolutely necessary. Spaces b b were
removed in the same way, but a knife was used in the concave part
of the curve. If it is handy, a small spokeshave can be employed on
the whole upper line of this handle.
Anything in the nature of a handle should be rounded to fit the
hand. Edges c c were therefore rounded with the knife, and finished
with coarse, followed by fine, sand-paper.
The two sides were laid out together as in the nail box, and the
groove cut with back saw and 1⁄8-inch chisel.
The end pieces were made in a similar manner, and the bottom
piece squared to 1⁄16-inch of finished size. The assembling consisted
of first gluing together the sides and ends. Two hand screws were
used to hold them. This was Harry's first attempt at using hand
screws, and Ralph showed him the importance of keeping the jaws
parallel.
Fig. 142. Method of
using hand screws in
the construction of a
knife box
The box remained in the hand screws
over night, and the next day it was
found to be securely fastened. The
most convenient kind of glue for boys
is the liquid sold in cans. It is always
ready for use, and very handy where
only a moderate quantity is needed.
Dry glue in the form of flakes, or
granulated, must be soaked over
night, and then heated in a pot having
a double bottom with water in the
lower part.
It should be put on hot with a brush or
a small flat stick. The best glue is none
too good, yet a good quality has
wonderful holding power and should
last indefinitely.
After removing the hand screws, the
unfinished box was placed in the vise,
tested with the edge of the plane, and
made perfectly true, top and bottom.
The 1⁄4-inch bottom piece was now
put on with one-inch brads, the sides and ends made square, the
handle partition slipped into the grooves, and fastened with two
brads at each end.
This knife box was so satisfactory that our young carpenters
resolved to have a large one for tools. Whenever they had a job to
do in the house, they were constantly running out to the shop for
something, so that a tool box became a necessity.
The construction was similar to the knife box; but this was larger
and heavier, and the dado joints at the ends were replaced by a butt
joint fastened with flat-head screws. (Fig. 143). The bottom and
Fig. 143. Tool box
partition were also put on with screws,
on account of the weight to be carried.
Fig. 144. Another tool box
These tool boxes are frequently made in the shape shown in Fig.
144, with sloping sides and ends called the hopper joint; but aside
from the tool practice it affords, it is doubtful if the shape has
advantage enough over the other form to warrant the extra time it
takes. Man is an imitative creature, however, and what one
carpenter has, the others copy.
The principal features about this useful article should be size and
strength, especially in the handle, which should be of about 5⁄8 or
3⁄4 inch stock.
Advanced Signal Processing For Communication Systems The Springer International Series In Engineering And Computer Science 1st Edition Tadeusz Wysocki
XXVI
THE MITRE BOX AND PICTURE FRAMES
It seemed to Harry that the shop was fairly well equipped, but Ralph
insisted that they must have a mitre box before making anything
else for the house.
The mitre box is, or should be, an instrument of precision, and
although simple in construction, must be perfectly accurate, or it is
useless. (Fig. 145.)
The illustration shows the common form, but elaborate affairs of iron
and wood can be bought ready made. Every boy should make his
own, for the practice, if for nothing else. The sides should be made
of oak 7⁄8 inch thick, 18 inches long, and 31⁄2 inches high, the
bottom of 7⁄8-inch pine or other soft wood, the same size.
When squared up, the two sides must be tested by standing them
side by side; then reverse one end for end, to see if they are alike. If
not, find where the trouble is, and correct it.
It is especially important that the edges of the bottom piece be
square and the sides perfectly parallel. This test can be made with
the marking gauge. Sides are fastened on by boring and
countersinking for three screws on each. After assembling, the
whole thing must be tested as if it were a solid block. Top edges
must be true and parallel.
Advanced Signal Processing For Communication Systems The Springer International Series In Engineering And Computer Science 1st Edition Tadeusz Wysocki
Fig. 145. The 45° mitre box and test pieces
Near one end—about two inches in—lay out across the top with try
square a line 90 degrees with the sides. Carry the line down each
side, square with the top edges. For 45-degree angles, lay out a
square by drawing two pencil lines across the top, as far apart as the
finished mitre box is wide. Draw the two diagonals and square lines
from their ends down both sides, taking care that their position is
not over the screw in the bottom; because as the saw cuts deeper it
may reach this screw and ruin its teeth.
Make the three saw cuts directly on the lines laid out with a cross
cut or back saw, with the utmost care. If this is not done accurately,
all the labour of preparation is wasted. The blank end of the mitre
box may have an additional 90-degree cut, or be left for new cuts in
the future, as a mitre box of this description wears out and becomes
inaccurate.
Other angles may be used, as 60 degrees or 30 degrees, but it is
better to have these on another box as they are used less, and for
special purposes. (Fig. 146.)
The mitre box is not ready to use until it has been thoroughly tested.
Prepare a strip of soft wood—pine or white wood—11⁄2 inches wide
and 1⁄2 inch thick. Cut four pieces from it on the mitre box, using the
back saw as shown at a, with only one of the slits. Place these four
triangular pieces together to form a square. All the four mitre joints
of this square must fit perfectly. If they do not, mark the slit "N. G.,"
and test the other slit in the same way. If all right, mark "O. K." It
often happens that one may be perfect and the other inaccurate. If
they are both O. K., the box is ready for use. If one slit is useless,
lay out and cut another on the blank end of the mitre box in the
same direction, and test again.
In testing a 30-degree cut three pieces of the strip should be sawed
out, and when placed together they should form a perfect equilateral
Fig. 146. 30-60-90
mitre box
triangle, while from a 60-degree cut,
six pieces are needed to form a
hexagon.
These angles are valuable in inlaid
work, and for getting out geometrical
designs.
The 45-degree cut is indispensable in
making the mitred corners of picture
frames and in cabinet work.
In making picture frames of simple
cross section, it is first necessary to
cut the rabbet (Fig. 147) with a rabbet
plane. If this moulding is made by hand, the size of the picture
should be measured, the length of all four sides added, and a liberal
allowance made for waste.
Fig. 147. Making picture frames
In the figure, the triangles a a are waste, the rabbet being indicated
by the dotted line. After the four pieces have been sawed out on the
mitre box, they should be placed together on a flat surface, such as
the bench top or floor, to see if the mitres fit perfectly. If they do
not, one of them can be block planed to make a perfect fit, and the
other three laid close together, as shown in the illustration.
The assembling is the hardest part of the operation, and many
devices have been tried and some patented to hold the parts
together while the glue is drying.
Perhaps the surest way is to drill a hole in one piece of each joint
large enough for the passage of a wire bung-head nail.
The undrilled piece is placed vertically in the vise. The drilled piece,
after receiving a thin coat of glue, is brought into position
horizontally, and the nail driven home.
Theoretically, the nail should catch at the first blow, but the
horizontal piece will sometimes slip, even with the best of care. It is
wiser to place this piece about 1⁄16 inch above its final position, to
allow for this slip.
A method sometimes used is to glue near the ends of each piece a
triangular block of wood, as shown at d. These must be left over
night to harden.
The next day the whole four pieces can be glued and held together
by four hand screws, as shown, until the glue is thoroughly hard.
This method, of course, can only be used with plain moulding or that
which is square on the outside.
Our boys tried another way that is commonly practised. They nailed
oblong blocks to an old drawing board, as shown at e e, and then
placed the picture frame in the centre, after gluing the joints, and
driving wedges in between the blocks and the frame. Paper placed
under each joint prevented the frame from being stuck to the
drawing board by the glue forced out by the pressure.
This paper plan was learned by experience, as the first frame the
boys tried had to be pried up from the board, and in so doing they
broke it at two of the joints, so that it had to be made again.
It is well to remember in gluing mitre joints that end grain absorbs
more glue than a flat surface. A priming coat should be applied first,
and allowed to remain a few moments to fill up the pores. The
second coat should hold fast and make a strong joint, but an excess
of glue should always be avoided, as it must be removed after
hardening, and glue soon takes the edge from the best of tools.
Very fancy frames should be avoided. A bevel on the outside or
inside, or both, is about all the young woodworker should attempt in
the way of ornamentation. Depend on the natural beauty of the
wood, as a fancy frame draws the attention from the picture, which
after all is the main thing. We should admire the man, not his
clothes, the picture not its frame, although the latter should be neat
and well made.
The finishing and polishing of frames is taken up in Chapter XLIX.
Advanced Signal Processing For Communication Systems The Springer International Series In Engineering And Computer Science 1st Edition Tadeusz Wysocki
XXVII
MAKING TOILET BOXES
To make a wooden box sounds like a simple proposition; but in
making the drawing, the questions of size, proportion, joints, hinges,
etc., immediately come up.
The size of course depends on the purpose of the box. If it is for
ladies' gloves, it should be long and narrow; if for collars or
handkerchiefs, square or nearly so. The height is nearly always
made too great. In fact, the whole question of proportion is one
which can hardly be taught; it must be felt, and different people
have different ideas as to what constitutes good proportion.
Some hints, however, may be given: A box perfectly square does not
look well. Again, dimensions that are multiples do not look well. A
box 4 × 8 × 12 inches would not be nearly so pleasing as one 3 ×
51⁄2 × 12 inches.
The proportions are also affected by the constructive details. Is the
box to be flat on the sides and ends or is the top to project? etc.
Our boys argued and sketched and finally drew the design shown at
Fig. 148. This was to hold ties. The top was to project and have a
bevel, or chamfer, also the bottom. No hinges were to be used, but
the cover was to have cleats fastened on the under side to keep it in
place, and to prevent warping.
The next question was the manner of fastening the sides and ends.
On unimportant work, a butt joint with glue and brads can be used,
but for a toilet article, the holes made by the brads, even if they are
filled with putty, are not satisfactory.
Fig. 148. Dado joint
used in box design
So it was decided to use the dado joint
as shown at a. This meant more fine
work, but, as Ralph suggested, it was
to last a lifetime, and should be made
right.
Sides and ends were squared up, and
the grooves on the side pieces laid out
as in the nail box. The rabbets on the
end pieces were cut out with the back
saw and chisel. After the joints had been carefully fitted, the four
pieces were glued together and placed in hand screws over night.
While the glue was hardening, the two pieces for the top and bottom
were squared up and bevelled with the smoothing plane on the long
sides, the block plane on the ends.
The cleats for the top were next made, drilled and countersunk for
the screws as at b.
A careful full-sized drawing of half of the top was made, and a chip
carving design drawn for it. The cleats were not put on until the
carving was finished and short screws had to be used so they would
not come through and spoil the surface.
The next day the body of the box was removed from the hand
screws and squared with a smoothing plane. The top and bottom
were put on with 1-inch brads. These were "set" with a nail punch to
prevent any possible scratching and the whole box was rubbed down
with wax dissolved in turpentine.
For fine cabinet work, the dovetail joint makes the most satisfactory
method of fastening, but Harry was not yet skilled enough to do the
fine work it demanded.
The second box was for handkerchiefs, dimensions 8 × 7 × 3 inches
outside, and no overhang at either top or bottom. The construction
brought in several new features. Sides and ends were dadoed
together as in the first box.
The top and bottom, after being squared, were rabbeted on all four
sides until they fitted snugly into the opening top and bottom. They
were glued in these positions and placed in hand screws over night.
(Fig. 149.)
"How are you going to get into that box?" asked Harry. "You've
closed it up solid and glued the top on."
"Wait and see," was all the satisfaction he got.
Fig. 149. The handkerchief box
The next day the hand screws were removed and the box squared
up exactly as if it had been a solid piece of wood. Ralph then made
two gauge lines around the four sides, 3⁄4 inch from the top and 1⁄8
inch apart. Then he cut the box in two between these two lines with
a rip saw, after slightly rounding all corners except the bottom ones
with a plane and sand-paper.
By this method, the box and cover must be exactly alike in outline,
and by planing to the gauge lines, they will fit perfectly.
It only remained to hinge the two parts together, but this operation
proved to be no slight task.
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PDF
Satellite Communication Engineering 1st Edition Ron S. Kenett
PDF
Satellite Communication Engineering 1st Edition Ron S. Kenett
PDF
Video Coding for Wireless Communication Systems Signal Processing Series 1st ...
PDF
Satellite Communication Engineering 1st Edition Michael O. Kolawole
PDF
Programmable Digital Signal Processors Vol 13 Architecture Programming And Ap...
PDF
Signal Processing for Wireless Communications Systems Information Technology ...
PDF
Video Coding For Wireless Communication Systems Signal Processing Series 1st ...
PDF
Ofdm Based Relay Systems for Future Wireless Communications 1st Edition Milic...
Satellite Communication Engineering 1st Edition Ron S. Kenett
Satellite Communication Engineering 1st Edition Ron S. Kenett
Video Coding for Wireless Communication Systems Signal Processing Series 1st ...
Satellite Communication Engineering 1st Edition Michael O. Kolawole
Programmable Digital Signal Processors Vol 13 Architecture Programming And Ap...
Signal Processing for Wireless Communications Systems Information Technology ...
Video Coding For Wireless Communication Systems Signal Processing Series 1st ...
Ofdm Based Relay Systems for Future Wireless Communications 1st Edition Milic...

Similar to Advanced Signal Processing For Communication Systems The Springer International Series In Engineering And Computer Science 1st Edition Tadeusz Wysocki (20)

PDF
Ofdm Based Relay Systems for Future Wireless Communications 1st Edition Milic...
PDF
Optimizing Wireless Communication Systems 1st Edition Fabiano S De Chaves Msc
PDF
Ofdm Based Relay Systems for Future Wireless Communications 1st Edition Milic...
PDF
Signal Processing for Mobile Communications Handbook 1st Edition Mohamed Ibnk...
PDF
Signal Processing for Wireless Communications Systems Information Technology ...
PDF
Signal Processing for Mobile Communications Handbook 1st Edition Mohamed Ibnk...
PDF
OFDM for Optical Communications 1st Edition William Shieh
PDF
Signal Processing For Mobile Communications Handbook 1st Edition Mohamed Ibnk...
PDF
Comparative Study of Optic Fibre and Wireless Technologies in Internet Connec...
PDF
Technological breakthroughs in modern wireless sensor applications 1st Editio...
PPTX
Amrita_IntroCommnSys_WelcomeLecture.pptx
PDF
Proceedings Of The 5th International Conference On Signal Processing And Info...
PDF
Digital Image and Signal Processing for Measurement Systems 1st Edition J. Ri...
PDF
OFDM for Optical Communications 1st Edition William Shieh
PDF
5G and Beyond Wireless Systems: PHY Layer Perspective Manish Mandloi
PDF
Applications Of Digital Signal Processing Christian Cuadradolaborde
PPTX
Link & Match Program
PDF
Last Mile Internet Access for Emerging Economies Wynand Lambrechts
PDF
5G and Beyond Wireless Systems: PHY Layer Perspective Manish Mandloi
Ofdm Based Relay Systems for Future Wireless Communications 1st Edition Milic...
Optimizing Wireless Communication Systems 1st Edition Fabiano S De Chaves Msc
Ofdm Based Relay Systems for Future Wireless Communications 1st Edition Milic...
Signal Processing for Mobile Communications Handbook 1st Edition Mohamed Ibnk...
Signal Processing for Wireless Communications Systems Information Technology ...
Signal Processing for Mobile Communications Handbook 1st Edition Mohamed Ibnk...
OFDM for Optical Communications 1st Edition William Shieh
Signal Processing For Mobile Communications Handbook 1st Edition Mohamed Ibnk...
Comparative Study of Optic Fibre and Wireless Technologies in Internet Connec...
Technological breakthroughs in modern wireless sensor applications 1st Editio...
Amrita_IntroCommnSys_WelcomeLecture.pptx
Proceedings Of The 5th International Conference On Signal Processing And Info...
Digital Image and Signal Processing for Measurement Systems 1st Edition J. Ri...
OFDM for Optical Communications 1st Edition William Shieh
5G and Beyond Wireless Systems: PHY Layer Perspective Manish Mandloi
Applications Of Digital Signal Processing Christian Cuadradolaborde
Link & Match Program
Last Mile Internet Access for Emerging Economies Wynand Lambrechts
5G and Beyond Wireless Systems: PHY Layer Perspective Manish Mandloi
Ad

Recently uploaded (20)

PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Institutional Correction lecture only . . .
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
01-Introduction-to-Information-Management.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Cell Types and Its function , kingdom of life
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Final Presentation General Medicine 03-08-2024.pptx
Institutional Correction lecture only . . .
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
O5-L3 Freight Transport Ops (International) V1.pdf
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
VCE English Exam - Section C Student Revision Booklet
Microbial diseases, their pathogenesis and prophylaxis
Abdominal Access Techniques with Prof. Dr. R K Mishra
Chinmaya Tiranga quiz Grand Finale.pdf
01-Introduction-to-Information-Management.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
A systematic review of self-coping strategies used by university students to ...
Complications of Minimal Access Surgery at WLH
Cell Types and Its function , kingdom of life
Pharmacology of Heart Failure /Pharmacotherapy of CHF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Anesthesia in Laparoscopic Surgery in India
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Ad

Advanced Signal Processing For Communication Systems The Springer International Series In Engineering And Computer Science 1st Edition Tadeusz Wysocki

  • 1. Advanced Signal Processing For Communication Systems The Springer International Series In Engineering And Computer Science 1st Edition Tadeusz Wysocki download https://ebookbell.com/product/advanced-signal-processing-for- communication-systems-the-springer-international-series-in- engineering-and-computer-science-1st-edition-tadeusz- wysocki-2127288 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Advanced Signal Processing For Industry 40 Evolution Communication Protocols And Applications In Manufacturing Systems Bajaj Ansari https://ebookbell.com/product/advanced-signal-processing-for- industry-40-evolution-communication-protocols-and-applications-in- manufacturing-systems-bajaj-ansari-50856820 Advanced Signal Processing Theory And Implementation For Sonar Radar And Noninvasive Medical Diagnostic Systems 2nd Ed Stergios Stergiopoulos https://ebookbell.com/product/advanced-signal-processing-theory-and- implementation-for-sonar-radar-and-noninvasive-medical-diagnostic- systems-2nd-ed-stergios-stergiopoulos-4096642 Advanced Signal Processing Handbook Theory And Implementation For Radar Sonar And Medical Imaging Real Time Systems Stergiopoulos https://ebookbell.com/product/advanced-signal-processing-handbook- theory-and-implementation-for-radar-sonar-and-medical-imaging-real- time-systems-stergiopoulos-6750218 Advanced Signal Processing Handbook Theory And Implementation For Radar Sonar And Medical Imaging Realtime Stergios Stergiopoulos https://ebookbell.com/product/advanced-signal-processing-handbook- theory-and-implementation-for-radar-sonar-and-medical-imaging- realtime-stergios-stergiopoulos-1269268
  • 3. Advanced Signal Processing Theory And Lmplementation For Sonarradarand Noninvasive Medical Diagnostic Systems 2009th Edition Stergios Stergiopoulos https://ebookbell.com/product/advanced-signal-processing-theory-and- lmplementation-for-sonarradarand-noninvasive-medical-diagnostic- systems-2009th-edition-stergios-stergiopoulos-231593430 Signal Processing For Neuroscientists A Companion Volume Advanced Topics Nonlinear Techniques And Multichannel Analysis Elsevier Insights 1st Edition Drongelen https://ebookbell.com/product/signal-processing-for-neuroscientists-a- companion-volume-advanced-topics-nonlinear-techniques-and- multichannel-analysis-elsevier-insights-1st-edition-drongelen-1824500 Multimedia Signals And Systems Basic And Advanced Algorithms For Signal Processing 2nd Edition Srdjan Stankovi https://ebookbell.com/product/multimedia-signals-and-systems-basic- and-advanced-algorithms-for-signal-processing-2nd-edition-srdjan- stankovi-5354424 Advanced Design Techniques For Rf Power Amplifiers Analog Circuits And Signal Processing 1st Edition Anna N Rudiakova https://ebookbell.com/product/advanced-design-techniques-for-rf-power- amplifiers-analog-circuits-and-signal-processing-1st-edition-anna-n- rudiakova-2356380 Advanced Methods For Processing And Visualizing The Renewable Energy A New Perspective From Signal To Image Recognition 1st Ed 2021 https://ebookbell.com/product/advanced-methods-for-processing-and- visualizing-the-renewable-energy-a-new-perspective-from-signal-to- image-recognition-1st-ed-2021-36127610
  • 5. ADVANCED SIGNAL PROCESSING FOR COMMUNICATION SYSTEMS
  • 6. THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE
  • 7. ADVANCED SIGNAL PROCESSING FOR COMMUNICATION SYSTEMS edited by Tadeusz A. Wysocki University of Wollongong, Australia Michael Darnell The University of Leeds, United Kingdom Bahram Honary Lancaster University, United Kingdom KLUWER ACADEMIC PUBLISHERS NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
  • 8. eBook ISBN: 0-306-47791-2 Print ISBN: 1-4020-7202-3 ©2002 Kluwer Academic Publishers New York, Boston, Dordrecht, London, Moscow Print ©2002 Kluwer Academic Publishers All rights reserved No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher Created in the United States of America Visit Kluwer Online at: http://kluweronline.com and Kluwer's eBookstore at: http://ebooks.kluweronline.com Dordrecht
  • 9. CONTENTS PREFACE ix. 1. 2. 3. 4. 5. 6. Application of Streaming Media in Educational Environments P. Doulai 1 Wideband Speech and Audio Coding in the Perceptual Domain 15 L.Lin, E.Ambikairajah and W.H.Holmes Recognition of Environmental Sounds Using Speech Recognition Techniques M.Cowling andR.Sitte 31 A Novel Dual Adaptive Approach to Speech Processing M.C.Orr, B.J.Lithgow, R.Mahony, andD.S.Pham 47 On the Design of Wideband CDMA User Equipment (UE) Modem K.H.Chang, M.C.Song, H.S.Park, Y.S.Song, K.-Y.Sohn, Y.-H.Kim, C.I.Yeh, C.W.Yu, andD.H.Kim 59 MMSE Performance of Adaptive Oversampling Linear Multiuser Receivers in CDMA Systems P.Iamsa-ard andP.B.Rapajic 71
  • 10. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. vi. Peak-to-Average Power Ratio of IEEE 802.11a PHY Layer Signals A.D.S.Jayalath and C.Tellambura 83 97 A Proposed Hangup Free and Self-Noise Reduction Method for Digital Symbol Synchronizer in MFSK Systems C.D.Lee and M.Darnell A Channel Sounder for Low-Frequency Sub-Surface Radio Paths D.Gibson and M.Darnell 113 Computational Complexity of Iterative Channel Estimation and Decoding Algorithms for GSM Receivers H.Cui and P.B.Rapajic 129 Modelling and Prediction of Wireless Channel Quality S.Ci andH.Sharif 139 Packet Error Rates of Terminated and Tailbiting Convolutional Codes J.Lassing, T.Ottosson and E.Ström 151 The Feng-Rao Designed Minimum Distance of Binary Linear Codes and Cyclic Codes J.Zheng, T.Kaida andK.Imamura 167 On a Use of Golay Sequences for Asynchronous DS CDMA Applications J.R.Seberry, B.J.Wysocki and T.A.Wysocki 183 PUM-Based Turbo Codes L.Fagoonee, B.Honary and C.Williams 197 A Code for Sequential Traitor Tracing R.Safavi-Naini and Y.Wang 211 Software-Defined Analyzer of Radio Signals J.Lopatka 225 Interleaved PC-OFDM to Reduce Peak-to-Average Power Ratio A.D.S.Jayalath and C.Tellambura 239
  • 11. 19. 20. INDEX vii. 283 Reducing PAR and PICR of an OFDM Signal K.Sathananthan and C.Tellambura 251 Iterative Joint Equalization and Decoding Based on Soft Cholesky Equalization For General Complex Valued Modulation Symbols J.Egle and J.Lindner 267
  • 12. PREFACE In the second year of the twenty first century, we are witnessing unprecedented growth in both quality and quantity of services offered by communication systems. Most of the recent advancements in communication systems performance have been only made possible by application of digital signal processing in all areas of communication systems development and implementation. Advanced digital signal processing allows for the new generation of communication systems to approach the theoretical predictions, and to practically utilize the ideas that have not been considered feasible to implement not so long ago. This book consists of 20 selected and revised papers from the 6th International Symposium on Digital Signal Processing for Communication Systems, held in January 2002, at Pacific Parkroyal Hotel in Manly, Sydney, Australia. The first group of papers, deals with the audio and video processing for communications applications, and includes topics ranging from multimedia content delivery over the Internet, through the speech processing and recognition to recognition of non-speech sounds that can be attributed to the surrounding environment. Another theme which receives significant attention in this book is orthogonal freqency division multiplexing (OFDM) in its various forms, eg HIPERLAN, IEEE 802, 11 a. Aspects of OFDM technology, which are covered here, include novel forms of modulation and coding, methods of reducing in-band and out-of-band spurious signal generation, and means of reducing the peak-to-average power ratio of an OFDM waveform. In these contributions, a key objective is to return the inherent implementational
  • 13. simplicity ofthe OFDM technique whilst enhancing its performance relative to single carrier systems. Digital signal processing for second and third generation systems is represented in the book as well. The topics covered here include both theoretical issues like spreading sequence design and implementation issues of 3G user equipment modem, and MMSE receivers for CDMA systems. A useful comparison of complexity of channel estimation, equalization and decoding for GSM receivers is discussed, too. The book also includes useful papers on applications of error control coding and information theory. These start with mathematical structure and decoding techniques and continue with channel capacity approaching codes and their applications to various communication systems. The last group of papers included in the book consider several important issues of digital signal processing for communication systems like modulation, software defined radio, and channel estimation. The Symposium was made possible by the generous support of the New South Wales Section of IEEE, the Smart Internet Technology Cooperative Research Center, the Telecommunications and Information Technology Research Institute, the Australian Telecommunications Cooperative Research Center, and the School of Electrical, Computer, and Telecommunications Engineering at the University of Wollongong. The Organizing Committee is most grateful for their support. The editors wish to thank the authors for their dedication and lot of efforts in preparing their contributions, revising and submitting their chapters as well as everyone else who participated in preparation ofthis book. Tadeusz Wysocki Mike Darnell Bahram Honary x.
  • 14. APPLICATION OF STREAMING MEDIA IN EDUCATIONAL ENVIRONMENTS Parviz Doulai Educational Delivery Technology Laboratory (EDTLab), University of Wollongong, Wollongong NSW 2500, Australia Abstract: This paper discusses the growing application of Web-based instruction and examines real time streaming technology in educational settings. The steps required in the process of applying streaming technology in education are outlined, and available tools and the nature of the delivery platforms are identified. The prospects and challenges in introducing virtual learning environments to tertiary institutions are illustrated using two case studies. It will not be long to overcome the challenges confronting technology-based education in traditional teaching institutions. Key words: Educational Technology, Streaming, Multimedia, Virtual Learning Environment, Virtual Classroom Chapter 1 1. INTRODUCTION AND BACKGROUND Educational institutions have long been a testing ground for the latest technological breakthroughs that change the way professional educators work and live. Examples include the growing application of information communications technologies and the use of network delivered multimedia educational modules through the application of interactive and dynamic Web environments. The growing global information technology revolution has already changed the face and culture of teaching and learning in Australia and other parts of the world, creating new opportunities and challenges for professional educators. The new and emerging educational technologies have enabled academic institutions to provide a flexible and more open
  • 15. learning environment for students. It is shown that in a well-designed web- based support system, students take more responsibility for their own learning, and instructors function more like coaches and mentors for a new generation of professionals [1]. The outcome of research and development work in utilizing new and emerging educational technologies in traditional educational institutions has also found its way in serving distance students. The convergence of new information technologies such as telecommunications, computers, satellites, and fiber optic technologies is making it easier for teaching institutions to implement distance education [2,3]. National and transnational virtual universities as well as traditional educational establishments are offering online degree programs, continuing education and corporate training courses. In many cases Web-based instruction and course management tools are used to deliver courseware containing interactive multimedia-based educational modules. An integrated environment containing Web-based course delivery and management along with multimedia modules is commonly referred to as a virtual learning environment or a virtual classroom. Virtual learning environments are used to support real classroom environments in traditional academic institutions [4,5]. Virtual classrooms also were found to be very attractive in virtual campuses and virtual universities all around the globe [6]. Key technologies involved in the development of virtual learning environments include multimedia and streaming media. Reasons for developing and utilizing virtual classrooms by teaching institutions vary, some endeavor to keep up with the ever changing frontiers of educational technologies, whilst others see it as an approach that gives students more control over their learning. The use of new and emerging educational technologies offers students a dynamic learning environment through which class communication and collaboration can be achieved with minimum time and budget requirements. In fact, the great benefit of online learning in general and virtual classrooms in particular is that it provides educators with an opportunity to get students to collaborate and to communicate very easily [1]. Two key issues in online learning are retention and the development of interactive and collaborative activities and environments. Creating a motivational and interactive virtual learning environment can enhance student retention, completion, and overall enthusiasm for this new type of learning arena [1]. In applications related to online learning, multimedia is the ability to include sound and video into Web pages. Due to the availability of many public domain and commercial computer programs it has become increasingly easy to incorporate audio and video clips into any digital document or multimedia Web publishing materials. Streaming media came 2 Chapter 1
  • 16. about in response to the problem of bandwidth-greedy multimedia files, opening the possibilities of delivering many multimedia applications via the Internet. Streaming refers to the process of delivering audio clips, video clips, and other media in real-time to online users [7]. Streamed audio and video files can be found in a number of World Wide Web locations serving a wide-variety of purposes, such as a vocal introduction to a homepage, a movie trailer, or an interactive educational presentation. One of the major attractions to streaming media is "live" broadcasting that has less applicability to educational environment. In a simple educational setup, the streaming media is used to deliver synchronized text, images and other media files over the public TCP/IP network. In a more complex setup, streaming is used for network delivery of interactive multimedia modules [8]. This paper illustrates two case studies; a simple virtual classroom offering standard Power Point slides synchronized with streamed voice narration and a stream video presentation in which the video is indexed to the table of the content. These case studies are explained in terms of the module structure and the method of delivery. Both modules are delivered to students over a low bandwidth modem connection. It would be useful to utilize desktop videos for course material presentation and distribution. However, until recent times network delivery of multimedia clips was limited to a corporate environment or on-campus environment where students have direct access to high-speed lines. The delivery of media files over the Web has always been limited by the bandwidth of communication lines or channels. Development in this field is happening in two directions: faster connections and communication technologies [9] that are altering the capacity of the communication channels and new multimedia technologies for the Web, such as streaming audio and video, flash animation, and others that are allowing for better delivery of media on the Web [4]. When video first came to the World Wide Web, it was necessary to download the entire video file before it could be played. This was seen to be one major disadvantage of traditional multimedia clips and modules. Downloading typically megabytes of video files resulted in substantial delays before the audience could actually hear or view the clip. This was even worse when large clips were downloaded over a slow modem connection. 1. Application of Streaming Media in Educational Environments 3 STREAMING: MULTIMEDIA FILES FOR NETWORK DELIVERY 2.
  • 17. Streaming media is a method of providing audio, video and other media files in real-time without download waits over the Internet or corporate Intranet. Instead of downloading the file in its entirety before playing it, streaming technology takes a different approach; it downloads the beginning of the file, forms a buffer of packets, and when an appropriate buffer is reached, the client player plays back the packets in a seamless stream. While the viewer is watching, it downloads the next portion, etc., until the entire file is played. The buffer provides a way for the player to protect itself in case of network congestion, lost packets, or other interference. 4 Chapter 1 2.1 History: Streaming Audio and Video Progressive Networks [10] led the way in the development of streaming audio and video, launching “RealAudio 1.0“ in 1995. “RealAudio 2.0” was then announced that upgraded sound to “FM mono” quality and made live Webcasting possible for the first time. RealAudio 2.0 introduced important features such as server bandwidth negotiation, support for firewalls and open Application Programming Interface (API) for third party developers. Compatibility of RealAudio 2.0 with the Netscape Navigator plug-in architecture made it possible to play RealAudio content available as an integrated part of a Web page. In February 1997, Progressive Networks released RealVideo 1.0 that made delivery of video over 28.8 kbps a reality. The system also offered full-motion-quality video using V.56 (56kbps) and near TV broadcast quality video at Local Area Network (LAN) rates or broadband speeds (100 kbps and above). In October 1997, Progressive Networks officially changed its name to Real Networks prior to the release of what it called “RealSystem 5.0”. The system included RealPlayer 5.0, RealEncoder 5.0, RealServer 5.0 and a software called RealPublisher. Until the release of RealSystem 6.0 in 1999, the delivery of multimedia files were conducted using Real Networks propriety PNM (Progressive Networks Metafile) format. RealSystem 6.0 used the Real Time Streaming Protocol (RTSP) that was then a new standard for improved server-client communication. RealSystem 6.0 could also stream and play not just Real Networks own format, but also standard data types such as MIDI, AVI or QuickTime. Case studies illustrated in this paper were based on RealSystem 6.0. Real Time Streaming Protocol is designed to work with time-based media, such as streaming audio and video, as well as any application where application-controlled, time-based delivery is essential. In addition, RTSP is designed to control multicast delivery of streams, and is ideally suited to full multicast solutions [7]. Currently, RealSystem supports a variety ofnew data types. These include audio and video as well as text, images and animation.
  • 18. In fact, streaming now is seen to be a platform for delivering information, rather than just as a system for delivering video. One can tie other kinds of Web content to the timeline of a video or an audio presentation. This allows the creation of a complex and personalized experiences for the end user. An example that contains a variety of media files with precise timing structure is available in [11]. 2.2 Why Streaming? There are several reasons why downloading of an entire media file prior to its play back is unsuitable in the delivery of information over the public TCP/IP network. For instance, if a user on a low bandwidth connection (and even high bandwidth) wants to move forward in the video they have to wait until the whole file is downloaded. Also, if a user only views a small portion of the stream and they are on a high bandwidth connection they are likely to have downloaded the whole file after only a few seconds. This will cost the user extra bandwidth because Web servers typically download as fast as they can. Moreover, Web severs do not have Intellectual Property control and so a publisher will not be able to prevent users from downloading the media file for re-using. Also Web servers are not capable of delivering presentations of unlimited or undetermined length, as well as live broadcast of media files. There are other reasons as well, which proves the superiority of dedicated streaming servers over the standard web servers in the delivery of multimedia files. Streaming multimedia has been optimized for use on the Internet in two ways: 1. Application of Streaming Media in Educational Environments 5 Clips are highly compressed, so that download time is drastically reduced. The goal is to download the clip faster than it takes to play the clip, even when using dial up modem connections. The players and plug-ins can play the clip as it is being downloaded. They start playing immediately, thus reducing wait time for the user. These optimizations allow users to do things that are impractical for traditional multimedia including broadcasting of live audio and video events and broadcasting of extremely large multimedia files, such as audio books that can take many hours to play. Often delivery of multimedia files through a dedicated stream server is combined with fast-forward and rewind capabilities.
  • 19. RealSystem, Microsoft Windows Media Technologies [12] and Apple’s QuickTime [13] offer tools for streaming multimedia content across corporate Intranets and the Internet. They allow the use of scripting languages to control the player or more importantly the integration with the browser so that one can embed the player and control it using Java script. Exposure to Java is useful as it ensures the developers can use the wealth of Java in virtual classrooms. Producing a pre-recorded streaming multimedia requires the following steps: 6 Chapter 1 3. STREAMING MEDIA: SERVERS, PLAYERS AND ENCODERS l. 2. 3. 4. 5. Recording the content that requires proper recording equipment such as video cameras, microphones, etc. Digitization or conversions of resulting clip into a multimedia format, such as .wav, .avi, .mov, rm, etc. It is possible to do this at the same time as step one by recording directly to the multimedia format. Post-processing in the multimedia format, such as adjusting sound quality, editing the content, etc. Conversions of the resulting multimedia format into a preferred format (eg. RealSystem format) using the relevant encoder (eg. RealProducer). If there are no editing enhancements, one can record direct to the preferred format. Uploading the resulting file on a Web server, or a dedicated steaming server such as RealServer, so people on the Web can access it as streaming multimedia. Examples shown in this paper use RealSystem, which is a collection of components by Real Networks for producing and distributing streaming multimedia. The three components of RealSystem include: Producer Module (encoder) that converts existing multimedia files into RealSystem format. The encoder program can also record to RealSystem format directly from audio and video sources. Player Module that plays, amongst other things, the RealSystem media file formats. The free version of RealPlayer includes both as an external version, and a Web browser plug-in version. The professional version of RealPlayer adds the ability to record broadcasts and other advanced features. Server Module that offers live broadcast and advanced features like automatic adjustments of transmission speeds to match user’s connection, or the ability to fast forward and rewind.
  • 20. 4. VIRTUAL LEARNING ENVIRONMENTS Web-based instruction can be supplemented by audio and video files to closely simulate a real classroom environment. Streaming technology is the key technology used in delivery of educational multimedia modules over the network. A virtual learning environment in its relatively complete form contains a small size video clips that shows the class activity as well as a series of text pages and images representing the content of the blackboard and the overhead projector screen. From a developer of educational resources perspectives, the interesting idea behind streaming files is the synchronization of the playback of arbitrary files such as text, images etc. For instance, one can synchronize a flash animation file with an audio, text, image, or any other data files. In a virtual classroom environment, one can synchronize the playback of a class video with images taken from the blackboard or the overhead screen as the lecture progresses. 4.1 Case Study 1: Stream Video Integration into Virtual Classrooms Due to the recent availability of video compressor/decompressor (codec) technologies with compressions designed for web delivery, it is now possible to use video as an effective resource in a web-based instruction environment. Different client programs are now available to make movies with different data rates, and different streaming server programs are now available to negotiate with the client machines to deliver stream video at relatively high quality even via narrow bandwidth of modem connections. A stream video presentation was included into a combined final year and Master subject (ELEC476/912) learning environment to provide background materials for students group projects. This module was offered in two formats to meet low- and high-end Mac, PC and UNIX platforms as well as slow and moderately fast network connections. In both formats an audio and a video file synchronized with text and images were used to create a simple virtual tutorial classroom. An interesting feature of most streaming server programs is that they allow client machines to directly negotiate with the server to access the part on the media file it wants. Normally, after a short pause the user can jump to anywhere in an audio or video clip. The video can be indexed to a table of contents and can also automatically "flip" pages in an adjacent frame according to markers embedded in the video. As shown in Figure 1, the video file in this presentation was indexed to a table of content, and that was done through markers embedded in the video file during the encoding 1. Application of Streaming Media in Educational Environments 7
  • 21. process. These enabled students to click on items listed in the table of content (left window) in order to view its associated video along with its synchronized text and images in allocated areas within the presentation window. An online questionnaire was administered to obtain information regarding student access to the subject homepage and its stream video integration in ELEC476/912 virtual learning environment. Survey results showed that students realized the benefits of technology-enhanced resources that were incorporated into their on-campus course delivery. Students’ comments and feedback on the course content, the method of delivery and available tools and resources for this subject was archived in [14]. 4.2 Case Study 2: ELEC101 Virtual Classroom The Web Edition of “Fundamentals of Electrical Engineering (ELEC101)” is a simple virtual classroom environment that uses the real- time streaming technology to deliver synchronized Power Point slides (images) and audio files (the lecturer voice) over the Internet. To ensure students using different computers of any power and different connections of any speed could retrieve the content of ELEC101 virtual classroom four options, namely plain, synchronized, controlled synchronized 8 Chapter 1
  • 22. 1. Application of Streaming Media in Educational Environments and power-point slide/script were provided. Figure 2 shows a screen caption of the cover page of ELEC101 World Wide Web Edition. Rather than replacing the conventional lecturing of ELEC101, the Web edition was designed and implemented to help students who need to review important pointers of major topics. Students need to have a freely available RealPlayer and perhaps a headphone set so that they can hear the lecture and view the overheads in computer laboratories or at home using a standard 56 kbps dial up connection on PC, Mac or UNIX platform. In the plain format students first receive a page containing thumbnails of available overheads. The RealPlayer will start working as soon as students click on a thumbnail to view the actual overhead. Then, they step to the slide they are interested in, and hear the associated audio clip with each slide. Students may control the RealPlayer operation, and they also have standard navigation tools. The RealPlayer may be used as a plug-in program or as a Netscape or Internet Explorer helper application. The latter means by clicking on the RealAudio icon, the browser lunches the player and from there, students control the player operation; recording, playback, rewind and so forth. They may also use standard previous and next buttons to move around. A screen caption ofthe plain format is shown in Figure 3. 9
  • 23. 10 Chapter 1 In synchronized format student receive power point slides and their associated sound. The audio file automatically updates slides displayed as the lecture progresses. RealPlayer multiple controls were provided in this option. These include play, pause, volume-control and position-slider. Users can use the latter to move forward and backward through the presentation.
  • 24. 1. Application of Streaming Media in Educational Environments 11 The controlled synchronized option of the ELEC101 displays projected slides on the screen and plays the corresponding sound. In this mode of operation, students step to the slide they are interested in and start the player. While the audio is playing, it will automatically update the slide as the lecture progresses. Alternatively, students can jump to a new slide by clicking on thumbnails listed on the left frame, and the audio will jump to follow. To start listening to the audio from a particular slide, students may type the slide number in the space provided in control section and press the enter key. Figure 5 shows a screen caption of ELEC101 in a controlled synchronized mode of operation. Provisions also were made for students using a computer without a sound card. In this case they view a slide on one window and read its corresponding text on another browser window. Implementation of the plain format is very simple provided the developer knows the technology and has some almost freely available tools. The “controlled synchronized” version of ELEC101 represents some challenges. This version uses JavaScript, Frames, and the RealAudio Plug-in.
  • 25. Nowadays, the RealPlayer itself supports Java driven events. This basically means the development of synchronized audio and video files for network delivery is much easier, and can be done by almost everyone. The ELEC101 virtual classroom environment was tested by a group of second year students using moderately high-speed connection (computer laboratories on campus) and low speed dial up connections (28.8kbps and higher modems). The setup performed with no interruptions or delay in delivering the subject content (sound and images). The entire concept of virtual classroom and the application of streamed and synchronized audio file were found by students very exciting and motivating. The setup is now available on Internet for public use [14]. 5. CONCLUSION The combination of powerful compression algorithms, extensive features that are associated with streaming servers and integration with the Web make it possible to use virtual learning environments effectively over narrow bandwidth networks. This paper explored the integration of the multimedia modules into a virtual learning environment. Real time streaming technology in an educational setting was examined and the process of applying streaming technology in education was briefly highlighted. Two examples of virtual learning environments using stream synchronized audio/video and image files were illustrated. It is envisaged that the usage of technology enabled methods in face-to-face university instruction results in a model that works equally well for distance students and learners in virtual campuses. P. Doulai, “Preserving the quality of on-Campus education using resource-based approaches,” Proc. International WebCT Conference on Learning Technologies, University of British Columbia, Vancouver, Canada, 1999, pp. 97-101. B. Hart-Davidson and R Grice, “Extending the dimensions of education: Designing, developing, and delivering effective distance-educ.,” Proc. of the IEEE Professional Communication Conference, 2001, pp. 221-230. E. R. Ladd, J. R. Holt and H. A. Rumsey, “Washington state university's engineering management program distance education industry partnership,” Proc. of Portland International Conference on Management of Engineering and Technology, 2001. pp. 302-306. P. Doulai, Smart and Flexible Campus: “Technology Enabled University Education,” Proc. of The World Internet and Electronic Cities Conference, 2001, Iran, pp. 94-101. V. Trajkovic, D. Davcev etal, “Web-based virtual classroom,” Proc. of IEEE Conference on Technology of Object-Oriented Languages and Systems, 2000, pp. 137-146 12 Chapter 1 REFERENCE [1] [2] [3] [4] [5]
  • 26. W. Beuschel, “Virtual campus: scenarios, obstacles and experiences,” Proc. of IEEE Conference on System Sciences, 1998, pp. 284-293. A. Zhang; Y. Song and M. Mieike, NetMedia: “Streaming multimedia presentations in distributed environments,” IEEE Multimedia, Vol.9, 2002 pp. 56-73. P. Doulai, “Recent developments in Web-based educational technologies: A practical overview using in-house implementation,” Proc. of the International Power Engineering Conference, 1999, Singapore, pp. 845-850. D. Fernandez, A. B. Garcia, D. Larrabeiti, A. Azcorra, P. Pacyna, and Z. Papir, “Multimedia services for distant work and education in an IP/ATM environment,” IEEE Multimedia, Vol.8,2001 pp. 68-77. RealNetworks(ProgressiveNetworks) http://www.real.com/ Design and Management 1, “Introduction to Group Projects (ELEC195) Homepage,” http://edt.uow.edu.au/elec195/welcome.ram S. Huang and H. Hu, “Integrating windows streaming media technologies into a virtual classroom environment,” Proc. of International Symposium on Multimedia Software Engineering, 2000, pp. 411-418 Apple QuickTime, http://www.apple.come/quicktime/ The Educational Delivery Technology Laboratory (EDTLab), University of Wollongong, http://edt.uow.edu.au/edtlab/portfolio.html/ 1. Application of Streaming Media in Educational Environments 13 [6] [7] [8] [9] [10] [11] [12] [13] [14]
  • 27. WIDEBAND SPEECH AND AUDIO CODING IN THE PERCEPTUAL DOMAIN L. Lin, E. Ambikairajah and W.H. Holmes School of Electrical Engineering and Telecommunications, The University of New South Wales, UNSW Sydney 2052, Australia. Abstract: A new critical band auditory filterbank with superior auditory masking properties is proposed and is applied to wideband speech and audio coding. The analysis and synthesis are performed in the perceptual domain using this filterbank. The outputs of the analysis filters are processed to obtain a series of pulse trains that represent neural firing. Simultaneous and temporal masking models are applied to reduce the number of pulses in order to achieve a compact time-frequency parameterization. The pulse amplitudes and positions are then coded using a run-length coding algorithm. The new speech and audio coder produces high quality coded speech and audio, with both temporal and spectral fidelity. Key words: auditory filterbank, speech coding, simultaneous and temporal masking Current applications of speech and audio coding algorithms include cellular and personal communications, teleconferencing, secure communications etc. Historically, coding algorithms using incompatible compression techniques have been optimized for particular signal classes such as narrowband speech, wideband speech, high quality audio and high fidelity audio (CD quality). It is evident that a universal speech and audio coding paradigm is required to meet the diverse needs of the above applications. Low bit rate speech coders provide impressive performance above 4kbps for speech signals. But do not perform well on music signals. Similarly, transform coders perform well for music signals, but not for speech signals at lower bit rates. Speech and general audio coders are usually quite different – for speech one of the main tools is a model of the speech production process, whereas Chapter 2 1. INTRODUCTION
  • 28. for audio more attention is paid to modeling the human auditory system, since a source model is usually not feasible. The new MPEG-4 standard for multimedia communication includes a scalable audio codec supporting transmission at bit rates from 2 to 64kbps. However, in order to achieve the highest audio quality with the full range of bit rates, MPEG-4 actually employs three types of codec. For lower bit rates, a parametric codec (Harmonic Vector Excitation Coding) is used which encodes at 2-4kbps for speech with an 8kHz sampling frequency, and at 4-16kbps for speech and audio with 8 or 16kHz sampling frequency. A Code Excited Linear Predictive (CELP) codec is used for the medium rate – i.e. 6-24kbps at 8 or 16kHz sampling frequency. Time-frequency (TF) codecs, including the MPEG-2 AAC and Twin VQ codecs are used for the higher bit rates, requiring 16-64kbps at a sampling frequency of 8kHz. There is therefore a need for high quality coders that can work equally well with either speech or general audio signals. In this work we propose a scheme for a universal coder that can handle both wideband speech and audio signals. This coder is based on a new auditory filterbank model, and is a further development of the speech and audio coding scheme initially proposed by Ambikairajah et al. [3], in which the analysis and synthesis of the speech and audio signals take place in the perceptual domain. 16 Chapter 2 1.1 Coding using Auditory Filterbanks In recent years parallel auditory filterbanks such as the Gammatone filterbank [5,13] have outperformed the conventional transmission line auditory model [1,12] in terms of computational simplicity. They have applications in various types of signal processing required to model human auditory filtering. Gammatone auditory filters were first proposed by Flanagan [5] to model basilar membrane motion, and were subsequently used by Patterson et al. [13] as a reasonably accurate alternative for auditory filtering. They have since become very popular. Robert and Eriksson [15] applied them to produce a nonlinear active model of the auditory periphery, and Kubin and Kleijn [7] applied them to speech coding. In the wideband speech and audio coder proposed by Ambikairajah et al. [3], the analysis is performed in the auditory domain by using Gammatone filters to obtain an auditory-based time-frequency parameterization of the input signal in the form of critical band pulse trains. This parameterization approximates the patterns of neural firing generated by the auditory nerves, and preserves the temporal information present in speech and music. An advantage of this parameterization is its ability to scale easily between different sampling rates, bit rates and signal types. Adequate modeling of the principal behavior of the peripheral auditory systems is still a difficult problem. An important shortcoming of Gammatone
  • 29. filters is that they do not provide an accurate frequency domain description of the tuning curves because of their flat upper-frequency slopes. In this work we propose a new parallel auditory filterbank based on the critical band scale. The filterbank models psychoacoustic tuning curves obtained from the well-known masking curves [16,17]. The new auditory filters, which have a steeper upper-frequency slope, achieve high frequency domain accuracy and are computationally efficient. The new filterbank is then applied to wideband speech and audio coding under the same paradigm as in [3]. Auditory masking is applied to eliminate redundant information in the critical band pulse trains. A technique to code the pulse positions and amplitudes based on a run-length coding algorithm is also proposed. This chapter is organized as follows: Section 2 presents the design techniques for the new critical band auditory filterbank. Section 3 describes the auditory-filterbank-based speech and audio coding scheme, including the reduction of redundancy in the pulse trains and the quantization and coding techniques for the pulse amplitudes and positions. A filterbank that models the characteristics of the human hearing system will have many desirable features and can have wide applications in speech and audio processing. It is very difficult and costly to experimentally observe the motion of the basilar membrane in a fully functional cochlea. We present here an inexpensive method for generating psychoacoustic tuning curves from the well-known auditory masking curves [16,17]. Then two approaches to obtain the critical band filterbank that model these tuning curves are introduced. The first approach is based on the Log-Modeling technique for filter design, which gives very accurate results. The second approach uses a unified transfer function to represent each filter in the critical band filterbank. 2. Wideband Speech and Audio Coding in the Perceptual Domain 17 2. DESIGN OF A CRITICAL BAND AUDITORY FILTERBANK 2.1 Generation of Psychoacoustic Tuning Curves from Masking Curves Masking is usually described as the sound-pressure level of a test sound necessary to be barely audible in the presence of a masker. Using narrow- band noise of a given center frequency and bandwidth as maskers and a pure tone as the test sound, masking patterns have been obtained by Zwicker and Fastl [16,17]. The effect of masking produced by narrow-band maskers is level dependent. The five curves plotted as solid lines in Fig. 1 are the
  • 30. 18 Chapter 2 masking patterns centered at 1 kHz at the five different levels and 100 dB [17]. It is known that the shapes of the masking patterns for different center frequencies and different levels are very similar when plotted using the critical band rate scale. Hence masking curves at different center frequencies can be obtained by simply shifting the available masking curves at Masking curves at levels other than and 100 can be generated through interpolation. The masking curves obtained through interpolation and shifting are shown in Fig. 1 by the dashed lines. The tuning curves can be obtained from the masking curves as follows. The first step is to fix a test tone at a particular frequency and level. Then the masking curves with different center frequencies that are just able to mask the testing tone are found and the corresponding levels are noted. Plotting the levels as a function ofthe center frequencies provides the tuning curve at that test tone frequency (Fig. 2). The magnitude response of the basilar membrane (or auditory filters) can be obtained by vertically reversing and scaling the tuning curves in Fig. 2. This is shown in later subsections in Fig. 3 and 4 by the dashed lines. More details can be found in [11]. The tuning curves are consistent with the measurement of nerve tuning curves [8] and the basilar membrane response [14]. Two auditory filter design techniques that model the magnitude response accurately are introduced in the next subsection.
  • 31. 2. Wideband Speech and Audio Coding in the Perceptual Domain 19 It is well known that the human auditory system gives rise to a perception of loudness that closely follows a logarithmic scale. Log-magnitude modeling is a technique for IIR digital filter design [6]. This technique has also been applied in [10] to the modeling of auditory tuning curves. The result is a very accurate model that matches the magnitudes of the tuning curves. The criterion for auditory filter design is based on the minimization of the difference between the log-magnitude of the desired basilar membrane frequency response and a pole-zero filter. The transfer function of one filter in a critical band rate filterbank can be written as where and are the filter parameters, P is the number of poles, and Q is the number of zeroes. The filter design technique minimize the sum of squared differences, on a logarithmic scale, between a given set of spectral amplitudes and the magnitude response of sampled at the same frequencies: 2.2 Filterbank Design by the Log-Magnitude Modeling Technique
  • 32. 20 Chapter 2 where is a set of uniformly spaced frequencies and is the desired basilar membrane frequency response (positive magnitude values) at a certain center frequency. The minimization of J with respect to the parameters and is a nonlinear problem. To avoid gradient-based optimization, an iterative procedure originally proposed in [6] is used. The minimization index at the step can be written as The filter at step m is computed from where The solution of (4) is used to update the weight function in (3) and the process is then repeated. The complete algorithm converges to a sufficiently small error within 2 to 3 iterations. The details of this procedure can be found in [6,10]. A critical band filterbank of 17 filters covering the frequency range of 50 Hz to 4000 Hz was obtained by this design technique. The frequency response of the 17 filters is shown in Fig. 3 by the solid lines, together with the vertically flipped tuning curves by the dashed lines. These filters are minimum-phase IIR filters with 8 poles and 7 zeros. The magnitude responses of the digital filters are almost indistinguishable from the true tuning curves.
  • 33. 2. Wideband Speech and Audio Coding in the Perceptual Domain 21 2.3 Filterbank Design by Direct Modeling Approach A unified digital filter model is proposed in [11] to represent the frequency characteristics of all the tuning curves. The transfer function of one auditory filter in a critical band filterbank is expressed in the z-domain by The parameters in (5) are given by where is the sampling frequency. The critical bandwidth and the central frequency in (6) are calculated from the following equations [16, 17]: where is the critical band rate in Bark corresponding to The spacing of is linear on a critical band scale. The parameter is chosen as The term produces a notch filter with a sharp dip at a
  • 34. 22 Chapter 2 point to the right of the center frequency so that the upper-frequency slope of the overall filter is steep enough. The parameter is chosen as To ensure the notch happens at a frequency location about 60 dB lower than the center frequency the empirical formula that we obtained can be used to choose where is in Hz. The frequency responses of five filters at critical bands 4, 7, 10, 13 and 16 are plotted in Fig. 4, together with the corresponding tuning curves. The modeling accuracy of this direct modeling approach is acceptable and is more straightforward than the log-magnitude modeling approach. Our filters are also compared with the well-known Gammatone auditory filters [5,13]. Our filters have steeper upper-frequency slopes, which is desirable for both accurate modeling of the masking effect and noise suppression. Critical band filters designed using this method can achieve both high frequency domain accuracy and computational efficiency. Next we will apply the critical band auditory filterbank to speech and audio processing.
  • 35. then the synthesis filterbank is implemented using FIR filters obtained by time-reversal of the impulse responses of the corresponding analysis filters. The reconstruction is nearly perfect – i.e. Each FIR synthesis filter has 128 coefficients, so that an 8 ms delay is required to make the filter causal if kHz. 2. Wideband Speech and Audio Coding in the Perceptual Domain 23 3. PERCEPTUAL DOMAIN BASED SPEECH AND AUDIO CODING 3.1 Speech/audio Coding Using an Auditory Filterbank where is the frequency response of the analysis filter at the ith channel and M is the total number of channels. If we choose the synthesis filters as The speech and audio coding system implemented in this work is an IIR/FIR analysis/synthesis scheme as described in [9] and also shown in Figs. 5 and 6. Other possible analysis/synthesis filterbank implementations can also be found in [9]. Each IIR analysis filter has 8 poles and 3 zeros. The analysis filterbank can also be implemented in FIR form [3,7], but at least 100 coefficients are required for each FIR filter to approximate the impulse response of the IIR filter with reasonable accuracy. The auditory filterbank is also approximately power-complementary. That is,
  • 36. 24 Chapter 2 The output of each filter is half-wave rectified, and the positive peaks of the critical band signals are located. Physically, the half-wave rectification process corresponds to the action of the inner hair cells, which respond to movement of the basilar membrane in one direction only. Peaks correspond to higher rates of neural firing at larger displacements of the inner hair cell from its position at rest. This process results in a series of critical band pulse trains, where the pulses retain the amplitudes of the critical band signals from which they were derived. In recognition of the fact that lower power components of the critical band signals are rendered inaudible by the presence of larger power components in neighboring critical bands, a simultaneous masking model is employed. Weak signal components become inaudible by the presence of stronger signal components in the same critical band that precede or follow 3.2 Auditory Masking
  • 37. In the implementation described a simultaneous masking model similar to that used in MPEG [4] was employed to calculate the masking threshold for the ith critical band, however the optimum simultaneous masking model for this scheme has yet to be determined. The simultaneous masked pulse train for the ith critical band was obtained from pulses in the unmasked pulse train whose amplitudes were below the masking threshold calculated for each critical band were considered inaudible, and were set to zero Note that for each 32 ms frame, the gain of each critical band is calculated based only on the non-zero pulse amplitudes. The purpose of applying simultaneous masking is to produce a more efficient and perceptually accurate parameterization of the firing pulses occurring in each band. Experiments revealed that simultaneous masking removed an average of around 10% of the pulses without altering the quality of the reconstructed speech in any way. 25 2. Wideband Speech and Audio Coding in the Perceptual Domain them in time, and this is called temporal masking. When the signal precedes the masker in time, it is called pre-masking; when the signal follows the masker in time, the condition is called post-masking. A strong signal can mask a weaker signal that occurs after it and a weaker signal that occurs before it [2, 16, 17]. Both temporal pre-masking and temporal post-masking are employed in this work to reduce the number of pulses. 3.2.1 Simultaneous Masking 3.2.2 Temporal Post-masking The masking threshold for temporal post-masking decays approximately exponentially following each pulse, or neural firing. A simple approximation to this masking threshold, introduced in [3], is where is the ith of M= 21 simultaneous masked critical band pulse train signals, and is the discrete time sample index. The
  • 38. 26 Chapter 2 time constants were determined empirically by listening to the quality of the reconstructed speech, and values between and were chosen. All pulses with amplitudes less than the masking threshold were discarded. The thresholds are shown in Fig. 7 by the dashed line, where the filled spikes are the pulses to be kept after applying post-masking. 3.2.3 Temporal Pre-masking Pre-masking is also allowed for in this work. The masking threshold for this temporal pre-masking is chosen as where is the ith critical band pulse train after post-masking, and is chosen as to simulate the fast exponential decay of pre- masking. All pulses with amplitude less than the masking threshold were discarded. This is shown in Fig. 8, where the filled spikes are the pulses to be kept after applying pre-masking. A reduction rate of 10% can be achieved by pre-masking on the pulses obtained after post-masking. The purpose of applying masking is to produce a more efficient and perceptually accurate parameterization of the firing pulses occurring in each band. Experiments show that the application of temporal masking reduces the overall pulse number to about 0.70N (where N is the frame size) while maintaining transparent quality of the coded speech and audio. This is a significant improvement over the pulse number of 1.26N in the previous application [3], which used Gammatone filters in the front end. The improvement is mainly due to the spectral shape of the new auditory filters used in this work.
  • 39. 2. Wideband Speech and Audio Coding in the Perceptual Domain 27 3.3 Quantization and Coding The pulse train in each critical band after redundancy reduction was finally normalized by the mean of its non-zero pulse amplitudes across the frame. Thus, the parameterization consists of the critical band gains (incorporating the normalization factors) and a series of critical band pulse trains with normalized amplitudes. For each frame, the signal parameters requiring for coding are the gains of the critical bands and the amplitudes and positions of the pulses. 3.3.2 Pulse Positions The pulse positions are coded using a new run-length coding technique. After temporal masking and thresholding, most locations on the time- frequency map have zero pulses. This suggests that we can just code the 3.2.4 Thresholding The pulses in the silent frames obtained after auditory filtering and peak picking are most likely due to background and quantization noise. These pulses are at random positions and their magnitudes are very small, so that the sound synthesized from these pulses are inaudible. By thresholding, these pulses can be eliminated without affecting the quality of the synthesized signal. A simple approach is to choose the threshold based on the silent frames at the beginning of the coding process. 3.3.1 Pulse Amplitudes Each critical band gain is quantized to 6 bits and the amplitude of each pulse is quantized to 1 bit, which does not result in any perceivable deterioration in the quality of the reconstructed speech or audio signal. Alternatively, vector quantization can be adapted to reduce the bits required for coding the amplitude [3].
  • 40. 28 Chapter 2 relative positions of neighboring pulses or the numbers of zeros between them. Specifically, the data in all channels with one frame is concatenated into one large vector and is scanned for pulses. Then the number of zeros preceding each pulse is coded using 7 bits. An example is shown below If the number of zeros is over 128, a code word of 0000000 is generated and the counting of zeros restarts after the 128 zeros. If during the decoding process, seven consecutive zeros are encountered, then no pulse will be generated and the decoding carries on to the next code word. This coding strategy is a form of run-length coding and is lossless. The overall average bit rate resulting from this coding scheme is 58 kbps. This is an improvement upon the 69.7 kbps in the previous work [3]. By exploring the statistical correlations and redundancy among the pulses, Huffman or arithmetic coding can be applied to further reduce the bit rate. The synthesis process starts with decoding to obtain the pulse train for each channel, and then filtering the pulse train by the corresponding FIR synthesis filter. Summing the outputs from all filters results in the reconstructed speech or audio signal, which is perceptually the same as the original. The results at different stages are shown in Figs. 9-12, where Fig. 9 is the original speech signal, Fig. 10 shows the pulses obtained from peak- picking, Fig. 11 shows the pulses retained after applying auditory masking, and Fig. 12 is the reconstructed speech.
  • 41. 2. Wideband Speech and Audio Coding in the Perceptual Domain 29 4. CONCLUSIONS Design techniques for a new critical band auditory filterbank that models the psychoacoustic tuning curves have been proposed. The auditory filterbank has been applied to speech and audio coding. The filterbank is implemented as an IIR/FIR analysis/synthesis scheme to reduce computation. Auditory masking is applied to reduce the number of pulses. A simple run-length coding algorithm is used to code the positions of the pulses. The reconstructed speech or audio signals are perceptually transparent. The overall average bit rate resulting from this coding scheme is 58kbps. The filterbank has superior masking properties and the auditory-
  • 42. system-based coding paradigm produces high quality coded speech or audio, is highly scalable, and is of moderate complexity. Current research involves investigation into to the use of Huffman coding or arithmetic coding techniques to further reduce the bit rate by examining the statistical correlation and redundancy among the pulses. Ambikairajah, E., Black, N.D. and Linggard, R., “Digital filter simulation of the basilar membrane”, Computer Speech and Language, 1989, vol. 3, pp. 105-118. Ambikairajah, E., Davis, A.G., and Wong, W.T.K., “Auditory masking and MPEG-1 audio compression”, Electr. & Commun. Eng. Journal, vol. 9, no. 4, August 1997, pp. 165-197. Ambikairajah, E., Epps, J. and Lin, L., “Wideband speech and audio coding using Gammatone filter banks”, Proc. ICASSP, 2001, pp. 773-776. Black, M. and Zeytinoglu, M., “Computationally efficient wavelet packet coding of wide-band stereo audio signals ”, Proc. ICASSP, 1995, pp. 3075-3078. Flanagan, J.L., “Models for approximating basilar membrane displacement”, Bell Sys. Tech. J, 1960, vol. 39, pp. 1163-1191. Kobayashi, T. and Imai, A., “Design of IIR digital filter with arbitrary log magnitude function by WLS techniques”, IEEE Trans. ASSP, vol. ASSP-38,1990, pp. 247-252. Kubin, G. and Kleijn, W.B., “On speech coding in a perceptual domain”, Proc. ICASSP, 1999, pp. 205-208. Liberman, M.C. “Auditory-nerve response from cats raised in a low-noise chamber”, J. Acoust. Soc. Am., vol. 63, 1978, pp. 442-455. Lin, L., Holmes, W.H. and Ambikairajah, E., “Auditory filter bank inversion”, Proc. ISCAS 2001, 200l. Vol. 2pp: 537–540. Lin, L., Ambikairajah, E. and Holmes, W.H., “Log-magnitude modelling of auditory tuning curves”, Proc. ICASSP, 2001, pp. 3293-3296. Lin, L., Ambikairajah, E. and Holmes, W.H., “Auditory filterbank design using masking curves”, Proc. EUROSPEECH 2001, pp. 411-414. Lyon, R.F., “A computational model of filtering detection and compression in the cochlea”, Proc. ICASSP, 1982, pp. 1282-1285. Patterson, R.D., Allerhand, M., and Giguere, C., “Time-domain modelling of peripheral auditory processing: a modular architecture and a software platform”, J. Acoust. Soc. Am., vol. 98, 1995, pp. 1890-1894. Rhode, W.S., “Observation of the vibration of the basilar membrane of the squirrel monkey using the Mossbauer technique”, J. Acoust. Soc. Am., vol. 49, 1971, pp. 1218-1231. Robert, A. and Eriksson, J., “A composite model of the auditory periphery for simulating responses to complex sounds”, J. Acoust. Soc. Am., vol. 106, 1999, pp. 1852-1864. Zwicker, E. and Zwicker, U.T., “Audio engineering and psychoacoustics: matching signals to the final receiver, the human auditory system”, J. Audio Eng. Soc., vol. 39, No. 3, 1991, pp. 115-125. Zwicker, E. and Fastl, H., Psychoacoustics: Facts and models. Springer-Verlag, 1999. 30 Chapter 2 REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]
  • 43. RECOGNITION OF ENVIRONMENTAL SOUNDS USING SPEECH RECOGNITION TECHNIQUES Michael Cowling and Renate Sitte. Griffith University, Gold Coast, Qld 9726,, Australia Abstract: This paper discusses the use of speech recognition techniques in non-speech sound recognition. It analyses the different techniques used for speech recognition and identifies those that can be used for non-speech sound recognition. It then performs benchmarks on these techniques and determines which technique is better suited for non-speech sound recognition. As a comparison, it also gives results for the use of learning vector quantization (LVQ) and artificial neural network (ANN) techniques in speech recognition. Key words: non-speech sound recognition, environmental sound recognition, artificial neural networks, learning vector quantization, dynamic time warping, long- term statistics, mel-frequency cepstral coefficients, homomorphic cepstral coefficients It has long been a goal of researchers around the world to build a computer that displays features and characteristics similar to those of human beings. The research of Brooks [1] is an example of developing human-like movement in robots. However, another subset of this research is to develop machines that have the same sensory perception as human beings. This work finds its practical application in the wearable computer domain (e.g. certain cases of deafness where a bionic ear (cochlea implant) cannot be used.) Humans use a variety of different senses in order to gather information about the world around them. If we were to list the classic five human senses in order of importance, it is generally accepted that we would come up with the sequence: vision, hearing, touch, smell, taste. Chapter 3 1. INTRODUCTION
  • 44. Vision is undoubtedly the most important sense with hearing being the next important and so on. However, despite the fact that hearing is a human beings second most important sense, it is all but ignored when trying to build a computer that has human like senses. The research that has been done into computer hearing revolves around the recognition of speech, with little research done into the recognition of non-speech environmental sounds. This chapter expands upon the research done by the authors [2, 3]. In these papers, a prototype system is described that recognizes 12 different environmental sounds (as well as performing direction detection in 7 directions in a 180° radius). This system was implemented using Learning Vector Quantization (LVQ), because LVQ is able to produce and modify its classification vectors so that multiple sounds of a similar nature are still considered as separate classes. However, no comparative testing was done to ensure that LVQ was the best method for the implementation of a non- speech sound classification system. Therefore, this chapter will review the various techniques that can be used for non-speech recognition and perform benchmark tests to determine the technique most suited for non-speech sound recognition. Due to lack of research into non-speech classification systems, this chapter will focus on using speech and speaker recognition techniques applied to the domain of environmental non-speech sounds. The remainder of this chapter will be split into four sections. The first section will discuss techniques that have been previously used for speech recognition and identify those techniques that could also be applied to non- speech recognition. The second section will show the results of benchmarks on these techniques and also compare their performance with results for speech recognition. The third section of this chapter will discuss these results. Finally, the fourth section will conclude and suggest areas for future research. Research into speech recognition began by reviewing the literature and finding techniques that had previously been used for speech/speaker recognition. Techniques for both feature extraction and system learning were analyzed and those techniques that could be used for non-speech sound recognition were identified. These techniques were then benchmarked and results will be presented in the Results section. 32 Chapter 3 2. SELECTION OF TECHNIQUES
  • 45. In addition, it was found that emerging research in speech recognition suggests the use of time-frequency techniques such as wavelets. Due to the emerging nature of this research, these techniques will not be included in this comparison. However, for an insight into how wavelets can be used for speaker recognition, please refer to the chapter in this volume by Michael Orr et al, "A Novel Dual Adaptive Approach to Speech Processing". A specific investigation was then performed for each of these eight techniques. This investigation revealed that techniques based on LPC Cepstral Coefficients were based on the idea of a vocoder, which is a simulation of the human vocal tract. Since the human vocal tract does not produce environmental sounds, these techniques are not appropriate for recognition ofnon-speech sounds. In addition, Lilly [4] mentions that the results of the Mel Frequency Based Filter and the Bark Frequency filter are similar, mainly due to the similar nature of these filters. Gold [5] also mentions that PLP and Mel Frequency are similar techniques. Based on these previous findings, only the more popular Mel Frequency technique was selected for benchmarking. 3. Recognition of Environmental Sounds 33 2.1 Feature Extraction For feature extraction, the literature review showed that speech recognition relies on only a few different types of feature extraction techniques (each with several different variations). Eight techniques were selected as possible candidates for feature extraction of non-speech sounds. These were: Frequency Extraction LPC Cepstral Coefficients Homomorphic Cepstral Coefficients Mel Frequency Cepstral Coefficients Mel Frequency LPC Cepstral Coefficients Bark Frequency Cepstral Coefficients Bark Frequency LPC Cepstral Coefficients Perceptual Linear Prediction Features This leaves three feature extraction techniques to be tested: Frequency Extraction Homomorphic Cepstral Coefficients Mel Frequency Cepstral Coefficients
  • 46. To aid in selection of techniques, comparison tables were built (using [5, 6, 7, 8]) to compare the different feature extraction and classification methods used by each of these techniques. The comparison tables showed that some of these techniques, by their very nature, could not be used for non-speech sound recognition. Any of the techniques that use subword features are not suitable for non-speech sound identification. This is because environmental sounds lack the phonetic structure that speech does. There is no set “alphabet” that certain slices of non-speech sound can be split into, and therefore subword features (and the related techniques) cannot be used. Due to the lack of an environmental sound alphabet, the Hidden Markov Model (HMM) based techniques shown above will be difficult to implement. However, this technique may be revisited in the future if other techniques produce lower than expected results. In addition, it was decided that the SOM and LVQ techniques compliment each other. Kohonen developed both techniques, with specific applications intended for each technique. For classification, Kohonen suggests the use of the LVQ technique over the SOM technique [9]. Therefore, LVQ will be the technique benchmarked. 34 2.2 System Learning Chapter 3 Based on this information, the four techniques left to be tested are: Dynamic Time Warping Long-Term Statistics Vector Quantization / Learning Vector Quantization Artificial Neural Networks The following system learning techniques are commonly used for speech/speaker recognition or have, in the past, been used for this application domain. They are: Dynamic Time Warping (DTW) Hidden Markov Models (HMM) Vector Quantization (VQ) / Learning Vector Quantization (LVQ) Self-Organizing Maps (SOM) Ergodic-HMM's Artificial Neural Networks (ANN) Long-Term Statistics
  • 47. This section will detail how each of the techniques listed above were implemented in this system. It will also discuss the details of the experiment (such as number of sounds etc). The techniques will be tested using a jackknife method, identical to the method used by Goldhor [10]. A jackknife testing procedure involves training the network with all of the data except the sound that will be tested. This sound is then tested against the network and the classification is recorded. In cases where the setting of initial weights may affect the classification result (as is the case with LVQ and ANN techniques), classification is repeated 5 times, with different initializations each time. A correct classification is only recorded if more than three of the training runs are correct. This jackknife procedure will be repeated with all six of the samples from each of the eight sounds. 3. Recognition of Environmental Sounds 35 3. ANALYTICAL ANALYSIS OF SPEECH RECOGNITION TECHNIQUES 3.1 Experiment Setup As an initial test, eight sounds were used, each with six different samples. Data set size was kept as small as possible due to the time it takes to train larger data sets. The sounds used for this test are detailed below and are some typical sounds that would be classified in a sound surveillance system. 3.2 Benchmarking Method The feature extraction and system learning techniques shown in the comparison will be tested for their ability to classify non-speech sounds in two ways. First, benchmarking will be performed, using these techniques, on non-speech sounds and data on the parameters, the resulting time taken and the final correct classification rate will be recorded. Then, these results will
  • 48. 36 Chapter 3 be compared with statistics and benchmark results reported in the literature for the performance of these techniques on speech. This will demonstrate how these techniques perform against each other on speech and provide a comparison to the results for non-speech. In addition, since feature extraction and system learning are both required to recognize a sound, each system learning technique should also be tested against each feature extraction technique to determine the best combination of these two techniques. The exception to this is the Long-Term Statistics technique, which generates its own features and therefore requires no feature extraction techniques. Therefore, ten combinations of techniques must be benchmarked: 3.3 Methodology Each of the techniques used was implemented in MATLAB. Both feature extraction and system learning techniques were implemented and then combined together in the way shown above in order to perform a comprehensive comparison. In this section, the implementation of both the feature extraction and system learning techniques will be discussed. 3.3.1 Feature Extraction Techniques Three feature extraction techniques will be tested in this comparison. The implementation of each of these techniques will be discussed in this section. 3.3.1.1 Frequency Extraction Frequency Extraction was performed using the Fast Fourier Transform (FFT) routine in MATLAB, which uses the following equation for FFT:
  • 49. where f represents the range of frequencies in the signal. Each filter is then multiplied by the spectrum (or portion of the spectrum if it has been split using hamming windows) to produce a series of magnitude values (one for each filter). Finally, a Cepstral Coefficient formula (shown in the next 3. Recognition of Environmental Sounds 37 with where is the frequency we wish to check for, j counts all the samples in the signal and N is the length of the signal being tested. Since non-speech sound covers a wider frequency range than speech (anywhere from 0Hz to 20,050Hz, the approximate limit of human hearing), a 44,100 point FFT (N = 44100) was performed and the results (22,050 unique features) were used to train the system learning network. 3.3.1.2 Mel-Frequency Cepstral Coefficients The MFCC algorithm was taken from the Auditory Toolbox by Malcolm Slaney of Interval Research Corporation [11]. This toolbox is in wide use in the research community. This toolbox applies three steps to produce the MFCC. First, it applies a Hamming Window using the standard Hamming Window equation: where n represents the subset of the signal which is being windowed. A Melody Frequency Filterbank is then applied to each windowed segment. The melody frequency filter bank m is a logarithmic calculation using the following relation:
  • 50. section) is applied to produce MFCC and these features are then modified into a vector that is more appropriate for training a network. Special attention was paid to removing the first scalar within the vector, which represents the total signal power [5] and is therefore too sensitive to the amplitude of the signal [4]. 3.3.1.3 Homomorphic Cepstral Coefficients The MFCC algorithm from the Auditory Toolbox by Malcolm Slaney of Interval Research Corporation [11] was then used as a basis to implement a Homomorphic Cepstral Coefficient (HCC) algorithm. This algorithm was written from scratch but based on information from the source code in the MFCC algorithm. The HCC algorithm applies the cepstral coefficient formula directly to the signal after it had been split using hamming windows. To calculate cepstral coefficients we use the following relation: where and n is the length of the windowed segment being manipulated. These features were then modified into a vector that was more appropriate for training a network. As with the MFCC, special attention was paid to removing the first scalar within the vector, which represents the total signal power [5] and is therefore too sensitive to the amplitude of the signal [4]. 3.3.2 System Learning Techniques Four system-learning techniques will be tested in this comparison. The implementation of each of these techniques will be discussed in this section. 38 Chapter 3
  • 51. 3.3.2.1 Learning Vector Quantization Learning vector quantization (LVQ) was implemented using the inbuilt LVQ routines in MATLAB’s neural network toolbox. The network was initialized with 20 competitive neurons and a learning rate of 0.05. This combination was found to give an acceptable classification rate. 3.3.2.2 Artificial Neural Networks Artificial neural network (ANN) was implemented using the fast back propagation algorithm (BPA) in the MATLAB neural network toolbox (trainbpx). The network was initialized with 20 hidden neurons and a learning rate of 0.05. In addition, sum-squared error was set to 0.1 and the momentum constant was set to 0.95. 3.3.2.3 Dynamic Time Warping Dynamic time warping (DTW) was implemented using the algorithm in the Auditory Toolbox developed by Malcolm Slaney [11]. The test signal was warped against each of the reference signals and the error was recorded. The smallest error was taken to represent the closest class of sound. 3.3.2.4 Long-Term Statistics Long-Term Statistics (LTS) was implemented using the mean and covariance functions available in the standard MATLAB distribution, where N is the length of the signal x. Mean and covariance were calculated for each of the reference signals and stored in a matrix. The mean and covariance of the test signal was then compared to this matrix. The closest match was selected as the correct class. If the closest mean and covariance occurred in difference classes, the test was concluded to be inconclusive. 4. RESULTS & DISCUSSION This section will cover the results of this research. Results are shown for the comparative study of existing speech recognition techniques when these techniques are applied to non-speech. In addition, a discussion is given on these results. 3. Recognition of Environmental Sounds 39
  • 52. 40 Chapter 3 4.1 Results 4.1.1 Non-Speech Sound Recognition Results for non-speech sound recognition are presented below.
  • 53. 3. Recognition of Environmental Sounds 41
  • 54. 42 Chapter 3 4.1.2 Speech Recognition For comparison, results were found for LVQ and ANN in speech recognition systems. These results are presented here. Due to the current popularity of HMM methods in speech recognition at the present time, results for DTW are difficult to find, therefore no DTW results are presented. For ANN’s, a selection of results from Castro and Perez [12] are shown below. Their results were taken on an isolated word recognition set with typically high classification error, the Spanish EE-set. The Multi-Layer Perceptron (MLP) tested used the back propagation algorithm, contained 20 hidden neurons and was trained over 2000 epochs with various amounts of inputs. The figures given are the MLP’s estimated error rate with a 95% confidence interval. For LVQ, results from Van de Wouver e.a. [13] are shown below for both female and male voices. These results present statistics for both a standard LVQ implementation for speech recognition and an implementation of LVQ that then has fuzzy logic performed on it (FILVQ). As can be seen from the results, the use of LVQ for speech recognition produces rather low recognition results.
  • 55. Other documents randomly have different content
  • 56. THE SPIRIT LEVEL This is necessary on outdoor structures which are to be placed on foundations, in securing level or horizontal timbers, and in plumbing the uprights. The human eye is not equal to the task. Masons and builders make use of wooden plumb rods, but as the level is necessary to secure the horizontals, it will be at hand for the uprights, the two glass tubes being at right angles. (Fig. 131.)
  • 57. Fig. 131. The spirit level RULE A two-foot, four-fold, boxwood rule, graduated to eighths outside and sixteenths inside, will answer all ordinary requirements. (Fig. 132.) THE STEEL SQUARE Fig. 132. Steel square and rule This simple but valuable tool, about which volumes have been written, is necessary for building construction, but is not needed in the making of furniture or cabinet work.
  • 59. Fig. 133. The nail box Fig. 134. Socket chisels XXIII MAKING NAIL BOXES The boys now became very busy completing their shop equipment, and the first project was a box for holding different sizes of nails. This was to be kept on the bench where it could be reached conveniently, and it is shown in Fig. 133. After studying the sketch, Harry made out the bill of material: 2 pcs. pine 15 × 13⁄4 × 1⁄2 2 pcs. pine 3 × 13⁄4 × 1⁄2 2 pcs. pine 31⁄2 × 13⁄4 × 3⁄8 These six pieces were squared up, and the joints for the two partitions laid out by placing them edge to edge in the vise. Pencil lines were drawn across the faces at random, a. Ralph explained that by fitting these pencil lines they could at any time bring the two pieces together in the original position. The four knife lines representing the edges of the grooves were next drawn, and squared half-way down on each edge, using the face with the pencil lines as a working face. The bottom of the groove was laid off with the marking gauge set at 1⁄4 inch. The wood inside the lines was removed by
  • 60. making a saw cut just inside the knife lines, and cutting out with a 3⁄8-inch chisel. This led to a talk on chisels. Ralph explained that for fine work a "firmer" chisel was used, having a comparatively thin body. There are two kinds of handles, known as "socket" and "tang." The chisels having "tangs" should never be hammered, as the tang acts as a wedge and splits the handle. Where blows are to be struck with the mallet, a socket handle should be used. (Fig. 134.) For heavy work, where hard blows are to be struck, as in house-framing, and out-of-door work generally, the heavy framing tool should be used. The handle of this chisel has a heavy iron ring near the top to keep it from going to pieces. Our boys' equipment at this time consisted of one half-inch and a one-inch firmer chisel with tang handles, a 1⁄8-inch and 3⁄8-inch socket firmer, and one 1⁄2-inch framing chisel. Later on they added a 1⁄4-inch firmer with tang handle. The grooves for the nail box were cut with the 3⁄8-inch chisel without the aid of the mallet. Ralph showed how, by inclining the tool at a slight angle, a paring action could be obtained, and by working from both ends of the groove no corners were destroyed. When the four grooves were finished, the box was ready for assembling. This called for hammer and nails. Wire nails are so cheap now that the old-fashioned cut nails have been largely driven from the market. The nails used on the box were one-inch brads. The holding power of flat-head nails is of course much greater than bung head, but in this case the box was to be squared up after nailing, exactly as if it were a solid block of wood. This meant planing the sides and ends, and as the nails would ruin the plane iron, they were all sunk below the surface with a nail set or punch. (Fig. 135). This is a useful tool, but not absolutely necessary, as for
  • 61. light work a wire nail, with the point ground flat on the grindstone, will answer the same purpose. A carpenter frequently uses the edge of a flat-head nail instead of the punch. Photograph by Arthur G. Eldredge The Correct Way to Hold the Chisel.
  • 62. Fig. 135. Wire nails and nail sets Fig. 135a Wire nails and nail sets The box was assembled by nailing together the sides and ends. The bottom was next put on, holding the try square along one side and end to make sure everything was square, and last of all the two partitions were pushed down into their grooves, and tied in place by one brad from each side. Next, all nails were set, and the outside tested with the try square and trued up with the plane. The cabinet of drawers shown in Fig. 136 was next designed to keep the assortment of screws and nails, which the boys knew would soon accumulate. As far as possible, they were kept in their original paper boxes, on which the sizes were plainly printed. The twelve drawers were simply boxes without covers or partitions, and Ralph suggested that it was not necessary to make them all at once, but that they could often fill in spare time that way, and gradually complete the dozen.
  • 63. Fig. 136. Cabinet for nails and screws After making the nail box with partitions, this was a simple job, it being only important that they all be of the same size. The construction of the cabinet, however, brought new problems. The shelves, being short, did not require any vertical support except at the ends, where they were gained into the sides, and to give Harry practice the top and bottom were to be "rabbeted" into the sides. The sides then were the most important parts. All six pieces were first squared up to the dimensions called for in the drawing. The list of material was as follows: 4 pcs. 245⁄8 × 12 × 1⁄2 shelves 2 pcs. 14 × 12 × 1⁄2 ends 1 pcs. 251⁄8 × 14 × 1⁄4 back
  • 64. "The grain must run the long way," said Ralph, "so the grooves will be across the grain." The four grooves were laid out with knife and try square, and the lines scored as deeply with the knife as possible. Then another cut was made with the knife inside of the first, and with the knife held at about 45 degrees, cutting out a V-shaped groove, as shown at a. In each of these grooves a cut with the buck saw was made down to the line, and the wood removed with the 3⁄8-inch chisel. There are special planes, called rabbet planes, and plows for doing this kind of work, but it is good practice for beginners to use the chisel. The grooves finished, the cabinet was put together with 11⁄2-inch brads, except the back. This being of thin material, and having no special strain on it, was nailed on with 1-inch brads. The total width of the drawers in each tier was 1⁄8 inch less than the space. This gave clearance, so that they could be moved in or out easily. Later, when all twelve drawers were finished, the boys bought a dozen simple drawer pulls, and screwed one in the centre of each box. The centre was found by drawing the diagonals in light pencil lines. The front and ends were sand-papered, and given two coats of dark- green stain, and the cabinet was placed on a shelf against the wall.
  • 66. XXIV BIRD HOUSES The boys felt that they were ready for business, and Ralph suggested that they had provided enough weather vanes and windmills, but had made no provisions for the birds. The cat, that arch enemy of the native birds, had driven the robins, martins, and wrens all away. Each year some of these brave little birds started homes in the trees near the house only to have their families devoured as soon as they were hatched. A bird house to be attractive need not be very pretentious, but it must absolutely be cat-proof, or the birds will inspect it carefully from all points of view and leave it severely alone. A nest well hidden in the tree foliage or shrubbery is not nearly so conspicuous as a brightly painted house fastened to the limbs of a tree. The side of a barn or outhouse, far enough down from the roof so that the cat cannot reach it, or a tall pole covered on the upper part with tin, so that the feline bird hunter cannot gain a foothold, are about the only safe places for a house which the birds will actually adopt. The first house our woodworkers manufactured is shown in Fig. 137. This was a single or one-family house, and its construction was very simple. The list of material follows: One pc. 1⁄2 -inch pine or white wood 10 × 61⁄2 ins. Two pcs. 1⁄2 -inch pine or white wood 71⁄2 × 3 ins. One pc. 1⁄2 -inch pine or white wood 91⁄2 × 5 ins. One pc. 1⁄2 -inch pine or white wood 91⁄2 × 41⁄2 ins. Two pcs. 1⁄2 -inch pine or white wood 51⁄4 × 41⁄2 ins.
  • 67. The first piece, 10 × 61⁄2 inches, was simply squared up for the bottom. The two pieces for the sides, 71⁄2 × 3 inches, were squared up, and one edge of each planed to a 45-degree bevel, to engage with the roof boards. The latter were squared up, and nailed together at right angles with 11⁄4-inch brads. The two ends, 51⁄2 × 41⁄2 inches, were carefully laid out as shown in the drawing, sawed, and planed to the lines with square edges. In the end which was to contain the circular door a hole 13⁄4 inches in diameter was bored with its centre two inches from the bottom line. This required the services of the extension bit, and, to avoid splitting the wood, as soon as the spur of the bit showed on the further side, the wood was turned about, and the hole finished from the other side. The house was next turned upside down, and fastened in the bench vise. Holes were drilled along the sides of the bottom piece 3⁄4 inch in from the edge—three on each side—countersunk, and the piece fastened to the sides with 1-inch No. 8 screws. The top pieces already nailed together were now nailed in position on the sides and ends with 1-inch brads.
  • 68. Fig. 137. One family bird house, and house for high-hole The pole they used was 13 feet long and about 3 inches in diameter at the small end. It was rounded at this end by using a draw knife. (Fig. 138). A block of 7⁄8-inch pine was bored out, and fitted snugly over the end of the pole. This block was then removed, and four holes bored through it for screws.
  • 69. Fig. 138. The draw knife Before replacing the block on the top of the pole a cut was made across the end of the pole about two inches deep, by means of the rip saw. The block was replaced, and wooden wedges driven into the saw cut. This fastened the block securely on the end of the pole, and after making sure that it was level, the bird house was fastened to the block by four 11⁄4-inch screws from the under side. A piece of sheet tin was wound around just under the house to discourage pussy, and the pole set into the ground about three feet, bringing the under side of the house ten feet above the ground. A double or two-family house of similar proportions was built next, as shown in Fig. 139. The list of material called for: One pc. 1⁄2-inch wood 181⁄2 × 61⁄2 (bottom) One pc. 1⁄2-inch wood 181⁄2 × 51⁄2 (roof) One pc. 1⁄2-inch wood 181⁄2 × 41⁄2 (roof) Two pcs. 1⁄2-inch wood 151⁄2 × 3 (sides) Three pcs. 1⁄2-inch wood 51⁄4 × 41⁄2 (ends and partition) The construction was the same as before, each end having a door, and the partition of course being solid. The block for supporting the house on the pole was larger, being 8 × 5 × 11⁄4 inches, and called for six 11⁄2-inch No. 10 screws, to secure it to the under side of the floor. Harry wanted to make it more complete by adding a small wind vane, but Ralph said it might frighten the birds, so it was omitted.
  • 70. Of course larger and more ornamental houses may be built, but where there are too many families in such close proximity there is apt to be trouble, while houses that are too conspicuous do not appeal to the beautiful American wild birds that we want to attract. With the English sparrow it does not matter so much. For these birds, a tenement house against the side of a barn may be built easily, in the form shown in Fig. 139. This may be made any length, each door leading to a compartment separated from the others by partitions. Make as many pieces plus one as there are to be compartments, apartments, or flats; have the bottom project as shown in side view for a perch and walk, and have the roof also project to shed rain. If not fastened from the inside of the barn by stout screws, this house must be secured to a shelf, or by brackets. Fig. 139. Two family house and tenement The side view shows a simple shelf made of a back piece secured to the side of the barn by screws or nails, a plain shelf nailed to this
  • 71. Fig. 140. The bird bath back piece, and two wooden brackets. If iron brackets are used, both the shelf and back piece may be omitted, the brackets being fastened to the under side of the bird house and to the siding of the barn by screws. For birds like the high-hole, or flicker, a piece of hollow log, or an elongated box fastened securely to the side of a pole, made cat proof, is very acceptable. This should not be painted, but should be provided with a door on the side and a perch. (Fig. 137.) The opening should be about three inches for these large birds, and the location should be as secluded as possible. Any number of devices will suggest themselves, but always remember the cat, and study the location from the bird point of view. The martins and swallows are especially to be encouraged, as they are wonderful destroyers of insects. One device, especially grateful to these feathered friends in hot weather, is a pan of water, in a place where they can drink and bathe without being eternally on the watch for that crouching enemy, who is always stalking them—Tabby. A pedestal with a platform about four feet above the ground will do nicely, and it can be placed so close to the house that you can watch them, and enjoy their ablutions almost as much as they do. (Fig. 140.) The construction is too simple to require an explanation.
  • 73. XXV SIMPLE ARTICLES FOR HOUSEHOLD USE The boys thought it was about time to pay some attention to the wants of the family, who had been clamouring for weeks to have this article or that for the kitchen, dining room, and in fact for every part of the house. Ralph was a wise teacher, however. He knew that the cause of ninety out of every hundred failures was due to the young mechanic's trying some problem too far advanced. It seems strange that people cannot learn this lesson. We have seen hundreds of boys led along, say in carving, from one simple lesson to another, until at the end of five or six carefully graded exercises, these boys could carve beautifully any design given them. On the other hand, we have seen boys start in on their own hook, without any direction from older people, and ruining everything they tried, simply because they wanted to do the most difficult thing first, before they had developed any skill. Ralph was determined that his boy should be an expert and successful user of tools, so he paid no attention to the clamours of the family, and allowed Harry to make only those things which were within his power to do well. Each time a piece of work was finished, and inspected by the family, the universal chorus was something like this: "Well, if he can make such a fine bird house, I don't see why he can't make half a dozen picture frames for these water colors," or, "If he can make such a fine pen tray, I don't see why he can't make a new stool for the piano!"
  • 74. In vain Ralph explained that these things could be made in due time, that a picture frame required much more skill than a bird house, etc. Their household articles commenced with a bread board for the kitchen. (Fig. 141). This gave Harry his first experience in planing a broad surface. He used jack and smoothing planes for the working face, and squared the rest of the board as he had smaller pieces. This required some time. The wood about the semi-circular top was removed with saw and chisel, the board held for the chiselling flat on the bench hook. After getting this curve as true as possible with the chisel, it was finished with a sand-paper block. A 1⁄2-inch hole was bored at the centre of the semi-circle to hang it up by, and the two lower corners were rounded with chisel and sand-paper. No sand- paper was used on the flat surface, as Ralph explained this was a board for cutting bread, and the grit from the sand-paper would become more or less embedded in the wood, and it would spoil the bread knife. Sand-paper is made of ground quartz, and it soon dulls the edge of a cutting tool. Fig. 141. The bread board
  • 75. The knife and fork box (Fig. 142) brought new problems. The list of material was: 1 pc. 111⁄2 × 31⁄4 × 1⁄2 2 pcs. 7 × 11⁄2 × 1⁄2 2 pcs. 14 × 11⁄2 × 1⁄2 1 pc. 12 × 61⁄2 × 1⁄4 It was made of white wood, and, after being assembled, was stained a rich brown by receiving two coats of bichromate of potash. This is a chemical, which may be bought at a paint or drug store in the form of crystals. These are dissolved in water, until the solution looks like pink lemonade. It can be applied with a brush, but each coat must be allowed to dry completely before the whole is sand-papered smooth with No. 0 sand-paper. A deeper brown can be obtained by adding one or two extra coats of stain. The middle partition containing the handle was made first. The drawing was laid out on the wood after it had been squared up, and two holes 1 inch in diameter were bored out at a a. The wood between was taken out with a key-hole saw, and finished to the line with chisel and knife. A turning saw can be used to advantage on this handle, but it is not absolutely necessary. Spaces b b were removed in the same way, but a knife was used in the concave part of the curve. If it is handy, a small spokeshave can be employed on the whole upper line of this handle. Anything in the nature of a handle should be rounded to fit the hand. Edges c c were therefore rounded with the knife, and finished with coarse, followed by fine, sand-paper. The two sides were laid out together as in the nail box, and the groove cut with back saw and 1⁄8-inch chisel. The end pieces were made in a similar manner, and the bottom piece squared to 1⁄16-inch of finished size. The assembling consisted of first gluing together the sides and ends. Two hand screws were used to hold them. This was Harry's first attempt at using hand screws, and Ralph showed him the importance of keeping the jaws parallel.
  • 76. Fig. 142. Method of using hand screws in the construction of a knife box The box remained in the hand screws over night, and the next day it was found to be securely fastened. The most convenient kind of glue for boys is the liquid sold in cans. It is always ready for use, and very handy where only a moderate quantity is needed. Dry glue in the form of flakes, or granulated, must be soaked over night, and then heated in a pot having a double bottom with water in the lower part. It should be put on hot with a brush or a small flat stick. The best glue is none too good, yet a good quality has wonderful holding power and should last indefinitely. After removing the hand screws, the unfinished box was placed in the vise, tested with the edge of the plane, and made perfectly true, top and bottom. The 1⁄4-inch bottom piece was now put on with one-inch brads, the sides and ends made square, the handle partition slipped into the grooves, and fastened with two brads at each end. This knife box was so satisfactory that our young carpenters resolved to have a large one for tools. Whenever they had a job to do in the house, they were constantly running out to the shop for something, so that a tool box became a necessity. The construction was similar to the knife box; but this was larger and heavier, and the dado joints at the ends were replaced by a butt joint fastened with flat-head screws. (Fig. 143). The bottom and
  • 77. Fig. 143. Tool box partition were also put on with screws, on account of the weight to be carried. Fig. 144. Another tool box These tool boxes are frequently made in the shape shown in Fig. 144, with sloping sides and ends called the hopper joint; but aside from the tool practice it affords, it is doubtful if the shape has advantage enough over the other form to warrant the extra time it takes. Man is an imitative creature, however, and what one carpenter has, the others copy. The principal features about this useful article should be size and strength, especially in the handle, which should be of about 5⁄8 or 3⁄4 inch stock.
  • 79. XXVI THE MITRE BOX AND PICTURE FRAMES It seemed to Harry that the shop was fairly well equipped, but Ralph insisted that they must have a mitre box before making anything else for the house. The mitre box is, or should be, an instrument of precision, and although simple in construction, must be perfectly accurate, or it is useless. (Fig. 145.) The illustration shows the common form, but elaborate affairs of iron and wood can be bought ready made. Every boy should make his own, for the practice, if for nothing else. The sides should be made of oak 7⁄8 inch thick, 18 inches long, and 31⁄2 inches high, the bottom of 7⁄8-inch pine or other soft wood, the same size. When squared up, the two sides must be tested by standing them side by side; then reverse one end for end, to see if they are alike. If not, find where the trouble is, and correct it. It is especially important that the edges of the bottom piece be square and the sides perfectly parallel. This test can be made with the marking gauge. Sides are fastened on by boring and countersinking for three screws on each. After assembling, the whole thing must be tested as if it were a solid block. Top edges must be true and parallel.
  • 81. Fig. 145. The 45° mitre box and test pieces Near one end—about two inches in—lay out across the top with try square a line 90 degrees with the sides. Carry the line down each side, square with the top edges. For 45-degree angles, lay out a square by drawing two pencil lines across the top, as far apart as the finished mitre box is wide. Draw the two diagonals and square lines from their ends down both sides, taking care that their position is not over the screw in the bottom; because as the saw cuts deeper it may reach this screw and ruin its teeth. Make the three saw cuts directly on the lines laid out with a cross cut or back saw, with the utmost care. If this is not done accurately, all the labour of preparation is wasted. The blank end of the mitre box may have an additional 90-degree cut, or be left for new cuts in the future, as a mitre box of this description wears out and becomes inaccurate. Other angles may be used, as 60 degrees or 30 degrees, but it is better to have these on another box as they are used less, and for special purposes. (Fig. 146.) The mitre box is not ready to use until it has been thoroughly tested. Prepare a strip of soft wood—pine or white wood—11⁄2 inches wide and 1⁄2 inch thick. Cut four pieces from it on the mitre box, using the back saw as shown at a, with only one of the slits. Place these four triangular pieces together to form a square. All the four mitre joints of this square must fit perfectly. If they do not, mark the slit "N. G.," and test the other slit in the same way. If all right, mark "O. K." It often happens that one may be perfect and the other inaccurate. If they are both O. K., the box is ready for use. If one slit is useless, lay out and cut another on the blank end of the mitre box in the same direction, and test again. In testing a 30-degree cut three pieces of the strip should be sawed out, and when placed together they should form a perfect equilateral
  • 82. Fig. 146. 30-60-90 mitre box triangle, while from a 60-degree cut, six pieces are needed to form a hexagon. These angles are valuable in inlaid work, and for getting out geometrical designs. The 45-degree cut is indispensable in making the mitred corners of picture frames and in cabinet work. In making picture frames of simple cross section, it is first necessary to cut the rabbet (Fig. 147) with a rabbet plane. If this moulding is made by hand, the size of the picture should be measured, the length of all four sides added, and a liberal allowance made for waste. Fig. 147. Making picture frames In the figure, the triangles a a are waste, the rabbet being indicated by the dotted line. After the four pieces have been sawed out on the mitre box, they should be placed together on a flat surface, such as
  • 83. the bench top or floor, to see if the mitres fit perfectly. If they do not, one of them can be block planed to make a perfect fit, and the other three laid close together, as shown in the illustration. The assembling is the hardest part of the operation, and many devices have been tried and some patented to hold the parts together while the glue is drying. Perhaps the surest way is to drill a hole in one piece of each joint large enough for the passage of a wire bung-head nail. The undrilled piece is placed vertically in the vise. The drilled piece, after receiving a thin coat of glue, is brought into position horizontally, and the nail driven home. Theoretically, the nail should catch at the first blow, but the horizontal piece will sometimes slip, even with the best of care. It is wiser to place this piece about 1⁄16 inch above its final position, to allow for this slip. A method sometimes used is to glue near the ends of each piece a triangular block of wood, as shown at d. These must be left over night to harden. The next day the whole four pieces can be glued and held together by four hand screws, as shown, until the glue is thoroughly hard. This method, of course, can only be used with plain moulding or that which is square on the outside. Our boys tried another way that is commonly practised. They nailed oblong blocks to an old drawing board, as shown at e e, and then placed the picture frame in the centre, after gluing the joints, and driving wedges in between the blocks and the frame. Paper placed under each joint prevented the frame from being stuck to the drawing board by the glue forced out by the pressure. This paper plan was learned by experience, as the first frame the boys tried had to be pried up from the board, and in so doing they broke it at two of the joints, so that it had to be made again.
  • 84. It is well to remember in gluing mitre joints that end grain absorbs more glue than a flat surface. A priming coat should be applied first, and allowed to remain a few moments to fill up the pores. The second coat should hold fast and make a strong joint, but an excess of glue should always be avoided, as it must be removed after hardening, and glue soon takes the edge from the best of tools. Very fancy frames should be avoided. A bevel on the outside or inside, or both, is about all the young woodworker should attempt in the way of ornamentation. Depend on the natural beauty of the wood, as a fancy frame draws the attention from the picture, which after all is the main thing. We should admire the man, not his clothes, the picture not its frame, although the latter should be neat and well made. The finishing and polishing of frames is taken up in Chapter XLIX.
  • 86. XXVII MAKING TOILET BOXES To make a wooden box sounds like a simple proposition; but in making the drawing, the questions of size, proportion, joints, hinges, etc., immediately come up. The size of course depends on the purpose of the box. If it is for ladies' gloves, it should be long and narrow; if for collars or handkerchiefs, square or nearly so. The height is nearly always made too great. In fact, the whole question of proportion is one which can hardly be taught; it must be felt, and different people have different ideas as to what constitutes good proportion. Some hints, however, may be given: A box perfectly square does not look well. Again, dimensions that are multiples do not look well. A box 4 × 8 × 12 inches would not be nearly so pleasing as one 3 × 51⁄2 × 12 inches. The proportions are also affected by the constructive details. Is the box to be flat on the sides and ends or is the top to project? etc. Our boys argued and sketched and finally drew the design shown at Fig. 148. This was to hold ties. The top was to project and have a bevel, or chamfer, also the bottom. No hinges were to be used, but the cover was to have cleats fastened on the under side to keep it in place, and to prevent warping. The next question was the manner of fastening the sides and ends. On unimportant work, a butt joint with glue and brads can be used, but for a toilet article, the holes made by the brads, even if they are filled with putty, are not satisfactory.
  • 87. Fig. 148. Dado joint used in box design So it was decided to use the dado joint as shown at a. This meant more fine work, but, as Ralph suggested, it was to last a lifetime, and should be made right. Sides and ends were squared up, and the grooves on the side pieces laid out as in the nail box. The rabbets on the end pieces were cut out with the back saw and chisel. After the joints had been carefully fitted, the four pieces were glued together and placed in hand screws over night. While the glue was hardening, the two pieces for the top and bottom were squared up and bevelled with the smoothing plane on the long sides, the block plane on the ends. The cleats for the top were next made, drilled and countersunk for the screws as at b. A careful full-sized drawing of half of the top was made, and a chip carving design drawn for it. The cleats were not put on until the carving was finished and short screws had to be used so they would not come through and spoil the surface. The next day the body of the box was removed from the hand screws and squared with a smoothing plane. The top and bottom were put on with 1-inch brads. These were "set" with a nail punch to prevent any possible scratching and the whole box was rubbed down with wax dissolved in turpentine. For fine cabinet work, the dovetail joint makes the most satisfactory method of fastening, but Harry was not yet skilled enough to do the fine work it demanded. The second box was for handkerchiefs, dimensions 8 × 7 × 3 inches outside, and no overhang at either top or bottom. The construction brought in several new features. Sides and ends were dadoed together as in the first box.
  • 88. The top and bottom, after being squared, were rabbeted on all four sides until they fitted snugly into the opening top and bottom. They were glued in these positions and placed in hand screws over night. (Fig. 149.) "How are you going to get into that box?" asked Harry. "You've closed it up solid and glued the top on." "Wait and see," was all the satisfaction he got. Fig. 149. The handkerchief box The next day the hand screws were removed and the box squared up exactly as if it had been a solid piece of wood. Ralph then made two gauge lines around the four sides, 3⁄4 inch from the top and 1⁄8 inch apart. Then he cut the box in two between these two lines with a rip saw, after slightly rounding all corners except the bottom ones with a plane and sand-paper. By this method, the box and cover must be exactly alike in outline, and by planing to the gauge lines, they will fit perfectly. It only remained to hinge the two parts together, but this operation proved to be no slight task.
  • 89. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com