Robust Sound Field Reproduction against Listener’s Movement Utilizing Image Sensor

Robust Sound Field Reproduction against
Listener’s Movement Utilizing Image Sensor
Toshihide Aketo，Hiroshi Saruwatari，Satoshi Nakamura
(Nara Institute of Science and Technology, Japan)

Outline

Research background
Conventional method
Spectral Division Method
Local sound field synthesis

Proposed method
Equiangular filter
Sound field reproduction system utilizing image sensor

Simulation experiment
Subjective assessment
on directional perception
on sound quality

Research background (1/3)
Objective of sound field reproduction (SFR) system
To reproduce the primary sound field to another space with wide range
and high accuracy.
However, it is difficult to realize such a system because the system size
becomes larger and the system configuration becomes complex.
Therefore, the recent research is focused on reproducing sound field with wide
range and high accuracy using small and simple system.
Surrounded
(large and complex)

Circular or spherical
(a little complex)

Linear or planer
(simple)

Boundary surface control
(BoSC)

Ambisonics
Stereo or surround system

Wave field synthesis
(WFS)

Focused
Complex

Simple

Spectral Division Method (SDM) [J. Ahrens, S. Spors., 2008]
One of the SFR methods that reproduces the sound field by synthesizing a
number of wavefronts.
This method can be realized with a simple system like linear loudspeaker
array.

However, SDM has two problems.
Problem 1: A sound pressure error is occurred by mismatching the
reference listening line.
Problem 2: A disturbance of wavefront is occurred by a spatial aliasing.

Reproduction accuracy: Low
Reproduction region: Wide

High

We aim to reproduce the sound field with high
accuracy by solving these problems in SDM.

To cope with these problems, we propose the novel SFR system with
linear loudspeaker array, which combines listener’s position
estimation by Kinect and SDM with local sound field synthesis.

Image sensor
Kinect

Local sound
field synthesis
Reproduction accuracy

Low
Reproduction region:

Wide

Reproduction accuracy:

High
Reproduction region:

localized around listener

Primary source

Primary source
nth secondary
source

nth secondary
source

Reference
listening line

Reference
listening line

Spatial domain

IDFT
Fourier transform

Wavenumber domain

The driving function in the wavenumber domain

The driving function in the spatial domain

： angular frequency
： wavenumber in

： speed of sound
-direction

： imaginary unit

： reference listening distance

： zero-th order modified Bessel function of the second kind

： zero-th order Hankel function of the second kind

Primary source

Primary source
nth secondary
source

nth secondary
source

Reference
listening line

Reference
listening line

Spatial domain

IDFT
Fourier transform

Wavenumber domain

The driving function in the wavenumber domain

The driving function in the spatial domain
：reference listening distance

Problems in SDM
A sound pressure error is occurred by mismatching the reference listening line.
A disturbance of wavefront is occurred by a spatial aliasing.

Problem 1 : sound pressure error
A sound pressure is correctly reproduced only on the reference
listening line under 2.5-dimensional synthesis condition.
Sound pressure is correctly
reproduced on the

2.0

2.0

1.0

1.0

0.0

0.0
-1.0

0.0

1.0

Primary sound field

-1.0

0.0

Sound pressure error
1.0

occurs outside the

Reproduced sound field

Therefore, to correctly reproduce the sound field to listener's position,
we must set the reference listening distance equal to listener's distance.

Problem 2: spatial aliasing (1/2)

0

10

-24

0

-48

0

R参
加

-30

30

20

0

10

-24

0

-48

-30

0

30

Spectral overlap occurs
Discretization of the secondary source

Magnitude[dB]

20

Magnitude [dB]

In SDM, a spectral overlap of the driving function is occurred by
discretization of secondary source, and filter power at high frequency
becomes larger like in the right figure.

Problem 2: spatial aliasing (2/2)
The effect of spectral overlap in the wavenumber domain appears as a
spatial aliasing in the spatial domain.

1.5
0.00
0.0
-1.5

0.0

1.5

-0.10

3.0

Synthesized wavefront
(discrete array)

0.10

1.5
0.00
0.0
-1.5

0.0

1.5

Amplitude

0.10
Amplitude

3.0

Synthesized wavefront
(continuous array)

-0.10

Disturbance of wavefront occurs
Discretization of the secondary source

Local sound field synthesis (1/2) [J. Ahrens, S. Spors., 2011]

0

10

-24

0

-48

-30

0

30


20

0

10

-24

0

-48

-30

0

30

Spectral overlap is suppressed

Rectangular window for the spectrum of the driving function
By applying a rectangular window to a spectrum in the left figure, we
enable to suppress a spectral overlap like in the right figure.

Magnitude[dB]

20

Magnitude[dB]

Local sound field synthesis: the method enables to suppress a spatial
aliasing by limiting spatial bandwidth in the wavenumber domain.

Local sound field synthesis (2/2) [J. Ahrens, S. Spors., 2011]
By applying a rectangular window, we enable to suppresses a
disturbance of wavefront and enable to increase the maximum
frequency in which the sound field can be correctly reproduced.
Synthesized wavefront (unfiltered)

Synthesized wavefront (filtered)

0.0
-1.5

0.0

1.5

-0.10

Spatial aliasing occurs

1.5
0.00
0.0
-1.5

0.0

1.5

Amplitude

0.00

Amplitude

1.5

0.10

3.0

0.10

3.0

-0.10

Disturbance of wavefront is suppressed
Reproduction area is localized

Therefore, It is necessary to design a filter to precisely control the
reproduced direction in order to take advantage of this method.

Equiangular filter
In order to design a filter to accurately control the reproduced direction,
we derive the relation equation between reproduced direction ,
wavenumber in -direction
and frequency .
constant
proportional
： wavenumber in

-direction

： speed of sound

：reproduced direction
： frequency

If reproduced direction is constant, since it is found that
proportional to , we design a new filter as follows

： angular frequency
： angular width

： wavenumber
： equiangular filter

is

Result of applying the equiangular filter (1/2)
An example when we applied a designed filter to a spectrum

0

10

-24

0

-48

-30

0

30


and the angular width

is

.

20

0

10

-24

0

-48

-30

0

30

Spectral overlap is suppressed

Equiangular filter for the spectrum of the driving function
Equiangular filter used in this presentation is cut by applying a low-pass
filter with respect to the frequency that exceeds the maximum
frequency
, and we do not reproduce the sound field.

Magnitude[dB]

20

is

Magnitude[dB]

This case that the angular

Result of applying the equiangular filter (2/2)
By applying the equiangular filter, we enable to suppress a disturbance
of wavefront and enable to reproduce the sound field to the specific
direction.
Synthesized wavefront (unfiltered)

Synthesized wavefront (filtered)

0.0
-1.5

0.0

1.5

-0.10

Spatial aliasing occurs

1.5
0.00
0.0
-1.5

0.0

1.5

Amplitude

0.00

Amplitude

1.5

0.10

3.0

0.10

3.0

-0.10

Disturbance of wavefront is suppressed

However, there is a problem that it is impossible to match the sweet spot
to the listener’s position if listener’s direction is unknown in advance.

Summary of problems
Problems in SDM
A sound pressure error occurs in the case that the reference
listening distance does not match listener's distance.
A spatial aliasing is occurred by discretization of secondary sources.
Second problem can be solved by applying an equiangular filter

Problems in equiangular filter
It is impossible to match the sweet spot to the listener’s position if
listener’s direction is unknown in advance.

These problems can be solved if we know the listener’s
position,
therefore, introduction of the image sensor enables to solve
these problems.

Condition of simulation experiment
Primary source (monopole source)

34 ch linear secondary
source array (monopole source)

Parameter name
measurement plane
aliasing frequency

Parameter value
W4.0 D4.0
approximately 2019 Hz

angular width
reproduced direction
Reference
listening line

synthesis frequency

3, 5 kHz

Evaluation score

： radiation characteristic of primary sound field
： radiation characteristic of secondary sound field

It is assumed that listener’s position is obtained by the image sensor, we calculate
the reproduced direction from sound source position and listener's position.

Results of simulation experiment
0.10

0.10

2.0
1.0

0.00

0.0
-1.0

-1.5

0.0

1.5 -0.10

2.0
1.0

0.00

0.0
-1.0
-1.5

0.0

Amplitude

Synthesized wavefront (5 kHz)

Amplitude


1.5 -0.10

Evaluated value (3 kHz)


0

0
2.0

2.0
-24

0.0
-1.0

1.0

-24

-48

1.0

0.0

-48

-1.0
-1.5

0.0

1.5

-1.5

0.0

1.5
： Listener
： Primary source

Results of simulation experiment
0.10

0.10

2.0
1.0

0.00

0.0
-1.0

-1.5

0.0

1.5 -0.10

2.0
1.0

0.00

0.0
-1.0
-1.5

0.0

Amplitude


Amplitude


1.5 -0.10



0

0
2.0

2.0
-24

0.0
-1.0

1.0

-24

-48

1.0

0.0

-48

-1.0
-1.5

0.0

1.5

-1.5

0.0

1.5

The sound field is correctly reproduced
at listener’s direction regardless of the frequency.

： Listener
： Primary source

Condition of subjective assessment on directional perception
parameter name

Acoustic transparent
curtain

： Primary source
： Answer number card

parameter value

sampling frequency

48 kHz

quantization bit rate

16 bit

test sound

white Gaussian noise with 3 seconds

aliasing frequency
34 ch linear
loudspeaker array angular width


sound source direction
number of evaluator
type of sound source

Loudspeaker
distance
Reference
listening line

7
・sound source without bandwidth limitation
(Conventional1)
・sound source with bandwidth limitation in
frequencies under 2 kHz (Conventional2)
・sound source in which we applied the
equiangular filter(Proposed)

Evaluation score
Pos 1
Pos 2

Pos 3
： number of evaluator

： answered direction

： true source direction

We asked evaluators to answer which card position you perceive the sound
source exists as an evaluation procedure.

Results of subjective assessment on directional perception
Conventional1 (without bandwidth limitation)
Conventional2 (with bandwidth limitation in frequencies under 2 kHz)
Proposed (in which we applied the equiangular filter)

Bad

(a) In Pos1

(b) In Pos2

(c) In Pos3

Good
Proposed is superior to Conventional1 and Conventional2 in Pos1 and Pos2.
However, Proposed is almost the same as Conventional2 in Pos3.
This is because in equiangular filter, as the angle of reproduced direction becomes
larger, the maximum frequency becomes low.

As the user moves to right (from Pos1 to Pos3), directional perception error of
Conventional1 becomes larger owing to the effect of a spatial aliasing.
The superiority of the proposed method is shown on directional perception.

Condition of subjective assessment on sound quality

Acoustic transparent
curtain

： Primary source
： Reference loudspeaker

parameter name

parameter value

sampling frequency

34 ch linear
loudspeaker array

48 kHz

quantization bit rate

16 bit

test sound
aliasing frequency

White Gaussian noise with 3 seconds

angular width

Loudspeaker
distance

sound source direction
number of evaluator
type of sound source

Reference
listening line

Pos 1
Pos 2

Pos 3

7
・sound source without bandwidth
limitation (Conventional1)
・sound source with bandwidth limitation
in frequencies under 2 kHz
(Conventional2)
sound source in which we applied the
equiangular filter(Proposed)

We sounded two synthesized sound after reference sound radiated by reference
loudspeaker, and asked evaluators to answer which synthesized sound you
perceive closer to the reference sound as an evaluation procedure.

Results of subjective assessment on sound quality
Conventional1 (without bandwidth limitation)
Conventional2 (with bandwidth limitation in frequencies under 2 kHz)
Proposed (in which we applied the equiangular filter)

Good
(a) In Pos1

(b) In Pos2

(c) In Pos3

ꥰꥰ

Bad
In all results, evaluators chose Conventional1 or Proposed, and didn’t
choose Conventional2.
In all listener’s position, more evaluator chose Conventional1 than
Proposed.
It was suggested that the effect in which high frequency region of sound is
cut is larger than the effect of spatial aliasing on sound quality.

Conclusion
The objective of SFR system is to reproduce the primary sound field to
another space with wide range and high accuracy as much as possible.
Since it is difficult to reproduce the sound field with a complex system, the
SFR method utilizing simple system has been desired.

SDM can be realized with a simple system like linear loudspeaker array.
However, to reproduce the sound field with high accuracy utilizing this
method is impossible.
ꥰꥰ

We proposed the SFR system which reproduce the sound field with high
accuracy to listener's position by estimating the listener's direction.

As results of subjective assessment, the superiority of proposed
method is shown on directional perception.
However, since the superiority failed to show on sound quality, it is
necessary to improve the equiangular filter that we do not apply the lowpass filter.

Thank you for your attention!

Robust Sound Field Reproduction against Listener’s Movement Utilizing Image Sensor

More Related Content

What's hot (20)

Viewers also liked (6)

Similar to Robust Sound Field Reproduction against Listener’s Movement Utilizing Image Sensor (20)

More from 奈良先端大情報科学研究科 (20)

Recently uploaded (20)