Skip to main content

Banery wysuwane

żartobliwe przedstawienie produkcji dźwięku
Selective Sound Mixing Technology

Product category
nauki techniczne » mechanika
ISBN
978-83-7464-633-8
Publication type
monografia
Format
B5
Binding
miękka
Number of pages
194
Publication date
2013
Edition
1
Description

The underlying idea for this research is that segregation of sounds becomes difficult for the ear when it is loaded with too much of information.
The work is concerned with the following concept: given multiple acoustic sources, excessive information could be removed in those time-frequency regions, where numerous overlapping contributions from sources occur, by discarding some contributions, with the purpose that the remaining information from other contributions is segregated more effectively. This is performed by converting individual time signals representing sound sources into time-frequency domain, and then performing the comparison of energy of all signals in all time-frequency cells. In the utmost form of this processing, in each cell all contributions are discarded except the strongest one. This type of processing results in complete removal of spectro-temporal overlap between individual acoustic sources. Experiments demonstrated that the removal of large parts of signals of musical instruments or speech in time-frequency domain may not be perceived in their mixture at all. Other experiments in this area demonstrated, that the ear can use information contained in very small areas of time-frequency plane. No minimum value of a timefrequency region that contributes to perception of sounds was found.
Smoothing individual spectrograms before performing comparisons in spectrotemporal cells makes the perceptual effect of this operation more pronounced. Parameters of smoothing for best results were investigated.
The use of time-frequency distributions of different resolutions was analysed.
The effect of removal of spectro-temporal overlap on perception of mixed speech signals was investigated, and it was demonstrated that it does not reduce the amount of semantic information received by the ear.
A more useful version of the investigated processing consists in removing a number of sound sources from a given time-frequency region, but leaving several others, not just one, which contribute most to perception. In this scheme, the actual process of selection is controlled by an adjustable threshold. Possible approaches to determining the threshold were investigated.
The final version of the developed technology was thoroughly tested with the use of five musical excerpts. In all of them, the majority of listeners chose the version processed with the algorithm developed in this work as providing more details. The listeners evaluated three of them as overall better sounding.
The work includes guidance for adjustment of parameters to obtain satisfactory results.

Contents

Abstract  7
Important symbols and acronyms  9
1. Introduction  11
2. Properties of hearing relevant to selective mixing  17
2.1. General functions of the hearing system  17
2.2. The auditory filter and the critical band  20
2.3. Time-frequency resolution of the ear  25
2.4. Masking  27
2.4.1. Simultanous masking  28
2.4.2. Summing of simultaneous maskers  30
2.4.3. Nonsimultanous masking  31
2.4.4. Informational masking  31
2.5. Illusion of continuity  33
3. The concept of selective mixing of sounds  35
3.1. Motivation for selective mixing from literature  35
3.2. Removal of spectro-temporal overlap  38
3.2.1. The basic operation  38
3.2.2. Sizes and shapes of time-frequency regions to be compared  41
3.3. The concept of layers  43
4. The choice of a time-frequency analysis method for selective mixing  46
4.1. Introduction to signal analysis methods relevant to selective mixing  46
4.1.1. Linear algebra roots of signal analysis  46
4.1.2. The family of Fourier signal decompositions  49
4.1.3. Other signal analysis methods  51
4.2. Introduction to time-frequency analysis  52
4.3. Short-Time Fourier Transform  54
4.4. Wavelet Transform  55
4.5. Local trigonometric bases  57
4.6. Discrete Cosine Transform  57
4.7. Modified Discrete Cosine Transform  59
4.8. Modified Discrete Cosine-Sine Transform  61
4.9. Discussion  62
4.10. Conclusion  65
4.11. A note on implementation  66
5. An overview of psychophysical methods  68
5.1. Introduction  68
5.2. Classification of psychophysical methods  70
5.3. The choice of methods for investigation on selective mixing  73
5.3.1. The choice based on literature review  73
5.3.2. Comparison of ratio scale with absolute magnitude estimation in evaluation of audio signals  75
5.3.2.1. Procedure  75
5.3.2.2. Results  77
5.3.3.3. Conclusions  79
5.4. The panel of listeners  80
5.5. The experimental software  81
6. Estimation of size and shape of time-frequency regions  84
6.1. General assumptions  84
6.2. The problem of size  85
6.2.1. Perceptual weight of small elements  87
6.2.2. Effects occurring in boundaries between areas occupied by different sound sources  88
6.2.3. Nonlinear distortion in transition between different sound sources  91
6.2.4. Methods for grouping of cells  95
6.2.4.1. Smoothing individual spectrograms  95
6.2.4.2. Smoothing maps of occupancy  99
6.3. Estimation of appropriate size of time-frequency regions  104
6.3.1. Simple evaluation of perceptual differences  108
6.3.2. Evaluation of perceptual differences with the use of signal detection theory  120
6.3.2.1. Mixtures of musical instruments 120
6.3.2.2. Mixtures of everyday sounds  125
6.3.3. Numerical estimation of smoothness of shapes  127
6.3.4. Final conclusions on size  131
6.4. Perceptual effect of proportion in rectangular shape  132
6.4.1. The comparison of performances of the 256 transform versus the 1024 transform  133
6.4.2. Performance of transforms in limited frequency ranges  136
6.5. Does energetic masking contribute to perception of removal of spectral overlap?  138
7. An attempt to introduce context rules  141
8. Application of removal of spectro-temporal overlap to speech  145
8.1. Speech as a material for quantitative assessment  145
8.2. Method  146
8.3. Results  150
8.4. Conclusions  152
9. Multi-layer selective mixing  153
9.1. Possible approaches  153
9.2. An improved method of smoothing  156
9.3. The comparison of basic approaches  161
9.4. An objective measure of the degree of the effect based on energy  164
9.5. The choice of an option for threshold-based selection  167
9.6. The effect of threshold on absolute rating in evaluation categories  170
9.7. Evaluation of perceptual sensitivity to the degree of selection in low, middle and high frequency bands  172
9.8. Evaluation of the degree of selection appropriate in low, middle and high frequency bands  175
9.9. Evaluation of multi-band selective mixing  177
9.10. Perception of selective mixing by hearing impaired listeners  179
9.11. Final evaluation  181
10. Summary and recommendations  184
Bibliography  187

Contents
Price
0.00
In order to arrange international shipping details and cost please contact wydawnictwa@agh.edu.pl