The underlying idea for this research is that segregation of sounds becomes difficult for the ear when it is loaded with too much of information.
The work is concerned with the following concept: given multiple acoustic sources, excessive information could be removed in those time-frequency regions, where numerous overlapping contributions from sources occur, by discarding some contributions, with the purpose that the remaining information from other contributions is segregated more effectively. This is performed by converting individual time signals representing sound sources into time-frequency domain, and then performing the comparison of energy of all signals in all time-frequency cells. In the utmost form of this processing, in each cell all contributions are discarded except the strongest one. This type of processing results in complete removal of spectro-temporal overlap between individual acoustic sources. Experiments demonstrated that the removal of large parts of signals of musical instruments or speech in time-frequency domain may not be perceived in their mixture at all. Other experiments in this area demonstrated, that the ear can use information contained in very small areas of time-frequency plane. No minimum value of a timefrequency region that contributes to perception of sounds was found.
Smoothing individual spectrograms before performing comparisons in spectrotemporal cells makes the perceptual effect of this operation more pronounced. Parameters of smoothing for best results were investigated.
The use of time-frequency distributions of different resolutions was analysed.
The effect of removal of spectro-temporal overlap on perception of mixed speech signals was investigated, and it was demonstrated that it does not reduce the amount of semantic information received by the ear.
A more useful version of the investigated processing consists in removing a number of sound sources from a given time-frequency region, but leaving several others, not just one, which contribute most to perception. In this scheme, the actual process of selection is controlled by an adjustable threshold. Possible approaches to determining the threshold were investigated.
The final version of the developed technology was thoroughly tested with the use of five musical excerpts. In all of them, the majority of listeners chose the version processed with the algorithm developed in this work as providing more details. The listeners evaluated three of them as overall better sounding.
The work includes guidance for adjustment of parameters to obtain satisfactory results.
- Contents
-
Abstract 7
Important symbols and acronyms 9
1. Introduction 11
2. Properties of hearing relevant to selective mixing 17
2.1. General functions of the hearing system 17
2.2. The auditory filter and the critical band 20
2.3. Time-frequency resolution of the ear 25
2.4. Masking 27
2.4.1. Simultanous masking 28
2.4.2. Summing of simultaneous maskers 30
2.4.3. Nonsimultanous masking 31
2.4.4. Informational masking 31
2.5. Illusion of continuity 33
3. The concept of selective mixing of sounds 35
3.1. Motivation for selective mixing from literature 35
3.2. Removal of spectro-temporal overlap 38
3.2.1. The basic operation 38
3.2.2. Sizes and shapes of time-frequency regions to be compared 41
3.3. The concept of layers 43
4. The choice of a time-frequency analysis method for selective mixing 46
4.1. Introduction to signal analysis methods relevant to selective mixing 46
4.1.1. Linear algebra roots of signal analysis 46
4.1.2. The family of Fourier signal decompositions 49
4.1.3. Other signal analysis methods 51
4.2. Introduction to time-frequency analysis 52
4.3. Short-Time Fourier Transform 54
4.4. Wavelet Transform 55
4.5. Local trigonometric bases 57
4.6. Discrete Cosine Transform 57
4.7. Modified Discrete Cosine Transform 59
4.8. Modified Discrete Cosine-Sine Transform 61
4.9. Discussion 62
4.10. Conclusion 65
4.11. A note on implementation 66
5. An overview of psychophysical methods 68
5.1. Introduction 68
5.2. Classification of psychophysical methods 70
5.3. The choice of methods for investigation on selective mixing 73
5.3.1. The choice based on literature review 73
5.3.2. Comparison of ratio scale with absolute magnitude estimation in evaluation of audio signals 75
5.3.2.1. Procedure 75
5.3.2.2. Results 77
5.3.3.3. Conclusions 79
5.4. The panel of listeners 80
5.5. The experimental software 81
6. Estimation of size and shape of time-frequency regions 84
6.1. General assumptions 84
6.2. The problem of size 85
6.2.1. Perceptual weight of small elements 87
6.2.2. Effects occurring in boundaries between areas occupied by different sound sources 88
6.2.3. Nonlinear distortion in transition between different sound sources 91
6.2.4. Methods for grouping of cells 95
6.2.4.1. Smoothing individual spectrograms 95
6.2.4.2. Smoothing maps of occupancy 99
6.3. Estimation of appropriate size of time-frequency regions 104
6.3.1. Simple evaluation of perceptual differences 108
6.3.2. Evaluation of perceptual differences with the use of signal detection theory 120
6.3.2.1. Mixtures of musical instruments 120
6.3.2.2. Mixtures of everyday sounds 125
6.3.3. Numerical estimation of smoothness of shapes 127
6.3.4. Final conclusions on size 131
6.4. Perceptual effect of proportion in rectangular shape 132
6.4.1. The comparison of performances of the 256 transform versus the 1024 transform 133
6.4.2. Performance of transforms in limited frequency ranges 136
6.5. Does energetic masking contribute to perception of removal of spectral overlap? 138
7. An attempt to introduce context rules 141
8. Application of removal of spectro-temporal overlap to speech 145
8.1. Speech as a material for quantitative assessment 145
8.2. Method 146
8.3. Results 150
8.4. Conclusions 152
9. Multi-layer selective mixing 153
9.1. Possible approaches 153
9.2. An improved method of smoothing 156
9.3. The comparison of basic approaches 161
9.4. An objective measure of the degree of the effect based on energy 164
9.5. The choice of an option for threshold-based selection 167
9.6. The effect of threshold on absolute rating in evaluation categories 170
9.7. Evaluation of perceptual sensitivity to the degree of selection in low, middle and high frequency bands 172
9.8. Evaluation of the degree of selection appropriate in low, middle and high frequency bands 175
9.9. Evaluation of multi-band selective mixing 177
9.10. Perception of selective mixing by hearing impaired listeners 179
9.11. Final evaluation 181
10. Summary and recommendations 184
Bibliography 187