Com Sci 29500: Digital Sound Modeling (Spring 2003)

Texts

Parameters of Auditory Perception

Human sound perception varies, probably even more than human vision. Basic sound modeling techniques need to work well for a large proportion of people with "normal" hearing. Here are typical parameters of human hearing, based on my interpretation of information in Audition, by Pierre Buser and Michel Imbert, English translation by R. H. Kay, MIT Press, Cambridge MA, 1992, and in the notes to the CD Audio Demonstrations, by A. J. M. Houtsma, T. D. Rossing, W. M. Wagenaars, Philips 1126-061.

Frequency range:

20 Hz to 20,000 Hz. Musically, that is a spread of about 10 octaves (the piano has about 7 octaves). Some people hear signals with frequencies well above 20,000 Hz. In a changing sound, frequency components far above 20,000 Hz may have perceptible effects, even though they are not noticeable as components of the sound.

Frequency discrimination:

Between about 1,000 Hz and 8,000 Hz, we notice changes between frequencies whose ratios are about 1.002 or 1.003, which is roughly 200 to 350 steps per octave, or something between 1/30 and 1/15 of a musical half step. Outside of this range, discrimination is poorer, but for most of the range of audible frequencies we notice changes in ratios smaller than 1.01, which gives more than 60 steps per octave, or something smaller than 1/5 of a half step. Discrimination of frequencies played in sequence is bit less---typically about 90 steps per octave or about 1/8 of a half step.
Musicians interested in nonstandard pitches have usually used the cent, which is 1/100 of a half step, or the savart, which is 1/25 of a half step. In complex sounds, frequency distinctions may be important even though they are less than those perceptible as changes in a simple helical signal. There is also a unit, called the mel that is like a pitch measurement, but scaled to the people's judgments that certain pitches are ``twice as high'' as others in psychological experiments. Although it is defined in terms of perceptual parameters, the mel probably does not correspond as well to perception as the various musical measures.
Taking frequency resolution between 90 and 360 steps per octave, over a range of 10 octaves, we get 900 to 3,600 distinguishable frequencies. But, it seems that we cannot exploit those as independent bits, and the practical information capacity of a single sound is much less.

Critical bands:

Our hearing is affected by "critical bands" of frequencies. The width of these bands is about 1/3 octave, but it varies according to the center frequency. The bands are not discrete, rather there is a critical band at each center frequency. Frequency discrimination for signals of only one wavelength is approximately the width of a critical band. A sound is perceived louder if its energy is spread across many critical bands, rather than concentrated in a few. I think that critical bands represent the basic frequency resolution of the filters in the cochlea. Greater frequency discrimination presumably comes from further filtering in the nervous system. So, about 30 critical bands cover the 10 octaves of human frequency perception, yielding 30 disjoint bands. But, it is not at all clear to me that a single mix of frequencies can present even 30 bits to our brains in a usable way.

Beat frequencies:

When two helical signals are played simultaneously with frequencies differing by 2-3 Hz, we hear a single intermediate frequency, getting louder and softer. This phenomenon is called "beats." The rate of the beats is is the difference between the helical frequencies. Beats may be heard with frequency differences as high as 35 Hz, but the boundary is extremely fuzzy.

Event resolution:

I haven't found data on this point yet. I am pretty confident that I can distinguish clicks separated by 1/30 second, and I believe that I can go close to 1/100 second. Event resolution depends crucially on the frequency components of the events. The start of a helical signal at frequency F cannot be perceived more precisely than about 1/F.

Transient scale:

Again, no data yet. I think that transients occur on a scale of 1/1000s to 1/10s of a second.

Measuring loudness:

I found the complications of different ways of measuring loudness quite confusing, and haven't succeeded in reducing them to a brief description. Loudness can be related either to power level, typically measured in Watts per square meter (W/m^2), or to change in pressure, typically measured in bars, where 1 bar is the normal pressure of the atmosphere. In either case logarithmic units called decibels (dB) are used, where a difference of 10 dB represents multiplying the power by 10, a difference of 20 dB represents multiplying the pressure by 10 (power is proportional to the square of pressure). You will find different choices for the 0 of the decibel scale. A typical choice is that 0 dB is about 2/10^10 bars, or 1/10^12 Watts per square meter. On this scale, typical loudness measures include

10 dB rustling leaves
20 dB noise in a recording studio
30 dB noise in a quiet room
30-70 dB conversational speech
40 dB noise on a quiet street
50 dB quiet music
60 dB cocktail party conversation
70-80 dB noisy street
90 dB symphony orchestra, playing loud
100 dB jack-hammer at 2 meters
120 dB thunder, or jet engine at 10 meters

There is a special unit of loudness, called the sone, that is scaled to our auditory sensitivity at different frequencies. In principle, this is a good idea, but the extra complication is probably not worth it for most of our purposes.

Loudness range:

From about 500 Hz to 2000 Hz we detect sounds as quiet as 5 dB, which is about 4/10^10 bar pressure change, or 3/10^12 Watts per square meter. At lower frequencies, sensitivity reduces, and we need about 75 dB to hear a sound at 20 Hz. At higher frequencies the curve is more complicated, improving to about -4 dB at 4000 Hz, then varying up to about 25 dB at 12000 Hz. There is no fixed upper limit to detectable sound. Around 100 dB (2/10^5 bar, 1/10^2 Watts per square meter) sound gets to be uncomfortably loud. Around 140 dB (2/10^2 bar, 10^2 Watts per square meter) it becomes physically painful. Eventually, I suppose it becomes lethal. The power ratio between the softest detectable sound and the loudest usable sound is something like 10^4 to 10^10, a range of 40-100 dB.

Loudness discrimination:

Minimum noticeable changes in loudness vary from about 0.15 dB to about 10 dB, depending on the type of signal. 3/4 dB to 1 dB is probably a practical increment. Loudness is a tricky parameter for carrying information, since our perception of it is very sensitive to context, and we have poor memory for loudness levels. 3/4 to 1 dB discrimination, over a 60 dB range, suggests 45-60 discriminable loudness levels. Since a given sound has only one loudness, this suggests that loudness can only carry log 45 to log 60 (base 2), that is 5 or 6 bits of information. That is probably more than can be used practically. It's not at all clear how well relative loudness of different components of a sound can be distinguished. In strictly monaural sound, we probably shouldn't expect to distinguish more than one loudness value per critical band, or 30 in all, with a total capacity of 150-180 bits. But, the threshold of pain is probably determined more by the total loudness than by the maximum loudness per critical band, and other perceptual complications probably restrict the total information capacity of the loudness channel to something much smaller.

Copyright Michael J. O'Donnell <michael_odonnell@acm.org>. Licensed for free use.
This page is generated from PHP source code, with supporting files.

Last modified: Wed Apr 3 20:12:37 CST 2002