Courses
Human sound perception varies, probably even more than human
vision. Basic sound modeling techniques need to work well for a large
proportion of people with "normal" hearing. Here are typical
parameters of human hearing, based on my interpretation of information
in Audition, by Pierre Buser and Michel Imbert, English
translation by R. H. Kay, MIT Press, Cambridge MA, 1992, and in the
notes to the CD Audio Demonstrations, by A. J. M. Houtsma,
T. D. Rossing, W. M. Wagenaars, Philips 1126-061.
- Frequency range:
- 20 Hz to 20,000 Hz. Musically, that is a spread of about 10
octaves (the piano has about 7 octaves). Some people hear signals
with frequencies well above 20,000 Hz. In a changing sound,
frequency components far above 20,000 Hz may have perceptible
effects, even though they are not noticeable as components of
the sound.
- Frequency discrimination:
- Between about 1,000 Hz and 8,000 Hz, we notice changes between
frequencies whose ratios are about 1.002 or 1.003, which is
roughly 200 to 350 steps per octave, or something between 1/30
and 1/15 of a musical half step. Outside of this range,
discrimination is poorer, but for most of the range of audible
frequencies we notice changes in ratios smaller than 1.01, which
gives more than 60 steps per octave, or something smaller than
1/5 of a half step. Discrimination of frequencies played in
sequence is bit less---typically about 90 steps per octave or
about 1/8 of a half step.
Musicians interested in nonstandard pitches
have usually used the cent, which is 1/100 of a half
step, or the savart, which is 1/25 of a half step. In
complex sounds, frequency distinctions may be important even
though they are less than those perceptible as changes in a
simple helical signal. There is also a unit, called the
mel that is like a pitch measurement, but scaled to the
people's judgments that certain pitches are ``twice as high'' as
others in psychological experiments. Although it is
defined in terms of perceptual parameters, the mel
probably does not correspond as well to perception as the
various musical measures.
Taking frequency resolution between 90
and 360 steps per octave, over a range of 10 octaves, we get 900
to 3,600 distinguishable frequencies. But, it seems that we
cannot exploit those as independent bits, and the practical
information capacity of a single sound is much less.
- Critical bands:
- Our hearing is affected by "critical bands" of frequencies. The
width of these bands is about 1/3 octave, but it varies
according to the center frequency. The bands are not discrete,
rather there is a critical band at each center
frequency. Frequency discrimination for signals of only one
wavelength is approximately the width of a critical band. A sound
is perceived louder if its energy is spread across many critical
bands, rather than concentrated in a few. I think that critical
bands represent the basic frequency resolution of the filters in
the cochlea. Greater frequency discrimination presumably comes
from further filtering in the nervous system. So, about 30
critical bands cover the 10 octaves of human frequency
perception, yielding 30 disjoint bands. But, it is not at all
clear to me that a single mix of frequencies can present even 30
bits to our brains in a usable way.
- Beat frequencies:
- When two helical signals are played simultaneously with
frequencies differing by 2-3 Hz, we hear a single intermediate
frequency, getting louder and softer. This phenomenon is called
"beats." The rate of the beats is is the difference between the
helical frequencies. Beats may be heard with frequency differences
as high as 35 Hz, but the boundary is extremely fuzzy.
- Event resolution:
- I haven't found data on this point yet. I am pretty confident
that I can distinguish clicks separated by 1/30 second, and I
believe that I can go close to 1/100 second. Event resolution
depends crucially on the frequency components of the
events. The start of a helical signal at frequency F
cannot be perceived more precisely than about 1/F.
- Transient scale:
- Again, no data yet. I think that transients occur on a scale of
1/1000s to 1/10s of a second.
- Measuring loudness:
- I found the complications of different ways of measuring
loudness quite confusing, and haven't succeeded in reducing
them to a brief description. Loudness can be related either to
power level, typically measured in Watts per square
meter (W/m^2), or to change in pressure, typically measured
in bars, where 1 bar is the normal pressure of the
atmosphere. In either case logarithmic units called
decibels (dB) are used, where a difference of 10
dB represents multiplying the power by 10, a difference of
20 dB represents multiplying the pressure by 10
(power is proportional to the square of pressure). You
will find different choices for the 0 of the decibel scale. A
typical choice is that 0 dB is about 2/10^10 bars, or 1/10^12
Watts per square meter. On this scale, typical loudness measures
include
- 10 dB rustling leaves
- 20 dB noise in a recording studio
- 30 dB noise in a quiet room
- 30-70 dB conversational speech
- 40 dB noise on a quiet street
- 50 dB quiet music
- 60 dB cocktail party conversation
- 70-80 dB noisy street
- 90 dB symphony orchestra, playing loud
- 100 dB jack-hammer at 2 meters
- 120 dB thunder, or jet engine at 10 meters
There is a special unit of loudness, called the sone,
that is scaled to our auditory sensitivity at different
frequencies. In principle, this is a good idea, but the extra
complication is probably not worth it for most of our
purposes.
- Loudness range:
- From about 500 Hz to 2000 Hz we detect sounds as quiet as 5 dB,
which is about 4/10^10 bar pressure change, or 3/10^12 Watts per
square meter. At lower frequencies, sensitivity reduces, and we
need about 75 dB to hear a sound at 20 Hz. At higher frequencies
the curve is more complicated, improving to about -4 dB at 4000
Hz, then varying up to about 25 dB at 12000 Hz. There is no
fixed upper limit to detectable sound. Around 100 dB (2/10^5
bar, 1/10^2 Watts per square meter) sound gets
to be uncomfortably loud. Around 140 dB (2/10^2 bar, 10^2 Watts
per square meter) it becomes physically
painful. Eventually, I suppose it becomes lethal. The power ratio
between the softest detectable sound and the loudest usable
sound is something like 10^4 to 10^10, a range of 40-100
dB.
- Loudness discrimination:
- Minimum noticeable changes in loudness vary from about 0.15 dB
to about 10 dB, depending on the type of signal. 3/4 dB to 1 dB is
probably a practical increment. Loudness is a tricky parameter for
carrying information, since our perception of it is very sensitive
to context, and we have poor memory for loudness levels. 3/4 to 1 dB
discrimination, over a 60 dB range, suggests 45-60 discriminable
loudness levels. Since a given sound has only one loudness, this
suggests that loudness can only carry log 45 to log 60
(base 2), that is 5 or 6 bits of information. That is probably more
than can be used practically. It's not at all clear how well
relative loudness of different components of a sound can be
distinguished. In strictly monaural sound, we probably shouldn't
expect to distinguish more than one loudness value per critical
band, or 30 in all, with a total capacity of 150-180 bits. But, the
threshold of pain is probably determined more by the total loudness
than by the maximum loudness per critical band, and other perceptual
complications probably restrict the total information capacity of
the loudness channel to something much smaller.
|
|
|
Last modified: Wed Apr 3 20:12:37 CST 2002