William Brock Witherspoon, CS 29500, Digital Sound Modeling, Spring 2003

Note: comments in red are additions made after the course ended.
Table of Contents
Phase: Audible or not?
  Random Phases 120-Hz tone with 20 harmonics at amplitudes of 1/n
  Random Phases in a 200-Hz tone with 3 harmonics at amplitudes of 1/n
  Stereophonic Phase Inversion with a Pseudo-Triangle Wave
Playing with Dither

Phase: Audible or not?

"The frequency-domain representation of periodic sounds was studied by the scientists Ohm, Helmholtz, and Hermann in particular. Ohm stated that changes in the phase spectrum, although they altered the waveshape, did not affect its aural effect. Helmholtz developed a method of harmonic analysis with acoustic resonators. According to these studies, the ear is "phase-deaf", and timbre is determined exclusively by the spectrum. ... Such conclusions are still considered essentially valid -- for periodic sounds only because Fourier series analysis-synthesis works only for those."
--Risset, 1991. p. 10

"It was Ohm who first postulated in the early nineteenth century that the ear was, in general, phase deaf, Risset and Mathews (1969). Helmholtz validated Ohm's claim in psycho-acoustic experiments and noted that, in general, the phase of partials within a complex tone (of three or so sinusoidals) had little or no effect upon the perceived result. Many criticisms of this view ensued based on faulting the mechanical acoustic equipment used, but Ohm's and Helmholtz' observations have been corroborated by many later psycho-acoustic studies.

The implications of Ohm's acoustical law for sound analysis are that the representation of Fourier spectral components as complex-valued elements possessing both a magnitude and phase component in polar form is largely unnecessary and that most of the relevant features in a sound are represented in the magnitude spectrum. This view is, of course, a gross simplification. There are many instances in which phase plays an extremely important role in the perception of sound stimuli. In fact, it was Helmholtz who noted that Ohm's law didn't hold for simple combinations of pure tones. However, for non-simple tones Ohm's law seems to be well supported by psycho-acoustic literature, Cassirer (1944)."
--Casey, 1998. p. 82.

"Fourier claimed in his writing that any periodic continuous signal could be represented by the sum of an infinite number of sine and cosine waves. This elegant description of periodic signals was later exploited by the 19th century physicist Herman Helmholtz (Helmholtz 1877). His view of the ear was that of a "frequency analyzer" based primarily on Fourier's mathematical theorem, Ohm's physical definition of a simple tone and the existence of a resonator in the cochlea, capable of accomplishing sound analysis. According to Helmholtz's theory, the cochlea behaved like a spectral analyzer analogous to the Fourier transform. He believed that the cochlea resonated at specific locations along the basilar membrane (Carterette and Friedman 1978), each tuned to specific frequencies. Helmholtz also claimed that the spectral magnitude components, and not the phase components, were the sole factors contributing to the perception of musical tones. ... [T]he importance of phase in perceiving musical sounds was demonstrated by Clark (Clark, Luce, Abrams, Schlossberg and Rome 1963), who clearly showed that in the absence of phase information, acoustic waveforms sounded unrealistic. This may be partly attributed to the fact that the highly transient onset part of a signal stores a great deal of phase information. Helmholtz's theory works well in ideal situations when a signal is periodic. However, real-life sounds are only quasi-periodic and vary considerably."
-- Park, 2000. Chapter 1

There is obviously much confusion as to whether phase is audible.

We've got Risset saying that the ear is phase-deaf for periodic sounds only. Casey says just the opposite, that "for simple combinations of pure tones" the phase information is important aurally. Park says that it works for periodic signals but not for quasi-periodic sounds. So.... as I said, there's some confusion. I aimed to make a few observations of my own. I had always heard (from stereo magazines, mostly) that phase wasn't important *except* in the cases of the 'attack' portion of complex sounds such as brass instruments. (The concern of stereo magazines is thus: does it matter if you hook up your speakers with reversed polarity? It would seem that the answer ideally should be no, but due to the capacitors and resistors involved in the speaker itself, some of which are "one-way" or have differing output depending on which direction the current takes through the component in question, differences are indeed often perceived in real-world equipment. The "correct polarity" sounds realistic, while the incorrect polarity introduces distortion into the signal. Somewhat ironically (at least to my tastes), this means that the answer to our question of whether phase is audible, becomes indeterminate essentially due to the Heisenberg Uncertainty Principle. We can have fun manipulating sound files on our computers all day, but what do these sound files sound like? The answer to this question is, they sound like nothing. You can't hear a thing until you have manipulated that signal enough to transform it into a physical vibration in the air which can be perceived by your ears. The very act of observing (or a necessary step in the observation process) alters what you're trying to observe. Unfortunately, there seems to be no way around this. No natural instrument can reverse phase polarity on demand, so any manipulated signal must be played through loudspeakers, which are inherently limited and will introduce distortion of the signal. Something to keep in mind. I was misled by an apparent particularity of my playback system in preparing this presentation (see below).)

Random Phases 120-Hz tone with 20 harmonics at amplitudes of 1/n

triangle.wav	triangle-random1.wav	triangle-random2.wav
triangle-random3.wav	triangle-random4.wav	triangle-random5.wav

Phases of Harmonics (in radians)
harmonic	triangle	triangle-random1	triangle-random2	triangle-random3	triangle-random4	triangle-random5
1	0	1.83711	3.46062	6.02675	3.67181	2.42479
2	0	0.544753	3.09466	3.49525	2.06105	2.63514
3	0	6.25577	2.88992	3.53942	6.05514	4.67926
4	0	3.68252	1.35757	4.63752	3.1193	5.74835
5	0	4.10138	6.18269	4.64644	4.73306	6.06733
6	0	0.184508	3.01704	5.9295	4.86513	0.221915
7	0	3.55238	5.66389	2.54218	5.0638	1.72635
8	0	4.54634	3.55486	4.20707	5.56446	4.3007
9	0	3.95338	3.75598	4.78792	1.91365	1.32755
10	0	4.20758	1.94593	3.91196	2.04048	5.04607
11	0	1.0495	0.229017	4.82828	2.27664	2.77417
12	0	2.54135	2.78365	0.815888	2.78672	0.914523
13	0	3.53978	4.35945	4.38054	1.93448	2.34998
14	0	5.89769	0.217916	4.04154	4.06316	2.76509
15	0	1.12835	3.45085	5.31362	1.78134	4.33458
16	0	2.51862	5.62405	5.36705	1.87703	0.972144
17	0	1.89305	0.247233	6.11223	0.746158	4.92601
18	0	4.82145	5.01048	3.32837	0.161827	2.55013
19	0	4.67219	2.72143	3.5555	5.51875	1.39039
20	0	3.65953	4.77311	2.5804	2.73035	4.68128

Observations

It would appear as though the phase differences are indeed audible. Subtle distinctions can be heard among these 6 examples. When I focus on specific parts of the sound, I cannot tell a difference -- however, my initial reaction to each example is that it's different somehow. A slight shift in overall pitch or perhaps some concept of liveliness -- I can't put my finger on what seems different, but

It appears as though what I was hearing earlier was a result of my particular audio equipment. Upon further listening with Mr O'Donnell's equipment, I could no longer hear any differences.

You may want to normalize these files to bring them up to readily audible levels on your system.

See next section for a simplified version of the formula I used to generate these sound files. (Not that you couldn't figure it out.) I used a simple spreadsheet to generate the random numbers and splice together the formulas, although a C program could easily do the same.

Random Phases in a 200-Hz tone with 3 harmonics at amplitudes of 1/n

Example of function to create these sound files:

1*x*sin(2*pi*t*1*f+0)+0.5*x*sin(2*pi*t*2*f+0)+0.333333*x*sin(2*pi*t*3*f+0)

where
x=.3 (I found that (under all encountered circumstances) this setting would keep the maximum amplitude from reaching the point of clipping)
f=120 or 200 (frequency)
the 0's indicate that this is the initial, pseudo-triangle-wave form. The random phase information would take the place of the 0's for the random phase forms.

For the earlier 20-harmonic waveforms, there would be 20 summands rather than 3.

3sines.wav	3sines-random1.wav
3sines-random2.wav	3sines-random3.wav

Phases of Harmonics (in radians)
harmonic	3sines	3sines-random1	3sines-random2	3sines-random3
1	0	5.04924	3.18175	3.29374
2	0	5.31725	3.47728	3.83017
3	0	0.231593	2.47091	4.32364

Stereophonic Phase Inversion with a Pseudo-Triangle Wave

Just messing around to see if cross-ear phase inversion would be audible. I can't hear anything. First sample is the normal (in-phase), second sample has the right channel's phase inverted. This was done with the 20-harmonic and 3-harmonic triangle tones from the earlier part of this presentation.

Be sure to use headphones! Alternatively, place your speakers directly facing one another (as close together as possible) and observe the startling effects of constructive and destructive interference. Then place your speakers normally and walk around the room and see if you can hear differences. (You might want to copy and paste the file several times over into a longer file so that you're not just listening to a 1 second clip.) At the frequencies I've chosen, wavelengths are approximately in the 2 to 3 meter range.

Waveform	Wave files
	trianglereg.wav triangle-rightinverted.wav
	3sinesreg.wav 3sines-rightinverted.wav

Playing with Dither

I started out with a 2-second, 16-bit, 44100kHz sample of a pure sine wave at 240Hz. This tone was recorded at -30dB -- pretty quiet but still very practical. I then used Sound Forge's "Bit-Depth Converter" to convert to 8 bits with a variety of dither options. First I tried it with no dither; the quantization error was clearly audible. The next option was a dither with a 1-bit (peak-to-peak) rectangular density function; then a rectangular 2-bit (peak-to-peak) dither. Finally I tried a 2-bit (peak-to-peak) triangular dither (cited (and apparently proved) as optimal by Lipshitz, Wannamaker, and Vanderkooy, 1992) and a 2-bit (RMS-to-RMS) Gaussian dither. After applying the dither, I then subtracted the original tone from each of the files, leaving just the residual of the dithering process. I then used Sound Forge's Statistics function to find the RMS power level of the noise and the highest single sample in the residual noise. You can hear all of the files here:

	16-bit	8-bit
sample	Original	no dither	rectangular 1-bit	rectangular 2-bit	triangular 2-bit	Gaussian 2-bit
sample with original tone subtracted	X	no dither	rectangular 1-bit	rectangular 2-bit	triangular 2-bit	Gaussian 2-bit
RMS power of residual	X	-53.87dB	-50.42dB	-45.76dB	-41.17dB	-51.07dB
Maximum residual sample value	X	-48.23dB	-42.18dB	-38.65dB	-34.23dB	-39.86dB
Average residual loudness (Visually estimated with VU meter)	X	-48dB	-42dB	-39dB	-35dB	-40dB

Conclusion:
It is clear that dither is very helpful in removing the harmonic distortion that arises from quantization error. Also, it is clear that 1-bit rectangular dither is not good enough to get rid of the quantization error. When it comes to the other three dithering options, however, things are not so clear. The triangular dither leaves a loud, high-pitched residual. The Gaussian dither is very quiet and low in pitch, but I believe that I can hear some of the quantization error remaining at a very low level. The rectangular dither is in between the others in terms of pitch and sound level and offers the best sound overall, to my ears.

On Mr O'Donnell's system, I could still hear the odd unexpected residual for the Gaussian dither. Mr O'Donnell confirmed that he could hear it as well. To get the right pitch to listen for, first listen to the residual with no dither (pure quantization error). Note that the volume must be raised very high for this to work! The residuals themselves are at around -40dB, so the quantization error in the Gaussian dither file must be at -60dB or less. This can be quite difficult to hear! At the time of this update, I must turn my receiver all the way up (+78dB) in order to barely be able to hear the quantization error on my headphones (I couldn't hear it on my speakers... last year when I took the class, I had a different setup and could hear the quantization error on my speakers). It might be more pronounced on other systems which resonate at different stages. I'm using a digital connection from my PC to my receiver, so the distortion is minimized.

A possible source of error in this presentation was the way in which I performed the subtraction. I simply copied and pasted an inverted copy of the original 16-bit tone into the resampled 8-bit tone, using additive mixing. It would seem as though this procedure behaved as expected: for example, with the undithered 8-bit tone we get exactly the expected quantization error. This would seem to indicate that Sound Forge (the program I used for these manipulations) did not upsample the 8-bit tone when adding it to the 16-bit tone. However, the strange result of quantization error with the Gaussian dither was not expected, and could conceivably be the result of something strange going on with the addition procedure. Replication of this experiment using more conventional (i.e., non-GUI) programs (in which you can tell exactly what's going on) might bring different results, although coming up with the various dither functions might be a pain.

References

Casey, Michael Anthony. 1998. "Auditory Group Theory with Applications to Statistical Basis Methods for Structured Audio." Ph.D. Thesis, MIT. http://xenia.media.mit.edu/~mkc/thesis/.
Lipshitz, S.P., R.A. Wannamaker and J. Vanderkooy, "A Theoretical Survey of Quantization and Dither," J. Audio Eng. Soc., vol. 40, 1992 May, pp. 355-375.
Park, Tae Hong. 2000. "Salient Feature Extraction of Musical Instrument Signals." M.A. Thesis, Dartmouth College. http://www.music.princeton.edu/~park/thesis/dartmouth/html/.
Risset, Jean-Claude. 1991. "Timbre Analysis by Synthesis: Representations, Imitations, and Variants for Musical Composition." In De Poli, Picciali, and Roads, eds. Representations of Musical Signals. Cambridge, MA: MIT Press, pp. 7-43.