Creating a virtual auditory space consists of taking sounds and distorting them to simulate distance and direction. A number of decisions have to be made in order to implement our observations of auditory perception.
One large obstacle to implementing a VAS is simulating the effects of the pinna. If the ear were modeled based on its geometry, one could make an equation of frequency response based on azimuth and elevation angles, unfortunately this has proven very difficult to accomplish. This means that in order to create the effect of filtering caused by the pinna, a separate filter must be made from expiremental data for each combination of elevation and azimuth angle. One such set of data has been published for free use by MIT (Gardner). Such data is usually recorded in the time domain by impulse response. The resulting waveform is refered to as the head related transfer function (HRTF). The HRTF is then simply convolved with any signal to produce the effects of the head and pinna. A decision then has to be made concerning what data to include or remove from the HRTF. Through equalization, one could remove the interaural pressure differences, so that the HRTF contains no evidence of the shadowing of the head. Also, interaural time differences could be removed. Again, this removes the effect of sound being refracted by the head. This would be done if there were a more effective or accurate way to simulate this effect, or if one is interested in studying the pinna alone. Keeping the effects of the head in the data assumes that the listener of the final product will be of the same size and shape as the acoustical dummy used in obtaining the data. With sufficiently powerful DSP hardware, one could make real-time systems that account for different head sizes. Neither of these factors were removed from the MIT data.
In a space where multiple sounds are occuring, phase becomes very important. In this case, the phase being refered to is not the difference in phase caused by the two ears, but rather the delay between when the sound is initiated and when the sound reaches the ear. This delay can be very significant over large distances. This becomes increasingly important when sound is simmulated in a virtual room. Sounds reflecting off the many surfaces of the room arrive at the listener's ear at different times, creating a much different effect than the direct signal alone. Blauert, and others, have been very effective at modeling the listening characteristics of concert halls (Blauert 380). As mentioned above, reverberation is important to distance perception. Without some reverberation, the brain often localizes sounds heard over headphones as being somewhere inside the head. Some commercially available sound cards for the PC have a generic reverberation built into their 3D sound chips, others use additional hardware to calculate reflections off of barriers.
If objects are to be moving in a virtual auditory space, then one must implement doppler shift. This can, however, be very resource intensive for two reasons: all distance calculations, including reflections, must be made before any HRTF can be applied, this increases the latency of the system. Secondly, a different HRTF must be applied to each sample, this usually requires they be stored in memory or ROM for fast access. It is for this reason that real-time interactive systems must be implemented with DSP hardware.
Headphones are widely used in virtual auditory space expirements and demonstrations. Headphones effectively deliver sound to the eardrum with little or no distortion caused by the outer ear. Sound heard from speakers will be distorted by the outer ear, and also can reverberate within the room. However, an additional head transfer function can be added to the speaker output to counteract the effects of the ear. Multiple-speaker systems have proven effective as an alternative, but require specific positioning and additional hardware.