Can a space perception conflict be solved using

three sense modalities?



Felice L. Bedford

University of Arizona

(in press, Perception; 2006)





University of Arizona

Department of Psychology

Tucson, AZ 85721


A cross-modal conflict over location was resolved in an unexpected way. When vision and proprioception provide conflicting information, which modality should dominate is ambiguous. A visual-propioceptive conflict was created with a prism and to logically disambiguate the problem, auditory information was added that either agreed with vision (Group 1), agreed with proprioception (Group 2), or was absent (Group 3). While a scarcity of research addresses the interaction of three modalities, we predicted error should be attributed to the modality in the minority. Instead, the opposite was found: Adaptation consisted of a large change in arm proprioception and a small change affecting vision in Group 2, and the reverse in Group 1. Group 1 was not different than Group 3. Findings suggested adaptation to separate two-way conflicts, possibly influenced by direction of attention, rather than a direct solution to a three-way modality problem.









Keywords: prism adaptation, cross-modal perception, visual-motor pointing, visual dominance, attention

We report an unexpected finding concerning the resolution of a spatial cross-modal conflict. Natural perception involves information gathered from multiple sensory modalities (e.g. Stein & Meridith, 1993). However, if the modalities provide conflicting information, which modality should be trusted as veridical?

The view that vision dominates other modalities for the apprehension of spatial location, size, and other geometric properties has dominated the literature since Irvin Rock's classic work (Rock & Victor, 1964, see also Harris, 1965, Rock & Harris, 1967), and continues to influence modern theorizing (e.g. Kubovy & Van Valkenburg, 2001). However, the visual modality can be submissive as well as dominant. Staring at ones legs for ten minutes while looking through a wedge prism creates a conflict between the seen and felt position of the legs, which is resolved with a change that affects vision (Wallach, Kravitz, & Landauer, 1963). This is in contrast to the more widely advertised findings in which felt position sense of limbs change to realign with visually provided information. Thus, even the early paradigms within which the visual dominance view emerged can produce the opposite result, sometimes with subtle changes in how observers are exposed to the conflict. More recently, elegant studies have shown systematic changes in observers' reliance on vision to reliance on haptic/proprioception corresponding to systematic changes in the amount of added visual noise (Ernst & Banks, 2002) and to the amount of delay of visual feedback (Redding & Wallace, 1990). Moreover, "visual dominance" cannot be a general resolution to cross-modal conflicts because of evolutionary considerations.

Consider perception from the perspective of a growing infant. Physical growth causes changes in each sensory apparatus through which perception takes place. For instance, growth causes the distance between the ears to increase (Clifton, Gwiazda, Bauer, Clarkson & Held 1988), a quantity which is essential for accurate localization of sound. Perception will be adversely affected by growth unless plasticity can accommodate the new body shell (Banks, 1988; Bedford, forthcoming; Clifton et al., 1988; Held, 1965; Pettigrew, 1978). Growth in different modalities at different rates and times will produce cross-modal conflicts. How does the infant "know" which modality is veridical and which should be fixed? A visual dominance hypothesis would mandate that vision should be relied upon and the other modality changed. Yet this conclusion would be wrong when it was the visual apparatus that grew. Such a conclusion would lead to plasticity within the wrong sensory system.

Visual dominance as a general rule may have emerged in part because the experiments were inherently ambiguous with respect to which modality should change. The experiments were conducted in adults in which artificial conflicts between modalities were created with a wedge prism that shifts the visual image. When vision and proprioception are so made to provide discrepant information about the location of an object, usable information is lacking as to which modality is correct. Not only are observers often unaware that their vision is altered, but even conscious knowledge that one is looking through a prism may not provide information that penetrates the automatic perceptual module. I have argued elsewhere (e.g. Bedford, 1995, 1999) that perception should not change unless an error is detected in a sensory system. While information for the presence of an internal error is clearly provided by rearrangement experiments (see Bedford, 1999), information as to the source of error is not.

Under such ambiguous conditions, a conclusion to "trust vision" is understandable. Rock and Harris (1967) argued that vision may be the dominant modality in cross-modal conflicts because it is the more accurate modality. Similarly, according to the refinement suggested by the "modality precision hypothesis" (Welch, 1978; Welch & Warren, 1986), the modality that dominates will be the one that is more precise for the task being performed (measured, e.g., by observations with the lowest variance, Welch, 1978, p45). For instance, vision is usually the most precise modality for spatial location and spatial extent (size), but audition is more precise than vision for temporal judgements. In the absence of better information, using a priori or on-line knowledge about the characteristics of the modalities to formulate an educated guess arguably leads to an intelligent, adaptive solution. Nonetheless, in real-world conflicts, this solution would still cause a less precise, but essentially correct, modality to get "fixed". In a developing child, such an outcome could be catastrophic.

We designed a simple procedure to provide richer information about the source of the error. The information is also potentially available in real-world conflicts. We added information from a third modality - audition - to logically disambiguate a conflict between vision and proprioception. Exposure to a conflict was provided wherein the identical visual-proprioceptive conflict over location received auditory information that either agreed with vision, agreed with proprioception, or was unavailable. For instance, an object localized straight ahead of the nose (0) through vision but 10 to the left through proprioception would then make a sound at either 0 (Group 1), 10 (Group 2) or not at all (Group 3). The visual, proprioceptive, and auditory (v,p,a) triplets used for each group are shown in Table 1. Adding audition provides information relevant to which modalities are correct. The dissenting modality is more likely to be wrong than both the modalities in agreement. For instance, support theory in problem solving, (Tversky & Koehler, 1994) would predict a solution that favors the majority. From the perspective of error correction (Bedford, 1989, 1993, 1995, 1999), the source of the internal error can now be more rationally identified and the components of adaptation will reflect this assessment of blame. Models of cross-modal conflict resolution based on weighted averages of available cues (Ernst & Banks, 2001; van Beers, et al., 1999) should also predict that the addition of (non-zero weighted) auditory information will push the solution appropriately towards vision or proprioception.

The visual-proprioceptive conflicts were created with a prism, a classic manipulation (Helmholtz, 1909) appearing in its latest guise as a clinical treatment for unilateral neglect (Rossetti, Jacquin-Courtois & Rode, 2004) nearly one hundred years later. Can the information from a third modality be exploited by the perceptual systems in logical problem solving-like fashion to disambiguate a conflict between two modalities?


Participants. The participants were 23 male right-handed undergraduate students from the University of Arizona with self-reported normal or contact-corrected vision. They received credit towards a course requirement for their participation.

Procedure. Participants were in one of three groups in which audition agreed with vision (Group 1, 8 subjects), agreed with proprioception (Group 2, 7 subjects) or was absent (Group 3, 8 subjects). This differential training was provided during an "exposure" phase. In addition, all groups were tested before (pretest) and after (posttest) exposure to assess both the overall amount of adaptation resulting from training in each group, as well as any differential visual and proprioceptive changes.

Testing: Testing occurred in a completely dark room without visual feedback or distortion. First, participants pointed with the right hand to small visual LED targets located 112 cm away (Task 1). Targets were located from 15 to the left of center through 15 to the right of center every 5 for a total of 7 trials presented in random order. Participants then pointed at the same targets in random order, except with the left hand, for an additional 7 trials (Task 2). Finally, participants were required to point with the right hand to the location of the left hand (Task 3). The experimenter placed a participant's left hand on a top shelf in one of 3 asymmetric positions, -20, -5 and 5. They pointed from a lower shelf such that the right index finger felt aligned with the left index finger. Each position was repeated twice in random order for a total of 6 trials. Posttest was identical to pretest except that a fourth task was added: Participants heard a sound (broad band white noise) at each of 5 positions from -20 to +20 located every 10 in random order, and required to point with the right hand. The sounds came from hidden speakers mounted directly behind the distant LED targets.

The difference between pretest and posttest on Task 1 (right hand pointing) measures the amount of overall adaptation because both changes involving vision and hand/arm proprioception will affect pointing with the exposed (right) hand. This is the classic measure for overall adaptation. The difference for Task 2 (left hand pointing) measures the amount of change affecting vision only (cf. Harris, 1965), because proprioceptive changes in the exposed arm/hand do not transfer to the unexposed arm/hand. The difference for Task 3 (right hand to left hand) measures the amount of change affecting only right arm proprioception because of the lack of intermanual transfer and because vision is not involved in the task (cf. Harris, 1965). The purpose of the fourth task (right hand to sounds) was to verify directly that the sounds used in the present apparatus were processed adequately given the importance of their role in training (see Training). The minimum audible angle for adults is approximately 1, and it was expected that the 10 auditory offsets needed in training would be easily resolved. In addition, pointing to sounds may provide additional information on the differences between how the conflict is resolved in the three groups. Since auditory pointing was not critical to assess the main components of adaptation, it was not included in pretest to avoid interfering in any way with the interpretation of sounds presented in training, as well as to minimize the required pointing in a tiring experiment.

Training: Participants were assigned to groups following Task 1 performance to balance any directional biases. All three groups received the identical visual-proprioceptive conflict accomplished by shifting vision 10 to the right for all positions with computer adjustment of a pair of variable prisms (Bedford, 1989). Exposure to the conflict was obtained by requiring participants to point to the distant LED targets while looking through the 10 shift, but the experiment room remained dark and all visual feedback provided through a "finger LED". That is, participants wore a cuff on their index finger that contained a single LED; on each trial, the finger LED is initially un-illuminated, and when a participant accurately points to specified target, it illuminates. Each participant's task was to point at a target using his right hand and to keep the finger LED illuminated for the duration of a 5 sec. trial. For instance, a target that is actually at 0 will appear at 10 and any initial aiming at 10 will not illuminate the finger LED. They were told that any difficulty encountered was because it was disorienting being in the dark, a small deception that is readily accepted. When participants succeed at pointing at the actual target, the finger LED illuminates; it too is seen through the prism and thus the light on the finger feels to be in one place (e.g. P=0), but appears at another (e.g. V=10), providing a visual-proprioceptive conflict. At the distance of the target LEDs, the target also appears visually at one place (V=10) but localized differently with pointing via the extension of the arm and hand which is at another location (e.g. P=0). The exposure method was chosen for tight control over the exposed conflict location pairs (Bedford, 1989). Each of 9 target positions (see Table 1) was repeated 10 times in random order for a total of 90 exposure trials. For Groups 1 and 2, a sound (broad band white noise) located at the same effective distance as the lights came on at the same time as the finger LED with the same azimuth angle as vision (Group 1) or as proprioception (Group 2). The timing of the sound and the finger LED were yoked; that is, simultaneous onsets and offsets of both were controlled by accurate pointing to the target.

Each participant's head was kept in the same position during the experiment with a chin rest, a forehead rest, and "centering lights" as each part commenced. All pointing occurred with arm and index finger fully extended. Pointing during each test trial was followed by the participant swinging his arm back and forth laterally to minimize dependence among the trials. The adaptive change (posttest minus pretest) for tasks 1 to 3 is for pointing to shift to the left.


The addition of auditory information influenced the direction of resolution in the visual-proprioceptive conflict. Figure 1 shows the differences between pretest and posttest for the 3 groups on the 3 tasks. The two groups with opposite auditory information had opposite components of adaptation; Group 1 had a large visual change and a small hand/arm proprioceptive change, while Group 2 the reverse pattern. Note that the finding is in the opposite direction of the prediction.

An ANOVA was performed on the differences between pretest and posttest with group and task as factors. The group by task interaction was significant (F(4,445)=5.4, p<.001) and reflects the different components of adaptation in the different groups. Group 3 without sound and Group 1 with sound in agreement with vision behaved similarly, while Group 2 with sound in agreement with proprioception reversed the components of adaptation. The task main effect was also significant (F(2,445)=8.8, p<.001) and reflects that overall, Task 1 (total adaptation) showed the most change and Task 3 (hand proprioception) the least. The group main effect was not significant (F=1.0, p>.1) which is due to equal adaptation in the 3 groups when collapsed across task.

Analysis of the 4th task, the post-test only pointing to sounds, showed a linear relation between the position of the target and where participants pointed (correlation coefficient .88). This verified that sound positions differing by 10 were readily identifiable in the present apparatus as expected. Pointing averaged across target positions also found Group 2 sound data different than the other 2 groups with a mean of -4.7 compared to -9.3 in Group 1 and -9.4 in Group 3. The difference was confirmed by the group main effect, F(2,100) = 4.8,p<.01 in an ANOVA. While it has become standard to assess perceptual adaptation only with pretest subtracted from posttest, in many learning experiments (such as in associative learning) posttest- only comparison among differentially treated groups is the preferred measure. For the present task, less leftward (more rightward) pointing to sounds would occur in Group 2 than the other groups if perceived auditory location had shifted in the direction of the visual location as a result of the visual-auditory conflict in that group.


The direction of resolution to an ambiguous cross-modal spatial conflict can be influenced by the addition of a third modality. However, the direction of influence was opposite to that expected to occur by general problem solving, cross-modal integration models, and error correction accounts. The data are also not explained by a general visual dominance principle. When vision and proprioception disagreed about the location of an object, the addition of auditory information that agreed with proprioception did not lead to a greater reliance on proprioception. Instead, it led counter-intuitively to an increase in proprioceptive change and a decrease in visual change.

The "directed attention" hypothesis (Cannon, 1970) is an early view of cross-modal conflict resolution that may be relevant to the present finding. In that view, when two modalities are in conflict, the one that changes is the one that is not attended. The theory has been expanded and updated by Redding and Wallace (e.g. 1990, 1997, "guided modality") who suggest that vision usually dominates because vision is usually allowed to guide an entire pointing movement. They argue that if a different task does not permit vision to guide aiming for the target, such as bringing one's arm into view to read a wristwatch, then attention will instead be on arm proprioception and will reverse the usual dominance. Redding and Wallace (1990) have confirmed their prediction in an elegant study where systematically delaying visual feedback during a pointing task systematically decreased changes in arm proprioception while simultaneously increasing changes in the visual modality.

The present findings may be consistent with the predictions of this approach. The LED exposure paradigm provides visual feedback only after pointing is complete, which, as their view predicts, led to a change in vision when only vision and proprioception were in conflict. In addition, there is reason to consider that adding audition to the conflict such that it agreed with proprioception shifted attention back on to available visual information in subsequent trials. This is because adding audition also introduced a visual-auditory conflict and it has been widely reported that visual-auditory spatial conflicts resolve largely in favor of vision. While we did not directly measure which modality was attended, in the directed attention view, the prevalence of auditory change with visual-auditory conflicts implies greater attention to the visual modality. Also consistent is the present apparent shift in perceived auditory location toward vision when vision and audition were in conflict. Thus, the visual-auditory conflict may have redirected some of the attention back to vision and away from proprioception during the pointing task, which lead to a greater proprioceptive change. If the directed attention/guided modality view proves to be important for cross-modal conflict resolutions, as Redding and Wallace maintain, then it remains to be explained how the correct resolution is reached in growth because change would seem to be governed haphazardly by which modality one happens to be attending.

Most research on cross-modal conflicts investigates the interaction between two modalities. One important consideration for conflicts involving three modalities is whether a single three-way conflict is processed as such. For instance, if directed attention is the correct explanation, then the third modality of audition had an influence only in an indirect way. The addition of a visual-auditory conflict changes attention to vision, which in turn influences the vision-proprioceptive conflict. That is, the conflicts can be processed two at a time, rather than leading to what seemed to be an intelligent solution based on the totality of a three-way conflict. The modality interactions are complex, consisting of visual, proprioceptive, and auditory information at each of the near distance of the finger, the far distance of the targets, and the relation between them, such as whether the target, finger, and eyes are in line. In the present experiment, the auditory information was at the same distance as the distant target, and was temporally yoked with the near finger. It is conceivable that different configurations of spatial and temporal auditory congruence will lead to different resolutions (see Bedford, 2001 on object/numerical identity across different modalities).

A final consideration is that even when three modalities are considered together, what initially seems to be an appropriate solution for an ideal decision maker may not be. Following is one situation in which the two modalities in the majority are nonetheless wrong. (The direction and size of distortions are derived from Bedford, under review and Bedford and Harvey, under review, who recently modeled the plasticity required in development to maintain accurate pointing and sound localization; see also Clifton et al., 1988) An observer is asked to point without seeing her right hand to the sound of an object that is located 45˚ to the right of straight ahead. An auditory distortion (e.g. due to an increase between the ears, as with growth) causes the object to be localized incorrectly at 54˚. The observer therefore attempts to point to 54˚, but because of proprioceptive distortions (an increase of shoulder width and arm length with growth), her arm feels to be at 54˚, when it is actually only at 52˚. If she sees the position of the object, it is at 45˚, if vision is undistorted. Thus, she feels (incorrectly) she is pointing at 54 ˚, the object sounds (incorrectly) as though it is at 54˚, but looks (correctly) as though it is at 45˚ for a v, p, a conflict of 45, 54, 54. Yet the majority value is wrong, and it would be incorrect to change the visual modality, the only correct modality in the situation. Baysian models of cue combination in perception (Ernst & Banks, 2001; Landy & Kojima, 2001) emphasize that different cues will not be weighed equally. Besides modality precision, the different uses, abilities, and limitations of each sensory modality may change the weighting to produce an appropriate solution. Perceptual intelligence may come in different flavors.


Banks, M.S. (1988). Visual recalibration and the development of contrast and optical flow perception. In Albert Yonas (Ed), Perceptual development in infancy. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. 145-196.

Bedford, F. L. (1989) Constraints on learning new mappings between perceptual dimensions. Journal of Experimental Psychology: Human Perception and Performance, 15, 232-248.

Bedford, F. L. (1995) Constraints on perceptual learning: Objects and dimensions. Cognition, 54, 253-297.

Bedford, F. L. (1999). Keeping perception accurate. Trends in Cognitive Sciences, 3, 4-12.

Bedford, F.L. (2001). Towards a general law of numerical/object identity, Cahiers de Psychologies Cognitive/Current Psychology of Cognition, 20, 113-175

Bedford, F. L. (under review). A simple mathematical model for assessing developmental recalibration of accurate visual-motor pointing. Manuscript submitted for publication.

Bedford, F. L. & Harvey, E.M. (under review). What plasticity is required in development for manual pointing in space? Manuscript submitted for publication.

van Beers, R.J., Sitting, van der Gon, (1999). Integration of proprioceptive and visual position information: an experimentally supported model. Journal of Neurophyshiology, 81, 1355-1364.

Canon, L. K. (1970). Intermodality inconsistency of input and directed attention as determinants of the nature of adaptation. Journal of Experimental Psychology, 84, 141-147.

Clifton, R.K., Gwiazda, J., Bauer, J.A., Clarkson, Marsha G., Held, R. (1988). Growth in head size during infancy: Implications for sound localization. Developmental Psychology, 24, 477-483.

Ernst, M.O. & Banks, M.S. (2002). Humans integrate visual and haptic information in statistically optimal fashion, Nature, 415, 429-433.

Harris, C.S. (1965). Perceptual adaptation to inverted, reversed, and displaced vision. Psychological Review, 72, 419-444.

von Helmholtz, H.E.F. (1962). Treatise on Physiological Optics (Southall, J.P.C., Ed,/Trans.), Dover (originally published 1909).

Kubovy, M. & Van Valkenburg, D. (2001). Auditory and visual objects. Cognition, 80, 97-126.

Landy, M. S. & Kojima, H. (2001). Ideal cue combination for localizing texture-defined edges. Journal of the Optical Society of America A, 18, 2307-2320.

Pettigrew, J.D. (1978). The paradox of the critical period for striate cortex. In C.W. Cotman (Ed.), Neuronal Plasticity (pp, 311-330). New York, NY: Raven Press.

Redding, G.M. & Wallace, B. (1990). Effects on prism adaptation of duration and timing of visual feedback. Journal of Motor Behavior, 22, 209-224.

Redding, G.M. & Wallace, B. (1997). Adaptive Spatial Alignment. New Jersey: Erlbaum.

Rock, I & Harris, C.S. (1967). Vision and touch. Scientific American, 216, 96-104.

Rock, I. & Victor, J. (1964). Vision and touch: an experimentally created conflict between the two senses. Science, 143, 594-596.

Rossetti , Y., Jacquin-Courtois, S. & Rode, G. (2004). Does action make the link between number and space representation? Visuo-manual adaptation improves number bisection in

Stein, B.E. and Meredith, M.A. (1993). The Merging of the Senses. MIT Press.

Tversky, A. & Koehler, D.J. (1994). Support theory: A nonextensional representation of subjective probability. Psychological Review, 101, 547-568.

Wallach, H., Kravitz, J.H. & Landauer, J. (1963). A passive condition for rapid adaptation to displaced visual direction. American Journal of Psychology, 76, 568-578.

Welch, R. (1978). Perceptual Modification. New York: Academic Press.

Welch, R.B. & Warren, D.H. (1986). Intersensory interactions. In Handbook of Perception and Human Performance (K.R. Boff, et al., eds.) pp. 25.1-25.36.Wiley

Table 1

Group 1 V=P+10 A = V

(v, p, a)

Group 2 V=P+10 A = P

(v, p, a)

Group 3 V=P+10

No A

(v, p, -)

-15, -25, -15

-15, -25, -25

-15, -25

-10, -20, -10

-10, -20, -20

-10, -20

-5, -15, -5

-5, -15, -15

-5, -15

0, -10, 0

0, -10, -10

0, -10

5, -5, 5

5, -5, -5

5, -5

10, 0, 10

10, 0, 0

10, 0

15, 5, 15

15, 5, 5

15, 5

20, 10, 20

20, 10, 10

20, 10

25, 15, 25

25, 15, 15

25, 15


Table 1. Training stimuli used for training/exposure for the three groups. "V" refers to apparent visual location, P to proprioceptive location, and A to the auditory location. "(v, p, a)" refers to the individual visual, proprioceptive, auditory triplets used. All values indicate locations in azimuth angle in degrees, with "0" located straight ahead of a participant's nose with the head positioned straight on the shoulders and negative numbers to locations to the left of straight ahead. In Group 1, for instance, a target that was actually at 15 to the left would appear to be 5 to the left. When a participant succeeds at pointing to the true location, -15, the finger LED lights up at -15 (p), appears at -5 (v), and the sound comes on at -5 (a).

Figure Caption

Auditory modulation of visual-proprioceptive conflict. Change in pointing as a result of training/exposure to the conflict, measured in degrees. Negative values refer to pointing further to the left (the adaptive direction). Total adaptation indicated by Task1 (first bar), visual component, v, by Task 2 (second bar), and arm/hand proprioceptive component, p, by Task 3 (third bar). Error bars show 1 standard error. "A = V" indicates audition and vision in agreement during the conflict, "A = P" to audition and proprioception in agreement, and "No A" to absent audition. Note reversal of v and p in groups 1 and 2.