Cahiers de Psychologie Cognitive/

Current Psychology of Cognition

2001, 20, 5, 381-

Author's response

"A law of numerical/object identity"

The role of object identity and

Klein's geometry in cross-modal

and other discrepancies


Felice L. Bedford


University of Arizona, Tucson, USA

A single object in the world exists continuously in time and travels along continuous paths in space. When observers sample from the world at different times, from different spatial vantage points, and through different sense modalities, the samples that result are often discon-tinuous. Determining whether or not these discontinuous samples arise from the exact same object is an important problem in perception and cognition. Bedford (2001a) recently put forth a theory ("Object Identity Theory", see Bedford 2001b, in press) in which the object identity prob-lem was shown to be the same for samples from space, time, modality, and different eyes and a solution based on mathematician Felix Klein's (1893/1957) five levels of geometric transformations offered. In the


Correspondence should be sent to Felice L. Bedford, University of Arizona, 312 Psychology Building, Tucson, AZ 85721, USA


present article,1 challenges to the necessity of object identity for cross-modal conflicts, the difficulties a lone pair of samples presents, and suggested counterexamples to the geometric solution are discussed. Also included is a brief reiteration that basic spatial-temporal laws are simple limiting cases of the more general geometry.


Cross-modal interactions

According to Object Identity Theory, when samples arise from differ-ent modalities they must be linked to the same object to initiate familiar cross-modal phenomena such as ventriloquism and prism adaptation. An example of the ventriloquism effect involves hearing the sound of a movie as coming from the central movie screen rather than from the nearest speaker located off to the side. Close your eyes, and the sound reverts back to its true location.

Radeau and Colin (this issue) raise four objections: first and second, object identity is not required for either ventriloquism or the aftereffects it can cause. They state "We will show hereafter that this claim is main-ly based on a confusion between the results of ventriloquism studies in which immediate effects and aftereffects are investigated". Third, object identity is also not required for the McGurk Effect, a cross-modal inter-action involving identification of speech sounds which was mentioned but not developed in Bedford (2001a), and fourth, object identity only results from cross-modal integration and not the other way around. In Object Identity Theory, all four claims are incorrect.

The crux of Radeau and Colin's arguments is based on studies which claim that while cognitive factors such as instructions, strategies and realism of displays can have an effect on ventriloquism, they do not in-fluence aftereffects of ventriloquism. A digression is necessary to make certain the distinction between ventriloquism and aftereffects is clear.

When there is a discrepancy between modalities, there are at least two psychological processes that can be observed. Immediate effects, such as ventriloquism, refer to precisely an immediately observable


1. Due to time constraints, I have been unable to analyze and respond to the excellent article by Brian Scholl at this time. Hopefully a dialogue can be continued in this or some other forum.

effect on perception. If a (stationary but noisy) car on the movie screen
is detected visually in one location, but the same car is detected audi-torily in a different location, then the car is perceived as coming from one location despite the conflicting location values fed by the two mo-dalities. Presumably, the preferred unconscious perceptual inference, based on the totality of the data, is that the world contains a single car in a single location. Despite the disagreement between the modalities, this solution is unconsciously judged a better fit than other hypotheses, such as one car is really in two places in the world at the same time (or there are two different cars, one heard, and one seen). If there were a dis-agreement about location between vision and proprioception rather than vision and audition, an immediate effect also occurs. For instance, if you look at your own hand through a prism, the hand immediately feels to be where it looks through the prism; only one hand in one location is perceived despite the conflict. This effect is often known as "visual capture", a term which also can encompass ventriloquism. Immediate effects are short-term, and refer only to the on-line inference. If the ob-server closes her eyes, the sound "snaps back" to its true location, and the hand feels to be where it really is. Removal of one modality removes its influence on the other.

This is in contrast to the second effect that occurs following cross-modal discrepancies, a long-term solution that is observed when a con-flict is repeated. As with the immediate effects, the solution is adaptive and allows adjustments to sensory processes "blamed" for the error (see Bedford, 1999) and recalibrated accordingly. The long-term effects per-sist even when one of the conflicting modalities is now removed; the hand, for example, continues to feel displaced even when no longer looking at it. Or the sound continues to be localized where the noisy object had been seen, even though it is no longer visible. When vision and proprioception are in conflict, the long-term solution is known as "prism adaptation" or "adaptation" and when vision and audition are in conflict, the long-term solution is sometimes known as aftereffects of ventriloquism, or "adaptation" as well.

Radeau and Colin's claim is that the object identity decision is not necessary for the aftereffects of ventriloquism because cognitive factors do not seem to influence the aftereffects of ventriloquism. However, this misses the point that the object identity decision need not be conscious. Bedford (2001a) was explicit that the critical decision as to whether two samples arise from the same object need not be conscious. The solution put forth by Object Identity Theory - nested geometries based on Klein's transformation approach to geometry - is, in fact, a solution that favors automatically elicited non-conscious processing. A well-educated adult may not even remember the Pythagorean theorem from Euclidean geometry consciously, let alone master the counterintuitive rules of to-pology or the other geometries involved. Similarly, they claim that for the immediate effect of ventriloquism, cognitive effects can occur, but still need not: "Immediate pointing biases were found even on those trials where the subject did not experience that the signals come from a same location" (italics added), and hence object identity is not required for ventriloquism either. Issues of experience and conscious awareness of object identity are independent of issues of the requirement of object identity.2 Object Identity Theory stands unwaveringly by its claim that a same object determination is simply a logical necessity for both imme-diate effects and long-term effects that result from discrepancies on any parameter between any modalities. There is no confusion between im-mediate effects and aftereffects.

For immediate effects, you would not hear the car as coming from the screen unless at some level the visual and auditory signals were judged to refer to the same car. To do otherwise, would be a nonsensical exploit for perceptual systems. As discussed in Bedford, 2001a, as you read this article, the television may be on in the background. It would be unadaptive to hear the sound of the television as coming from the loca-tion of the article, since these sounds and sights are independent of one another. In a normal environment, there are multiple simultaneous sights and sounds; only those that are determined to arise from the same source should and will influence each other in this way. For long-term effects, the logic of the necessity of the object identity decision was explained for prism adaptation (Bedford, 2001a, 1999) and applies to aftereffects of ventriloquism as well. To recap briefly, the reason a mismatch be-


2. However, it is worth noting that a function of the immediate effects should be to maintain experience of a coherent world, and the dissociation Radeau and Colin report may be of special interest for issues of how consciousness and perception interact in adaptation (see Reiner & Willingham, 2001). It is also reminiscent of Goodale, Milner, Jakobson, and Carey (1991) finding where dissociations in responding occur depending on whether patients use a motor response or report the answer.

tween vision and proprioception (or vision and audition) leads to adapta-tion is also based on an assumption that one object cannot be in two places at the same time. The discrepant cross-modal signals for a single object indicate that there must be an underlying error in one or both modalities. If there are two objects, there is no reason why they cannot be in two places, and there is no reason for adaptation to occur. Indeed, adaptation under these circumstances would fix something that isn't broken. Radeau and Colin argue that instead synchrony and proximity are required for the aftereffects of ventriloquism. How identity occurs, and that it must occur, are different issues. But note that it was also shown in Object Identity Theory how what are known as synchroniza-tion and proximity are derived from the geometric solution - synchro-nization results from geometry applied to time, and proximity from geometry for space.

The McGurk effect (McGurk & MacDonald, 1976) is an immediate effect that results from a cross-modal discrepancy for speech sounds rather than location. For instance, if observers listen to /ba/ but watch a face uttering /ga/, the visual information is influential and observers re-port hearing the intermediate speech sound /da/. For the McGurk effect, our positions should now be predictable. Radeau and Colin argue that object identity is not required for the McGurk effect because it may remain uninfluenced by explicit conscious knowledge (e.g., a female visual speaker and a male auditory speaker does not always diminish the effect; Green, Kuhl, Meltzoff, & Stevens, 1991). I argue that object identity is a logical prerequisite for cross-modal integration and that con-scious vs. unconscious processing is a red herring. It would be illogical if vision were permitted to influence the results of audition about a speech sound from a different speaker. Only when it is determined that it is the exact same speaker will it be advantageous to consult vision as a source of information as to what that speech sound might have been, even if that decision is not consciously arrived at or its results are con-sciously counterintuitive.3 If Lila Gleitman and Henry Gleitman are both speaking to you at the same time (and anyone who is fortunate enough to dine with them knows that they do this, and are both worth listening to),


3. Conscious reasoning may suggest a male voice and a female face are not the same speaker, but the automatic geometry at the core of identity decision may nonetheless equate them.

then the sight of Henry's lips moving will only help you process what Henry is saying, not Lila, and vice versa, because they talk of different things. Logically, this would be true even if Lila's voice came from a speaker closer to Henry's location, and vice versa, even if perceptually one were misled by such a switch.

Radeau and Colin's final issue refers to neurons in the superior colli-culus that are responsive to input from different modalities simultane-ously. I agree multisensory neurons may play a role in the substrate of cross-modal interactions, but I do not see how the existence of such cells shows that the object identity decision must occur after integration rather than before as they suggest. I invite Radeau and Colin to develop the argument. Note that a potential paradox involving the order of the object identity decision and cross-modal integration, which I labeled "Held's paradox", was offered in a commentary I wrote on an article of Radeau's several years ago (Bedford, 1994) and may be of interest here.

One final note is that Radeau (and Colin) use the notion of "pairing" for ventriloquism here and elsewhere, but it is puzzling that originally Radeau (and long-time collaborator Bertelson) traced what was behind the concept as relating to a "single source, event, or object" (Radeau & Bertelson, 1974, 1976). It is unclear if her present work implies Radeau has had a change in theoretical position, or whether it is just the differ-ent (interesting, but independent) issue of conscious vs. unconscious processing which has gotten in the way.


One pair of samples in the real world

Object Identity Theory was developed for situations when there is a choice between samples. For instance, will a circle map onto a square or to a circle with a hole in it in an apparent motion paradigm? In the geo-metric solution of the theory, a sample from the lowest level geometric transformation available will always be chosen - isometric preferred to similarity, but similarity preferred to affine, and so on.

Sharon (this issue) raises the issue of object identity when there are no choices. While she notes that Bedford (2001a), discusses that real-world situations present an abundance of samples from which to choose, she correctly argues that there are nonetheless situations in which there may be only one pair of samples. For instance, she suggests that if a man goes behind a house but a woman emerges, there appears to be only one pair of samples, and yet, we make an identity decision ("no"). As she puts it: "However, when more than one possible pairing is not im-mediately present, processing doesn't grind to a halt - people still make identity decisions, and in a non-random way". She further points out that since the man/woman case is the only possible match, it seems like it should be accepted because, in Object Identity Theory, identity is resolved from the lowest level of the hierarchy available, whatever it is.

These are important issues that need to be addressed. One useful dis-tinction is between the theory itself and its development or application. The geometric solution was indeed developed to make predictions for situations with choices between higher and lower levels, but note from the original article: "To summarize, in the present theory, the object identity decision is probabilistic such that the more radical a transforma-tion, the greater the properties of the original form are altered, and the less likely the pre-transformed and post-transformed samples will be judged to refer to the same object" (p. 142) and "In practice, this general probabilistic information can be used for an actual decision - Yes or No - by choosing ..." (p. 142). Thus, development of the theory applied the probabilities to choice paradigms where it is arguably clear-est about how probabilities can be used to make concrete decisions. However, it is important to note that the theory itself involves a general ordering of probabilities.

Probabilistic ordering also applies to a single pair of samples. All else being equal, a single pair of samples will be more likely judged as coming from one object, the less radical the geometric transformation that relates them. As Sharon astutely observes, the prism adaptation situ-ation discussed was, in fact, based on an identity decision with only a one single pair of samples, the hand in one place through vision, and another place through proprioception. As discussed, and reiterated by Sharon, the greater the amount of transformation (as defined by the Klein hier-archy) between the two modalities, the greater the amount of learning there was in an adaptation-like paradigm, even though there was only one pair of samples on each trial. For instance, a mapping in which a visual square was turned into a proprioceptive rectangle lead to less adaptation than a mapping in which the visual square became a smaller square in proprioceptive space. How do graded probabilities get trans-lated into graded amounts of learning? When the probability that two samples come from the same object is high, the probability that they will be judged on any one trial to refer to the same object is high. Since adaptation requires they be judged to be one object, there will effective-ly be more training trials when the probability of object identity is high. But of course, all else is not always equal. And a difficult issue remains - how is a specific probability (and what is the value of the probability) translated into an actual decision for two specific samples?

Sharon offers a suggestion that perhaps there is a probability cutoff, after which the decision for two samples will always be "no". While that would be a heuristic that would take care of at least the extreme deci-sions, I do not think such an inflexible rule, however simple it would make our task, is ever used. As discussed, even non-topological trans-formations are possible, such as when one sample is mapped to two samples in apparent motion "splits" or Panum's limiting case in stereop-sis. Note also that Object Identity Theory does not predict that a single pair of samples must be accepted as coming from one object when any transformation, even a radical one, relates them. Rather, it is that even such a radical transformation can be judged as one object.4 Thus, Sharon's concern that people can never arrive at a "different object" decision in the theory is not a problem at least.

If there is just a single sample pair, one generality may be that the more radical the transformation, the more evidence required for a same-object conclusion. For instance, Huber and Aust (2001) show a number of leopard poses and note that they can all be viewed as coming from the same leopard. However, if one considers just two pictures at a time, some pairs are easy and some are hard. If we see a leopard from an angle, and then head on after a few contortions, the transformation that results is likely topological or even non-topological; under these circum-stances, we may need more evidence before we are perceptually certain it is still the leopard - perhaps a growl, or a few more glimpses.
Perhaps there is also greater room for domain-specific knowledge when


4. There might be a counter-intuitive prediction here that one pair of samples, x and y, need not be judged the same object, but adding an alternative, z, such that there is now a choice between x and y and a more radical x and z, makes an x-y link more likely than if x and y were the only stimuli present. It the man goes behind the screen and both a woman and a car emerge, are the man and the woman more likely to be linked as the same object than if just the woman emerged without the car? Alternatively, Object Identity Theory can be interpreted that when there is a choice, the sample from the lower level transformation will be picked if any sample is picked at all.

there is a single pair of samples. There have been a number of com-ments on how specialized knowledge can be used to determine object identity (e.g., Bloom, 2001; Wilcox, 2001). While Object Identity Theory argues geometry is still the basic core of the decision (see also Bedford, 2001b), such domain-specific knowledge may play an espe-cially large role for single-sample pairs. The man-woman transformation Sharon mentions, for example, can be judged as two people and not one with help from such specialized knowledge that, except in rare medical facilities, men do not turn into woman and surely cannot even cross-dress so quickly. But such knowledge is painfully specific. This is a long way from Sharon's hope that Object Identity Theory can also pro-vide principled criteria when only one pair of samples is available, as it does for choices between samples. I suggest that if one accumulates all possible single sample pair cases, they will order themselves, as if by magic, such that the higher the level of transformation, the lower the probability the two samples were judged the same object. But the route to that observed regularity may be squishy.


What is a sample?

Sharon also suggests that the theory develop further how to charac-terize and pick out samples both to determine how many there are, and as a question in its own right (see also Gauker, 2001).

A sample is part of a proximal stimulus. Identification of samples can begin with discontinuities in the proximal stimulus that occur in space and time (look at first and second order derivatives) - samples from dif-ferent modalities and different eyes have a head-start, as they begin at least more separable from one another. If we survey the samples used to build the case that identity is required in a number of independent domains in Bedford (2001a), we obtain the following:

Apparent motion points (dots) separated in time and space, or small geometric forms

Prism adaptation a point on the finger, localized visually and proprioceptively

Ventriloquism a point of light, and a simple sound, separated in space

Gestalt Proximity as few points as possible, spatially separated

Stereopsis as few points as possible on the left retina and on the right retina

Priming two small words, separated in time

It is not a coincidence that the samples were chosen to be as small and contentless as possible - a dot or psychological point when that would suffice. This was done so as to facilitate the abstraction away from content and to show the "sameness" that lurks beneath the different phenomena. Indeed, initially pictures of the formal structure were included for each phenomenon, but judged unnecessary by reviewers - presumably because they all looked like the same picture. Other phe-nomena for which the theory argues identity occurs, but with larger richer samples, were typically relegated to a table and not developed:

Pavlovian conditioning stimuli like tones and shock, separated in time

McCollough effect a vertical red/black grating, and a horizontal green/black grating

McGurk effect a visual speech sound and an auditory speech sound

To identify larger, richer, samples from a big confusing proximal stimulus, a useful idea is provided by Lachter (2001). He suggests that Object Identity Theory be applied at many different scales simultane-ously, the way Marr suggests in general that early vision proceed. Note, however, that while Sharon observes that real-life rarely presents sam-ples in "tidy" fashion, this is a problem shared by many research enter-prises. Pavlovian conditioning, for instance, may present a single tone stimulus, with care to ensure that no other sound reaches the subject's ears. Even perception research, whose raison d'être is perception, assumes as a given the extraction of the perceptual elements; subjects are placed in front of a computer monitor, and fed very specific tidily presented stimuli. The important question of what is a sample, both for space ("objects") and for time ("events"), may be a shared problem with all of perception.


Engineering vs. psychology

Dobbins (this issue) has concerns with stereopsis and apparent motion examples of object identity. But there is a deeper issue here about dif-ferent approaches to the same problems in perception. Starting with the specific issues, for stereopsis, Dobbins comments on Panum's limiting case. The phenomenon was raised in Bedford (2001a) as an example where even many-to-one matches, which are beyond the official end of the Kleinian geometries, can occur to achieve object identity. It illus-trated the extreme flexibility that object identity decisions can take and how Object Identity Theory is able to incorporate such flexibility. In Panum's limiting case, two dots (or lines) on one retina can match a single dot (or line) on the other retina, when they are the only dots (or lines) available. Observers see two dots (lines) separated in depth. Dobbins raises a fascinating example in which the single line is replaced with two very closely spaced lines - and fusion no longer occurs. He suggests that this is a counterexample to the geometric solution of Object Identity Theory because a non-topological match was accepted in the first case, but the topological match of the second case, a lower level in the hierarchy, was not accepted. However, this is not a genuine re-versal of the hierarchy because the system did not choose a higher-level match over a lower-level one in the same situation. Had there been a choice between mapping one line to one other line, or to two lines, then in the theory, the topology-preserving single line would be selected. As discussed in Bedford (2001a), the same two samples can in one situation refer to one object but in another situation refer to two objects, and more relevant here, two samples related by a high-level transformation can refer to one object (e.g., two bird samples separated in time, related topologically), while two samples from a low level in a different situa-tion can refer to two objects (e.g., two tennis balls separated in space, related by an isometric transformation).

The example Dobbins raises is nonetheless fascinating because the two situations are so closely related. Exploring such cases near the limits may prove interesting. Note that limits occur in the other domains as well. In prism adaptation, for example, there are limits to how large the discrepancy can be to attain a "same object" determination, even when a low-level transformation (isometric shift) separates the two mo-dalities. Large separations of 40 deg. will not produce either prism adaptation (long-term effect) or visual capture over proprioception (im-mediate effect). For stereopsis, as Dobbins explains, the examples of Panum's limiting case and the permutation he raises reflect in part limits in the amount of binocular disparity that will lead to fusion for one pair of points, and limits to the disparity gradient for pairs of points (i.e., the difference between the disparities). Domain specific limitations, or the existence of limits generally, do not detract from a general law (see Bedford, 2001b).

For apparent motion,5 Dobbins states that Object Identity Theory cannot explain spatial and temporal regularities such as Korte's laws. This is incorrect. The appendix (which I suspect was missed by many) shows precisely how Korte's third law (which states that the amount of time needed for optimal apparent motion increases with increasing spa-tial separation) is an example of the geometric solution. To use the hier-archy to derive spatio-temporal laws, the samples were constrained to no more than one-dimensional lines rather than two-dimensional forms and the now free spatial dimension was replaced with the temporal dimen-sion. This allowed space and time to be present in the hierarchy at the same time rather than separately applied as it was in previous examples. Basic spatio-temporal laws can be understood as limiting cases of the more general geometry where the stimuli are point sources, or behave like them. Combining two dimensions of space (forms rather than lines or points) along with time would require a three-dimensional version of the hierarchy, which would be a profitable direction to pursue.

Perhaps most importantly, Dobbins concludes: "I have tried to show how constraints on binocular fusion depend on a detailed consideration of the geometry of local disparity and on the physiological mechanisms of binocular matching rather than on very general abstract considera-tions concerning a hierarchy of geometries." It is here that there is a fundamental difference in viewpoint. I have tried to show just the


5. Dobbins also briefly raises three other specific points about apparent motion, none of which are problematic for Object Identity Theory. He sug-gests that other superficial factors might explain apparent motion choices rather than the geometric hierarchy. However, different experiments would require different explanatory principles while only the geometric hierarchy can explain all the findings. He suggests that subjects may not actually see the motion in apparent motion choice paradigms. We have developed a "clock" paradigm (Bedford & Mansson, under review) to improve these shortcomings in apparent motion methodology and have found the same geometric choices. Lastly, he suggests that since nearly any two stimuli can be accepted for a match in apparent motion, it is difficult to see how the geometric hierarchy is explanatory. As discussed in Bedford (2001a), the theory excels at just such flexibility while at the same time remaining rule-governed.

opposite - that local disparities and their many equivalents cannot be sufficient to understand perception, especially for relating diverse phe-nomena, each with its own set of specialized local rules. Also, why do the disparity gradients have the limits that they do? Do the limits to simple binocular fusion reflect situations where points on the left and right eyes could never have plausibly arisen from a single object in the natural world? Engineering models simply do not ask these kinds of questions.




This work was supported by a grant from the Vice Presidents Office of Research at the University of Arizona funded by the University of Arizona Foundation.



Bedford, F. L. (1994). A pair of paradoxes and the perceptual pairing process. Cahiers de Psychologie Cognitive/Current Psychology of Cognition, 13, 60-68.

Bedford, F. L. (1999). Keeping perception accurate. Trends in Cognitive Sciences, 3, 4-12.

Bedford, F. L. (2001a). Towards a general law of numerical/object identity. Cahiers de Psychologie Cognitive/Current Psychology of Cognition, 20, 113-175.

Bedford, F. L. (2001b). Object Identity Theory and the nature of general laws. Cahiers de Psychologie Cognitive/Current Psychology of Cognition, 20, 277-293.

Bedford, F. L. (in press). Generality, mathematical elegance, and evolution of numerical/object identity. Brain and Behavioral Sciences.

Bloom, P. (2001). Identity crisis. Cahiers de Psychologie Cognitive/Current Psychology of Cognition, 20, 183-192.

Gauker, C. (2001). Object identity: What is the question? Cahiers de Psychologie Cognitive/Current Psychology of Cognition, 20, 215-220.

Goodale, M. A., Milner, A. D., Jakobson, L. S., & Carey, D. P. (1991). A neurological dissociation between perceiving objects and grasping them. Nature, 349, 154-156.

Green, K. P., Kuhl, P.K., Meltzoff, A. N., & Stevens, E. B. (1991). Inte-grating speech information across talkers, gender and sensory modality: female faces and male voices in the McGurk effect. Perception and Psychophysics, 50, 534-536.

Huber, L., & Aust, U. (2001). The relevance of evolution, species com-parison, color and categorization for the object identity problem. Cahiers de Psychologie Cognitive/Current Psychology of Cognition, 20, 221-229.

Klein, F. (1957). Vorlesungen uber hohere geometrie (Lectures on higher geometry) (3rd ed.). New York: Chelsea. (Original work published 1893)

Korte, A. (1915). Kinematoskopische Untersuchungen. Zeitschrift für Psycho-logie, 72, 193-206.

Lachter, J. (2001). Object similarity, not identity: Getting at Bedford's core arguments. Cahiers de Psychologie Cognitive/Current Psychology of Cog-nition, 20, 231-236.

McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746-748.

Radeau, M., & Bertelson, P. (1974). The aftereffects of ventriloquism. Quarterly Journal of Experimental Psychology, 26, 63-71.

Radeau, M., & Bertelson, P. (1976). The effect of a textured visual field on modality dominance in a ventriloquism situation. Perception and Psycho-physics, 20, 227-235.

Reiner, C., & Willingham, D. B. (2001). Beliefs about object identity and beliefs about object properties in perception. Cahiers de Psychologie Cognitive/Current Psychology of Cognition, 20, 255-260.

Wilcox, T. G. (2001). Object identity: A developmental perspective. Cahiers de Psychologie Cognitive/Current Psychology of Cognition, 20, 269-276.