<<Vol. 1 Table of Contents

PDF Version>>

 2006

Volume 1, pp 12-35 

 
Comparative Cognition of Object Recognition
 

Marcia L. Spetch and Alinda Friedman
University of Alberta


Object recognition is fundamental in the lives of most animals. The authors review research comparing object recognition in pigeons and humans. One series of studies investigated recognition of previously learned objects seen in novel depth rotations, including the influence of a single distinctive object part and whether the novel view was close to two or only one of the training views. Another series of studies investigated whether recognition of directly viewed objects differs from recognition of objects viewed in pictures. The final series of studies investigated the role of motion in object recognition. The authors review similarities and differences in object recognition between humans and pigeons. They also discuss future directions for comparative investigations of object recognition.


Introduction

For most creatures, successful interactions with the world require the ability to rapidly detect, recognize, and respond to numerous objects, both animate and inanimate. For example, rapid and accurate recognition of a mate, a predator, an edible object, or a navigational landmark would all be beneficial for survival. Indeed, the very fact that almost all complex organisms have evolved sophisticated sensory systems to determine “what’s out there” underscores the importance of detecting external stimuli and recognizing what they are. Object recognition thus goes beyond simple detection and requires cognitive processes to recognize, interpret, and appropriately respond to objects. Because survival can depend on the ability to perform these operations quickly and accurately, it follows that the evolution of cognitive processes necessary for object recognition should be widespread throughout the animal kingdom. It does not necessarily follow, however, that the nature of these processes is the same across all species. Just as sensory systems have evolved with both common and specialized features across species, we might expect to see both general and specialized cognitive processes for interpreting the information detected by sensory systems. Thus, object recognition is fruitful ground for investigations of comparative cognition.

It takes only a few moments of thought to realize that the ability to recognize objects is not only widespread across species, but it is also a fundamental part of almost every waking moment of our lives. In fact, only when we experience problems in recognizing objects do we even notice that we are continually processing stimuli to determine what is out there. But just because object recognition is routine and fundamental does not mean that it is simple. In fact, visual object recognition is an extremely complex ability, due to the fact we must construct a three dimensional (3-D) world from two dimensional (2-D) input. That is, the 3-D objects in the world are first registered on the retinas at the back of each eye, which, for practical purposes, are 2-D. The 2-D information from each eye is combined to form the 3-D representation of the world. Moreover, we must parse complex patterns of light into separate objects, and often we must infer the existence of whole objects from incomplete information. Thus, as illustrated in Demonstration 1, what we ultimately “see” goes beyond the pattern of light that reaches our eyes.

Another complexity faced by any object recognition system is that the information reaching our eyes changes across viewpoints. In fact, the 2-D outlines of the same object seen from different views are sometimes less similar than the outlines of different objects (e.g., consider the 2-D silhouettes formed by the image of a horse from the front vs. the side view, compared to the 2-D silhouettes formed from the side views of a horse vs. a donkey). Thus, it is no simple task to be able to both discriminate between similar-looking objects and to recognize different-looking views of the same object. How visual cognitive systems accomplish these feats, and with the speed needed to interact successfully with the world, has been the focus of considerable research and theory in human cognition (e.g., Biederman, 1987; Bülthoff & Edelman, 1992; Tarr & Bülthoff, 1998), and more recently in animal cognition (e.g., Friedman, Spetch, & Ferrey, 2005; Kirkpatrick, 2001; Logothetis, Pauls, Bülthoff, & Poggio, 1994; Peissig, Young, Wasserman & Biederman, 2000; Spetch, Friedman, & Reid, 2001).

Our review will focus on a series of recent studies conducted in our laboratories that were aimed at comparing the cognitive processes that underlie recognition of 3-D objects in humans with the processes used by pigeons. Theses species provide an interesting comparison because pigeons, like humans, are highly visual creatures, but they differ substantially from humans both in their visual experiences and in the neuroanatomy of their visual system (see Zeigler & Bischof, 1993; Husband & Shimizu, 2001). For example, birds, through flight, may require a different set of processes for rapid comprehension of the 3-D world than do humans. Pigeons also have two fovea-like specialized retinal areas that are each similar to the human fovea. One appears to be specialized for near frontal vision and presumably facilitates detection and selection of grain, and one appears to serve more distant monocular lateral vision, and may allow constant monitoring of predators (see Blough, 2001). Our studies, and other research we will review, suggest that there are both interesting similarities and important differences between the cognitive processes underlying the bird’s eye view of the world and those that underlie our own.

Because the direction of theoretical influence in object recognition literature has been from human work to comparative studies, our review will start with a brief overview of theories of object recognition in humans. We will then review comparative studies of object recognition, with a focus on studies of pigeons and humans that concern three select aspects of object recognition: viewpoint dependence, transfer between objects and pictures, and the role of dynamic cues. We will suggest that processes of object recognition in pigeons share several similarities to those that underlie human object recognition, but that there also appear to be some interesting differences between species. We end by noting that comparative studies of object recognition are at an early stage of both empirical and theoretical development and suggest that an ecological approach may provide an excellent complement to current research programs.

Theories of Human Object Recognition

One of the most intriguing aspects of object recognition is our ability to identify objects across changes in viewpoint. In particular, depth rotations of an object can drastically change the 2-D information that reaches our eyes. Explaining how we recognize an object despite these changes has been a major theoretical challenge. Two general classes of theories have been proposed to explain how humans recognize objects when seen from novel views. The classes differ principally in terms of how the shapes, features, and structure of objects are represented, and what processes are involved in object recognition.

In one class of theories, shape representations are object-based, insofar as they consist of “structural descriptions” of the 3-D properties of the objects. Because an object’s structure is represented, there is no one view of it that is “privileged” (e.g., more accessible or recognizable) in any sense. For example, the recognition-by-components theory (Biederman, 1987) and its more recent version, the geonstructural- description (GSD) theory (Biederman & Gerhardstein, 1993), assume that representations of objects consist of simple, volumetric parts (called “geons”) and the spatial relations among them. Both the geons themselves and their spatial arrangement are thought to be important for recognition. According to these theories, recognition of an object should be invariant across changes in view as long as certain conditions are met, namely that: 1) the object can be decomposed into geons, 2) the arrangement of the geons forms a distinct structural description that differs from other arrangements, and 3) changes in the view of the object do not change the structural description (as would be the case, for example, if a particular part was visible only from certain views). Thus, object-based theories predict that all views of an object that meet these three criteria should be recognized with approximately the same speed and accuracy.

The second class of theories assumes that objects are encoded in memory in the poses in which they are seen by viewers, and are thus called “view-based” theories (e.g., Bülthoff & Edelman, 1992; Edelman, 1999; Tarr, 1995; Tarr & Pinker, 1989). In the earliest version of this class (Tarr, 1995; Tarr & Pinker, 1989), it was assumed that because objects can be seen from many views, the representation of an object emerges as a collection of stored views. Each stored view reflects the specific metric properties (appearance) of the object as it looks from that view. Thus, the ability to recognize the object when seen from a novel view requires a mechanism that matches the current percept to one of the stored views, except in cases where recognition can be based on the presence of a distinctive, diagnostic feature. For example, if one can see the trunk of an elephant, it is probably not necessary to see much else in order to identify the elephant as such. In the absence of such a diagnostic feature, however, the view-based theory predicts that speed or accuracy in recognizing an object will decrease as a function of the rotational distance between a given novel view and the nearest stored view.

In many situations, object- and view-based approaches to understanding object recognition make the same predictions. Specifically, when an object contains a single distinctive geon that can serve as a diagnostic feature, both view-based and object-based approaches predict that the speed and accuracy of recognition will be viewpoint invariant. Conversely, whenever one or more of the three conditions for viewpoint invariance stipulated by object-based theories is not met, then both classes of theories predict that performance will exhibit viewpoint dependence. Because the predictions that are common to both classes of theories have been supported in studies of human object recognition, the challenge for researchers has been to identify situations that test differential predictions of the two classes of theories.

Figure 1. Paperclip-like object, and objects containing one, three or five added geons. From Tarr et al., 1997.

Tarr, Bülthoff, Zabinski, and Blanz (1997) identified and tested one such situation. Specifically, they noted that if objects consist of unique arrangements of multiple distinguishable geons that can be seen from all views, then they meet the conditions for viewpoint invariance according to the object-based approach. However, if the objects also do not contain a single unique part that can serve as a diagnostic feature, view-based theories predict that object recognition should be viewpoint dependent. To test these differing predictions, Tarr et al. (1997) examined peoples’ recognition of depth-rotated objects in four conditions. In one condition, the stimuli consisted of paperclip-like objects with no added geons (Figure 1, left object). Both classes of theories predicted viewpoint dependence for these objects. In the second condition, one distinctive geon was added to each object, and the added geon differed across objects making it a diagnostic feature (Figure 1, second object from left). Thus, both classes of theories predicted that recognition should be viewpoint invariant in this condition. In the remaining conditions, either 3 or 5 distinctive parts were added to the objects (Figure 1, rightmost two objects). These new parts were arranged in unique ways for each object, thus fulfilling the three conditions of the object-based theories, but the specific geons used to create the objects were not unique to each object. Thus, for the multi-part objects, each object had a unique structural description, but no particular geon could serve as a diagnostic feature for any of the objects. In this case, object-based theories predict viewpoint invariance whereas view-based theories predict viewpoint dependence.

Consistent with the predictions of view-based theories, Tarr et al. (1997) found that people showed strong viewpoint dependence in the multi-part conditions, both in a naming task, in which people learned one-syllable names for each of several novel objects and then identified them in either the learned view or new views, and in a same-different task in which people simply had to detect whether two views represented the same object.

Although there is now quite a bit of evidence favoring view-based object recognition for humans, the view-based approach has recently undergone a challenge to both its representational and process assumptions. In its original instantiation (Tarr, 1995; Tarr & Pinker, 1989; see also Shepard & Metzler, 1971), a 2-D representation of a novel view was “normalized” (i.e., transformed) until it matched one of a set of 2-D “snapshot-like” stored representations. We will thus refer to this as the normalization approach. The increase in time and decrease in accuracy that was typically observed as a function of rotational distance between the novel and stored views was hypothesized to arise from the length of the transformation process itself. As originally conceived (Shepard & Metzler, 1971; Metzler & Shepard, 1974; Tarr & Pinker, 1989), this process involved a kind of “mental rotation” that transformed the novel percept until a match was achieved (or not) to a stored view. Thus, the larger the rotational distance between the two the longer (and less accurate) would be the predicted response.

In recent years, a new view-based approach to object recognition has been developed that has as its basis a more sophisticated representational scheme as well as a computational recognition mechanism that appears to explain a broader range of phenomena than the normalization approach (Broomhead & Lowe, 1988; Bülthoff & Edelman, 1992; Edelman, 1999; Edelman & Bülthoff, 1992; Edelman, Bülthoff, & Bülthoff, 1999). In this view-combination approach, some novel views of familiar shapes can benefit during recognition from their similarity to more than one learned view. Thus, this approach is a type of view-based approach, but it is more robust because it can accommodate situations in which performance appears to be view-dependent and others in which it appears to be view-invariant.

The two view-based approaches can be distinguished by their predictions for a situation in which participants are trained with more than one view of a given object. Consider, for example, participants who are trained with two views of an object that differ from each other by a 30° rotation in depth (we will arbitrarily call these the 0° and 30° views). They may be tested with novel views that are either interpolated within the shortest distance between the trained views (e.g., 15°) or are extrapolated outside of this training range (e.g., 45°). On such tests, participants show more accurate recognition of the interpolated novel views than they do for the extrapolated novel views (Bülthoff & Edelman, 1992).

This finding is troublesome for normalization accounts of object recognition, because these accounts predict that recognition performance for both interpolated and extrapolated novel views should be equal and inferior to performance on the learned views (Tarr & Pinker, 1989; Tarr, 1995). Again, this is because it is assumed that objects are represented as a number of “exemplar” views (e.g., the training views), and that novel views are recognized by transforming them until they are aligned with the nearest stored exemplar (e.g., Ullman, 1989). Thus, recognition performance is predicted to be a declining function of distance between a novel view and its nearest stored exemplar, and the interpolated and extrapolated conditions are usually equated on this factor. Indeed, even when the interpolated view is equidistant to two different training views it is still assumed to be normalized to only one of them on each trial.

As noted above, in contrast to the normalization approach, view combination approaches permit some novel views to benefit from their similarity to more than one learned view. For example, Edelman (1999; see also Broomhead & Lowe, 1988; Poggio & Edelman, 1990) assumes that objects are represented as points in a multidimensional shape space spanned by similarities to a small number of reference objects, which act as prototypes. Recognizing known objects from novel viewpoints occurs by mathematically interpolating between two or more prototypes (Broomhead & Lowe, 1988) to compute what a novel view should look like in that region of the shape space. The predicted view is then matched to the novel view. Thus, when a novel view is relatively close in the shape space to two stored views, as in the interpolated case, the novel view should be similar to the predicted view and therefore relatively easy to recognize. In contrast, in the extrapolated case the more distant learned view will tend to reduce the similarity between the predicted and novel views. This is because similarity (and ease of recognition) is assumed to be an inverse function of the distance between the novel view and each of the stored views that are combined to make the predicted view (Shepard, 1968; Edelman, 1999, p. 128). Together, these assumptions predict that performance should be better on interpolated views than on extrapolated views. Further, the view combination mechanism is more robust than normalization because it can predict recognition performance for views of entirely novel objects (Edelman, 1995).

The view combination mechanism is similar in many respects to the notion of generalization in the animal learning literature (e.g., Honig and Urcuioli, 1981). In particular, generalization is commonly believed to occur via the combination of excitatory and inhibitory activation gradients that have formed around representations of positive (S+) and negative (S-) stimulus values, respectively (Spence, 1937). When more than one stimulus is positive, as is the case when there are two training views, then a positive gradient would be expected to form around the representation of each S+ view. Equally, a negative gradient would be expected to form around the representation of each S- view. Thus, when a novel stimulus view is presented, it causes excitation and inhibition according to how similar it is to the S+ and S- views that are represented, respectively; the sum of that excitation and inhibition determines the response. If the representations at the two training orientations overlap, then novel interpolated views should receive generalization from both training views, whereas novel extrapolated views might receive generalization from only one training view. Thus, the commonly held conceptualization of stimulus generalization makes the same predictions as the view combination approach for the difference in ease of recognition for interpolated versus extrapolated views, so that view combination is, in principle, a kind of generalization. Mathematically interpolating between radial basis functions is one method of implementing view combination (Edelman, 1999; Bülthoff & Edelman, 1992). A radial basis function is a kind of neural network in which the responses of neural units are Gaussians; thus, there is a gradual decay in the response of the units as the dissimilarity between the stimulus image and the learned views increases.

In general, we believe the important similarities between the view combination approach and stimulus generalization outweigh any implementation differences. For example, the notion that similarity is a function of the inverse distance between a novel view and a set of stored prototypes (or S+ representations) is equally functional in both schemes. Similarly, differently tuned generalization gradients should act in most circumstances like the differently tuned radial basis functions (Edelman, 1999) that underlie viewpoint interpolation. Importantly, both view combination and generalization mechanisms provide a meaningful contrast to recognition by normalization to a nearest neighbor.

It should be emphasized that both normalization and view combination accounts of object recognition take a “viewbased” approach to recognition. The primary differences between them involve the way that shapes are represented and the particular mechanism used to compare the learned views with novel views. The growing evidence for human object recognition favors the view combination approach. As we will describe below, the evidence for pigeons appears to be following the same trajectory, but with some interesting and subtle differences.

Studies on Viewpoint Dependence

Like humans, many animal species are constantly faced with the problem of recognizing objects from varying perspectives because our view of an object can change due to movement of ourselves or the object. For ground-dwelling animals, including humans, the most frequent type of change in view is produced when an object is rotated in depth, or equivalently, if we move in a circle around the object. Such depth rotation can result in drastic changes to the 2D shape of the object and to the object features that are visible. Intuitively, some views should be easier to recognize than others (e.g., a side-view vs. a front-view). For a flying animal, such as a bird, multiple changes in viewpoint would be common and would include changes from top to side views. How do birds recognize as objects when seen from different views and are the processes used similar to our own?

In the past few decades, researchers have begun to investigate pigeons’ ability to recognize objects across changes in view. Most studies of pigeons’ recognition of 3-D objects rotated in depth have found that pigeons’ recognition is view-dependent. For example, some studies show that when pigeons are trained with a particular view of an object, they may not be able to recognize (transfer) to novel views of the same object (e.g., Cerella, 1977, 1990). Other studies have found that pigeons could transfer to novel views of familiar objects, but that they did not show rotational invariance for novel objects (Watanabe, 1997). Still other studies have found systematic decreases in discriminative performance as a function of rotation from the training orientation (e.g., Jitsumori & Makino, 2004; Lumsden, 1970; Wasserman et al., 1996). Wasserman et al. (1996) found that pigeons’ generalization to novel depth rotations of line drawings of 3-D objects increased substantially if they were trained with three rather than one view of the objects. Overall, most research to date suggests that pigeons’ recognition performance is best described in terms of view-based processes. However, systematic investigation of factors that influence object recognition in pigeons and studies aimed at elucidating object recognition processes have only just begun.

A. Effect of Distinctive Parts

Figure 2. Discriminative objects used for pigeons in the same-parts and different-parts groups. From Spetch, Kelly, and Reid (1999).

Figure 3. Discrimination accuracy as a function of rotation from the training views for pigeons in the Same-Parts and Different-Parts groups. The dashed line indicates chance level. From Spetch, Kelly, and Reid (1999).

Figure 4. Objects used in each object-part condition. From Spetch et al., 2001.

Spetch, Kelly, and Reid (1999) conducted a preliminary investigation of whether the presence of diagnostic features would allow pigeons to generate viewpoint invariant object representations. They trained pigeons to discriminate between objects composed of red Lego pieces seen as digitized images on a computer screen (Figure 2). For birds in a Same-Parts group, the positive object (S+) differed from three negative objects (S-) only in the arrangement of their parts. For birds in a Different-Parts group, the positive object was made of differently-shaped parts than the negative objects and thus contained parts that could serve as diagnostic features.

The pigeons saw the objects at six orientations in training and then were tested at six novel orientations. Birds in the Different-Parts group performed more accurately on both training and test trials than birds in the Same-Parts group, but importantly, the reduction in accuracy for the novel orientations did not differ between the two groups (Figure 3). Thus, the presence of uniquely-shaped parts that could serve as distinctive features appeared to enhance pigeons’ ability to learn the discrimination between the objects but, in contrast to results typically found with humans, it did not reduce their viewpoint dependence.

To provide a more systematic investigation of the role of distinctive parts in object recognition, and to provide a direct comparison between pigeons and humans, Spetch, Friedman, and Reid (2001) conducted a series of experiments modeled after those conducted by Tarr et al. (1997). They used a simultaneous discrimination task, which can be given to both humans and pigeons, and they tested both species under the same four conditions used by Tarr et al., namely objects composed of zero, one, three, or five distinctive geons (Figure 4).

In each condition, pigeons and humans viewed an S+ and an S- object, which were shown side by side on a computer monitor (Figure 5), with position of the S+ varied randomly across trials.

During training, pigeons were reinforced with food for pecking on the S+ side and humans were reinforced with points for selecting the S+ side with the corresponding arrow key. The objects were shown in two views that were a rotational distance of 90° apart from each other during training, and they were shown in four novel orientations during subsequent unreinforced test trials. Consistent with the results found by Tarr et al., humans showed much weaker viewpoint dependence in the 1-part condition than in either the 0-part or multi-part conditions. Pigeons, however, showed strong viewpoint dependence in all four conditions and thus did not appear to benefit from the presence of a diagnostic feature in the 1-part condition (Figure 6).

Together, these results suggested that the presence of a distinctive geon that can serve as a diagnostic feature does not produce a representation in pigeons that is viewpoint invariant. This appears to be one difference between object recognition processes in pigeons and humans. It is important to note, however, that this does not mean that pigeons are insensitive to the presence of diagnostic features, because in the Spetch et al. (1999) study, diagnostic features facilitated pigeons’ discrimination between objects even though it did not facilitate their ability to recognize new views (i.e., to generalize across views). One possible explanation for these results rests on the fact that our diagnostic features were based on three-dimensional shape cues alone. Although the distinctive part facilitated their discrimination between the objects at the training views, pigeons might not benefit from the distinctive part if their representation of the part itself is viewpoint dependent. The finding by Peissig et al. (2000) – that pigeons show viewpoint dependence even for single geons – is consistent with this possibility.

Figure 5. Human and pigeon viewing objects on the computer screen. From Spetch et al., 2001.

Figure 6. Accuracy of pigeons and humans on tests at the training orientation or at novel orientations for each object- part condition. The dashed line indicates chance level. From Spetch et al., 2001.

B. Effect of Degree of Rotation from Nearest
Training View

Although the effect of distinctive parts on object recognition highlighted a difference between pigeons and humans, the effect of degree of rotation from the nearest trained view highlighted a similarity between the species. Many studies of human object recognition have found that speed and or accuracy of recognition decreases systematically as a function of how far the object is rotated from the nearest stored view. As noted earlier, this result is directly predicted by view-based theories, and can be accommodated by the other theories as well under certain conditions. This effect of degree of rotation has also been observed in pigeons, both with real 3-D objects (Lumsden, 1970) and with line drawings of 3-D objects (Wasserman et al., 1996). In both of these studies, the objects contained distinctive parts. Our studies found a similar result, both for objects with distinctive parts and for objects without distinctive parts. Specifically, Spetch et al. (1999) found a systematic decrease in accuracy for pigeons as a function of rotation from the nearest training view in both the Same-Parts group and the Different- Parts group (see above Fig 3). Decreases in accuracy and increases in latency as a function of rotation from the nearest training view also occurred for both species in Spetch et al. (2001). These effects can be seen in Figure 7, for which the data were collapsed across stimulus conditions (0-part, 1- part, 3-part, and 5-part).

Thus, at least under certain conditions, degree of rotation from the nearest training view is an important determinant of object recognition for both species.

C. Recognizing Interpolated versus Extrapolated
Novel Views

Although humans in Spetch et al. (2001) showed an overall decrease in recognition as objects were rotated further from the nearest trained view, their recognition depended not just on degree of rotation but also on how the object was rotated relative to the two training views. Because the training views were 90° apart within a 360° rotation circle, novel rotations could fall within the shortest distance between the views, which is considered to be an “interpolated” novel view, or they could fall outside of this shortest distance, which is considered to be an “extrapolated” novel view (see Figure 8).

Spetch et al. (2001) found that, although humans showed decreases in recognition for novel extrapolated views, they showed complete rotational invariance in their recognition of novel interpolated objects. That is, they recognized novel interpolated views as quickly and accurately as the trained views (Figure 9, top). This finding was consistent with previous findings for human object recognition by Bülthoff and Edelman (1992).

Figure 7. Accuracy and reaction time for pigeons and humans as a function of degree of rotation of objects from the nearest training orientation. The dashed line indicates chance level. From Spetch, et al., 2001).

Figure 8. Diagram showing an example of the relationship between two training views and an interpolated, extrapolated and far novel view. From Friedman et al., 2005.

Figure 9. Accuracy and reaction time with training (Trn), interpolated (In) and extrapolated (Ex) views for humans and pigeons. The dashed line indicates chance level. From Spetch et al., 2001.

Interestingly, as seen in the bottom panel of Figure 9, pigeons, unlike humans, did not show rotational invariance in their recognition of interpolated novel views (Spetch et al., 2001). However, in this initial study (and unlike the situation depicted in Figure 8), the total amount of rotation differed for interpolated and extrapolated novel views: In fact, the interpolated novel views were closer in overall rotational distance to the trained views than were the extrapolated novel views. Consequently, we could not determine unambiguously whether there was any advantage for the interpolated views that was an effect of being interpolated per se. Accordingly, we conducted another experiment to compare interpolated versus extrapolated views with degree of rotation equated (Spetch & Friedman, 2003). Both pigeons and humans were trained with two views of 3-part objects. The training views were 0° and 90° for one group, and 90° and 180° for a second group. Novel test views at 30° and 45° rotations from each training view resulted in extrapolated and interpolated novel views that were equated for degree of rotation, and were counterbalanced for specific orientation across groups (see Figure 10).

 A striking species difference emerged in recognition accuracy. Humans showed a substantial decrease in accuracy and increase in response time for the novel extrapolated views compared to the trained views, but did not show a similar reduction in performance for the novel interpolated views (Figure 11, top). By contrast, pigeons showed similar decreases in accuracy for both interpolated and extrapolated views (Figure 11, bottom). Thus, for pigeons, it appeared to make little difference whether the objects were rotated between the training views or outside of the shortest distance between the training views.

The difference found with humans between interpolated and extrapolated views has been taken as important evidence for the appropriateness of the view combination approach as a description of human object recognition (Bülthoff & Edelman, 1992; Edelman, 1999). As noted earlier, view combination approaches permit some novel views to benefit from their similarity to more than one learned view. The key factor influencing recognition performance appears to be the total rotational distance between the novel views and the various training views; it is this distance that determines the similarity between the novel test view and the stored views. For example, in Spetch and Friedman (2003), the 30° interpolated novel views were 30° from one training stimulus and 60° from the other, but the 30° extrapolated novel views were 30° from one training stimulus and 120° from the other (see previous Figure 10). Thus, neither interpolated nor extrapolated test views were equidistant to the training views, but the interpolated views were closer on average to the training views than were the extrapolated views. In this experimental situation, the participants were 23% more accurate and 133 ms faster to respond to the 30° interpolated views than to the 30° extrapolated views. Bülthoff and Edelman (1992) report similar data over multiple interpolated and extrapolated views that were both equidistant and non-equidistant to the training views.

Notably, the original view-based approach predicts that there should be similar decrements in performance to both the interpolated and extrapolated views, and the view-invariant approach predicts no decrements to either novel view. Consequently, being able to demonstrate performance differences between interpolated and extrapolated views is strong evidence in support of the view combination mechanism.

Figure 10. Stimuli and design of training and test views used in Spetch & Friedman, 2003.

Figure 11. Accuracy and reaction time on tests with training (Trn), interpolated (In), extrapolated (Ex) and far views of the objects. The dashed line indicates chance level. From Spetch & Friedman, 2003.

Figure 12. Back view of apparatus used to display and rotate actual 3-D objects. See Friedman et al., 2003 for details.

Figure 13. Front view of the object rotation apparatus as used for pigeons and humans. From Friedman et al., 2003.

Figure 14. Percent correct for humans and pigeons as a function of object type (1-geon or 3-geon), viewing condition (pictures or directly viewed objects) and pose. Trn = training, Int= interpolated, Ext= extrapolated. Chance level accuracy is 50%. From Friedman et al., 2005.

Figure 15. Reaction times for humans and pigeons as a function of object type (1-geon or 3-geon), viewing condition (pictures or directly viewed objects) and pose. Trn = training, Int= interpolated, Ext= extrapolated. From Friedman et al., 2005.

D. Seeing the real thing: Pictures versus direct views of objects

For practical reasons, almost all studies of object recognition, both with humans and animals, present the objects as images in slides or on a computer screen. This presentation mode allows for rapid, automated presentation of objects in various views. However, such a mode of presentation requires not only object recognition processes, but also processes for interpreting the images as representations of 3-D objects. Moreover, although images of objects may provide pictorial cues to depth, they do not provide other cues that may be important for detecting depth information in the real world, such as binocular disparity and stereopsis. Therefore, it is important to determine whether the processes identified in studies of object recognition using pictures reflect the processes used when recognizing objects viewed directly. This consideration is particularly important for comparative studies of object recognition. In particular, whereas adult humans have extensive experience at interpreting pictures as representations of actual 3-D objects, the same is not true for pigeons. Thus, any differences that emerge between species from studies using pictures could be due to differences in picture interpretation processes instead of, or in addition to, differences in object recognition processes. Therefore, it was important to compare object recognition in pigeons and humans using actual objects to determine whether the differences we found in our previous studies were dependent on the use of pictorial stimuli.

The first step in our endeavor to use actual objects was to devise an apparatus that would allow automated presentations of objects from multiple viewpoints. With the help of our technician, Isaac Lank, we created an apparatus that allowed us to present objects side by side, and to rapidly and automatically rotate the objects between trials to present them in any of 100 orientations (Figure 12).

The object tray contained three compartments for objects, but only two were visible to the subject (Figure 13). By placing two identical objects in the outside compartments, and by sliding the tray randomly back and forth between trials, we could thereby randomize the left-right location of the positive object across trials. The apparatus was designed for use with both pigeons and humans.

To create objects similar to those used in our previous studies (Spetch et al., 2001; Spetch & Friedman, 2003), we used a 3-D printer to make physical instantiations of the 1-part and 3-part objects (see Friedman et al., 2003 for details). For direct-viewing conditions, we displayed these objects in the object-rotation device. For picture-viewing conditions, we took digital photos of the objects at each orientation as they appeared from the viewing area of the object rotation device, and we presented these on the computer screen.

The 3-D presentation apparatus and stimuli allowed us to conduct a series of experiments to determine whether the species differences we had identified previously would appear both when the objects were seen in photographs and when they were viewed directly (Friedman, Spetch, & Ferrey, 2005). First, a comparison of recognition accuracy for 1- part and 3-part objects replicated the previous finding that the presence of a single distinctive part substantially decreased viewpoint dependence for humans but not for pigeons, both when viewing actual objects and their photographs (Figure 14). Thus, this difference between species in the influence of a distinctive part did not appear to reflect picture interpretation processes. Second, however, a comparison of performance with interpolated and extrapolated views of objects yielded an interesting but more complicated set of species differences. In Experiment 1 of Friedman et al. (2005), the two training views were only 60° apart, which was a smaller distance than we had used in the previous studies. The results for the humans were the same for pictures and direct viewing of 3-D objects: They recognized novel interpolated views faster than novel extrapolated views (Figure 15).

The surprising finding in this first experiment was that the pigeons also showed better recognition of novel interpolated views than of novel extrapolated views, both when the objects were viewed directly and when they were presented in photographs. However, and importantly, pigeons’ performance overall was faster and more accurate when they were directly viewing the objects than when they were viewing pictures of them. Thus, we tentatively concluded that pigeons process actual objects and their photographs differently.

Because the difference for pigeons between interpolated and extrapolated views with photographs was inconsistent with our previous results, we conducted a second experiment in which the training views were 90° apart, which was the rotational distance we had used previously. In this case, pigeons showed an advantage for interpolated views only when viewing the objects directly (Figure 16). To summarize, when viewing photographs with a 90° rotational difference between the two training views, we replicated our previous finding with pigeons viewing photographs: They showed no difference in accuracy between interpolated and extrapolated views. However, when pigeons viewed the objects directly, or when they viewed photographs with training views that were only 60° apart, pigeons, like humans, were better able to recognize novel interpolated than novel extrapolated views. Notably, when there was a significant interpolated-extrapolated difference, it was larger for responses to the actual objects than to their images for both species. Together, the two experiments suggest that for pigeons, viewing objects directly invokes different processes, and may involve different representations, than viewing their photographs.

One interesting difference between the species was that humans’ correct reaction times were longer with objects than with pictures whereas pigeons’ correct reaction times were shorter with objects than with pictures. The data are consistent with the interpretation that real objects are easier for pigeons to discrimination than pictures, whereas humans, who have extensive experience with pictures, are as good or better at quickly recognizing objects in pictures.

So, how do we make sense of this pattern of results? The key finding across Experiments 1 and 2 was that the pigeon’s recognition performance to novel interpolated views of actual objects was different than their performance with the photographs of those views, but only when the training views were relatively far apart. In a view combination approach, recognition of a novel view that is equivalent or even better than recognition of a learned view should only occur if the two (or more) representations that were used in the recognition process were sufficiently similar to each other and to the novel view that they would provide activation above a “recognition threshold.” Thus, because pigeons’ successful generalization (or view combination) occurred in the 90° training condition for actual objects but not for their images, it implies that the representations that resulted from seeing actual objects were more broadly tuned than were the representations that resulted from seeing photographs. A related possibility is that, in the 90° training condition, the pigeons represented the two training views as distinctly different objects when they were presented as pictures but as the same object when they were real. This could be because the pictures required interpretative processes that pigeons do not have. If two views are determined to be views of the same object, that may contribute to the breadth of their representations, or to an increase in the connections between representations, and hence, to the ability to generalize to representations “in between” the views that were trained (see Figure 17).

At this stage we can only conjecture about the kinds of cues that actual objects afford but which are absent in their photographs. For example, the depth cues derived from binocular disparity and stereopsis (see Zeigler & Bischof, 1993) may allow a broader representation of the 3-D properties of the objects. Alternatively, movement of the bird could afford slight variation in the experienced views of the objects during training. A study with humans failed to show that head movement improved generalization to novel extrapolated views of real objects, but we cannot rule out the possibility that movement could have contributed to the broader representation for pigeons. Nevertheless, the data underline the importance of using actual objects as stimuli in assessing and modelling human and animal object recognition.

Figure 16. Reaction times for pigeons as a function of viewing condition (pictures or directly viewed objects) and pose. Trn = training, Int= interpolated, Ext= extrapolated. The dashed line indicates chance level. From Friedman et al., 2005.

Figure 17. Diagram showing why generalization would facilitate recognition of interpolated views when the training views are close together (lower figure) but not when they are far apart (top figure). With close training views, the recognition functions for the two views overlap and hence an interpolated view would benefit from combined generalization from both views.

Figure 18. Stimuli used to assess transfer between directly viewed objects and their pictures. Objects and views a and b were used for some pigeons and objects and views c, d, and e were used for other pigeons. From Spetch & Friedman, submitted.

Studies of transfer between pictures and objects

The extensive use of pictorial displays in comparative studies of visual cognition not only raises questions about their external validity with respect to real world objects, but also raises the interesting question of whether non-humans recognize the correspondence between the pictures and the objects or scenes they represent. This question has been addressed in numerous studies, some of which are nicely summarized in a recent book (Fagot, 2000). Briefly, there have been two main approaches to the correspondence question. One approach has been to look for evidence that animals respond to pictures with the same behaviors that are elicited by real stimuli (e.g., aggression or courtship). In birds, this approach has sometimes, but not always, yielded positive results (see review by Fagot, Martin-Malivel & Depy, 2000). However, natural behaviors are often elicited by specific features of a whole stimulus (i.e., a “sign” stimulus such as a patch of color) and can therefore be elicited by highly artificial renditions of the real object (e.g., Tinbergen, 1951). Thus, observing an appropriate reaction to a picture of an object does not necessarily imply that the organism sees the picture as representing the real object. In addition, natural behaviors that occur in response to a single stimulus feature may be genetically “hard-wired,” so they may have little to offer our understanding of whether animals understand the correspondence between objects and their pictorial representations more generally.

The second main approach has been to look for transfer of learned behavior from pictures to the real objects or scenes they represent and vice versa. For birds, the results of such studies have again been mixed. For example, most demonstrations that transfer has occurred have shown positive transfer in one direction only (e.g., Cabe, 1976; Cole & Honig, 1994). Moreover, interpretation of such results is difficult in cases where the learned discrimination could be based on differences between 2-D cues, such as color (see Watanabe, 1997). In such cases, transfer of the discrimination might be based on the presence of these simple 2-D features and may not require any recognition that the pictures correspond to the real stimuli.

Our object rotation device (Friedman et al., 2003) provided an ideal opportunity to assess whether pigeons could show transfer of a learned discrimination based on the 3-D properties of objects. Accordingly, we trained pigeons to discriminate between identically-colored 3-part objects that were learned from two or three views (Spetch & Friedman, submitted; see Figure 18).

Half of the pigeons were trained first with the objects shown as images on the computer screen, and the remaining pigeons were trained with the directly-viewed objects. Following training, the birds were transferred to the other media. For some birds the contingencies for the two objects remained the same (i.e., the S+ object remained positive and the S- object remained negative), but for other birds, the contingencies were reversed. We found that birds transferred with the same contingency showed higher accuracy on the first 250 trials, and met an accuracy criterion significantly faster than the birds transferred with the reversed contingency (Figure 19). Importantly, positive transfer was seen in both directions: from objects to their pictures and vice versa. This is the first time such equivalent transfer between the two types of media has been demonstrated so clearly. At the same time, the birds appeared to clearly notice the difference between the objects displayed in pictures and directly- viewed objects because accuracy on the transfer test in the same contingency group started well below the 80% criterion that was reached prior to transfer.

Figure 19. Discrimination accuracy during the first 250 trials following transfer from pictures to directly viewed objects (top) or from directly viewed objects to pictures (bottom) for pigeons transferred with the same or reversed reinforcement contingencies. The dashed line indicates chance level. From Spetch & Friedman, submitted.

It should be emphasized that our use of identically-colored objects that were learned and had to be recognized from more than one viewpoint made it very unlikely that the birds used a simple 2-D cue, or a genetically-driven sign stimulus to discriminate between the positive and negative objects. Hence, in contrast to some other demonstrations of transfer, recognition of the objects in the new presentation format could not be based on a simple cue like a distinguishing color. We think these results provide the strongest evidence yet that pigeons do see some correspondence between objects viewed directly and in pictures. Again, however, the circumstances in which this bidirectional transfer occurred were specific: Transfer occurred following training with more than one view of an object. We conjecture that this kind of training is more conducive to apprehending the object’s 3-D structure. If this result is upheld, then it has strong implications for avian models of the human visual system, at least for object recognition.

Studies of the Role of Dynamic Cues in Object Recognition

Sometimes objects can be recognized not just by their static properties, such as shape or color, but also by their characteristic motion. Consider a grasshopper and a snake. These creatures have a characteristic biological motion and we can easily discriminate between them simply by the way they move. Vuong and Tarr (2004) identified several ways that dynamic information might play a role in object recognition, including the possibility that a) motion may enhance the detection of an object’s structure, and hence, the recovery of shape information; b) motion provides multiple views of the object’s shape, and thus affords the opportunity for broadly tuned representations; c) motion may permit meaningful edges to be found more readily, and enhance the segmentation of a scene into discrete objects, or into foreground and background, which is a likely precursor to object recognition; d) motion may provide information about how 2-D image features change over time; and e) motion may allow observers to anticipate future views of objects. Thus, in the real world, object recognition may often make use of dynamic information. We therefore need to consider the role of motion cues to develop a more complete understanding of the processes underlying object recognition.

Several recent studies have demonstrated that dynamic information contributes to object recognition in humans (e.g., Liu & Cooper, 2003, Stone, 1998, Vuong and Tarr, 2004). For example, Vuong and Tarr (2005) investigated people’s ability to learn and recognize dynamically-presented artificial objects within a 4-object identification task. Some of the objects were decomposable (i.e., objects made of several geons that can be easily decomposed into parts), whereas other objects were amoeba-like objects that were hard to decompose (see Video 1 and Video 2). Each object was shown in a particular direction of motion. After learning the discriminations, participants were required to name each of the objects as soon as they could once the movie started. Reversing the direction of motion for each object on subsequent tests was found to impair object identification; however, this impairment occurred only for objects that were difficult to discriminate, either because they did not contain decomposable parts or because they were degraded by “dynamic fog” (see Video 3 and Video 4).

The role of dynamic information in pigeons’ object recognition has been investigated in a few recent studies. Cook and Katz (1999) trained pigeons to discriminate between a cube and a pyramid that were sometimes presented as static views taken at random orientations along a particular axis, and at other times they were presented as dynamically rotating along the axis. Transfer tests in which the object color was changed or the axis of rotation was altered revealed better performance with dynamic presentations than with static presentations. However, changes to the direction of motion along the trained axis did not alter discrimination performance. Thus, dynamic information appeared to contribute to, but was not essential for, the object discrimination.

In a somewhat different approach to investigating pigeons’ sensitivity to dynamic cues, Cook, Shaw and Blaisdell (2001) showed that pigeons could discriminate motion paths in terms of their interaction with objects. In particular, the pigeons learned to respond differentially according to whether the motion path of the camera’s perspective moved through an object or moved around an object. (Figure 20). The birds showed significant transfer to new objects, and a disruption in performance when the coherence of the motion path was eliminated by randomly re-arranging the video frames, indicating that the discrimination was based on motion cues.

Figure 20. top: Objects used to create motion pathways (top) and diagram of two movement patterns in “Dynamic object perception by pigeons: discrimination of action in video presentations” by Cook, Shaw and Blaisdell. Copyright, Springer- Verlage, 2001, reprinted with permission.

Evidence that pigeons are sensitive to biological motion and can use natural motion cues to categorize stimuli was provided by Dittrich, Lea, Barrett and Gurr (1998). Using a discriminative autoshaping procedure, they found that some pigeons could discriminate between full video scenes depicting pigeons pecking and scenes depicting pigeons walking. On transfer sessions in which the birds were tested with point light displays of the movements alone, discrimination was substantially lower than for full video displays, but was significantly above chance level, suggesting that some discrimination was based on motion cues alone. Diettrich et al. also gave some pigeons discriminative autoshaping between pecking and walking movements depicted with point-light displays alone. Four of 12 birds learned the discrimination, indicating that they could discriminate between these behaviors on the basis of motion alone. However, none of these birds showed significant transfer to full video displays. Thus, their results suggest that pigeons are capable of discriminating between biologically relevant motion cues, but there was considerable variability between birds in sensitivity to these cues.

In a recent study, we compared pigeons and humans in their discrimination of dynamically presented objects that were each rotating in a particular trajectory (Spetch, Friedman & Vuong, 2005). Both species were trained to respond to one dynamic object (“Go” trials) and to withhold responding to another object (“No Go” trials) that differed in 3-D shape and rotated in the opposite direction along the same trajectory. As in the Vuong and Tarr (2005) study, we used objects that humans find easy to discriminate (objects that can be easily decomposed into parts; see Video 5 and Video 6), as well as objects that humans find hard to discriminate (amoeba-like objects that are hard to decompose; see Figure 21, Video 7, and Video 8).

Figure 21.. Decomposable and non-decomposable object used in Spetch et al. (2005). Objects a and b were the training Go and No Go objects (counterbalanced across participants) and objects c were the novel objects used in testing.

In subsequent tests, we (a) presented the learned objects in reversed direction of motion (see Video 9, 10, 11, and 12), which placed shape and motion cues in conflict, (b) presented the learned objects in an entirely new trajectory (see Video 13, 14, 15, and 16), which tested for the effectiveness of shape cues alone, and (c) presented a new object in the learned motions (see Video 17, 18, 19 and 20), which tested for the effectiveness of motion cues alone.

With decomposable objects, both pigeons and humans showed a decrease in accuracy when the motion of the object was reversed, or when a new motion was presented, but for both species, the discrimination based on shape nevertheless remained above chance level (Figure 22).

Figure 22. Proportion of Go responses to the Go and No Go stimulus by humans and pigeons in various test conditions. Same = trained objects in their characteristic motions, Reversed = trained objects in the opposite motion, New Motion = trained objects in an entirely new motion, New Objects = an entirely new object in the Go and No Go motions. The dashed line indicates chance level. From Spetch et al., 2005.

Further, humans showed this same pattern of results for both decomposable and non-decomposable objects. In contrast, pigeons but not humans, showed significant discrimination of the new object based on learned motion cues alone. That is, for both decomposable and non-decomposable objects pigeons responded positively to the new objects when they appeared in the learned (S+) motion, and refrained from responding when they appeared in the S- motion. In addition, the pigeons, unlike the humans, showed no discrimination between the learned non-decomposable objects when they were presented in a new trajectory, and they responded primarily on the basis of the motion rather than the shape of the non-decomposable objects on the conflict test. It is not clear whether this result occurred because the motion overshadowed (i.e., interfered with learning about) the shape cues of the objects, or whether the pigeons were unable to recognize the invariance of the decomposable objects when they moved in new ways. In either case, the pigeons discriminated between the objects primarily on the basis of motion cues. Further research to determine whether motion cues facilitate or interfere with recognition of static views (e.g., Jitsumori & Makino, 2004) would be interesting to try with our stimuli. Clearly, however, dynamic information can contribute to object recognition in pigeons, and it does so differently than in humans.

Summary

 Our studies of object recognition in humans and pigeons have revealed both similarities and differences between species that have broadened our understanding of the processes involved in ways that studying each species by itself could not. The similarities between pigeons and humans include the following. First, provided that objects do not contain diagnostic features, both species typically show a systematic decrease in recognition accuracy and/or speed as objects are rotated in depth away from the nearest, single, trained view (Spetch et al., 2001; Spetch & Friedman, 2003). This result suggests that in general, object representations are viewpoint dependent and that recognition depends on some sort of view combination or generalization process. Second, object recognition by both species is influenced by the nature of the objects -- specifically whether they are decomposable into parts and whether the parts are unique to each object (Friedman et al., 2005; Spetch et al., 2001). Third, under some circumstances, both species show better recognition of novel views of objects that are interpolated between trained views than of novel extrapolated views (Friedman et al., 2005; Spetch & Friedman, 2003). Another way of describing this effect is to say that both species can better recognize an object from a novel view when the novel view is relatively close to two trained views than when it is the same distance away from only one trained view, or when the distance between the novel view and the trained views is relatively far. Finally, both species are sensitive to characteristic motion cues associated with a dynamically viewed object, and show reductions in recognition accuracy if the motion is altered (Spetch et al., 2005). The extent of these similarities is impressive, given that humans and pigeons differ in so many ways, including evolutionary history and ecology, mode of locomotion, and morphology of the visual system.

Within each of these areas of similarity, we also observed some interesting differences in object recognition between pigeons and humans. First, for humans, the systematic decrease in object recognition as a function of depth rotation is sometimes weak or does not occur if the objects contain a single distinctive part that can serve as a diagnostic feature. We did not see this for pigeons: They showed similar viewpoint dependence whether the objects were composed of the same or different parts (Spetch et al., 1999), and whether paper-clip type objects had zero, one, or multiple added parts (Spetch et al., 2001). Peissig, Young, Wasserman and Biederman (2000) also found that pigeons’ recognition of single geons generally decreased as the geons were rotated away from the training view, a result that differs from some studies with humans (Biederman & Gerharstein, 1993) but not others (Tarr, Williams, Hayward, & Gautier, 1998). Thus, pigeons show substantial viewpoint dependence even with objects for which humans sometimes show viewpoint invariance. This difference is not because pigeons are insensitive to the nature of the objects. Pigeons show better discrimination of objects that are composed of different parts than of objects that are composed of the same parts (Spetch et al., 1999), and they show greater control by the shape of decomposable dynamic objects than by the shape of non-decomposable dynamic objects (Spetch et al., 2005). Thus, the evidence to date suggests that the structure of the objects affects pigeons’ discrimination between them, but not the viewpoint dependence of their object representations.

A second difference between species is that, although pigeons sometimes show better recognition of interpolated than extrapolated novel views, they do so under a more restricted set of conditions than we found with humans (Friedman et al., 2005). Specifically, humans showed better recognition of interpolated views under all conditions we tested, including pictures or real objects and training views separated by 60° or 90°. Pigeons showed the effect with real object when the two training views were separated by either 60° or 90°, but with pictures they showed the effect only when there was a 60° separation between the two training views. One interpretation of this result is that pigeons form a more narrowly- tuned representation of the objects seen in pictures than objects seen directly, and in addition, that their representations for objects seen in pictures are more narrowly tuned that those of humans. As a result, when objects are seen in pictures, the representations that pigeons form if the training views are 90° apart may have insufficient overlap for there to be a substantial benefit of interpolation. A related possibility is that pigeons may encode such pictorial representations as different objects.

A third difference we observed was that that pigeons appeared to be more sensitive than humans to the characteristic motion of a dynamically presented object (Spetch et al., 2005). In particular, pigeons, but not humans, continued to respond discriminatively when the trained motion directions were carried by new objects. Thus, even in the case of decomposable objects, for which pigeons showed greater control by shape than by motion on conflict tests, they nevertheless were able to recognize the characteristic motion independently of the learned object shape.

Our results, which suggest that processes of object recognition in pigeons may differ in interesting ways from those in humans, complement other reports of differences in visual cognition between pigeons and humans. For example, a number of studies, using various tests, have found that pigeons fail to complete images of partially occluded 2-D objects (e.g., Fujita & Ushitani, 2005; Sekuler, Lee, & Shettleworth, 1996). Specifically, when the contour of one object is occluded by another object, pigeons appear to see the occluded object as a fragment rather than a whole object that is behind another object. This finding stands in contrast to results from humans and several other non-human species (see Fujita, 2004). Kelly and Cook (2003) found that discrimination of line orientation by pigeons and humans also was affected in opposite ways by the addition of contextual information. Specifically, humans’ discrimination between oblique lines is enhanced by the addition of spatially contiguous horizontal and vertical lines, (a configural superiority effect) whereas pigeons showed reduced discrimination under these conditions (see also Donis & Heinneman, 1993). Finally, Cavoto and Cook (2001) found that pigeons responded more readily to the local information in a hierarchical display. For example, when small letter O’s (local information) made up a larger letter T (global information) and vice versa, pigeons showed a local advantage in processing the stimuli, which contrasts with the global advantage more typically found with humans (Navon, 1977). Due to the numerous differences in morphology, ecology and evolution, it is not surprising that some differences in visual cognition would emerge between humans and an avian species. Perhaps more surprising are the findings of substantial similarities between these species, both in object recognition as reviewed here and in studies of other aspects of visual cognition such as pattern recognition (e.g., Blough, 1985) and texture discrimination (e.g., Cook, Cavoto, & Cavoto, 1996).

It is important to note that, as with the case of any differences observed in comparative research, further research is needed to determine their generality and causes. The possibility that procedural and/or stimulus factors can influence performance, and hence any differences observed, is highlighted by our finding that pigeons responded more like humans to interpolated views of objects when they viewed them directly rather than in pictures. It will also be important to assess the role of motion in object recognition using directly-viewed objects. It is possible, for example, that the shape of the object may become a more salient dimension for pigeons when they view objects directly. If so, an interesting question is whether shape would then overshadow attention to the object’s characteristic motion and make the performance of pigeons more like that seen for humans.

Directions for Future Research

 Unlike investigations of other processes such as spatial cognition (e.g., see Spetch & Kelly, in press; Cheng & Newcombe, 2005; Balda & Kamil, 2002) and memory processes (e.g., see Grant & Kelly, 2002; Wright, Santiago, Sands, Kendrick, & Cook, 1985), comparative research on the processes underlying object recognition is in its infancy. The research we have summarized provides a starting point, but only a starting point, for how object recognition processes compare across species. This work has revealed some interesting similarities and differences between object recognition processes in pigeons and humans. But much more work with additional species, using different approaches, is needed to make the study of object recognition a truly comparative science.

Shettleworth (1993) nicely outlined two alternative research programs within the field of comparative cognition, namely an anthropocentric program and an ecological program. She suggested that the anthropocentric program has three essential features. First, as the name implies, it is human- centered and asks whether non-humans can do what humans can do. Second, it is concerned with demonstrating whether animals can perform a particular task, rather than understanding the conditions that encourage different cognitive processes. Third, it assumes that evolution is a “ladder of improvement” (p. 179). Shettleworth argues, and we believe rightly so, that such an approach can miss some of the richness of animal cognition and behavior. The alternative, ecological program that Shettleworth advocates focuses on the cognitive processes that animals use to solve ecologically important problems, as well as why, and how, these processes evolved. In this approach, selection of species for comparison is thus based on evolutionary and ecological considerations.

Although we would argue that the research we have summarized does not follow the second two features of the anthropocentric program, it is certainly human-centered in that the comparisons were driven by what is known about object recognition processes in humans and was concerned with whether these processes are similar or different in the pigeon. Both the experimental manipulations and the theoretical ideas were derived primarily from research in human visual cognition. However, we assumed that recognizing objects is a generally important cognitive problem in most ecological niches, much like “dealing with space, time and event correlation” (Shettleworth, 1993, pg. 179), and hence some generality in the processes involved in recognizing objects is likely to be seen. What is clearly needed to complement this work, however, is an ecological program of research on object recognition. For example, it may be that specialized object recognition processes have evolved that differ among related species depending on their ecology (e.g., granivorous versus predatory birds; nocturnal vs diurnal birds), primary mode of locomotion, cues used for individual recognition and mate selection, and so forth. Differences in the morphology of sensory systems and in visual capabilities have certainly evolved (Zeigler & Bischof, 1993), but comparisons between related species that differ in ecological factors are needed to determine whether the underlying cognitive processes are similarly specialized. For example, one might predict that motion cues would be even more important for predatory birds than for granivorous birds because predators may use characteristic motion both to recognize the prey and to anticipate the location of movement during capture (Kelly, 2002). However, many predatory birds, such as falcons, also have extraordinary visual acuity. Hence, it would be interesting to determine whether predatory birds would respond more to shape or motion on a conflict test, and whether high sensitivity to an object’s characteristic motion would facilitate or overshadow their encoding of object shape.

A related interesting question for future research is to investigate whether object recognition processes differ within pigeons, depending on the specific visual system that is used. As mentioned previously, pigeons have frontal fovea for selecting grain and lateral fovea for more distant vision and probably predator vigilance. The frontal visual field is bilateral but the lateral visual field is monocular.

Because of the proximity of the objects in the operant chambers we used for all the research we reviewed, the pigeons presumably solved our tasks, both with pictures and directly viewed objects, using the frontal visual system. It is therefore of interest to assess object recognition in pigeons for laterally-viewed distant objects. It is possible, for example, that presence of a distinctive object part may facilitate viewpoint invariant recognition when pigeons view distant objects from the lateral field. One reason for making this prediction is that we suspect that that viewpoint invariant recognition might be particularly important when pigeons view distant objects during flight because of the numerous views in which the object can be seen. A second reason is that recognition of objects from the lateral visual field, which is monocular, would not benefit from binocular disparity, and consequently might be more sensitive to diagnostic features of objects. These ideas are pure speculation, however, and need to be experimentally tested. It is possible that the different properties of the frontal and lateral visual fields affect low level vision only, and higher level cognitive processes may not differ. For example, some principles of spatial cognition in pigeons hold in both computer touch-screen tasks and open-field tasks that require distant lateral vision (e.g., Spetch et al., 1996, 1997; see Cheng & Spetch, 1998 for a review).

It may also be advantageous to consider the nature of the stimuli and the visual systems employed when examining other interesting findings related to object recognition in pigeons, such as the failure to show perceptual completion and the advantage for local stimuli in hierarchical displays. For example, Fujita and Ushitani (2005) make the interesting suggestion that pigeons’ lack of completion may be related to their ecology: specifically, that their diet consists mainly of small grains that would not need to be completed to be recognized. In fact, they show that under certain circumstances, not completing a occluded objects can facilitate object detection, and they suggest that the lack of object completion may sometimes be advantageous to pigeons. To our knowledge, all tests of completion in pigeons have used pictorial displays that would primarily activate the near frontal visual system. Thus, it would be interesting to test whether pigeons would show completion when viewing objects from a distance, or when viewing real 3-D objects.

The last few decades have seen exciting developments in the field of comparative cognition (for examples, see Cook, 2001 and abstracts from the Conference on Comparative Cognition: http://www.comparativecognition.org ). Our understanding of several important processes, such as memory and spatial cognition has advanced remarkably, due to research programs that are motivated by comparisons to human cognition (e.g., Wright et al., 1985) as well as research programs that are ecologically oriented (e.g., Balda & Kamil, 2002; Bond, Kamil, & Balda, 2003). We believe that these alternative programs are complementary, and sometimes can be merged in very interesting ways. A prime example of such a merger is the investigation of “episodic-like memory” in non-humans. The question itself comes from an anthropocentric approach, but ecological considerations of situations in which a non-human might need to remember what, when and where have led to very interesting studies on this type of memory process in a food-storing bird (e.g., Clayton, Bussey, & Dickinson, 2003). Future research on object recognition may also benefit from these complementary approaches.

References

Balda, R. P., & Kamil, A. C. (2002) Spatial and social cognition in corvids: An evolutionary approach. In Bekoff, M, Allen, C. et al. (Eds.), The cognitive animal: Empirical and theoretical perspectives on animal cognition (pp. 129-133). Cambridge, MA: MIT Press.

Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94, 115-147.

Biederman, I., & Gerhardstein, P. C. (1993). Recognizing depth-rotated objects: Evidence and conditions for threedimensional viewpoint invariance. Journal of Experimental Psychology: Human Perception & Performance, 19, 1506-1514.

Blough, D. S. (1985). Discrimination of letters and random dot patterns by pigeons and humans. Journal of Experimental Psychology: Animal Behavior Processes. 11, 261-280.

Blough, P. M. (2001). Cognitive strategies and foraging in pigeons. In R. G. Cook (Ed.), Avian visual cognition [Online]. Available: www.pigeon.psy.tufts.edu/avc/pblough/

Bond, A. B, Kamil, A. C., & Balda, R. P. (2003). Social complexity and transitive inference in corvids. Animal Behaviour, 65, 479-487.

Broomhead, D. S., & Lowe, D. (1988). Multivariable functional interpolation and adaptive networks. Complex Systems, 2, 321-355.

Bülthoff, H. H., & Edelman, S. (1992). Psychophysical support for a two-dimensional view interpolation theory of object recognition. Proceedings of the National Academy of Sciences, 89, 60-64.

Cabe, P. A. (1976). Transfer of discrimination from solid objects to pictures by pigeons: A test of theoretical models of pictorial perception. Perception and Psychophysics, 19, 545-550.

Cavoto, K. K., & Cook, R. G. (2001). Cognitive precedence for local information in hierarchical stimulus processing by pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 27, 3-16.

Cerella, J. (1990). Pigeon pattern perception: Limits on perspective invariance. Perception. 19, 141-159.

Cerella, J. (1977). Absence of perspective processing in the pigeon. Pattern Recognition, 9, 65-68.

Cheng, K., & Newcombe, N. S. (2005). Is there a geometric module for spatial orientation? Squaring theory and evidence. Psychonomic Bulletin & Review. 12, 1-23.

Cheng, K., & Spetch, M. L. (1998). Mechanisms of landmark use in mammals and birds. In S. Healy (Ed). Spatial representation in animals (pp. 1-17). Oxford: Oxford University Press.

Clayton, N. S, Bussey, T. J., & Dickinson, A. (2003). Can animals recall the past and plan for the future? Nature Reviews Neuroscience, 4, 685-691.

Cole P. D., & Honig, W. K. (1994). Transfer of a discrimination by pigeons (Columba livia) between pictured locations and the represented environment. Journal of Comparative Psychology, 108, 189-198.

Cook, R. G. (2001). Avian visual cognition. [On-line]. Available: www.pigeon.psy.tufts.edu/avc/

Cook, R. G., Cavoto, K. K., & Cavoto, B. R. (1996). Mechanisms of multidimensional grouping, fusion, and search in avian texture discrimination. Animal Learning & Behavior, 24, 150-167.

Cook, R. G., & Katz, J. S. (1999). Dynamic object perception by pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 25, 194-210.

Cook, R. G., Shaw, R., & Blaisdell, A. P. (2001). Dynamic object perception by pigeons: discrimination of action in video presentations. Animal Cognition, 4, 137-146.

Dittrich, W. H., Lea, S. E. G., Barrett, J., & Gurr, P. R. (1998). Categorization of natural movements by pigeons: Visual concept discrimination and biological motion. Journal of the Experimental Analysis of Behavior, 70, 281-299.

Donis, F. J., & Heinemann, E. G. (1993). The object-line inferiority effect in pigeons. Perception & Psychophysics, 53, 117-122.

Edelman, S. (1999). Representation and recognition in vision. Cambridge, MA: MIT Press.

Edelman, S. (1995). Class similarity and viewpoint invariance in the recognition of 3D objects. Biological Cybernetics, 72, 207-220.

Edelman, S., & Bülthoff, H. H. (1992). Orientation dependence in the recognition of familiar and novel views of three-dimensional objects. Vision Research, 32, 2385-2400.

Edelman, S., Bülthoff, H. H., & Bülthoff, I. (1999). Effects of parametric manipulation of inter-stimulus similarity on 3D object categorization. Spatial Vision, 12, 107-123.

Fagot, J. (Ed). (2000). Picture perception in animals. Philadelphia, PA, US: Psychology Press/Taylor & Francis.

Fagot, J. Martin-Malivel, J. & Depy, D. (2000). What is the evidence for an equivalence between objects and pictures in birds and nonhuman primates? In J. Fagot (Ed.). Picture Perception in Animals (pp. 295-320). Philadelphia, PA: Psychology Press/ Taylor & Francis.

Friedman, A., Spetch, M. L., & Ferrey, A. (2005). Recognition by humans and pigeons of novel views of 3-D objects and their photographs. Journal of Experimental Psychology: General, 134, 149-162.

Friedman, A., Spetch, M. L., & Lank, I. (2003). An automated apparatus for presenting depth-rotated three-dimensional objects for use in human and animal object recognition research. Behavior Research Methods, Instruments, and Computers, 35, 343-349.

Fujita, K. (2004). How do nonhuman animals perceptually integrate figural fragments? Japanese Psychological Research, 46, 154-169.

Fujita, K., & Ushitani, T. (2005). Better living by not completing: A wonderful peculiarity of pigeon vision? Behavioural Processes, 69, 59-66.

Grant, D. S., & Kelly, R. (2001). Anticipation and short-term retention in pigeons. In R. G. Cook (Ed.), Avian visual cognition [On-line]. Available: www.pigeon.psy.tufts.edu/avc/grant/

Honig, W. K., & Urcuioli, P. J. (1981). The legacy of Guttman and Kalish (1956): 25 years of research on stimulus generalization. Journal of the Experimental Analysis of Behavior, 36, 405-445.

Husband, S., & Shimizu, T. (2001). Evolution of the avian visual system. In R. G. Cook (Ed.), Avian visual cognition [On-line]. Available: www.pigeon.psy.tufts.edu/avc/husband/

Jitsumori, M., & Makino, H. (2004). Recognition of static and dynamic images of depth-rotated human faces by pigeons. Learning & Behavior, 32, 145-156.

Kelly, D. M. (2002). Avian pattern and object perception. Dissertation Abstracts International: Section B: The Sciences and Engineering, 63(6-B), 3050.

Kelly, D. M., & Cook, R. G. (2003). Differential effects of visual context on pattern discrimination by pigeons (Columba livia) and humans (Homo sapiens). Journal of Comparative Psychology, 117, 200-208.

Kirkpatrick, K. (2001). Object recognition. In R. G. Cook (Ed.), Avian visual cognition [On-line]. Available: www.pigeon.psy.tufts.edu/avc/kirkpatrick/

Logothetis, N. K., Pauls, J., Bülthoff, H. H., & Poggio, T. (1994). View-dependent object recognition by monkeys. Current Biology, 4, 401-414.

Liu, T., & Cooper L.A. (2003). Explicit and implicit memory for rotating objects. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 554-562.

Lumsden, E. A. (1977). Generalization of an operant response to photographs and drawings/silhouettes of a three dimensional object at various orientations. Bulletin of the Psychonomic Society, 10, 405-407.

Metzler, J., & Shepard, R. N. (1974). Transformational studies of the internal representation of three-dimensional objects. In R.S. Solso (Ed.), Theories of cognitive psychology: The Loyola Symposium. NY: Wiley.

Navon, G. (1977). Forest before trees: The precedence of global features in visual perception. Cognitive Psychology, 9, 353-383.

Peissig, J. J., Young, M. E., Wasserman, E. A., & Biederman, I. (2000). The pigeon’s perception of depth-rotated shapes. In J. Fagot (Ed) Picture perception in animals (pp. 37-70). East Sussex: Psychology Press.

Poggio, T., & Edelman, S. (1990). A network that learns to recognize three-dimensional objects. Nature, 343, 263-267.

Sekuler, A. B., Lee. J. A. J., & Shettleworth, S. J. (1996). Pigeons do not complete partially occluded objects. Perception, 25, 1109-1120.

Shepard, D. (1968). A two-dimensional interpolation function for irregularly spaced data. Proceedings of the 23rd National Conference of the ACM, 517-524.

Shepard, R. N., & Metzler, J. (1971). Mental rotation of three-dimensional objects. Science, 171, 701-703.

Shettleworth, S. J. (1993). Where is the comparative in comparative cognition? Alternative research programs. Psychological Science, 4, 179-184.

Spence, K. W. (1937). The differential response of animals to stimuli differing within a single dimension. Psychological Review, 44, 430-444.

Spetch, M. L., & Friedman, A. (2003). Recognizing rotated views of objects: Interpolation versus generalization by humans and pigeons. Psychonomic Bulletin and Review. 10, 135-140

Spetch, M. L., & Friedman, A. (2005). Pigeons see correspondence between objects and their pictures. Submitted for publication.

Spetch, M. L., Friedman, A., & Reid, S. L. (2001). The effect of distinctive parts on recognition of depth-rotated objects by pigeons (Columba livia) and humans. Journal of Experimental Psychology: General, 130, 238-255.

Spetch, M. L., Friedman, A., & Vuong, Q. C. (2005) Dynamic Information Affects Object Recognition in Pigeons and Humans. Submitted.

Spetch, M. L., & Kelly, D. M. (in press). Comparative spatial cognition: Processes in landmark and surface-based place finding. In E. Wasserman and T. Zentall (Eds.), Comparative Cognition: Experimental Explorations of Animal Intelligence. Oxford: Oxford University Press.

Spetch, M. L., Kelly, D. M., & Reid, S. (1999). Recognition of objects and spatial relations in pictures across changes in viewpoint. Cahiers de Psychologie Cognitive/ Current Psychology of Cognition, 18, 729-764.

Stone, J. V. (1998). Object recognition using spatiotemporal signatures. Vision Research, 38, 947-951.

Tarr, M. J. (1995). Rotating objects to recognize them: A case study on the role of viewpoint dependency in the recognition of three-dimensional objects. Psychonomic Bulletin and Review, 2, 55-82.

Tarr, M. J., & Bülthoff, H. H. (1998). Image-based object recognition in man, monkey, and machine. In M.J. Tarr & H.H. Bülthoff (Eds.), Object recognition in man, monkey, and machine (pp. 1-20). Cambridge, MA: MIT Press.

Tarr, M. J., Bülthoff, H. H., Zabinski, M., & Blanz, V. (1997). To what extent do unique parts influence recognition across changes in viewpoint? Psychological Science, 8, 282-289.

Tarr, M. J., & Gauthier, I. (1998). Do viewpoint-dependent mechanisms generalize across members of a class? Cognition, 67, 73-110.

Tarr, M. J., & Pinker, S. (1989). Mental rotation and orientation dependence in shape recognition. Cognitive Psychology, 21, 233-282.

Tarr, M. J., Williams, P., Hayward, W. G., & Gauthier, I. (1998). Three-dimensional object recognition is viewpoint dependent. Nature Neuroscience, 1, 275-277.

Tinbergen, N. (1951). The study of instinct. Oxford: Oxford University Press.

Ullman, S. (1989). Aligning pictorial descriptions: An approach to object recognition. Cognition, 32, 193-254.

Vuong, Q. C., & Tarr, M. J. (2004). Rotation direction affects object recognition. Vision Research, 44, 1717-1730.

Vuong, Q. C., & Tarr, M.J. (2005). Structural similarity and spatiotemporal noise effects on learning dynamic novel objects. Perception, in press.

Wasserman, E. A., Gagliardi, J. L., Cook, B. R., Kirkpatrick-Steger, K., Astley, S. L., & Biederman, I. (1996). The pigeon’s recognition of drawings of depth-rotated stimuli. Journal of Experimental Psychology: Animal Behavior Processes, 22, 205-221.

Watanabe, S. (1997).Visual discrimination of real objects and pictures in pigeons. Animal Learning & Behavior, 25, 185-192.

Wright, A. A. Santiago, H. C., Sands, S. F., Kendrick, D. F., & Cook, R. G. (1985). Memory processing of serial lists by pigeons, monkeys and people. Science, 2, 287-289.

Zeigler, H. P., & Bischof, H. J. (1993). Vision, Brain, and Behavior in Birds. Cambridge, MA: MIT Press.

 

This review was supported by Discovery grants from the Natural Sciences and Engineering Research Council of Canada to each author. We thank Quoc Vuong for assistance with animations. Images for Figure 1 were provided courtesy of Michael J. Tarr (Brown University, Providence, RI). The four videos labeled VT are from Vuong & Tarr, 2005.
Correspondence regarding this article should be addressed to M. Spetch, Department of Psychology, University of Alberta, Edmonton, Alberta, Canada, T6G 2E9, mspetch@ualberta.ca