Volume 10: pp. 25–43

ccbr_vol10_pravosudov_roth_ladage_freas_iconEnvironmental Influences on Spatial Memory and the Hippocampus in Food-Caching Chickadees

Vladimir V. Pravosudov
Department of Biology, University of Nevada, Reno, USA

Timothy C. Roth II
Department of Psychology, Franklin and Marshall College, USA

Lara D. LaDage
Department of Biology, Penn State, Altoona, USA

Cody A. Freas
Department of Biological Sciences, Macquarie University, Australia

Read/Download PDF | Add to Endnote


Abstract

Cognitive abilities have been widely considered as a buffer against environmental harshness and instability, with better cognitive abilities being especially crucial for fitness in harsh and unpredictable environments. Although the brain is considered to be highly plastic and responsive to changes in the environment, the extent of such environment-induced plasticity and the relative contributions of natural selection to the frequently large variation in cognitive abilities and brain morphology both within and between species remain poorly understood. Food-caching chickadees present a good model to tackle these questions because they: (a) occur over a large gradient of environmental harshness largely determined by winter climate severity, (b) depend on food caches to survive winter and their ability to retrieve food caches is, at least in part, reliant on hippocampus-dependent spatial memory, and (c) regularly experience a distinct seasonal cycle of food caching and cache retrieval. Here we review a body of work, both comparative and experimental, on two species of food-caching chickadees and discuss how these data relate to our understanding of how environment-induced plasticity and natural selection generate environment-related variation in spatial memory and the hippocampus, both across populations as well as across seasons within the same population. We argue that available evidence suggests a relatively limited role of environment-induced structural hippocampal plasticity underlying population variation. At the same time, evidence is consistent with the history of natural selection due to differences in winter climate severity and associated with heritable individual variation in spatial memory and the hippocampus. There appears to be no clear direct association between seasonal variation in hippocampus morphology and seasonal variation in demands of food caching. Finally, we suggest that experimental studies of hippocampal plasticity with captive birds should be viewed with some caution because captivity is associated with large reductions in many hippocampal traits, including volume and in some cases neurogenesis rates, but not neuron number. Comparative studies using captive birds, on the other hand, appear to provide more reliable results, as captivity does not appear to override population differences, especially in the number of hippocampal neurons.

Acknowledgements: Vladimir Pravosudov was supported by NSF awards IOS1351295 and IOS0918268 and Lara LaDage was supported by NSF award IOS0918268. We would like to thank Chris Sturdy and three anonymous reviewers for their constructive criticisms that significantly improved the ­manuscript.

Keywords: spatial memory; hippocampus; neurogenesis; neurons; plasticity; natural selection; food caching; environment; winter; ambient temperature; seasonality; chickadee


A key evolutionary question for understanding how environmental heterogeneity is associated with cognitive abilities concerns the relative contribution of environment-induced effects (e.g., plasticity) and natural selection acting on heritable cognitive traits as a means of generating environment-related variation in cognition and neural traits (e.g., Pravosudov & Roth, 2013). At least in humans, there is sufficient evidence that both general cognition and specific cognitive traits are highly heritable and that individual variation in these traits is, at least in part, determined by genetics (e.g., Ando, Ono, & Wright, 2001; Haworth et al., 2010; McGee, 1979; Pedersen, Plomin, Nesselroade, & McClearn, 1992; Plomin, Pedersen, Lichtenstein, & McClearn, 1994; Plomin & Spinath, 2002). Assuming that heritability of cognitive traits is not a unique human phenomenon but is common in other animals, it should provide ample opportunities for natural selection to generate variation in cognitive traits given different selection pressures. Many species occur over a large range of environmental conditions and experience major seasonal changes in their environment. Both geographic and seasonal variation in environmental conditions are likely to impart different demands on cognitive abilities, which may be especially important for fitness in harsher environments (e.g., longer winter period, lower temperatures, more snow cover covering foraging substrates and more frequent snowfalls, etc.) with higher energetic demands (due to lower temperatures) and a shortage of naturally available food (e.g., Pravosudov & Clayton, 2002; Pravosudov & Roth, 2013). It is important to note that the range of seasonal variation is usually also associated with geographic variation with a larger range of seasonal variation in harsher environments (e.g., more northern environments are associated with stronger seasonal differences).

Food-caching chickadees present a good case to understand the relationship between the environment, cognition, and the brain because (a) they occur over a large gradient of environmental harshness with different demands on caching and cache retrieval, (b) caching and cache retrieval depend, at least in part, on hippocampus-dependent spatial memory, and (c) they exhibit highly seasonal food caching behavior.

Population Variation in Spatial Memory and Hippocampus Morphology Is Associated with Differences in Winter Climate Harshness

Food-caching chickadees occur over a large range of environmental conditions with some populations experiencing relatively milder winters and some others experiencing relatively harsher winters. Chickadees are non-migratory birds that spend the non-breeding season in social groups characterized by linear social dominance hierarchy (e.g., Ekman, 1989; Hogstad, 1989) and appear to rely on food caches to survive winters (e.g., Pravosudov & Smulders, 2010). Most food-caching chickadee species live in temperate climates where the highest rates of mortality likely occur during the winter, likely due to the inability to meet energetic requirements. During the winter naturally available food is both in short supply and unpredictable in availability. Thus, food caching has been widely hypothesized to have evolved to provide a more reliable food supply during that time (Krebs, Sherry, Healy, Perry, & Vaccarino, 1989; Pravosudov & Clayton, 2002; Pravosudov & Roth, 2013; Sherry, Vaccarino, Vuckenham, & Herz, 1989; Vander Wall, 1990). At the same time, the large variation in winter harshness associated with climate severity (colder temperatures, more snowfall, longer winter period) across species ranges might be expected to influence the reliance on food caches, depending on winter climate (Pravosudov & Clayton, 2002; Pravosudov & Roth, 2013). Longer winter periods means longer periods without abundant and predictable food supply associated with phenology of main natural food sources (e.g., invertebrates). Colder temperature is likely associated with higher food intake requirements, yet during the winter naturally available food is limited and unpredictable, and more snow (covering both ground and frequently tree branches) likely reduces access to already limited food. In food-caching birds, food caches appear to represent the main reliable food source during the winter, and harsher winter conditions can be expected to increase reliance on food caches for overwinter survival.

It is well established that spatial memory plays a role in successful cache retrieval and, potentially, even in generating the optimal density of caches during caching (e.g., Male & Smulders, 2007), so variation in winter climate harshness could be expected to produce differential demands on spatial memory ability (Pravosudov & Roth, 2013). Birds living in harsher winter environments should benefit from a superior spatial memory that allows them to be more successful in retrieving previously made caches compared to birds wintering in milder climates (Pravosudov & Clayton, 2002). As spatial memory is dependent, at least in part, on the hippocampus, differences in spatial memory among populations that are due to differential dependence on food caches for survival should also be associated with differences in the hippocampus (Pravosudov & Roth, 2013). Such expected differences in spatial memory and the hippocampus might come about via environment-induced plastic phenotypic responses associated with the differential use of memory (Clayton, 1996, 2001; Clayton & Krebs, 1994; Woollett & Maguire, 2011) and/or could be based on genetic differences produced by natural selection if differences in memory and hippocampus morphology are based on heritable mechanisms (Krebs et al., 1989; Pravosudov & Roth, 2013; Sherry et al., 1989). Before discussing the origin of potential population differences in spatial memory and the hippocampus, we shall first consider the data demonstrating such population differences.

Our studies focused on two species of food-caching chickadees—the black-capped chickadee (Poecile ­atricapillus) and the mountain chickadee (P. gambeli). Black-capped chickadees occur over a large range on the North American continent that spans large variation in winter conditions both longitudinally and latitudinally (Figure 1; Pravosudov & Clayton, 2002). Along the latitudinal gradient of winter climate harshness, the black-capped chickadee range expands from a milder climate in Kansas to a much harsher winter climate in Alaska, whereas along the longitudinal gradient, chickadees range from milder climate in Washington state to much harsher winter climate in Maine (Figure 1). The first study compared chickadees from the two most different populations (from most extremely different winter environments) from Alaska (Anchorage) and Colorado and reported that chickadees from Alaska (harsh winters) had a stronger propensity to cache food, significantly better spatial, but not nonspatial memory ability, larger relative and absolute hippocampus volume, and a significantly larger total number of hippocampal neurons (Pravosudov & Clayton, 2002). The follow-up studies (Roth, LaDage, & Pravosudov, 2011; Roth & Pravosudov, 2009) compared 10 populations of black-capped chickadees along the winter climate gradient, including the two populations previously compared in Pravosudov and Clayton (2002). These studies showed that independent of latitudinal differences in day length (shorter in northern populations), harsher winter climatic conditions were associated with larger hippocampus volume, higher total number and larger soma size of hippocampal neurons, larger total number of hippocampal glial cells, and higher neurogenesis rates (Figure 2; Chancellor, Roth, LaDage, & Pravosudov, 2011; Freas, Bingman, LaDage, & Pravosudov, 2013; Roth et al., 2011; Roth & Pravosudov, 2009).

Figure 1. Sampling locations across winter climate severity gradients in black-capped chickadees. AKF — Alaska, Fairbanks; AKA — Alaska, Anchorage; BC — British Columbia; WA — Washington State; MT — Montana; MN — Minnesota; ME — Maine; CO — Colorado; KS — Kansas; IA — Iowa. L — large hippocampus, S — small hippocampus, S-I — small-intermediate hippocampus. Based on Pravosudov et al. (2012).

Figure 1. Sampling locations across winter climate severity gradients in black-capped chickadees. AKF — Alaska, Fairbanks; AKA — Alaska, Anchorage; BC — British Columbia; WA — Washington State; MT — Montana; MN — Minnesota; ME — Maine; CO — Colorado; KS — Kansas; IA — Iowa. L — large hippocampus, S — small hippocampus, S-I — small-intermediate hippocampus. Based on Pravosudov et al. (2012).

Figure 2. Hippocampus volume (A, B, D), total number of hippocampal neurons (A, B, D), and adult hippocampal neurogenesis rates (C, D) in blackcapped chickadees sampled directly from the wild without experiencing any captive environment across latitudinal (A, C) and longitudinal (B) gradient of winter climate harshness and in captive chickadees hand-reared from 10 days of age and maintained in controlled laboratory conditions throughout their entire life (D). From Roth & Pravosudov (2009), Roth et al. (2011), and Roth et al. (2012).

Figure 2A. Black-capped chickadees: hippocampus volume and the number of neurons in wild-caught birds.

A. Black-capped chickadees: hippocampus volume and the number of neurons in wild-caught birds.

Figure 2B. Black-capped chickadees: hippocampus volume and the number of neurons in wild-caught birds.

B. Black-capped chickadees: hippocampus volume and the number of neurons in wild-caught birds.

Figure 2C. Black-capped chickadees: neurogenesis in wild-caught birds.

C. Black-capped chickadees: neurogenesis in wild-caught birds.

Figure 2D. Black-capped chickadees: hippocampus volume, the number of neurons, and neurogenesis in hand-reared vs. wild-caught birds.

D. Black-capped chickadees: hippocampus volume, the number of neurons, and neurogenesis in hand-reared vs. wild-caught birds.

Mountain chickadees experience different winter conditions on a much smaller spatial scale along an elevation gradient of winter climate severity in the mountains, with birds at higher elevations experiencing longer and colder winters (Freas, LaDage, Roth, & Pravosudov, 2012). Higher elevations are associated with significantly lower winter temperatures (likely requiring more food intake to meet higher energetic demands), longer winter period associated with limited natural (e.g., not cached) food supply (likely increasing reliance on food caches for overwinter survival), and significantly more snow cover (both on the ground and on trees) that likely limits access to some potential foraging substrates. Similarly to black-capped chickadees from different winter conditions, mountain chickadees from higher elevations in the Sierra Nevada had a stronger propensity to cache food, better spatial memory ability, larger hippocampus volume, higher total number and larger soma size of hippocampal neurons, and higher hippocampal neurogenesis rates (Figure 3; Freas et al., 2012; Freas, Bingman, et al., 2013; Freas, Roth, LaDage, & Pravosudov, 2013).

Figure 3. Hippocampus volume (A, D), total number of hippocampal neurons (B, E), adult hippocampal neurogenesis rates (C), and telencephalon (minus the hippocampus) volume (F) in mountain chickadees sampled at different elevations directly from the wild (without experiencing captive conditions; A, B, C) and in chickadees captured as juveniles and maintained in the same controlled laboratory conditions for several months (D, E, F — filled circles; open circles represent birds sampled directly from the wild for comparison). From Freas et al. (2012) and Freas, Bingman, et al. (2013).

Figure 3A. Mountain chickadees: hippocampus volume in wild-caught birds.

A. Mountain chickadees: hippocampus volume in wild-caught birds.

Figure 3B. Mountain chickadees: the number of neurons in wild birds.

B. Mountain chickadees: the number of neurons in wild birds.

Figure 3C. Mountain chickadees: neurogenesis in wild-caught birds.

C. Mountain chickadees: neurogenesis in wild-caught birds.

Figure 3D. Mountain chickadees: hippocampus volume in captive vs. wild-caught birds.

D. Mountain chickadees: hippocampus volume in captive vs. wild-caught birds.

Figure 3E. Mountain chickadees: the number of neurons in captive vs. wildcaught birds.

E. Mountain chickadees: the number of neurons in captive vs. wild caught birds.

Figure 3F. Mountain chickadees: telencephalon volume in captive vs. wild-caught birds.

F. Mountain chickadees: telencephalon volume in captive vs. wild-caught birds.

Overall, these combined data on 10 populations of black-capped chickadees (with the data on two of these populations collected twice during different years) and on mountain chickadees from three different elevations are highly consistent in showing significant differences in food caching propensity, spatial memory, and hippocampus morphology related to winter climate. This pattern is, in turn, consistent with the hypothesis that population variation associated with differences in winter climate might be produced by natural selection acting on food caching–related spatial memory (Pravosudov & Roth, 2013).

Harsher environments are likely associated with increased reliance on food caches for overwinter survival and therefore should favor more intense food caching and better spatial memory ability needed to recover food caches. Differential winter mortality based on individual variation in food caching propensity, spatial memory, and hippocampus morphology supporting spatial memory might be expected to result in evolutionary changes in both memory and its neural mechanisms (Pravosudov & Roth, 2013). It is also possible that both memory and hippocampus morphology flexibly adjust to local conditions (e.g., Clayton & Krebs, 1994; Woollett & Maguire, 2011), and that climate-dependent population variation is a product of such environment-induced phenotypic plasticity.

Potential Causes of Climate-Related Variation in Spatial Memory and the Hippocampus

Understanding the causes of climate-dependent population variation in spatial memory and the hippocampus is important for our understanding of both the evolution of cognition and how animals might respond to changing environments and to changes in climate. Most data available so far point toward natural selection acting on heritable mechanisms underlying individual differences in spatial memory and the hippocampus as the main driver for the observed climate-related variation in food-caching chickadees in the following ways:

  1. Population differences in both species have been detected in juvenile birds prior to experiencing their first winter conditions even though climatic conditions during late summer and early autumn do not appear to be energetically challenging, food is usually superabundant, and chickadees mostly cache, but do not retrieve their long-term food caches (e.g., Pravosudov, 1983).
  2. In both species, laboratory conditions did not eliminate population differences in food caching rates, spatial memory performance, and some hippocampal properties (most notably the total number of neurons; Freas et al., 2012; Freas, Bingman, et al., 2013; Pravosudov & Clayton, 2002).
  3. In black-capped chickadees, birds from the two extreme populations (Alaska and Kansas) were hand-reared from the nestling age when the eyes were still closed (10 days of age) and maintained in controlled laboratory conditions during their entire life. Yet hand-reared chickadees from Alaska showed higher food caching rates, displayed better spatial memory performance, were better at novel problem solving, and had significantly larger total number and soma size of hippocampal neurons, higher total number of glial cells, and higher hippocampal neurogenesis rates (Freas, Roth, et al., 2013; Roth, LaDage, Freas, & Pravosudov, 2012). At the same time, the total number of hippocampal neurons and hippocampal neurogenesis rates were statistically similar between wild-caught and “common garden” chickadees from their respective populations. Even though the reason remains unknown, stable number of total neurons and higher neurogenesis rates in Alaska chickadees suggest higher cell death compared to more southern birds.
  4. Significant differences in hippocampal gene expression were detected between “common garden” black-capped chickadees hand-reared from the two extremely different environments in genes known to be involved in neurogenesis and other hippocampal processes even though these birds spent their entire life (from day 10 of age) in the same controlled laboratory conditions (Pravosudov et al., 2013).

All of these data suggest, albeit indirectly, that population differences are unlikely to be a direct plastic response to variation in environmental conditions associated with differential demands for food caches. It remains potentially possible, however, that population differences arise following some triggers during early life or during development. If so, it appears unlikely that the nature of the potential triggers concerns some differences in food caching–related experiences. It has been shown that memory-based caching experiences are critical for hippocampus development, yet it appears that just a few caching and cache-retrieval experiences are sufficient for full hippocampus development (Clayton, 1996, 2001; Clayton & Krebs, 1994). Considering that both black-capped and mountain chickadees cache thousands of food items starting in later summer (Brodin, 2005), it is clear that chickadees in all populations exceed the minimum threshold shown to be critical for hippocampus development (Pravosudov & Roth, 2013). Yet, even when food caching was severely limited in laboratory conditions both in chickadees collected as juveniles after having some food caching experiences and in chickadees hand-reared as nestlings prior to any caching experiences, significant differences in spatial memory performance and in most hippocampal properties remained (Freas et al., 2012; Freas, Roth, et al., 2013; Roth et al., 2012). Nevertheless, the possibility that climate-related differences in memory and the hippocampus are associated with epigenetic (e.g., developmental) or maternal (e.g., yolk hormones) effects remains viable and, as of yet, untested.

What Is Plastic in the Hippocampus: Experimental Studies

Although all studies so far have been unable to eliminate population differences in memory and the hippocampus by manipulating environmental conditions, these studies provided important information about the plasticity of the hippocampus and suggested that some hippocampal properties are very plastic (e.g., hippocampus volume, neuron soma size, total number of glial cells), but others are not (total number of hippocampal neurons).

Hippocampus Volume

Many studies testing the hypothesis that interspecific variation in hippocampus size represents adaptive specialization related to memory-dependent food caching behavior (e.g., Krebs et al., 1989; Sherry et al., 1989) used hippocampus volume as a dependent measure. Population comparisons of both black-capped and mountain chickadees also used hippocampus volume among many other hippocampal properties and reported significant climate-related differences (Freas et al., 2012; Freas, Roth, et al., 2013; Pravosudov & Clayton, 2002; Roth et al., 2011; Roth & Pravosudov, 2009). Yet, hippocampus volume is undoubtedly one of the most plastic of all hippocampal properties. Multiple studies documented that when chickadees and other passerine birds are brought into laboratory conditions, their hippocampus volume shrinks by about 30% (LaDage, Roth, Fox, & Pravosudov, 2009; Smulders, Shiflett, Sperling, & DeVoogd, 2000; Tarr, Rabinowitz, Imtiaz, & DeVoogd, 2009). Hippocampus volume in black-capped chickadees that have been hand-reared and maintained in controlled laboratory conditions was also significantly smaller than that in chickadees sampled directly from the wild and without any period of captivity (Roth et al., 2012).

The effect of memory-based experiences on the development of the hippocampus has been well documented for young, inexperienced-in-food-caching parids (Clayton, 1996, 2001; Clayton & Krebs, 1994). If inexperienced young birds are deprived of food caching and cache retrieval experiences, their hippocampus volume remains smaller than that of adults or young birds provided such experiences. Most important, only a few caching experiences are needed for the hippocampus to reach its full volume, and further experiences do not result in any additional increases in volume (Clayton, 2001; Clayton & Krebs, 1994). At the same time, restriction of memory-based experiences in “experienced” birds has been suggested to result in hippocampus volume reductions (Clayton & Krebs, 1994). This latter finding, however, was not supported by another study using wild-caught birds in a controlled laboratory environment, which showed no differences in hippocampus volume between experienced mountain chickadees deprived of food caching and cache retrieval experiences for several months and chickadees regularly engaged in these activities (LaDage et al., 2009).

It is unclear which specific mechanisms result in captivity-related changes in hippocampus volume. For birds caught as juveniles/adults and brought into captive laboratory conditions, captivity-related stress is a likely cause (Roth et al., 2012). At the same time, experimental manipulations of memory use and food caching and retrieval in captive conditions failed to produce significant differences in hippocampus volume (LaDage et al., 2009), which suggests that memory use alone might not have a strong effect on hippocampus volume in experienced birds.

It is also possible that memory use does not show any effects on hippocampus volume specifically in captive birds, which already have a much reduced hippocampus volume due to captive environment. Yet manipulations of memory use in captivity do have an effect on other hippocampal processes such as adult neurogenesis rates (LaDage, Roth, Fox, & Pravosudov, 2010). In contrast to avian studies, human learning experiences are correlated with posterior hippocampus volume (Woollett & Maguire, 2011), but there were no structural changes in individuals who trained, but failed to learn spatial information. It remains unclear, however, what exactly did change in the human hippocampus that resulted in an increased volume.

Seasonal changes in food caching are associated with changes in day length, yet photoperiod manipulations in captive chickadees aimed to simulate seasonal changes in day length also failed to generate significant differences in hippocampus volume, even though such manipulations affected food caching rates (Hoshooley, Phillmore, & MacDougall-Shackleton, 2005; Krebs, Clayton, Hampton, & Shettleworth, 1995; MacDougall-Shackleton, Sherry, Clark, Pinkus, & Hernandez, 2003).

All in all, hippocampus volume exhibits a large degree of plasticity, but it remains unclear whether such plasticity is memory dependent in fully developed, experienced food-caching chickadees.

Hippocampal Neuron Soma Size

In both black-capped and mountain chickadees, hippocampal neuron soma size was significantly associated with winter climate severity, with birds in harsher environments having larger hippocampal neuron soma (Figure 4; Freas, Bingman, et al., 2013). Similar to the hippocampus volume, hippocampal neuron soma size appears highly plastic, and captivity resulted in significant soma size reduction in both black-capped and mountain chickadees (Figure 4; Freas, Bingman, et al., 2013; Freas, Roth, et al., 2013). Furthermore, it appears that captivity specifically affected neuron soma size in the hippocampus but not in the areas adjacent to the hippocampus (Freas, Bingman, et al., 2013). Despite significant reduction in hippocampal neuron soma size due to captive conditions, population differences remained significant in the hand-reared black-capped chickadees from the two extremely different environments (Freas, Bingman, et al., 2013). The fact that chickadees from the harsher environment still had significantly larger hippocampal neuron soma even though they spent their entire life (from day 10 of age) in the same controlled laboratory environment as chickadees from the milder environment suggests that these differences are regulated, at least in part, by some heritable mechanisms.

Figure 4. Mean hippocampal neuron soma size in wild black-capped chickadees (A) along environmental gradients and in wild-caught mountain chickadees (B) from different elevations. Mean hippocampal neuron soma size (C) as well as neuron soma size in brain area HA (G) and M — mesopallium (D) in mountain chickadees from a single elevation (mid) sampled directly from the wild and captured as juveniles. These birds were maintained in laboratory conditions under two treatments: deprived (no food caching and cache retrieval experiences) and experienced (regular food caching and cache retrieval experiences). Mean hippocampal neuron soma size in black-capped chickadees (E) from two environments at the extremes of the winter harshness range sampled directly from the wild (filled circles) and hand-reared and maintained in controlled laboratory environment (open circles). Mean hippocampal neuron soma size in mountain chickadees (F) from two elevations, both sampled directly in the wild (open circles) and captured as juveniles, but maintained in a controlled laboratory environment (filled circles). From Freas, Bingman, et al. (2013).

Figure 4A. Black-capped chickadees: neuron soma size in wild-caught birds.

A. Black-capped chickadees: neuron soma size in wild-caught birds.

Figure 4B. Mountain chickadees: neuron soma size in wild-caught birds.

B. Mountain chickadees: neuron soma size in wild-caught birds.

Figure 4C. Mountain chickadees: hippocampal neuron soma size in wild-caught and captive birds with differences in memory use.

C. Mountain chickadees: hippocampal neuron soma size in wild-caught and captive birds with differences in memory use.

Figure 4D. Mountain chickadees: M neuron soma size in wild-caught and captive birds with differences in memory use.

D. Mountain chickadees: M neuron soma size in wild-caught and captive birds with differences in memory use.

Figure 4E. Black-capped chickadees: neuron soma size in hand-reared vs. wild-caught birds.

E. Black-capped chickadees: neuron soma size in hand-reared vs. wild-caught birds.

Figure 4F. Mountain chickadees: neuron soma size in captive vs. wild-caught birds.

F. Mountain chickadees: neuron soma size in captive vs. wild-caught birds.

Figure 4G. Mountain chickadees: HA neuron soma size in wild-caught and captive birds with differences in memory use.

G. Mountain chickadees: HA neuron soma size in wild-caught and captive birds with differences in memory use.

Similar to hippocampus volume, it remains unclear what exactly causes the reduction in hippocampal neuron soma size associated with a captive environment. Experimental manipulation of memory-based food caching and cache recovery did not produce any detectable effects on hippocampal neuron soma size, yet this manipulation did have a significant effect on hippocampal neurogenesis rates (Freas, Bingman, et al., 2013). So it is possible that neuron soma size reduction might be due to stress associated with captivity in birds captured as juveniles or adults (as in LaDage et al., 2009; LaDage et al., 2010). On the other hand, neuron soma were also significantly smaller in the “common garden” black-capped chickadees, which spent their entire life in controlled laboratory conditions and it is unlikely that these birds experienced captivity-associated stress similar to wild-caught birds (Freas, Bingman, et al., 2013). For example, hippocampal neurogenesis rates in these “common garden” birds were statistically indistinguishable from those in wild-caught birds that experienced natural, and unquestionably much richer, environments (Roth et al., 2012).

Overall, experimental results suggest that environment-related changes in hippocampus volume could be at least partially due to changes in hippocampal neuron soma size. Interestingly, captivity had no effect on telencephalon volume in chickadees (Freas, Roth, et al., 2013; LaDage et al., 2009) and also no effect on neuron soma size in telencephalic areas adjacent to the hippocampus (Freas, Bingman, et al., 2013). While it is extremely likely that captivity-associated stress is one of the drivers for such changes, it remains unclear how memory-related experiences might affect hippocampal neuron soma size. At least in captive birds collected as juveniles from the wild, manipulating the number of memory experiences failed to produce a detectable effect on hippocampal neuron soma size (Freas, Bingman, et al., 2013).

Hippocampal Glia Numbers

The total number of hippocampal glial cells was significantly different between the two populations of black-capped chickadees from extremely different environments, with birds from harsher environment having more glia (Figure 5; Roth, LaDage, Chavalier, & Pravosudov, 2013). At the same time, the number of glia also showed environment-induced plasticity as chickadees that were hand-reared and maintained in the same controlled laboratory environment had significantly fewer hippocampal glia cells compared to juvenile wild-caught birds (Roth et al., 2013). Both population- and captivity-related differences in the number of hippocampal glia closely followed differences in hippocampus volume and in hippocampal neuron soma size, which suggest that plasticity in the hippocampus volume is likely due, at least in part, to changes in the number of glia. At the same time, population differences in glia still remained significant even in birds that were hand-reared and maintained in the same controlled laboratory environment—a result that suggests involvement of some heritable mechanisms underlying population differences (Roth et al., 2013). Overall, it appears that the number of hippocampal glia cells is both plastic and, to a degree, controlled by some heritable mechanisms, which might respond to selection pressure associated with environmental differences.

Figure 5. Mean total number of hippocampal glial cells in black-capped chickadees from two populations from the extremes of the environmental harshness range sampled both directly from the wild (filled circles) and hand-reared and maintained in the same controlled laboratory environment (open circles). From Roth et al. (2013).

Figure 5. Mean total number of hippocampal glial cells in black-capped chickadees from two populations from the extremes of the environmental harshness range sampled both directly from the wild (filled circles) and hand-reared and maintained in the same controlled laboratory environment (open circles). From Roth et al. (2013).

Hippocampal Neuron Numbers

In both black-capped and mountain chickadees, significant population differences in the total number of hippocampal neurons was associated with winter climate harshness (Freas et al., 2012; Pravosudov & Clayton, 2002; Roth et al., 2011; Roth & Pravosudov, 2009). Chickadees from harsher environments had significantly more hippocampal neurons. In contrast to all other, previously discussed hippocampal properties, the total number of neurons does not appear plastic. A captive environment resulted in significant reductions in hippocampus volume, neuron soma size, and glial numbers, but not in the total number of neurons (Freas, Bingman, et al., 2013; Freas, Roth, et al., 2013; LaDage et al., 2009). In mountain chickadees, two independent studies confirmed that a period of several months in captivity produced no significant effects on the total number of hippocampal neurons in birds collected as experienced juveniles (Figures 4, 6; Freas, Roth, et al., 2013; LaDage et al., 2009). In black-capped chickadees, birds that were hand-reared and maintained in the same controlled laboratory environment had a statistically similar total number of hippocampal neurons to chickadees sampled as experienced juveniles in their natural environment (Figure 2; Roth et al., 2012). Furthermore, in both species, there were significant differences related to variation in winter climate in the number of hippocampal neurons both in wild-caught and captivity-maintained individuals (Freas, Roth, et al., 2013; Roth et al., 2012). Therefore, whereas population differences in hippocampus volume were associated with differences in the total number of hippocampal neurons, within-population changes in hippocampus volume were independent of the total number of neurons. These results suggest that the total number of hippocampal neurons is most likely controlled by some heritable mechanisms, which could be acted upon by natural selection. While at least some population variation in hippocampus volume might be due to potential differences in experiences, population variation in the total number of hippocampal neurons does not appear to be influenced directly by the environment. Even when hippocampus volume was reduced by as much as 30% in captivity, the number of neurons appeared to remain unchanged. So, the number of neurons might serve as a more rigid hippocampus structure, while the neuron soma size (and likely associated arborization/connectivity) and the number of glial cells are prone to changes due to immediate environmental conditions, which could produce changes in hippocampus volume independent of the number of neurons.

Hippocampal Neurogenesis

Adult hippocampal neurogenesis, a process of production, survival, and recruitment of new neurons in the hippocampus, has been generally linked to spatial learning (e.g., Barnea & Pravosudov, 2011). As food-caching birds appear to rely on spatial memory to recover their food caches, hippocampal neurogenesis is likely an important process that might potentially be under selection. A two-species comparison indeed showed that a food-caching species had significantly higher hippocampus neurogenesis rates (Hoshooley & Sherry, 2007). In both black-capped and mountain chickadees, adult hippocampus neurogenesis rates (estimated as the number of new immature neurons) were significantly associated with winter climate harshness, with birds from harsher climates having higher neurogenesis rates (Figures 2, 3; Chancellor et al., 2011; Freas et al., 2012). These population differences were in general agreement with the data on all other hippocampal properties: harsh winter climate was associated with larger hippocampus volume, larger total number and soma size of hippocampal neurons, larger total number of hippocampal glia cells and higher adult hippocampal neurogenesis rates. The question is whether these climate-related population differences in neurogenesis rates reflect plastic adjustments to local conditions and experiences or whether these differences might be, at least in part, controlled by some heritable mechanisms.

Results of experimental studies in food-caching birds suggest that adult hippocampal neurogenesis is significantly reduced in captive chickadees captured as experienced juveniles or adults (Figure 6; Barnea & Nottebohm, 1994; LaDage et al., 2010), and that spatial memory experiences additionally affect hippocampal neurogenesis rates in wild-caught captive chickadees (LaDage et al., 2010). Mountain chickadees maintained in captive laboratory conditions, but allowed to engage in memory-based food caching and cache retrieval, had significantly higher neurogenesis rates compared to captive chickadees denied such experiences. At the same time, even experienced chickadees had significantly, and much lower, hippocampal neurogenesis rates than birds sampled directly from the wild (i.e., trapped and sacrificed without experiencing captivity; LaDage et al., 2010).

Figure 6. Effect of captivity and food caching related memory use on telencephalon (minus the hippocampus) volume (A), hippocampus volume (B), total number of hippocampal neurons (C) and adult hippocampal neurogenesis rates (D, E) in mountain chickadees. From LaDage et al. (2009, 2010).

Figure 6A. Mountain chickadees; telencephalon volume in wild-caught and captive birds with differences in memory use.

A. Mountain chickadees; telencephalon volume in wild-caught and captive birds with differences in memory use.

Figure 6B. Mountain chickadees; hippocampus volume in wild-caught and captive birds with differences in memory use.

B. Mountain chickadees; hippocampus volume in wild-caught and captive birds with differences in memory use.

Figure 6C. Mountain chickadees; hippocampal neuron numbers in wild-caught and captive birds with differences in memory use.

C. Mountain chickadees; hippocampal neuron numbers in wild-caught and captive birds with differences in memory use.

Figure 6D. Mountain chickadees; proportion of new hippocampal neurons in wildcaught and captive birds with differences in memory use.

D. Mountain chickadees; proportion of new hippocampal neurons in wildcaught and captive birds with differences in memory use.

Figure 6E. Mountain chickadees; hippocampal neurogenesis in wild-caught and captive birds with differences in memory use.

E. Mountain chickadees; hippocampal neurogenesis in wild-caught and captive birds with differences in memory use.

Tarr et al. (2009) was so far the only study that reported no significant effect of captivity on new hippocampal neuron survival in black-capped chickadees—a result that is strikingly different from those reported in at least two other studies (Barnea & Nottebohm, 1994; LaDage et al., 2010). It is unclear why there was such discrepancy among the studies; in addition, Tarr et al. (2009) used methods that differ from those in all other studies. For example, Tarr et al. (2009) used multiple covariates, such as body mass, brain mass, and telencephalon volume, including the hippocampus in their analyses of the effect of captivity on the number of new cells. Use of these continuous variables as covariates can significantly affect the results concerning the effect of captivity on neuron survival rate, yet the effect of captivity on these variables has not been reported. Using the hippocampus volume as part of the overall telencephalon volume might confound the results, as the hippocampus volume is known to be affected by captivity. The question is whether new neuron survival is affected independently of any changes in the hippocampus volume. Unfortunately, Tarr et al. (2009) did not report analyses based on raw numbers of new surviving neurons, so it remains unclear whether there was an effect of captivity on the total number of new neurons. Barnea and Nottebohm (1994) reported significant reduction in new neuron survival in captive black-capped chickadees. In mountain chickadees, captivity resulted in a more than 30% reduction in the number of new immature neurons, although because of the methods used to label new neurons, this number represents a combination of new immature neurons of different age and therefore combines new neuron production and neuron survival (LaDage et al., 2010).

To add more confusion, there were no significant differences in adult neurogenesis rates (combined new neuron production and survival) between black-capped chickadees sampled directly from the wild and birds hand-reared and maintained in controlled laboratory conditions for many months (Figure 2; Roth et al., 2012). The only difference between the “common garden” study and all other chickadees studies mentioned above was that captive birds in the “common garden” study have never experienced “the wild,” while in the other studies wild-caught experienced birds were brought into the lab. Such results suggest that captivity-related differences in neurogenesis rates might be directly affected by stress of captivity in wild-caught birds, whereas hand-reared birds might not be affected by such stress (Roth et al., 2012). It is also likely that most laboratory rodent and avian studies showing environmental effects on hippocampal neurogenesis (e.g., review in Barnea & Pravosudov, 2011) also detect neurogenesis rates much below the normal “base” levels, which could indeed be improved by even slight environmental changes in extremely impoverished lab conditions. For example, Hall et al. (2014) reported significant effects of flight exercise on adult neurogenesis using doublecortin staining to quantify neurogenesis in adult starlings (Sturnus vulgaris) captured and maintained in a laboratory. The number of new neurons reported in Hall et al. (2014) is much smaller than that reported for wild chickadees using the same method (LaDage et al., 2010; Roth et al., 2012). Even though starlings are not a food-caching species and so likely have lower levels of hippocampal neurogenesis (Hoshooley & Sherry, 2007), it is also very likely that these numbers are much reduced due to captivity and so additional exercise might simply reduce captivity-related stress’s effect on neurogenesis, rather than have an additive effect on the naturally present baseline. Interestingly, photoperiod manipulations designed to imitate seasonal day length changes associated with seasonal variation in food caching activity also failed to produce any significant differences in hippocampal neurogenesis rates in captive birds (Hoshooley et al., 2005), even though such manipulations are known to affect food caching rates (MacDougall-Shackleton et al., 2003).

The major question is whether there is a threshold after which additional enhancements do not have any effects on neurogenesis. Our “common garden” experiment results certainly point in that direction as unstressed, hand-reared birds maintained in a relatively enriched captive environment (large cages, unrestricted food caching experiences) have similar hippocampal neurogenesis rates to the wild birds that experience an immensely richer natural environment. At the same time, memory experiences in likely stressed captive birds captured as juveniles or adults appear to ameliorate the negative effect of stress on neurogenesis (LaDage et al., 2010). Interestingly, food caching–related learning experiences have also been reported to increase hippocampal neurogenesis rates in juvenile “experience-naïve” hand-reared marsh tits (P. palustris; Patel, Clayton, & Krebs, 1997), but such an increase appears to be related to the initial memory experiences responsible for hippocampus growth and development rather than to experience-based adult neurogenesis in experienced birds.

The question remains, however, whether any additional experiences would also lead to increased neurogenesis rates given a hypothetical threshold. Results with “common garden” chickadees are certainly consistent with the ­threshold hypothesis as it would be difficult to explain otherwise why birds that spent their entire life in laboratory conditions had statistically indistinguishable neurogenesis rates from their conspecifics in natural conditions in the wild. These results also suggest that mechanisms regulating adult hippocampal neurogenesis rates might be heritable and therefore a potential target for natural selection acting on spatial memory.

It is also possible that food-caching species might be different from other non-caching species in maintaining hippocampal neurogenesis at high levels at all times. For example, hippocampal neurogenesis rates were almost three times as high in food-caching black-capped chickadees as in non-caching house sparrows (Passer domesticus) even after spending six weeks in captivity (Hoshooley & Sherry, 2007).

Conclusions of Experimental Studies

Experimental studies manipulating environment/experiences in food-caching chickadees suggest that most hippocampal properties, with the exception of neuron number, are likely both plastic and at the same time controlled by some heritable mechanisms. Environment-induced plasticity in hippocampus volume appears to be related to plasticity in hippocampal neuron soma size and the number of glial cells, but not in the total number of neurons. The total number of hippocampal neurons, on the other hand, appears to be fairly constant regardless of environmental manipulations, suggesting that it is regulated by some heritable mechanism(s).

Plastic changes due to experimental manipulations in hippocampus volume, neuron soma size, and the number of glia cells also do not override population differences associated with winter climate harshness, which further suggests that such differences are likely due, at least in part, to natural selection acting on food caching–related spatial memory. It appears that the main differences among populations are based on the differences in the total number of hippocampal neurons while neuron morphology (soma) and the number of glia cells exhibit additional experience-based variation. It remains unclear, however, how much of such variation is due to differences in memory-based experiences versus stress and whether any “positive” effects in laboratory studies are still well below the baseline natural levels.

Correlational Studies: Seasonal Variation

Food-caching birds present a good case to better understand plasticity of the brain because of the highly distinct seasonality in food caching behavior (Pravosudov, 2006). Food-caching parids such as chickadees cache tens of thousands of food items during late summer–early fall (e.g., long-term caching; Brodin, 2005) and might also cache again in spring (Pravosudov, 2006), while caching much less (e.g., short-term caching) during the winter and potentially not caching at all during summer.

The three studies that brought a large amount of interest to brain plasticity associated with food caching seasonality in black-capped chickadees showed that hippocampal neuron incorporation rates were higher during late autumn (Barnea & Nottebohm, 1994) and hippocampus volume and the total number of neurons were also highest during autumn (Smulders, Sasson, & DeVoogd, 1995; Smulders et al., 2000). Smulders et al. (2000) used birds from the Smulders et al. (1995) study and estimated the total number of hippocampal neurons based on the hippocampus volume. These latter two studies received especially visible attention from public media, which frequently stated that food-­caching chickadees can enlarge their hippocampi by 30% every year. Unfortunately, all available evidence combined (see below) does not support these initial claims.

First, even the initial studies provided conflicting information about seasonal changes in the number of neurons. Smulders et al. (2000) reported significant seasonal variation in the total number of hippocampal neurons, but Barnea and Nottebohm (1994) failed to detect such seasonal variation in the same species while reporting variation in hippocampal neuron incorporation rates only. At least two additional studies also failed to replicate results reported in Smulders et al. (1995) and Smulders et al. (2000) by showing no significant seasonal variation in both hippocampus volume and the total number of hippocampal neurons in black-capped chickadees (Hoshooley & Sherry, 2004; Hoshooley, Phillmore, Sherry, & MacDougall-Shackleton, 2007). These two latter studies also reported somewhat conflicting results on seasonal variation in adult hippocampal neurogenesis; Hoshooley and Sherry (2004) failed to detect significant seasonal variation in new neuron survival over 1–2 weeks, but Hoshooley et al. (2007) reported significantly higher new neuron survival rates over a 1-week period in January. Finally, Hoshooley and Sherry (2007) reported that chickadees sampled in autumn (October–November) had significantly smaller hippocampus volume and smaller number of hippocampal neurons compared to chickadees sampled in spring (March–April), a result that goes directly against the initial reports of a larger hippocampus in autumn (Smulders et al., 1995). At the same time, Hoshooley and Sherry (2007) detected no significant differences in hippocampal neurogenesis rates (new neuron survival over 6 weeks) between chickadees sampled in autumn and in spring. Finally, experimental manipulations of photoperiod in laboratory-maintained chickadees failed to produce any significant differences in hippocampus volume or hippocampal neurogenesis rates despite significantly affecting food caching rates (Hoshooley et al., 2005; Krebs at al., 1995; MacDougall-Shackleton et al., 2003). Overall, these results do not seem to provide convincing support that any of the hippocampal properties vary consistently and specifically in relation to seasonal cycle of memory-based food caching and cache retrieval. So why are there such discrepancies among the studies?

Hippocampus Volume

Using the same species in generally similar environmental conditions (Ithaca, New York and London, Ontario), one study reported significant seasonal variation in hippocampus volume (Smulders et al., 1995), the other two detected no seasonal variation (Hoshooley et al., 2007; Hoshooley & Sherry, 2004), and the fourth actually reported that chickadees sampled in autumn had significantly smaller hippocampus volume compared to chickadees sampled in spring (Hoshooley & Sherry, 2007). There are a couple of potential explanations for these differences.

  1. Birds have been sampled in different years and in different locations, so it is possible that seasonal variation was present only in some years or only at a particular location. If that were the case, it would suggest that seasonal variation in hippocampus volume is likely not a regular phenomenon, but it might sometimes occur. Considering that winter climate conditions might be expected to be somewhat similar at both locations, this explanation does not seem likely.
  2. The two labs used different methods to generate hippocampus volume estimates. Smulders et al. (1995) adjusted hippocampus volumes for the overall brain shrinkage (measured as brain mass change after post perfusion fixation process), which showed significant seasonal variation. Hoshooley and Sherry (2004, 2007) and Hoshooley et al. (2007) did not use such an adjustment. It is unfortunate that Smulders et al. (1995) did not report their data without adjusting for potential brain shrinkage so that it would be possible to evaluate whether these differences between the studies might be due to such an adjustment. At the same time, the purpose of such an adjustment is not entirely clear since hippocampus volume is measured relative to the rest of the telencephalon. In other words, even if the entire brain shrinks more, the ratio of hippocampus to telencephalon should remain the same, assuming that shrinkage is not influenced by region. Adjusting for shrinkage, on the other hand, might potentially generate spurious results specifically in regard to the relative hippocampus volume.

Seasonal Variation in the Total Number of Hippocampal Neurons

Again, seasonal variation in the total number of hippocampal neurons was reported in a single study (Smulders et al., 2000), while two other studies reported no significant seasonal variation (Barnea & Nottebohm, 1994; Hoshooley & Sherry, 2004) and one study actually reported the opposite pattern by showing that chickadees sampled in autumn had a significantly smaller number of hippocampal neurons than chickadees sampled in spring (Hoshooley & Sherry, 2007). These studies did not use unbiased stereological methods (e.g., optical fractionator, West, Slomianka, & Gunderson, 1991) to estimate the total number of neurons, but instead either counted cells only in some nonrandomly chosen areas (e.g., Smulders et al., 2000) and/or seemed to use neuron densities (number of cells divided by volume). Cell density is directly dependent on hippocampus volume and any shrinkage/variation in volume due to tissue processing could potentially produce biased results when the hippocampus volume, but not the number of neurons (or vice versa), shows significant variation. The optical fractionator method provides an estimate that is independent of tissue shrinkage or other variation in volume that is not associated with changes in neuron numbers (e.g., West et al., 1991). The optical fractionator method does depend on the volume, as a larger volume would result in more counting frames, which are used to estimate the total number of neurons. However, unlike direct density estimates (e.g., number of cells divided by volume), the optical fractionator would produce the same estimate for the number of cells if different volumes were associated with the same number of neurons. Considering that at least two studies showed no significant differences in the total number of hippocampal neurons between wild and captive birds using stereological methods when the hippocampus volume differed by almost 30% (Freas, Roth, et al., 2013; LaDage et al., 2009), it does not seem likely that chickadees would exhibit regular significant seasonal variation in the total number of hippocampal neurons. In fact, black-capped chickadees sampled at almost the same time when Smulders et al. (2000) reported a significant peak in the number of neurons (October) had a statistically indistinguishable number of hippocampal neurons from those in chickadees that were hand-reared and maintained in controlled laboratory conditions and were sampled in spring (Roth et al., 2012). If the number of neurons reflected differences in memory-based food caching, it should be expected that wild chickadees at the peak of food caching should experience much higher memory demands than hand-reared birds living in relatively small cages, yet these two groups did not differ significantly in the total number of neurons (Roth et al., 2012). Finally, Hoshooley and Sherry (2007) also reported a higher number of hippocampal neurons in spring compared to autumn—a pattern opposite to the one suggested by Smulders et al. (2000).

While it is impossible to say why only one of the four studies was able to report seasonal differences in the number of hippocampal neurons, considering all correlational and experimental evidence, it does not appear likely that the number of hippocampal neurons regularly exhibits food caching–related seasonal variation.

Hippocampal Neurogenesis

Data on seasonal variation in hippocampal neurogenesis rates in food-caching chickadees is also quite inconsistent. First, Barnea and Nottebohm (1994) reported that hippocampal new neuron incorporation rates were highest in black-capped chickadees injected with new neuron marker in October and attributed these high rates to the peak of autumn food caching. Hoshooley and Sherry (2004, 2007) reported no significant seasonal variation in hippocampal neurogenesis rates in the same species, and Hoshooley et al. (2007) reported a peak in new hippocampal neuron survival rates in January (and potentially in April when neurogenesis rates were not statistically different from those sampled in January), much later than reported by Barnea and Nottebohm (1994).

Hippocampal neurogenesis is the only hippocampal attribute (among the ones considered here) that has indeed been experimentally linked to spatial memory use (LaDage et al., 2010). Based on such experimental evidence it might be plausible to expect that seasonal changes in memory use associated with food caching might indeed produce seasonal changes in hippocampal neurogenesis rates. Yet available evidence does not seem to provide unequivocal support for the idea that changes in hippocampal neurogenesis rates track seasonal changes in memory use associated with food caching.

It is likely that chickadees use spatial memory both when they make tens of thousands of food caches during later summer–early fall (e.g., Male & Smulders, 2007) as well as all throughout the winter when they recover these caches (see references in Pravosudov & Smulders, 2010). So it is not clear whether memory use (all aspects, including memory acquisition during caching, memory formation, and memory recall used either during cache retrieval or when making other caches relative to locations of previously made caches) should be higher during the peak of caching or the entire winter. See Barnea and Pravosudov (2011) for more discussion about neurogenesis.

If memory use is heaviest during the peak of caching, it might be expected that the highest neurogenesis rates should be in late August–September and early October at the latest (Pravosudov, 2006). If new neurons are needed for new memories, new neurons should be incorporated into the existing hippocampal circuits during that time and new neuron production could be triggered at the beginning of intense food caching in late August. Yet, Barnea and Nottebohm (1994) detected highest new neuron incorporation rates 6 weeks after injecting birds with a new neuron marker in October. So these new neurons were likely functional only in mid to late November, much later and after the peak of food caching and therefore unlikely related to memory needs associated with food caching (e.g., Barnea & Pravosudov, 2011). Results of Hoshooley and Sherry (2007) showed an even later peak in new neuron survival (January), which is not likely related to the food caching process.

If memory use is the highest during cache retrieval, it might be expected that food-caching chickadees use memory intensely during the entire winter, or at least during a few winter months, likely from November to February. The data from both Barnea and Nottebohm (1994) and Hoshooley et al. (2007) still do not fit such a pattern. Barnea and Nottebohm (1994) reported the highest neuron incorporation rates only in birds injected with new neuron marker in October (measured 6 weeks later—likely in late November), but not in birds injected in December even though cache retrieval memory use should be as high in January as in November. Hoshooley et al. (2007), on the other hand, reported the highest hippocampal neuron 1-week survival rates in birds sampled in January–February, yet new neuron survival rates were almost as high (and statistically indistinguishable from) new neuron survival rates in birds sampled in April–May, when cache retrieval should not be critical. At the same time, Hoshooley and Sherry (2004, 2007) did not detect any significant seasonal variation in hippocampal neurogenesis rates.

There are important differences between the Barnea and Nottebohm (1994), the Hoshooley and Sherry (2004), and the Hoshooley et al. (2007) studies concerning the measured period of new neuron survival (Barnea & Pravosudov, 2011). While Barnea and Nottebohm (1994) and Hoshooley and Sherry (2007) estimated 6-week survival, Hoshooley and Sherry (2004) and Hoshooley et al. (2007) measured 1–2 week survival. In the latter two studies and in Hoshooley and Sherry (2007), neuron survival was measured in captive birds, while Barnea and Nottebohm (1994) measured neuronal incorporation rates in free-ranging birds. Despite these differences, the observed patterns do not seem to fit any of the patterns predicted using seasonality of food caching and cache retrieval. One-to-two week survival might be potentially insufficient to detect important differences in neuron survival, as it may take more than 6 weeks for the new neurons to express adult phenotype (Hoshooley & Sherry, 2007), so the data presented in Hoshooley and Sherry (2004) and Hoshooley et al. (2007) might be more indicative of new neuron production rates. Yet, seasonal variation in 6-week survival rates reported in Barnea and Nottebohm (1994) still does not follow a pattern expected from seasonal variation in food caching and cache retrieval.

Finally, there are methodological differences concerning using tritiated thymidine (Barnea & Nottebohm, 1994) and BrdU (Hoshooley & Sherry’s studies) that might also produce potential differences in estimation of neurogenesis rates (Leuner, Glasper, & Gould, 2009).

Overall, the available data do not seem to provide clear evidence for robust food caching–related seasonal variation in adult hippocampal neurogenesis rates. While it is possible that there are some seasonal changes, they might be unrelated to food caching and associated with some other factors such as winter temperature or activity patterns. While chickadees captured as juveniles and maintained in captive conditions did show memory use–based increases in hippocampal neurogenesis, these increases did not compensate for the large captivity-related reduction in neurogenesis rates (LaDage et al., 2010). At the same time, black-capped chickadees hand-reared and maintained in laboratory conditions had statistically similar hippocampal neurogenesis rates (joint estimate of new neuron production and survival) to those in chickadees sampled directly from the wild during the peak of food caching (Roth et al., 2012). There is little doubt that birds in the wild must have more memory-based experiences than birds that spent their entire life in a relatively confined captive environment, yet such differences were not reflected by hippocampal neurogenesis rates. Such data are suggestive of some rather small threshold beyond which more experiences are not likely to produce an additional increase in hippocampal neurogenesis. Such a suggestion, however, remains a speculation at this point, and more data are needed to understand the patterns of association between memory use and neurogenesis.

Overall, there appears to be no clear evidence that the hippocampus undergoes robust and predictable seasonal changes associated specifically with food caching and/or cache retrieval. In fact, many studies reported no significant seasonal variation in any of the traits—hippocampus volume, total neuron numbers, or adult neurogenesis rates.

Overall Conclusions

Population comparisons of two species of food-caching chickadees experiencing different winter climate conditions provided highly consistent evidence of environment-related, strong variation in spatial memory, hippocampus morphology including hippocampus volume, total number and soma size of hippocampal neurons, total number of hippocampal glia, and adult hippocampal neurogenesis rates.

Experimental data suggest that some, but not all, of these hippocampal properties might be directly affected by the environment; however, in all cases the largest effects were due to captive environment. Memory-based experiences were only shown to up-regulate hippocampal neurogenesis rates in captive birds with neurogenesis rates already significantly reduced in captive conditions. All other hippocampal properties discussed here were unaffected by manipulations of such experiences. In contrast, birds that were hand-reared from an early age and maintained in a fairly enriched laboratory environment (large cages, ability to cache food in multiple substrates) had adult hippocampal neurogenesis rates statistically indistinguishable from those measured in wild birds in their immensely richer natural environment, which points toward a relatively small threshold in experiences beyond which adult neurogenesis rates do not appear to be affected by additional enriching experiences.

The fact that hippocampus volume might be affected by the environment without significant changes in the total number of neurons suggests that using neuron densities for evaluating cognitive abilities is not only incorrect, but could be misleading. For example, captivity is associated with a significant reduction in hippocampus volume, but not in the number of neurons, which results in higher density of hippocampal neurons in captive birds.

Most evidence is consistent with the hypothesis that climate-related population variation in spatial memory and hippocampus morphology is produced by natural selection associated with individual heritable variation in spatial memory and its neural mechanisms. The fact that the total number of neurons does not change, even in extremely impoverished captive conditions, suggests the involvement of some heritable regulatory mechanisms. While the hippocampus volume, total number of glia, and neuron soma size can and do respond to direct environmental changes, these changes appear to be anchored around the total number of neurons, which seems quite stable. Although it remains untested whether individual variation in spatial memory and hippocampal morphology in birds is heritable and based on genetic variation, there is evidence from human research showing heritability of general cognitive ability, spatial ability, and hippocampus volume, as well as its genetic basis (e.g., Ando et al., 2001; Haworth et al., 2010; McGee, 1979; Pedersen et al., 1992; Plomin et al., 1994; Plomin & Spinath, 2002; Sullivan, Pfefferbaum, Swan, & Carmelli, 2001). Finally, there appears to be no unambiguous evidence showing consistent seasonal variation in hippocampus morphology directly related to the seasonal cycle of food caching and cache retrieval. In fact, experimental data on the number of neurons suggests that at least the number of neurons is not likely to vary seasonally.

Overall, it appears that environment-induced plasticity in hippocampus morphology related to hippocampus volume, total number and size of hippocampal neurons, glia cell numbers, and even hippocampal neurogenesis rates might be anchored around the total number of hippocampal neurons, which appears to be regulated by some heritable mechanisms responsive to natural selection on food caching–related spatial memory. More research on hippocampus plasticity needs to be done on wild birds as captive conditions generate strong negative effects and all experience-based experimental manipulations in captive birds, especially captured as juvenile or adults, cannot come close to the baseline levels present in wild birds. Such strong captivity effects suggest that any results of experimental studies investigating brain plasticity should be considered cautiously.

References

Ando, J., Ono, Y., & Wright, M. J. (2001). Genetic structure of spatial and verbal working memory. Behavior Genetics, 31, 615–624. doi:10.1023/A:1013353613591

Barnea, A., & Nottebohm, F. (1994). Seasonal recruitment of hippocampal neurons in adult free-ranging black-capped chickadees. Proceedings of the National Academy of Sciences of the United States of America, 91, 11271–11221.

Barnea, A., & Pravosudov, V. V. (2011). Birds as a model to study adult neurogenesis: Bridging evolutionary, comparative and neuroethological approaches. European Journal of Neuroscience, 34, 884–907. doi:10.1111/j.1460-9568.2011.07851.x

Brodin, A. (2005). Hippocampal volume does not correlate with food-hoarding rates in the black-capped chickadee (Poecile atricapillus) and willow tits (Parus montanus). Auk, 122, 819–828. doi:10.1642/0004-8038(2005)122[0819:HVDNCW]2.0.CO;2

Chancellor, L. V., Roth, T. C., II, LaDage, L. D., & Pravosudov, V. V. (2011). The effect of environmental harshness on neurogenesis: A large-scale comparison. Developmental Neurobiology, 71, 246–252. doi:10.1002/dneu.20847

Clayton, N. S. (1996). Development of food-storing and the hippocampus in juvenile marsh tits (Parus palustris). Behavioral Brain Research, 74, 153–159. doi:10.1016/0166-4328(95)00049-6

Clayton, N. S. (2001). Hippocampal growth and maintenance depend on food-caching experience in juvenile mountain chickadees (Poecile gambeli). Behavioral Neuroscience, 115, 614–625. doi:10.1037/0735-7044.115.3.614

Clayton, N. S., & Krebs, J. R. (1994). Hippocampal growth and attrition in birds affected by experience. Proceedings of the National Academy of Sciences of Sciences of the United States of America, 91, 7410–7414. doi:10.1073/pnas.91.16.7410

Ekman, J. (1989). Ecology of non-breeding social systems of Parus. Wilson Bulletin, 101, 263–288.

Freas, C., LaDage, L. D., Roth, T. C., II, & Pravosudov, V. V. (2012). Elevation-related differences in memory and the hippocampus in food-caching mountain chickadees. Animal Behaviour, 84, 121–127. doi:10.1016/j.anbehav.2012.04.018

Freas, C. A., Bingman, K., LaDage, L. D., & Pravosudov, V. V. (2013). Untangling elevation-related differences in the hippocampus in food-caching mountain chickadees: The effect of a uniform captive environment. Brain, Behavior and Evolution, 82, 199–209. doi:10.1159/000355503

Hall, Z. J., Bauchinger, U., Gerson, A. R., Price, E. R., Langlois, L. A., Boyles, M., et al. (2014). Site-specific regulation of adult neurogenesis by dietary fatty acid content, vitamin E and flight exercise in European starlings. European Journal of Neuroscience, 39, 875–882. doi:10.1111/ejn.12456

Haworth, C. M. A., Wright, M. J., Luciano, M., Martin, N. G., de Geus, E. J. C., van Beijsterveldt, C. E. M., et al. (2010). The heritability of general cognitive ability increases linearly from childhood to young adulthood. Molecular Psychiatry, 15, 1112–1120.
doi:10.1038/mp.2009.55

Hogstad, O. (1989). Social organization and dominance behavior in some Parus species. Wilson Bulletin, 101, 254–262.

Hoshooley, J. S., Phillmore, L. S., & MacDougall-Shackleton, S. A. (2005). An examination of avian hippocampal neurogenesis in relationship to photoperiod. Neuroreport, 16, 987–991. doi:10.1097/00001756-200506210-00021

Hoshooley, J. S., Phillmore, L. S., Sherry, D. F., & MacDougall-Shackleton, S. A. (2007). Annual cycle of the black-capped chickadee: Seasonality of food-storing and the hippocampus. Brain Behavior and Evolution, 69, 161–168. doi:10.1159/000096984

Hoshooley, J. S., & Sherry, D. F. (2004). Neuron production, neuron number, and structure size are seasonally stable in the hippocampus of the food-storing black-capped chickadee (Poecile atricapillus). Behavioral Neuroscience, 118, 345–355. doi:10.1037/0735-7044.118.2.345

Hoshooley, J. S., & Sherry, D. F. (2007). Greater hippocampal neuronal recruitment in food-storing than in non-food-storing species. Developmental Neurobiology, 67, 406–414. doi:10.1002/dneu.20316

Krebs, J. R., Clayton, N. S., Hampton, R. R., Shettleworth, S. J. (1995). Effects of photoperiod on food-storing and the hippocampus in birds. Neuroreport, 6, 1701–1704. doi:10.1097/00001756-199508000-00026

Krebs, J. R., Sherry, D. F., Healy, S. D., Perry, V. H., & Vaccarino, A. L. (1989). Hippocampal specialization of food-storing birds. Proceedings of the National Academy of Sciences of Sciences of the United States of America, 86, 1388–1392. doi:10.1073/pnas.86.4.1388

LaDage, L. D., Roth, T. C., II, Fox, R. A., & Pravosudov, V. V. (2009). Effects of captivity and memory-based experiences on the hippocampus in mountain chickadees. Behavioral Neuroscience, 123, 284–291. doi:10.1037/a0014817

LaDage, L. D., Roth, T. C., II, Fox, R. A., & Pravosudov, V. V. (2010). Ecologically-relevant spatial memory use modulates hippocampal neurogenesis. Proceedings of the Royal Society B: Biological Sciences, 277, 1071–1079.
doi:10.1098/rspb.2009.1769

Leuner, B., Glasper, E. R., & Gould, E. (2009). Thymidine analog methods for studies of adult neurogenesis are not equally sensitive. Journal of Comparative Neurology, 517, 123–133. doi:10.1002/cne.22107

Male, L. H., & Smulders, T. V. (2007). Memory for food caches: Not just for retrieval. Behavioral Ecology, 18, 456–459. doi:10.1093/beheco/arl107

MacDougall-Shackleton, S. A., Sherry, D. F., Clark, A. P., Pinkus, R., & Hernandez, A. M. (2003). Photoperiodic regulation of food storing and hippocampus volume in black-capped chickadees, Poecile atricapillus. Animal Behavior, 65, 805–812. doi:10.1006/anbe.2003.2113

McGee, M. G. (1979). Human spatial abilities: Psycho-metric studies and environmental, genetic, hormonal, and neurological influences. Psychological Bulletin, 86, 889–918. doi:10.1037/0033-2909.86.5.889

Patel, S. N., Clayton, N. S., & Krebs, J. R. (1997). Spatial learning induces neurogenesis in the avian brain. Behavioral Brain Research, 89, 115–128.
doi:10.1016/S0166-4328(97)00051-X

Pedersen, N. L., Plomin, R., Nesselroade, J. R., & McClearn, G. E. (1992). A quantitative genetic analysis of cognitive abilities during the second half of the life span. Psychological Science, 3, 346–353. doi:10.1111/j.1467-9280.1992.tb00045.x

Plomin, R., & Spinath, F. M. (2002). Genetics and general cognitive ability (g). Trends in Cognitive Sciences, 6, 169–176. doi:10.1016/S1364-6613(00)01853-2

Pravosudov, V. V. 1985. Search for and storage of food by Parus cinctus lapponicus and P. montanus borealis (Paridae). Zoologichesky Zhurnal, 64(7): 1036–1043.

Pravosudov, V. V. (2006). On seasonality of food caching behavior in parids: Do we know the whole story? Animal Behaviour, 71, 1455–1460.
doi:10.1016/j.anbehav.2006.01.006

Pravosudov, V. V., & Clayton, N. S. (2002). A test of the adaptive specialization hypothesis: Population differences in caching, memory and the hippocampus in black-capped chickadees (Poecile atricapilla). Behavioral Neuroscience, 116, 515–522. doi:10.1037/0735-7044.116.4.515

Pravosudov, V. V., & Roth, T. C., II. (2013). Cognitive ecology of food hoarding: The evolution of spatial memory and the hippocampus. Annual Reviews of Ecology, Evolution and Systematics, 44, 18.1–18.21. doi:10.1146/annurev-ecolsys-110512-135904

Pravosudov, V. V., Roth, T. C., II, Forister, M., LaDage, L. D., Kramer, R., Schilkey, F., et al. (2013). Differential hippocampal gene expression associated with climate-related natural variation in memory and the hippocampus in food-caching chickadees. Molecular Ecology, 22, 397–408. doi:10.1111/mec.12146

Pravosudov, V. V., & Smulders, T. V. (2010). Integrating ecology, psychology, and neurobiology within a food-hoarding paradigm, Philosophical Transactions of the Royal Society B: Biological Sciences, 365, 859–867.
doi:10.1098/rstb.2009.0216

Roth II, T. C., LaDage, L. D., Chavalier, D., & Pravosudov, V. V. (2013). Variation in hippocampal glial cell numbers in food-caching birds from different climates. Developmental Neurobiology, 73, 480-485. doi:10.1002/dneu.22074

Roth, T. C., II, LaDage, L. D., & Pravosudov, V. V. (2011). Variation in hippocampal morphology along an environmental gradient: Controlling for the effect of day length. Proceedings of the Royal Society B: Biological Sciences, 278, 2662–2667. doi:10.1098/rspb.2010.2585

Sherry, D. F., Vaccarino, A. L., Buckenham, K., & Herz, R. S. (1989). The hippocampal complex of food-storing birds. Brain Behavior and Evolution, 34, 308–317. doi:10.1159/000116516

Smulders, T. V., Sasson, A. D., & DeVoogd, T. J. (1995). Seasonal variation in hippocampal volume in a food-storing bird, the black-capped chickadee. Journal of Neurobiology, 27, 15–25. doi:10.1002/neu.480270103

Smulders, T. V., Shiflett, M. W., Sperling, A. J., & DeVoogd, T. J. (2000). Seasonal changes in neuron numbers in the hippocampal formation of a food-hoarding bird: The black-capped chickadee. Journal of Neurobiology, 44, 414–422. doi:10.1002/1097
-4695(20000915)44:4<414::AID-NEU4>3.0.CO;2-I

Sullivan, E. V., Pfefferbaum, A., Swan, G. E., & Carmelli, D. (2001). Heritability of hippocampal size in elderly men: Equivalent influence from genes and environment. Hippocampus, 11, 754–762.
doi:10.1002/hipo.1091

Tarr, B. A., Rabinowitz, J. S., Imtiaz, M. A., & DeVoogd, T. J. (2009). Captivity reduces hippocampal volume but not survival of new cells in a food-storing bird. Developmental Neurobiology, 69, 972–981. doi:10.1002/dneu.20736

Vander Wall, S. B. (1990). Food hoarding in animals. University of Chicago Press.

West, M. J., Slomianka, L., & Gunderson, H. J. (1991). Unbiased stereological estimation of the total number of neurons in the subdivisions of the rat hippocampus using the optical fractionator. Anatomical Record, 231, 482–497. doi:10.1002/ar.1092310411

Woollett, K., & Maguire, E. A. (2011). Acquiring “the Knowledge” of London’s layout drives structural brain changes. Current Biology, 21, 2109–2114. doi:10.1016/j.cub.2011.11.018

Volume 10: pp. 1–23

ccbr_vol10_farrell_kriengwatana_macdougall-shackleton_iconDevelopmental Stress and Correlated Cognitive Traits in Songbirds

Tara Farrell
University of Western Ontario

Buddhamas Kriengwatana
Leiden University

Scott A. MacDougall-Shackleton
University of Western Ontario

Reading Options:

Continue reading below, or:
Read/Download PDF | Add to Endnote


Abstract

Early-life environments have profound influence on shaping the adult phenotype. Specifically, stressful rearing environments can have long-term consequences on adult physiology, neural functioning, and cognitive ability. While there is extensive biomedical literature regarding developmental stress, recent research in songbirds highlights similar findings in domesticated and non-domesticated species, opening up the field to broader questions with an ecological and evolutionary focus. Here, we review the literature in songbirds that exemplifies how developmental stress can shape birdsong, a sexually selected cognitive trait, and other physiological and cognitive abilities. Furthermore, we review how various traits can be correlated in adulthood as a result of various systems developing in tandem under stressful conditions. In particular, birdsong may be indicative of other cognitive abilities, which we explore in depth with current research regarding spatial cognition. In addition, we discuss how various personality traits can also be influenced by the intensity and timing of developmental stress (prenatal versus postnatal). We conclude by highlighting important considerations for future research, such as how assessing cognitive abilities is often constrained by experimental focus and more weight should be given to outcomes of reproductive success and fitness.

Author Note: Tara Farrell and Scott A. MacDougall-Shackleton, Department of Psychology and Advanced Facility for Avian Research, University of Western Ontario, 1151 Richmond Street, London, Ontario, Canada N6A 5B8; Buddhamas Kriengwatana, Institute of Biology Leiden (IBL), Leiden University, Sylviusweg 72, 2333 BE, Leiden, The Netherlands.
Correspondence concerning this article should be addressed to Tara M. Farrell at tfarrel2@uwo.ca.

Keywords: developmental stress; birdsong; cognition; correlated traits; hippocampus; corticosterone; behavioral syndromes


Organisms exhibit integrated phenotypes, with various traits working in concert to produce a functioning whole. Traits typically co-vary among individuals, and much research has attempted to determine the underlying causes of correlations between different traits. One such cause is the influence of environmental factors on developmental trajectories, which can induce pleiotropic effects on the adult phenotype. That is, environmental conditions do not impact the development of a single trait in isolation, but will affect numerous traits simultaneously, thus potentially leading to developmentally correlated traits (Metcalfe & Monaghan, 2001; Spencer & MacDougall-Shackleton, 2011). In the case of cognition and behavior, any neural developmental processes that are sensitive to environmental factors at the same time points in development may become developmentally correlated, even if they are functionally unrelated (Figure 1).

Figure 1. Developmental stress may induce correlations among traits in adulthood. Horizontal orange arrows indicate the developmental trajectories of neurocognitive systems. If a stressor affects the development of multiple neurocognitive traits early in development (indicated by lightning bolt), this may result in positive correlations between the traits in the adult animal (time point indicated by vertical blue arrow). In this way, traits that are functionally independent in adulthood (e.g., birdsong and spatial memory) may be correlated across individuals. Figure modified from Spencer & MacDougall-Shackleton (2011) with permission.

Figure 1. Developmental stress may induce correlations among traits in adulthood. Horizontal orange arrows indicate the developmental trajectories of neurocognitive systems. If a stressor affects the development of multiple neurocognitive traits early in development (indicated by lightning bolt), this may result in positive correlations between the traits in the adult animal (time point indicated by vertical blue arrow). In this way, traits that are functionally independent in adulthood (e.g., birdsong and spatial memory) may be correlated across individuals. Figure modified from Spencer & MacDougall-Shackleton (2011) with permission.

There has been growing interest in how stressful environmental factors can influence development. In one view, such developmental stressors can disrupt development and result in impaired performance. In another view, developmental stressors may program development to produce an adult better suited to a stressful environment (phenotypic programming; see Monaghan, 2008). If stressors impair development, then stressed individuals should have impairments in multiple traits. If stressors induce phenotypic programming, then stressed individuals should have multiple programmed traits. Regardless of these competing views, developmental stressors provide a potent mechanism by which multiple neural and cognitive traits may become developmentally correlated.

The aim of this review is to provide an overview of the literature regarding the effects of developmental stress on various cognitive-behavioral traits in songbirds. Cognition, defined for this review, refers to the processes that underlie the acquisition, processing, and storage of information, which an animal uses to interact within its environment (Shettleworth, 2009, p. 720). We explore how the developmental stress hypothesis—originally formulated to explain how developmental experiences can affect birdsong in particular—can be extended as a framework to help researchers understand how cognitive and behavioral traits in general may become correlated through developmental stressors. Consequently, this review will synthesize findings in songbirds regarding the developmental stress hypothesis, the implications that this hypothesis has for females, the reliability of song as a proxy of other cognitive abilities, and the effects of developmental stress on spatial cognition and personalities/behavioral syndromes—topics addressed by recent reviews of the developmental stress hypothesis that warrant a more thorough discussion (Buchanan, Grindstaff, & Pravosudov, 2013; MacDougall-Shackleton & Spencer, 2012; Spencer & MacDougall-Shackleton, 2011).

There are two primary reasons for focusing this review on songbirds despite the large and growing biomedical literature on the effects of early stressors on neural and cognitive development in domesticated species (e.g., Andersen & Teicher, 2008; Heim & Nemeroff, 2001; Heim, Shugart, Craighead, & Nemeroff, 2010; Welberg & Seckl, 2008). First, it is important to understand the role that stress plays in cognitive development in non-domesticated species because these studies provide valuable insights into the effects of developmental stress on ecologically relevant behaviors and sexual selection (e.g., MacDougall-Shackleton & Spencer, 2012; Buchanan et al., 2013). Second, birdsong is one of the most extensively studied areas of animal behavior and cognition, and depends critically on developmental experience. In addition, songbirds have been studied with respect to hippocampus-dependent spatial cognition, behavioral innovation, and behavioral syndromes, providing a rich and diverse animal model from which we can understand how stress affects a variety of cognitive functions.

Birdsong and the Developmental Stress Hypothesis

There has been a long-standing interest in the mechanisms of birdsong learning and development. Because birdsong is a sexually selected trait, there has also been extensive research on the types of information transmitted by birdsong and how song can act as an indicator signal to accurately provide information to the receiver regarding qualities of the singer. The current prevailing hypothesis to explain how song can be an honest signal of male quality is the developmental stress hypothesis (Nowicki, Hasselquist, Bensch, & Peters, 2000; Nowicki, Peters, & Podos, 1998). Because song learning and development of the neural structures underlying song occur over a protracted period of early life, they may be sensitive to a variety of stressors. Thus, a bird that sings a song of high quality is a bird that suffered relatively little stress during early life or was able to cope with such stress. Consequently, song may also provide predictive information about other traits that are sensitive to developmental stress (Nowicki et al., 1998). We will first review what is known about the effects of developmental stress on male song development and then discuss the effects of developmental stress on song learning and preference in the underrepresented female.

The developmental stress hypothesis has received substantial empirical support, as developmental stressors ranging from dietary manipulations, glucocorticoid administration, brood size manipulation, and immunological challenges have all been found to affect adult song and associated neurological structures (reveiwed in Buchanan et al., 2013; MacDougall-Shackleton & Spencer, 2012; Spencer & MacDougall-Shackleton, 2011). However, not all published experimental manipulations have found an effect of developmental stress on song learning. This is particularly evident in the multitude of zebra finch (Taeniopygia guttata) studies, where the effects of a variety of developmental stressors on song learning are inconsistent, with some studies reporting effects (Brumm, Zollinger, & Slater, 2009; de Kogel & Prijs, 1996; Holveck, Vieira de Castro, Lachlan, ten Cate, & Riebel, 2008; Spencer, Buchanan, Goldsmith, & Catchpole, 2003; Tschirren, Rutstein, Postma, Mariette, & Griffith, 2009; Zann & Cash, 2008), and others no effect (Gil, Naguib, Riebel, Rutstein, & Gahr, 2006; Kriengwatana et al., 2014).

The exact reasons for such discrepancies between studies, especially with zebra finches, remain unclear because the mechanisms by which developmental stress affects song have yet to be firmly established. Currently, developmental stressors are hypothesized to largely exert their effects by acting on the hypothalamic-pituitary-adrenal (HPA) axis. When stressors are perceived, the HPA axis mediates the physiological stress response by secreting glucocorticoids hormones (corticosterone [CORT] is the primary avian glucocorticoid). Thus, the way by which any type of developmental stressor affects song could be through activation of the HPA axis and the subsequent effects of CORT. For example, food restriction can alter baseline and stress-induced levels of CORT in birds (Kitaysky, Kitaiskaia, Wingfield, & Piatt, 2001; Pravosudov & Kitaysky, 2006) and elevated CORT due to food restriction can in turn adversely affect brain development (Welberg & Seckl, 2001). In support of this, some studies have found parallel effects of CORT and food restriction (Buchanan, Spencer, Goldsmith, & Catchpole, 2003; Schmidt, Moore, MacDougall-Shackleton, & MacDougall-Shackleton, 2013; Spencer et al., 2003).

Nevertheless, food restriction can also cause specific song and neural deficits not always seen in birds treated only with CORT. For example, song sparrows (Melsospiza melodia) that were either treated with CORT or food restricted both experienced a reduction in song complexity, but only food-restricted males were less accurate copying tutor song and had smaller volumes of the song-related brain nucleus RA (Schmidt, Moore, et al. 2013). This indicates that the influence of developmental stress on song may not necessarily be mediated by CORT per se, but by changing resource availability or resource allocation strategies of young birds. That is, birds that are deprived of nutrients early in life may have fewer substrates to allocate toward brain development (Welberg & Seckl, 2001), or may prioritize the development of certain systems when faced with limited nutrients (Schew & Ricklefs, 1998). These resource allocation strategies may, for instance, manifest as catch-up growth, which is a period of accelerated growth after a period of food restriction (Metcalfe & Monaghan, 2001). Catch-up growth may be beneficial in the short term by improving chances of survival, but there may be lifelong physiological debts incurred to paying the costs of this compensation (Metcalfe & Monaghan, 2001). Songbirds that are food restricted in early life often show slower growth rates, but typically these birds catch up to their control counterparts in adulthood once the stressor has been removed (Krause, Honarmand, Wetzel, & Naguib, 2009; Krause & Naguib, 2011; Kriengwatana, Wada, Macmillan, & MacDougall-Shackleton, 2013; Schmidt, MacDougall-Shackleton, & MacDougall-Shackleton, 2012; Spencer et al., 2003). Collectively, these studies suggest that glucocorticoid regulation is an important, but not the sole, mechanism by which developmental stress can have long-lasting impacts on birdsong. Glucocorticoid regulation may in fact be a component of resource allocation strategies that young birds employ when faced with stress during development (Wingfield et al., 1998).

The inconsistent effects of developmental stress on zebra finch song provide important clues about factors that can alter the impact of developmental stress on song. Specifically, the variety of developmental manipulations used across the studies to date (discussed in Kriengwatana et al., 2014), the relative importance of song in courtship displays, and relaxed sexual selection pressures due to domestication and variation among zebra finch colonies may contribute to these divergent findings. Increasing brood size and increasing foraging difficultly are two common methods of increasing developmental stress because of their ecological validity. However, it is difficult to identify the reasons why identical manipulations would yield different results (e.g., Brumm et al., 2009; Kriengwatana et al., 2014; Spencer et al., 2003; Zann & Cash, 2008) when researchers cannot quantify exactly how much stress is being applied to a nest and if all nestlings within a brood are experiencing the stressor equally. That is, it is logistically difficult to isolate zebra finch nestlings and strictly enforce a stressor if they are being fed by their parents and living in a brood. The fact that zebra finch song is part of a multimodal courtship display (i.e., females can see and hear males during courtship) suggests that their song may not be under as intense sexual selection as other species’ because song is one among many signals that zebra finch females use to assess potential mates (Bennett, Cuthill, Partridge, & Maier, 1996; Collins, Hubbard, & Houtman, 1994; Collins & ten Cate, 1996). In contrast to studies on zebra finches, data from free-living songbirds in which song clearly is and continues to be under intense sexual selection, such as European starlings (Sturnus vulgaris) and song sparrows (Melospiza melodia), have provided strong support for the developmental stress hypothesis (European starlings: Buchanan et al., 2003; Farrell, Weaver, An, & MacDougall-Shackleton, 2011; Spencer, Buchanan, Goldsmith, & Catchpole, 2004; song sparrows: MacDonald, Kempster, Zanette, & MacDougall-Shackleton, 2006; Schmidt, Moore, et al., 2013). Developmental stressors impair song learning and development of the song-control system in these species, and the size of song nucleus HVC is correlated with sexually selected components of song in free-living birds (starlings: Bernard, Eens, & Ball, 1996; song sparrows: Pfaff, Zanette, MacDougall-Shackleton, & MacDougall-Shackleton, 2007). Therefore, if song is a highly relevant indicator of fitness for a species, then we would expect that the mechanisms regulating song in that species are more susceptible to early environmental manipulations (Buchanan et al., 2013). Emphatically, developmental stress may not have the same consequences for fitness across all species or populations (i.e., colonies of domesticated birds) if song is not a primary metric by which females are evaluating potential mates.

Females and the Developmental Stress Hypothesis

As birdsong is the metric by which developmental stress has been assessed in most studies, little effort has been made to understand how stressors during development affect the receiver’s ability to perceive and respond to song. To our knowledge, there are no studies to date of the effects of developmental stress on a male’s ability to perceive and respond to songs during development or in adulthood (but for a manipulation of the early tutoring environment see Sturdy, Phillmore, Sartor, & Weisman, 2001). There are, however, a limited number of studies examining how developmental stress affects females’ responses to songs.

Most work on the effects of stress on females has been conducted with zebra finches. Data collected so far support the notion that females prefer the songs of control males to their stressed counterparts, and that female preferences may be altered by developmental stress (Riebel, Naguib, & Gil, 2009; Spencer et al., 2005). Female zebra finches prefer the songs of control males to previously stressed males, which suggests that developmental stress alters songs in a biologically relevant fashion (Spencer et al., 2005). In Riebel et al. (2009), female zebra finches raised in large broods (therefore presumed to have experienced more developmental stress) demonstrated equally strong preference for their tutor song as females from smaller broods. But, when given two songs from unfamiliar males singing a ‘short’ or ‘long’ song based on motif duration, females from small broods demonstrated stronger absolute preferences for one song type (i.e., there was no consistent directionality of the preference; Riebel et al., 2009). In similar studies, females were found to prefer males whose developmental background matched their own—in song preference tests and live interactive mate choice trials the females from small broods preferred males from small broods, and vice versa for large brood females (Holveck, Geberzahn, & Riebel, 2011; Holveck & Riebel, 2010). However, food restriction early in life did not affect female zebra finch preferences for song complexity when given song from unfamiliar singers, but did reduce overall activity during mate choice trials (Woodgate et al., 2011; Woodgate, Bennett, Leitner, Catchpole, & Buchanan, 2010). Overall, these studies indicate that developmental stress can have some influence over a female’s response toward song (i.e., less motor activity), but there is no compelling evidence to suggest that developmental stress alters a female’s preference toward song based on measures of song quality alone (i.e., complexity, motif duration). Be that as it may, studies that have found preference effects also have a potential confound—the similarity of the stimulus songs to songs a female would have been exposed to during the sensorimotor phases of vocal development are often not accounted for. Exposure to song in early life is a determining force to shaping female zebra finches’ song preferences (Lauay, Gerlach, Adkins-Regan, & DeVoogd, 2004; Riebel, 2000). A female raised in a large brood may acquire a preference for songs that share similar features to the songs of her development (i.e., similar to the song of her stressed father and/or developmentally stressed siblings). Therefore, rather than solely assessing if stress affects preferences for songs that vary in complexity, an additional consideration may be to assess preference for songs that vary features that sound more or less similar to the songs heard during the sensorimotor learning phase.

Apart from zebra finches, the only other species to date that has been studied with respect to the effects of developmental stress on female song preferences are song sparrows. Females that experienced food restriction or CORT treatment show reduced preferences for conspecific song (versus a heterospecific song) compared to control females (Schmidt, McCallum, MacDougall-Shackleton, & MacDougall-Shackleton, 2013). In addition, these stressed females showed patterns of neural activity in auditory forebrain areas (as measured by immunoreactivity of the immediate-early gene Zenk) that were not different when they listened to either conspecific or heterospecific song, while control females show significantly more immunoreactivity when listening to conspecific than heterospecific song (Schmidt, McCallum, et al., 2013). This study suggests that female preferences are condition dependent (Cotton, Small, & Pomiankowski, 2006) and may in part be caused by differences in neural activation in auditory forebrain regions in response to song (Schmidt, McCallum, et al., 2013). Unlike the zebra finches studies, all song sparrows in the Schmidt, McCallum, et al. (2013) study had the same exposure to tutor songs (a combination of live and tape tutors) in development, and all stimulus songs were from males whose repertoires were unfamiliar. Therefore, differences in early-tutoring environments are not likely the cause of the weaker preferences seen in song sparrows.

Even though female preference and mating decisions are swayed by the quality of male song (Gil & Gahr, 2002; Searcy, 1992), what exactly does a ‘good’ song advertise? Does song advertise direct benefits (e.g., increased parental feeding through superior foraging abilities), indirect benefits (e.g., good genes), or both? Moreover, what characteristics of song are most constrained by early developmental stress, and are they the same characteristics that females are using to evaluate prospective mates? Assessing preferences, and how developmental stress may alter them, should be tailored to each species due to ecological and life history characteristics that differ between them. In some species, specific content within a song is important for preference, such as sharing songs, singing a local dialect, or singing specific syllables (e.g., fast trills in canaries; Gil & Gahr, 2002). However, the reasons why females prefer these song characteristics could be established through different mechanisms (Searcy & Andersson, 1986), and therefore developmental stress may affect some species more selectively. For example, zebra finch mate preferences are strongly affected by parental imprinting such that zebra finches raised by Bengalese finches (Lonchura striata domestica) will almost exclusively display sexually in adulthood to Bengalese finches rather than conspecifics (ten Cate & Vos, 1999). Yet, female house finches (Carpodacus mexicanus) tutored by a foreign dialect, or in complete isolation, still preferred song from their local dialect in adulthood, despite no previous experience with it, which suggests some song preferences may be innate (Hernandez & MacDougall-Shackleton, 2004). Species for whom early auditory experiences are instrumental to shaping song preferences may be more susceptible to developmental stress than those whose preferences are less dependent on auditory experience.

Studies in wild bird populations generally support the developmental stress hypothesis that birdsong evolved as an honest signal of the developmental history of the singer. Attention is now turning toward understanding how developmental stress affects the perception of song. Currently, it appears that developmental stress may alter how females respond to songs, and preferences in some instances, but no tests so far have convincingly shown that these results are caused by developmental stress impairing mechanisms of perception. This, along with other questions (e.g., Does developmental stress selectively affect song perception? What are the neural bases for such perceptual impairments? And, if perceptual abilities were compromised, what would be the consequences to an individual’s fitness?), will need to be addressed in order to fully understand how early developmental factors can influence the coevolution of production and perception of increasingly elaborate sexually selected signals.

Developmental Stress and Correlated Cognitive Traits: Are the Best Singers Also the Smartest?

Early-life environments are fundamental to shaping an organism’s phenotype, exerting effects on a multitude of physiological and cognitive-behavioral traits (Metcalfe & Monaghan, 2001). In songbirds, physiological effects of developmental stress include altered growth rates, body size, organ mass, and immune and metabolic functioning (Kriengwatana et al., 2013; Verhulst, Holveck, & Riebel, 2006). Developmental stress can also affect other cognitive and behavioral processes in addition to affecting song, such as associative learning in zebra finches (Fisher, Nager, & Monaghan, 2006; Kriengwatana, Farrell, Aitken, Garcia, & MacDougall-Shackleton, in press) and spatial learning in Western scrub-jays (see the Developmental Stress and Spatial Cognition section for a detailed discussion; Pravosudov, Lavenex, & Omanska, 2005). Importantly, song characteristics appear to be correlated with a number of physiological and cognitive measures. For instance, song is correlated with immune function, body condition, endocrine function, survival, and fitness in song sparrows (Hasselquist, Bensch, & von Schantz, 1996; MacDougall-Shackleton et al., 2009; Pfaff et al., 2007; Schmidt et al., 2012; Schmidt, MacDougall-Shackleton, Soma, & MacDougall-Shackleton, 2014), inhibitory control in song sparrows (Boogert, Anderson, Peters, Searcy, & Nowicki, 2011), problem solving in zebra finches (Boogert, Giraldeau, & Lefebvre, 2008), and spatial learning in European starlings (Farrell et al., 2011). Consequently, these correlations suggest that if song is an honest indicator of early-life conditions (as posited by the developmental stress hypothesis), then song could also be an honest indicator of the quality of other cognitive functions that develop in parallel with song. In this section, we investigate the extent to which song is predictive of other cognitive functions, examine how developmental stress may explain the persistence of the relationships between song and other cognitive functions even after the stressor has been removed, and draw attention to the importance of understanding whether the effects of developmental stress on cognition have significant fitness consequences.

It is only recently that researchers began to test the relatively simple idea that the best singers are also the smartest, even though it seems logical to assume that the neural processing required for song learning may be correlated to other cognitive functions (Catchpole, 1996; Nowicki & Searcy, 2011). Numerous neural systems develop in tandem, and correlations can arise if they have overlapping critical sensitive periods and resource requirements (Buchanan et al., 2013; Spencer & MacDougall-Shackleton, 2011). For instance, the neural systems that regulate song learning and spatial memory (song-control system and hippocampus, respectively) are functionally independent in the developing zebra finch (Bailey, Wade, & Saldanha, 2009). Yet, these systems have developmental schedules that likely overlap (Brainard & Doupe, 2002; Clayton, 1996) and therefore could both be simultaneously affected by stressful environmental factors. If both systems are shaped by the same environmental factors (such as stressful rearing environments), this could result in them being correlated in adulthood despite their functional independence (Nowicki et al., 1998). Farrell et al. (2011) illustrate such an association: starlings that experienced a nutritional stressor in early development were impaired on a spatial memory task and scored lower on a measure of song quality. Moreover, the starlings that were better at the spatial foraging task also went on to sing more complex songs in their first breeding season (Figure 2).

Figure 2. Developmental stress affected both spatial and song abilities in starlings (figures modified with permission from Farrell et al., 2011). (A) An overhead view of the spatial foraging arena used in the spatial memory task. Birds were tested daily for 4 weeks with the same formation of 4 of 16 cups baited with mealworms. Cups were covered with tissue paper so birds had to peck through the paper to obtain the worm. (B) Performance as measured by number of incorrect cups searched across the 4-week testing phase of the spatial memory task. Controls (blue line) birds made significantly fewer errors than the food-restricted birds (orange line) across the 4-week testing period. (C) A male starling from the study singing in an aviary. (D) Average song bout length for both the control and food-restricted males from the study. Males raised in control conditions sang significantly longer song bouts in an undirected singing situation than males raised in a food-restricted condition. (E) Males that sang longer song bouts made fewer errors on the spatial memory task during the first 2 weeks of testing. Each point represents a male starling from the study coded by its early developmental condition (blue circle: control male, orange triangle: food-restricted male), but the regression line is based on all birds.

Figure 2A

Figure 2A

Figure 2B

Figure 2B

Figure 2C

Figure 2C

Figure 2D

Figure 2D

Figure 2E

Figure 2E

However, song is not always predictive of other cognitive traits, because for every positive correlation there have been an almost equal number of null, or even negative, correlations between song quality and other cognitive traits. In the aforementioned study with song sparrows (Boogert, Anderson, et al., 2011), there was no significant relationship between repertoire size and performance on a color association task or a reversal task. Templeton, Laland, and Boogert (2014) tested flocks of zebra finches on the same problem-solving task as Boogert et al. (2008) but did not replicate Boogert et al.’s results showing a positive relationship between song complexity and problem-solving performance in zebra finches tested in isolation. Specifically, Templeton et al.’s (2014) study did not find any relationship between song complexity of solvers and non-solvers, nor between song complexity and latency to solve the task among solvers. The researchers cite responsiveness to social isolation as the reason underlying the correlation previously reported by Boogert et al. (2008). A separate study in song sparrows found a negative relationship between spatial memory and song repertoire size (Sewall, Soha, Peters, & Nowicki, 2013), which the authors suggest could be the result of a trade-off between the development of song and spatial cognitive systems.

What might explain why song is predictive of other cognitive functions in some studies but not others? One explanation is that song correlates only with performance on particular cognitive tests. However, not all of the above outlined tests have been validated to assess specific cognitive processes (Thornton, Isden, & Madden, 2014). Alternatively, developmental history (i.e., the amount of developmental stress experienced) may contribute to differences in experience, motivation, and other factors that contribute to performance on cognitive tests (reviewed in Thornton & Lukas, 2012). Unfortunately, the developmental history of individuals is not always reported, making comparisons across studies rather difficult. Developmental history must be considered if we are to understand the importance of developmental stressors in organizing correlations among adult cognitive and behavioral traits—in this case, among song learning and other types of learning abilities. In light of these confounds, we strongly advise researchers interested in knowing how song is linked to other cognitive processes to use more rigorous testing and established experimental protocols (i.e., tests that have been validated to assess specific cognitive processes), and to always take note and report the developmental history of experimental subjects. To make the claim that song quality is predictive of other cognitive abilities, the onus must be placed on experimenters to demonstrate that these correlations are due to condition-dependent effects on the systems in question (Buchanan et al., 2013).

The relationship between song and other cognitive processes is predicted to be positive if the development of neural substrates mediating song and cognition overlap in time and if developmental stressors similarly affect song and cognition. Null or negative correlations between song and other cognitive traits could be explained if the neural substrates supporting a particular cognitive ability are developed at different times than the song-control system, or if the development of neural/physiological systems that maintain more essential functions than song are canalized, or buffered from developmental stressors. A third (and often implicit) assumption made when predicting the relationship between song and other cognitive functions is that they are, at least partly, mediated by common learning processes.

If song learning really does signal general learning abilities, this implies that there are common underlying processes for song and other forms of cognition, that is, a general intelligence factor in songbirds. General intelligence is a construct that captures cognitive performance across a series of tasks and suggests that performance on one task may be reflective of performance on other tasks (Deary, Penke, & Johnson, 2010). To date, there is evidence for general intelligence abilities in several species (reviewed in Boogert, Fawcett, & Lefebvre, 2011; Thornton & Lukas, 2012). Large-scale cognitive testing in birds is in its infancy, but we turn to recent studies on whether the elaborate bower-building displays (i.e., structures made of sticks and objects) of bowerbirds are reflective of a general intelligence factor (Isden, Panayi, Dingle, & Madden, 2013; Keagy, Savard, & Borgia, 2011a; Keagy, Savard, & Borgia, 2011b). Overall, there are few direct correlations between performances on different cognitive tasks in male satin bowerbirds (Ptilonorhynchus violaceus), but there is some evidence for a general intelligence factor, calculated based on shared covariance between the tests using principal components analysis (i.e., SB-g; Keagy et al., 2011b). However, in this species measures of song quality did not correlate with individual problem-solving abilities (Keagy et al., 2011b). In a congeneric species, spotted bowerbirds (Ptilonorhynchus nuchalis), one factor explains 44% of the covariance of test performance across a battery of tests unrelated to bower building (unlike tasks performed by Keagy, Savard, & Borgia, 2009, 2011a, 2011b). However, none of the individual tests or this aggregate score correlated with mating success (but see Isden et al., 2013). To date, the evidence from bowerbirds does not support the hypothesis that song quality is related to a general intelligence factor extracted from performance across various cognitive tests. Female bowerbirds appear to be making mating decisions on both measures of song and bower building in these species, which suggests that these traits are likely advertising different information (Candolin, 2003). Therefore, any general intelligence factors that are extracted from these studies likely reflect cognitive processes more or less independent of those necessary for song learning and performance.

Cognition and Fitness

Although there is good evidence that females of at least some songbird species preferentially choose males based on song (e.g., Hasselquist et al., 1996; Searcy, 1992), there is limited evidence that females choose males directly based on cognitive abilities (Boogert, Fawcett, et al., 2011). For example, female red-crossbills (Loxia curvirostra) observed males extracting seeds from pinecones and displayed a preference for the males that were faster at the task (Snowberg & Benkman, 2009), but female zebra finches showed no preference for males’ foraging technique (Boogert, Bui, Howarth, Giraldeau, & Lefebvre, 2010). Future research is warranted to determine whether females use song to assess a male’s cognitive abilities, or if they are also directly assessing other cognitive behaviors in mate choice situations. However, if a female chooses to mate with a “smarter” male it may not ultimately matter whether her decision was based directly on cognition or not, as long as she receives the fitness benefits (direct and/or indirect) from mating with a “smarter” male (Boogert, Fawcett, et al., 2011).

Song is a trait on which females appear to base mate choice decisions and this mate choice is hypothesized to result in fitness gains if that male also has greater cognitive abilities. However, rarely is the link between song and cognitive ability studied within the context of fitness. In the previous sections, we have reviewed the tenuous relationship between song and other cognitive domains. Most of the cognitive tests used are artificial ones designed by experimenters, making it difficult to determine whether performance on such tests is related to fitness in a natural setting. A behavior that is deemed “smart” by an experimenter may not yield an increase in overall fitness if: (a) the task is far removed from a biologically relevant context, (b) there are alternative strategies that are equally or more profitable (e.g., foraging versus scrounging), or (c) displaying such a behavior would not add, or would be counterproductive, to an animal’s reproductive success/fitness. Ultimately, “smart” behaviors evolve because they increase an individual’s survivability and/or reproductive success. And yet, how particular cognitive phenotypes benefit an animal’s fitness is a question that often falls outside the scope of most experiments (Healy, 2012).

Intelligence is assumed to be beneficial, but very few studies have found a direct link between cognition and fitness. In fact, researchers often do not consider the costs that may result from being smart. In one study that did so, the problem-solving abilities of over 400 great tits (Parus major) were related to their reproductive success (Cole, Morand-Ferron, Hinks & Quinn, 2012). Female tits briefly tested in captivity that were successful at a problem-solving task went on to have larger clutches and fledged more young. These fitness gains did not come at the expense of the mother’s own condition, but rather because these females had significantly smaller foraging areas. This implies that successful solvers were better able to exploit their habitat when foraging and therefore spent less time away from the nest, which in turn meant that they could engage in more nest-attentive behaviors that would increase the odds that young successfully fledged (Cole et al., 2012). But, successful solvers were also more likely to desert their nest after a nest disruption. Solvers in this population are known to have a heightened startle response and are less competitive in social situations (Cole & Quinn, 2012; Dunn, Cole, & Quinn, 2011); thus problem-solving appears to have both costs and benefits in this species.

Together, the above findings suggest that in this population of great tits there could be two alternative life-history strategies, with each strategy resulting in higher fitness benefits under particular circumstances. Solvers appear to be more reactive to nest disruptions and therefore could fledge fewer young compared to non-solvers in a season where nest disturbance events are high. Conversely, if food was not as abundant, solvers may fledge more young because they are able to exploit their habitats more efficiently than non-solvers. Therefore, cognitive ability may only be employed in situations where it is a rewarding strategy. If there is a genetic basis to these cognitive and personality phenotypes, then individuals may be under different selection pressures based upon ecological constraints (Svanbäck & Bolnick, 2007).

Developmental Stress and Spatial Cognition

In mammals, the association between developmental stress and deficits in spatial cognition in adulthood is well researched (Lupien, McEwen, Gunnar, & Heim, 2009; Vallée et al., 1999; Yang, Han, Cao, Li, & Xu, 2006). While studies of spatial cognition are prominent in the songbird literature, few have yet to manipulate developmental conditions and observe the effects on spatial cognition and the hippocampus. In this section, we review the few studies that have conducted such manipulations, remarking on a potential mechanism by which developmental stress could affect spatial cognition, and highlight future avenues of research.

For food-caching species, spatial cognition is a conspicuous trait that is closely linked to survival (Pravosudov & Lucas, 2001). Western scrub-jays (Aphelocoma claifornica) are prolific food-cachers—when food is plentiful, they store food in hidden locations and will retrieve these items in times of food scarcity. Pravosudov et al. (2005) hypothesized that stressful conditions in early development could induce cognitive deficits in scrub-jays, which would become evident when they performed tasks as adults that relied on spatial ability. This hypothesis was supported by their findings, as jays that experienced nutritional restriction in early life performed more poorly than control birds on tasks assessing cache recovery and spatial-association learning. Furthermore, when given conflicting spatial/color information to locate a food reward, the jays overwhelmingly preferred to solve the task based on the color information compared to birds that did not experience nutritional restriction. In the same birds, nutritional restriction impeded growth of the hippocampus, as food-restricted jays had smaller hippocampal volume and fewer hippocampal neurons, even though overall telencephalon volume and brain mass were unaffected. As memory for spatial location and memory for color appear to be mediated by separate mechanisms (Hampton & Shettleworth, 1996; Sherry & Vaccarino, 1989), Pravosudov et al.’s (2005) results reveal that the effects of stress in the brain are not uniform, and that some neural structures may be more sensitive to the effects of early nutritional stress than others.

For non-caching species, spatial cognition is involved in a variety of behaviors (e.g., migration, orientation, exploration; Shettleworth, 2009). In two non-food-caching species, manipulating diets early in life also had negative effects on performance in spatial memory tasks. Arnold, Ramsay, Donaldson, and Adam (2007) manipulated levels of taurine, an amino acid implicated in brain development and the regulation of the stress response (Engelmann, Landgraf, & Wotjak, 2003; Lapin, 2003), in nestling blue tits (Cyanistes caruleus). When tested as juveniles, blue tit females that had been supplemented with taurine showed a trend toward committing fewer errors on a spatial reference memory task. Similarly, starlings that were subjected to an unpredictable food supply during the juvenile phase committed more reference and working memory errors on a spatial foraging task compared to starlings raised on an ad libitum diet (Farrell et al., 2011; Figure 2B). Zebra finches subjected to nutritional stress before nutritional independence also performed more poorly in a hippocampus-dependent spatial memory task (Kriengwatana et al., in press). Although spatial performance was affected by early developmental treatments, neither of the aforementioned studies assessed the hippocampus or any other neural structures.

Developmental stress exists in many forms (e.g., brood enlargement, nutritional deficits, unpredictable environments, parasites, and infectious diseases), yet the resulting effects on the physiology of the song-control system and the hippocampus are similar: smaller volumes and fewer neurons. Therefore, the hippocampal differences in jays reported by Pravosudov et al. (2005) may not be due to a nutritional deficit per se, but rather the result of a physiological mechanism that, when stimulated by environmental stress, inhibits neural development. As discussed previously, a likely mechanism is the HPA axis, which regulates the physiological stress response involving glucocorticoids (CORT). Glucocorticoids may have short-term activational and long-term organizational effects on the hippocampus because this brain region is rich in two receptors (mineralocorticoid receptors, MR, and glucocorticoid receptors, GR) that regulate negative feedback and thus mediate the effects of glucocorticoids (Liu et al., 1997). Stressful early conditions increase circulating glucocorticoids and alter levels of glucocorticoid receptors in the hippocampus, which may subsequently lead to spatial memory deficits (Banerjee, Arterbery, Fergus, & Adkins-Regan, 2012; Hodgson et al., 2007; Pravosudov & Kitaysky, 2006). In zebra finches, offspring that were deprived of maternal care had a more exaggerated stress response to social isolation and fewer MR were observed across the brain, including within the hippocampus (Banerjee et al., 2012). Similarly, a selectively bred line of zebra finches that had high CORT in response to acute stress were found to have worse performance on a spatial memory task and fewer MR within the hippocampus (Hodgson et al., 2007). MR is thought to preserve neuronal integrity and excitatory tone within the hippocampus, and therefore a decrease in their number could compromise cognitive processes specific to the hippocampus (Joëls, Karst, DeRijk, & de Kloet, 2008; Joëls, 2008).

Food-caching birds provide a fruitful model for future work, as there are many established experimental protocols for assessing a variety of aspects of spatial memory and its relation to the hippocampus. Clayton and colleagues have studied the various ways scrub-jays (Aphelocoma coerulescens) and western scrub-jays cache their food and convincingly demonstrate that these birds integrate temporal, contextual, and social information in memory (Clayton, Dally, & Emery, 2007; Clayton & Dickinson, 1998). We know that developmental stress impairs hippocampus development and spatial memory in this species, but how might such stress affect temporal, semantic, and social memory? These and other cognitive processes, such as emotional processing and memory formation, may also depend on hippocampus function (Eichenbaum, 1996). In addition, while the volume of the hippocampus may reflect spatial performance, there could be additional aspects aside from volume that contribute to spatial and other forms of cognition. Hippocampal volume is a proxy of spatial cognition, as variables such as age, captivity, and seasonal effects can alter hippocampal volume (Roth, Brodin, Smulders, LaDage, & Pravosudov, 2010). Examining small-scale changes, such as the integration of newly generated neurons within the hippocampus, may be a more sensitive measure that reflects variation within spatial memory ability. Future research examining the effects of developmental stress should include variables that capture neuron integration, such as neuron proliferation, neuron type/size, glial counts/size, dendritic branching, and length of branching (Roth, Brodin, et al., 2010). Moreover, examining differences in gene expression for MR and GR receptors may be of importance as the hippocampus is an extrahypothalamic site for negative feedback of the HPA axis (Welberg & Seckl, 2001). Differences in these small-scale measures may not be reflected in the larger-scale measure of hippocampal volume and will further our understanding of how developmental stress may affect the brain. Developmental stress affects hippocampal development and spatial cognition, but future efforts are required both to understand the neural mechanisms behind these effects and to clarify how other memory systems may be affected.

Personality/Behavioral Syndromes

Behavioral syndromes can be defined as “a suite of correlated behaviors reflecting between-individual consistency in behavior across multiple (two or more) observations” (Sih & Bell, 2008, p. 231). Correlated behaviors should maintain a consistent and stable relationship, which is to say they should not change in the face of transient factors (e.g., motivation), but can change based on life history stages and social contexts (Groothuis & Carere, 2005; Schuett & Dall, 2009; van Oers, 2005). The definition of behavioral syndromes is inclusive and subsumes other similar, but not synonymous terms, such as coping styles, personality, and temperament (Sih & Bell, 2008; Stamps & Groothuis, 2010a). Consequently, we consider studies using any of the aforementioned terms as studies of behavioral syndromes.

The suite of correlated traits that comprise behavioral syndromes in songbirds is currently not well understood. So far, the most comprehensive studies of avian behavioral syndromes have been undertaken in great tits (Parus major). However, we must be cautious about assuming that these findings are applicable to other birds. In great tits, the terms reactive and proactive have been used to describe birds that exhibit slow exploration of novel environments and increased latency to investigate a novel object, and fast exploration of a novel environment and reduced latency to investigate a novel object, respectively (reviewed in Groothuis & Carere, 2005). Compared to reactive birds, proactive birds are consistently more aggressive, socially dominant, less behaviorally flexible, and secrete less CORT in response to social and restraint stress (Baugh et al., 2012; Carere, Drent, Privitera, Koolhaas, & Groothuis, 2005; Carere, Groothuis, Möstl, Daan, & Koolhaas, 2003; van Oers, Drent, de Goede, & van Noordwijk, 2004; Verbeek, Boon, & Drent, 1996; Verbeek, Drent, & Wiepkema, 1994). These different personality types can be influenced by genetic and nongenetic factors (Carere, Drent, Koolhaas, & Groothuis, 2005; Drent, van Oers, & van Noordwijk, 2003; Stamps & Groothuis, 2010b; van Oers et al., 2004). Studies on birds such as zebra finches and black-capped chickadees (Poecile atricapillus) also find consistent individual differences in exploratory behaviors (An, Kriengwatana, Newman, & MacDougall-Shackleton, 2011; Beauchamp, 2000; David, Auclair, & Cézilly, 2011; Krause & Naguib, 2011; Schuett & Dall, 2009). In zebra finches, selection for physiological responsiveness to stress is also associated with differences in exploration. Specifically, in zebra finch lines artificially selected for low and high responses to acute restraint stress, greater exploratory behavior was linked to higher CORT only in the low CORT line (Martins, Roberts, Giblin, Huxham, & Evans, 2007). While the work in great tits has been extremely influential for understanding avian behavioral syndromes, we must be careful when generalizing across species because relationships between traits may not exist or may be different in other species. For example, exploration and boldness were not correlated in zebra finches or in a non-songbird (Japanese quail; Coturnix japonica; Martins et al., 2007; Zimmer, Boogert, & Spencer, 2013), and learning an acoustic discrimination task was positively correlated with exploratory behavior in black-capped chickadees but not in great tits (Groothuis & Carere, 2005; Guillette, Reddon, Hurd, & Sturdy, 2009).

Few studies in birds have investigated the impact of developmental experiences on behavioral syndromes, although their importance to behavioral syndromes is gaining recognition (Groothuis & Trillmich, 2011; Stamps & Groothuis, 2010a, 2010b; Trillmich & Hudson, 2011). As explained in detail earlier, stressors during development can affect diverse physiological and behavioral traits and consequently may influence behavioral syndromes if they can affect the strength and direction of the correlation between these traits (Spencer & MacDougall-Shackleton, 2011). Data from the studies available do not yet clearly establish how developmental stress affects behavioral syndromes. Among others, one important variable that is inconsistent between studies is the timing of developmental stress and the time at which behaviors are assessed. The timing of stress during development is an important source of variation in offspring behavior, with both prenatal and postnatal stress (as well as the time within those stages) having potentially different behavioral outcomes (Boogert, Zimmer, & Spencer, 2013; Henriksen, Rettenbacher, & Groothuis, 2011; Krause et al., 2009; Kriengwatana, 2013; but see Zimmer et al., 2013). Furthermore, the time at which behaviors are assessed is also important because behavioral variation between individuals may change across the lifespan, thus making it difficult to detect covariation (Sih & Bell, 2008). Below we discuss separately the studies that investigate prenatal and postnatal stress.

Prenatal Stress

Prenatal stress is assumed to affect offspring behaviors through altering maternal steroid hormone deposition in eggs, decreasing maternal investment during egg formation, or affecting maternal care behaviors such as incubation. In a comprehensive review of prenatal stress in birds, Henriksen et al. (2011) noted that maternal stress (via CORT injections in the female or unpredictability of feeding) reduced offspring competitiveness, but that this result was not always observed if CORT was injected directly into the eggs. The effects of CORT injections into eggs on fearfulness and anxiety are mixed, and it is not clear whether these measures can be treated as measures of boldness and exploration, or whether these measures showed individual consistency and inter-individual variation and can thus be deemed as components of a behavioral syndrome (for a discussion regarding fearfulness as an aspect of behavioral syndromes see Cockrem, 2007). One study in Japanese quail that injected CORT into eggs and measured both boldness and exploration found that prenatal stress increased exploration but not boldness (Zimmer et al., 2013). Moreover, birds in this study that received both prenatal and postnatal stress treatments tended to be the most explorative and risk taking compared to birds that received pre- or postnatal stress treatments or control treatment. This suggests a cumulative effect of pre- and postnatal stress on measures of behavioral syndromes. Nevertheless, more studies are needed before generalizations can be made about the effects of prenatal stress on behavioral syndromes. In addition to manipulating CORT, future studies should also consider whether incubation temperature could alter offspring behavior, as previous work shows that it is able to affect a variety of physiological measures (Henriksen et al., 2011). In addition, it will be important to explore variation between species that produce altricial versus precocial young, as the physiological systems that develop prenatally in ovo will markedly differ. Recent work from our lab (H. Wada, unpublished) found that manipulations of incubation temperature in zebra finches (that produce altricial young) had very different effects than those reported for wood ducks (that produce precocial young; Durant, Hepp, Moore, Hopkins, & Hopkins, 2010). Thus, the distinction between pre- and postnatal manipulations will vary for species with different developmental schedules.

Postnatal Stress

Sources of postnatal stress include food availability, sibling competition, parental favoritism, and disease/parasitism. Investigations of the effect of early postnatal stress on components of behavioral syndromes are shown in Table 1. The strongest evidence that developmental stress can affect behavioral syndromes comes from a study that found that reducing food intake of great tit nestlings increased exploration and boldness (Carere, Drent, Koolhaas, et al., 2005). Importantly, food rationing increased aggression in great tits artificially selected to be fast explorers, indicating that postnatal developmental stress was able to alter the relationship between behavioral traits that are part of a well-established behavioral syndrome. However increased boldness and exploration were also observed in two other studies where birds may have experienced less developmental stress. Naguib, Flörcke, and van Oers (2011) found that great tits that experienced less sibling competition were faster to investigate novel environments and objects. Arnold et al. (2007) also found that blue tits (Cyanistes caeruleus) were bolder if they had been supplemented with taurine (an inhibitor of HPA axis activity; Engelmann et al., 2003) as nestlings. The discrepancy between the studies above may be due to the different manipulations used to alter developmental conditions. More research is needed to clarify how different stressors may produce different effects on personality traits.

Table 1

Table 1. A summary of studies of postnatal stress and the effects on components of behavioral syndromes. We distinguish between exploration and boldness as tests that measured behaviors in a novel environment, and toward a novel object, respectively. Values for the duration of treatment and approximate age of testing represent days post-hatch.

In zebra finches there are methodological, age, and sex-specific effects on the development of behavioral phenotypes. Administration of postnatal CORT has had contradictory effects, with one study reporting decreases in boldness (males only) and competitiveness (both sexes; CORT administered PHD 7–18: Spencer & Verhulst, 2007), and another reporting no change in boldness in either sex (CORT administered PHD 12–28; Donaldson, 2009). However, diet manipulations during a similar developmental time period generated different results. Krause et al. (2009) found that reducing diet quality increased exploration (females only; diet manipulation from PHD 1–17); however, the same manipulation for a longer duration had no effect on exploration on males or females (Krause & Naguib, 2014; PHD 3–35). Similarly, Donaldson (2009) found no effect of reducing dietary protein on boldness in either sex. Instead, Donaldson (2009) reported that inconsistency of treatment (e.g., a switch from high to low protein diets and vice versa) rather than the diet itself decreased boldness in both sexes, although this result was nonsignificant (p = 0.052). This suggests that environmental instability, rather than stressful early environments per se, may mediate the effects of developmental stress on personality. This hypothesis has received mixed support. In support, Krause and Naguib (2011) found that accelerated catch-up growth resulting from alleviation of nutritional stress was negatively correlated with exploration in males and females, but early nutritional stress itself did not affect exploration. In opposition, Kriengwatana et al. (in press) reported no effect of nutritional stress (via food accessibility) or constancy of nutritional conditions between PHD 5–61 on boldness in adult birds.

Despite some conflicting results, the studies above raise three important observations. First, findings that postnatal stress can decrease boldness (Spencer & Verhulst, 2007) and increase exploration (Krause et al., 2009) suggest that boldness and exploration may not be correlated in zebra finches (Martins et al., 2007). Alternatively, postnatal stress may not have affected both boldness and exploration because of the different type of stress experienced (i.e., CORT administration versus nutritional stress). An experiment that manipulates postnatal stress and measures both boldness and exploration in the same individuals is needed to evaluate these possibilities. Second, different types of stress during development interact with sex to differentially affect exploration and boldness in males and females. Males may be very sensitive to increased CORT during the first days post-hatch (Spencer & Verhulst, 2007), whereas both males and females may be similarly sensitive to unavailability of dietary protein before they reach nutritional independence (around PHD 35; Donaldson, 2009). Third, these results highlight that the effects of developmental stress on behavioral phenotypes is contingent on the age or life stage at which behaviors are assessed. Boldness or exploration are consistent in zebra finches if the tests are repeated within the same day, the next day, or the following week (David et al., 2011; Krause & Naguib, 2011; Schuett & Dall, 2009), yet over the long term these traits may change (Donaldson, 2009). The lack of correlations between behaviors at different life stages may reflect less behavioral variability of a species in general at a certain age, or be caused by testing for behaviors in contexts that do not sufficiently reveal underlying inter-individual variability (Sih & Bell, 2008).

In summary, both prenatal and postnatal developmental stress can alter behaviors that constitute behavioral syndromes, but further investigation is necessary to determine whether stress can change the correlations between traits. Further studies are also required to determine whether the influence of developmental stress on behavioral syndromes is limited to early life—the only study that manipulated stress after nutritional independence found that it had no effect on boldness (Kriengwatana et al., in press). Another aspect that warrants further investigation is how developmental stress differentially affects behavioral syndromes in males and females. Developmental stress may have sex-specific effects because different sexes may respond to stress differently according to their life history strategies, and already there is some evidence of developmental stress producing sex differences in boldness and/or exploration (e.g., Arnold et al., 2007; Donaldson, 2009; Spencer & Verhulst, 2007). As males seem to be more consistent in exploration and boldness compared to females (Donaldson, 2009; Schuett & Dall, 2009), this indicates that females may be more behaviorally plastic than males. Last, because developmental stress can have such diverse effects, it would be beneficial to assess the relationship of boldness, exploration, and aggression with other behaviors that may be affected by stress, such as begging rates, song, and learning ability (Arnold et al., 2007; Brust, Krüger, Naguib, & Krause, 2014; Carere, Drent, Koolhaas, et al., 2005; Garamszegi, Eens, & Török, 2008; Groothuis & Carere, 2005). Studies in this direction would address how developmental experiences, by altering behavioral tendencies, can affect behavioral plasticity.

Conclusion

Here we have reviewed the evidence to date for the developmental stress hypothesis and how we can extend its underlying principles to the study of other cognitive traits and behavioral measures. Cognitive and behavioral traits that are influenced by environmental conditions could be signaled through song quality, conveying information to listeners about how well an individual coped with stressful early-life conditions and/or their heritable developmental stability. Although there has been much borne out of the developmental stress hypothesis, there are still areas where current research falls short and there are areas where concentrated efforts are still needed.

The predictions of the developmental stress hypothesis have been supported in multiple species using a variety of manipulations. However, not all manipulations have yielded the same effects on song (reviewed in Spencer & MacDougall-Shackleton, 2011), and therefore more research is needed to understand the mechanisms by which developmental stress operates. As we alluded to throughout the review, CORT is a likely candidate by which stress affects the brain. Recent research has found that there are receptors for corticosterone in the song-control system (Suzuki, Matsunaga, Kobayashi, & Okanoya, 2011), and therefore corticosterone is a likely vehicle by which stress alters song learning. Still, many questions remain unanswered. For instance, how does CORT affect cellular processing, neuronal migration, or connectivity between neural circuits? And do these CORT-induced neural changes lead to observable behavioral changes in song? Another important factor to consider are sex-specific differences with regard to developmental stress. Although it is the male of the species that typically sings, how stress affects females’ perception, preference, and choice for male song is a crucial component of the evolutionary equation. A better understanding of what benefits are bestowed upon a female when she chooses a male will go a long way to understanding the evolution of this sexually selected cognitive trait.

We emphasize that most work to date regarding the correlation across cognitive traits and birdsong is equivocal. However, the field is still in its infancy and future studies should strive to use validated psychometric tests, adequate sample sizes, and knowledge about developmental history of its subjects. It is easier to assess cognitive abilities with localized neural structures, such as spatial cognition and the hippocampus. As reviewed above, there are many excellent songbird models that study hippocampal functioning that are rich sources for future studies linking song and with various aspects of memory. Designing future experiments around tasks that assess known cognitive processes and underlying neural structures is a necessary step to further our understanding between song and specific forms of cognition. For example, consider the arcopallium, a region homologous to the mammalian amygdala (Abellán, Legaz, Vernier, Rétaux, & Medina, 2009) that regulates fear learning and is sensitive to CORT (Brown, Woolston, & Frol, 2008; Cohen, 1975). Differences in the volume of the arcopallium between two black-capped chickadee populations could potentially explain the differences in problem-solving abilities and neophobia responses also observed between these two populations (Roth, Gallagher, LaDage, & Pravosudov, 2012; Roth, LaDage, & Pravosudov, 2010). Examining how developmental stress may affect the functioning of this area, and subsequent associative fear learning, could be one example of a future study assessing behavior with known neural correlates.

Although we have discussed personality and cognitive traits separately, they should not be thought of as independent of each other. It is apparent that early developmental conditions can shape an organism’s phenotype, but more work is needed to understand how such changes could give rise to persistent behavioral strategies across a variety of contexts. The timing of developmental stress, and the temporal overlap between periods when traits are most sensitive to environmental influences are key factors that could explain the relationship between personality and cognitive abilities.

In conclusion, to understand cognition and the relationship among various cognitive traits, it is imperative that we have knowledge of an individual’s developmental history. This is because developmental events, especially stressful ones, can have persistent effects on the function of various cognitive traits that carry over into adulthood. The developmental stress hypothesis provides a powerful framework to synthesize findings across the fields of developmental and cognitive research. While the hypothesis focuses on explaining variation within birdsong, its central tenets can be applied to other aspects of an individual’s condition, and the principle in general can be applied to other animals’ systems.

References

Abellán, A., Legaz, I., Vernier, B., Rétaux, S., & Medina, L. (2009). Olfactory and amygdalar structures of the chicken ventral pallium based on the combinatorial expression patterns of LIM and other developmental regulatory genes. The Journal of Comparative Neurology, 516(3), 166–186. doi:10.1002/cne.22102

An, Y. S., Kriengwatana, B., Newman, A. E., MacDougall-Shackleton, E. A., & MacDougall-Shackleton, S. A. (2011). Social rank, neophobia and observational learning in black-capped chickadees. Behaviour, 148(1), 55–69. doi:10.1163/000579510X545829

Andersen, S. L., & Teicher, M. H. (2008). Stress, sensitive periods and maturational events in adolescent depression. Trends in Neurosciences, 31(4), 183–191. doi:10.1016/j.tins.2008.01.004

Arnold, K. E., Ramsay, S. L., Donaldson, C., & Adam, A. (2007). Parental prey selection affects risk-taking behaviour and spatial learning in avian offspring. Proceedings of The Royal Society B: Biological Sciences, 274(1625), 2563–2569. doi:10.1098/rspb.2007.0687

Bailey, D. J., Wade, J., & Saldanha, C. J. (2009). Hippocampal lesions impair spatial memory performance, but not song—a developmental study of independent memory systems in the zebra finch. Developmental Neurobiology, 69(8), 491–504. doi:10.1002/dneu.20713

Banerjee, S. B., Arterbery, A. S., Fergus, D. J., & Adkins-Regan, E. (2012). Deprivation of maternal care has long-lasting consequences for the hypothalamic-pituitary-adrenal axis of zebra finches. Proceedings of The Royal Society B: Biological Sciences, 279(1729), 759–766. doi:10.1098/rspb.2011.1265

Baugh, A. T., Schaper, S. V., Hau, M., Cockrem, J. F., de Goede, P., & van Oers, K. (2012). Corticosterone responses differ between lines of great tits (Parus major) selected for divergent personalities. General and Comparative Endocrinology, 175(3), 488–494. doi:10.1016/j.ygcen.2011.12.012

Beauchamp, G. (2000). Individual differences in activity and exploration influence leadership in pairs of foraging zebra finches. Behaviour, 137, 301–314. doi:10.1163/156853900502097

Bennett, A. T. D., Cuthill, I. C., Partridge, J. C., & Maier, E. J. (1996). Ultraviolet vision and mate choice in zebra finches. Nature, 380(4), 433–435. doi:10.1038/380433a0

Bernard, D. J., Eens, M., & Ball, G. F. (1996). Age- and behavior-related variation in volumes of song control nuclei in male European starlings. Journal of Neurobiology, 30(3), 329–339.
doi:10.1002/(SICI)1097-4695(199607)30:3<329
::AID-NEU2>3.0.CO;2-6

Boogert, N. J., Anderson, R. C., Peters, S., Searcy, W. A., & Nowicki, S. (2011). Song repertoire size in male song sparrows correlates with detour reaching, but not with other cognitive measures. Animal Behaviour, 81(6), 1209–1216. doi:10.1016/j.anbehav.2011.03.004

Boogert, N. J., Bui, C., Howarth, K., Giraldeau, L.-A., & Lefebvre, L. (2010). Does foraging behaviour affect female mate preferences and pair formation in captive zebra finches? PloS One, 5(12), e14340. doi:10.1371
/journal.pone.0014340

Boogert, N. J., Fawcett, T. W., & Lefebvre, L. (2011). Mate choice for cognitive traits: A review of the evidence in nonhuman vertebrates. Behavioral Ecology, 22(3), 447–459. doi:10.1093/beheco/arq173

Boogert, N. J., Giraldeau, L.-A., & Lefebvre, L. (2008). Song complexity correlates with learning ability in zebra finch males. Animal Behaviour, 76(5), 1735–1741. doi:10.1016/j.anbehav.2008.08.009

Boogert, N. J., Zimmer, C., & Spencer, K. (2013). Pre- and post-natal stress have opposing effects on social information use. Biology Letters, 9(2), 20121088. doi:10.1098/rsbl.2012.1088

Brainard, M. S., & Doupe, A. J. (2002). What songbirds teach us about learning. Nature, 417(6886), 351–8. doi:10.1038/417351a

Brown, E. S., Woolston, D. J., & Frol, A. B. (2008). Amygdala volume in patients receiving chronic corticosteroid therapy. Biological Psychiatry, 63(7), 705–709. doi:10.1016/j.biopsych.2007.09.014

Brumm, H., Zollinger, S. A., & Slater, P. J. B. (2009). Developmental stress affects song learning but not song complexity and vocal amplitude in zebra finches. Behavioral Ecology and Sociobiology, 63(9), 1387–1395. doi:10.1007/s00265-009-0749-y

Brust, V., Krüger, O., Naguib, M., & Krause, E. T. (2014). Lifelong consequences of early nutritional conditions on learning performance in zebra finches (Taeniopygia guttata). Behavioural Processes, 103. doi:10.1016/j.beproc.2014.01.019

Buchanan, K. L., Grindstaff, J. L., & Pravosudov, V. V. (2013). Condition dependence, developmental plasticity, and cognition: Implications for ecology and evolution. Trends in Ecology & Evolution, 28(5), 290–296. doi:10.1016/j.tree.2013.02.004

Buchanan, K. L., Spencer, K., Goldsmith, A. R., & Catchpole, C. K. (2003). Song as an honest signal of past developmental stress in the European starling (Sturnus vulgaris). Proceedings of The Royal Society B: Biological Sciences, 270(1520), 1149–1156. doi:10.1098/rspb.2003.2330

Candolin, U. (2003). The use of multiple cues in mate choice. Biological Reviews of the Cambridge Philosophical Society, 78(4), 575–595. doi:10.1017/S1464793103006158

Carere, C., Drent, P., Koolhaas, J., & Groothuis, T. G. G. (2005). Epigenetic effects on personality traits: Early food provisioning and sibling competition. Behaviour, 142(9), 1329–1355. doi:10.1163/156853905774539328

Carere, C., Drent, P. J., Privitera, L., Koolhaas, J. M., & Groothuis, T. G. G. (2005). Personalities in great tits, Parus major: Stability and consistency. Animal Behaviour, 70(4), 795–805. doi:10.1016/j.anbehav.2005.01.003

Carere, C., Groothuis, T. G. G., Möstl, E., Daan, S., & Koolhaas, J. (2003). Fecal corticosteroids in a territorial bird selected for different personalities: Daily rhythm and the response to social stress. Hormones and Behavior, 43(5), 540–548. doi:10.1016/S0018-506X(03)00065-5

Catchpole, C. K. (1996). Song and female choice: Good genes and big brains? Trends in Ecology & Evolution, 11(9), 358–360. doi:10.1016/0169-5347(96)30042-6

Clayton, N. S. (1996). Development of food-storing and the hippocampus in juvenile marsh tits (Parus palustris). Behavioural Brain Research, 74(1–2), 153–159. doi:10.1016/0166-4328(95)00049-6

Clayton, N. S., Dally, J. M., & Emery, N. J. (2007). Social cognition by food-caching corvids. The western scrub-jay as a natural psychologist. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 362(1480), 507–522. doi:10.1098/rstb.2006.1992

Clayton, N. S., & Dickinson, A. (1998). Episodic-like memory during cache recovery by scrub jays. Nature, 395(6699), 272–274. doi:10.1038/26216

Cockrem, J. F. (2007). Stress, corticosterone responses and avian personalities. Journal of Ornithology, 148(S2), 169–178. doi:10.1007/s10336-007-0175-8

Cohen, D. H. (1975). Involvement of the avian amygdalar homologue (archistriatum posterior and mediale) in defensively conditioned heart rate change. The Journal of Comparative Neurology, 160(1), 13–35. doi:10.1002/cne.901600103

Cole, E. F., Morand-Ferron, J., Hinks, A. E. E., & Quinn, J. L. L. (2012). Cognitive ability influences reproductive life history variation in the wild. Current Biology, 22(19), 1808–1812. doi:10.1016/j.cub.2012.07.051

Cole, E. F., & Quinn, J. L. (2012). Personality and problem-solving performance explain competitive ability in the wild. Proceedings of The Royal Society B: Biological Sciences, 279(1731), 1168–1175. doi:10.1098/rspb.2011.1539

Collins, S. A., Hubbard, C., & Houtman, A. M. (1994). Female mate choice in the zebra finch—the effect of male beak color and male song. Behavioral Ecology and Sociobiology, 35(1), 21–25. doi:10.1007/BF00167055

Collins, S. A., & ten Cate, C. (1996). Does beak colour affect female preference in zebra finches? Animal Behaviour, 52(1), 105–112. doi:10.1006/anbe.1996.0156

Cotton, S., Small, J., & Pomiankowski, A. (2006). Sexual selection and condition-dependent mate preferences. Current Biology, 16(17), R755–R765. doi:10.1016/j.cub.2006.08.022

David, M., Auclair, Y., & Cézilly, F. (2011). Personality predicts social dominance in female zebra finches, Taeniopygia guttata, in a feeding context. Animal Behaviour, 81(1), 219–224. doi:10.1016/j.anbehav.2010.10.008

de Kogel, C. H., & Prijs, H. J. (1996). Effects of brood size maniuplations on sexual attractiveness of offspring in the zebra finch. Animal Behaviour, 51, 699–708. doi:10.1006/anbe.1996.0073

Deary, I. J., Penke, L., & Johnson, W. (2010). The neuroscience of human intelligence differences. Nature Reviews Neuroscience, 11(3), 201–11. doi:10.1038/nrn2793

Donaldson, C. (2009). Post-natal environmental effects on behaviour in the zebra finch (Taeniopygia guttata). (Doctoral dissertation, University of Glasgow, UK, 2009). Id: glathesis:2009-937. http://theses.gla.ac.uk/id/eprint/937

Drent, P. J., van Oers, K., & van Noordwijk, A. J. (2003). Realized heritability of personalities in the great tit (Parus major). Proceedings of The Royal Society B: Biological Sciences, 270(1510), 45–51. doi:10.1098/rspb.2002.2168

Dunn, J. C., Cole, E. F., & Quinn, J. L. (2011). Personality and parasites: Sex-dependent associations between avian malaria infection and multiple behavioural traits. Behavioral Ecology and Sociobiology, 65(7), 1459–1471. doi:10.1007/s00265-011-1156-8

Durant, S. E., Hepp, G. R., Moore, I. T., Hopkins, B. C., & Hopkins, W. A. (2010). Slight differences in incubation temperature affect early growth and stress endocrinology of wood duck (Aix sponsa) ducklings. The Journal of Experimental Biology, 213(1), 45–51. doi:10.1242/jeb.034488

Eichenbaum, H. (1996). Is the rodent hippocampus just for “place”? Current Opinion in Neurobiology, 6(2), 187–195. doi:10.1016/S0959-4388(96)80072-9

Engelmann, M., Landgraf, R., & Wotjak, C. T. (2003). Taurine regulates corticotropin secretion at the level of the supraoptic nucleus during stress in rats. Neuroscience Letters, 348(2), 120–122. doi:10.1016/S0304-3940(03)00741-9

Farrell, T. M., Weaver, K., An, Y.-S., & MacDougall-Shackleton, S. A. (2011). Song bout length is indicative of spatial learning in European starlings. Behavioral Ecology, 23(1), 101–111. doi:10.1093/beheco/arr162

Fisher, M. O., Nager, R. G., & Monaghan, P. (2006). Compensatory growth impairs adult cognitive performance. PLoS Biology, 4(8), e251. doi:10.1371/journal.pbio.0040251

Garamszegi, L. Z., Eens, M., & Török, J. (2008). Birds reveal their personality when singing. PloS One, 3(7), e2647. doi:10.1371/journal.pone.0002647

Gil, D., & Gahr, M. (2002). The honesty of bird song: Multiple constraints for multiple traits. Trends in Ecology & Evolution, 17(3), 133–141. doi:10.1016/S0169-5347(02)02410-2

Gil, D., Naguib, M., Riebel, K., Rutstein, A., & Gahr, M. (2006). Early condition, song learning , and the volume of song brain nuclei in the zebra finch (Taeniopygia guttata). Journal of Neurobiology, 66(14), 1602–1612. doi:10.1002/neu.20312

Groothuis, T. G. G., & Carere, C. (2005). Avian personalities: Characterization and epigenesis. Neuroscience and Biobehavioral Reviews, 29(1), 137–150. doi:10.1016/j.neubiorev.2004.06.010

Groothuis, T. G. G., & Trillmich, F. (2011). Unfolding personalities: The importance of studying ontogeny. Developmental Psychobiology, 53(6), 641–655. doi:10.1002/dev.20574

Guillette, L. M., Reddon, A. R., Hurd, P. L., & Sturdy, C. B. (2009). Exploration of a novel space is associated with individual differences in learning speed in black-capped chickadees, Poecile atricapillus. Behavioural Processes, 82(3), 265–70. doi:10.1016/j.beproc.2009.07.005

Hampton, R. R., & Shettleworth, S. J. (1996). Hippocampal lesions impair memory for location but not color in passerine birds. Behavioral Neuroscience, 110(4), 831–835. doi:10.1037/0735-7044.110.4.831

Hasselquist, D., Bensch, S., & von Schantz, T. (1996). Correlation between male song repertoire, extra-pair paternity and offspring survival in the great reed warbler. Nature, 381(6579), 229–232. doi:10.1038/381229a0

Healy, S. D. (2012). Animal cognition: The trade-off to being smart. Current Biology, 22(19), R840–R841. doi:10.1016/j.cub.2012.08.032

Heim, C., & Nemeroff, C. B. (2001). The role of childhood trauma in the neurobiology of mood and anxiety disorders: Preclinical and clinical studies. Biological Psychiatry, 49(12), 1023–1039. doi:10.1016/S0006-3223(01)01157-X

Heim, C., Shugart, M., Craighead, W. E., & Nemeroff, C. B. (2010). Neurobiological and psychiatric consequences of child abuse and neglect. Developmental Psychobiology, 52(7), 671–690. doi:10.1002/dev.20494

Henriksen, R., Rettenbacher, S., & Groothuis, T. G. G. (2011). Prenatal stress in birds: Pathways, effects, function and perspectives. Neuroscience and Biobehavioral Reviews, 35(7), 1484–1501. doi:10.1016/j.neubiorev.2011.04.010

Hernandez, A. M., & MacDougall-Shackleton, S. A. (2004). Effects of early song experience on song preferences and song control and auditory brain regions in female house finches (Carpodacus mexicanus). Journal of Neurobiology, 59(2), 247–258. doi:10.1002/neu.10312

Hodgson, Z. G., Meddle, S. L., Roberts, M. L., Buchanan, K. L., Evans, M. R., Metzdorf, R., et al. (2007). Spatial ability is impaired and hippocampal mineralocorticoid receptor mRNA expression reduced in zebra finches (Taeniopygia guttata) selected for acute high corticosterone response to stress. Proceedings of The Royal Society B: Biological Sciences, 274(1607), 239–245. doi:10.1098/rspb.2006.3704

Holveck, M.-J., Geberzahn, N., & Riebel, K. (2011). An experimental test of condition-dependent male and female mate choice in zebra finches. PloS One, 6(8), 1–10. doi:10.1371/journal.pone.0023974

Holveck, M.-J., & Riebel, K. (2010). Low-quality females prefer low-quality males when choosing a mate. Proceedings of The Royal Society B: Biological Sciences, 277(1678), 153–160. doi:10.1098/rspb.2009.1222

Holveck, M.-J., Vieira de Castro, A. C., Lachlan, R. F., ten Cate, C., & Riebel, K. (2008). Accuracy of song syntax learning and singing consistency signal early condition in zebra finches. Behavioral Ecology, 19(6), 1267–1281. doi:10.1093/beheco/arn078

Isden, J., Panayi, C., Dingle, C., & Madden, J. (2013). Performance in cognitive and problem-solving tasks in male spotted bowerbirds does not correlate with mating success. Animal Behaviour, 86(4), 829–838. doi:10.1016/j.anbehav.2013.07.024

Joëls, M. (2008). Functional actions of corticosteroids in the hippocampus. European Journal of Pharmacology, 583, 312–321. doi:10.1016/j.ejphar.2007.11.064

Joëls, M., Karst, H., DeRijk, R., & de Kloet, E. R. (2008). The coming out of the brain mineralocorticoid receptor. Trends in Neurosciences, 31(1), 1–7. doi:10.1016/j.tins.2007.10.005

Keagy, J., Savard, J.-F., & Borgia, G. (2009). Male satin bowerbird problem-solving ability predicts mating success. Animal Behaviour, 78(4), 809–817. doi:10.1016/j.anbehav.2009.07.011

Keagy, J., Savard, J.-F., & Borgia, G. (2011a). Cognitive ability and the evolution of multiple behavioral display traits. Behavioral Ecology, 23(2), 448–456. doi:10.1093/beheco/arr211

Keagy, J., Savard, J.-F., & Borgia, G. (2011b). Complex relationship between multiple measures of cognitive ability and male mating success in satin bowerbirds, Ptilonorhynchus violaceus. Animal Behaviour, 81(5), 1063–1070. doi:10.1016/j.anbehav.2011.02.018

Kitaysky, A., Kitaiskaia, E., Wingfield, J., & Piatt, J. (2001). Dietary restriction causes chronic elevation of corticosterone and enhances stress response in red-legged kittiwake chicks. Journal of Comparative Physiology B: Biochemical, Systemic, and Environmental Physiology, 171(8), 701–709. doi:10.1007/s003600100230

Krause, E. T., Honarmand, M., Wetzel, J., & Naguib, M. (2009). Early fasting is long lasting: Differences in early nutritional conditions reappear under stressful conditions in adult female zebra finches. PloS One, 4(3), e5015. doi:10.1371/journal.pone.0005015

Krause, E. T., & Naguib, M. (2011). Compensatory growth affects exploratory behaviour in zebra finches, Taeniopygia guttata. Animal Behaviour, 81(6), 1295–1300. doi:10.1016/j.anbehav.2011.03.021

Krause, E. T., & Naguib, M. (2014). Effects of parental and own early developmental conditions on the phenotype in zebra finches (Taeniopygia guttata). Evolutionary Ecology, 28(2), 263–275. doi:10.1007/s10682-013-9674-7

Kriengwatana, B. (2013). Timing of developmental stress and phenotypic plasticity: Effects of nutritional stress at different developmental periods on physiological and cognitive-behavioral traits in the zebra finch (Taeniopygia guttata). (Doctoral dissertation, University of Western Ontario, Canada, 2013). University of Western Ontario—Electronic Thesis and Dissertation Repository. Paper 1469. http://ir.lib.uwo.ca/etd/1469

Kriengwatana, B., Farrell, T. M., Aitken S. D. T., Garcia, L., & MacDougall-Shackleton, S. A. (2015). Early-life nutritional stress affects associative learning and spatial memory but not performance on a novel object task. Behaviour, 152(2), 195-218. doi:10.1163/1568539X-00003239

Kriengwatana, B., Wada, H., Macmillan, A., & MacDougall-Shackleton, S. A. (2013). Juvenile nutritional stress affects growth rate, adult organ mass, and innate immune function in zebra finches (Taeniopygia guttata). Physiological and Biochemical Zoology, 86(6), 769–781. doi:10.1086/673260

Kriengwatana, B., Wada, H., Schmidt, K. L., Taves, M. D., Soma, K. K., & MacDougall-Shackleton, S. A. (2014). Effects of nutritional stress during different developmental periods on song and the hypothalamic-pituitary-adrenal axis in zebra finches. Hormones and Behavior, 65(3), 285–293. doi:10.1016/j.yhbeh.2013.12.013

Lapin, I. P. (2003). Neurokynurenines (Neky) as common neurochemical links of stress and anxiety. Advances in Experimental Medicine and Biology Volume, 527, 121–125. doi:10.1007/978-1-4615-0135-0_14

Lauay, C., Gerlach, N. M., Adkins-Regan, E., & DeVoogd, T. J. (2004). Female zebra finches require early song exposure to prefer high-quality song as adults. Animal Behaviour, 68(6), 1249–1255. doi:10.1016/j.anbehav.2003.12.025

Liu, D., Diorio, J., Tannenbaum, B., Caldji, C., Francis, D., Freedman, A., et al. (1997). Maternal care, hippocampal glucocorticoid receptors, and hypothalamic-pituitary-adrenal responses to stress. Science, 277(5332), 1659–1662. doi:10.1126/science.277.5332.1659

Lupien, S. J., McEwen, B. S., Gunnar, M. R., & Heim, C. (2009). Effects of stress throughout the lifespan on the brain, behaviour and cognition. Nature Reviews. Neuroscience, 10(6), 434–445. doi:10.1038/nrn2639

MacDonald, I. F., Kempster, B., Zanette, L., & MacDougall-Shackleton, S. A. (2006). Early nutritional stress impairs development of a song-control brain region in both male and female juvenile song sparrows (Melospiza melodia) at the onset of song learning. Proceedings of The Royal Society B: Biological Sciences, 273(1600), 2559–2564. doi:10.1098/rspb.2006.3547

MacDougall-Shackleton, S. A, Dindia, L., Newman, A. E. M., Potvin, D. A., Stewart, K. A., & MacDougall-Shackleton, E. A. (2009). Stress, song and survival in sparrows. Biology Letters, 5(6), 746–748. doi:10.1098/rsbl.2009.0382

MacDougall-Shackleton, S., & Spencer, K. (2012). Developmental stress and birdsong: Current evidence and future directions. Journal of Ornithology, 153(S1), 105–117. doi:10.1007/s10336-011-0807-x

Martins, T. L. F., Roberts, M. L., Giblin, I., Huxham, R., & Evans, M. R. (2007). Speed of exploration and risk-taking behavior are linked to corticosterone titres in zebra finches. Hormones and Behavior, 52(4), 445–453. doi:10.1016/j.yhbeh.2007.06.007

Metcalfe, N. B., & Monaghan, P. (2001). Compensation for a bad start: Grow now, pay later? Trends in Ecology & Evolution, 16(5), 254–260. doi:10.1016/S0169-5347(01)02124-3

Monaghan, P. (2008). Early growth conditions, phenotypic development and environmental change. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 363(1497), 1635–1645. doi:10.1098/rstb.2007.0011

Naguib, M., Flörcke, C., & van Oers, K. (2011). Effects of social conditions during early development on stress response and personality traits in great tits (Parus major). Developmental Psychobiology, 53(6), 592–600. doi:10.1002/dev.20533

Nowicki, S., Hasselquist, D., Bensch, S., & Peters, S. (2000). Nestling growth and song repertoire size in great reed warblers: Evidence for song learning as an indicator mechanism in mate choice. Proceedings of The Royal Society B: Biological Sciences, 267(1460), 2419–2424. doi:10.1098/rspb.2000.1300

Nowicki, S., Peters, S., & Podos, J. (1998). Song learning, early nutrition and sexual selection in songbirds. Integrative and Comparative Biology, 38(1), 179–190. doi:10.1093/icb/38.1.179

Nowicki, S., & Searcy, W. A. (2011). Are better singers smarter? Behavioral Ecology, 22(1), 10–11. doi:10.1093/beheco/arq081

Pfaff, J. A., Zanette, L., MacDougall-Shackleton, S. A., & MacDougall-Shackleton, E. A. (2007). Song repertoire size varies with HVC volume and is indicative of male quality in song sparrows (Melospiza melodia). Proceedings of The Royal Society B: Biological Sciences, 274(1621), 2035–2040. doi:10.1098/rspb.2007.0170

Pravosudov, V. V., & Kitaysky, A. S. (2006). Effects of nutritional restrictions during post-hatching development on adrenocortical function in western scrub-jays (Aphelocoma californica). General and Comparative Endocrinology, 145(1), 25–31. doi:10.1016/j.ygcen.2005.06.011

Pravosudov, V. V., Lavenex, P., & Omanska, A. (2005). Nutritional deficits during early development affect hippocampal structure and spatial memory later in life. Behavioral Neuroscience, 119(5), 1368–1374. doi:10.1037/0735-7044.119.5.1368

Pravosudov, V. V., & Lucas, J. R. (2001). A dynamic model of short-term energy management in small food-caching and non-caching birds. Behavioral Ecology, 12(2), 207–218. doi:10.1093/beheco/12.2.207

Riebel, K. (2000). Early exposure leads to repeatable preferences for male song in female zebra finches. Proceedings of The Royal Society B: Biological Sciences, 267(1461), 2553–2558. doi:10.1098/rspb.2000.1320

Riebel, K., Naguib, M., & Gil, D. (2009). Experimental manipulation of the rearing environment influences adult female zebra finch song preferences. Animal Behaviour, 78(6), 1397–1404. doi:10.1016/j.anbehav.2009.09.011

Roth, T. C., Brodin, A., Smulders, T. V., LaDage, L. D., & Pravosudov, V. V. (2010). Is bigger always better? A critical appraisal of the use of volumetric analysis in the study of the hippocampus. Philosophical Transactions of the Royal Society B: Biological Sciences, 365(1542), 915–931. doi:10.1098/rstb.2009.0208

Roth, T. C., Gallagher, C. M., LaDage, L. D., & Pravosudov, V. V. (2012). Variation in brain regions associated with fear and learning in contrasting climates. Brain, Behavior and Evolution, 79(3), 181–190. doi:10.1159/000335421

Roth, T. C., LaDage, L. D., & Pravosudov, V. V. (2010). Learning capabilities enhanced in harsh environments: A common garden approach. Proceedings of The Royal Society B: Biological Sciences, 277(1697), 3187–3193. doi:10.1098/rspb.2010.0630

Schew, W. A., & Ricklefs, R. E. (1998). Developmental plasticity. In J. M. Starck & R. E. Ricklefs (Eds.), Avian Growth and Development: Evolution Within the Altricial-Percocial Spectrum (pp. 288–304). Oxford: Oxford University Press.

Schmidt, K. L., MacDougall-Shackleton, E. A., & MacDougall-Shackleton, S. A. (2012). Developmental stress has sex-specific effects on nestling growth and adult metabolic rates but no effect on adult body size or body composition in song sparrows. The Journal of Experimental Biology, 215(18), 3207–3217. doi:10.1242/jeb.068965

Schmidt, K. L., MacDougall-Shackleton, E. A., Soma, K. K., & MacDougall-Shackleton, S. A. (2014). Developmental programming of the HPA and HPG axes by early-life stress in male and female song sparrows. General and Comparative Endocrinology, 196, 72–80. doi:10.1016/j.ygcen.2013.11.014

Schmidt, K. L., McCallum, E. S., MacDougall-Shackleton, E. A., & MacDougall-Shackleton, S. A. (2013). Early-life stress affects the behavioural and neural response of female song sparrows to conspecific song. Animal Behaviour, 85(4), 825–837. doi:10.1016/j.anbehav.2013.01.029

Schmidt, K. L., Moore, S. D., MacDougall-Shackleton, E. A., & MacDougall-Shackleton, S. A. (2013). Early-life stress affects song complexity, song learning and volume of the brain nucleus RA in adult male song sparrows. Animal Behaviour, 86(1), 25–35. doi:10.1016/j.anbehav.2013.03.036

Schuett, W., & Dall, S. R. X. (2009). Sex differences, social context and personality in zebra finches, Taeniopygia guttata. Animal Behaviour, 77(5), 1041–1050. doi:10.1016/j.anbehav.2008.12.024

Searcy, W. A. (1992). Song repertoire and mate choice in birds. American Zoologist, 32(1), 71–80. doi:10.1093/icb/32.1.71

Searcy, W. A., & Andersson, M. (1986). Sexual selection and the evolution of song. Annual Review of Ecology, Evolution and Systematics, 17, 507–533. doi:10.1146/annurev.es.17.110186.002451

Sewall, K. B., Soha, J. A., Peters, S., & Nowicki, S. (2013). Potential trade-off between vocal ornamentation and spatial ability in a songbird. Biology Letters, 9. doi:10.1098/rsbl.2013.0344

Sherry, D. F., & Vaccarino, A. L. (1989). Hippocampus and memory for food caches in black-capped chickadees. Behavioral Neuroscience, 103(2), 308–318. doi:10.1037/0735-7044.103.2.308

Shettleworth, S. J. (2009). Cognition, Evolution, and Behavior (2nd ed.). New York: Oxford University Press.

Sih, A., & Bell, A. M. (2008). Insight for behavioral ecology from behavioral syndromes. Advances in the Study of Behavior, 38, 227–281. doi:10.1016/S0065-3454(08)00005-3

Snowberg, L. K., & Benkman, C. W. (2009). Mate choice based on a key ecological performance trait. Journal of Evolutionary Biology, 22(4), 762–769. doi:10.1111/j.1420-9101.2009.01699.x

Spencer, K., Buchanan, K., Goldsmith, A., & Catchpole, C. (2003). Song as an honest signal of developmental stress in the zebra finch (Taeniopygia guttata). Hormones and Behavior, 44(2), 132–139. doi:10.1016/S0018-506X(03)00124-7

Spencer, K., Buchanan, K. L., Goldsmith, A. R., & Catchpole, C. K. (2004). Developmental stress, social rank and song complexity in the European starling (Sturnus vulgaris). Proceedings of The Royal Society B: Biological Sciences, 271 Suppl, S121–S123. doi:10.1098/rsbl.2003.0122

Spencer, K., & MacDougall-Shackleton, S. A. (2011). Indicators of development as sexually selected traits: The developmental stress hypothesis in context. Behavioral Ecology, 22(1), 1–9. doi:10.1093/beheco/arq068

Spencer, K., & Verhulst, S. (2007). Delayed behavioral effects of postnatal exposure to corticosterone in the zebra finch (Taeniopygia guttata). Hormones and Behavior, 51(2), 273–280. doi:10.1016/j.yhbeh.2006.11.001

Spencer, K., Wimpenny, J. H., Buchanan, K. L., Lovell, P. G., Goldsmith, A. R., & Catchpole, C. K. (2005). Developmental stress affects the attractiveness of male song and female choice in the zebra finch (Taeniopygia guttata). Behavioral Ecology and Sociobiology, 58(4), 423–428. doi:10.1007/s00265-005-0927-5

Stamps, J. A., & Groothuis, T. G. G. (2010a). The development of animal personality: Relevance, concepts and perspectives. Biological Reviews of the Cambridge Philosophical Society, 85(2), 301–325. doi:10.1111/j.1469-185X.2009.00103.x

Stamps, J. A., & Groothuis, T. G. G. (2010b). Developmental perspectives on personality: Implications for ecological and evolutionary studies of individual differences. Philosophical Transactions of the Royal Society B: Biological Sciences, 365(1560), 4029–4041. doi:10.1098/rstb.2010.0218

Sturdy, C. B., Phillmore, L. S., Sartor, J. J., & Weisman, R. G. (2001). Reduced social contact causes auditory perceptual deficits in zebra finches, Taeniopygia guttata. Animal Behaviour, 62(6), 1207–1218. doi:10.1006/anbe.2001.1864

Suzuki, K., Matsunaga, E., Kobayashi, T., & Okanoya, K. (2011). Expression patterns of mineralocorticoid and glucocorticoid receptors in Bengalese finch (Lonchura striata var. domestica) brain suggest a relationship between stress hormones and song-system development. Neuroscience, 194, 72–83. doi:10.1016/j.neuroscience.2011.07.073

Svanbäck, R., & Bolnick, D. I. (2007). Intraspecific competition drives increased resource use diversity within a natural population. Proceedings of The Royal Society B: Biological Sciences, 274(1611), 839–844. doi:10.1098/rspb.2006.0198

Templeton, C. N., Laland, K. N., & Boogert, N. J. (2014). Does song complexity correlate with problem-solving performance in flocks of zebra finches? Animal Behaviour, 92, 63–71. doi:10.1016/j.anbehav.2014.03.019

ten Cate, C., & Vos, D. R. (1999). Sexual imprinting and evolutionary processes in birds: A reassessment. Advances in the Study of Behavior, 28, 1–31. doi:10.1016/S0065-3454(08)60214-4

Thornton, A., Isden, J., & Madden, J. R. (2014). Toward wild psychometrics: Linking individual cognitive differences to fitness. Behavioral Ecology, 25(6), 1299-1301, 1–3. doi:10.1093/beheco/aru095

Thornton, A., & Lukas, D. (2012). Individual variation in cognitive performance: Developmental and evolutionary perspectives. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1603), 2773–2783. doi:10.1098/rstb.2012.0214

Trillmich, F., & Hudson, R. (2011). The emergence of personality in animals: The need for a developmental approach. Developmental Psychobiology, 53(6), 505–509. doi:10.1002/dev.20573

Tschirren, B., Rutstein, A. N., Postma, E., Mariette, M., & Griffith, S. C. (2009). Short- and long-term consequences of early developmental conditions: A case study on wild and domesticated zebra finches. Journal of Evolutionary Biology, 22(2), 387–395. doi:10.1111/j.1420-9101.2008.01656.x

Vallée, M., MacCari, S., Dellu, F., Simon, H., Le Moal, M., & Mayo, W. (1999). Long-term effects of prenatal stress and postnatal handling on age-related glucocorticoid secretion and cognitive performance: A longitudinal study in the rat. The European Journal of Neuroscience, 11(8), 2906–2916. doi:10.1046/j.1460-9568.1999.00705.x

van Oers, K. (2005). Context dependence of personalities: Risk-taking behavior in a social and a nonsocial situation. Behavioral Ecology, 16(4), 716–723. doi:10.1093/beheco/ari045

van Oers, K., Drent, P. J., de Goede, P., & van Noordwijk, A. J. (2004). Realized heritability and repeatability of risk-taking behaviour in relation to avian personalities. Proceedings of The Royal Society B: Biological Sciences, 271(1534), 65–73. doi:10.1098/rspb.2003.2518

Verbeek, M., Boon, A., & Drent, P. J. (1996). Exploration, aggressive behaviour and dominance in pair-wise confrontations of juvenile male great tits. Behaviour, 133, 945–963. doi:10.1163/156853996X00314

Verbeek, M. E. M., Drent, P. J., & Wiepkema, P. R. (1994). Consistent individual differences in early exploratory behaviour of male great tits. Animal Behaviour, 48(5), 1113–1121. doi:10.1006/anbe.1994.1344

Verhulst, S., Holveck, M.-J., & Riebel, K. (2006). Long-term effects of manipulated natal brood size on metabolic rate in zebra finches. Biology Letters, 2(3), 478–480. doi:10.1098/rsbl.2006.0496

Welberg, L. A. M., & Seckl, J. R. (2001). Prenatal stress, glucocorticoids and the programming of the brain. Journal of Neuroendocrinology, 13(2), 113–128. doi:10.1111/j.1365-2826.2001.00601.x

Wingfield, J. C., Maney, D. L., Breuner, C. W., Jacobs, J. D., Lynn, S., Ramenofsky, M., & Richardson, R. D. (1998). Ecological bases of hormone–behavior interactions: The “emergency life history stage.” Integrative and Comparative Biology, 38(1), 191–206. doi:10.1093/icb/38.1.191

Woodgate, J. L., Bennett, A. T. D., Leitner, S., Catchpole, C. K., & Buchanan, K. L. (2010). Developmental stress and female mate choice behaviour in the zebra finch. Animal Behaviour, 79(6), 1381–1390. doi:10.1016/j.anbehav.2010.03.018

Woodgate, J. L., Leitner, S., Catchpole, C. K., Berg, M. L., Bennett, A. T. D., & Buchanan, K. L. (2011). Developmental stressors that impair song learning in males do not appear to affect female preferences for song complexity in the zebra finch. Behavioral Ecology, 22(3), 566–573. doi:10.1093/beheco/arr006

Yang, J., Han, H., Cao, J., Li, L., & Xu, L. (2006). Prenatal stress modifies hippocampal synaptic plasticity and spatial learning in young rat offspring. Hippocampus, 436, 431–436. doi:10.1002/hipo.20181

Zann, R., & Cash, E. (2008). Developmental stress impairs song complexity but not learning accuracy in non-domesticated zebra finches (Taeniopygia guttata). Behavioral Ecology and Sociobiology, 62(3), 391–400. doi:10.1007/s00265-007-0467-2

Zimmer, C., Boogert, N. J., & Spencer, K. (2013). Developmental programming: Cumulative effects of increased pre-hatching corticosterone levels and post-hatching unpredictable food availability on physiology and behaviour in adulthood. Hormones and Behavior, 64(3), 494–500. doi:10.1016/j.yhbeh.2013.07.002

Volume 9: pp. 99-126


vol9_colbert_white_thumbWhere Apes and Songbirds Are Left Behind: A Comparative Assessment of the Requisites for Speech

Erin N. Colbert-White
University of Puget Sound

Michael C. Corballis
University of Auckland

Dorothy M. Fragaszy
University of Georgia

Reading Options:

Continue reading below, or:
Read/Download PDF| Add to Endnote


Abstract

A handful of mammalian and avian species can imitate speech (i.e., sounds perceived by humans as those comprising the human communication system of language). Of those species, even fewer are capable of using speech to communicate. While there has been no empirical comparison of nonhuman speech users, parrots are presumed to be the most prolific. In this review, we identify several anatomical, neurological, and sociobiological features shared by parrots and humans that could account for why parrots might emerge as the most advanced nonhuman speech users. Apes and temperate oscine songbirds, due to their phylogenetic similarity to humans and parrots, respectively, are also included in the comparison. We argue that while all four taxa share hemispheric asymmetry of communication areas and basic sociality, humans and parrots share three additional features that are not completely present in apes and songbirds. Specifically, apes, unlike songbirds, parrots, and humans, are not considered vocal learners and do not have sufficient respiratory control to support a speech stream, while parrots, humans, and apes demonstrate complex affiliative social behavior. Along with the above anatomical, neurological, and sociobiological traits, parrots’ affiliative long-term relationships, similar to that of humans, may help explain both groups’ ability to produce and use a wide variety of sounds. Thus, this paper extends parrot–human cognitive comparisons by introducing another similarity—that of complex affiliative relationships—as a possible explanation for why parrots can produce and use speech to communicate.

Keywords: speech; parrots; primates; language; songbirds

Author Note: Erin N. Colbert-White, Department of Psychology, University of Puget Sound, Tacoma, WA 98416; Michael C. Corballis, School of Psychology, University of Auckland, Auckland 1010, New Zealand; Dorothy M. Fragaszy, Department of Psychology, University of Georgia, Athens, GA 30602.

We acknowledge Patrick Murray, Gary Baker, and Marina Popkov for their help in the preparation of this manuscript.

Correspondence concerning this article should be addressed to Erin N. Colbert-White, Department of Psychology, University of Puget Sound, 1500 N. Warner St. #1046, Tacoma, WA 98416. E-mail: ecolbertwhite@pugetsound.edu


Speech is the vocalized form of language, whereby identifiable units of sound (phonemes) are combined to form more complex sounds with referential meaning (morphemes, words), which are in turn combined to form syntactic structures that can serve as descriptions about the world (phrases, sentences). Language can also be represented in other forms, including writing and sign language. Linguists generally agree that fully syntactic language, whether spoken, signed, or written, is unique to humans. Nevertheless, some nonhumans, notably parrots, are capable of producing identifiable renderings of spoken words, and even simple phrases, and using them communicatively. To this limited extent, at least, they may be said to be capable of speech, and in this review we use the term speech to cover referential speech-like vocal communication without any implication of syntactic structure.

Phylogenetically, humans and birds diverged some 300 million years ago (Burish, Kueh, & Wang, 2004). In contrast, hominids and chimpanzees (Pan troglodytes) diverged only 6 million years ago (Zollikofer et al., 2005). Though a common vocalization mechanism cannot be assumed, given these dates, chimpanzees would appear to be the most likely candidate for articulate production and use of speech, not birds. However, there is no indication that apes can articulate any sounds approximating words, precluding them from communicating with speech. Parrots, on the other hand, represent one of the most skilled of all nonhuman speech producers (e.g., Fitch, 2000b; see Pepperberg, 1999 for review).

Some mammalian species can mimic speech or speech-like sounds with varying levels of precision (e.g., harbor seals, Phoca vitulina, Ralls, Fiorelli, & Gish, 1985; one beluga whale, Delphinapterus leucas, Ridgway, Carder, Jeffries, & Todd, 2012; one Indian elephant, Elephas maximus indicus, Stoeger et al., 2012; see Janik & Slater, 1997 for others). Among birds, speech sound mimics include tuis (Prosthemadera novaeseelandiae, Whangarei Native Bird Recovery Centre, n.d.), corvid songbirds (e.g., Pica nuttalli, Noack, 1902), and sturnid songbirds (e.g., Sturnus vulgaris, West, Stroud, & King, 1983). Unlike almost every other nonhuman species, however, parrots can use speech that is identifiable as such to the human ear for communicative purposes. Lab- and home-reared studies have demonstrated the sophistication with which parrots are able to use speech, including referential comments about object properties and numbers, spontaneous recombination of syllables to produce new, arguably context-appropriate, words (e.g., Pepperberg, 1987, 1999, 2006, 2007), and predictable use of words across varying social contexts (e.g., Colbert-White, Covington, & Fragaszy, 2011). This curious similarity between humans’ and parrots’ speech ability is the subject of this discussion. Since we include the communicative function as part of our definition of speech, we exclude mere mimicry. Figure 1 provides a sample of natural vocal abilities, ranging from complex nonspeech communication systems, to speech mimicry, to the use of speech as a medium for language.

Figure 1. Spectrums illustrating differences in speech use and species-typical repertoires for a variety of animals. Exemplar species included in the figure were selected due to their frequent appearance in the literature, and do not necessarily generalize to all species within a given taxonomic order.

Figure 1. Spectrums illustrating differences in speech use and species-typical repertoires for a variety of animals. Exemplar species included in the figure were selected due to their frequent appearance in the literature, and do not necessarily generalize to all species within a given taxonomic order.

 

To date, great emphasis has been placed on comparing humans to extant apes to understand the speech faculty. Empirical work comparing human and ape vocal tract anatomy (e.g., Duchin, 1990; Kay, Cartmill, & Balow, 1998; Lieberman, Crelin, & Klatt, 1972) and neurobiology (e.g., Gannon, Holloway, Broadfield, & Braun, 1998; Sherwood, Broadfield, Holloway, Gannon, & Hof, 2003) has raised more questions than it has answered, as some point out (e.g., Lieberman & McCarthy, 2007). Comparisons between humans and songbirds, on the other hand, identify similarities in both neurobiology (e.g., Doupe & Kuhl, 1999; Kuhl, 2003; Teramitsu, Kudo, London, Geschwind, & White, 2004) and vocalization acquisition patterns (e.g., Doupe & Kuhl, 1999; Marler, 1970b), among others. Doupe and Kuhl (1999) and Jarvis (2004) provide extensive reviews of this literature. Yet, though parrots are highly adept at producing and using speech sounds to communicate, neurobiological and anatomical evidence comparing humans with parrots is limited (when considering the many papers comparing humans with songbirds). Furthermore, rather than African Grey parrots (Psittacus erithacus erithacus), which are renowned for speech use, comparisons between humans and parrots are frequently made using budgerigars (Melopsittacus undulates, e.g., Jarvis & Mello, 2000; Tu & Dooling, 2012).

Figure 2. Diagram visualizing speech requisites shared by humans, apes, songbirds, and parrots, as well as those that are only shared by some of the animal groups. “Hemispheric asymmetry for communication” denotes asymmetrical size or volume of structures related to communication in either the left or right hemisphere. The term “basic sociality” refers to species that have frequent interaction with conspecifics, individual recognition of conspecifics, and extensive parental care; we define “complex sociality” as all features of basic sociality with the addition of the presence of discrete repertoire elements for affiliative nonsexual social interaction with conspecifics, social correlates of intelligence, and hierarchical relationships among group members. The figure reflects patterns for exemplar ape, songbird, and parrot species, but deviations and exceptions do exist. “H” = Feature possessed by humans; “P” = Feature possessed by parrots; “S” = Feature possessed by songbirds; “A” = Feature possessed by apes.

Figure 2. Diagram visualizing speech requisites shared by humans, apes, songbirds, and parrots, as well as those that are only shared by some of the animal groups. “Hemispheric asymmetry for communication” denotes asymmetrical size or volume of structures related to communication in either the left or right hemisphere. The term “basic sociality” refers to species that have frequent interaction with conspecifics, individual recognition of conspecifics, and extensive parental care; we define “complex sociality” as all features of basic sociality with the addition of the presence of discrete repertoire elements for affiliative nonsexual social interaction with conspecifics, social correlates of intelligence, and hierarchical relationships among group members. The figure reflects patterns for exemplar ape, songbird, and parrot species, but deviations and exceptions do exist. “H” = Feature possessed by humans; “P” = Feature possessed by parrots; “S” = Feature possessed by songbirds; “A” = Feature possessed by apes.

We posit here that there is no one unique characteristic that makes a species capable of speech. We argue that our definition of speech as production and use instead requires a constellation of anatomical, neurological, and sociobiological features, many of which are possessed by species that can neither produce nor use speech (see Table 1). This constellation view is shared by others in the field (e.g., Fitch, 2000b; Wind, 1983). We begin by briefly outlining two now-debunked features previously considered to be necessary for the production of speech by humans. Next, we assess four groups—humans, parrots, apes (predominantly chimpanzees), and passerine songbirds—on features relevant to the speech faculty: basic sociality (i.e., frequent interaction with conspecifics, individual recognition of conspecifics, and extensive parental care), hemispheric asymmetry with a bias for communication areas, vocal learning, finely tuned respiratory control, and complex affiliative social behavior among conspecifics (i.e., discrete repertoire elements for affiliative nonsexual social interaction with conspecifics, social correlates of intelligence, and hierarchical relationships among group members).1 As shown in Figure 2, while songbirds and apes do share some of the features above, humans and parrots as a group possess all of the features. We hypothesize that these specific features are crucial to the speech faculty, and their presence may also relate to the many similarities identified between humans’ and parrots’ cognitive abilities (e.g., Pepperberg, 1999). Thus, by further researching parrots’ wild communication systems from a sociobiological perspective, we may uncover a new parallel to human language.

Features Irrelevant to Production of Speech

Before a species can learn to use speech to communicate, it must first be able to articulate the sounds. The speech faculty has been of interest to anatomists, linguists, neuroscientists, anthropologists, and psychologists alike, resulting in a variety of theories of the requisites of speech in humans. For example, Wind (1983) provided a detailed review of over 100 morphological, physiological, and behavioral features associated with the articulation of words. Davidson’s (2003) shorter list of necessary features included a shortened soft palate, a loss of epiglottic–soft palate lock-up, a narrow supralaryngeal vocal tract (SVT), an oropharyngeal tongue, and an anterior foramen magnum. Both Wind (1983) and Davidson (2003) created their respective lists by comparing modern humans with apes and hominid fossils to determine how and at which point modern humans were able to produce the necessary range of sounds for speech. A descended larynx and a 1:1 SVT ratio are two features discussed in great detail in the literature. However, as neither of these features is present in parrots (or devices like voice recorders, for that matter), they cannot be considered necessary for the articulation of speech. These features are nevertheless detailed here both to show how nonhumans fit into the speech faculty debate, and to introduce additional evidence pertaining to characteristics that are strongly associated with, but not necessary for, speech.

Table 1. Relevant Speech Production Characteristics Across Animal Groups

Table 1. Relevant Speech Production Characteristics Across Animal Groups

Descended Larynx

At birth, the human larynx is similar in location to that of other mammals (Lieberman, 1984). Beginning around three months of age, the larynx gradually descends and the laryngeal musculature develops until about age 6 (Greene & Mathieson, 1989; Lieberman, McCarthy, Hiiemae, & Palmer, 2001; Sasaki, Levine, Laitman, & Crelin, 1977). Fully articulated speech sounds are not achieved until after the second year, when the larynx and associated structures are fully developed (Laitman, Heimbuch, & Crelin, 1978).

In the late 1960s, researchers reconstructed a Neanderthal vocal tract to investigate its vocal production capabilities. According to Lieberman and Crelin (1971), the Neanderthal larynx was similar in location to that of a human infant or nonhuman primate. Given human infants’ inability to produce speech sounds, Lieberman and Crelin theorized that a descended larynx was one of the uniquely human features required for word production. Since that time, the accuracy of Lieberman and colleagues’ anatomical reconstructions has been criticized (e.g., Boë, Heim, Honda, & Maeda, 2002), calling into question the validity of their conclusions regarding the importance of a descended larynx. It is important to note, however, that such criticisms have been refuted by others (e.g., de Boer & Fitch’s 2010 response to Boë et al., 2002).

Today, a majority of researchers agree that possessing a descended larynx is not necessary for speech production (e.g., Fitch, 2000c)—indeed, speech is even possible following laryngectomy (Luchsinger & Arnold, 1965). Research during the late 19th century with preserved nonhuman animal specimens, along with work such as that of Lieberman and Crelin (1971), concluded that humans were the only animals with a descended larynx. Thirty years after Lieberman and Crelin, Fitch’s (2000a) X-ray studies demonstrated that there are species (e.g., dogs, Canis familiaris; goats, Capra hircus; pigs, Sus scrofa; and cotton-top tamarins, Saguinus oedipus) with larynxes that descend during loud vocalizations, some to a position similar to that of humans. Further, in red deer (Cervus elaphus; Fitch & Reby, 2001) and fallow deer (Dama dama; McElligott, Birrer, & Vannoni, 2006), males’ post-pubescent larynx is permanently descended.

As none of the above nonhuman species is capable of producing speech sounds, a descended larynx must be neither uniquely human nor necessary for speech production (Hauser, Chomsky, & Fitch, 2002). However, while human and chimpanzee neonates have similarly high-positioned larynxes at birth, it is only after the human larynx descends that infants can produce the full repertoire of speech sounds in the vocal code (Nishimura, Mikami, Suzuki, & Matsuzawa, 2003). Thus, though a descended larynx is not necessary or sufficient on its own, Nishimura et al.’s work demonstrates that it is an important pre-adaptation in the evolution of speech in humans. Others support such a conclusion (e.g., Fitch, 2000b; Hauser et al., 2002; Pulleyblank, 2008).

Finally, parrot speech debunks the theory that a descended larynx is necessary for speech articulation. Unlike mammals and reptiles that use a larynx, birds use a syrinx to vocalize. The two structures are morphologically and functionally distinct. In particular, the position of the syrinx is much lower in the vocal tract, sitting at the fork of the bronchi. This location allows birds to produce two sounds simultaneously (Catchpole & Slater, 2008; Nottebohm, 1971)2. In addition to having the necessary anatomy to produce sounds that are perceived by humans as speech, African Grey parrots articulate many phonemes by employing the same anatomical structures (e.g., tongue, glottis) as humans (see Pepperberg, 2010 for review). Such proficiency in articulation occurs without a larynx—descended or not.

1:1 SVT Ratio

Along a similar vein, some have speculated that speech requires the horizontal component of the vocal tract [SVTH, posterior oropharyngeal wall to the lips] to be equal in length to the vertical component [SVTV, vocal folds to the velum] (i.e., a 1:1 SVT ratio, Lieberman, 1984). As our human ancestors evolved, features such as the loss of large teeth resulted in face shortening. While the impetus for change in these features is unknown, the modifications contributed to the speech faculty, for example, by repositioning and enhancing the mobility of the tongue (Aiello & Dunbar, 1993; Lieberman et al., 1972) and lips (Liska, 1993) in the supralaryngeal pharyngeal cavity.

Like most nonhuman species, chimpanzees are incapable of speech production and possess an SVT ratio that is greater than 1:1 (Lieberman & McCarthy, 2007; Nishimura et al., 2003). Further, human infants are also unable to produce the range of sounds necessary for fully articulated speech until their larynxes descend to achieve a 1:1 ratio (Nishimura et al., 2003). The conclusions drawn regarding a 1:1 SVT ratio and the speech faculty have been supported for decades (e.g., Duchin, 1990; Lieberman & McCarthy, 2007).

While the 1:1 SVT ratio may have facilitated speech production in humans, Corballis (1991) points out that using morphology as a guide to determine a speaker’s vocal abilities can often be misleading and not generalizable. His example of the speech-mimicking mynah bird (Gracula religiosa, f. Sturnidae) provides another instance of birds throwing a proverbial monkey wrench in the list of characteristics said to be necessary for speech. Just by outward appearance, birds that mimic speech do not possess a flattened face, an oropharyngeal tongue, or most other face morphology features relevant to speech in humans. Additionally, because the syrinx sits so low in the trachea, a 1:1 SVT ratio is impossible (Catchpole & Slater, 2008; Nottebohm, 1971). Such findings could implicate a 1:1 SVT ratio as necessary for speech in humans only, but longitudinal MRI research with Japanese macaques (Macaca fuscata) concluded that the ratio and position of vocal tract anatomy in humans was probably not driven by speech requirements (Nishimura, Oishi, Suzuki, Matsuda, & Takahashi, 2008). Instead, Nishimura et al. argue the 1:1 SVT ratio may have arisen secondary to other factors.

Relevant Features Shared by All Four Groups

The four taxa under investigation in this review are distinct, and yet share two important features we argue to be relevant to speech: basic sociality and hemispheric asymmetry biased for vocalizations. While humans, parrots, apes, and songbirds are by no
means the only taxa possessing these features, they are highlighted here as the most relevant characteristics linked to speech that are common to all four groups.

Basic Sociality

The first feature common to all four groups, basic sociality, is defined here as frequent interaction with conspecifics (primates, e.g., Dunbar, 1988; parrots, e.g., Seibert, 2006; songbirds, e.g., Robinson, Fernald, & Clayton, 2008); individual recognition of conspecifics (primates, e.g., Tomasello & Call, 1997; parrots, e.g., Farabaugh & Dooling, 1996; songbirds, e.g., Stoddard, 1996); and engagement in extensive parental care of young (primates, e.g., Zeveloff & Boyce, 1982; parrots, e.g., Bucher, 1983; songbirds, e.g., O’Connor, 1984). For humans, speech acquisition requires frequent social interaction, demonstrating the essential connection between the two (Kuhl, 2007). Social interaction is also important for parrots learning to produce and use speech (Pepperberg, 1992), as well as for parrots and songbirds learning species-specific vocalizations (Marler, 1970a; Nottebohm, 1972; Pepperberg, 1999).

Features of sociality have been used frequently as predictors of social species’ vocal repertoire complexity (e.g., Aiello & Dunbar, 1993; Marler, 1977; Marler & Mitani, 1988; Philips & Austad, 1990). One feature of sociality, group size, is considered by some to be a driving factor in why our human ancestors developed such a complex vocal communication system (e.g., Aiello & Dunbar, 1993; Dunbar, 2003). According to Aiello and Dunbar’s (1993) hypothesis, as the size of our human ancestors’ groups increased, maintaining social cohesion became difficult. In this scenario, speech served the function of “social grooming” from a distance when there were not enough hands or time to physically groom everyone.

Critics of Aiello and Dunbar’s (1993) hypothesis argued that social grooming served more of a hygienic function than a social bonding one. An earlier study addressed this criticism. Dunbar (1991) correlated grooming time with body weight and group size in 44 species of free-living primates and found that time spent grooming was more closely related to the size of the group in Pongo, Pan, and Gorilla than it was to body weight. Dunbar interpreted the finding as evidence that allogrooming has a primarily social function among great apes, which strengthens the connections among sociality, group size, and possibly the emergence of a more complex vocal code in our ancestors. Even in modern humans, a language’s vocabulary size increases as a function of complexity and industrialization of a society (e.g., Corballis, 1991; Diamond, 1959; Morton & Page, 1992). That is to say, the more individuals there are in the group, the larger the vocal repertoire, presumably because there is more to talk about with more individuals.

Despite the primate literature supporting the pattern of sociality and group size predicting vocal repertoire complexity, Blumstein and Armitage (1997) presented an exception in their comparison of alarm call repertoire size in multiple species of ground-dwelling squirrels that had a variety of social systems. The authors found that social complexity explained some, but not all, of the complexity of the repertoires. In light of this exception, the authors offered other possible predictors that could influence a species’ repertoire size, including facial and vocal tract morphology, physical or biological constraints by the habitat, and specific needs such as developing different escape patterns for different classes of predators.

Along with group size, the quality of social interaction among members of a group may also influence the size and complexity of the vocal repertoire (McCowan, Doyle, & Hanser, 2002; Morton & Page, 1992). Pinker (2003) proposed that language evolved in humans, not for social “grooming” purposes, but as a means to process increasingly complex social information related to who, what, when, where, and why. Though fundamentally Pinker was referring to language and not speech as we define it in this paper (i.e., speech sounds used for communicative purposes), sociality is certainly a common theme. Similarly, among birds, Salwiczek and Wickler (2004) noted that language-like behavior correlated with sociality. Thus, for parrots and songbirds, which meet our earlier definition of basic sociality, a complex vocal repertoire may be closely related to features of higher social cognition such as the ability to address and communicate with conspecifics—further strengthening the link between sociality and the need for a complex vocal communication system.

The vocalizations of wild African Grey parrots in particular are highly complex and heterospecific, containing elements of other species’ vocalizations as well as their own (Cruickshank, Gautier, & Chappuis, 1993). Cruickshank et al.’s 4-minute recording of two wild African Greys contained more than 10 different mimicries representing nine different bird species and one fruit bat. The authors also commented on the complexity of the vocalizations; specifically, some of the mimicked sounds had been rearranged from the original species’ patterns. Similar to primates’ fission–fusion social system, wild parrots form complex social hierarchies based on age and experience with flock-mates (Del Hoyo, Elliott, & Sargatal, 1992), implicating a similar necessity for bonding and processing social information that both Aiello and Dunbar (1993) and Pinker (2003) described among primates.

Hemispheric Asymmetry for Communication

Brain asymmetry was once believed to be a unique feature of the human brain (e.g., Corballis, 1991). Now, comparative neuroanatomists have described brain asymmetry in every vertebrate class (for review, see Ocklenburg & Güntükün, 2012). In humans, left hemisphere brain asymmetry (LHA) is associated with cerebral specializations related to visuospatial and symbolic reasoning, speech production, and speech recognition (Falk, 1980, 1983; Holloway & de la Coste-Lareymondie, 1982). Some nonhumans also possess LHA related to species-specific vocalizations (e.g., sea lions, Zalophus californianus, Böye, Güntürkün, & Vauclair, 2005; mice, Mus musculus, Geissler & Ehret, 2004; some songbirds, Moorman et al., 2012; and some monkeys, Petersen, Beecher, Zoloth, Moody, & Stebbins, 1978; for primate review, see Ghazanfar & Hauser, 1999).

A number of asymmetries related to speech and language in the human brain also appear in great apes, suggesting that some asymmetries date back at least 6 million years. For example, in humans, the left Sylvian fissure defines many of the language-related areas in the left hemisphere, and this fissure is longer and straighter on the left side than on the right in both humans and apes (Galaburda, LeMay, Kemper, & Geschwind, 1978). Cantalupo and Hopkins (2001) have also identified a structural LHA in Brodmann’s area 44 (i.e., Broca’s area) in great apes. Brodmann’s area 44 in humans has long been considered critically involved in speech and language, although its exact role is still debated (Vargha-Khadem, Gadian, Copp, & Mishkin, 2005). The primate homologue of Brodmann’s area 44 is part of the system involved in the production and perception of grasping movements, and the lateralization of this area in great apes may signal the emergence of a communicative function. That function, though, may have had more to do with gestural than with vocal communication, and it is noteworthy that Broca’s area in humans is activated both by signers when signing and speakers when speaking (Horwitz et al., 2003).

This need not rule out the possibility that the lateralization of Brodmann’s area 44 in great apes was a precursor to speech. Direct stimulation of this area in the chimpanzee produces movements of the tongue and larynx, but no sound (Bailey, von Bonin, & McCulloch, 1950), and more recently Ghazanfar and Rendall (2008) showed similarly that electrical stimulation of the motor cortex produced lip and facial movements and vowel sound production in humans, but stimulation of the homologous area in apes and monkeys resulted in tongue, facial, and vocal cord movement but no actual sound. One possibility is that the homologue of Broca’s area in our primate and hominin precursors was initially specialized for communication through visible gestures, with vocalization incorporated in the course of hominin evolution (Cantalupo & Hopkins, 2001; Corballis, 2010; Rizzolatti & Arbib, 1998).

Such a scenario receives some support from the anatomy of vocal production. Vocalization in nonhuman primates depends on the supplementary motor area (SMA) and cingulate cortex along with diencephalic structures, a system that is primarily dedicated to emotional and instinctive vocalization with at best limited control (Jürgens, 2002). A recent study shows, for instance, that chimpanzees can direct food calls to specific individuals, such as those with whom the caller is friendly, implying a degree of intentional control (Schel, Machanda, Townsend, Zuberbühler, & Slocombe, 2013), but the calls themselves are species-specific, and largely innately structured. The learning of novel vocal patterns depends on a pathway from the face area of the motor cortex to the nucleus ambiguous, which controls muscles of the larynx (Simonyan & Horwitz, 2011). Among mammals, this appears to be unique to humans, or at least much more profuse in humans than in other mammals. This is further discussed below.

As in apes and humans, the communication areas of songbirds’ and parrots’ brains are functionally lateralized and hemispherical asymmetry is present with a bias for communication areas (Bottjer & Arnold, 1985; Nottebohm, 1970, 1977). While some songbirds have LHA for communication areas (e.g., Moorman et al., 2012), others, like zebra finches (Poephila guttata), are right-hemisphere biased (e.g., Williams, Crane, Hale, Esposito, & Nottebohm, 1992). Likewise, of the nine parrot species Rogers (1980) tested, all but one were left-foot dominant, where footedness is a measure of cerebral lateralization. Such differences across species suggest that species-wide laterality itself may be important, regardless of the direction.

The avian cerebrum has a nuclear organization rather than a layered one as in mammals, which makes identifying homologies and analogies between avian and mammalian brains difficult. Currently, seven specific vocal production nuclei are recognized in the avian brain—four in the posterior and three in the anterior areas of the brain (Feenders et al., 2008). Early researchers hypothesized that because birds lacked language, a structure comparable to Brodmann’s area 44 was nonexistent. Further, due to substantial differences in the organization of avian and mammalian brains, early neuroanatomists were unable to distinguish clearly a Brodmann’s area 44 homologue or analogue based on estimations of location alone. Since then, two structures, the magnocellular nucleus of the anterior nidopallium (MAN) and the hyperstriatum ventral pars caudale (HVC), have been proposed as an analogue to Brodmann’s area 44 in songbirds that learn their vocalizations (Bolhuis & Gahr, 2006). Bottjer, Halsema, and Arnold (1984) lesioned the lateral MAN in juvenile and adult zebra finches and found that adult birds’ songs were unaffected while juveniles’ vocalizations were severely abnormal. Thus, the MAN is theorized to be involved with early song development in songbirds which must learn their vocalizations. Intra- and extra-cellular recordings of the HVC of canaries (Serinus canaries), white-crowned sparrows (Zonotrichia leucophrys), and zebra finches demonstrated that the HVC’s role in song production is related to auditory feedback, which is necessary for normal song development (McCasland & Konishi, 1981).

Unlike songbirds, parrots develop calls (i.e., brief, simple sounds) rather than songs (i.e., long series of individual notes); and different vocalization control pathways are involved (e.g., Feenders et al., 2008). In at least budgerigar parrots, the oval nucleus of the anterior nidopallium (NAO) is presumed analogous to the songbird MAN. Likewise, the lateral neostriatum (NLC) is considered comparable to the songbird HVC (Feenders et al., 2008). The NLC is involved with the production, but not development, of learned vocalizations including speech sounds (Lavenex, 2000). Lavenex’s studies with budgerigars revealed disturbances in the ability to modulate properly the amplitude of vocalizations when the area was lesioned.

The features of vocalization-related structures in parrots and songbirds are also different. Differences lie in (a) how auditory stimuli are received, (b) the mechanisms by which sounds are produced (Striedter, 1994), (c) the nuclei involved in the vocalization pathway (Jarvis & Mello, 2000), and (d) the overall orientation of nuclei in the vocalization pathway (Matsunaga, Kato, & Okanoya, 2008). Currently, it is unknown how the differing features in the vocal production pathways of songbirds and parrots contribute to learning vocalizations, memorizing complex vocalizations, producing vocalizations, and learning to incorporate speech into the vocal repertoire (in the case of parrots).

Speech-Related Features Not Present in Apes

Despite our genetic closeness to apes, similarities are far greater between avian and human communication systems with respect to features of vocal production and vocalization acquisition (for review, see Fitch & Jarvis, 2013; Petkov & Jarvis, 2012). A review of the literature suggests two additional features—vocal learning and heightened respiratory control—create a dividing line between apes and the human–songbird–parrot triad. While the position of the vocal apparatus may contribute to difficulty in producing some speech sounds (e.g., Nishimura et al., 2003), ultimately, speech production is rendered impossible for apes due to the inability to imitate vocalizations readily and to produce a sufficiently long and controlled airstream.

Vocal Learning

Vocal learning species acquire their vocalizations through experiential mechanisms. As Jarvis (2004) points out, vocal learning requires auditory learning (i.e., the ability to create associations with auditory stimuli), but it is distinct from auditory learning. Most nonhuman animals are auditory learners. With respect to speech, this means that although they can be trained to learn the meanings of spoken words (e.g., dogs, Canis familiaris, Kaminski, Call, & Fischer, 2004; apes, see Savage-Rumbaugh, Shanker, & Taylor, 1998, for review), they do not use auditory learning to develop their own species-specific repertoires. Vocal learning nonhuman taxa include hummingbirds, songbirds, and parrots (Nottebohm, 1972); cetaceans (McCowan & Reiss, 1997); some pinnipeds (e.g., Mirounga leonine, Sanvito, Galiberti, & Miller, 2007); bats (e.g., Phyllostomus hastatus, Boughman, 1998); and elephants (e.g., Poole, Tyack, Stoeger-Horwath, & Watwood, 2005). Vocal learners are able to imitate species-atypical sounds, but most build their species-specific repertoires by imitating sounds of their own species. Some vocal learning species, such as the lyrebird, incorporate a large variety of sounds from the environment into their repertoires (e.g., Dalziell & Magrath, 2012). Zann and Dunstan (2008) reported over 20 different species’ vocalizations in their recordings of 10 male lyrebirds. Further, 16% of the vocalizations could not be attributed to any animal species, illustrating lyrebirds’ tendency to incorporate non-animal sounds into their repertoires.

Among primates, only humans are classified as vocal learners. Nonhuman primates do show some modification of vocal output, but this seems to be based largely on modification of innate calls through altering positioning of the mouth or lips rather than through control of the larynx. For instance, chimpanzees can produce novel sounds to attract attention by puckering and vibrating their lips to create a “raspberry” sound (Hopkins, Taglialatela, & Leavens, 2007), and captive orangutans (Pongo pygmaeus) have spontaneously matched human whistles (Lameira et al., 2013). In contrast, in humans there is precise control of voicing itself, allowing for a far wider repertoire of different learned patterns. A likely reason for this is that in humans there is a direct connection from the face area of the motor cortex to the nucleus ambiguous, which controls muscles of the larynx (Simonyan & Horwitz, 2011). Although this connection is generally regarded as unique to humans, there is evidence for a similar, if sparse, pathway in mice, allowing for a degree of learning in their ultrasound vocalizations (Arriaga & Jarvis, 2013). Petkov and Jarvis (2012) do not rule out the possibility of sparse connections between the nonhuman primate motor cortex and vocal control, but it appears that only humans possess the density of projection for prolific vocal learning. Nevertheless, such evidence for learned vocalizations in nonvocal learning species has led Arriaga and Jarvis (2013) to criticize the vocal learner–nonvocal learner dichotomy and offer a spectrum-based approach to studying vocal learning. According to their Continuum Hypothesis framework of vocal learning, our conclusion that parrots, songbirds, and humans exhibit far more vocal learning than apes would still hold true.

Feenders et al. (2008) noted parallels between the vocal control systems of humans and of birds, such as parrots, songbirds, and hummingbirds, that are vocal learners. In both groups, the systems divide into anterior and posterior components. The posterior component in birds includes the vocal nuclei that produce the call or song; the posterior component in humans includes the region within the face area of the motor cortex that connects with control of the laryngeal muscles, as described earlier. The anterior component in birds controls the sequencing and learning of vocal productions; in humans, it includes Broca’s area, along with the anterior striatum and anterior thalamus, critical to the production of speech. This system is distinct from the systems underlying innate nonhuman song patterns or calls.

These systems in humans and birds are very similar in architecture, and Feenders et al. (2008) propose that they derive from a more general motor system inherited from the common ancestor of birds and mammals. In most mammals and birds, that motor system is dedicated to physical movement of the body, and control over the system is present only in the relatively rare cases of vocal learners. In parrots, songbirds, and hummingbirds, the vocal learning nuclei are adjacent to the nuclei controlling limb and body movements, while in humans, laryngeal control lies within the face area, which in turn is adjacent to the area controlling hand movements. In the evolutionary scenario proposed by Feenders et al., the incorporation of vocal control did not require the emergence of new structures. Following Finlay, Cheung, and Darlington (2005), they suggest that new cortical areas arise from the enlargement of older areas, with part of an enlarged area allocated to a new function. It is further suggested that this might be accomplished through the duplication of a gene, with one copy retained for the original function and the other used for the new function (Ito, Ishikawa, Yoshimoto, & Yamamoto, 2007).

As described earlier, the organization of the vocal control system in parrots is rather different from that in songbirds. As Feenders et al. (2008) put it, the posterior motor pathway, along with the vocal portion, is shifted forward and laterally, although still posterior to the anterior portion. Feenders et al. suggest that if the motor part of the nidopallium moved with the arcopallium forward and laterally, the supralateral nidopallium (SLN) in parrots may be the homologue of dorsolateral nidopallium (DLN) in other birds. They also note that the parrot nidopallium is much larger relative to body size than in songbirds, and suggest that sensory pathways in the posterior nidopallium may also have been expanded, displacing the anterior forward and laterally. While it is only speculation, the answer to why parrots are the most versatile of avian vocal learners, to the point that they can learn to communicate using speech, could be housed within these nuanced anatomical differences.

Fitch (2010) hypothesized that vocal learners have an evolved need to communicate with a more complex repertoire in order to, for example, identify group members, engage in elaborate reproductive rituals or mate attraction, or communicate effectively in highly variable environments. According to these needs, Fitch’s hypothesis should also include apes, further muddying the waters of why they are not vocal learners. Corballis (2010) and Knight (1998) provide two different possible explanations. Corballis (2010) posits that apes could be more accurately described as “gestural learners” given the variety of discrete information apes can communicate with manual gestures (e.g., sharing food/objects, instigating co-locomotion, stopping a social partner’s action, Cartmill & Byrne, 2010). This fits with the scenario, outlined above, in which vocal control emerged from a preexisting system dedicated to movements of the limbs, including the hands. A further consideration, proposed by Knight (1998), is that innately programmed vocalizations, rather than learned ones, prevent the possibility of vocal deception, or “crying wolf”—thereby keeping vocal signals honest among individuals. Perhaps apes faced stronger selection for honest signaling (mediated through species-typical vocalizations) than for a variable, complex repertoire (mediated through vocal learning). Important to note, deception has been documented in language-trained apes (e.g., Savage-Rumbaugh & McDonald, 1988), suggesting “crying wolf” is within the realm of apes’ cognitive abilities.

While some have argued that similarities between humans and nonhumans should be easiest to find by looking to our nonhuman primate relatives (e.g., Whitaker, 1976), many agree that songbird and parrot vocalizations are more akin to human speech and language than are the calls of nonhuman primates (e.g., Hauser et al., 2002; Passingham, 1981; for a counterclaim that birdsong is more signal than symbol, see Zlatev, 2002). The foregoing review of primate vocalization shows that nonhuman primates demonstrate only limited evidence of spontaneous vocal learning (e.g., Lameira et al., 2013). After extensive training, even chimpanzees show very limited evidence of vocal learning or vocal imitation (e.g., Hayes & Hayes, 1951). Unlike vocal learners’ vocalizations that are learned from conspecifics and emitted intentionally, nonhuman primates’ vocalizations are largely innate and elicited by emotion (e.g., Corballis, 2003; Hauser et al., 2002; Jarvis, 2004; Robinson, 1967). Exceptions to this include titi monkeys (Callicebus cupreus, Müller & Anzenberger, 2002) and the lesser apes (siamangs and gibbons; f. Hylobatidae, e.g., Geissmann, 1999, 2002), which are known to modify their vocalizations to converge upon pair-specific duet “songs” among bonded individuals. Additionally, chimpanzees can direct vocalizations to specific individuals, implying some degree of intentionality in communication (Schel et al., 2013). Even in exceptions such as these, intentional use of vocalizations does not extend to the majority of the repertoire as it does with vocal learners. As a group, then, nonhuman primates do not require a vocal repertoire “tutor,” despite their characteristically highly social group-living, which we and others (e.g., Fitch, 2000b) would predict should offer substantial reason and opportunity for vocal learning.

To demonstrate the lack of necessity for a tutor in nonhuman primates, Winter, Handley, Ploog, and Schott’s (1973) work examined vocal development in infant squirrel monkeys (Saimiri sciureus) reared with muted mothers in the absence of species-specific vocalizations. Their vocal repertoires were virtually identical to those of normally reared infants. In addition, the auditory-isolated repertoires were no different from normal adults’ repertoires, further illustrating the innate nature of nonhuman primate vocalizations. These results are similar to those of isolation studies with vocal non-learner birds such as chickens (Gallus gallus domesticus) and doves (Streptopelia risoria, Konishi, 1963; Nottebohm & Nottebohm, 1971). We do acknowledge that prenatal exposure to vocalizations can significantly influence the vocalizations of developing young (e.g., Gottlieb, 1963); so some degree of vocal learning inside the egg or womb must always be considered a possibility.

Documented rare cases of extreme child neglect in humans (e.g., Curtiss, 1979) and auditory isolation studies with songbirds (e.g., Marler, 1970b) and budgerigars (e.g., Heaton & Brauth, 1999) confirm the necessity of a tutor for normal species-specific vocalization development. Without a tutor, disturbances arise in the production of species-specific vocalizations. Under normal developmental conditions, auditory stimulation provided by a tutor is hypothesized to serve as a model after which the learner modifies its output (Keller & Hahnloser, 2009; Prather, Peters, Nowicki, & Mooney, 2008). To do this, the brain connects
auditory stimuli with the required motor movements necessary to reproduce what was heard.

Research with mammals and birds has confirmed regions in the cerebrum (avian telencephalon) to be responsible for vocalizations in vocal learners (e.g., Jürgens, 1995), while regions important to vocalizations in non-learners are located in the midbrain’s limbic
system and medulla (e.g., Robinson, 1967; Wild, 1997). Robinson (1967) stimulated hundreds of neocortical sites in rhesus macaques (Macaca mulatta), and no vocal production was evoked, further confirming that the limbic system and medulla are sufficient for nonhuman primate vocalizations. In vocal learning birds, unique sub-pathways underlie vocalization production. One is a vocal motor pathway responsible for producing learned vocalizations, and the other is a pallial–basal ganglia–thalamic loop which is responsible for modifying and learning vocalizations (Jarvis, 2007). Further, vocal learning birds possess uniquely similar expression of one gene that is unexpressed in non-learners (Matsunaga et al., 2008). Despite well-documented differences in brain anatomy between humans, parrots, and songbirds (e.g., Jarvis, 2004; Paton, Manogue, & Nottebohm, 1981; Striedter, 1994), similarities in vocalization acquisition and production do exist, most relevant of these to speech is the shared commonality of vocal learning.

Heightened Control Over Respiration

While vocal learning stands out in the literature as a clear divider between apes and the human–parrot–songbird triad, many recognize the significant role that heightened control over respiration plays in normal speech production (e.g., Campbell, 1968; Lieberman, 1984; MacLarnon & Hewitt, 1999). Given the substantial, finely controlled respiratory requirements for the production of the speech stream in humans (Ghazanfar & Rendall, 2008; Lieberman, 1984), we posit that this feature should be included as a speech requisite.

In mammals and birds, the lungs provide the necessary subglottal airstream, modulated by the larynx or syrinx, respectively, to create and modify sound (Fitch & Hauser, 1995). In humans, quiet breathing is disrupted in order to produce speech sounds. Most speech vocalizations occur in a spontaneous cycle of long expirations (words) which are punctuated by rapid, silent inspirations (Ghazanfar & Rendall, 2008; MacLarnon & Hewitt, 1999, 2004). Finely tuned control of the respiratory system in response to cognitive factors is required for a speaker to time inspirations in order not to lose his or her breath while vocalizing (MacLarnon & Hewitt, 1999). Early work by Ladefoged (1968) also highlighted the importance of finely controlled breathing for varying emphasis, pitch, and intonation of words.

The breath control necessary for speech production is estimated to have appeared in humans about 600,000 years ago (MacLarnon & Hewitt, 2004). Support for this comes from studying the size of the thoracic vertebral canal in hominid fossils. This canal expanded over time to allow for enhanced innervation of the intercostal and abdominal muscles as more finely tuned speech breathing developed. Several 600,000-year-old Neanderthal specimens had canals similar in size to modern humans (MacLarnon & Hewitt, 1999). Conversely, Homo ergaster, who lived approximately 1.6 million years ago, as well as earlier hominids, had a small thoracic vertebral canal that was comparable in size to extant nonhuman primates (MacLarnon & Hewitt, 2004).

Compared to nonhuman primate vocalizations, speech is extremely taxing on the respiratory system. The rate at which nonhuman primates produce sequences of vocalizations is limited by their tendency to vocalize using a one-sound-per-breath pattern (Ghazanfar & Rendall, 2008; MacLarnon & Hewitt, 2004). On the other hand, an average human speech stream full of different sounds may last as long as 12 seconds (Winkworth, Davis, Adams, & Ellis, 1995). Among nonhuman primates, average vocalization streams are variable in length; however, longer vocalization streams are associated with species that rely upon elaborate vocal apparatus to increase vocalization length. The indri (Indri indri), a prosimian that uses air sacs to increase vocalization length, has a vocalization stream of 5 seconds (Thalmann, Geissmann, Simone, & Mutschler, 1993). The howler monkey (g. Alouatta), which possesses a large air sac beneath the hyoid bone that acts as a resonating chamber, as well as two lateral air sacs, has been documented vocalizing for more than 50 seconds in one breath (Sekulic & Chivers, 1986). The lesser apes, which also have a large air sac (Boer, 2009), may vocalize up to 30 seconds in one “great call” (Haimoff, 1983).

Unlike howler monkeys and indris, great apes (and humans) do not have specialized vocal apparatus, although great apes (but not humans) also have large air sacs (Boer, 2009). Chimpanzees’ longest documented stream is about 1.6 seconds (Clark & Wrangham, 1993; Marler & Tenaza, 1977), and orangutans’ and gorillas’ just over 2 seconds (Hardus et al., 2009; Salmi, Hammerschmidt, & Doran-Sheehy, 2013). These data may come as a surprise considering humans and apes share similarly sized lungs relative to body size (e.g., Stahl, 1967). Given this, some feature pertaining to the control of respiration rather than anatomical properties of the lungs or vocal tract must differentiate apes’ and humans’ vocal-breathing characteristics. The specific role that nonhuman primates’ air sacs play in vocalization is unclear, but Boer (2009) suggests that an air sac would actually reduce the ability to produce speech.

For birds, the demands of flight have resulted in a highly specialized respiratory system. Pressure differentials created by air passing through the air sacs, bronchi, and lungs contribute to vocalization production (e.g., Elemans, Muller, Larsen, & van Leeuwen, 2009). The tongue, larynx, and other relevant anatomy are reduced in size, and the primary breathing/vocalizing apparatus—the syrinx—sits close to the lungs (Deacon, 1997). Human and songbird (and presumably parrot) vocalizations require controlled coordination of laryngeal and syringeal (respectively), respiratory, and vocal tract muscles (e.g., Suthers, Goller, & Pytte, 1999; Wild, 1997). Despite many differences, songbird respiration during vocalization is similar to that of nonhuman primates in that songbirds respire between almost every song note. This often results in rapid “mini-breaths” between complex trill sounds which can be as quick as 25 notes per second (Calder, 1970; Wild, Goller, & Suthers, 1998). Nevertheless, songbirds such as the winter wren (Troglodytes troglodytes) produce vocal streams as long as 41 seconds (Clark, 1949), far surpassing humans, and approaching the length of nonhuman primate species with specialized vocal apparatus. Clark also noted that the winter wrens’ 41 seconds were comprised of songs, far more difficult to produce than the one-note howler monkey howl. Such a feat is made even more difficult because birds lack a muscular diaphragm, making both inspiration and expiration active processes (e.g., Codd, Boggs, Perry, & Carrier, 2005; for review of avian respiratory morphology, see Codd, 2010). Thus, though songbirds breathe in between notes, the breaths are very small and require a substantial effort on the part of the bird; yet enough air is inspired to sustain lengthy, complex songs. An early investigation of budgerigars demonstrated that the air inspired during mini-breaths provides little to no air-intake value; rather, the inspiration goes completely to vocalizing (Tucker, 1968). Long-duration vocalizations require a strong, finely tuned respiratory system that undergoes regular periods of apnea without disrupting or distorting vocal output. Such vocal control is more on par with the physical demands associated with speech breathing than nonhuman primate vocalization breathing.

Finally, vocalization is associated with controlled activation of skeletal muscle system neural pathways in humans, songbirds, and parrots (Deacon, 1997; Paton et al., 1981; Sturdy, Wild, & Mooney, 2003; for review, see Wild, 1997). In contrast, activation in nonhuman primates occurs via visceral muscle system pathways. According to Deacon (1997), recruitment of skeletal rather than visceral muscle systems allows for more finely tuned breathing and therefore a more flexible range of vocalizations that is characteristic of humans and birds, but not apes. Nevertheless, examples of controlled breathing do exist, such as Lameira et al.’s (2013) report of whistling orangutans and Perlman, Patterson, and Cohn’s (2012) description of Koko the gorilla’s fake coughs, nose blowing, and wind instrument playing. Relevant to note, Koko’s “toots” on the instruments were all less than 2 seconds, the length reported for wild gorilla vocalizations. Perlman et al. concluded that the ability to control breathing is not dichotomous, with humans being able and the great apes being unable. Rather, they hypothesized that great apes can demonstrate some degree of controlled breathing provided a motivating and relevant environment (e.g., human models encouraging the behavior).

Complex Sociality: Where Parrots and Songbirds Differ

So far, this review has proposed that speech is associated with four major features: basic sociality, hemispheric asymmetry in communication areas, vocal learning, and heightened respiratory control. The groups possessing all four of these features are humans, songbirds, and parrots. This final section proposes that complex sociality separates humans and parrots from most temperate songbirds (see Figure 2). Similar to defining “basic sociality,” arriving at an appropriate definition of “complex sociality” is difficult. Nevertheless, we define complex sociality in this paper as the presence of discrete repertoire elements for
affiliative nonsexual social interaction with conspecifics
, social correlates of intelligence, and hierarchical relationships among group members. Others (e.g., Knight, 1998) have highlighted complex forms of sociality as a catalyst for the development of complex communication systems like speech. While there are exceptions to each of the criteria posed, complex sociality as we have defined it is present in humans, parrots, and apes. As discussed earlier, vocal learning and heightened respiratory control (and certain anatomical features) make word-production impossible for apes, thus precluding them from the speech faculty as we have defined it here (i.e., production and communicative use). A detailed summary of social organization, anatomical, and repertoire-related features is provided in Table 1 for all four animal groups.

Heightened Sociality and the Vocal Repertoire

Repertoire complexity and repertoire size are distinctly different. Many have linked sociality to the size of a species’ repertoire by hypothesizing that a larger repertoire affords an individual the ability to vocalize with greater detail about more numerous experiences (e.g., Aiello & Dunbar, 1993; Blumstein & Armitage, 1997; Dunbar, 2003; McCowan et al., 2002; Morton & Page, 1992). Linguists estimate that while the Oxford English Dictionary defines over 600,000 separate words, the average native English-speaking university graduate’s repertoire contains around 20,000 word families (i.e., excluding archaic words, proper names, compound words, abbreviations, alternative spellings, and dialect forms; Goulden, Nation, & Read, 1990; Nation & Waring, 1997). By comparison, great apes, parrots, and a representative songbird, black-capped chickadees (Parus atricapillus) are estimated to have repertoire sizes of under 100 distinct vocal types, where a “type” could be a call or song (e.g., bonobos, Bermejo & Omedes, 1999; mountain gorillas, Fossey, 1972; chimpanzees, Goodall, 1986; lowland gorillas, Harcourt, Stewart, & Hauser, 1993; Salmi et al., 2013; parrots, Bradbury, 2003; black-capped chickadee, Ficken, Ficken, & Witkin, 1978). With the exception of more prolific oscine songbirds like the nightingale (Luscinia megarhynchos), with a repertoire containing over 200 elements due to song syllables (Kipper, Mundry, Sommer, Hultsch, & Todt, 2006), typical ape, songbird, and parrot distinct vocalization repertoires are within the same order of magnitude as that of the European badger (Meles meles), a vocal non-learning social mammal (Wong, Stewart, & MacDonald, 1999), and two orders of magnitude smaller than the repertoire size (synonymous with vocabulary) of humans. Yet, despite the vastly differently sized repertoires between humans and the other three groups, as well as the fact that one of the groups does not engage in vocal learning, each group is still classified as social; this suggests repertoire size is neither a perfect predictor of sociality nor related to the ability to produce and use speech.

It is important to note that meaningful determination of a vocal repertoire’s complexity or size must make assessments of call morphology together with perceptual determinations of the salience of call features. Relying on sound differences alone results in an incomplete investigation. Such challenges may explain the vast differences in reported repertoire size among nonhumans, and make it difficult to compare repertoire size across taxonomic groups. This is especially true of those avian species for which repertoire complexity is a territory and reproduction arms race among males. In these cases, selection favors males with repertoires consisting of a variety of parsed and novel vocalizations. Attempts to count specific syllables to arrive at species-typical “repertoire size” would be difficult, and possibly uninformative. For example, Tu, Osmanski, and Dooling (2011) reported 116 different elements in a budgerigar’s warble song. The question of whether each distinct element provides discrete information, or if the elements’ organization or repetition provides discrete information, is unknown and beyond current bioacoustic techniques.

Quantifying repertoire size among parrots presents additional concerns as some species have at least three classes of vocalizations: emotion-driven sounds (e.g., agnostic shrieks), intentional sounds (e.g., contact calls), and dialect-based sounds that are unique to individuals and groups for purposes of self- and group-identification (e.g., Berg, Delgado, Cortopassi, Beissinger, & Bradbury, 2012; Salinas-Melgoza & Wright, 2012). Some songbirds also show similar evidence of individual variations and dialects in their vocalizations (e.g., song sparrow, Melospiza melodia, Harris & Lemon, 1972). Decisions regarding repertoire size determination are further complicated by factors such as these, and must be made carefully. To date, we are not aware of any consistently used, appropriate methodology for comparing repertoires across species.

While repertoire complexity and size do not appear to be appropriate for comparison, functionality of the repertoire seems to provide reliable information regarding sociality and may offer a reliable difference between songbirds and parrots and humans. Among temperate songbird species, vocalizations are generally limited to males and to contexts of territory defense and reproduction (Catchpole & Slater, 2008; Kroodsma & Miller, 1996). Male songbirds have a larger syrinx than females, despite similar body size, indicating sexual selection for more robust vocalizations (Riede, Fisher, & Goller, 2010). Further, female zebra finches, for example, have only rudimentary versions of certain song-learning and song-production telencephalon areas and produce only innate vocalization patterns (Nottebohm & Arnold, 1976). This sexual dimorphism is not found in tropical duetting songbirds like chats (f. Muscicapidae) and some species of wrens where females are more vocally active (Brenowitz, Arnold, & Levin, 1985). Taken together, songbird evolution has selected for anatomy and vocal production that facilitates the repertoire’s function—whether it is communicating to conspecifics about resources and mating, or strengthening bonds in mated pairs.

Unlike songbirds in temperate regions, both sexes of parrots use learned vocalizations throughout the year in a variety of contexts that are unrelated to reproduction (Bradbury, 2003). According to Bradbury, many adult parrot calls promote cohesion, affiliation, and information transfer among individuals. These calls include, but are not limited to, a loud contact call for maintaining connection, a soft contact call for coordinating movement in dense vegetation, a pre-flight call to notify group members of an individual’s impending departure, and a paired duet call. In black-capped chickadees, a majority of calls are classified as being involved with reproduction, coordination of group movement, and various agonistic encounters with conspecifics (Ficken et al., 1978). As an exception, the Carolina chickadee’s (Poecile carolinensis) “chick-a-dee” call has been implicated in social cohesion (Freeberg & Harvey, 2008). Likewise, according to Ficken et al. (1978), black-capped chickadees have “broken dee” and “faint fee-bee” calls that attract males to females that are out of sight; however, only the “chick-a-dee call complex” is implicated in pair and flock cohesion, such as recruitment of individuals to mob predators of differing threat levels (Templeton, Greene, & Davis, 2005).

Even the social processes involved with vocalization acquisition differ greatly between songbirds on the one hand and parrots and humans on the other. Similar to human children, wild African Grey parrot juveniles learn vocalizations through affiliative social interaction with parents and flock-mates (e.g., Berg et al., 2012; Nottebohm, 1970). Conversely, many songbirds learn their vocalizations directly and indirectly through hearing aggressive, territorial interactions of their fathers and neighboring conspecifics (e.g., Nuttall’s white-crowned sparrows, Zonotrichia leucophyrys nuttali, Bell, Trail, & Baptista, 1998; European starlings, Bertin, Hausberger, Henry, & Richard-Yris, 2007; zebra finches, Zann, 1990). In zebra finches, presence and interactions with even male siblings can contribute to features of a male’s song (e.g., Tchernichovski, Lints, Mitra, & Nottebohm, 1999; Tchernichovski & Nottebohm, 1998).

From this and earlier evidence, socializing is much different in songbirds than in humans and parrots with respect to the vocal repertoire. Humans and parrots use their vocalizations to foster strong, positive bonds that last years (i.e., across breeding seasons, in parrots). In most species studied, songbirds typically use their vocalizations to attract and retain mates and to defend territories one breeding season at a time (Catchpole & Slater, 2008; Kroodsma & Miller, 1996). These contrasting overall functions of the vocal repertoire support the argument that speech capabilities may somehow be linked to fundamental differences in the functions of the repertoire and communication (e.g., Brown & Farbaugh, 1997). That is to say, parrots, like humans and great apes, may naturally have “more to say” because of their more diverse social interactions that extend beyond reproduction. This richer level of sociality may make parrots better suited than songbirds to produce human speech sounds and readily adopt the use of them for interspecies communication.

Social Correlates of Intelligence

Intelligence in birds has been linked to many features of sociality, such as interactions not obviously related to survival (Burish et al., 2004), group size in fossil and extant primates (Aiello & Dunbar, 1993; Sawaguchi & Kudo, 1990), and the tendency toward altricial young (which require extensive parental care) in avian species with large adult brains (Portmann, 1946)3. According to the Social Intelligence Hypothesis (see Byrne & Whiten, 1988), human intelligence was enhanced by the numerous roles, interactions, and experiences that came as a result of living socially. Empirical work with nonhumans has supported this theory (e.g., Burish et al., 2004; Reader & Laland, 2002; for counterevidence see Beauchamp & Fernández-Juricic, 2004; for alternative views see Zuberbühler & Janmaat, 2010; Melin, Young, Mosdossy, & Fedigan, in press). Burish et al. (2004) presented a meta-analysis of 154 bird species’ social structures, eating habits, migration habits, flight habits, mating systems, and vocalization qualities. The authors then correlated each of these factors to a telencephalon-to-whole-brain ratio. The results demonstrated that transactional (defined as engaging in at least between-individual social interaction), monogamous, herbivorous species that did not migrate, but did fly, and that were vocal learners had the largest telencephalon ratio. African Grey parrots and many other speech-using psittacids possess all of these features.

In Burish et al.’s (2004) meta-analysis, the 20 largest telencephalon ratios belonged to species of parrots, corvids, woodpeckers, and owls, with parrots never ranking below 33rd on the list of 154. Interestingly, the lowest ranking psittacid was the budgerigar, the species used most frequently in comparative research. The five species with the largest telencephalon ratios were (in order) the blue-and-yellow macaw (Ara ararauna), the red-and-green macaw (Ara chloropterus), the common raven (Corvus corax), the African Grey parrot, and the yellow-crested cockatoo (Cacatua sulphurea). The first true songbird, the Eurasian skylark (Alauda arvensis) ranked 23rd, and the next, the blue tit (Parus caeruleus) was 29th. While these are highly ranked, songbirds were scattered in the list, with the European robin (Erithacus rubecula) appearing 121st out of 154. Zebra finches, commonly used in comparative research, appeared 62nd. Though making generalizations from a meta-analysis is difficult, the data are congruent with Byrne and Whiten’s (1988) Social Intelligence Hypothesis in that parrots are both highly social and have relatively large brains.

While many nonsocial species’ lifespans can exceed 70 years (e.g., European pond turtle, Emys orbicularis, Gibbons, 1987; lake sturgeon, Acipenser fulvescens, Thomas & Haas, 2004), there are clear social and cognitive correlates of long lifespans (e.g., Carey & Judge, 2001). According to Carey and Judge, species with long lifespans have more time for intergenerational transfer of information. In addition, a longer lifespan allows for stronger social bonding due to years of exposure and accumulated experiences with group members. This may explain why humans, parrots, and cetaceans (another long-lived, highly social taxon) use signature vocal “tags” to recognize individuals (e.g., Bruck, 2013; Janik & Sayigh, 2013; Quick & Janik, 2012; Saunders, 1983). This characteristic is not prevalent in shorter-lived species. Chickadees and house finches (Carpodacus mexicanus) represent two short-lived species with vocal tags (Bradbury, 2003). Carey and Judge’s lifespan data suggest a strong relationship between complex sociality and lifespan (e.g., humans, 100+ years; parrots, 70 years; cetaceans, 40–70 years, with George et al., 1999, estimating 100+ years for bowhead whales, Balaena mysticetus; apes, 60 years). For comparison, exemplar songbird species discussed in this review live less than 10 years (e.g., zebra finch, 5 years, Burley, 1985).

The opportunity for division of labor also may somehow relate to social intelligence in highly social species (Carey & Judge, 2001). Division of labor within a vertebrate “society” requires substantial cooperation and interaction among group members—including information transfer and the cognitive capacity to remember other individuals’ identities and roles within the system. Primate species exhibit various degrees of division of labor (for review, see Galdikas & Teleki, 1981). Division of labor among songbirds (excluding parental care) has yet to be documented; however, sentinel behavior, a transient labor role within a society, is seen in some parrots (Levinson, 1980). Whether or not there is a causal relationship between the speech faculty and the presumed intelligence associated with complex sociality as we have defined it is difficult to determine at this point. What is clear is that there are definite similarities between humans and parrots with respect to these features, and that both groups’ heightened sociality distinguishes them from songbirds.

Concluding Thoughts

Given the inability of apes to speak, the common capacity of parrots and humans to produce and communicate with speech sounds must be examples of parallel evolution, arrived at for similar purposes but via different routes. One possibility for humans, hinted at earlier, is that speech arose from manual and facial gestures, perceived visually rather than auditorily. This idea has a long but intermittent history, dating at least from the writings of Rousseau and Condillac in the 18th century. It was revived by Hewes (1973) and has since found support from a variety of considerations, including the efficiency and linguistic sophistication of sign languages (Armstrong & Wilcox, 2007), the role of the mirror system in primates (Rizzolatti & Sinigaglia, 2008), the strong neurophysiological and behavioral links between hand movements and mouth movements (Gentilucci & Corballis, 2006), and the nature of gestural communication in great apes, both in captivity and in the wild (e.g., Tomasello, 2008).

Indeed, there is some suggestion that speech may have superseded a manual sign language within the past 100,000 years (Corballis, 1991, 2010). Gestures by captive chimpanzees and bonobos (Pollick & de Waal, 2007), as well as wild chimpanzees (Hobaiter & Bryne, 2011a, 2011b), appear to be more diverse and flexibly used than the vocal calls used by the species. As Pollick and de Waal (2007) reported, apes’ gestural repertoires were larger than their repertoires of facial/vocal signals. The authors also noted bonobos’ usage of multimodal communication, whereby combinations or serially produced gestures and facial/vocal signals elicited greater responsiveness by the receiver. The development of sign language in deaf infants shows remarkable parallels with that of speech, including manual ‘babbling’ (e.g., Petitto & Marentette, 1991) and similar overall phonological, morphological, and syntactical organization (e.g., Klima & Bellugi, 1979).

There remains the question of why speech would have superseded manual gesture. There are several possible answers. One is that speech frees the hands for other activities, such as carrying objects, and eventually for making and using tools. Speech is also a system of gestures, involving movements of the tongue, lips, velum, and larynx (Studdert-Kennedy, 1998), and moving the gestural system away from the external limbs into the mouth would have been much more efficient in terms of the expenditure of energy. This was perhaps an early example of miniaturization. Speech also holds the advantage at night, or when physical barriers intervene. Even so, people still gesture as they speak, and their gesturing helps convey information (Corballis, 2010).

Nevertheless, not all are convinced by the gestural theory (e.g., Burling, 2005; MacNeilage, 2008), and there may well be alternative explanations as to the parallel routes to speech in parrots and humans. Although both are employed communicatively, they do serve different functions and have different properties. In humans, speech is the dominant medium of language, a complex system involving syntax and the capacity to transmit information about past and planned future events, states of the world, explanations of how things work—or in Pinker’s words, “who did what to whom, when, where, and why” (Pinker, 2003, p. 27).

From what little is known about wild parrots, their communication appears to have more to do with social bonding than with the exchange of information, although vocalizations are used to coordinate movement and to transmit general information (as in alarm calls). Given our lack of knowledge, we cannot yet say if parrots use calls to transmit information in the ways described above for human speech, though speech-based contact calls have been documented (Colbert-White et al., 2011). In the wild and in captivity, parrots must vocally conform to a group in order to be accepted by that group. Given the importance to parrots of group
cohesion and social partners for safety and resource discovery (Bradbury, 2003), parrots have most likely experienced selection for the ability to imitate a vast array of sounds to ensure continuing acceptance in the group (i.e., to be vocal generalists).

Just as human infants are born with the ability to learn thousands of languages, parrots also appear to have the ability to learn a vast array of vocalizations. However, while human infants excel at producing a variety of human vocalizations, they—like most species—produce other species’ vocalizations quite poorly. By contrast, parrots readily produce many other species’ vocalizations. This extreme vocal generalist quality, matched with the use of a species-atypical vocal communication system to interact with social partners, is both remarkable and rare within the animal kingdom.

Stereotyped vocalizations are predominant in avian species for which inclusion in a specific group is not crucial to survival (e.g., pigeons and chickens). Temperate songbirds may therefore hold an intermediate position between taxa exhibiting stereotyped vocalizations (vocal specialist) and taxa exhibiting extensive vocal learning (vocal generalist). In some species of songbirds, while the song template is the same across individuals, and there is no requirement of song for inclusion into groups, males that recombine syllables or incorporate vocalizations of other species are considered the most attractive by females (e.g., Catchpole, 1987; Howard, 1974; see Catchpole & Slater, 2008 for review). Thus, the pressure to be somewhat of a vocal generalist in this taxonomic group is apparent.

Just as humans may have transitioned to speech from gestures as a means of overcoming issues associated with night vision and physical barriers, species that are flighted, arboreal, nocturnal, or aquatic also encounter environmental constraints that would make vocal communication systems as complex as speech more appropriate for information transfer (e.g., Janik & Slater, 1997; Jarvis, 2006; Liska, 1993; McCowan et al., 2002). Cetaceans and microchiropteran bats share environmental constraints similar to humans and parrots, as well as vocal learning and varying degrees of sociality. Among bats, researchers have identified individual- and group-specific signature contact calls (e.g., Arnold & Wilkinson, 2011; Gillam & Chaverri, 2012) similar to vocal identification systems observed in cetaceans (e.g., Janik & Sayigh, 2013), parrots (e.g., Berg et al., 2012), and humans. Many cetaceans such as bottlenose dolphins (Tursiops truncates) and humpback whales (Megaptera novaeangliae) are highly social to the level of cultural transmission of behavior (e.g., Allen, Weinrich, Hoppitt, & Rendell, 2013; Rendell & Whitehead, 2001). However, neither bats nor cetaceans are able to produce speech. Ridgway et al. (2012) did describe a beluga whale that spontaneously produced sounds that mimicked human speech rhythms and fundamental frequencies, but the authors acknowledged that the sounds were not articulated speech. Like apes, bats and cetaceans present an interesting conundrum in which some, but not all, features described in this review apply.

Along with increasing communication research on cetaceans and bats to learn more about how they fit into the speech faculty debate, future studies could investigate the vocal generalist versus vocal specialist difference described earlier. In addition to investigations of neuroanatomical differences among highly generalist species like African Grey parrots, intermediate generalists like nightingales, and vocal specialists like chickens, the literature lacks intra-family assessments of speech abilities. African Greys have received much attention and training in speech production and use (e.g., Pepperberg’s adaptation of Todt’s 1975 model/rival shaping paradigm; for review, see Pepperberg, 1999). However, intensive, specialized training protocols have not been developed for other promising parrot species like macaws, or even budgerigars, the parrot species that dominates neuroanatomical research on vocal-auditory pathways. Budgerigars may be used because they are inexpensive and easy to maintain in captivity, but they are not the most appropriate model to understand why parrots and not songbirds can produce speech. In order to understand how
African Greys are so skilled at using speech, neuroanatomical work must be done with Greys, not budgerigars. Comparisons of multiple parrot species with a range of speech abilities, including budgerigars, may answer more questions about the speech faculty. Macaws, for example, have not been studied at all in this regard, although they have the largest telencephalon ratios among those studied by Burish et al. (2004).

With the exception of some of Pepperberg’s work, comparisons between humans and African Grey parrots’ speech use do not exist in the literature, and few theories have been developed to explain why parrots have been selected to produce the large variety of sounds comprising their heterospecific repertoires—both in the wild and in captivity. Given the similarly large range of sounds human infants can learn to produce in order to communicate, we find parrots to be an interesting opportunity for comparison. Whether the comparison is at the level of social organization, ecological relevance of a complex learned repertoire, some other feature, or a combination of these, the vocal generalist quality of humans and parrots merits further investigation. Because numerous parallels between human speech and language and songbird songs have been made, it is our hope that this synthesis of the literature serves as a call to action for collaboration of linguists, animal behaviorists, neuroanatomists, and psychologists to begin to explore humans’ and parrots’ shared vocal generalist quality.


Footnotes

1 Since tuis (Prosthemadera novaeseelandiae; Whangarei Native Bird Recovery Centre, n.d.), corvids (f. Corvidae, Noack, 1902), and sturnids (f. Sturnidae, West et al., 1983) are also passerines that can mimic speech, we have simplified songbird to denote stereotypical temperate oscine songbirds with songs predominating the wild vocal repertoire (e.g., chickadees, finches).

2 Simultaneously producing two or even three distinguishable pitches using a larynx is extremely rare, but possible. Some traditional singing in Central Asia, Southern Siberia, India, and South Africa involves what is called overtone singing. The extraordinary practice requires years of training and greatly strains the vocal apparatus (Pegg, 1992).

3 Humans, songbirds, and parrots also share the feature of altricial young. The relevance of altriciality to arguments made in this review is unknown, but the similarity in this dimension among the three taxonomic groups should not be overlooked.


References

Aiello, L. C., & Dunbar, R. I. M. (1993). Neocortex size, group size, and the evolution of language. Current Anthropology, 34, 184–193. doi:10.1086/204160

Allen, J., Weinrich, M., Hoppitt, W., & Rendell, L. (2013). Network-based diffusion analysis reveals cultural transmission of lobtail feeding in humpback whales. Science, 340, 485–488. doi:10.1126/science.1231976

Armstrong, D. F., & Wilcox, S. E. (2007). The Gestural Origin of Language. Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780195163483.001.0001

Arnold, B. D., & Wilkinson, G. S. (2011). Individual specific contact calls of pallid bats (Antrozous pallidus) attract conspecific’s at roosting sites. Behavioral Ecology and Sociobiology, 65, 1581–1593. doi:10.1007/s00265-011-1168-4

Arriaga, G., & Jarvis, E. D. (2013). Mouse vocal communication system: Are ultrasounds learned or innate? Brain & Language, 124, 96–116. doi:10.1016/j.bandl.2012.10.002

Bailey, P., von Bonin, G., & McCulloch, W. S. (1950). The Isocortex of the Chimpanzee. Urbana-Champaign: University of Illinois Press.

Beauchamp, G., & Fernández-Juricic, E. (2004). Is there a relationship between forebrain size and group size in birds? Evolutionary Ecology Research, 6(6), 833–842.

Bell, D. A., Trail, P. W., & Baptista, L. F. (1998). Song learning and vocal tradition in Nuttall’s white-crowned sparrows. Animal Behaviour, 55, 939–956. doi:10.1006/anbe.1997.0644

Berg, K. S., Delgado, S., Cortopassi, K. A., Beissinger, S. R., & Bradbury, J. W. (2012). Vertical transmission of learned signatures in a wild parrot. Proceedings of the Royal Society B, 279, 585–591. doi:10.1098/rspb.2011.0932

Bermejo, M., & Omedes, A. (1999). Preliminary vocal repertoire and vocal communication of bonobos (Pan paniscus) at Lilunga (Democratic Republic of Congo). Folio Primatologica, 70, 328-357. doi:10.1159/000021717

Bertin, A., Hausberger, M., Henry, L., & Richard-Yris, M.-A. (2007). Adult and peer influences on starling song development. Developmental Psychobiology, 49, 362–374. doi:10.1002/dev.20223

Blumstein, D. T., & Armitage, K. B. (1997). Does sociality drive the evolution of communicative complexity? A comparative test with ground-dwelling sciurid alarm calls. The American Naturalist, 150(2), 179–200. doi: 10.1086/286062

Boë, L.-J., Heim, J.-L., Honda, K., & Maeda, S. (2002). The potential Neandertal vowel space was as large as that of modern humans. Journal of Phonetics, 30, 465–484. doi:10.1006/jpho.2002.0170

Boer, B. (2009). Acoustic analysis of air sacs and their effect on vocalization. Journal of the Acoustical Society of America, 126, 3329–3343. doi:10.1121/1.3257544

Bolhuis, J., & Gahr, M. (2006). Neural mechanisms of birdsong memory. Nature Reviews Neuroscience, 7, 347–357. doi:10.1038/nrn1904

Bottjer, S. W., & Arnold, A. P. (1985). Cerebral lateralization in birds. In S. Glick (Ed.), Cerebral Lateralization in Nonhuman Species (pp. 11–38). Orlando, FL: Academic Press.

Bottjer, S. W., Halsema, E. A., & Arnold A. P. (1984). Forebrain lesions disrupt development but not maintenance of song in passerine birds. Science, 224, 901–903. doi:10.1126/science.6719123

Boughman, J. W. (1998). Vocal learning by greater spear-nosed bats. Proceedings of the Royal Society of London B, 265, 227–233. doi:10.1098/rspb.1998.0286

Böye, M., Güntürkün, O., & Vauclair, J. (2005). Right ear advantage for conspecific calls in adults and subadults, but not infants, California sea lions (Zalophus californianus): hemispheric specialization for communication? European Journal of Neuroscience, 21, 1727–1732. doi:10.1111/j.1460-9568.2005.04005.x

Bradbury, J. W. (2003). Vocal communication of wild parrots. In F. B. M. de Waal & P. L. Tyack (Eds.), Animal Social Complexity: Intelligence, Culture, and Individualized Societies (pp. 293–316). Cambridge, MA: Harvard University Press. doi:10.1121/1.4780035

Brenowitz, E. A., Arnold, A. P., & Levin, R. N. (1985). Neural correlates of female song in tropical duetting birds. Brain Research, 34(1), 104–112. doi:10.1016/0006-8993(85)91163-1

Brown, E. D., & Farbaugh, S. M. (1997). What birds with complex social relationships can tell us about vocal learning: Vocal sharing in avian groups. In P. McGregor (Ed.), Animal Communication Networks (pp. 98–127). Cambridge: Cambridge University Press.

Bruck, J. N. (2013). Decades-long social memory in bottlenose dolphins. Proceedings of the Royal Society B, 280, 1–6. doi:10.1098/rspb.2013.1726

Bucher, T. L. (1983). Parrot eggs, embryos, and nestlings: Patterns and energetics of growth and development. Physiological Zoology, 56(3), 465–483.

Burish, M. J., Kueh, H. Y., & Wang, S. S.-H. (2004). Brain architecture and social complexity in modern and ancient birds. Brain, Behavior and Evolution, 63, 107–124. doi:10.1159/000075674

Burley, N. (1985). Leg-band color and mortality patterns in captive breeding populations of zebra finches. The Auk, 102(3), 647–651.

Burling, R. (2005). The Talking Ape. New York: Oxford University Press.

Byrne, R. W., & Whiten, A. (1988). Machiavellian Intelligence: Social Expertise and the Evolution of Intellect in Monkeys, Apes and Humans. Oxford: Oxford University Press.

Calder, W. A. (1970). Respiration during song in the canary (Serinus canaria). Comparative Biochemistry and Physiology, 32, 251–258. doi:10.1016/0010-406X(70)90938-2

Campbell, E. J. M. (1968). The respiratory muscles. Annals of the New York Academy of Sciences, 155, 135–140. doi:10.1111/j.1749-6632.1968.tb56757.x

Cantalupo, C., & Hopkins, W. D. (2001). Asymmetric Broca’s area in great apes. Nature, 41(6863), 505. doi:10.1038/35107134

Carey, J. R., & Judge, D. S. (2001). Lifespan extension in humans is self-reinforcing: A general theory of longevity. Population and Development Review, 27, 411–436. doi:10.1111/j.1728-4457.2001.00411.x

Cartmill, E. A., & Byrne, R. W. (2010). Semantics of primate gestures: Intentional meanings of orangutan gestures. Animal Cognition, 13, 793–804. doi:10.1007/s10071-010-0328

Catchpole, C. K. (1987). Bird song, sexual selection and female choice. Trends in Ecology and Evolution, 2, 94–97. doi:10.1016/0169-5347(87)90165-0

Catchpole, C. K., & Slater, P. J. B. (2008). Bird song: Biological Themes and Variations (2nd ed.). Cambridge: Cambridge University Press. doi:10.1017/CBO9780511754791

Clark, A. P., & Wrangham, R. W. (1993). Acoustic analysis of wild chimpanzee pant hoots: Do Kibale forest chimpanzees have an acoustically distinct arrival pant hoot? American Journal of Primatology, 31¬, 99–109. doi:10.1002/ajp.1350310203

Clark, R. B. (1949). Some statistical information about wren song. British Birds, 42, 337–346.

Codd, J. R. (2010). Uncinate processes in birds: Morphology, physiology and function. Comparative Biochemistry and Physiology, Part A, 156, 303–308. doi:10.1016/j.cbpa.2009.12.005

Codd, J. R., Boggs, D. F., Perry, S. F., & Carrier, D. R. (2005). Activity of three muscles associated with the uncinate processes in the giant Canada goose Branta canadensis maximus. Journal of Experimental Biology, 208, 849–857. doi:10.1242/jeb.01489

Colbert-White, E. N., Covington, M. A., & Fragaszy, D. M. (2011). Social context influences the vocalizations of a home-raised African Grey parrot (Psittacus erithacus erithacus). Journal of Comparative Psychology, 125, 175–184. doi:10.1037/a0022097

Corballis, M. C. (1991). The Lopsided Ape: Evolution of the Generative Mind. New York: Oxford University Press.

Corballis, M. C. (2003). From mouth to hand: Gesture, speech, and the evolution of right-handedness. Behavioral and Brain Sciences, 26, 199–260. doi:10.1017/S0140525X03000062

Corballis, M. C. (2010). The gestural origins of language. WIREs Cognitive Science, 1, 2–7. doi:10.1002/wcs.2

Cruickshank, A. J., Gautier, J.-P., & Chappuis, C. (1993). Vocal mimicry in wild African Grey parrots Psittacus erithacus. Ibis, 135, 293–299. doi:10.1111/j.1474-919X.1993.tb02846.x

Curtiss, S. (1979). Genie: Language and cognition. UCLA Working Papers in Cognitive Linguistics, 1, 15–62.

Dalziell, A. H., & Magrath, R. D. (2012). Fooling the experts: Accurate vocal mimicry in the song of the superb lyrebird, Menura novaehollandiae. Animal Behaviour, 83, 1401–1410. doi:10.1016/j.anbehav.2012.03.009

Davidson, T. M. (2003). The great leap forward: The anatomic basis for the acquisition of speech and obstructive sleep apnea. Sleep Medicine, 4, 185–194. doi:10.1016/S1389-9457(02)00237-X

de Boer, B., & Fitch, W. T. (2010). Computer models of vocal tract evolution: An overview and critique. Adaptive Behavior, 18, 36–47. doi:10.1177/1059712309350972

de Waal, F. B. M. (1988). The communicative repertoire of captive bonobos (Pan paniscus), compared to that of chimpanzees. Behaviour, 106(3/4), 183–251. doi:10.1163/156853988X00269

Deacon, T. W. (1997). The Symbolic Species: The Co-Evolution of Language and the Brain. New York: Norton.

Del Hoyo, J., Elliot, A., & Sargatal, J. (1992). Handbook of the Birds of the World. Barcelona: Lynx Editions.

Diamond, A. S. (1959). The History and Origin of Language. London: Methuen.

Doupe, A. J., & Kuhl, P. K. (1999). Birdsong and human speech: Common themes and mechanisms. Annual Review of Neuroscience, 22, 567–631. doi:10.1146/annurev.neuro.22.1.567

Duchin, L. E. (1990). The evolution of articulate speech: Comparative anatomy of the oral cavity in Pan and Homo. Journal of Human Evolution, 19, 687–697. doi:10.1016/0047-2484 (90)90003-T

Dunbar, R. I. M. (1988). Primate Social Systems. Ithaca, NY: Cornell University Press. doi:10.1007/978-1-4684-6694-2

Dunbar, R. I. M. (1991). Functional significance of social grooming in primates. Folia Primatologica, 57, 121–131. doi:10.1159/000156574

Dunbar, R. I. M. (1992). Neocortex size as a constraint on group size in primates. Journal of Human Evolution, 22, 469–493. doi:10.1016/0047-2484(92)90081-J

Dunbar, R. I. M. (2003). The social brain: Mind, language, and society in evolutionary perspective. Annual Review of Anthropology, 32, 163–181. doi:10.1146/annurev.anthro.32.061002.093158

Elemans, C. P. H., Muller, M., Larsen, O. N., & van Leeuwen, J. L. (2009). Amplitude and frequency modulation control of sound production in a mechanical model of the avian syrinx. Journal of Experimental Biology, 212, 1212–1224. doi:10.1242/jeb.026872

Falk, D. (1980). Language, handedness, and primate brains: Did the Australopithecines sign? American Anthropologist, 82(1), 72–78. doi:10.1525/aa.1980.82.1.02a00040

Falk, D. (1983). Cerebral cortices of East African early hominids. Science, 221(4615), 1072–1074. doi:10.1126/science.221.4615.1072

Farabaugh, S. M., & Dooling, R. J. (1996). Acoustic communication in parrots: Laboratory and field studies of budgerigars, Melopsittacus undulatus. In D. E. Kroodsma & E. H. Miller (Eds.), Ecology and Evolution of Acoustic Communication in Birds (pp. 97–117). Ithaca, NY: Cornell University Press.

Feenders, G., Liedvogel, M., Rivas, M., Zapka, M., Horita, H., Hara, E., . . . Jarvis, E. D. (2008). Molecular mapping of movement-associated areas in the avian brain: A motor theory for vocal learning origin. PLoS ONE, 3, e1768. doi:10.1371/journal.pone.0001768

Ficken, M. S., Ficken, R. W., & Witkin, S. R. (1978). Vocal repertoire of the black-capped chickadee. The Auk, 95 (1), 34–48. doi:10.2307/4085493

Finlay, B. L., Cheung, D., & Darlington, R. B. (2005). Developmental constraints on or developmental structure in brain evolution? In Y. Munakata & M. Johnson (Eds.), Attention and Performance XXI: Process of Change in Brain and Cognitive Development (pp. 131–162). Oxford: Oxford University Press.

Fitch, W. T. (2000a). Comparative vocal production and the evolution of speech: Reinterpreting the descent of the larynx. In A. Wray (Ed.), The Transition to Language (pp. 21–45). Oxford: Oxford University Press.

Fitch, W. T. (2000b). The evolution of speech: A comparative review. Trends in Cognitive Sciences, 4, 258–267. doi:10.1016/S1364-6613(00)01494-7

Fitch, W. T. (2000c). The phonetic potential of nonhuman vocal tracts: Comparative cineradiographic observations of vocalizing animals. Phonetica, 57, 205–218. doi:10.1159/000028474

Fitch, W. T. (2010). The Evolution of Language. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511817779

Fitch, W. T., & Hauser, M. D. (1995). Vocal production in nonhuman primates: Acoustics, physiology, and functional constraints on “honest” advertisement. American Journal of Primatology, 37, 191–219. doi:10.1002/ajp.1350370303

Fitch, W. T., & Jarvis, E. D. (2013). Birdsong and other animal models for human speech, song, and vocal learning. In M. A. Arbib (Ed.), Language, Music, and the Brain: A Mysterious Relationship (pp. 499–540). Cambridge, MA: MIT Press.

Fitch, W. T., & Reby, D. (2001). The descended larynx is not uniquely human. Proceedings of the Royal Society of London B , 268, 1669–1675. doi:10.1098/rspb.2001.1704

Fossey, D. (1972). Vocalizations of the mountain gorilla (Gorilla gorilla berengei). Animal Behaviour, 20, 36–53. doi:10.1016/S0003-3472(72)80171-4

Freeberg, T. M., & Harvey, E. M. (2008). Group size and social interactions are associated with calling behavior in Carolina chickadees (Poecile carolinensis). Journal of Comparative Psychology, 122, 312–318. doi:10.1037/0735-7036.122.3.312

Galaburda, A. M., LeMay, M., Kemper, T. L., & Geschwind, N. (1978). Right–left asymmetrics in the brain. Science, 199, 852–856. doi:10.1126/science.341314

Galdikas, B. M. F., & Teleki, G. (1981). Variations in subsistence activities of female and male pongids: New perspectives on the origins of hominid labor division. Current Anthropology, 22(3), 241–256. doi:10.1086/202662

Gannon, P. J., Holloway, R. L., Broadfield, D. C., & Braun, A. R. (1998). Asymmetry of chimpanzee planum temporale: Humanlike pattern of Wernicke’s brain language area homolog. Science, 279, 220–222. doi:10.1126/science.279.5348.220

Geissler, D. B., & Ehret, G. (2004). Auditory perception vs. recognition: Representation of complex communication sounds in the mouse auditory cortical fields. European Journal of Neuroscience, 19, 1027– 1040. doi:10.1111/j.1460-9568.2004.03205.x

Geissmann, T. (1999). Duet songs of the siamang, Hylobates syndactylus: II. Testing the pair-bonding hypothesis during a partner exchange. Behaviour, 136(8), 1005–1039.

Geissmann, T. (2002). Duet-splitting and the evolution of gibbon songs. Biological Review, 77, 57–76. doi:10.1017/S1464793101005826

Gentilucci, M., & Corballis, M. C. (2006). From manual gesture to speech: A gradual transition. Neuroscience and Biobehavioral Reviews, 30, 949–960. doi:10.1016/j.neubiorev.2006.02.004

George, J. C., Bada, J., Zeh, J., Scott, L., Brown, S. E., O’Hara, T., & Suydam, R. (1999). Age and growth estimates of bowhead whales (Balaena mysticetus) via aspartic acid racemization. Canadian Journal of Zoology, 77, 571–580. doi:10.1139/z99-015

Geschwind, N., & Levitsky, W.(1968). Human brain: Left–right asymmetries in temporal speech region. Science, 161 (3837), 186–187. doi:10.1126/science.161.3837.186

Ghazanfar, A., & Hauser, M. (1999). The neuroethology of primate vocal communication: Substrates for the evolution of speech. Trends in Cognitive Sciences, 3, 377–384. doi:10.1016/S1364-6613(99) 01379-0

Ghazanfar, A. A., & Rendall, D. (2008). Evolution of human vocal production. Current Biology, 18(11), R457– R460. doi:10.1016/j.cub.2008.03.030

Gibbons, J. W. (1987). Why do turtles live so long? BioScience, 37(4), 262–269. doi:10.2307/1310589

Gillam, E. H., & Chaverri, G. (2012). Strong individual signatures and weaker group signatures in contact calls of Spix’s disc-winged bat, Thyroptera tricolor. Animal Behaviour, 83, 269–276. doi:10.1016/j.anbehav.2011.11.002

Goodall, J. (1986). The Chimpanzees of Gombe: Patterns of Behavior. Cambridge, MA: Harvard University Press.

Gottlieb, G. (1963). A naturalistic study of imprinting in wood ducklings (Aix sponsa). Journal of Comparative and Physiological Psychology, 56, 86–91. doi:10.1037/h0046285

Goulden, R., Nation, P., & Read, J. (1990). How large can a receptive vocabulary be? Applied Linguistics, 11, 341–363. doi:10.1093/applin/11.4.341

Greene, M. C. L., & Mathieson, L. (1989). The Voice and Its Disorders (5th ed). London: Whurr Publishers.

Haimoff, E. F. (1983).

Occurrence of antiresonance in the song of siamang Hylobates syndactylus. American Journal of Primatology, 5, 249–256. doi:10.1002/ajp.1350050309

Harcourt, A. H., Stewart, K. J., & Hauser, M. (1993). Functions of wild gorilla ‘close’ calls. I. Repertoire, context, and interspecific comparison. Behaviour, 124, 89–122. doi:10.1163/156853993X00524

Hardus, M., Lameira, A., Singleton, I., Morrogh-Bernard, H., Knott, C., Ancrenaz, M., . . . Wich, S. (2009). A description of the orangutan’s vocal and sound repertoire, with a focus on geographic variation. In S. Wich (Ed.), Orangutans: Geographic variation in behavioral ecology and conservation (pp. 49-64). Oxford: Oxford University Press.

Harris, M. A., & Lemon, R. E. (1972). Songs of song sparrows (Melospiza melodia): Individual variation and dialects. Canadian Journal of Zoology, 50, 301–309. doi:10.1139/z72-041

Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: What is it, who has it, and how did it evolve? Science, 298, 1569–1579. doi:10.1126/science.298.5598.1569

Hayes, K. J., & Hayes, C. (1951). The intellectual development of a home-raised chimpanzee. Proceedings of the American Philosophical Society, 95(2), 105–109.

Heaton, J. T., & Brauth, S. E. (1999). Effects of deafening on the development of nestling and juvenile vocalizations in budgerigars (Melopsittacus undulatus). Journal of Comparative Psychology, 113, 314–320. doi:10.1037/0735-7036.113.3.314

Hewes, G. W. (1973). Primate communication and the gestural origins of language. Current Anthropology, 14, 5–24. doi:10.1086/204019

Hobaiter, C., & Byrne, R. W. (2011a). The gestural repertoire of the wild chimpanzee. Animal Cognition, 14, 745– 767. doi:10.1007/s10071-011-0409-2

Hobaiter, C., & Byrne, R. W. (2011b). Serial gesturing by wild chimpanzees: Its nature and function for communication. Animal Cognition, 14, 827–838. doi:10.1007/s10071-011-0416-3

Holloway, R., & de la Coste-Lareymondie, C. (1982). Brain endocast asymmetry in pongids and hominids: Some preliminary findings on the paleontology of cerebral dominance. American Journal of Physical Anthropology, 58, 101–110. doi:10.1002/ajpa.1330580111

Hopkins, W. D., Taglialatela, J., & Leavens, D. A. (2007). Chimpanzees differentially produce novel vocalizations to capture the attention of a human. Animal Behaviour, 73, 281–286. doi:10.1016/j.anbehav.2006.08.004

Horwitz, B., Amunts, K., Bhattacharyya, R., Patkin, D., Jeffries, K., Zilles, K., & Braun, A. R. (2003). Activation of Broca’s area during the production of spoken and signed language: A combined cytoarchitectonic mapping and PET analysis. Neuropsychologia, 41, 1868–1876. doi:10.1016/S0028-3932(03)00125-8

Howard, R. D. (1974). The influence of sexual selection and interspecific competition on mockingbird song (Mimus polyglottos). Evolution, 28(3), 428–438. doi:10.2307/2407164

Ito, H., Ishikawa, Y., Yoshimoto, M., & Yamamoto, N. (2007). Diversity of brain morphology in teleosts: Brain and ecological niche. Brain, Behavior & Evolution, 69, 76–86. doi:10.1159/000095196

Janik, V. M., Sayigh, L. S. (2013). Communication in bottlenose dolphins: 50 years of signature whistle research. Journal of Comparative Physiology A, 199, 479–489. doi:10.1007/s00359-013-0817-7

Janik, V. M., & Slater, P. J. B. (1997). Vocal learning in mammals. Advances in the Study of Behavior, 26, 59–99. doi:10.1016/S0065-3454(08)60377-0

Jarvis, E. D. (2004). Learned birdsong and the neurobiology of human language. Annals of the New York Academy of Sciences, 1016 , 749–777. doi:10.1196/annals.1298.038

Jarvis, E. D. (2006). Selection for and against vocal learning in birds and mammals. Ornithological Science, 5, 5–14. doi:10.2326/osj.5.5

Jarvis, E. D. (2007). Neural systems for vocal learning in birds and humans: A synopsis. Journal of Ornithology, 148, S35–S44. doi:10.2326/osj.5.5

Jarvis, E. D., & Mello, C. V. (2000). Molecular mapping of brain areas involved in parrot vocal communication. The Journal of Comparative Neurology, 419, 1–31. doi:10.1002/(SICI)1096-9861 (20000327)419:13.0.CO;2-M

Jürgens, U. (1995). Neuronal control of vocal production in nonhuman and human primates. In E. Zimmermann, J. D. Newman, & U. Jürgens (Eds.), Current Topics in Primate Vocal Communication (pp. 199–206). New York: Plenum Press.

Jürgens, U. (2002). Neural pathways underlying vocal production. Neuroscience & Biobehavioral Reviews, 26, 235–258. doi:10.1016/S0149-7634(01)00068-9

Kaminski, J., Call, J., & Fischer, J. (2004). Word learning in a domesticated dog: Evidence for “fast mapping.” Science, 304, 1682–1683. doi:10.1126/science.1097859

Kay, R. F., Cartmill, M., & Balow, M. (1998). The hypoglossal canal and the origin of human vocal behavior. Proceedings of the National Academy of Sciences, 95(9), 5417–5419. doi:10.1073/pnas.95.9.5417

Keller, G. B., & Hahnloser, H. R. (2009). Neural processing of auditory feedback during vocal practice in a songbird. Nature, 457, 187–190. doi:10.1038/nature07467

Kipper, S., Mundry, R., Sommer, C., Hultsch, H., & Todt, D. (2006). Song repertoire size is correlated with body measures and arrival date in common nightingales, Luscinia megarhynchos. Animal Behaviour, 71, 211–217. doi:10.1016/j.anbehav.2005.04.011

Klima, E. S., & Bellugi, U. (1979). The Signs of Language. Cambridge, MA: Harvard University Press.

Knight, C. (1998). Ritual/speech coevolution: A solution to the problem of deception. In J. R. Hurford, M. Studdert-Kennedy, & C. Knight (Eds.), Approaches to the Evolution of Language (pp. 68–91). Cambridge: Cambridge University Press.

Konishi, M. (1963). The role of auditory feedback in the vocal behavior of the domestic fowl. Zeitschrift für Tierpsychologie, 2, 349–367. doi:10.1111/j.1439-0310.1963.tb01156.x

Kroodsma, D. E., & Miller, E. H. (1996). Ecology and Evolution of Acoustic Communication in Birds. Ithaca, NY: Cornell University Press.

Kuhl, P. K. (2003). Human speech and birdsong: Communication and the social brain. Proceedings of the National Academy of Sciences, 100, 9645–9646. doi:10.1073/pnas.1733998100

Kuhl, P. K. (2007). Is speech learning ‘gated’ by the social brain? Developmental Science, 10, 110–120. doi:10.1111/j.1467-7687.2007.00572.x

Ladefoged, P. (1968). Linguistic aspects of respiratory phenomena. Annals of the New York Academy of Sciences, 155, 141–151. doi:10.1111/j.1749-6632.1968.tb56758.x

Laitman, J. T., Heimbuch, R. C., & Crelin, E. S. (1978). Developmental change in a basicranial line and its relationship to the upper respiratory system in living primates. American Journal of Anatomy, 152, 467–482. doi:10.1002/aja.1001520403

Lameira, A. R., Hardus, M. E., Kowalsky, B., de Vries, H., Spruijt, B., Sterck, E. H. M., . . . Wich, S. A. (2013). Orangutan (Pongo spp.) whistling and implications for the emergence of an open-ended call repertoire: A replication and extension. Journal of the Acoustical Society of America, 134, 2326–2335. doi:10.1121/1.4817929

Lavenex, P. B. (2000). Lesions in the budgerigar vocal control nucleus NLc affect production, but not memory, of English words and natural vocalizations. The Journal of Comparative Neurology, 421, 437–460. doi:10.1002/(SICI)1096-9861(20000612)421:43.0.CO;2-A

Levinson, S. T. (1980). The social behavior of the White-fronted Amazon (Amazona albifrons). In Conservation of New World Parrots: International Council for Bird Preservation, Technical Publication No. 1 (R. F. Pasquier, Ed.), (pp. 403–417). Washington, DC: Smithsonian Institution Press.

Lieberman, P. (1984). The Biology and Evolution of Language. Cambridge, MA: Harvard University Press.

Lieberman, P., & Crelin, E. S. (1971). On the speech of Neanderthal man. Linguistic Inquiry, 2(2), 203–222.

Lieberman, P., Crelin, E. S., & Klatt, D. H. (1972). Phonetic ability and related anatomy of the newborn and adult human, Neanderthal man, and the chimpanzee. American Anthropologist, 74, 287–307. doi:10.1525/aa.1972.74.3.02a00020

Lieberman, P., & McCarthy, R. (2007). Tracking the evolution of language and speech: Comparing vocal tracts to identify speech capabilities. Expedition, 49, 15–20.

Lieberman, D. E., McCarthy, R. C., Hiiemae, K. M., & Palmer, J. B. (2001). Ontogeny of postnatal hyoid and larynx descent in humans. Archives of Oral Biology, 46, 117–128. doi:10.1016/S0003-9969(00)00108-4

Liska, J. (1993). Bee dances, bird songs, monkey calls, and cetacean sonar: Is speech unique? Western Journal of Communication, 57, 1–26. doi:10.1080/10570319309374428

Luchsinger, R., & Arnold, G. E. (1965). Voice, Speech, Language. London: Constable.

MacLarnon, A. M., & Hewitt, G. P. (1999). The evolution of human speech: The role of enhanced breathing control. American Journal of Physical Anthropology, 109, 341–363. doi:10.1002/(S ICI)1096-8644(199907) 109:33.0.CO;2-2

MacLarnon, A., & Hewitt, G. (2004). Increased breathing control: Another factor in the evolution of human language. Evolutionary Anthropology, 13, 181–197. doi:10.1002/evan.20032

MacNeilage, P. N. (2008). The Origin of Speech. Oxford: Oxford University Press.

Marler, P. (1970a). Birdsong and speech development: Could there be parallels? American Scientist, 58, 669–673.

Marler, P. (1970b). A comparative approach to vocal learning: Song development in white-crowned sparrows. Journal of Comparative Physiological Psychology, 71, 1–25. doi:10.1037/h0029144

Marler, P. (1977). The evolution of communication. In T. A. Sebeok (Ed.), How Animals Communicate (pp. 45– 70). Bloomington: Indiana University Press.

Marler, P., & Mitani, J. (1988). Vocal communication in primates and birds: Parallels and contrasts. In D. Todt, P. Goedeking, & D. Symmes (Eds.), Primate Vocal Communication (pp. 3–14). Berlin: Springer.

Marler, P., & Tenaza, R. (1977). Signaling behavior of apes with special reference to vocalizations. In T. A. Sebeok (Ed.), How Animals Communicate (pp. 965–1033). Bloomington: Indiana University Press.

Matsunaga, E., Kato, M., Okanoya, K. (2008). Comparative analysis of gene expressions among avian brains: A molecular approach to the evolution of vocal learning. Brain Research Bulletins, 75, 474–479. doi:10.1016/j.brainresbull.2007.10.045

McCasland, J. S., & Konishi, M. (1981). Interaction between auditory and motor activities in an avian song control nucleus. Proceedings of the National Academy of Sciences, 78(12), 7815–7819. doi:10.1073/pnas.78.12.7815

McCowan, B., Doyle, L. R., & Hanser, S. F. (2002). Using information theory to assess the diversity, complexity and development of communicative repertoires. Journal of Comparative Psychology, 116, 166–172. doi:10.1037/0735-7036.116.2.166

McCowan, B., & Reiss, D. (1997). Vocal learning in captive bottlenose dolphins: A comparison with humans and nonhuman animals. In C. T. Snowdon & M. Hausberger (Eds.), Social Influences on Vocal Development (pp. 178–207). Cambridge: Cambridge University Press.

McElligott, A. G., Birrer, M., & Vannoni, E. (2006). Retraction of the mobile descended larynx during groaning enables fallow bucks (Dama dama) to lower their formant frequencies. Journal of Zoology, 270, 340–345. doi:10.1111/j.1469-7998.2006.00144.x

Melin, A., Young, H., Mosdossy, K., & Fedigan, L. (in press). Seasonality, extractive foraging and the evolution of primate sensorimotor intelligence. Journal of Human Evolution.

Moorman, S., Gobes, S. M. H., Kuijpers, M., Kerkhofs, A., Zandbergen, M. A., & Bolhuis, J. J. (2012). Human-like brain hemispheric dominance in birdsong learning. Proceedings of the National Academy of Sciences, 109, 12782–12787. doi:10.1073/pnas.1207207109

Morton, E. S., & Page, J. (1992). Animal Talk: Science and the Voices of Nature. New York: Random House.

Müller, A. E., & Anzenberger, G. (2002). Duetting in the titi monkey Callicebus cupreus: Structure, pair specificity and development of duets. Folia Primatologica, 73, 104–115. doi:10.1159/000064788

Nation, I. S. P., & Waring, R. (1997). Vocabulary size, text coverage, and word lists. In N. Schmitt & M. McCarthy (Eds.), Vocabulary: Description, Acquisition and Pedagogy (pp. 6–19). Cambridge: Cambridge University Press.

Nishimura, T., Mikami, A., Suzuki, J., & Matsuzawa, T. (2003). Descent of the larynx in chimpanzee infants. Proceedings of the National Academy of Sciences, 100, 6930–6933. doi:10.1073/pnas.1231107100

Nishimura, T., Oishi, T., Suzuki, J., Matsuda, K., & Takahashi, T. (2008). Development of the supralaryngeal vocal tract in Japanese macaques: Implications for the evolution of the descent of the larynx. American Journal of Physical Anthropology, 135, 182–194. doi:10.1002/ajpa.20719

Noack, H. R. (1902). Vocal powers of the yellow-billed magpie. The Condor, 4(4), 78–79. doi:10.2307/1361063

Nottebohm, F. (1970). Ontogeny of bird song. Science, 167(3920), 950–956. doi:10.1126/science.167.3920.950

Nottebohm, F. (1971). Neural lateralization of vocal control in a passerine bird. 1 Song. Journal of Experimental Zoology, 177, 229–262. doi:10.1002/jez.1401770210

Nottebohm, F. (1972). The origins of vocal learning. American Naturalist, 106(947), 116–140. doi:10.1086/282756

Nottebohm, F. (1977). Asymmetries in neural control of vocalization in the canary. In S. Harnad Lateralization in the nervous system (pp. 23-44). New York: Academic Press.

Nottebohm, F., & Arnold, A. P. (1976). Sexual dimorphism in vocal control areas of the songbird brain. Science, 194, 211–213. doi:10.1126/science.959852

Nottebohm, F., & Nottebohm, M. E. (1971). Vocalizations and breeding behavior of surgically deafened ring doves (Streptopelia risoria). Animal Behaviour, 19, 313–327. doi:10.1016/S0003-3472 (71)80012-X

Ocklenburg, S., & Güntükün, O. (2012). Hemispheric asymmetries: The comparative view. Frontiers in Psychology, 3, 1– 9. doi:10.3389/fpsyg.2012.00005

O’Connor, R. J. (1984). The Growth and Development of Birds. New York: Wiley & Sons.

Passingham, R. E. (1981). Broca’s area and the origins of human vocal skill. Philosophical Transactions of the Royal Society of London Series B, 292, 167–175. doi:10.1098/rstb.1981.0025

Paton, J. A., Manogue, K. R., & Nottebohm, F. (1981). Bilateral organization of the vocal control pathway in the budgerigar, Melopsittacus undulatus. Journal of Neuroscience, 1(11), 1279–1288.

Pegg, C. (1992). Mongolian conceptualizations of overtone singing (xöömii). British Journal of Ethnomusicology, 1, 31–54. doi:10.1080/09681229208567199

Pepperberg, I. M. (1987). Interspecies communication: A tool for assessing conceptual abilities in the African Grey parrot. In G. Greenberg & E. Tobach (Eds.), Language, Cognition, Consciousness: Integrative Levels (pp. 31–56). Hillsdale, NJ: Erlbaum Associates.

Pepperberg, I. M. (1992). A review of the effects of social interaction on vocal learning. Netherlands Journal of Zoology, 43(1–2), 104–124. doi:10.1163/156854293X00241

Pepperberg, I. M. (1999). The Alex Studies. Cambridge, MA: Harvard University Press.

Pepperberg, I. M. (2006). Cognitive and communicative abilities of Grey parrots. Applied Animal Behaviour Science, 100, 77–86. doi:10.1016/j.applanim.2006.04.005

Pepperberg, I. M. (2007). Grey parrots do not always ‘parrot’: The roles of imitation and phonological awareness on the creation of new labels from existing vocalizations. Language Sciences, 29, 1–13. doi:10.1016/j.langsci.2005.12.002

Pepperberg, I. M. (2010). Vocal learning in Grey parrots: A brief review of perception, production, and cross-species comparisons. Brain and Language, 115, 81–91. doi:10.1016/j.bandl.2009.11.002

Perlman, M., Patterson, F. G., & Cohn, R. H. (2012). The human-fostered gorilla Koko shows breath control in play with wind instruments. Biolinguistics, 6(3–4), 433–444.

Petersen, M. R., Beecher, M. D., Zoloth, S. R., Moody, D. B., & Stebbins, W. C. (1978). Neural lateralization of species- specific vocalizations by Japanese macaques (Macaca fuscata). Science, 202, 324–327. doi:10.1126/science.99817

Petitto, L. A., & Marentette, P. F. (1991). Babbling in the manual mode: Evidence for the ontogeny of language. Science, 251, 1493–1496. doi:10.1126/science.2006424

Petkov, C. I., & Jarvis, E. D. (2012). Birds, primates, and spoken language origins: Behavioral phenotypes and neurobiological substrates. Frontiers in Evolutionary Neuroscience, 4, 1–24. doi:10.3389/fnevo.2012.00012

Philips, M., & Austad, S. N. (1990). Animal communication and social evolution. In M. Bekoff & D. Jamieson (Eds.), Interpretation and Explanation in the Study of Animal Behavior. Vol. 1 Interpretation, Intentionality and Communication (pp. 254–268). Boulder, CO: Westview.

Pinker, S. (2003). Language as an adaptation to the cognitive niche. In M. H. Christiansen & S. Kirby (Eds.), Language Evolution (pp. 16–37). Oxford: Oxford University Press.

Pollick, A. S., & de Waal, F. B. M. (2007). Ape gestures and language evolution. Proceedings of the National Academy of Sciences of the United States of America, 104, 8184–8189. doi:10.1073/pnas.0702624104

Poole, J. H., Tyack, P. L., Stoeger-Horwath, A. S., & Watwood, S. (2005). Elephants are capable of vocal learning. Nature, 434, 455–456. doi:10.1038/434455a

Portmann, A. (1946). Études sur la cérébralisation chez les oiseaux: I. Alauda, 14, 2–20.

Prather, J. F., Peters, S., Nowicki, S., & Mooney, R. (2008). Precise auditory-vocal mirroring in neurons for learned vocal communication. Nature, 451, 305–310. doi:10.1038/nature06492

Pulleyblank, E. G. (2008). Language as digital: A new theory of the origin and nature of human speech. Proceedings of the 20th North American Conference on Chinese Linguistics, 1, 1–20.

Quick, N. J., & Janik, V. M. (2012). Bottlenose dolphins exchange signature whistles when meeting at sea. Proceedings of the Royal Society B, 279, 2539–2545. doi:10.1098/rspb.2011.2537

Ralls, K., Fiorelli, P., & Gish, S. (1985). Vocalizations and vocal mimicry in captive harbor seals, Phoca vitulina. Canadian Journal of Zoology, 63, 1050–1056. doi:10.1139/z85-157

Reader, S. M., & Laland, K. N. (2002). Social intelligence, innovation, and enhanced brain size in primates. Proceedings of the National Academy of Sciences of the United States of America, 99, 4436– 4441. doi:10.1073/pnas.062041299

Rendell, L., & Whitehead, H. (2001). Culture in whales and dolphins. Behavioral and Brain Sciences, 24, 309–382. doi:10.1017/S0140525X01243969

Ridgway, S., Carder, D., Jeffries, M., & Todd, M. (2012). Spontaneous human speech mimicry by a cetacean. Current Biology, 22, R860–R861. doi:10.1016/j.cub.2012.08.044.

Riede, T., Fisher, J. H., & Goller, F. (2010). Sexual dimorphism of the zebra finch syrinx indicates adaptation for high fundamental frequencies in males. PLoS ONE, 5, e11368. doi:10.1371/journal.pone.0011368

Rizzolatti, G., & Arbib, M. (1998). Language within our grasp. Trends in Neurosciences, 21, 188–194. doi:10.1016/S0166-2236(98)01260-0

Rizzolatti, G., & Sinigaglia, C. (2008). Mirrors in the Brain. How We Share Our Actions and Emotions. New York: Oxford University Press.

Robinson, B. W. (1967). Vocalization evoked from the forebrain in Macaca mulatta. Physiology and Behavior, 2, 345–354. doi:10.1016/0031-9384(67)90050-9

Robinson, G. E., Fernald, R. D., & Clayton, D. (2008). Genes and social behavior. Science, 322, 896–900. doi:10.1126/science.1159277

Rogers, L. J. (1980). Lateralisation in the avian brain. Bird Behavior, 2(1), 1–12. doi:10.3727/015613880791573835

Salinas-Melgoza, A., & Wright, T. F. (2012). Evidence for vocal learning and limited dispersal as dual mechanisms for dialect maintenance in a parrot. PLoS ONE, 7, e48667. doi:10.1371/journal.pone.0048667

Salmi, R., Hammerschmidt, K., & Doran-Sheehy, D. M. (2013). Western gorilla vocal repertoire and contextual use of vocalizations. Ethology, 119, 831–847. doi:10.1111/eth.12122

Salwiczek, L. H., & Wickler, W. (2004). Birdsong: An evolutionary parallel to human language. Semiotica, 151, 163– 182. doi:0037–1998/04/0151–0163

Sanvito, S., Galiberti, F., & Miller, E. H. (2007). Observational evidences of vocal learning in southern elephant seals: A longitudinal study. Ethology, 113, 137–146. doi:10.1111/j.1439-0310.2006.01306.x

Sasaki, C. T., Levine, P. A., Laitman, J. T., & Crelin, E. S. (1977). Postnatal descent of the epiglottis in man. Archives of Otolaryngology, 103, 169–171. doi:10.1001/archotol.1977.00780200095011

Saunders, D. A. (1983). Vocal repertoire and individual vocal recognition in the short-billed white-tailed black cockatoo, Calyptorhynchus funereuslatirostris Carnaby. Australian Wildlife Research, 10, 527–536. doi:10.1071/WR9830527

Savage-Rumbaugh, S., & McDonald, K. (1988). Deception and social manipulation in symbol-using apes. In R. W. Bryne & A. Whiten A. (Eds.), Machiavellian Intelligence: Social Expertise and the Evolution of Intellect in Monkeys, Apes, and Humans (pp. 224–237). New York: Clarendon Press/Oxford University Press.

Savage-Rumbaugh, S., Shanker, S. G., & Taylor, T. J. (1998). Apes, Language, and the Human Mind. New York: Oxford University Press.

Sawaguchi, T., & Kudo, H. (1990). Neocortical development and social structure in primates. Primates, 31, 283– 290. doi:10.1007/BF02380949

Schel, A. M., Machanda, Z., Townsend, S. W., Zuberbühler, K., & Slocombe, K. (2013). Chimpanzee food calls are directed at specific individuals. Animal Behaviour, 86, 955–965. dx.doi.org/10.1016/j.anbehav.2013.08.013

Schwagmeyer, P. L., Bartlett, T. L., & Schwabl, H. G. (2008). Dynamics of house sparrow biparental care: What contexts trigger partial compensation? Ethology, 114, 459–468. doi:10.1111/j.1439- 0310.2008.01480.x

Seibert, L. M. (2006). Social behavior of Psittacine birds. In A. U. Luescher (Ed.), Manual of Parrot Behavior (pp. 43–48). Ames, IA: Blackwell Publishing.

Sekulic, R., & Chivers, D. J. (1986). The significance of call duration in howler monkeys. International Journal of Primatology, 7, 183–190. doi:10.1007/BF02692317

Sherwood, C. C., Broadfield, D. C., Holloway, R. L., Gannon, P. J., & Hof, P. R. (2003). Variability of Broca’s area homologue in African great apes: Implications for language evolution. The Anatomical Record Part A, 271A, 276–285. doi:10.1002/ar.a.10046

Simonyan, K., & Horwitz, B. (2011). Laryngeal motor cortex and control of speech in humans. Neuroscientist, 17, 197–208. doi:10.1177/1073858410386727

Stahl, W. R. (1967). Scaling of respiratory variables in mammals. Journal of Applied Physiology, 22, 453–460.

Stoddard, P. K. (1996). Vocal recognition of neighbors by territorial passerines. In D. E. Kroodsma & E. H. Miller (Eds.), Ecology and Evolution of Acoustic Communication in Birds (pp. 356–374). Ithaca, NY: Cornell University Press.

Stoeger, A. S., Mietchen, D., Oh, S., de Silva, S., Herbst, C. T., Kwon, S., & Fitch, T. (2012). An Asian elephant imitates human speech. Current Biology, 22, 2144–2148. doi:10.1016/j.cub.2012.09.022

Striedter, G. F. (1994). The vocal control pathways in budgerigars differ from those in songbirds. Journal of Comparative Neurology, 343, 35–56. doi:10.1002/cne.903430104

Studdert-Kennedy, M. (1998). The particulate origins of language generativity: From syllable to gesture. In J. R Hurford, M. Studdert-Kennedy, & C. Knight (Eds.), Approaches to the Evolution of Language (pp. 169–176). Cambridge: Cambridge University Press.

Sturdy, C. B., Wild, J. M., & Mooney, R. (2003). Respiratory and telencephalic modulation of vocal motor neurons in the zebra finch. Journal of Neuroscience, 1(3), 1072–1086.

Suthers, R., Goller, F., & Pytte, C. (1999). The neuromuscular control of birdsong. Philosophical Transactions of the Royal Society B, 354, 927–939. doi:10.1098/rstb.1999.0444

Tchernichovski, O., Lints, T., Mitra, P. P., & Nottebohm, F. (1999). Vocal imitation in zebra finches is inversely related to model abundance. Proceedings of the National Academy of Sciences, 96(22), 12901– 12904. doi:10.1073/pnas.96.22.12901

Tchernichovski, O., & Nottebohm, F. (1998). Social inhibition of song imitation among sibling male zebra finches. Proceedings of the National Academy of Sciences, 95(15), 8951–8956. doi:10.1073/pnas.95.15.8951

Templeton, C. N., Greene, E., & Davis, K. (2005). Allometry of alarm calls: Black-capped chickadees encode information about predator size. Science, 308, 1934–1937. doi:10.1126/science.1108841

Teramitsu, I., Kudo, L. C., London, S. E., Geschwind, D. H., & White, S. A. (2004). Parallel FoxP1 and FoxP2 expression in songbird and human brain predicts functional interaction. The Journal of Neuroscience, 24, 3152–3163. doi:10.1523/JNEUROSCI.5589-03.2004

Thalmann, U., Geissmann, T., Simone, A., & Mutschler, T. (1993). The indris of Anjanaharibe-Sud, Northeastern Madagascar. International Journal of Primatology, 14, 357–381. doi:10.1007/BF02192772

Thomas, M. V., & Haas, R. C. (2004). Abundance, age structure, and spatial distribution of lake sturgeon (Acipenser fulvescens) in the St. Clair System. Michigan Department of Natural Resources, Lake St. Clair Fisheries Research Station, Harrison Township, MI. Fisheries Research Report, 2076.

Todt, D. (1975). Social learning of vocal patterns and modes of their applications in Grey parrots. Zeitschrift für Tierpsychologie, 39, 178–188. doi:10.1111/j.1439-0310.1975.tb00907.x

Tomasello, M. (2008). The Origins of Human Communication. Cambridge, MA: MIT Press.

Tomasello, M., & Call, J. (1997). Primate Cognition. Oxford: Oxford University Press.

Tu, H.-W., & Dooling, R. J. (2012). Perception of warble song in budgerigars (Melopsittacus undulates): Evidence for special processing. Animal Cognition, 15, 1151–1159. doi:10.10071-012-0539-1

Tu, H.-W., Osmanski, M. S., & Dooling, R. J. (2011). Learned vocalizations in budgerigars (Melopsittacus undulates): The relationship between contact calls and warble song. Journal of the Acoustical Society of America, 129, 2289–2297. doi:10.1121/1.3557035

Tucker, V. A. (1968). Respiratory exchange and evaporative water loss in the flying budgerigar. Journal of Experimental Biology, 48, 67–87.

Vargha-Khadem, F., Gadian, D. G., Copp, A., Mishkin, M. (2005). FOXP2 and the neuroanatomy of speech and language. Nature, 6, 131–138. doi:10.1038/nrn1605

West, M. J., Stroud, A. N., & King, A. P. (1983). Mimicry of the human voice by European starlings: The role of social interaction. The Wilson Bulletin, 95, 635–640.

Whangarei Native Bird Recovery Centre (n.d.). Woof Woof the Talking Tui. Retrieved from http://www.nbr.org.nz/node/7

Whitaker, H. A. (1976). Neurobiology of language. In E. C. Carterette & M. P. Friedman (Eds.), Handbook of Perception, vol. VII: Language and Speech (pp. 121–144). New York: Academic Press.

Wich, S. A., Krützen, M., Lameira, A. R., Nater, A., Arora, N., Bastian, M. L., . . . van Schaik, C. P. (2012). Call cultures in orang-utans? PLoS ONE, 7, e36180. doi:10.1371/journal.pone.0036180

Wild, J. M. (1997). Neural pathways for the control of birdsong production. Journal of Neurobiology, 33 (5), 653–670. doi:10.1002/(SICI)1097-4695(19971105)33:5<653::AID-

NEU11>3.0.CO;2-A Wild, J. M., Goller, F., & Suthers, R. A. (1998). Inspiratory muscle activity during birdsong. Journal of Neurobiology, 36, 441–453. doi:0.1002/(SICI)1097-4695(19980905) 36:33.0.CO;2-E

Williams, H., Crane, L. A., Hale, T. K., Esposito, M. A., & Nottebohm, F. (1992). Right-side dominance for song control in the zebra finch. Developmental Neurobiology, 23, 1006–1020. doi:10.1002/neu.480230807

Wind, J. (1983). Primate evolution and the emergence of speech. In E. de Groher (Ed.), Glossogenetics: The Origin and Evolution of Language (pp. 15–35). Paris: Harwood Academic Publishers.

Winkworth, A. L., Davis, P. J., Adams, R. D., & Ellis, E. (1995). Breathing patterns during spontaneous speech. Journal of Speech Hearing Research, 38, 124–144. doi:10.1044/jshr.3801.124

Winter, P., Handley, P., Ploog, D., & Schott, D. (1973). Ontogeny of squirrel monkey calls under normal conditions and under acoustic isolation. Behaviour, 47, 230–239. doi:10.1163/156853973X00085

Wong, J., Stewart, P. D., & MacDonald, D. W. (1999). Vocal repertoire in the European badger (Meles meles): Structure, context, and function. Journal of Mammalogy, 80(2), 570–588. doi:10.2307/1383302

Zann, R. (1990). Song and call learning in wild zebra finches in south-east Australia. Animal Behaviour, 40, 811–828. doi:10.1016/S0003-3472(05)80982-0

Zann, R., & Dunstan, E. (2008). Mimetic song in superb lyrebirds: Species mimicked and mimetic accuracy in different populations and age classes. Animal Behaviour, 76, 1043–1054. doi:10.1016/j.anbehav.2008.05.021

Zeveloff, S. I., & Boyce, M. S. (1982). Why human neonates are so altricial. The American Naturalist, 120(4), 537– 542. doi:10.1086/284010

Zlatev, J. (2002). Mimesis: The “missing link” between signals and symbols in phylogeny and ontogeny? In A. Pajunen (Ed.), Mimesis, Sign and Language Evolution (pp. 93–122). Turku, Finland: Turku University Press.

Zollikofer, C. P. E., Ponce de León, M. S., Lieberman, D. E., Guy, F., Pilbeam, D., Likius, A., . . . Brunet, M. (2005). Virtual cranial reconstruction of Sahelanthropus tchadensis. Nature, 434, 755– 759. doi:10.1038/nature03397

Zuberbühler, K., & Janmaat, K. R. L. (2010). Foraging cognition in non-human primates. In M. Platt & A. Ghazanfar (Eds.), Primate Neuroethology (pp. 64–83). Oxford: Oxford University Press.

Volume 9: pp. 75-98


vol9_condro_white_thumbRecent Advances in the Genetics of Vocal Learning

Michael C. Condro
Molecular, Cellular and Integrative Physiology Interdepartmental Program, University of California, Los Angeles

Stephanie A. White
Department of Integrative Biology and Physiology, University of California, Los Angeles

Reading Options:

Continue reading below, or:
Read/Download PDF | Add to Endnote


Abstract

Language is a complex communicative behavior unique to humans, and its genetic basis is poorly understood. Genes associated with human speech and language disorders provide some insights, originating with the FOXP2 transcription factor, a mutation in which is the source of an inherited form of developmental verbal dyspraxia. Subsequently, targets of FOXP2 regulation have been associated with speech and language disorders, along with other genes. Here, we review these recent findings that implicate genetic factors in human speech. Due to the exclusivity of language to humans, no single animal model is sufficient to study the complete behavioral effects of these genes. Fortunately, some animals possess subcomponents of language. One such subcomponent is vocal learning, which though rare in the animal kingdom, is shared with songbirds. We therefore discuss how songbird studies have contributed to the current understanding of genetic factors that impact human speech, and support the continued use of this animal model for such studies in the future.

Keywords: Autism, Basal ganglia, Cntnap2, FoxP1, FoxP2, KE family, Speech, Vocal learning, Zebra finch

Author Note: Author Note: Correspondence concerning this article should be addressed to Stephanie A. White, PhD, Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Dr. East, Los Angeles, CA 90095-7239. E-mail: sawhite@ucla.edu.


Introduction

Vocal learning, which includes the ability to imitate sounds with one’s voice, is a rare trait in the animal kingdom. To date, only a few groups of mammals have demonstrated a capacity for vocal learning. These include certain species of echolocating bats, cetaceans, pinnipeds, elephants, and of course, humans (Fitch, 2012; Knornschild, Nagy, Metz, Mayer, & von Helversen, 2010; Stoeger et al., 2012). Outside of mammals, three groups of birds are capable of learning a portion of their vocalizations, namely hummingbirds, parrots, and songbirds, the last of which make up about half of all bird species (Reiner et al., 2004). The disparate pattern of vocal learning across taxa is characteristic of convergent evolution. A parsimonious explanation is thus that preadaptations for vocal learning emerged from non-learning ancestors of each taxon (Fitch, 2011). These preadaptations are likely genetically encoded, which suggests that despite the distant relationships between vocal learners, there are some common genetic factors. One 2013), even though these disorders are also characterized by language deficits. In contrast, within a sample of dyslexic children and their unaffected relatives, a single nucleotide polymorphism (T vs. C) in an intron of FOXP2, identified as rs7782412, was correlated with nonword repetition (NWR) score (Peter et al., 2011), with the major allele (T, frequency of 0.558) being associated with impairment on this task. Since dyslexia is associated with impairments of written, but not spoken, language (Lyon, Shaywitz, & Shaywitz, 2003), these data suggest that FOXP2 aberrations affect language processing as well as spoken motor ability. Notably, language processing deficits and low verbal IQ are symptomatic in the KE family as well (Vargha-Khadem, Watkins, Alcock, Fletcher, & Passingham, 1995), though it is unclear whether these traits are directly related to the FOXP2 mutation, or are sequelae of DVD.

FOXP2 Function in the Developing Brain

In all animals, the FOX family of transcription factors is involved in regulating biological processes that affect embryogenesis and tissue development, as well as processes underlying adult cancer and aging (Benayoun, Caburet, & Veitia, 2011; Carlsson & Mahlapuu, 2002). FoxP1, 2, and 4 are expressed in embryonic neural tissues (Lu, Li, Yang, & Morrisey, 2002; Shu et al., 2001), and may therefore mediate neurogenesis and/or differentiation. Experimental reduction of Foxp2 in the cortex of embryonic mice through either shRNA or overexpression of the dominant negative KE form of FoxP2 repressed the transition from radial precursor to immediate neuronal progenitor, resulting in decreased cortical neurogenesis (Tsui, Vessey, Tomita, Kaplan, & Miller, 2013). Interestingly, overexpression of human FOXP2 increases neurogenesis, whereas overexpression of murine Foxp2 does not. These data indicate that human FOXP2 exerts a greater neurogenic effect, which is perhaps significant for the construction of the brain, including neural circuits involved in language processing. Foxp2 (here indicating the mouse form of the protein by capitalizing only the first letter, whereas the human form contains all capitals, and camel case for all other chordates) (Kaestner, Knochel, & Martinez, 2000) in conjunction with Foxp4, appears to promote neurogenesis by regulation of N-cadherin (Rousso et al., 2012). In embryonic chick and mouse spinal cord, overexpression of either FoxP increases the release of neural progenitors from the neuroepithelium, whereas knockdown of both prevents this release. These effects have yet to be tested in the cortex.

Another mechanism whereby FoxP2 may promote the development of vocal learning circuitry is through neurite development, especially during embryogenesis. A recent gene ontology study using Foxp2-ChIP and expression arrays found that Foxp2 targets related to neurite development are enriched (Vernes et al., 2011). Using ex vivo neuronal cultures, this study found that expression of wild type Foxp2 accelerates neurite growth, whereas expression of the KE mutant form has the opposite effect. Ectopic expression of Foxp2, achieved by removing the 3’UTR, which includes its regulatory elements, delays neurite outgrowth in vitro, though by seven days neurites form properly (Clovis, Enard, Marinaro, Huttner, & De Pietri Tonelli, 2012).

FOXP2 regulates gene activity by binding to DNA either as a homodimer, or by heterodimerizing with FOXP1 or FOXP4. There are six known isoforms of FOXP2 (Figure 1), two of which are truncated and lack FOX domains (Bruce & Margolis, 2002). The truncated forms, referred to as FOXP2.10+ due to their alternate splicing at exon 10 (Figure 1), do not localize to the nucleus, but may still dimerize with other FOXP2 isoforms (Vernes et al., 2006). Therefore, it is hypothesized that FOXP2.10+ forms act as posttranslational regulators of FOXP2 activity. FOXP2 can also interact with C-terminal binding protein (CtBP) to repress transcription (Li et al., 2004). A new association has been identified between FOXP2 and the gene protection of telomeres 1 (POT1; Tanabe, Fujita, & Momoi, 2011). In cell culture, when POT1 is expressed alone or coexpressed with the KE dominant negative mutation (R553H) of FOXP2, it is not localized in the nucleus. Only when POT1 is coexpressed with wild type FOXP2 is nuclear localization observed. Loss of POT1 can elicit a DNA damage response and cause cell arrest (Hockemeyer, Sfeir, Shay, Wright, & de Lange, 2005). FOXP2, in conjunction with POT1, could therefore affect cell cycling during development. The human phenotype exhibited by the KE mutation may be partly mediated by the inability of the mutant FOXP2 to associate with POT1, thereby disrupting cell cycling during the development of neural tissues subsequently necessary for vocal learning (Tanabe et al., 2011).

Molecular Phylogeny of FoxP2

FoxP2 is highly conserved across species, particularly in the zinc finger and DNA-binding FOX domains (Figure 1). Two amino acid differences between humans and chimpanzees (303N and 325S in the human isoform; Figure 1) are unique to humans among living primates (Enard et al., 2002). Interestingly, these substitutions are shared with extinct hominids such as Neanderthals (Green et al., 2010; Krause et al., 2007; Reich et al., 2010), for whom the capability for language is still uncertain (Benítez-Burraco & Longa, 2012). Between the zebra finch and human isoforms, there are only five additional substitutions, including one in the zinc finger domain, which is conserved in primates and rodents, but differs in the zebra finch ortholog (Teramitsu, Kudo, London, Geschwind, & White, 2004). Importantly, the DNA binding region is conserved between zebra finches and humans, including the arginine residue corresponding to position 553 in humans that is the site of the KE mutation. There is a considerable amount of homology (>80%) in the zinc finger, leucine zipper, and DNA-binding domains between human FOXP2 and the single FoxP ortholog of fruit flies and honeybees, from which it is believed the vertebrate FoxP family expanded (Kiya, Itoh, & Kubo, 2008; Scharff & Petri, 2011). As in vertebrates, invertebrate FoxP is predicted to be involved in procedural learning and communication, consistent with its neural expression and suggesting that it is most distinct from mammalian FoxP3, which is not associated with neural cell types (Scharff & Petri, 2011). FoxP2 is not well-conserved among echolocating bats nor between bats and other mammals, however, which has been postulated to be the result of a selection pressure on FoxP2 in bats for the evolution of echolocation (Li, Wang, Rossiter, Jones, & Zhang, 2007).

Figure 1. Schematic of human FOXP2 isoforms I–VI. FOXP2 is alternatively spliced as two major isoforms: the full-length isoform I and a truncated isoform III. Variations of either major isoform contain inserted or omitted amino acids (II, IV–VI), indicated here as the difference in number of amino acids (gray triangles). Both major isoforms possess a glutamine-rich (Q-rich) area, zinc finger (Zn) and leucine zipper (Leu) domains. Full-length isoforms of FOXP2 also possess a DNA-binding domain and an acid region on the C-terminus. Isoforms III and VI also have an additional 10 amino acids on the C-terminus that are not shared with the full-length isoforms. Arrows indicate amino acid substitutions between human and chimpanzee (303 and 325) or related to human speech disorders (328 and 553).

Figure 1. Schematic of human FOXP2 isoforms I–VI. FOXP2 is alternatively spliced as two major isoforms: the full-length isoform I and a truncated isoform III. Variations of either major isoform contain inserted or omitted amino acids (II, IV–VI), indicated here as the difference in number of amino acids (gray triangles). Both major isoforms possess a glutamine-rich (Q-rich) area, zinc finger (Zn) and leucine zipper (Leu) domains. Full-length isoforms of FOXP2 also possess a DNA-binding domain and an acid region on the C-terminus. Isoforms III and VI also have an additional 10 amino acids on the C-terminus that are not shared with the full-length isoforms. Arrows indicate amino acid substitutions between human and chimpanzee (303 and 325) or related to human speech disorders (328 and 553).

Songbird Studies of FoxP2

Humans are the only living animals that communicate with language (Berwick, Friederici, Chomsky, & Bolhuis, 2013), leaving no single animal model that sufficiently encapsulates every component of the behavior. However, facets of language are shared with other species. Vocal learning is one such facet that is shared with select groups of mammals, but as yet common laboratory models (e.g. rats, mice, nonhuman primates) fail to demonstrate this ability (Arriaga, Zhou, & Jarvis, 2012; Fitch, 2000; Mahrt, Perkel, Tong, Rubel, & Portfors, 2013). Rather, songbirds have been the principal animal models for vocal imitation in a laboratory setting (Panaitof, 2012). Vocal learning in both humans and songbirds relies on connections between the cortex, basal ganglia, and thalamus (Doupe & Kuhl, 1999). An advantage of the songbird model is that the neural structures responsible for vocal production and learning, called song production nuclei, are interconnected and anatomically distinct from the larger neurological subdivisions in which they reside, but are comprised of similar cell types. The song production nuclei are therefore assumed to function similarly to the circuits underlying other forms of procedural learning, but are dedicated to vocal learning. This feature of the songbird neuroanatomy has been incredibly useful for studies of vocal learning genes, many of which are discussed in this review. Among songbirds, zebra finches have been widely used due to their ease of breeding in captivity, as well as the sexual dimorphism of vocal learning (only males sing; Immelmann, 1969) and the song production system, which is incomplete in females (Konishi & Akutagawa, 1985; Nottebohm & Arnold, 1976).

FoxP2 mRNA expression is robust in the basal ganglia of humans and zebra finches (Teramitsu et al., 2004). In the zebra finch striatopallidal song nucleus, area X, FoxP2 transcript and protein levels correlate negatively with early morning singing. FoxP2 protein decreases in area X over the course of two hours when a male directs his songs at a female or when he practices them alone (Miller et al., 2008; Thompson et al., 2013); the latter is referred to as undirected singing. The transcript decreases during the course of two hours of undirected, but not directed, singing (Hilliard, Miller, Horvath, & White, 2012; Teramitsu & White, 2006; Teramitsu, Poopatanapong, Torrisi, & White, 2010). Down regulation of the mRNA is most potent in young birds engaged in sensorimotor learning (Teramitsu et al., 2010) when the more the juvenile practices, the lower his area X FoxP2 levels. This regulation appears largely due to motor activity, rather than auditory input, as levels also decrease in birds that have been deafened. However, there may be an additional auditory component to this phenomenon, as the degree of down regulation is only correlated with the amount of singing (Hilliard, Miller, Fraley, Horvath, & White, 2012) in juveniles that maintained their hearing (Teramitsu et al., 2010). The distinct behavioral regulation of the mRNA and protein suggests that there is post-transcriptional regulation of FoxP2, at least in the case of directed singing. In any case, both phenomena have been replicated at the two hour time point, namely that the protein levels decline with both directed and undirected singing, whereas the mRNA only declines with undirected song practice. Specifically, new findings show that microRNAs that target FoxP2 are up-regulated during undirected, but not directed, singing and lead to corresponding decreases in FoxP2 mRNA only for the former (Shi et al., 2013).

In their 2013 study of FoxP2 protein expression, Thompson et al. (2013) identified two categories of FoxP2-labeled neurons: those with large nuclei intensely labeled by the FoxP2 antibody, and those with smaller nuclei and weaker labeling. One possibility is that these subtypes represent different stages of maturation within a single population of medium spiny neurons (MSNs). Intensely labeled neurons may be younger neurons either in the process of migrating or already having migrated to area X, whereas weakly labeled neurons may be mature and integrated into the basal ganglia microcircuitry. The intensely labeled neurons peak in density within area X around 35 days and decline with age. The density of weakly labeled ‘mature’ neurons does not change with age. However, the density of these neurons in area X is behavioral context dependent. Adult males that sing for two hours in the morning exhibit a reduced density of weakly labeled neurons, a finding that replicates the behaviorally modulated levels of FoxP2 described by Fisher et al. (1998) and Miller et al. (2008).

In the zebra finch, experimentally induced reduction of FoxP2 at a developmental stage prior to the onset of vocal motor learning via injection of lentivirus containing an shRNA construct partially impairs the ability to learn the tutor’s song (Haesler et al., 2007). Though shRNA-injected young zebra finches are capable of producing sounds similar to those of their tutors, they consistently fail to accurately imitate the tutor’s song, often omitting or repeating individual syllables. Additionally, they are unable to accurately imitate the spectral characteristics and timing of the tutor’s song. During this period of song learning, new neurons expressing FoxP2, which are hypothesized to affect behavioral plasticity, migrate into area X (Rochefort, He, Scotto-Lomassese, & Scharff, 2007). Surprisingly, though, knockdown of FoxP2 does not prevent the proliferation of new neurons from the ventricular zone. It does, however, reduce the number of dendritic spines on MSNs, suggesting that FoxP2 affects neuronal plasticity without affecting proliferation and migration of new neurons (Schulz, Haesler, Scharff, & Rochefort, 2010). These data provide support for a functional role of FoxP2 in vocal learning subserved by basal ganglia circuits, in addition to mediating the development of the brain regions involved.

Mouse Models of Foxp2

Several mutant mice strains have been generated to study the effects of Foxp2 on brain morphology as well as vocal and nonvocal behaviors. In one such model, the two amino acids characteristic to humans (Enard et al., 2002) were changed to conform to the human sequence (Enard et al., 2009). The resulting mice have altered cortico-basal ganglia circuitry in the form of increased dendrite length in Foxp2-expressing bipolar spiny neurons in layer 6 of the primary motor cortex, MSNs in the striatum, and neurons in the parafascicular nucleus of the thalamus. Longterm depression (LTD) is increased in MSNs of the striatum, and dopamine concentrations are reduced in several brain regions, including the striatum (Reimers-Kipping, Hevers, Pääbo, & Enard, 2011). Despite also expressing the humanlike Foxp2 protein, dendrite lengths of amygdalar and cerebellar Purkinje neurons are unchanged. Purkinje cell LTD is also similar to control levels, which suggests that the humanlike Foxp2 impacts mainly basal ganglia microcircuits (Enard et al., 2009; Reimers-Kipping et al., 2011). In terms of behavior, the mutant mice exhibit decreased exploration, spend more time in groups, and as neonates emit ultrasonic vocalizations with reduced pitch and increased frequency modulation compared to control mice. Interestingly, FOXP2 knockout heterozygotes with a functional wild type allele have the opposite effects on dopamine levels and behavior (Enard et al., 2009).

Several mouse models have been generated to mimic FOXP2 mutations associated with human disorders. These knock-in mice include murine versions of the KE mutation (R552H; Fujita et al., 2008; Groszer et al., 2008), a similar mutation that results in an amino acid substitution at a different site within the DNA binding domain (N549K; Groszer et al., 2008), and a truncation (S321X) that fails to produce a protein, similar to a human mutation associated with speech impairment (Groszer et al., 2008). These loss of function knock-in mutations are lethal in homozygotes, with mice usually dying within the first month of life, though N549K homozygotes can survive for several months. All knock-in mutants have decreased cerebellar volume and Purkinje cell dendritic arbor (Fujita et al., 2008; Groszer et al., 2008), but otherwise no gross anatomical disturbances were observed in the rest of the brain. Homozygous knockout, R552H, and S321X mutant mouse pups make fewer ultrasonic distress calls, though there are mixed reports about the quality of these vocalizations (Fujita et al., 2008; Gaub, Groszer, Fisher, & Ehret, 2010; Groszer et al., 2008; Shu et al., 2005). Recently, Bowers, Perez-Pouchoulen, Edwards, & McCarthy (2013) investigated these calls using wild type rats and found qualitative and quantitative sex differences. Similar to mice, isolation calls are emitted from rat pups separated from their dam and trigger her to retrieve the pup back to the nest. The authors found that male pups call more frequently, at a lower pitch, and more quietly than do female pups. In turn, the dam responds differently to calls made by each sex, preferentially retrieving male before female pups. Male rat pups have more Foxp2 protein than female pups in several brain areas. Experimental reduction of Foxp2 by injection of siRNA into the ventricles during the first two days of life reverses this sex effect in calling behavior. Treated male pups call less frequently and at a higher pitch than control males. Notably, treatment of female pups with siRNA causes their vocalizations to become male-like, with higher frequency of calling, lower pitch, and lower amplitude. The authors posit that the reversal caused by Foxp2 siRNA is the result of a decrease in Foxp2 in males and a rebound-effect increase in females, although no evidence is provided for the latter. Interestingly, the dam retrieves siRNA-injected females before siRNA-treated males, providing evidence that the retrieval response of the dam depends on the vocal behavior rather than other sexually dimorphic characteristics. This study also finds that, in postmortem human brain tissue, there is more FOXP2 in the cortices of 4-year-old girls than age-matched boys, which coincides with gender-based language differences in children at this age. The authors posit that sex differences in brain FoxP2 levels correlate with the more ‘communicative’ sex in human and rodent species.

Since separation calls are not learned (Arriaga et al., 2012) and therefore are not analogous to human speech, studies in Foxp2 mutants examined other classical learned behavioral skills. One such skill was measured by Morris water maze place learning, in which mice were given four consecutive training trials each day for four days, after which the platform was moved and training began again (Santucci, 1995). Heterozygote knockout mice perform as well as wild types (Shu et al., 2005), indicating that this hippocampal-based learning task is not affected by loss of Foxp2. However, R552H mutants are impaired on the accelerating rotarod, a procedural learning task in which mice are placed on a rod that rotates around its axis at an increasing rate and the amount of time before the animal falls from the rod is recorded. Performance on the rotarod relies on basal ganglia activity, suggesting that R552H mutant mice have deficits in activity in these brain regions (French et al., 2012; Groszer et al., 2008). R552H heterozygous mutant mice have corresponding neurophysiological abnormalities, including reduced striatal LTD and increased cerebellar paired pulse facilitation (Groszer et al., 2008). In vivo electrophysiological recordings of these mice during the accelerating rotarod learning task show that striatal firing rate activity decreases in R552H mutants, whereas it increases in wild type, and temporal coordination is altered (French et al., 2012). Interestingly, these mutant mice can perform other striatal-based learning tasks, such as pressing a lever for a reward, equally well as controls. These data suggest that Foxp2 activity in the basal ganglia is involved in procedural learning tasks in nonvocal learning species, perhaps in a similar manner to vocal learning in humans and songbirds.

2. FOXP1

FoxP1 is the most similar molecule to FoxP2 and, perhaps not surprisingly, is also linked to human speech. As previously mentioned, FoxP1 and FoxP2 may form heterodimers that regulate transcription in areas where their expression overlaps (Li et al., 2004; Shu et al., 2001; B. Wang, Lin, Li, & Tucker, 2003). Initial support for a role of FOXP1 in vocal learning stems from a study of comparative gene expression in two vocal learners: humans and zebra finches. Unlike FoxP2, for which differential expression in song nuclei depends on behavior, FoxP1 signals constitutively ‘mark’ the song system, with mRNA enrichment in area X (in males), HVC, and RA relative to their surrounding tissues (Teramitsu et al., 2004). In humans, FOXP1 and FOXP2 are found in separate cortical layers: the former is found primarily in layers 2/3 with less expression in deeper layers, whereas the latter is primarily in layer 6 (Ferland, Cherry, Preware, Morrisey, & Walsh, 2003; Teramitsu et al., 2004). Both transcripts are expressed in the human striatum, similar to the expression pattern in the basal ganglia nucleus area X of songbirds. The possible co-regulation of transcription by FoxP members in the songbird song production system, and the comparative gene expression in humans suggested that FOXP1 also plays a role in human language (Teramitsu et al., 2004). Subsequently, Pariani, Spencer, Graham, & Rimoin (2009) reported the first human case of FOXP1 alteration and speech impairment, in which the patient had a large deletion in chromosome 3 including the FOXP1 gene. Speech delay was one of several deficits, which also included anatomical and neurological abnormalities. Shortly after this report, several similar cases were published in which patients with FOXP1 deletions presented cognitive deficits, motor control deficits, speech delay, and autism (Carr et al., 2010; Hamdan et al., 2010; Horn, 2012; Horn et al., 2010; O’Roak et al., 2011; Palumbo et al., 2013; Talkowski et al., 2012). In all the reported cases, however, the language impairment described was more consistent with speech delay than DVD. A screen of patients with DVD failed to identify FOXP1 as a risk factor (Vernes, MacDermot, Monaco, & Fisher, 2009). Though many of the phenotypes associated with mutations in FOXP1 and FOXP2 are non-overlapping, language impairment is common to both (Bacon & Rappold, 2012).

3. CNTNAP2

CNTNAP2 in Human Disease

Similar to the discovery of the relationship between FOXP2 and language through the KE family, a rare mutation in the contactin associated protein-like 2 (CNTNAP2) gene was discovered in a genetically related population of Old Order Amish children (Strauss et al., 2006). Some members of this group are afflicted with cortical dysplasiafocal epilepsy (CDFE). The disorder is characterized by the onset of seizures at about 2 years of age, mental retardation, hyperactivity, pervasive developmental delay or autism in the majority of cases, and language regression by the age of 3 in all cases. Patients with CDFE are homozygous for a deletion of a single base pair in CNTNAP2 exon 22, 3709delG. Subsequent to the initial association between CNTNAP2 mutation and CDFE, it was revealed that it is transcriptionally regulated by FOXP2. In chromatin immunoprecipitation (ChIP) assays, fragments of intron 1 of CNTNAP2 were bound by FOXP2 at the canonical binding sequence CAAATT (Vernes et al., 2008; Vernes et al., 2011). Mutation of these sites to CGGGTT prevented FOXP2 binding. Overexpression of FOXP2 in the human-derived neuroblastoma cell line SY5Y decreased CNTNAP2 transcription. To further investigate the relationship between CNTNAP2 and language ability, variants of the gene were screened in a cohort of families with SLI-afflicted members. Nine intronic SNPs between exons 13 and 15 of CNTNAP2 correlated with NWR scores. The one SNP most correlated, rs17236239, was also associated with expressive language score. Quantitative transmission disequilibrium testing (QTDT) confirmed a relationship between measures of language ability and four of these SNPs, but failed to confirm a relationship for rs7794745 in a new sample of families containing members with SLI (Newbury et al., 2011). None of the SNPs associated with language-related QTDT measures in a sample of families with dyslexia, indicating that there are separate factors that affect language ability.

Other common CNTNAP2 polymorphisms have been identified that associate with diagnoses of autism (Arking et al., 2008; Bakkaloglu et al., 2008), for which language impairment is a core deficit, and a language-related measure, age at first word (Alarcón et al., 2008). Interestingly, inherited CNTNAP2 polymorphisms that are associated with disease occur mainly in introns (Alarcón et al., 2008; Arking et al., 2008), suggesting either these SNPs are in linkage disequilibrium with yet unidentified markers in exons, or the SNPs themselves affect transcriptional regulation of the gene. Quantitative transmission disequilibrium testing revealed a association between the SNP rs2710102 and NWR (Peter et al., 2011). Thirteen de novo mutations in CNTNAP2 have been described in ASD patients that result in an amino acid change in the protein, eight of which were predicted to hinder function (Bakkaloglu et al., 2008). The de novo mutations, along with the CDFE mutation identified by Strauss et al. (2006), were investigated further to determine whether they did in fact affect protein function. HEK cells and rat hippocampal neurons were transfected with either wild type human CNTNAP2 or the mutant forms (Falivelli et al., 2012). The mutation associated with CDFE, 3709delG, causes a frameshift that results in the loss of the single transmembrane and intracellular domains of the protein (Figure 2, Strauss et al., 2006). This causes the normally membrane-bound protein to be secreted instead (Falivelli et al., 2012), presumably eliminating its normal functionality, and possibly introducing novel effects.. Another mutant, D1129H (Figure 2), also prevents surface expression of CNTNAP2, and instead the protein remains restricted to the endoplasmic reticulum, unable to move to the plasma membrane, interferes with the LNS4 domain of CNTNAP2, and is presumed to cause misfolding of the protein. Most other mutations investigated did not show restricted localization to the ER, though a mutation in a highly conserved amino acid, I869T (Figure 2), had less surface staining than the wild type form of the protein. Theoretically, mutations that interfere with intracellular trafficking of CNTNAP2 would also interfere with protein function. However, with the exception of 3709delG, these mutations do not always result in an autistic phenotype, indicating that other genetic, environmental, and developmental factors are involved in the presentation of the disorder.

CNTNAP2 Function in the Brain

Investigation of genes related to the formation of language-related brain areas revealed CNTNAP2 enrichment in the cortical superior temporal gyrus, associated with language processing and production (Abrahams & Geschwind, 2008). Moreover, CNTNAP2 is enriched in embryonic human frontal cortex, but not in rat or mouse at comparable stages of development. Not only do these data suggest a potential role for CNTNAP2 in the development of neural circuitry underlying language, they conform to the idea that this enrichment is relevant to vocal learning in humans, a behavior not shared with rodents.

Figure 2. Schematic of human CNTNAP2. CNTNAP2 consists of a single discoidin domain (DISC), four laminin-G domains (LamG), EGF repeats, a single transmembrane region (TM), and a putative protein 4.1 binding region (4.1m). CDFE indicates the subregion of the protein that is deleted in cases of cortical dysplasia-focal epilepsy in an Old Order Amish population (Li et al., 2004; Strauss et al., 2006). Arrows indicate two other amino acid changes associated with language impairment (869 and 1129).

Figure 2. Schematic of human CNTNAP2. CNTNAP2 consists of a single discoidin domain (DISC), four laminin-G domains (LamG), EGF repeats, a single transmembrane region (TM), and a putative protein 4.1 binding region (4.1m). CDFE indicates the subregion of the protein that is deleted in cases of cortical dysplasia-focal epilepsy in an Old Order Amish population (Li et al., 2004; Strauss et al., 2006). Arrows indicate two other amino acid changes associated with language impairment (869 and 1129).

The brains of healthy and autistic individuals homozygous for risk alleles rs7794745 and rs2710102 exhibit functional differences. Subjects with one or both risk variants exhibit increased activation of the frontal operculum and medial frontal gyrus relative to subjects homozygous for the non-risk allele (Whalley et al., 2011). Event-related brain potentials are altered during a language perception task in individuals carrying the rs7794745 risk allele (Kos et al., 2012). Scott-Van Zeeland, Abrahams, et al. (2010) investigated the correlation of risk allele rs2710102 with connectivity both within the medial prefrontal cortex (mPFC) and between other areas. In this study, subjects participated in a reward-based learning task in which they were presented with abstract images and were asked to assign them into either “Group 1” or “Group 2.” Upon correct classification, subjects were either given a monetary or social reward, or a “neutral” reward in which they were simply told whether or not they were correct. This experimental paradigm activates frontostriatal circuits (Scott-Van Zeeland, Dapretto, Ghahremani, Poldrack, & Bookheimer, 2010). Subjects with the CNTNAP2 risk allele rs2710102 exhibited increased local connectivity in the mPFC relative to subjects without the risk variant. This occurred in a genetically dominant fashion regardless of the autism phenotype of the risk allele carriers. In addition, risk allele carriers had less focused long-range connectivity between the mPFC and several other brain areas, as well as decreased lateralization, a result which is associated with autism-like behaviors. These data suggest that CNTNAP2 variants increase the risk of autism through alteration of frontal lobar connectivity.

Animal Models for Cntnap2

As yet, the most well-characterized function of Cntnap2 is to cluster voltage-gated potassium channels at juxtaparanodes of axons in the peripheral nervous system (Poliak et al., 2003). Recently, another potential function was discovered through an RNA interference (RNAi) survey of autism susceptibility genes (Anderson et al., 2012). Of the 13 genes included in the RNAi screen, Cntnap2 knockdown had the most pronounced effects on network activity in mouse hippocampal cultures. In mouse cortical cultures transfected with short hairpin RNA (shRNA) targeting endogenous Cntnap2, calcium transients from evoked synaptic responses were reduced in amplitude to approximately 70% of controls, though action potential frequency was not affected. Conversely, knockdown of the binding partner of Cntnap2, contactin 2, had the opposite effect, increasing the amplitude of the action potential. Cntnap2 expression level has no effect on neuronal excitability. Instead, the underlying cause of the action potential attenuation is a global decrease in synaptic transmission. Both excitatory and inhibitory evoked currents are reduced by the shRNA, as well as the frequency of miniature postsynaptic potentials, suggesting that the number of synaptic sites on affected neurons is reduced. This is further confirmed by changes to cellular morphology of transfected neurons. Cntnap2 knockdown results in shorter neurites with fewer branches, and dendritic spines with smaller spine heads. These data are evidence that Cntnap2 may affect the development of neurons by increasing the number of active synaptic sites and facilitating network activity.

Given the evidence for a role of CNTNAP2 in human speech, it may also function in birdsong (Panaitof, Abrahams, Dong, Geschwind, & White, 2010). In adult male zebra finches, Cntnap2 transcript is enriched in the robust nucleus of the arcopallium (RA) and the lateral magnocellular nucleus of the anterior nidopallium (LMAN), cortical nuclei in the song production system. Projection neurons from RA are similar to layer 5 pyramidal neurons in mammalian cortex whose axons descend below the telencephalon to synapse onto motor neurons (Jarvis, 2004), and LMAN shares similarities with the mammalian prefrontal cortex (Kojima, Kao, & Doupe, 2013). No such enrichment of Cntnap2 is observed in HVC (acronym used as a proper name), another song nucleus analogous to mammalian cortical layer 2/3 (Jarvis, 2004), and there is reduced expression in area X relative to the striatopallidum. Each song nucleus is comprised of similar cell types as those in the surrounding tissues, which suggests that the differential expression of genes within the song nucleus indicates a specific role for those genes in vocal learning and/or production. In contrast to males, adult females have moderate transcript levels in RA and LMAN. Female zebra finches have an underdeveloped area X that is not visible by common staining procedures (Balmer, Carels, Frisch, & Nick, 2009), but still Cntnap2 is uniform across the entire striatopallidum. Interestingly, in young females (<50d) Cntnap2 is enriched in RA to the same degree as for males, and declines to the level of the surrounding arcopallium with age. The reduction in gene expression coincides with the sensorimotor period of song learning in males, a time at which the male begins to practice singing. The percentage of cells expressing the protein in female RA decreases at this time point (Condro & White, 2014). This sexually dimorphic expression supports the hypothesis that Cntnap2 expression in RA is important for proper production of learned vocalizations in songbirds. According to this hypothesis, interference of Cntnap2 translation in male RA should disrupt song learning and/or production (Haesler et al., 2007).

As with Foxp2, mouse models of Cntnap2 risk variants may not capture language deficits associated with their respective disorders. However, they can be used to study other aspects of behavior and physiology that may impact future studies focused on vocal learning. Initially, outbred Cntnap2(-/-) mice were reported to have no gross anatomical or neurological abnormalities (Poliak et al., 2003). However, when these mice were crossbred with the C57BL/6J strain, subsequent generations exhibited neurological abnormalities similar to human patients with CDFE (Strauss et al., 2006), including epileptic seizures induced by mild handling starting before 6 months of age (Penagarikano et al., 2011). These knockout mice present neuronal migration abnormalities, with an increase in the incidence of ectopic neurons, a reduced number of inhibitory interneurons in the cortex and the striatum, along with impaired network synchrony in the cortex. Additionally, there is increased spontaneous inhibitory activity in cortical layers 2/3, disrupting the balance between inhibition and excitation (Lazaro, Penagarikano, Dong, Geschwind, & Golshani, 2012). These mice exhibit behavior similar to the human autistic phenotype, including repetitive motions, such as self-grooming and digging, behavioral inflexibility on learned tasks, such as the Morris water maze or T maze, decreased social activity with other mice, reduced nest building, and a decrease in the number of ultrasonic separation calls. Less frequent vocalizations could be symptomatic of impaired communication similar to language regression in autism, or alternatively due to a decreased motivation for maternal interactions, similar to social impairment in autistic children. The two hypotheses are not mutually exclusive, though the former is less likely, since this particular call type in mice is innate (Arriaga et al., 2012) and therefore not subject to regression. Interestingly, many of the behavioral deficits in the knockout mice can be partially rescued by treatment with risperidone, a medication used to treat the symptoms of autism (Penagarikano et al., 2011). However, the drug does not improve social interactions for the knockout mice. The effects of risperidone on communicative behavior have not yet been reported. Rescue by the drug of some of the effects of knocking out Cntnap2 further validates the relationship between Cntnap2 and autism. These knockout mice can be used to test other drugs to treat some of the symptoms of autism, though perhaps not language impairment. This model is especially pertinent to CDFE, for which the mutation renders CNTNAP2 nonfunctional. The more common polymorphisms associated with ASD and SLI risk lie in introns, creating a challenge to develop mouse models. A songbird model may offer an advantage in understanding the role of CNTNAP2 in language in that knockdown of Cntnap2 can be targeted to song nuclei, isolating its effects on vocal behavior.

4. Hepatocyte Growth Factor Signaling Pathway Genes

In keeping with the theme of FoxP2 as a molecular entry point into gene networks involved in speech and language, another class of FoxP2 target genes is implicated in language deficits. Three genes in the hepatocyte growth factor (HGF) signaling pathway are each targets of FOXP2 regulation and associated with disorders of human speech and language. The first is the HGF receptor tyrosine kinase MET (Bottaro et al., 1991), which has been linked to ASD (Mukamel et al., 2011). The second, also linked to ASD, is the urokinase plasminogen activator receptor (uPAR, or PLAUR when referring to the human gene; Campbell et al., 2007), which was long thought to indirectly activate HGF through its binding partner urokinase (Mars, Zarnegar, & Michalopoulos, 1993), though more recently this function has been challenged (Eagleson, Campbell, Thompson, Bergman, & Levitt, 2011; Owen et al., 2010). The third is sushi-repeat protein, X-linked 2 (SRPX2), a uPAR ligand (Royer-Zemmour et al., 2008) that also binds HGF (Tanaka et al., 2012), and may account for the HGF-mediated effects of uPAR signaling. SRPX2 is linked to language through association with childhood seizures of the Rolandic fissure, which can cause language disabilities (Roll et al., 2006). FOXP2 binds the promoter regions of all three genes and represses transcription (G. Konopka et al., 2012; Mukamel et al., 2011; Roll et al., 2010). Recent evidence suggests that FOXP2 regulation of SRPX2 affects synaptogenesis and vocalizations in mice (Sia, Clem, & Huganir, 2013). Similar to CNTNAP2, the distribution of MET in human fetal brain is complementary to that of FOXP2. In cultures of normal human neural progenitors and established cell lines, endogenous FOXP2 expression increases with maturity as MET decreases (G. Konopka et al., 2012). Notably, the KE mutant (R553H) fails to repress uPAR or SRPX2 (Roll et al., 2010). These data suggest that HGF signaling is altered in cases of language disorders associated with FOXP2. To date HGF itself has not been directly associated with a disorder relating to speech; however, given the association of these other HGF signaling pathway genes with language disorders, it would not be surprising if such an association were discovered.

MET was initially investigated as an autism susceptibility gene due to the similarity of neuroanatomical abnormalities attributed to loss of MET signaling in the cortex and those found in cases of autism (Campbell et al., 2006). A SNP in the promoter region of MET, rs1858830, was identified as a site associated with elevated risk of diagnosis of autism. The “C” variant at this site causes a reduction in transcription of the gene, and alters transcription factor binding relative to the non-risk “G” variant. The “C” variant is overrepresented in cases of ASD, associated with reduced MET protein in the cortex (Campbell et al., 2007; Campbell, Li, Sutcliffe, Persico, & Levitt, 2008) and social and communication impairments in cases of ASD (Campbell, Warren, Sutcliffe, Lee, & Levitt, 2010). In healthy human embryonic brains, MET is enriched in the temporal cortex, an area involved in language processing, and to a lesser degree in the hippocampus and occipital cortex (Mukamel et al., 2011). HGF signaling through MET promotes development of cortical projection neurons (Eagleson et al., 2011). In microarray analysis, MET has been identified as a member of a gene module correlated with differentiation, particularly with axon guidance (G. Konopka et al., 2012). Though protein levels are dynamic during development, a peak of expression coincides with increased development of neurites and synapse formation, suggesting a role for MET in neuronal connectivity (Judson, Bergman, Campbell, Eagleson, & Levitt, 2009). MET is expressed in axon tracts of projection neurons of the neocortex, including those that descend into the striatum, consistent with the hypothesis that MET is a factor in development of neural circuits, which when perturbed, leads to symptoms of ASD and language impairment.

In a screen for other ASD-related genes in the MET signaling pathway, a SNP in the promoter region of PLAUR, rs344781, was identified as a risk factor for autism diagnosis with an interaction effect with MET rs1858830. uPAR knockout mice have been generated, but thus far studies have focused on the effects of knockout on neural migration and seizure activity. Whereas MET seems to promote cortical projection neuron migration and growth, uPAR seems to affect inhibitory neurons in much the same manner, though the mechanism remains unclear (Eagleson et al., 2011). Homozygous knockouts exhibit spontaneous seizures as well as a reduction of parvalbumin-positive interneurons in the anterior cingulate and parietal cortices (Eagleson, Bonnin, & Levitt, 2005; Powell et al., 2003). The loss of inhibitory interneurons may affect the balance of excitation and inhibition, a phenomenon associated with autism (Eagleson et al., 2011). Interestingly, uPAR may be absent in birds (NCBI search, BLAST), suggesting that it is not common to all vocal learning species. There may be a different molecule in songbirds that replaces uPAR function. Though uPAR was originally thought to be involved in the activation of HGF required for binding to MET (Mars et al., 1993), recent evidence suggests that uPAR and its binding partner urokinase contribute very little to the process, and rather other serine proteases are responsible for HGF activation (Owen et al., 2010). Phenotypic differences in uPAR and MET knockout mice support this hypothesis (Eagleson et al., 2011). However, uPAR is involved in several signaling cascades independent of MET (Blasi & Carmeliet, 2002), any of which may be related to autism or language impairment.

SRPX2 is a chondroitin sulfate proteoglycan that binds to both HGF and uPAR (Royer-Zemmour et al., 2008; Tanaka et al., 2012). Mutations in SRPX2 can result in seizures originating in the Rolandic fissure, which can lead to abnormal brain morphology in the form of polymicrogyria, and are associated with oral and speech dyspraxia and cognitive impairment (Roll et al., 2006). One such mutation, resulting in a tyrosine-to-serine substitution at position 72, is related to both Rolandic seizures and orofacial and fine motor impairment. The substitution occurs in a region thought to affect protein–protein interactions. In this same region, a site at position 75 is highly conserved among primates, but has changed in humans since the split from chimpanzees, suggesting an evolutionary mechanism for human speech (Royer et al., 2007), reminiscent of the amino acid substitutions in FOXP2 between the two species (Enard et al., 2002). As mentioned previously, new evidence has emerged for the role of SPRX2 in mouse vocalizations (Sia, Clem, & Huganir, 2013). Other chondroitin sulfate proteoglycans are involved in formation of perineuronal nets, which can affect plasticity of sensory systems (McRae, Rocco, Kelly, Brumberg, & Matthews, 2007). In songbirds, development of perineuronal nets around song nuclei correlates with the development of song, and it is hypothesized that destruction of these nets permits the reopening of critical period for song learning after crystallization (Balmer et al., 2009). It is possible, therefore, that SRPX2 is involved in similar processes, which could affect learned vocalizations in humans and songbirds alike.

5. Stuttering Genes

Stuttering, or stammering, is a condition in which speech is interrupted by involuntary repetitions of syllables or words, prolongation of syllables, or pauses during speech. Inheritance patterns strongly suggest a multifactorial genetic basis for the disorder, with relatively little environmental influence (Kang et al., 2011; Kraft & Yairi, 2012). However, it was not until recently that any specific gene was identified as a factor in stuttering. Genome-wide linkage revealed a locus of disequilibrium on chromosome 12 for stuttering (Riaz et al., 2005), which was investigated more closely in a large pedigree, identified only as Family PKST72, in which roughly half of the living members stutter (Kang et al., 2010). Genotyping in this pedigree revealed a relationship with a SNP, (G3598A), which causes a glutamine-to-lysine amino acid substitution in a gene encoding a subunit of N-acetylglucosamine-1-phosphate transferase (GNPTAB). The ‘A’ variant was more common in stuttering family members, and family members homozygous for the ‘G’ variant were much less likely to stutter. Unlike FOXP2 in the KE family, though, G3598A exhibits some phenotypic plasticity, in that not every family member with an ‘A’ variant stutters, and some family members homozygous for the ‘G’ variant do stutter. Sex has been previously shown to be a factor in recovery of stuttering, with females being four times more likely to recover (Ambrose, Cox, & Yairi, 1997). Such may be the case for the two female non-stuttering family members homozygous for the ‘A’ variant (Kang et al., 2010). Three more amino acid changes in GNPTAB were associated with stuttering in a broader population sample, as well as three others found in GNPTG, another subunit of the phosphotransferase, and three more mutations in N-acetylglucosamine-1-phosphodiester alpha-N-acetylglucosaminidase (NAGPA). These mutations account for a small percentage (<10%) of stuttering cases in this study, indicating that still unidentified factors contribute to the disorder. GNPTAB, GNPTG and NAGPA act as enzymes in the lysosomal targeting pathway. Other mutations in GNPTAB and GNPTG are associated with mucolipidoses, disorders associated with deficits in development, mental ability, and speech, though this study is the first to link mutations in these genes to stuttering (Kang et al., 2010; Kang & Drayna, 2012). The mechanisms by which these mutations affect speech are unknown. Other loci have been identified as potential sites for mutations associated with stuttering (Kraft & Yairi, 2012; Raza, Amjad, Riazuddin, & Drayna, 2012), but as yet no other genes have been discovered. One study did find an association between a SNP in the DRD2 gene in a Chinese Han population (Lan et al., 2009), but this result was not replicated in a larger sample (Kang et al., 2011). Additionally, a case was reported in which a partial deletion of CNTNAP2 was found in a stuttering patient, (Petrin et al., 2010) suggesting that there may be some overlap of genetic factors in stuttering and other language disorders.

6. Other Genes of Interest

Additional genes likely contribute to vocal learning. In a screen of genes within a region on chromosome 16 associated with SLI, two candidates correlated with measures of language ability: c-maf-inducing protein (CMIP) and calcium-importing ATPase, type 2C, member 2 (ATP2C2; Newbury et al., 2009). A subsequent study found an association of CMIP, but not of ATP2C2, with reading-related measures (Newbury et al., 2011; Scerri et al., 2011). Though both molecules are expressed in the brain, their functions therein are still poorly understood. In other tissues, CMIP is involved in a cell signaling cascade (Grimbert et al., 2003), and ATP2C2 is part of a pathway responsible for shuttling divalent ions to the Golgi apparatus (Faddy et al., 2008; Missiaen, Dode, Vanoevelen, Raeymaekers, & Wuytack, 2007). Other genes potentially involved in language comprehension include doublecortin domain containing protein 2 (DCDC2) and KIAA0319, which have both been associated with dyslexia (Czamara et al., 2011; Newbury et al., 2011; M. L. Rice, Smith, & Gayán, 2009; Scerri et al., 2011). Recently, DCDC2 was found to affect neuronal firing, increasing the excitability and compromising spike timing (Che, Girgenti, & Loturco, 2013). Given that the other genes implicated in language acquisition and production seem to be involved in either neurogenesis or neurite growth, perhaps CMIP, ATP2C2, and DCDC2 affect either or both of these processes. However, the function of KIAA0319 in language processing is beginning to be better understood. KIAA0319 is involved in the clathrin endocytosis pathway (Levecque, Velayos-Baeza, Holloway, & Monaco, 2009). Knockdown of Kiaa0319 expression in rat auditory cortex results in increased neuronal input resistance accompanied by increased excitability in response to auditory stimuli (Centanni et al., 2013). The authors hypothesize that this change in neuronal excitability, relevant to variants of KIAA0319 in cases of dyslexia, impedes differentiation of speech and non-speech sounds. Another gene of interest in relation to its role in language is FMR1, which encodes the fragile X mental retardation protein (FMRP). Language delay and impairments in both receptive and expressive language are characteristic of children with fragile X syndrome (FXS; Finestack, Richmond, & Abbeduto, 2009). In the zebra finch song system, FMRP is expressed in song nuclei HVC, LMAN, RA, and area X (Winograd, Clayton, & Ceman, 2008). Interestingly, FMRP is enriched in male RA around the onset of the sensorimotor learning phase. These data suggest that FMRP may be a common factor in learned vocalizations in both humans and songbirds.

7. MicroRNA

MicroRNAs (miRs) are short (~22 nucleotide), noncoding RNAs that post-transcriptionally regulate synthesis of specific proteins through either degradation of the mRNA or inhibition of translation (He & Hannon, 2004; Pasquinelli, 2012). These small molecules are thought to “fine-tune” gene expression involved in many biological processes. Research on miR functions in the brain has focused primarily on roles in development and neurogenesis (Liu & Zhao, 2009; Sun, Crabtree, & Yoo, 2013), though studies are starting to emerge on activational effects in the mature brain (Bredy, Lin, Wei, Baker-Andresen, & Mattick, 2011; Fiore, Khudayberdiev, Saba, & Schratt, 2011; Shi et al., 2013). MicroRNAs can affect learning and memory-based tasks, such as fear conditioning, context conditioning, place preference, and Morris water maze performance (Griggs, Young, Rumbaugh, & Miller, 2013; Konopka et al., 2010; Olde Loohuis et al., 2011; Wang & Barres, 2012). Another class of small noncoding RNAs are those that interact with regulatory piwi proteins (piRNAs) in spermatogenic cells, whose mechanisms and functions are still poorly understood, though evidence suggests they are involved in epigenetic control of transcription (Kuramochi-Miyagawa et al., 2008). Recently, piRNAs have been identified as factors contributing to associative learning in Aplysia through regulation of CREB2 (Rajasethupathy et al., 2012). However, investigation into the role of small RNAs in vocal learning has only just begun.

As with many of the genes described in this review, FOXP2 may be used as a starting point by identifying miRs that regulate expression of FOXP2, or are targets of FOXP2 regulation (or in some cases, both). In microarray analysis used to identify gene networks influenced by Foxp2 expression, 22 miRs were identified as transcriptional targets of murine Foxp2 (Vernes et al., 2011). Of these, several have documented functions in the brain: miR-9, -29a, -30a, -30d, -34b, -124a, -125b, and -137. Additional sources of potential vocal learning–associated miRs come from studies in songbirds. In zebra finches, miR-137 was included in a microarray study investigating genes regulated by singing in basal ganglia nucleus area X, and was found to belong to the same gene network module as FoxP2, and negatively associated with the number of motifs sung (Hilliard, Miller, Fraley, et al., 2012). As mentioned in an earlier section, miR-9 and -140-5p are expressed in zebra finch area X, are upregulated by singing in juveniles and adults, and associated with reduced levels of FoxP2 mRNA (Shi et al., 2013). Expression of five miRs in cortical auditory regions are affected by exposure to conspecific song: mir-92, -124, and -129-5p decreased, and mir-25 and -192 increased (Gunaratne et al., 2011). Though the birds in this latter study were adults, and therefore past the critical phase of song learning, the miRs involved in auditory processing may very well impact song learning earlier in life. mir-2954, a putatively avian-specific miR, is expressed at greater levels in males than females in all tissues tested, including brain (Luo et al., 2012). miR-2954 may therefore play a role in the sex-based differences in neuroanatomy and song learning in this species. miRs like miR-2954, which appear to be unique to birds or specifically zebra finch (Gunaratne et al., 2011; Luo et al., 2012), are not likely a common factor underlying behavior in all vocal learning species, although they may regulate genes in a manner common to all vocal learners. A better understanding of the mRNA targets of these miRs will be required to parse out this hypothesis.

How might miRs in the brain affect vocal learning? As with other genes implicated in vocal learning, many miRs act early in development to regulate neurogenesis (Sun et al., 2013), which may contribute to the organization of brain structures underlying speech and vocal learning. In chick spinal cord, miR-9 acts through regulation of FoxP1 to direct motor neuron specification (Otaegi, Pollock, Hong, & Sun, 2011). In the ventricular zone of developing mouse and zebra fish brain, miR-9 promotes neural differentiation by suppression of proteins involved in the proliferation of neural stem cells (Coolen, Thieffry, Drivenes, Becker, & Bally-Cuif, 2012; Saunders et al., 2010; Shibata, Nakao, Kiyonari, Abe, & Aizawa, 2011; Tan, Ohtsuka, González, & Kageyama, 2012; Zhao, Sun, Li, & Shi, 2009). Similarly, miR-124 expression in the developing CNS is thought to direct cell differentiation to a neuronal fate by suppressing non-neuronal transcripts (Cheng, Pastrana, Tavazoie, & Doetsch, 2009; Lim et al., 2005; Makeyev, Zhang, Carrasco, & Maniatis, 2007; Sanuki et al., 2011; Visvanathan, Lee, Lee, Lee, & Lee, 2007). miR-137 also regulates maturation of neurons (Smrt et al., 2010).

Additionally, miRs may have activational effects that support vocal learning. Several miRNAs impact neurite outgrowth and synaptogenesis. miR-9, for example, is expressed in axons of post-mitotic cortical neurons and limits or fine-tunes axon growth (Dajas-Bailador et al., 2012). Brain-derived neurotrophic factor (BDNF) indirectly affects axon growth through regulation of miR-9. Application of BDNF for a short period reduces miR-9 levels and subsequent growth of the axon, but prolonged exposure leads to an increase in miR-9 and a cessation of axon growth. In the songbird, BDNF is thought to be an important factor for neural connectivity between motor song nuclei in development and in adulthood in seasonal learners (Brenowitz, 2013); therefore, miR-9 activity in the songbird brain may be regulated by BDNF exposure. Additionally, predicted binding sites for miR-9 are found in the 3’-untranslated region of matrix metallopeptidase-9 (MMP9), an enzyme that affects synaptic morphology (Konopka et al., 2010). miR-9 represses both Foxp1 (Otaegi et al., 2011) and Foxp2 (Clovis et al., 2012; Shi et al., 2013), whereas Foxp2 promotes miR-9 expression in neuron-like cells in culture (Vernes et al., 2011). This argues for the existence of a Foxp2/miR-9 feedback loop, in which miR-9 indirectly affects gene expression downstream of FoxP2. miR-29a/b changes dendritic spine morphology in hippocampus (Lippi et al., 2011). In Aplysia, miR-124 restricts serotonin-induced synaptic plasticity through regulation of CREB (Rajasethupathy et al., 2009). In mouse differentiating and adult primary cortical neurons, overexpression of miR-124 increases neurite outgrowth, whereas functional blockade causes a delay (Yu, Chung, Deo, Thompson, & Turner, 2008). miRs may affect synaptic plasticity by regulating synaptic molecules. miR-137 has potential binding sites in the 3’UTR of GluR1 mRNA, and miR-124 in GluR2 (Konopka et al., 2010). Regulation of these proteins could impact the synaptic plasticity required for vocal learning.

Conclusions

Recent advances have augmented our understanding of the genetic basis for vocal learning by (a) uncovering new genetic factors through studies of human pathology, (b) discovering new vocal learning–related genes through network analysis of neural tissues pertaining to human speech and birdsong, and (c) developing a better understanding of the physiological effects of known speech-related genes, such as FOXP1, FOXP2, and CNTNAP2 using animal models. FOXP2 was the first gene directly correlated with a language disorder, and through its molecular connections other language-related genes are being discovered, including those in the HGF signaling pathway. As small RNA regulatory factors become better cataloged, we are likely to learn even more about the genetic basis of vocal learning. Since convergent evolution has produced vocal learning in humans, other mammals, and songbirds, we might expect that there are overlapping genes between the clades, but equally we expect some differences. This is likely the case with uPAR, which has no direct avian correlate, but is associated with human speech pathology. Continuing investigation into genes that affect language and vocal learning in other species will provide a better understanding of the mechanisms that govern this complex communicative behavior.


References

Abrahams, B. S., & Geschwind, D. H. (2008). Advances in autism genetics: On the threshold of a new neurobiology. Nature Reviews Genetics, 9(5), 341–355. doi:10.1038/nrg2346

Alarcón, M., Abrahams, B. S., Stone, J. L., Duvall, J. A., Perederiy, J. V., Bomar, J. M., . . . Geschwind, D. H. (2008). Linkage, association, and gene-expression analyses identify CNTNAP2 as an autism-susceptibility gene. American Journal of Human Genetics, 82(1), 150–159. doi:10.1016/j.ajhg.2007.09.005

Ambrose, N. G., Cox, N. J., & Yairi, E. (1997). The genetic basis of persistence and recovery in stuttering. Journal of Speech, Language, and Hearing Research, 40(3), 567–580. doi:10.1044/jslhr.4003.567

Anderson, G. R., Galfin, T., Xu, W., Aoto, J., Malenka, R. C., & Südhof, T. C. (2012). Candidate autism gene screen identifies critical role for cell-adhesion molecule CASPR2 in dendritic arborization and spine development. Proceedings of the National Academy of Sciences. doi:10.1073/pnas.1216398109

Arking, D. E., Cutler, D. J., Brune, C. W., Teslovich, T. M., West, K., Ikeda, M., . . . Chakravarti, A. (2008). A common genetic variant in the neurexin superfamily member CNTNAP2 increases familial risk of autism. American Journal of Human Genetics, 82(1), 160–164. doi:10.1016/j.ajhg.2007.09.015

Arriaga, G., Zhou, E. P., & Jarvis, E. D. (2012). Of mice, birds, and men: The mouse ultrasonic song system has some features similar to humans and song-learning birds. PLoS ONE, 7(10), e46610. doi:10.1371/journal.pone.0046610

Bacon, C., & Rappold, G. A. (2012). The distinct and overlapping phenotypic spectra of FOXP1 and FOXP2 in cognitive disorders. Human Genetics, 131(11), 1687–1698. doi:10.1007/s00439-012-1193-z

Bakkaloglu, B., O’Roak, B. J., Louvi, A., Gupta, A. R., Abelson, J. F., Morgan, T. M., . . . State, M. W. (2008). Molecular cytogenetic analysis and resequencing of contactin associated protein-like 2 in autism spectrum disorders. American Journal of Human Genetics, 82(1), 165–173. doi:10.1016/j.ajhg.2007.09.017

Balmer, T. S., Carels, V. M., Frisch, J. L., & Nick, T. A. (2009). Modulation of perineuronal nets and parvalbumin with developmental song learning. The Journal of Neuroscience, 29(41), 12878–12885. doi:10.1523/JNEUROSCI.2974-09.2009

Benayoun, B. A., Caburet, S., & Veitia, R. A. (2011). Forkhead transcription factors: Key players in health and disease. Trends in Genetics, 27(6), 224–232. doi:10.1016/j.tig.2011.03.003

Benítez-Burraco, A., & Longa, V. M. (2012). Righthandedness, lateralization and language in Neanderthals: A comment on Frayer et al. (2010). Journal of Anthropological Sciences = Rivista di antropologia : JASS / Istituto italiano di antropologia, 90, 187–92– discussion 193–7. doi:10.4436/jass.90002

Berwick, R. C., Friederici, A. D., Chomsky, N., & Bolhuis, J. J. (2013). Evolution, brain, and the nature of language. Trends in Cognitive Sciences, 17(2), 89–98. doi:10.1016/j.tics.2012.12.002

Blasi, F., & Carmeliet, P. (2002). uPAR: A versatile signalling orchestrator. Nature Reviews. Molecular Cell Biology, 3(12), 932–943. doi:10.1038/nrm977

Bottaro, D. P., Rubin, J. S., Faletto, D. L., Chan, A. M., Kmiecik, T. E., Vande Woude, G. F., & Aaronson, S. A. (1991). Identification of the hepatocyte growth factor receptor as the c-met proto-oncogene product. Science (New York, NY), 251(4995), 802–804. doi:10.1126/science.1846706

Bowers, J. M., Perez-Pouchoulen, M., Edwards, N. S., & McCarthy, M. M. (2013). Foxp2 mediates sex differences in ultrasonic vocalization by rat pups and directs order of maternal retrieval. The Journal of Neuroscience, 33(8), 3276–3283. doi:10.1523/JNEUROSCI.0425-12.2013

Bredy, T. W., Lin, Q., Wei, W., Baker-Andresen, D., & Mattick, J. S. (2011). MicroRNA regulation of neural plasticity and memory. Neurobiology of Learning and Memory, 96(1), 89–94. doi:10.1016/j.nlm.2011.04.004

Brenowitz, E. A. (2013). Testosterone and brain-derived neurotrophic factor interactions in the avian song control system. Neuroscience, 239, 115–123. doi:10.1016/j.neuroscience.2012.09.023

Bruce, H. A., & Margolis, R. L. (2002). FOXP2: Novel exons, splice variants, and CAG repeat length stability. Human Genetics, 111(2), 136–144. doi:10.1007/s00439-002-0768-5

Campbell, D. B., D’Oronzio, R., Garbett, K., Ebert, P. J., Mirnics, K., Levitt, P., & Persico, A. M. (2007). Disruption of cerebral cortex MET signaling in autism spectrum disorder. Annals of Neurology, 62(3), 243–250. doi:10.1002/ana.21180

Campbell, D. B., Li, C., Sutcliffe, J. S., Persico, A. M., & Levitt, P. (2008). Genetic evidence implicating multiple genes in the MET receptor tyrosine kinase pathway in autism spectrum disorder. Autism Research: Official Journal of the International Society for Autism Research, 1(3), 159–168. doi:10.1002/aur.27

Campbell, D. B., Sutcliffe, J. S., Ebert, P. J., Militerni, R., Bravaccio, C., Trillo, S., . . . Levitt, P. (2006). A genetic variant that disrupts MET transcription is associated with autism. Proceedings of the National Academy of Sciences of the United States of America, 103(45), 16834–16839. doi:10.1073/pnas.0605296103

Campbell, D. B., Warren, D., Sutcliffe, J. S., Lee, E. B., & Levitt, P. (2010). Association of MET with social and communication phenotypes in individuals with autism spectrum disorder. American Journal of Medical Genetics. Part B, Neuropsychiatric Genetics: The Official Publication of the International Society of Psychiatric Genetics, 153B(2), 438–446. doi:10.1002/ajmg.b.30998

Carlsson, P., & Mahlapuu, M. (2002). Forkhead transcription factors: Key players in development and metabolism. Developmental Biology, 250(1), 1–23. doi:10.1006/dbio.2002.0780

Carr, C. W., Moreno-De-Luca, D., Parker, C., Zimmerman, H. H., Ledbetter, N., Martin, C. L., . . . Abdul-Rahman, O. A. (2010). Chiari I malformation, delayed gross motor skills, severe speech delay, and epileptiform discharges in a child with FOXP1 haploinsufficiency.
European Journal of Human Genetics, 18(11), 1216–1220. doi:10.1038/ejhg.2010.96

Centanni, T. M., Booker, A. B., Sloan, A. M., Chen, F., Maher, B. J., Carraway, R. S., . . . Kilgard, M. P. (2013). Knockdown of the dyslexia-associated gene Kiaa0319 impairs temporal responses to speech stimuli in rat primary auditory cortex. Cerebral Cortex (New York, NY: 1991). doi:10.1093/cercor/bht028

Che, A., Girgenti, M. J., & Loturco, J. (2013). The dyslexia-associated gene Dcdc2 is required for spiketiming precision in mouse neocortex. Biological Psychiatry. doi:10.1016/j.biopsych.2013.08.018

Cheng, L.-C., Pastrana, E., Tavazoie, M., & Doetsch, F. (2009). miR-124 regulates adult neurogenesis in the subventricular zone stem cell niche. Nature Neuroscience, 12(4), 399–408. doi:10.1038/nn.2294

Clovis, Y. M., Enard, W., Marinaro, F., Huttner, W. B., & De Pietri Tonelli, D. (2012). Convergent repression of Foxp2 3’UTR by miR-9 and miR-132 in embryonic mouse neocortex: Implications for radial migration of neurons. Development (Cambridge, England), 139(18), 3332–3342. doi:10.1242/dev.078063

Condro, M. C., & White, S. A. (2014). Distribution of language-related Cntnap2 protein in neural circuits critical for vocal learning. The Journal of Comparative Neurology, (522), 169–185. doi:10.1002/cne.23394

Coolen, M., Thieffry, D., Drivenes, Ø., Becker, T. S., & Bally-Cuif, L. (2012). miR-9 controls the timing of neurogenesis through the direct inhibition of antagonistic factors. Developmental Cell, 22(5), 1052–1064. doi:10.1016/j.devcel.2012.03.003

Czamara, D., Bruder, J., Becker, J., Bartling, J., Hoffmann, P., Ludwig, K. U., . . . Schulte-Körne, G. (2011). Association of a rare variant with mismatch negativity in a region between KIAA0319 and DCDC2 in dyslexia. Behavior Genetics, 41(1), 110–119. doi:10.1007/s10519-010-9413-6

Dajas-Bailador, F., Bonev, B., Garcez, P., Stanley, P., Guillemot, F., & Papalopulu, N. (2012). microRNA-9 regulates axon extension and branching by targeting Map1b in mouse cortical neurons. Nature Neuroscience, 15, 697-699. doi:10.1038/nn.3082

Doupe, A. J., & Kuhl, P. K. (1999). Birdsong and human speech: Common themes and mechanisms. Annual Review of Neuroscience, 22, 567–631. doi:10.1146/annurev.neuro.22.1.567

Eagleson, K. L., Bonnin, A., & Levitt, P. (2005). Region- and age-specific deficits in ?-aminobutyric acidergic neuron development in the telencephalon of theuPAR-/- mouse. The Journal of Comparative Neurology, 489(4), 449–466. doi:10.1002/cne.20647

Eagleson, K. L., Campbell, D. B., Thompson, B. L., Bergman, M. Y., & Levitt, P. (2011). The autism risk genes MET and PLAUR differentially impact cortical development. Autism Research: Official Journal of the International Society for Autism Research, 4(1), 68–83. doi:10.1002/aur.172

Enard, W., Gehre, S., Hammerschmidt, K., Holter, S.M., Blass, T., Somel, M., . . . Pääbo, S. (2009). A humanized version of Foxp2 affects cortico-basal ganglia circuits in mice. Cell, 137(5), 961–971. doi:10.1016/j.cell.2009.03.041

Enard, W., Przeworski, M., Fisher, S. E., Lai, C. S. L., Wiebe, V., Kitano, T., . . . Pääbo, S. (2002). Molecular evolution of FOXP2, a gene involved in speech and language. Nature, 418(6900), 869–872. doi:10.1038/nature01025

Faddy, H. M., Smart, C. E., Xu, R., Lee, G. Y., Kenny, P. A., Feng, M., . . . Monteith, G. R. (2008). Localization of plasma membrane and secretory calcium pumps in the mammary gland. Biochemical and Biophysical Research Communications, 369(3), 977–981. doi:10.1016/j.bbrc.2008.03.003

Falivelli, G., De Jaco, A., Favaloro, F. L., Kim, H., Wilson, J., Dubi, N., . . . Comoletti, D. (2012). Inherited genetic variants in autism-related CNTNAP2 show perturbed trafficking and ATF6 activation. Human Molecular Genetics. doi:10.1093/hmg/dds320

Ferland, R. D., Cherry, T., Preware, P., Morrisey, E. E., & Walsh, C. (2003). Characterization of Foxp 2 and Foxp 1 mRNA and protein in the developing and mature brain. The Journal of Comparative Neurology. doi:10.1002/cne.10654

Feuk, L., Kalervo, A., Lipsanen-Nyman, M., Skaug, J., Nakabayashi, K., Finucane, B., . . . Hannula-Jouppi, K. (2006). Absence of a paternally inherited FOXP2 gene in developmental verbal dyspraxia. American Journal of Human Genetics, 79(5), 965-972. doi:10.1086/508902

Finestack, L. H., Richmond, E. K., & Abbeduto, L. (2009). Language development in individuals with fragile X syndrome. Topics in Language Disorders, 29(2), 133–148. doi:10.1097/TLD.0b013e3181a72016

Fiore, R., Khudayberdiev, S., Saba, R., & Schratt, G. (2011). MicroRNA function in the nervous system. Progress in Molecular Biology and Translational Science, 102, 47–100. doi:10.1016/B978-0-12-415795-8.00004-0

Fisher, S. E., Vargha-Khadem, F., Watkins, K. E., Monaco, A. P., & Pembrey, M. E. (1998). Localisation of a gene implicated in a severe speech and language disorder. Nature Genetics, 18(2), 168–170. doi:10.1038/ng0298-168

Fitch, W. (2000). The evolution of speech: A comparative review. Trends in Cognitive Sciences, 4(7), 258–267. doi:10.1016/S1364-6613(00)01494-7

Fitch, W. T. (2011). The evolution of syntax: An exaptationist perspective. Frontiers in Evolutionary Neuroscience, 3, 9. doi:10.3389/fnevo.2011.00009

Fitch, W. T. (2012). Evolutionary developmental biology and human language evolution: Constraints on adaptation. Evolutionary Biology, 39(4), 613–637. doi:10.1007/s11692-012-9162-y

French, C. A., Jin, X., Campbell, T. G., Gerfen, E., Groszer, M., Fisher, S. E., & Costa, R. M. (2012). An aetiological Foxp2 mutation causes aberrant striatal activity and alters plasticity during skill learning. Molecular Psychiatry, 17(11), 1077–1085.doi:10.1038/mp.2011.105

Fujita, E., Tanabe, Y., Shiota, A., Ueda, M., Suwa, K., Momoi, M. Y., & Momoi, T. (2008). Ultrasonic vocalization impairment of Foxp2 (R552H) knockin mice related to speech-language disorder and abnormality of Purkinje cells. Proceedings of the National Academy of Sciences of the United States of America, 105(8), 3117–3122. doi:10.1073/pnas.0712298105

Gaub, S., Groszer, M., Fisher, S. E., & Ehret, G. (2010). The structure of innate vocalizations in Foxp2-deficient mouse pups. Genes, Brain, and Behavior, 9(4), 390–401. doi:10.1111/j.1601-183X.2010.00570.x

Green, R. E., Krause, J., Briggs, A. W., Maricic, T., Stenzel, U., Kircher, M., . . . Pääbo, S. (2010). A draft sequence of the Neandertal genome. Science (New York, NY), 328(5979), 710–722. doi:10.1126/science.1188021

Griggs, E. M., Young, E. J., Rumbaugh, G., & Miller, C. A. (2013). MicroRNA-182 regulates amygdaladependent memory formation. The Journal of Neuroscience, 33(4), 1734–1740. doi:10.1523/JNEUROSCI.2873-12.2013

Grimbert, P., Valanciute, A., Audard, V., Pawlak, A., Le gouvelo, S., Lang, P., . . . Sahai, D. (2003). Truncation of C-mip (Tc-mip), a new proximal signaling protein, induces c-maf Th2 transcription factor and cytoskeleton reorganization. The Journal of Experimental Medicine, 198(5), 797–807. doi:10.1084/jem.20030566

Groszer, M., Keays, D. A., Deacon, R. M. J., de Bono, J. P., Prasad-Mulcare, S., Gaub, S., . . . Fisher, S. E. (2008). Impaired synaptic plasticity and motor learning in mice with a point mutation implicated in human speech deficits. Current Biology, 18(5), 354–362. doi:10.1016/j.cub.2008.01.060

Gunaratne, P. H., Lin, Y.-C., Benham, A. L., Drnevich, J., Coarfa, C., Tennakoon, J. B., . . . Clayton, D. F. (2011). Song exposure regulates known and novel microRNAs in the zebra finch auditory forebrain. BMC Genomics, 12(1), 277. doi:10.1186/1471-2164-12-277

Haesler, S., Rochefort, C., Georgi, B., Licznerski, P., Osten, P., & Scharff, C. (2007). Incomplete and inaccurate Vocal imitation after knockdown of FoxP2 in songbird basal ganglia nucleus area X. PLoS Biology, 5(12), e321. doi:10.1371/journal. pbio.0050321

Hamdan, F. F., Daoud, H., Rochefort, D., Piton, A., Gauthier, J., Langlois, M., . . . Michaud, J. L. (2010). De novo mutations in FOXP1 in cases with intellectual disability, autism, and language impairment. American Journal of Human Genetics, 87(5), 671–678. doi:10.1016/j.ajhg.2010.09.017

He, L., & Hannon, G. J. (2004). MicroRNAs: Small RNAs with a big role in gene regulation. Nature Reviews Genetics, 5(7), 522–531. doi:10.1038/nrg1379

Hilliard, A. T., Miller, J. E., Fraley, E. R., Horvath, S., & White, S. A. (2012). Molecular microcircuitry underlies functional specification in a Basal Ganglia circuit dedicated to vocal learning. Neuron, 73(3), 537–552. doi:10.1016/j.neuron.2012.01.005

Hilliard, A. T., Miller, J. E., Horvath, S., & White, S. A. (2012). Distinct neurogenomic states in basal ganglia subregions relate differently to singing behavior in songbirds. PLoS Computational Biology, 8(11), e1002773. doi:10.1371/journal.pcbi.1002773

Hockemeyer, D., Sfeir, A. J., Shay, J. W., Wright, W. E., & de Lange, T. (2005). POT1 protects telomeres from a transient DNA damage response and determines how human chromosomes end. The EMBO Journal, 24(14), 2667–2678. doi:10.1038/sj.emboj.7600733

Horn, D. (2012). Mild to moderate intellectual disability and significant speech and language deficits in patients with FOXP1 deletions and mutations. Molecular Syndromology, 2(3–5), 213–216. doi:10.1159/000330916

Horn, D., Kapeller, J., Rivera-Brugués, N., Moog, U., Lorenz-Depiereux, B., Eck, S., . . . Strom, T. M. (2010). Identification of FOXP1 deletions in three unrelated patients with mental retardation and significant speech and language deficits. Human Mutation, 31(11), E1851–60. doi:10.1002/humu.21362

Immelmann, K. (1969). Song development in the zebra finch and other estrildid finches. In R. A. Hinde (Ed.), Bird Vocalizations (pp. 61–74). New York: Cambridge University Press.

Jarvis, E. D. (2004). Learned birdsong and the neurobiology of human language. Annals of the New York Academy of Sciences, 1016(1), 749–777. doi:10.1196/annals.1298.038

Judson, M. C., Bergman, M. Y., Campbell, D. B., Eagleson, K. L., & Levitt, P. (2009). Dynamic gene and protein expression patterns of the autism-associated met receptor tyrosine kinase in the developing mouse forebrain. Journal of Comparative Neurology, 513(5), 511–531. doi:10.1002/cne.21969

Kaestner, K. H., Knochel, W., & Martinez, D. E. (2000). Unified nomenclature for the winged helix/forkhead transcription factors. Genes & Development, 14(2), 142–146. doi:10.1101/gad.14.2.142

Kang, C., Domingues, B. S., Sainz, E., Domingues, C. E. F., Drayna, D., & Moretti-Ferreira, D. (2011). Evaluation of the association between polymorphisms at the DRD2 locus and stuttering. Journal of Human Genetics, 56(6), 472–473. doi:10.1038/jhg.2011.29

Kang, C., & Drayna, D. (2012). A role for inherited metabolic deficits in persistent developmental stuttering. Molecular Genetics and Metabolism, 107(3), 276–280. doi:10.1016/j.ymgme.2012.07.020

Kang, C., Riazuddin, S., Mundorff, J., Krasnewich, D., Friedman, P., Mullikin, J. C., & Drayna, D. (2010). Mutations in the lysosomal enzyme-targeting pathway and persistent stuttering. The New England Journal of Medicine, 362(8), 677–685. doi:10.1056/NEJMoa0902630
Kiya,

T., Itoh, Y., & Kubo, T. (2008). Expression analysis of the FoxP homologue in the brain of the honeybee, Apis mellifera. (1), 53–60. doi:10.1111/j.1365-2583.2008.00775.x

Knornschild, M., Nagy, M., Metz, M., Mayer, F., & von Helversen, O. (2010). Complex vocal imitation during ontogeny in a bat. Biology Letters, 6(2), 156–159. doi:10.1098/rsbl.2009.0685

Kojima, S., Kao, M. H., & Doupe, A. J. (2013). Task-related “cortical” bursting depends critically on basal ganglia input and is linked to vocal plasticity. Proceedings of the National Academy of Sciences, 110(12), 4756–4761. doi:10.1073/pnas.1216308110

Konishi, M., & Akutagawa, E. (1985). Neuronal growth, atrophy and death in a sexually dimorphic song nucleus in the zebra finch brain. Nature, 315(6015), 145–147. doi:10.1038/315145a0

Konopka, G., Wexler, E., Rosen, E., Mukamel, Z., Osborn, G. E., Chen, L., . . . Geschwind, D. H. (2012). Modeling the functional genomics of autism using human neurons. Molecular Psychiatry, 17(2), 202–214. doi:10.1038/mp.2011.60

Konopka, W., Kiryk, A., Novak, M., Herwerth, M., Parkitna, J. R., Wawrzyniak, M., . . . Schütz, G. (2010). MicroRNA loss enhances learning and memory in mice. The Journal of Neuroscience, 30(44), 14835–14842. doi:10.1523/JNEUROSCI.3030-10.2010

Kos, M., van den Brink, D., Snijders, T. M., Rijpkema, M., Franke, B., Fernandez, G., & Hagoort, P. (2012). CNTNAP2 and language processing in healthy individuals as measured with ERPs. PLoS ONE, 7(10), e46995. doi:10.1371/journal.pone.0046995

Kraft, S. J., & Yairi, E. (2012). Genetic bases of stuttering: The state of the art, 2011. Folia phoniatrica et logopaedica: Official Organ of the International Association of Logopedics and Phoniatrics (IALP), 64(1), 34–47. doi:10.1159/000331073

Krause, J., Lalueza-Fox, C., Orlando, L., Enard, W., Green, R. E., Burbano, H. A., . . . Pääbo, S. (2007). The derived FOXP2 variant of modern humans was shared with Neandertals. Current Biology, 17(21), 1908–1912. doi:10.1016/j.cub.2007.10.008

Kuramochi-Miyagawa, S., Watanabe, T., Gotoh, K., Totoki, Y., Toyoda, A., Ikawa, M., . . . Nakano, T. (2008). DNA methylation of retrotransposon genes is regulated by Piwi family members MILI and MIWI2 in murine fetal testes. Genes & Development, 22(7), 908–917. doi:10.1101/gad.1640708

Lai, C. S., Fisher, S. E., Hurst, J. A., Vargha-Khadem, F., & Monaco, A. P. (2001). A forkhead-domain gene is mutated in a severe speech and language disorder. Nature, 413(6855), 519–523. doi:10.1038/35097076

Lan, J., Song, M., Pan, C., Zhuang, G., Wang, Y., Ma, W., . . . Wang, W. (2009). Association between dopaminergic genes (SLC6A3 and DRD2) and stuttering among Han Chinese. Journal of Human Genetics, 54(8), 457–460. doi:10.1038/jhg.2009.60

Lazaro, M. T., Penagarikano, O., Dong, H., Geschwind, D. H., & Golshani, P. (2012, October 15). Excitatory/inhibitory imbalance in the mPFC of the Cntnap2 mouse model of autism. Society for Neuroscience. New Orleans, LA.

Levecque, C., Velayos-Baeza, A., Holloway, Z. G., & Monaco, A. P. (2009). The dyslexia-associated protein KIAA0319 interacts with adaptor protein 2 and follows the classical clathrin-mediated endocytosis pathway. American Journal of Physiology. Cell Physiology, 297(1), C160–168. doi:10.1152/ajpcell.00630.2008

Li, G., Wang, J., Rossiter, S. J., Jones, G., & Zhang, S. (2007). Accelerated FoxP2 evolution in echolocating bats. PLoS ONE, 2(9), e900. doi:10.1371/journal.pone.0000900

Li, S., Weidenfeld, J., & Morrisey, E. E. (2004). Transcriptional and DNA binding activity of the Foxp1/2/4 family is modulated by heterotypic and homotypic protein interactions. Molecular and Cellular Biology, 24(2), 809–822. doi:10.1128/MCB.24.2.809-822.2004

Lim, L. P., Lau, N. C., Garrett-Engele, P., Grimson, A., Schelter, J. M., Castle, J., . . . Johnson, J. M. (2005). Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature, 433(7027), 769–773. doi:10.1038/nature03315

Lippi, G., Steinert, J. R., Marczylo, E. L., D’Oro, S., Fiore,
R., Forsythe, I. D., . . . Young, K. W. (2011). Targeting of the Arpc3 actin nucleation factor by miR-29a/b regulates dendritic spine morphology. The Journal of Cell Biology, 194(6), 889–904. doi:10.1083/jcb.201103006

Liu, C., & Zhao, X. (2009). MicroRNAs in adult and embryonic neurogenesis. NeuroMolecular Medicine, 11(3), 141–152. doi:10.1007/s12017-009-8077-y

Lu, M. M., Li, S., Yang, H., & Morrisey, E. E. (2002). Foxp4: A novel member of the Foxp subfamily of winged-helix genes co-expressed with Foxp1 and Foxp2 in pulmonary and gut tissues. Mechanisms of Development, 119 Suppl 1, S197–202. doi:10.1016/S0925-4773(03)00116-3

Luo, G.-Z., Hafner, M., Shi, Z., Brown, M., Feng, G.-H., Tuschl, T., . . . Li, X. (2012). Genome-wide annotation and analysis of zebra finch microRNA repertoire reveal sex-biased expression. BMC Genomics, 13(1), 727. doi:10.1186/1471-2164-13-727

Lyon, G. R., Shaywitz, S. E., & Shaywitz, B. A. (2003). A definition of dyslexia. Annals of Dyslexia, 53(1), 1–14. doi:10.1007/s11881-003-0001-9

MacDermot, K. D., Bonora, E., Sykes, N., Coupe, A.-M., Lai, C. S. L., Vernes, S. C., . . . Fisher, S. E. (2005). Identification of FOXP2 truncation as a novel cause of developmental speech and language deficits. American Journal of Human Genetics, 76(6), 1074–1080. doi:10.1086/430841

Mahrt, E. J., Perkel, D. J., Tong, L., Rubel, E. W., & Portfors, C. V. (2013). Engineered deafness reveals that mouse courtship vocalizations do not require auditory experience. The Journal of Neuroscience, 33(13), 5573–5583. doi:10.1523/JNEUROSCI.5054-12.2013

Makeyev, E. V., Zhang, J., Carrasco, M. A., & Maniatis, T. (2007). The microRNA miR-124 promotes neuronal differentiation by triggering brain-specific alternative pre-mRNA splicing. Molecular Cell, 27(3), 435–448. doi:10.1016/j.molcel.2007.07.015

Mars, W. M., Zarnegar, R., & Michalopoulos, G. K. (1993). Activation of hepatocyte growth factor by the plasminogen activators uPA and tPA. The American Journal of Pathology, 143(3), 949–958.

Marui, T., Koishi, S., Funatogawa, I., Yamamoto, K., Matsumoto, H., Hashimoto, O., . . . Sasaki, T. (2005). No association of FOXP2 and PTPRZ1 on 7q31 with autism from the Japanese population. Neuroscience Research, 53(1), 91–94. doi:10.1016/j.neures.2005.05.003

McRae, P. A., Rocco, M. M., Kelly, G., Brumberg, J. C., & Matthews, R. T. (2007). Sensory deprivation alters aggrecan and perineuronal net expression in the mouse barrel cortex. The Journal of Neuroscience, 27(20), 5405–5413. doi:10.1523/JNEUROSCI.5425-06.2007

Miller, J. E., Spiteri, E., Condro, M. C., Dosumu-Johnson, R. T., Geschwind, D. H., & White, S. A. (2008). Birdsong decreases protein levels of FoxP2, a molecule required for human speech. Journal of Neurophysiology, 100(4), 2015–2025. doi:10.1152/jn.90415.2008

Missiaen, L., Dode, L., Vanoevelen, J., Raeymaekers, L., & Wuytack, F. (2007). Calcium in the Golgi apparatus. Cell Calcium, 41(5), 405–416. doi:10.1016/j.ceca.2006.11.001

Mizutani, A., Matsuzaki, A., Momoi, M. Y., Fujita, E., Tanabe, Y., & Momoi, T. (2007). Intracellular distribution of a speech/language disorder associated FOXP2 mutant. Biochemical and Biophysical Research Communications, 353(4), 869–874. doi:10.1016/j.bbrc.2006.12.130

Mukamel, Z., Konopka, G., Wexler, E., Osborn, G. E., Dong, H., Bergman, M. Y., . . . Geschwind, D. H. (2011). Regulation of MET by FOXP2, genes implicated in higher cognitive dysfunction and autism risk. The Journal of Neuroscience, 31(32), 11437–11442. doi:10.1523/JNEUROSCI.0181-11.2011

Newbury, D. F., Bonora, E., Lamb, J. A., Fisher, S. E., Lai, C. S. L., Baird, G., . . . International Molecular Genetic Study of Autism Consortium. (2002). FOXP2 is not a major susceptibility gene for autism or specific language impairment. American Journal of Human Genetics, 70(5), 1318–1327. doi:10.1086/339931

Newbury, D. F., Paracchini, S., Scerri, T. S., Winchester, L., Addis, L., Richardson, A. J., . . . Monaco, A. P. (2011). Investigation of dyslexia and SLI risk variants in reading- and language-impaired subjects. Behavior Genetics, 41(1), 90–104. doi:10.1007/s10519-010-9424-3

Newbury, D. F., Winchester, L., Addis, L., Paracchini, S., Buckingham, L.-L., Clark, A., . . . Monaco, A. P. (2009). CMIP and ATP2C2 modulate phonological short-term memory in language impairment. American Journal of Human Genetics, 85(2), 264–272. doi:10.1016/j.ajhg.2009.07.004

Nottebohm, F., & Arnold, A. P. (1976). Sexual dimorphism in vocal control areas of the songbird brain. Science, 194(4261), 211–213. doi:10.1126/science.959852

Olde Loohuis, N. F. M., Kos, A., Martens, G. J. M., Bokhoven, H., Nadif Kasri, N., & Aschrafi, A. (2011). MicroRNA networks direct neuronal development and plasticity. Cellular and Molecular Life Sciences, 69(1), 89–102. doi:10.1007/s00018-011-0788-1

O’Roak, B. J., Deriziotis, P., Lee, C., Vives, L., Schwartz, J. J., Girirajan, S., . . . Eichler, E. E. (2011). Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nature Genetics, 43(6), 585–589. doi:10.1038/ng.835

Otaegi, G., Pollock, A., Hong, J., & Sun, T. (2011). MicroRNA miR-9 modifies motor neuron columns by a tuning regulation of FoxP1 levels in developing spinal cords. The Journal of Neuroscience, 31(3), 809–818. doi:10.1523/JNEUROSCI.4330-10.2011

Owen, K. A., Qiu, D., Alves, J., Schumacher, A. M., Kilpatrick, L. M., Li, J., . . . Ellis, V. (2010). Pericellular activation of hepatocyte growth factor by the transmembrane serine proteases matriptase and hepsin, but not by the membrane-associated protease uPA. The Biochemical Journal, 426(2), 219–228. doi:10.1042/BJ20091448

Palka, C., Alfonsi, M., Mohn, A., Cerbo, R., Guanciali Franchi, P., Fantasia, D., . . . Palka, G. (2012). Mosaic 7q31 deletion involving FOXP2 gene associated with language impairment. Pediatrics, 129(1), e183–188. doi:10.1542/peds.2010-2094

Palumbo, O., D’Agruma, L., Minenna, A. F., Palumbo, P., Stallone, R., Palladino, T., . . . Carella, M. (2013). 3p14.1 de novo microdeletion involving the FOXP1 gene in an adult patient with autism, severe speech delay and deficit of motor coordination. Gene, 516(1), 107–113. doi:10.1016/j.gene.2012.12.073

Panaitof, S. C. (2012). A songbird animal model for dissecting the genetic bases of autism spectrum disorder. Disease Markers, 33(5), 241–249. doi:10.3233/DMA-2012-0918

Panaitof, S. C., Abrahams, B. S., Dong, H., Geschwind, D. H., & White, S. A. (2010). Language-related Cntnap2 gene is differentially expressed in sexually dimorphic song nuclei essential for vocal learning in songbirds. The Journal of Comparative Neurology, 518(11), 1995–2018. doi:10.1002/cne.22318

Pariani, M. J., Spencer, A., Graham, J. M., & Rimoin, D. L. (2009). A 785kb deletion of 3p14.1p13, including the FOXP1 gene, associated with speech delay, contractures, hypertonia and blepharophimosis. European Journal of Medical Genetics, 52(2–3), 123–127. doi:10.1016/j.ejmg.2009.03.012

Pasquinelli, A. E. (2012). MicroRNAs and their targets: recognition, regulation and an emerging reciprocal relationship. Nature Reviews Genetics, 13(4), 271–282. doi:10.1038/nrg3162

Penagarikano, O., Abrahams, B. S., Herman, E. I., Winden, K. D., Gdalyahu, A., Dong, H., . . . Geschwind, D. H. (2011). Absence of CNTNAP2 leads to epilepsy, neuronal migration abnormalities, and core autism-related deficits. Cell, 147(1), 235–246. doi:10.1016/j.cell.2011.08.040

Peter, B., Raskind, W. H., Matsushita, M., Lisowski, M., Vu, T., Berninger, V. W., . . . Brkanac, Z. (2011). Replication of CNTNAP2 association with nonword repetition and support for FOXP2 association with timed reading and motor activities in a dyslexia family sample. Journal of Neurodevelopmental Disorders, 3(1), 39–49. doi:10.1007/s11689-010-9065-0

Petrin, A. L., Giacheti, C. M., Maximino, L. P., Abramides, D. V. M., Zanchetta, S., Rossi, N. F., . . . Murray, J. C. (2010). Identification of a microdeletion at the 7q33-q35 disrupting the CNTNAP2 gene in a Brazilian stuttering case. American Journal of Medical Genetics Part A, 152A(12), 3164–3172. doi:10.1002/ajmg.a.33749

Poliak, S., Salomon, D., Elhanany, H., Sabanay, H., Kiernan, B., Pevny, L., . . . Peles, E. (2003). Juxtaparanodal clustering of Shaker-like K+ channels in myelinated axons depends on Caspr2 and TAG-1. The Journal of Cell Biology, 162(6), 1149–1160. doi:10.1083/jcb.200305018

Powell, E. M., Campbell, D. B., Stanwood, G. D., Davis, C., Noebels, J. L., & Levitt, P. (2003). Genetic disruption of cortical interneuron development causes region- and GABA cell type-specific deficits, epilepsy, and behavioral dysfunction. The Journal of Neuroscience, 23(2), 622–631.

Raca, G., Baas, B. S., Kirmani, S., Laffin, J. J., Jackson, C. A., Strand, E. A., . . . Shriberg, L. D. (2013). Childhood Apraxia of Speech (CAS) in two patients with 16p11.2 microdeletion syndrome. European Journal of Human Genetics, 21(4), 455–459. doi:10.1038/ejhg.2012.165

Rajasethupathy, P., Antonov, I., Sheridan, R., Frey, S., Sander, C., Tuschl, T., & Kandel, E. R. (2012). A role for neuronal piRNAs in the epigenetic control of memory-related synaptic plasticity. Cell, 149(3), 693–707. doi:10.1016/j.cell.2012.02.057

Rajasethupathy, P., Fiumara, F., Sheridan, R., Betel, D., Puthanveettil, S. V., Russo, J. J., . . . Kandel, E. (2009). Characterization of small RNAs in Aplysia reveals a role for miR-124 in constraining synaptic plasticity through CREB. Neuron, 63(6), 803–817. doi:10.1016/j.neuron.2009.05.029

Raza, M. H., Amjad, R., Riazuddin, S., & Drayna, D. (2012). Studies in a consanguineous family reveal a novel locus for stuttering on chromosome 16q. Human Genetics, 131(2), 311–313. doi:10.1007/s00439-011-1134-2

Reich, D., Green, R. E., Kircher, M., Krause, J., Patterson, N., Durand, E. Y., . . . Hublin, J.-J. (2010). Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature, 468(7327), 1053–1060. doi:10.1038/nature09710

Reimers-Kipping, S., Hevers, W., Pääbo, S., & Enard, W. (2011). Humanized Foxp2 specifically affects corticobasal ganglia circuits. Neuroscience, 175, 75–84. doi:10.1016/j.neuroscience.2010.11.042

Reiner, A., Perkel, D. J., Bruce, L. L., Butler, A. B., Csillag, A., Kuenzel, W., . . . Jarvis, E. D. (2004). Revised nomenclature for avian telencephalon and some related brainstem nuclei. The Journal of Comparative Neurology, 473(3), 377–414. doi:10.1002/cne.20118

Riaz, N., Steinberg, S., Ahmad, J., Pluzhnikov, A., Riazuddin, S., Cox, N. J., & Drayna, D. (2005). Genomewide significant linkage to stuttering on chromosome 12. American Journal of Human Genetics, 76(4), 647–651. doi:10.1086/429226

Rice, G. M., Raca, G., Jakielski, K. J., Laffin, J. J., Iyama-Kurtycz, C. M., Hartley, S. L., . . . Shriberg, L. D. (2012). Phenotype of FOXP2 haploinsufficiency in a mother and son. American Journal of Medical Genetics Part A, 158A(1), 174–181. doi:10.1002/ajmg.a.34354

Rice, M. L., Smith, S. D., & Gayán, J. (2009). Convergent genetic linkage and associations to language, speech and reading measures in families of probands with Specific Language Impairment. Journal of Neurodevelopmental Disorders, 1(4), 264–282. doi:10.1007/s11689-009-9031-x

Rochefort, C., He, X., Scotto-Lomassese, S., & Scharff, C. (2007). Recruitment of FoxP2-expressing neurons to area X varies during song development. Developmental Neurobiology, 67(6), 809–817. doi:10.1002/dneu.20393

Roll, P., Rudolf, G., Pereira, S., Royer, B., Scheffer, I. E., Massacrier, A., . . . Szepetowski, P. (2006). SRPX2 mutations in disorders of language cortex and cognition. Human Molecular Genetics, 15(7), 1195–1207. doi:10.1093/hmg/ddl035

Roll, P., Vernes, S. C., Bruneau, N., Cillario, J., Ponsole- Lenfant, M., Massacrier, A., . . . Szepetowski, P. (2010). Molecular networks implicated in speech-related disorders: FOXP2 regulates the SRPX2/uPAR complex. Human Molecular Genetics, 19(24), 4848–4860. doi:10.1093/hmg/ddq415

Rousso, D. L., Pearson, C. A., Gaber, Z. B., Miquelajauregui, A., Li, S., Portera-Cailliau, C., . . . Novitch, B. G. (2012). Foxp-mediated suppression of N-cadherin regulates neuroepithelial character and progenitor maintenance in the CNS. Neuron, 74(2), 314–330. doi:10.1016/j.neuron.2012.02.024

Royer, B., Soares, D. C., Barlow, P. N., Bontrop, R. E., Roll, P., Robaglia-Schlupp, A., . . . Szepetowski, P. (2007). Molecular evolution of the human SRPX2 gene that causes brain disorders of the Rolandic and Sylvian speech areas. BMC Genetics, 8, 72. doi:10.1186/1471-2156-8-72

Royer-Zemmour, B., Ponsole-Lenfant, M., Gara, H., Roll, P., Lévêque, C., Massacrier, A., . . . Szepetowski, P. (2008). Epileptic and developmental disorders of the speech cortex: Ligand/receptor interaction of wild-type and mutant SRPX2 with the plasminogen activator receptor uPAR. Human Molecular Genetics, 17(23), 3617–3630. doi:10.1093/hmg/ddn256

Santucci, A. (1995). Effects of scopolamine on spatial working memory in rats pretreated with the serotonergic depletor p-Chloroamphetamine. Neurobiology of Learning and Memory, 63(3), 286–290. doi:10.1006/nlme.1995.1033

Sanuki, R., Onishi, A., Koike, C., Muramatsu, R., Watanabe, S., Muranishi, Y., . . . Furukawa, T. (2011). miR-124a is required for hippocampal axogenesis and retinal cone survival through Lhx2 suppression. Nature Neuroscience, 14(9), 1125–1134. doi:10.1038/nn.2897

Saunders, L. R., Sharma, A. D., Tawney, J., Nakagawa, M., Okita, K., Yamanaka, S., . . . Verdin, E. (2010). miRNAs regulate SIRT1 expression during mouse embryonic stem cell differentiation and in adult mouse tissues. Aging, 2(7), 415–431.

Scerri, T. S., Morris, A. P., Buckingham, L.-L., Newbury, D. F., Miller, L. L., Monaco, A. P., . . . Paracchini, S. (2011). DCDC2, KIAA0319 and CMIP are associated with reading-related traits. Biological Psychiatry, 70(3), 237–245. doi:10.1016/j.biopsych.2011.02.005

Scharff, C., & Petri, J. (2011). Evo-devo, deep homology and FoxP2: Implications for the evolution of speech and language. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 366(1574), 2124–2140. doi:10.1098/rstb.2011.0001

Schulz, S. B., Haesler, S., Scharff, C., & Rochefort, C. (2010). Knockdown of FoxP2 alters spine density in Area X of the zebra finch. Genes, Brain, and Behavior, 9(7), 732–740. doi:10.1111/j.1601-183X.2010.00607.x

Scott-Van Zeeland, A. A., Abrahams, B. S., Alvarez-Retuerto, A. I., Sonnenblick, L. I., Rudie, J. D., Ghahremani, D., . . . Bookheimer, S. Y. (2010). Altered functional connectivity in frontal lobe circuits is associated with variation in the autism risk gene CNTNAP2. Science Translational Medicine, 2(56), 56ra80. doi:10.1126/scitranslmed.3001344

Scott-Van Zeeland, A. A., Dapretto, M., Ghahremani, D. G., Poldrack, R. A., & Bookheimer, S. Y. (2010). Reward processing in autism. Autism Research: Official Journal of the International Society for Autism Research, 3(2), 53–67. doi:10.1002/aur.122

Shi, Z., Luo, G., Fu, L., Fang, Z., Wang, X., & Li, X. (2013). miR-9 and miR-140-5p target FoxP2 and are regulated as a function of the social context of singing behavior in zebra finches. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 33(42), 16510–16521. doi:10.1523/JNEUROSCI.0838-13.2013

Shibata, M., Nakao, H., Kiyonari, H., Abe, T., & Aizawa, S. (2011). MicroRNA-9 regulates neurogenesis in mouse telencephalon by targeting multiple transcription factors. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 31(9), 3407–3422. doi:10.1523/JNEUROSCI.5085-10.2011

Shriberg, L. D., Ballard, K. J., Tomblin, J. B., Duffy, J. R., Odell, K. H., & Williams, C. A. (2006). Speech, prosody, and voice characteristics of a mother and daughter with a 7;13 translocation affecting FOXP2. Journal of Speech, Language, and Hearing Research, 49(3), 500–525. doi:10.1044/1092-4388(2006/038)

Shu, W., Cho, J. Y., Jiang, Y., Zhang, M., Weisz, D., Elder, G. A., . . . Buxbaum, J. D. (2005). Altered ultrasonic vocalization in mice with a disruption in the Foxp2 gene. Proceedings of the National Academy of Sciences of the United States of America, 102(27), 9643–9648. doi:10.1073/pnas.0503739102

Shu, W., Yang, H., Zhang, L., Lu, M. M., & Morrisey, E. E. (2001). Characterization of a new subfamily of winged-helix/forkhead (Fox) genes that are expressed in the lung and act as transcriptional repressors. The Journal of Biological Chemistry, 276(29), 27488–27497. doi:10.1074/jbc.M100636200

Sia, G. M., Clem, R. L., & Huganir, R. L. (2013). The human language-associated gene SRPX2 regulates synapse formation and vocalization in mice. Science (New York, NY), 342(6161), 987–991. doi:10.1126/science.1245079

Smrt, R. D., Szulwach, K. E., Pfeiffer, R. L., Li, X., Guo, W., Pathania, M., . . . Zhao, X. (2010). MicroRNA miR-137 regulates neuronal maturation by targeting ubiquitin ligase mind bomb-1. Stem Cells (Dayton, Ohio), 28(6), 1060–1070. doi:10.1002/stem.431

Stoeger, A. S., Mietchen, D., Oh, S., de Silva, S., Herbst, C. T., Kwon, S., & Fitch, W. T. (2012). An Asian elephant imitates human speech. Current Biology, 22(22), 2144–2148. doi:10.1016/j.cub.2012.09.022

Strauss, K. A., Puffenberger, E. G., Huentelman, M. J., Gottlieb, S., Dobrin, S. E., Parod, J. M., . . . Morton, D. H. (2006). Recessive symptomatic focal epilepsy and mutant contactin-associated protein-like 2. The New England Journal of Medicine, 354(13), 1370–1377. doi:10.1056/NEJMoa052773

Sun, A. X., Crabtree, G. R., & Yoo, A. S. (2013). MicroRNAs: Regulators of neuronal fate. Current Opinion in Cell Biology, 25(2), 215–221. doi:10.1016/j.ceb.2012.12.007

Talkowski, M. E., Rosenfeld, J. A., Blumenthal, I., Pillalamarri, V., Chiang, C., Heilbut, A., . . . Gusella, J. F. (2012). Sequencing chromosomal abnormalities reveals neurodevelopmental loci that confer risk across diagnostic boundaries. Cell, 149(3), 525–537. doi:10.1016/j.cell.2012.03.028

Tan, S.-L., Ohtsuka, T., González, A., & Kageyama, R. (2012). MicroRNA9 regulates neural stem cell differentiation by controlling Hes1 expression dynamics in the developing brain. Genes to Cells: Devoted to Molecular & Cellular Mechanisms, 17(12), 952–961. doi:10.1111/gtc.12009

Tanabe, Y., Fujita, E., & Momoi, T. (2011). FOXP2 promotes the nuclear translocation of POT1, but FOXP2(R553H), mutation related to speech-language disorder, partially prevents it. Biochemical and Biophysical Research Communications, 410(3), 593–596. doi:10.1016/j.bbrc.2011.06.032

Tanaka, K., Arao, T., Tamura, D., Aomatsu, K., Furuta, K., Matsumoto, K., . . . Nishio, K. (2012). SRPX2 is a novel chondroitin sulfate proteoglycan that is overexpressed in gastrointestinal cancer. PLoS ONE, 7(1), e27922. doi:10.1371/journal.pone.0027922

Teramitsu, I., Kudo, L. C., London, S. E., Geschwind, D. H., & White, S. A. (2004). Parallel FoxP1 and FoxP2 expression in songbird and human brain predicts functional interaction. The Journal of Neuroscience, 24(13), 3152–3163. doi:10.1523/JNEUROSCI.5589-03.2004

Teramitsu, I., Poopatanapong, A., Torrisi, S., & White, S. A. (2010). Striatal FoxP2 is actively regulated during songbird sensorimotor learning. PLoS ONE, 5(1), e8548. doi:10.1371/journal.pone.0008548

Teramitsu, I., & White, S. A. (2006). FoxP2 regulation during undirected singing in adult songbirds. The Journal of Neuroscience, 26(28), 7390–7394. doi:10.1523/JNEUROSCI.1662-06.2006

Thompson, C. K., Schwabe, F., Schoof, A., Mendoza, E., Gampe, J., Rochefort, C., & Scharff, C. (2013). Young and intense: FoxP2 immunoreactivity in Area X varies with age, song stereotypy, and singing in male zebra finches. Frontiers in Neural Circuits, 7, 24. doi:10.3389/fncir.2013.00024

Toma, C., Hervás, A., Torrico, B., Balmaña, N., Salgado, M., Maristany, M., . . . Cormand, B. (2013). Analysis of two language-related genes in autism: A casecontrol association study of FOXP2 and CNTNAP2. Psychiatric Genetics, 23(2), 82–85. doi:10.1097/YPG.0b013e32835d6fc6

Tsui, D., Vessey, J. P., Tomita, H., Kaplan, D. R., & Miller, F. D. (2013). FoxP2 regulates neurogenesis during embryonic cortical development. The Journal of Neuroscience, 33(1), 244–258. doi:10.1523/JNEUROSCI.1665-12.2013

Vargha-Khadem, F., Watkins, K., Alcock, K., Fletcher, P., & Passingham, R. (1995). Praxic and nonverbal cognitive deficits in a large family with a genetically transmitted speech and language disorder. Proceedings of the National Academy of Sciences of the United States of America, 92(3), 930–933. doi:10.1073/pnas.92.3.930

Vernes, S. C., MacDermot, K. D., Monaco, A. P., & Fisher, S. E. (2009). Assessing the impact of FOXP1 mutations on developmental verbal dyspraxia. European Journal of Human Genetics, 17(10), 1354–1358. doi:10.1038/ejhg.2009.43

Vernes, S. C., Newbury, D. F., Abrahams, B. S., Winchester, L., Nicod, J., Groszer, M., . . . Phil, D. (2008). A functional genetic link between distinct developmental language disorders. The New England Journal of Medicine, 359(22), 2337–2345. doi:10.1056/NEJMoa0802828

Vernes, S. C., Nicod, J., Elahi, F. M., Coventry, J. A., Kenny, N., Coupe, A.-M., . . . Fisher, S. E. (2006). Functional genetic analysis of mutations implicated in a human speech and language disorder. Human Molecular Genetics, 15(21), 3154–3167. doi:10.1093/hmg/ddl392

Vernes, S. C., Oliver, P. L., Spiteri, E., Lockstone, H. E., Puliyadi, R., Taylor, J. M., . . . Fisher, S. E. (2011). Foxp2 regulates gene networks implicated in neurite outgrowth in the developing brain. PLoS Genetics, 7(7), e1002145. doi:10.1371/journal.pgen.1002145

Visvanathan, J., Lee, S., Lee, B., Lee, J. W., & Lee, S. K. (2007). The microRNA miR-124 antagonizes the antineural REST/SCP1 pathway during embryonic CNS development. Genes & Development, 21(7), 744–749. doi:10.1101/gad.1519107

Wang, B., Lin, D., Li, C., & Tucker, P. (2003). Multiple domains define the expression and regulatory properties of Foxp1 forkhead transcriptional repressors. The Journal of Biological Chemistry, 278(27), 24259–24268. doi:10.1074/jbc.M207174200
Wang, J. T., & Barres, B. A. (2012). Axon degeneration:
Where the Wlds things are. Current Biology, 22(7), R221–3. doi:10.1016/j.cub.2012.02.056

Whalley, H. C., O’Connell, G., Sussmann, J. E., Peel, A., Stanfield, A. C., Hayiou-Thomas, M. E., . . . Hall, J. (2011). Genetic variation in CNTNAP2 alters brain function during linguistic processing in healthy individuals. American Journal of Medical Genetics. Part B, Neuropsychiatric Genetics: The Official Publication of the International Society of Psychiatric Genetics, 156(8), 941–948. doi:10.1002/ajmg.b.31241

Winograd, C., Clayton, D., & Ceman, S. (2008). Expression of fragile X mental retardation protein within the vocal control system of developing and adult male zebra finches. Neuroscience, 157(1), 132–142. doi:10.1016/j.neuroscience.2008.09.005

Yu, J.-Y., Chung, K.-H., Deo, M., Thompson, R. C., & Turner, D. L. (2008). MicroRNA miR-124 regulates neurite outgrowth during neuronal differentiation. Experimental Cell Research, 314(14), 2618–2633. doi:10.1016/j.yexcr.2008.06.002

Zeesman, S., Nowaczyk, M. J. M., Teshima, I., Roberts, W., Cardy, J. O., Brian, J., . . . Scherer, S. W. (2006). Speech and language impairment and oromotor dyspraxia due to deletion of 7q31 that involves FOXP2. American Journal of Medical Genetics Part A, 140(5), 509–514. doi:10.1002/ajmg.a.31110

Zhao, C., Sun, G., Li, S., & Shi, Y. (2009). A feedback regulatory loop involving microRNA-9 and nuclear receptor TLX in neural stem cell fate determination. Nature Structural & Molecular Biology, 16(4), 365–371. doi:10.1038/nsmb.1576

Volume 9: pp. 17-74

vol9_mmp_thumbImitating Sounds: A Cognitive Approach
to Understanding Vocal Imitation

Eduardo Mercado III
University at Buffalo, The State University of New York

James T. Mantell
St. Mary’s College of Maryland

Peter Q. Pfordresher
University at Buffalo, The State University of New York

Reading Options:

Continue reading below, or:
Read/Download PDF | Add to Endnote


Abstract

Vocal imitation is often described as a specialized form of learning that facilitates social communication and that involves less cognitively sophisticated mechanisms than more “perceptually opaque” types of imitation. Here, we present an alternative perspective. Considering current evidence from adult mammals, we note that vocal imitation often does not lead to learning and can involve a wide range of cognitive processes. We further suggest that sound imitation capacities may have evolved in certain mammals, such as cetaceans and humans, to enhance both the perception of ongoing actions and the prediction of future events, rather than to facilitate mate attraction or the formation of social bonds. The ability of adults to voluntarily imitate sounds is better described as a cognitive skill than as a communicative learning mechanism. Sound imitation abilities are gradually acquired through practice and require the coordination of multiple perceptual-motor and cognitive mechanisms for representing and generating sounds. Understanding these mechanisms is critical to explaining why relatively few mammals are capable of flexibly imitating sounds, and why individuals vary in their ability to imitate sounds.

Keywords: mimicry; copying; social learning; singing; emulation; imitatible; convergence; imitativeness

Author Note: Preparation of this paper was made possible by NSF grant #SMA-1041755 to the Temporal Dynamics of Learning Center, an NSF Science of Learning Center and NSF grant #BCS-1256864. We thank Sean Green, Emma Greenspon, and Benjamin Chin for comments on an earlier version of this paper. Correspondence concerning this article should be addressed to Eduardo Mercado III, Department of Psychology, University at Buffalo, SUNY, Buffalo, NY, 14260. Email: emiii@buffalo.edu.


In his seminal text, Habitat and Instinct, Lloyd Morgan (1896, p. 166) describes two general kinds of imitation: instinctive imitation and intelligent or voluntary imitation. The examples he provides of intelligent imitation mostly involve reproducing sounds—a child copies words used by his companions, a mockingbird imitates the songs of 32 other bird species, a jay imitates the neighing of a horse, and so on. In fact, most of the examples of “imitation proper” that Morgan provides consist of birds reproducing the sounds of other species. Similarly, Romanes (1884) focuses almost exclusively on reports of birds imitating songs, music, and speech in his discussion of imitation. These classic portrayals of vocal reproductions as providing the best and clearest examples of imitation stand in stark contrast to current psychological discussions of imitation, which often classify examples such as those given by Romanes and Morgan as non-imitative performances that merely resemble actual imitation (Byrne, 2002; Heyes, 1996). When did phenomena that were once considered archetypal examples of voluntary imitation transform into a footnote of modern cognitive theories? Was some discovery made that fundamentally changed our scientific understanding of the processes underlying vocal imitation? Have psychologists or biologists succeeded in explaining what vocal imitation is and how it works to the point where little can be gained from further study? Or, have theoretical assumptions led scientists to underestimate the cognitive mechanisms required for an individual to be able to flexibly imitate sounds?

In the present article, we attempt to identify what exactly vocal imitation entails, and to assess whether current explanatory frameworks adequately account for this ability, including its apparent rarity among mammals. Past theoretical and empirical considerations of vocal imitation have often focused on the ability of birds to learn songs or reproduce speech (Kelley & Healy, 2010; Margoliash, 2002; Nottebohm & Liu, 2010; Pepperberg, 2010; Tchernichovski, Mitra, Lints, & Nottebohm, 2001), especially during development, or on the sophisticated ways in which birds interactively copy songs (Akcay, Tom, Campbell, & Beecher, 2013; Molles & Vehrencamp, 1999; J. J. Price & Yuan, 2011; Searcy, DuBois, Rivera-Caceres, & Nowicki, 2013). In contrast, vocal imitation by mammals other than humans has received little attention. When mammalian vocal imitation has been discussed, it typically has been described as a vocal learning mechanism, because of its presumed involvement in vocal repertoire development (Janik & Slater, 1997, 2000; Tyack, 2008). Although imitation clearly has an important role in learning, imitation by definition involves performance (via reproduction) and thus can exhibit varying degrees of success. Moreover, the effectiveness of imitation is itself an index of learning (e.g., we say a tennis player has reached expertise in serving when he or she can demonstrate the coordination exhibited by a professional). Thus, we argue here that vocal imitation by adult mammals is better viewed as performance of a learned skill, and that a closer examination of those species and individuals that have acquired this skill to a high degree can clarify the mechanisms that underlie vocal imitation abilities. Currently, the only mammals that have clearly demonstrated the ability to voluntarily imitate sounds are primates (particularly humans) and cetaceans (whales and dolphins). The main goals of this article are to reassess the available evidence on vocal imitation in these two groups and to provide new perspectives on how to better integrate future investigations of vocal imitation phenomena.

The paper is divided into six sections. In the first two sections, we consider alternate conceptualizations of vocal imitation (both historical and modern) that have different theoretical implications for the origin and role of vocal imitation. These alternate frameworks function as hypotheses against which we compare the literature summarized in subsequent sections. In section three, we discuss possible constraints on vocal imitation with respect to sounds that are imitatible by human and non-human primates, and also consider the degree to which the vocal motor system, as opposed to other motor systems, is attuned to the imitation of sound. Section four evaluates past reports that cetaceans, a group of mammals famous for their vocal flexibility, are capable of imitating sounds. Consideration of evidence from both primates and cetaceans leads to the proposal that sound imitation may serve a critical role for spatial perception and the coordination of actions (section five), in contrast to other accounts, which focus on its role in the development of social communication. Finally, in the sixth section we discuss possible mechanisms for vocal imitation, highlighting an existing computational model of speech learning and imitation that may provide an integrative theoretical framework for conceptualizing the representational mechanisms underlying the sound imitation abilities of mammals.

I. What Is Vocal Imitation?

Over the past century, researchers have used varying terminology to describe animals’ reproduction of sounds. In some cases, the same term has been used to describe different classes of phenomena. In others, different terms have been applied to the same phenomenon. For instance, the terms vocal mimicry and vocal copying have often been used as either synonyms for vocal imitation, or as a way to distinguish particular kinds of imitative or non-imitative vocal processes (Baylis, 1982; Morgan, 1896; Witchell, 1896). Marler (1976a) distinguished cases in which vocal production is modified as a result of auditory experience (vocal learning) from cases in which an individual produces sounds of a novel morphology by imitating previously experienced sounds (vocal imitation). Similarly, in their description of vocal developmental processes in bottlenose dolphins (Tursiops truncatus), McCowan and Reiss (1995) distinguished vocal learning, which they suggest occurs mainly during development, from vocal mimicry, which they describe as an imitative process that contributes to vocal learning (see Wickler, 2013, for a discussion of how the term mimicry might best be applied to sound production). To avoid potential confusion, we provide a glossary detailing our use of terminology (Table 1).

There is general consensus that vocal imitation must involve some attempt (intentional or incidental) to match an auditory event with the vocal motor system. The nature of this ability, however, has been a point of debate. Early on, Thorndike (1911) rejected the proposals of Morgan (1896) and Romanes (1884) that vocal reproductions by birds were examples of imitation. Thorndike seemed to believe that vocal imitation required less sophisticated mental capacities than other kinds of imitation. He claimed that the ability to copy sounds was a specialized capacity possessed by a few select bird species, and that, “we cannot . . . connect these phenomena with anything found in mammals or use them to advantage in a discussion of animal imitation” (1911, p. 77). Many researchers have subsequently endorsed Thorndike’s characterization of vocal imitation, either explicitly or implicitly (Byrne & Russon, 1998; Galef, 1988; Heyes, 1994; Shettleworth, 1998). For example, Tyack and Clark (2000) described the vocal imitation abilities of cetaceans as “the most unusual specialization in cetaceans.” In contrast, several experimental psychologists have argued that vocal imitation abilities are not specialized at all, but simply reflect basic mechanisms of conditioning (reviewed by Baer & Deguchi, 1985; Kymissis & Poulson, 1990). Some researchers describe vocal imitation as a specialized social learning mechanism that enables individuals to rapidly acquire new communicative signals (Bolhuis, Okanoya, & Scharff, 2010; Janik & Slater, 2000; Kelley & Healy, 2011; Sewall, 2012; Tyack, 2008), whereas others classify all instances of vocal reproduction as non-imitative phenomena (Byrne, 2002; Galef, 1988; Heyes, 1996; Zentall, 2006). In the following, we critically consider each of these approaches to explaining what vocal imitation is, noting their strengths and limitations.

Is Vocal Imitation an Outcome of Instrumental Conditioning?

Instrumental (operant) conditioning is a learning process in which the consequences of an action determine its future likelihood of occurring (Domjan, 2000; Immelmann & Beer, 1989). Miller and Dollard (1941) suggested that apparently copied actions (including vocal acts) might in some cases only match by coincidence, having been reinforced independently of any similarities in performance. In such situations, some apparent cases of vocal “imitation” can be viewed as an instance of instrumental conditioning, referred to as matched-dependent behavior. For example, one could train a dog to produce whining sounds whenever it hears the cries of a baby. The dog’s whines might be acoustically similar in certain respects to the baby’s cries, but these similarities are coincidental; the dog might just as easily have been trained to bark or to open a door whenever it heard the cries. Though the trained behavior may match the discriminative stimulus, it is not the degree of match per se that leads to reinforcement. Miller and Dollard distinguished matched-dependent behavior from copying, in which the presence of reinforcement is contingent on the successfulness of matching. Learning to sing a melody by matching the sounds produced by an instructor would be an example of this kind of vocal copying. The teacher uses feedback to reinforce correct matches and to punish mismatches. The main difference between matched-dependent behavior and copying in Miller and Dollard’s framework is that a copier directly compares his acts (or their outcomes) with those of a target to evaluate their similarity, such that the level of detected similarity becomes a cue controlling behavior. A commonality across matched-dependent behavior and copying is that changes in vocal behavior are described as reflecting reinforcement histories alone, and thus do not require any specialized learning mechanisms.

Miller and Dollard’s explanation for acts of vocal imitation (construed as copying or matched-dependent behavior) continues to be endorsed by some psychologists (Heyes, 1994, 1996). For instance, Heyes (1994, p. 224) suggested that, “copying is virtually synonymous with vocal imitation.” This interpretation of vocal imitation rests on four major assumptions: (1) a vocalizing individual initially produces sounds at random, after which a subset are rewarded; (2) all a vocalizing individual needs to be able to do to reproduce a sound is recognize similarities between produced sounds and previously perceived sounds; (3) mismatches between an internally stored model of a previously experienced sound and percepts of produced sounds drive instrumental conditioning see also the discussion of auditory template matching in section six); and (4) such mismatches correspond to errors in production. Like those of many previous researchers, the examples of vocal copying provided by Heyes focus mainly on song learning and speech reproduction by birds.

Mowrer (1952, 1960) similarly proposed that vocal imitation, even in humans, was a consequence of instrumental conditioning rather than a specialized ability. He suggested that for such conditioning to occur, a sound produced by a model initially had to be established as a secondary reinforcer by being associated with pleasant outcomes. Later, a “babbling” individual (e.g., a parrot or human infant) might occasionally make a similar sound. Assuming that the vocalizing individual generalized from its past experiences with the secondary reinforcer, hearing the self-produced sound would reinforce the immediately preceding vocal act. The more similar to the original sound the babbled sound was, the more reinforcing it should be, leading to a kind of autoshaping or successive approximation in which the vocalizing individual is differentially self-rewarded based on how closely it produces copies of the original sound. By this account, vocal imitation is simply an automatic, trial-and-error process that depends on initial rewards from another organism to establish certain sounds as secondary reinforcers. Mowrer thus describes vocal imitation phenomena as the result of latent learning about associations between sounds and rewards. He notes that sounds can only maintain their efficacy as secondary reinforcers if they are occasionally supplemented by external social reinforcements. Thus, as in Miller and Dollard’s (1941) explanation of copying, Mowrer claimed that feedback from a teacher is critical for vocal imitation to occur.

Baer and colleagues (1967) later showed that explicitly trained vocal imitation in children immediately generalized to novel sounds. They suggested that topographical similarity between a performed act and a perceived act could become a conditioned reinforcer, which could lead to generalized imitation across different stimuli (see also Garcia, Baer, & Firestone, 1971; Gewirtz & Stingle, 1968; Zentall & Akins, 2001). Their proposal parallels Miller and Dollard’s (1941) claim that recognition of similarity is a critical component of vocal copying, and again requires reinforcement of imitative vocal acts by a teacher. In Baer and colleagues’ generalized imitation framework, a vocalization is imitative if it occurs after a vocal act demonstrated by another individual, and if the form of the model’s vocalization determines the form of the copier’s vocalization. The proposal that generalized vocal imitation can be viewed as a consequence of operant conditioning has received some support from recent studies of the role of vocal imitation in speech development by children (Poulson, Kymissis, Reeve, Andreators, & Reeve, 1991; Poulson, Kyparissos, Andreatos, Kymissis, & Parnes, 2002).

Collectively, past theoretical analyses of vocal imitation by experimental psychologists have often focused on establishing that this phenomenon can be viewed as an outcome of instrumental conditioning with few if any unique characteristics. These accounts generally do not explain why vocal imitation abilities are absent in most mammals. Given that mammals are quite capable of being instrumentally conditioned, some researchers have suggested that the rarity of vocal imitation abilities in mammals reflects limitations in vocal control (Arriaga & Jarvis, 2013; Deacon, 1997; Fitch, 2010; Mowrer, 1960). However, this explanation remains speculative (Lieberman, 2012), and others have suggested that what is missing are mechanisms that make it possible for an organism to adaptively adjust existing vocal control mechanisms. For example, Moore (2004) hypothesized that an organism must possess specialized imitative learning mechanisms beyond those necessary for instrumental conditioning before vocal imitation becomes possible (see also Subiaul, 2010). Thus, Thorndike’s (1911, p. 77) view that vocal imitation abilities are “a specialization removed from the general course of mental development,” has resurged in recent years and is currently the dominant view among biologists studying vocal imitation.

Is Vocal Imitation a Specialized Type of Vocal Learning?

Figure 1. Taxonomy proposed by Janik and Slater (2000) in which vocal learning is distinguished from contextual learning and subtypes of vocal learning are associated with different effectors. In this framework, vocal imitation of amplitude and duration features involves respiratory learning, imitation of frequency contours or pitch involves phonatory learning, and imitation of timbre involves articulatory learning.

Figure 1. Taxonomy proposed by Janik and Slater (2000) in which vocal learning is distinguished from contextual learning and subtypes of vocal
learning are associated with different effectors. In this framework, vocal imitation of amplitude and duration features involves respiratory learning, imitation of frequency contours or pitch involves phonatory learning, and imitation of timbre involves articulatory learning.

As noted above, Marler (1976a) defined vocal learning as a process whereby vocal production is modified as a result of auditory experience. More recently, this term has been used to refer to any type of learning that involves vocal systems (Arriaga & Jarvis, 2013). Several reviews of vocal learning by mammals suggest that it represents a specialized form (or actually several different forms) of motor learning (Egnor & Hauser, 2004; Janik & Slater, 1997, 2000; Jarvis, 2013; Sewall, 2012; Tyack, 2008). Within modern vocal learning taxonomies, vocal imitation is often described as a particular type of vocal learning called vocal production learning (Fitch, 2010; Tyack, 2008) or production learning (Byrne, 2002; Janik & Slater, 2000). Vocal production learning, defined as the ability to modify features of sounds based on auditory inputs, has been distinguished from contextual learning, which is said to consist of learning how to use or comprehend sounds (Figure 1). Janik and Slater (2000) divided vocal production learning into three “forms” depending on which articulators were thought to be involved. Tyack (2008) identified over a dozen forms based on how animals used the sounds (e.g., vocal production learning involving sounds used for echolocation). The three kinds of vocal production learning proposed by Janik and Slater correspond to acoustic features controlled by the vocalizing individual, including: (1) duration and amplitude; (2) pitch or frequency modulation; and (3) relative energy distribution or timbre. They hypothesize that modifying the duration and amplitude of a sound represents the simplest form of vocal learning (because these features depend mainly on respiratory control), and that modifying frequency components through control of vocal systems requires more advanced mechanisms (Janik & Slater, 1997, 2000). They also suggest that, because of its rarity, the ability to copy (i.e., imitate) novel sounds is the most advanced form of vocal learning.

A simple way of thinking about the distinction between contextual learning and production learning proposed by Janik and Slater (2000) is that contextual learning determines when animals produce the sounds they know how to make, whereas production learning determines what sounds they know how to make. For instance, situations in which animals respond to hearing certain sounds by producing similar sounds (e.g., dogs that bark when they hear barking or infants that cry when they hear crying) would not qualify as vocal imitation or vocal learning by these criteria (Andrew, 1962). These cases would meet the criteria for contextual learning, however, because sound usage is context dependent. Such phenomena are typically referred to as instances of vocal contagion (Piaget, 1962).

Byrne (2002), following the terminology proposed by Janik and Slater (2000), describes instances of vocal contagion as a kind of contextual learning1 in which heard sounds prime particular vocal acts, a process that he refers to as response facilitation. He describes vocal production learning as a potentially more interesting case because it includes the generation of new vocal acts and therefore requires more than just response facilitation. Nevertheless, he suggests that such vocal acts are not imitative, because in some cases only the outcomes of the actions are reproduced (see also Morgan, 1896). For instance, a mynah bird reproducing speech sounds cannot replicate the speech acts of a human, because the bird does not use the same vocal organs to produce sounds (however, see Beckers, Nelson, & Suthers, 2004; Patterson & Pepperberg, 1994).

Vocal imitation of novel sounds often has been touted as the clearest evidence of vocal production learning (Fitch, 2010; Janik, 2000; Tyack, 2008). The basic logic underlying past emphasis on the imitation of novel sounds is that if a vocalization is not novel, then one cannot be sure that imitation actually occurred. The origins of this criterion can be traced to Thorpe (1956, p. 135), who proposed that, “By true imitation is meant the copying of a novel or otherwise improbable act or utterance, or some act for which there is clearly no instinctive tendency.” Herman (1980) was one of the first to suggest that copying novel sounds requires more sophisticated cognitive mechanisms than modifying features of existing vocalizations (see also Baylis, 1982). He noted that many mammalian species can be trained to adjust their existing vocalizations into new forms or usage patterns (Adret, 1993; Johnson, 1912; Koda, Oyakawa, Kato, & Masataka, 2007; Molliver, 1963; Myers, Horel, & Pennypacker, 1965; Salzinger, 1993; Salzinger & Waller, 1962; Schusterman, 2008; Schusterman & Feinstein, 1965; Shapiro & Slater, 2004), whereas few species show any ability or inclination to copy novel sounds. Hearing individuals vocalize in ways that resemble the vocalizations of other species forcefully suggests that one has witnessed an imitative act. However, as originally noted by both James (1890) and Thorndike (1911), observations of an organism producing a novel action that resembles human actions, however precisely, does not provide strong evidence that the organism is imitating. Conversely, the fact that production of familiar vocalizations can potentially be attributed to mechanisms other than imitation does not provide strong evidence that those vocalizations are not truly imitative. Such ambiguities severely limit the usefulness of current taxonomical approaches for describing and understanding vocal imitation processes.

Limitations of the Vocal Learning Framework

Problems with defining vocal imitation. Past emphasis on specifying criteria for reliably identifying instances of vocal imitation have led researchers to focus almost exclusively on situations in which similarities between a produced sound and other environmental sounds seem unlikely to have occurred by chance (e.g., when the sound is novel and acoustically complex). However, the fact that human observers perceive an animal’s vocalizations as strikingly similar to a salient environmental sound (e.g., electronic sounds, speech, or melodies), either through subjective impressions or quantitative acoustic analyses, is no more evidence that vocal imitation has occurred than the fact that certain photos of the surface of Mars look like a face is evidence that aliens reconfigured the landscape into that shape. Videos showing examples of cats and dogs producing vocalizations that are aurally comparable to the phrase, “I love you,” are now commonplace, and yet few if any scientists would view these as evidence that these pets are imitating human speech. This is because many different mechanisms can lead to the production of atypical sounds. A novel vocalization might be a seldom-used part of an individual’s repertoire, the result of some combination of previously learned vocal acts, or an aberration resulting from atypical genetics, diseases, or neural deficits. If an elephant produces a sound that resembles that of trucks (Poole, Tyack, Stoeger-Horwath, & Watwood, 2005) or speech (Stoeger et al., 2012), it remains possible that these sounds are ones that a small number of elephants infrequently make, independently of whether they have ever heard trucks or speech. Alternatively, vocalizations may have been modified through differential reinforcement to more closely resemble those of environmental sounds (which would represent a case of contextual learning using the taxonomy shown in Figure 1). Consequently, the novelty criterion does not reliably differentiate vocal imitation, vocal learning, or contextual learning.

In contrast, human research benefits from experimenters’ ability to explicitly instruct human participants to intentionally imitate sound sequences (e.g., Mantell & Pfordresher, 2013). In these experiments, researchers assume that participants are following instructions and earnestly attempting to imitate sounds. This assumption implies that both accurate and inaccurate reproductions of any sound sequence (novel or familiar) are viewed as valid attempts at vocal imitation. A second branch of human vocal imitation research exploits the tendency for human speech patterns to align with (become more similar to) previously experienced speech stimuli (e.g., Goldinger, 1998). In these studies, experimenters instruct their subjects to perform a vocal task such as word naming without actually telling them to imitate sounds. Researchers assume that when an individual produces speech with features similar to those produced by a speaker s(he) has been recently exposed to, then this performance is indicative of spontaneous vocal imitation. Human vocal imitation research thus uses contextual factors as criteria for identifying imitative acts rather than idiosyncratic features of vocalizations.

A second criterion that has occasionally been used to exclude vocal performances from involving imitation is that an action (vocal or otherwise) can only be considered imitative if the specific movements of a model are replicated (e.g., Byrne, 2002). An oddity of this criterion is that if a dolphin were to copy sounds produced by a sea lion, then this would not count as imitation, because dolphins and sea lions have different sound producing organs. However, if a second dolphin copied the first dolphin’s “barking,” then this would count as imitation, because the imitator shares the same vocal organs as the model, and thus would likely replicate the sound producing movements of the model (see Wickler, 2013, for a more extensive critique of such distinctions). The logical consequence of this exclusionary criterion is that cross-species vocal imitation is impossible, because “imitation” requires identical physiological production constraints. However, defining imitation in this way does little to clarify what humans and other animals are doing when they seem to be copying sounds they have heard. Furthermore, it creates a false dichotomy between vocal imitation and other more visible forms of sound imitation (e.g., imitating a percussive rhythm).

Problems with equating learning and performance. A basic assumption underlying the claim that imitation of novel sounds provides the clearest evidence of vocal production learning is that, because the organism is producing an “otherwise improbable” sound that it has not been observed to produce before, it must have gained the ability to do so through its auditory experience (i.e., it must have learned how to produce the sound by hearing it). It is clear, however, that the individual imitating the sound had the vocal control mechanisms necessary for producing the novel sound prior to ever hearing that sound. Hearing the novel sound merely set the occasion for the individual to express an already present capacity. By analogy, a person who has never seen a motorcycle before and sits on one for the first time does not spontaneously acquire the motor control needed to sit on a motorcycle simply by seeing someone else sit on one. Instead, the person generalizes existing sitting skills to a novel object. Similarly, an organism that copies features of a novel sound it has heard is applying existing vocal production skills to a novel auditory object. For example, upon hearing a tugboat horn, a child may successfully reproduce the long, low, shifting, spectrotemporal pattern on her first try, at least in relative pitch terms. There is no reason to think that experience with an unfamiliar percept somehow endows an observer with previously unavailable motor control abilities (Galef, 2013). Consequently, reproduction of novel sounds does not provide clear evidence of vocal production learning, and such performances actually might be better viewed as evidence of contextual learning, because it is the context that determines when the individual reproduces a novel sound.

It is important to note that psychologists’ use of the term learning differs from how this term is used colloquially and by biologists in that psychologists view learning as a long-lasting change in the mechanisms of behavior that results from past experiences with particular stimuli and responses (Domjan, 2000). In contrast, biologists’ definition of learning as “behavioral changes effected by experience” (Immelmann & Beer, 1989), makes no distinction between short- and long-term changes, emphasizes changes in actions rather than changes in mechanisms, and makes no attempt to specify why an action changed. Psychologists do not consider changes in behavior as strong evidence of learning, because many experiences such as fatigue, hunger, pain, injury, motivation, and drunkenness can also produce changes in behavior. Furthermore, numerous experiments have shown that learning can occur without any overt changes in an organism’s behavior. For these reasons, experimental psychologists have drawn a distinction between learning and performance—performance refers to what organisms do. How an organism performs often reflects past learning, but there is not a one-to-one mapping between performance and learning. Because biologists do not make a similar distinction, some phenomena they might classify as vocal learning do not meet psychologists’ criteria for learning. In the following, we use the term learning in the psychological sense, but use the phrase “vocal learning” in the sense preferred by biologists (see Table 1).

From a psychological perspective, production of novel sounds provides no evidence that learning has occurred. In fact, the learning that enables an individual to reproduce a particular sound may occur long before the novel sound is actually produced. This is not to say that vocal imitation of familiar or novel sounds never plays a role in vocal learning. Certainly, copying of sounds can afford many opportunities for learning that would not otherwise be available, especially in young children. Nevertheless, it is important to recognize that not only does vocal learning occur in the absence of vocal imitation (reviewed by Schusterman, 2008), but vocal imitation can also occur without involving any new learning. These facts are clearly problematic for any taxonomy that defines vocal imitation as a kind of learning.

Synthesis

The two approaches to explaining vocal imitation described above—defining vocal imitation either as an outcome of instrumental conditioning or as a kind of learning (the vocal learning framework)—parallel more general frameworks for describing imitation. For instance, in evaluating strategies for defining imitation, Heyes (1996) identified three basic solutions: the essentialist solution, the positivist solution, and the realist solution. The essentialist solution is a definition-by-exclusion strategy in which researchers classify different imitation-like phenomena using specific criteria in an attempt to identify what is truly an imitative act. The vocal learning taxonomical framework is an example of this strategy. Limitations of this approach are that classifications are only as good as the demarcation criteria that are developed, and that defining vocal imitation by exclusionary criteria does not clarify what vocal imitation actually entails. The positivist solution involves selecting an operational definition for what will be called vocal imitation. The instrumental conditioning framework qualifies as this type of strategy because it focuses less on differentiating vocally imitative acts from other acts, and more on identifying the conditions that lead to vocal reproductions. Finally, there is the realist solution, which focuses on explaining behavior in terms of theories about mental processes that yield testable hypotheses. Cognitive accounts of vocal imitation by adult humans exemplify this approach.

II. Vocal Imitation Is a Cognitive Process

One reason that vocal imitation often has been described as a learning mechanism is that comparative studies have focused on its role in vocal development (Baer, Peterson, & Sherman, 1967; Marler, 1970; McCowan & Reiss, 1997; Mowrer, 1952; Nottebohm & Liu, 2010; Subiaul, Anderson, Brandt, & Elkins, 2012; Tyack & Sayigh, 1997). In particular, there have been extensive comparisons between song learning by birds and speech learning by humans (Bolhuis et al., 2010; Doupe & Kuhl, 1999; Jarvis, 2004, 2013; Lipkind et al., 2013; Marler, 1970)2. From this perspective, vocal imitation provides a way for naïve youngsters to acquire the communicative abilities of mature adults. In fact, some researchers have argued that copying of sounds outside the natural repertoire may be a functionless evolutionary artifact (Garamszegi, Eens, Pavlova, Aviles, & Moller, 2007; Lachlan & Slater, 1999). Although vocal imitation abilities can be an important component of vocal development, the most versatile vocal imitators are adult humans (Amin, Marziliano, & German, 2012; Majewski & Staroniewicz, 2011; Revis, De Looze, & Giovanni, 2013). Furthermore, humans invariably achieve expertise in vocal imitation abilities well after learning to produce speech sounds. The most capable human vocal imitators perform copying feats that few adults can replicate. One could even argue that highly developed communication skills are a prerequisite for the highest levels of proficiency in vocal imitation, because professional imitators (e.g., impersonators, actors, singers) often receive detailed verbal feedback from instructors and peers over several years.

Viewing Vocal Imitation as a Component of Auditory Cognition

Cognitive psychologists’ conceptualization of vocal imitation by adult humans differs dramatically from that proposed by biologists and comparative psychologists for vocal imitation by non-humans. In particular, the emphasis in cognitive studies of vocal imitation is on how sounds and vocal acts are perceived, how links between percepts and actions contribute to performance, and how mental representations of events contribute to these processes. From this perspective, studies of vocal imitation in adults can be viewed as part of the field of auditory cognition, which focuses on understanding how mental representations and cognitive processes enable the understanding and use of sound.

In some respects, the cognitive approach to describing vocal imitation represents a return to Morgan’s (1896) portrayal of imitation. Recall that Morgan divided imitation into two types: instinctive and voluntary. As an example of instinctive vocal imitation, he described a scene in which a chick comes across a dead bee and gives an alarm call, which leads a second nearby chick to give a similar alarm call. Today, the latter part of this scenario would be described as a case of vocal contagion. Morgan contrasted this kind of reflexive vocal matching with voluntary imitation, which he also refers to as conscious, intentional, or intelligent imitation. He noted that voluntary imitation is not independent of instinctive imitation, but rather builds on it (see also Romanes, 1884). Notably, frameworks that describe vocal imitation as either instrumental conditioning or vocal learning make no distinction between reflexive and voluntary imitation. This distinction is common in cognitive studies of human vocal imitation, however, and has recently also been revisited in discussions of motor imitation. For example, Heyes (2011) distinguishes between two “radically different” types of imitation: a complex, intentional type that individuals can use to acquire novel behaviors (voluntary imitation), and a simple, involuntary variety that involves duplicating familiar actions (referred to as automatic imitation). Cognitive psychologists have also drawn a distinction between overt imitative acts, which involve the observable, physical reproduction of sound, and covert imitation, which involves the unobservable, mental, or subvocal reproduction of sounds or actions (Pickering & Garrod, 2006; Wilson & Knoblich, 2005). These distinctions have important implications for understanding what vocal imitation is, and for identifying the cognitive processes that make vocal imitation possible.

Automatic Imitation Suggests Vocal Imitation Frequently Goes Unnoticed

Automatic vocal imitation has been studied extensively by speech researchers and has been observed at multiple levels of processing, including syntactic, prosodic, and lexical alignment in conversation (Garrod & Pickering, 2009; Gregory & Webster, 1996; Levelt & Kelter, 1982; Neumann & Strack, 2000; Pickering & Branigan, 1999; Shockley, Richardson, & Dale, 2009). Automatic imitation is modulated by social factors such as gender (Namy, Nygaard, & Sauerteig, 2002), personal closeness (Pardo, Gibbons, Suppes, & Krauss, 2012), attitude toward the interlocutor (Abrego-Collier, Grove, & Sonderegger, 2011), conversational role (Pardo, Jay, & Krauss, 2010), model attractiveness (Babel, 2012), and even sexual orientation (Yu et al., 2011). Talkers apparently imitate both visual and auditory components of observed speech (Legerstee, 1990; R. Miller, Sanchez, & Rosenblum, 2010). Automatic vocal imitation processes may occur relatively continuously without any awareness by the vocalizing individual (or others) that they are occurring.

One common way of generating automatic vocal imitation in the laboratory is to have talkers listen to and then intentionally repeat just-heard speech (a task called shadowing). Listeners can voluntarily replicate speech with a delay as short as 150 ms (Porter & Lubker, 1980). Shadowing could be viewed as a case of rapid vocal imitation, but is more often described as word repetition. Rapid production of just-heard words supports the notion that perceived sounds may be automatically converted into articulatory commands (Skoyles, 1998). When a talker produces shadowed words in ways that are more similar to the just-heard words than to his or her spontaneous speech, then this is viewed as evidence that the talker has automatically imitated features of the just-heard words (Fowler, Brown, Sabadini, & Weihing, 2003; Honorof, Weihing, & Fowler, 2011; Kappes, Baumgaertner, Peschke, & Ziegler, 2009; Mitterer & Ernestus, 2008; Nielsen, 2011; Shockley, Sabadini, & Fowler, 2004). Goldinger (1998) found that immediately shadowed words were more likely to be judged by external evaluators as matching the just-heard sound than versions produced after a four second delay. He also found that when talkers shadowed uncommon words, their reproductions were more likely to be judged as matching the just-heard sound than when they shadowed common words. Similar effects have been observed in tasks in which talkers replicated unique word features that were encountered up to a week previously (Goldinger & Azuma, 2004; Nielsen, 2011). These findings suggest that the effects of automatic vocal imitation mechanisms on speech production may persist for long periods. Further evidence that experienced sounds may involuntarily affect vocal production comes from the earworm phenomenon, wherein a person involuntarily mentally or overtly rehearses a catchy tune that was previously encountered (Beaman & Williams, 2010; Halpern & Bartlett, 2011; Williamson et al., 2012).

People voluntarily shadow words when they are instructed to repeat them in laboratory studies, but there are cases in which individuals involuntarily shadow recently heard sounds in their environment, referred to as echolalia. Echolalia is commonly seen in people with autism and is also associated with several other disorders (Fay, 1969; Schuler, 1979; van Santen, Sproat, & Hill, 2013). It can involve either immediate or delayed reproduction of relatively complex sequences of speech sounds (Prizant & Rydell, 1984) or non-vocal sounds (Fay & Coleman, 1977; Filatova, Burdin, & Hoyt, 2010), and is often viewed as a contributing factor to dysfunctional language learning (Eigsti, de Marchena, Schuh, & Kelley, 2011). To date, detailed acoustic comparisons between heard speech and echolalic speech have not been performed, so the fidelity with which repeated sounds are copied is unclear.

Collectively, past studies of automatic vocal imitation demonstrate that humans sometimes reproduce features of previously experienced sounds without intending to do so and without being aware that they are copying heard features. Because automatic vocal imitation is often not apparent to the vocalizing individual and can occur after a significant delay, it may be more prevalent than is currently recognized. How automatic imitation relates to voluntary vocal imitation is a key question that researchers have grappled with for over a century.

Covert Imitation Suggests That Vocal Imitation May Enhance Perceptual Processing

Virtually all past discussions of vocal imitation assume that it is a process that primarily serves to enable an individual to produce certain sounds by reference to sounds previously heard. A recent alternative perspective is that imitative abilities may instead (or additionally) facilitate the prediction of future events (Grush, 2004; Hurley, 2008; Wilson & Knoblich, 2005). This perspective assumes that individuals are better able to perceive the actions of conspecifics if they can construct mental simulations of ongoing acts (including vocal acts) that occur in parallel with the perception of those acts. These mental simulations would be available to the individual perceiving the acts, but would not be evident in the observer’s behavior.

Covert vocal imitation is described as an automatic process in which a sound is represented, at least in part, in terms of the motor acts necessary to re-create the sound. The suggestion is that vocal imitative processing is not a rare event (as suggested by frameworks that only consider production of novel sounds to be evidence of imitation), but is instead a routine component of auditory processing. Echolalia is often interpreted as evidence that auditory processing normally engages an imitative process that would naturally lead to overt imitative acts if not for being actively inhibited (Fay & Coleman, 1977; Grossi, Marcone, Cinquegrana, & Gallucci, 2012). From this perspective, what is rare is for an organism to produce overt actions that reveal these representational processes—overt vocal imitation then becomes analogous to “thinking out loud.” Wilson and Knoblich (2005) suggest that vocal imitation serves not to enable the acquisition of new sounds, but rather as a perceptual process that uses “implicit knowledge of one’s own body mechanics as a mental model to track another person’s actions in real time” (p. 463). The advantage of such processing is that a listener can potentially fill in missing or ambiguous information and infer the trajectory of likely actions in the near future. In section five, we consider in more detail how such mental simulations may specifically contribute to audiospatial perception by cetaceans.

Voluntary Imitation Suggests That Vocal Imitation Can Be Consciously Controlled

Piaget (1962) was one of the first psychologists to collect empirical evidence that automatic vocal imitation abilities in human infants may provide a foundation for the later development of voluntary vocal imitation abilities. He strongly argued that vocal imitation was not an evolutionarily specialized ability. In fact, Piaget starts his book on imitation by stating that, “Imitation does not depend on an instinctive or hereditary technique . . . . the child learns to imitate” (1962, p. 5). Piaget proposed six successive stages in the development of voluntary vocal imitation in children: (1) vocal contagion, (2) interactive copying of sounds, (3) systematic rehearsal of sounds in the repertoire, (4) exploratory copying of novel sounds, (5) increased flexibility at imitating novel events, and (6) deferred imitation. Studies of vocal development in parrots led Pepperberg (2005) to suggest that parrots progress through similar stages of imitative development. She described three levels of vocal imitation proficiency, starting with the involuntary copying of sounds, followed by intentional production of copied sounds, which in some cases develops into more sophisticated, creative sound production including the recombination of familiar segments into new sounds.

Relatively few researchers have theorized about the mechanisms or functions of vocal imitation in adult humans. Donald (1991) described vocal mimesis by adults as differing from vocal imitation in that it involves the invention of intentional representations as well as “the ability to produce conscious, self-initiated, representational acts that are intentional but not linguistic” (p. 168). He noted that vocal reproduction can serve communicative purposes, but may also function simply to represent an event to oneself. In his framework, vocal mimesis allows for the self-cued recall of previously perceived sounds, as well as the control of how those sounds might be transformed during reproductions; vocal acts that were initially involuntary (e.g., laughing) can be explicitly recalled and used intentionally, for instance in reenactments of past episodes or when acting out a scene. Donald proposed that the cognitive basis of vocal mimesis involves a combination of episodic memory abilities and “an extended conscious map of the body and its patterns of action, in an objective event space; and that event space must be superordinate to the representation of both the self and the external world” (p. 189). He describes the main outputs of this system as consisting of self-representations and episodic memories. Thus, his proposed mimetic system (Figure 2) builds on and encompasses an episodic memory system, which some describe as one of the most advanced cognitive systems in adult humans (Tulving, 2002).

Figure 2. Donald’s (1991) qualitative model of vocal mimesis in adult humans. The mimetic controller integrates episodic representations with outputs from self-representational systems to control how sounds are produced and to compare external events with self-produced actions.

Figure 2. Donald’s (1991) qualitative model of vocal mimesis in adult humans. The mimetic controller integrates episodic representations with outputs from self-representational systems to control how sounds are produced and to compare external events with self-produced actions.

The idea that episodic memory representations play a key role in vocal reproduction has also been discussed in relation to speech shadowing tasks (Goldinger, 1998). Goldinger proposed that each word exposure generates a memory trace that resonates with previously encoded traces of the same word. When there are fewer past traces in memory (i.e., the word is uncommon), resonance with the current word presentation is weak. As a result, the unique vocal characteristics of the just-heard word are more likely to be retained in the mental representation that drives the shadowing production plan. Unlike traditional descriptions of episodic memory, which assume that such memories are consciously accessed, Goldinger’s proposal implies that such memories may also automatically shape vocal production. Essentially, the idea is that memories of recent auditory episodes may continuously modulate how a listener vocalizes.

The capacity of adult humans to voluntarily copy sounds is best viewed as a cognitive skill that requires refined perceptual-motor control and planning abilities. Cognitive skills are abilities that an organism can improve through practice or observational learning that involve judgments or processing beyond what is involved in performing perceptual-motor responses (Anderson, 1982; Mercado, 2008; Rosenbaum, Carlson, & Gilmore, 2001). Relevant cognitive processes that may contribute to an adult’s vocal imitation skills include conscious maintenance and recall of past auditory or vocal episodes, selective attention to subcomponents of experienced and produced sounds, identification of specific goals of reproducing certain acoustic features, and awareness of possible benefits that can be attained through successful sound reproduction. From a cognitive perspective, an imitative vocal act is a memory-guided performance rather than a learning mechanism, and an individual’s ability to flexibly perform such acts will depend strongly on how that individual mentally represents both sounds and sound producing actions (Roitblat, 1982; Roitblat & von Fersen, 1992).

The most impressive vocal imitation abilities of adult humans involve voluntary, highly experience-dependent skills that are more reminiscent of soccer skills than of learning mechanisms. Soccer players can all walk, run, and judge the consequences of their motor acts, but these abilities are insufficient to make someone a professional soccer player. Similarly, the ability to make sounds, recognize similarities between sounds, and remember sounds are all necessary for vocal imitation, but these abilities do not make a person a professional impersonator. It would not make sense to say that a toddler uses his soccer abilities to learn how to walk, and it may similarly be questionable to say that a toddler uses his vocal imitation abilities to learn how to talk. What the toddler does in both cases is learn how to flexibly control his or her actions based on past experiences. S(he) gradually learns to voluntarily run and kick in strategically advantageous ways and also gradually learns to voluntarily produce sound features based on memories of past percepts and actions.

Synthesis

Past attempts to understand the nature of vocal imitation reflect the ways in which this phenomenon has been used as an explanatory construct. Psychologists have often noted the important role that vocal imitation may play in language learning, and consequently have emphasized how the availability and guidance of adult speakers may contribute to learning when infants copy their examples. Biologists have also stressed how vocal imitation can facilitate the vocal learning of communicative signals. Consequently, it is perhaps only natural that researchers have traditionally described vocal imitation as a learning mechanism. In contrast, it seems less likely that an adult human shadowing speech in a laboratory, or humming a tune while exiting a concert, is doing so to learn how to speak or to hum. Identifying when vocal imitation abilities are used provides hints about what those abilities may be for, but those hints may be misleading when only a subset of the relevant contexts are considered or when those abilities are difficult to observe. Without understanding the mechanisms that underlie sound imitation, and without any ability to monitor those mechanisms, it is simply not possible to definitively identify instances in which vocal imitation is occurring.

What then is vocal imitation? Clearly, different fields offer different ways of answering this question. Historically, animal learning researchers have described vocal imitation as the generalization of a conditioned response that is acquired through a supervised learning process. In this framework, acquisition of vocal imitation abilities (and consequent vocal communicative capacities) is subserved by general mechanisms of associative learning, rather than adaptively specialized vocal learning mechanisms. Animal behavior researchers, in contrast, have treated vocal imitation as a highly specialized adaptation that serves primarily to increase the flexibility with which animals can expand or customize their vocal repertoire. In this context, vocal imitation is the learning mechanism. Finally, cognitive psychologists construe vocal imitation as a consequence of multiple voluntary and involuntary representational processes. From the cognitive perspective, vocal imitation may help an organism learn, but this capacity can also be enlisted when no learning or vocalizing is occurring.

In the following, we use the term vocal imitation to refer to the vocal reenactment of previously experienced auditory events, essentially endorsing the framework developed by cognitive researchers studying vocal imitation in adult humans. Moreover, we claim that vocal imitation is a complex cognitive ability that involves coordinating action and perception. As such, vocal imitation can both be learned and in turn facilitate learning. The strength of this definition, and the cognitive approach more generally, is that it encompasses voluntary and automatic imitation, including covert imitation, and gives a clearer sense of the scope of cognitive processes that may contribute to vocal imitation abilities. A potential weakness of this definition is that it does not provide specific criteria for distinguishing imitative vocal acts from those that are non-imitative. As history has repeatedly shown, however, identifying such demarcation criteria is a formidable task, made all the more difficult by an incomplete understanding of the mechanisms underlying vocal imitation. Taxonomical distinctions may be useful for classifying different vocal phenomena, but it is less clear that they provide a viable framework for understanding what vocal imitation is or how it works. Instead, we focus on understanding how past experiences with various sounds enable some organisms to reproduce them. Because cognitive psychologists have studied vocal imitation most extensively in primates (primarily humans), we first consider the factors that determine when primates imitate sounds, as well as the features of sounds that primates are most likely to reproduce.

III. Sound Imitation by Human and Non-Human Primates

When considering the factors that constrain an individual’s ability to imitate sounds (or likelihood of doing so), a key question is: what makes a sound more or less imitatible? The answer to this question may vary across species and even within and across individuals of the same species. Wilson (2001b) defined imitatible stimuli as those for which an individual’s body can engage in an activity in which its configuration and movement can be mapped onto the configuration and movement of the stimulus, even if the mapping is not perfect and only applies to a limited set of properties of the stimulus. Most humans can easily imitate at least some speech sounds, but all other primates cannot. This has generally been interpreted as evidence that humans have unique capacities for imitating sounds. It remains possible, however, that sounds exist that at least some non-human primates might easily imitate, but that humans would find difficult or impossible to imitate. In the following, we suggest that a primate’s ability to imitate a particular sound depends, at least in part, on how the individual represents the sound and sound producing actions.

Imitating Sounds Non-Vocally

If vocal imitation is defined as the vocal reenactment of previously heard events, then sound imitation can be viewed as a generalization of this ability that includes both vocal and non-vocal reenactments. Past emphasis on understanding how vocal imitation enables individuals to learn to produce novel vocalizations has distracted attention away from instances in which organisms use non-vocal motor acts to reproduce sounds. According to Wilson’s (2001b) definition of imitatible stimuli, any sound-producing movements of an individual’s body that can be mapped onto features of heard sounds can potentially make that sound imitatible. Thus, it is important to consider all available sound producing body movements when evaluating the imitatibility of a sound for a particular species.

There have been several anecdotal reports of animals non-vocally reproducing environmental sounds such as percussive knocking (e.g., Witchell, 1896). This phenomenon has only recently been studied scientifically, however. Moore (1992) reported that a parrot (Psittacus erithacus) reproduced knocking sounds by drumming its head on objects after repeatedly observing a person hitting on a door. He later described this behavior as an instance of percussive mimicry, which he argued was a more sophisticated ability than vocal reproduction. Most reports of sound imitation by non-human primates involve non-vocal sound production (for rare exceptions, see Kojima, 2003; Masataka, 2003). Marshall, Wrangham, and Arcadi (1999) observed that chimpanzees (Pan troglodytes) exposed to a male that produced a Bronx cheer3 as part of his pant-hoot call subsequently began using this sound in their own calls. A captive orangutan (Pongo pygmaeus × Pongo abelii) independently learned to whistle and was able to match the duration and number of whistles produced by a human model (Lameira et al., 2013; Wich et al., 2009; see Figure 3a). Though performed with the mouth, whistling is a non-vocal motor act requiring fine control of lip positions and airflow. Most recently, infant chimpanzees have been shown to adopt particular non-vocal sound production techniques (kisses, lip smacks, Bronx cheers, teeth clacking) as attention-getting signals based on the techniques modeled by their mothers (Taglialatela, Reamer, Schapiro, & Hopkins, 2012). Chimpanzees also can be trained to produce such non-vocal sounds, suggesting that their ability to voluntarily generate novel sounds is more flexible than previously thought (Hopkins, Taglialatela, & Leavens, 2007; Russell, Hopkins, & Taglialatela, 2012).

The non-vocal sound imitation abilities of humans are often taken for granted in music education (Drake, 1993; Drake & Palmer, 2000; Palmer & Drake, 1997). For instance, a teacher may ask students to clap the rhythm of a song that they are learning to sing, or ask them to copy a demonstrated percussive pattern on various instruments. Conversely, most music students take ear-training classes that involve having to produce visually presented musical intervals vocally (called “sight-singing”), with the assumption that this ability will facilitate non-vocal reproduction of music. A musician that reproduces the melodic sequence produced by a singing bird or fellow musician when she plucks strings, presses piano keys, or uses air to make a reed vibrate, is also imitating the sounds non-vocally (Clarke, 1993; Clarke & Baker-Short, 1987). Many musicians learn to play songs “by ear,” which involves transforming heard sounds into the motor acts required to reproduce them (McPherson & Gabrielsson, 2002; Woody & Lehmann, 2010). Musicians and non-musicians can readily imitate the intonation patterns of sentences by moving a stylus on a tablet (d’Alessandro, Rilliard, & Le Beux, 2011). It is not clear anecdotally, either among human or non-human primates, that there is anything special about non-vocal reproduction of sounds relative to vocal imitation. The individuals appear to be reproducing sounds based on past experiences, regardless of whether the reenactment is produced through the voice or through some other means. In fact, the perceptual and cognitive demands appear to be comparable: the individual perceives a sound and then uses that sound as a guide for controlling motor acts that generate a similar event.

Figure 3. (a) Non-vocal sound imitation by an orangutan (adapted from Wich et al., 2009; Figure 2). Gray lines show spectrographic contours of whistles produced by a human, and black lines show the contours of subsequent whistles produced by the orangutan in which the number, timing, and duration of sounds are similar to features present in the target sequence. (b) Spontaneous vocal production by an infant chimpanzee (black lines show spectrographic contour and harmonics) with acoustic features similar to those of a preceding environmental sound (gray lines), indicative of vocal imitation (adapted from Kojima, 2003; Figure 9-2).

Figure 3. (a) Non-vocal sound imitation by an orangutan (adapted from Wich et al., 2009; Figure 2). Gray lines show spectrographic contours of whistles produced by a human, and black lines show the contours of subsequent whistles produced by the orangutan in which the number, timing, and duration of sounds are similar to features present in the target sequence. (b) Spontaneous vocal production by an infant chimpanzee (black lines show spectrographic contour and harmonics) with acoustic features similar to those of a preceding environmental sound (gray lines), indicative of vocal imitation (adapted from Kojima, 2003; Figure 9-2).

It is possible, however, that vocal and non-vocal sound imitation involve qualitatively different mechanisms. For instance, Moore (2004) argues that the parrot’s capacity for copying sounds percussively requires adaptations beyond those necessary for vocal imitation. In human studies, some have suggested that processing of different auditory events (e.g., melodies versus speech) may involve separate underlying mechanisms (Peretz & Coltheart, 2003; Zatorre & Baum, 2012; Zatorre, Belin, & Penhune, 2002), whereas others argue that there may be significant overlap (Mantell & Pfordresher, 2013; Patel, 2003; C. Price & Griffiths, 2005). Evidence supporting the view that vocal and non-vocal sound imitation can involve separate mechanisms was recently reported by Hutchins and Peretz (2012). In their study, participants who were classified as either accurate or poor-pitch singers matched pitch either vocally or manually by using a slider. The slider was used so that participants could continuously control pitch, as is the case for vocal pitch control, thus somewhat equating demands of pitch control across distinct effector systems. They found that pitch-matching errors in poor-pitch singers were voice specific. In other words, poor-pitch singers successfully matched pitch using the slider, but not using their voice. These results suggest that an individual’s ability to reproduce a pitch depends on the specific movements and associated feedback involved in matching the pitch.

For primates, sounds are imitatible when they are encoded in such a way that the stored representation of that sound enables the listener to voluntarily generate motor acts that produce phenomenological features present within the originally experienced sound. Note that by this criterion, any sound that a human hears is potentially imitatible, because the listener should be able to at least approximate the duration of the heard sound through some sound producing action. It is less clear which sounds would qualify as imitatible for other primates. Based on the currently available evidence, non-vocal sounds produced with the mouth seem to be relatively easy for chimpanzees and orangutans to reproduce, whereas vocal sounds are relatively easy for humans to imitate. Given that some sounds, such as those produced by conspecifics, will be easier to reproduce than other sounds, findings regarding which sounds (or features of sounds) are most imitatible can provide important clues about the factors that constrain imitation capacities within and across species.

Variations in the Imitatibility of Sounds

If sound imitation depends on adaptively specialized auditory-motor processing, then the sound features that should be easiest for an organism to imitate should be those present within functional vocalizations produced by conspecifics. Recent studies of humans provide some support for the hypothesis that vocal imitation is facilitated for natural vocalizations. For instance, matching of pitch is more accurate with a human voice timbre than a synthetic vocal timbre (Lévêque, Giovanni, & Schön, 2012; R. Moore, Estis, Gordon-Hickey, & Watts, 2008) or with a complex tone (Hutchins & Peretz, 2012; Watts & Hall, 2008). Adults also match pitch better when the vocal range of the target is closer to their range, as when female imitators match a female voice (H. E. Price, 2000). Similarly, children match pitch better when matching a child’s voice, and are better at matching pitch for a female than a male adult voice, given the greater similarity of female voice formants and pitch to a child’s voice (Green, 1990).

Figure 4. Pitch contours (shown as black dots) extracted from an adult human’s vocalizations when the individual was instructed to imitate a target vocal sequence compared with spectral and temporal features of the target sequence (gray lines).

Figure 4. Pitch contours (shown as black dots) extracted from an adult human’s vocalizations when the individual was instructed to imitate a
target vocal sequence compared with spectral and temporal features of the target sequence (gray lines).

Mantell and Pfordresher (2013) recently explored differences in the vocal imitation of pitch within two cognitive domains: music (song) and language (speech). We summarize the results of this study here as a paradigmatic example of how vocal imitation can be influenced by stimulus structure, and of how the fidelity of imitations can be quantitatively assessed. According to the modular model of audition proposed by Peretz and Coltheart (2003), pitch processing occurs in domain-specific, information-encapsulated modules (Fodor, 1983) separate from speech processing. In a direct test of this framework, Mantell and Pfordresher compared the accuracy with which people intentionally imitated the pitch-time contents of spoken sentences and sung melodies. They created speech and song stimuli that matched in word content, pitch contour (the pattern of rising and falling pitch), pitch range, and syllable/note timing. The difference between the speech and song targets was that each note of the sung targets conformed to diatonic, musical tonal rules. Mantell and Pfordresher reasoned that if the pitch processing system underlying vocal imitation was truly modular, phonetic information should not influence imitative performance. Thus, the critical experimental factor was the presence or absence of phonetic information in the target sequences. They created wordless versions of all of the speech and song stimuli by synthesizing the pitch-time contents of each of the worded sequences. The wordless versions sounded like hummed versions of the sentences and songs, but they lacked all acoustic-phonetic identification cues. Imitation accuracy was gauged by directly comparing the target sequence with a temporally aligned imitative production (Figure 4) and by calculating two different quantitative measures of similarity (Figure 5). The first measure, mean absolute error, assessed the accuracy with which each imitative production matched the pitch content of the target. The second measure, pitch correlation, scrutinized the accuracy with which each imitative production tracked the relative (rising and falling) pitch-time contour of the target.

Figure 5. Poor-pitch imitators (left) produce vocalizations that do not match the target sounds in absolute or relative pitch, whereas typical adult humans (right) match both spectral features.

Figure 5. Poor-pitch imitators (left) produce vocalizations that do not match the target sounds in absolute or relative pitch, whereas typical adult humans (right) match both spectral features.

The critical finding of this study was that the presence of phonetic information in both the target and the imitative production reliably improved pitch accuracy. Thus, subjects imitated worded speech and song sequences more accurately than they imitated wordless speech and song sequences, despite the fact that the wordless versions were acoustically simpler (e.g., they lacked complex acoustic-phonetic spectral information). This finding is contrary to predictions afforded by a modular framework of music and speech processing, because if musical pitch processors are encapsulated to speech, then pitch processing should occur independently and unhindered (or not facilitated) by any parallel phonological processes. It also contradicts the proposal that imitation of spectrotemporal contours is inherently more difficult than imitation of other acoustic features (Janik & Slater, 2000). Mantell and Pfordresher further found that participants varied in their accuracy at imitating absolute and relative features of target sequences (see also Dalla Bella, Giguere, & Peretz, 2007; Pfordresher & Brown, 2007). Specifically, participants imitated the absolute pitch within songs better than the absolute pitch in sentences, but imitated the relative pitch-time contours of speech and song equally well.

In Mantell and Pfordresher’s (2013) study, participants imitated recordings of vocalizations and also synthesized versions of these recordings, making it possible to examine whether they adjusted the resonant properties of their vocal tract in order to imitate the timbre of targets. The synthesized recordings featured a timbre that resembled a human voice, but that differed considerably from the timbre of vocal recordings. Analyses of the long-term average spectra during imitations (Figure 6) suggested that participants adjusted their own vocal resonances in order to imitate the timbre of each target, even though this was not necessary according to instructions, which simply focused on the imitation of pitch content. As illustrated in Figure 4, participants also naturally matched the temporal structure of heard sequences, which was also not specifically requested in the instructions. Thus, when humans voluntarily imitate speech or song sequences, they spontaneously imitate multiple acoustic features of the sequences. Interestingly, when an orangutan imitated whistle sequences produced by a human (Wich et al., 2009), it also spontaneously matched the duration and temporal spacing of target sequences (Figure 3a), suggesting that this propensity is not limited to human imitators.

Figure 6. Long-term average spectra showing that adult humans spontaneously match the timbre of target sound sequences when targets are either natural or synthetic.

Figure 6. Long-term average spectra showing that adult humans spontaneously match the timbre of target sound sequences when targets are either natural or synthetic.

What Makes a Sound Imitatible?

As noted above, a basic question surrounding the imitatibility of sounds concerns whether, or to what degree, organisms have evolved dedicated systems that are specialized for imitating certain sound features. The imitatibility of sounds is not simply based on whether the acoustic properties of individual sounds resemble those of natural vocalizations. People are able to vocally reproduce melodies presented on a piano as well as those that are sung, and infant chimpanzees sometimes imitate environmental sounds (Figure 3b, Kojima, 2003). The complexity of a target sequence can strongly limit its imitatibility. At a cognitive level, different kinds of target sequences represent different auditory domains and may, according to some theories, be processed by different cognitive modules. Take for instance the difference between a sung melody on the syllable “la” versus a spoken sentence. Both are auditory sequences, but each is complex in its own way. Because the former sequence is heard as “musical,” it may be processed differently from the latter sequence. Such putative separation across domains may therefore influence imitatibility and, consequently, many human studies focus on the structural complexity of rhythmic, melodic, and phonic combinations rather than on the relative difficulty of producing individual sounds.

An important ancillary consideration when evaluating the imitatibility of sounds is the flexibility of vocal production by the imitator. Obviously, an individual who can imitate a wide range of inputs must be able to engage in flexible vocal motor control. Flexibility in pitch range increases dramatically during childhood, and thus may play a large role in the development of pitch matching abilities in singing (Welch, 1979). Similarly, poor-pitch singers, who exhibit a general deficit of vocal imitation, also exhibit an apparent lack of flexibility in vocal imitation (Pfordresher & Brown, 2007). Poor-pitch singers also show a larger advantage for matching pitch from recordings of their own voice, in contrast to matching the vocal pitch of other singers, than do more accurate singers (R. Moore et al., 2008; Pfordresher & Mantell, 2014). Finally, when transferring from the imitation of one sequence to another, poor-pitch singers show a greater tendency to perseverate the previously imitated pitch pattern than do more accurate singers (Wisniewski, Mantell, & Pfordresher, 2013). Interestingly, this apparent lack of flexibility in poor-pitch singers does not appear to be based on vocal motor control in that poor-pitch singers exhibit similar pitch range and ability to control a sustained pitch as accurate singers (Pfordresher & Brown, 2007; Pfordresher & Mantell, 2009). Instead, their inflexibility seems to result from dysfunctional vocal imitation abilities.

Even when considering only the performance of adult humans, there is no fixed scale of most-to-least imitatible sounds or sound features. Nevertheless, it may be possible to generate a gross scale of different properties associated with sounds being more or less imitatible. For instance, sound features that are imperceptible or sounds (and sequences) with complex, aperiodic, novel acoustic structures are typically more difficult to imitate, whereas sounds that are routinely self-generated tend to be the easiest to reproduce. Interestingly, this scale is the inverse of the criteria that biologists have developed for identifying instances of vocal imitation. Specifically, production of highly imitatible sounds is generally considered to be the least compelling behavioral evidence of vocal imitation, whereas production of novel complex sounds (which are often less imitatible) is currently considered to be the most compelling evidence. Consequently, the sounds that an individual is most likely to be proficient at imitating are also the sounds that scientists are least likely to consider relevant to studies of vocal imitation. In fact, in the taxonomy of vocal learning abilities proposed by Janik and Slater (2000), some sounds are inherently impossible to imitate; by their definitional criteria, an individual cannot imitate any sound that is already within the individual’s vocal repertoire. This constraint arises from the fact that they view vocal imitation as a learning mechanism. If vocal imitation is viewed as vocal reenactment, however, then individuals can potentially imitate any sound. This includes their own vocalizations, a process referred to as self-imitation (Pfordresher & Mantell, 2014; Repp & Williams, 1987; Vallabha & Tuller, 2004).

Studies of intentional vocal imitation in humans are beginning to shed new light on how sound imitatibility varies within and across individuals. They have yet to reveal, however, why sound imitatibility varies. If a person is particularly good at imitating a family member’s voice that is similar to his or her own, is this because the person possesses an adaptively specialized module that is tuned to the specific features of sounds produced by relatives? Is it because shared genetics have led to similar vocal organs? Or, is it because the person aspires to be like that family member and has practiced copying particular mannerisms of their role model’s vocal style over many years? To a large extent, the imitatibility of a sound depends on what resources the listener brings to bear for perceiving, encoding, and producing sounds. A clearer understanding of the physical and mental mechanisms relevant to increasing the imitatibility of sounds can be gained by examining those individuals who have reached the highest levels of performance—professional imitators.

Expertise in Sound Imitation

If, as we claim, voluntary imitation of sounds is a cognitive skill, then it should be possible to improve imitation abilities with training. However, if sound imitation is more of an innate capacity, then individual variations in ability should be less dependent on experience. Earlier claims that vocal imitation involves feedback-based error correction (Heyes, 1996; N. E. Miller & Dollard, 1941) predict that the fidelity with which particular sounds are imitated should increase incrementally as the number of comparisons between produced vocalizations and remembered targets increases. However, studies of the vocal imitation of pitch in singing have not shown any improvements across repeated trials in which participants attempted either to match the same pitch vocally (Hutchins & Peretz, 2012), or to repeatedly imitate the same spoken or sung sequence (Wisniewski et al., 2013). Likewise, efforts to enhance pitch imitation accuracy by having participants sing along with the correct sequence (auditory augmented feedback) have yielded mixed results and may even degrade the performance of poor-pitch singers (Hutchins, Zarate, Zatorre, & Peretz, 2010; Pfordresher & Brown, 2007; Wang, Yan, & Ng, 2012; Wise & Sloboda, 2008). It is clear anecdotally that individuals can improve their vocal imitation abilities through instruction and practice. However, simply relying on error correction based on auditory feedback may not suffice. More successful methods of augmented feedback involve showing the singer a graphical display of the imitated and target pitches as on-screen icons, with changes to sung pitch influencing the spatial proximity of these displays (Hoppe, Sadakate, & Desain, 2006).

Anecdotally, evidence that learning experiences can strongly determine sound imitation abilities comes from the performances of professional musicians, who often train and practice for decades to achieve the control necessary to produce particular sound qualities (e.g., features such as vibrato or breathiness). Often, musical training focuses on teaching students how to produce higher quality sound sequences. This generally means the student must learn to reproduce the features of sounds commonly produced by more proficient musicians. The fact that many professional musicians spend several hours a day performing exercises to maintain and enhance their musical skills attests to the important contributions of practice to their ability to flexibly and accurately reproduce sounds in a prescribed way.

A second domain in which imitative skills appear to be refined through practice is the learning or copying of non-native languages. Much of the difficulty in learning a new language relates to learning to produce speech sounds to match some pre-established standard. The ability to imitate foreign languages varies considerably across individuals (Golestani & Zatorre, 2009; Reiterer et al., 2011), and is predicted by levels of articulatory flexibility and working memory capacity (Reiterer, Singh, & Winkler, 2012). Professional actors may learn to reproduce a wide range of dialects or even foreign languages that they do not speak when performing dialogue. What exactly are second language learners or professional actors learning in these situations? In part, they seem to be learning which features of speech sounds and vocal gestures they need to copy. Importantly, speakers do not need to learn the necessary adjustments for each word within a language, but can immediately apply what they have learned to many novel words and sentences. In some cases, subtle distinctions in speech sounds may be extremely difficult for a non-native speaker to imitate (Ingvalson, Holt, & McClelland, 2012; Lim & Holt, 2011; Reiterer et al., 2011), again attesting to the important role that experience plays in an adult’s ability to vocally imitate, even when the sounds being imitated are the naturally occurring speech of other humans.

Learning language or musical skills might depend more on developing expertise in particular perceptual-motor acts or on gaining knowledge about symbols and rules than on improving sound imitation abilities. Some entertainers have more explicitly developed expertise in sound imitation, however, including professional impersonators, tribute artists, and vocalists described as beatboxers. These performers all specialize in reproducing speech or musical sound sequences. For example, beatboxers interleave imitations of both percussive and vocal elements of electronically or acoustically generated sound sequences, often using novel modes of sound production to capture key features of the musical sequences being imitated. Like other musicians, these expert sound imitators gain their unique skills through extensive directed practice and performance.

Recently, researchers have found that professional speech impersonators match the general pitch of the fundamental, temporal variations in the fundamental, speaking rate, prosody, formant structure, and the timbre of model speakers (Amin et al., 2012; Eriksson, 2010; Eriksson & Wretling, 1997; Majewski & Staroniewicz, 2011; Revis et al., 2013; Zetterholm, 2006). Impersonators match the timing of speech sounds at the sentence or prosodic level rather than at the word level (Eriksson & Wretling, 1997; Liberman & Mattingly, 1985; Revis et al., 2013), and vary considerably in their ability to match particular features. Compared to amateurs, professional impersonators are more aware of differences between their vocalizations and those of a target, and are better able to emphasize features that are likely to be salient to listeners (Revis et al., 2013). Interestingly, expert impersonators, like caricaturists, often exaggerate features of copied sounds such that imitations judged to be most accurate by listeners generally do not exactly match the acoustic features of the model (Majewski & Staroniewicz, 2011; Zetterholm, 2006). In fact, when amateur impersonators imitated models, the acoustic properties of their imitations more closely matched the speech of models, but listeners nevertheless judged these attempts as worse copies of the models than those produced by professionals (Majewski & Staroniewicz, 2011). These acoustic experiments show that expert vocal imitators copy and adjust sounds along multiple acoustic dimensions in parallel, and can do so even when producing novel speech sequences that incorporate speech sounds/words that differ from those of the model.

Collectively, evidence from expert imitators suggests that the enhanced sound imitation abilities of adult humans reflect a protracted learning process that can extend over decades. This raises questions about whether differences in imitation abilities across species might reflect differences in training histories rather than (or in addition to) differences in adaptive specializations. A related possibility is that constraints on sound imitation in non-humans may reflect differences in cognitive plasticity across species (Mercado, 2008), such that even with comparable training histories and the same cognitive mechanisms available, some species may be better able to acquire the cognitive skills necessary for flexible sound imitation. Neither of these accounts requires that humans possess any adaptively specialized “extra parts” to account for cross-species differences in vocal imitation abilities.

Synthesis

Past assessments of the sound imitation abilities of nonhuman primates have been unequivocally dismissive. For instance, Hauser (2009, p. 304) states that, “monkeys and apes . . . show no evidence for vocal imitation. There is no capacity (and it has been fifty years of intensive looking by primatologists), absolutely no evidence for vocal imitation.” Although there is evidence that adult non-human primates may modify their vocalizations so that they are more similar to those of other individuals within a group (reviewed by Egnor & Hauser, 2004; Owren, Amoss, & Rendall, 2011), referred to as vocal convergence or vocal matching, it is unclear whether such convergence is the result of learning or genetics. Although non-human primates do not vocally imitate sounds to the same extent as humans, they do have some capacity to represent a subset of sounds in ways that enable them to non-vocally imitate those sounds. If flexible sound imitation abilities are cognitive skills learned through practice, as the evidence from adult humans suggests, then a non-human primate might need significant pedagogical guidance over many years before flexible sound imitating abilities are evident (see also Pepperberg, 1986). It seems clear, nevertheless, that non-human primates only rarely overtly imitate sounds in naturalistic contexts. Consequently, studies of imitation in monkeys and apes may be less informative than studies of other mammals that more regularly imitate sounds. Few mammals other than humans have shown the ability to voluntarily imitate sounds, which has led some researchers to suggest that vocal imitation requires unique, human-specific neural and cognitive processing mechanisms. By examining the situations faced by those rare mammalian species that are known to naturally voluntarily imitate sounds, one can potentially gain insights into the representational demands that might lead to the kinds of internal processes that would provide an organism with flexible sound imitation abilities. Identifying similarities and differences between the imitation abilities of humans and non-humans can thus provide important clues about the nature of the mechanisms that determine imitative proficiency and proclivity.

IV. Sound Imitation by Whales and Dolphins

The only mammalian order that includes multiple species with the apparent ability to flexibly imitate sounds is cetaceans. In the following section, we review the evidence for sound imitation abilities in cetaceans in detail, considering not only the strengths and weaknesses of this evidence, but also how it compares to findings from human research. Cetaceans provide a particularly important test bed for examining the origins of imitative abilities as well as the mechanisms that underlie such abilities, because although they have diverged in many ways from terrestrial mammals, they seem to possess cognitive capacities that are similar in certain respects to those of humans (Herman, 1980; Marino et al., 2007; Mercado & DeLong, 2010). For instance, bottlenose dolphins are the only mammals other than humans that have demonstrated the ability to voluntarily imitate both seen and heard actions (Herman, 1980, 2002; Kuczaj & Yeater, 2006; Yeater & Kuczaj, 2010). Humpback whales are the only non-human mammals that continuously and collectively restructure their vocal repertoire throughout their adult lives (Guinee, Chu, & Dorsey, 1983). Given that researchers are severely limited in their ability to observe and conduct experiments with cetaceans, the prevalence of observations indicative of cetacean sound imitation abilities is noteworthy. The following subsections focus on the sound producing abilities of belugas (Delphinapterus leucas), orcas (Orcinus orca), humpback whales (Megaptera novaeangliae), and bottlenose dolphins, four cetacean species often described as vocal imitators. Evidence suggestive of vocal learning and imitative abilities has been reported for other cetacean species (DeRuiter et al., 2013; May-Collado, 2010; Rendell & Whitehead, 2003), and thus the four species emphasized here are best viewed as a sample of convenience.

Flexible Sound Production Mechanisms May Enhance Imitative Capacities

Vocal flexibility is a key aspect of vocal imitation and may be a prerequisite for vocal imitation abilities (Arriaga & Jarvis, 2013; Deacon, 1997; Fitch, 2010; Mowrer, 1960). Like most mammals, cetaceans can produce sounds using both internal organs and other body parts, referred to as vocalizations/phonations and percussive sounds respectively. Some researchers have questioned using the term vocalization to describe cetacean sounds because, unlike terrestrial mammals, most cetaceans do not appear to use vocal folds to produce sounds (Cranford et al., 2011; however, see Reidenberg & Laitman, 1988, 2007). For those cetaceans that produce sounds nasally rather than vocally (which includes belugas, orcas, and dolphins), it would be more accurate to say that they possess nasal imitation abilities. As noted above, the term sound imitation avoids such complications because it does not specify how sounds are reproduced.

Cetacean vocalizations have traditionally been classified first by suborder (i.e., baleen whales vs. toothed whales), and then either by the acoustic features of the sounds perceived by the investigator, or in terms of proposed functional classes. The vocalizations of toothed whales have been classified into three aurally defined categories: clicks, whistles, and burst-pulse sounds. Clicks are often associated with echolocation, whereas whistles and burst-pulse sounds are often associated with communication (Herman & Tavolga, 1980; Janik, 2009a). Baleen whale vocalizations have often been described as being very different from toothed whale vocalizations and as much more difficult to classify (Edds-Walton, 1997). Distinctions have been drawn between calls and songs (Clark, 1990), and between different kinds of calls (e.g., moans, cries, grunts, and pulse trains). Edds-Walton (1997) categorized baleen whale sounds into three functional/contextual categories: contact calls, winter (breeding) vocalizations, and social sounds.

Popper and Edds-Walton (1997) suggested that the vocalizations of both toothed and baleen whales could be collectively classified into three discrete categories based on their acoustic features: tonal or narrow-band whistles or moans, pulsed sounds, and broadband clicks. However, other analyses suggest that these three subjective categories represent points along a continuum of pulsed vocalizations, with clicks corresponding to low-rate pulse trains, pulsed sounds to medium-rate pulse trains, and “whistles” to high-rate pulse trains (Killebrew, Mercado, Herman, & Pack, 2001; Mercado, Schneider, Pack, & Herman, 2010; Murray, Mercado, & Roitblat, 1998). In this context, the term whistle is a misnomer, because the mechanism of sound production is the same as that of clicks and pulsed sounds, namely vibrating membranes. This interpretation has recently been experimentally confirmed in bottlenose dolphins (Madsen, Jensen, Carder, & Ridgway, 2012). Observational evidence shows that cetaceans can continuously modulate sounds along this pulse-rate continuum, much as professional human singers do when producing pitches across a wide range (Mercado et al., 2010; Murray et al., 1998). In other words, the vocal repertoire of cetaceans is graded rather than discrete, and vocal control in cetaceans is generally comparable to that of human singers; click trains are analogous to vocal fry, pulsed sounds share features of sung vowels, and whistles are comparable to the sounds a soprano might produce when singing in the whistle register.

A key difference between how human singers typically vocalize and the ways that cetaceans vocalize is that some cetacean species can control two independently vibrating sources simultaneously (Cranford et al., 2011). In this way, the vocal flexibility of sound production is greatly increased for cetaceans relative to other mammals, and is more comparable to the dual syringeal production mechanisms used by many singing birds. Much less attention has been given to studies of percussive sounds made by cetaceans (e.g., rhythmic tail slapping), so little is known about how flexibly cetaceans might use these sound producing modes. In terms of vocal range, cetaceans as a group are unmatched among mammals. A humpback whale can produce sounds lower than any human male singer as well as sounds higher than the highest pitches sung by professional sopranos (Mercado et al., 2010). Dolphins can also produce a wide variety of tonal sounds as well as ultrasonic clicks (Au, 1993). The various species described below differ in their specific vocal skills and repertoires, but all show greater flexibility in sound production than any mammal other than humans. Thus, unlike non-human primates, there is no question about whether cetaceans have the dexterity necessary to imitate many acoustic features.

Speech-Like Sound Production by Belugas

The repertoire of vocal sounds produced by belugas has been evaluated both in captive animals (Vergara & Barrett-Lennard, 2008) and wild populations (Chmelnitsky & Ferguson, 2012), and is historically considered to be one of the most varied of all cetaceans (Fish & Mowbray, 1962; Schevill & Lawrence, 1949). Like most toothed whales, they produce a wide range of pulsed sounds, many of which have been described as whistles or pulsed calls. In contrast to many other toothed whales, the graded structure of beluga sounds has been consistently noted in past studies (Chmelnitsky & Ferguson, 2012; Karlsen, Bisther, Lydersen, Haug, & Kovacs, 2002; Sjare & Smith, 1986). Belugas appear to be able to produce two independent calls simultaneously (Chmelnitsky & Ferguson, 2012), consistent with reports from other highly vocal toothed whales. Like most cetaceans, belugas are thought to vocalize primarily to echolocate or to socially communicate. However, assessments of the functionality of sounds (other than click trains) have been limited mainly to observational studies in which different sound types were correlated with different social contexts (Panova, Belikov, Agafonov, & Bel’kovich, 2012; Sjare & Smith, 1986).

The social structure within groups of belugas appears to be fluid. They sometimes form large groups and there are indications that their sound repertoire varies with context and number of individuals present (Panova et al., 2012). There are no reports of belugas imitating sounds in the wild, but such behavior would be virtually impossible to detect. It is also unclear how easily sound imitation by belugas in captivity would be to identify. Nevertheless, there are at least two published reports of captive belugas producing speech-like sounds without explicit training. The first was a report of an adult male that was heard to produce his name: Logosi (Eaton, 1979). This beluga was described as being particularly interested in human visitors, spending much of his time near viewing windows. He was also described as producing sounds that resembled the “sound of human voices heard underwater.” Some listeners described these sounds as resembling Russian, Chinese, or garbled voices. A more recent report (Ridgway, Carder, Jeffries, & Todd, 2012) describes recordings of a second beluga spontaneously producing sounds that were “as if two people were conversing in the distance, just out of range for our understanding.” The temporal patterning of sound production was also found to be comparable to speech. Trainers were able to teach the beluga to “speak” on cue, so that the sound production mechanisms used could be examined more closely. When the beluga produced speech-like sounds, atypical modes of sound production were observed in which the beluga sequentially inflated two vocal sacs. More naturalistic evidence of sound imitation during development was reported for a captive beluga calf that appeared to adopt new modes of call production after being exposed to the novel calls of an adult male that was introduced into his environment (Vergara & Barrett-Lennard, 2008). Researchers have speculated that belugas may copy the sounds of conspecifics to facilitate individual and group recognition or possibly to maintain social bonds (Janik & Slater, 1997).

One interesting feature of speech-like sound production by belugas is that humans speak in air, but the beluga’s auditory system is adapted for receiving sounds underwater. Consequently, it is difficult to know whether differences between speech-like sounds produced by belugas and those produced by humans reflect limitations in their ability to reproduce sounds, or correspond to the effects of distortion caused either by impedance mismatches at the air–water interface, or because the beluga heard the speech with its head out of the water. In other words, a beluga might be accurately replicating the sounds that it experienced and still sound like it was producing distorted or garbled speech. This ambiguity highlights the fact that the similarity of two sounds is observer dependent; two sounds that are “different” to one observer (or species) might be “the same” to another, or vice versa.

Orca Sound Matching: Imitation of Familiar Vocalizations?

Orcas, commonly referred to as killer whales, are the largest species of dolphin. Their vocal repertoire is similar in many respects to that of belugas, except that they produce relatively more intermediate pulse-rate calls than the higher pulse-rate “whistles” typical of belugas. Orcas also have been recorded producing two types of sound simultaneously (referred to as biphonic calls) more consistently than have belugas (Filatova et al., 2012; P. J. O. Miller, Shapiro, Tyack, & Solow, 2004). Much of the interest in orca vocalizations comes not from any particularly unusual features of their calls or their usage, but from the fact that stable social groups of orcas use a shared repertoire of sounds that is so consistent that recordings of particular sounds can be used to identify particular families of orcas (Deecke, 1998; Filatova et al., 2010; Ford, 1991; Weib, Symonds, Spong, & Ladich, 2011). The predictability with which groups of orcas use certain sets of sounds with recognizable acoustic features has led many researchers to conclude that orcas within each group use a discrete library of 7–17 calls that is adopted by convention (Ford, 1991; Kremers, Lemasson, Almunia, & Wanker, 2012; Rendell & Whitehead, 2001; Strager, 1995). Field observations indicate that orcas use their sounds differentially depending on the social context (Ford, 1989; Hoelzel & Osborne, 1986; Thomsen, Franck, & Ford, 2002). Some of the call types appear to be shared across social groups, and overlap in repertoires has been used as an index of the relationships between distinct groups (Riesch, Ford, & Thomsen, 2006; Yurk, Barrett-Lennard, Ford, & Matkin, 2002). Longitudinal analyses of call variations within particular groups suggest that the features of sounds within each group’s repertoire are being gradually modified over time, and that modifications are constrained in such a way that the differences in sounds used across groups are not increasing over time (Deecke, 1998; Grebner et al., 2011). It has been suggested that just as researchers can identify families of orcas from their call repertoire, the orcas themselves may use calls as signifiers of family membership. However, there is currently no evidence that orcas use sounds in this way. Recent studies suggest that orcas will match the calls of other orcas that they hear in vocal exchanges (P. J. O. Miller et al., 2004; Weib et al., 2011).

As with belugas, observations suggesting that orcas have the capacity to imitate sounds have mostly been opportunistic. Orca calves in captivity may develop calls with features similar to those produced by their companions (Kremers et al., 2012); this has also been reported for adults (Ford, 1991). Orcas may also copy features of man-made sounds present in their environment (van Heel, Kamminga, & van der Toorn, 1982). There are some indications that orcas in the wild may imitate the calls of other orcas (Ford, 1991), or sounds produced by other marine animals such as sea lions (Foote et al., 2006). Interestingly, apparent reproductions of sea lion barks by a wild orca matched not only the features of individual sounds, but also the rhythmic production of repeated sounds typical of sea lions (Foote et al., 2006). The orca that was observed producing sea lion–like sounds was separated from its family group at a young age, which may have affected this individual’s auditory experiences during vocal development.

Call matching and call sharing are generally not viewed as clear instances of vocal imitation (Egnor & Hauser, 2004; P. J. O. Miller et al., 2004; Tyack, 2008). Instead, such repertoire sharing is usually described either in terms of dialect usage (Deecke, Ford, & Spong, 2000) or vocal contagion (Andrew, 1962). This interpretation leads to the somewhat odd situation that if an orca replicates a call that it just heard (call matching) this would not qualify as an imitative act, but if it were to bark like a sea lion in response to that same call, then this would qualify as sound imitation (albeit deferred), because orcas do not normally bark. Although call matching could potentially be explained as vocal contagion, or as a case of an orca selecting a known call from its repertoire, it is important to keep in mind that these possibilities do not compel the inference that an orca matching another’s call is not imitating that sound. The orca matching a call could be doing so by copying features of the call it recently heard. The presumption that a call should only be classified as imitative when all other alternative possibilities have been excluded lacks parsimony. If there is evidence that orcas can imitate sounds, and no evidence that they reactively produce calls of a particular type whenever they hear them (as is typical of vocal contagion), then the “simplest” explanation is the one for which there is evidence.

Convergence in Humpback Whale Singing

Humpback whales produce sounds in ways that differ substantially from how dolphins, belugas, and orcas produce sounds, and that are more similar to vocal production by terrestrial mammals (Cazau, Adam, Laitman, & Reidenberg, 2013; Mercado et al., 2010; Reidenberg & Laitman, 2007). The sounds produced by humpback whales are also subjectively quite different from those used by belugas or orcas. Humpbacks do not produce short duration ultrasonic clicks, and their sounds are not commonly classified as being either whistles or pulsed calls. Recent acoustic (Mercado et al., 2010) and anatomical (Reidenberg & Laitman, 2007) analyses suggest, however, that many of the qualitative aural differences in the sounds used by humpback whales reflect quantitative differences in the size and configuration of their vocal organs rather than mechanistic differences in how they produce sounds. Humpback whales do produce click trains (Mercado et al., 2010; Stimpert, Wiley, Au, Johnson, & Arsenault, 2007), but their clicks are much longer in duration and lower in frequency than those used by delphinids4. Many of the sounds produced by humpbacks are acoustically comparable to the pulsed calls used by orcas, but shifted to lower pulse rates. Humpback whales also produce higher-frequency tonal sounds, referred to as “chirps” or “cries,” that are comparable to the “whistles” produced by toothed whales, but with fundamental frequencies an octave or two lower. As in toothed whales and human singers, the sounds of humpback whales fall along a graded continuum that corresponds to modulations of the pulse rate produced by vibrating membranes (Mercado et al., 2010).

Despite these similarities in the acoustic properties of the sounds produced by humpback whales and toothed whales, there are some key differences in the ways that humpbacks use sounds. Most notably, the repertoire of sounds that a particular humpback whale uses varies from one year to the next (Mercado, Herman, & Pack, 2005; K. Payne & Payne, 1985). More famously, humpback whales rhythmically produce sounds in stereotypical sequences for hours with no break, a behavior that has traditionally been described as singing (R. S. Payne & McVay, 1971). During singing bouts, an individual whale may gradually or rapidly expand or compress the spectrotemporal features of sounds, shift them into different frequency bands, or vary the rate and elemental structure of sequences of sounds (K. Payne, Tyack, & Payne, 1983). The repertoire of sounds produced within songs changes annually such that in each year some distinctive sounds are often no longer evident and others that have not previously been recorded may be prevalent (K. Payne & Payne, 1985). Singing by humpback whales is one of the most dramatic displays of vocal flexibility in any species.

There are no scientific reports of humpback whales reproducing the sounds of other species or man-made sounds. Nevertheless, humpback whale singing is often described as providing the clearest and most impressive evidence of vocal imitation among all cetaceans (Herman, 1980; Janik, 2009b). This is because singing humpback whales in a particular region produce similarly structured songs, despite annual changes in songs. It has been argued that humpback whales must be copying the songs they hear being produced by neighboring whales to maintain regional song similarity (Janik & Slater, 1997; Noad, Cato, Bryden, Jenner, & Jenner, 2000; Rendell & Whitehead, 2001; Tyack, 2000). Consistent with this idea, singers may change the features of their songs after being exposed to novel songs. For example, over a period of a year, whales along the Eastern coast of Australia gradually adopted the songs of a separate population of whales from the west coast, essentially abandoning their original song features in favor of those present within the novel song (Noad et al., 2000). An obvious explanation for such rapid turnover is that whales on the east coast of Australia copied the songs of whales from the west coast.

A musician recently collected further evidence that humpback whale singers alter their songs based on the sounds they experience when he attempted an improvisational duet with a singing humpback whale (Rothenberg, 2008). Rothenberg used an underwater speaker and hydrophone to create a two-way sound channel with a nearby singing whale. By broadcasting clarinet sounds underwater in coordination with the singing whale’s sound production, Rothenberg was apparently able to induce the singer to modulate features of its song in ways that matched aspects of the clarinet sounds. A more conventional, non-interactive playback study also found evidence that singers modify their songs based on the features of other songs they hear in their environment (Cholewiak, 2008). Although neither of these acoustic interventions provides clear evidence of sound imitation by humpback whales, they both suggest that singing humpback whales can flexibly adjust their sound production in real time based on sounds they have recently experienced.

If singing humpback whales are copying song features produced by other whales, then this is a rather sophisticated case of deferred sound imitation. First, the songs produced by humpbacks usually last 15 minutes or more, and typically contain 100+ individual sounds produced in five to seven different sequential patterns. A singer would need to encode, retain, and recall multiple properties of an experienced song in order to be able to incorporate these features into an existing song5. Second, songs produced by an individual whale on any given day can vary considerably in duration and content, and do not always include all of the regionally prevalent patterns. In other words, individual whales hear and produce multiple renditions of songs that vary in numerous ways (e.g., the number and variety of sounds, which patterns are included, the number of times patterns are consecutively repeated, etc.). Third, singers in many locales will often be exposed to songs from multiple singers simultaneously. To encode songs received in such complex auditory scenes, singers would need to selectively attend to the songs of individual singers while simultaneously hearing other similar songs at different stages within the sequence, possibly including their own song. Finally, a singing humpback whale would need mechanisms for comparing its own current song with other songs to determine how the songs differ, the kinds of changes required to make the singer’s song more similar to those it hears, and whether such changes are warranted. Baleen whales have generally been viewed as cognitively unsophisticated compared to their toothed relatives. However, the perceptual, memory, and attentional processes required to continuously update song features across decades suggests that humpback whales, at least, possess auditory and sound generating capacities that may match or exceed those of delphinids.

Multidimensional Sound Imitation by Bottlenose Dolphins

The sound producing capacities of bottlenose dolphins have been studied more extensively than those of all other cetaceans combined. Much of this work has focused on understanding how dolphins use ultrasonic signals to echolocate (Au, 1993), or on how they use whistles to communicate (Janik, 2000, 2009a; Tyack, 2000; Tyack & Clark, 2000). Like belugas and orcas, bottlenose dolphins produce a variety of sounds and can produce multiple sound types simultaneously. Unlike the fortuitous observations of belugas spontaneously producing speech-like sounds in captivity, and of orcas producing sea lion–like sounds in the wild, the first indications that dolphins could imitate sounds from outside their typical repertoire came from laboratory studies6. Lilly (1963) described hearing “queer noises” while conducting brain stimulation experiments designed to investigate basic mechanisms of motivation and reward (Lilly, 1958). Recordings used to dictate notes during the neuroscience experiment revealed that some of the sounds being produced were similar to other sounds on the recordings, including laughter and vocal dictations. These early reports that dolphins appeared to be imitating man-made sounds were initially viewed as implausible (Lilly, 1963). Lilly subsequently performed several behavioral experiments designed to explore whether dolphins could learn to reproduce arbitrary sounds (Lilly, 1961, 1965, 1967, 1968; Lilly, Miller, & Truby, 1968). He discovered that: (1) dolphins could repeat properties of acoustic sequences on command (e.g., matching the number, rate, and rhythm of sound bursts); (2) dolphins typically did not replicate the sounds they copied, but instead reproduced only a subset of features, for instance by speeding up frequency-modulation rates and transposing frequencies into a more natural range; (3) novel vocalizations learned by one dolphin sometimes are copied by companion dolphins; (4) an adult dolphin was able to learn to copy features of arbitrary sound sequences produced by humans in as little as 2 hours and immediately transferred this ability to copying sounds from tape recordings; (5) dolphins were willing to reproduce sound sequences without any food reinforcement; (6) given repeated presentations of a word or sequence, dolphins naturally modulated their production across repetitions, gradually improving the match of a subset of features; (7) dolphins persevered in reproducing sounds longer if there were natural variations in the targets than if the sound was reproduced exactly (e.g., by repeatedly playing back a recording of a stimulus); and (8) four of four dolphins were able to learn such tasks with varying fidelity. Although many have questioned the rigor and objectivity of Lilly’s sound imitation experiments, particularly his reports that dolphins were imitating human speech, several of his observations regarding sound imitation by dolphins have since been independently confirmed.

Figure 7. Spontaneous vocalizations produced by a bottlenose dolphin after broadcasts of computer-generated tonal sounds show that dolphins initially imitate subcomponents of the experienced sound (top three images) before producing a more complete copy (adapted from Reiss & McCowan, 1993; Figure 3). Gray lines show spectrographic contours and harmonics of the broadcast sound, and black lines show the contours and harmonics of the dolphin’s sounds. Arrows point to components of the target sound that are similar to the sound produced by the dolphin.

Figure 7. Spontaneous vocalizations produced by a bottlenose dolphin after broadcasts of computer-generated tonal sounds show that dolphins initially imitate subcomponents of the experienced sound (top three images) before producing a more complete copy (adapted from Reiss &
McCowan, 1993; Figure 3). Gray lines show spectrographic contours and harmonics of the broadcast sound, and black lines show the contours and harmonics of the dolphin’s sounds. Arrows point to components of the target sound that are similar to the sound produced by the dolphin.

Anecdotal reports of dolphins spontaneously producing “unnatural” sounds similar to ones they were exposed to in their surroundings provided additional evidence that dolphins could modify their vocalizations to match environmental features (Caldwell & Caldwell, 1972; Tayler & Saayman, 1973). More formal studies of spontaneous imitation in dolphins later confirmed that they reproduced components of computer-generated whistles after as few as 2–20 exposures (Hooper, Reiss, Carter, & McCowan, 2006; Reiss & McCowan, 1993), and that dolphins replicated not only individual sounds, but also rhythmic patterns of sounds (Crowell, Harley, Fellner, & Larsen-Plott, 2005). In the spontaneous imitation studies conducted by Reiss and colleagues, some electronic whistles were associated with the introduction of toys into the tank and others were presented alone. Dolphins reproduced the sounds in both cases, but were more likely to do so (and with higher fidelity) when the sound had been paired with a toy (Hooper et al., 2006). As noted by Lilly (1963), the dolphins often transposed novel sounds and compressed them in time when reproducing them (Hooper et al., 2006; Reiss & McCowan, 1993). Additionally, the dolphins’ initial copies of electronic sounds contained only subcomponents of those sounds, which were later combined (Figure 7). In some cases, components of separate sounds were recombined to create novel sounds that the dolphins had never used or experienced previously (Reiss & McCowan, 1993). Reiss and colleagues found that dolphins reproduced sounds immediately after a sound was broadcast and also at later times. Kremers, Jaramillo, Boye, Lemasson, and Hausberger (2011) recently reported that captive dolphins could be heard producing sounds at night that were reminiscent of humpback whale sounds that were broadcast as part of public shows during the day. In this case, the sounds were transposed from the low frequency range produced by humpback whales into a range more typical of dolphin sound production. Dolphins appeared to match both the harmonic structure of the humpback whale calls, as well as their duration and direction of frequency modulation.

In one of the most controlled experimental studies of sound imitation to date, Richards, Wolz, and Herman (1984) found that dolphins were able to learn to imitate computer-generated sounds on command. As reported by Lilly (1967) and Reiss and McCowan (1993), Richards and colleagues found that dolphins spontaneously imitated sounds before being trained to do so, rapidly learned to generalize the sound imitation task to novel sounds, transposed reproductions into a preferred vocal range, and gradually improved their copies of sounds across trials (Richards, 1986; Richards et al., 1984). Sigurdson (1993) also succeeded in training dolphins to reproduce specific frequency-modulated sounds, but only after extensive training. He concluded that the dolphins initially copied more general features of sounds, and then afterward learned to control details of sound structure through a process of vocal shaping. Richards and colleagues found that once a dolphin settled on a particular mode of imitating a sound, that imitations on subsequent trials were quite stable (Figure 8). Such stable renditions of specific targets can provide important information about the acoustic features that the dolphin attended to, as well as the precision with which dolphins can replicate these features. For instance, Figure 8 shows that a dolphin matched closely the duration of targets as well as the range of frequencies produced. The dolphin also more closely matched the final spectral properties of target sounds than earlier components. Although dolphins sometimes transpose sounds when imitating them, Figure 8 shows that they can also precisely match absolute pitch. Similarly, although they may expand or compress spectrotemporal properties of a heard sound, they can also closely approximate rates of frequency modulation within sounds. In fact, Richards and colleagues noted that the dolphin even imitated transient distortions produced by the underwater speaker at the onsets of certain sounds.

Figure 8. (a–d) Sound reproductions produced in experimental tests of a dolphin’s imitation abilities across trials show that sound production is reliable across multiple repetitions and that the dolphin is more likely to replicate some features than others (adapted from Richards et al., 1984; Figure 4). Gray lines show spectrographic contours of four different broadcast sounds, and black lines show the contours of the dolphin’s sounds on multiple trials for each of the sounds.

Figure 8. (a–d) Sound reproductions produced in experimental tests of a dolphin’s imitation abilities across trials show that sound production is reliable across multiple repetitions and that the dolphin is more likely to replicate some features than others (adapted from Richards et al., 1984; Figure 4). Gray lines show spectrographic contours of four different broadcast sounds, and black lines show the contours of the dolphin’s sounds on multiple trials for each of the sounds.

Research on dolphins is unique in that, although relatively few experiments have been conducted, dolphins have consistently shown sound generating capacities that have yet to be observed in any other non-human species. Various songbirds are able to reproduce environmental sounds with astonishing fidelity (Dalziell & Magrath, 2012), but none have shown the ability to replicate electronic sounds on command, transpose copied sounds into a more appropriate range, or flexibly match the number, rhythm, and rate of sounds across trials. Dolphins are also the only non-human mammal that is known to spontaneously reenact observed episodes integrating both actions and sounds. For instance, in an early anecdotal report, dolphins were seen to imitate a scuba diver cleaning algae from the window of their tank. Not only did the dolphins use an object to scrub algae off the window, but they also released bubbles in bouts while doing so and made sounds described as being almost identical to those of the diver’s air-demand valve (Tayler & Saayman, 1973). Such performances strongly suggest that dolphins can flexibly reproduce sounds other than those in their natural repertoire and will occasionally do so in contexts where sound imitation serves no obvious functional purpose.

Richards (1986) argued that the flexibility with which dolphins could imitate novel sounds in controlled experiments indicated that they possessed a generalized concept of imitation that extended to absolute frequency, relative frequency, amplitude modulation, and inadvertent click transients. Lilly (1963) had earlier noted similar generalization of a copying task across rhythm, rate, number, Bronx cheers, and possibly speech. The range and specificity with which dolphins can imitate sounds has yet to be determined. Given the wide range of sounds that dolphins are known to be able to imitate, and the fact that they can match the timing, number, and durations of sound sequences, they are likely able to reproduce at least some sequences of sounds (Crowell et al., 2005). Dolphins might also automatically imitate idiosyncratic features of sounds produced by conspecifics (as has been observed in studies of human speech), although this has yet to be reported. Dolphins have been trained to produce a wide range of sounds on command and to produce matching sounds when they hear another dolphin produce them (Jaakkola, Guarino, & Rodriguez, 2010). Such performances traditionally have been viewed as instances of contextual learning, because the sounds the dolphins reproduce are not novel. However, no quantitative measures have been made to assess whether dolphins in these situations naturally adjust their vocalizations to match those of recently heard sounds.

Herman (1980) described primates and cetaceans as “cognitive cousins” because despite millions of years of evolutionary divergence within radically different environments, both groups appear to have converged on similar cognitive mechanisms for classifying, remembering, and discovering relationships between events (reviewed by Herman, 1980; Mercado & DeLong, 2010). In the case of sound imitation, this convergence is particularly noteworthy because humans and dolphins are the only mammals that have shown the ability to voluntarily imitate novel sounds. Above, we proposed that the ability of adult humans to imitate sounds is an acquired cognitive skill. Non-human primates do not appear to naturally acquire such skills (at least not vocally), raising the question of why dolphins would acquire a skill that other mammals typically do not. In section five, we suggest that the answer to this question may relate to cetaceans’ advanced perceptual use of sound underwater.

Synthesis

A century ago, researchers were optimistic that with the right training, enculturated chimpanzees would eventually be able to learn to reproduce speech sounds. At that time, the idea that a dolphin might be better at imitating sounds than a chimpanzee would have been considered absurd. Experimental studies have since shown that dolphins’ capacities for imitating sounds exceed those of all nonhuman primates, and opportunistic observations suggest that other cetaceans may share this capacity. The evidence for sound imitation abilities in other cetaceans is anecdotal, but remains stronger than for most mammals, including non-human primates. The extent to which adult cetaceans use their imitative abilities in their daily lives remains unclear. There have been no studies of automatic imitation in any cetacean. It is also not known how the fidelity with which cetaceans can copy different acoustic features varies either within or across species and individuals. The few laboratory studies of sound imitation by cetaceans to date have focused on showing that they can imitate sounds rather than on revealing how they are able to do this. The extent to which the sound imitation abilities of adult cetaceans depend on practice is uncertain, but clearly dolphins can learn to refine their ability to reproduce man-made sounds, and field observations suggest that they may also regularly reproduce conspecific sounds in natural interactions.

The willingness of dolphins to interact with humans in experimental contexts provides numerous opportunities for sound imitation studies that would be impossible to conduct with humans. Individuals can be trained across multiple years to perform a wide range of tasks. In principle, one might control a dolphin’s exposure to many complex acoustic events, including musical patterns and sequences of speech sounds. Most importantly, such studies potentially allow for cross-species comparisons that are not feasible with non-human primates. If the ability to imitate sounds depends on evolutionarily specialized processing, then one would expect that cetaceans’ abilities to imitate sounds should differ systematically from those of humans in ways that directly reflect the many differences in their ecological circumstances. If, however, these abilities are highly dependent on training and practice, then it might be possible to endow individuals from different species with similar imitative capacities by training them on tasks with similar demands. Given that the functions of sound imitation in adult mammals are poorly understood, it remains possible that cetaceans and humans evolved similar capacities to learn imitative skills because they faced similar perceptual or cognitive challenges. In the following section, we consider this possibility more closely.

V. Proposed Origins and Functions of Sound Imitation Abilities

When biologists and psychologists discuss vocal imitation as a learning mechanism, it invariably is in the context of explaining how and why individuals acquire communicative skills during development. Consequently, when it comes to explaining why some species have the ability to imitate sounds, many researchers focus on describing the benefits associated with effective communicative systems, such as enhanced mating opportunities, greater possibilities for complex social interactions, ability to identify familiar individuals, and so on. Although such explanations provide plausible reasons for why adaptations for sound imitation abilities might persist once they appear in a species, they are less able to account for why these abilities are so rare among terrestrial mammals.

The sophisticated sound imitation abilities of adult humans suggest that these abilities may be advantageous for reasons other than (or in addition to) learning to talk, such as predicting and perceiving the actions of others (Wilson & Knoblich, 2005). Below, we consider whether this might also be true for cetaceans. We conclude that current evolutionary and functional explanations for the prevalence of sound imitation abilities in cetaceans, which focus on the role of vocal learning in social communication, are inadequate, and propose an alternative explanation in which increasing perceptual-motor and cognitive demands related to non-visually guided movement coordination led to advanced sound localization abilities that are enhanced by sound imitation capacities. First, however, we review past attempts to explain why cetaceans evolved the ability to imitate sounds.

Do Mammals Imitate Sounds to Enhance Social Communication?

The main hypotheses typically proposed for why different cetaceans imitate sounds are that this ability: (1) enables group recognition and maintenance of group cohesion (e.g., in orcas); (2) aids in the learning of a vocal badge that can be used as a password for access to local resources (e.g., in bottlenose dolphins); (3) provides a way for males to increase the complexity of sound production, thereby increasing their attractiveness to females (e.g., humpback whales); (4) enables individuals to display their prowess and better fend off competing males; and (5) helps individuals to recognize each other in noisy environments (Janik & Slater, 1997; Tyack, 2000). Janik (1999) collapsed these possibilities into two global hypotheses: the sexual selection hypothesis and the individual recognition hypothesis. In both cases, the proposed driving force for the evolution of vocal learning and imitation abilities in cetaceans is a need to facilitate communication of either fitness or identity. Janik (2009a) further suggested that sound imitation abilities in cetaceans subserve complex communication mechanisms that are necessitated by complex social systems.

The idea that bottlenose dolphins evolved the ability to imitate sounds to enable them to develop individual-specific whistles (referred to as signature whistles) that serve as a vocal badge or naming signal (Fripp et al., 2005; Janik, 2000; Janik & Sayigh, 2013; Janik, Sayigh, & Wells, 2006; King, Sayigh, Wells, Fellner, & Janik, 2013; Quick & Janik, 2012), arose from early observations that captive dolphins in isolation often repeatedly produced a stereotyped whistle with distinctive features that were specific to the vocalizing individual (Caldwell & Caldwell, 1965; see Harley, 2008, for a review). It was later noted that in some situations, dolphins would produce a whistle that was highly similar to the signature whistle of a tank-mate; these whistles have been described as being signature whistle imitations (Agafonov & Panova, 2012; Tyack, 1986). Researchers have hypothesized that a dolphin might imitate a signature whistle to communicate with or about specific individuals (Janik, 1999; King et al., 2013; Richards et al., 1984; Tyack, 1991).

It has also been suggested that sound imitation serves an important role during dolphin vocal development, enabling young bottlenose dolphins to acquire signature whistles that reflect their lineage (Sayigh, Tyack, Wells, & Scott, 1990; Sayigh, Tyack, Wells, Scott, & Irvine, 1995; Sayigh et al., 1999), or social affiliations (Fripp et al., 2005; Watwood, Tyack, & Wells, 2004). The role of vocalizing “tutors” in the vocal development of bottlenose dolphins is generally thought to be similar to what is seen in human children and songbirds (Reiss & McCowan, 1993). Observations of dolphins born in captivity support the idea that vocal development is shaped by the sounds dolphins experience in their surroundings (Caldwell & Caldwell, 1979; McCowan & Reiss, 1995; Miksis, Tyack, & Buck, 2002; Reiss & McCowan, 1993; Tyack & Sayigh, 1997). However, such observations provide little evidence that experience-dependent repertoire acquisition serves primarily to distinctively signify a vocalizing dolphin’s identity, or that sound imitation plays any role in such a process.

Whereas sound imitation abilities in toothed whales have been postulated to be important for the learning and development of acoustic identifiers, the apparent sound imitation abilities of humpback whales have been described as serving a role in sexual advertisement (Janik, 2009b; R. S. Payne & McVay, 1971; Smith, Goldizen, Dunlop, & Noad, 2008). For instance, Tyack and Sayigh (1997, p. 229) suggest that, in humpback whales, “vocal learning appears to function to produce more complex displays through sexual selection.” Janik (1999) suggested that the ancestors of humpback whales may have initially evolved sound imitation abilities for individual recognition functions, but that over time this ability came to serve a reproductive function.

Limitations of Current Evolutionary and Functional Hypotheses

A prevalent assumption regarding vocal learning and imitation in cetaceans is that because different species have divergent social systems, the origins and functions of sound imitation must be similarly diverse (Janik & Slater, 1997; Tyack & Sayigh, 1997). For instance, Tyack (2000, p. 307) speculated that, “it is possible that vocal learning7 evolved de novo in these different taxa as independent solutions to different problems posed by their different social organizations.” While it is certainly possible that different cetacean species developed sound imitation abilities independently in response to their particular social and reproductive pressures, it is also possible that the origins and functions of sound imitation in cetaceans are not as disparate as they might at first appear. For instance, Deacon (1997) hypothesized that cetacean sound imitation abilities are an exaptation of adaptations for skeletal motor control of airflow. In this scenario, new demands on motor control related to voluntary breathing gave rise to increased vocal flexibility, as well as a dissociation between mechanisms involved in producing reactive/emotive vocalizations and other more voluntarily produced sounds (see also Mithen, 2009). Deacon’s hypothesis makes no assumptions about the functions of either sound imitation or the sounds being imitated, and can potentially account for the emergence of vocal imitation abilities in all cetacean species as well as in humans. A limitation of his hypothesis is that it does not explain any benefits cetaceans might gain from imitating sounds. In fact, Deacon suggests that some mammals famous for imitating speech may have been showing signs of neural dysfunction.

Past proposals that vocal imitation is an evolutionary outcome of either sexual selection or adaptations for enhanced individual recognition suffer from several limitations. First, these hypotheses attempt to account for the emergence of vocal learning and imitation abilities in cetaceans in terms of hypothetical functions of the sounds cetaceans produce. However, the specific functions of most cetacean sounds have yet to be established experimentally. The hypothesis that humpback whale songs function as reproductive displays to attract females and repel males is based on circumstantial evidence (Frazer & Mercado, 2000; Mercado & Frazer, 2001), and does not account for many known behaviors of singing whales (Darling, Jones, & Nicklin, 2012; Darling, Meagan, & Nicklin, 2006; Stimpert, Peavey, Friedlaender, & Nowacek, 2012). Although there is substantial evidence that bottlenose dolphins produce whistles that humans can use to identify them (Harley, 2008; Janik & Sayigh, 2013), there is no evidence that this is a primary function of whistles or that dolphins have difficulty identifying other dolphins that are not producing signature whistles (McCowan & Reiss, 2001). Second, neither the sexual selection nor the individual recognition hypothesis leads to predictions other than those related to the speculated functions of a small subset of cetacean sounds. Consequently, these evolutionary hypotheses are little more than a restatement of pre-existing functional hypotheses. Third, the sexual selection and individual recognition hypotheses require one either to assume that all cetaceans have a common ancestor that developed sound imitation abilities for sexual or identification purposes, after which the functions of these abilities later diverged dramatically across different species depending on their social systems (Janik, 1999), or that each species of cetacean independently evolved sound imitation abilities to meet their particular social needs (Tyack, 2000). Why either of these scenarios might have occurred in cetaceans, but not other mammals, is unclear given that many mammals (e.g., primates and canids) often engage in complex social interactions in situations where visual information is limited.

A New Hypothesis: Imitatible Sounds Are More Localizable

Most current hypotheses regarding the origins and functions of sound imitation in cetaceans were originally developed as explanations for the evolution of song learning by birds (Thorpe, 1969; Thorpe & North, 1965). Here, we consider whether sound imitation abilities may provide adult cetaceans with other previously unsuspected benefits. Specifically, we assess the possibility that the capacity to imitate sounds might enable cetaceans to localize sound sources more accurately. In this scenario, sound imitation abilities may have appeared early in the evolution of cetaceans and then been preserved throughout the differentiation of species because the advantages of such capacities persisted despite differences in social organization and behavior.

As with the communication-focused hypotheses described above, the idea that vocal learning or imitation might enhance spatial perception was originally proposed to account for the evolution of song learning in birds (Morton, 1982, 1986, 1996, 2012). This hypothesis, referred to as the “ranging hypothesis,” states that a listening bird will be better able to estimate its distance from a singing bird if the listener can compare received songs with an internal representation of the song as it would appear at the source. Ranging is a perceptual process in which an individual uses a received sound (or sounds) to estimate the distance to the source of that sound. Sound transmission can degrade the acoustic features of a song. By comparing an undistorted representation of the song with the received song, the listener may identify how transmission has changed song features. Changes in songs caused by propagation are thought to be the primary cues that enable birds and mammals to estimate auditory distance (Naguib & Wiley, 2001). The ranging hypothesis thus suggests that the accuracy with which a listener can judge auditory distance is constrained by its ability to compare received songs with internal representations of “pristine” songs. The ability to imitate a received song, either overtly or covertly, gives the listener direct access to features of the song as they would appear at the source. Thus, the ability to imitate sounds could improve a bird’s ability to estimate auditory distance, which could give the bird a selective advantage in spatial interactions with competitors. Morton (1996) also suggested that male songbirds might selectively sing songs with acoustic features that make ranging difficult so that other males have problems locating them during territorial disputes.

Playback studies in songbirds have tested the ranging hypothesis by comparing territorial birds’ responses to familiar and unfamiliar songs broadcast within and outside of a listener’s territory (Falls & Brooks, 1975; Morton, Howlett, Kopysh, & Chiver, 2006; Shy & Morton, 1986). Listening birds responded more aggressively to familiar songs produced inside their territory than to those outside of their territory (Shy & Morton, 1986). Listening birds also expended more energy searching when unfamiliar songs were produced outside of their territory, suggesting that they may have been less certain of the singer’s location. Finally, birds approached a playback speaker more closely when the song was familiar (Morton et al., 2006), indicating that they were better able to localize the speaker when it was broadcasting familiar songs. Although one cannot assume that all familiar songs are more imitatible than unfamiliar songs, if a song is familiar because it is within the listening bird’s repertoire, then it is likely to be highly imitatible.

Cetaceans are not generally territorial, but they often encounter situations in which precise spatial hearing is important, as evidenced by their use of echolocation. Echolocation differs from ranging in that an echolocating animal controls the sounds it uses to localize environmental features, whereas a ranging animal uses sounds produced by other animals to localize them. Possible links between the evolution of echolocation and the emergence of sound imitation abilities have been previously noted (Tyack & Clark, 2000), but have received little scientific attention. Applied to cetaceans, the ranging hypothesis suggests that sound imitation capacities may have developed in cetaceans for the same reason as echolocation—to enhance auditory spatial perception in a visually limited environment.

Determining the distance to a sound source might seem like a rather trivial ability, one that an organism could easily achieve through mechanisms less complex than sound imitation. Intuitively, one might suspect that simply looking at the source would usually solve the problem. When a source is not visible, as may often be the case for cetaceans, then variations in amplitude might appear to suffice (e.g., the quieter the sound, the farther the source). Amplitude cues are only grossly correlated with source distance, however, and for sounds propagating in the ocean such cues would provide little if any information about the trajectory of a vocalizing conspecific. The ambiguity of amplitude cues arises, in part, because individuals may vary how loudly they produce sounds and because sounds repeatedly reflect from the ocean surface and bottom, creating complex patterns of constructive and destructive interference. As a result, amplitude can fluctuate dramatically for reasons unrelated to variations in distance (e.g., Mercado & Frazer, 1999). It might also seem that if a species can echolocate, then additional mechanisms for locating other individuals would be redundant. Undoubtedly, cetaceans do sometimes use echolocation to range other animals. This is a much less efficient means of coordinating the movements of multiple individuals than passive localization, however, because it requires that every individual continuously echolocate in multiple directions to keep track of all the other individuals in a group. Furthermore, such active sound production would reveal the locations of all members of the group to prey or competitors, which is likely to be disadvantageous in many situations. Humpback whales, belugas, orcas, and bottlenose dolphins are all known to engage in sophisticated foraging strategies in which multiple animals must coordinate their underwater movements in three-dimensions to corral prey (Connor, 2000; Wiley et al., 2011), and they often synchronize their movements within groups (Fellner, Bauer, & Harley, 2006; Perelberg & Schuster, 2008). Coordinating invisible movements in the ocean can be a highly challenging task. A listening whale or dolphin may need to track multiple sources simultaneously and to move or produce sounds contingently based on the sounds it hears. If sound imitation abilities enhance a cetacean’s capacity to monitor and predict the movements of conspecifics, then sound imitation may be more prevalent in cetaceans than in terrestrial mammals because reduced availability of visual cues for coordinating actions underwater increased reliance on alternative perceptual strategies.

Do Mammals Imitate Sounds to Enhance Their Perception of Actions?

A specific prediction of the ranging hypothesis is that a listener will be better able to localize the source of a sound if the listener can reproduce that sound. Unfortunately, it is not known whether cetaceans’ auditory distance estimates vary with sound type. In fact, there are no measures of the accuracy with which cetaceans can judge the auditory distance of any sound source other than targets that they have echolocated (Au, 1993). To test whether imitatible sounds are easier for cetaceans to localize, one would need to broadcast various sounds at known distances, and then assess how accurately individuals can estimate the distance of the source8. Given the logistical difficulties associated with conducting such experiments with cetaceans, an alternative approach is to first investigate whether other species (e.g., humans) show improved spatial processing of imitatible sounds.

Predictions of the ranging hypothesis have never been explicitly tested in mammals, but there have been numerous studies of auditory distance estimation in humans. Human sound localization abilities are quite good relative to other mammals (Blauert, 1997). Nevertheless, the accuracy with which humans can estimate the distance to a sound source varies considerably (Zahorik, Brungart, & Bronkhorst, 2005). Familiarity with sound features can dramatically improve an individual’s ability to range the source of that sound (Coleman, 1962; Little, Mershon, & Cox, 1992). Humans are also known to be better at ranging speech than artificial sounds (Gardner, 1969), and to be better at ranging forward speech than speech played backward (McGregor, Horn, & Todd, 1985; Wisniewski, Mercado, Gramann, & Makeig, 2012). Because backward speech contains all of the acoustic information present in forward speech, any environmental degradation of sound features associated with propagation will be the same for both forward and backward speech. Consequently, any differences in an individual’s ability to estimate the distance of these sounds lies within the listener, not within the received signals.

A recent study of auditory distance estimation by humans hearing familiar and foreign speech sounds found that the advantage for forward speech still holds for an unfamiliar foreign language (Wisniewski et al., 2012). Thus, familiarity per se does not seem to be the key factor that makes speech more localizable. According to the ranging hypothesis, the greater accuracy at ranging forward speech comes from the fact that speech is a highly imitatible acoustic event, and thus is encoded in ways that make replication of the heard sounds possible (Skoyles, 1998). Backward speech, in contrast, contains acoustic trajectories that would be difficult or impossible to reproduce with vocal acts (Cowan, Braine, & Leavitt, 1985), and so cannot be reconstructed with the same fidelity. The ranging hypothesis thus provides a possible explanation for differences in the accuracy with which humans can judge auditory distance for particular sound types.

The ranging hypothesis is similar in many respects to Wilson’s (2001b) and Wilson and Knoblich’s (2005) hypotheses that imitation may enhance an individual’s ability to perceive and predict the actions of conspecifics. Wilson (2001b) suggested that mental representations formed during covert imitation facilitate the flow of information processing between perception and action, especially when the stimuli and actions are familiar. More specifically, Wilson and Knoblich proposed that visual perception of other persons’ behaviors activates covert imitative motor representations that feed back into the perceptual processing of observed actions, leading to expectations and predictions of ongoing action trajectories. Consistent with this proposal, people are better able to recognize actions via point-light displays if the actions are ones that they themselves can perform (Blake & Shiffrar, 2007; Casile & Giese, 2006). Auditory processing of conspecifics’ vocal acts might similarly activate covert imitative motor representations that facilitate the mental representation of non-visible movements of a sound’s source through space. Although the ranging hypothesis, as proposed by Morton (2012), does not specifically address the possibility that such acoustically triggered representations might facilitate a listener’s ability to track or predict a singer’s future actions, it is likely that more accurate ranging of vocalizing conspecifics would facilitate monitoring of their movements.

Past laboratory studies of sound imitation by dolphins suggest that they gradually improve the fidelity of their copies through repeated practice (Lilly et al., 1968; Reiss & McCowan, 1993; Richards et al., 1984; Sigurdson, 1993), and that this gradual improvement reflects incremental refinement of vocal control. Perceptual-motor skill learning related to vocal production thus likely plays a role in the development of capacities for imitating specific sounds. The ranging hypothesis predicts that as an individual’s facility at producing a particular sound improves, his or her ability to represent and imitate that sound should also gradually improve, which could indirectly lead to improvements in spatial localization abilities. Thus, vocal learning may play an important role in the functionality of sound imitation for cetaceans, but in the opposite direction from what is typically assumed. Most researchers assume that the purpose of vocal imitation is to enable individuals to rapidly learn new ways of producing sounds from others (e.g., Whiten & Ham, 1992). The ranging hypothesis suggests instead that individuals may learn new sound production skills to enhance existing perceptual capacities (for a review of how motor skills can enhance perception, see Wilson & Knoblich, 2005). Specifically, rather than imitating novel sounds to increase or specialize their vocal repertoire, cetaceans may practice producing different sounds to increase their vocal flexibility, thereby increasing the variety of sounds that they can imitate, which in turn might increase their ability to localize sources of similar sounds.

Synthesis

Current explanations for why cetaceans evolved the ability to imitate sounds focus heavily on the role of imitation in vocal repertoire formation and modification. Such explanations meld well with proposed functions of vocal imitation in speech and language learning by young children. When cetaceans’ abilities are viewed through the lens of vocal imitation research in adult humans, however, an alternative possibility emerges. Namely, that the benefits of sound imitation abilities for adult cetaceans may relate more to enhancing the perception and dynamic coordination of movements than to cementing social bonds, selecting a moniker, or attracting a mate. Of course, enhanced perceptual and coordination abilities may facilitate a wide array of functions, including mating, communicating, and other social functions. Nevertheless, sexual selection for fitness revealing traits and adaptive specializations for species-specific social needs are likely to involve different adaptations and mechanisms from those associated with natural selection for basic perceptual abilities. The hypothesis that vocal imitation in cetaceans is a perceptual adaptation predicts that the most proficient imitators will be adults rather than immature individuals, and that through extensive practice, cetaceans may be able to increase not only their sound imitation skills, but also their capacity to localize sound sources, and their ability to represent and predict dynamic events. In the following section, we consider more closely the role that learning plays in the refinement of sound imitation abilities and explore whether a unified framework can potentially describe and explain these abilities in both cetaceans and primates.

VI. Proposed Mechanisms for Imitating Sounds

A successful model of vocal imitation, and of sound imitation more generally, must be able to account for known flexibilities in imitative abilities and for documented sensitivities to stimulus complexity. Ideally, the model should also be able to account for the role of sound imitation in perception and production across mammalian species. Having discussed empirical findings from both primates and cetaceans, we now review some of the leading theoretical models of vocal imitation and consider how well they account for the available data. In so doing, we revisit general themes discussed at the beginning of the paper, but with respect to specific mechanisms proposed by different theories. By our reading, the literature to date supports the notion that vocal imitation abilities emerge in mammals as a learned skill that is suited to the particular constraints faced by the species in question, and that involves the construction of multimodal representations of acoustic events.

Vocal Imitation as Template Matching

Researchers studying animals other than humans have often described the processes underlying vocal imitation as simple, unimodal, and transparent to the vocalizing individual. For instance, Whiten and Ham (1992) suggested that to reproduce a sound, a bird only needed to adjust its output until the produced sound matched what the bird had originally heard. They contrasted this process with visually based motor imitation, which they described as requiring additional levels of representation and greater computational capacities. This auditory-feedback based explanation of the processes involved in vocal imitation is derived from a model that was originally developed to account for song learning by birds—the auditory template model (Konishi, 1965; Marler, 1976b). In the template model, birds start out with an internal auditory representation of what a song should sound like (acquired either genetically or through memorization, Marler, 1997), and then gradually learn to produce sounds that match this auditory template through a process of sensorimotor learning (Margoliash, 2002; Marler, 1976b, 1997). Computational instantiations of the template model show that such error-correction mechanisms are sufficient to generate sound patterns that match a prescribed target (Troyer & Doupe, 2000a, 2000b). When vocal imitation is construed as an instance of vocal learning, this model provides a relatively simple account of the necessary underlying mechanisms (Figure 9).

Figure 9. Template model of vocal learning and imitation originally developed to explain how birds learn songs and subsequently used as a model of vocal imitation.

Figure 9. Template model of vocal learning and imitation originally developed to explain how birds learn songs and subsequently used as a model of vocal imitation.

The auditory template model rests on several assumptions that make it problematic as a simple account of vocal imitation abilities, however, including: (1) heard sounds are selectively filtered such that particular sequences produced by conspecifics trigger unique auditory memory mechanisms; (2) experiences of these favored sequences are internally stored via something like the auditory equivalent of eidetic memory after a single or very few exposures; (3) once formed, these memories last indefinitely and are immediately reactivated whenever an individual vocalizes; (4) any mismatch between the permanent auditory template and a produced sound will lead to changes in sound production to minimize those differences; and (5) the fundamental process enabling vocal imitation is auditory feedback (see Petrinovich, 1988, for a more detailed critique of these assumptions in relation to theories of bird song learning).

Vocal imitation studies in adult humans suggest that neither detailed long-term auditory memories nor auditory feedback are necessary to reproduce sounds. A human can readily imitate a novel melody even if masking noise is presented over headphones such that it is very difficult for the person to hear their own vocalizations (Pfordresher & Brown, 2007), although intonation may deteriorate slightly (Mürbe, Friedmann, Hofmann, & Sundberg, 2002; Ward & Burns, 1978). According to the template model, the mismatch between what is produced (voiced pitches) and what is heard (noise) should lead to large changes in sound production; however, no such changes have been reported. In fact, much larger changes in vocal production are observed when auditory feedback exactly matches the produced sound, but shifted slightly in time (Pfordresher & Mantell, 2012; Smotherman, 2007). More generally, comparisons between produced sounds and previously heard sounds are not necessary for an adult human to reproduce novel sounds. In particular, when a person imitates a novel sound for the first time, feedback cannot guide the vocal act because the motor acts that constitute the imitative act are selected and executed prior to any feedback being available. Thus, organisms that can accurately imitate novel sounds upon first presentation are controlling their sound producing actions such that they will generate perceived similarities, rather than using those similarities to discover what actions to perform. Theories of vocal imitation that assume auditory feedback renders vocal imitation fundamentally different from other forms of imitation have conflated the act of vocal reproduction with an individual’s post-hoc assessment of similarities between produced sounds and remembered sounds. The availability of auditory feedback can be an important component of vocal learning, but it is neither necessary nor sufficient for flexible vocal imitation, and in some cases may even degrade an individual’s imitation abilities.

Despite the limitations of the auditory template model as a model of vocal imitation, it can potentially provide insights into how and why mammals change the way they imitate sounds over time. For instance, past studies of spontaneous and instructed vocal imitation by bottlenose dolphins consistently show that dolphins gradually refine their reproductions of experienced sounds (Lilly, 1967; Reiss & McCowan, 1993; Richards et al., 1984; Sigurdson, 1993), with later renditions showing more similarities to targets than earlier versions. The template model provides a reasonable account of such gradual adjustments in performance.

Vocal Imitation as the Operation of Adaptively Specialized Modules

Marler (1997), recognizing that the auditory template model was insufficient to account for vocal imitation by birds (especially when the sounds being imitated were from other species), proposed two distinct modes of vocal learning: one involving the template-based system that is specialized for learning songs produced by conspecifics, and a second system, described as “general auditory mechanisms,” that enabled birds to imitate other sounds by bypassing or overriding the template-based system. Marler’s proposal that separate auditory mechanisms might be used for imitating different kinds of sounds converges with a second way of conceptualizing vocal imitation—as operations performed by one or more specialized cognitive modules.

A modular architecture of cognition, as originally proposed by Fodor (1983), assumes that certain cognitive functions are driven by specialized processors that operate independently from each other. Fodor and others who have grappled with the notion of modularity have proposed many features of cognitive modules, but the two features that dominate the literature include informational encapsulation (a module’s functioning is not influenced by processing of information in other modules) and domain specificity (a module is selective with respect to the type of input it will process). Modular approaches that are relevant to vocal imitation have proposed distinct modules for imitation, thus leading to the possibility that stimulus-specific modules may mediate certain kinds of imitation.

One modular approach to imitation in general, and not just vocal imitation, was proposed by Subiaul and colleagues (Subiaul, 2010; Subiaul et al., 2012). In his “multiple imitation mechanisms” approach, imitative modules are divided into vocal imitation, motor imitation (imitation of visually presented information through manual gestures), and cognitive imitation (copying an inferred pattern of thought). Superordinate to this division, he further divides imitation into separate processes for the imitation of novel versus familiar stimuli. This approach shares with the perspective we have advocated the idea that vocal imitation is genuinely a form of imitation, albeit one that may be guided by distinct mechanisms from other forms of imitation. However, in proposing that vocal imitation abilities depend on six specialized cognitive modules, Subiaul’s model diverges from the present account in two important respects. First, the conceptualization of vocal imitation as being the dedicated function of two adaptively specialized systems runs against the present argument that imitative skills are learned. Second, because the domain specificity in Subiaul’s model is limited to auditory inputs, the model does not explain differences in imitation across domains such as music and language, which we turn to next.

A highly influential modular architecture of auditory processing was proposed by Peretz and Coltheart (2003). Although this model was not intended to be a model of imitation per se, its scope is broad enough to make systematic predictions about imitation within each domain. According to the Peretz and Coltheart model, individuals are endowed with processing modules specialized for analyzing particular features of sounds that are then used as a basis for guiding vocal actions. These features are processed differently for inputs that represent linguistic versus musical domains. One might use a module specialized for extracting pitch when imitating melodies, another focused on phonology when imitating speech, and possibly a third when vocally reproducing percussive rhythms. This framework suggests that different processing mechanisms are required to form particular kinds of auditory templates and that which template formation process is used depends on categorical features of auditory inputs. This approach is consistent with some of the vocal imitation data from adult humans, in particular the general advantage for imitating absolute pitch content within the domain of music as opposed to speech9 (Mantell & Pfordresher, 2013). The assumption that these auditory modules are informationally encapsulated is, however, inconsistent with the observed effects of phonetic information on the imitatibility of both musical and spoken sentences.

A variant of this multiple module approach was recently proposed by Patel (2003), in which musical and linguistic representations are separately constructed by independent, specialized processing systems, but then manipulated or used by a third shared system that constrains how both types of representations are used. For example, an individual’s ability to parse syntactical structures or to recognize chord progressions might both depend on integrating multiple elements within a sound sequence. A shared system for sequence integration might thus lead to correlations in an individual’s fidelity at imitating different sound sequences, even if the auditory templates formed by different categories of sounds are independent of one another. In this view, there are specialized mechanisms for representing different categories of sound sequences (and forming associated templates), as well as general cognitive mechanisms that may constrain an individual’s ability to reproduce all kinds of sequences.

Vocal Imitation as Auditory-Motor Recoding

All of the above models focus on comparisons of auditory representations of sounds as being the key mechanism of vocal imitation, while minimizing the role of other contributing mechanisms, such as characteristics of the vocal motor system. These models beg the question of why so few mammals show vocal imitation abilities, given that many mammals (including all primates) have sophisticated auditory systems. As noted earlier, some researchers have suggested that a more crucial mechanism underlying vocal imitation relates to neural control of skeletal muscles involved in vocalizing (Arriaga & Jarvis, 2013; Deacon, 1997; Fitch, 2010). Humans have greater control of tongue and laryngeal movements than most other primates and may possess specialized neural regions for directly controlling these movements. Other species known to imitate sounds, such as some songbirds, also have more fine control over vocal membranes than is typical for mammals. The basic idea proposed by Deacon, Fitch, and Jarvis is that these specialized motor control circuits provide humans and a few other mammals with uniquely flexible vocal control processes, and that it is this heightened vocal dexterity that makes vocal imitation possible.

The role of the motor system in vocal imitation, and more broadly in perception, has been assessed in studies of human speech production and imitation. Speech researchers have posited additional mechanisms that may shed some light on processes that facilitate the imitation of sounds. Foremost among these is the proposal that received speech sounds are encoded not only via auditory representations, but also in terms of the motor gestures required to generate particular speech sounds (Corballis, 2010; Galantucci, Fowler, & Turvey, 2006; Liberman & Mattingly, 1985; Lindbolm, 1996; Vallabha & Tuller, 2004; Yuen, Davis, Brysbaert, & Rastle, 2010), and possibly in terms of the somatosensory signals that occur during the production of speech (Guenther, 1995; Studdert-Kennedy, 2000). Wilson (2001b) similarly suggested that imitatible stimuli are not represented solely in terms of their unimodal perceptual properties, but also in terms of articulatory gestures. Such mechanisms provide a ready explanation for how an individual might reproduce novel sounds without auditory feedback on a first attempt. Specifically, if the representation that guides one’s vocal acts during vocal imitation is the motor representation required to produce a heard sound, then mismatching auditory feedback (or the lack of repeated instances of mismatching feedback) would have relatively little impact on vocal performance. Numerous theories have been proposed for how one might transform acoustic inputs into “matching” vocal gestures (reviewed by Galantucci et al., 2006), as this is often suggested as a fundamental mechanism of theories of speech imitation. When applied to vocal imitation, this perspective can be viewed as a multimodal representational model in which the key mechanisms correspond to cross-modal transformations rather than error correction based on unimodal auditory comparisons.

A related model of rapid speech imitation (shadowing) developed by Fowler and colleagues similarly suggests that speech sounds may be encoded in terms of motor representations (Galantucci et al., 2006; Honorof et al., 2011; Shockley et al., 2004). Specifically, they suggest that speech may be encoded in terms of the actual motor commands used to control vocal acts rather than (or in addition to) representations of gestures and associated kinesthetic stimuli. Such abstract control parameters might relate to constraints on trajectories of movement patterns (e.g., the order of speech primitives) rather than the specific motor gestures required to implement those patterns. The mechanisms emphasized by this model relate to controlling a nonlinear dynamical system rather than to creating analog representations of perceived events (Shockley et al., 2004). This approach provides a plausible account of why humans automatically imitate certain features of speech and may also be able to explain vocal convergence within social groups of non-humans.

A problem this sort of model confronts in accounting for the present data has to do with the flexibility of imitation as well as the etiology of imitative deficits. On the one hand, in proposing a specific auditory-vocal equivalence, such motorically constrained models seem ill equipped to account for the fact that imitation of sounds can be performed non-vocally, and that non-vocal sounds can be imitated vocally, often with high accuracy. On the other hand, in proposing a simple perceptual/motor equivalence, which is associated with fluency in speech, such models have difficulty accounting for the fact that imitative deficits can occur in individuals who are apparently able to fluently control phonation and articulation. Moreover, suggestions of perceptual/motor equivalence assume that the transformation from sensory to motor representations is effectively a non-issue. This stands in contrast to the apparent basis of poor-pitch singing, which appears to reflect a deficit of sensorimotor translation (Pfordresher & Brown, 2007).

Vocal Imitation as Multimodal Mapping

Another approach to modeling vocal imitation is also based on sensorimotor interactions, but adopts a broader, more flexible framework than the theories discussed above. This approach suggests that sensorimotor translation effects can span multiple perceptual and motor modalities. Such ideas stem from music cognition researchers who have suggested that the capacity of musicians to imitate sounds depends on coordinated auditory, kinesthetic, visual, and spatiomotor processes (described by Baily, 1985, as “auromotor coordination”), which are developed through experience and which enable some individuals to immediately reproduce musical patterns either vocally or instrumentally. At the core of such musical reproduction abilities lies hypothetical mechanisms of auditory imagery, which enable one to plan and control the production of complex, extended sound sequences (Baily, 1985; Pfordresher & Halpern, 2013). Auditory imaging can be viewed as analogous to visualization processes, enabling a musician not only to reproduce songs, but also to creatively modify those songs (e.g., transforming them into the styles of various musical genres, transposing them into different keys, etc.). Expert musical reproduction is also thought to require sophisticated conceptual processes acquired through extensive training, allowing heard (or imagined) sound sequences to be reproduced in the form of symbolic visual notations (Gordon, 2007). The supplemental mechanisms required for such flexible reproduction of sounds are not well specified, but clearly involve more than simple unimodal comparisons. In particular, they seem to require some means of voluntarily controlling vocal imitation. Consideration of the possible mechanisms that give rise to voluntary acts is beyond the scope of the current review. Recent work points to perception-action links and cognitive control as critical components (Jeannerod, 2006; Nattkemper, Ziessler, & Frensch, 2010; Zhang, Hughes, & Rowe, 2012).

Current computational models of speech acquisition provide quantitative hypotheses regarding the roles multimodal learning and representations play in vocal control and production (Kroger, Kannampuzha, & Neuschaefer-Rube, 2009; Tourville & Guenther, 2011; Westermann & Reck Miranda, 2004). These models can also provide a useful framework for thinking about how learning contributes to vocal imitation and about the form of the representations that make sound imitation possible. For instance, the DIVA model is an adaptive neural network model that can be used to simulate the acquisition of speech by humans (Guenther, 1994, 1995, 2006; Tourville & Guenther, 2011). The key components of this model closely match several mechanisms hypothesized to underlie vocal learning and imitation (Figure 10). In this model, vocal acts generate auditory and tactile feedback that is compared with auditory and somatosensory templates. The outcomes of these comparisons in turn modulate how sounds are produced. The model is adaptive in terms of how heard phonemes become mapped to somatosensory patterns, how somatosensory patterns are mapped to articulatory control, and how sounds are mapped to phonemes.

Figure 10. Guenther’s computational model of speech acquisition (adapted from Tourville and Guenther, 2011; Figure 1). In this model, multimodal maps acquired through experience make it possible for an individual to rapidly learn to reproduce novel sounds.

Figure 10. Guenther’s computational model of speech acquisition (adapted from Tourville and Guenther, 2011; Figure 1). In this model, multimodal maps acquired through experience make it possible for an individual to rapidly learn to reproduce novel sounds.

The DIVA model initially learns to generate pre-specified phonemes based on the results of essentially random babbling followed by specific practice (i.e., no vocal imitation is involved). This learning process can be viewed as a multimodal instantiation of the template model of vocal learning. In the model, babbling corresponds to induced random motions of speech articulators. Acquisition of phoneme production involves finding appropriate parameters to establish desired mappings. Initially, the model learns to map sensed mouth movements to particular articulator movements. Babbled movements produce tactile feedback. This stage basically leads to specific coordinated groupings of articulator movements that generate target tactile patterns. The mapping that the model learns transforms current states into desired states. Mappings from auditory representations to tactile representations are similarly learned so that certain sounds become associated with certain tactile configurations. Essentially, the model learns the different effector positions that are associated with different sounds. Importantly, multiple effector positions can lead to similar sound outputs and the model learns to approximate these many-to-one mappings. The targets for production are thus not a single auditory template for each sound, but a multidimensional space of possible effector configurations (learned from prior production experiences) that lead to that sound. The DIVA model assumes that the speaker (typically construed as a developing child) has a good representation of the sounds that need to be produced prior to vocal learning. However, it is well known that perception of speech sounds is experience dependent and that perceptual learning and speech production learning often occur in parallel. It is likely that perceptual learning gradually refines the target(s) with accumulated experience. The assumption that perception of speech stabilizes before productive learning begins is thus an oversimplification.

Within the DIVA framework, vocal imitation can be described as a process whereby new auditory-tactile targets can be incorporated into a pre-existing vocal control system. The factors that constrain how well the model can imitate a novel target sound include: (1) how the novel sound is represented; (2) the current set of learned tactile configurations; and (3) how closely components of the target sound map onto existing production templates. Ultimately, vocal imitation requires the model to generalize from past learning. However, whenever a new target is added to the repertoire of the model, this will initiate a new wave of adaptive changes to connections in the model that over time can change the model’s ability to accurately reproduce both familiar and novel sounds.

A trained DIVA model can rapidly learn to produce new speech sounds based on audio samples provided to it (Tourville & Guenther, 2011. This is possible because the learned maps represent both subcomponents of sounds and combinations of sounds (corresponding to phonemes, syllables, and words). Consequently, novel sounds are essentially indexing combinations of speech motor programs as well as expected somatosensory targets. Feedback is critical in this model for adjusting movements to reduce errors. Feedback is not what makes vocal imitation possible, however. It is instead the incrementally learned mappings based on past auditory, somatosensory, and vocal experiences. Note that in this model there are no specialized “imitation” modules or processors that transform sounds into vocal acts. Rather, it is the adaptive connections between different modalities, as well as the resolution of representations within each of these modality-specific processors that enable the model to imitate sounds. The DIVA model incorporates several features of earlier unimodal models, including error-correction learning mechanisms similar to those of the template model, auditory-to-motor recoding, and specializations for processing speech sounds. Because it is a computational model, it can be used to explore the effects of different experiences on imitative abilities and to generate specific predictions about how different auditory-motor coding schemes might impact an organism’s ability to imitate speech sounds.

Figure 11. (a) In the standard portrayal of vocal imitation as a learning mechanism, memories of sounds enable an individual to produce somewhat similar sounds that can be compared with the remembered sounds. Differences between the produced and remembered sounds serve as an error signal that is used to adjust future sound production. (b) In a more cognitive characterization of vocal imitation, multimodal representations of ongoing acoustic events (Current Experience) and memories of past events are used to predict future events and to generate and modulate plans for vocal actions, including intentional sound reproduction. In this framework, differences between expected events and perceived events adjust how events are represented.

Figure 11. (a) In the standard portrayal of vocal imitation as a learning mechanism, memories of sounds enable an individual to produce somewhat similar sounds that can be compared with the remembered sounds. Differences between the produced and remembered sounds serve as an error signal that is used to adjust future sound production. (b) In a more cognitive characterization of vocal imitation, multimodal representations of ongoing acoustic events (Current Experience) and memories of past events are used to predict future events and to generate and modulate plans for vocal actions, including intentional sound reproduction. In this framework, differences between expected events and perceived events adjust how events are represented.

A limitation of the DIVA model is that it assumes a single target for a given speech sound. Consequently, it would not be able to reproduce the melodic structure of sung speech, nor the individual-specific qualities of a person’s voice. It would also not be able to account for the transposition or temporal compression of heard sounds during imitation. Nevertheless, the DIVA model illustrates how vocal imitation abilities can potentially be achieved without any specialized learning mechanisms, and how incremental multimodal learning may be critical to the development of vocal imitation capabilities. The model also highlights the idea that the ability to vocally imitate depends on the flexibility with which sounds are encoded, as well as the capacity to cross-modally associate and monitor dynamic sensorimotor patterns related to vocal control. Because the DIVA model does not include any mechanisms that are unique to humans (other than predefined speech targets), it may be applicable to other species, including non-human primates and cetaceans. The model does not directly account for why vocal imitation abilities are rare among mammals.

Synthesis

The key mechanisms postulated in most current models of vocal imitation are auditory representations, motor control systems, a means of comparing past and present representations of sensorimotor events, and error-correction learning (Figure 11a). None of the models explicitly portrays vocal imitation as potentially involving maintenance and recall of past episodes, selective attention, or goal planning, and none distinguishes voluntary imitation from involuntary imitation (Figure 11b illustrates how such processes might be incorporated into a more cognitive model of sound imitation). Theories that describe vocal imitation in the context of acquired multimodal coordination of actions come closest to capturing the complexity of processing typically associated with the voluntary performance of a cognitive skill. These theories suggest that an organism’s capacity to imitate sounds is gained through extensive practice producing, feeling, hearing, and recalling different sounds. Although originally developed as models of speech learning and imitation, such multimodal mapping models might be applicable to sound imitation more generally. This could entail introducing multiple, specialized sensorimotor modules that vary across species and/or sound types to account for differences in the imitatibility of different sounds—such specialized modules might reflect either adaptive specializations or domain-specific customization from prior experiences. It remains to be seen whether any of these models can be modified such that their representations facilitate the perception, prediction, tracking, or coordination of actions.

VII. Conclusions

In the end, the success of any framework for explaining vocal imitation rests less on the terminology it prescribes than on the novel findings that it provides. Our general assessment is that current frameworks that describe vocal imitation as either a specialized communicative learning mechanism or alternatively as an instrumentally conditioned copying response are inadequate for explaining either what vocal imitation entails or its apparent rarity among mammals. For those who may have missed the gist of our argument within the jungle of details provided above, we briefly summarize the main points that led us to this conclusion.

Adult humans vary considerably in their abilities to imitate sounds, both vocally and non-vocally. They imitate speech automatically and unconsciously in contexts that are unlikely to lead to significant learning. They voluntarily imitate singing styles, accents they find amusing, and commercial slogans. Some imitate sounds for a living. Others imitate sounds covertly, including their own vocalizations (e.g., when mentally practicing lines for a play). Human toddlers readily imitate many sounds they hear, both vocally and non-vocally, not all of which are speech sounds. But, despite the frequency with which toddlers copy sounds, their fidelity is poor compared to that of a professional impersonator. This is because the professional has honed his or her imitative skills through extensive practice. Proficient imitation of sounds is a multifaceted skill that arises through learning and that takes much longer to master than the ability to speak. In humans, at least, it is learning that gives rise to vocal imitation abilities. Certainly, the ability to imitate sounds can catalyze communicative learning, but this is just one of many benefits that imitative abilities afford and perhaps not the most important.

Imitation of sounds by cetaceans is not nearly as evident as human vocal imitation. When put to the test, however, dolphins show astounding fidelity in reproducing artificial sounds. While it is true that dolphins do not reproduce speech with the accuracy shown by some birds, it is arbitrary to treat speech as the gold standard of sound imitation. In a competition where dolphins and humans are challenged to reproduce artificial sounds, dolphins would likely outperform many humans. The fact that adult dolphins (and humans) can imitate arbitrary novel sounds when instructed to do so strongly implies they that have flexible voluntary control of this ability. Is such control necessary for adding new sounds to a vocal repertoire? Would automatic imitation abilities not suffice? These are questions that most current explanatory frameworks cannot readily address, because they make no distinction between voluntary and involuntary sound imitation.

Current ideas about the nature of vocal imitation are largely derivative of hypotheses proposed by Thorndike (1911) over a century ago. Assumptions about the mechanisms underlying sound imitation in mammals have led researchers to underestimate the range of cognitive processes involved. The proposal that vocal imitation only requires comparing percepts of self-produced sounds with memories of previously experienced sounds is inadequate. All mammals that produce sounds can perceive their sound-producing actions through multiple modalities, but most show no capacity to imitate sounds. Auditory feedback can be an important guide to learning, but it is neither necessary nor sufficient for sound imitation. Mismatches between produced sounds and remembered sounds do not automatically lead to changes in sound production. Studies from adult humans suggest that an individual’s ability to map perceived sounds onto performable actions, to retain representations of sufficient detail for later reenactments, and to acquire the motor control necessary to flexibly reenact perceived events are key elements of successful sound imitation.

A New View on Vocal Imitation: Imitating Sounds Is a Complex Cognitive Skill

Thorndike had many useful insights about how animals learn and about how to identify limitations in their mental abilities. His ideas about vocal imitation, however, reflect the limited data available at the time. Now, we know much more about the imitative abilities possessed by humans and other animals. The time has come to move beyond Thorndike’s (1911, p. 77) idea that vocal imitation abilities are “a specialization removed from the general course of mental development.”

We suggest that sound imitation abilities should instead be viewed as a sophisticated skill (or set of skills) that relatively few organisms have the representational or vocal flexibility to master. The limited evidence currently available is consistent with the idea that sound imitation by primates and cetaceans may be mediated by learning, memory, attention, and vocal control mechanisms that involve experience-dependent multimodal representations of events. Multimodal representations appear to play a particularly important role in the sound imitation abilities of adult humans, and may also contribute to sound imitation by non-humans. The availability of such multimodal representations of events can enhance an individual’s ability to represent, predict, and reconstruct perceived events (Wilson & Knoblich, 2005), predict the future actions of conspecifics (Knoblich & Jordan, 2003; Loehr, Kourtis, Vesper, Sebanz, & Knoblich, 2013; Vesper, van der Wel, Knoblich, & Sebanz, 2013), socially communicate with others (Chartrand & Lakin, 2013; Lakin & Chartrand, 2003; Lakin, Chartrand, & Arkin, 2008), and monitor self-produced actions (Wilson, 2001a). An integrative framework in which sound imitation by any mammal is viewed as a skilled performance may provide new insights into the mechanisms underlying this ability.

Viewing voluntary sound imitation as a cognitive skill shifts emphasis away from describing its role in repertoire acquisition and more toward understanding how it fits within the broader domain of cognitive skill learning. Techniques that have been developed to study cognitive skill acquisition in other domains (e.g., comparisons between the performances of experts and amateurs) can potentially be brought to bear in studies of sound imitation. Such approaches, which often focus on individual differences, have rarely been applied in comparative cognition research. Understanding an individual’s ability to imitate sounds may require one to relate the individual’s perceptual, motor control, memory, selective attention, and conceptual capacities to his or her ability to produce sounds. Vocal imitation abilities may vary within and across species in ways that closely match variations in more basic cognitive capacities, and may reflect global constraints on how learning experiences impact perceptual-motor and cognitive abilities (Mercado, 2008).

Models of the representational processes involved in the control, imagery, and perception of movements (Grossberg & Paine, 2000; Grush, 2004; Hurley, 2008; Jeannerod, 2006) may provide a useful starting point for developing new ways of understanding sound imitation abilities. Most of these models (including the DIVA model described above) were developed to account for human learning and behavior, but they typically do not invoke mechanisms that are specific to humans. Exploring how well existing models can account for sound imitation abilities in different species will be an important step toward identifying the minimal sensorimotor and cognitive requirements for both the automatic and voluntary reproduction of sounds, as well as toward identifying contexts in which either covert imitation or self-imitation are likely to occur.

Vocal imitation is best understood as a subtype of sound imitation, which in turn can be viewed as a representational process that enables an individual to better perceive and predict ongoing actions (of which vocalizations are only a small subset). Imitation of sounds is as true a form of imitation as any based on visual inputs, and no less mysterious. Treating any imitative process as primarily a learning mechanism requires one to ignore any distinction between learning and performance. It would be more accurate to say that imitation depends on an individual’s capacity to flexibly represent observed events and to voluntarily control actions, independently of whether the observer attempts to reproduce components of what was observed (as suggested by Bandura, 1986). Unless one considers perception to be synonymous with learning, the framework we propose suggests that sound imitation is a learned process of interpreting the world rather than a specialized mechanism for learning how to produce vocalizations. Studies of adult humans and cetaceans are particularly useful for investigating such processes because these are the only mammals that have consistently shown the ability or motivation to learn to imitate arbitrary sounds in experimental settings.

This account of vocal imitation will remind many readers of the purported role of the “mirror neuron system”, in facilitating action understanding (Aziz-Zadeh & Ivry, 2009; Corballis, 2010; Molenberghs, Cunnington, & Mattingley, 2009; Ocampo & Kritikos, 2011). Although we clearly share with this view the sense that the intersection of perception and action plays a critical role in cognition, our perspective differs in several ways from these accounts. First, and most important, we propose that vocal imitation is an ability that is acquired through learning and reflects generalization of learned associations within a complex cognitive architecture. By contrast, the mirror neuron hypothesis, in our view, implies a more hard-wired, modular system. Second, we propose that vocal imitation is a manifestation of the organism’s tendency to generate multimodal representations based on associations. Thus we are not suggesting that there is something special about perception/action intersections but rather that intersections may cross multiple perceptual and motor modalities. Finally, a shortcoming we see in the mirror neuron hypothesis is that it is computationally underspecified. This comment is meant more as an observation than a criticism. Mirror neurons emerged from the neuroscience literature and are thus a biological reality. As cognitivists, however, we think it is important to focus on the underlying functional architecture. The mirror neuron hypothesis specifies no such architecture, because the system itself exists at a different level of analysis. Ultimately we find ourselves aligned with the mirror neuron hypothesis in broad philosophical terms, while differing significantly in several important details.

Some Open Questions

When one considers sound imitation freed from the restraints of past assumptions regarding its nature and functions, this can change not only how one describes vocal imitation, but also how it is studied scientifically (Galef, 2013). We end by considering just a few of the many challenging questions raised by this new view of sound imitation.

When does imitative processing contribute to sound perception and production?

Because past criteria for designating phenomena as vocal imitation have been conservative, it is possible that the prevalence of sound imitation abilities in mammals has been underestimated. For instance, a vocalizing cetacean might produce several similar calls in a row. Such repetitive vocal behavior has not been previously viewed as being imitative, but it could involve either deferred imitation of others or self-imitation (Mercado, Murray, Uyeyama, Pack, & Herman, 1998). Piaget (1962) suggested that when infants produce series of similar vocalizations, that these vocal acts should be viewed as self-imitative. Currently, there are no validated, objective metrics for distinguishing an imitative vocal act from a non-imitative one, for distinguishing self-imitation from repetition, or for determining when sounds might be covertly imitated. Consequently, it is difficult to observationally distinguish a series of self-imitated sounds from a series of independently generated sounds with similar features. The extent to which self-imitation contributes to vocal production is not known for any species, including humans, so this possibility cannot yet be excluded. Studies of self-imitation in adults may reveal new ways of distinguishing self-imitative sound production from other non-imitative sound-producing acts, thereby clarifying how often imitative mechanisms are engaged during sound production and perception. When one considers that an imitative vocal act might be deferred for extended periods, the possibility arises that most of the sounds an individual produces might be imitative.

Modern models of sound production posit that many of the mechanisms underlying sound imitation may be automatically engaged every time an organism with the capacity to imitate hears a sound. In that case, observations of overt sound copying would not accurately reflect the frequency of imitative processing. Cases in which individuals use their voice to voluntarily or automatically imitate novel sounds may reflect only a small proportion of instances in which sound imitation skills are engaged during the processing of acoustic events. By analogy, an adult human may think many thoughts each day and yet only occasionally state, “I was just thinking that . . . .” The rarity of such statements does not accurately reflect the frequency of thinking. Representing certain sounds in terms of the vocal or motor acts required to reproduce those sounds may be the default perceptual process rather than a selectively applied approach (Möttönen, Dutton, & Watkins, 2013; Wilson & Knoblich, 2005; Yuen et al., 2010). Techniques are needed for identifying when auditory perception by both humans and non-humans engages imitative mechanisms.

What determines which acoustic events are imitated?

Mowrer (1960) suggested that the motivation for parrots to imitate human speech was driven by an emotional attachment between a bird and its caretaker and by the frequency with which the bird encountered situations where its needs were not met. This interpretation predicts that differences in “personality” within or across species might correlate with the likelihood that an individual imitates particular sounds, and further suggests that an individual’s imitativeness might provide an indirect indicator of his or her emotional state or relationship with a particular individual (Gewirtz & Stingle, 1968). Whether certain emotional states are a prerequisite for overt sound imitation remains unknown. It is interesting to note, however, that in many of the cases in which cetaceans have been observed to imitate environmental sounds (including speech), the imitator has been socially deprived relative to natural situations (Caldwell & Caldwell, 1972; Eaton, 1979; Foote et al., 2006; Ridgway et al., 2012). Miklosi (1999) suggested that many instances of imitation could be related to play behavior (see also Richards, 1986, and Pepperberg, 2005, 2010, for discussions of vocal play in dolphins and parrots). Consistent with this idea, many professional imitators are entertainers. The act of sound imitation (or observations of such acts) might thus sometimes serve to modulate an organism’s emotional state, acting as a homeostatic mechanism rather than as a means of learning, localizing, or communicating. It remains unclear how any emotional functions of sound imitation might relate to other potential functions.

We know little about the sound qualities that mammals are most likely to reproduce or about the fidelity with which the most proficient imitators can reproduce these qualities. To reveal the full imitative capacities of non-human mammals may require extensive cognitive and vocal training over several years focusing on sounds that apes or cetaceans find naturally imitatible. If sound imitation in cetaceans or apes depends on comparable mechanisms to those used by humans, then it should be possible to train individuals to flexibly imitate a wide range of sounds or to specialize in imitating certain classes of sounds. Identifying the upper limits of imitative capacity in cetaceans may yield new insights into the constraints that prevent other mammals from learning to imitate sounds. If higher fidelity imitative representations enhance remote action monitoring, as suggested above, then this predicts that audiospatial sequences that correspond to natural events should be more imitatible than randomly ordered sequences. To date, researchers have only explored the ability of non-human mammals to imitate individual sounds or repetitions of the same sound; essentially nothing is known about how the imitatibility of sound sequences varies across species.

What determines an individual’s imitative capacity?

There is considerable debate about which if any imitative abilities of humans are genetically determined (Jones, 2007, 2009; Parton, 1976). Recent work tends to side with Piaget’s (1962) conclusion that there is not a single specialized mechanism that gives rise to vocal imitation (or any other kind of imitation), but that a variety of perceptual, motor, and cognitive abilities that emerge sequentially during development contribute to the acquisition of vocal imitation abilities (Jones, 2007). Interestingly, the earliest imitative acts noted by Piaget (1962) and other developmental psychologists often have involved sound production (Jones, 2006, 2007), suggesting that sound plays a particularly important role in the development of imitative abilities. Furthermore, the first sounds imitated by infants are in some cases not sounds produced vocally by other humans, but are instead novel environmental sounds or percussive sounds (Piaget, 1962; E. Mercado, personal observation), contrary to what one might expect if sound imitation serves primarily as an adaptation for speech acquisition. Detailed investigations of the kinds of sounds naturally reproduced by humans and other animals at an early age, as well as systematic analyses of how imitative capacities vary across individuals, may help researchers to identify the genetic, neural, or experiential variables that impact an organism’s capacity and tendency to imitate particular sound features. For instance, a recent study of expert phoneticians (who specialize in transcribing speech) identified both anatomical predispositions and acquired brain morphology that were correlated with an individual’s ability to transcribe novel phonetic contrasts (Golestani, Price, & Scott, 2011).

As noted earlier, several researchers have suggested that the rarity of vocal imitation abilities in mammals reflects limitations in vocal control (Deacon, 1997; Fitch, 2010; Mowrer, 1960). Jarvis has made this argument most strongly and convincingly (Arriaga & Jarvis, 2013; Jarvis, 2004, 2013). Undoubtedly, variations in imitative capacity across species and individuals reflect differences in neural architecture, including differences in circuits involved in the control of vocal production. However, the brains of different individuals vary in many ways, and scientific history is replete with premature identifications of “brain differences that make the difference.” To date, neuroscientific studies of vocal imitation have been designed primarily to identify neural substrates that instantiate the classic template matching model of vocal learning. To the extent that this model is inadequate for explaining sound imitation by adult mammals, research aimed at revealing how neural circuits implement template-based vocal learning will be similarly inadequate for understanding how and why mammals imitate sounds.

The flexibility with which dolphins and humans can reproduce sounds may reflect their general cognitive abilities rather than any specialized mechanisms of vocal control. Music cognition researchers often suggest that musicians can flexibly manipulate sophisticated representations of sound streams and associated visual or motor events, and that acquired musical concepts constrain a musician’s performances. Dolphins have shown the ability to explicitly access representations of past events (Mercado, Uyeyama, Pack, & Herman, 1999), to form and use abstract concepts about such events, and to actively use sounds (and memories of sounds) to guide their actions (reviewed by Mercado & DeLong, 2010). The extent to which available concepts and memory mechanisms constrain sound imitation abilities has seldom been considered in past comparative studies. Future efforts to characterize and understand the nature and functions of sound imitation in mammals can benefit from integrated approaches that more fully consider the representational processes cetaceans and primates may bring to bear when imitating both familiar and novel sounds.

In this review, a cognitive approach to understanding sound imitation was presented as an alternative to the possibility that vocal imitation serves primarily as a mechanism for learning to produce novel sounds. The shift from a social communicative learning model of vocal imitation to a cognitive skill oriented model leads to novel hypotheses about cross-species commonalities in representational and perceptual processes and to new avenues for theoretical integration of comparative bioacoustic studies with studies of human auditory cognition. Experimental investigations of automatic imitation, individual differences in imitative capacity, and correlations between imitative fidelity and spatial acuity or auditory working memory capacity in nonhuman animals may reveal unsuspected similarities (or differences) in the imitative skills of primates and cetaceans.

Table 1. Glossary

vol9_mmp_table_1


Footnotes:

1 By biologists’ definition of learning, vocal contagion is a kind of learning because auditory inputs lead to a change in behavior. Psychologists would instead classify vocal contagion as an elicited behavior or a reflexive action.

2 Pepperberg (2005) noted, however, that in many cases it is unclear whether birdsong learning actually involves vocal imitation (see also Marler, 1997).

3 an unvoiced sound produced by placing the tongue between the lips and blowing.

4 similar to vocal fry produced by human singers

5 Singers typically modify songs by gradually inserting, deleting, or modifying existing patterns within their current song, rather than replacing their songs entirely

6 Lilly (1967) noted that Aristotle reported that dolphins made sounds with “a voice like that of the human,” so this discovery might be more accurately described as a rediscovery.

7 Here, the term vocal learning is meant to include vocal imitation.

8 One complication of this approach is that it is difficult to establish how well a listener can imitate a sound unless the listener is known to make that sound or actually imitates the broadcast sound.
9 However, an advantage was not found for the imitation of relative pitch content.


References

Abrego-Collier, C., Grove, J., & Sonderegger, M. (2011). Effects of speaker evaluation on phonetic convergence. Paper presented at the Proceedings of the 17th International Congress of Phonetic Science.

Adret, P. (1993). Vocal learning induced with operant techniques: An overview. Netherlands Journal of Zoology, 43, 125–142.

Agafonov, A. V., & Panova, E. M. (2012). Individual patterns of tonal (whistling) signals of bottlenose dolphins (Tursiops truncates) kept in relative isolation. Biology Bulletin, 39, 430–440. doi: 10.1134/S1062359012050020

Akcay, C., Tom, M. E., Campbell, S. E., & Beecher, M. D. (2013). Song type matching is an honest early threat signal in a hierarchical animal communication system. Proceedings of the Royal Society B – Biological Sciences, 280, 20122517. doi: 10.1098/rspb.2012.2517

Amin, T. B., Marziliano, P., & German, J. S. (2012). Nine voices, one artist: Linguistic and acoustic analysis. Paper presented at the 2012 IEEE International Conference on Multimedia and Expo, Melbourne, Australia.

Anderson, J. R. (1982). Acquisition of cognitive skill. Psychological Review, 89, 369–406. doi: 10.1037//0033-295X.89.4.369

Andrew, R. J. (1962). Evolution of intelligence and vocal mimicking: Studies of large-brained mammals promise to elucidate some problems of human evolution. Science, 137, 585–589. doi: 10.1126/science.137.3530.585

Arriaga, G., & Jarvis, E. D. (2013). Mouse vocal communication system: Are ultrasounds learned or innate? Brain and Language, 124, 96–116. doi: 10.1016/j.bandl.2012.10.002

Au, W. W. L. (1993). The sonar of dolphins. New York: Springer-Verlag.

Aziz-Zadeh, L., & Ivry, R. B. (2009). The human mirror neuron system and embodied representations. Advances in Experimental Medicine and Biology, 629, 355–376. doi: 10.1007/978-0-387-77064-2_18

Babel, M. (2012). Evidence for phonetic and social selectivity in spontaneous phonetic imitation. Journal of Phonetics, 40, 177–189. doi: 10.1016/j.wocn.2011.09.001

Baer, D. M., & Deguchi, H. (1985). Generalized imitation from a radical-behavioral viewpoint. In S. Reiss & R. R. Bootzin (Eds.), Theoretical issues in behavior therapy (pp. 179–217). Orlando: Academic Press.

Baer, D. M., Peterson, R. F., & Sherman, J. A. (1967). The development of imitation by reinforcing behavioral similarity to a model. Journal of the Experimental Analysis of Behavior, 10, 405–416. doi: 10.1901/jeab.1967.10-405

Baily, J. (1985). Musical structure and human movement. In P. Howell, I. Cross, & R. West (Eds.), Musical structure and cognition (pp. 237–258). London: Academic Press.

Bandura, A. (1986). Social foundations of thought and action: A social cognitive theory. Englewood Cliffs, NJ: Prentice-Hall.

Baylis, J. R. (1982). Avian vocal mimicry: Its function and evolution. In D. E. Kroodsma (Ed.), Acoustic communication in birds, vol. 2: Song learning and its consequences (pp. 51–83). New York: Academic Press.

Beaman, C. P., & Williams, T. I. (2010). Earworms (stuck song syndrome): Towards a natural history of intrusive thoughts. British Journal of Psychology, 101, 637–653. doi: 10.1348/000712609X479636

Beckers, G. J., Nelson, B. S., & Suthers, R. A. (2004). Vocal-tract filtering by lingual articulation in a parrot. Current Biology, 14, 1592–1597. doi: 10.1016/j.cub.2004.08.057

Blake, R., & Shiffrar, M. (2007). Perception of human motion. Annual Review of Psychology, 58, 47–73. doi: 10.1146/annurev.psych.57.102904.190152

Blauert, J. (1997). Spatial hearing: The psychophysics of human sound localization. Cambridge, MA: MIT Press.

Bolhuis, J. J., Okanoya, K., & Scharff, C. (2010). Twitter evolution: Converging mechanisms in birdsong and human speech. Nature Reviews Neuroscience, 11, 747–759. doi: 10.1038/nrn2931

Byrne, R. W. (2002). Imitation of novel complex actions: What does the evidence from animals mean? Advances in the Study of Behavior, 31, 77–105. doi: 10.1016/S0065-3454(02)80006-7

Byrne, R. W., & Russon, A. E. (1998). Learning by imitation: A hierarchical approach. Behavioral and Brain Sciences, 21, 667–684. doi: 10.1017/S0140525X98001745

Caldwell, M. C., & Caldwell, D. K. (1965). Individualized whistle contours in bottlenosed dolphins (Tursiops truncatus). Nature, 207, 434–435. doi: 10.1038/207434a0

Caldwell, M. C., & Caldwell, D. K. (1972). Vocal mimicry in the whistle mode by an Atlantic bottlenosed dolphin. Cetology, 9, 1–8.

Caldwell, M. C., & Caldwell, D. K. (1979). The whistle of the Atlantic bottlenosed dolphin (Tursiops truncatus) – Ontogeny. In H. E. Winn & B. L. Olla (Eds.), Behavior of marine animals: Current perspective in research. Vol 3. Cetaceans (pp. 369–401). New York: Plenum Press.

Casile, A., & Giese, M. A. (2006). Nonvisual motor training influences biological motion perception. Current Biology, 16, 69–74. doi: 10.1016/j.cub.2005.10.071

Cazau, D., Adam, O., Laitman, J. T., & Reidenberg, J. S. (2013). Understanding the intentional acoustic behavior of humpback whales: A production-based approach. Journal of the Acoustical Society of America, 134, 2268–2273. doi: 10.1121/1.4816403

Chartrand, T. L., & Lakin, J. L. (2013). The antecedents and consequences of human behavioral mimicry. Annual Review of Psychology, 64, 285-308. doi: 10.1146/annurev-psych-113011-143754

Chmelnitsky, E. G., & Ferguson, S. H. (2012). Beluga whale, Delphinapterus leucas, vocalizations from the Churchill River, Manitoba, Canada. Journal of the Acoustical Society of America, 131, 4821–4835. doi: 10.1121/1.4707501

Cholewiak, D. M. (2008). Evaluating the role of song in the humpback whale (Megaptera novaeangliae) breeding system with respect to intra-sexual interactions. Doctoral dissertation, Cornell University.

Clark, C. W. (1990). Acoustic behavior of mysticete whales. In J. A. Thomas & R. Kastelein (Eds.), Sensory abilities of cetaceans: Laboratory and field evidence (pp. 571–583). New York: Plenum.

Clarke, E. F. (1993). Imitating and evaluating real and transformed musical performances. Music Perception, 10, 317–341.

Clarke, E. F., & Baker-Short, C. (1987). The imitation of perceived rubato: A preliminary study. Psychology of Music, 15, 58–75. doi: 10.1177/0305735687151005

Coleman, P. D. (1962). Failure to localize the source distance of an unfamiliar sound. Journal of the Acoustical Society of America, 34, 345–346. doi: 10.1121/1.1928121

Connor, R. C. (2000). Group living in whales and dolphins. In J. Mann, R. C. Connor, P. L. Tyack, & H. Whitehead (Eds.), Cetacean societies: Field studies of dolphins and whales (pp. 199–218). Chicago: University of Chicago Press.

Corballis, M. C. (2010). Mirror neurons and the evolution of language. Brain and Language, 112, 23–35. doi: 10.1016/j.bandl.2009.02.002

Cowan, N., Braine, M. D. S., & Leavitt, L. A. (1985). The phonological and metaphonological representation of speech: Evidence from fluent backward talkers. Journal of Memory and Language, 24, 679–698. doi: 10.1016/0749-596X(85)90053-1

Cranford, T. W., Elsberry, W. R., Van Bonn, W. G., Jeffress, J. A., Chaplin, M. S., Blackwood, D. J., . . . Ridgway, S. H. (2011). Observation and analysis of sonar signal generation in the bottlenose dolphin (Tursiops truncatus): Evidence for two sonar sources. Journal of Experimental Marine Biology and Ecology, 407, 81–96. doi: 10.1016/j.jembe.2011.07.010

Crowell, S., Harley, H. E., Fellner, W., & Larsen-Plott, L. (2005). Vocal productions of rhythms by the bottlenose dolphin. Paper presented at the 16th Biennial Conference on the Biology of Marine Mammals, San Diego, CA.

d’Alessandro, C., Rilliard, A., & Le Beux, S. (2011). Chironomic stylization of intonation. Journal of the Acoustical Society of America, 129, 1594–1604. doi: 10.1121/1.3531802

Dalla Bella, S., Giguere, J.-F., & Peretz, I. (2007). Singing proficiency in the general population. Journal of the Acoustical Society of America, 121, 1182–1189. doi: 10.1121/1.2427111

Dalziell, A. H., & Magrath, R. D. (2012). Fooling the experts: Accurate vocal mimicry in the song of the superb lyrebird, Menura novaehollandiae. Animal Behaviour, 83, 1401–1410.

Darling, J. D., Jones, M. E., & Nicklin, C. P. (2012). Humpback whale (Megaptera novaeangliae) singers in Hawaii are attracted to playback of similar song (L). Journal of the Acoustical Society of America, 132, 2955–2958. doi: 10.1121/1.4757739

Darling, J. D., Meagan, E., & Nicklin, C. P. (2006). Humpback whale songs: Do they organize males during the breeding season? Behaviour, 143, 1051–1101. doi: 10.1163/156853906778607381

Deacon, T. W. (1997). The symbolic species: The co-evolution of language and the brain. New York: W. W. Norton.

Deecke, V. B. (1998). Stability and change of killer whale (Orcinus orca) dialects. Masters thesis, University of British Columbia.

Deecke, V. B., Ford, J. K., & Spong, P. (2000). Dialect change in resident killer whales: Implications for vocal learning and cultural transmission. Animal Behaviour, 60, 629–638. doi: 10.1006/anbe.2000.1454

DeRuiter, S. L., Boyd, I. L., Claridge, D. E., Clark, C. W., Gagnon, C., Southall, B. L., & Tyack, P. L. (2013). Delphinid whistle production and call matching during playback of simulated military sonar. Marine Mammal Science, 29, E46-E59. doi: 10.1111/j.1748-7692.2012.00587.x

Domjan, M. (2000). Learning: Overview. In A. E. Kazdin (Ed.), Encyclopedia of psychology (Vol. 5, pp. 1–3). New York: Oxford University Press.

Donald, M. (1991). Origins of the modern mind: Three stages in the evolution of culture and cognition. Cambridge, MA: Harvard University Press.

Doupe, A. J., & Kuhl, P. K. (1999). Birdsong and human speech: Common themes and mechanisms. Annual Review of Neuroscience, 22, 567–631. doi: 10.1146/annurev.neuro.22.1.567

Drake, C. (1993). Reproduction of musical rhythms by children, adult musicians, and adult nonmusicians. Perception & Psychophysics, 53, 25–33. doi: 10.3758/BF03211712

Drake, C., & Palmer, C. (2000). Skill acquisition in music performance: Relations between planning and temporal control. Cognition, 74, 1–32. doi: 10.1016/S0010-0277(99)00061-X

Eaton, R. L. (1979). A beluga imitates human speech. Carnivore, 2, 22–23.

Edds-Walton, P. L. (1997). Acoustic communication signals of mysticete whales. Bioacoustics, 8, 47–60. doi: 10.1080/09524622.1997.9753353

Egnor, S. E., & Hauser, M. D. (2004). A paradox in the evolution of primate vocal learning. Trends in Neurosciences, 27, 649–654. doi: 10.1016/j.tins.2004.08.009

Eigsti, I.-M., de Marchena, A. B., Schuh, J. M., & Kelley, E. (2011). Language acquisition in autism spectrum disorders: A developmental review. Research in Autism Spectrum Disorders, 5, 681–691. doi: 10.1016/j.rasd.2010.09.001

Eriksson, A. (2010). The disguised voice: Imitating accents of speech styles and impersonating individuals. In C. Llamas (Ed.), Language and identities (pp. 86–98). Edinburgh: Edinburgh University Press.

Eriksson, A., & Wretling, P. (1997). How flexible is the human voice? A case study of mimicry. Paper presented at the Fifth European Conference on Speech Communication and Technology.

Falls, J. B., & Brooks, R. J. (1975). Individual recognition by song in white-throated sparrows. II. Effects of location. Canadian Journal of Zoology, 53, 1412–1420. doi: 10.1139/z75-170

Fay, W. H. (1969). On the basis of autistic echolalia. Journal of Communication Disorders, 2, 38–47. doi: 10.1016/0021-9924(69)90053-7

Fay, W. H., & Coleman, R. O. (1977). A human sound transducer/reproducer: Temporal capabilities of a profoundly echolalic child. Brain and Language, 4, 396–402. doi: 10.1016/0093-934X(77)90034-7

Fellner, W., Bauer, G. B., & Harley, H. E. (2006). Cognitive implications of synchrony in dolphins: A review. Aquatic Mammals, 32, 511–516.

Filatova, O. A., Burdin, A. M., & Hoyt, E. (2010). Horizontal transmission of vocal traditions in killer whale (Orcinus orca) dialects. Biology Bulletin, 37, 965–971. doi: 10.1134/S1062359010090104

Filatova, O. A., Deecke, V. B., Ford, J. K. B., Matkin, C. O., Barrett-Lennard, L. G., Guzeev, M. A., . . . Hoyt, E. (2012). Call diversity in the North Pacific killer whale populations: Implications for dialect evolution and population history. Animal Behavior, 83, 595–603. doi: 10.1016/j.anbehav.2011.12.013

Fish, M. P., & Mowbray, W. H. (1962). Production of underwater sound by the white whale or beluga Delphinapterus leucas (Pallas). Journal of Marine Research, 20, 149–161.

Fitch, W. T. (2010). The evolution of language. Cambridge: Cambridge University Press.

Fodor, J. A. (1983). The modularity of mind: An essay on faculty psychology. Cambridge, MA: MIT Press.

Foote, A. D., Griffin, R. M., Howitt, D., Larsson, L., Miller, P. J. O., & Hoelzel, A. R. (2006). Killer whales are capable of vocal learning. Biology Letters, 2, 509–512. doi: 10.1098/rsbl.2006.0525

Ford, J. K. B. (1989). Acoustic behavior of resident killer whales (Orcinus orca) off Vancouver Island, British Columbia, Canada. Canadian Journal of Zoology, 67, 727–745.

Ford, J. K. B. (1991). Vocal traditions among resident killer whales (Orcinus orca) in coastal waters of British Columbia. Canadian Journal of Zoology, 69, 1454–1483. doi: 10.1139/z91-206

Fowler, C. A., Brown, J. M., Sabadini, L., & Weihing, J. (2003). Rapid access to speech gestures in perception: Evidence from choice and simple response time tasks. Journal of Memory and Language, 49, 396–413. doi: 10.1016/S0749-596X(03)00072-X

Frazer, L. N., & Mercado, E., III. (2000). A sonar model for humpback whale song. IEEE Journal of Oceanic Engineering, 25, 160–182. doi: 10.1109/48.820748

Fripp, D., Owen, C., Quintana-Rizzo, E., Shapiro, A., Buckstaff, K., Jankowski, K., . . . Tyack, P. (2005). Bottlenose dolphin (Tursiops truncatus) calves appear to model their signature whistles on the signature whistles of community members. Animal Cognition, 8, 17–26. doi: 10.1007/s10071-004-0225-z

Galantucci, B., Fowler, C. A., & Turvey, M. T. (2006). The motor theory of speech perception reviewed. Psychonomic Bulletin & Review, 13, 361–377. doi: 10.3758/BF03193990

Galef, B. G. (1988). Imitation in animals: History, definition, and interpretation of data from the psychological laboratory. In T. R. Zentall & B. G. Galef (Eds.), Social learning: Psychological and biological perspectives (pp. 3–28). Hillsdale, NJ: Lawrence Erlbaum Associates.

Galef, B. G. (2013). Imitation and local enhancement: Detrimental effects of consensus definitions on analyses of social learning in animals. Behavioural Processes, 100, 123–130. doi: 10.1207/s15327604jaws0204_2

Garamszegi, L. Z., Eens, M., Pavlova, D. Z., Aviles, J., & Moller, A. P. (2007). A comparative study of the function of heterospecific mimicry in European passerines. Behavioral Ecology, 18, 1001–1009. doi: 10.1093/beheco/arm069

Garcia, E., Baer, D. M., & Firestone, I. (1971). The development of generalized imitation within topographically determined boundaries. Journal of Applied Behavioral Analysis, 4, 101–112.

Gardner, M. B. (1969). Distance estimation of 0 degrees or apparent 0 degrees-oriented speech signals in anechoic space. Journal of the Acoustical Society of America, 45, 47–53. doi: 10.1121/1.1911372

Garrod, S., & Pickering, M. J. (2009). Joint action, interactive alignment, and dialog. Topics in Cognitive Science, 1, 292–304. doi: 10.1111/j.1756-8765.2009.01020.x

Gewirtz, J. L., & Stingle, K. G. (1968). Learning of generalized imitation as the basis for identification. Psychological Review, 75, 374–397. doi: 10.1037/h0026378

Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105, 251–279. doi: 10.1037//0033-295X.105.2.251

Goldinger, S. D., & Azuma, T. (2004). Episodic memory reflected in printed word naming. Psychonomic Bulletin & Review, 11, 716–722. doi: 10.3758/BF03196625

Golestani, N., Price, C. J., & Scott, S. K. (2011). Born with an ear for dialects? Structural plasticity in the expert phonetician brain. Journal of Neuroscience, 31, 4213-4220. doi: 10.1523/JNEUROSCI.3891-10.2011

Golestani, N., & Zatorre, R. J. (2009). Individual differences in the acquisition of second language phonology. Brain and Language, 109, 55–67. doi: 10.1016/j.bandl.2008.01.005

Gordon, E. E. (2007). Learning sequences of music. Chicago: GIA Publications.

Grebner, D. M., Parks, S. E., Bradley, D. L., Miksis-Olds, J. L., Capone, D. E., & Ford, J. K. (2011). Divergence of a stereotyped call in northern resident killer whales. Journal of the Acoustical Society of America, 129, 1067–1072. doi: 10.1121/1.3531842

Green, G. A. (1990). The effect of vocal modeling on pitch-matching accuracy of elementary schoolchildren. Journal of Research in Music Education, 38, 225–231. doi: 10.2307/3345186

Gregory, S. W., Jr., & Webster, S. (1996). A nonverbal signal in voices of interview partners effectively predicts communication accommodation and social status perceptions. Journal of Personality and Social Psychology, 70, 1231–1240. doi: 10.1037/0022-3514.70.6.1231

Grossberg, S., & Paine, R. W. (2000). A neural model of corticocerebellar interactions during attentive imitation and predictive learning of sequential handwriting movements. Neural Networks, 13, 999-1046. doi: 10.1016/S0893-6080(00)00065-4

Grossi, D., Marcone, R., Cinquegrana, T., & Gallucci, M. (2012). On the differential nature of induced and incidental echolalia in autism. Journal of Intellectual Disability Research. doi: 10.1111/j.1365-2788.2012.01579.x

Grush, R. (2004). The emulation theory of representation: Motor control, imagery, and perception. Behavioral and Brain Sciences, 27, 377–396.

Guenther, F. H. (1994). A neural network model of speech acquisition and motor equivalent speech production. Biological Cybernetics, 72, 43–53. doi: 10.1007/BF00206237

Guenther, F. H. (1995). Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production. Psychological Review, 102, 594–621. doi: 10.1037/0033-295X.102.3.594

Guenther, F. H. (2006). Cortical interactions underlying the production of speech sounds. Journal of Communication Disorders, 39, 350–365. doi: 10.1016/j.jcomdis.2006.06.013

Guinee, L. N., Chu, K., & Dorsey, E. M. (1983). Changes over time in the songs of known individual humpback whales (Megaptera novaeangliae). In R. Payne (Ed.), Communication and behavior of whales (pp. 59–80). Boulder, CO: Westview Press.

Halpern, A. R., & Bartlett, J. C. (2011). The persistence of musical memories: A descriptive study of earworms. Music Perception, 28, 425–432. doi: 10.1525/MP.2011.28.1.425

Harley, H. E. (2008). Whistle discrimination and categorization by the Atlantic bottlenose dolphin (Tursiops truncatus): A review of the signature whistle framework and a perceptual test. Behavioural Processes, 77, 243–268. doi: 10.1016/j.beproc.2007.11.002

Hauser, M. D. (2009). The illusion of biological variation: A minimalist approach to the mind. In M. Piatelli-Pamarini, J. Uriagereka, & P. Salaburu (Eds.), Of minds and language: A Dialogue with Noam Chomsky in the Basque Country (pp. 299–328). Oxford, UK: Oxford University Press.

Herman, L. M. (1980). Cognitive characteristics of dolphins. In L. M. Herman (Ed.), Cetacean behavior: Mechanisms and functions (pp. 363–429). New York: Wiley Interscience.

Herman, L. M. (2002). Vocal, social, and self-imitation by bottlenosed dolphins. In K. Dautenhahn & C. Nehaniv (Eds.), Imitation in animals and artifacts (pp. 63–108). Cambridge, MA: MIT Press.

Herman, L. M., & Tavolga, W. N. (1980). The communication systems of cetaceans. In L. M. Herman (Ed.), Cetacean behavior: Mechanisms and functions (pp. 149–209). New York: Wiley Interscience.

Heyes, C. M. (1994). Social learning in animals: Categories and mechanisms. Biological Reviews, 69, 207–231. doi: 10.1111/j.1469-185X.1994.tb01506.x

Heyes, C. M. (1996). Genuine imitation. In C. Heyes & B. G. Galef (Eds.), Social learning in animals: The roots of culture (pp. 371–389). New York: Academic Press.

Heyes, C. M. (2011). Automatic imitation. Psychological Bulletin, 137(3), 463–483. doi: 10.1037/a0022288 2011-01604-001

Hoelzel, A. R., & Osborne, R. (1986). Killer whale call characteristics: Implications for cooperative foraging strategies. In B. C. Kirkevold & J. S. Lockard (Eds.), Behavioral biology of killer whales (pp. 373–403). New York: A. R. Liss.

Honorof, D. N., Weihing, J., & Fowler, C. A. (2011). Articulatory events are imitated under rapid shadowing. Journal of Phonetics, 39, 18–38. doi: 10.1016/j.wocn.2010.10.007

Hooper, S., Reiss, D., Carter, M., & McCowan, B. (2006). Importance of contextual saliency on vocal imitation by bottlenose dolphins. International Journal of Comparative Psychology, 19, 116–128.

Hopkins, W. D., Taglialatela, J., & Leavens, D. A. (2007). Chimpanzees differentially produce novel vocalizations to capture the attention of a human. Animal Behaviour, 73, 281–286. doi: 10.1016/j.anbehav.2006.08.004

Hoppe, D., Sadakate, M., & Desain, P. (2006). Development of real-time visual feedback assistance in singing training: A review. Journal of Computer Assisted Learning, 22, 308–316. doi: 10.1111/j.1365-2729.2006.00178.x

Hurley, S. (2008). The shared circuits model (SCM): How control, mirroring, and simulation can enable imitation, deliberation, and mindreading. Behavioral and Brain Sciences, 31, 1–22. doi: 10.1017/S0140525X07003123

Hutchins, S., & Peretz, I. (2012). Amusics can imitate what they cannot discriminate. Brain and Language, 123, 234–239. doi: 10.1016/j.bandl.2012.09.011

Hutchins, S., Zarate, J. M., Zatorre, R. J., & Peretz, I. (2010). An acoustical study of vocal pitch matching in congenital amusia. Journal of the Acoustical Society of America, 127, 504–512. doi: 10.1121/1.3270391

Immelmann, K., & Beer, C. (1989). A dictionary of ethology. Cambridge, MA: Harvard University Press.

Ingvalson, E. M., Holt, L. L., & McClelland, J. L. (2012). Can native Japanese listeners learn to differentiate /r-l/ on the basis of F3 onset frequency? Bilingualism: Language and Cognition, 15, 434–435. doi: 10.1017/S1366728912000041

Jaakkola, K., Guarino, E., & Rodriguez, M. (2010). Blindfolded imitation in a bottlenose dolphin (Tursiops truncatus). International Journal of Comparative Psychology, 23, 671–688.

James, W. (1890). The principles of psychology. New York, NY: Dover.

Janik, V. M. (1999). Origins and implications of vocal learning in bottlenose dolphins. In H. O. Box & K. R. Gibson (Eds.), Mammalian social learning: Comparative and ecological perspectives (pp. 308–326). Cambridge: Cambridge University Press.

Janik, V. M. (2000). Whistle matching in wild bottlenose dolphins (Tursiops truncatus). Science, 289, 1355–1357. doi: 10.1126/science.289.5483.1355

Janik, V. M. (2009a). Acoustic communication in delphinids. Advances in the Study of Behavior, 40, 123–158. doi: 10.1016/S0065-3454(09)40004-4

Janik, V. M. (2009b). Whale song. Current Biology, 19, R109–111. doi: 10.1016/j.cub.2008.11.026

Janik, V. M., & Sayigh, L. S. (2013). Communication in bottlenose dolphins: 50 years of signature whistle research. Journal of Comparative Physiology A, 199, 479–489. doi: 10.1007/s00359-013-0817-7

Janik, V. M., Sayigh, L. S., & Wells, R. S. (2006). Signature whistle shape conveys identity information to bottlenose dolphins. Proceedings of the National Academy of Sciences, USA, 103, 8293–8297. doi: 10.1073/pnas.0509918103

Janik, V. M., & Slater, P. J. B. (1997). Vocal learning in mammals. Advances in the Study of Behavior, 26, 59–99. doi: 10.1016/S0065-3454(08)60377-0

Janik, V. M., & Slater, P. J. B. (2000). The different roles of social learning in vocal communication. Animal Behaviour, 60, 1–11. doi: 10.1006/anbe.2000.1410

Jarvis, E. D. (2004). Learned birdsong and the neurobiology of human language. Annals of the New York Academy of Sciences, 1016, 749–777. doi: 10.1196/annals.1298.038 1016/1/749

Jarvis, E. D. (2013). Evolution of brain pathways for vocal learning in birds and humans. In J. J. Bolhuis & M. Everaert (Eds.), Birdsong, speech, and language: Exploring the evolution of mind and brain (pp. 63–107). Cambridge, MA: MIT.

Jeannerod, M. (2006). Motor cognition: What actions tell the self. Oxford: Oxford University Press.

Johnson, H. M. (1912). The talking dog. Science, 35, 749–751. doi: 10.1126/science.35.906.749

Jones, S. S. (2006). Infants learn to imitate by being imitated. In C. Yu, L. B. Smith & O. Sporns (Eds.), Proceedings of the International Conference on Development and Learning. Bloomington, IN: Indiana University.

Jones, S. S. (2007). Imitation in infancy: The development of mimicry. Psychological Science, 18, 593-599. doi: 10.1111/j.1467-9280.2007.01945.x

Kappes, J., Baumgaertner, A., Peschke, C., & Ziegler, W. (2009). Unintended imitation in nonword repetition. Brain and Language, 111, 140–151. doi: 10.1016/j.bandl.2009.08.008

Karlsen, J. D., Bisther, A., Lydersen, C., Haug, T., & Kovacs, K. M. (2002). Summer vocalisations of adult male white whales (Delphinapterus leucas) in Svalbard, Norway. Polar Biology, 25, 808–817. doi: 10.1007/s00300-002-0415-6

Kelley, L. A., & Healy, S. D. (2010). Vocal mimicry in male bowerbirds: Who learns from whom? Biology Letters, 6, 626–629. doi: 10.1098/rsbl.2010.0093

Kelley, L. A., & Healy, S. D. (2011). Vocal mimicry. Current Biology, 21(1), R9–10. doi: 10.1016/j.cub.2010.11.026

Killebrew, D. A., Mercado, E., III, Herman, L. M., & Pack, A. A. (2001). Sound production of a neonate bottlenose dolphin. Aquatic Mammals, 27, 34–44.

King, S. L., Sayigh, L. S., Wells, R. S., Fellner, W., & Janik, V. M. (2013). Vocal copying of individually distinctive signature whistles in bottlenose dolphins. Proceedings of the Royal Society B – Biological Sciences, 280, 20130053. doi: 10.1098/rspb.2013.0053

Knoblich, G., & Jordan, J. S. (2003). Action coordination in groups and individuals: Learning anticipatory control. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 1006-1016. doi: 10.1037/0278-7393.29.5.1006

Koda, H., Oyakawa, C., Kato, A., & Masataka, N. (2007). Experimental evidence for the volitional control of vocal production in an immature gibbon. Behaviour, 144, 681–692. doi: 10.1163/156853907781347817

Kojima, S. (2003). A search for the origins of human speech: Auditory and vocal functions of the chimpanzee. Victoria, Australia: Trans Pacific Press.

Konishi, M. (1965). The role of auditory feedback in the control of vocalization in the white-crowned sparrow. Zeitschrift fur Tierpsychologie, 22, 770–783.

Kremers, D., Jaramillo, M. B., Boye, M., Lemasson, A., & Hausberger, M. (2011). Do dolphins rehearse show-stimuli when at rest? Delayed matching of auditory memory. Frontiers in Psychology, 2, 386. doi: 10.3389/fpsyg.2011.00386

Kremers, D., Lemasson, A., Almunia, J., & Wanker, R. (2012). Vocal sharing and individual acoustic distinctiveness within a group of captive orcas (Orcinus orca). Journal of Comparative Psychology, 126, 433–445. doi: 10.1037/a0028858

Kroger, B. J., Kannampuzha, J., & Neuschaefer-Rube, C. (2009). Towards a neurocomputational model of speech production and perception. Speech Communication, 51, 793–809. doi: 10.1016/j. specom.2008.08.002

Kuczaj, S. A., II, & Yeater, D. B. (2006). Dolphin imitation: Who, what, when, and why. Aquatic Mammals, 32, 413–422. doi: 10.1578/AM.32.4.2006.413

Kymissis, E., & Poulson, C. L. (1990). The history of imitation in learning theory: The language acquisition process. Journal of the Experimental Analysis of Behavior, 54, 113–127. doi: 10.1901/jeab.1990.54-113

Lachlan, R. F., & Slater, P. J. B. (1999). The maintenance of vocal learning by gene-culture interaction: The cultural trap hypothesis. Proceedings of the Royal Society B – Biological Sciences, 266, 701–706.

Lakin, J. L., & Chartrand, T. L. (2003). Using nonconscious behavioral mimicry to create affiliation and rapport. Psychological Science, 14, 334-339. doi: 10.1111/1467-9280.14481

Lakin, J. L., Chartrand, T. L., & Arkin, R. M. (2008). I am too just like you: Nonconscious mimicry as an automatic behavioral response to social exclusion. Psychological Science, 19, 816-822. doi: 10.1111/j.1467-9280.2008.02162.x

Lameira, A. R., Hardus, M. E., Kowalsky, B., de Vries, H., Spruijt, B. M., Sterck, E. H., . . . Wich, S. A. (2013). Orangutan (Pongo spp.) whistling and implications for the emergence of an open-ended call repertoire: A replication and extension. Journal of the Acoustical Society of America, 134, 2326–2335. doi: 10.1121/1.4817929

Legerstee, M. (1990). Infants use multimodal information to imitate speech sounds. Infant Behavior and Development, 13, 343–354. doi: 10.1016/0163-6383(90)90039-B

Levelt, W. J. M., & Kelter, S. (1982). Surface form and memory in question answering. Cognitive Psychology, 14, 78–106. doi: 10.1016/0010-0285(82)90005-6

Lévêque, Y., Giovanni, A., & Schön, D. (2012). Pitchmatching in poor singers: Human model advantage. Journal of Voice, 26, 293–298. doi: 10.1016/j.jvoice.2011.04.001

Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of speech perception revised. Cognition, 21, 1–36. doi: 10.1016/0010-0277(85)90021-6

Lieberman, P. (2012). Vocal tract anatomy and the neural bases of talking. Journal of Phonetics, 40, 608–622. doi: 10.1016/j.wocn.2012.04.001

Lilly, J. C. (1958). Some considerations regarding basic mechanisms of positive and negative types of motivation. American Journal of Psychiatry, 115, 498–504.

Lilly, J. C. (1961). Man and dolphin. New York: Doubleday.

Lilly, J. C. (1963). Productive and creative research with man and dolphin. Archives of General Psychiatry, 8, 111–116.

Lilly, J. C. (1965). Vocal mimicry in Tursiops: Ability to match numbers and durations of human vocal bursts. Science, 147, 300–301. doi: 10.1126/science.147.3655.300

Lilly, J. C. (1967). Dolphin’s mimicry as a unique ability and a step towards understanding. In K. Salzinger & S. Salzinger (Eds.), Research in verbal behavior and some neurophysiological implications (pp. 21–27). New York: Academic Press.

Lilly, J. C. (1968). Sound production in Tursiops truncatus (bottlenose dolphin). Annals of the New York Academy of Sciences, 155, 321–341. doi: 10.1111/j.1749-6632.1968.tb56778.x

Lilly, J. C., Miller, A. M., & Truby, H. M. (1968). Reprogramming of the sonic output of the dolphin: Sonic burst count matching. Journal of the Acoustical Society of America, 43, 1412–1424. doi: 10.1121/1.1911001

Lim, S. J., & Holt, L. L. (2011). Learning foreign sounds in an alien world: Videogame training improves non-native speech categorization. Cognitive Science, 35, 1390–1405. doi: 10.1111/j.1551-6709.2011.01192.x

Lindbolm, B. (1996). Role of articulation in speech perception: Clues from production. Journal of the Acoustical Society of America, 99, 1683–1692. doi: 10.1121/1.414691

Lipkind, D., Marcus, G. F., Bemis, D. K., Sasahara, K., Jacoby, N., Takahasi, M., . . . Tchernichovski, O. (2013). Stepwise acquisition of vocal combinatorial capacity in songbirds and human infants. Nature, 498, 104–108. doi: 10.1038/nature12173

Little, A. D., Mershon, D. H., & Cox, P. H. (1992). Spectral content as a cue to perceived auditory distance. Perception, 21, 405–416. doi: 10.1068/p210405

Loehr, J. D., Kourtis, D., Vesper, C., Sebanz, N., & Knoblich, G. (2013). Monitoring individual and joint action outcomes in duet music performance. Journal of Cognitive Neuroscience. doi: 10.1162/jocn_a_00388

Madsen, P. T., Jensen, F. H., Carder, D., & Ridgway, S. (2012). Dolphin whistles: A functional misnomer revealed by heliox breathing. Biology Letters, 8, 211–213. doi: 10.1098/rsbl.2011.0701

Majewski, W., & Staroniewicz, P. (2011). Imitation of target speakers by different types of impersonators. In A. Esposito, A. Vinviarelli, K. Vicsi, C. Pelachaud, & A. Nijholt (Eds.), Analysis of verbal and nonverbal communication and enactment: The processing issues (Vol. 6800, pp. 104–112). Berlin: Springer.

Mantell, J. T., & Pfordresher, P. Q. (2013). Vocal imitation of song and speech. Cognition, 127, 177–202. doi: 10.1016/j.cognition.2012.12.008

Margoliash, D. (2002). Evaluating theories of bird song learning: Implications for future directions. Journal of Comparative Physiology A: Neuroethology, Sensory, Neural, and Behavioral Physiology, 188, 851–866. doi: 10.1007/s00359-002-0351-5

Marino, L., Connor, R. C., Fordyce, R. E., Herman, L. M., Hof, P. R., Lefebvre, L., . . . Whitehead, H. (2007). Cetaceans have complex brains for complex cognition. PLoS Biology, 5, 966–972. doi: 10.1371/journal.pbio.0050139

Marler, P. (1970). Birdsong and speech development: Could there be parallels? American Scientist, 58, 669–673.

Marler, P. (1976a). An ethological theory of the origin of vocal learning. Annals of the New York Academy of Sciences, 280, 386–395. doi: 10.1111/j.1749-6632.1976.tb25503.x

Marler, P. (1976b). Sensory templates in species-specific behavior. In J. C. Fentress (Ed.), Simpler networks and behavior (pp. 314–329). Sunderland, MA: Sinauer.

Marler, P. (1997). Three models of song learning: Evidence from behavior. Journal of Neurobiology, 33, 501–516. doi: 10.1002/(SICI)1097-4695(19971105)33:53.0.CO;2-8

Marshall, A. J., Wrangham, R. W., & Arcadi, A. C. (1999). Does learning affect the structure of vocalizations in chimpanzees? Animal Behaviour, 58, 825–830. doi: 10.1006/anbe.1999.1219

Masataka, N. (2003). The onset of language. Cambridge: Cambridge University Press.

May-Collado, L. J. (2010). Changes in whistle structure of two dolphin species during interspecific associations. Ethology, 116, 1065–1074. doi: 10.1111/j.1439-0310.2010.01828.x

McCowan, B., & Reiss, D. (1995). Whistle contour development in captive-born infant bottlenose dolphins (Tursiops truncatus): Role of learning. Journal of Comparative Psychology, 109, 242–260. doi: 10.1037//0735-7036.109.3.242

McCowan, B., & Reiss, D. (1997). Vocal learning in captive bottlenose dolphins: A comparison with humans and nonhuman animals. In C. T. Snowdon & M. Hausberger (Eds.), Social influences on vocal development (pp. 178–207). Cambridge: Cambridge University Press.

McCowan, B., & Reiss, D. (2001). The fallacy of ‘signature whistles’ in bottlenose dolphins: A comparative perspective of ‘signature information’ in animal vocalizations. Animal Behaviour, 62, 1151–1162. doi: 10.1006/anbe.2001.1846

McGregor, P., Horn, A. G., & Todd, M. A. (1985). Are familiar sounds ranged more accurately? Perceptual and Motor Skills, 61, 1082.

McPherson, G. E., & Gabrielsson, A. (2002). From sound to sign. In R. Parncutt & G. E. McPherson (Eds.), The science and psychology of music performance: Creative strategies for teaching and learning (pp. 99–116). New York: Oxford University Press.

Mercado, E., III. (2008). Neural and cognitive plasticity: From maps to minds. Psychological Bulletin, 134, 109–137. doi: 10.1037/0033-2909.134.1.109

Mercado, E., III, & DeLong, C. M. (2010). Dolphin cognition: Representations and processes in memory and perception. International Journal of Comparative Psychology, 33, 344–378.

Mercado, E., III, & Frazer, L. N. (1999). Environmental constraints on sound transmission by humpback whales. Journal of the Acoustical Society of America, 106, 3004–3016. doi: 10.1121/1.428120

Mercado, E., III, & Frazer, L. N. (2001). Humpback whale song or humpback whale sonar? A reply to Au et al. IEEE Journal of Oceanic Engineering, 26, 406–415. doi: 10.1109/48.946514

Mercado, E., III, Herman, L. M., & Pack, A. A. (2005). Song copying by humpback whales: Themes and variations. Animal Cognition, 8, 93–102. doi: 10.1007/s10071-004-0238-7

Mercado, E., III, Murray, S. O., Uyeyama, R. K., Pack, A. A., & Herman, L. M. (1998). Memory for recent actions in the bottlenosed dolphin (Tursiops truncatus): Repetition of arbitrary behaviors using an abstract rule. Animal Learning & Behavior, 26, 210-218. doi: 10.3758/BF03199213

Mercado, E., III, Schneider, J. N., Pack, A. A., & Herman, L. M. (2010). Sound production by singing humpback whales. Journal of the Acoustical Society of America, 127, 2678–2691. doi: 10.1121/1.3309453

Mercado, E., III, Uyeyama, R. K., Pack, A. A., & Herman, L. M. (1999). Memory for action events in the bottlenosed dolphin. Animal Cognition, 2, 17-25. doi: 10.1007/s100710050021

Miklosi, A. (1999). The ethological analysis of imitation. Biological Reviews, 74, 347-374. doi: 10.1017/S000632319900537X

Miksis, J. L., Tyack, P. L., & Buck, J. R. (2002). Captive dolphins, Tursiops truncatus, develop signature whistles that match acoustic features of human-made model sounds. Journal of the Acoustical Society of America, 112, 728–739. doi: 10.1121/1.1496079

Miller, N. E., & Dollard, J. (1941). Social learning and imitation. New Haven: Yale University Press.

Miller, P. J. O., Shapiro, A. D., Tyack, P. L., & Solow, A. R. (2004). Call-type matching in vocal exchanges of free-ranging resident killer whales, Orcinus orca. Animal Behaviour, 67, 1099–1107. doi: 10.1016/j.anbehav.2003.06.017

Miller, R., Sanchez, K., & Rosenblum, L. (2010). Alignment to visual speech information. Attention, Perception, & Psychophysics, 72, 1614–1625. doi: 10.3758/APP.72.6.1614

Mithen, S. (2009). The music instinct: The evolutionary basis of musicality. The Neurosciences and Music III – Disorders and Plasticity: Annals of the New York Academy of Science, 1169, 3–12. doi: 10.1111/j.1749-6632.2009.04590.x

Mitterer, H., & Ernestus, M. (2008). The link between speech perception and production is phonological and abstract: Evidence from the shadowing task. Cognition, 109, 168–173. doi: 10.1016/j.cognition.2008.08.002

Molenberghs, P., Cunnington, R., & Mattingley, J. B. (2009). Is the mirror neuron system involved in imitation? A short review and meta-analysis. Neuroscience and Biobehavioral Reviews, 33, 975–980. doi: 10.1016/j.neubiorev.2009.03.010

Molles, L. E., & Vehrencamp, S. L. (1999). Repertoire size, repertoire overlap, and singing modes in the banded wren (Thryothorus pleurostictus). Auk, 116, 677–689.

Molliver, M. E. (1963). Operant control of vocal behavior in the cat. Journal of the Experimental Analysis of Behavior, 6, 197–202. doi: 10.1901/jeab.1963.6-197

Moore, B. R. (1992). Avian movement imitation and a new form of mimicry: Tracing the evolution of a complex form of learning. Behaviour, 122, 231–263. doi: 10.1163/156853992X00525

Moore, B. R. (2004). The evolution of learning. Biological Reviews, 79, 301–335. doi: 10.1017/S0464793103006225

Moore, R., Estis, J., Gordon-Hickey, S., & Watts, C. (2008). Pitch discrimination and pitch matching abilities with vocal and nonvocal stimuli. Journal of Voice, 22, 399–407. doi: 10.1016/j.jvoice.2009.10.010

Morgan, C. L. (1896). Habit and instinct. London: Arnold.

Morton, E. S. (1982). Grading, discreteness, redundancy, and motivation-structural rules. In D. E. Kroodsma & E. H. Miller (Eds.), Acoustic communication in birds (pp. 183–212). New York: Academic Press.

Morton, E. S. (1986). Predictions from the ranging hypothesis for the evolution of long distance signals in birds. Behaviour, 99, 65–86. doi: 10.1163/156853986X00414

Morton, E. S. (1996). Why songbirds learn songs: An arms race over ranging? Poultry and Avian Biology Reviews, 7, 65–71.

Morton, E. S. (2012). Putting distance back into bird song with mirror neurons. Auk, 129, 560–564. doi: 10.1525/auk.2012.12072

Morton, E. S., Howlett, J., Kopysh, N. C., & Chiver, I. (2006). Song ranging by incubating male Blueheaded Vireos: The importance of song representation in repertoires and implications for song delivery patterns and local/foreign dialect discrimination. Journal of Field Ornithology, 77, 291–301. doi: 10.1111/j.1557-9263.2006.00055.x

Möttönen, R., Dutton, R., & Watkins, K. E. (2013). Auditory-motor processing of speech sounds. Cerebral Cortex, 23, 1190-1197. doi: 10.1093/cercor/bhs110

Mowrer, O. H. (1952). The autism theory of speech development and some clinical applications. Journal of Speech and Hearing Disorders, 17, 263–268.

Mowrer, O. H. (1960). Learning theory and the symbolic processes. New York: John Wiley.

Mürbe, D., Friedmann, P., Hofmann, G., & Sundberg, J. (2002). Significance of auditory and kinesthetic feedback to singers’ pitch control. Journal of Voice, 16, 44–51. doi: 10.1016/S0892-1997(02)00071-1

Murray, S. O., Mercado, E., & Roitblat, H. L. (1998). Characterizing the graded structure of false killer whale (Pseudorca crassidens) vocalizations. Journal of the Acoustical Society of America, 104, 1679–1688. doi: 10.1121/1.424380

Myers, S. A., Horel, J. A., & Pennypacker, H. S. (1965). Operant control of vocal behavior in the monkey. Psychonomic Science, 3, 389–390.

Naguib, M., & Wiley, H. (2001). Estimating the distance to a source of sound: Mechanisms and adaptations for long-range communication. Animal Behaviour, 62, 825–837. doi: 10.1006/anbe.2001.1860

Namy, L. L., Nygaard, L. C., & Sauerteig, D. (2002). Gender differences in vocal accommodation: The role of perception. Journal of Language and Social Psychology, 21, 422–432. doi: 10.1177/026192702237958

Nattkemper, D., Ziessler, M., & Frensch, P. A. (2010). Binding in voluntary action control. Neuroscience and Biobehavioral Reviews, 34, 1092–1101. doi: 10.1016/j.neubiorev.2009.12.013

Neumann, R., & Strack, F. (2000). “Mood contagion”: The automatic transfer of mood between persons. Journal of Personality and Social Psychology, 79, 211–223. doi: 10.1037//0022-3514.79.2.211

Nielsen, K. (2011). Specificity and abstractness of VOT imitation. Journal of Phonetics, 39, 132–142. doi: 10.1016/j.wocn.2010.12.007

Noad, M. J., Cato, D. H., Bryden, M. M., Jenner, M. N., & Jenner, K. C. (2000). Cultural revolution in whale songs. Nature, 408, 537. doi: 10.1038/35046199

Nottebohm, F., & Liu, W. C. (2010). The origins of vocal learning: New sounds, new circuits, new cells. Brain and Language, 115, 3–17. doi: 10.1016/j.bandl.2010.05.002

Ocampo, B., & Kritikos, A. (2011). Interpreting actions: The goal behind mirror neuron function. Brain Research Reviews, 67, 260–267. doi: 10.1016/j.brainresrev.2011.03.001

Owren, M. J., Amoss, R. T., & Rendall, D. (2011). Two organizing principles of vocal production: Implications for nonhuman and human primates. American Journal of Primatology, 73, 530–544. doi: 10.1002/ajp.20913

Palmer, C., & Drake, C. (1997). Monitoring and planning capacities in the acquisition of music performance skills. Canadian Journal of Experimental Psychology, 51, 369–384. doi: 10.1037/1196–1961.51.4.369

Panova, E. M., Belikov, R. A., Agafonov, A. V., & Bel’kovich, V. M. (2012). The relationship between the behavioral activity and the underwater vocalization of the beluga whale (Delphinapterus leucas). Oceanology, 52, 79–87. doi: 10.1134/S000143701201016X

Pardo, J. S., Gibbons, R., Suppes, A., & Krauss, R. M. (2012). Phonetic convergence in college roommates. Journal of Phonetics, 40, 190–197. doi: 10.1016/j.wocn.2011.10.001

Pardo, J. S., Jay, I. C., & Krauss, R. M. (2010). Conversational role influences speech imitation. Attention, Perception, & Psychophysics, 72, 2254–2264. doi: 10.3758/APP.72.8.2254

Parton, D. A. (1976). Learning to imitate in infancy. Child Development, 47, 14-31. doi: 10.1111/j.1467-8624.1976.tb03389.x

Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience, 6, 674–681. doi: 10.1038/nn1082

Patterson, D. K., & Pepperberg, I. M. (1994). A comparative study of human and parrot phonation: Acoustic and articulatory correlates of vowels. Journal of the Acoustical Society of America, 96, 634–648. doi: 10.1121/1.410303

Payne, K., & Payne, R. S. (1985). Large scale changes over 19 years in songs of humpback whales in Bermuda. Zeitschrift fur Tierpsychologie, 68, 89–114.

Payne, K., Tyack, P., & Payne, R. S. (1983). Progressive changes in the songs of humpback whales (Megaptera novaeangliae): A detailed analysis of two seasons in Hawaii. In R. Payne (Ed.), Communication and behavior of whales (pp. 9–57). Boulder, CO: Westview Press.

Payne, R. S., & McVay, S. (1971). Songs of humpback whales. Science, 173, 585–597. doi: 10.1126/science.173.3997.585

Pepperberg, I. M. (1986). Social modeling theory: A possible framework for understanding avian learning. Auk, 102, 854–864.

Pepperberg, I. M. (2005). Insights into vocal imitation in African grey parrots (Psittacus erithacus). In S. Hurley & N. Chater (Eds.), Perspectives on imitation, vol 1: Mechanisms of imitation and imitation in animals (pp. 243–262). Cambridge, MA: MIT Press.

Pepperberg, I. M. (2010). Vocal learning in grey parrots: A brief review of perception, production, and cross-species comparisons. Brain and Language, 115, 81–91. doi: 10.1016/j.bandl.2009.11.002

Perelberg, A., & Schuster, R. (2008). Coordinated breathing in bottlenose dolphins (Tursiops truncatus) as cooperation: Integrating proximate and ultimate explanations. Journal of Comparative Psychology, 122, 109–120.

Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature Neuroscience, 6, 688–691. doi: 10.1038/nn1083

Petrinovich, L. (1988). The role of social factors in whitecrowned sparrow song development. In T. R. Zentall & B. G. Galef (Eds.), Social learning: Psychological and biological perspectives (pp. 255–278). Hillsdale, NJ: Lawrence Erlbaum Associates.

Pfordresher, P. Q., & Brown, S. (2007). Poor-pitch singing in the absence of “tone deafness”. Music Perception, 25, 95–115. doi: 10.1525/MP.2007.25.2.95

Pfordresher, P. Q., & Halpern, A. R. (2013). Auditory imagery and the poor-pitch singer. Psychonomic Bulletin & Review. doi: 10.3758/s13423-013-0401-8

Pfordresher, P. Q., & Mantell, J. T. (2009). Singing as a form of vocal imitation: Mechanisms and deficits. Paper presented at the Proceedings of the 7th Triennial Conference of the European Society for the Cognitive Sciences of Music.

Pfordresher, P. Q., & Mantell, J. T. (2012). Effects of altered auditory feedback across effector systems: Production of melodies by keyboard and singing. Acta Psychologica, 139, 166–177. doi: 10.1016/j.actpsy.2011.10.009w

Pfordresher, P. Q., & Mantell, J. T. (2014). Singing with yourself: Evidence for an inverse modeling account of poor-pitch singing. Cognitive Psychology, 70, 31-57. doi: 10.1016/j.cogpsych.2013.12.005

Piaget, J. (1962). Play, dreams, and imitation in childhood. New York: W. W. Norton.

Pickering, M. J., & Branigan, H. P. (1999). Syntactic priming in language production. Trends in Cognitive Sciences, 3, 136–141. doi: 10.3389/fnhum.2012.00185

Pickering, M. J., & Garrod, S. (2006). Do people use language production to make predictions during comprehension? Trends in Cognitive Sciences, 11, 105–110. doi: 10.1016/j.tics.2006.12.002

Poole, J. H., Tyack, P. L., Stoeger-Horwath, A. S., & Watwood, S. (2005). Animal behaviour: Elephants are capable of vocal learning. Nature, 434, 455–456. doi: 10.1038/435042b

Popper, A. N., & Edds-Walton, P. L. (1997). Bioacoustics of marine vertebrates. In M. J. Crocker (Ed.), Encyclopedia of acoustics (pp. 1831–1836). New York: John Wiley & Sons.

Porter, R. J. J., & Lubker, J. F. (1980). Rapid reproduction of vowel-vowel sequences: Evidence for a fast and direct acoustic-motoric linkage in speech. Journal of Speech and Hearing Research, 23, 593–602.

Poulson, C. L., Kymissis, E., Reeve, K. F., Andreators, M., & Reeve, L. (1991). Generalized vocal imitation in infants. Journal of Experimental Child Psychology, 51, 267–279. doi: 10.1016/0022-0965(91)90036-R

Poulson, C. L., Kyparissos, N., Andreatos, M., Kymissis, E., & Parnes, M. (2002). Generalized imitation within three response classes in typically developing infants. Journal of Experimental Child Psychology, 81, 341–357. doi: 10.1006/jecp.2002.2661

Price, C., & Griffiths, T. D. (2005). Speech-specific auditory processing: Where is it? Trends in Cognitive Sciences, 9, 271–276. doi: 10.1016/j.tics.2005.03.009

Price, H. E. (2000). Interval matching by undergraduate nonmusic majors. Journal of Research in Music Education, 48, 360–372. doi: 10.2307/3345369

Price, J. J., & Yuan, D. H. (2011). Song-type sharing and matching in a bird with very large song repertoires, the tropical mockingbird. Behaviour, 148, 673–689. doi: 10.1163/000579511X573908

Prizant, B. M., & Rydell, P. J. (1984). Analysis of functions of delayed echolalia. Journal of Speech and Hearing Research, 27, 183–192.

Quick, N. J., & Janik, V. M. (2012). Bottlenose dolphins exchange signature whistles when meeting at sea. Proceedings of the Royal Society B – Biological Sciences, 279, 2539–2545. doi: 10.1098/rspb.2011.2537

Reidenberg, J. S., & Laitman, J. T. (1988). Existence of vocal folds in the larynx of Odontoceti (toothed whales). Anatomical Record, 221, 884–891. doi: 10.1002/ar.1092210413

Reidenberg, J. S., & Laitman, J. T. (2007). Discovery of a low frequency sound source in Mysticeti (baleen whales): Anatomical establishment of a vocal fold homolog. Anatomical Record, 290, 745–759. doi: 10.1002/ar.20544

Reiss, D., & McCowan, B. (1993). Spontaneous vocal mimicry and production by bottlenose dolphins (Tursiops truncatus): Evidence for vocal learning. Journal of Comparative Psychology, 107, 301–312. doi: 10.1037/0735-7036.107.3.301

Reiterer, S. M., Hu, X., Erb, M., Rota, G., Nardo, D., Grodd, W., . . . Ackermann, H. (2011). Individual differences in audio-vocal speech imitation aptitude in late bilinguals: Functional neuro-imaging and brain morphology. Frontiers in Psychology, 2, 271. doi: 10.3389/fpsyg.2011.00271

Reiterer, S. M., Singh, N. C., & Winkler, S. (2012). Predicting speech imitation ability biometrically. In B. Stolterfoht & S. Featherston (Eds.), Empirical approaches to linguistic theory: Studies in meaning and structure (pp. 317–339). Berlin: De Gruyter.

Rendell, L., & Whitehead, H. (2001). Culture in whales and dolphins. Behavioral and Brain Sciences, 24, 309–324.

Rendell, L., & Whitehead, H. (2003). Vocal clans in sperm whales (Physeter macrocephalus). Proceedings of the Royal Society B – Biological Sciences, 270, 225–231. doi: 10.1098/rspb.2002.2239

Repp, B. H., & Williams, D. R. (1987). Categorical tendencies in imitating self-produced isolated vowels. Speech Communication, 6, 1–14. doi: 10.1016/0167-6393(87)90065-3

Revis, J., De Looze, C., & Giovanni, A. (2013). Vocal flexibility and prosodic strategies in a professional impersonator. Journal of Voice. doi: 10.1016/j.jvoice.2013.01.008

Richards, D. G. (1986). Dolphin vocal mimicry and vocal object labeling. In R. J. Schusterman, J. A. Thomas, & F. G. Wood (Eds.), Dolphin cognition and behavior: A comparative approach (pp. 273–288). Hillsdale, NJ: Lawrence Erlbaum Associates.

Richards, D. G., Wolz, J. P., & Herman, L. M. (1984). Vocal mimicry of computer-generated sounds and vocal labeling of objects by a bottlenosed dolphin, Tursiops truncatus. Journal of Comparative Psychology, 98, 10–28. doi: 10.1037/0735-7036.98.1.10

Ridgway, S., Carder, D., Jeffries, M., & Todd, M. (2012). Spontaneous human speech mimicry by a cetacean. Current Biology, 22, R860–861. doi: 10.1016/j.cub.2012.08.044

Riesch, R., Ford, J. K. B., & Thomsen, F. (2006). Stability and group specificity of stereotyped whistles in resident killer whales, Orcinus orca, off British Columbia. Animal Behavior, 71, 79–91. doi: 10.1016/j.anbehav.2005.03.026

Roitblat, H. L. (1982). The meaning of representation in animal memory. Behavioral and Brain Sciences, 5, 353–372. doi: 10.1017/S0140525X00012486

Roitblat, H. L., & von Fersen, L. (1992). Comparative cognition: Representations and processes in learning and memory. Annual Review of Psychology, 43, 671–710.

Romanes, G. J. (1884). Mental evolution in animals. New York: D. Appleton & Co.

Rosenbaum, D. A., Carlson, R. A., & Gilmore, R. O. (2001). Acquisition of intellectual and perceptualmotor skills. Annual Review of Psychology, 52, 453–470. doi: 10.1146/annurev.psych.52.1.453

Rothenberg, D. (2008). Whale music: Anatomy of an interspecies duet. Leonardo Music Journal, 18, 47–53. doi: 10.1162/lmj.2008.18.47

Russell, J. L., Hopkins, W. D., & Taglialatela, J. P. (2012). Vocal learning in captive chimpanzees (Pan troglodytes): Evidence of flexibility and voluntary control. American Journal of Primatology, 74, 66.

Salzinger, K. (1993). Animal communication. In D. A. Dewsbury & D. A. Rethlingshafer (Eds.), Comparative psychology: A modern survey (pp. 161–193). New York: McGraw-Hill.

Salzinger, K., & Waller, B. W. (1962). The operant control of vocalization in the dog. Journal of the Experimental Analysis of Behavior, 5, 383–389.

Sayigh, L. S., Tyack, P. L., Wells, R. S., & Scott, M. D. (1990). Signature whistles of free-ranging bottlenose dolphins, Tursiops truncatus: Mother-offspring comparisons. Behavioral Ecology and Sociobiology, 26, 247–260.

Sayigh, L. S., Tyack, P. L., Wells, R. S., Scott, M. D., & Irvine, A. B. (1995). Sex differences in signature whistle production of free-ranging bottlenose dolphins, Tursiops truncatus. Behavioral Ecology and Sociobiology, 36, 171–177. doi: 10.1007/BF00177793

Sayigh, L. S., Tyack, P. L., Wells, R. S., Solow, A. R., Scott, M. D., & Irvine, A. B. (1999). Individual recognition in wild bottlenose dolphins: A field test using playback experiments. Animal Behaviour, 57, 41–50. doi: 10.1006/anbe.1998.0961

Schevill, W. E., & Lawrence, B. (1949). Listening to the white porpoise (Delphinapterus leucas). Science, 109, 143–144. doi: 10.1126/science.109.2824.143

Schuler, A. L. (1979). Echolalia: Issues and clinical applications. Journal of Speech and Hearing Disorders, 44, 411–434.

Schusterman, R. J. (2008). Vocal learning in mammals with special emphasis on pinnipeds. In D. K. Oller & U. Gribel (Eds.), The evolution of communicative flexibility: Complexity, creativity, and adaptability in human and animal communication (pp. 41–70). Cambridge, MA: MIT Press.

Schusterman, R. J., & Feinstein, S. H. (1965). Shaping and discriminative control of underwater click vocalizations in a California sea lion. Science, 150, 1743–1744. doi: 10.1126/science.150.3704.1743

Searcy, W. A., DuBois, A. L., Rivera-Caceres, K., & Nowicki, S. (2013). A test of a hierarchical signalling model in song sparrows. Animal Behavior, 86, 309–315. doi: 10.1016/j.anbehav.2013.05.019

Sewall, K. (2012). Vocal matching in animals. American Scientist, 100, 306–315.

Shapiro, A. D., & Slater, P. J. B. (2004). Call usage learning in gray seals (Halichoerus grypus). Journal of Comparative Psychology, 118, 447–454. doi: 10.1037/0735-7036.118.4.447

Shettleworth, S. J. (1998). Cognition, evolution, and behavior. New York: Oxford University Press.

Shockley, K., Richardson, D. C., & Dale, R. (2009). Conversation and coordinative structures. Topics in Cognitive Science, 1, 305–319. doi:10.1111/j.1756-8765.2009.01021.x

Shockley, K., Sabadini, L., & Fowler, C. A. (2004). Imitation in shadowing words. Perception & Psychophysics, 66, 422–429. doi: 10.3758/BF03194890

Shy, E., & Morton, E. S. (1986). The role of distance, familiarity, and time of day in Carolina Wren responses to conspecific songs. Behavioral Ecology and Sociobiology, 19, 393–400. doi: 10.1007/BF00300541

Sigurdson, J. (1993). Frequency-modulated whistles as a medium for communication with the bottlenose dolphin (Tursiops truncatus). In H. L. Roitblat, L. M. Herman, & P. E. Nachtigall (Eds.), Language and communication: Comparative perspectives (pp. 153–174). Hillsdale, NJ: Lawrence Erlbaum Associates.

Sjare, B. L., & Smith, T. G. (1986). The vocal repertoire of white whales, Delphinapterus leucas, summering in Cunningham Inlet, Northwest Territories. Canadian Journal of Zoology, 64, 407–415. doi: 10.1139/z86-063

Skoyles, J. R. (1998). Speech phones are a replication code. Medical Hypotheses, 50, 167–173. doi: 10.1016/S0306-9877(98)90203-1

Smith, J. N., Goldizen, A. W., Dunlop, R. A., & Noad, M. J. (2008). Songs of male humpback whales, Megaptera novaeangliae, are involved in intersexual interactions. Animal Behaviour, 76, 467–477. doi: 10.1016/j.anbehav.2008.02.013

Smotherman, M. S. (2007). Sensory feedback control of mammalian vocalizations. Behavioural Brain Research, 182, 315–326. doi: 10.1016/j.bbr.2007.03.008

Stimpert, A. K., Peavey, L. E., Friedlaender, A. S., & Nowacek, D. P. (2012). Humpback whale song and foraging behavior on an Antarctic feeding ground. PLoS One, 7(12), e51214. doi: 10.1371/journal. pone.0051214

Stimpert, A. K., Wiley, D. N., Au, W. W., Johnson, M. P., & Arsenault, R. (2007). ‘Megapclicks’: Acoustic click trains and buzzes produced during night-time foraging of humpback whales (Megaptera novaeangliae). Biology Letters, 3, 467–470. doi: 10.1098/rsbl.2007.0281

Stoeger, A. S., Mietchen, D., Oh, S., de Silva, S., Herbst, C. T., Kwon, S., & Fitch, W. T. (2012). An Asian elephant imitates human speech. Current Biology, 22, 2144–2148. doi: 10.1016/j.cub.2012.09.022

Strager, H. (1995). Pod specific call repertoires and compound calls of killer whales, Orcinus orca, Linnaeus, 1758, in waters of northern Norway. Canadian Journal of Zoology, 73, 1037–1047. doi: 10.1139/z95-124

Studdert-Kennedy, M. (2000). Imitation and the emergence of segments. Phonetica, 57, 275–283. doi: 10.1159/000028480

Subiaul, F. (2010). Dissecting the imitation faculty: The multiple imitation mechanisms (MIM) hypothesis. Behavioural Processes, 83, 222–234. doi: 10.1016/j.beproc.2009.12.002

Subiaul, F., Anderson, S., Brandt, J., & Elkins, J. (2012). Multiple imitation mechanisms in children. Developmental Psychology, 48, 1165–1179. doi: 10.1037/a0026646

Taglialatela, J. P., Reamer, L., Schapiro, S. J., & Hopkins, W. D. (2012). Social learning of a communicative signal in captive chimpanzees. Biology Letters, 8, 498–501. doi: 10.1098/rsbl.2012.0113

Tayler, C. K., & Saayman, G. S. (1973). Imitative behavior by Indian bottlenose dolphins (Tursiops aduncus) in captivity. Behaviour, 44, 286–298.

Tchernichovski, O., Mitra, P. P., Lints, T., & Nottebohm, F. (2001). Dynamics of the vocal imitation process: How a zebra finch learns its song. Science, 291, 2564–2569. doi: 10.1126/science.1058522

Thomsen, F., Franck, D., & Ford, J. K. (2002). On the communicative significance of whistles in wild killer whales (Orcinus orca). Naturwissenschaften, 89, 404–407. doi: 10.1007/s00114-002-0351-x

Thorndike, E. L. (1911). Animal intelligence: Experimental studies. New York: Hafner Publishing.

Thorpe, W. H. (1956). Learning and instinct in animals. London: Methuen and Co.

Thorpe, W. H. (1969). The significance of vocal imitation in animals with special reference to birds. Acta Biologiae Experimentalis, 29, 251–269.

Thorpe, W. H., & North, M. E. W. (1965). Origin and significance of the power of vocal imitation: With special reference to the antiphonal singing of birds. Nature, 208, 219–222. doi: 10.1038/208219a0

Tourville, J. A., & Guenther, F. H. (2011). The DIVA model: A neural theory of speech acquisition and production. Language and Cognitive Processes, 26, 952–981. doi: 10.1080/01690960903498424

Troyer, T. W., & Doupe, A. J. (2000a). An associational model of birdsong sensorimotor learning I. Efference copy and the learning of song syllables. Journal of Neurophysiology, 84, 1204–1223.

Troyer, T. W., & Doupe, A. J. (2000b). An associational model of birdsong sensorimotor learning II. Temporal hierarchies and the learning of song sequence. Journal of Neurophysiology, 84, 1224–1239.

Tulving, E. (2002). Episodic memory: From mind to brain. Annual Review of Psychology, 53, 1–25. doi: 10.1146/annurev.psych.53.100901.135114

Tyack, P. L. (1986). Whistle repertoires of two bottlenosed dolphins, Tursiops truncatus: Mimicry of signature
whistles? Behavioral Ecology and Sociobiology, 18, 251–257. doi: 10.1007/BF00300001

Tyack, P. L. (1991). Use of a telemetry device to identify which dolphin produces a sound. In K. Pryor & K. S. Norris (Eds.), Dolphin societies: Discoveries and puzzles (pp. 319–344). Berkeley: University of California Press.

Tyack, P. L. (2000). Functional aspects of cetacean communication. In J. Mann, R. C. Connor, P. L. Tyack, & H. Whitehead (Eds.), Cetacean societies: Field studies of dolphins and whales (pp. 270–307). Chicago: University of Chicago Press.

Tyack, P. L. (2008). Convergence of calls as animals form social bonds, active compensation for noisy communication channels, and the evolution of vocal learning in mammals. Journal of Comparative Psychology, 122, 319–331. doi: 10.1037/a0013087

Tyack, P. L., & Clark, C. W. (2000). Communication and acoustic behavior of whales and dolphins. In W. W. L. Au, A. N. Popper, & R. R. Fay (Eds.), Hearing by whales and dolphins (pp. 156–224). New York: Springer.

Tyack, P. L., & Sayigh, L. S. (1997). Vocal learning in cetaceans. In C. T. Snowdon & M. Hausberger (Eds.), Social influences on vocal development (pp. 208–233). Cambridge: Cambridge University Press.

Vallabha, G. K., & Tuller, B. (2004). Perceptuomotor bias in the imitation of steady-state vowels. Journal of the Acoustical Society of America, 116, 1184–1197. doi: 10.1121/1.1764832

van Heel, W. H. D., Kamminga, C., & van der Toorn, J. D. (1982). An experiment in two-way communication in Orcinus orca L. Aquatic Mammals, 9, 69–82.

van Santen, J. P. H., Sproat, R. W., & Hill, A. P. (2013). Quantifying repetitive speech in autism spectrum disorders and language impairment. Autism Research, 6, 372–383. doi: 10.1002/aur.1301

Vergara, V., & Barrett-Lennard, L. G. (2008). Vocal development in a beluga calf (Delphinapterus leucas). Aquatic Mammals, 34, 123–143. doi: 10.1578/AM.34.1.2008.123

Vesper, C., van der Wel, R. P., Knoblich, G., & Sebanz, N. (2013). Are you ready to jump? Predictive mechanisms in interpersonal coordination. Journal of Experimental Psychology: Human Perception and Performance, 39, 48-61. doi: 10.1037/a0028066

Wang, D., Yan, N., & Ng, L. (2012). Effects of augmented auditory feedback on pitch production accuracy in singing. In E. Cambouropoulos, C. Tsougras, P. Mavromatis, & K. Pastiadis (Eds.), Proceedings of the 12th International Conference on Music Perception and Cognition (pp. 1116–1119). Thessoloniki, Greece: Aristotle University of Thessaloniki.

Ward, W. D., & Burns, E. M. (1978). Singing without auditory feedback. Journal of Research in Singing, 1, 24–44.

Watts, C. R., & Hall, M. D. (2008). Timbral influences on vocal pitch-matching accuracy. Logopedics Phoniatrics Vocology, 33, 74–82. doi: 10.1080/14015430802028434

Watwood, S. L., Tyack, P. L., & Wells, R. S. (2004). Whistle sharing in paired male bottlenose dolphins, Tursiops truncatus. Behavioral Ecology and Sociobiology, 55, 531–543. doi: 10.1007/s00265-003-0724-y

Weib, B. M., Symonds, H., Spong, P., & Ladich, F. (2011). Call sharing across vocal clans of killer whales: Evidence for vocal imitation. Marine Mammal Science, 27, E1–E13. doi: 10.1111/j.1748-7692.2010.00397.x

Welch, G. F. (1979). Vocal range and poor pitch singing. Psychology of Music, 7, 13–31. doi: 10.1177/030573567972002

Westermann, G., & Reck Miranda, E. (2004). A new model of sensorimotor coupling in the development of speech. Brain and Language, 89, 393–400. doi: 10.1016/S0093-934X(03)00345-6

Whiten, A., & Ham, R. (1992). On the nature and evolution of imitation in the animal kingdom: Reappraisal of a century of research. Advances in the Study of Behavior, 21, 239–283.

Wich, S. A., Swartz, K. B., Hardus, M. E., Lameira, A. R., Stromberg, E., & Shumaker, R. W. (2009). A case of spontaneous acquisition of a human sound by an orangutan. Primates, 50, 56–64. doi: 10.1007/s10329-008-0117-y

Wickler, W. (2013). Understanding mimicry—with special reference to vocal mimicry. Ethology, 119, 259–269. doi: 10.1111/eth.12061

Wiley, D., Ware, C., Bocconcelli, A., Cholewiak, D., Friedlaender, A., Thompson, M., & Weinrich, M. (2011). Underwater components of humpback whale bubble-net feeding behavior. Behaviour, 148, 575–602. doi: 10.1163/000579511X570893

Williamson, V. J., Jilka, S. R., Fry, J., Finkel, S., Mullensiefen, D., & Stewart, L. (2012). How do “earworms” start? Classifying the everyday circumstances of involuntary musical imagery. Psychology of Music, 40, 259–284. doi: 10.1177/0305735611418553

Wilson, M. (2001a). The case for sensorimotor coding in working memory. Psychonomic Bulletin & Review, 8, 44-57. doi: 10.3758/BF03196138

Wilson, M. (2001b). Perceiving imitatible stimuli: Consequences of isomorphism between input and output. Psychological Bulletin, 127, 543-553. doi: 10.1037//0033-2909.127.4.543

Wilson, M., & Knoblich, G. (2005). The case for motor involvement in perceiving conspecifics. Psychological Bulletin, 131, 460–473. doi:10.1037/0033-2909.131.3.460

Wise, K., & Sloboda, J. A. (2008). Establishing an empirical profile of self-defined ‘tone deafness’: Perception, singing performance, and self-assessment. Musicae Scientiae, 12, 3–23. doi: 10.1177/102986490801200102

Wisniewski, M. G., Mantell, J. T., & Pfordresher, P. Q. (2013). Transfer effects in the vocal imitation of speech and song. Psychomusicology: Music, Mind, and Brain, 23, 82–99.

Wisniewski, M. G., Mercado, E., III, Gramann, K., & Makeig, S. (2012). Familiarity with speech affects cortical processing of auditory distance cues and increases acuity. PLoS One, 7, e41025. doi: 10.1371/journal.pone.0041025

Witchell, C. A. (1896). The evolution of bird-song with observations of heredity and imitation. London: Adam and Charles Black.

Woody, R. H., & Lehmann, A. C. (2010). Student musician’s ear-playing ability as a function of vernacular music experience. Journal of Research in Music Education, 58, 101–115. doi: 10.1177/0022429410370785

Yeater, D. B., & Kuczaj, S. A., II. (2010). Observational learning in wild and captive dolphins. International Journal of Comparative Psychology, 23, 379–385.

Yu, A., Abrego-Collier, C., Baglini, R., Grano, T., Martinovik, M., Otte, C., & Urban, J. (2011). Speaker attitude and sexual orientation affect phonetic imitation. Paper presented at the Proceedings of the 34th Annual Penn Linguistics Colloquium (Vol. 17).

Yuen, I., Davis, M. H., Brysbaert, M., & Rastle, K. (2010). Activation of articulatory information in speech perception. Proceedings of the National Academy of Sciences, USA, 107, 592–597. doi: 10.1073/pnas.0904774107

Yurk, H., Barrett-Lennard, L. G., Ford, J. K. B., & Matkin, C. O. (2002). Cultural transmission within maternal lineages: Vocal clans in resident killer whales in southern Alaska. Animal Behavior, 63, 1103–1119. doi:10.1006/anbe.2002.3012

Zahorik, P., Brungart, D. S., & Bronkhorst, A. W. (2005). Auditory distance perception in humans: A summary of past and present research. Acta Acustica United with Acustica, 91, 409–420.

Zatorre, R. J., & Baum, S. R. (2012). Musical melody and speech intonation: Singing a different tune. PLoS Biology, 10, e1001372. doi: 10.1371/journal.pbio.1001372

Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory cortex: Music and speech. Trends in Cognitive Sciences, 6, 37–46. doi: 10.1016/S1364-6613(00)01816-7

Zentall, T. R. (2006). Imitation: Definitions, evidence, and mechanisms. Animal Cognition, 9, 335–353. doi: 10.1007/s10071-006-0039-2

Zentall, T. R., & Akins, C. (2001). Imitation in animals: Evidence, function, and mechanisms. In R. G. Cook (Ed.), Avian visual cognition [Online]: Available: www.pigeon.psy.tufts.edu/avc/zentall.

Zetterholm, E. (2006). Same speaker—different voices: A study of one impersonator and some of his different imitations. Paper presented at the Proceedings of the 11th Australian International Conference on Speech Science & Technology, University of Auckland, New Zealand.

Zhang, J., Hughes, L. E., & Rowe, J. B. (2012). Selection and inhibition mechanisms for human voluntary action decisions. Neuroimage, 63, 392–402. doi: 10.1016/j.neuroimage.2012.06.058

Volume 9: pp. 1-16

vol9_white_brown_thumbForgetting from Short-Term Memory in Delayed Matching to Sample: A Reinforcement Context Model

K. Geoffrey White and Glenn S. Brown
University of Otago, New Zealand

Reading Options:

Continue reading below, or:

Read/Download PDF | Add to Endnote


Abstract

Short-term memory in nonhuman animals is typically studied in delayed matching to sample, with variation in the retention interval or delay between the to-be-remembered sample and subsequently presented choice or comparison stimuli. The forgetting function, which relates the systematic decrease in discriminability to increasing delay, is well described by an exponential in the square root of time, with an intercept and slope that vary systematically with different conditions, such as sample-stimulus disparity, retention-interval conditions, and reward parameters. We argue that the rewards for accurate matching are relative to the reinforcement context, which includes rewards Ro for extraneous or other behaviors. Forgetting results from competition between Ro and rewards for the delayed matching task. We suggest that Ro acts to shift attention from the memory task to extraneous behavior, and that Ro grows as a linear function of time in the retention interval. By incorporating these assumptions in the model proposed by White and Wixted (1999), we accurately predict the time course of forgetting under a variety of different conditions for delayed matching.

Keywords: forgetting, reinforcement, interference, extraneous behavior, short-term memory, delayed matching, pigeon

Author Note: An earlier version was presented to the Society for Quantitative Analysis of Behavior, Phoenix, Arizona, 2009. We thank our lab group and Sara-Lee Illingworth for their contribution to our own experiments reviewed here. Address correspondence to geoff.white@otago.ac.nz.


More than fifty years ago, Peterson and Peterson (1959) and J. Brown (1958) demonstrated that people quickly forget unfamiliar combinations of letters (three-letter trigrams) if they are prevented from rehearsing them. In their experiments, recall accuracy systematically fell as the retention interval lengthened. This result provided the first empirical evidence for a short-term memory process in which a memory trace decays in a matter of seconds. This classic result has been confirmed many times in research with humans, and with a variety of to-be-remembered stimuli (Baddeley, 1997). The result supports a major theoretical account of forgetting: that forgetting occurs via a passive decay of memory traces over time.

In another landmark study published at the same time as the Petersons’, Blough (1959) demonstrated short-term forgetting in pigeons. Blough’s pigeons worked for food in a delayed matching-to-sample task. In this task, a to-be-remembered sample stimulus was presented at the beginning of each trial. After a retention interval lasting up to 5 or 10 s, the pigeon chose one of two comparison stimuli. Correct choices that matched the prior sample were rewarded with food. Blough observed the pigeons’ behavior during the retention interval. Pigeons that developed different behavior patterns during the retention interval for each sample (e.g., bobbing up and down for one sample and a different behavior for the other sample, as though rehearsing) were able to recall the sample with very high accuracy, even after 10 s. Memory accuracy for pigeons without such rehearsal-like behaviors, however, declined rapidly with increasing delay, just as in the studies with humans. Like theories of human forgetting, the main theory of forgetting in nonhuman animals assumed that, unless memory traces are maintained by rehearsal (Grant, 1981), traces decay with time (Roberts, 1972).

Decay theories of human short-term memory, compared to alternative theories, continue to be hotly debated (Lewandowsky, Oberauer, & Brown, 2009; Nairne, 2002; Portrat, Barrouillet, & Camos, 2008; Surprenant & Neath, 2009; White, 2012). Earlier, McGeoch (1932) argued that the main mechanism to account for forgetting is interference (Roediger, Weinstein, & Agarwal, 2010).

In the present paper, we propose a theory for forgetting from short-term memory in nonhuman animals that generally follows the interference principle. Unlike McGeogh’s original idea of response competition as the source of interference, our theory is based on reinforcement competition. This notion stems from Herrnstein’s (1961, 1970) matching law. According to this law, the strength of a response is predicted by the rewards it produces, relative to rewards for alternative behaviors. That is, the effectiveness of rewards for a behavior of interest is relative to the reinforcement context provided by all sources of reinforcement. Thus the rewards for alternative, or other, behaviors, Ro, compete with the rewards for completing or attending to the main task. We outline our theory in more detail below, but we first explain the two main characteristics of a forgetting function that the theory must account for. Descriptively, these are the intercept and slope of the forgetting function relating memory performance to the passage of time.

Forgetting Functions

vol9_white_brown_img1

Figure 1. The exponential in the square root of time, y = a·exp(b·?t), fitted to data for the pigeon, Bird B1, in one condition reported by Sargisson & White (2003).

Over the fifty or so years since the seminal work, studies with a wide range of species have explored short-term forgetting functions (Rubin & Wenzel, 1996; White, 2001, 2013). Forgetting typically follows a systematically decreasing function in which performance gradually decreases as the retention interval lengthens. The form of the function could be logarithmic (Woodworth & Shlosberg, 1954), power (Wixted & Carpenter, 2007; Wixted & Ebbesen, 1991), exponential (White, 1985), hyperbolic (Staddon, 1983), or exponential in the square root of time (Harper & White, 1997; White, 2001). These were among the best-fitting functions of the large number that Rubin and Wenzel (1996) fitted to data from over 200 studies with both humans and nonhumans. The common characteristic of all their best-fitting functions is that accuracy decreases monotonically as time since the to-be-remembered event elapses. With the best-fitting functions such as the power and exponential in the square root of time (White, 2001), forgetting is slower at longer retention intervals, consistent with what might be expected if memories consolidate with time (Wixted, 2004, 2010). In the present paper, we use the exponential in the square root of time, that is, y = a·exp(b·?t), because it does an excellent job of fitting data from a wide range of studies using the delayed matching-to-sample task (see White, 2013, for review). To illustrate, in Figure 1, this function was fitted to data for a pigeon trained in a delayed matching-to-sample task with 18 different delays arranged in an arithmetic progression (Sargisson & White, 2003). Five sets of different delays were run in rotations of five daily sessions over a large number of total sessions, with a few overlapping delays across sets. The fitted function (solid line) has an intercept of a = 1.79, has a slope of b = .04, and accounted for 92 percent of the variance in the data. In the present paper, this equation is used to describe an entire forgetting function—the function relating discriminability to retention-interval duration. An important requirement for our model is the ability to predict differences in both intercept and slope of the forgetting functions over a range of conditions. The examples selected below usefully illustrate such changes, but we do not attempt an exhaustive review of delayed matching studies.

In the examples that follow, we use a measure of discriminability to describe the pigeon’s accuracy in delayed matching. As it happens, conclusions drawn based on the discriminability measure do not differ from those based on the more usual measure, percent correct (White, 1985). The discriminability measure, however, like d’ in signal detection theory, has the advantage that it is bias-free, and varies on a dimension that has equalinterval properties and has no upper bound to create ceiling effects (White, 2001). The discriminability measure used here, log d, was derived by Davison and Tustin (1978) and is the log (base 10) of the ratio of correct to error responses. For sample stimuli S1 and S2, log d = 0.5 log10 [(correct responses following S1 × correct responses following S2)/(errors following S1 × errors following S2)].

Reinforcement Context

During the retention interval of a short-term memory task, including delayed matching to sample, various activities or events may intervene to interfere with remembering the sample stimuli or to-be-remembered items. Experimentally introduced interference in a human list-learning task might include the learning of another list, or the introduction of competing (‘concurrent’) tasks at encoding or retrieval. Wixted (2004, 2010) argued that such interference in everyday remembering is nonspecific in that the intervening event does not have to be specifically related to the to-be-remembered items. In delayed matching to sample, the pigeon engages in extraneous or other behaviors during the retention interval. When the experimental chamber is dark, these may be restricted to wing flapping or pacing for a pigeon (a visual animal), and when the chamber is illuminated, the pigeon will peck at grains of wheat spilt from the hopper, or at screws or small marks on the chamber walls. In Blough’s (1959) seminal study, the behaviors during the retention interval were carefully recorded and for some pigeons seemed to be correlated with performance on the memory task. More generally, illuminating the chamber during the retention interval creates conditions for retroactive interference, and matching accuracy is adversely affected (Roberts & Grant, 1978; Zentall, 1973). For the present model, we assume that other behaviors extraneous to the task of remembering occur throughout the retention interval, whatever they are, and that they are rewarded by extraneous or other reinforcers, Ro, following Herrnstein’s (1970) supposition of extraneous reinforcement. In general, Ro is a hypothetical entity, although it could be supplemented by experimenter-defined extraneous reinforcement, as Brown and White (2005b) did when they reinforced key pecking on a variable-interval schedule during the retention interval (see below). Ro is part of the reinforcement context. If the reinforcement context includes just the rewards R for a target behavior B, and Ro, which rewards other behavior Bo, Herrnstein’s (1970) application of the matching law predicts the relative strength of the target behavior from B/(B + Bo) = R/(R + Ro). The effect of R is relative to the total reinforcement context R + Ro.

A Modified White-Wixted Model

vol9_white_brown_img2

Figure 2. Hypothetical distributions of stimulus effect for green and red samples (top two panels), and reward probability distributions (third panel) that result from multiplying them by arranged reward probabilities (0.7 and 0.3 for correct green and red choices respectively, in the example), and the distribution of relative reward probability for correct red choices on the stimulus effect dimension (bottom panel).

White and Wixted (1999) described a model for delayed matching performance that was reminiscent of signal detection theory, but that based the decision rule on the matching law. Like signal detection theory, the model assumes that the sample stimuli (for example, green and red hues) are associated with Thurstone’s (1927) discriminal distributions along a dimension of stimulus value or stimulus effect (Figure 2, top two panels). Unlike signal detection theory, however, in the present model the individual has no knowledge of these distributions. Instead, the individual’s knowledge is about distributions of relative reinforcement along the dimension of stimulus effect. The reinforcement distributions are derived (in the model) by multiplying the stimulus effect distribution in the top panel by the probability of reinforcement for correct red or green choices. The third panel in Figure 2 shows the result for an example where reinforcement probabilities were 0.7 and 0.3 for correct choices of green and red comparison stimuli respectively. The bottom panel of Figure 2 shows the distribution of the proportion of rewards for correct choices of red as a function of stimulus value.

For the present modeling, the stimulus effect distributions were set up using the NORMDIST function in Excel, and reinforcement distributions were generated by multiplying the normal distributions by reinforcement probabilities (1.0 in most cases). The discriminal distributions are set apart by D z-units, and for standard deviations of the distributions set at 1.0, the only free parameter in the model is D. The model works in the following way. On each trial, the red or green sample is randomly selected, and a value (i) is sampled on the stimulus effect dimension (in relation to the relevant normal distribution). That value is associated with a specific ratio (R1i / R2i) or proportion of rewards that have been gained in the past. Given the stimulus value i, the individual makes a choice response B1i or B2i to comparison stimuli 1 and 2, according to the matching law. That is, at stimulus value i, B1i/B2i = (R1i)/(R2i). By summing choice responses B1i and B2i across all values of stimulus effect, and also the rewards they produce (which depend on the reward probabilities), a matrix is generated, which gives B1 and B2 choices following S1 and S2 samples, and also the rewards they obtain. This signal detection matrix then allows the calculation of the discriminability measure, log d.

Compared to the original version of the White and Wixted (1999) model, however, we add an important assumption, first proposed by Brown and White (2009). This assumption recognizes that the rewards for correct matching act in a context of total reinforcement. Specifically, we assumed that the effect of the R1i/R2i reward ratio in determining the choice at stimulus value i is diluted by rewards for other behavior, Ro. If Ro acts as a general background, then it is added to R1 or R2. Accordingly, we assume that at a given stimulus value i, B1i/B2i = (R1i + Ro)/(R2i + Ro). One specific advantage of this new assumption is that it allows the prediction of the effects of absolute rate of reward on matching accuracy—with overall lower reward probabilities, discriminability is reduced (Brown & White, 2009). The original White and Wixted model did not predict this effect of absolute rate of reward, but the modified version does. Application of the model below to the results of a variety of experimental conditions assumes that the main causal factor in forgetting is the value of Ro, rewards for other behavior. The relativity of R to Ro, however, means that in any instance, forgetting could result from an increase in Ro, or a weakening of R. This possibility is illustrated by rewriting our equation: B1i/B2i = (R1i + Ro)/(R2i + Ro), after dividing top and bottom expressions by Ro, to give: B1i/B2i = (R1i / Ro + 1)/(R2i / Ro + 1). Our interpretation of the term R/Ro is that the effect of R is weakened or diluted by the effect of rewards for other behavior. An alternative interpretation, not considered here, might be that rewards for remembering are weakened by some other factor such as changing expectancies across time. In other words, the relativity of R to Ro means that variation in task parameters could result in a decrease in R that is modeled by an increase in Ro. This conclusion is plausible because a change in parameters of the memory task could be associated with a change in Ro. For example, if the sample duration is extremely short, it is plausible that Ro is higher than when the sample is of long duration and more attention or effort is being paid to the memory task.

In behavioral terms, the notion of reinforcement competition, following Herrnstein (1970), is used to account for the allocation of behavior between two or more alternatives. In our model, these alternatives are the task of remembering and alternative or other behaviors. The task of remembering may or may not include rehearsal as just one aspect. In this behavioral view, remembering is a conditional discrimination like any other but with sample stimuli (in delayed matching) temporally separated from the comparison stimuli (White, 2002a). Thus, the pigeon’s allocation of behavior to the memory task versus alternative activities is determined by the rewards for remembering relative to rewards for other behaviors. Our shorthand way of describing this differential allocation of behavior is that the pigeon may switch attention between remembering and alternative activities.

The Effect of Retention Interval

The feature that defines delayed matching as a memory task is the retention interval between presentation of the to-be-remembered sample stimuli and the comparison stimuli to which a choice response is made. For any model of remembering, the critical objective is to predict the effect of the retention interval by describing the effects of variables correlated with time. White and Wixted (1999) assumed a diffusion process in which the standard deviation of the discriminal distributions (which could be different for the two distributions—White & Wixted, 2010) increased with increasing time in the retention interval, thus increasing the overlap between distributions and decreasing discriminability. White and Wixted did not specify the form of the diffusion process.

vol9_white_brown_img3

Figure 3. Hypothetical examples of exponential forgetting functions in the square root of time that differ in intercept a, but not slope b (left panel), and discrete values of Ro from the modified White-Wixted model at different times in the retention interval, needed to generate the values of discriminability for the hypothetical forgetting functions in the left panel (right panel).

However, White (2002b) showed that the specific form of diffusion could predict the mathematical form of the forgetting function. If the function relating standard deviation to time was linear, the predicted forgetting function was hyperbolic. If the function was exponential, the predicted forgetting function was exponential. If diffusion was a function of the square root of time, assuming that stimulus value drifts over time according to a random walk, the predicted forgetting function was a power function (White, 2002b). To date, however, it is not clear which form the hypothetical diffusion process might follow.

The present model does not assume a diffusion process, but predicts the effect of retention-interval duration by assuming that Ro grows with time over the course of the retention interval. This important assumption means that relative to Ro, the effectiveness of rewards for remembering decreases over time. Ro grows over time because opportunities to engage in competing activities increase with time in the retention interval. For example, in the first second into the retention interval, orienting toward the food hopper might be the only alternative. However, by 10 s, a variety of behaviors is possible. Additionally, at the beginning of the retention interval, Ro might be low because alternative behaviors had been exhausted in a previously illuminated experimental chamber during the intertrial interval (Santi, 1984), or Ro might grow at a rapid rate during the retention interval because the chamber was illuminated and allowed more possible alternative activities than in a dark retention interval.

We arrived at the Ro growth function in the following way. First, we drew two theoretical forgetting functions for y = a·exp(b·?t), with the same slope b, but with different intercepts a (Figure 3, left panel). Second, using our Excel-based implementation of the modified White-Wixted model, we asked what (punctate) values of Ro were needed in order to generate the log d values for the exponential in ?t forgetting function. These are shown in the right panel of Figure 3. We did the same thing for forgetting functions that had the same intercepts but differed in slope (Figure 4, left panel). The set of Ro values at different times in the retention interval, shown in the right panel of Figure 4, are the values needed in order to generate the exponential in ?t functions with different slopes in the left panel of Figure 4.

vol9_white_brown_img4

Figure 4. Hypothetical examples of exponential forgetting functions in the square root of time that differ in slope b, but not intercept a (left panel), and discrete values of Ro from the modified White-Wixted model at different times in the retention interval, needed to generate the values of discriminability for the hypothetical forgetting functions in the left panel (right panel).

The result of the back-to-front hypothetical analysis shown in Figures 3 and 4 suggested to us that an approximately linear Ro growth function was needed to achieve a reduction in discriminability with increasing retention-interval duration according to y = a·exp(b·?t). Intuitively, the growth of Ro over the course of the retention interval might be limited, and follow a Gompertz function, in which slower growth at the start is followed by a period of rapid growth, and then a falloff in growth as the function reaches a limit. Such a process might occur over a much longer time, but for the short durations used in the delayed matching task, the growth of Ro over most of the range is best approximated by a linear function. The linear function is of course the most parsimonious, and in a very different model with ‘null’ memory traces that block recall (Lansdale & Baguley, 2008), the null traces are assumed to increase as a linear function of time. We therefore assume a linear growth function that has an intercept at (0) of Ro (0), and a slope of g. That is, the growth over time t is Ro = Ro (0) + g·t.

The Ro Model

The resulting modified White-Wixted model, which we call the “Ro model” for short, has three parameters, the distance D between means of the discriminal distributions, and the intercept Ro (0) and slope g of the Ro growth function, with standard deviations of the discriminal distributions set at 1.0. When generating predicted forgetting functions from the Ro model, the intercept of the forgetting function depends on both D and the intercept of the Ro growth function, but it does not depend on the slope g of the Ro growth function. These relationships are illustrated in Figure 5, for multiple runs of the model. Figure 5 shows values of the forgetting function intercepts a, for instances in which the intercepts of the growth function vary with D = 5 (top panel), and for instances in which D varies, for a constant growth function intercept (bottom panel).

A similar hypothetical analysis shows that the slope of the predicted forgetting function depends on both the slope of the Ro growth function and its intercept. Figure 6 shows that the slope, or rate of forgetting <b, of the predicted forgetting function is greater, the greater the rate of increase in Ro over the course of the retention interval. However, if Ro starts at a higher level early in the interval (that is, at a higher intercept), the rate of growth in Ro is constrained and accordingly the rate of forgetting is not so great.

Forgetting Functions Differing in Intercept

vol9_white_brown_img5

Figure 5. The intercept of predicted forgetting functions depends on the intercept of the hypothetical Ro growth function and the distance D between means of discriminal dispersions in the Ro model, but does not depend on the slope g of the growth function.

As a generalization, forgetting functions are characterized by differences in intercept, that is, discriminability at time t = 0, and in slope, or rate of forgetting (White, 1985, 2001, 2013). The following sections give examples of both. In the figures that follow, both panels show data from empirical studies of delayed matching to sample, typically in the pigeon, and in which retention interval was varied over several values. The left panel shows dashed curves for the exponential in ?t fitted to the data by the method of least squares. The right panel shows the smooth curves predicted by our Ro model. These, too, were best-fitting functions according to the method of least squares. The right panels give values for the three parameters in the model to facilitate comparison across experimental conditions.

Functions that differ in intercept can be interpreted in terms of factors that affect overall difficulty of the task, or attentional factors, such as the disparity between sample stimuli, the number of responses made to a sample, and the duration of sample stimulus presentation. In a first example, Fetterman (1995) trained pigeons to discriminate three short sample durations from three long durations in a delayed matching task, and categorized the discriminations as easy, medium, or hard. His data, plotted in terms of the nonparametric discriminability measure A’, are shown in Figure 7. The left panel shows fits of the exponential in ?t, and the right panel shows fits of the Ro model. In the Ro model, the intercept of the Ro growth function was set at 0.0001, and D and g were free to vary. As the discrimination became more difficult, D decreased and the rate of Ro growth in the retention interval increased. This effect illustrates our main interpretation of Ro, which functions to attract attention away from the task of remembering by rewarding competing behaviors, analogous to concurrent tasks in human memory research.

vol9_white_brown_img7

Figure 7. Data from Fetterman (1995) with fitted exponential in ?t functions differing primarily in intercept but not slope (left panel), and fitted functions predicted by the Ro model (right panel).

In a second example, Grant (1976) found that increasing the exposure duration of sample stimuli resulted in an increase in accuracy of pigeons’ delayed matching performance. We transformed the proportion correct (p) data from Grant’s study to Logit p, which equals log d when there is no response bias (a safe assumption for averaged data). Figure 8 shows the exponential in ?t function fitted to the data in the left panel and the functions predicted by our Ro model in the right panel. The decrease in the D parameter in the Ro model with decreasing sample duration reflects the overall weakening of the discrimination, and the increase in the rate of growth of Ro for the more difficult discrimination is similar to the effect shown in Figure 7.

vol9_white_brown_img8

Figure 8. Data from Grant (1976) with fitted exponential in ?t functions differing primarily in intercept but not slope (left panel), and fitted functions predicted by the Ro model (right panel).

In a third example, five pecks to the sample (FR5) led to greater delayed matching accuracy than did a single peck (White & Wixted, 1999), with fitted exponential in ?t functions that differed in intercept but not slope (Figure 9, left panel). The Ro model predicts a decrease in D for FR1 compared to FR5, with an increase in the rate of Ro growth, given a fixed intercept for the Ro growth function (Figure 9, right panel).

A further manipulation to enhance the discriminability of the samples is torequire differential responding to the two samples, as did Zentall and Sherburne (1994). They trained their pigeons to respond (FR10) or not to respond (DRO) to color samples in a delayed matching task. With differential responding, discriminability was overall higher than without, and fitted exponential in ?t functions showed clear differences in intercept (Figure 10, left panel). Predictions from the Ro model (Figure 10, right panel) also fitted the data well. The difference in discrimination between the two conditions was reflected in a higher value of the D parameter for the FR10 vs. DRO task, and a lower rate of growth of Ro during the retention interval. In other words, differential responding to the sample helped to protect attention to the memory task from the interfering effects of reinforcers for alternative activities.

vol9_white_brown_img9

Figure 9. Data from White & Wixted (1999) with fitted exponential in ?t functions differing primarily in intercept but not slope (left panel), and fitted functions predicted by the Ro model (right panel).

vol9_white_brown_img10

Figure 10. Data from Zentall & Sherburne (1994), with fitted exponential in ?t functions differing primarily in intercept (left panel), and fitted functions predicted by the Ro model (right panel).

The four examples above are all instances in which variation in sample-stimulus discriminability, through physical stimulus disparity, exposure duration, repetition, or differential sample responding, can be predicted by changes in the distance D between discriminal distributions in the Ro model, accompanied by an increase in the rate of Ro growth when the discrimination becomes more difficult and the distracting force of Ro becomes greater. For fits of the Ro model to data in Figures 7–10, the intercept of the Ro growth function was 0.0001 for all of the different conditions. Figure 5 suggests, however, that stimulus disparity D could be held constant for the comparison between different conditions, and variation in the intercept a of the forgetting function could be accounted for by variation in the intercept of the growth function. The Ro model would then have two free parameters, namely the intercept and slope of the growth function, and we would interpret discriminability differences at t = 0 as resulting from differences in attention to the sample at the time of encoding, versus attention to competing behaviors. The latter interpretation seems consistent with instances in which sample-stimulus conditions are held constant, but accuracy is lowered through drug administration and consequential distraction from competing alternatives. For example, administration of the drug scopolamine increases the overall difficulty of discrimination, as reflected in a reduction in the intercept of the forgetting function, consistent with much prior research on the effects of drugs on delayed matching performance in pigeons and rats (Parkes & White, 2000; White & Ruske, 2002; Wright & White, 2003). Ruske, Fisher, and White (1997) compared the effects of scopolamine with a vehicle control on delayed matching performance in pigeons. Their data are shown in Figure 11, with fitted exponential in ?t functions that differ in intercept. For fits of the Ro model (Figure 11, right panel), we assumed that sample discriminability was the same for vehicle and drug conditions, and set D = 5. In terms of the model, both the starting level of Ro (the intercept), and the rate of growth in Ro across the retention interval, were greater under scopolamine administration. The presence of higher levels of Ro under drug administration, which distracts the animal from attending to the memory task, seems plausible.

vol9_white_brown_img11

Figure 11. Data from Ruske, Fisher, & White (1997) with fitted exponential in ?t functions differing in intercept but not slope (left panel), and fitted functions predicted by the Ro model (right panel).

Forgetting Functions Differing in Slope

Rate of forgetting, or slope of the forgetting function, tends to be influenced by events occurring during the retention interval, and by reinforcement factors. The most striking example is retroactive interference, thoroughly studied by Roberts and Grant (1978), Cook (1980), and others. Pigeons, strongly visual animals, perform delayed matching tasks with visual stimuli with high accuracy when the experimental chamber is dark during the retention interval. When the chamber is illuminated during the retention interval, accuracy plummets from a high level at t = 0 s, to very low levels. During the retention interval in the illuminated chamber, they tend to peck at marks on the chamber wall, pace, wing flap, and find grain spilled from the food hopper. In other words, they engage in a variety of behaviors that we assume are extraneous to the task of remembering, and that are rewarded by (hypothetical) Ro, reinforcers for other behavior. Roberts and Grant (1978) varied the retention interval over a wide range and reported a very clear detrimental effect of illuminating the chamber by turning on the houselight. A similar result, also for pigeons in a delayed matching task, was reported by Harper and White (1997). Their data (Figure 12) were well fitted by exponential in ?t functions that differed in slope but not intercept (Figure 12, left panel). Their data were also satisfactorily fitted by our Ro model, with the same values for the D parameter for dark and houselight conditions, with similar values for the intercepts of the Ro growth functions, and a greater growth of Ro under conditions with the houselight turned on (Figure 12, right panel). In this and subsequent examples in which slope of the forgetting function varies, D was held constant across conditions, and only the two growth function parameters were free to vary. This result provides strong validation for our assumption that Ro grows during the retention interval and rewards extraneous behaviors that compete with the task of remembering.

vol9_white_brown_img12

Figure 12. Data from Harper & White (1997), with fitted exponential in ?t functions differing primarily in slope but not intercept (left panel), and fitted functions predicted by the Ro model (right panel).

The assumption that the level of Ro during the retention interval may depend on whether the chamber is dark or light gains support from a novel result reported recently by White and Brown (2011). Retention interval duration was varied within sessions in a delayed matching task with pigeons. Three conditions are of interest, two of which replicated the effect shown in Figure 12. In the third, the chamber was illuminated for the first few seconds of the retention interval and accuracy at these retention intervals was low. When the chamber was darkened after the first few seconds in longer retention intervals, accuracy recovered to the higher level consistent with performance in the baseline condition in which the retention intervals were dark throughout. In terms of our Ro model, we assume that Ro was high during the initially light part of the retention interval and lower during the later dark part of the interval, thus causing an apparent reversal of the forgetting function.

vol9_white_brown_img13

Figure 13. Data from Jones & White (1994), with fitted exponential in ?t functions differing primarily in slope but not intercept (left panel), and fitted functions predicted by the Ro model (right panel).

The differential outcomes effect (DOE) is a curious phenomenon in which discriminability is enhanced when the outcomes or rewards for correct matching responses are different, compared to when they are the same (Urcuioli, 2005). Our previous analyses indicate that the DOE manifests primarily as a difference in rate of forgetting, that is, in the slope of the forgetting function, often with relatively small differences in intercepts (Jones & White, 1994). In other words, the enhanced discriminability appears at longer delay intervals to a greater extent than at shorter delays. The DOE is illustrated in Figure 13 (left panel), in which the data from the within-sessions procedure reported by Jones and White are fitted by exponential in ?t functions that differ mainly in slope. The data are also well fitted by our Ro model (Figure 13, right panel), with an assumption that stimulus disparity D is equal for same and differential outcomes trials. The DOE in Figure 13 is predicted by starting with a larger background Ro on Same trials than on Different trials, and grows at a faster rate (g) on Same trials. This assumption makes sense if it is assumed that rewards on Different trials have a stronger effect than on Same trials and are less diluted by Ro (as in the signaled probability effect described below), consistent with the finding that rewards in mixed or variable schedules of reinforcement have stronger effects in maintaining behavior than do rewards in fixed schedules of reinforcement (Davison, 1969; Fantino, 1967).

vol9_white_brown_img14

Figure 14. Data from Miller, Freidrich, Narkavik, & Zentall (2009), with fitted exponential in ?t functions differing primarily in slope (left panel), and fitted functions predicted by the Ro model (right panel).

A possible challenge to our notion that the DOE derives from a greater reinforcing effect of the differential outcomes, relative to Ro during the retention interval, comes from the unusual finding that the DOE occurs with non-hedonic differential outcomes. Figure 14 shows the delayed matching-to-sample performance of pigeons for which outcomes for correct choices in a differential outcomes condition were brief presentations of houselight or tone, followed by the same amount of food, compared to either houselight or tone plus food in a non-differential outcomes condition (Miller, Friedrich, Narkavic, & Zentall, 2009). The data follow the same form as those in Figure 13 in which differential hedonic (food) outcomes were arranged, and were well fit by exponential in ?t functions differing primarily in slope (left panel) and by our Ro model (right panel). In terms of our Ro model, we suggest that the same account applies to the DOE with differentially cued food outcomes (Figure 14) as for differential food outcomes (Figure 13). Specifically, by preceding rewards for correct choices following the different sample stimuli by different brief signals, the reinforcing strength of the rewards is enhanced relative to the effect of Ro. As a result, the interfering effect of Ro on different-outcome trials is less than that on same-outcome trials. The effect of adding the cue is perhaps consistent with the higher response rates in the choice phase of a concurrent-chains procedure when the choice leads to multiple schedules that are differentially cued, compared to when the choice leads to mixed schedules that are not (Hursh & Fantino, 1974).

vol9_white_brown_img15

Figure 15. Data from Brown & White (2005a), with fitted exponential in ?t functions differing primarily in slope but not intercept (left panel), and fitted functions predicted by the Ro model (right panel).

The signaled probability effect occurs in delayed matching to sample when a cue is presented during the retention interval (but not with the sample), which signals whether correct matching responses will be rewarded with low or high probability. The reinforcer probabilities and associated cues alternate randomly across trials within session, and in the study reported by Brown and White (2005a), were 0.2 and 1.0. Figure 15 shows their data, with best-fitting exponential in ?t functions that differed in slope but not intercept (left panel). The right panel of Figure 15 shows the fits of our Ro model, in which parameters for stimulus disparity D and the intercept of the Ro growth function (at t = 0) were the same for the two probability conditions, as is intuitively plausible. The difference in the model fits was in the rate of Rogrowth parameter, g. This result validates our interpretation. If the reduction in discriminability with increasing retention interval duration results from competition between reinforcers for completing the memory task and reinforcers for alternative or other behaviors, Ro, then a reduction in the probability of reward for the memory task will result in a relatively greater influence of Ro and, accordingly, a greater increase in the rate of forgetting.

vol9_white_brown_img16

Figure 16. Data from Brown & White (2005b), with fitted exponential in ?t functions differing primarily in slope but not intercept (left panel), and fitted functions predicted by the Ro model (right panel), for conditions in which center-key pecking during the retention interval (an extraneous task) was reinforced according to VI 15 s, VI 30 s, or EXT schedules.

The rationale above was applied more specifically by Brown and White (2005b) to a delayed matching-to-sample task with pigeons, in which an extraneous task was interpolated in the retention interval. The extraneous task involved pecking the center key, with pecks rewarded according to variable interval (VI) schedules of VI 15 or VI 30 s, or not at all (Extinction or EXT). The rationale was that the experimenter-arranged extraneous VI reinforcement should add to the hypothetical Ro to increase the total extraneous reinforcement. The data were satisfactorily fit by exponential in ?t functions (Figure 16, left panel), and by our Ro model (right panel). In terms of the model, for a fixed value of the stimulus disparity parameter D, both the intercept and slope of the Ro growth function increased with increasing rate of extraneous reinforcement. The reduction in accuracy in delayed matching performance with increasing rate of extraneous reinforcement for center-key responding can therefore be attributed to interference or competition between reinforcers for other behavior and reinforcers for completing the delayed matching task. That is, the result reported by Brown and White (2005) constitutes strong direct support for our Ro theory.

In the delayed matching task, the reinforcement context might extend to the intertrial interval (ITI), as well as the retention interval, perhaps depending on the extent to which the ITI is discriminated from the trial. During the ITI, extraneous behaviors may occur. Following the argument of McLean and White (1983) and McLean (1991), Ro not obtained in a short ITI might carry over into a subsequent trial and compete with rewards for the delayed matching task. As a result, accuracy with short ITIs is poorer than with long ITIs, a common result (Edhouse & White, 1988; Roberts, 1980; White, 1985). Additionally, adding noncontingent reinforcers to the ITI (Santi & Roberts, 1985), especially when they are added at the end of the ITI (Spetch, 1985), results in a substantial reduction in matching accuracy. When the ITI is illuminated and the retention interval is dark, however, the trial spacing effect is lost (Edhouse & White, 1988; Santi, 1984), presumably because a clearer discrimination between the ITI and retention interval reduces the likelihood of carryover of Ro.

Conclusion

In the present paper, we suggest that forgetting in delayed matching-to-sample tasks results from competition between reinforcers for extraneous behaviors and the reinforcers for matching to sample. As a result, extraneous behaviors interfere or compete with matching to sample. The notion of reinforcer competition is well developed in the study of concurrent choice and applications of the matching law (Davison & McCarthy, 1988). Even at the time of sample presentation, attention to the memory task may be diminished by distraction caused by reinforcers for other behaviors.

In our modification of the White-Wixted (1999) model, the parameter D represents the distance between means of the discriminal dispersions, as in the original version of the model. In the first several examples we present, D was free to vary in fitting the model, and tended to change in plausible ways with apparently decreasing difficulty of the discrimination. For example, Grant (1976) found decreasing accuracy with shorter presentation durations of the sample stimuli (Figure 8). In our fits of the model to Grant’s data, D decreased systematically with decreasing sample presentation duration. The intercepts of both obtained and predicted forgetting functions also decreased. As Figure 5 shows, however, the intercept can be determined by an additive combination of D and the intercept Ro (0) of the Ro growth function. That is, given a particular level of stimulus disparity, the background Ro at the beginning of the retention interval, or during sample presentation, can result in lack of attention to the sample and a decrease in discriminability. It is therefore possible to substitute changes in Ro (0) for changes in D, thus requiring only two free parameters in the Ro model, both relating to reinforcers for extraneous behavior. Consistent with this possibility, in some of the examples above, such as the effect of scopolamine in reducing overall accuracy (Figure 11), an increase in the intercept of the Ro growth function contributed to the reduction in the intercept of the forgetting function.

The main feature of the present Ro model is the assumption that Ro grows over the course of the retention interval, and that this growth is the cause of forgetting, that is, the progressive reduction in discriminability with the passage of time. This assumption and its implementation in our modification of the White-Wixted model allowed quantitative predictions of the time course of forgetting functions. Our assumption of the linear growth function is justified by the success of our Ro model in fitting the data. Although we have not reported measures of goodness of fit, the figures above show that the fits of the Ro model mirrored the fits of the exponential in ?t function, which is the most successful function in fitting data from delayed matching studies (White, 2001, 2002b). When we considered alternative Ro growth functions, such as a limited growth exponential and the Gompertz function, they were essentially linear over the range of delays used in most of the studies reviewed, and so had no advantage over the linear function adopted here. The linear function might seem counterintuitive, but we see no reason why reinforcers from distracting sources should not continue to build up linearly as time progresses. The Ro model is somewhat parsimonious. It can be regarded as an interference model with a single primary mechanism—reinforcement competition. It has only three parameters, stimulus disparity D, the starting level of Ro, and the rate of growth of Ro over the course of the retention interval, or only two parameters when D is fixed. Quantitatively, it does well to fit delayed matching data from studies with a range of independent variables. We have not yet compared it with other possible models, in particular the reinforcement-based model of Nevin, Davison, Odum, and Shahan (2007), or conducted a comprehensive survey of its ability to fit data from all extant delayed matching studies with at least four delays. Its ultimate success, however, may depend on more intuitive considerations. For example, when fits of the model with two or three free parameters indicate that hypothetical Ro is responsible for an effect, such as in the differential outcomes effect (see Figures 13 and 14), it will be necessary to provide validating evidence to reveal the action of extraneous rewards in diluting the effects of rewards for remembering.


References

Baddeley, A. (1997). Human memory: Theory and practice. Revised edition. Hove, UK: Psychology Press.

Blough, D. S. (1959). Delayed matching in the pigeon. Journal of the Experimental Analysis of Behavior, 2, 151-160. doi.org/10.1901/jeab.1959.2-151

Brown, G. S., & White, K. G. (2005a). On the effects of signalling reinforcer probability and magnitude. Journal of the Experimental Analysis of Behavior, 83, 119-128. doi.org/10.1901/jeab.2005.94-03

Brown, G. S., & White, K. G. (2005b). Remembering: The role of extraneous reinforcement. Learning & Behavior, 33, 309-323. doi.org/10.3758/BF03192860

Brown, G. S., & White, K. G. (2009). Reinforcer probability, reinforcer magnitude, and the
reinforcement context for remembering. Journal of Experimental Psychology: Animal Behavior Processes, 35, 238-249. doi.org/10.1037/a0013864

Brown, J. (1958). Some tests of the decay theory of immediate memory. Quarterly Journal of Experimental Psychology, 10, 12-21. doi. org/10.1080/17470215808416249

Cook, R. G. (1980). Retroactive interference in pigeon short-term memory by a reduction in ambient illumination. Journal of Experimental Psychology: Animal Behavior Processes, 6, 326-338. doi. org/10.1037/0097-7403.6.4.326

Davison, M. C. (1969). Preference for mixed-interval versus fixed-interval schedules. Journal of the Experimental Analysis of Behavior, 12, 247-252. doi. org/10.1901/jeab.1972.17-169

Davison, M., & McCarthy, D. (1988). The Matching Law: A research review. Hillsdale, NJ: Erlbaum.

Davison, M. C., & Tustin, R. D. (1978). The relation between the generalized matching law and signal detection theory. Journal of the Experimental Analysis of Behavior, 29, 331-336. doi.org/10.1901/jeab.1978.29-331

Edhouse, W., & White, K.G. (1988). Sources of proactive interference in animal memory. Journal of Experimental Psychology: Animal Behavior Processes, 14, 56-71. doi.org/10.1037/0097-7403.14.1.56

Fantino, E. (1967). Preference for mixed- versus fixed-ratio schedules. Journal of the Experimental Analysis of Behavior, 10, 35-43. doi.org/10.1901/ jeab.1967.10-35

Fetterman, J. G. (1995). The psychophysics of remembered duration. Animal Learning & Behavior, 23, 49-62. doi.org/10.3758/BF03198015

Grant, D. S. (1976). Effect of sample presentation time on long-delay matching in the pigeon. Learning and Motivation, 7, 580-590. doi. org/10.1016/0023-9690(76)90008-4

Grant, D. S. (1981). Short-term memory in the pigeon. In N. E. Spear & R. R. Miller (Eds.), Information processing in animals: Memory mechanisms (pp. 227-256). Hillsdale, NJ: Erlbaum.

Harper, D. N., & White, K.G. (1997). Retroactive interference and rate of forgetting in delayed matching-to-sample performance. Animal Learning & Behavior, 25, 158-164. doi.org/10.3758/BF03199053

Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behavior, 4, 267-272. doi.org/10.1901/jeab.1961.4-267

Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimental Analysis of Behavior, 13, 243-266. doi.org/10.1901/jeab.1970.13-243

Hursh, S. R., & Fantino, E. (1974). An appraisal of preference for multiple versus mixed schedules. Journal of the Experimental Analysis of Behavior, 22, 31-38. doi.org/10.1901/jeab.1974.22-31

Jones, B. M., & White, K. G. (1994). An investigation of the differential-outcomes effect within sessions. Journal of the Experimental Analysis of Behavior, 61, 389-406. doi.org/10.1901/jeab.1994.61-389

Lansdale, M., & Baguley, T. (2008). Dilution as a model of long-term forgetting.
Psychological Review, 115, 864-892. doi.org/10.1037/a0013325

Lewandowsky, S., Oberauer, K., & Brown, G. D. A. (2009). No temporal decay in verbal short-term memory. Trends in Cognitive Sciences, 13, 120-126. doi.org/10.1016/j.tics.2008.12.003

McGeoch, J. A. (1932). Forgetting and the law of disuse. Psychological Review, 39, 352-370. doi.org/10.1037/h0069819

McLean, A. P. (1991). Local contrast in behavior allocation during multiple-schedule components. Journal of the Experimental Analysis of Behavior, 56, 81-96. doi.org/10.1901%2Fjeab.1991.56-81

McLean, A. P., & White, K.G. (1983). Temporal constraint on choice: Sensitivity and bias in multiple schedules. Journal of the Experimental Analysis of Behavior, 39, 405-426. doi.org/10.1901/jeab.1983.39-405

Miller, H. C., Friedrich, A. M., Narkavic, R. J., & Zentall, T. R. (2009). A differential-outcomes effect using hedonically nondifferential outcomes with delayed matching to sample by pigeons. Learning & Behavior, 37, 161-166. doi.org/10.3758/LB.37.2.161

Nairne, J. S. (2002). Remembering over the short-term: The case against the standard model. Annual Review of Psychology, 53, 53-81. doi.org/10.1146/annurev.psych.53.100901.135131

Nevin, J. A., Davison, M., Odum, A. L., & Shahan, T.A. (2007). A theory of attending, remembering, and reinforcement in delayed matching to sample. Journal of the Experimental Analysis of Behavior, 88, 285-317. doi.org/10.1901/jeab.2007.88-285

Parkes, M., & White, K. G. (2000). Glucose attenuation of memory impairments. Behavioral Neuroscience, 114, 1-13. doi.org/10.1037//0735-7044.114.2.307

Peterson, L. R., & Peterson, M. (1959). Short-term retention of individual verbal items. Journal of Experimental Psychology, 58, 193-198. doi.org/10.1037/h0049234

Portrat, S., Barrouillet, P., & Camos, V. (2008). Time-related decay or interference-based forgetting in working memory? Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 1561-1564. doi.org/10.1037/a0013356

Roberts, W. A. (1972). Short-term memory in the pigeon: Effects of repetition and spacing. Journal of Experimental Psychology, 94, 74-83. doi.org/10.1037/h0032796

Roberts, W. A. (1980). Distribution of trials and intertrial retention in delayed matching to sample with pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 64, 217-237. doi.org/10.1037/0097-7403.6.3.217

Roberts, W. A., & Grant, D. S. (1978). An analysis of light-induced retroactive inhibition in pigeon short-term memory. Journal of Experimental Psychology: Animal Behavior Processes, 4, 219-236. doi.org/10.1037/0097-7403.4.3.219

Roediger, H. L. III., Weinstein, Y., & Agarwal, P. K. (2010). Forgetting: Preliminary considerations. In S. D. Salla (Ed.), Forgetting (pp. 1-22). Hove, Sussex: Psychology Press.

Rubin, D. C., & Wenzel, A. E. (1996). One hundred years of forgetting: A quantitative description of retention. Psychological Review, 103, 734-760. doi.org/10.1037/0033-295x.103.4.734

Ruske, A. C., Fisher, A., & White, K. G. (1997). Attenuation of scopolamine-induced deficits in delayed-matching performance by a new muscarinic agonist. Psychobiology, 25, 313-320.

Santi, A. (1984). The trial spacing effect in delayed matching-to-sample by pigeons is dependent upon the illumination condition during the intertrial interval. Canadian Journal of Psychology, 38, 154-165. doi.org/10.1037/h0080830

Santi, A., & Roberts, W. A. (1985). Reinforcement expectancy and trial spacing effects in delayed matching-to-sample by pigeons. Animal Learning & Behavior, 13, 274-284. doi.org/10.3758/BF03200021

Sargisson, R. J., & White, K. G. (2003). On the form of the forgetting function: The effects of arithmetic and logarithmic distributions of delays. Journal of the Experimental Analysis of Behavior, 80, 295-309. doi.org/10.1901/jeab.2003.80-295

Spetch, M. L. (1985). The effect of intertrial interval food presentations on pigeons’ delayed matching to sample accuracy. Behavioural Processes, 11, 309-315. doi.org/10.1016/0376-6357(85)90025-7

Staddon, J. E. R. (1983). Adaptive behavior and learning. Cambridge: Cambridge University Press.

Suprenant, A. M., & Neath, I. (2009). Principles of memory. NewYork: Psychology Press.

Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34, 273-286. doi.org/10.1037/h0070288

Urcuioli, P. J. (2005). Behavioral and associative effects of differential outcomes on discrimination learning. Learning & Behavior, 33, 1-21. doi.org/10.3758/BF03196047

White, K. G. (1985). Characteristics of forgetting functions in delayed matching to sample. Journal of the Experimental Analysis of Behavior, 44, 15-34. doi.org/10.1901/jeab.1985.44-15

White, K. G. (2001). Forgetting functions. Animal Learning & Behavior, 29, 193-207. doi.org/10.3758/BF03192887

White, K. G. (2002a). Psychophysics of remembering: The discrimination hypothesis. Current Directions in Psychological Science, 11, 141-145. doi.org/10.1111/1467-8721.00187

White, K. G. (2002b). Temporal generalization and diffusion in forgetting. Behavioral Processes, 57, 121-129. doi.org/10.1016/S0376-6357(02)00009-8

White, K.G. (2012). Dissociation of short term forgetting from the passage of time. Journal of Experimental Psychology: Learning, Memory, and Cognition, 38, 255-259. doi.org/10.1037/a0025197

White, K. G. (2013). Remembering and forgetting. In Madden, G. J. (Ed.-in-Chief), W. V. Dube, T. Hackenberg, G. P. Hanley, & K. A. Lattal (Assoc. Eds.) APA handbooks in psychology: APA Handbook of behavior analysis, Volume. 1: Methods and principles. Washington, DC: American Psychological Association.

White, K.G., & Brown, G. S. (2011). Reversing the course of forgetting. Journal of
the Experimental Analysis of Behavior, 96
, 177-189. doi.org/10.1901/jeab.2011.96-177

White, K. G., & Ruske, A. C. (2002). Memory deficits in Alzheimer’s Disease: The encoding hypothesis and cholinergic function. Psychonomic Bulletin & Review, 9, 426-437. doi.org/10.3758/BF03196301

White, K.
G., & Wixted, J. T. (1999). Psychophysics of remembering. Journal of the Experimental Analysis of Behavior, 71, 91-113. doi.org/10.1901/jeab.1999.71-91

White, K. G., & Wixted, J. T. (2010). Psychophysics of remembering: To bias or not to bias? Journal of the Experimental Analysis of Behavior, 94, 83-94. doi.org/10.1901/jeab.2010.94-83

Wixted, J. T. (2004). The psychology and neuroscience of forgetting. Annual Review of Psychology, 55, 235-269. doi.org/10.1901/jeab.2010.94-83

Wixted, J. T. (2010). The role of retroactive interference and consolidation in everyday forgetting. In S. D. Salla (Ed.), Forgetting (pp. 285-312). Hove, Sussex: Psychology Press.

Wixted, J. T., & Carpenter, S. K. (2007). The Wickelgren Power Law and the Ebbinghaus Savings Function. Psychological Science, 18, 133-134. doi.org/10.1111/j.1467-9280.2007.01862.x

Wixted, J. T., & Ebbesen, E. B. (1991). On the form of forgetting. Psychological Science, 6, 409-415. doi. org/10.1111/j.1467-9280.1991.tb00175.x

Wright, F. K., & White, K. G. (2003). Effects of methylphenidate on working memory in pigeons. Cognitive, Affective & Behavioral Neuroscience, 3, 300-308. doi.org/10.3758/CABN.3.4.300

Woodworth, R. S., & Schlosberg, H. (1954). Experimental psychology: Revised edition. New York: Holt, Rinehart and Winston.

Zentall, T. R. (1973). Memory in the pigeon: Retroactive inhibition in a delayed matching task. Bulletin of the Psychonomic Society, 1, 126-128.

Zentall, T. R., & Sherburne, L. M. (1994).The role of differential sample responding in the differential outcomes effect involving delayed matching by pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 20, 390-401. doi. org/10.1037/0097-7403.20.4.390

Volume 8: pp 78 – 97

twyman_thumbTwo Fields Are Better Than One: Developmental and Comparative Perspectives On Understanding Spatial Reorientation

Alexandra D. Twyman
University of Western Ontario

Daniele Nardi
Sapienza University

Nora S. Newcombe
Temple University

Reading Options:

PDF | Add to Endnote | Kindle | eBook


Abstract: Occasionally, we lose track of our position in the world, and must re-establish where we are located in order to function. This process has been termed the ability to reorient and was first studied by Ken Cheng in 1986. Reorientation research has revealed some powerful cross-species commonalities. It has also engaged the question of human uniqueness because it has been claimed that human adults reorient differently from other species, or from young human children, in a fashion grounded in the distinctive combinatorial power of human language. In this chapter, we consider the phenomenon of reorientation in comparative perspective, both to evaluate specific claims regarding commonalities and differences in spatial navigation, and also to illustrate, more generally, how comparative cognition research and research in human cognitive development have deep mutual relevance.

Keywords: spatial reorientation, geometric module, adaptive combination, individual differences, sex differences, slope


One of the many unique characteristics of the human species is, arguably, the urge to reflect on what characteristics make us unique. There are many distinctive characteristics to consider, such as large brains, bipedal gait, lengthy childhoods, tool invention and use, symbolic representation and grammatically-structured language. But at least as interesting a question as what makes our species distinctive is the question of what we share with other species. In fact, systematic understanding of similarities as well as differences is arguably helpful to answering questions about species-uniqueness.

When we pursue a serious comparative cognition research strategy of this kind, the ability to navigate successfully is a central domain in which to work. Navigation is a crucial skill for all mobile organisms. Do all species use the sametechniques to navigate successfully? Common mechanisms could arise either because the essential problem was solved long ago by a common ancestor, or because the structure of the problem itself places constraints on the possible ways it can be solved. Or do various species invent different solutions to the navigation problem, depending on their sensory and motor abilities, the kind of food they seek, the characteristics of their predators, and so forth?

At first glance, it seems likely that various species differ considerably in how they navigate (for a general overview of navigation in a comparative perspective, see Wiener et al., 2011). For example, some species have magnetic compasses or sonar capabilities, while others do not; some species migrate long distances, while others live out their lives in ancestrally-defined territories. However, despite these obvious differences between species, there may also be deeper commonalities. One such cross-species commonality in spatial navigation has been proposed to be the use of geometric information in the surrounding environment to reorient. Occasionally, we lose track of our position in the world, and must re-establish where we are located in order to function. Several kinds of information could guide this process, called reorientation.

One parsing of the information sources for reorientation proposes two classes of cues (Gallistel, 1990). Geometric cues involve the relation between at least two points or two surfaces; in the lab, this has been operationalized mainly by investigating the use of relative lengths or corner angles of enclosed surfaces. Any other cue to orientation has been termed, by default, non-geometric, or sometimes featural, and operationalizations have included the study of colored walls, beacons, and odors. More recently, a third type of cue – the slope of the floor of an enclosed search space – has been examined, and slope appears to be a powerful reorientation cue as well.

Reorientation research has revealed some powerful cross-species commonalities. It has also engaged the question of human uniqueness because it has been claimed that human adults reorient differently from other species, or from young human children, in a fashion grounded in the distinctive combinatorial power of human language. In this chapter, we consider the phenomenon of reorientation in comparative perspective, both to evaluate specific claims regarding commonalities and differences in spatial navigation, and also to illustrate, more generally, how comparative cognition research and research in human cognitive development have deep mutual relevance. We begin with the debate over the geometric module, as this issue has initiated and fueled research in the field. Following an exposition of the modular approach, we first discuss claims that human language confers a unique mode of operation on human adults and older children, and then proceed to other aspects of the modularity debate, and evidence for a non-modular position, i.e., adaptive combination theory. We then transition to two sections that are aimed at broadening the focus of the debate. The first of these sections focuses on a discussion of slope as a potential reorientation cue, how it might be differentially used across species, and if slope could be considered a particular type of either geometric or feature information, or is instead an entirely new cue class. The second case for a wider perspective comes from the fact that the reorientation literature has so far focused on the behavior of groups of individuals, for example, pigeons or mice or children of various ages, considered collectively. There is a growing trend to look for individual differences within species or age groups that might be predictors of behavior. Many spatial abilities have been studied in relation to individual and sex differences in performance, and we close with a discussion of recently reported sex-related differences in reorientation.

The Original Proposal: A Geometric Module

Ken Cheng (1986) was the first researcher to observe a difference between the search behavior of oriented and disoriented rats. His rats were allowed to search for food as they wandered around in a rectangular enclosure. Each of the corners was marked with distinctive feature cues of various kinds, e.g., the number of lights, the odor (see Figure 1). Once a rat found the correct corner, it was allowed to start eating, but partway through its meal, it was removed, disoriented, and then placed in an identical enclosure. It would seem quite logical for the rat to return to the corner at which there had been food, but this only happened 50% of the time. In this situation, rats favored the corners that were geometrically correct, but did not use other cues to disambiguate the two corners. For example, even when the correct corner smelled of peppermint, rats would sometimes return to the peppermint-scented corner, but equally often go to the rotationally equivalent corner that smelled of licorice. This behavioral pattern is found only for working memory versions of the task where the correct corner changes from trial to trial. In reference memory versions of the task, where the correct location remains stable over the course of the experiment, then over time rats are able to learn to use the non-geometric properties of the space.

To explain this suboptimal behavior on the working memory task, Cheng proposed the idea of a geometric module for reorientation. He argued that when rats return to the enclosed space, the geometry of the enclosure is the overriding cue that is used to re-set their spatial position so that the two corners with identical geometric properties are indistinguishable.  Importantly, the geometric information was proposed to be modular, in the sense of being encapsulated and impenetrable. This description captured the fact that rats discarded the useful feature information, even though it could have been used for better performance.

Gallistel (1990) proposed that the apparently suboptimal behavior observed in the lab might be quite advantageous in the natural world. He argued that the features of the environment change, sometimes over the course of the day as the sunlight shifts or weather patterns change, and also over the seasons, as when the leaves change color and when snow falls. Because the geometric properties of the environment are less changeable than other cues, such as odors, Gallistel proposed that there might have been selective pressure for a geometric module to evolve that excluded the variable feature properties and depended only on the stable geometric properties of the environment.

FIGURE 1 HERE

The Geometric Module-Plus-Language Hypothesis

Cheng’s findings and Gallistel’s analysis suggested that the geometric module might characterize the behavior ofmany species, including humans. Indeed, children between the ages of 18 months to six years of age seemed to perform the same as Cheng’s rats (Hermer & Spelke, 1994, 1996). That is, they ignored a saliently-colored feature wall in a rectangular room, and instead searched for a hidden object in the two geometrically equivalent corners (see Figure 2). Since children and rats performed similarly, it appeared that the reorientation module was evolutionarily ancient and conserved across species. However, human adults, in contrast to rats and toddlers, were able to flexibly combine feature and geometric information and searched almost exclusively for the hidden object at the correct corner. The fundamental difference between the reorientation behavior of rats and children on the one hand and adults on the other hand was proposed to be due to the production of spatial language that enabled flexible adult performance.

FIGURE 2 HERE

Support for the geometric module-plus-language account came from two primary lines of research. First, it was found that, for children between the ages of 5 and 6 years, there was a correlation between production of the words “left” and “right” and successful performance on the reorientation task (Hermer-Vazquez, Moffett, & Munkholm, 2001). The second empirical approach was to try to eliminate adults’ use of language during the reorientation task (Hermer-Vazquez, Spelke, & Katsnelson, 1999). When adults were asked to perform a verbal shadowing task at the same time as the reorientation task, their reorientation behavior fell back to exclusive geometric choices similar to those of the rats and young children. These two lines of evidence were taken as support that children were limited to using geometric information for reorientation until they acquired spatial language production capabilities that enabled them to flexibly integrate feature and geometric cues.

Initial Comparative Work

Troubling evidence for the geometric module-plus-language position seemed to come from comparative data gathered since Cheng’s original work. Features turned out to actually be often used for reorientation across a wide range of non-human animals, including chickens (Vallortigara, Zanforlin, & Pasti, 1990), pigeons (Kelly, Spetch, & Heth, 1998), monkeys (Gouteux, Thinus-Blanc, & Vauclair, 2001; see Figure 3 below),fish (Sovrano, Bisazza, & Vallortigara, 2003), mice (Twyman, Newcombe, & Gould, 2009), and ants (Wystrach & Beugnon, 2009). It is obviously unlikely that feature use in these non-human species could be explained through language.

INSERT FIGURE 3

There are problems, however, with regarding these data as invalidating either the modularity hypothesis or the unique role of human language. First, many of the studies used a reference memory paradigm, in which correct search remains constant across trials. Cheng (1986) had only found modularity effects in working memory, where the correct location changes from trial to trial. Second, Hermer-Vazquez et al. (2001) objected that studies with non-human animals involve extensive training. They suggested that the distinctive power of human language comes from its ability to allow for flexible use of features without training.

Is Language Necessary for Feature Use in Reorientation?

Because work with non-human animals involves training regimens by necessity, the hypothesis that human language has a unique role can really only be examined in the human species. Focusing only on the human evidence, there is reason to doubt that language is necessary for the puncturing of a geometric module by feature cues. First, each of the two lines of supportive research presented earlier can be questioned. There are puzzling aspects to the Hermer-Vazquez et al. (2001) data, such as why it is the production of spatial terms that is associated with better performance, rather than comprehension. Additionally, as suggestive as the data are, it is possible that a third variable could account for the relationship between language production and flexible reorientation. There are also problems with the verbal shadowing experiments. While they seem to give stronger evidence than the correlational data, subsequent research has failed to replicate the dramatic fall to chance for adults concurrently performing the reorientation and verbal shadowing task. Furthermore, and crucially, while reorientation performance does diminish to some extent with verbal shadowing, the effect is not particular to a linguistic task but also occurs with spatial shadowing tasks (Hupbach, Hardt, Nadel, & Bohbot, 2007; Ratliff & Newcombe, 2008a). These data seem to suggest that, while language is a useful tool for adults, it is not a necessity.

Second, if language were crucial, it would seem that individuals with language problems should perform like young children on the reorientation task. There are two tests of this idea. In one experiment, individuals with global aphasia performed no differently from control participants (Bek, Blades, Siegal, & Varley, 2010), suggesting that the flexible behavior observed with human adults does not depend exclusively on the availability of language (although perhaps having been able to speak for many years could be argued to have crucially affected spatial reorientation). In the second experiment, deaf individuals in Nicaragua who had grown up in an environment without input from a structured sign language performed less well than deaf individuals in a second, later-born cohort who did have such input (Pyers, Shusterman, Senghas, Spelke, & Emmorey, 2010). However, the first cohort still searched at the correct corner far more than would be expected by chance (67.5% as opposed to 25% chance). Further, other aspects of the data set indicated that the first cohort had been deprived in ways that led to spatial deficits more global than deficits in feature use for reorientation. They also performed less well than the second cohort in a rotated box condition that did not involve reorientation, and they showed an odd pattern of errors in the reorientation study, in which rotational errors did not predominate, as is almost universal in reorientation studies.

Third, and most decisively, it has turned out that toddlers can in fact use features to reorient. Although far too young to be able to use or comprehend the terms left and right, and often with little spatial language at all, children as young as 18 months can succeed in using a colored wall to find the correct corner in a rectangular room, as long as the room is somewhat larger than the very small room used in the initial Hermer and Spelke studies (Learmonth, Newcombe, & Huttenlocher, 2001). We will review the room size effect in more detail below.

In sum, there is reason to doubt the position that language is the mechanism that facilitates a more flexible reorientation strategy in adults compared to children and non-human animals. However, this is not to say that language is not helpful. There is evidence that even just hearing relevant spatial language (at the red wall) or task relevant non-spatial language (red can help you) can be a powerful tool to help children succeed at reorientation tasks before they are normally able to reorient with a feature cue (Shusterman, Lee, & Spelke, 2011).

Are Features Really Used by Children to Reorient?

Lee, Shusterman, and Spelke (2006) and Lee and Spelke (2010) have proposed an alternative account for the apparent use of features by children and non-human animals. They argue that true reorientation can only be accomplished with geometric cues; in a separate process, features can be used to guide search to the target location, but features are not used to update position in the environment. To test this hypothesis, Lee et al. (2006) asked children to reorient in an enclosed circular space, which does not provide any useful geometric information. Three objects forming an equilateral triangle were placed in the middle of the enclosure. One of these objects was unique (a red cylinder) and two of the objects were identical (blue boxes). Lee et al. argued that the unique red cylinder could act both as a beacon (a feature that directly marks a hiding location) and also as a landmark (a feature that indirectly marks a hiding location) that could in theory differentiate search between the two identical blue box locations. For example, children might orient themselves to the red cylinder and then remember that the hiding location was the blue box on the left. This kind of performance was not found. Children searched almost perfectly at the unique container (a beacon) but divided search evenly (i.e., randomly) between the two blue containers. The authors reasoned that if features were truly capable of being used for reorientation, then children should succeed at the task when the target is hidden in any of the three containers. Therefore, it was argued that children remained disoriented in the absence of a geometric cue, but were nonetheless able to use a beacon to retrieve a hidden object.

As reorientation experiments are often conducted in rectangular enclosures, the two-step account could potentially explain the use of features by non-human animals and young children in the majority of studies to date. In the first step, the only true reorientation step, the participant or subject is able to reorient by the geometry of the space which narrows the possible search locations to two geometrically correct places. In the second step, the participant or subject chooses either the white-white geometrically correct or white-colored geometrically correct corner by beaconing to the correct target location. Thus, the Lee et al. (2006) experiment suggested that a two-step account for reorientation, with true reorientation based on geometry and beacon piloting accounting for feature use, might explain use of features by young children and non-human animals.

This study is not, however, decisive. Some of the parameters of the Lee et al. (2006) study may have made features less likely to be used for reorientation. First, although the area of the circular enclosure was quite large, the actual area of the array of objects was small. It has been demonstrated that features are less likely to be used in small spaces (Learmonth, Newcombe, & Huttenlocher, 2001; Learmonth, Nadel, & Newcombe, 2002). Features are more likely to be used for orientation when they are further away (called distal cues) because they are more accurate for indicating direction than when they are close to the hiding location (proximal cues) where left-right relations can change as one moves around the target location (Nadel & Hupbach, 2006). Second, the feature was itself a hiding container, and thus it is not surprising that it was used as a beacon. Third, the feature appeared small and portable, and in fact the children watched the experimenter move the hiding locations. Mobile parts of the environment are not reliable cues for determining a heading. Fourth, different brain regions appear to be activated when the feature is located inside a space, as opposed to against or on the periphery of an enclosure. From the animal literature, features along the periphery of the enclosure control hippocampal place cell firing, while the same landmark inside the enclosure does not (Cressant, Muller, & Poucet, 1997, 1999; Zugaro, Berthoz, & Wiener, 2001). All of these factors make it more likely that the unique container would be coded by children as a beacon, rather than as a landmark for reorientation.

In fact, there is some evidence that features can be used as a heading cue for reorientation. In square rooms, there are no useful geometric cues to aid reorientation. Success in this task would therefore depend on the use of feature cues. In square environments, toddlers are able to reorient using relative feature cues such as large versus small polka-dot patterns (Huttenlocher & Lourenco, 2007; see Figure 4) and distinct colors (Nardini, Atkinson, & Burgess, 2008). This effect was also found for mice (Twyman, Newcombe, & Gould, 2009). However, a possible rebuttal from modularity theorists would be that performance in this paradigm is based on the use of complex beacons. The corners of the enclosure can be distinguished from adjacent corners (although not from the diagonally opposite corners) based on the left-right positions of each feature (i.e. the corners might be blue/red, or red/blue). It is therefore possible that the combination of features, including relative position information, could be used as a beacon, leaving open the possibility that feature use in these experiments might be accounted for by an associative model.

FIGURE 4 HERE

More directly, Newcombe, Ratliff, Shallcross, and Twyman (2010) designed an experiment to directly test the Lee et al. (2006) claims. In the first experiment, children were asked to reorient in an octagon with alternating short and long walls. In this type of enclosure, the eight possible hiding locations can be reduced to the four geometrically equivalent corners that share the same wall length and sense relations to the target location (see Figure 5). For example, a participant could use the geometry of the octagon to remember that the correct location is in one of the corners with a long wall to the left and a short wall to the right. Different groups of children were asked to reorient in the octagonal space either with or without one of the walls of the octagon serving as a red feature wall. This cue could be used, for example, to remember that the target is on the left side of the red wall.

The first finding was that, in an all-white (geometry-only) condition, 2- and 3-year-old children were able to use the complex geometry of the octagon for orientation. The fact that toddlers were able to use the geometry of the octagon was quite remarkable given the complexity of the shape, the subtle obtuse corner angles, and the lack of a single principal axis of space that might have helped reorientation. The second finding was that, when a feature wall was added, 3-and 5-year-old children were able to choose among the three all-white corners that share the same geometric and feature properties; these corners can only be distinguished on the basis of indirect feature use of the red wall. (Two-year-old children were not tested.) The octagon experiments demonstrate that children are able to use the feature for true reorientation, at least in the presence of geometric information.

To determine what happens in the absence of geometric information, a second experiment was conducted in a circle with a design similar to that of the Lee et al. (2006) study. Four year old children were asked to reorient in a circular enclosure and were asked to find a hidden object in small hiding boxes (see Figure 6). The most important difference between the Lee et al. and Newcombe et al. experiments is that in the former, the feature is actually one of the hiding locations and is centrally placed within the enclosure while in the latter the feature is a stable part of the enclosure boundary. When the feature is stable and integrated into the space, children are able to reorient with the feature cue. They are able to correctly search at a hiding location within an array of either two or three boxes placed in the middle of the enclosure. Together, studies with children that use a more stable feature cue suggest that features are truly used for reorientation, and not just as beacons marking the target. There are at least two lines of research that could extend these findings. For the first, although children searched above chance in the octagon and circle experiments, adults were quite a bit more accurate. Therefore it appears that both the use of geometric and feature cues develops beyond the first five years of life. These paradigms could be used to chart the developmental trajectories of both cue classes. In a complementary fashion, it would be interesting to extend these paradigms with nonhuman animals to determine if they too are able to truly use feature cues for reorientation.

FIGURE 5 HERE

An Alternative Proposal: Adaptive Combination Theory

Spatial memory and judgments are typically based on a variety of cues, and there is evidence that these cues are combined in a Bayesian fashion (Cheng, Shettleworth, Huttenlocher & Rieser, 2007; Huttenlocher, Hedges & Duncan, 1991; Waismeyer & Jacobs, 2012). This idea can be applied to the data on use of geometric and featural cues. In contrast to modularity theory, adaptive combination theory proposes that geometric and featural cues can both be used for reorientation in a fashion that depends on a combination of cue weights, with the weights determined by factors such as the perceptual salience of the cues (which affects their initial encoding), the reliability of the memory traces (i.e., subjective uncertainty, which is related to the variability of estimates), and the success with of that kind of cue given prior experience (Newcombe & Huttenlocher, 2006; Newcombe & Ratliff, 2007). Information that is more salient, more reliable as a predictor of the goal, more familiar, or low in variability, should be taken into account more than other competing sources of information. The flexibility of adaptive combination theory suggests that, when features and geometry have similar weights on these dimensions, they should be integrated, but when the combination of weights on these dimensions strongly favors one kind of cue over the other, that cue should dominate.

We should pause for a moment to discuss cue salience.Geometry and features have been the main cue classes that have been examined in the reorientation literature. It might be argued that it is difficult to compare how much each contributes to behavior because the saliencies of the cue type are impossible to equate, and may well differ across periods of development or between species. While it is true that the absolute salience of each cue cannot be know for each participant or subject, what is important for adaptive combination theory is that the salience can be varied. For any given situation, when the salience of the cue is increased, then the adaptive combination theory predicts that it will be more heavily used. This kind of finding has been demonstrated. For example, we see a reduced reliance on geometric information in increasingly large rooms (the room size effect discussed below) where the feature cue becomes more salient because it is more distal. As another example, when subjects have spent the early part of their lives in either geometrically or featurally rich environments, we see rearing effects, also to be discussed further below.

FIGURE 6 HERE

Despite the strengths of the adaptive combination approach, its potential weakness is being overly general, and future work clearly needs to more rigorously specify the parameters in a well-defined model, and test novel predictions. Nonetheless, in this section we review the data that suggest that some model more flexible than geometric modularity is necessary.

The Room Size Effect

In an important illustration of adaptive combination theory, and a challenge to modularity theorists, the dominance of geometric information over feature use, has turned out to depend critically on the size of the enclosure. Geometry is more likely to be used in small spaces and features are more likely to be used in large spaces, for children (Learmonth et al., 2001, 2002, 2008), adults (Ratliff & Newcombe 2008b), fish (Sovrano et al., 2007), chicks (Chiandetti et al., 2007; Sovrano & Vallortigara, 2006;  Vallortigara et al., 2005), and pigeons (Kelly et al., 1998). These data cannot be explained by any interesting version of modularity theory because an adaptive module should operate across variations in scale and should especially operate in large spaces. It is true that there might be a module that applies only to very small enclosures, but it is hard to see how such a module would be central to survival and reproduction in any plausible environment of adaptation.

Why does the size of the space make a difference? One possibility is that the geometric cue is more salient in small spaces because the relative difference between wall lengths is more noticeable when the aspect ratio is greater and when the wall lengths can be compared within a single view. Therefore, as the room size increases, the weight assigned to the geometry cue is reduced. However, the attractiveness of this idea is decreased by a recent demonstration that the distance of the walls from the center of the room is the potent cue in this paradigm, rather than the lengths of the walls (Lee, Sovrano & Spelke, 2012). If distance is more important than length, then one could postulate that differences in two short distances are easier to compare than differences in two longer distances.

There are other explanations for the room size effect. As the room size increases, the weight assigned to the feature cue is increased because a landmark is more useful for determining heading when it is a distal rather than a proximal cue (Lew, 2011). In addition, the increased possibility for movement in the larger room may engage more spatial processing. In several experiments on these issues, Learmonth, Newcombe, Sheridan, and Jones (2008) found that both the distance of features from the participant and the possibilities for action in the larger space have an impact on the age at which children succeed in using features. The changing relative use of geometric and feature cues based on the scale of space is difficult for the modular position to explain, as it would predict invariant use of geometry. In contrast, the changing weights of cues as a function of their salience and reliability are at the heart of adaptive combination theory.

Short-Term Experience Effects

Experience effects are not predicted by modularity theory; modules are supposed to be inflexible and relatively impermeable. However, adaptive combination theory suggests that familiarity with a cue should be an important determinant of use of features versus geometry. There are several training experiments that provide support for the effects of recent experience. In one study, children were given practice using a feature for reorientation in an equilateral triangle (no useful geometry) with three different colored walls (Twyman, Friedman, & Spetch, 2007). In as few as four practice trials with the feature, 4- and 5-year-old children came to use the feature wall to reorient even in the small space used by Hermer and Spelke (1994;1996), in which same-aged children had been shown to rely exclusively on geometric cues. The short training period was effective in either the presence (a rectangle) or absence (equilateral triangle) of relevant geometric information. This experimen highlights that the relative use of geometric and feature cues can change. Along similar lines, four trials of experience in a larger enclosure lead to young children’s use of features in the small enclosure (Learmonth et al., 2008).

Newcombe and Ratliff (2008b) demonstrated a similar pattern for adults. Participants were asked to perform a reorientation task in either a small or a large room and to switch room sizes halfway through the experiment. People who had started in the large room (where features are salient) relied more heavily on the feature cue than people who had spent all trials in a small room. In contrast, individuals who had started in the small room (where geometry is salient) began to use feature information when moved to the larger room; in fact, they performed no differently from individuals who had remained in the large room for all trials. Therefore, it seems likely that the successful search using the feature in the large space increased the relative dependence on the feature cue, and this change in relative cue weights was reflected when participants were asked to perform the same task in the smaller space.

Short-term experience also matters for pigeons. Kelly and Spetch (2004) trained pigeons on the reorientation task. Some of the pigeons were initially trained with geometry and others were trained with features. Then the pigeons experienced training with both cues and were tested for their relative use. The pigeons with the geometry pre-training relied both on geometric and feature cues, while the pigeons with the feature pre-training relied mainly on just the feature cues.

These experiments with children, adults, and pigeons indicate a common theme: reorientation is a flexible system that is updated, based on prior experiences. Next we turn to experiences over a longer period of time and earlier in development.

Rearing Effects

The previous sections demonstrated that changes in the salience of the cues or in the participants’ short-term experiences influence reorientation behavior. A series of rearing experiments have demonstrated that there are differences that emerge over a longer period, at least for some species. Initially, the reorientation ability of wild-caught mountain chickadees (Poecile gambeli) was examined (Gray, Bloomfield, Ferrey, Spetch, & Sturdy, 2005). This group of researchers used wild-caught birds as they were likely to have experienced rich feature information in their natural habitat. This species typically lives in forested areas near streams and mountains. The environment just described contrasts greatly with the standard housing conditions in labs, which are comprised largely of uniform rectangular enclosures. The wild-caught chickadees relied more heavily on feature cues than did other standard-reared species. However, when the reorientation abilities of wild-caught and lab-reared black-capped chickadees (Poecile atricapillus) were examined, their behavior was much closer to the standard-reared subjects (Batty, Bloomfield, Spetch, & Sturdy, 2009). Therefore, it is unclear if there is something different about the experiences of black-capped and mountain chickadees that cause these differences, or if there is a difference across species.

An alternative approach is to tightly control the rearing environment. This approach has been used with chicks, fish, and mice. For chicks, there does not seem to be any difference between chicks reared in a circular (lacking relevant geometry) and rectangular (containing relative wall lengths) environment in their relative use of feature or geometric cues (Chiandetti & Vallortigara, 2008, 2010). However, the chicks were only housed for two days before starting training, and they are a precocial species that may not have as much of a sensitive period for rearing effects. In experiments with longer rearing periods, a different pattern has emerged. Convict fish were reared in either circular or rectangular environments. Subsequent tests showed that the fish in the circular environments relied more heavily on feature cues than did the rectangular reared fish (Brown, Spetch, & Hurd, 2007).

Similar to fish, there are differences between mice that have been raised in feature rich environment (a circle with one half white one half blue) and a geometrically rich environment (rectangular enclosur with a triangular nest box; see Figure 7). Although there were no differences in the acquisition of geometric information alone, the circular-reared mice were faster to learn a feature panel task. Additionally, and crucially, on a test of incidental geometry encoding (a rectangle with a feature panel marking the correct location), the rectangular- reared mice had encoded the geometry while the circular-reared mice had not (Twyman, Newcombe, & Gould, 2012).

FIGURE 7 HERE

In summary, for chicks and black-capped chickadees, early environment does not have a large impact on reorientation behavior. However, for mountain chickadees, mice, and fish, the rearing environment alters the relative use of geometric and feature cues.

Facilitation and Interference Effects

One reason given initially to favor a modularity hypothesis was the claim that geometric and featural information are both learned in situations where one might expect overshadowing or blocking effects (Cheng & Newcombe, 2005). This pattern of independence suggested separable systems. However, subsequent research has shown a far more complex pattern of results, with the two kinds of information sometimes learned independently, sometimes showing overshadowing or blocking of one by the other, and sometimes showing facilitation of one by the other (Cheng, 2008; Miller & Shettleworth, 2008). Furthermore, it has been shown that rats can integrate these kinds of information across successive phases of an experiment to make correct spatial choices (Rhodes, Creighton, Killcross, Good & Honey, 2009).

Just considering facilitation effects, there are two recent examples, one from research with birds and other from research with humans. Kelly (2010) trained two groups of Clark’s nutcrackers (Nucifraga columbiana) with an array of objects at the four corners of a rectangle. When the objects were identical, the birds did not learn the task after an extensive training program. When the objects were unique, the birds learned the task and, maybe surprisingly, had also encoded the rectangular shape of the array. In another example of a facilitation effect, human individuals with Williams Syndrome, a genetic defect that has important effects on spatial functioning, failed to encode the geometry of an all-white rectangular enclosure, but showed geometric encoding when a colored feature wall was added (Lakusta, Dessalegn & Landau, 2010).

This literature is now much too large to review thoroughly here, but it clearly challenges modularity theory (Twyman & Newcombe, 2010). More important, it represents a challenge to any viable comprehensive theory, which must be able to account in precise quantitative terms for the pattern of effects, and make novel predictions. An interesting direction for future research has been indicated by recent studies on rodents which suggest that cue interaction (blocking, overshadowing, and facilitation) between geometry and features might be modulated by sex because male and female rats tend to assign different weights to these cues (Rodriguez, Chamizo, & Mackintosh, 2011; Rodriguez, Torres, Mackintosh, & Chamizo, 2010); this should be explored in additional species.

Section Summary

The available evidence indicates that geometric and featural information can both be used for reorientation by a wide variety of species and (within the human species) across a broad range of ages. However, the relative use of these cues depends on their salience, the reliability of their encoding, and their familiarity across both recent and longer-term experience. Human language is one of several factors that can facilitate the use of features in situations in which it might otherwise be weak, but it is not the only way this end can be accomplished. From the general point of view of a field of comparative cognition, a striking fact is how vigorous the dialogue between the developmental and comparative communities has been, and how many species have been investigated using how many techniques. Wider development of this dialogue is likely to be very fruitful.

Slope as a Reorientation Cue

Most spatial experiments, including reorientation studies, have been conducted on flat surfaces. But, as we all know after climbing a hill or admiring an amazing view from a mountain top, the world is not flat. The slope of the terrain might clearly be an important cue for polarizing space, and hence for reorienting. One could imagine using “uphill” in a similar manner to “north” to anchor a direction in the environment. But is it in fact used this way?

Nardi and Bingman (2009a) compared the reorientation performance of pigeons which were trained to a correct corner of a trapezoid on a flat surface (geometry-only) with pigeons which were trained in the same trapezoid enclosure, but now with the floor sloped at 20 degree angle (geometry + slope, see Figure 8). Both groups of pigeons learned the task, but the geometry + slope group learned about three times faster than the geometry only group. The follow up tests for the geometry + slope group revealed that the pigeons had readily encoded slope (92% correct), had encoded geometry at above chance levels although accuracy was not very high (63%), and that the pigeons overwhelmingly preferred the slope-correct (75%) over the geometry-correct corner (0%) on conflict trials. Overall, these data suggest that slope is a powerful cue for reorientation compared to the geometry of the sides of the enclosure.

FIGURE 8 HERE

As acquisition was so much faster in the combined group, Nardi and Bingman wondered if slope might facilitate geometry acquisition. In a second experiment, they trained groups of pigeons with only geometry or with combined geometry and slope cues. Over the course of training, no differences were found in geometry acquisition between groups. Thus, it appears that geometry and slope cues neither facilitate nor inhibit learning of each other, a pattern traditionally interpreted as supporting the idea that they are fundamentally different classes of cues.

Thus far, geometry has been considered a single cue. As Sutton (2009) points out, there are several possible cue types of a geometric nature. These levels of geometric cues may be nested within each other, where local cues are located near the correct location and the global cues encompass relations in the larger space. For example, the trapezoid enclosures that have been reviewed thus far include two types of geometric cues: local corner angles (acute or obtuse) and global relations between relative wall lengths (for example a long wall to the right and a shorter wall to the left). Nardi, Nitsch and Bingman (2010) conducted a series of geometry and slope learning experiments with pigeons that examined the contributions of local and global geometry as well as slope to reorientation performance. Over the course of training, pigeons first learned to go to the two acute corners within the first three days. It took about nine days for pigeons to learn the global geometry of the space. Therefore, local geometry learning is much faster than global geometry learning. As one of the follow up tests, Nardi et al. rotated the training apparatus so that pigeons could not match all of the local geometry, global geometry, and the slope. In this manipulation, pigeons matched the correct slope and local geometric cue, at the expense of the global geometric cue. In training conditions where the global geometry is made two- or three-times as predictive as slope as an indicator of the correct target location, pigeons still rely more heavily on the slope rather than the global geometric cue. Therefore, for pigeons, the multimodal slope cue, which includes visual, kinesthetic, and vestibular information, appears to be particularly salient, and more important than geometry for a reorientation task.

Humans

Pigeons encode slope, but what about other species? The fact that pigeons can fly might be taken to argue that they are less likely to encode slope than species that cannot transcend the terrestrial environment, but is that in fact true? Nardi, Shipley and Newcombe (2011) put adult humans in a uniform white square enclosure with no useful geometric or feature cues for orientation. The 5° sloped floor of the enclosure provided visual, kinesthetic, and vestibular cues that could guide search (see Figure 9). A bowl was located in each corner of the room and participants saw a $1 bill hidden under one of the bowls. The correct hiding bowl remained the same for each of the four training trials for each participant, but was counterbalanced across subjects. After seeing the correct location, participants were disoriented and then asked to find the $1 bill. Once training was complete, two post-training tests compared search with the 5° sloped floor to the same space with a flat floor. People performed at chance (25%) when the floor was flat, showing that they had been thoroughly disoriented and that there were no stray cues that could be used to reorient. When the floor was sloped at a 5° angle, people were able to retrieve the hidden object on the majority of the trials, although there was a significant difference between men (79%) and women (43%) during the training trials. (This sex difference will be discussed further later in the paper.) This study showed that people can use slope as a reorientation cue, although less clearly than the pigeons had; however, the fact that the slope was at a much reduced angle for humans may have contributed to this apparent species difference. The Institutional Review Board declined Nardi et al. (2011) to tilt the floor of the room at a steeper angle. Therefore, studying pigeons (or other animals) at gentler angles would allow for a better comparison across species.

FIGURE 9 HERE

What Kind of Cue is Slope?

Thus far, we have seen that both an aerial species (pigeons) and a terrestrial species (people) use slope for reorientation; additionally, for pigeons, slope is a very powerful cue, which does not appear to interact with geometric cues in spatial learning. Now we turn to the question, important in the context of the reorientation literature, of whether to categorize slope as geometric information, feature information, or something else. There are arguments for slope being considered a geometric cue. The slope of the floor, say a 10 degree incline, is measured as the difference between a perfectly horizontal surface, perpendicular to gravitational force, and the angle of the floor. Therefore, the slope could be defined by comparing a surface to a surface in terms of angle, which would fall under Gallistel’s (1990) definition of a geometric cue. Additionally, determining that the floor is sloped could be accomplished by comparing relative lengths of walls (assuming a horizontal ceiling, the participant could judge the distance between the floor and the ceiling and note that the “up” end of the slope has a shorter wall height than the “down” end of the slope) or by noting the angle at which the floor meets the walls (acute at the uphill end and obtuse at the downhill end).

However, slope could be considered a type of feature information if viewed as a property of non-horizontal surfaces. One could use the slope direction to determine the facing orientation and to encode a location. For example, a navigator moving on a slope might know that the top of the hill should be on the left in order to get to a desired destination. This is analogous to the role that distant landmarks – another type of feature cue – play in horizontal environments; if there were a conspicuous landmark in the horizon (e.g., a mountain), then one could use it to determine heading. Therefore, slope polarizes the environment and provides a directional frame of reference that can be used for (re)orientation, in the same way as a distant landmark. In this sense, varying the inclination
of the tilt affects the salience of slope information (steeper slopes are obviously more salient than gentle ones), just like varying the size of a landmark makes it more or less salient.

Research from a neuroscience perspective with pigeons is relevant to this issue. Previously, it had been shown that bilateral lesions to the pigeon hippocampal formation, an analogous structure to the human hippocampus, disrupt the processing of geometric cues, but not feature cues (Vargas, Petruso, & Bingman, 2004). Similarly, Nardi and Bingman (2007) found that lesions to the left hippocampal formation of pigeons decreased reliance on geometry for reorientation. Pigeons that had undergone a control surgery performed identically to pigeons with a lesioned right hippocampal formation. Since the hippocampus appears to be more heavily involved in the use of geometric cues than feature cues in pigeons, Nardi and Bingman reasoned that lesions to the hippocampal formation should disrupt slope-based reorientation if slope is a type of geometric cue.

Nardi and Bingman (2009b) examined the reorientation of control and bilaterally lesioned pigeons when geometric and slope cues were available for reorientation. The training apparatus was a trapezoid shaped room with the correct
corner in one of the acute corners. Additionally, the floor was sloped at a 20 degree angle. Both groups of pigeons learned the task. Supporting previous research, the pigeons with the bilaterally lesioned hippocampal formation had more difficultly using the geometric cue than the control pigeons. Interestingly, there were no differences between groups in the use of slope. All pigeons rapidly learned the task, had encoded the slope cue when it was tested in isolation, and selected the slope correct corner on conflict trials. Therefore, it not only appears (again) that slope is a powerful reorientation cue for pigeons, since all pigeons preferred to reorient with slope rather than geometry, but also that slope does not seem to recruit the same neural circuits used by geometric cues. The identical performance of control and hippocampal lesioned pigeons with a slope reorientation cue implies that slope is hippocampal independent, and therefore is more like a feature cue than a geometric cue. The authors characterize slope as a gravity-dependent feature cue. However, given the distinctive characteristics of this cue – because it provides multimodal sensory stimuli, because it is associated with effortful movement, and because it involves the vertical dimension – it may be that slope is a unique type of information.

Slope Cues Versus Feature Cues In Pigeons and People

If slope cues are similar in some ways to feature cues, how do they interact and which kind of cue is more powerful?
Nardi and colleagues have asked these questions in behavioral studies with both pigeons and people. In both experiments, the experimental space was a square so that the geometric information was identical throughout the space. (Recall, however, that the floor was sloped at a 20 degree angle for pigeons and at a 5 degree angle for people.) Unique feature cards were placed in each corner of the room; therefore the correct target location could be identified based on the beacon alone.

Pigeons readily learned the reorientation task (Nardi, Mauch, Klimas, & Bingman, 2012). Post training tests indicated that the pigeons had encoded both cues. When slope (all feature cards identical) or beacon (flat floor) cues were presented in isolation, pigeons were highly accurate (96%). On the conflict test, where the trained beacon location was moved to an incorrect slope location, pigeons divided their search evenly between the beacon-correct and slope-correct corners. Interestingly, choices on the conflict tests depended on the location of the correct corner during training. When pigeons were required to go uphill during training, pigeons selected the slope-correct corner 76% of the time. In contrast, when pigeons went downhill to the correct training location, pigeons selected the beacon-correct corner 75% of the time. When pigeons go uphill, they exert more effort than when they follow the slope downhill. Nardi et al. propose that the role of effort might modulate the weighting of the slope and beacon cue for reorientation.

Using a similar paradigm, Nardi, Newcombe, and Shipley (2012) examined the interaction between slope and feature
cues with people. Like pigeons, people readily learned to reorient. Unlike pigeons, who encoded both the feature and
the slope, about two-thirds of the participants only encoded one or the other cue. Individuals performed similarly during the training trials with either the slope-strategy (78% accurate) or a feature-strategy (90% accurate). When people did not clearly follow a single strategy, they were not nearly as accurate, although still above 25% chance, on the training trials (50% accurate).

In sum, pigeons encode both slope and beacon cues during a reorientation task, with both information sources being
given equal importance. Interestingly, this balance seems to shift based on the amount of effort required during training. When pigeons require extra effort to go to an uphill location, then slope is given more importance than a beacon cue and vice versa. People are also able to encode and use slope and feature cues for reorientation. In contrast to pigeons, people tend to use a single strategy for reorientation, either a slopebased or a feature-based approach. They show consistent individual differences in which class of cue they prefer.

Section Summary

Overall, both pigeons and people are able to use slope as a reorientation cue. It appears that slope should be considered a different cue class from geometry. When the hippocampus of pigeons is lesioned, geometry performance is
impaired, particularly when the left hippocampal formation is lesioned. Slope behavior is unaffected by bilateral hippocampal formation lesions. Thus, slope and geometry appear to be processed by different areas of the pigeon brain. When pigeons are required to choose between feature, slope and geometry cue types, subjects weigh slope and feature cues about equally, and prefer to use slope over geometry. The over-reliance on slope when given also geometric information is compelling, as it occurs even if geometry is a better predictor of the goal. The balance between slope and feature cue use depends in part on the amount of effort during training. When the trained corner is located uphill, then pigeons rely more heavily on the slope cue. When the trained corner in located downhill, then pigeons rely more heavily on the feature cue. Thus, effort modulates the relative weighting of feature and slope cues in spatial memory for pigeons. When both slope and features are present during training, pigeons encode both cue types. In contrast, the majority of people tend to use one or the other cue type, in about equal proportions, to solve the reorientation task.

Sex Differences in Reorientation?

There are striking sex-related differences in some (but not all) kinds of human spatial functioning, particularly in mental rotation and in orientation to gravitationally-defined horizontal and vertical (Voyer, Voyer, & Bryden, 1995). There are also probably sex differences in navigation tasks. For example, men perform better than women in constructing a survey representation (Ishikawa & Montello, 2006), in using the geometry of a surrounding trapezoid to locate a hidden platform (Sandstrom, Kaufman, & Huettel, 1998), and in selecting the initial heading in a virtual Morris Water Maze task (Woolley, Vermaercke, Op de Beeck, Wagemans, Gantois, D’Hooge, Swinner, & Wenderoth, 2010). There are also probably sex differences in non-human species, although the differences vary across species, for example, mice show fewer such differences than rats (Jonasson, 2005).

Until recently, however, possible sex differences in reorientation have received little attention. In the animal literature, subjects are often all male, of unspecified sex, or comprise too small a sample to look for sex differences (Cheng & Newcombe, 2005). Of course, human studies of reorientation are more often able to look for sex differences, but they have mostly not found them. And when sex has been examined in studies of non-human animals, it seems to have weak and inconsistent effects (Sovrano, Bisazza, & Vallortigara, 2003 for fish; Twyman, Newcombe, & Gould, 2009, 2012 for mice). In sum, because of all-male samples, unknown sex, or too small sample sizes, it is unclear if there are differences between the sexes in reorientation, but they have not seemed impressive. However, more recently, some sex differences have emerged, concerning three areas. Arranged in ascending order by the power of the findings, they are: the use of local geometric cues, geometry in the presence of a beacon feature cue, and the use of slope for reorientation.

Local versus Global Geometry

It has been proposed that men rely more on directional cues such as cardinal position, gradients or distal landmarks, while women seem to depend on positional cues like local landmarks (Jacobs & Schenk, 2003). In a reorientation study linked to this issue, adults were asked to reorient in a space with both local and global reorientation cues (Reichert & Kelly, 2011). An array of four posts formed a mental rectangular search space that could be used as a global cue (see Figure 10). The diagonal pairs of corner posts were set at angles of either 50 or 75 degrees and served as local geometric cues.

FIGURE 10 HERE

Neither sex encoded the global geometric shape of the array; men, but not women, encoded the local geometric
cues (i.e., angle size). Therefore, men appeared to be better able to use local geometric cues for reorientation than were women, in contradiction of the Jacob and Schenk hypothesis. These findings are puzzling, however, not only because they seem to contradict the Jacobs and Schenk hypothesis, but also because Sutton, Twyman, Joanisse and Newcombe (2012) found that, at least in virtual reality, adults could infer the global geometric shape from and array of four columns. Additionally, Lubyk, Dupuis, Gutiérrez, and Spetch (2012) found that adults were able to reorient with local acute and obtuse angles in a virtual reality search task, and importantly, no sex differences were found.

Beacon Cues and Geometric Cues

The bulk of the previous research with humans has used a rectangular enclosed space as the geometric cue, and one of the walls of the rectangle was a unique color to provide the feature cue. In this type of task, gender differences have not been found with adults or with children (Hermer & Spelke 1994; 1996; Learmonth, Nadel, & Newcombe, 2002; Twyman, Friedman, & Spetch, 2007). However, two studies have used a distinctive object directly at or near the correct hiding location within a rectangular search space, i.e., a beacon. Kelly and Bischof (2005) created a 3D virtual environment of a rectangular search space. In each corner of the room was a distinctive object. Both men and women readily learned the task, which could have been accomplished by either encoding both the geometric cue and the beacon, or just paying attention to the beacon. When the beacons were removed, it was found that the men, but not the women, had encoded the geometry of the space. Importantly, in a similar experiment, when a feature wall was used, the sex difference went away and both men and women encoded the geometry of the space (Kelly & Bischof, 2008). Lourenco, Addy, Huttenlocher and Fabian (2011) found similar results with toddlers. In a real-world version of the task with an enclosed rectangular search space and either a unique hiding container or a distinctive flag placed on top of the target container, toddlers learned to reorient. On the geometry-only test in which all of the containers were identical, only the boys turned out to have encoded the geometry of the enclosure.

On the basis of these two studies, it is possible that gender differences in reorientation are specific to the case in which there are salient beacons, which somehow have an especially strong pull on females. It would be nice to know
the pattern with non-human animals, but researchers will need to use female as well as male animals to answer this
question. However, some geometric information may exist even when geometry-only tests are failed. Lourenco et al.
(2011) included conflict trials designed to assess the relative use of geometric and feature cues. All toddlers preferred the beacon cue to geometry, and all toddlers, both boys and girls, were slower to respond on the conflict trials than they had been during training. If the girls truly had not encoded the geometry during training, then their search times should have remained fast. Thus, the girls probably had noticed something about the shape of the environment even though not at a level sufficient to support active search with the geometric cue.

Sex Differences in Slope Cues

As we reviewed earlier, people are able to reorient with slope as the sole orientation cue (Nardi et al., 2011). Participants were disoriented in a uniform square room and then were asked to find a target location using the floor that was slanted at a 5° angle. Overall, people were able to use the sloped floor to guide search. However, men and women performed quite differently on this task. When participants were not given any extra instructions, men were about 35% more accurate (1.4 standard deviation difference). Additionally, each sex adopted different strategies. The vast majority of the men reported using slope, while only about half of women attempted to use the slope. The other half attempted to use other ineffective strategies: about a third of the women attempted to use a path integration strategy (trying to keep track of the number of rotations), and the remaining tried to use small features in the environment like a wrinkle in the fabric or a filament thread in the light bulb. Therefore, it is possible that the lower accuracy of women on this task could be because of strategy choice rather than a difference in ability.

In an effort to make the sloped floor more salient, Nardi et al. showed a ball rolling down the floor and told participants that the slanted floor could help them succeed at the task. All participants reported using a slope-based strategy. And people did improve, but men were still more accurate than women. To further investigate this sex difference, the authors wondered if women might have a difficult time perceiving the slanted floor. To test this hypothesis, participants were required to stand in the middle of the room and they were asked to point in the up direction of the slope as quickly and accurately as possible. Both sexes were able to correctly identify the direction of the slope in just over 3 seconds, but men were over 1 second faster than women.

Might women have more difficulty using slope since they are often wearing heeled footwear that might make slope
difficult to perceive and use? Probably not. In Nardi et al. (2011), when the footwear was uncontrolled (i.e. women
performed the task in the shoes they showed up with on the given day) and when women were required to wear flat slippers provided by the experimenters, they performed identically in the slope task. Further, when Nardi, Newcombe, and Shipley (2012) asked women to complete a survey about the height of footwear they wear for everyday use, there was no correlation between slope use and typical heel height. On a different note, an interesting aspect of this study was the finding that men were generally more confident in solving the reorientation task on a slope, suggesting that sex differences in spatial confidence might play a role in the performance advantage with slope.

In summary, there appear to be large sex difference in the use of slope-based strategies for reorientation. Men are more accurate than women by about 1.4 standard deviations, a difference that is larger than the sex difference for the mental rotation test. Men are twice as likely to adopt a slope based strategy when this is the only effective cue available. If the slope is made more salient, then almost everyone attempts to use slope, but men are still more accurate. Women are also slower to correctly identify the direction of a slope.

Section Summary

In the vast majority of studies with animals and humans, sex differences have not been found for reorientation behavior, particularly when experiments are conducted in enclosed rectangles with a feature wall. More recently, there have been a few findings that suggest sex differences when reorientation is tested with embedded local and geometric cues or with a beacon as a feature cue. From this small set of findings, it appears that men might be better at using geometric cues compared to women. This would parallel the sex differences that have been found for rats in water-maze search tasks, where both sexes can use geometric or proximal feature cues to locate a hidden platform, but male rats rely more heavily on geometric cues and female rats prefer to use a proximal feature cue (Rodriguez, Chamizo, & Mackintosh, 2011; Rodriguez, Torres, Mackintosh & Chamizo, 2010). However, these studies were not about orientation, and therefore any claims about sex differences in reorientation ability are currently far from definitive. Further experiments with nonhuman animals would be more likely to shed light on sex
differences than work with humans because various social and cultural differences could be excluded. Nevertheless, the most striking sex difference we have reviewed concerns the use of slope cues. Comparative work and work investigating the neural bases of these effects might shed more light on these differences.

Conclusion

Research on spatial cognition has been generally more open to a comparative approach than research in many other domains, and research on the geometric module theory has been an especially vigorous example of the kind of interchange that would be desirable for a comprehensive account of cognitive biology. In this article, we have seen that each field and sub-field often contributes distinctive methods and concepts to the collective enterprise. As a result, we know a great deal more about reorientation than we did in 1986. It has become clear that the twin hypotheses of a geometric module and a unique and necessary role for human language in reorientation cannot stand. It has also become clear that there is a need for expansion of the taxonomy of cues that can be used for reorientation, with slope a good example. There is also a need for definitional clarification and possibly for a change in nomenclature, because it is difficult to postulate a geometry that includes distance and direction, but not angle and length as suggested by Lee et al. (2012). It may be that a renewed focus on contact with the overall literature on spatial navigation will lead to a more comprehensive view (Lew, 2011). The challenge for the future will be in formulating a precise, quantitatively-specified model that can account for the hundreds of effects found to date, with more data being reported each month.


References

Batty, E. R., Bloomfield, L. L., Spetch, M. L. & Sturdy, C. B. (2009). Comparing black-capped (Poecile atricapillus) and mountain chickadees (Poecile gambeli): use of
geometric and featural information in a spatial orientation task. Animal Cognition, 12, 633-641. doi.org/10.1007/s10071-009-0222-3 PMid:19381699

Bek, J., Blades, M., Siegal, M., & Varley, R. (2010). Language and spatial reorientation: Evidence from severe aphasia. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36, 646 – 658. doi.org/10.1037/a0018281 PMid:20438263

Brown, A. A., Spetch, M. L., & Hurd, P. L. (2007). Growing in circles: Rearing environment alters spatial navigation in fish. Psychological Science, 18, 569-573.
doi.org/10.1111/j.1467-9280.2007.01941.x PMid:17614863

Cheng, K. (1986). A purely geometric module in the rat’s spatial representation. Cognition, 23, 149-178. doi.org/10.1016/0010-0277(86)90041-7

Cheng, K. (2008). Whither geometry? Troubles of the geometric module. Trends in Cognitive Sciences, 12, 355-361. doi.org/10.1016/j.tics.2008.06.004 PMid:18684662

Cheng, K. & Newcombe, N.S. (2005). Is there a geometric module for spatial orientation? Squaring theory and evidence. Psychonomic Bulletin and Review, 12, 1-23. doi.org/10.3758/BF03196346 PMid:15945200

Cheng, K., Shettleworth, S. J., Huttenlocher, J., & Rieser, J. J. (2007). Bayesian integration of spatial information. Psychological Bulletin, 133, 625-637.
doi.org/10.1037/0033-2909.133.4.625 PMid:17592958

Chiandetti, C., & Vallortigara, G. (2008). Is there an innate geometric module? Effects of experience with angular geometric cues on spatial reorientation based on the shape of the environment. Animal Cognition, 11, 139-146. doi.org/10.1007/s10071-007-0099-y PMid:17629754

Chiandetti, C., & Vallortigara, G. (2010). Experience and geometry: Controlled-rearing studies with chicks. Animal Cognition, 13, 463-470. doi.org/10.1007/s10071-009-0297-x PMid:19960217

Chiandetti, C., Regolin, L., Sovrano, V. A., & Vallortigara, G. (2007). Spatial reorientation: The effects of space size on the encoding of landmark and geometry information. Animal Cognition, 10, 159-168. doi.org/10.1007/s10071-006-0054-3 PMid:17136416

Cressant, A., Muller, R. U., & Poucet, B. (1997). Failure of centrally placed objects to control the firing fields of hippocampal place cells. Journal of Neuroscience, 17, 2531-2542. PMid:9065513

Cressant, A., Muller, R. U., & Poucet, B. (1999). Further study of the control of place cell firing by intro-apparatus objects. Hippocampus, 9, 423–431.
doi.org/10.1002/(SICI)10981063(1999)9:4 3.0.CO;2-U

Gallistel, C. R. (1990). The organization of learning. Cambridge: MIT Press.

Gouteux, S., Thinus-Blanc, C., & Vauclair, J. (2001). Rhesus monkeys use geometric and nongeometric information during a reorientation task. Journal of Experimental Psychology: General, 130, 505-519. doi.org/10.1037/0096-3445.130.3.505 PMid:11561924

Gray, E. R., Bloomfield, L. L., Ferrey A., Spetch, M. L., & Sturdy, C. B. (2005). Spatial encoding in mountain chickadees: Features overshadow geometry. Biology Letters, 1, 314-317. doi.org/10.1098/rsbl.2005.0347 PMid:17148196 PMCid:1617142

Hermer, L., & Spelke, E. (1994). A geometric process for spatial representation in young children. Nature, 370, 57-59. doi.org/10.1038/370057a0 PMid:8015605

Hermer, L., & Spelke, E. (1996). Modularity and development: The case of spatial reorientation. Cognition, 61, 195-232. doi.org/10.1016/S0010-0277(96)00714-7

Hermer-Vasquez, L., Moffet, A., & Munkholm, P. (2001). Language, space, and the development of cognitive flexibility in humans: The case of two spatial memory tasks. Cognition, 79, 263-299. doi.org/10.1016/S0010-0277(00)00120-7

Hermer-Vazquez, L., Spelke, E., & Katsnelson, A. (1999). Sources of flexibility in human cognition: Dual task studies of space and language. Cognitive Psychology, 39, 3-36. doi.org/10.1006/cogp.1998.0713 PMid:10433786

Hupbach, A., Hardt, O., Nadel, L., & Bohbot, V. D. (2007). Spatial reorientation: Effects of verbal and spatial shadowing. Spatial Cognition and Computation, 7, 213-226. doi.org/10.1080/13875860701418206

Hupbach, A., & Nadel, L. (2005). Reorientation in a rhombic environment: No evidence for an encapsulated geometric module. Cognitive Development, 20, 279-302. doi.org/10.1016/j.cogdev.2005.04.003

Huttenlocher, J., & Lourenco, S. F. (2007). Coding location in enclosed spaces: Is geometry the principle? Developmental Science, 10, 741-746. doi.org/10.1111/j.1467-7687.2007.00609.x PMid:17973790

Huttenlocher, J., Hedges, L. V., & Duncan, S. (1991). Categories and particulars: Prototype effects in estimating spatial location. Psychological Review, 98, 352-376. doi.org/10.1037/0033-295X.98.3.352 PMid:1891523

Ishikawa, T., & Montello, D. R. (2006). Spatial knowledge acquisition from direct experience in the environment: Individual differences in the development of metric knowledge and the integration of separately learned places. Cognitive Psychology, 52, 93-129. doi.org/10.1016/j.cogpsych.2005.08.003 PMid:16375882

Jacobs, L. F., & Schenk, F. (2003). Unpacking the cognitive map: The parallel map theory of hippocampal function. Psychological Review, 110, 285-315. doi.org/10.1037/0033-295X.110.2.285 PMid:12747525

Jonasson, Z. (2005). Meta-analysis of sex differences in rodent models of learning and memory: A review of behavioral and biological data. Neuroscience and Biobehavioral
Reviews, 28, 811-825. doi.org/10.1016/j.neubiorev.2004.10.006 PMid:15642623

Kelly, D. M. (2010). Features enhance the encoding of geometry. Animal Cognition, 13, 453-462. doi.org/10.1007/s10071-009-0296-y PMid:20012120

Kelly, D. M., & Bischof, W. F. (2005). Reorienting in images of a three-dimensional environment. Journal of Experimental Psychology: Human Perception and Performance, 31, 1391-1403. doi.org/10.1037/0096-1523.31.6.1391 PMid:16366797

Kelly, D. M., & Bischof, W. F. (2008). Orienting in virtual environments: How are surface features and environmental geometry weighted in an orientation task? Cognition, 109, 89-104. doi.org/10.1016/j.cognition.2008.07.012 PMid:18834974

Kelly, D. M., & Spetch, M. L. (2004). Reorientation in a two-dimensional environment: II. Do pigeons (Columba livia) encode the featural and geometric properties of a two-dimensional schematic of a room? Journal of Comparative Psychology, 118, 384-395.
doi.org/10.1037/0735-7036.118.4.384 PMid:15584775

Kelly, D. M., Spetch, M.L., & Heth, C. D. (1998). Pigeons’ (Columba livia) encoding of geometric and featural properties of a spatial environment. Journal of Comparative Psychology, 112, 259-269. doi.org/10.1037/0735-7036.112.3.259

Lakusta, L., Dessalegn, B., & Landau, B. (2010). Impaired geometric reorientation caused by genetic defect. PNAS Proceedings of the National Academy of Sciences of the United States of America, 107, 2813-2817. doi.org/10.1073/pnas.0909155107 PMid:20133673 PMCid:2840366

Learmonth, A. E., Newcombe, N. S., & Huttenlocher, J. (2001). Toddler’s use of metric information and landmarks to reorient. Journal of Experimental Child Psychology, 80, 225-244. doi.org/10.1006/jecp.2001.2635 PMid:11583524

Learmonth, A.E., Nadel, L. & Newcombe, N.S. (2002). Children’s use of landmarks: Implications for modularity theory. Psychological Science, 13, 337-341.
doi.org/10.1111/j.0956-7976.2002.00461.x PMid:12137136

Learmonth, A.E., Newcombe, N.S., Sheridan, N. & Jones, M. (2008). Why size counts: Children’s spatial reorientation in large and small enclosures. Developmental Science, 11, 414-426. doi.org/10.1111/j.1467-7687.2008.00686.x PMid:18466375

Lee, S. A., Shusterman, A., & Spelke, E. S. (2006). Reorientation and landmark-guided search by young children: Evidence for two systems. Psychological Science, 17, 577-582. doi.org/10.1111/j.1467-9280.2006.01747.x PMid:16866742

Lee, S. A., Sovrano, V. A., & Spelke, E. S. (2012). Navigation as a source of geometric knowledge: Young children’s use of length, angle, distance, and direction in a reorientation task. Cognition, 123, 144-161. doi.org/10.1016/j.cognition.2011.12.015 PMid:22257573

Lee, S. A., & Spelke, E. S. (2010). A modular geometric mechanism for reorientation in children. Cognitive Psychology, 61, 152-176. doi.org/10.1016/j.cogpsych.2010.04.002 PMid:20570252 PMCid:2930047

Lew, A. R. (2011). Looking beyond the boundaries: Time to put landmarks back on the cognitive map? Psychological Bulletin, 137, 484-507. doi.org/10.1037/a0022315 PMid:21299273

Lourenco, S. F., Addy, D., Huttenlocher, J., & Fabian, L. (2011). Early sex differences in weighting geometric cues. Developmental Science, 14, 1365-1378. doi.org/10.1111/j.1467-7687.2011.01086.x PMid:22010896

Lubyk, D. M., Dupuis, B., Gutiérrez, L., & Spetch, M. L. (2012). Geometric orientation by humans: angles weigh in. Psychonomic Bulletin & Review, 19, 436 – 442. doi.org/10.3758/s13423-012-0232-z PMid:22382695

Miller, N. Y., & Shettleworth, S. J. (2008). An associative model of geometry learning: A modified choice rule. Journal of Experimental Psychology: Animal Behavior Processes, 34, 419-422. doi.org/10.1037/0097-7403.34.3.419 PMid:18665724

Nadel, L., & Hupbach, A. (2006). Species comparisons in development: the case of the spatial “module”. In M. Johnson and Y. Munakata (eds), Processes of change in brain and cognitive development. Attention and Performance, vol. XXI. Oxford, UK: Oxford University Press

Nardi, D., & Bingman, V. P. (2007). Asymmetrical participation of the left and right hippocampus for representing environmental geometry in homing pigeons. Behavioural Brain Research, 178, 160-171. doi.org/10.1016/j.bbr.2006.12.010 PMid:17215051

Nardi, D., & Bingman, V. P. (2009b). Slope-based encoding of goal location is unaffected by hippocampal lesions in homing pigeons (Columba livia). Behavioural Brain Research, 205, 322-326. doi.org/10.1016/j.bbr.2009.08.018 PMid:19703498

Nardi, D., Mauch, R. J., Klimas, D. B., & Bingman, V. P. (2012). Use of slope and feature cues in pigeon (Columba livia) goal-searching behavior. Journal of Comparative Psychology. Advance online publication. doi:10.1037/a0026900

Nardi, D., Newcombe, N.S. & Shipley, T.F. (2011). The world is not flat: Can people reorient using slope? Journal of Experimental Psychology: Learning, Memory, and Cognition, 37, 354-367. doi.org/10.1037/a0021614 PMid:21171808

Nardi, D., Newcombe, N.S. & Shipley, T. F. (2012). Reorienting with terrain slope and landmarks. Memory & Cognition, advance online publication.
doi: 1. 3758/s13421-012-0254-9

Nardi, D., Nitsch, K. P., & Bingman, V. P. (2010). Slope-driven goal location behavior in pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 36, 430-442. doi.org/10.1037/a0019234 PMid:20718551

Nardini, M., Atkinson, J., & Burgess, N. (2008). Children reorient using the left/right sense of coloured landmarks at 18-24 months. Cognition, 106, 519-527. doi.org/10.1016/j.cognition.2007.02.007 PMid:17379204

Newcombe, N. S., & Huttenlocher, J. (2006). Development of spatial cognition. In W. Damon & R. Lerner (Series Eds.) and D. Kuhn & R. Seigler (Vol. Eds.), Handbook of child psychology: Vol. 2. Cognition, perception and language (6th ed., pp. 734-776). Hoboken, NJ: John Wiley & Sons.

Newcombe, N. S., & Ratliff, K. R. (2007). Explaining the development of spatial reorientation: Modularity-plus-language versus the emergence of adaptive combination. In J. Plumer & J. Spencer (Eds.), The emerging spatial mind (pp. 53-76). New York, NY: Oxford University Press. doi.org/10.1093/acprof:oso/9780195189223.003.0003

Newcombe, N. S., Ratliff, K. R., Shallcross, W. L., & Twyman, A. D. (2010). Young children’s use of features to reorient is more than just associative: Further evidence against a modular view of spatial processing. Developmental Science, 13, 213-220
doi.org/10.1111/j.1467-7687.2009.00877.x PMid:20121877

Pyers, J. E., Shusterman, A., Senghas, A., Spelke, E. S., & Emmorey, K. (2010). Evidence from an emerging sign language reveals that language supports spatial cognition. PNAS Proceedings of the National Academy of Sciences of the United States of America, 107, 12116-12120 doi.org/10.1073/pnas.0914044107 PMid:20616088 PMCid:2901441.

Ratliff, K. R., & Newcombe, N. S. (2008a) Is language necessary for human spatial reorientation? Reconsidering evidence from dual task paradigms. Cognitive Psychology, 56, 142-163. doi.org/10.1016/j.cogpsych.2007.06.002 PMid:17663986

Ratliff, K. R., & Newcombe, N. S. (2008b). Reorienting when cues conflict: Using geometry and features following landmark displacement. Psychological Science, 19, 1301- 1307. doi.org/10.1111/j.1467-9280.2008.02239.x PMid:19121141

Reichert, J. F., & Kelly, D. M. (2011). Use of local and global geometry from object arrays by adult humans. Behavioural Processes, 86, 196-205. doi.org/10.1016/j.beproc.2010.11.008 PMid:21144887

Rhodes, S. E. V., Creighton, G., Killcross, A. S., Good, M., & Honey, R. C. (2009). Integration of geometric with luminance information in the rat: Evidence from within compound associations. Journal of Experimental Psychology: Animal Behavior Processes, 35, 92-98. doi.org/10.1037/0097-7403.35.1.92 PMid:19159164

Rodriguez, C. A., Chamizo, V. D., & Mackintosh, N. J. (2011). Overshadowing and blocking between landmark learning and shape learning: the importance of sex differences. Learning & Behavior, 39, 324-35.
doi.org/10.3758/s13420-011-0027-5 PMid:21472414

Rodriguez, C. A., Torres, A. A., Mackintosh, N. J., & Chamizo, V. D. (2010). Sex differences in preferential strategies to solve a navigation task. Journal of Experiemental Psychology: Animal Behavior Processes, 36, 395-401. doi.org/10.1037/a0017297 PMid:20658870

Sandstrom, N. J., Kaufman, J., & Huettel, S. A. (1998). Males and females use different distal cues in a virtual environment navigation task. Cognitive Brain Research, 6, 351–360. doi.org/10.1016/S0926-6410(98)00002-0

Shusterman, A., Ah Lee, S., & Spelke, E. S. (2011). Cognitive effects of language on human navigation. Cognition, 120, 186-201. doi.org/10.1016/j.cognition.2011.04.004 PMid:21665199

Sovrano, V. A., & Vallortigara, G. (2006). Dissecting the geometric module: A sense linkage for metric and landmark information in animals’ spatial reorientation. Psychological Science, 17, 616-621. doi.org/10.1111/j.1467-9280.2006.01753.x PMid:16866748

Sovrano, V. A., Bisazza, A., & Vallortigara, G. (2003). Modularity as a fish (Xenotoca eiseni) views it: Conjoining geometric and nongeometric information for spatial reorientation. Journal of Experimental Psychology: Animal Behavior Processes, 29, 199-210.
doi.org/10.1037/0097-7403.29.3.199 PMid:12884679

Sovrano, V. A., Bisazza, A., & Vallortigara, G. (2007). How fish do geometry in large and in small spaces. Animal Cognition, 10, 47-54. doi.org/10.1007/s10071-006-0029-4 PMid:16794851

Sutton, J. E. (2009). What is geometric information and how do animals use it? Behavioural Processes, 80, 339-343. doi.org/10.1016/j.beproc.2008.11.007 PMid:19084055

Sutton, J. E., Twyman, A. D., Joanisse, M. F. & Newcombe, N. S. (2012). Geometry three ways: An fMRI investigation of geometric processing during reorientation. Journal of Experimental Psychology: Learning, Memory and Cognition, 38, 1530 – 1541. doi.org/10.1037/a0028456 PMid:22582967

Twyman, A. D., Friedman, A., & Spetch, M. L. (2007). Penetrating the geometric module: Catalyzing children’s use of landmarks. Developmental Psychology, 43, 1523-1530. doi.org/10.1037/0012-1649.43.6.1523 PMid:18020829

Twyman, A. D., & Newcombe, N. S. (2010). Five reasons to doubt the existence of a geometric module. Cognitive Science, 34, 1315-1356. doi.org/10.1111/j.1551-6709.2009.01081.x PMid:21564249

Twyman, A. D., Newcombe, N. S., & Gould, T. J. (2009). Of mice (Mus musculus) and toddlers (Homo sapiens): Evidence for species-general spatial reorientation. Journal of Comparative Psychology, 123, 342-345.
doi.org/10.1037/a0015400 PMid:19685977

Twyman, A. D., Newcombe, N.S. & Gould, T.G. (2012). Malleability in the development of spatial reorientation. Developmental Psychobiology, advanced online publication. doi.org/10.1002/dev.21017

Vallortigara, G., Feruglio, M., & Sovrano, V. A. (2005). Reorientation by geometric and landmark information in environments of different size. Developmental Science, 8, 393-401. doi.org/10.1111/j.1467-7687.2005.00427.x PMid:16048511

Vallortigara, G., Zanforlin, M., & Pasti, G. (1990). Geometric modules in animals’ spatial representations: A test with chicks (Gallus gallus domesticus). Journal of Comparative Psychology, 104, 248-254. doi.org/10.1037/0735-7036.104.3.248 PMid:2225762

Vargas, J. P., Petruso, E. J., & Bingman, V. P. (2004). Hippocampal formation is required for geometric navigation in pigeons. European Journal of Neuroscience, 20, 1937–44. doi.org/10.1111/j.1460-9568.2004.03654.x PMid:15380016

Voyer, D., Voyer, S., & Bryden, M. P. (1995). Magnitude of sex differences in spatial abilities: A meta-analysis and consideration of critical variables. Psychological Bulletin, 117, 250-270. doi.org/10.1037/0033-2909.117.2.250 PMid:7724690

Waismeyer, A. S., & Jacobs, L. F. (2012). The emergence of flexible spatial strategies in young children. Developmental Psychology. Advance online publication. doi.org/10.1037/a0028334

Wiener, J., Shettleworth, S., Bingman, V.P., Cheng, K., Healy, S., Jacobs, L.F., Jeffery, K.J., Mallot, H.A., Menzel, R. & Newcombe, N.S. (2011). Animal navigation: A synthesis. In R. Menzel & J. Fischer (Eds.), Animal thinking: Contemporary issues in comparative cognition (pp. 51-76). Strüngmann Forum Report, Vol. 8, J. Lupp, series ed. Cambridge, MA: MIT Press.

Woolley, D. G., Vermaercke, B., de Beeck, H. O., Wagemans, J., Gantois, I., D’Hooge, R., . . . Wenderoth, N. (2010). Sex differences in human virtual water maze performance: Novel measures reveal the relative contribution of directional responding and spatial knowledge. Behavioural Brain Research, 208, 408-414. doi.org/10.1016/j.bbr.2009.12.019 PMid:20035800

Wystrach, A., & Beugnon, G. (2009). Ants learn geometry and features. Current Biology, 19, 61-66. doi.org/10.1016/j.cub.2008.11.054 PMid:19119010

Zugaro, M. B., Berthoz, A., & Wiener, S. I. (2001). Background, but not foreground, spatial cues are taken as references for head direction responses by rat anterodorsal thalamus neurons. Journal of Neuroscience, 21, 1-5.


Contact Information

Alexandra Twyman,
University of Western Ontario,
London, Ontario, Canada, N6A 3K7
Email: atwyman3@uwo.ca

Volume 8: pp. 60-77

zentall_thumbAnimals Prefer Reinforcement that Follows Greater Effort: Justification of Effort or Within-Trial Contrast?

Thomas R. Zentall
University of Kentucky

Reading Options:

PDF | Add to Endnote | Kindle | eBook


Abstract:

Justification of effort by humans is a form of reducing cognitive dissonance by enhancing the value of rewards when they are more difficult to obtain. Presumably, assigning greater value to rewards provides justification for the greater effort needed to obtain them. We have found such effects in adult humans and children with a highly controlled laboratory task. More importantly, under various conditions we have found similar effects in pigeons, animals not typically thought to need to justify their behavior to themselves or others. To account for these results, we have proposed a mechanism based on within-trial contrast between the end of the effort and the reinforcement (or the signal for reinforcement) that follows. This model predicts that any relatively aversive event can serve to enhance the value of the reward that follows it, simply through the contrast between those two events. In support of this general model, we have found this effect in pigeons when the prior event consists of: (a) more rather than less effort (pecking), (b) a long rather than a short delay, and (c) the absence of food rather than food. We also show that within-trial contrast can occur in the absence of relative delay reduction theory. Contrast of this kind may also play a role in other social psychological phenomena that have been interpreted in terms of cognitive dissonance.

Keywords: cognitive dissonance, justification of effort, contrast, delay reduction


When humans behave in a way that is inconsistent with the way they think they should behave, they will often try to justify their behavior by altering their beliefs. The theory on which this behavior is based is known as cognitive dissonance theory (Festinger, 1957). Evidence for the attempt to reduce cognitive dissonance comes from the classic study by Festinger and Carlsmith (1959) who found that subjects, who were given a small reward for agreeing to tell a prospective subject that a boring task was interesting, then rated the task more interesting than subjects who were given a large reward. Presumably, those given a small reward could not justify their behavior for the small reward so, to justify their behavior, they remembered the task as being more interesting. On the other hand, those given the large reward did not have to justify their behavior because the large reward was sufficient.

But the theory that such decisions are cognitively influenced has been challenged by evidence that humans with retrograde amnesia show cognitive-dissonance-like effects without having any memory for the presumed dissonant event (Lieberman, Ochsner, Gilbert, & Schacter, 2001). Lieberman et al. asked amnesics, to choose between pictures that they had originally judged to be similarly preferred. When they then asked the subjects to rate the pictures again, they, much like control subjects, now rated the chosen pictures higher than the unchosen pictures. What is surprising is that the amnesics had no memory for ever having seen the chosen pictures before. This result implies that cognitive dissonance is an implicit automatic process that requires little cognitive processing.

The same conclusion was reached by Egan, Santos, and Bloom (2007) who examined a similar effect in 4-year-old children and monkeys. When subjects were required to choose between two equally-preferred alternatives, they later avoided the unchosen alternative over a novel alternative, but they did so only if they were forced to make the original choice.

Festinger himself believed that his theory also applied to the behavior of nonhuman animals (Lawrence & Festinger, 1962), but the examples that they provided were only remotely related to the cognitive dissonance research that had been conducted with humans and the results that were obtained were easily accounted for by simpler behavioral mechanisms (e.g., the partial reinforcement extinction effect, which was attributed by others to a generalization decrement [Capaldi, 1967] or to an acquired response in the presence of frustration [Amsel, 1958]). Thus, the purpose of the research described in this article is to examine an analog design that could be used with nonhuman animals to determine if they too would show a similar cognitive dissonance effect.

One form of cognitive dissonance reduction is the justification of effort effect (Aronson & Mills, 1959). When a goal is difficult to obtain, Aronson and Mills found that it is often judged to be of more value than the same goal when it is easy to obtain. Specifically, Aronson and Mills reported that a group that required a difficult initiation to join was perceived as more attractive than a group that was easy to join. This effect appears to be inconsistent with the Law of Effect or the Law of Least Effort (Thorndike, 1932) because goals with less effort to obtain should have more value than goals that require more effort. To account for these results, Aronson and Mills proposed that the difficulty of the initiation could only be justified by increasing the perceived value of joining the group.

Alternatively, it could be argued that there may be a correlation between the difficulty in joining a group and the value of group membership. That is, although there is not always sufficient information on which to determine the value of a group, a reasonable heuristic may be that the difficulty of being admitted to the group is a functional (but perhaps imperfect) source of information about the value of group membership. Put more simply, more valuable groups are often harder to join.

The problem with studying justification of effort in humans is that humans often have had experience with functional heuristics or rules of thumb and what may appear to be a justification of effort, may actually reflect no more than the generalized use of this heuristic. On the other hand, if cognitive dissonance actually involves implicit automatic processes, cognitive processes may not be involved and one should be able to demonstrate justification of effort effects in nonhuman animals, under conditions that control for prior experience with the ability of effort to predict reward value.

The beauty of the Aronson and Mills (1959) design is that it easily can be adapted for use with animals because one can train an animal that a large effort is required to obtain one reinforcer whereas a small effort is required to obtain a different reinforcer. If the two reinforcers are objectively of equal value, one can then ask if the value of the reinforcer that requires greater effort is then preferred over the reinforcer that requires lesser value. Finding two reinforcers that have the same initial value and, more important, reinforcers that will not change in value with experience (unrelated to the effort involved in obtaining them) is quite a challenge (but see Johnson & Gallagher, 2011). Alternatively, one could use a salient discriminative stimulus that signals the presentation of the reinforcer following effort of one magnitude and a different discriminative stimulus that signals the same reinforcer following effort of another magnitude. One can then ask if the animal has a preference for either conditioned reinforcer, each having served equally often as a signal for the common reinforcer.

In this review, I will first present the results of an experiment in which we have found evidence for justification of effort in pigeons and then will describe a noncognitive model based on contrast to account for this effect. I will then demonstrate the generality of the effect to show that a variety of relatively aversive events can be used to produce a preference for the outcome that follows. We have interpreted the results of these experiments in terms of within-trial contrast and have proposed that it is unlike the various contrast effects that have been described in the literature (incentive contrast, anticipatory contrast, and behavioral contrast). Although an alternative theory, delay reduction, can make predictions similar to within-trial contrast, in several experiments we have found that within-trial contrast can be found in the absence of differential delay reduction. Although several studies have reported a failure to find evidence for within- trial contrast, the procedures and results of these studies have proven useful in identifying some of the boundary conditions that appear to constrain the appearance of this effect. Finally, I will suggest that contrast effects of this kind may be involved in several psychological phenomena that have been studies in humans (e.g., general cognitive dissonance effects, the distinction between intrinsic and extrinsic reinforcement, and learned industriousness).

Justification of Effort in Animals

To determine the effect of prior effort on the preference for the conditioned reinforcer that followed, Clement, Feltus, Kaiser, and Zentall (2000) trained pigeons with a procedure analogous to that used by Aronson and Mills (1959). All training trials began with the presentation of a white stimulus on the center response key. On half of the training trials, a single peck to the white key turned it off and turned on two different colored side keys, for example red and yellow, and choice of the red stimulus (S+) was reinforced but not the yellow stimulus (S-) (sides were counterbalanced over trials and colors were counterbalanced over subjects). On the remaining training trials, 20 pecks to the white key turned it off and turned on two different colored side keys, for example green and blue, and choice of the green stimulus was reinforced (see design of this experiment in Figure 1). Following extensive training, a small number of probe trials was introduced (among the training trials) involving the two conditioned reinforcers (i.e., red and green) as well as the two conditioned inhibitors (i.e., yellow and blue) to determine if the training had resulted in a preference for one over the other.

Figure 1. Design of the experiment by Clement et al. (2000), in which one pair of discriminative stimuli followed 20 pecks and the other pair of discriminative stimuli followed a single peck. Following extensive training, when pigeons were given a choice between the two positive stimuli, they preferred the one that followed the greater number of pecks.

Figure 1. Design of the experiment by Clement et al. (2000), in which one pair of discriminative stimuli followed 20 pecks and the other pair of discriminative stimuli followed a single peck. Following extensive training, when pigeons were given a choice between the two positive stimuli, they preferred the one that followed the greater number of pecks.

Interestingly, traditional learning theory (Hull, 1943; Thorndike 1932) would predict that this sort of training should not result in a differential preference because each of the conditioned stimuli would have been associated with the same reinforcer, obtained following the same delay from the onset of the conditioned reinforcer, and following the same effort in the presence of the conditioned reinforcer. That is, the antecedent events on training trials (the number of previous pecks experienced prior to the conditioned reinforcer during training) should not affect stimulus preference on probe trials.

Alternatively, one could imagine that stimuli that had been presented in the context of the single peck requirement would be associated with the easier trials and stimuli that had been presented in the context of the 20-peck requirement would be associated with the harder trials. If that was the case, it might be that the conditioned stimulus that was presented on the single-peck trials would be preferred over the conditioned stimulus that was presented on the 20-peck trials.

If, however, cognitive dissonance theory is correct, it could be that in order to “justify” the 20-peck requirement (because on other trials only a single peck was required) the pigeons would give added value to the reinforcer that followed the 20-peck requirement. If this was the case, the added value might transfer to the conditioned reinforcer that signaled its occurrence and one might find a preference for the stimulus that followed the greater effort.

Finally, it is possible that the peck requirement could serve as an occasion setter (or conditional stimulus) that the pigeons could use to anticipate which color would be presented. For example, if a pigeon was in the process of pecking 20 times it might anticipate the appearance of the green conditioned stimulus that it would choose. That is, the pecking requirement could bias the pigeon to choose the color that was most associated with that requirement. On probe trials there would be no peck requirement but Clement et al. (2000) reasoned that on probe trials, without an initial peck requirement, the pigeons might be biased to choose the conditioned reinforcer that in training required a single peck to produce because no required pecking would be more similar to a single peck than to 20 pecks. To allow for this possibility, Clement et al. presented three kinds of conditioned reinforcer probe trials: Trials initiated by a single peck to a white key, trials initiated by 20 pecks to a white key, and trials that started with a choice between the two conditioned reinforcers, with no white key.

The results of this experiment were clear. Regardless of the pecking requirement, on test trials (20, 1, or no pecks), the pigeons showed a significant preference (69.3%) for the conditioned stimulus that in training had required 20 pecks to produce. Thus, they showed a justification of effort effect. Furthermore, the two simultaneous discriminations were not acquired at different rates. That is, neither the number trials required to acquire the two simultaneous discriminations nor the number of reinforcements associated with the two S+ stimuli were significantly different.

A similar result was obtained by Kacelnik and Marsh (2002) with starlings. With their procedure, on some trials, they required the starlings to fly back and forth four times from one end of their cage to the other in order to light a colored key and peck the key to obtain a reinforcer. On other trials, the starlings had to fly back and forth 16 times to obtain a different colored key and peck the key to obtain the same reinforcer. On test trials, when the starlings were given a choice between the two colored lights without a flight requirement, 83% of them preferred the color that had required the greatest number of flights to produce.

Clement et al. (2000) and Kacelnik and Marsh (2002) used colors as the conditioned reinforcers to be able to use a common reinforcer as the outcome for both the easy and hard training trial. But in the natural ecology of animals, it is more likely that less arbitrary cues would be associated with the different alternatives. For example, one could ask if an animal might value reinforcement more from a particular location if it had to work harder to get the reinforcer from that location. In nature one could require that the animal travel farther to obtain food from one location than from another but it would be difficult to allow the animal to choose between the two locations without incurring the added cost of the additional travel time. However, such an experiment could be conducted in an operant chamber by manipulating the response requirement during training. Thus, we conducted an experiment in which we used two feeders, one that provided food on trials in which 30 pecks were required to the center response key, the other that provided the same food but at a different location, on trials in which a single peck was required to the center response key (Friedrich & Zentall, 2004). Prior to the start of training, we obtained a baseline feeder preference score for each pigeon. On each forced trial, the left or right key was illuminated (white) and pecks to the left key raised the left feeder, whereas pecks to the right key raised the right feeder. On interspersed choice trials, both the right and left keys were lit and the pigeons had a choice of which feeder would be raised (see Figure 2 top).

Figure 2. Design of the experiment by Friedrich & Zentall (2004), in which pigeons had to make 30 pecks to receive reinforcement from their less preferred feeder and only one peck to receive reinforcement from their more preferred feeder.

Figure 2. Design of the experiment by Friedrich & Zentall (2004), in which pigeons had to make 30 pecks to receive reinforcement from their less preferred feeder and only one peck to receive reinforcement from their more preferred feeder.

On training trials, the center key was illuminated (yellow) and either 1 peck or 30 pecks were required to turn off the center key and raised one of the two feeders. For each pigeon, the high-effort response raised the less preferred feeder and the low-effort response raised the more preferred feeder. Forced and free choice feeder trials continued through training to monitor changes in feeder preference (see Figure 2 bottom). Over the course of training, we found that there was a significant (20.5%) increase in preference for the originally non-preferred feeder (the feeder associated with the high-effort response; see Figure 3). To ensure that the increased preference for the originally non-preferred feeder was not due to the extended period of training, a control group was included. For the control group, over trials, each of the two response requirements was followed by each feeder equally often. Relative to the initial baseline preference, this group showed only a 0.5% increase in preference for their non-preferred feeder as a function of training. Thus, it appears that the value of the location of food can be enhanced by being preceded by a high-effort response, as compared to a low-effort response.

The ecological validity of the effect of prior effort on preference for the outcome that follows was further tested in a recent experiment by Johnson and Gallagher (2011) in which mice were trained to press one lever for glucose and a different lever for polycose. When initially tested, the mice showed a preference for the glucose; however, when the response requirement for the polycose was increased from one to 15 lever presses, and the mice were offered both reinforcers, they showed a preference for the polycose over the sucrose. Thus, increasing the effort required to obtain the less preferred food resulted in a reversal in preference. The once less-preferred food was now more preferred. Furthermore, neutral cues that had been paired with the reinforcers during training (a tone for one, white noise for the other) then became conditioned reinforcers that the mice worked to obtain in extinction, and they responded preferentially to produce the high-effort cue.

Figure 3. When pigeons were trained to make 30 pecks to receive reinforcement from their less preferred feeder and only one peck to receive reinforcement from their more preferred feeder and were then given a choice of feeders, they showed a shift in preference to the one they had had to work harder for in training (green circles, after Friedrich & Zentall, 2004). For the control group (red circles), both feeders were equally often associated with the 30-peck response. The dotted line represents the baseline preference for the originally non-preferred feeder.

Figure 3. When pigeons were trained to make 30 pecks to receive reinforcement from their less preferred feeder and only one peck to receive reinforcement from their more preferred feeder and were then given a choice of feeders, they showed a shift in preference to the one they had had to work harder for in training (green circles, after Friedrich & Zentall, 2004). For the control group (red circles), both feeders were equally often associated with the 30-peck response. The dotted line represents the baseline preference for the originally non-preferred feeder.

Had the experiments described above been conducted with human subjects, the results likely would have been attributed to cognitive dissonance. It is unlikely, however, that cognitive dissonance is responsible for the added value given to outcomes that follow greater effort in pigeons and mice. Instead, this phenomenon can be described more parsimoniously as a form of positive contrast.

A Model of Justification of Effort for Animals

To model the contrast account, one should set the relative value of the trial to zero. Next, it is assumed that key pecking (or the time needed to make those pecks) is a relatively aversive event and results in a negative change in the value of the trial. It is also assumed that obtaining the reinforcer causes a shift to a more positive value (relative to the value at the start of the trial). The final assumption is that the value of the reinforcer depends on the relative change in value; that is, the change in value from the end of the response requirement to the appearance of the reinforcer (or the appearance of the conditioned reinforcer that signals reinforcement; see Figure 4). In the case of the second experiment, it would be the change in value from the end of the response requirement to the location of the raised feeder. Thus, because the positive change in value following the high-effort response would be larger than the change in value following the low-effort response, the relative value of the reinforcer following a high-effort response should be greater than that of the low-effort response.

Figure 4. A model of the justification of effort effect based on contrast (i.e., the change in relative value following the less aversive initial event and following the more aversive initial event).

Figure 4. A model of the justification of effort effect based on contrast (i.e., the change in relative value following the less aversive initial event and following the more aversive initial event).

A similar model of suboptimal choice has been proposed by Aw, Vasconcelos, and Kacelnik (2011). They indicate “that animals may attribute value to their options as a function of the experienced fitness or hedonic change at the time of acting” (p. 1118). That is, the value of a reinforcer may depend on the state of the animal at the time of reinforcement. The poorer the state of the animal, the more valued the reinforcer will be. They have referred to this implied contrast as state-dependent valuation learning.

Relative Aversiveness of the Prior Event

Delay to Reinforcement as an Aversive Event.

If the interpretation of these experiments that is presented in Figure 4 is correct, then other relatively-aversive prior events (as compared with the comparable event on alternative trials) should result in a similar enhanced preference for the stimuli that follow. For example, given that pigeons should prefer a shorter delay to reinforcement over a longer delay to reinforcement, they should also prefer discriminative stimuli that follow a delay over those that follow no delay.

To test this hypothesis, we trained pigeons to peck the center response key (20 times on all trials) to produce a pair of discriminative stimuli (as in Clement et al., 2000). On some trials, pecking the response key was followed immediately by one pair of discriminative stimuli (no delay), whereas on the remaining trials, pecking the response key was followed by a different pair of discriminative stimuli but only after a delay of 6 sec. On test trials, the pigeons were given a choice between the two conditioned reinforcers, but in this experiment they showed no preference (DiGian, Friedrich, & Zentall, 2004, Group Unsignaled Delay).

One difference between the manipulation of effort used in the first two experiments and the manipulation of delay used in this one was in the effort manipulation in the earlier experiments. Once the pigeon had pecked once and the discriminative stimuli failed to appear, the pigeon could anticipate that 19 additional pecks would be required. Thus, the additional effort could be anticipated following the first response and the pigeon would be required to make 19 more responses in the presence of that anticipation. In the case of the delay manipulation, however, the pigeon could not anticipate whether a delay would occur or not, and at the time the delay occurred, no further responding was required. Thus, with the delay manipulation, the pigeon would not have to experience having to peck in the context of the anticipated delay. Would the results be different if the pigeon could anticipate the delay at a time when responding was required? To test this hypothesis, the delay to reinforcement manipulation was repeated but this time the initial stimulus was predictive of the delay (DiGian et al., 2004, Group Signaled Delay). On half of the trials, a vertical line appeared on the response key and 20 pecks resulted in the immediate appearance of a pair of discriminative stimuli (e.g., red and yellow). On the remaining trials, a horizontal line appeared on the response key and 20 pecks resulted in the appearance of the other pair of discriminative stimuli (e.g., green and blue) but only after a 6-sec delay (see Figure 5). For this group, the pigeons could anticipate whether 20 pecks would result in a delay or not, so they had to peck in the context of the anticipated delay. When pigeons in this group were tested, as in the effort-manipulation experiments, they showed a significant (65.4%) preference for the conditioned reinforcer that in training had followed the delay. Once again, the experience of a relatively-aversive event produced an increase in the value of the conditioned reinforcer that followed. Furthermore, the results of this experiment demonstrated that it may be necessary for the subject to anticipate the aversive event for positive contrast to be found.

The Absence of Reinforcement as an Aversive Event

A related form of relatively-aversive event is the absence of reinforcement in the context of reinforcement on other trials. Could reinforcement or its absence result in a preference for the conditioned reinforcer that follows the absence of reinforcement? To test this hypothesis, pigeons were trained to peck a response key five times on all trials to produce a pair of discriminative stimuli. On some trials pecking the response key was followed immediately by 2-s access to food from the central feeder and then immediately by the presentation of one pair of discriminative stimuli, whereas on the remaining trials pecking the response key was followed by the absence of food (for 2 s) and then by the presentation of a different pair of discriminative stimuli. On test trials, the pigeons were given a choice between the two S+ stimuli, but once again they showed no preference (Friedrich, Clement, & Zentall, 2005, Group Unsignaled Reinforcement).

Figure 5. Design of experiment by DiGian et al. (2004, Group Signaled Delay) in which one stimulus signaled the appearance of discriminative stimuli without a delay and the other stimulus signaled the appearance of a different pair of discriminative stimuli with a 6-s delay. Following extensive training, when pigeons were given a choice between the two positive stimuli, they preferred the one that followed the 6-s delay.

Figure 5. Design of experiment by DiGian et al. (2004, Group Signaled Delay) in which one stimulus signaled the appearance of discriminative stimuli without a delay and the other stimulus signaled the appearance of a different pair of discriminative stimuli with a 6-s delay. Following extensive training, when pigeons were given a choice between the two positive stimuli, they preferred the one that followed the 6-s delay.

As with the unsignaled delay condition, for this group, the aversive event, the absence of reinforcement, could not be anticipated prior to its occurrence. To test the hypothesis that this contrast effect depends on the anticipation of the aversive event, the absence of reinforcement manipulation was repeated but this time the initial stimulus was predictive of the delay (Friedrich et al., 2005, Group Signaled Reinforcement). Once again, on half of the trials, a vertical line appeared on the response key and 5 pecks resulted in the presentation of food followed by the appearance of one pair of discriminative stimuli. On the remaining trials, a horizontal line appeared on the response key and 5 pecks resulted in the absence of food followed by the appearance of the other pair of discriminative stimuli (see Figure 6). For this group, the pigeons could anticipate whether 5 pecks would result in reinforcement or not. When pigeons in this group were tested, they showed a significant (66.7%) preference for the conditioned reinforcer that in training had followed the absence of reinforcement. Once again, the experience of a relatively aversive event produced an increase in the value of the conditioned reinforcer that followed.

Figure 6. Design of experiment by Friedrich et al. 2005, Group Signaled Reinforcement) in which one stimulus signaled that food would be presented prior to the appearance of discriminative stimuli and the other stimulus signaled that food would not be presented prior to the appearance of a different pair of discriminative stimuli. Following extensive training, when pigeons were given a choice between the two positive stimuli, they preferred the one that followed the absence of food.

Figure 6. Design of experiment by Friedrich et al. 2005, Group Signaled Reinforcement) in which one stimulus signaled that food would be presented prior to the appearance of discriminative stimuli and the other stimulus signaled that food would not be presented prior to the appearance of a different pair of discriminative stimuli. Following extensive training, when pigeons were given a choice between the two positive stimuli, they preferred the one that followed the absence of food.

The Anticipation of Effort as the Aversive Event.

Can anticipated effort, rather than actual effort, serve as the aversive event that increases the value of stimuli signaling reinforcement that follows? This question addresses the issue of whether the positive contrast between the initial aversive event and the conditioned reinforcer depends on actually experiencing the aversive event. One account of the added value that accrues to stimuli that follow greater effort is that during training, the greater effort experienced produces a heightened state of arousal, and in that heightened state of arousal, the pigeons learn more about the discriminative stimuli that follow, than about the discriminative stimuli that follow the lower state of arousal produced by lesser effort. Examination of the acquisition functions for the two simultaneous discriminations offers no support for this hypothesis. Over the various experiments that we have conducted, there has been no tendency for the simultaneous discrimination that followed greater effort, longer delays, or the absence of reinforcement to have been acquired faster than the discrimination that followed less effort, shorter delays, or reinforcement. However, those discriminations were acquired very rapidly and there might have been a ceiling effect. That is, it might be easy to miss a small difference in the rate of discrimination acquisition sufficient to produce a preference for the conditioned reinforcer that follows the more aversive event.

Thus, the purpose of the anticipation experiments was to ask if we could obtain a preference for the discriminative stimuli that followed a signal that more effort might be required but actually was not required on that trial. More specifically, at the start of half of the training trials, pigeons were presented with, for example, a vertical line on the center response key. On half of these trials, pecking the vertical line replaced it with a white key and a single peck (low effort) to the white key resulted in reinforcement. On the remaining vertical-line trials, pecking the vertical line replaced it with a simultaneous discrimination S+L S-L on the left and right response keys and choice of the S+ was reinforced. A schematic presentation of the design of this experiment appears in Figure 7.

Figure 7. Design of experiment by Clement & Zentall (2002, Exp. 1) to determine the effect of the anticipation of effort (1 vs. 30 pecks). On some trials pigeons were presented with a vertical-line stimulus and 10 pecks would produce either a white stimulus (one peck to the white stimulus would produce reinforcement) or a choice between two colors (choice of the correct stimulus would be reinforced). On other trials pigeons were presented with a horizontal-line stimulus and 10 pecks would produce either a white stimulus (30 pecks to the white stimulus would produce reinforcement) or a choice between two other colors (choice of the correct stimulus would be reinforced). On probe trials, when given a choice between the two correct colors, the pigeons preferred the color associated with the horizontal-line stimulus (the correct stimulus that on other horizontal-line trials would have required 30 pecks to receive reinforcement).

Figure 7. Design of experiment by Clement & Zentall (2002, Exp. 1) to determine the effect of the anticipation of effort (1 vs. 30 pecks). On some trials pigeons were presented with a vertical-line stimulus and 10 pecks would produce either a white stimulus (one peck to the white stimulus would produce reinforcement) or a choice between two colors (choice of the correct stimulus would be reinforced). On other trials pigeons were presented with a horizontal-line stimulus and 10 pecks would produce either a white stimulus (30 pecks to the white stimulus would produce reinforcement) or a choice between two other colors (choice of the correct stimulus would be reinforced). On probe trials, when given a choice between the two correct colors, the pigeons preferred the color associated with the horizontal-line stimulus (the correct stimulus that on other horizontal-line trials would have required 30 pecks to receive reinforcement).

On the remaining training trials, the pigeons were presented with a horizontal line on the center response key. On half of these trials, pecking the horizontal line replaced it with a white key and 30 pecks (high effort) to the white key resulted in reinforcement. On the remaining horizontal-line trials, pecking the horizontal line replaced it with a different simultaneous discrimination S+H S-H on the left and right response keys and again choice of the S+ was reinforced. On test trials when the pigeons were given a choice between S+H and S+L, once again, they showed a significant (66.5%) preference for S+H.

It is important to note that in this experiment the events that occurred in training on trials, involving the two pairs of discriminative stimuli, were essentially the same. It was only on the other half of the trials, those trials on which the discriminative stimuli did not appear, that differential responding was required. Thus, the expectation of differential effort, rather than actual differential effort appears to be sufficient to produce a differential preference for the conditioned reinforcers that follow. These results extend the findings of the earlier research to include anticipated effort.

The Anticipation of the Absence of Reinforcement as the Aversive Event.

If anticipated effort can function as a relative conditioned aversive event, can the anticipated absence of reinforcement serve the same function? Using a design similar to that used to examine differential anticipated effort, we evaluated the effect of differential anticipated reinforcement (Clement & Zentall, 2002, Exp. 2). On half of the training trials, pigeons were presented with a vertical line on the center response key. On half of these trials, pecking the vertical line was followed immediately by reinforcement (high probability reinforcement). On the remaining vertical-line trials, pecking the vertical line replaced it with a simultaneous discrimination S+HP S-HP and choice of the S+ was reinforced, but only on a random 50% of the trials. A schematic presentation of the design of this experiment appears in Figure 8. On the remaining training trials, the pigeons were presented with a horizontal line on the center response key. On half of these trials, pecking the horizontal line was followed immediately by the absence of reinforcement (low probability reinforcement). On the remaining horizontal-line trials, pecking the horizontal line replaced it with a different simultaneous discrimination S+LP S-LP and again choice of the S+ was reinforced, but again, only on a random 50% of the trials. On test trials, when the pigeons were given a choice between S+HP and S+LP, they showed a significant (66.9%) preference for S+LP. Thus, the anticipation of an aversive, absence-of-food event appears to produce a preference for the S+ that follows the initial stimulus and that preference is similar to the anticipation of a high effort response.

Figure 8. Design of experiment by Clement & Zentall (2002, Exp. 2) to determine the effect of the anticipation of the absence of reinforcement. On some trials pigeons were presented with a vertical-line stimulus and 10 pecks would produce either reinforcement or a choice between two colors (choice of the correct stimulus would be reinforced 50% of the time). On other trials pigeons were presented with a horizontal-line stimulus and 10 pecks would produce either the absence of reinforcement or a choice between two other colors (choice of the correct stimulus would be reinforced 50% of the time). On probe trials, when given a choice between the two correct colors, the pigeons preferred the color associated with the horizontal-line stimulus (the correct stimulus that on other horizontal-line trials would have produced the absence of reinforcement).

Figure 8. Design of experiment by Clement & Zentall (2002, Exp. 2) to determine the effect of the anticipation of the absence of reinforcement. On some trials pigeons were presented with a vertical-line stimulus and 10 pecks would produce either reinforcement or a choice between two colors (choice of the correct stimulus would be reinforced 50% of the time). On other trials pigeons were presented with a horizontal-line stimulus and 10 pecks would produce either the absence of reinforcement or a choice between two other colors (choice of the correct stimulus would be reinforced 50% of the time). On probe trials, when given a choice between the two correct colors, the pigeons preferred the color associated with the horizontal-line stimulus (the correct stimulus that on other horizontal-line trials would have produced the absence of reinforcement).

In a follow-up experiment (Clement & Zentall, 2002, Exp. 3), we tried to determine whether preference for the discriminative stimuli associated with the anticipation of the absence of food was produced by the anticipation of positive contrast between the certain absence of food and a 50% chance of food (on discriminative stimulus trials) or negative contrast between the certain anticipation of food and a 50% chance of food (on the other set of discriminative stimulus trials). A schematic presentation of the design of this experiment appears at the top of Figure 9.

For Group Positive, the conditions of reinforcement were essentially nondifferential (i.e., reinforcement always followed vertical-line trials whether the discriminative stimuli S+HP S-HP were presented or not). Thus, on half of the vertical line trials, reinforcement was presented immediately for responding to the vertical line. On the remaining vertical-line trials, pecking the vertical line replaced it with a different simultaneous discrimination S+HP S-HP and reinforcement was presented for responding to the S+. Thus, there should have been little contrast established between these two kinds of trial.

On half of the horizontal-line trials, however, no reinforcement always followed responses to the horizontal line. On the remaining horizontal-line trials involving S+LP S-LP, reinforcement was presented for responding to the S+. Thus, for this group, on horizontal-line trials, there was the opportunity for positive contrast to develop on discriminative stimulus trials (i.e., the pigeons should expect that reinforcement might not occur on those trials and they might experience positive contrast when it does occur).

For Group Negative, on all horizontal-line trials the conditions of reinforcement were essentially nondifferential (i.e., the probability of reinforcement on horizontal-line trials was always 50% whether the trials involved discriminative stimuli or not). Thus, there should have been little contrast established between these two kinds of trial (see the bottom of Figure 9). That is, on half of the horizontal-line trials, reinforcement was provided immediately with a probability of .50 for responding to the horizontal line. On the remaining horizontal-line trials, the discriminative stimuli S+LP S-LP were presented and reinforcement was obtained for choices of the S+ but only on 50% of the trials.

On half of the vertical-line trials, however, reinforcement was presented immediately for responding to the vertical line (with a probability of 1.00). On the remaining vertical-line trials, the discriminative stimuli S+LP S-LP were presented and reinforcement was provided for choice of the S+ with a probability of 50%. Thus, for this group, on verticalline trials, there was the opportunity for negative contrast to develop on discriminative stimulus trials (i.e., the pigeons should expect that reinforcement is quite likely and they might experience negative contrast when it does not occur).

Figure 9. Design of experiment by Clement & Zentall (2002, Exp. 3) to determine if the effect of the anticipation of the absence of reinforcement was due to positive or negative contrast. For group positive (top panel), on some trials pigeons were presented with a vertical-line stimulus and 10 pecks would produce either reinforcement or a choice between two colors (choice of the correct stimulus S+HP would be reinforced 100% of the time, thus, no contrast). On other trials pigeons were presented with a horizontal-line stimulus and 10 pecks would produce either the absence of reinforcement or a choice between two other colors (choice of the correct stimulus S+LP would be reinforced 100% of the time, thus, positive contrast). On probe trials, when given a choice between the two correct colors, S+HP and S+LP the pigeons preferred the color associated with the horizontal-line stimulus (the correct stimulus that on other horizontal-line trials would have produced the absence of reinforcement), thus providing evidence for positive contrast (on the horizontal-line trials).

Figure 9. Design of experiment by Clement & Zentall (2002, Exp. 3) to determine if the effect of the anticipation of the absence of reinforcement was due to positive or negative contrast. For group positive (top panel), on some trials pigeons were presented with a vertical-line stimulus and 10 pecks would produce either reinforcement or a choice between two colors (choice of the correct stimulus S+HP would be reinforced 100% of the time, thus, no contrast). On other trials pigeons were presented with a horizontal-line stimulus and 10 pecks would produce either the absence of reinforcement or a choice between two other colors (choice of the correct stimulus S+LP would be reinforced 100% of the time, thus, positive contrast). On probe trials, when given a choice between the two correct colors, S+HP and S+LP the pigeons preferred the color associated with the horizontal-line stimulus (the correct stimulus that on other horizontal-line trials would have produced the absence of reinforcement), thus providing evidence for positive contrast (on the horizontal-line trials).

On test trials, when pigeons in Group Positive were given a choice between the two S+ stimuli, they showed a significant (60.1%) preference for the positive discriminative stimulus that in training was preceded by a horizontal line (the initial stimulus that on other trials was followed by the absence of reinforcement). Thus, Group Positive showed evidence of positive contrast.

When pigeons in Group Negative were given a choice between the two S+ stimuli, they showed a 58.1% preference for the positive discriminative stimulus that in training was preceded by a horizontal line (the initial stimulus that on other trials was followed by a lower probability of reinforcement than on comparable trials involving the vertical line). Thus, Group Negative showed evidence of negative contrast. In this case, it should be described as a reduced preference for the positive discriminative stimulus preceded by the vertical line, which on other trials was associated with a higher probability of reinforcement (100%). Considering the results from both Group Positive and Group Negative it appears that both positive and negative contrast contributed to the preferences found by Clement and Zentall, (2002, Exp. 2).

Hunger as the Aversive Event

According to the contrast model, if pigeons are trained to respond to one conditioned reinforcer when hungry and to respond to a different conditioned reinforcer when less hungry, when they are given a choice between the two conditioned reinforcers, they should prefer the conditioned reinforcer to which they learned to respond when hungrier. That is, they should prefer the stimulus that they experienced when they were in a relatively more aversive state. Vasconcelos and Urcuioli (2008b) tested this prediction by training pigeons to peck one colored stimulus on days when they were hungry and to peck a different colored stimulus on days when they were less hungry. On test days, when the pigeons were given a choice between the two colored stimuli, they showed a preference for the stimulus that they pecked when they were hungrier. Furthermore, this effect was not state-dependent because the pigeons preferred the color that they had learned to peck when hungrier, whether they were tested more or less hungry. Similar results were reported by Marsh, Schuck-Paim, and Kacelnik (2004) with starlings (see also Pompilio & Kacelnik, 2005). Furthermore, the effect appears to have considerable generality because Pompilio, Kacelnik, and Behmer (2006) were able to show similar effects in grasshoppers.

Within-Trial Contrast in Humans.

It can be argued that if within-trial contrast is analogous to justification of effort, one should be able to show similar effects with humans. In fact, when humans were given a modified version of the task used by Clement et al. (2000) a similar effect was found (Klein, Bhatt, & Zentall, 2005). The humans were told that they would have to “click on a mouse” to receive a pair of abstract shapes and by clicking on the shapes they could learn which shape was correct. On some trials, a single click was required to present one of two pairs of shapes and one shape from each pair was designated as correct. On the remaining trials, 20 clicks were required to present one of two different pairs of shapes and again one shape from each pair was designated as correct. Thus, there was a total of four pairs of shapes. On test trials, the subjects were asked to choose between pairs of correct shapes, one shape that had followed a single mouse click the other that had followed 20 mouse clicks. Consistent with the contrast hypothesis, subjects showed a significant (65.2%) preference for the shapes that followed 20 clicks. Furthermore, after their choice, when the subjects were asked why they had chosen those shapes, typically they did not know and most of them were not even aware of which shapes had followed the large and small response requirement. When a similar procedure was used with 8-year old children, they showed a similar 66.7% preference for the shapes that they had to work harder to obtain (Alessandri, Darcheville, & Zentall, 2008).

Contrast or Relative Delay Reduction?

We have described the preferences we have found for conditioned reinforcers (and feeder location) as a contrast effect. However, one could also interpret these effects in terms of relative delay reduction (Fantino & Abarca, 1985). According to the delay reduction hypothesis, any stimulus that predicts reinforcement sooner in its presence than in its absence will become a conditioned reinforcer. In the present experiments, the temporal relation between the conditioned reinforcers and the reinforcers was held constant, so one could argue that neither conditioned reinforcer should have served to reduce the delay to reinforcement more than the other. But the delay reduction hypothesis is meant to be applied to stimuli in a relative sense. That is, one can consider the predictive value of the discriminative stimuli relative to the time in their absence or, in the present case, to the total duration of the trial. If one considers delay reduction in terms of its duration relative to the duration of the entire trial, then the delay reduction hypothesis can account for the results of the present experiments. For example, in the case of the differential effort manipulation, as it takes longer to produce 20 responses (pecks or clicks) than to produce 1 response, 20-response trials would be longer in duration than 1-response trials. Thus, the appearance of the discriminative stimuli would occur relatively later in a 20-response trial than in a 1-response trial. The later in a trial that the discriminative stimuli appear, the closer would be their onset to reinforcement, relative to the start of the trial and thus, the greater relative reduction in delay that they would represent.

The delay reduction hypothesis can also account for the effect seen with a delay versus the absence of a delay. But what about trials with reinforcement versus trials without reinforcement? In this case, the duration of the trial is the same with and without reinforcement, prior to the appearance of the discriminative stimuli; however, delay reduction theory considers the critical time to be the interval between reinforcements. Thus, on trials in which the discriminative stimuli are preceded by reinforcement, the time between reinforcements is short, so the discriminative stimuli are associated with little delay reduction. On trials in which the discriminative stimuli are preceded by the absence of reinforcement, however, the time between reinforcements is relatively long (i.e., the time between reinforcement on the preceding trial and reinforcement on the current trial), so the discriminative stimuli on the current trial would be associated with a relatively large reduction in delay.

Delay reduction theory has a more difficult time accounting for the effects of differential anticipated effort because trials with both sets of discriminative stimuli were not differentiated by number of responses, delay, or reinforcement. Thus, all trials with discriminative stimuli should be of comparable duration. The same is true for the effects of differential anticipated reinforcement because that manipulation occurred on trials independent of the trials with the discriminative stimuli. Thus, taken as a whole, based on what has been presented to this point, the contrast account appears to offer a more parsimonious account of the data.

On the other hand, it should be possible to distinguish between the delay reduction and contrast accounts with the use of a design similar to that used in the first experiment, with one important change. Instead of requiring that the pigeons peck many times on half of the trials and a few times on the remaining trials, one could use two schedules that accomplish the same thing while holding the duration of the trial event constant. This could be accomplished by using a fixed interval schedule (FI, the first response after a fixed duration would present one pair of discriminative stimuli) on half of the trials and a differential reinforcement of other behavior schedule (DRO, the absence of key pecking for the same fixed duration would present the other pair of discriminative stimuli) on the remaining trials. Assuming that the pigeons prefer the DRO schedule (but it is not certain that they would), then according to the contrast account the pigeons should prefer the discriminative stimuli that follow the FI schedule over the discriminative stimuli that follow the DRO schedule. According to the delay reduction hypothesis, if trial duration is held constant and the two pairs of discriminative stimuli occupy the same relative proportion of the two kinds of trial, the pigeons should not differentially prefer either pair of discriminative stimuli, regardless of which schedule is preferred.

We tested the prediction of delay reduction theory by equating the trial duration on high effort and low effort trials by first training the pigeons to respond on a FI schedule to one stimulus on half of the trials and a DRO schedule to a different stimulus on the remaining trials (Singer, Berry, & Zentall, 2007). But before introducing the discriminative stimuli, we tested the pigeons for their schedule preference. We then followed the two schedules with discriminative stimuli as in the earlier research and finally tested the pigeons for their conditioned reinforcer preference (see Figure 10). Consistent with contrast theory, we found that the pigeons reliably preferred (by 63.2%) the discriminative stimuli that followed their least preferred schedule (Figure 11; see also Singer & Zentall, 2011, Exp. 1). Furthermore, consistent with a contrast account, as the schedule preference varied in direction and degree among the pigeons, we examined the correlation between schedule preference and preference for the conditioned reinforcer that followed and found a significant negative correlation (r = -.78). The greater the schedule preference the less they preferred the conditioned reinforcer that followed that schedule.

Figure 10. Design of experiment that controlled for the duration of a trial. Choice of the left key resulted in presentation of a horizontal line, for example, on the center key and if the pigeon refrained from pecking (DRO20s) the horizontal line, it could choose between a red (S+) and yellow (S-) stimulus on the side keys. Choice of the right key resulted in presentation of a vertical line on the center key and if the pigeon pecked (FI20s) the vertical line, it could choose between a green (S+) and blue (S-) stimulus on the side keys. Pigeons schedule preference was used to predict their preference for the S+ stimulus that followed the schedule on probe trials (after Singer, Berry, & Zentall, 2007).

Figure 10. Design of experiment that controlled for the duration of a trial. Choice of the left key resulted in presentation of a horizontal line, for example, on the center key and if the pigeon refrained from pecking (DRO20s) the horizontal line, it could choose between a red (S+) and yellow (S-) stimulus on the side keys. Choice of the right key resulted in presentation of a vertical line on the center key and if the pigeon pecked (FI20s) the vertical line, it could choose between a green (S+) and blue (S-) stimulus on the side keys. Pigeons schedule preference was used to predict their preference for the S+ stimulus that followed the schedule on probe trials (after Singer, Berry, & Zentall, 2007).

Further support for the contrast account came from an experiment in which there were 30-peck trials and single-peck trials but trial duration was extended on single-peck trials to equal the duration of 30-peck trials by inserting a delay following the single peck, equal to the time each pigeon took to complete the immediately-preceding 30-peck requirement (Singer & Zentall, 2011, Exp. 2). Once again, following a test to determine which schedule was preferred, discriminative stimuli were inserted following completion of the schedule and the pigeons’ preference for the conditioned reinforcers was assessed. Again, the pigeons preferred the conditioned reinforcer that followed the least-preferred schedule, 60.4% of the time (but see Vasconcelos, Lionello-DeNolf, & Urcuioli, 2007).

Figure 11. For each pigeon, probe trial preference for the S+ stimulus that followed the least preferred schedule in training (after Singer, Berry, & Zentall, 2007).

Figure 11. For each pigeon, probe trial preference for the S+ stimulus that followed the least preferred schedule in training (after Singer, Berry, & Zentall, 2007).

A different approach to equating trial duration was demonstrated with human subjects by Alessandri, Darcheville, Delevoye-Turrell, and Zentall (2008). Instead of using number of mouse clicks as the differential initial event, we used pressure on a transducer. On some trials, signaled by a discriminative stimulus, the subjects had to press the transducer lightly to produce a pair of shapes. On other trials, signaled by a different discriminative stimulus, the subjects had to press the transducer with greater force (50% of their maximum force assessed during pretraining). Following training, when subjects were given a choice between pairs of the conditioned reinforcers, they showed a significant 66.7% preference for those stimuli that had required the greater force to produce in training (and the effect was independent of the force required on test trials). Thus, further support for the contrast account was obtained under conditions in which it would be difficult to account for the effect by delay reduction theory.

Failures to Replicate the Within-Trial Contrast Effect

Several studies have reported a failure to obtain a contrast effect of the kind reported by Clement et al. (2000). Such reports are instructive because they can help to identify the boundary conditions for observing the effect. The first of these studies was reported by Vasconcelos, Urcuioli, and Lionello-DeNolf (2007) who attempted to replicate the original Clement et al. finding with 20 sessions of training beyond acquisition of the simple simultaneous discriminations that were acquired very quickly. It should be noted, however, that in more-recent research we have found that the amount of training required to establish the within-trial contrast effect is often greater than that used by Clement et al. Although Clement et al. found a contrast effect with 20 sessions of additional training, later research suggested that up to 60 sessions of training is often required to obtain the effect (see, e.g., Friedrich & Zentall, 2004).

Arantes and Grace (2007) also failed to replicate the contrast effect. In their first experiment they tested their pigeons without overtraining and in their second experiment they tested their pigeons at various points up to 27 sessions of overtraining. Thus, once again it may be that insufficient training was provided. However, in their second experiment, a subgroup of four pigeons was given more than twice the number of training sessions and although they did find a preference for the conditioned reinforcer that followed the greater effort in training, it was not statistically reliable. However, the smaller contrast effect reported by Arantes and Grace may be attributable to the extensive prior experience (in a previous experiment) that these pigeons had had with lean variable interval schedules. It is possible that the prior experience with lean schedules sufficiently reduced the aversiveness of the 20-peck requirement to reduce the magnitude of the contrast effect that they found. Another factor that may have contributed to the reduced magnitude of their effect was the use of a 6-s delay between choice of the conditioned reinforcer and reinforcement. Although Clement et al. (2000) also included a 6-s delay, later research suggested that contrast effects at least as large can be obtained if reinforcement immediately follows choice of the conditioned reinforcer.

Finally, Vasconcelos and Urcuioli (2008a) noted that they too failed to find a significant contrast effect following extensive overtraining. However, the effect that they did find (about 62% choice of the conditioned reinforcer that followed the greater pecking requirement) was quite comparable in magnitude to the effect reported by Clement et al. (2000). Their failure to find a significant effect may be attributed to the fact that there were only four pigeons in their experiment. That is, their study may have lacked sufficient power to observe significant within-trial contrast. Thus, the several failures to find a contrast effect with procedures similar to those used by Clement et al. suggest that observation of the contrast effect may require considerable overtraining, the absence of prior training with lean schedules of reinforcement, and a sufficient sample size to deal with individual differences in the magnitude of the effect.

The Nature of the Contrast

The contrast effects found in the present research appear to be somewhat different from the various forms of contrast that have been reported in the literature (see Flaherty, 1996). Flaherty distinguishes among three kinds of contrast.

Incentive Contrast

In incentive contrast, the magnitude of reward that has been experienced for many trials, suddenly changes, and the change in behavior that follows is compared with the behavior of a comparison group that has experienced the final magnitude of reinforcement from the start. Early examples of incentive contrast were reported by Tinklepaugh (1928), who found that if monkeys were trained for a number of trials with a preferred reward (e.g., fruit), when they then encountered a less preferred reward (e.g., lettuce, a reward that they would normally readily work for) they often would refuse to eat it.

Incentive contrast was more systematically studied by Crespi (1942, see also Mellgren, 1972). Rats trained to run down an alley for a large amount of food and shifted to a small amount of food, typically run slower than rats trained to run for the smaller amount of food from the start (negative incentive contrast). Conversely, rats trained to run for a small amount of food and shifted to a large amount of food may run faster than rats trained to run for the larger amount of food from the start (positive incentive contrast). By its nature, incentive contrast must be assessed following the shift in reward magnitude rather than in anticipation of the change because, generally, only a single shift is experienced.

Capaldi (1972) has argued that negative successive incentive contrast of the kind studied by Crespi (1942) can be accounted for as a form of generalization decrement (the downward shift in incentive value represents not only a shift in reinforcement value but also a change in context), however, generalization decrement is not able to account for positive successive incentive contrast effects (also found by Crespi and in the present research) when the magnitude of reinforcement increases.

Incentive contrast would seem to be an adaptive mechanism by which animals can increase their sensitivity to changes in reinforcement density. Just as animals use lateral inhibition in vision to help them discriminate spatial changes in light intensity resulting in enhanced detection of edges (or to provide better figure-ground detection), so too may incentive contrast help the animal detect changes in reinforcement magnitude important to its survival. Thus, incentive contrast may be a perceptually-mediated detection process.

Anticipatory Contrast

In a second form of contrast, anticipatory contrast, there are repeated (typically one a day) experiences with the shift in reward magnitude, and the measure of contrast involves behavior that occurs prior to the anticipated change in reward value. Furthermore, the behavior assessed is typically consummatory behavior rather than running speed. For example, rats often drink less of a weak saccharin solution if they have learned that it will be followed by a strong sucrose solution, relative to a control group for which saccharin is followed by more saccharin (Flaherty, 1982). This form of contrast differs from others in the sense that the measure of contrast involves differential rates of the consumption of a reward (rather than an independent behavior such as running speed).

Behavioral Contrast

A third form of contrast involves the random alternation of two signaled outcomes. When used in a discrete-trials procedure with rats, the procedure has been referred to as simultaneous incentive contrast. Bower (1961), for example, reported that rats trained to run down an alley to both large and small signaled magnitudes of reward ran slower to the small magnitude of reward than rats that ran only to the small magnitude of reward.

The more-often-studied, free-operant analog of this task is called behavioral contrast. To observe behavioral contrast, pigeons are trained on an operant task involving a multiple schedule of reinforcement. In a multiple schedule, two (or more) schedules, each signaled by a distinctive stimulus, are randomly alternated. Positive behavioral contrast can be demonstrated by training pigeons initially with equal probability of reinforcement schedules (e.g., two variable-interval 60-s schedules) and then reducing the probability of reinforcement in one schedule (e.g., from variable-interval 60-s to extinction) and noting an increase in the response rate in the other, unaltered schedule (Halliday & Boakes, 1971; Reynolds, 1961). Similar results can be demonstrated in a between groups design (Mackintosh, Little, & Lord, 1972) in which pigeons are trained on the multiple variable-interval 60-s and extinction schedules from the start, and their rate of pecking during the variable-interval 60-s schedule is compared with other pigeons that have been trained on two variable-interval 60-s schedules.

The problem with classifying behavioral contrast according to whether it involves a response to entering the richer schedule (as with incentive contrast) or the anticipation of entering the poorer schedule (as with anticipatory contrast) is, during each session, there are multiple transitions from the richer to the poorer schedule and vice versa. Thus, when one observes an increase in responding in the richer schedule resulting from the presence of the poorer schedule at other times, it is not clear whether the pigeons are reacting to the preceding poorer schedule or they are anticipating the next poorer schedule.

Williams (1981) attempted to distinguish between these two mechanisms by presenting pigeons with triplets of trials in a ABA design (with the richer schedule designated as A) and comparing their behavior to that of pigeons trained with an AAA design. Williams found very different kinds of contrast in the two A components of the ABA schedule. In the first A component, Williams found a generally higher level of responding that was maintained over training sessions (see also Williams, 1983). In the second A component, however, he found a higher level of responding primarily at the start of the component, an effect known as local contrast, but the level of responding was not maintained over training sessions (see also, Cleary, 1992). Thus, there is evidence that behavioral contrast may be attributable primarily to the higher rate of responding by pigeons in anticipation of the poorer schedule rather than in response to the appearance of the richer schedule (Williams, 1981; see also Williams & Wixted, 1986).

It is generally accepted that the higher rate of responding to the stimulus associated with the richer schedule of reinforcement occurs because, in the context of the poorer schedule, that stimulus is relatively better at predicting reinforcement (Keller, 1974). Or in more cognitive terms, the richer schedule seems even better in the context of a poorer schedule.

There is evidence, however, that it is not that the richer schedule appears better, but that the richer schedule will soon get worse. In support of this distinction, although pigeons peck at a higher rate at stimuli that predict a worsening in the probability of reinforcement, it has been found that when given a choice, pigeons prefer stimuli that they respond to less but that predict no worsening in the probability of reinforcement (Williams, 1992). Thus, curiously, under these conditions, response rate has been found to be negatively correlated with choice.

The implication of this finding is that the increased responding associated with the richer schedule does not reflect its greater value to the pigeon, but rather its function as a signal that conditions will soon get worse because the opportunity to obtain reinforcement will soon diminish. This analysis suggests that the mechanism responsible for anticipatory contrast (Flaherty, 1982) and, in the case of behavioral contrast, responding in anticipation of a worsening schedule (Williams, 1981), is likely to be a compensatory or learned response. In this sense, these two forms of contrast are probably quite different from the perceptual-like detection process involved in incentive contrast.

The Present Within-Trial Contrast Effect

What all contrast effects have in common is the presence, at other times, of a second condition that is either better or worse than the target condition. The effect of the second condition often is to exaggerate the difference between the two conditions. Although there have been attempts to account for these various contrast effects, Mackintosh (1974) concluded that no single principle will suffice (see also Flaherty, 1996). Thus, even before the contrast effect reported by Clement et al. (2000) and presented here was added to the list, contrast effects resisted a comprehensive explanation.

Procedurally, the positive contrast effect reported by Clement et al. (2000) appears to be most similar to that involved in anticipatory contrast (Flaherty, 1982) because in each case there is a series of paired events, the second of which is better than the first. High effort is followed by discriminative stimuli in the case of the Clement et al. procedure, and a low concentration of saccharin is followed by a higher concentration of sucrose in the case of anticipatory contrast. However, the effect reported by Clement et al. is seen in a choice response made in the presence of the second event (i.e., preference for one conditioned reinforcer over the other) rather than the first (i.e., differential consumption of the saccharin solution).

Alternatively, although successive incentive contrast and the contrast effect reported by Clement et al. (2000) both involve a change in behavior during the second component of the task, the mechanisms responsible for these effects must be quite different. In the case of the Clement et al. procedure, the pigeons experienced the two-event sequences many hundreds of times prior to test and thus, they could certainly learn to anticipate the appearance of the discriminative stimuli and the reinforcers that followed, whereas in the case of successive incentive contrast, the second component of the task could not be anticipated.

The temporal relations involved in the within-trial contrast effect reported by Clement et al. (2000) would seem more closely related to those that have been referred to as local contrast (Terrace, 1966). As already noted, local contrast refers to the temporary change in response rate that occurs following a stimulus change that signals a change in schedule. But local contrast effects tend to occur early in training and they generally disappear with extended training. Furthermore,  Furthermore, if local contrast was responsible for the contrast effect reported by Clement et al., they should have found a higher response rate to the positive stimulus that followed the higher effort response than to the positive stimulus that followed the lower effort response. But differences in response rate have not been found, only differences in choice. Thus, the form of contrast characteristic of the research described in this review appears to be different from the various contrast effects described in the literature. First, the present contrast effect is a within-subject effect that is measured by preference score. Second, in a conceptual sense, it is the reverse of what one might expect based on more-typical contrast effects. Typically, a relatively-aversive event (e.g., delay to reinforcement) is judged to be more aversive (as measured by increased latency of response or decreased choice) when it occurs in the context of a less-aversive event that occurs on alternative trials (i.e., it is a between-trials effect). The contrast effect described here is assumed to occur within trials and the effect is to make the events that follow the relatively aversive event more preferred than similar events that follow less-aversive events. Thus, referring to this effect as a contrast effect is descriptive but it is really quite different from the other contrast effects described by Flaherty (1996). For all of the above reasons we consider the contrast effect presented here to be different from other contrast effects that have been studied in the literature and we propose to refer to it as within-trial contrast.

Possibly Related Psychological Phenomena

The within-trial contrast effect described here may be related to other psychological phenomena that have been described in the literature.

Contrafreeloading.

A form of contrast similar to that found in the present experiment may be operating in the case of the classic contrafreeloading effect (e.g., Carder & Berkowitz, 1970; Jensen, 1963; Neuringer, 1969). For example, pigeons trained to peck a lit response key for food will often obtain food by pecking the key even when they are presented with a dish of free food. Although it is possible that other factors contribute to the contrafreeloading effect (e.g., reduced familiarity with the free food in the context of the operant chamber, Taylor, 1975, or perhaps preference for small portions of food spaced over time), it is also possible that the pigeons value the food obtained following the effort of key pecking more than the free food, and if the effort required is relatively small, the added value of food for which they have to work may at times actually be greater than the cost of the effort required to obtain it.

Justification of Effort.

As mentioned earlier, justification of effort in humans has been attributed to the discrepancy between one’s beliefs and one’s behavior (Aronson & Mills, 1959). The present research suggests that contrast may be a more parsimonious interpretation of this effect not only in pigeons but also in humans. In fact, the present results may have implications for a number of related phenomena that have been studied in humans.

The term work ethic has often been used in the human literature to describe a value or a trait that varies among members of a population as an individual difference (e.g., Greenberg 1977). But it also can be thought of as a typically human characteristic that appears to be in conflict with traditional learning theory (Hull, 1943). Work (effort) is generally viewed as at least somewhat aversive and as behavior to be reduced, especially if less-effortful alternatives are provided to obtain a reward. Other things being equal, less work should be preferred over more work (and in general it is).Yet, it is also the case that work, per se, is often valued in our culture. Students are often praised for their effort independent of their success. Furthermore, the judged value of a reward may depend on the effort that preceded it. For example, students generally value a high grade that they have received in a course not only for its absolute value, but also in proportion to the effort required to obtain it. Consider the greater pride that a student might feel about an A grade in a difficult course (say, Organic Chemistry) as compared to a similar A grade in an easier course (say, Introduction to Golf).

Although, in the case of such human examples, cultural factors, including social rewards, may be implicated, a more fundamental, nonsocial mechanism may also be present. In the absence of social factors, it may generally be the case (as in the present experiments) that the contrast between the value of the task prior to reward and at the time of reward may be greater following greater effort than following less
effort.

Cognitive Dissonance.

As described earlier, when humans experience a tedious task, their evaluation of the aversiveness of the task is sometimes negatively correlated with the size of the reward provided for agreeing to describe the task to others as pleasurable, a cognitive dissonance effect (Festinger & Carlsmith, 1959). The explanation that has been given for the cognitive dissonance effect is that the conflict between attitude (the task was tedious) and behavior (participants had agreed to describe the task to another person as enjoyable) was more easily resolved when a large reward was given (“I did it for the large reward”) and thus, a more honest evaluation of the task could be provided. However, one could also view the contrast between effort and reward to be greater in the large reward condition than in the small reward condition. Thus, in the context of the large reward, looking back at the subjective aversiveness of the prior task, it might be judged as greater than in the context of small reward.

Intrinsic versus Extrinsic Reinforcement.

Contrast effects of the kind reported here may also be responsible for the classic finding that extrinsic reinforcement may reduce intrinsic motivation (Deci, 1975; but see also Eisenberger & Cameron, 1996). If rewards are given for activities that may be intrinsically rewarding (e.g., puzzle solving), providing extrinsic rewards for such an activity may lead to a subsequent reduction in that behavior when extrinsic rewards are no longer provided. This effect has been interpreted as a shift in self-determination or locus of control (Deci & Ryan,
1985; Lepper, 1981). But such effects can also be viewed as examples of contrast. In this case, it may be the contrast between extrinsic reinforcement and its sudden removal that is at least partly responsible for the decline in performance (Flora, 1990). Such contrast effects are likely to be quite different from those responsible for the results of the present experiment, however, because the removal of extrinsic reinforcement results in a change in actual reward value, relative to the reward value expected (i.e., the shift from a combination
of both extrinsic and intrinsic reward to intrinsic reward alone). Thus, the effect of extrinsic reinforcement on intrinsic motivation is probably more similar to traditional reward shift effects of the kind reported by Crespi (1942, i.e., rats run slower after they have been shifted from a large to a small magnitude of reward than rats that have always experienced the small magnitude of reward).

Learned Industriousness.

Finally, contrast effects may also be involved in a somewhat different phenomenon that Eisenberger (1992) has called learned industriousness. Eisenberger has found that if one is rewarded for putting a large amount of effort into a task (compared to a small amount of effort into a task), it may increase ones general readiness to expend effort in other goal-directed tasks. Eisenberger has attributed this effect to the conditioned reward value of effort, a reasonable explanation for the phenomenon, but contrast may also be involved.

Depending on the relative effort required in the first and second tasks, two kinds of relative contrast are possible. First, if the target (second) task is relatively effortful, negative contrast between the previous low-effort task and the target task may make persistence on the second task more aversive for the low-effort group (and the absence of negative contrast less aversive for the high-effort group). Second, for the high-effort group, if the target task requires relatively little effort, positive contrast between the previous high-effort
task and the target task may make persistence less aversive. In either case, contrast provides a reasonable alternative account of these data.

Conclusions

From the previous discussion it should be clear that positive contrast effects of the kind reported in the present research may contribute to a number of experimental findings that have been reported using humans (and sometimes animals) but that traditionally have been explained using more complex cognitive and social accounts. Further examination of these phenomena from the perspective of simpler contrast effects may lead to more parsimonious explanations of what have previously been interpreted to be uniquely human phenomena.

But even if contrast is involved in these more complex phenomena, it may be that more cognitive factors, of the type originally proposed, may also play a role in these more complex social contexts. It would be informative, however, to determine the extent to which contrast effects contribute to these phenomena.

Finally, the description of the various effects as examples of contrast may give the mistaken impression that such effects are simple and are well understood. As prevalent as contrast effects appear to be, the mechanisms that account for them remain quite speculative. Consider the prevalence of the opposite effect, generalization, in which experience with one value on a continuum spreads to other values in direct proportion to their similarity to the experienced value (Hull, 1943). According to a generalization account, generalization between two values of reinforcement should tend to make the two values more similar to each other, rather than more different. An important goal of future research should be to identify the conditions that produce contrast and those than produce generalization.

At the very least, the presence of contrast implies some form of relational learning that cannot be accounted for by means of traditional behavioral theories. Thus, although contrast may provide an alternative, more parsimonious account of several complex social psychological phenomena, contrast should not be considered a simple mechanism. Instead it can be viewed as a set of relational phenomena that must be explained in their own right.


References

Alessandri, J., Darcheville, J.-C., & Zentall, T. R. (2008). Cognitive dissonance in children: Justification or contrast? Psychonomic Bulletin & Review, 15, 673-677. doi.org/10.3758/PBR.15.3.673 PMid:18567273

Alessandri, J., Darcheville, J.-C., Delevoye-Turrell, & Zentall, T. R. (2008). Preference for rewards that follow greater effort and greater delay. Learning & Behavior, 36, 352-358. doi.org/10.3758/LB.36.4.352 PMid:18927058

Amsel, A. (1958). The role of frustrative nonreward in noncontinuous reward situations. Psychological Bulletin, 55, 102–119. doi.org/10.1037/h0043125 PMid:13527595

Arantes, J. & Grace, R. C. (2007). Failure to obtain value enhancement by within-trial contrast in simultaneous and successive discriminations. Learning & Behavior, 36, 1-11. doi.org/10.3758/LB.36.1.1

Aronson, E., & Mills, J. (1959). The effect of severity of initiation on liking for a group. Journal of Abnormal and Social Psychology, 59, 177–181. doi.org/10.1037/h0047195

Aw, J., Vasconcelos, M., Kacelnik , A. (2011). How costs affect preferences: Experiments on state-dependence, hedonic state and within-trial contrast in starlings. Animal Behaviour, 81, 1117-1128 doi.org/10.1016/j.anbehav.2011.02.015

Bower, G. H. (1961). A contrast effect in differential conditioning. Journal of Experimental Psychology, 62, 196–199. doi.org/10.1037/h0048109

Capaldi, E. J. (1967). A sequential hypothesis of instrumental learning. In K. W. Spence & J. T. Spence (Eds.), The psychology of learning and motivation (Vol. 1, pp. 67–156). New York: Academic Press.

Capaldi, E. J. (1972). Successive negative contrast effect: Intertrial interval, type of shift, and four sources of generalization decrement. Journal of Experimental Psychology, 96, 433-438. doi.org/10.1037/h0033695

Carder, B., & Berkowitz, K. (1970). Rats preference for earned in comparison with free food. Science, 167, 1273–1274. doi.org/10.1126/science.167.3922.1273 PMid:5411917

Cleary, T. L. (1992). The relationship of local to overall behavioral contrast. Bulletin of the Psychonomic Society, 30, 58–60.

Clement, T. S., Feltus, J., Kaiser, D. H., & Zentall, T. R. (2000). “Work ethic” in pigeons: Reward value is directly related to the effort or time required to obtain the reward Psychonomic Bulletin & Review, 7, 100–106. doi.org/10.3758/BF03210727 PMid:10780022

Clement, T. S., & Zentall, T. R (2002). Second-order contrast based on the expectation of effort and reinforcement. Journal of Experimental Psychology: Animal Behavior Processes, 28, 64–74. doi.org/10.1037/0097-7403.28.1.64

Crespi, L. P. (1942). Quantitative variation in incentive and performance in the white rat. American Journal of Psychology, 40, 467–517. doi.org/10.2307/1417120

Deci, E. (1975). Intrinsic motivation. New York: Plenum. doi.org/10.1007/978-1-4613-4446-9

Deci, E., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior. New York: Plenum Press. PMid:3841237

DiGian, K. A., Friedrich, A. M., & Zentall, T. R. (2004). Reinforcers that follow a delay have added value for pigeons. Psychonomic Bulletin & Review, 11, 889–895. doi.org/10.3758/BF03196717

Egan, L. C., Santos, L. R., & Bloom, P. (2007). The origins of cognitive dissonance evidence from children and monkeys. Psychological Science, 18, 978−983. doi.org/10.1111/j.1467-9280.2007.02012.x PMid:17958712

Eisenberger, R. (1992). Learned industriousness. Psychological Review, 99, 248–267. doi.org/10.1037/0033-295X.99.2.248 PMid:1594725

Eisenberger, R., & Cameron, J. (1996). Detrimental effects of reward. American Psychologist, 51, 1153–1166. doi.org/10.1037/0003-066X.51.11.1153 PMid:8937264

Fantino, E., & Abarca, N. (1985). Choice, optimal foraging, and the delay-reduction hypothesis. Behavioral and Brain Sciences, 8, 315–330. doi.org/10.1017/S0140525X00020847

Festinger, L. (1957). A theory of cognitive dissonance. Stanford, CA: Stanford University Press.

Festinger L., & Carlsmith, J. M. (1959). Cognitive consequences of forced compliance. Journal of Abnormal and Social Psychology. 58, 203–210. doi.org/10.1037/h0041593

Flaherty, C. F. (1982). Incentive contrast. A review of behavioral changes following shifts in reward. Animal Learning & Behavior, 10, 409–440. doi.org/10.3758/BF03212282

Flaherty, C. F. (1996). Incentive relativity. New York: Cambridge University Press. PMCid:163278

Flora, S. R. (1990). Undermining intrinsic interest from the standpoint of a behaviorist. Psychological Record, 40, 323–346.

Friedrich, A. M., & Zentall, T. R. (2004). Pigeons shift their preference toward locations of food that take more effort to obtain. Behavioural Processes, 67, 405–415. doi.org/10.1016/j.beproc.2004.07.001 PMid:15518990

Friedrich, A. M., Clement, T. S., & Zentall, T. R. (2005). Discriminative stimuli that follow the absence of reinforcement are preferred by pigeons over those that follow reinforcement. Learning & Behavior, 33, 337-342. doi.org/10.3758/BF03192862

Greenberg, J. (1977). The Protestant work ethic and reactions to negative performance evaluations on a laboratory task. Journal of Applied Psychology, 62, 682–690. doi.org/10.1037/0021-9010.62.6.682

Halliday, M. S., & Boakes, R. A. (1971). Behavioral contrast and response independent reinforcement. Journal of the Experimental Analysis of Behavior, 16, 429–434. http://dx.doi.org/10.1901/jeab.1971.16-429 PMid:16811560 PMCid:1333947

Hull, C. L. (1943). Principles of behavior. New York: Appleton-Century-Crofts. PMid:16578092 PMCid:1078614

Jensen, G. D. (1963). Preference of bar pressing over “freeloading” as a function of number of rewarded presses. Journal of Experimental Psychology, 65, 451–454. doi.org/10.1037/h0049174 PMid:13957621

Johnson, A. W., & Gallagher, M. (2011). Greater effort boosts the affective taste properties of food. Proceedings of the Royal Society B: Biological Sciences, 278, 1450-1456. doi.org/10.1098/rspb.2010.1581 PMid:21047860 PMCid:3081738

Kacelnik, A., & Marsh, B. (2002). Cost can increase preference in starlings. Animal Behaviour, 63, 245-250. doi.org/10.1006/anbe.2001.1900

Keller, K. (1974). The role of elicited responding in behavioral contrast. Journal of the Experimental Analysis of Behavior, 21, 249–257. doi.org/10.1901/jeab.1974.21-249 PMid:16811742 PMCid:1333192

Klein, E. D., Bhatt, R. S., & Zentall, T. R. (2005). Contrast and the justification of effort. Psychonomic Bulletin & Review, 12, 335-339. doi.org/10.3758/BF03196381

Lawrence, D. H., & Festinger, L. (1962). Deterrents and reinforcement: The psychology of insufficient reward. Stanford, CA: Stanford University Press.

Lieberman, M. D., Ochsner, K. N., Gilbert, D. T., & Schacter, D. L. (2001). Do amnesics exhibit cognitive dissonance reduction? The role of explicit memory and attention in attitude change. Psychological Science, 12, 135-140. doi.org/10.1111/1467-9280.00323 PMid:11340922

Lepper, M. R. (1981). Intrinsic and extrinsic motivation in children: Detrimental effects of superfluous social controls. In W. A. Collins (Ed.), Aspects of the development of competence: The Minnesota Symposium on Child Psychology (Vol. 14, pp. 155–214). Hillsdale, NJ: Lawrence Erlbaum.

Mackintosh, N. J. (1974) The psychology of animal learning. London: Academic Press.

Mackintosh, N. J., Little, L., & Lord, J. (1972). Some determinants of behavioral contrast in pigeons and rats. Learning and Motivation, 3, 148–161. doi.org/10.1016/0023-9690(72)90035-5

Marsh, B., Schuck-Palm, C., & Kacelnik, A. (2004). State-dependent learning affects foraging choices in starlings. Behavioral Ecology, 15, 396-399. doi.org/10.1093/beheco/arh034

Mellgren, R. L. (1972). Positive and negative contrast effects using delayed reinforcement. Learning and Motivation, 3, 185–193. doi.org/10.1016/0023-9690(72)90038-0

Neuringer, A. J. (1969). Animals respond for food in the presence of free food. Science, 166, 399–401. doi.org/10.1126/science.166.3903.399 PMid:5812041

Popilio, L., & Kacelnik, A. (2005). State-dependent learning and suboptimal choice: When starlings prefer long over short delays to food. Animal Behaviour, 70, 571-578. doi.org/10.1016/j.anbehav.2004.12.009

Popilio, L., Kacelnik, A., & Behmer, S. (2006). State dependent learned valuation drives choice in an invertebrate. Science, 311, 1613-1615. doi.org/10.1126/science.1123924 PMid:16543461

Reynolds, R. S. (1961). Behavioral contrast. Journal of the Experimental Analysis of Behavior, 4, 57–71. doi.org/10.1901/jeab.1961.4-57 PMid:13741096 PMCid:1403981

Singer, R. A., Berry, L. M., & Zentall, T. R. (2007). Preference for a stimulus that follows an aversive event: Contrast or delay reduction? Journal of the Experimental Analysis of Behavior, 87, 275-285. doi.org/10.1901/jeab.2007.39-06 PMid:17465316 PMCid:1832171

Singer, R. A. & Zentall, T. R. (2011). Preference for the outcome that follows a relative aversive event: Contrast or delay reduction? Learning and Motivation, 42, 255-271. doi.org/10.1016/j.lmot.2011.06.001 PMid:22993453 PMCid:3444245

Taylor, G. T. (1975). Discriminability and the contrafreeloading phenomenon. Journal of Comparative and Physiological Psychology, 88, 104–109. doi.org/10.1037/h0076222 PMid:1120788

Terrace, H. S. (1966). Stimulus control. In W. K. Honig (Ed.), Operant behavior: Areas of research and application. New York: Appleton-Century-Crofts.

Thorndike, E. L. (1932). The fundamentals of learning. New York: Teachers College. doi.org/10.1037/10976-000

Tinklepaugh, O. L. (1928). An experiment study of representative factors in monkeys. Journal of Comparative Psychology, 8, 197–236. doi.org/10.1037/h0075798

Vasconcelos, M., Urcuioli, P. J., & Lionello-DeNolf, K. M. (2007). Failure to replicate the ‘‘work ethic’’ effect in pigeons. Journal of the Experimental Analysis of Behavior, 87, 383–399. doi.org/10.1901/jeab.2007.68-06 PMid:17575903 PMCid:1868581

Vasconcelos, M. & Urcuioli, P. J. (2008a).Certainties and mysteries in the within-trial contrast literature: A reply to Zentall (2008). Learning & Behavior, 36, 23-25 doi.org/10.3758/LB.36.1.23

Vasconcelos, M. & Urcuioli, P. J. (2008b). Deprivation level and choice in pigeons: A test of within-trial contrast. Learning & Behavior, 36, 12-18. doi.org/10.3758/LB.36.1.12 PMid:18318422

Williams, B. A. (1981). The following schedule of reinforcement as a fundamental determinant of steady state contrast in multiple schedules. Journal of the Experimental Analysis of Behavior, 35, 293–310. doi.org/10.1901/jeab.1981.35-293 PMid:16812218 PMCid:1333085

Williams, B. A. (1983). Another look at contrast in multiple schedules. Journal of the Experimental Analysis of Behavior, 39, 345–384. doi.org/10.1901/jeab.1983.39-345 PMid:16812325 PMCid:1347926

Williams, B. A. (1992). Inverse relations between preference and contrast. Journal of the Experimental Analysis of Behavior, 58, 303–312. doi.org/10.1901/jeab.1992.58-303 PMid:16812667 PMCid:1322062

Williams, B. A., & Wixted, J. T. (1986). An equation for behavioral contrast. Journal of the Experimental Analysis of Behavior, 45, 47–62. doi.org/10.1901/jeab.1986.45-47 PMid:3950534 PMCid:1348210

Volume 8: pp. 29-59

kesner_figure1_smallNeurobiological Foundations of an Attribute Model of Memory

Raymond Kesner
University of Utah

Reading Options:

PDF | Add to Endnote | Kindle | eBook


Abstract

Memory is a complex phenomenon due to a large number of potential interactions that are associated with the organization of memory at the psychological and neural system level. In this review article a tripartite, multiple attribute, multiple process memory model with different forms of memory and its neurobiological underpinnings is represented in terms of the nature, structure or content of information representation as a set of different attributes including language, time, place, response, reward value (affect) and visual object as an example of sensory-perception. For each attribute, information is processed in the event-based, knowledge-based, and rule-based memory systems through multiple operations that involve multiple neural underpinnings. Of the many processes associated with the event-based memory system, the emphasis will be placed on short-term or working memory and pattern separation. Of the many processes associated with the knowledgebased memory system, the emphasis will be placed on perceptual processes. Of the many processes associated with the rule-based memory system the emphasis will be on short-term or working memory and paired associate learning. For all three systems data will be presented to demonstrate differential neuroanatomical mediation and where available parallel results will be presented in rodents, monkeys and humans.

Keywords

event-based memory, knowledge-based memory, rule-based memory, attribute memory model, hippocampus, amygdala, caudate nucleus, perirhinal cortex, prefrontal cortex, parietal cortex and TE2 cortex


Introduction

The structure and utilization of memory is central to one’s knowledge of the past, interpretation of the present, and prediction of the future. Therefore, the understanding of the structural and process components of memory systems at the psychological and neurobiological level is of paramount importance. There have been a number of attempts to divide learning and memory into multiple memory systems. Schacter & Tulving (1994) have suggested that one needs to define memory systems in terms of the kind of information to be represented, the processes associated with the operation of each system, and the neurobiological substrates including neural structures and mechanisms that subserve each system. Furthermore, it is likely that within each system, there are multiple forms or subsystems associated with each memory system and there are likely to be multiple processes that define the operation of each system. Finally, there are probably multiple neural structures that form the overall substrate of a memory system.

Currently, the most established models of memory can be characterized as dual memory system models with an emphasis on the hippocampus or medial temporal lobe for one component of the model and a composite of other brain structures as the other component. For example, Squire (1994) and Squire et al.(2004) have proposed that memory can be divided into a medial temporal-lobe-dependent declarative memory which provides for conscious recollection of facts and events and a non-hippocampal dependent nondeclarative memory which provides for memory without conscious access for skills, habits, priming, simple classical conditioning and non-associative learning. Others have used different terms to reflect the same type of distinction, including a hippocampal dependent explicit memory versus a non-hippocampal dependent implicit memory (Schacter, 1987), and a hippocampal dependent declarative memory based on the representation of relationships among stimuli versus a non-hippocampal dependent procedural memory based on the representation of a single stimulus or configuration of stimuli (Cohen & Eichenbaum, 1993; Eichenbaum, 2004). Olton (1983) has suggested a different dual memory system in which memory can be divided into a hippocampal dependent working memory defined as memory for the specific, personal, and temporal context of a situation and a non-hippocampal dependent reference memory, defined as memory for rules and procedures (general knowledge) of specific situations. Different terms have been used to reflect the same distinction including episodic versus semantic memory (Tulving, 1983).

However, memory is more complex and involves many neural systems in addition to the hippocampus. To remedy this situation, Kesner (2002) has proposed a tripartite attribute based theoretical model of memory which is organized into event-based, knowledge-based, and rule-based memory systems. Each system is composed of the same set of multiple attributes or forms of memory, characterized by a set of process oriented operating characteristics and mapped onto multiple neural regions and interconnected neural circuits (for more detail see Kesner 1998, 2002).

On a psychological level (see Tables 1, 2, 3), the eventbased memory system provides for temporary representations of incoming data concerning the present, with an emphasis upon data and events that are usually personal or egocentric and that occur within specific external and internal contexts. The emphasis is upon the processing of new and current information. During initial learning, great emphasis is placed on the event-based memory system, which will continue to be of importance even after initial learning in situations where unique or novel trial information needs to be remembered. This system is akin to episodic memory (Tulving, 1983) and some aspects of declarative memory (Squire, 1994).

kesner_table1

Table 1. Event-Based Memory

The knowledge-based memory system provides for more permanent representations of previously stored information in long-term memory and can be thought of as one’s general knowledge of the world. The knowledge-based memory system would tend to be of greater importance after a task has been learned, given that the situation is invariant and familiar. The organization of these attributes within the knowledge-based memory system can take many forms and are organized as a set of attribute-dependent cognitive maps and their interactions, that are unique for each memory. This system is akin to semantic memory (Tulving, 1983).

kesner_table2

Table 2. Knowledge-Based Memory

The rule-based memory system receives information from the event-based and knowledge-based systems and integrates the information by applying rules and strategies for subsequent action. In most situations, however, one would expect a contribution of all three systems with a varying proportion of involvement of one relative to the other.The three memory systems are composed of the same forms, domains, or attributes of memory. Even though there could be many attributes, the most important attributes include space, time, response, sensory-perception, and reward value (affect). In humans a language attribute is also added. A spatial (space) attribute within this framework involves memory representations of places or relationships between places. It is exemplified by the ability to encode and remember spatial maps and to localize stimuli in external space. Memory representations of the spatial attribute can be further subdivided into specific spatial features including allocentric spatial distance, egocentric spatial distance, allocentric direction, egocentric direction, and spatial location. A temporal (time) attribute within this framework involves memory representations of the duration of a stimulus and the succession or temporal order of temporally separated events or stimuli, and from a time perspective, the memory representation of the past. A response attribute within this framework involves memory representations based on feedback from motor responses (often based on proprioceptive and vestibular cues) that occur in specific situations as well as memory representations of stimulus-response associations. A reward value (affect) attribute within this framework involves memory representations of reward value, positive or negative emotional experiences, and the associations between stimuli and rewards. A sensory-perceptual attribute within this framework involves memory representations of a set of sensory stimuli that are organized in the form of cues as part of a specific experience. Each sensory modality (olfaction, auditory, vision, somatosensory, and taste) can be considered part of the sensory-perceptual attribute component of memory. A language attribute within this framework involves memory representations of phonological, lexical, morphological, syntactical, and semantic information. The attributes within each memory system can be organized in many different ways and are likely to interact extensively with each other, even though it can be demonstrated that these attributes do in many cases operate independent of each other. The organization of these attributes within the event-based memory system can take many forms and are probably organized hierarchically and in parallel. The organization of these attributes within the knowledge-based memory system can take many forms, are assumed to be organized as a set of cognitive maps or neural nets, and their interactions are unique for each memory. It is assumed that long-term representations within cognitive maps are more abstract and less dependent upon specific features. The organization of these attributes within the rule-based memory system can also take many forms; these are assumed to be organized to provide flexibility in executive function in developing rules, development of goals, and affecting decision processes.

kesner_table3

Table 3. Rule-Based Memory

Within each system attribute, information is processed in different ways based on different operational characteristics. For the event-based memory system, specific processes involve (a) selective filtering or attenuation of interference of temporary memory representations of new information and is labeled pattern separation, (b) encoding of new information, (c) short-term and intermediate-term memory for new information, (d) the establishment of arbitrary associations, (e) consolidation or elaborative rehearsal of new information, and (f) retrieval of new information based on flexibility, action, and pattern completion.

For the knowledge-based memory system, specific processes include (a) encoding of new information, (b) selective attention and selective filtering associated with permanent memory representations of familiar information, (c) perceptual memory, (d) consolidation and long-term memory storage partly based on arbitrary and/or pattern associations, and (e) retrieval of familiar information based on flexibility and action.

For the rule-based memory system, it is assumed that information is processed through the integration of information from the event-based and knowledge-based memory systems for the use of major processes that include the selection of strategies and rules for maintaining or manipulating information for subsequent decision making and action, as well as short-term or working memory for new and familiar information, development of goals, and affecting decision processes.

On a neurobiological level (see Tables 1, 2, 3) each attribute maps onto a set of neural regions and their interconnected neural circuits. For example, within the event-based memory system, it has been demonstrated that in animals and humans (a) the hippocampus supports memory for spatial, temporal and language attribute information, (b) the caudate mediates memory for response attribute information, (c) the amygdala subserves memory for reward value (affect) attribute information, and (d) the perirhinal and extrastriate visual cortex support memory for visual object attribute information as an example of a sensory-perceptual attribute and the ventral hippocampus supports memory for odor information as another example of a sensory–perceptual attribute (for more detail see Kesner, 1998, 2002).

Within the knowledge-based memory system, it has been demonstrated that in animals and humans (a) the posterior parietal cortex supports memory for spatial attributes, (b) the dorsal and dorsolateral prefrontal cortex and/or anterior cingulate support memory for temporal attributes, (c) the premotor, supplementary motor, and cerebellum in monkeys and humans and precentral cortex and cerebellum in rats support memory for response attributes, (d) the orbital prefrontal cortex supports memory for reward value (affect) attributes, (e) the inferotemporal cortex in monkeys and humans and TE2 cortex in rats subserves memory for sensoryperceptual attributes (e.g. visual objects), and (f) the parietal cortex, Broca and Wernicke’s areas subserve memory for the language attribute (for more detail see Kesner, 1998, 2002). Within the rule-based memory system it can be shown that different subdivisions of the prefrontal cortex support different attributes. For example,( a) the dorso-lateral and ventrolateral prefrontal cortex in humans support spatial, object, and language attributes and the infralimbic and prelimbic cortex in rats supports spatial and visual object attributes, (b) the pre-motor and supplementary motor cortex in monkeys and humans and precentral cortex in rats support response attributes, (c) the dorsal, dorso-lateral, and mid-dorsolateral prefrontal cortex in monkeys and humans and anterior cingulate in rats mediate primarily temporal attributes, and (d) the orbital prefrontal cortex in monkeys and humans and agranular insular cortex in rats support affect attributes (for more detail see Kesner, 2000, 2002).

Given the complexity of the nature of memory representations and the multitude of processes associated with learning and memory associated with any specific task, it is clear that prior to analyzing the neural circuits that support mnemonic processing, one must determine which attributes and which systems and associated underlying processes are essential for memory analysis of the proposed task. One example will suffice. If one assumes that the hippocampus supports the processing of the spatial attribute within the event-based memory system, then any task that minimizes the importance of the spatial attribute and emphasizes the importance of reward value, response, and sensory-perceptual attributes are not likely to involve the hippocampus. I will concentrate in this article primarily on specific processes for which there are sufficient data to determine the role of the neurobiological attribute-based model of memory for the eventbase, knowledge-based, and rule-based components of the attribute model.

Event-Based Memory

For the event-based memory system I will concentrate on specific processes that mediate short-term memory for new information and a selective filtering or attenuation of interference of temporary memory representations of new information which is labeled pattern separation. For the other processes including the establishment of arbitrary associations, consolidation or elaborative rehearsal of new information, and retrieval of new information based on flexibility, action, and pattern completion there is not a sufficient data set to differentiate the contribution of the different attributes associated with mnemonic processing of information.

Short-term or Working Memory — Spatial Attribute

The most extensive data set is based on the use of paradigms that measure the short-term or working memory process such as matching or non-matching-to-sample, delayed conditional discrimination, continuous recognition memory of single or lists of items, and recognition memory based on exploratory information and detection of novelty. Figure 1 depicts the location of the hippocampus in the rat. Figure 2 depicts the location of the different subregions of the hippocampus [dentate gyrus (DG), CA3, and CA1] as well as the medial and lateral perforant path inputs from the entorhinal cortex inputs into the different subregions of the hippocampus in the rat.

Figure 1. Pictorial representation of the hippocampus (HF), entorhinal cortex (EC), perirhinal cortex (PER), postrhinal cortex (POR), and amygdala (AMY) in the rat.

Figure 1. Pictorial representation of the hippocampus (HF), entorhinal cortex (EC), perirhinal cortex (PER), postrhinal cortex (POR), and amygdala (AMY) in the rat.

With respect to spatial attribute information, there is extensive data that show with the use of the above mentioned paradigms to measure short-term or working memory for spatial information that there are severe impairments for rats, monkeys, and humans with right hippocampal damage or bilateral hippocampal damage (Chiba, Kesner, Matsuo, & Heilbrun, 1990; Hopkins, Kesner, & Goldstein, 1995a; Kesner, 1990; Olton, 1983, 1986; Parkinson, Murray, & Mishkin, 1988; Pigott & Milner, 1993; Smith & Milner, 1981). To examine the temporal dynamic of hippocampal involvement in short-term and intermediate-term memory in the context of processing spatial information in humans, Holdstock, Shaw, and Aggleton (1995) tested patients with hippocampal damage with a delayed matching-to-sample paradigm analogous to tasks used for rats. In this task a single stimulus was presented in a specific location and following delays of 3-40 s, the patients had to remember that location compared with a location not previously seen. The results indicated that there were no memory deficits for delays up to 20 s followed by a deficit at the 40 s delay. However, Cave and Squire (1992) found no deficits for short-term memory for a dot on a line or memory for an angle. In a different experiment, hypoxic subjects with bilateral hippocampal damage were tested on a short-term memory test to determine whether the hippocampus supports short-term or intermediate-term memory for a spatial relationship based on distance information. Control subjects and hypoxic subjects with bilateral hippocampal damage were tested for memory for spatial distance information for delays of 1, 4, 8, 12, or 16 s. The hypoxic subjects had impaired memory for distance information at the long, but not short, delays compared to normal controls (Kesner & Hopkins, 2001).

Figure 2. The Hippocampal Network: The hippocampus forms a principally uni-directional network, with input from the Entorhinal Cortex (EC) that forms connections with the Dentate Gyrus (DG) and CA3 pyramidal neurons via the Perforant Path (PP – split into lateral and medial). CA3 neurons also receive input from the DG via the mossy fibres (MF). They send axons to CA1 pyramidal cells via the Schaffer Collateral Pathway (SC), as well as to CA1 cells in the contralateral hippocampus via the Associational Commissural pathway (AC). CA1 neurons also receive input directly from the Perforant Path and send axons to the Subiculum (Sb). These neurons in turn send the main hippocampal output back to the EC, forming a loop.

Figure 2. The Hippocampal Network: The hippocampus forms a principally uni-directional network, with input from the Entorhinal Cortex (EC) that forms connections with the Dentate Gyrus (DG) and CA3 pyramidal neurons via the Perforant Path (PP – split into lateral and medial). CA3 neurons also receive input from the DG via the mossy fibres (MF). They send axons to CA1 pyramidal cells via the Schaffer Collateral Pathway (SC), as well as to CA1 cells in the contralateral hippocampus via the Associational Commissural pathway (AC). CA1 neurons also receive input directly from the Perforant Path and send axons to the Subiculum (Sb). These neurons in turn send the main hippocampal output back to the EC, forming a loop.

With respect to specific spatial features, such as allocentric spatial distance, egocentric spatial distance, and spatial location, it has been shown in both rats and humans with bilateral hippocampal damage that there are severe deficits in short-term memory for these spatial features (Long & Kesner, 1996). These data are consistent with the recording of place cells (cells that increase their firing rate when an animal is located in a specific place) within the hippocampus of rats (Kubie & Ranck, 1983; McNaughton, Barnes, & O’Keefe, 1983, O’Keefe, 1983; O’Keefe & Speakman, 1987). One specific example is provided by the use of a continuous spatial recognition memory task where it has been shown that hippocampal lesions produced a profound deficit (Jackson-Smith, Kesner, & Chiba, 1993). However, it should be noted that lesions of the dorsal lateral thalamus, pre- and para-subiculum, medial entorhinal cortex and preand infra-limbic cortex produce profound deficits similar to what has been described for hippocampal lesions, suggesting that other neural regions contribute to the spatial attribute within the event-based memory system (Kesner, et al., 2001). The exact contribution of each of these areas needs to be investigated, especially because grid cells have been recorded from medial entorhinal cortex (Moser et al. 2008) and place cells have been recorded in the parasubiculum (Muller et al., 1996).

Short-term memory for the spatial direction feature has also been investigated. Based on a delayed matching-to-sample task for assessing memory for direction in rats, it was shown that hippocampal lesions disrupt memory for direction (DeCoteau, Hoang, Huff, Stone, & Kesner, 2004). It should be noted that medial caudate nucleus lesions also produced an impairment in memory for direction (DeCoteau et al., 2004). Furthermore, it is likely that spatial short-term memory representations within the hippocampus might be important to amplify a subsequent consolidation process when necessary and spatial short-term memory representations within the pre-and infra-limbic prefrontal cortex might be important to engage a retrieval, action or strategy selection process. Thus, in general, the hippocampus represents some of the spatial features associated with the spatial attribute, within short-term memory.

Based on a subregional analysis of hippocampal function, it appears that different subregions subserve differential roles in spatial processing of short-term memory. For example, using a paradigm developed by Poucet (1989), rats with CA3 or CA1 lesions were tested for the detection of a novel spatial configuration of familiar objects. The results indicated that CA3, but not CA1, lesions disrupted novelty detection of a spatial location (Lee, Jerman, & Kesner, 2005b). Based on the idea that the medial perforant path input into the CA3 or CA1 mediates spatial information via activation of NMDA receptors, rats received direct infusions of AP5 into the CA3 or CA1 and were tested for the detection of a novel spatial configuration of familiar objects and the detection of a novel visual object change using the same paradigm mentioned above. The results indicated that AP5 infusions into the CA3 disrupted both novelty detection of a spatial location and a visual object, whereas AP5 infusions into the CA1 disrupted novelty detection of a spatial location, but not the detection of a novel object (Hunsaker, Mooy, Swift, & Kesner, 2007). In this case, it appears the medial perforant path and the recurrent collateral system in CA3 were either actively maintaining the spatial and non-spatial information as a single behavioral episode in the network over the 3 min intersession interval or else the rich spatial context available to the rats on the test session was sufficient to guide retrieval of the previous experience to guide test performance, reflective of event-based memory processing. CA1, in the absence of recurrent circuitry, appeared to be acting directly upon the spatially rich medial perforant path inputs to retrieve the spatial information needed to perform the test. It is of interest that CA1, as opposed to CA3, did not appear to retrieve the overall behavioral episode in this case to guide retrieval, only the spatial aspects of the experience.

In other research, Lee, Rao, and Knierim (2004) showed physiologically that plasticity mechanisms in CA3 were activated only when animals encountered novel spatial configurations of familiar cues for the first time. Specifically, rats were trained to circle clockwise on a ring track whose surface was composed of four different textural cues (local cues). The ring track was positioned in the center of a curtained area in which various visual landmarks were also available along the curtained walls. To produce a novel cue configuration in the environment, distal landmarks and local cues on the track were rotated in opposite directions (distal landmarks were rotated clockwise and local cues were rotated counterclockwise by equal amounts). It is well known that principal cells in the hippocampus fire when the animal occupies a certain location of space, known as the place field of the cell. Mehta and colleagues (Mehta, Barnes, & McNaughton, 1997; Mehta, Quirk, & Wilson, 2000) originally showed that the location of the CA1 place field (measured by the center of mass of the place field) changed over time (shifting backward opposite to the direction of rat’s motion) in a familiar environment as the animal experienced the environment repeatedly. When the rats encountered the changed cue configurations for the first time in the Lee et al. (2004) experiment, the CA3 place fields shifted their locations backwards prominently compared to the place fields in CA1. However, such a prominent shift was not observed in CA3 from Day 2 onwards (CA1 place fields started to exhibit a similar property from Day 2). This double dissociation in the time course of plasticity between CA1 and CA3 place fields suggests that CA3 reacts rapidly to any changed components in the environment, presumably to incorporate the novel components into an existing event-based shortterm memory system or contribute to a new representation of the environment mediated by an event-based short-term memory system if changes are significant. CA1 appears to be performing a similar function, but within an intermediateterm event-based memory system as demonstrated by the different time course than CA3, suggesting that the representation of the behavioral episode in CA1 is processed on a more lengthy timescale than in CA3. These data suggest that in some cases CA3 processes information and communicates that information to CA1 via the Schaffer collateral projections. This is similar to a finding by Hampson, Heyser, and Deadwyler (1993) who recorded ensembles of CA3 and CA1 neurons during a spatial DNMS task. They found cells responsive to spatial location, nonspatial attributes of the task, as well as cells responsive to conjunctions of spatial and nonspatial information (called conjunctive cells in their report). What is of interest is that quite often they found activity in CA1 to be highly correlated to CA3 activity, but later in time, suggesting information transfer from CA3 to CA1.

Short-term or Working Memory — Temporal Attribute

Memory for duration. In this section I will concentrate on memory for duration rather than the processing of time and time perception as measured by time estimation and time-scale invariance. Previous research has indicated that fimbria-fornix lesioned rats are impaired in remembering the duration of a stimulus across a short delay interval, even though there is only a small change in estimating the passage of time (Meck, Church, & Olton, 1984; Olton, 1986; Olton, Wenk, Church, & Meck, 1988). In an attempt to replicate these results using a different paradigm, rats were trained on a short-term memory for duration task using a delayed symbolic conditional discrimination procedure (Jackson-Smith et al., 1998). It had previously been shown that rats acquire high proficiency in short-term memory for duration information (Santi & Weise, 1995). In the Jackson-Smith et al. (1998) experiment, the rats had to learn that a black rectangle stimulus that was visible for 2 s would result in a positive (go) reinforcement for one object (a ball) and no reinforcement (no go) for a different object (a bottle). However, when the black rectangle stimulus was visible for 8 s then there would be no reinforcement for the ball (no go), but a reinforcement for the bottle (go). After rats learned to respond differentially in terms of latency to approach the object, they received large (dorsal and ventral) lesions of the hippocampus, medial prefrontal cortex (anterior cingulate) lesions, or lesions of the cortex dorsal to the dorsal hippocampus. Following recovery from surgery they were retested. The results indicate that in contrast to cortical control lesions, there were major impairments following hippocampal lesions, as indicated by smaller and statistically non-significant latency differences between positive and negative trials on post-surgery tests. In order to ensure that the deficits observed with hippocampal lesions were not due to a discrimination problem, new rats were trained in an object (black rectangle) duration discrimination task. In this situation the rats were reinforced for either a 2 or 10 s exposure (duration) of the black rectangle. The stimulus was presented and remained visible for either 2 or 10 s, following which the door was raised and latency to move the stimulus was measured. Half of the animals in each group received a piece of Froot Loop on trials with a short stimulus duration and the other half were reinforced on those trials with a long stimulus duration. After rats learned to respond differentially in terms of latency to approach the object, they received large (dorsal and ventral) lesions of the hippocampus, as well as medial prefrontal cortex lesions for comparison purposes, or lesions of cortex dorsal to the dorsal hippocampus. Following recovery from surgery the rats were retested. The results indicate that after hippocampal lesions there was an initial deficit followed by complete recovery.

Thus, the hippocampus mediates memory for duration, but does not mediate duration discrimination. The data are consistent with previous research that indicates that fimbriafornix rats are impaired in remembering the duration of a stimulus across a short delay interval, even though there is only a small change in estimating the passage of time (Meck, Church, & Olton, 1984; Olton, 1986; Olton et al., 1988). Furthermore, it has been suggested that trace conditioning requires memory for the duration of the conditioned stimulus. Thus, it is of importance to note that rabbits with hippocampal lesions are impaired in acquisition (consolidation) of trace but not delayed eye-blink conditioning (Moyer, Deyo, & Disterhoft, 1990). Based on a subregional analysis of hippocampal function, it can be shown that the dorsal CA1 supports object-trace odor paired associate learning (Kesner et al., 2005), but both the dorsal CA1 and CA3 support object trace-place paired associate learning (Hunsaker et al., 2006) and the ventral CA1 supports trace fear conditioning (Rogers et al., 2006). It should be noted that the hippocampus is not directly involved in representing memory information concerning specific objects (Bussey, Saksida, & Murray, 2002; Kesner, Bolland, & Dakis, 1993; Mumby & Pinel, 1994; Norman & Eacott, 2004).

To what extent can one generalize from hippocampal function in rats to humans with respect to memory representation of duration as one feature of temporal attribute information? To answer this question, a number of experiments were conducted using humans that were exposed to hypoxia due to a variety of causes, but primarily carbon monoxide poisoning (Hopkins & Kesner, 1994; Hopkins et al., 1995a). These subjects have anterograde amnesia and, based on MRI data, have bilateral damage to the hippocampus, but no detectable damage to the entorhinal cortex, parahippocampal gyrus, or temporal cortex. They also show no signs of prefrontal cortex dysfunction based on normal performance on tests of fluency and Wisconsin Card sorting. The hypoxic subjects with hippocampus damage and age matched controls were tested for short-term memory for duration of a visual object. Subjects were presented with a single object (square, circle, etc) on a computer screen for a duration of 1 or 3 s. They were instructed to remember the duration of presentation of the object. After a delay of 1, 4, 8, 12, 16, or 20 s, the same object appeared for the same or different duration. The subjects were asked to indicate whether the duration was the same or different from the duration shown in the study phase. The results indicate that the hypoxic subjects were impaired relative to control damaged subjects in short-term memory for duration for all but the shortest delay (Kesner & Hopkins, 2001). In order to determine whether the deficits may have been due to impaired memory for the objects per se, a control task was administered to the same subjects. They were presented with a single object for 1 or 3 s and were asked to remember the object. After a delay of 1, 4, 8, 12, 16, or 20 s either the identical or a different object appeared on the screen. The subjects were asked if it was the same or a different object. The results indicate that there were minimal differences between the hypoxic and control subjects (Kesner & Hopkins, 2001). The impairment could not be due to an inability to estimate time accurately, because in an additional experiment with objects the subjects were asked to estimate the time elapsed before each of the 1, 4, 8, 12, 16, 20 s delay intervals. The results indicate that for hypoxic subjects, time estimates were accurate up to 8 s followed by some underestimation with longer delays, so that short-term memory for the duration of 1 or 3 s stimulus exposure could not be due to difficulty in estimating time (Kesner & Hopkins, 2001). The process of estimating time may not require active participation of short-term memory and may, therefore, appear to be independent of short-term memory for the duration of exposure of a stimulus.

Rat data are also consistent with previous research which indicated that humans with hypoxia resulting in bilateral hippocampal damage are impaired in acquisition (consolidation) of trace but not delayed eye-blink conditioning (Disterhoft, Carrillo, Hopkins, Gabrieli, & Kesner, 1996). Thus, the results suggest that like rodents, humans with hippocampal damage have difficulty in representing short-term memory for duration of an object, but not short-term memory for a single object.

Memory for sequential spatial information. In order to examine sequential learning of spatial information, a task was developed in which rats were required to remember multiple places. During the study phase, rats were presented with four different places within sections that were sequentially visited by opening of one door to a section at a time on a newly devised maze (i.e., Tulum maze). Each place was cued by a unique object that was specifically associated with each location within the section during the study phase. Following a 15 s delay and during the test phase, one door to one section would be opened and in the absence of the cued object in that section, rats were required to recall and revisit the place within that section of the maze that had been previously visited. Once animals were able to reliably perform this short-term episodic memory task, they received lesions to either CA3 or CA1 subregions of the hippocampus. Both CA1 and CA3 lesions disrupted accurate relocation of a previously visited place (Lee et al., 2005a).

In a different task, rats learned trial-unique sequences of spatial locations along a runway box. Each trial consisted of a study phase made up of the presentation of a linear sequence of four spatial locations marked by neutral blocks. After a 30 s interval, the animal was given the test phase. The test phase consisted of the same sequence presented during the study phase, but although one of the spatial locations was not marked by a block, it still contained a reward. The unmarked spatial location was pseudo-randomly distributed equally between the first, second, third, and fourth item in the sequence. To receive a reward, the rat had to visit the correct, unmarked spatial location. Once animals were able to reliably perform this short-term event-based memory task, they received lesions to either CA3 or CA1. Animals with lesions to either CA3 or CA1 had difficulty with short-term event-based memory processing, although CA1 lesioned animals had a much greater deficit. However, when animals were trained on a fixed version of the same task, hippocampal lesions had no effect. These results suggest that CA3 and CA1 both contribute to short-term event-based memory processing, since lesions to CA3 or CA1 result in an inability to process spatial information within the event-based memory system, whereas they have no effect on non-event-based memory information processing (Hunsaker et al., 2008).

In order to determine temporal order memory for visual objects, we used a paradigm described by Hannesson, Howland, and Phillips (2004). This paradigm involves long duration study phases (on the order of minutes) and long duration tests (also on the order of minutes) for temporal preference and thus it is likely to involve more directly the intermediate-term event-based memory processes we have proposed for CA1. In this experiment, rats with CA3 or CA1 lesions were placed inside a box to explore each set of three objects (referred to as A-A, B-B, and C-C) for 5 min with a 3 min inter-session interval. After the third set of objects, the rats were given a 3 min time-out after which one of the two A objects and one of the two C objects were placed in opposite ends of the box. The rats were then returned to the box to measure preference for A versus C for 5 min. On a subsequent day with new objects, the same animals were tested for detection of a novel object as a control using the same procedure previously described with the exception that one of the two A objects and one new object D were placed in opposite ends of the box to measure preference for A versus D for 5 min. All rats were tested once in the A-C preference test (temporal order) and on the A-D preference test (detection of object novelty) for a total of 2 days of testing. The results indicated that CA1 lesions impaired choice (they preferred C over A), but CA3 lesioned rats showed the same preference as controls (they preferred A over C). All groups preferred D in the novelty test (Hoge & Kesner, 2007; Hunsaker et al., 2008). The data indicate that controls prefer A rather than C. In order to explain this preference for A, it is assumed that rats prefer A because the rat has had more time for consolidation within an intermediate-term event-based memory operation for object A in comparison with object C and thus has greater memory strength for object A. Furthermore, CA1, but not CA3, lesioned rats prefer C suggesting an impairment for CA1, but not CA3, in temporal order memory for visual objects. A possible explanation for the observation that CA1 lesioned rats prefer C rather than A is based on the assumption that the trace of A has not been consolidated properly and thus may be difficult to retrieve, but C may still be processed by the short-term event-based memory system mediated by CA3. Thus, the rat prefers C because of a short-term recency effect. This would also explain the lack of deficit observed following a lesion of CA3.

Using the same paradigm as described above but with shorter delays to examine the effects of dorsal and ventral CA1 lesions on temporal and novelty processing of visual objects, odor, and spatial location information revealed that memory for temporal order information for visual objects is impaired following dorsal and ventral CA1 lesions, for odors following ventral CA1, but not dorsal CA1 lesions, and for spatial locations for dorsal CA1, but not ventral CA1 lesions (Hunsaker et al., 2008). Thus, CA1 appears to be involved in separating events in time for spatial and nonspatial information, so that one event can be remembered distinctly from another event, but ventral CA1 might play a more important role than dorsal CA1 for odor information. There were no disruptive effects for dorsal or ventral CA1 lesions on novelty detection for odors, spatial locations, and objects (Hunsaker et al., 2008). It has been shown, however, that lesions to CA3 eliminate any preference for one spatial location over another, suggesting CA3 is also involved in temporal ordering for spatial locations, but only insofar as the information to be temporally processed is spatial in nature.

Short-term or Working Memory — Response Attribute

With respect to response attribute information, it can be shown that with the use of the above mentioned paradigms to measure short-term memory, that for rats with caudateputamen lesions and humans with caudate-putamen damage due to Huntington’s disease (HD), there are profound deficits for a right or left turn response or a list of hand motor movement responses (Cook & Kesner, 1988; Davis, Filoteo, Kesner, & Roberts, 2003; Kesner et al., 1993; Figure 3 depicts the location of the caudate nucleus in the rat). For example, it has been shown that electrolytic induced caudate lesions in rats impair short-term or working memory for a specific motor response (right-left turn) without any impairments in memory for a visual object or for a spatial location (Kesner et al., 1993). Similarly, a lack of effects has been reported following medial caudate lesions in working memory performance for spatial locations on an 8 arm maze (Colombo, Davis, & Volpe, 1989; Cook & Kesner, 1988). A similar pattern of results has been reported following dysfunction of the caudate nucleus in patients with HD. For example, Davis et al. (2003) administered tests of spatial and motor working memory to a small group of HD patients. During the study phase of the spatial memory task, subjects were shown a subset of six stimulus locations (X’s) randomly selected from a set of 16 and presented in a sequential manner. Immediately following the study phase, the test phase was presented. During the test phase, two stimulus locations (X’s) were presented simultaneously. The subject was asked to indicate which one they had seen during the study phase. During the study phase of the hand position memory task, subjects were shown sequential presentations of six hand positions randomly selected from a set of 16 and were asked to imitate the hand position in the display. On the test phase, subjects were shown two pictures of different hand positions and were asked to determine which one they had seen in the study phase. The results of this study indicate that, relative to normal controls, the HD patients are differentially impaired in the motor memory task as compared to the spatial memory task. Interestingly, in two studies, Pasquier, et al. (1994) and Davis, Filoteo and Kesner (2007) demonstrated that HD patients were impaired on a task requiring them to recall the spatial distance of the displacement of a handle on the apparatus. The results of the above mentioned studies, suggest that patients with HD and rats with caudate lesions are impaired on working-memory tasks, particularly when the task places a heavy demand on motor information. Additional data based on a patient with a caudate nucleus lesion showed a decrease in accuracy of memory-guided saccades implying that the caudate nucleus mediates spatial short term memory for eye movements (Vermersch et al., 1999).

Figure 3. Pictorial representation of the caudate nucleus in the rat.

Figure 3. Pictorial representation of the caudate nucleus in the rat.

Furthermore, rats, monkeys, and humans with caudate lesions have deficits in tasks like delayed response, delayed alternation, and delayed matching to position (Divac, Rosvold, & Szwarcbart, 1967; Dunnett, 1990; Oberg & Divac, 1979; Partiot et al., 1996; Sanberg, Lehmann, & Fibiger, 1978). One salient feature of delayed response, delayed alternation, and delayed matching to position tasks is the maintenance of spatial orientation to the baited food, relative to the position of the subject’s body, often based on proprioceptive and vestibular feedback. These data suggest that the caudateputamen plays an important role in short-term memory representation for the feedback from a motor response feature of response attribute information. The memory impairments following caudate-putamen lesions are specific to the response attribute, because these same lesions in rats do not impair short-term memory performance for spatial location, visual object, or affect attribute information (Kesner et al., 1993; Kesner & Williams, 1995).

Short-term or Working Memory — Affect Attribute

With respect to affect attribute information, it can be shown that with the use of the above mentioned paradigms to measure short-term memory, that for rats with amygdala lesions and humans with amygdala damage there are major deficits for reward value associated with magnitude of reinforcement or for a liking response based on the mere exposure of a novel stimulus (Kesner & Williams, 1995; Chiba, Kesner, Matsuo, & Heilbrun, 1993), suggesting that the amygdala plays an important role in short-term memory representation for reward value as a critical feature of the affect attribute. Figure 1 depicts the location of the amygdala in rats. Since very few studies have measured the role of the amygdala in mediating short-term memory for affect, it was necessary to develop a new task (Kesner & Williams, 1995). In the study phase of the task, rats were given one of two cereals – one cereal contained 25% sugar, the other 50% sugar. One of the two cereals was always designated as the positive stimulus and the other as the negative stimulus. This study phase was followed by the test phase, in which the rat was shown an object which covered a food well. If the rat was given the negative food stimulus during the study phase, no food was placed beneath the object. If the rat was given the positive food stimulus during the study phase, another food reward was placed beneath the object. Latency to approach the object was used as the dependent measure. Rats learn to approach the objects quickly when they expect a reward and they are slow to approach the object when they expect no reward. After they reached criterion of at least a 5 s difference between the positive and negative trials, the rats were given amygdala or control lesions. The results indicate that in contrast to controls, the amygdala lesioned rats displayed a deficit in performance as indicated by smaller latency differences between positive and negative trials on post-surgery tests. This deficit persisted at both short and long delays. In additional experiments, it was shown that the amygdala lesioned rats, like controls, had similar taste preferences and transferred readily to different cereals containing 25% or 50% sugar. A similar result was reported by Kesner, Walser, and Winzenried (1989), who showed that amygdala lesioned rats were impaired in short-term memory performance for 1 versus 7 pieces of food associated with different spatial locations on an 8 arm maze. Thus, the amygdala appears to mediate short-term affect-laden information based on the reward value (magnitude) of reinforcement.

To what extent can one generalize from amygdala function in rats to humans with respect to affect attribute information? Previous research has shown that bilateral damage to the amygdala in humans impairs recognition of affect embedded within facial expressions (Adolphs, Tranel, Damasio, & Damasio, 1994). In order to elaborate further on the role of the amygdala in humans, Chiba et al. (1993) developed a liking test based on the mere exposure effect described by Zajonc (1968). Based on this principle, a computerized liking task was designed to test the presence of the mere exposure effect. The liking task consisted of eight abstract pictures and eight unknown words that were sequentially presented on the computer screen. Following the individual presentation of each of these 16 study stimuli, 16 liking trials were presented. In each liking trial, two stimuli – one study stimulus and a matched lure – were simultaneously presented on the computer screen. Subjects were then asked which of the two stimuli they liked better. Four groups of subjects were tested on this task — college students as control subjects, subjects with partial complex epilepsy of temporal lobe origin, subjects who had undergone unilateral temporal lobe resections, including the temporal cortex and the hippocampus, and subjects who had undergone unilateral temporal lobe resections including the temporal cortex, hippocampus, and amygdala. Results indicated that for mean percent preference for abstract pictures and words, all subject groups showed a stable liking or mere exposure effect for both sets of stimuli, with the exception of those who sustained amygdala damage. It appears that the integrity of the amygdala is critical to the existence of the liking effect.

Thus, it is likely that the amygdala of animals and humans is involved in a short-term memory representation of the affective quality and quantity (reward value) of stimuli. This idea is an extension of earlier theoretical notions that the amygdala is involved in the interpretation and integration of reinforcement (Weiskrantz, 1956), serves as a reinforcement register (Douglas & Pribram, 1966), mediates stimulus-reinforcement associations (Jones & Mishkin, 1972) and serves to associate stimuli with reward value (Gaffan, 1992).

Short-term or Working Memory — Sensory-Perceptual Attribute

With respect to sensory-perceptual attribute information, I will concentrate on visual object information as an exemplar of memory representation of the sensory-perceptual attribute. Figure 4 depicts the location of the perirhinal cortex in the rat. It can be shown that with the use of the above mentioned paradigms to measure short-term memory, that there are severe impairments in visual object information for rats and monkeys with extra-striate or perirhinal cortex lesions (Bussey et al., 2002; Gaffan & Murray, 1992; Horel, Pytko-Joiner, Boytko, & Salsbury, 1987; Kesner et al., 1993; Mumby & Pinel, 1994; Norman & Eacott, 2004; Suzuki, Zola-Morgan, Squire, & Amaral, 1993), suggesting that the extra-striate and perirhinal cortex play an important role in short-term memory representation for visual object information as an exemplar of the sensory-perceptual attribute. Further support derives from single unit studies in rats and monkeys which indicate that activity of neurons in the rhinal cortex reflect stimulus repetition which is an integral part of the delayed non-matching to sample tasks used to measure short-term recognition memory for objects (Zhu, Brown, & Aggleton, 1995).

Using a paradigm developed by Poucet (1989), rats with CA3 lesions that were tested for the detection of a novel visual object change showed no disruption (Lee et al., 2005b). Based on the idea that the lateral perforant path inputs into CA3 mediate visual object information (i.e. “what” information) via activation of opioid receptors, rats received direct infusions of naloxone (a μ opiate antagonist) into CA3 and CA1 and were tested for the detection of a novel spatial configuration of familiar objects and the detection of a novel
visual object. The results indicate that naloxone infusions into the CA3 disrupted novelty detection of a spatial location and a visual object, but naloxone injections into CA1 disrupted novelty detection for a visual object, but not for a spatial location (Hunsaker et al., 2007). The primary implication of these data is that CA3 is capable of simultaneous processing of both spatial (“where”) and nonspatial (“what”) elements of event-based memory. Based on the idea that the medial perforant path inputs into CA3 mediate spatial location information (i.e. “where” information) via activation of NMDA receptors, rats received direct infusions of AP5 (an NMDA antagonist) into CA3 and were tested for the detection of a novel spatial configuration of familiar objects and the detection of a novel visual object. The results indicate that NMDA infusions into the CA3 disrupted novelty detection of a spatial location and a visual object, but NMDA injections into CA1 disrupted novelty detection for a spatial location, but not for a visual object (Hunsaker et al, 2007). Disruption of either medial perforant path (NMDA-ergic) or lateral perforant path (μ opioid-ergic) plasticity resulted in spatial and novel object detection deficits. In CA1, it appears that the spatial and nonspatial elements are processed separately. Disrupting the lateral perforant path by infusing naloxone was sufficient to disrupt novel object detection, but not sufficient to disrupt detection of a spatial change. These data suggest that CA3, but not CA1, is critically important for spatial/nonspatial associative binding critical for eventbased memory. Similar to the argument provided earlier, it appears that CA3 is involved in rapid spatial and nonspatial information binding into coherent behavioral episodes in the time-scale of this task (each episode is of approximately 6 min duration). When CA3 is disrupted, the rat fails to retrieve any elements of the event. This is in contrast to CA1, where it appears that CA1 is involved in temporally tagging information into events, and that this is carried out upon each type of information separately (e.g., spatial and nonspatial information). Thus a disruption to nonspatial information disrupts only nonspatial processing in CA1.

Short term or Working Memory — Language Attribute

With respect to language attribute information, it can be shown that with the use of the above mentioned paradigms to measure short-term memory that there are severe impairments for lists of words for humans with left hippocampal or bilateral hippocampal damage (Hopkins, Kesner, & Goldstein, 1995b), suggesting that the hippocampus plays an important role in short-term memory representation of word information as an important feature of language attribute information. There is a good deal of evidence supporting the idea of important lateralization for hippocampal function in humans with the right hippocampus representing spatial information and the left hippocampus representing linguistic information (Milner, 1971; Smith & Milner, 1981). For example, Milner tested patients who had left or right temporal lobectomies on a task of recall for a visual location. In this task subjects made a mark on an 8 in line in order to reproduce as close as possible the exact position of the previously shown circle. Subjects with right temporal lobe lesions were impaired on this task, whereas subjects with left temporal lobe lesions were not significantly different from control subjects. Smith and Milner (1981) tested patients with right and left temporal lobectomies and control subjects on a memory task involving incidental recall of the locations of the objects. Subjects were asked to estimate the prices of several objects which were placed in a spatial array on a test board. After a short or 24 h delay, subjects were asked to place the objects in their appropriate locations. Left temporal lobe and control subjects performed well on this task at both the immediate and delayed recall of the object locations. Right temporal lobe subjects were impaired for both the immediate and delayed recall of the object locations. Even though hypoxic subjects or left temporal resected patients are impaired for new linguistic information, they are not impaired when they can use semantic or syntactic information to remember the order of presentation of syntactically and semantically meaningful sentences (Hopkins et al., 1995b).

Event-Based Memory — Pattern Separation

Pattern separation is defined as a process to remove redundancy from similar inputs so that events can be separated from each other and interference can be reduced and in addition can produce a more orthogonal, sparse, and categorized set of outputs.

Pattern Separation — Spatial Attribute

The determination of a spatial pattern separation process has been developed extensively by computational models of the subregions of the hippocampus with a special emphasis on the dentate gyrus (DG). Based on the empirical findings that all sensory inputs are processed by the DG subregion of the hippocampus ((Aggleton, Hunt, & Rawlins, 1986; Jackson-Smith et al., 1993; Kesner et al., 1993; Mumby, Wood, & Pinel, 1992; Otto & Eichenbaum, 1992), it has been suggested that a possible role for the hippocampus might be to provide for sensory markers to demarcate a spatial location, so that the hippocampus can more efficiently mediate spatial information. It is thus possible that one of the main process functions of the hippocampus is to encode and separate spatial events from each other. This would ensure that new highly processed sensory information is organized within the hippocampus and enhances the possibility of remembering and temporarily storing one place as separate from another place. It is assumed that this is accomplished via pattern separation of event information, so that spatial events can be separated from each other and spatial interference reduced. This process is akin to the idea that the hippocampus is involved in orthogonalization of sensory input information (Rolls, 1989), in representational differentiation (Myers, Gluck, & Granger, 1995), and indirectly in the utilization of relationships (Cohen & Eichenbaum, 1993).

Rolls’ (1996) model proposes that pattern separation is facilitated by sparse connections in the mossy-fiber system, which connects DG granular cells to CA3 pyramidal neurons. Separation of patterns is accomplished based on the low probability that any two CA3 neurons will receive mossy fiber input synapses from a similar subset of DG cells. Mossy fiber inputs to CA3 from DG are suggested to be essential during learning and may influence which CA3 neurons fire based on the distributed activity within the DG. Cells of the DG are suggested to act as a competitive learning network with Hebb-like modifiability to reduce redundancy and produce sparse, orthogonal outputs. O’Reilly & McClelland (1996) and Shapiro & Olton (1994) also suggested that the mossy fiber connections between the DG and CA3 may support pattern separation.

To examine the contribution of the DG to spatial pattern separation, Gilbert, Kesner, and Lee (2001) tested rats with DG lesions using a paradigm which measured short-term memory for spatial location information as a function of spatial similarity between spatial locations. Specifically, the study was designed to examine the role of the DG subregion in discriminating spatial locations when rats were required to remember a spatial location based on distal environmental cues and to differentiate between the to-be-remembered location and a distractor location with different degrees of similarity or overlap among the distal cues.

Animals were tested using a cheeseboard maze apparatus (the cheese board is similar to a dry land water maze with 177 circular, recessed holes on a 119 cm diameter board) on a delayed-match-to-sample for a spatial location task. Animals were trained to displace an object which was randomly positioned to cover a baited food well in 1 of 15 locations along a row of food wells. Following a short delay, the animals were required to choose between objects which were identical to the sample phase object: one object was in the same location as the sample phase object and the second object was in a different location along the row of food wells. Rats were rewarded for displacing the object in the same spatial location as the sample phase object (correct choice), but they received no reward for displacing the foil object (incorrect choice). Five spatial separations, from 15 cm to 105 cm, were used to separate the correct object and the foil object during the choice phase. Rats with DG lesions were significantly impaired at short spatial separations; however, during the choice phase, performance of DG-lesioned animals increased as a function of greater spatial separation between the correct and foil objects. The performance of rats with DG lesions matched control rats at the largest spatial separation. The graded nature of the impairment and the significant linear improvement in performance as a function of increased separation illustrate a deficit in pattern separation. Based on these results, it was concluded that lesions of the DG decrease the efficiency of spatial pattern separation, which results in impairments on trials with increased spatial proximity and increased spatial similarity among working memory representations. Holden, Hoebel, Loftis, and Gilbert (2012) used an analogous task to that used for rats (Gilbert et al., 2001) to test young participants compared to aged participants who are likely to have DG dysfunction (see Small, et al., 2011). They report that aged participants that do not perform well on standard memory tests are impaired in displaying a pattern separation function. One limitation of the dot task is that it does not assess the ability to separate spatial patterns in the real world. In order to assess real world spatial pattern separation, hypoxic subjects with hippocampal damage and matched normal controls were administered a geographical spatial distance task (cities on a map; Hopkins & Kesner, 1993). The subjects were shown 8 cities on a map of New Brunswick, one at a time, for 5 s each. Subjects were instructed to remember the city and its spatial location on the map. In the test phase, the subjects were presented with the names of two cities that occurred in the study phase and were asked which of the cities was located further to the east (on separate trials, subjects were asked which city occurred further north, south, or west). There were two trials for each compass direction. Spatial distances of 0, 2, 4, and 6 as measured by the number of cities in the study phase that were geographically situated between the two test cities were measured. There were 8 trials for each distance. The hypoxic subjects were impaired for all spatial distances for spatial geographical information compared to control subjects (Hopkins & Kesner, 1993).

Thus, the DG may function to encode and to separate locations in space to produce spatial pattern separation. Such spatial pattern separation ensures that new highly processed sensory information is organized within the hippocampus, which in turn enhances the possibility of encoding and temporarily remembering one spatial location as separate from another.

Based on the observations that cells in CA3 and CA1 regions respond to changes in metric and topological aspects of the environment (Jeffery & Anderson, 2003; O’Keefe & Burgess, 1996), one can ask whether these different features of the spatial environment are processed via the DG and then are subsequently transferred to the CA3 subregion or if these features are communicated via the direct perforant path projection to the CA3 subregion. In both cases, information may then be transferred to the CA1 subregion.

To answer this question, Goodrich-Hunsaker, Hunsaker, and Kesner (2005) examined the contributions of the DG to memory for metric spatial relationships. Using a modified version of an exploratory paradigm developed by Poucet (1989), rats with DG, CA3, and CA1 lesions as well as controls, were tested on tasks involving a metric spatial manipulation. In this task, a rat was allowed to explore two different visual objects separated by a specific distance on a cheeseboard maze. On the initial presentation of the objects, the rat explored each object. However, across subsequent presentations of the objects in the same spatial locations, the rat habituated and eventually spent less time exploring the objects. Once the rat had habituated to the objects in their locations, the metric spatial distance between the objects was manipulated so that the two objects were either closer together or farther apart. The time the rat spent exploring each moved object was recorded. The results showed that rats with DG lesions spent significantly less time exploring the two objects that were displaced relative to controls, indicating that DG lesions impair the detection of metric distance changes. Rats with CA3 or CA1 lesions displayed mild impairments relative to controls, providing empirical validation for the role of DG in spatial pattern separation and support the predictions of computational models (Rolls, 1996; Rolls & Kesner, 2006). Stark, Yassa, and Stark (2010) used an analogous task to that used for rats (Goodrich-Hunsaker et al., 2005) to measure spatial pattern separation based on distance, and in this case angle as well, to test young and healthy aging humans. Even though there were some individual differences, they reporedt an impairment in spatial pattern separation. Also, Baumann, Chan, and Mattingley (2012) reported activation of the posterior hippocampus in spatial pattern separation using the task used by Goodrich-Hunsaker et al. (2005).

Based on the observation that neurogenesis occurs in the DG and that new DG granule cells can be formed over time, it has been proposed that the DG mediates a spatial patternseparation mechanism as well as generates patterns of episodic memories within remote memory (Aimone, Wiles, & Gage, 2006). Thus far, it has been shown in mice that disruption of neurogenesis using low-dose x-irradiation was sufficient to produce a loss of newly born DG cells. Further testing indicated impairments in spatial learning in a delayed non-matching-to-place task in the radial arm maze. Specifically, impairment occurred for arms which were presented with little separation, but no deficit was observed when the arms were presented farther apart, suggesting a spatial pattern separation deficit. Also, the disruption of neurogenesis using lentivirus expression of a dominant Wnt protein produced a loss of newly born DG cells, as well, and was observed in an associative object-in-place task with different spatial separations as a function of the degree of separation, again suggesting a spatial pattern separation deficit (Clelland et al., 2009). These data suggest that neurogenesis in the DG may contribute to the operation of spatial pattern separation. Thus, spatial pattern separation may play an important role in the acquisition of new spatial information and there is a good possibility that the DG may be the subregion responsible for the impairments in the various tasks described above.

Pattern Separation — Temporal Attribute

There are data to support the existence of memory for order information, but it is not always clearly demonstrated whether memory for a particular sequence has been learned and can be accurately recalled. Estes (1986) summarized data demonstrating that, in human memory, there are fewer errors for distinguishing items (by specifying the order in which they occurred) that are far apart in a sequence than those that are temporally adjacent. Other studies have also shown that order judgments improve as the number of items in a sequence between the test items increases (Banks, 1978; Chiba, Kesner, & Reynolds, 1994; Madsen & Kesner, 1995). This phenomenon is referred to as a temporal distance effect [sometimes referred to as a temporal pattern separation effect (Kesner, Lee, & Gilbert, 2004)]. The temporal distance effect is assumed to occur because there is more interference for temporally proximal events than for temporally distant events.

Based on these findings, Gilbert et al. (2001) tested rodents memory for the temporal order of items in a one-trial sequence learning paradigm. In the task, each rat was given one daily trial consisting of a sample phase followed by a choice phase. During the sample phase, the animal visited each arm of an 8-arm radial maze once in a randomly predetermined order and was given a reward at the end of each arm. The choice phase began immediately following the presentation of the final arm in the sequence. In the choice phase, two arms were opened simultaneously and the animal was allowed to choose between the arms. To obtain a food reward, the animal had to enter the arm that occurred earlier in the sequence that it had just followed. Temporal separations of 0, 2, 4, and 6 were randomly selected for each choice phase. These values represented the number of arms in the sample phase that intervened between the arms that were to be used in the test phase. After reaching criterion, rats received CA1 lesions. Following surgery, control rats matched their preoperative performance over all temporal separations. In contrast, rats with CA1 lesions performed at chance across 0, 2, or 4 temporal separations and a little better than chance in the case of a separation of 6 items. The results suggest that the CA1 subregion is involved in memory for spatial location as a function of temporal separation of spatial locations; lesions of the CA1 decrease efficiency in temporal pattern separation. CA1 lesioned rats cannot separate events over time, perhaps due to an inability to inhibit interference that may be associated with sequentially occurring events. The increase in temporal interference impairs the rat’s ability to remember the order of specific events. Tolentino et al. (2012) used an analogous task to that used for rats (Gilbert et al., 2001) to test young compared to non-demented older participants in a spatial temporal pattern separation task and report temporal pattern separation problems for the older participants. In another spatial location task, patients with a hypoxic condition and hippocampal damage were impaired in displaying a temporal pattern separation function (Hopkins et al., 1995a).

In a more recent experiment using a paradigm described by Hannesson et al. (2004), it was shown that temporal order information for spatial location was impaired only for CA1 (Hunsaker et al., 2008). Thus, it can be suggested that the CA1 hippocampal subregion serves as a critical substrate for sequence learning and temporal pattern separation for the spatial attribute.

It has been suggested that the perirhinal cortex and CA1 subregion of the hippocampus plays an important role in supporting temporal processing of visual object information (Hoge & Kesner, 2007; Hunsaker et al., 2008). In humans it can be shown that a temporal pattern separation process can be observed in hypoxic patients in a temporal order test memory test for abstract figures (Hopkins et al., 1995a), suggesting that the hippocampus may also play a role in temporal pattern separation for visual stimuli, at least in humans.

Does the hippocampus support temporal pattern separation processes for sensory-perceptual information other than space and visual objects? To answer this question, memory for the temporal order for a sequence of odors was assessed in rats based on a varied sequence of five odors, using a similar paradigm described for sequences of spatial locations. Kesner, Gilbert, and Barua (2002) found that rats with hippocampal lesions were impaired relative to control animals for memory for all temporal distances between the odors, despite an intact ability to discriminate between the odors. Fortin, Agster, and Eichenbaum (2002) reported similar results with fimbria fornix lesions. In a further subregional analysis, rats with dorsal CA1 lesions showed a mild impairment in memory for the temporal distance for odors, but rats with ventral CA1 lesions showed a severe impairment (Kesner, Hunsaker, & Ziegler, 2010). Thus, the CA1 appears to be involved in separating events in time for spatial and nonspatial information, so one event can be remembered distinctly from another event; however, the dorsal CA1 might play a more important role than the ventral CA1 for spatial information (Chiba, Johnson, & Kesner, 1992), and conversely the ventral CA1 might play a more important role than the dorsal CA1 for odor information. The mechanism that could subserve the above mentioned findings is based on the memory question that asks which of two items occurred earlier in the list. To implement this type of memory, some temporally decaying memory trace or temporally increasing memory trace via a consolidation process might provide a model (Marshuetz, 2005); in such a model, temporally adjacent items would have memory traces of more similar strength and would be harder to discriminate than the strengths of the memory traces of more temporally distant items.

Pattern Separation — Response Attribute

A delayed-match-to-sample task was used to assess memory for motor responses in rats with control, hippocampus, or medial caudate nucleus (MCN) lesions. All testing was conducted on a cheeseboard maze in complete darkness using an infrared camera. A start box was positioned in the center of the maze facing a randomly determined direction on each trial. In the sample phase, a phosphorescent object was randomly positioned to cover a baited food well in 1 of 5 equally spaced positions around the circumference of the maze forming a 180-degree arc 60 cm from the box. On each trial, the door to the start box was opened, the rat exited, displaced the object to receive food, and returned to the box. The box was then rotated to face a different direction. The food well in the same position relative to the box was baited and an identical phosphorescent object was positioned to cover the well. A second identical object was positioned to cover a different unbaited well. On the choice phase, the rat was allowed to choose between the 2 objects. The object in the same position relative to the start box as the object in the sample phase was the correct choice and the foil object was the incorrect choice. The rat must remember the motor response made on the sample phase and make the same motor response on the choice phase to receive a reward. Four separations of 45, 90, 135, and 180 degrees were randomly used to separate the correct object from the foil in the choice phase. Hippocampus-lesioned and control rats improved as a function of increased angle separation and matched the performance of controls. However, rats with MCN lesions were impaired across all separations (Kesner & Gilbert, 2006). Results suggest that the MCN, but not the hippocampus, may support working memory and/or a process aimed at reducing interference for motor response selection based on vector angle information.

Pattern Separation — Affect Attribute

Male Long-Evans rats were tested on a modified version of Flaherty, Turovsky, & Krauss’ (1994) anticipatory contrast paradigm, to assess pattern separation for reward value. Prior to testing, each rat received either a control, hippocampal, or amygdala lesion. In the home cage, each rat was allowed to drink a water solution containing 2% sucrose for 3 min followed by a water solution containing 32% sucrose for 3 min. Over 10 days of testing, the rats in each lesion group showed significantly increased anticipatory discriminability as a function of days. In order to assess the operation of a pattern separation mechanism, each rat was then tested using the same procedure, except the 2% solution was followed by a 16% solution for 10 days and then by an 8% solution for 10 days. Control and hippocampal-lesioned rats continued to show high discriminability when the 2% solution was followed by a 16% solution, however, the amygdala-lesioned rats showed low anticipatory discriminability. On trials where the 2% sucrose solution was followed by an 8% sucrose solution, all groups showed low discriminability scores, suggesting that when two reward values are very similar, even control animals are not able to separate the reward values in memory. However, the results of a preference task revealed that all groups can perceptually discriminate between a 2% and an 8% sucrose solution (Gilbert & Kesner, 2002). The data suggest that the amygdala, but not the hippocampus, is involved in the separation of patterns based on reward value.

Pattern Separation — Sensory-Perceptual Attribute (Objects)

In order to determine whether the perirhinal cortex plays a role in object-based pattern separation, rats with perirhinal cortex, hippocampal, or sham lesions were trained on a successive discrimination go/no-go task to examine recognition memory based on pattern separation for an array of visual objects with varying interference among the objects in the array. Rats were trained to recognize a target array consisting of four particular objects that could be presented in any one of four possible configurations to cover baited food wells. If the four target objects were presented, the rat should displace each object to receive food. However, if a novel object replaced any one or more of the target objects, then the rat should withhold its response. The number of novel objects presented on non-rewarded trials varied from one to four. The fewer the number of novel objects in the array, the more interference the array shared with the target array, therefore increasing task difficulty, requiring an object pattern separation mechanism to solve the task. The results indicated that an increased number of novel objects resulted in a pattern separation effect with less interference for the target array as indicated by decreased task difficulty. Although accuracy was slightly lower in rats with hippocampal lesions, compared to controls, the learning of the groups was not statistically different. In contrast, rats with perirhinal cortex lesions were significantly impaired in utilizing a pattern separation function compared to both control and hippocampal-lesioned rats (Gilbert & Kesner, 2003). The results suggest that temporal pattern separation for objects is affected by stimulus interference in rodents and is mediated by the perirhinal cortex. Other research supports these results for the perirhinal mediation of object-based pattern separation (Bussey et al., 2002; Norman & Eacott, 2004).

In studies with humans, a modified continuous recognition task was used. In one study with young participants using high resolution fMRI with this task, it was found that the hippocampus distinguished among correctly identified true stimulus repetitions, correctly rejected presentations of similar lure stimuli, and false alarm lures (Kirwan & Stark, 2007). In a subsequent study it was shown that in aged compared to young participants that the DG/CA3 subregions of the hippocampus played an important role in deficits found in aged participants (Yassa et al., 2010). For a review of the human pattern separation data see (Yassa & Stark, 2011).

Pattern Separation — Sensory-Perceptual Attribute (Odors)

Working memory and pattern separation for odor information was assessed in rats using a matching-to-sample for odors paradigm. The odor set consisted of a five aliphatic acids with unbranched carbon chains that varied from two-six carbons in length. Each trial consisted of a sample phase followed by a choice phase. During the sample phase, rats would receive one of five different odors. During the choice phase 15 s later, one of the previous odors was presented simultaneously side by side with a different odor that was based on the number of aliphatic acids that varied in the carbon chains from two-six carbons in length and rats were allowed to choose between the two odors. The rule to be learned in order to receive a food reward was to always choose the odor that occurred during the study phase. Odor separations of 1, 2, 3 or 4 were selected for each choice phase which represented the carbon chain difference between the study phase odor and the test phase odor. Once an animal reached a criterion of 80-90% correct across all temporal separations based on the last 16 trials, rats received a control or ventral dentate gyrus lesion and were retested on the task. On postoperative trials, there were no deficits at the 15 s delay for either the controls or the ventral dentateyrus lesioned rats. However, when the delay was increased o 60 s, rats with ventral DG lesions were significantly impaired at short spatial separations and performance of DGlesioned animals increased as a function of greater spatial separation between the correct and foil objects. The performance of rats with ventral DG lesions matched control rats at the largest odor based separation. The graded nature of the impairment and the significant linear improvement in performance as a function of increased separation illustrate a deficit in odor pattern separation. Based on these results, it was concluded that lesions of the ventral DG decrease the efficiency of odor based pattern separation, which results in impairments on trials with increased spatial proximity and increased odor similarity among working memory representations (Weeden, Hu, Ho, & Kesner, 2012). The data suggest that the ventral hippocampus, but not dorsal hippocampus, supports pattern separation for odor information.

In summary, within the event-based memory system, different brain regions process different attributes in support of short-term or working memory and pattern separation processes. Data are presented to support this assertion by demonstrating that the dorsal hippocampus mediates spatial and temporal attribute information, the caudate mediates response attribute information, the amygdala mediates affect attribute information, the perirhinal cortex mediates sensory-perceptual attribute information for visual objects, the ventral hippocampus mediates sensory-perceptual attribute information for odors, and the hippocampus mediates language attribute information. Where data are available, there are parallel results found in rodents, monkeys and humans.

Knowledge-Based Memory

The organization of the attributes within the knowledge-based memory system can take many forms and they are assumed to be organized as a set of cognitive maps or neural nets, the interactions of which are unique for each memory. It is assumed that long-term representations within cognitive maps are more abstract and less dependent upon specific features. Some interactions between attributes are important and can aid in identifying specific neural regions that might subserve a critical interaction. For example, the interaction between sensory-perceptual attributes and the spatial attribute can provide for the long-term memory representation of a spatial cognitive map or spatial schemas, the interaction between temporal and spatial attributes can provide for the long-term memory representation of scripts, the interaction between temporal and affect attributes can provide for the long-term memory representation of moods, and the interaction between sensory-perceptual and response attributes can provide for the long-term memory of skills. Based on a series of experiments, it can be shown that within the knowledge-based memory system, different neural structures and circuits mediate different forms or attributes of memory. The most extensive data set is based on the use of paradigms that measure repetition priming, the acquisition of new information, discrimination performance, executive functions, strategies and rules to perform in a variety of tasks including skills and the operation of a variety of long-term memory programs.

For the knowledge-based memory system, I will concentrate on specific processes that mediate perceptual memory within long-term memory. For the other processes, including selective attention and selective filtering associated with permanent memory representations of familiar information, selection of strategies and rules (“executive functions”), and retrieval of familiar information based on flexibility and action, the establishment of arbitrary associations, consolidation or elaborative rehearsal of new information, and retrieval of new information based on flexibility, action, and pattern completion, there is not a sufficient data set to differentiate the contribution of the different attributes associated with mnemonic processing of information.

Spatial Attribute

The emphasis will be on the role of the parietal cortex (PPC) in perceptual and long-term memory processing of complex spatial information within the knowledge-based memory system, see Figure 4 for the location of the PPC in rats. Rats with PPC lesions display deficits in both the acquisition and retention of spatial navigation tasks that are presumed to measure the operation of a spatial cognitive map within a complex environment (DiMattia & Kesner, 1988b; Kesner, Farnsworth, & Kametani, 1992). They also display deficits in the acquisition and retention of spatial recognition memory for a list of five spatial locations (DiMattia & Kesner, 1988a). In a complex discrimination task in which a rat has to detect the change in location of an object in a scene, rats with PPC lesions are profoundly impaired (DeCoteau & Kesner, 1996), yet on less complex tasks involving the discrimination or short-term memory for single spatial features including spatial location, allocentric and egocentric spatial distance (Long & Kesner, 1996) there are no impairments. When the task is more complex, involving the association of objects and places (components of a spatial cognitive map), then PPC plays an important role. Support for this conclusion comes from the finding that rats with PPC lesions are impaired in the acquisition and retention of a spatial location plus object discrimination (paired associate task), but show no deficits for only spatial or object discriminations (Long et al., 1998). Comparable deficits are found within an egocentric-allocentric distance paired associate task (Long & Kesner, 1996), but there is no deficit for an object-object paired associate task, suggesting that spatial features are essential in activating and involving the PPC (unpublished observations). Finally, it should be noted that in rats, neurons have been found within the PPC that encode spatial location and head direction information and that many of these cells are sensitive to multiple cues including visual, proprioceptive, sensorimotor and vestibular cue information (Chen, Lin, Barnes, & McNaughton, 1994; McNaughton, Chen, & Marcus, 1991). Additional support comes from studies with parietal lesioned monkeys. These animals demonstrate deficits in place reversal, landmark reversal, distance discrimination, bent wire route-finding, pattern string-finding, and maze-learning tasks (Milner, Ockleford, & DeWar, 1977; Petrides & Iversen, 1979; Pohl, 1973).

Figure 4. Pictorial representation of the posterior parietal cortex (PPC) and TE2 cortex in the rat.

Figure 4. Pictorial representation of the posterior parietal cortex (PPC) and TE2 cortex in the rat.

In a somewhat different study, rats with PPC lesions are impaired in an implicit spatial repetition priming experiment but perform without difficulty in processing positive priming for features of visual objects and a short-term or working memory for a spatial location experiment (Chiba, Kesner, & Jackson, 2002), suggesting that the parietal cortex plays a role in spatial perceptual memory within the knowledge-based memory system, but does not play a role in spatial memory within the event-based memory system.In humans there is a general loss of topographic sense, which may involve loss of long-term geographical knowledge as well as an inability to form cognitive maps of new environments. Using PET scan and functional MRI data, it can be shown that complex spatial information results in activation of the parietal cortex (Ungerleider, 1995). Thus, memory for complex spatial information appears to be impaired (Benton, 1969; De Renzi, 1982). Furthermore, in patients with parietal lesions and spatial neglect, there is a deficit in spatial repetition priming without a loss in short-term or working memory for spatial information (Ellis, Sala, & Logie, 1996). Keane, et al. (1995) reported that a patient with occipital-lobe damage (extending into PC) showed a deficit in perceptual priming but had no effect on recognition memory, whereas a patient with bilateral medial temporal lobe damage (including hippocampus) had a loss of recognition memory, but no loss of perceptual memory.

Sensory-Perceptual Attribute

The emphasis will be on visual perceptual processing within the knowledge-based system, see Figure 4 for the location of the TE2 cortex in the rat. I will concentrate on temporal cortex (TE2) and make comparisons with the (PPC). In rats using a visual object-place recognition task, TE2 lesioned rats fail to detect a visual object change, whereas PPC lesioned rats fail to detect a spatial location change (Tees, 1999) suggesting that the two cortical areas play a distinctive role in perceptual processing of visual versus spatial location information. Similar results were reported by Ho et al. (2011) who showed that rats with TE2 lesions had object recognition problems at 20 min, but not at 5 min delays. In rats there is a deficit in processing positive priming for features of visual objects (a component of perceptual memory system), but the rats performed well in positive priming for spatial location (Kesner, unpublished observations). In monkeys deficits for visual objects and in a working memory task and visual paired comparison task were observed following TE2 deficits suggesting that TE2 cortex may play an important role in visual perceptual processing (Buffalo et al., 1999). It can also be shown that lesions of the inferotemporal cortex in monkeys and humans and temporal cortex (TE2) in rats result in visual object discrimination problems (Dean, 1990; Fuster,1995; Gross, 1973; McCarthy & Warrington, 1990; Weiskrantz & Saunders, 1984), suggesting that the inferotemporal or TE2 may play an important role in mediating long-term representations of visual object information. Additional support comes from PET scan and functional MRI data in humans, where it can be shown that visual object information results in activation of inferotemporal cortex (Ungerleider, 1995). In a somewhat different study, Sakai and Miyashita (1991) have shown that neurons within the inferotemporal cortex responded more readily after training to a complex visual stimulus that had been paired with another complex visual stimulus across a delay, suggesting the formation of long-term representations of objectobject pairs.

In summary, within the knowledge-based memory system, different brain regions process different attributes in support of perceptual processes. Data are presented to support this assertion by demonstrating that the PPC mediates the spatial attribute for spatial perceptual information and the TE2 cortex mediates the sensory-perceptual attribute for visual object information. Where data are available, there are parallel results found in rodents, monkeys and humans.

Rule-Based Memory

For the rule-based memory system, it is assumed that there is integration of information from the event-based and knowledge-based memory systems for the use of major processes that include the selection of strategies and rules for maintaining or manipulating information for subsequent decision making and action as well as short-term or working memory for new and familiar information.

I will concentrate on two processes, namely working (short-term memory) and paired associate learning and I will emphasize the importance of prefrontal cortex (PFC) subregions in the mediation of different attributes. Figure 5 depicts the organization of the PFC in the rat.

Working Memory (Short-term Memory)

Working or short-term memory is a process for short-term active maintenance of information as well as for processing maintained information. The most extensive data set aimed at addressing the role of the different PFC subregions in supporting different forms of working memory is based on experiments using paradigms that measure short-term or working memory in tasks such as matching or non-matching-to-sample for single or lists of items, continuous or nback recognition memory, and novelty detection based on recognition memory.

Figure 5. Frontal areas of the rat: A. Medial View. B. Ventral view. Abbreviations: PrCm (PC)-precentral cortex; ACdorsal and ventral anterior cingulate; PL-IL-prelimbic and infralimbic cortex; MO-medial orbital cortex; AI-dorsal and ventral agranular insular cortex; LO-lateral orbital cortex; VO-ventral orbital cortex; VLO-ventrolateral orbital cortex.

Figure 5. Frontal areas of the rat: A. Medial View. B. Ventral view. Abbreviations: PrCm (PC)-precentral cortex; ACdorsal and ventral anterior cingulate; PL-IL-prelimbic and infralimbic cortex; MO-medial orbital cortex; AI-dorsal and ventral agranular insular cortex; LO-lateral orbital cortex; VO-ventral orbital cortex; VLO-ventrolateral orbital cortex.

Evidence supporting a role for the rat PFC in working memory is based on the findings that lesions of the anterior cingulate and precentral (AC/PC) cortex that spare the prelimbic-infralimbic (PL/IL) cortex produce a deficit in working memory for motor response information such as working memory for a motor (right-left turn) response (Kesner, Hunt, Williams, & Long, 1996; Ragozzino & Kesner, 2001), but not working memory for visual object (Ennaceur, Neave, & Aggleton, 1997; Kesner et al., 1996; Shaw & Aggleton, 1993), or affect (food reward value) information, (DeCoteau, Kesner, & Williams, 1997; Ragozzino & Kesner, 1999). There are also no deficits, with a few exceptions, in working memory for spatial information using delayed non-matching to position, delayed spatial alternation, or non-matching-to sample in a T- maze, 8 arm maze, and continuous spatial recognition memory procedures (Ennaceur et al., 1997; Harrison & Mair, 1996; Kesner et al., 1996; Kolb, Sutherland, & Whishaw, 1983; Passingham, Myers, Rawlins, Lightfoot, & Fearn, 1988; Ragozzino, Adams, & Kesner, 1998; Sanchez-Santed, deBruin, Heinsbroek, & Verwer, 1997; Shaw & Aggleton, 1993). Thus, the data suggest that the AC cortex and PC cortex process working memory for motor response information, but do not process working memory for visual object, spatial, or affect (food value) information. In monkeys, enhanced single unit activity was recorded in the premotor cortex in relation to the go and no-go component of a delayed conditional go/no go task, suggesting an involvement of the premotor cortex in a working memory task component associated with motor movement (Watanabe, 1986). Recent work with humans using fMRI techniques showed that activation was observed in the premotor cortex in a delayed response task (Turner & Levine, 2006).The PL-IL cortex appears to play an important role in working memory for visual object and spatial location information. Supporting evidence is based on the findings that lesions of the PL-IL cortex produce deficits in working memory for spatial information (Brito & Brito, 1990; Delatour & Gisquet-Verrier, 1996; Granon, Vidal, Thinus-Blanc, Changeux, & Poucet, 1994; Horst & Laubach, 2009; Ragozzino et al., 1998; Seamans, Floresco, & Phillips, 1995), and working memory for visual object information (Di Pietro, Black, Green-Jordan, Eichenbaum, & Kantak, 2004; Kesner et al., 1996; Ragozzino, Detrick, & Kesner, 2001). However, PL-IL lesions do not produce a deficit in working memory for a food reward (DeCoteau et al., 1997; Ragozzino & Kesner, 1999). Further support of this conclusion was reported by Chang, Chen, Luo, Shi, and Woodward (2002), who found sustained neural firing in the PL-IL cortex during the delay within a delayed matching-to-position task, and Baeg et al. (2003) who recorded from the PL-IL cortex in a spatial delayed alternation task reported an increase in neural firing during the delay period.

In early research, Goldman-Rakic (1987, 1996) proposed that one could fractionate functions of the PFC on the basis of differential subregional contributions. She suggested that the main function of the PFC is to support working memory defined as a specialized process by which a remembered stimulus is held on line to guide behavior in the absence of external cues. Furthermore, she postulated a modular organization of working memory based on the use of different domains or attributes of information processing. This is called the domain-specificity model, which would be consistent with the attribute model. In monkeys, the cortex surrounding the principal sulcus (dorsolateral prefrontal cortex) is specialized for on line processing of spatial information, whereas the inferior convexity (ventrolateral prefrontal cortex) is specialized for on-line processing of visual object information. In addition, the ventrolateral prefrontal cortex could also support other sensory domains. Support for this model is based on the observation of delay-specific cells in the dorsolateral prefrontal cortex for only spatial tasks, such as delayed response, delayed alternation, and delayed oculomotor tasks, and the observation that lesions of the dorsolateral prefrontal cortex disrupt performance on a delayed response, delayed alternation, delayed oculomotor, and spatial search tasks (Butters & Pandya, 1969; Funahashi, Bruce, & Goldman-Rakic, 1993; Goldman & Rosvold, 1970; Mishkin, 1957; Passingham, 1985). In contrast, delay-specific cells for a visual object delay task are found in the ventrolateral prefrontal cortex and lesions in this area disrupt visual object recognition (Mishkin & Manning, 1978; Wilson, Scalaidhe, & Goldman-Rakic, 1993).

Challenges to Goldman-Rakic’s domain specificity model for working memory as the main organizing principle for the prefrontal cortex have emerged based on research with monkeys and humans. First, Rao, Rainer, and Miller (1997) have shown that one can record both spatial location and visual object information from the same cell within the dorsolateral prefrontal cortex and these cells can change very readily based on the demands of the task. Second, Fuster, Bauer, and Jervey (1982) showed that delay cells can be found in both the dorsolateral and ventrolateral prefrontal cortex in a visual-visual stimulus or spatial location-spatial location matching-to-sample tasks. In humans, D’Esposito et al. (1998) reported a meta-analysis of neuroimaging results based on visual object and spatial location working memory tasks which provided a strong case for processing of both visual object and spatial location information in working memory in both the dorsolateral and ventrolateral prefrontal cortex. Similar findings based on a meta-analysis of sixty PET and fMRI studies using working memory paradigms were reported by Wager and Smith (2003), showing that working memory for spatial and object location information resulted in activation of the ventral and dorsolateral prefrontal cortex. Using another meta-analytic review, Owen (2000) reported results that support the findings mentioned in the above studies. Thus, it is clear that there is a large body of evidence based on recording and lesion studies supporting a working memory or short-term memory role for the PL-IL cortex in rats and the ventral and dorsolateral frontal cortex in monkeys and humans in working memory for spatial locations and objects.

Based on anatomical and behavioral data, the agranular insular and lateral orbital (AI/LO) cortex appears to play an important role in working memory for affect attribute information based on odor and taste. Supporting evidence is based on the findings that lesions of the AI/LO cortex produce deficits in working memory for affect based on taste or odor information (DeCoteau et al., 1997; Di Pietro et al., 2004; Otto & Eichenbaum, 1992; Ragozzino & Kesner, 1999). Further support can be found in a study where sustained neuronal firing was observed in the AI/LO cortex during the delay period in a non-matching-to-sample for odors task (Ramus & Eichenbaum, 2000). It should be noted that AI/LO cortical lesions do not produce an effect on visual object or spatial working memory (DeCoteau et al., 1997; Di Pietro et al., 2004; Horst & Laubach, 2009; Ragozzino & Kesner, 1999), although in a recent study there were deficits using a delayed alternation task for odors with lesions of the PL-IL cortex, but there was extra damage to the AI/LO cortex (Kinoshita et al., 2008). Deficits in short-term memory for odors have been reported following damage to the orbital frontal cortex in humans (Jones-Gotman & Zatorre, 1993). Furthermore, Dade et al. (2001) reported increased activity based on PET scans in the orbital frontal cortex as well as dorsal and ventral lateral prefrontal cortex in an n-back task based on odor information.

In summary, within the rule-based memory system, different brain regions process different attributes in support of short-term or working memory. Data are presented to support this assertion by demonstrating that the AC/PC cortex mediates response attribute information, the PL/IL mediates spatial location and visual object attribute information, and AI/OL cortex mediates odor and taste information within the sensory-perceptual attribute. It appears that the different regions can be dissociated from each other based on specific attributes. Parallel results are found in monkeys and humans in that response information is mediated by the premotor cortex, spatial location and visual object information are mediated by the dorsal and ventral lateral prefrontal cortex, and odor and taste information are mediated by the orbital frontal cortex.

Paired Associate Learning

It is assumed that in addition to processing of temporal information, the prefrontal cortex is also involved in mediating higher-order processes, such as rule learning based on the use of biconditional discrimination or paired associate paradigms. Passingham et al. (1988) showed that lesions of the AC and PC cortex in rats resulted in deficits in a visual conditional motor associative task. Similar results with the same type of lesion and the same visual conditional motor associative task were reported by (St-Laurent, Petrides, & Sziklas, 2009). Based on behavioral and anatomical data, the PC cortex in the rat is assumed to be homologous to the premotor and supplementary motor area in monkeys and humans (periarcuate or posterior dorsal lateral area; Brodmann areas 6 and 8) in that the deficits observed in the visual-response conditional task and working memory for a motor response task in rats are rather similar to what has been described for monkeys and humans. For example, in monkeys, Halsband and Passingham (1985) showed that the premotor cortex is directly involved in mediating a visual-conditional motor task, but not a visual conditional non-motor task, suggesting that the response component is critical. Similar deficits in a visual conditional response task following premotor cortex lesions have been found in monkeys and humans (Halsband & Freund, 1990; Petrides, 1982, 1985a, 1997). Based on fMRI analysis of learning arbitrary visual-response associations, increased activity in the dorsal premotor cortex (Toni, Ramnani, Josephs, Ashburner, & Passingham, 2001) has been observed in addition to supplementary motor cortex interacting with dorsolateral prefrontal cortex (Boettiger & D’Esposito, 2005).

In a different set of studies, it has been shown that rats with lesions of the PL/IL, but not of the AC and PC cortex, fail to acquire an object-place association (Kesner & Ragozzino, 2003). In a subsequent study Lee and Solivan (2008) showed that temporary inactivation of the PL/IL cortex with muscimol led to profound impairments in an object-place paired association task. Furthermore, impairment in a novelty detection paradigm using an object-in-place learning task has been observed in in rats with PL/IL lesions (Barker, Bird, Alexander, & Warburton, 2007).

Based on a different set of arbitrary associations, it has been shown in rats that lesions of AI/LO impair the learning of a cross-modal association involving odor and tactile stimuli (Whishaw, Tomie, & Kolb, 1992). Furthermore, based on single unit recording within the OL cortex of rats, it was found that many neurons were active during odor-location learning. In monkeys, in the orbital frontal cortex, many neurons were activated by both taste and odor stimuli (Rolls & Baylis, 1994), suggesting that flavor may be processed by orbital frontal cortex leading to pleasant experiences often associated with reward. In humans, Small et al. (1999) showed that based on fMRI data there was activation of the orbital frontal cortex during the processing of taste and odor information.

In the context of other types of paired associate learning, Petrides (1985b) has shown that humans with PFC cortex lesions have difficulty in learning a paired associate task and Pigott and Milner (1993) reported that frontal lobe damaged patients are impaired for objects and places in a complex visual scene task. Also, Klingberg and Roland (1998) showed that based on PET scans, the dorsal lateral prefrontal and anterior cingulate cortex are activated during new learning of visual-auditory paired associates.

There are many interactions among the three systems. I will present just one example, namely an interaction between the prefrontal cortex (rule-based system) and hippocampus (event-based memory system) in the context of temporal processing of short-term memory information. In this study Lee and Kesner (2003) examined the dynamic interactions between the prefrontal cortex and hippocampus by training and testing rats on a delayed non-matching-to-place task on a radial 8-arm maze requiring memory for a single spatial location following short-term (i.e., 10 s or 5 min) delays. The results showed that inactivating both regions at the same time resulted in a severe impairment of short-term and intermediate memory for spatial information, suggesting that one of the structures needs to function properly for intact processing of short-term or intermediate-term spatial memory. Thus, the two regions interact with each other to ensure the processing of spatial information oveer a dynamic temporal range including both short-term and intermediate term memory. The current results provide compelling evidence indicating that a mnemonic time window is a critical factor in dissociating the function of the hippocampal system from that of the medial prefrontal cortex in a delayed choice task. That is, the dorsal hippocampus and medial prefrontal cortex appear to process spatial memory in parallel within a shortterm range, whereas the dorsal hippocampal function becomes more essential once the critical time window requires spatial memory for a time period exceeding that range.

In summary, within the rule-based memory system different brain regions process different attributes in support of paired associate learning. Data are presented to support this assertion by demonstrating that the PC cortex mediates associative processes based on a visual conditional response learning, the PL/IL cortex mediates associative processes based on visual-spatial learning, and AI/LO cortex mediates associative processes based primarily on odor-taste learning. PPC mediates the spatial attribute for spatial perceptual information and the TE2 cortex mediates the sensory-perceptual attribute for visual object information. Parallel results are found in monkeys and humans in that for monkeys and humans the premotor mediates associative processes based on visual conditional response learning, the dorsal and ventral lateral prefrontal cortex mediate object-place learning, and the orbital prefrontal cortex mediates odor-taste associations.

Other Theories of Memory

I will compare the tripartite attribute model with the O’Keefe and Nadel (1978) and Moscovitch et al. (2005); the Squire (1994) and Squire et al. (2004); and the Cohen and Eichenbaum, (1993), Eichenbaum (2004) and Eichenbaum et al., (2007) models. I will apply Schacter and Tulving (1994) suggestion that one needs to define memory systems in terms of the kind of information to be represented, the processes associated with the operation of each system, and the neurobiological substrates including neural structures and mechanisms that subserve each system.

O’Keefe and Nadel (1978) have concentrated on the role of the hippocampus as the neurobiological substrate in processing spatial and contextual information and more recently Nadel and Moscovitch (1998) have suggested that the hippocampus stores episodic memory. They do suggest that transfer of episodic information to the neocortex can occur to store semantic memories. The tripartite attribute model does not accept that all episodic (event-based) memories are stored in the hippocampus and emphasizes that the hippocampus supports temporal, odor, and language information in addition to space and context. The tripartite attribute model also states that new information is stored in a semantic or knowledge-based memory system via a consolidation process, but the model emphasizes that different neocortical areas store different attributes of memory which utilizes different processes from the event-based memory system, such as perception, selective attention, and implicit memory. The Nadel model does not incorporate the prefrontal cortex in their memory model.

Squire proposes a declarative versus non-declarative system long-term memory model (Squire 1994; Squire et al., (2004). It is assumed that the declarative memory system is based on explicit information that is easily accessible and is concerned with specific facts or data. It includes episodic and semantic representations of propositions and images. On the other hand, the non-declarative memory system is based on implicit information that is not easily accessible and includes unaware representations of motor, perceptual, and cognitive skills, as well as priming, simple classical conditioning, and non-associative learning. In this model the hippocampus and interconnected neural regions, such a perirhinal cortex, postrhinal/parahippocampal gyrus, and entorhinal cortex encompasses the medial temporal lobe and is assumed to be the critical neural substrate in mediating all forms of memory within the declarative memory system. The non-declarative memory system include the mediation of skills and habits by the striatum, priming by the neocortex, simple classical conditioning of emotional responses by the amygdala, simple classical conditioning of skeletal musculature by the cerebellum, and non-associative learning by reflex pathways. It is assumed that the two memory systems are independent of each other. The Squire model assumes that the declarative memory system is based on conscious awareness and is involved in consolidation, and that all the areas of the medial temporal lobe are critical for recollection of all types of sensory information. In the tripartite attribute model, I do not differentiate the attributes on the basis of conscious awareness. Furthermore, I assume that for each attribute, the same processes operate in the event-based memory system and thus, non-declarative memory does not operate in the tripartite attribute model. In my model, the perirhinal cortex and hippocampus subserve different attribute information, such as object information and spatial information, respectively. Also, the prefrontal cortex is not usually incorporated in a specific memory system and the emphasis has not been on the different attributes of memory.

Cohen and Eichenbaum, (1993) and Eichenbaum (2004) propose that the declarative memory system is dependent upon the hippocampus, which provides a substrate for relational representation of all forms of memory including conjunctive, configural and arbitrary associations, as well as representational flexibility allowing for the retrieval of memories in novel situations. In contrast, the procedural system is independent of the hippocampus and is characterized by individual representations and inflexibility in retrieving memories in novel situations. In contrast to this declarative/non-declarative perspective, relational memory theory proposes that the medial temporal lobe function is independent of conscious awareness.

Recently, a model for episodic recognition memory was proposed and extended by Eichenbaum et al, (2007). This model, called the Binding Items and Context (BIC) model, proposed that information pertaining to item identity (i.e., “what”) resides primarily in the perirhinal cortex and information pertaining to the context wherein an item was experienced (i.e., “where”) resides primarily in the postrhinal/parahippocampal cortex. The item and context information are transmitted through the lateral and medial entorhinal cortices, respectively, and they enter the hippocampus, at which point the item and the context are bound together into coherent episodes. I assume that for each attribute the same processes operate in the event-based memory system and thus, the different components of the medial temporal lobe utilize similar functions. Even though Eichenbaum is clearly aware of the importance of the prefrontal cortex in its interactions with the hippocampus, the prefrontal cortex is not included in the overall memory model.

Summary

Memory is a complex phenomenon due to a large number of potential interactions that are associated with the organization of memory at the psychological and neural system level. Thus, it is not surprising that there many different models of memory (see other theories of memory section). In the Kesner tripartite, multiple attribute, multiple process memory model, different forms of memory and its neurobiological underpinnings are represented in terms of the nature, structure, or content of information representation as a set of different attributes including language, time, place, response, reward value (affect), and visual object as an example of sensory-perception. For each attribute, information is processed in the event-based memory system through a variety of operations but especially for short-term and intermediate-term memory and pattern separation based on orthogonalization of specific attribute information. In addition, for each attribute, information is processed in the knowledge-based system through a variety of operations, but especially for long-term storage and perceptual memory. Finally, for each attribute, it is assumed that information is processed in the rule-based memory system through the integration of information from the event-based and knowledge-based memory systems for the use for major processes that include especially short-term or working memory and paired associate learning. The neural systems that subserve specific attributes within a system can operate independent of each other, even though there are also many possibilities for interactions among the attributes. Although the event-based and knowledge-based memory systems are supported by neural substrates and different operating characteristics, suggesting that the two systems can operate independent of each other, there are also important interactions between the two systems, especially during the consolidation of new information and retrieval of previously stored information. Finally, because it is assumed that the rule-based system is influenced by the integration of event-based and knowledge-based memory information, there should be important interactions between the event-based and knowledge-based memory systems and the rule-based memory system. Thus, for each attribute, there is a neural circuit that encompasses all three memory systems in representing specific attribute information. In general, the tripartite attribute memory model represents the most comprehensive memory model capable of integrating the extant knowledge concerning the neural system representation of memory.

It is important to note the new information that has been obtained during the last decade. For the event-based memory system there has been (a) an extensive elaboration of the temporal attribute in terms of memory for duration, memory for sequential spatial processing, and memory for the sensory-perceptual attribute based on the operation of short-term memory or working memory, and (b) an emphasis on the characterization of an attribute-based pattern separation process. For the knowledge-based memory system, only a few studies have been added. For the rule-based memory system there has been a more elaborate detailed delineation of the different subregions of the prefrontal cortex in mediating working memory and paired associate learning, as well as a discussion of potential interactions between the event-based and rule-based memory systems.


References

Aimone, J. B., Wiles, J. & Gage, F. H. (2006). Potential role for adult neurogenesis in the encoding of time in new memories. Nature Neuroscience, 9, 723–727. doi.org/10.1038/nn1707 PMid:16732202

Adolphs, R., Tranel, D., Damasio, H., & Damasio, A. (1994). Impaired recognition of emotion in facial expressions following bilateral damage to the human amygdala. Nature, 372, 669-672. doi.org/10.1038/372669a0 PMid:7990957

Aggleton, J. P., Hunt, P. R., & Rawlins, J. N. P. (1986). The effects of hippocampal lesions upon spatial and non-spatial tests of working memory. Behavioural Brain Research, 19, 133-146. doi.org/10.1016/0166-4328(86)90011-2

Baeg, E. H., Kim, Y. B., Huh, K., Mook-Jung, I., Kim, H. T., & Jung, M. W. (2003). Dynamics of population code for working memory in the prefrontal cortex. Neuron, 40, 177-188. doi.org/10.1016/S0896-6273(03)00597-X

Banks, W. P. (1978). Encoding and processing of symbolic information in comparative judgements. In G. H. Bower (Ed.), The psychology of learning and motivation: Advances in theory and research (pp. 101–159). New York: Academic.

Barker, G. R. I., Bird, F., Alexander, V., & Warburton, E. C. (2007). Recognition memory for objects, place, and temporal order: A disconnection analysis of the role of the medial prefrontal cortex and perirhinal cortex. Journal of Neuroscience, 27, 2948-2957. doi.org/10.1523/JNEUROSCI.5289-06.2007 PMid:17360918

Baumann, O., Chan E., & Mattingley, J. B. (2012). Distinct neural networks underlie encoding of categorical versus coordinate spatial relations during active navigation. Neuroimage, 60, 1630-1637. doi.org/10.1016/j.neuroimage.2012.01.089 PMid:22300811

Benton, A. L. (1969). Disorder of spatial orientation. In P. J. Vinken & G. W. Bruyn (Eds.), Handbook of clinical neurology (Vol. 3). Amsterdam: North Holland.

Boettiger, C. A., & D’Esposito, M. (2005). Frontal networks for learning and executing arbitrary stimulus-response associations. Journal of Neuroscience, 25, 2723-2732. doi.org/10.1523/JNEUROSCI.3697-04.2005 PMid:15758182

Brito, G. N. O., & Brito, L. S. O. (1990). Septohippocampal system and the prelimbic sector of frontal cortex: A neuropsychological battery analysis in the rats. Behavioural Brain Research, 36, 52-58. doi.org/10.1016/0166-4328(90)90167-D

Buffalo, E. A., Ramus, S. J., Clark, R. E., Teng, E., Squire, L. R., & Zola, S. M. (1999). Dissociation between the effects of damage to perirhinal cortex and area TE. Learning and Memory, 6, 572-599. doi.org/10.1101/lm.6.6.572 PMid:10641763 PMCid:311316

Bussey, T. J., Saksida, L. M., & Murray, E. A. (2002). Perirhinal cortex resolves feature ambiguity in complex discriminations. European Journal of Neuroscience, 15, 365–374. doi.org/10.1046/j.0953-816x.2001.01851.x PMid:11849302

Butters, N., & Pandya, D. (1969). Retention of delayed-alternation: Effect of selective lesions of sulcus principalis. Science, 165, 1271-1273. doi.org/10.1126/science.165.3899.1271 PMid:4979528

Cave, C. B., & Squire, L. R. (1992). Intact verbal and nonverbal short-term memory following damage to the human hippocampus. Hippocampus, 2, 151-163. doi.org/10.1002/hipo.450020207 PMid:1308180

Chang, J. Y., Chen, L., Luo, F., Shi, L. H., & Woodward, D. J. (2002). Neuronal responses in the frontal cortico-basal ganglia system during delayed matching-to-sample task: Ensemble recording in freely moving rats. Experimental Brain Research, 142, 67-80. doi.org/10.1007/s00221-001-0918-3 PMid:11797085

Chen, L. L., Lin, L., Barnes, C. A., & McNaughton, B. L. (1994). Head-direction cells in the rat posterior cortex II: Contributions of visual and ideothetic information to the directional firing. Experimental Brain Research, 101, 24-34. doi.org/10.1007/BF00243213 PMid:7843299

Chiba, A. A., Johnson, D. L., & Kesner, R. P. (1992). The effects of lesions of the dorsal hippocampus or the ventral hippocampus on performance of a spatial location order recognition task. Society for Neuroscience Abstract, 18, 1422.

Chiba, A. A., Kesner, R. P., & Jackson, P. (2002). Two forms of spatial memory: A double dissociation between the parietal cortex and the hippocampus in the rat. Behavioral Neuroscience, 116, 874-883. doi.org/10.1037/0735-7044.116.5.874 PMid:12369807

Chiba, A. A., Kesner, R. P., Matsuo, F., & Heilbrun, M. P. (1990). A dissociation between verbal and spatial memory following unilateral temporal lobectomy. Society for Neuroscience Abstracts, 16, 286.

Chiba, A. A., Kesner, R. P., Matsuo, F., & Heilbrun, M. P. (1993). A dissociation between affect and recognition following unilateral temporal lobectomy including the amygdala. Society for Neuroscience Abstracts, 19, 792.

Chiba, A. A., Kesner, R. P., & Reynolds, A. M. (1994). Memory for spatial location as a function of temporal lag in rats: Role of hippocampus and medial prefrontal cortex. Behavioral and Neural Biology, 61, 123–131. doi.org/10.1016/S0163-1047(05)80065-2

Clelland, C. D., et al. (2009). A functional role for adult hippocampal neurogenesis in spatial pattern separation. Science, 325, 210–213. doi.org/10.1126/science.1173215 PMid:19590004 PMCid:2997634

Cohen, N. J., & Eichenbaum, H. B. (1993). Memory, amnesia, and hippocampal function. Cambridge: MIT Press. Colombo, P. J., Davis, H. P., & Volpe, B. T. (1989). Allocentric spatial and tactile memory impairments in rats with dorsal caudate lesions are affected by preoperative behavioral training. Behavioral Neuroscience, 103, 1242-1250. doi.org/10.1037/0735-7044.103.6.1242 PMid:2610917

Cook, D., & Kesner, R. P. (1988). Caudate nucleus and memory for egocentric localization. Behavioral Neural Biology, 49, 332-343. doi.org/10.1016/S0163-1047(88)90338-X

Dade, L. A., Zatorre, R. J., Evans, A. C., & Jones-Gotman, M. (2001). Working memory in another dimension: Functional imaging of human olfactory working memory. NeuroImage, 14, 650-660. doi.org/10.1006/nimg.2001.0868 PMid:11506538

Davis, J. D., Filoteo, J. V., Kesner, R. P., & Roberts, J. W. (2003). Recognition memory for hand positions and spatial locations in patients with Huntington’s disease: Differential visuospatial memory impairment? Cortex, 39, 239-253. doi.org/10.1016/S0010-9452(08)70107-2

Davis, J. D., Filoteo, J. V., & Kesner, R. P. (2007). Is shortterm memory for discrete arm movements impaired in Huntington’s disease? Cortex, 43, 255-263. doi.org/10.1016/S0010-9452(08)70480-5

Dean, P. (1990). Sensory cortex: Visual perceptual functions. In B. Kolb & R. C. Tees (Eds.), Cerebral cortex of the rat (pp. 275-308). Cambridge: MIT Press. PMid:1975555

DeCoteau, W. E., Hoang, L., Huff, L., Stone, A., & Kesner, R. P. (2004). Effects of hippocampus and medial caudate nucleus lesions on memory for direction information in rats. Behavioral Neuroscience, 118, 540-545. doi.org/10.1037/0735-7044.118.3.540 PMid:15174931

DeCoteau, W. E., & Kesner, R. P. (1998). Effects of hippocampal and parietal cortex lesions on the processing of multiple object scenes. Behavioral Neuroscience, 112, 68-82. doi.org/10.1037/0735-7044.112.1.68 PMid:9517816

DeCoteau, W. E., Kesner, R. P., & Williams, J. M. (1997). Short-term memory for food reward magnitude: The role of the prefrontal cortex. Behavioural Brain Research, 88, 239-249. doi.org/10.1016/S0166-4328(97)00044-2

Delatour, B., & Gisquet-Verrier, P. (2001). Involvement of the dorsal anterior cingulate cortex in temporal behavioral sequencing: Subregional analysis of the medial prefrontal cortex in rat. Behavioural Brain Research, 126, 105-114. doi.org/10.1016/S0166-4328(01)00251-0

D’Esposito, M., Aguirre, G. K., Zarahn, E., Ballard, D., Shin, R. K., & Lease, J. (1998). Functional MRI studies of spatial and nonspatial working memory. Cognitive Brain Research, 7, 1-13. doi.org/10.1016/S0926-6410(98)00004-4

De Renzi, E. (1982). Disorders of space exploration and cognition. New York: Wiley.

DiMattia, B. V., & Kesner, R. P. (1988a). The role of the posterior parietal association cortex in the processing of spatial event information. Behavioral Neuroscience, 102, 397-403. doi.org/10.1037/0735-7044.102.3.397 PMid:3395449

DiMattia, B. V., & Kesner, R. P. (1988b). Spatial cognitive maps: Differential role of parietal cortex and hippocampal formation. Behavioral Neuroscience, 102, 471-480. doi.org/10.1037/0735-7044.102.4.471 PMid:3166721

Di Pietro, N. C., Black, Y. D., Green-Jordan, K., Eichenbaum, H. B., & Kantak, K. M. (2004). Complementary tasks to measure working memory in distinct prefrontal cortex subregions in rats. Behavioral Neuroscience, 118, 1042-1051. doi.org/10.1037/0735-7044.118.5.1042 PMid:15506886

Disterhoft, J. F., Carrillo, M. C., Hopkins, R. O., Gabrieli, J. D. E., & Kesner, R. P. (1996). Impaired trace eyeblink conditioning in severe medial temporal lobe amnesics. Society for Neuroscience Abstracts, 22, 1866.

Divac, I., Rosvold, H. E., & Szwarcbart, M. K. (1967). Behavioral effects of selective ablation of the caudate nucleus. Journal of Comparative and Physiological Psychology, 63, 184-190. doi.org/10.1037/h0024348 PMid:4963561

Douglas, R. J., & Pribram, K. H. (1966). Learning and limbic lesions. Neuropsychology, 4, 197-220. doi.org/10.1016/0028-3932(66)90028-5

Dunnett, S. B. (1990). Role of prefrontal cortex and striatal output systems in short-term memory deficits associated with ageing, basal forebrain lesions, and cholinergic-rich grafts. Canadian Journal of Psychology, 44, 210-232. doi.org/10.1037/h0084240 PMid:2383812

Eichenbaum, H. (2004). Hippocampus: Cognitive processes and neural representations that underlie declarative memory. Neuron, 44, 109-120. doi.org/10.1016/j.neuron.2004.08.028 PMid:15450164

Eichenbaum, H., Yonelinas, A. P., & Ranganath, C. (2007). The medial temporal lobe and recognition memory. Annual Review of Neuroscience, 30, 123-152. doi.org/10.1146/annurev.neuro.30.051606.094328 PMid:17417939 PMCid:2064941

Ellis, A. X., Sala, S. D., & Logie, R. H. (1996). The Bailiwick of visuo-spatial working memory: evidence from unilateral spatial neglect. Cognitive Brain Research, 3, 71-78. doi.org/10.1016/0926-6410(95)00031-3

Ennaceur, A., Neave, N., & Aggleton, J. P. (1997). Spontaneous object recognition and object location memory in rats: The effects of lesions in the cingulate cortices, the medial prefrontal cortex, the cingulum bundle and the fornix. Experimental Brain Research, 113, 509-519. doi.org/10.1007/PL00005603

Estes, W. K. (1986). Memory for temporal information. In J. A. Michon & J. L. Jackson (Eds.), Time, mind and behavior (pp. 151–168). New York: Springer-Verlag.

Flaherty, C. F., Turovsky, J., Krauss, K. L. (1994). Relative hedonic value modulates anticipatory contrast. Physiology and Behavior, 55, 1047–1054.
doi.org/10.1016/0031-9384(94)90386-7

Fortin, N. J., Agster, K. L., & Eichenbaum, H. B. (2002). Critical role of the hippocampus in memory for sequences of events. Nature Neuroscience, 5, 458-462. PMid:11976705

Funahashi, S., Bruce, C. J. & Goldman-Rakic, P. S. (1993). Dorsolateral prefrontal lesions and oculomotor delayed performance: Evidence for mnemonic “scotomas.” Neuroscience, 13, 1479-1497. PMid:8463830

Fuster, J. M. (1995). Memory in the cerebral cortex: An empirical approach to neural networks in the human and nonhuman primate. Cambridge: MIT Press.

Fuster, J. M., Bauer, R. H., & Jervey, J. P. (1982). Cellular discharge in the dorsolateral prefrontal cortex of the monkey in cognitive tasks. Cerebral Cortex, 4, 443-450.

Gaffan, D. (1992). Amygdala and the memory of reward. In J. P. Aggleton (Ed.), The amygdala: Neurobiological aspects of emotion, memory, and mental dysfunction. New York: Wiley-Liss.

Gaffan, D., & Murray E. A. (1992). Monkeys (Macaca fascicularis) with rhinal cortex ablations succeed in object discrimination learning despite 24-hr intertrial intervals and fail at matching to sample despite double sample presentations. Behavioral Neuroscience, 106, 30-38.
doi.org/10.1037/0735-7044.106.1.30 PMid:1554436

Gilbert, P. E., & Kesner, R. P. (2002). The amygdala but not the hippocampus is involved in pattern separation based on reward value. Neurobiology of Learning and Memory, 77, 338–353. doi.org/10.1006/nlme.2001.4033 PMid:11991762

Gilbert, P. E., & Kesner, R. P. (2003). Recognition memory for complex visual discrimination is influenced by stimulus interference in rodents with perirhinal cortex damage. Learning and Memory, 10, 525–530. doi.org/10.1101/lm.64503 PMid:14657264 PMCid:305468

Gilbert, P. E., Kesner, R. P., & Lee, I. (2001). Dissociating hippocampal subregions: A double dissociation between the dentate gyrus and CA1. Hippocampus, 11, 626–636. doi.org/10.1002/hipo.1077 PMid:11811656

Goldman, P. S., & Rosvold, H. E. (1970). Localization of function within the dorsolateral prefrontal cortex of the rhesus monkey. Experimental Neurology, 27, 291-304. doi.org/10.1016/0014-4886(70)90222-0

Goldman-Rakic, P. S. (1987). Circuitry of primate prefrontal cortex and regulation of behavior by representational memory. In F. Plum & V. Mountcastle (Eds.), Handbook of physiology: The nervous system (pp. 373-417). Bethesda: American Physiological Society.

Goldman-Rakic, P. S. (1996). The prefrontal landscape: Implications of functional architecture for understanding human mentation and the central executive. Philosophical Transactions of the Royal Society of London, Biology, 351, 1445-1453. doi.org/10.1098/rstb.1996.0129 PMid:8941956

Goodrich-Hunsaker, N. J., Hunsaker, M. R., & Kesner, R. P. (2005). Dissociating the role of the parietal cortex and dorsal hippocampus for spatial information processing. Behavioral Neuroscience, 119, 1307-1315. doi.org/10.1037/0735-7044.119.5.1307 PMid:16300437

Granon, S., Vidal, C., Thinus-Blanc, C., Changeux, J.-P., & Poucet, B. (1994). Working memory, response selection, and effortful processing in rats with medial prefrontal lesions. Behavioral Neuroscience, 108, 883-891. doi.org/10.1037/0735-7044.108.5.883 PMid:7826511

Gross, C. G. (1973). Inferotemporal cortex and vision. In E. Stellar & J. M. Sprague (Eds.), Progress in physiological psychology (Vol. 5, pp. 77-123). New York: Academic. Halsband, U., & Freund, H. J. (1990). Premotor cortex and conditional motor learning in man. Brain, 113, 207-222. doi.org/10.1093/brain/113.1.207 PMid:2302533

Halsband, U., & Passingham, R. E. (1985). Premotor cortex and the conditions for movement in monkeys (Macaca mulatta). Behavioural Brain Research, 18, 269-276. doi.org/10.1016/0166-4328(85)90035-X

Hampson, R. E., Heyser, C. J., & Deadwyler, S. A. (1993). Hippocampal cell firing correlates of delayed-match-tosample performance in the rat. Behavioral Neuroscience, 107, 715-739. doi.org/10.1037/0735-7044.107.5.715 PMid:8280383

Hannesson, D. K., Howland, J. G., & Phillips, A. G. (2004). Interaction between perirhinal and medial prefrontal cortex is required for temporal order but not recognition memory. Journal of Neuroscience, 24, 4596–4603. doi.org/10.1523/JNEUROSCI.5517-03.2004 PMid:15140931

Harrison, L. D., & Mair, R. G. (1996). A comparison of the effects of frontal cortical and thalamic lesions on measures of spatial learning and memory in the rat. Behavioural Brain Research, 75, 195-206. doi.org/10.1016/0166-4328(96)00173-8

Ho, J. W.-T., Narduzzo, K. E., Outram, A., Tinsley, C .J., Henley, J. M., Warburton, E. C., & Brown, M. W. (2011). Contributions of area Te2 to rat recognition memory. Learning and Memory, 18, 493-501. doi.org/10.1101/lm.2167511 PMid:21700715 PMCid:3125610

Hoge, J., & Kesner R. P. (2007). Role of CA3 and CA1 subregions of the dorsal hippocampus on the temporal processing of objects. Neurobiology of Learning and Memory, 88, 225–231. doi.org/10.1016/j.nlm.2007.04.013 PMid:17560815 PMCid:2095779

Holden, H. M., Hoebel, C., Loftis, K., & Gilbert, P. E. (2012). Spatial pattern separation in cognitively normal young and older adults. Hippocampus. Advanced online publication. doi.org/10.1002/hipo.22017 PMid:22467270

Hopkins, R. O. & Kesner, R. P. (1993). Memory for temporal and spatial distances for new and previously learned geographical information in hypoxic subjects. Paper presented at the Society for Neuroscience Meeting, Washington, DC.

Hopkins, R. O., & Kesner, R. P. (1994). Short-term memory for duration in hypoxic subjects. Society for Neuroscience Abstracts, 20, 1075.

Hopkins, R. O., Kesner, R. P., & Goldstein, M. (1995a). Item and order recognition memory for words, pictures, abstract pictures, spatial locations, and motor responses in subjects with hypoxic brain injury. Brain and Cognition, 27, 180-201. doi.org/10.1006/brcg.1995.1016 PMid:7772332

Hopkins, R. O., Kesner, R. P., & Goldstein, M. (1995b). Memory for novel and familiar spatial and linguistic temporal distance information in hypoxic subjects. Journal of the International Neuropsychological Society, 1, 454-468. doi.org/10.1017/S1355617700000552 PMid:9375231

Horel, J. A., Pytko-Joiner, D. E., Boytko, M. L., & Salsbury, K. (1987). The performance of visual tasks while segments of the inferotemporal cortex are suppressed by cold. Behavioural Brain Research, 23, 29-42. doi.org/10.1016/0166-4328(87)90240-3

Horst, N. K., & Laubach, M. (2009). The role of rat dorsomedial prefrontal cortex in spatial working memory. Neuroscience, 164, 444-456. doi.org/10.1016/j.neuroscience.2009.08.004 PMid:19665526 PMCid:2761984

Hunsaker, M. R., Thorup, J. A., Welch, T., & Kesner, R. P. (2006). The role of CA3 and CA1 in the acquisition of an object-trace-place paired associate task. Behavioral Neuroscience, 120, 1252-1256. doi.org/10.1037/0735-7044.120.6.1252 PMid:17201469

Hunsaker, M. R., Fieldsted, P. M., Rosenberg, J. S., & Kesner, R. P. (2008). Dissociating the roles of dorsal and ventral CA1 for the temporal processing of spatial locations, visual objects, and odors. Behavioral Neuroscience, 122, 643–650. doi.org/10.1037/0735-7044.122.3.643 PMid:18513134

Hunsaker, M. R., Mooy, G. G., Swift, J. S., & Kesner, R. P. (2007). Dissociations of the medial and lateral perforant path projections into dorsal DG, CA3, and CA1 for spatial and nonspatial (visual object) information processing. Behavioral Neuroscience, 121, 742-750.
doi.org/10.1037/0735-7044.121.4.742 PMid:17663599

Jackson-Smith, P., Kesner, R. P., & Chiba, A. A. (1993). Continuous recognition of spatial and nonspatial stimuli in hippocampal lesioned rats. Behavioral and Neural Biology, 59, 107-119. doi.org/10.1016/0163-1047(93)90821-X

Jackson, P. A., Kesner, R. P., & Amann, K. (1998). Memory for duration: Role of hippocampus and medial prefrontal cortex. Neurobiology of Learning and Memory, 70, 328-348. doi.org/10.1006/nlme.1998.3859 PMid:9774525

Jeffery, K. J., & Anderson, M. I. (2003). Dissociation of the geometric and contextual influences on place cells. Hippocampus, 13, 868–872. doi.org/10.1002/hipo.10162 PMid:14620882

Jones, B., & Mishkin, M. (1972). Limbic lesions and the problem of stimulus-reinforcement associations. Experimental Neurology, 36, 362-377. doi.org/10.1016/0014-4886(72)90030-1

Jones-Gotman, M., & Zatorre, R. J. (1993). Odor recognition memory in humans: Role of right temporal and orbitofrontal regions. Brain and Cognition, 22, 182-198. doi.org/10.1006/brcg.1993.1033 PMid:8373572

Keane, M. M., Gabrieli, J. D. E., Mapstone, H. C., Johnson, K. A., & Corkin, S. (1995). Double dissociation of memory capacities after bilateral occipital-lobe or medial temporal-lobe lesions. Brain, 118, 1129-1148. doi.org/10.1093/brain/118.5.1129 PMid:7496775

Kesner, R. P. (1998). Neurobiological views of memory. In J. L. Martinez & R. P. Kesner (Eds.), Neurobiology of learning and memory (pp. 361-416). San Diego: Academic. doi.org/10.1016/B978-012475655-7/50011-3

Kesner, R. P. (2000). Subregional analysis of mnemonic functions of the prefrontal cortex in the rat. Psychobiology, 28, 219-228.

Kesner, R. P. (2002). Memory neurobiology. In V. S. Ramachadran (Ed.), Encyclopedia of the human brain (Vol. 2, pp. 783-796). San Diego: Academic. doi.org/10.1016/B0-12-227210-2/00200-4

Kesner, R. P., Bolland, B. L., & Dakis, M. (1993). Memory for spatial locations, motor responses, and objects: Triple dissociation among the hippocampus, caudate nucleus, and extrastriate visual cortex. Experimental Brain Research, 93, 462-470. doi.org/10.1007/BF00229361 PMid:8519335

Kesner, R. P., Farnsworth, G., & Kametani, H. (1992). Role of parietal cortex and hippocampus in representing spatial information. Cerebral Cortex, 1, 367-373. doi.org/10.1093/cercor/1.5.367

Kesner, R. P., Hunsaker, M. R. & Gilbert P. E. (2005). The role of CA1 in the acquisition of an object-trace-odor paired associate task. Behavioral Neuroscience, 119, 781- 786. doi.org/10.1037/0735-7044.119.3.781 PMid:15998199

Kesner, R. P., & Gilbert, P. E. (2006). The role of the medial caudate nucleus, but not the hippocampus, in a matchingto sample task for a motor response. European Journal of Neuroscience, 23, 1888–1894. doi.org/10.1111/j.1460-9568.2006.04709.x PMid:16623845

Kesner, R. P., Gilbert, P. E., & Barua, L. A. (2002). The role of the hippocampus in memory for the temporal order of a sequence of odors. Behavioral Neuroscience, 116, 286–290. doi.org/10.1037/0735-7044.116.2.286 PMid:11996313

Kesner, R., & Hopkins, R. (2001). Short-term memory for duration and distance in humans: Role of the hippocampus. Neuropsychology 15, 58-68. doi.org/10.1037/0894-4105.15.1.58 PMid:11216890

Kesner, R. P, Hunsaker M. R., & Ziegler, W. (2010). The role of the dorsal CA1 and ventral CA1 in memory for the temporal order of a sequence of odors. Neurobiology of Learning and Memory, 93, 111-116. doi.org/10.1016/j.nlm.2009.08.010 PMid:19733676

Kesner, R. P., Hunt, M. E., Williams, J. M., & Long, J. M. (1996). Prefrontal cortex and working memory for spatial response, spatial location, and visual object information in the rat. Cerebral Cortex, 6, 311-318. doi.org/10.1093/cercor/6.2.311 PMid:8670659

Kesner, R. P., Lee, I., & Gilbert, P. (2004). A behavioral assessment of hippocampal function based on a subregional analysis. Reviews Neuroscience, 15, 333–351. doi.org/10.1515/REVNEURO.2004.15.5.333 PMid:15575490

Kesner, R. P., & Ragozzino, M. E. (2003). The role of the prefrontal cortex in object-place learning: A test of the attribute specificity model. Behavioural Brain Research, 146, 159-165. doi.org/10.1016/j.bbr.2003.09.024 PMid:14643468

Kesner, R. P., Ravindranathan, A., Jackson, P., Giles, R., & Chiba, A. A. (2001). A neural circuit analysis of visual object recognition memory: Role of perirhinal, medial and lateral entorhinal cortex. Learning & Memory, 8, 87-95. doi.org/10.1101/lm.29401 PMid:11274254 PMCid:311369

Kesner, R. P., Walser, R. D., & Winzenried, G. (1989). Central but not basolateral amygdala mediates memory for positive affective experiences. Behavioural Brain Research, 33, 189-195. doi.org/10.1016/S0166-4328(89)80050-6

Kesner, R. P., & Williams, J. M. (1995). Memory for magnitude of reinforcement: Dissociation between the amygdala and hippocampus. Neurobiology of Learning and Memory, 64, 237-244. doi.org/10.1006/nlme.1995.0006 PMid:8564377

Kinoshita, S., Yokohama, C., Masaki, D., Yamashita, T., Tsuchida, H., Nakatomi, Y., & Fukui, K. (2008). Effects of rat medial prefrontal cortex lesions on olfactory serial reversal and delayed alternation tasks. Neuroscience Research, 60, 213-218. doi.org/10.1016/j.neures.2007.10.012 PMid:18077035

Kirwan, C. B., & Stark, E. L. (2007). Overcoming interference: An fMRI investigation of pattern separation in the medial temporal lobe. Learning and Memory, 14, 625–633. doi.org/10.1101/lm.663507 PMid:17848502 PMCid:1994079

Klingberg, T., & Roland, P. E. (1998). Right prefrontal activation during encoding, but not during retrieval, in a nonverbal paired-associates task. Cerebral Cortex, 8, 73-79. doi.org/10.1093/cercor/8.1.73 PMid:9510387

Kolb, B., Sutherland, R. J., & Whishaw, I. Q. (1983). A comparison of the contributions of the frontal and parietal association cortex to spatial localization in rats. Behavioral Neuroscience, 97, 13-27. doi.org/10.1037/0735-7044.97.1.13 PMid:6838719

Kubie, J. L., & Ranck, J. B. (1983). Sensory-behavioral correlates in individual hippocampal neurons in three situations: Space and context. In W. Seifert (Ed.), Neurobiology of the hippocampus (pp. 433-447). New York: Academic.

Lee, I., Hunsaker, M. R., & Kesner, R. P. (2005b). The role of hippocampal subregions in detecting spatial novelty. Behavioral Neuroscience, 119, 145-153. doi.org/10.1037/0735-7044.119.1.145 PMid:15727520

Lee, I., Jerman, T. S., & Kesner, R. P. (2005a). Disruption of delayed memory for a sequence of spatial locations following CA1- or CA3-lesions of the dorsal hippocampus. Neurobiology of Learning and Memory, 84, 138-147. doi.org/10.1016/j.nlm.2005.06.002 PMid:16054848

Lee, I., & Kesner, R. P. (2003). Time-dependent relationship between the dorsal hippocampus and the prefrontal cortex in spatial memory. Journal of Neuroscience, 23, 1517-1523. PMid:12598640

Lee I., Rao G., & Knierim, J. J. (2004). A double dissociation between hippocampal subfields: differential time course of CA3 and CA1 place cells for processing changed environments. Neuron, 42, 803-815. doi.org/10.1016/j.neuron.2004.05.010 PMid:15182719

Lee, I., & Solivan, F. (2008). The roles of the medial prefrontal cortex and hippocampus in a spatial paired-association task. Learning and Memory, 15, 357-367.
doi.org/10.1101/lm.902708 PMid:18463175 PMCid:2364607

Long, J. M., & Kesner, R. P. (1996). The effects of dorsal vs. ventral hippocampal, total hippocampal, and parietal cortex lesions on memory for allocentric distance in rats. Behavioral Neuroscience, 110, 922-932. doi.org/10.1037/0735-7044.110.5.922 PMid:8918996

Long, J. M., Mellem, J. E., & Kesner, R. P. (1998). The effects of parietal cortex lesions on an object/spatial location paired-associate task in rats. Psychobiology, 26, 128-133.

Madsen, J., & Kesner, R. P. (1995). The temporal-distance effect in subjects with dementia of the Alzheimer type. Alzheimer Disease and Associated Disorders, 9, 94–100. doi.org/10.1097/00002093-199509020-00006 PMid:7662329

Marshuetz, C. (2005). Order information in working memory: An integrative review of evidence from brain and behavior. Psychology Bulletin, 131, 323–339. doi.org/10.1037/0033-2909.131.3.323 PMid:15869331

McCarthy, R. A. & Warrington, E. K. (1990). Cognitive psychology: A clinical introduction. London: Academic. McNaughton, B. L., Barnes, C. A., & O’Keefe, J. (1983). The contributions of position, direction and velocity to single unit activity in the hippocampus of freely-moving rats. Experimental Brain Research, 52, 41-49. doi.org/10.1007/BF00237147 PMid:6628596

McNaughton, B. L., Chen, L. L., & Marcus, E. J. (1991). “Dead reckoning”, landmark learning, and the sense of direction: A neurophysiological and computational hypothesis. Journal of Cognitive Neuroscience, 3, 190-202. doi.org/10.1162/jocn.1991.3.2.190

Mehta, M. R., Barnes, C. A., & McNaughton, B. L. (1997). Experience-dependent, asymmetric expansion of hippocampal place fields. Proceedings of the National Academy of Sciences US, 94, 8918-8921. doi.org/10.1073/pnas.94.16.8918

Mehta, M. R., Quirk, M. C., & Wilson, M. A. (2000). Experience-dependent asymmetric shape of hippocampal receptive fields. Neuron, 25, 707-715. doi.org/10.1016/S0896-6273(00)81072-7

Meck, W. H., Church, R. M., & Olton, D. S. (1984). Hippocampus, time and memory. Behavioral Neuroscience, 98, 3-22. doi.org/10.1037/0735-7044.98.1.3 PMid:6696797

Milner, B. (1971). Interhemispheric differences in the localization of psychological processes in man. British Medical Bulletin, 27, 272-277. PMid:4937273

Milner, A. D., Ockleford, E. M., & DeWar, W. (1977). Visuo-spatial performance following posterior parietal and lateral frontal lesions in stumptail macaques. Cortex, 13, 170-183.

Mishkin, M. (1957). Effects of small prefrontal lesions on delayed alternation in monkeys. Journal of Neurophysiology, 220, 615-622.

Mishkin, M., & Manning, F. J. (1978). Non-spatial memory after selective prefrontal lesions in monkeys. Brain Research, 143, 313-323. doi.org/10.1016/0006-8993(78)90571-1

Moscovitch M, Rosenbaum RS, Gilboa A, Addis DR, Westmacott R, Grady C, McAndrews MP, Levine B, Black S, Winocur G, and Nadel L.(2005). Functional neuroanatomy of remote episodic, semantic and spatial memory: a unified account based on multiple trace theory. Journal of Anatomy, 207, 35-66. doi.org/10.1111/j.1469-7580.2005.00421.x PMid:16011544 PMCid:1571502

Moser, E. I., Kropff. E. & Moser, M. B. (2008). Place cells, grid cells, and the brain ‘s spatial representation system. Annual Review Neuroscience, 31, 69-89. doi.org/10.1146/annurev.neuro.31.061307.090723 PMid:18284371

Moyer, J. R. Jr., Deyo, R. A., & Disterhoft, J. F. (1990). Hippocampectomy disrupts trace eye-blink conditioning in rabbits. Behavioral Neuroscience, 104, 243-252. doi.org/10.1037/0735-7044.104.2.243 PMid:2346619

Muller, R. U., Ranck, J. B. Jr., & Taube, J. S. (1996). Head direction cells: Properties and functional significance. Current Opinions in Neurobiology, 6, 196-206. doi.org/10.1016/S0959-4388(96)80073-0

Mumby, D. G., & Pinel, J. P. J. (1994). Rhinal cortex lesions and object recognition in rats. Behavioral Neuroscience, 108, 11-18. doi.org/10.1037/0735-7044.108.1.11 PMid:8192836

Mumby, D. G., Wood, E. R., & Pinel, J. P. J. (1992). Object recognition memory is only mildly impaired in rats with lesions of the hippocampus and amygdala. Psychobiology, 20, 18-27.

Myers, C. E., Gluck, M. A., & Granger, R. (1995). Dissociation of hippocampal and entorhinal function in associative learning: A computational approach. Psychobiology, 23, 116-138.

Nadel, L., & Moscovitch, M. (1998) Hippocampal contributions to cortical plasticity. Neuropharmacology, 37, 431–439.

Norman, G., & Eacott, M. J. (2004). Impaired recognition with increasing levels of feature ambiguity in rats with perirhinal cortex lesions. Behaviour Brain Research, 148, 79-91. doi.org/10.1016/S0166-4328(03)00176-1

Oberg, R. G. E., & Divac, I. (1979). “Cognitive” functions of the neostriatum. In I. Divac & R. G. E. Oberg (Eds.), The neostriatum (pp. 291-313). Oxford: Pergamon.

O’Keefe J. & Nadel L. (1978). The hippocampus as a cognitive map. Oxford: Clarendon Press.

O’Keefe, J. (1983). Spatial memory within and without the hippocampal system. In W. Seifert (Ed.), Neurobiology of the hippocampus (pp. 375-403). London: Academic.

O’Keefe, J., & Burgess, N. (1996). Geometric determinants of the place field of hippocampal neurons. Nature, 381, 425–428. doi.org/10.1038/381425a0 PMid:8632799

O’Keefe, J., & Speakman, A. (1987). Single unit activity in the rat hippocampus during a spatial memory task. Experimental Brain Research, 68, 1-27. doi.org/10.1007/BF00255230

Olton, D. S. (1983). Memory functions and the hippocampus. In W. Seifert (Ed.), Neurobiology of the hippocampus. New York: Academic.

Olton, D. S. (1986). Hippocampal function and memory for temporal context. In R. L. Isaacson & K. H. Pribram (Eds.), The hippocampus (Vol. 3). New York: Plenum. doi.org/10.1007/978-1-4615-8024-9_9

Olton, D. S., Wenk, G. L., Church, R. M., & Weck, W. H. (1988). Attention and the frontal cortex as examined by simultaneous temporal processing. Neuropsychologia, 26, 307-318. doi.org/10.1016/0028-3932(88)90083-8

O’Reilly, R. C., & McClelland, J. L. (1994). Hippocampal conjunctive encoding, storage, and recall: Avoiding a trade-off. Hippocampus, 4, 661–682. doi.org/10.1002/hipo.450040605 PMid:7704110

Otto, T., & Eichenbaum, H. (1992). Complementary roles of the orbital prefrontal cortex and the perirhinal-entorhinal cortices in an odor-guided delayed-nonmatching-to-sample task. Behavioral Neuroscience, 106, 762-775. doi.org/10.1037/0735-7044.106.5.762 PMid:1445656

Owen, A. M. (2000). The role of the lateral frontal cortex in mnemonic processing: The contribution of functional neuroimaging. Experimental Brain Research, 133, 33-43. doi.org/10.1007/s002210000398

Parkinson, J. K., Murray, E. A., & Mishkin, M. (1988). A selective mnemonic role for the hippocampus in monkeys: Memory for the location of objects. Journal of Neuroscience, 8, 4159-4167. PMid:3183716

Partiot, A., Verin, M., Pillon, B., Teixeira-Ferreira, C., Agid, Y., & Dubois, B. (1996). Delayed response tasks in basal ganglia lesions in man: Further evidence for a striato-frontal cooperation in behavioural adaptation. Neuropsychologia, 34, 709-721. doi.org/10.1016/0028-3932(95)00143-3

Pasquier, F., Van Der Linden, M., Lefebvre, C., Bruyer, R., & Petit, H. (1994). Motor memory and the preselection effect in Huntington’s and Parkinson’s disease. Neuropsychologia, 32, 951-968.

Passingham, R. E. (1985). Memory of monkeys (Macaca mulatta) with lesions in prefrontal cortex. Behavioral Neuroscience, 99, 3-21. doi.org/10.1037/0735-7044.99.1.3 PMid:4041231

Passingham, R. E., Myers, C., Rawlins, N., Lightfoot, V., & Fearn, S. (1988). Premotor cortex in the rat. Behavioral Neuroscience, 102, 101-109. doi.org/10.1037/0735-7044.102.1.101 PMid:3355650

Petrides, M. (1982). Motor conditional associative learning after selective prefrontal lesions in the monkey. Behavioural Brain Research, 5, 407-413. doi.org/10.1016/0166-4328(82)90044-4

Petrides, M. (1985a). Deficits in nonspatial conditional associative learning after periarcuate lesions in the monkey. Behavioural Brain Research, 16, 95-101. doi.org/10.1016/0166-4328(85)90085-3

Petrides, M. (1985b). Deficits on conditional associate learning tasks after frontal- and temporal-lobe lesions in man. Neuropsychologia, 23, 601-614. doi.org/10.1016/0028-3932(85)90062-4

Petrides, M. (1997). Visuo-motor conditional associative learning after frontal and temporal lesions in the human brain. Neuropsychologia, 35, 989-997. doi.org/10.1016/S0028-3932(97)00026-2

Petrides, M., & Iversen, S. D. (1979). Restricted posterior parietal lesions in the rhesus monkey and performance on visuo-spatial tasks. Brain Research, 161, 63-77. doi.org/10.1016/0006-8993(79)90196-3

Pigott, S. & Milner, B. (1993). Memory for different aspects of complex visual scenes after unilateral temporal- or frontal-lobe resection. Neuropsychologia, 31, 1-15. doi.org/10.1016/0028-3932(93)90076-C

Poucet, B. (1989). Object exploration, habituation, and response to a spatial change in rats following septal or medial frontal cortical damage. Behavioral Neuroscience, 103, 1009–1016. doi.org/10.1037/0735-7044.103.5.1009 PMid:2803548

Pohl, W. (1973). Dissociation of spatial discrimination deficits following frontal and parietal lesions in monkeys. Journal of Comparative and Physiological Psychology, 82, 227-239. doi.org/10.1037/h0033922 PMid:4632974

Quirk, G. J., Muller, R. U., Kubie, J. L.,& Ranck, J. B. Jr. (1992). The positional firing properties of medial entorhinal neurons: Description and comparison with hippocampal place cells. The Journal of Neuroscience, 12, 1945-1963.

Ragozzino, M. E., Adams, S., & Kesner, R. P. (1998). Differential involvement of the dorsal anterior cingulate and prelimbic-infralimbic areas of the rodent prefrontal cortex in spatial working memory. Behavioral Neuroscience, 112, 293-303. doi.org/10.1037/h0090326 doi.org/10.1037/0735-7044.112.2.293 PMid:9588479

Ragozzino, M. E., & Kesner, R.P. (1999). The role of the agranular insular cortex in working memory for food reward value and allocentric space in rats. Behavioural Brain Research, 98, 103-112. doi.org/10.1016/S0166-4328(98)00058-8

Ragozzino, M. E., Detrick, S., & Kesner, R. P. (2001). The effects of prelimbic and infralimbic lesions on working memory for visual objects in rats. Neurobiology of Learning and Memory, 77, 29-43. doi.org/10.1006/nlme.2001.4003 PMid:11749084

Ragozzino, M. E., & Kesner, R. P. (2001). The role of rat dorsomedial prefrontal cortex in working memory for egocentric responses. Neuroscience Letters, 308, 145-148. doi.org/10.1016/S0304-3940(01)02020-1

Ramus, S. J., & Eichenbaum, H. (2000). Neural correlates of olfactory recognition memory in the rat orbitofrontal cortex. Journal of Neuroscience, 20, 8199-8208. PMid:11050143

Rao, S. R., Rainer, G., & Miller, E. K. (1997). Integration of what and where in the primate prefrontal cortex. Science, 276, 821-823. oi.org/10.1126/science.276.5313.821 Mid:9115211

Rogers, J. L., Hunsaker, M. R., & Kesner, R. P. (2006). Effects of ventral and dorsal CA1 subregional lesions on race fear conditioning. Neurobiology of Learning and Memory, 72-81. doi.org/10.1016/j.nlm.2006.01.002 PMid:16504548

Rolls, E. T. (1989). Functions of neuronal networks in the hippocampus and neocortex in memory. In J. H. Bryne & W.O. Berry (Eds.), Neural models of plasticity: Experimental and theoretical approaches (pp. 240–265). San Diego: Academic.

Rolls, E. T. (1996). A theory of hippocampal function in memory. Hippocampus, 6, 601–620. doi.org/10.1002/(SICI)1098-1063(1996)6:6 3.0.CO;2-J

Rolls, E. T., & Baylis, L.L. (1994). Gustatory, olfactory, and visual convergence within the primate orbitofrontal cortex. Journal of Neuroscience, 14, 5437-5452. PMid:8083747

Rolls, E. T., & Kesner, R. P. (2006). A computational theory of hippocampal function, and empirical tests of the theory. Progress in Neurobiology, 79, 1–48. doi.org/10.1016/j.pneurobio.2006.04.005 PMid:16781044

Sakai K., & Miyashita,Y. (1991). Neural organization for the long-term memory of paired associates. Nature, 354, 152-155. doi.org/10.1038/354152a0 PMid:1944594

Sanberg, P. R., Lehmann, J., & Fibiger, H. C. (1978). Impaired learning and memory after kainic acid lesions of the striatum: A behavioral model of Huntington’s disease. Brain Research, 149, 546-551. doi.org/10.1016/0006-8993(78)90502-4

Sanchez-Santed, F., de Bruin, J. P. C., Heinsbroek, R. P. W., & Verwer, R. W. H. (1997). Spatial delayed alternation of rat in a T-maze: Effects of neurotoxic lesions of the medial prefrontal cortex and of T-maze rotations. Behavioural Brain Research, 84, 73-79. doi.org/10.1016/S0166-4328(97)83327-X

Santi, A., & Weise, L. (1995). The effects of scopolamine on memory for time in rats and pigeons. Pharmacology, Bochemistry, and Behavior, 51, 271-277. doi.org/10.1016/0091-3057(94)00376-T

Schacter, D. L. (1987). Implicit memory: History and current status. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 501-518. doi.org/10.1037/0278-7393.13.3.501

Schacter, D. L., & Tulving, E. (Eds.). (1994). Memory systems 1994. Cambridge: MIT Press.

Seamans, J. K., Floresco, S. B., & Phillips, A. G. (1995). Functional differences between the prelimbic and anterior cingulate regions of the rat prefrontal cortex. Behavioral Neuroscience, 109, 1063-1073. doi.org/10.1037/0735-7044.109.6.1063 PMid:8748957

Shapiro, M. L., & Olton, D. S. (1994). Hippocampal function and interference. In: D.L. Schacter & E. Tulving (Eds.), Memory systems 1994 (pp. 141–146). Cambridge: MIT Press. PMid:8185958

Shaw, C., & Aggleton, J. P. (1993). The effects of fornix and medial prefrontal lesions on delayed non-matching-tosample by rats. Behavioural Brain Research, 54, 91-102. doi.org/10.1016/0166-4328(93)90051-Q

Small, S. A., Schobel, S. A., Buxton, R . B., Witter, M. P., & Barnes, C. A. (2011). A pathophysiological framework of hippocampal dysfunction in aging and disease. Nature Reviews Neuroscience, 12, 585-601. doi.org/10.1038/nrn3085 PMid:21897434 PMCid:3312472

Small, D. M., Zald, D. H., Jones-Gotman, M., Zatorre, R. J., Pardo, J. V., Frey, S., & Petrides, M. (1999). Human cortical gustatory areas: A review of functional neuroimaging data. Neuroreport, 10, 7-14. doi.org/10.1097/00001756-199901180-00002 PMid:10094124

Smith, M. L., & Milner, B. (1981). The role of the right hippocampus in the recall of spatial location. Neuropsychologia, 19, 781-793. doi.org/10.1016/0028-3932(81)90090-7

Squire, L. R. (1994). Declarative and nondeclarative memory: Multiple brain systems supporting learning and memory. In D. L. Schacter & E. Tulving (Eds.), Memory systems 1994 (pp. 203-231). Cambridge: MIT Press.

Squire, L.R., Stark, C.E. & Clark R.E. (2004). The medial temporal lobe. Annual review Neuroscience, 27, 279-306. doi.org/10.1146/annurev.neuro.27.070203.144130 PMid:15217334

Suzuki, W. A., Zola-Morgan, S., Squire, L. R., & Amaral, D. G. (1993). Lesions of the perirhinal and parahippocampal cortices in the monkey produce long-lasting memory impairment in the visual and tactual modalities. The Journal of Neuroscience, 13, 2430-2451. PMid:8501516

Stark, S. M., Yassa, M. A., & Stark, C. E. (2010). Individual differences in spatial pattrn separation performance associasted with healthy aging humans. Learning and Memory, 17, 284-288. doi.org/10.1101/lm.1768110 PMid:20495062 PMCid:2884287

St-Laurent, M., Petrides, M., & Sziklas, V. (2009). Does the cingulate cortex contribute to spatial conditional associative learning in the rat? Hippocampus, 19, 612-622.

St-Laurent, M., Petrides, M., & Sziklas, V. (2009). Does the cingulate cortex contribute to spatial conditional associative learning in the rat? Hippocampus, 19, 612-622. doi.org/10.1002/hipo.20539 PMid:19123251

Tees, R. C. (1999). The effects of posterior parietal and posterior temporal cortical lesions on multimodal spatial and nonspatial competences in rats. Behavioural Brain Research, 106, 55-73. doi.org/10.1016/S0166-4328(99)00092-3

Tolentino, J. C., Pirogovsky, E., Luu, T., Toner, C. K., & Gilbert, P. E. (2012). The effect of interference on temporal order memory for random and fixed sequences in nondemented older adults. Learning and Memory, 19, 1-6. doi.org/10.1101/lm.026062.112 PMid:22615480

Toni, I., Ramnani, N., Josephs, O., Ashburner, J., & Passingham, R. E. (2001). Learning arbitrary visuomotor associations: Temporal dynamic of brain activity. NeuroImage, 14, 1048-1057. doi.org/10.1006/nimg.2001.0894 PMid:11697936

Tulving, E. (1983). Elements of episodic memory. Oxford: Clarendon.

Turner, G. R., & Levine, B. (2006). The functional neuroanatomy of classic delayed response tasks in humans and the limitations of cross-method convergence in prefrontal function. Neuroscience, 139, 327-337. doi.org/10.1016/j.neuroscience.2005.08.067 PMid:16324791

Ungerleider, L. G. (1995). Functional brain imaging studies of cortical mechanisms of memory. Science, 270, 769-775. doi.org/10.1126/science.270.5237.769 PMid:7481764

Vermersch, A.-I., Gaymard, B. M., Rivaud-Pechoux, S., Ploner, C. J., Agid, Y., Pierrot-Deseilligny, C. (1999). Memory guided saccade deficit after caudate nucleus lesions. Journal of Neurology, Neurosurgery and Psychiatry, 66, 524-527. doi.org/10.1136/jnnp.66.4.524

Wager, T. D., & Smith, E. E. (2003). Neuroimaging studies of working memory: A meta-analysis. Cognitive, Affective, and Behavioral Neuroscience, 3, 255-274. doi.org/10.3758/CABN.3.4.255

Watanabe, M. (1986). Prefrontal unit activity during delayed conditional Go/No-Go discrimination in the monkey. II. Relation to Go and No-Go responses. Brain Research, 10, 15-27. doi.org/10.1016/0006-8993(86)90105-8

Weeden, C. S. S. , Hu, N. J., Ho, L. Y. N. & Kesner, R. P. (2012). The role of the ventral dentate gyrus in olfactory learning and memory. Society for Neuroscience Abstracts.

Weiskrantz, L. (1956). Behavioral changes with ablation of the amygdaloid complex in monkeys. Journal of Comparative Physiological Psychology, 49, 381-391. doi.org/10.1037/h0088009

Weiskrantz, L., & Saunders, C. (1984). Impairments of visual object transforms in monkeys. Brain, 107, 1033-1072. doi.org/10.1093/brain/107.4.1033 PMid:6509307

Whishaw, I. A., Tomie, J., & Kolb, B. (1992). Ventrolateral prefrontal cortex lesions in rats impair the acquisition and retention of a tactile-olfactory configural task. Behavioral Neuroscience, 106, 597-603. doi.org/10.1037/0735-7044.106.4.597 PMid:1503655

Wilson, F. A. W., Scalaidhe, S. P. O., & Goldman-Rakic, P. S. (1993). Dissociation of object and spatial processing domains in primate prefrontal cortex. Science, 260, 1955-1957. doi.org/10.1126/science.8316836 PMid:8316836

Yassa, M. A., Lacy, J. W., Stark, S. M., Albert, M. S., Gallagher, M., & Stark, C. E. L. (2010). Pattern deficits associated with increased hippocampal CA3 and dentate gyrus in nondemented older adults. Hippocampus, 21, 968-979. PMid:20865732 PMCid:3010452

Yassa, M. A., & Stark, C. E. L. (2011). Pattern separation in the hippocampus. Trends in Neurosciences, 34, 515-525. doi.org/10.1016/j.tins.2011.06.006 PMid:21788086 PMCid:3183227

Zajonc, R. B. (1968). Attitudinal effects of mere exposure. Journal of Personality and Social Psychology, 9, 1-27. doi.org/10.1037/h0025848

Zhu, X. O., Brown, M. W., & Aggleton, J. P. (1995). Neuronal signaling of information important to visual recognition memory in rat rhinal and neighbouring cortices. European Journal of Neurosciences, 7, 753-765. doi.org/10.1111/j.1460-9568.1995.tb00679.x


How to Reference This Article:

Kesner, R. (2013). Neurobiological Foundations of an Attribute Model of Memory. Comparative Cognition & Behavior Reviews, 8, 29-59. Retrieved from http://comparative-cognition-and-behavior-reviews.org/index.html doi:10.3819/ccbr.2013.80003


Contact Information:

Raymond Kesner
Department of Psychology
University of Utah
Salt Lake City , Utah
Phone: (801) 581 7430

Volume 8: pp. 13-28

healy_figure_2_smWhat hummingbirds can tell us about cognition in the wild

Susan D. Healy,
University of St. Andrews, St. Andrews, Fife, UK

T. Andrew Hurly,
University of Lethbridge, Lethbridge, Alberta, Canada

Reading Options:

PDF | Add to Endnote | Kindle | eBook


Abstract

Here we review around 20 years of experimental data that we have collected during tests of cognitive abilities of free-living, wild rufous hummingbirds Selasphorus rufus at their breeding grounds in southwestern Alberta. Because these birds are readily trained to feed from artificial flowers they have proved a useful system for testing cognitive abilities of an animal outside the box wherein animal cognitive abilities are so often tested in the laboratory. And, although these data all come from a single species in a single location, the long-term aim of this work is to make a contribution to our understanding of the evolution of cognitive abilities, by examining the relationship between the ecological demands these birds face and their cognitive abilities. Testing predictions based on our knowledge of their ecology we have found that, while these birds aggressively defend a territory and display to females during the time we train and test them, they can learn and remember the locations of rewarded flowers, what those flowers look like, and when they are likely to contain food. Small-brained though they may be, these 3g hummingbirds appear to have cognitive capabilities that are not only well matched to their ecological demands, they are in at least some instances better (more capacious) than those of animals tested in the laboratory.

Keywords: hummingbird; field research; spatial orientation; timing; choice

Acknowledgements: We thank Rachael Marshall for helpful comments on an earlier version of manuscript, NSERC for funding T.A.H. and all of the cabin team for their contributions to this work. Corresponding author: susan.healy@st-andrews.ac.uk All previously published figures and photographs are used with permission of the authors.


Around 15 years have passed since the publication of Animal Cognition in Nature (Balda, Pepperberg, & Kamil, 1998). Ironically, this was a volume that did not, in fact, actually contain any chapters examining the cognitive abilities of animals in the wild. It did, however, contain descriptions of work on wild animals trained and tested under laboratory conditions and seemed to herald a major expansion of work on comparative cognition to encompass a much wider range of species than previously tested. A decade and a half later, however, it is not clear that that promise is being realised. For example, food storing, once a model for examining questions of the evolution of cognition and possibly the wildest of all the examples discussed in Balda et al. (1998), is now much less of a focus (e.g., Biegler, McGregor, Krebs, & Healy, 2001; Hampton & Shettleworth, 1996; Sherry & Vaccarino, 1989; but see Feeney, Roberts, & Sherry, 2009; Freas, LaDage, Roth, & Pravosudov, 2012). Food storing did, however, lead to perhaps the greatest recent flurry of excitement and effort in comparative cognition (Clayton & Dickinson, 1999): the examination of cognitive abilities in corvids. Subsequent work is now ranging from examination of episodic-like memory in a number of species including rats (Babb & Crystal, 2005), magpies Pica pica (Zinkivskay, Nazir, & Smulders, 2009), chickadees Poecile atricapillus (Feeney, et al., 2009), hummingbirds (Henderson, Hurly, Bateson, & Healy, 2006a), and meadow voles Microtus pennsylvanicus (Ferkin, Combs, Delbarco-Trillo, Pierce, & Franklin, 2008) to examination of problem-solving in a variety of contexts, typically by corvids but not always (Auersperg, Huber, & Gajdon, 2011; Dally, Emery, & Clayton, 2010; Schmidt, Scheid, Kotrschal, Bugnyar, & Schloegl, 2011; Taylor, Elliffe, Hunt, & Gray, 2010; Teschke & Tebbich, 2011; Weir, Chappell, & Kacelnik, 2009).

In fact, much of comparative cognition can be comfortably addressed in the laboratory, even when wild animals are tested. This may help to explain why there continues to be very little examination of cognitive abilities of animals in the wild, in what might be considered to be the real world. That world is one in which animals are faced daily with getting food, finding mates, avoiding predation, and this is where selection acts on cognitive abilities, perhaps favouring animals that are generally smart or, alternatively, favouring animals that are good at solving particular problems. The questions, then, differ slightly from those asked of animals in a laboratory i.e., not just what animals can do but what and how do they put those abilities to work when the test itself does not occupy much of their day. It is possible that we will find that animals’ cognitive abilities in the field differ little or not at all from those we see in the laboratory. For example, the use of food deprivation in the laboratory to motivate animals to perform a test may resemble the state in which many wild animals find themselves i.e., often hungry and very willing to work for reliable food rewards. On the other hand, having to watch out for predators or competitors may mean that animals attend to experimental features differently than if they were to be tested in the field or the spatial scale over which testing occurs (Figure 1). Natural conditions might also lead to different cue use or different cue weighting than we see when animals are tested in boxes, arenas or (relatively) small rooms in the laboratory.

Figure 1. A photo showing the landscape in which we train and test our hummingbirds. Birds typically defend territories that contain both open fields and some wooded areas. In this photo, one bird defended a territory at the far end of the field and a second male defended a territory around the location at which the photograph was taken. Photo by T. A. Hurly.

Figure 1. A photo showing the landscape in which we train and test our hummingbirds. Birds typically defend territories that contain both open fields and some wooded areas. In this photo, one bird defended a territory at the far end of the field and a second male defended a territory around the location at which the photograph was taken. Photo by T. A. Hurly.

Going out into the field to test cognitive abilities certainly shares problems with laboratory tests, not least of which is being sure that the animal ‘answers’ the question experimenters think they are asking. If an animal fails to respond in an experiment, for example, it is frequently unclear whether this is because the animal is not motivated to respond or does not ‘know’ how to respond. Only when the animal does make a response that seems vaguely appropriate can we begin to measure its performance. Even then, variation in its performance may be due to motivation rather than to cognitive ability per se. In the field the animal may be distracted mid-test or simply fail to return to the test after failing to find a reward. A second major issue that has arisen with recent tests of cognitive ability in the wild is with the ‘unit of measurement’ for cognitive ability, especially when problem solving is that measure. Thus far, we are not aware of a general consensus as to what constitutes a problem or what makes one problem more difficult than another. For example, currently, manipulation of physical material to retrieve food from manmade devices is considered by some to require ‘complex’ cognition although an apparently similar manipulation of materials to build a nest, however complex, is not (Seed & Byrne, 2010; but see Muth & Healy, 2011; Walsh, Hansell, & Healy, 2010; Walsh, Hansell, Borello, & Healy, 2011; van Casteren, Sellers, Thorpe, Coward, Crompton, Myatt, & Ennos, 2012). Is a problem considered more difficult if it has more steps to the solution, even if each step is ‘easy’, or is a problem more difficult if it is more novel, either in appearance or in its solution? It would seem that there is a problem in using problem solving as a measure for cognitive abilities in animals. And if we have no measure that is readily quantifiable, then it will not be possible to determine the causes or consequences of variation in that measure, within or across species.

It will come as no surprise that we have not attempted to examine problem solving in our work examining the cognitive abilities of rufous hummingbirds, trained and tested in the wild at our field site in the eastern Rocky Mountains, Alberta, Canada. At least, we have not looked at their ability to manipulate tools or to solve problems of the kinds that crows and others are now being set. Rather we have required our birds to learn to feed from all manner of devices (Figure 2), which they have invariably been very quick to do, typically learning within a couple of hours where to insert their tongue to receive sugar solution (sucrose). Although some might say that speed of learning in itself indicates cognitive ability (e.g., Boogert, Fawcett, & Lefebvre, 2011; Keagy, Savard, & Borgia, 2012), the fact that these birds learn so readily has for us largely meant that they are a useful species for examining cognition in the field: animals that took 100’s or 1000’s of trials to learn how to solve a task would have led us to look for other species. Here we review our work with two aims in mind: (1) to show that basing an experimental framework on knowing the ecology of a species can lead to a useful understanding of that species’ cognitive abilities, and, (2) in light of the paucity of work done in the wild, we want to use our work on rufous hummingbirds as a case study to show what is possible to do in the messiness of the field, where our control over the animal’s behaviour and experience is compromised. We would hope to show that such a pursuit can be both fruitful and that by doing so we add usefully to our understanding of cognition acquired from laboratory experiments. By so doing we would hope to encourage others similarly to go out into the field to examine cognition in other species. If we want to understand the evolution of cognitive abilities, especially in the vertebrates, the answers will not come from work on a single species, irrespective of the depth of enquiry.

Figure 2. Four examples of the kind of feeding device to which the birds can be readily trained. With these ‘flowers’ we can vary the quantity of sucrose, the number of flowers, their spatial proximity and their visual features. The photograph at the top left is of a board of the kind we use in the context-dependent experiments. The next two photographs show birds about to and feeding from our most commonly-used flower type, a cardboard disc with a central well formed from a syringe tip or cap. The bottom right photograph shows a hummingbird choosing florets on artificial inflorescences. Photos by T. A. Hurly.

Figure 2. Four examples of the kind of feeding device to which the birds can be readily trained. With these ‘flowers’ we can vary the quantity of sucrose, the number of flowers, their spatial proximity and their visual features. The photograph at the top left is of a board of the kind we use in the context-dependent experiments. The next two photographs show birds about to and feeding from our most commonly-used flower type, a cardboard disc with a central well formed from a syringe tip or cap. The bottom right photograph shows a hummingbird choosing florets on artificial inflorescences. Photos by T. A. Hurly.

Interested as we are in comparative cognition, we have two significant reasons for attempting to determine the cognitive abilities in a single species, specifically rufous hummingbirds, in the wild. Firstly, these birds are logistically amenable to testing. As described elsewhere (e.g., Healy & Hurly, 2003, 2004), the males (the focus of our efforts) are strongly territorial, excluding conspecifics from feeding and thus from being trained to use our experimental equipment, they can be readily marked for individual identification and they feed every 10-15 minutes throughout the day for the duration of the breeding season (Figure 3). Although one might say that cognition is studied in the laboratory for logistic reasons such as experimental and experiential control, the choice of a species to test in the wild is, at this relatively early stage of such work, crucial to success. For example, the fact that our hummingbirds feed every 10-15 minutes means that we can collect a useful amount of data within a day and across our six-week field season. Choosing to work with animals that feed once a day or less often, would lead to major issues with training animals and collecting enough data, without each study taking a lifetime!

Equally key to the success of the endeavour is that the behaviour and ecology of these birds is such that we can readily formulate predictions as to the nature of the birds’ cognitive abilities from our observations of the birds’ foraging behaviour. Rather than using a rather arbitrary task to examine their cognitive abilities, we can attempt to test those abilities we would expect might have been favoured in these particular animals. Foraging behaviour in the male rufous hummingbirds, at least, typically consists of a male flying approximately every 10 minutes from a conspicuous perch in his territory to feed (from flowers or our feeders; Figure 4) for a handful of seconds before returning to his perch. The intervening time before his next foraging bout can be filled with a considerable activity as he is constantly on the lookout for conspecific males and females. Territorial males display to conspecific rivals by the flashing of their bright orange gorget (throat) feathers. If this does not deter an intruder, it will be chased off at high speed. Females are also chased, especially off the feeders, but they tend to move to a position near the ground while the male performs several display flights. These consist of the male flying up (some 15m) and then flying steeply downwards before pulling out of his dive just above the head of the female, flying a short upward sweep and ending with a waggle (a short series of oscillations in the vertical plane). He then either repeats this manoeuvre several times or flies to the female and performs a shuttle-flight – a series of short zig-zag buzzing flights in front of her. The aim of this game is to persuade the female to mate (Hurly, Scott, & Healy, 2001). Although we have not measured the energy expenditure of the males’ various flight acrobatics, it appears that they would be energetically expensive (Clark, 2009). Indeed, males tend to visit the feeder (flowers) within a few minutes of such displays, although their visits are still no longer than a few seconds. Our very first speculation with regard to their cognitive abilities, then, was that this small (about 3g) nectarivore, defending several hundred (or more) flowers and feeding about every ten minutes for a few seconds only, might benefit from remembering which flowers he had recently visited (Healy & Hurly 2001). Not only would he save time and energy by remembering where they were, it should be useful to remember whether he had emptied the flower(s), or not. A bird that could do this would return to territory defence and mate attraction more quickly having expended less energy.

Figure 3. Photographs showing the elevated feeder (to deter bears) being lowered during pre-training (left), a newly marked bird in the hand (top right) and a marked bird feeding during an experiment (bottom right). Photos by T. A. Hurly.

Figure 3. Photographs showing the elevated feeder (to deter bears) being lowered during pre-training (left), a newly marked bird in the hand (top right) and a marked bird feeding during an experiment (bottom right). Photos by T. A. Hurly.

The success of our very first, speculative experiment set the scene for most of the experiments that have followed. In that first experiment, we presented rufous hummingbirds with an open-field analogue of a radial-arm maze: an array of artificial flowers, some of which contained a small amount of sucrose solution (Healy & Hurly, 1995). The flowers were coloured cardboard discs approximately 6cm in diameter, each glued to the end of a wooden stake 60cm tall (Figures 2 and 5). They were arranged in a rough circle with about 70cm between neighbouring flowers. For this experiment, the flowers held 40μl, an amount that meant the birds should visit and drink all of the contents of about four of the eight flowers. We presented the birds with two versions of this delayed-non-matching-to-sample task: in one version, all eight flowers contained reward but we allowed birds to visit only up to four flowers on their first visit to the array and in the second version, all eight stakes were presented but only four bore flowers. For both versions, then, birds visited and emptied up to four flowers of their contents. On their return to the array, after intervals ranging from five minutes up to an hour, all eight flowers were present but only the flowers that had not been visited in the first phase of the trial contained food. As predicted, the birds were much more likely to visit flowers they had not recently emptied. A follow-up experiment showed that birds were also more likely to visit the flowers that had been present when they first visited the array but from which they did not drink (Hurly, 1996; see also Henderson, Hurly, & Healy, 2001). Although memory for perhaps as many eight flowers is not in the ball park of the number of flowers thought to make up the territory of these birds (perhaps a couple of thousand), the birds could remember not just where the flowers were but that they had emptied them.

Figure 4. Once they arrive at our field site, the males establish feeding territories centred around feeders we have hung along the valley a week or two before they arrive. Typically the feeders contain 14% sucrose, which is much weaker than the nectar provided by the flowers from which the birds would normally feed. Photo by T. A Hurly.

Figure 4. Once they arrive at our field site, the males establish feeding territories centred around feeders we have hung along the valley a week or two before they arrive. Typically the feeders contain 14% sucrose, which is much weaker than the nectar provided by the flowers from which the birds would normally feed. Photo by T. A Hurly.

That the birds could remember something about a flower’s contents was confirmed by an experiment in which we were actually aiming to address the role that flower colour played in the birds’ ability to learn which flowers to visit. We expected that the birds would pay attention to the colour of the flowers both because, like us, birds have well-developed colour vision and there is much anecdotal evidence that hummingbirds are attracted to red objects. There was also speculation that this predilection for red had lead to the propensity for the Californian flora, which lie on the migratory path of these birds, to produce red flowers. In that experiment we presented the birds with four, individually coloured flowers, only one of which contained sucrose solution and too much for a bird to consume in one visit. Once a bird had found the rewarded flower and then left after drinking as much as he wished, we emptied the flower and switched it with one of the other flowers in the array. When the bird returned, he was more likely to go to the flower that was in the location of the flower he had most recently fed from, rather than to the flower with the colour of the earlier, rewarded flower (Hurly & Healy, 1996; Miller & Miller, 1971; Miller, Tamm, Sutherland, & Gass, 1985). Consideration of the nature of the birds’ ecology helps to explain why these bird seemed to ignore the colour cue provided by the flower: in a field of flowers of the same species, colour does not help the bird determine which flowers will be rewarding. Colour might, however, be used to find flowers in unfamiliar places, such as along a migratory route and red may well be more conspicuous against a background of browns and greens that make up a western North American mountain range. It has also been argued elsewhere that the ubiquity of red flowers along the migration route of the rufous has less to do with attracting hummingbirds than making flowers inconspicuous to insects, whose vision is poorer at longer wavelengths (Altshuler, 2003; Briscoe & Chittka, 2001; Raven, 1972).

Figure 5

Figure 5. Flowers in an array used by Rachael Marshall (in photo) in one of her timing experiments, showing the proximity of the experimenter to the array. Photo by T. A. Hurly.

The longer we have experimented with these birds, the more we have found evidence for their ability to learn information as is necessary, because we have found they will learn and remember the colours of flowers, we just had to ask them in the appropriate fashion. Two examples will illustrate this. Firstly, in an experiment in which we were primarily interested in the accuracy with which they could remember a location, some birds were trained that yellow flowers were rewarded while others were taught that red flowers were rewarded. This colour-reward association was learned within 2-3 experiences (Hurly & Healy, 1996). Secondly, we trained birds that three of the flowers in an array of ten contained sucrose solution. All of the flowers differed in their colour pattern. Once the birds had learned which were the rewarded flowers we moved the array 2m from the site of the original array so the birds could not use the location of the flowers to determine which were the rewarded flowers. However, it was not until we had also changed the shape of the moved array that we found that the birds had remembered the colours of the flowers rewarded in the first array (Hurly & Healy, 2002; Figure 6). The birds can and will learn and remember colour but our ability to demonstrate that they can and will do so required us to be much more particular about our experimental designs. We would have been both remiss and incorrect if we had concluded that rufous hummingbirds were unable to learn colour cues, a very ubiquitous cognitive ability.

Not only do we have to be particular about our experimental designs, we also have to be careful about our expectations of what these birds may or may not be capable. Expectations of animals’ cognitive abilities tend to come from two sources, some based on knowledge of the animals’ ecology and others from what might loosely be described as being based on their brain size. Rufous hummingbirds weigh around 3g and, although they have a brain that is larger than expected for their body size (Ward, Day, Wilkening, Wylie, Saucier, & Iwaniuk, 2012), that brain is still not very large. Knowledge of the birds’ ecology leads to expectations that these birds might, for example, pay more attention to spatial information than to colour information (but that they would still pay attention to colour in the relevant contexts), while their brain size might lead to expectations of noticeable limits to the capacity for and speed and the accuracy with which the birds learn spatial locations. We are familiar with onetrial learning from the retrieval successes of food-storers and from long-delay taste aversion learning but even in tasks where animals are highly motivated to learn locations such as in rats searching for hidden platforms in the Morris water maze, animals often either take several trials to learn a location or require some time exploring the location in a first visit. Like food-storing birds, rufous hummingbirds, however, learn the three-dimensional location of a reward from a single visit that lasts only a few seconds. They can return to that location even in the absence of the flower (Flores Abreu, Hurly, & Healy, 2012) and they can visit several such ‘empty’ locations. Usefully, hummingbirds can demonstrate their memory for a rewarding location in the absence of the local cues of that reward because they will fly to, and then hover at, that location, much as a rat in a water maze will swim back and forth over the place it has learned to find a hidden platform.

Figure 6. Schematic of the arrays used to determine that the birds did learn and remember colour (redrawn from Healy and Hurly 2002, above – treatment; below - control).

Figure 6. Schematic of the arrays used to determine that the birds did learn and remember colour (redrawn from Healy and Hurly 2002, above – treatment; below – control).

Although we have not measured the accuracy with which the birds can return to a flower that has been removed after a single visit, we have attempted to measure the 3-D accuracy of memories for a familiar location. We trained birds to fly to a rewarded flower (a red 8cm3 cardboard cube) in a large featureless field, then removed the flower and filmed the bird’s flight path into the location of the now-missing flower. The birds flew to within 60cm in the horizontal plane and within 20cm in the vertical plane (Hurly, Franz, & Healy, 2010). They did not appear to beacon to the flower in spite of the ‘flower’ being highly conspicuous, as when we simply moved the flower about 1.5m, the birds flew nearer to the location of the missing flower, and hovered, before flying directly to the moved flower. The birds’ accuracy for a flower’s location seems to depend on the size of the flower. In a second experiment, we trained birds to feed from either a small (8cm3) or a large flower (1000cm3). This time, in the absence of the flower, the birds flew even closer to its previous location than they had in the earlier experiment (the locations were not the same in the two presentations): 20cm in the horizontal and around 5cm in the vertical when the flower was small and around 50cm in the horizontal and about 25cm in the vertical when the flower was large. These data finally allowed us to confirm the precision with which a hummingbird can return to a learned but absent reward described in the many anecdotes of hummingbirds returning to sites of feeders they had fed from during their last migration or breeding season. The appearance of birds at particular windows of houses is a common incentive for people to get feeders out of the cupboard after the winter.

One obvious difference for the birds in our experiments from the birds in these reports, however, was that we deliberately chose to place the experimental flower at least 10m from any obvious landmarks (e.g., bushes, trees; Figures 1 and 5). The data from experiments on various species in the laboratory would suggest that the birds might have learned the flower’s location in one of three ways: they may have learned the visual characteristics of the flower and used it as a beacon, they could have used the landmarks proximal to the flower or, they used a number of distal landmarks. The behaviour of our real-world animals, however, does not readily conform to any of these three possibilities: while they can use the flower as a beacon, as shown by the birds flying to the moved flower once they discover the one in the familiar location is missing, they do not need to beacon to the flower and they do not do so preferentially. Graham, Fauria, & Collett, (2003) suggested that their ants might use large landmarks along a route as beacons while learning that route and that those landmarks might act as a scaffold for learning other landmarks nearby so that the animals could move along the route if the beacons were then removed. While this seems plausible for our hummingbirds it is not at all clear which landmarks along the way would have formed this scaffold. The proximal landmarks were (to our eyes, at least) remarkably uniform: the ground was quite flat, and covered by vegetation that reached perhaps 20cm punctured by multiple ground squirrel burrows. The distal landmarks, on the other hand, were very conspicuous and ranged from trees, typically ringing the open fields used for training and testing, to the mountains rising some 1000m along both sides of the valley and visible from all points in the open fields. However, it was still not clear how such large landmarks would enable the birds to be quite so accurate in their return to the flower location. It is possible that the birds did not use visual cues at all as they may well have used magnetic or sun compass cues instead but there is no reason to believe that such cues would be better at supporting accuracy than are distal visual landmarks. We have no useful information on the use of either of these systems in hummingbirds although there is an abundance of data for their use by other birds so it seems at least plausible that hummingbirds might also use them. We also cannot reject the possibility that the birds did use either proximal or distal visual landmarks simply because we cannot yet determine which they might have used or how they might have used them.

Surprisingly, these data on the memory that hummingbirds have for rewards located in three-dimensional space are among only a handful of data on spatial cognition in 3-D (Grobéty & Schenk, 1992; Holbrook & Burt de Perera, 2009; Holbrook & Burt de Perera, 2011). Although all flying and swimming animals live their lives moving through 3-D space, even terrestrial animals can and will move through the vertical dimension of their world and yet, we know almost nothing about whether the z-dimension is encoded differently from the way in which x and y dimensions are encoded, whether the animals pay attention to the dimensions differently or whether either or both of these are dependent on whether the animal moves through the z-dimension. Rufous hummingbirds might remember flowers bettewr if they differ in their height (Henderson, Hurly, & Healy, 2006b) and they may also remember a 3-D location more accurately than do rats, when both species have learned a location in a 3-D maze (Flores Abreu, 2012). Whether these outcomes are due to greater familiarity with moving through 3-D space or to an ability that has been favoured by natural selection is not yet clear and requires investigation of 3-D spatial performance in more species.

That performance depends on the way in which the question is asked is demonstrated yet again because hummingbirds apparently encode vertical information less accurately than horizontal information when the locations to be discriminated differ only in their vertical component: when flowers were presented on a vertical pole (Figure 7), birds found it difficult to learn which one of five flowers was rewarded but when the flowers were presented along a diagonal pole, the birds were relatively quick to learn which was the rewarded flower (Flores Abreu, Hurly, & Healy, 2013). Here it appears that the addition of a horizontal component to the flower’s location may have facilitated the learning of its vertical location. Additionally, when presented with only two flowers the location of which differed only in the vertical component, hummingbirds appeared to learn which was the rewarded flower relative to the other i.e., whether the flower was the upper or the lower of the two flowers (Henderson et al., 2006b).

Unlike 3-D spatial cognition, a considerable amount of work has been conducted on a variety of species into the ways they learn and use time. Interval timing over short time periods has been well studied in the laboratory while circadian timing has been much investigated in both the laboratory and the field (e.g., Sylvia borin, Biebach, Falk, & Krebs, 1989; pigeons Columba livia, Saksida & Wilkie, 1994; hamsters Mesocricetus auratus, Cain, Chou, & Ralph, 2004; and bees Apis mellifera, Pahl, Zhu, Pix, Tautz, & Zhang, 2007). More recently, investigations into episodiclike memory (often also called what-where-when memory) have also raised questions as to what kind of time constitutes the ‘when’ component of this kind of memory. There are a priori reasons to suppose that rufous hummingbirds might also be capable of using more than one kind of time. For example, it has been suggested that territorial hummingbirds, like the rufous, use defence by exploitation as a means to exclude nectarivorous intruders and that this is effected by the territorial holder feeding on, and thereby emptying, the flowers at the edge of his territory early in the day (Paton & Carpenter, 1984). Traplining, whereby an animal moves around a circuit of resources (such as flowers) in a predictable pattern and time period, may also be used by foraging hummingbirds that are using their floral resources effectively (Feinsinger & Colwell, 1978). Finally, for hummingbirds foraging on flowers that refill their nectar supplies (perhaps within a few hours), being able to remember which flowers they visited and when, would enable more effective foraging. The biology of these birds, then, suggests that they might be capable of circadian timing, sequence learning and/ or interval timing.

Given the possible daily requirement for remembering when flowers had been emptied, we began investigating the hummingbirds’ ability to learn time intervals by presenting a bird with an array of eight, individually distinctive flowers, four of which were refilled 10 minutes after the bird had emptied them and four were refilled 20 minutes after being emptied. Each flower had contained the same amount and concentration of sucrose so the time to refill was not an indicator of a flower’s contents and the amount in a flower was such that the bird would typically visit 3-5 flowers per visit to the array. However, he could visit all the flowers or only one, the number of flowers and which flowers to visit were chosen by the bird. The array was presented in the morning and the territory owner was allowed to visit at will throughout the day. The flowers were removed overnight and replaced in the same place the following day. As we did not know how long it would take for a bird to learn the refill rates (if at all), we presented each of three territorial birds with an eight-flower array for 11 days. It turned out that the birds had, in fact, learned the refill rates quite well by the end of the first day (our expectations of their performance vastly underestimated their abilities) but the key finding was that all three birds returned to the 10-minute refilling flowers at around 10 minutes and to the 20-minute refilling flowers at around 20 minutes (Henderson et al., 2006a). Although these time periods are much shorter than typical refilling times for real flowers and birds may defend up to a couple of thousand flowers in their territories, we considered that the Henderson et al. (2006a) data at least showed ‘proof of principle’ and that these birds could learn refill rates, for multiple flowers and relatively quickly. Furthermore, not only were the duration of the refill rates longer and the number of flowers that the birds could track more than has been tested in the laboratory, all of the birds were living a very active life throughout their foraging on the array, defending their territory from intruding males and displaying to females. One suspects that if these birds were tested in the more controlled environment of a laboratory, their abilities would seem even more impressive. Rather than testing them in the laboratory, we then looked to see whether, if the colour of the flower signalled the refill rate, the birds would use that colour and learn the refill rate of the flower more quickly than if there was no colour-refill association. This was one of the instances in which we based our expectation of what the birds would learn on the information we would be likely to use ourselves: if the 10-minute refilling flowers were blue and the 20-minute refilling flowers were pink, for example, it would seem that the colour might reduce the time taken to learn the refill schedule. However, it did not affect the speed at which the birds learned which flowers refilled after 10 minutes and which refilled after 20 minutes (Marshall, Hurly, & Healy, 2012; Figure 5). Indeed, there is some preliminary evidence that colour-refill associations may also not affect the speed with which people learn the duration before which flowers refill (Marshall, 2012). In a second experiment reported by Marshall et al., (2012), birds also did not learn which flowers held 20% sucrose solution and which held 30% sucrose sooner when the flower colour signalled the sucrose concentration. Of course, our earlier data whereby birds did not appear to learn the colour of a flower when its location did not change should have alerted us to the probability that colour would also not be salient to the birds in the Marshall et al. (2012) experiment, or at least, not as salient as is space. It appears that space overshadows colour information in a range of contexts. One way to test this would be to move the flowers after each visit. In such a manipulation, space would no longer be a reliable cue and colour would be.

Figure 7. Photographs showing the flower arrays used by Ileana Nuri Flores Abreu to examine the use by the hummingbirds of horizontal and vertical information. Photos by I. N. Flores Abreu.

Figure 7. Photographs showing the flower arrays used by Ileana Nuri Flores Abreu to examine the use by the hummingbirds of horizontal and vertical information. Photos by I. N. Flores Abreu.

Having found that the hummingbirds could learn refill rates, and in experiments on risk sensitivity and context-dependent choice, that they readily learn what (volume and concentration of sucrose) is held in flowers or wells of different colours, it was clear that these birds could learn and remember each of the three (what-where-when) components of episodic-like memory (Hurly & Oseen, 1999; Bateson, Healy, & Hurly, 2002; 2003; Morgan, Hurly, Bateson, Asher, & Healy, 2012). They can also remember pairs of these components: they can remember when and where (Henderson et al., 2006a) and they can remember what and where (e.g., Healy & Hurly 1995; 1998) although we have not explicitly looked for whether they can remember what and when together. The question then was whether they could remember all three components together. Although there is now considerable evidence that a range of animals have episodic-like memory, including rats (Babb & Crystal, 2005), magpies (Zinkivskay et al., 2009), chickadees (Feeney et al., 2009), and meadow voles (Ferkin et al., 2008), issues over designing an appropriate experiment continue to plague this field. The most systematic difficulty concerns the ‘when’ component, with differing groups defining this in different ways (e.g., including a place in a sequence: Ergorul & Eichenbaum, 2004; a time of day: Zhou & Crystal, 2009; and using “which” instead of “when”: Eacott, Easton, Zinkivskay, 2005; Eacott & Norman, 2004). Common to all of these approaches, however, is that the test would allow animals to demonstrate their ability to remember what, where and when in combination. We took a slightly different approach again, which was to design an experiment in which we could explicitly examine all of the components of whatwhere and when memory. We did this by presenting birds with a pair of four-flowers arrays in which the single flower that contained reward differed for the different times of day at which the arrays were presented. The distinctive colours of the four flowers in each array (e.g., blue, yellow, purple, and red; Figure 8) occurred in the same relative positions for both arrays. The arrays were presented in the morning and a territorial male was allowed to search around the flowers to find the rewarded flower. He was then allowed to return to the arrays to feed from the rewarded flower for a further five visits before the arrays were removed. The arrays were presented again in the afternoon and the bird had again to find the rewarded flower. In this latter array presentation, the rewarded flower was in the other array and of one of the other colours. After five subsequent visits to the rewarded flower, the arrays were removed. They were presented to the bird over the next few days at approximately the same times each day. Over the course of these presentations we found that birds visited the eight different flowers differently: there was a single ‘correct’ flower, which was in the correct array at the correct time and of the correct colour, one flower that was in the correct array and of the correct colour but at the wrong time, one flower that was at the correct time and of the correct colour but the wrong array, three flowers that were in the correct array at the correct time, and two flowers that were wrong array, at the wrong time and of the wrong colour. The most common flower the birds chose was the rewarded flower and they went least frequently to the flowers that were completely wrong (Marshall, 2012). Of the other kinds of flowers, relative to chance, the birds went most often to the flower that was rewarded at the other time of day, which is consistent with the data from episodic-like memory tests where it is the when component that is the most difficult for the animal to get right. Furthermore, the birds most readily corrected what errors as they typically flew from a what error flower directly to the correct flower. These outcomes are consistent with what we have seen in other experiments with the birds: they have good spatial memory and they seem to remember colour only when space is not relevant. Although this experiment was not an episodic-like memory experiment as the birds visited the arrays six times at each time period and the trials were not trial-unique, this kind of experimental design might enable a comparison across species in their episodic-like memories of how well they can remember each of the three components. For example, do other animals remember the where better than the what and the when as do hummingbirds? Might a species’ ecology enable us to predict the pattern of variation in the structure of episodic-like memories (if there is one)? It might also help to determine which species serves as the most useful model of human episodic memories. The data from this experiment also suggest that, in addition to being capable of learning intervals, the birds can learn circadian time.

One-trial learning of 3-D locations, accurate spatial memory and timing capacity: these animals have cognitive capabilities that appear to match their ecological requirements, although we have not actually tested the extent of these abilities (e.g., whether the birds can remember most of the flowers in their territories, rather than the handful on which we have tested them). For us, this raises multiple questions, such as how specific are these abilities? Is it the case, for example, that other animals are also capable of this kind of timing capacity or has natural selection favoured this particularly in the hummingbirds and only in other species that have faced similar cognitive challenges in their foraging strategy (or in some other part of their lifestyle)? This question requires comparative data, of the kind gathered to address similar questions asked of the capacities of food storers relative to nonstorers, of sticklebacks living in different environments, and a handful of other species (Girvan & Braithwaite, 1998; McGregor & Healy, 1999; Odling-Smee, Boughman, & Braithwaite, 2008).

Figure 8. Schematic of the what-where-when experiment in which time of day was the ‘when’.

Figure 8. Schematic of the what-where-when experiment in which time of day was the ‘when’.

Accurate spatial memory and timing capabilities allow rufous hummingbirds to make appropriate decisions about which flowers to visit and when and we have used their ecology to make predictions about their cognitive capabilities. We have also used data from the literature on human decision-making to ask questions about whether or not these birds make so-called irrational decisions. A rational decision is one in which the animal/human faced with an array of options is expected to choose the option that obtains the highest utility. For animals, the utility of a foraging option is considered to be the one that provides the highest energetic return. For humans, utility might be measured in terms of energy, finance, or some other useful currency and humans were also always assumed to make rational choices. However, there is now a wealth of data to show that humans do not always make rational decisions and increasing evidence to show that animals, including rufous hummingbirds, also do not make such decisions (Bateson et al., 2002; 2003; Hurly & Oseen, 1999; Latty & Beekman, 2011; Morgan et al., 2012; Sasaki & Pratt, 2011; Scarpi, 2011, Waite, 2001). One of the experimental paradigms used to show that rufous hummingbirds are irrational consists of offering birds a choice between two favourable options (a binary choice set; Figure 2) and a choice among three options, the two favourable options from the binary choice set plus a third poorer option. If the birds were choosing rationally, the inclusion of the poorer option should not affect the relative preference the birds have for the two favourable options, but it does (Bateson et al., 2002; 2003; Morgan et al., 2012). One suggested mechanism for this change in preference is that sampling of the poorer option forces birds to take relatively more of the better of the two favourable options to regain the lost energy. However, the hummingbirds do not always respond to the poorer option by increasing their preference for the option with the highest energy return (Morgan et al., 2012). Another possibility is that the birds assess the options available relative to each other rather than with respect to their absolute values. This might mean then, that the inclusion of a poor option in the choice set might make the better of two favourable options seem much better than when just the two favourable options are presented and so a bird should increase its relative preference for that better option. It might also mean that the inclusion of very poor option to the choice set would lead the bird to perceiving the two favourable options as more similar to each other, which would result in a bird choosing the two favourable options more similarly than it had when presented with the binary choice. Just this experiment is currently underway.

Why hummingbirds might make irrational decisions over foraging options is not yet clear. There are at least two possibilities that have been raised to explain irrational human decision making: 1) the birds are perceptually or cognitively constrained and are simply unable to measure each of the options sufficiently accurately to make the appropriate (rational) choice; 2) the birds are capable of measuring the available options but trade off making the perfect decision with the time taken to make the measurements as the costs of making the perfect decision at every foraging choice (for hummingbirds around every 10 minutes through the day) well outweigh the benefits of that decision. This latter seems intuitively more likely as no one foraging decision for these birds will be worth a bird spending a lot of time assessing the options. However, whether the frequent foraging for small meals means that these birds are more likely to make irrational decisions than are animals making more substantive decisions (in fitness terms), such as mate choice or offspring sex ratio is not yet clear. This is also a question that would be best addressed by a comparative approach, both within and among species. For example, rufous hummingbirds choosing mates may not trade off time with assessment accuracy because this is a decision that has medium to long-term fitness effects. Similarly, animals that feed on few but large meals as do many top predators may also be prepared to take longer to choose among foraging options in order to ensure they choose the option with the highest energetic return.

More comparative data would help to determine the role that cognition plays in the lives of animals and that is played by natural selection in producing variation in cognitive abilities. We are beginning to determine the cognitive capabilities of rufous hummingbirds when those animals are living a fast and furious life in the midst of our experiments. We consider that this work brings us considerably closer to understanding the use to which these animals could put their cognitive abilities than if we had conducted this work in the laboratory. In the real world, the benefits to cognitive abilities might range from the short term such as finding food for a single meal to the medium term such as managing to attract a good mate (Keagy, Savard, and Borgia, 2009), with the ultimate goal of the production of more and/or better offspring. Demonstration of these benefits requires not only the measurement of that ability, it also requires measurement of how variability in that ability maps onto a tangible benefit. However, although most assume that being ‘smarter’ must be beneficial, the data are remarkably sparse. Indeed, to our knowledge, perhaps the first data that might begin to confirm the smarter is better assumption have only just been published (Cole, Cram, & Quinn, 2012). Cole et al. (2012) showed that great tits (Parus major) brought into the laboratory for testing, that learned how to access food from a manmade dispenser, when returned to the wild, went on to lay more eggs over the following four years. These birds also managed to spend less time feeding their offspring. To show these effects Cole et al. (2012) tested, and then tracked in the field, over 400 individuals, a feat that many will find difficult to emulate. However, while laying eggs would suggest actual fitness benefits to greater problem-solving abilities, the solver great tits were also more likely to desert their nests than were those tits that did not solve the food access problem. In the medium term, then, problem-solving great tits did not produce more grand-offspring than did the nonsolvers.

We (collectively) have a way to go before we have good evidence for fitness benefits of cognitive abilities. However, we hope that our focus on our work on rufous hummingbirds shows that one can usefully address questions about cognitive abilities in wild, free-living animals. Furthermore, these data provide tentative evidence for natural selection acting on cognitive abilities: like food-storing songbirds and unlike non-food storing songbirds (e.g., Brodbeck, 1994), rufous hummingbirds place greater emphasis on spatial cues over colour cues, a hierarchy of cue preference that is correlated with their particular ecological demands. However, it remains to be seen whether other animals, living very different lives, will pay attention to time as do the hummingbirds. Finally, if natural selection has shaped cognitive abilities as it appears, then the next challenge will be for us to determine what cognitive capabilities hummingbirds lack.


References

Altshuler, D.L. (2003). Flower color, hummingbird pollination, and habitat irradiance in four Neotropical forests. Biotropica, 35 , 344-355. doi.org/10.1111/j.1744-7429.2003.tb00588.x doi.org/10.1646/02113

Auersperg, A. M. I., Huber, L., & Gajdon, G. K. (2011). Navigating a tool end in a specific direction: stick-tool use in kea (Nestor notabilis). Biology Letters, 7, 825-828. doi.org/10.1098/rsbl.2011.0388 PMid:21636657 PMCid:3210666

Babb, S. J. & Crystal, J. D. 2005. Discrimination of what, when, and where: Implications for episodic-like memory in rats. Learning and Motivation, 36, 177-189. doi.org/10.1016/j.lmot.2005.02.009

Balda, R.P., Pepperberg, I.M., & Kamil, A.C. 1998. Animal cognition in nature: The convergence of psychology and biology in laboratory and field. Academic Press. PMCid:1170538

Bateson, M., Healy, S.D. & Hurly, T.A. 2002. Irrational choices in hummingbird foraging behaviour. Animal Behaviour, 63, 587-596. doi.org/10.1006/anbe.2001.1925

Bateson, M., Healy, S.D. & Hurly, T.A. 2003. Context-dependent foraging decisions in rufous hummingbirds. Proceedings of the Royal Society London B 270, 1271-1276. doi.org/10.1098/rspb.2003.2365 PMid:12816640 PMCid:1691372

Biebach, H., Falk, H. & Krebs, J. R. (1991). The effect of constant light and phase shifts on a learned time-place association in Garden Warblers (Sylvia borin): Hourglass or circadian clock? Journal of Biological Rhythms, 6, 353- 365. doi.org/10.1177/074873049100600406 PMid:1773101

Biegler, R, McGregor, A, Krebs, J.R. & Healy, S.D. 2001. A larger hippocampus is associated with longer lasting spatial memory. Proceedings of the National Academy of Sciences USA, 98, 6941-6944. doi.org/10.1073/pnas.121034798 PMid:11391008 PMCid:34457

Boogert, N.J., Fawcett, T.W. & Lefebvre, L. 2011 . Mate choice for cognitive traits: a review of the evidence in nonhuman vertebrates. Behavioral Ecology, 22, 447-459. doi.org/10.1093/beheco/arq173

Briscoe, A.D. & Chittka, L. 2001. The evolution of color vision in insects. Annual Review of Entomology, 46, 471-510. doi.org/10.1146/annurev.ento.46.1.471 PMid:11112177

Brodbeck, D. R. (1994). Memory for spatial and local cues – a comparison of a storing and a nonstoring species. Animal Learning & Behavior, 22, 119-133. doi.org/10.3758/BF03199912

Cain, S. W., Chou, T. & Ralph, M. R. 2004. Circadian modulation of performance on an aversion-based place learning task in hamsters. Behavioural Brain Research, 150, 201-205. doi.org/10.1016/j.bbr.2003.07.001 PMid:15033293

Clark, C. J. 2009. Courtship dives of Anna’s hummingbird offer insights into flight performance limits. Proceedings of the Royal Society London B, 276, 3047-3052. doi.org/10.1098/rspb.2009.0508 PMid:19515669 PMCid:2817121

Clayton, N. S. & Dickinson, A. (1999). Memory for the content of caches by scrub jays (Aphelocoma coerulescens). Journal of Experimental Psychology: Animal Behavior Processes, 25, 82-91. doi.org/10.1037/0097-7403.25.1.82 PMid:9987859

Cole, E.F., Cram, D.L. & Quinn, J.L. 2011. Individual variation in spontaneous problem-solving performance among great tits. Animal Behaviour, 81, 491-498. doi.org/10.1016/j.anbehav.2010.11.025

Dally, J. M., Emery, N. J. & Clayton, N. S. 2010. Avian Theory of Mind and counter espionage by food-caching western scrub-jays (Aphelocoma californica). European Journal of Developmental Psychology, 7, 17-37. doi.org/10.1080/17405620802571711

Eacott, M. J., Easton, A. & Zinkivskay, A. 2005. Recollection in an episodic-like memory task in the rat. Learning & Memory, 12, 221-223. doi.org/10.1101/lm.92505 PMid:15897259

Eacott, M. J. & Norman, G. 2004. Integrated memory for object, place, and context in rats: A possible model of episodic-like memory? Journal of Neuroscience, 24, 1948-1953. doi.org/10.1523/JNEUROSCI.2975-03.200 PMid:14985436

Ergorul, C. & Eichenbaum, H. 2004. The hippocampus and memory for “What,” “Where,”" and “When”.  Learning & Memory, 11, 397-405. doi.org/10.1101/lm.73304 PMid:15254219 PMCid:498318

Feeney, M., Roberts, W. & Sherry, D. 2009. Memory for what, where, and when in the black-capped chickadee (Poecile atricapillus). Animal Cognition, 12, 767-777. doi.org/10.1007/s10071-009-0236-x PMid:19466468

Feinsinger, P. & Colwell, R. K. (1978). Community organization among neotropical nectar-feeding birds. American Zoologist, 18, 779-795.

Ferkin, M. H., Combs, A., Delbarco-Trillo, J., Pierce, A. A. & Franklin, S. 2008. Meadow voles, Microtus pennsylvanicus, have the capacity to recall the “what”, “where”, and “when” of a single past event. Animal Cognition, 11, 147-159. doi.org/10.1007/s10071-007-0101-8 PMid:17653778

Flores-Abreu, I. N. 2012. Spatial cognition in three dimensions. Unpublished PhD thesis, University of St Andrews, UK. Flores-Abreu, I. N., Hurly, T.A. & Healy, S.D. 2012. Onetrial spatial learning: wild hummingbirds relocate a rewarding location after a single visit. Animal Cognition, 15, 631-637. doi.org/10.1007/s10071-012-0491-0 PMid:22526688

Flores-Abreu, I. N., Hurly, T.A. & Healy, S.D. 2013. Three-dimensional spatial learning in hummingbirds. Animal Behaviour. doi.org/10.1016/j.anbehav.2012.12.019

Freas, C.A., LaDage, L.D., Roth, T.C. & Pravosudov, V.V. 2012. Elevation-related differences in memory and the hippocampus in mountain chickadees,  Poecile gambeli. Animal Behaviour, 84, 121-127. doi.org/10.1016/j.anbehav.2012.04.018

Girvan, J. R. & Braithwaite, V. A. 1998. Population differences in spatial learning in three-spined sticklebacks. Proceedings of the Royal Society London B, 265, 913-918. doi.org/10.1098/rspb.1998.0378 PMCid:1689060

Graham, P., Fauria, K. & Collett, T.S. 2003. The influence of beacon-aiming on the routes of wood ants. Journal of Experimental Biology, 206, 535-541. doi.org/10.1242/jeb.00115 PMid:12502774

Grobéty, M.-C. & F. Schenk. 1992. Spatial learning in a three-dimensional maze. Animal Behaviour, 43, 1011-1020. doi.org/10.1016/S0003-3472(06)80014-X

Hampton, R. R. & Shettleworth, S. J. 1996. Hippocampus and memory in a food-storing and in a nonstoring bird species. Behavioral Neuroscience, 110, 946-964. doi.org/10.1037/0735-7044.110.5.946 PMid:8918998

Healy, S.D. & Hurly, T.A. 2004. Spatial learning and memory in birds. Brain Behavior and Evolution, 63, 211-220. doi.org/10.1159/000076782 PMid:15084814

Healy, S.D. & Hurly, T.A. 2003. Cognitive ecology: foraging in hummingbirds as a model system. Advances in the Study of Behavior, 32, 325-359. doi.org/10.1016/S0065-3454(03)01007-6

Healy, S.D. & Hurly T.A. 2001. Foraging and spatial learning in hummingbirds. In: Pollination Biology. (Ed. by L. Chittka & J. Thomson). Pp. 127-147. Cambridge University Press. PMid:11446794

Healy, S.D. & Hurly, T.A. 1998. Rufous hummingbirds’ (Selasphorus rufus) memory for flowers: patterns or actual spatial locations? Journal of Experimental Psychology: Animal Behavior Processes, 24, 1-9. doi.org/10.1037/0097-7403.24.4.396

Healy, S.D. & Hurly, T.A. 1995. Spatial memory in rufous hummingbirds (Selasphorus rufus): a field test. Animal Learning and Behavior, 23, 63-68. doi.org/10.3758/BF03198016

Henderson, J., Hurly, T.A., Bateson, M. & Healy, S.D. 2006a. Timing in free-living rufous hummingbirds Selasphorus rufus. Current Biology 16, 512-515. doi.org/10.1016/j.cub.2006.01.054 PMid:16527747

Henderson, J., Hurly, T.A. & Healy, S.D. 2006b. Spatial relational learning in rufous hummingbirds (Selasphorus rufus). Animal Cognition 9, 201-205. doi.org/10.1007/s10071-006-0021-z PMid:16767469

Henderson, J., Hurly, T.A. & Healy, S.D. 2001. Rufous hummingbirds’ memory for flower features. Animal Behaviour, 61, 98-106. doi.org/10.1006/anbe.2000.1670

Holbrook, R. & Burt de Perera, T. 2011. Three-dimensional spatial cognition: information in the vertical dimension overrides information from the horizontal. Animal Cognition, 14, 613-619. doi.org/10.1007/s10071-011-0393-6 PMid:21452048

Holbrook, R. I. & Burt de Perera, T. 2009. Separate encoding of vertical and horizontal components of space during orientation in fish. Animal Behaviour 78, 241-245 doi.org/10.1016/j.anbehav.2009.03.021

Hurly, A. T. 1996. Spatial memory in rufous hummingbirds: memory for rewarded and non-rewarded sites. Animal Behaviour, 51, 177-183. doi.org/10.1006/anbe.1996.0015

Hurly, T.A. & Healy, S.D. 2002. Cue learning by rufous hummingbirds Selasphorus rufus. Journal of Experimental Psychology:Animal Behavior Processes, 28, 209-223. doi.org/10.1037/0097-7403.28.2.209 PMid:11987877

Hurly, T.A. & Healy, S.D. 1996. Memory for flowers in rufous hummingbirds: Location or local visual cues? Animal Behaviour, 51, 1149-1157. doi.org/10.1006/anbe.1996.0116

Hurly, T. A. & Oseen, M. D. 1999. Context-dependent, risk-sensitive foraging preferences in wild rufous hummingbirds. Animal Behaviour, 58, 59-66. doi.org/10.1006/anbe.1999.1130 PMid:10413541

Hurly, T.A., Franz, S. & Healy, S.D. 2010. Do rufus hummingbirds (Selasphorus rufus) use visual beacons? Animal Cognition, 13, 377-383. doi.org/10.1007/s10071-009-0280-6 PMid:19768647

Hurly, T.A., Scott, R.D. & Healy, S.D. 2001. The function of displays in male rufous hummingbirds. The Condor, 103, 647-651. doi.org/10.1650/0010-5422(2001)103[0647:TFODOM]2.0.CO;2

Keagy, J., Savard, J.-F. & Borgia, G. 2012 . Cognitive ability and the evolution of multiple behavioral display traits. Behavioral Ecology, 23, 448-456. doi.org/10.1093/beheco/arr211

Keagy, J., Savard, J. -F., & Borgia, G. (2009). Male satin bowerbird problem-solving ability predicts mating success. Animal Behaviour, 78, 809-817. doi.org/10.1016/j.anbehav.2009.07.011

Latty, T. & Beekman, M. 2011. Irrational decisionmaking in an amoeboid organism: transitivity and context-dependent preferences. Proceedings of the Royal Society London B, 278, 307-312. doi.org/10.1098/rspb.2010.1045 PMid:20702460 PMCid:3013386

Marshall, R.E.S. 2012. Timing and episodic-like memory in the rufous hummingbird. Unpublished PhD thesis, University of St Andrews, UK.

Marshall, R.E.S., Hurly, T.A. & Healy, S.D. 2012. Do a flower’s features help hummingbirds to learn its contents and refill rate? Animal Behaviour, 83, 1163-1169. doi.org/10.1016/j.anbehav.2012.02.003

McGregor, A. & Healy, S.D. 1999. Spatial accuracy in food-storing and nonstoring birds. Animal Behaviour, 58, 727-734. doi.org/10.1006/anbe.1999.1190 PMid:10512645

Miller, R.S. & Miller, R.E. 1971. Feeding activity and color preference of ruby-throated hummingbirds. Condor, 73, 309-313. doi.org/10.2307/1365757

Miller, R.S., Tamm, S., Sutherland, G.D., & Gass C.L. (1985) Cues for orientation in hummingbird foraging: color and position. Canadian Journal of Zoology 63,18–21. doi.org/10.1139/z85-004

Morgan, K.V., Hurly, T.A., Bateson, M., Asher, L., & Healy, S.D. 2012. Context-dependent decisions among options varying in a single dimension. Behavioural Processes, 89, 115-120. doi.org/10.1016/j.beproc.2011.08.017 PMid:21945144

Muth, F. & Healy, S.D. 2011. The role of adult experience in nest building in the zebra finch, Taeniopygia guttata. Animal Behaviour, 82, 185-189. doi.org/10.1016/j.anbehav.2011.04.021

Odling-Smee, L.C., Boughman, J.W. & Braithwaite, V.A. 2008. Sympatric species of threespine stickleback differ in their performance in a spatial learning task. Behavioral Ecology and Sociobiology, 62, 1935-1945. doi.org/10.1007/s00265-008-0625-1

Pahl, M., Zhu, H., Pix, W., Tautz, J. & Zhang, S. W. 2007. Circadian timed episodic-like memory – a bee knows what to do when, and also where. Journal of Experimental Biology, 210, 3559-3567. doi.org/10.1242/jeb.005488 PMid:17921157

Paton, D .C. & Carpenter, F. L. (1984). Peripheral foraging by territorial rufous hummingbirds – defense by exploitation. Ecology, 65, 1808-1819. doi.org/10.2307/1937777

Raven, P.H. 1972. Why are bird-visited flowers predominantly red. Evolution, 26, 674-674. doi.org/10.2307/2407064

Saksida, L. & Wilkie, D. 1994. Time-of-day discrimination by pigeons, Columba livia. Learning & Behavior, 22, 143-154. doi.org/10.3758/BF03199914

Sasaki, T. & Pratt, S. C. 2011. Emergence of group rationality from irrational individuals. Behavioral Ecology, 22, 276-281. doi.org/10.1093/beheco/arq198

Scarpi, D. 2011. The impact of phantom decoys on choices in cats. Animal Cognition, 14, 127-136. doi.org/10.1007/s10071-010-0350-9 PMid:20838836

Schmidt, J., Scheid, C., Kotrschal, K, Bugnyar, T. & Schloegl, C. 2011. Gaze direction – A cue for hidden food in rooks ( Corvus frugilegus)? Behavioural Processes, 88, 88-93. doi.org/10.1016/j.beproc.2011.08.002 PMid:21855614 PMCid:3185283

Seed, A. & Byrne, R. 2010. Animal Tool-Use. Current Biology, 20, R1032-R1039. doi.org/10.1016/j.cub.2010.09.042 PMid:21145022

Sherry, D. F. & Vaccarino, A. L. 1989. Hippocampus and memory for food caches in black-capped chickadees. Behavioral Neuroscience, 103, 308-318. doi.org/10.1037/0735-7044.103.2.308

Taylor, A.H., Elliffe, D., Hunt, G.R. & Gray, R.D. 2010. Complex cognition and behavioural innovation in New Caledonian crows. Proceedings of the Royal Society London B, 277, 2637-2643. doi.org/10.1098/rspb.2010.0285 PMid:20410040 PMCid:2982037

Teschke, I. & Tebbich, S. 2011. Physical cognition and tool-use: performance of Darwin’s finches in the two trap tube task. Animal Cognition, 14, 555-563. doi.org/10.1007/s10071-011-0390-9 PMid:21360118

van Casteren, A., Sellers, W. I., Thorpe, S. K. S., Coward, S., Crompton, R.H., Myatt, J.P. & Ennos, A.R. 2012. Nest-building orangutans demonstrate engineering know-how to produce safe, comfortable beds. Proceedings of the National Academy of Sciences USA, 109, 6873-6877. doi.org/10.1073/pnas.1200902109 PMid:22509022 PMCid:3344992

Waite, T. A. 2001. Background context and decision making in hoarding gray gays. B ehavioral Ecology, 12, 318-324. doi.org/10.1093/beheco/12.3.318

Walsh, P., Hansell, M. & Healy, S.D. 2010. Repeatability of nest morphology in African weaverbirds. Biology Letters, 6, 149-151. doi.org/10.1098/rsbl.2009.0664 PMid:19846449 PMCid:2865054

Walsh, P.T., Hansell, M. Borello, W. & Healy, S.D. 2011. Individuality in nest building: do Southern Masked weaver (Ploceus velatus) males vary in their nest-building behavior? Behavioural Processes, 88, 1-6. doi.org/10.1016/j.beproc.2011.06.011 PMid:21736928

Ward, B.J., Day, L.B., Wilkening, S.R., Wylie, D.R., Saucier, D.M., & Iwaniuk, A.N. 2012. Hummingbirds have a greatly enlarged hippocampal formation. Biology Letters, 8, 657-659. doi.org/10.1098/rsbl.2011.1180 PMid:22357941

Weir, A.A.S., Chappell, J. & Kacelnik, A. 2002. Shaping of hooks in New Caledonian crows. Science 297, 981-981. doi.org/10.1126/science.1073433 PMid:12169726

Zhou, W. & Crystal, J. D. 2009. Evidence for remembering when events occurred in a rodent model of episodic memory. Proceedings of the National Academy of Sciences, 106, 9525-9529. doi.org/10.1073/pnas.0904360106 PMid:19458264 PMCid:2695044

Zinkivskay, A., Nazir, F. & Smulders, T. V. 2009. What-Where-When memory in magpies (Pica pica). Animal Cognition, 12, 119-125. doi.org/10.1007/s10071-008-0176-x PMid:18670793