The Evolution of Language
Michael C. Corballis
Department of Psychology, University of Auckland, Private Bag 92019
Auckland, New Zealand
Introduction
In 1866, seven years after the publication of Darwin's Origin of Species, the Linguistic Society of Paris famously banned all discussion of the evolution of language. The main difficulty, it seems, was the widespread belief that language was uniquely human, so that there was no evidence to be gained from the study of nonhuman animals. This meant that language must have evolved some time since the split of humans from the great apes. Since there was little evidence to be gained from the fossil evidence, any theory as to how language evolved was largely a matter of speculation—and no doubt argument. Of course evolution was itself a contentious issue, and was vigorously attached by the church. In the case of language, the conflict between and science and religion would have been exacerbated by the long-standing view that language was gifted by God.
In more recent if not more enlightened times, the ban seems to have been lifted, but the contention remains. The problem lies partly in the sheer complexity of language itself, and partly in the lack of fossil evidence that can be related directly to language. As Lock (1999) pointed out, the earliest incontrovertible evidence for human speech is not much more than a century old, with Edison's recordings! These factors contrive to make language evolution arguably “the hardest problem in science” (Christiansen & Kirby, 2003, p. 1). Although there are probably few who would argue that language is a gift from God, there are still those who maintain that language evolved in a single step, in all-or-none fashion. This is sometimes called the “big bang” theory of language evolution, and has been most clearly articulated by the linguist Derek Bickerton (1995), but is also implicit in much of the writing of Noam Chomsky. It smacks a little of the miraculous, although the appeal is more likely to be to a fortuitous genetic mutation (e.g., Crow, 2002), or to some emergent physical property (Chomsky, 1975), rather than to God. Against this is the view that language evolved in incremental fashion, through natural selection, as maintained by Pinker and Bloom (1990) and elaborated by Jackendoff (2002). This view is clearly aligned with the Darwinian view of human evolution.
A related issue is whether the origins of language are to be found in the behavior or communication systems of animals, or more specifically, of our primate forebears. In 1966, Chomsky wrote as follows:
The unboundedness of human speech, as an expression of limitless thought, is an entirely different matter [from animal communication], because of the freedom from stimulus control; and the appropriateness to new situations . . . Modern studies of animal communication so far offer no counterevidence to the Cartesian assumption that human language is based on an entirely different principle. Each known animal communication system either consists of a fixed number of signals, each associated with a specific range of eliciting systems or internal states, or a fixed number of `linguistic dimensions,' each associated with a non-linguistic dimension (pp. 77-78).
Although Chomsky has also argued that language did not evolve through natural selection, the idea that language is uniquely human has not precluded the notion that it is a product of natural selection. For example, although Pinker and Bloom (1990; see also Pinker, 1994) argue that that language evolved in the hominid lineage through natural selection, they agree with Chomsky that nothing resembling language has been demonstrated in any nonhuman species.
Chomsky appears to have moderated his conclusion in a recent co-authored article (Hauser, Fitch, & Chomsky, 2002), in which it is argued that there is a distinction between what the authors call the faculty of language in the broad sense (FLB) and the faculty of language in the narrow sense (FLN). FLB includes those aspects of human language that we share with other species. In order to speak, for example, we need memory, respiration, a larynx capable of producing sound, and so forth. In the case of signed language, we need the ability to move the forelimbs in diverse ways, the ability to move parts of the face, and so on. These properties are evident in other species, and have no doubted been exapted for use in language systems. FLN includes only those aspects of language that are unique to humans. Much of the discussion of FLN has centered on two aspects of human language, namely the use of symbols, and syntax, which is a mechanism for combining symbols into a potentially infinite number of expressions.
It is clear that most animals, including primates, are able to make communicative sounds, and perform intentional acts. It is also clear that primates can use sounds or actions in symbolic fashion. For example, vervet monkeys give different warning cries to distinguish between a number of different threats, such as snakes, hawks, eagles, or leopards. When a monkey makes one of these cries, the troop acts appropriately, clambering up trees in response to a leopard call or running into the bushes in response to an eagle call (Cheney & Seyfarth, 1990). These cries bear no obvious relation to the sounds emitted by the predators they stand for, and are in that sense symbolic. More compellingly, perhaps, the bonobo Kanzi is able to use visual symbols on a keyboard to refer to objects and actions; these symbols are again abstract in the sense that they were deliberately chosen by the human keepers so as not to resemble what they stand for (Savage-Rumbaugh, Shanker, & Taylor, 1998).
Recursion
Based on such arguments, Hauser et al. (2002) argue that FLB is shared by other species, including birds and other mammals, although they also point out that the use of symbols in these examples does not mean that the symbols have all of the properties of words (see Deacon, 1997, 2003, for extended discussion). In their view, though, the critical ingredient that is missing from FLB, and that characterizes FLN, is recursion. Recursion lies at the heart of grammar, and enables us to create a potential infinity of sentences that convey an infinity of meanings. Recursive language is well understood even by quite young children, as illustrated by the well-known children's story:
This is the house that Jack built
This is the malt that lay in the house that Jack built
This is the rat that ate the malt that lay in the house that Jack built
This is the cat that worried the rat that ate the malt that lay in the house that Jack built
And so on … and on. Young children quickly understand that the sentence can be extended ad infinitum. The recursive rules of grammar also allow phrases to be moved around instead of simply being tacked on to the beginning. For example, if one wanted to highlight the malt in the story, one could embed phrases as follows:
The malt that the rat that the cat killed ate lay in the house that Jack built.
It seems clear that this highly flexible, recursive property is absent from communication among nonhuman species. Although many birds emit long sequences of sounds, these are essentially repetitive, born of insistence, perhaps, rather than the attempt to convey new information. The same is true of primates, and it is evident even at the acoustic level that there is a variation and novelty in human vocal output that simply does not exist in the vocalizations of primates, even great apes (Arcadi, 2000). A visitor from Mars would soon discern that there is something special about human vocal output even if she had no understanding of what was being said.
Protolanguage
Compared to the recursive sophistication of human language, animal communication systems are at best only weakly combinatorial. For the past half-century or so, there have been strenuous attempts to teach language to the great apes, and especially to our closest relatives the chimpanzee and bonobo. It soon emerged that chimpanzees were essentially unable to speak; in one famous example, a baby chimpanzee reared in a human family proved able to articulate only three or four words, and was soon outstripped by the human children in the family (Hayes, 1952). It was then realized that the failure to speak may have resulted from deficiencies of the vocal apparatus, and perhaps of cortical control of vocal output, and subsequent attempts have been based on manual action and visual representations. For example, the chimpanzee Washoe was taught over 100 manual signs, based loosely on American Sign Language, and was able to combine signs into two- or three-“word” sequences to make simple requests (Gardner & Gardner, 1969). The bonobo Kanzi has mastered the use of a keyboard containing 256 symbols representing objects and actions, and can construct meaningful sequences by pointing to the symbols. He supplements this vocabulary with gestures of his own invention. Although he makes full use of this vocabulary, his manual utterances appear to be limited to only two or three “words.” Surprisingly, though, he has shown an impressive ability to follow instructions conveyed in spoken sentences, with as many as seven or eight words (Savage-Rumbaugh, Shanker, & Taylor, 1998).
There seems to be a general consensus, though, that these exploits are not language—as Pinker (1994, p. 340) put it, the great apes “just don't `get it.'” Kanzi's ability to understand spoken sentences, although seemingly impressive, was shown to be roughly equivalent to that of a two-and-a-half-year-old girl (Savage-Rumbaugh et al., 1998), and is probably based on the extraction of two or three key words rather than a full decoding of the syntax of the sentences. His ability to produce symbol sequences is also at about the level of the average two-year-old human. In human children, grammar typically emerges between the ages of two and four, so that the linguistic capabilities of Kanzi and other great apes is generally taken as equivalent to that of children in whom grammar has not yet emerged. Bickerton (1995, p. 339) wrote that “The chimps' abilities at anything one would want to call grammar were next to nil,” and has labeled this pre-grammatical level of linguistic performance “protolanguage.”
Bickerton has further suggested, though, that protolanguage may be the precursor of true language, not only in development, but also in evolution, an idea adopted by Jackendoff (2002) in a recent influential book. Yet protolanguage has been taught to such diverse creatures as the great apes, dolphins, a sea lion, and an African gray parrot, implying parallel evolution. Further, it has never been observed in the wild. An alternative view, then, is that it is not a precursor to language, but is indicative rather of a general problem-solving ability. For example, chimpanzees have been observed to solve mechanical problems by combining implements, such as joining two sticks together to rake in food that would not be reachable using either stick alone (Kohler, 1925; Tomasello, 1996). The simple combining of symbols to achieve some end, such as food, may be in principle no different.
Nevertheless, protolanguage may be said also to characterize the communication skills of the two-year-old child, as well as those with expressive aphasia following damage to Broca's area—or perhaps, as more recent research suggests, damage to the left precentral area of the insula, a cortical structure underling the frontal and temporal lobes (Dronkers, 1996). Given that protolanguage seems to underlie language both in development and in terms of neural representation, it is perhaps reasonable to suppose that language did in fact grow out of protolanguage, whether conceived as a linguistic ability or as simply a capacity for problem solving, in human evolution. The question then is, when did this happen?
From Protolanguage to Language
Since great apes have not acquired any communicative skill beyond protolanguage, despite strenuous efforts to teach them, we can be reasonably sure that the language capacity evolved some time after the split between the hominid line and the line leading to modern chimpanzees and bonobos. The earliest fossil skull tentatively identified as a bipedal hominid is Sahelanthropus tchadensis, discovered in Chad, and dated between 6 and 7 million years ago (Brunet et al., 2002). This date is probably very close to the time of the chimpanzee-hominid split, estimated at between 6.3 and 7.7 million years ago by a DNA-DNA hybridization technique (Sibley & Ahlquist, 1984). Another early fossil, Orrorin tugenensis, is perhaps more securely identified as bipedal, and is dated from between 5.2 and 5.8 years ago (Galik et al., 2004). The early hominids were distinguished from the great apes by bipedalism, but with respect to brain size and what little is known of their cognitive capacities they were probably little different from present-day chimpanzees. There is little reason to suppose that grammatical language emerged within the 4 or 5 million years of early hominid evolution.
Dramatic changes began to occur from some 2 million years ago, with the emergence of the genus Homo. Stone tool industries have been dated from about 2.5 million years ago in Ethiopia (Semaw et al., 1997), and tentatively identified with Homo rudolfensis. However these tools, which belong to the Oldowan industry, are primitive, and some have suggested that H. rudolfensis and H. habilis, the hominid traditionally associated with the Oldowan, should really be considered australopithecines (e.g., Wood, 2002). The true climb to humanity, and to language, probably began with the emergence of the larger-brained Homo erectus around 1.8 million years ago, and the somewhat more sophisticated Acheulian tool industry dating from around 1.5 million years ago (Ambrose, 2001). But even tool manufacture may not be an especially good guide to the advance of cognition, since the Acheulian industry remained fairly static for over a million years, and even persisted into the culture of early Homo sapiens some 125,000 years ago (Walter et al., 2000).
Other changes associated with H. erectus may give a better guide to the emergence of more sophisticated cognition. Erectus marked the progression from a relatively primitive form of bipedalism, retaining a degree of adaptation to an arboreal habitat, to the full striding gait characteristic of modern humans. It has been suggested that the newer form of bipedalism evident from erectus on was also an adaptation to efficient endurance running, and resulted in marked changes to skeletal structure (Bramble & Lieberman, 2004). This may have been driven by a change in habitat and lifestyle toward hunting and scavenging, which in turn may have resulted in cognitive and cerebral changes leading to language. From about 1.6 million years ago, some members of this species strode out of Africa and into Asia, and erectus fossils in Java have been dated to as recently as 30,000 years ago (Swisher et al., 1994). But perhaps the surest signs of intellectual advance have to do with the size and development of the brain.
Bigger brains
According to estimates based on fossil skulls, brain size increased from 457 cc in Australopithecus africanus, to 552 cc in H. habilis, to 854 cc in early H. erectus (also known as H. ergaster), to 1016 cc in later H. erectus, to 1552 in H. neanderthalensis, and back to 1355 cc in H. sapiens (Wood & Collard, 1999). These values depend partly on body size, which probably explains why H. neanderthalensis, being slightly larger than modern humans, also had slightly larger brains, but the picture is clearly one of a progressive increase, first clearly evident in early Homo.
Indeed Chomsky (1975) has suggested that language may have arisen simply as a consequence of possessing an enlarged brain, without the assistance of natural selection:
We know very little about what happens when 1010 neurons are crammed into something the size of a basketball, with further conditions imposed by the specific manner in which this system developed over time. It would be a serious error to suppose that all properties, or the interesting structures that evolved, can be `explained' in terms of natural selection.
Nevertheless the increase in brain size itself may have depended on natural selection, and recent research has brought to light two genetic mutations that may have a bearing on this. It is of some interest that both mutations resulted in the inactivation of genes, suggesting that we may owe our humanity at least in part to the loss of genes, rather than the incorporation of new ones.
One of these mutations has to do with a gene on chromosome 7 that encodes the enzyme CMP-N-acetylneuraminic acid (CMP-Neu5Ac) hydroxylase (CMAH). An inactivating mutation of this gene has resulted in a deficiency in humans of the mammalian sialic acid N-glycolylneuraminic acid (Neu5Gc). This acid appears to be absent in Neanderthal fossils as well as in humans, but is present in present-day primates. It also seems to have been down-regulated in the chimpanzee brain, and through mammalian evolution, leading to speculation that inactivation of the CMAH gene may have removed a constraint on brain growth in human ancestry (Chou et al., 2002). Chou et al. applied molecular-clock analysis to the CMAH genes in chimpanzees and other great apes, as well as to the pseudogene in humans, which indicated that the mutation occurred some 2.1 million years ago, leading up to the expansion in brain size.
The other inactivating mutation that may also have contributed to the increase in brain size has to do with a gene on chromosome 7 that encodes for the myosene heavy chain MYH16, responsible for the heavy masticatory muscles in most primates, including chimpanzees and gorillas, as well as the early hominids. Molecular-clock analysis suggests that this gene was inactivated around 2.4 million years ago, leading to speculation that the diminution of jaw muscles and their supporting bone structure removed a further constraint on brain growth (Stedman et al., 2004). It is a matter of further speculation as to why this seemingly deleterious mutation became fixed in the ancestral human population. It may have had to do with the change from a predominantly vegetable diet to a meat-eating one, or it may have had to do with the increasing use of the hands rather than the jaws to prepare food (Currie, 2004).
More generally, though, these mutations, and the resulting increase in brain size, may have been selected because of a change in environment. With the global shift to cooler climate after 2.5 million years ago, much of southern and eastern Africa probably became more open and sparsely wooded (Foley, 1987). This left the hominids not only more exposed to attack from dangerous predators, such as saber-tooth cats, lions, and hyenas, but also obliged to compete with them as carnivores. The solution was not to compete on the same terms, but to establish what Tooby and DeVore (1987) called the “cognitive niche,” relying on social cooperation and intelligent planning for survival. As Pinker (2003, p. 27) put it, it became increasingly important to encode, and no doubt express, information as to “who did what to whom, when, where, and why.” The problem is that the number of combinations of possible events becomes very large, and a system of holistic calls to describe those events rapidly taxes the perceptual and memory systems.
Suppose, for example, that the environment of an early hominid comprised ten “objects” (such as lion, tree, stone chopper, child, etc) and six actions (such as run, climb, throw, carry, etc). This gives rise to 60 possible combinations (some admittedly unlikely), and to attach a single utterance to each would therefore require 60 distinct utterances. There is clear benefit in attaching distinct symbols, which we can now call “words,” to each object and to each action, requiring only 16 words. The first step toward grammar may therefore have been the assignment of distinct labels to different objects and actions, along with a simple rule for combining then (Nowak, Plotkin, & Jansen, 2000). Nowak (2001) has extended this approach to indicate how universal grammar might have evolved.
Of course, such a system still requires immense long-term storage capacity. The average adult knows tens of thousands of words, although words themselves are built from a much smaller vocabulary of phonemes, and are themselves characterized by “discrete infinity.” The double processes of combining phonemes into words and words into sentences make up what is known as duality of patterning. One clear benefit of the combinatorial principle is that the language user has ready access to new combinations to describe new concepts and new events, or even imaginary events, such as a cow jumping over the moon. A disadvantage of this system, though, is that messages comprise long sequences of elements, which places an extra burden on short-term memory. The increase in brain size may therefore have been driven as much by the demands on short-term memory as by the requirements of vocabulary and syntax. But whatever the reason, language is expensive in terms of neural circuitry, as it takes up a good deal of the left hemisphere (and some of the right) in modern humans (Dick et al., 2001). Rather than suppose that language simply “emerged” as a consequence of increased brain size, as proposed by Chomsky (1975), then, it is much more likely that selection for an increasingly sophisticated language was itself one of the drivers of increasing brain size.
Although Bickerton (1995) and others have argued that language evolved in a single step from protolanguage, it is more likely that it developed gradually, through progressive refinements based on natural selection. Bickerton himself has modified his earlier view, now proposing several steps, although he still argues that syntax as we know it was probably not present in the Neanderthals, and must therefore have evolved within the past 300,000 years (Bickerton, 2003; Calvin & Bickerton, 2000). The steps by which language evolved must be largely a matter of conjecture. Hurford (2003) has proposed that the earliest “language” consisted only of nouns and verbs, and was followed by the gradual accrual of new categories, which had their origins in nouns and verbs. As an example of how this might have happened, he gives an example from the emergence of Tok Pisin, a creole that derived from pidgin in Papua New Guinea. The pidgin consisted only of nouns and verbs, but in the creole adjectives were signaled by the addition of the suffix -fela (or -pela), itself derived from the English noun fellow. It also seems reasonable to suppose that phrase structure preceded the ability to combine phrases into single utterances. Christiansen and Kirby (2004) give the example of the phrases My dad/He plays tennis/He plays tennis with his colleagues, which can be combined into the more compact form My dad plays tennis with his colleagues.
A more extensive proposal as to the sequence of steps leading up to modern language has been proposed by Jackendoff (2002), and is summarized in Figure 1.
Figure 1 about here
Postnatal growth
There is some reason to suppose that recursive grammar depends, not on brain size per se, but rather on postnatal growth of the brain. Human development is characterized by “secondary altriciality;” that is, the human brain undergoes most of its growth after birth. In macaque newborns, the brain at birth has a volume of about 70% of adult size; in chimpanzees the figure is about 40%; in humans, it is only 25% (Coqueuniot et al. 2004). The brain is at its most plastic during growth, which may explain the so-called “critical period” for the development of language. The optimal period is probably between two and four, although there is evidence that grammar can be acquired later in childhood (Vargha-Khadem et al. 1997), but probably not after puberty.
The role of childhood in the emergence of grammatical structure has been documented in the conversion of pidgin, which are effectively proto-languages created by European traders working in foreign territories, to creoles, in which grammatical structure has been imposed. This conversion occurs within a generation, and appears to be accomplished by a generation of children rather than by the adults first exposed to the pidgin (Bickerton, 1981). A more recent example comes from Nicaraguan Sign Language (NSL), which has been developed among the deaf only in the past 25 years. In an experimental study, individuals were asked to describe the action of rolling down a slope. Those from the first cohort mimicked both the rolling and the down motions in a single gesture. The majority of those in the second and third cohorts indicated the motion in two gestures, one to indicate a rolling motion and the other to indicate downward motion (Senghas, Kita, & Özyürek, 2004). The creation of separate gestures effectively “grammaticalizes” the language, opening up the possibility of new combinations. It was evident only in those cohorts who had been introduced to NSL as children, and not among those who had been exposed as adults, which reinforces the view that language is structured by mechanisms expressed during childhood. It remains an open question whether it reflects an underlying UG, or whether it implies some more general capacity for structured learning in the developing brain.
The idea that growth may be critical to the emergence of recursive grammar also gains some support from work with artificial networks. Elman (1993) tried to determine whether a network with recurrent loops could acquire the rules underlying sequences of symbols, by testing whether the network could learn to predict the next symbol in the sequence. The network was able to learn simple grammars, but was at first unable to deal with recursive grammars in which phrases were embedded in other phrases. This problem was at least partially surmounted when Elman introduced a "growth" factor, which he simulated by degrading the system early on so that only global aspects of the input were processed, and then gradually decreasing the "noise" in the system so that it was able to process more and more detail. When this was done, the system was able to pick up some of the recursive quality of grammar, and so begin to approximate the processing of true language.
One might expect secondary altriciality to have evolved along with the increasing brain size, since restrictions on the size of the birth canal would presumably have forced earlier birth. Indeed, it has been estimated that if human birth were to conform to the general primate pattern it should occur after 18 months, not nine months (Krogman, 1972). Yet there is recent evidence that early Homo erectus showed an ape-like pattern rather than a human-like pattern of post-natal brain growth, suggesting that secondary altriciality may not have emerged until fairly late in the genus Homo (Coqueugniot et al., 2004; Dean et al., 2001). It is also possible to gauge the period of development from striations in the dental enamel on teeth, evidence from fossil teeth suggests that although Homo neanderthalensis possessed brains at least as large as those of modern humans, their rate of development was much faster (Ramirez-Rossi & Bermudez de Castro, 2004). The difference between H. sapiens and earlier Homo may lie, not in the relative size of the brain at birth, but rather in the speed of brain growth. Komarova and Nowak (2001) suggest that language is more accurate the longer the period of acquisition, but this must be balanced against the high cost of learning. The pressure may have been toward longer periods of growth as language skills became more critical to biological fitness.
In summary, the expansion of brain size from some 2 million years ago may well have signaled the beginnings of more sophisticated language, and the emergence of phrase structure and grammatical categories. But the emergence of fully recursive language may not have come about until later in the evolution of Homo, when postnatal growth was characterized not only by secondary altriciality, but also by a slowing of postnatal growth. This scenario may well provide an evolutionary framework for the sequential processes of language evolution outlined by Jackendoff (2002).
Biology or culture?
There can be little question that language is a biological endowment, at least in the sense that humans are the only species to possess it. It is nevertheless profoundly cultural, in the senses that it serves primarily social functions and often demarcates one culture from another. Even within the biology of language, there is some question as to whether it is simply a consequence of a general cognitive capacity, depending perhaps on a large brain or a long period of postnatal growth, or whether it depends on some biological adaptation or adaptations specific to language. Following Chomsky (1975), it has generally been assumed, by linguists at least, that all languages, regardless of culture, depend on universal grammar (UG), which is assumed to underlie all grammars, and to be innate and common to all humans. After some 30 years of discussion, though, no one seems to have a clear idea of what UG actually is—as Newmeyer (2003) points out, there are more than a dozen competing versions. Although Chomsky has likened UG to a physical organ, such as the heart, he was skeptical as to whether it was the direct result of natural selection, as we have seen. Others have attributed UG to genetic influences. Pinker (1994) wrote, for example, of a “grammar gene,” but has since referred to evidence that several genetic mutations have contributed to the evolution of grammar in our own species (Pinker, 2003).
An alternative view, though, is that language evolved not as a specific endowment, but rather as part of a more general adaptation to a more complex social environment. Recursion, arguably the most important aspect of grammar, is not unique to language. Tomasello (2003) has proposed, for example, that language is part of a broader capacity to understand others as intentional agent. Our social lives are governed by the attribution of mental states, and these attributions are used recursively, as in I know that she thinks I'm crazy, or I know that she thinks he thinks she's crazy. Recursion is not even restricted to social settings. We have developed highly complex, recursive ways of manufacturing things, using the same elements in different environments. The classic example is the invention of the wheel, which has come to feature in a vast array of mechanical contrivances. This does not prove the existence of a manufacturing gene, but is generally seen as evidence of human inventiveness. I have myself argued that humans are blessed with a generative assembling device (GAD) that underlies not only language, but also other recursive activities such as manufacture and music (Corballis, 1991).
Whether the biological changes necessary for language were specific to language or whether they were more general endowments, the primary function of language is the sharing of information between people. Language evolved biologically as a function of social pressures. Further, languages vary between cultures, not only in the words themselves, but also in grammatical structure. For example, languages differ with respect to the ordering of subject (S), object (O), and verb (V). Among Indo-European, most modern languages follow the SVO order, while most ancient ones follow the SOV order. More generally the switch from an OV to a VO order seems to have occurred more often than the reverse switch, suggesting to Newmeyer (2000) that the original human language, if such existed, was an SOV language.
Languages also mutate through time, just as living organisms do, so that it is possible to construct language trees in much the same way that one constructs cladistic trees of biological species. That mutation is of course primarily social, not genetic. Words drift through time, possibly in rather random fashion, so that different cultures acquire different words. This means that linguistic information can be used to infer migration patterns in much the same way that genetic information is used. For example, using methods derived from biology, Gray and Jordan (2000) were able to derive a model of how which the Pacific was colonized by Austronesian-speaking peoples from the relations between their contemporary languages. Yet, although languages differ, so far as we know children of any racial group can acquire any of the world's languages, which is further indication that language variation is basically social, not genetic. Even so, languages differ to the point of mutual incomprehensibility, and serve as much to keep human cultures apart as they do to facilitate bonding within cultures.
Although languages change over time, it has been commonly assumed that all languages are essentially the same in basic structure. This view is known as uniformitarianism, and relates to the view that all languages depend on UG. There is some reason to believe, though, that not all languages are born entirely equal, even though language speakers may be born equal, at least across cultural groups. For example, it has been suggested that the languages of small isolated communities have highly complex grammatical or inflectional rules that may serve to preserve tight social networks, and perhaps exclude non-native speakers (Trudgill, 1992). Literacy may also have an impact on language structure. Givón (1979) notes, for example, that the use of subordinate clauses increases dramatically with literacy. On the basis of these and other examples, Newmeyer (2003) has argued that we should look more carefully at the way in which language structure might vary depending on social and environmental pressures. Understanding the balance between the biological and social is perhaps one of the foremost challenges facing the study of language evolution.
Did Language Evolve from Manual Gesture?
Language is often equated with speech. Yet it has become increasingly evident that the signed languages invented by deaf communities down the ages have all of the grammatical and semantic sophistication of speech (Armstrong et al., 1995; Emmorey, 2002; Neidle et al., 2000). Chomsky (2000, pp. 121-122) puts it succinctly:
Though highly specialized, the language faculty is not tied to specific sense modalities, contrary to what was assumed not long ago. Thus, the sign language of the deaf is structurally very similar to spoken language, and the course of acquisition is very similar. Large-scale sensory deficit seems to have limited effect on language acquisition. … The analytic mechanisms of the language faculty seem to be triggered in much the same ways whether the input is auditory, visual, even tactual, and seems to be localized in the same brain areas, somewhat surprisingly.
Further, language is seldom wholly one or the other. Both manual and facial gestures normally accompany speech, and are closely synchronized with it, implying a common source, and the gestures carry part of the meaning (Goldin-Meadow & McNeill, 1999). The visible movements of the face can also influence the perception of speech, as in the McGurk effect, in which dubbing sounds onto a mouth that is saying something different alters what the hearer actually hears (McGurk & MacDonald, 1976). Although we can communicate without having to see the person we are talking to, as on radio or cell-phone, speech in the natural world is rendered more eloquent and meaningful with the addition of bodily movements.
These considerations raise the possibility that language itself might have evolved from manual gestures rather than from animal calls, although it may always have been a mixture of both—perhaps with gestures punctuated by grunts gradually giving way to vocalizations embellished by gestures. The idea that language may have its roots in manual gesture goes back at least to the 18th century philosopher Condillac (1971/1746), but has been advocated many times since, often independently (e.g., Armstrong, 1999; Armstrong et al., 1995; Corballis, 2002; Givòn, 1995; Hewes, 1973; Rizzolatti & Arbib, 1998). From an evolutionary point of view, the idea makes some sense, since nonhuman primates have little if any cortical control over vocalization, but excellent cortical control over the hands and arms. Human speech required extensive anatomical modifications, including changes to the vocal tract and to innervation of the tongue, and the development of cortical control over voicing via the pyramidal tract. The vocalizations of other primates are probably largely emotional, controlled by the limbic system rather than the cortex (Ploog, 2002). The human equivalents are laughing, crying, shrieking, and the like. As we shall see, the modifications necessary for articulate speech arrived late in hominid evolution, and may not have been complete until the emergence of Homo sapiens—or even later. We have also seen that efforts to teach great apes anything resembling speech have proven futile, but there has been reasonable success in teaching them to communicate through gestures or by pointing to visual symbols.
The late emergence of speech
Fossil evidence suggests that articulate speech emerged late in hominid evolution, which gives further grounds for supposing that it may have been preceded by a gestural system. One piece of evidence has to do with the hypoglossal canal at the base of the tongue. The hypoglossal nerve, which passes through this canal and innervates the tongue, is much larger in humans than in great apes, probably because of the important role of the tongue in speech. Fossil evidence suggests that the size of the hypoglossal canal in early australopithecines, and perhaps in Homo habilis, was within the range of that in modern great apes, while that of the Neanderthal and early H. sapiens skulls contained was well within the modern human range (Kay, Cartmill, & Barlow, 1998), although this has been disputed (DeGusta, Gilbert, & Turner, 1999). A further clue comes from the finding that the thoracic region of the spinal cord is relatively larger in humans than in nonhuman primates, probably because breathing during speech involves extra muscles of the thorax and abdomen. Fossil evidence indicates that this enlargement was not present in the early hominids or even in Homo ergaster, dating from about 1.6 million years ago, but was present in several Neanderthal fossils (MacLarnon & Hewitt, 1999).
The production of articulate speech in humans depends on the lowering of the larynx. According to P. Lieberman (1998; Lieberman, Crelin, & Klatt, 1972) this adaptation was incomplete even in the Neanderthals of 30,000 years ago, and their resultant poor articulation would have been sufficient to keep them separate from H. sapiens, leading to their eventual extinction. This work remains controversial (e.g., Gibson & Jessee, 1999), but there is other evidence that the cranial structure underwent changes subsequent to the split between anatomically modern and earlier “archaic” Homo, such as the Neanderthals, Homo heidelbergensis, and Homo rhodesiensis. One such change is the shortening of the sphenoid, the central bone of the cranial base from which the face grows forward, resulting in a flattened face (D. E. Lieberman, 1998). D. E. Lieberman speculates that this is an adaptation for speech, contributing to the unique proportions of the human vocal tract, in which the horizontal and vertical components are roughly equal in length. This configuration, he argues, improves the ability to produce acoustically distinct speech sounds, such as the vowel [i] (P. Lieberman, 2002). It is not seen in Neanderthal skeletal structure (see also Vleck, 1970), suggesting that it emerged in our own species within the past 500,000 years. Another adaptation unique to H. sapiens is neurocranial globularity, defined as the roundness of the cranial vault in the sagittal, coronal, and transverse planes, which is likely to have increased the relative size of the temporal and/or frontal lobes relative to other parts of the brain (D. E. Lieberman, McBratney, & Krovitz, 2002). These changes may reflect more refined control of articulation and also, perhaps, more accurate perceptual discrimination of articulated sounds.
These various findings suggest that the tinkering of the brain and cranial configuration for the refinement of speech continued into H. sapiens after the split from the Neanderthals and other “archaic” species of Homo, and perhaps even into the past 100,000 years of human evolution, as I shall suggest below. Nevertheless, grammatical language may well have arisen considerably earlier, and may have been conveyed initially by means of manual and facial gestures, increasingly augmented and eventually (largely) replaced by vocalization.
Speech as gesture
The idea that language may have evolved from manual gestures receives further support from evidence that speech itself is better considered a gestural system than an acoustic one. Traditionally, speech has been regarded as made up of discrete elements of sound, called phonemes. It has been known for some time, though, that phonemes do not exist as discrete units in the acoustic signal (Joos, 1948), and are not discretely discernible in mechanical recordings of sound, such as a sound spectrograph (Liberman et al., 1967). One reason for this is that the acoustic signal corresponding to individual phonemes varies widely, depending on the contexts in which they are embedded. This has led to the view that they exist only in the minds of speakers and hearers, and the acoustic signal must undergo complex transformation for individual phonemes to be perceived as such. Yet we can perceive speech at remarkably high rates, up to at least 10-15 phonemes per second, which seems at odds with the idea that some complex, context-dependent transformation is necessary.
These problems have led to the alternative view, known as articulatory phonology (Browman and Goldstein, 1995), that speech is better understood as comprised of articulatory gestures. Six articulatory organs—namely, the lips, the velum, the larynx, and the blade, body, and root of the tongue―produce these gestures. Each is controlled separately, so that individual speech units are comprised of different combinations of movements. The distribution of action over these articulators means that the elements overlap in time, which makes possible the high rates of production and perception. Unlike phonemes, speech gestures can be discerned by mechanical means, though X-rays, magnetic resonance imaging, and palatography (Studdert-Kennedy, 1998).
This still raises the question, though, of how these gestures are perceived—our ability to understand speech on radio or telephone is incontrovertible evidence that speech can be understood from the acoustic stream alone. The short (but still incomplete) answer is that speech is understood in terms of the articulatory gestures that produce it, rather than in terms of elementary sound units. This is the so-called “motor theory of speech perception” (Liberman et al., 1967). Although we can understand the radio announcer, there is abundant evidence that watching people speak can aid understanding of what they are saying. Nevertheless the process by which articulatory information is extracted from the acoustic signal is not fully understood, although some insight has come from the recent discovery of what has been termed the “mirror system” in the brain.
The mirror system
Neurons in the region of F5 in the ventral premotor cortex of the monkey typically fire when the animal makes grasping movements with the hand or mouth. A subset of those cells, dubbed “mirror neurons,” also fire when the animal observes another individual making the same movements. In the monkey, these responses require the presence of a target, and do not respond to actions that merely mimic an action in the absence of a target, nor do they respond to a target alone (Gallese et al., 1996; Rizzolatti et al., 1996). This direct mapping of perceived action onto the production of action seems to provide a platform for the evolution of language, and to support, albeit indirectly, the motor theory of speech perception. Furthermore, the area of the human brain that corresponds most closely to area F5 in the monkey includes Broca's area, which is one of the main cortical areas underlying the production of speech. This suggests that speech may have arisen from cortical structures that initially had to do with manual action rather than with vocalization (Rizzolatti & Arbib, 1998).
It has also become apparent that mirror neurons are part of a more general mirror system that involves other regions of the brain as well. The superior temporal sulcus (STS) also contains cells that respond to observed biological actions, including grasping actions (Perrett et al., 1989), although few if any respond when the animal itself performs an action. F5 and STS are connected to area PF in the inferior parietal lobule, where there are also neurons that respond both to the execution and perception of actions. These neurons are now known as “PF mirror neurons” (Rizzolatti et al., 2001). Other areas, such as amygdala and orbito-frontal cortex, may also be part of the mirror system.
A similar system has been inferred in humans, based on evidence from electroencephalography (Muthukumaraswamy, Johnson, & McNair, 2004), magnetoencephalography (Hari et al., 1998), transcranial magnetic stimulation (Fadiga et al., 1995), and functional magnetic resonance imaging (fMRI) (Iacoboni et al., 1999). Unlike the mirror system in monkeys, the human mirror system appears to be activated by movements that need not be directed toward an object (Rizzolatti et al., 2001), although there is evidence that it is activated more by actions that are object-directed than by those that are not object-directed (Muthukumaraswamy et al., 2004). Activation by non-object-directed action may reflect adaptation of the system for more abstract signaling—as in signed languages. The mirror system in humans appears to involve areas in the frontal, temporal and parietal lobes that are homologous to those in the monkey, although there is some evidence that they tend to be lateralized to the left hemisphere in humans, especially in the frontal lobes (Iacoboni et al., 1999, Nishitani and Hari, 1998). It is well established that manual apraxia, especially for actions involving fine motor control, is associated with left-hemisphere damage (Heilman et al., 2000). It is possible that the incorporation of vocalization into the mirror system, perhaps unique to Homo sapiens, resulted in lateralization of the manual as well as of the vocal system (Corballis, 2003).
The mirror system leads to what has been termed the “direct-matching hypothesis,” according to which we understand actions by mapping the visual representations of observed actions onto the motor representations of the same actions (Rizzolatti et al., 2001). This system is tuned to the perception of actions that have a “personal” reference. Evidence from an fMRI study shows, for example, that it is activated when people watch mouth actions, such as biting, lip-smacking, oral movements involved in vocalization (e.g., speech reading, barking), performed by people, but not when they watch such actions performed by a monkey or a dog. Actions belonging to the observer's own motor repertoire are mapped onto the observer's motor system, while those that do not belong are not—instead, they are perceived in terms of their visual properties (Buccino et al., 2004). Watching speech movements, and even stills of a mouth making a speech sound, activate the mirror system, including Broca's area (Calvert & Campbell, 2003). This is consistent with the idea that language may have evolved from visual display that included movements of the face.
Although most of the evidence on the mirror system has to do with visual input, area F5 of the monkey also contains what might be termed “acoustic mirror neurons.” These respond to the sounds of actions, such as tearing paper or breaking a peanut, as well as to the performance of those actions. That is, even in the monkey, the direct-matching hypothesis is not restricted to visual input (Kohler et al., 2002). There is no evidence for mirror neurons in the monkey that fire to both the production and perception of vocalization. It is likely, though, that vocal production was incorporated into the mirror system in humans, and probably only in humans or our hominid forebears (Ploog, 2002), providing the mechanism for the motor theory of speech perception.
The next question is when vocalization was added to the system. A clue to this comes from the FOXP2 gene
The FOXP2 gene
About half of the members of three generations of an extended family in England, known as the KE family, are affected by a disorder of speech and language. The disorder is evident from the affected child's first attempts to speak and persists into adulthood (Vargha-Khadem et al., 1995). The disorder is now known to be due to a point mutation on the FOXP2 gene (forkhead box P2) on chromosome 7 (Fisher et al., 1998; Lai et al, 2001). For normal speech to be acquired, two functional copies of this gene seem to be necessary.
The nature of the deficit in the affected members of the KE family, and therefore the role of the FOXP2 gene, have been debated. Some have argued that FOXP2 gene is involved in the development of morphosyntax (Gopnik, 1990), and it has even been identified more broadly as the “grammar gene” (Pinker, 1994)—although Pinker (2003) has since recognized that other genes probably played a role in the evolution of grammar. Subsequent investigation suggests, however, that the core deficit in affected members of the KE family is one of articulation, with grammatical impairment a secondary outcome (Watkins, Dronkers, & Vargha-Khadem, 2002). It may therefore play a role in the incorporation of vocal articulation into the mirror system.
This is supported by a study in which fMRI was used to record brain activity in both affected and unaffected members of the KE family while they covertly generated verbs in response to nouns (Liégeois et al., 2003). Whereas unaffected members showed the expected activity concentrated in Broca's area in the left hemisphere, affected members showed relative underactivation in both Broca's area and its right-hemisphere homologue, as well as in other cortical language areas. They also showed overactivation bilaterally in regions not associated with language. However, there was bilateral activation in the posterior superior temporal gyrus; the left side of this area overlaps Wernicke's area, important in the comprehension of language. This suggests that affected members may have generated words in terms of their sounds, rather than in terms of articulatory patterns. Their deficits were not attributable to any difficulty with verb generation itself, since affected and unaffected members did not differ in their ability to generate verbs overtly, and the patterns of brain activity were similar to those recorded during covert verb generation. Another study based on structural MRI showed morphological abnormalities in the same areas (Watkins, Vargha-Khadem, et al., 2002).
The FOXP2 gene is highly conserved in mammals, and in humans differs in only three places from that in the mouse. Nevertheless, two of the three changes occurred on the human lineage after the split from the common ancestor with the chimpanzee and bonobo. A recent estimate of the date of the more recent of these mutations suggests that it occurred “since the onset of human population growth, some 10,000 to 100,000 years ago” (Enard et al., 2002, p. 871). If this is so, then it might be argued that the final incorporation of vocalization into the mirror system was critical to the emergence of modern human behavior, often dated to the Upper Paleolithic (Corballis, 2004).
It is unlikely, though, that the FOXP2 mutation was the only event in the transition to speech, which undoubtedly went through several steps and involved other genes (Marcus & Fisher, 2003). Moreover, the FOXP2 gene is expressed in the embryonic development of structures other than the brain, including the gut, heart, and lung (Shu et al., 2001). It may have even played a role in the modification of breath control for speech (MacLarnon & Hewitt, 1999). A mutation of the FOXP2 gene may nevertheless have been the most recent event in the incorporation of vocalization into the mirror system, and thus the refinement of vocal control to the point that it could carry the primary burden of language.
The idea that the critical mutation of the FOXP2 gene occurred less than 100,000 years ago is indirectly supported by recent evidence from African click languages. Two of the many groups that make extensive use of click sounds are the Hadzabe and San, who are separated geographically by some 2000 kilometers, and genetic evidence suggests that the most recent common ancestor of these groups goes back to the root of present-day mitochondrial DNA lineages, perhaps as early as 100,000 years ago (Knight, et al., 2003). This could mean that clicks were a prevocal way of adding sound to facial gestures, prior to the FOXP2 mutation. Evidence from mitochondrial DNA suggests that modern humans outside of Africa date from groups who migrated from Africa and these groups may have already developed autonomous speech, leaving behind African speakers who retained click sounds. Although an initial estimate dated this migration at around 52,000 years ago (Ingman et al., 2000), a more recent estimate places it at about 83,000 years ago (Oppenheimer, 2003), which is consistent with evidence that Homo sapiens had reached Australia by around 60,000 years ago (Thorne et al., 1999). The only known non-African click language is Damin, an extinct Australian aboriginal language. This is not to say that the early Australians and Africans did not have full vocal control of speech; rather, click languages may be simply a vestige of earlier languages in which vocalization was not yet part of the mirror system giving rise to autonomous speech.
Why speech?
According to the account presented here, the transition from manual to vocal language was not abrupt. This raises the question, though, of why the transition took place at all. The signed languages of the deaf clearly show that manual languages can be as sophisticated as vocal ones. Further the transition to speech involved the lowering of the larynx, which greatly increased the risk of choking to death. Clearly, the evolutionary pressure toward speech must have been strong. But why?
There are a number of possible answers. First, a switch to autonomous vocalization would have freed the hands from necessary involvement in communication, allowing increased use of the hands for manufacture and tool use. Indeed vocal language allows people to speak and use tools at the same time, leading perhaps to pedagogy (Corballis, 2002). Indeed, it may explain the so-called “human revolution” (Mellars & Stringer, 1989), manifest in the dramatic appearance of more sophisticated tools, bodily ornamentation, art, and perhaps music, dating from some 40,000 years ago in Europe, and probably earlier in Africa (McBrearty & Brooks, 2000; Oppenheimer, 2003). This may well have come about because of the switch to autonomously vocal language, made possible by the FOXP2 mutation (Corballis, 2004).
Although manual and vocal language can be considered linguistically equivalent, there are other advantages to vocalization. Speech is less attentionally demanding than signed language; one can attend to speech with one's eyes shut, or when watching something else. Speech also allows communication over longer distances, as well as communication at night or when the speaker is not visible to the listener. The San, a modern hunter-gatherer society, are known to talk late at night, sometimes all through the night, to resolve conflict and share knowledge (Konner, 1982). Boutla et al. (2004) have shown that the span of short-term memory is shorter for American Sign Language than for speech, suggesting that voicing may have permitted longer and more complex sentences to be transmitted—although the authors claim that the shorter memory span has no impact on the linguistic skill of signers
A possible scenario for the switch is that there was selective pressure for the face to become more extensively involved in gestural communication as the hands were increasingly engaged in other activities. Our species had been habitually bipedal from some 6 or 7 million years ago, and from some 2 million years ago was developing tools, which would have increasingly involved the hands. The face had long played a role in visual communication, and plays an important role in present-day signed languages (e.g., Neidle et al. 2000). Consequently, there may have been pressure for intentional communication to move to the face, including the mouth and tongue. Gesturing may then have retreated into the mouth, so there may have been pressure to add voicing in order to render movements of the tongue more accessible—through sound rather than sight. In this scenario, speech is simply gesture half swallowed, with voicing added. Even so, lip-reading can be a moderately effective way to recover the speech gestures, and as mentioned earlier the McGurk effect illustrates that speech is in part a visual medium. Adding voicing to the signal could have had the extra benefit of allowing a distinction between voiced and unvoiced phonemes, increasing the range of speech elements.
Changes in the mode of communication can have a dramatic influence on human culture, as illustrated by the invention of writing, and more recently by email and the Internet. These changes were relatively sudden, and cultural rather than biological. The change from manual to vocal communication, in contrast, would have been slow, driven by natural selection and involving biological adaptations, but it may have had no less an impact on human culture—and therefore, perhaps, on human fitness.
Conclusions
Fully grammatical language appears to be a uniquely human accomplishment. Other animals are capable of understanding symbolic representations, and perhaps even of segmenting speech, at least to the point of isolating words. Besides the bonobo (Savage-Rumbaugh, Shanker, & Taylor, 1998), this may include the Africa gray parrot (Pepperberg, 2002) and the domestic dog (Kaminsky, Call, & Fischer, 2004). But there is no evidence that nonhuman animals can decode or generate grammar, and so create and understand a potentially infinite variety of sentences. At best, they are at the level of the two-year-old human child, with a level of communication lacking the generative, recursive property of fully developed language. They have protolanguage.
The emergence of language from protolanguage may have occurred late in hominid evolution, though not so late as to represent an evolutionary “big bang.” The steps toward grammar may have begun some 2 million years ago, with the emergence of larger brained hominids, and continued over the next 1.5 million years, or thereabouts. The final step may have been full recursion, depending perhaps on secondary altriciality and the slowing of post-natal brain growth. This process may not have been complete even in the Neanderthals, who survived until some 30,000 years ago, and may not have been fully developed in the line leading to Homo sapiens until the emergence of that species around 170,000 years ago.
There is also evidence that fully articulate speech evolved late, and may not have been complete until less than 100,000 years ago, with the mutation of the FOXP2 gene allowing vocalization to be incorporated into the mirror system. Evidence from mtDNA suggests that modern humans migrated out of Africa some 83,000 years ago (Oppenheimer, 2003), eventually replacing all other hominids, including the Neanderthals in Europe, H. erectus in Asia, and even groups of H. sapiens who had migrated earlier. What was it that led to the dominance of these late migrants? I have suggested that it may have been the consequences of the emergence of fully articulate speech, resulting in improved technology, perhaps including more lethal weaponry, and a more coherent culture (Corballis, 2004). An alternative, perhaps, is that fully recursive language itself did not evolve until within the past 100,000 years. Rather than talking our forebears out of existence, we may have lost them in a recursive loop. Or maybe the invading hordes out of Africa simply brought diseases that the indigenous populations were not resistant to.
One might have thought that an understanding of how language evolved would have been beyond the reach of science. That, presumably, was the view in 1866, when the Linguistic Society of Paris banned all discussion of the topic. Nevertheless the past decade, in particular, has produced an extraordinarily rich accumulation of evidence from multiple sources, all of which appear to be converging on common themes, if not yet on an agreed scenario. In 1866, very little was known about the transitions from ape to human, but modern archaeology has given us a remarkably detailed account of what our hominid forebears must have been like. From sceptical talk of a “missing link” we now have evidence of over 20 hominid species separating us from our common ancestry with the chimpanzee and bonobo (Wood, 2002). Detailed inspection of hominid fossils has provided evidence of brain size, and growth characteristics, and modern biochemistry has elucidated the timing of critical events, such as the ape-hominid split, and the late migration out of Africa. There are techniques for dating genetic mutations, and this chapter has identified three mutations that may be of significance to the understanding of the evolution of language and speech—two dating from just over 2 million years ago and one from something under 100,000 years ago.
We also now understand much better what language is actually like, how it differs from other forms of communication, and how it develops. It has only recently become clear that the signed languages of the deaf are true grammatical languages, and not impoverished signaling systems. With the advance of brain imaging, the neurophysiology of language is increasingly understood, and work on the so-called mirror system has led to important insights as to how language might be better understood as part of a more general system for understanding biological motion, instead of a rather abstract coding system beyond any affinity with our animal heritage.
The scenario sketched in this chapter may be wrong, but we can be sure that evidence will continue to accumulate. The trick will be to integrate the diverse sources, and so gain a better appreciation of how we became such compulsive chatterboxes. This chapter, I hope, has been a start.
References
Ambrose, S. H. (2001). Paleolithic technology and human evolution. Science, 291, 1748-1752.
Arcadi, A. C. (2000). Vocal responsiveness in male wild chimpanzees: Implications for the evolution of language. Journal of Human Evolution, 39, 205-223.
Armstrong, D.F. (1999). Original signs: gesture, sign, and the source of language. Washington, DC: Gallaudet University Press.
Armstrong, D.F., Stokoe, W.C., & Wilcox, S.E. (1995). Gesture and the nature of language. Cambridge: Cambridge University Press.
Bickerton, D. (1981). Roots of language. Ann Arbor, MI: Karoma.
Bickerton, D. (1995). Language and human behavior. Seattle, WA: University of Washington Press.
Bickerton, D. (2003). Symbol and structure: A comprehensive framework for language evolution. In M. H. Christiansen, & S.
Kirby (Eds.), Language evolution (pp. 77-93). Oxford: Oxford University Press.
Boutla, M., Supalla, T., Newport, E. L., & Bavelier, D. (2004). Short-term memory span: Insights from sign language. Nature Neuroscience, 7, 997-1002.
Bramble, D. M., & Lieberman, D. E. (2004). Endurance running and the evolution of Homo. Science, 432, 345-352.
Browman, C.P., & Goldstein, L.F. (1995). Dynamics and articulatory phonology. In T. van Gelder & R. F. Port (Eds.), Mind as motion (pp. 175-193). Cambridge, MA: MIT Press.
Brunet, M., Guy, F., Pilbeam, D., Mackaye, H. T., Likius, A., Ahounta, D., Beauvilain, A. et al. (2002). A new hominid from the Upper Miocene of Chad, Central Africa. Nature, 418, 145-151.
Buccino, G., Lui, F., Canessa, N., Patteri, I., Lagravinese, G., Benuzzi, F., et al. (2004). Neural circuits involved in the recognition of actions performed by nonconspecifics: An fMRI study. Journal of Cognitive Neuroscience, 16, 114-126.
Calvert, G. A., & Campbell, R. (2003). Reading speech from still and moving faces: The neural substrates of visible speech. Journal of Cognitive Neuroscience, 15, 57-70.
Calvin, W. H., & Bickerton, D. (2000). Lingua ex machina: Reconciling Darwin with the human brain. Cambridge, MA: MIT Press.
Cheney, D. L., & Seyfarth, R. M. (1990). How monkeys see the world. Chicago: University of Chicago Press.
Chomsky, N. (1966). Cartesian linguistics: A chapter in the history of rationalist thought. New York: Harper & Row.
Chomsky, N. (1975). Reflections on language. New York: Pantheon.
Chomsky, N. (2000). Language as a natural object. In N. Chomsky, New horizons in the study of language and mind (pp. 106-133). Cambridge, UK: Cambridge University Press.
Chou, H.-H., Hakayama, T., Diaz, S., Krings, M., Indriati, E., Leakey, M., et al. (2002). Inactivation of CMP-N-acetylneuraminic acid hydroxylase occurred prior to brain expansion during human evolution. Proceedings of the National Academy of Sciences, USA 99, 11736-11741.
Christiansen, M. H., &
Kirby, S. (2003). Language evolution: The hardest problem in science? In M. H. Christiansen, & S.
Kirby (Eds.), Language evolution (pp. 1-15). Oxford: Oxford University Press.
Condillac, E.B. de. (1971). An essay on the origin of human knowledge. T. Nugent (Tr.), Gainesville, FL: Scholars Facsimiles and Reprints. (Originally published 1746).
Coqueugniot, H., Hublin, J.-J., Veillon, F., Houet, F., & Jacob T. (2004). Early brain growth in Homo erectus and implications for cognitive ability. Science, 431, 299-302.
Corballis, M. C. (1991). The lopsided ape. New York: Oxford University Press.
Corballis, M. C. (2002). From hand to mouth: the origins of language. Princeton, NJ: Princeton University Press.
Corballis, M. C. (2003). From mouth to hand: Gesture, speech, and the evolution of right-handedness. Behavioral and Brain Sciences, 26, 199-260.
Corballis, M. C. (2004). The origins of modernity: Was autonomous speech the critical factor? Psychological Review, 111, 543-522.
Crow, T. J. (2002). Sexual selection, timing, and an X-Y homologous gene: Did Homo sapiens speciate on the Y chromosome? In T. J. Crow (Ed.), The speciation of modern Homo sapiens (pp. 197-216). Oxford, UK: Oxford University Press.
Currie, P. (2004). Muscling in on hominid evolution. Nature, 428, 373-374.
Deacon, T. (1997). The symbolic species: The coevolution of language and the brain. New York: Norton.
Deacon, T. W. (2003). Universal grammar and semiotic constraints. In M. H. Christiansen, & S.
Kirby (Eds.), Language evolution (pp. 111-139). Oxford: Oxford University Press.
Dean, C., Leakey, M. G., Reid, D., Shrenck, F., Schwartz, G. T., Stringer, C., et al. (2001). Growth processes in teeth distinguish modern humans from Homo erectus and earlier hominins. Nature, 414, 628-631.
DeGusta, D., Gilbert, W. H., & Turner, S. P. (1999). Hypoglossal canal size and hominid speech. Proceedings of the National Academy of Sciences, 96, 1800-1804.
Dick, F., Bates, E., Wulfeck, B., Utman, J. A., Dronkers, N. F., & Gernsbacher, M. A. (2001). Language deficits, localization, and grammar: evidence for a distributed model of language breakdown in aphasic patients and neurologically intact individuals. Psychological Review, 108, 759-788.
Dronkers, N. F. (1996). A New Brain Region for Coordinating Speech Articulation. Nature 384, 159-161.
Elman, J. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48, 71-99.
Emmorey, K. (2002). Language, cognition, and brain: Insights from sign language research. Hillsdale, NJ: Erlbaum.
Enard, W., Przeworski, M., Fisher, S. E., Lai, C. S. L., Wiebe, V., Kitano, T., et al. (2002). Molecular evolution of FOXP2, a gene involved in speech and language. Nature, 418, 869-871.
Fadiga, L., Fogassi, L., Pavesi, G., & Rizzolatti, G. (1995). Motor facilitation during action observation—a magnetic stimulation study. Journal of Neurophysiology, 73, 2608-2611.
Fisher, S. E., Vargha-Khadem, F., Watkins, K. E., Monaco, A.P., Pembrey, M.E., (1998). Localisation of a gene implicated in a severe speech and language disorder. Nature Genetics, 18, 168-170.
Foley, R. (1987). Another unique species: patterns in human evolutionary ecology. Harlow: Longman Scientific and Technical.
Galik, K., Senut, B., Pickford, M., Gommery, D., Treil, J., Kuperavage, A. J., & Eckhardt, R. B. (2004). External and internal morphology of the BAR 1002'00 Orrorin tugenensis femur. Science, 305, 1450-1453.
Gallese, V., Fadiga, L., Fogassi, L., & Rizzolatti, G. (1996). Action recognition in the premotor cortex. Brain, 119, 593-609.
Gardner, R. A., & Gardner, B. T. (1969). Teaching sign language to a chimpanzee. Science, 165, 664-672.
Gibson, K. R., & Jessee, S. (1999). Language evolution and expansions of multiple neurological processing areas. In B. J. King (Ed.), The origins of language: What nonhuman primates can tell us. Santa Fe, NM: School of American Research Press.
Givòn, T. (1979). On understanding grammar. New York: Academic Press.
Givòn, T. (1995). Functionalism and grammar. Philadelphia, PA: John Benjamins.
Goldin-Meadow S., & McNeill D. (1999). The role of gesture and mimetic representation in making language the province of speech. In M. C. Corballis & S. E. G. Lea (Eds.) The descent of mind (pp. 155-172). Oxford, UK: Oxford University Press.
Gopnik, M. (1990). Feature-blind grammar and dysphasia. Nature, 344, 715.
Gray, R. D., & Jordan, F. M. (2000). Language trees support the express-train sequence of Austronesian expansion. Nature, 409, 1052.
Hari, R., Forss, N., Avikainen, S., Kirveskari, E., Salenius, S., & Rizzolatti,G. (1998). Activation of human primary motor cortex during action observation: A neuromagnetic study. Proceedings of the National Academy of Sciences, USA, 95, 15061-15065.
Hauser, M. D., Fitch, W. T., & Chomsky, N. (2002). The faculty of language: What is it, who has it, and how did it evolve? Science, 298, 1569-1579.
Hayes, C. (1952). The ape in our house. London: Gollancz.
Heilman, K. M., Meador, K. J., & Loring, D. W. (2000). Hemispheric asymmetries of limb-kinetic apraxia - A loss of deftness. Neurology, 55, 523-526.
Hewes, G.W. (1973). Primate communication and the gestural origins of language. Current Anthropology, 14, 5-24.
Hurford, J. R. (2003). The language mosaic and its evolution. In M. H. Christiansen, & S.
Kirby (Eds.), Language evolution (pp. 38-57). Oxford: Oxford University Press.
Iacoboni, M., Woods, R.P, Brass, M., Bekkering, H., Mazziotta, J.C., & Rizzolatti, G. (1999). Cortical mechanisms of human imitation. Science, 286, 2526-2528.
Ingman. M., Kaessmann, H., Pääbo, S., & Gyllensten, U. (2000). Mitochondrial genome variation and the origin of modern humans. Nature, 408, 708-713.
Jackendoff, R. (2002). Foundations of language: brain, meaning, grammar, evolution. Oxford, UK: Oxford University Press.
Joos, M. (1948). Acoustic phonetics. Language Monograph No. 23. Baltimore, MD: Linguistic Society of America.
Kaminsky, J., Call, J., & Fischer, J. (2004). Word learning in a domestic dog: Evidence for “fast mapping.” Science, 304, 1682-1683.
Kay, R. F., Cartmill, M., & Barlow, M. (1998). The hypoglossal canal and the origin of human vocal behavior. Proceedings of the National Academy of Sciences (USA), 95, 5417-5419.
Knight, C., Studdert-Kennedy, M., & Hurford, J. R. (Eds.) (2000). The evolutionary emergence of language: Social function and the origins of linguistic form. Cambridge: Cambridge University Press.
Kohler, E., Keysers, C., Umilta, M.A., Fogassi, L., Gallese, V., & Rizzolatti, G. (2002). Hearing sounds, understanding actions: Action representation in mirror neurons. Science, 297, 846-848.
Kohler, W. (1925). The mentality of apes. New York: Routledge & Kegan Paul.
Komarova, N. L. & Nowak, M. A. (2001). Natural selection of the critical period for language acquisition. Proceedings of the Royal Society of London Series B: Biological Sciences, 268, 1189-1196.
Konner, M. (1982). The tangled wing: biological constraints on the human spirit. New York: Harper.
Knight, A., Underhill, P.A., Mortensen, H.M., Zhivotovsky, L.A., Lin, A.A., Henn, B.M., et al. (2003). African Y chromosome and mtDNA divergence provides insight into the history of click languages. Current Biology, 13, 464-473.
Krogman, W. M. (1972). Child growth. Ann Arbor, MI: The University of Michigan Press.
Lai, C. S., Fisher, S. E., Hurst, J. A., Vargha-Khadem, F., & Monaco A. P. (2001). A novel forkhead-domain gene is mutated in a severe speech and language disorder. Nature, 413, 519-523.
Liberman A. M., Cooper F. S., Shankweiler, D. P., & Studdert-Kennedy, M.. (1967). Perception of the speech code. Psychological Review, 74, 431-461.
Lieberman, D. E. (1998). Sphenoid shortening and the evolution of modern cranial shape. Nature, 393, 158-162.
Lieberman, D. E., McBratney, B. M., & Krovitz, G. (2002). The evolution and development of cranial form in Homo sapiens. Proceedings of the National Academy of Sciences, 99, 1134-1139.
Lieberman, P. (1998). Eve spoke: Human language and human evolution. New York: W.W. Norton.
Lieberman, P. (2002). On the nature and evolution of the neural bases of human language. Yearbook of Physical Anthropology, 45, 36-62.
Lieberman, P., Crelin, E. S., & Klatt, D. H. (1972). Phonetic ability and related anatomy of the new-born, adult human, Neanderthal man, and the chimpanzee. American Anthropologist, 74, 287-307.
Liégeois, F., Baldeweg, T., Connelly, A., Gadian, D. G., Mishkin, M., & Vargha-Khadem, F. (2003). Language fMRI abnormalities associated with FOXP2 gene mutation. Nature Neuroscience, 6, 1230-1237.
Lock, A. (1999). On the recent origin of symbolically-mediate language and its implications for psychological science. In M. C. Corballis and S. E. G. Lea (Eds.), The descent of mind (pp. 324-355). Oxford: Oxford University Press.
MacLarnon, A. & Hewitt, G. (1999). The evolution of human speech: The role of enhanced breathing control. American Journal of Physical Anthropology, 109, 341-363.
Marcus, G. F., Fisher, S. E., 2003. FOXP2 in focus: what can genes tell us about speech and language? Trends in Cognitive Science,. 7, 257-262.
McBrearty, S. & Brooks, A. S. (2000). The revolution that wasn't: A new interpretation of the origin of modern human behavior. Journal of Human Evolution, 39, 453-563.
Mellars, P. A. & Stringer, C. B. (Eds.) (1989). The human revolution: Behavioural and biological perspectives on the origins of modern humans. Edinburgh: Edinburgh University Press
McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746-748.
Muthukumaraswamy, S. D., Johnson, B. W., & McNair, N. A. (2004). Mu rhythm modulation during observation of an object-directed grasp. Cognitive Brain Research, 19, 195-201.
Neidle, C., Kegl, J., MacLaughlin, D., Bahan, B., & Lee, R. G. (2000). The syntax of American Sign Language. Cambridge, MA: The MIT Press.
Newmeyer, F. J. (2000). On the reconstruction of `proto-world' word order. In C. Knight, M. Studdert-Kennedy, & J. R. Hurford, J. R. (Eds.), The evolutionary emergence of language: Social function and the origins of linguistic form (pp. 372-390). Cambridge: Cambridge University Press.
Newmeyer, F. J. (2003). What can the field of linguistic tell us about the origins of language? In M. H. Christiansen, & S.
Kirby (Eds.), Language evolution (pp. 58-76). Oxford: Oxford University Press.
Nishitani, N., & Hari R. (2000). Temporal dynamics of cortical representation for action. Proceedings of the National Academy of Sciences, USA, 97, 913-918.
Nowak, M. A. (2001). Evolution of universal grammar. Science, 291, 114-118.
Nowak, M. A., Plotkin, J. B., & Jansen, V. A. A. (2000). The evolution of syntactic communication. Nature, 404, 495-498.
Oppenheimer, S. (2003). Out of Eden: The peopling of the world. London: Constable.
Pepperberg, I. M. (2002). In search of King Solomon's ring: Cognitive and communicative studies of Gray parrots (Psittacus erithacus). Brain, Behavior, and Evolution, 59, 54-67.
Perrett, D. I., Harries, M. H., Bevan, R., Thomas, S., Benson, P.J., Mistlin, A.J., et al. (1989). Frameworks of analysis for the neural representation of animate objects and actions. Journal of Experimental Biology, 146, 87-113.
Pinker, S. (1994). The language instinct. New York: Morrow.
Pinker, S. (2003). Language as an adaptation to the cognitive niche. In M. H. Christiansen, & S.
Kirby (Eds.), Language evolution (pp. 16-37). Oxford: Oxford University Press.
Pinker, S. & Bloom, P. (1990). Natural language and natural selection. Behavioral and Brain Sciences, 13, 707-784.
Ploog, D. (2002). Is the neural basis of vocalisation different in non-human primates and Homo sapiens? In T. J. Crow (Ed.) The speciation of modern Homo Sapiens (pp. 121-135). Oxford, UK: Oxford University Press.
Ramirez-Rossi, F. V. & Bermudez de Castro, J. M. (2004). Surprisingly rapid growth in Neanderthals. Nature, 428, 936-939.
Rizzolatti, G. & Arbib, M. A. (1998). Language within our grasp. Trends in Cognitive Sciences, 21, 188-194.
Rizzolatti, G., Fadiga, L., Fogassi, L, & Gallese V. (1996). Premotor cortex and the recognition of motor actions. Cognitive Brain Research, 3, 131-141.
Rizzolatti, G., Fogassi, L., & Gallese V. (2001). Neurophysiological mechanisms underlying the understanding and imitation of action. Nature Reviews, 2, 661-670.
Savage-Rumbaugh, S., Shanker, S.G., & Taylor, T. J. (1998). Apes, language, and the human mind. New York: Oxford University Press.
Semaw, S. P., Renne, P., Harris, J. W. K., Feibel, C. S., Bernor, R. L., Fessweha, N., et al. (1997). 2.5-million-year-old stone tools from Gona, Ethiopia. Nature, 385, 333-336.
Senghas, A., Kita, S., & Özyürek, A. (2004). Children creating core properties of language: Evidence from an emerging sign language in Nicaragua. Science, 305, 1779-1782.
Shu, W.G., Yang, H.H., Zhang, L.L., Lu, M.M., & Morrisey, E.E. (2001). Characterization of a new subfamily of winged-helix/forkhead (Fox) genes that are expressed in the lung and act as transcriptional repressors. Journal of Biological Chemistry, 276, 27488-27497.
Sibley, C. G., & Ahlquist, J. E. (1984). The phylogeny of hominoid primates, as indicated by DNA-DNA hybridisation. Journal of Molecular Evolution, 20, 2-15.
Stedman, H. H., Kozyak, B. W., Nelson, A., Thesier, D. M., Su, L. T., Low, D. W., et al. (2004). Myosin gene mutation correlates with anatomical changes in the human lineage. Nature, 428, 415-418.
Studdert-Kennedy, M. (1998). The particulate origins of language generativity: From syllable to gesture. In J. R Hurford, M. Studdert-Kennedy, & C. Knight (Eds), Approaches to the evolution of language (pp. 169-176). Cambridge, UK: Cambridge University Press.
Swisher, C. C., III, Curtis, G. H., Jacob, A. C., Getty, A. G., Suprojo, A., & Widiasmoro. (1994). Age of the earliest known hominids in Java, Indonesia. Science, 263, 1118-1121.
Thorne, A., Grün, R., Mortimer, G., Spooner, N. A., Simpson, J. J., McCulloch, M., et al. (1999). Australia's oldest human remains: Age of the Lake Mungo human skeleton. Journal of Human Evolution, 36, 591-612.
Tomasello, M. (1996). Do apes ape? In J. Galef & C. Heyes (Eds.), Social learning in animals: The roots of culture (pp. 319-346). New York: Academic Press.
Tomasello, M. (2003). On the different origins of symbols and grammar. In M. H. Christiansen, & S.
Kirby (Eds.), Language evolution (pp. 94-110). Oxford: Oxford University Press.
Tooby, J., & DeVore, I. (1987). The reconstruction of hominid evolution through strategic modeling. In W. G. Kinzey (Ed.), The evolution of human behavior: Primate models. Albany, NY: SUNY Press.
Trudgill, P. (1992). Dialectic typology and social structure. In E. J. Jahr (Ed.), Language contact (pp. 195-211). New York: Mouton de Gruyter.
Vargha-Khadem, F., Carr, L. J., Isaacs, E., Brett, E., Adams, C., & Mishkin, M. (1997). Onset of speech after left hemispherectomy in a nine-year-old boy. Brain, 120, 159-182.
Vargha-Khadem, F., Watkins, K.E., Alcock, K.J., Fletcher, P., & Passingham, R. (1995). Praxic and nonverbal cognitive deficits in a large family with a genetically transmitted speech and language disorder. Proceedings of the National Academy of Sciences, USA, 92, 930-933.
Vleck, E. (1970). Etude comparative onto-phylogénétique de l'enfant du Pech-de-L'Azé par rapport à d'autres enfants néanderthaliens. In D. Feremback (Ed.), L'enfant Pech-de-L'Azé (pp. 149-186). Paris: Masson.
Walter, R. C., Buffler, R. T., Bruggemann, J. H., Guillaume, M. M. M., Berhe, S. M., Negassi, B., et al. (2001). Early human occupation of the Red Sea coast of Eritrea during the last interglacial. Nature, 405, 65-69.
Watkins, K.E., Dronkers, N.F., & Vargha-Khadem F. (2002). Behavioural analysis of an inherited speech and language disorder: Comparison with acquired aphasia. Brain, 125, 452-464.
Watkins, K.E., Vargha-Khadem, F., Ashburner, J., Passingham, R.E., Connelly, A., Friston, K.J., et al. (2002). MRI analysis of an inherited speech and language disorder: structural brain abnormalities. Brain, 125, 465-478.
Wood, B. (2002). Hominid revelations from Chad. Nature, 418, 134-135.
Wood, B., & Collard, M. (1999). The human genus. Science, 284, 65-71.
Pre-existing primate conceptual structure
Use of symbols in a non-situation-specific fashion
Use of an open, unlimited class of symbols Concatenation of symbols
Development of a phonological combinatorial Use of symbol position
system to enlarge open, unlimited class of to convey basic semantic
symbols (possibly syllables first, then phonemes) relations
(Protolanguage about here)
Hierarchical phrase structure
Symbols that explicitly encode Grammatical
abstract semantic relations categories
System of inflections System of grammatical
to convey functions to convey
semantic relations semantic relations
(Modern language)
Figure 1. Hypothesized evolutionary steps in the evolution of language. Sequential steps are ordered top to bottom; parallel, independent steps are shown side by side. Steps unique to humans are shown in bold type. (After Jackendoff, 2002)
1