12/11/2007 03:38 PM
14. Grammatical Approaches to Syntactic Change : The Handbook of Historical Linguistics : Blackwell Reference Online
Page 1 of 10
http://www.blackwellreference.com/subscriber/uid=532/tocnode?id=g9781405127479_chunk_g978140512747916
Subject
Key-Topics
DOI:
14. Grammatical Approaches to Syntactic Change
DAVID LIGHTFOOT
Linguistics
»
Historical Linguistics
grammar
,
syntax
10.1111/b.9781405127479.2004.00016.x
People use “grammar” to refer to a wide range of objects, and I adopt a biological view: grammars are
mental entities which arise in the mind/brain of individual children. These mental grammars show properties
which are not determined by the experience that children have. Children are exposed to utterances made in
some context and this experience does not suffice to shape all aspects of their mature grammars.
Consequently language acquisition is data-driven only in part. Researchers have postulated genotypical
principles which are available independently of experience and which therefore do not have to be learned.
These principles determine similarities among grammars, recurrent properties which hold of all grammars.
Alongside the invariant principles, we also postulate grammatical parameters, which children set on the
basis of their linguistic experience and which account for grammar variation. So language acquisition
proceeds as children set the parameters defined by Universal Grammar (UG), that is, those genotypical
principles and parameters which are relevant for the emergence of language in an individual (Chomsky
1986). The parameters of UG are structural and abstract, as we shall see, and that accounts for the
“bumpiness” of language variation; even closely related languages generally differ from each other in several
ways and not just in terms of one or two superficial phenomena.
We adopt the schema of (1) where (1a) gives general biological terminology and (1b) gives the specific
linguistic terminology: children are genetically endowed with UG and they are exposed to some triggering
experience (PLD); as a result, a mature grammar emerges and becomes part of their phenotype:
(1)
a. Triggering experience (linguistic genotype → phenotype)
b. Primary Linguistic Data (Universal Grammar → grammar)
This perspective on language acquisition was revived in the 1950s. Researchers have focused on poverty-of-
stimulus problems, ways in which mature grammars have properties which cannot result entirely from
childhood experience. Work has also dealt with language variation, parsing, and acquisition, and now we
have fairly rich theories of individual grammars and the UG from which they arise.
1
Turning now to language change, we note that the speech of no two people is identical, so it follows
naturally that if one takes manuscripts from two eras, one will be able to identify differences and so point to
language “change.” In this sense languages are constantly changing in piecemeal, gradual, chaotic, and
relatively minor fashion. However, historians also know that languages sometimes change in a bumpy
fashion, several things changing at the same time, and then settle into relative stasis, in a kind of
“punctuated equilibrium,” to borrow a term from evolutionary biology. From the perspective adopted here, it
12/11/2007 03:38 PM
14. Grammatical Approaches to Syntactic Change : The Handbook of Historical Linguistics : Blackwell Reference Online
Page 2 of 10
http://www.blackwellreference.com/subscriber/uid=532/tocnode?id=g9781405127479_chunk_g978140512747916
“punctuated equilibrium,” to borrow a term from evolutionary biology. From the perspective adopted here, it
is natural to try to interpret cascades of changes in terms of changes in grammars, a new setting for some
parameter, sometimes having a wide variety of surface effects and perhaps setting off a chain reaction. Such
“catastrophic” changes have distinctive features discussed in section 1.
2
So grammatical approaches to
language change have focussed on these large-scale changes, assuming that the clusters of properties tell
us about the harmonies which follow from particular parameters. By examining the clusters of simultaneous
changes and by taking them to be related by properties of UG, we discover something about the scope and
nature of large-scale parameters and about how they are set. Work on language change from this
perspective is fused with work on language variation and acquisition.
1 Parameter Resetting
If we aim to gain insight on how parameters are set by considering the conditions under which parameters
came to be set differently in the history of some language, then we need to know what to look for in
identifying a new parameter setting, as opposed to diachronic shifts which involve no structural change. New
parameter settings have some distinctive characteristics, which are quite independent of any particular
grammatical model.
First, each new parameter setting is manifested by a cluster of simultaneous surface changes, and this is one
element of the catastrophic nature of parameter resetting. For example, the loss of the operation moving
verbs to a distinct inflection position in English (see section 2) entailed the predominance of forms like Kim
always reads the Bible instead of the earlier Kim reads always the Bible, and the obsolescence of inversion
and negative sentences like reads Kim the Bible? and Kim reads not the Bible. These apparently unrelated
changes took place in parallel, as demonstrated by the statistical studies of Kroch (1989a), which showed the
singularity of the change at the grammatical level (and led Kroch to postulate his Constant Rate Effect; see
Pintzuk, this volume).
Second, not only are new parameter settings typically manifested by clusters of changes, but they also often
set off chain reactions. A clear example from English is the establishment of verb-complement order at D-
structure. Lightfoot (1991) showed that this entailed indirectly the analysis of the infinitival to as a
transmitter of properties of its governing verb and the introduction of an operation analyzing speak to,
spoken to, etc. as complex verbs. Such chain reactions can be understood through the acquisition process: a
child with the new verb-complement setting is forced by the constraints of UG to analyze some expressions
differently from the way they were analyzed in earlier generations.
Third, changes involving new parameter settings tend to take place more rapidly than other changes, and
they manifest the S-curve of Kroch (1989a). For example, grammaticalization and morphological change,
involving the loss of gender markers (Jones 1988), the reduction in verbal desinences, or the loss of the
subjunctive mood generally take place over long periods, often several hundred years. In the interim,
individual writers and speech communities show variation in the forms they employ. This kind of gradual
cumulativeness is usually not a hallmark of new structural parameter settings. The old negative patterns
associated with the verb raising operation (Kim reads not the Bible) were robust and widely attested in the
texts until their demise, which was rapid (see section 2). The fast spread of new parameter settings is not
surprising if one thinks of it in the context of language acquisition. Once the linguistic environment has
shifted in such a way as to trigger a new parameter setting in some children, the very fact that some people
have a new parameter setting changes the linguistic environment yet further in the direction of setting the
parameter in the new fashion. That is, the first people with the new parameter setting produce different
linguistic forms, which in turn are part of the linguistic environment for younger people and so contribute to
the spread of the new setting.
Fourth, obsolescence manifests new parameter settings. When structures become obsolete, it is hard to see
how to attribute their obsolescence to the ebb and flow of non-grammatical changes in the linguistic
environment. A novel form may be introduced for expressive reasons, to focus attention on some part of the
utterance by virtue of the novelty of the form, but a form can hardly drop out of the language directly for
expressive reasons or because of the influence of another language. On the contrary, obsolescence must be
due to a structural domino effect, a by-product of something else which was itself triggered by the kind of
positive data generally available to children (for a recent application of this methodology, see Warner 1995:
542).
12/11/2007 03:38 PM
14. Grammatical Approaches to Syntactic Change : The Handbook of Historical Linguistics : Blackwell Reference Online
Page 3 of 10
http://www.blackwellreference.com/subscriber/uid=532/tocnode?id=g9781405127479_chunk_g978140512747916
Fifth, any significant change in meaning is generally a by-product of a new parameter setting, for much the
same reason that the obsolescence of a structure must be the indirect consequence of a more abstract
change. Lightfoot (1991: ch. 6) discusses changes affecting the thematic roles associated with particular NP
positions with verbs like like, repent, ail (the direct object of these verbs could once be an experiencer,
while in modern English only the subject may be an experiencer; so people said things along the lines of
“apples like me” for the modern I like apples). These changes could not arise as idiosyncratic innovations
that somehow became fashionable within the speech-community. It is hard to see how the variation in
meaning could be attained by children on a non-systematic basis, and even harder to see how the variation
could have been introduced as a set of independent developments, imitating properties of another language
or serving some expressive function through their novelty. Rather, such changes must be attributed to some
aspect of a person's grammar which was triggered by the usual kind of environmental factors - for the
English psych-verbs, the existence of only structural Cases.
Sixth, new parameter settings occur in response to shifts in simple data, cues occurring in unembedded
domains only; they are not sensitive to changes or continuities in embedded domains. Embedded domains
are as likely as unembedded domains to reflect the usual toing and froing of the chaotic linguistic
environment, but they have no effect on parameter setting. This follows from degree-0 learnability, the claim
that grammars are learnable, that is, parameters are set on the basis of data from unembedded binding
domains (Lightfoot 1991).
2 V-to-I Raising and its Cue
Let us consider one case of a grammatical change, which is partially understood, using it as a case study to
show what further work is needed. It will show how the study of a change is intimately connected, under this
approach, with work on grammatical theory and on language acquisition. Operations which associate
inflectional features with the appropriate verb appear to be parameterized, and this has been the subject of
a vast amount of work covering many languages (see, for example, the collection of papers in Lightfoot and
Hornstein 1994). We can learn about the shape of the parameter(s) by considering how the relevant
grammars could be attained, and that in turn is illuminated by how some grammars have changed.
Assuming work by Emonds (1978) and Pollock (1989), I adopt the basic clause structure of (2):
(2)
Subjects occur in Spec-IP and wh-elements typically occur in Spec-CP. Heads raise from one head position
to another, so verbs may raise to I and then further to C. In fact, many grammars raise their verbs to the
12/11/2007 03:38 PM
14. Grammatical Approaches to Syntactic Change : The Handbook of Historical Linguistics : Blackwell Reference Online
Page 4 of 10
http://www.blackwellreference.com/subscriber/uid=532/tocnode?id=g9781405127479_chunk_g978140512747916
to another, so verbs may raise to I and then further to C. In fact, many grammars raise their verbs to the
position containing the inflectional elements ((3c) and (3d)), but English grammars, unusually, have an
operation which lowers I on to an adjacent verb ((3a) but not (3b)). We know this because English finite verbs
do not occur in some initial C-like position (4a) and cannot be separated from their complements by
intervening material (4b):
(3)
a. Jill
VP
[leave+past]
b. Jill
I
[leave
i
+past]
VP
[e
i
c. Jeanne
I
[lit
i
]
VP
[toujours e
i
les journaux]
d. lit
i
IP
[elle e
i
VP
[toujours e
i
les journaux]
(4)
a. *visited you Utrecht last week?
b. *the women visited not/all/frequently Utrecht last week
What is it that forces French children to have the V-to-I operation and what forces English children to lack
the operation and to lower their Is?
It is reasonable to construe the English lowering operation as a morphological phenomenon: in general,
lowering operations are unusual in the syntax, and a syntactic lowering operation here would leave behind a
trace which would not be bound or properly governed. Furthermore, one would expect a morphological
operation but not a syntactic operation to be subject to a condition of adjacency. Therefore the
representation in (3a), reflecting a morphological operation, contains no trace of the lowered I. In any case,
the English lowering needs to be taken as the default setting, as argued in Lightfoot (1993), Lasnik (1999),
and Roberts (1999); there is no non-negative evidence available to the child which would force her or him to
select an I-lowering analysis over a V-raising analysis (3b) for English, if both operations could be syntactic
and subject to an adjacency requirement: children would need to know that (4a) and (4b) do not occur
(negative data, therefore unavailable as input to children). In that case, let us take the morphological I-
lowering analysis as the default setting.
Now one can ask what triggers the availability of a syntactic V-to-I raising operation in grammars where it
may apply. Some generalizations have emerged over the last several years. One is that languages with rich
inflection may have V-to-I operations in their grammars, and rich inflection could be part of the trigger
(Rohrbacher 1994). However, the presence of V-to-I raising cannot be linked with rich inflection in a simple
one-to-one fashion. It may be the case that if a language has rich inflection, then V-to-I raising is available
(Lightfoot 1991; Roberts 1997). If there is no rich inflection, a grammar may have the raising operation
(Swedish - see Lightfoot 1997) or may lack it (English). Indeed, English verb morphology was simplified
radically and that simplification was complete by 1400; however, V-to-I movement disappeared only in the
seventeenth century, so there was a long period when English grammars had very little verbal inflection but
did have V-to-I movement. In that case, there needs to be a syntactic trigger for V-to-I movement. So, for
example, a finite verb occurring in C, that is to the left of the subject NP (as in a V2 language or in
interrogatives), could only get there by raising first to I, and therefore inversion forms like (3d) in French
could be syntactic triggers for V-to-I.
3
Here we need to spell out an assumption about language acquisition: associated with each parameter
defined in UG is a cue, some kind of structure. Children scan their linguistic environment for these cues and
set the parameters accordingly. This view is, I believe, implicitly assumed in some work on acquisition
(notably work by Nina Hyams, e.g., Hyams 1986) but it needs to be spelled out more precisely. It differs
from other models (Chomsky 1965; Clark 1992; Clark and Roberts 1993; Gibson and Wexler 1994), which
take a child to converge on a grammar if it succeeds in generating the input data to which the child is
12/11/2007 03:38 PM
14. Grammatical Approaches to Syntactic Change : The Handbook of Historical Linguistics : Blackwell Reference Online
Page 5 of 10
http://www.blackwellreference.com/subscriber/uid=532/tocnode?id=g9781405127479_chunk_g978140512747916
take a child to converge on a grammar if it succeeds in generating the input data to which the child is
exposed. The idea that language acquisition is cue-based and does not proceed in this “input-matching”
fashion results to some extent from work on abrupt language change, where children arrive at grammars
which generate data quite different from grammars of an earlier generation (Lightfoot 1999b).
So triggers consist not of sets of sentences but rather of partially analyzed syntactic structures (Lightfoot
1991: ch. 1): Parameters are set by these partial structures, elements of I-language which act as what
Dresher and Kaye (1990) call cues. So a cue-based learner sets a Spec-head parameter (Spec
precedes/follows its head) on the basis of exposure to data which must be analyzed with a Spec preceding
its head, for example, [[
spec
John's] [
N
hat]]. This parameter can only be set, of course, when the child has a
partial analysis which treats John's and hat as separate words, the latter a head noun, etc. Less trivially, a
cue-based learner acquires a V2 grammar not by evaluating grammars against sets of sentences but on
exposure to structures commencing with a XP followed immediately by a finite V, where there is no fixed
grammatical or thematic relation between the initial phrasal category and the finite verb, effectively where
the initial XP is a non-subject (Lightfoot 1999b). This requires analyzing the XP as in Spec-CP and so
CP
[XP]
is the cue for a V2 system; the cue must be represented robustly in the PLD. As noted, the cue-based
approach to parameter setting is implicitly assumed in some earlier work; also it corresponds to work on the
visual system (which develops as organisms are exposed to very specific visual structures; Hubel 1978;
Hubel and Wiesel 1962; Sperry 1968), it has been productive for phonologists concerned with the parameters
for stress systems (Dresher and Kaye 1990; Dresher 1999; Fikkert 1994, 1995), it has been invoked for
some syntactic problems by Fodor (1998), and it represents something quite different from the input-
matching approach of Gibson and Wexler, Clark, and others.
Returning to our case study, under a cue-based learning approach, one would say that the cue for the V-to-
I parameter is a finite verb in I, that is,
I
[V], an element of I-language. One unambiguous instance of
I
[V] is
an I containing the trace of a verb which has moved on to C, as in the structure of (3d).
Indeed, I would guess that this would be a very important expression of the cue, and I doubt that structures
like (4b) would be robust enough to trigger V-to-I in isolation; this can be tested (see below). Adopting
terminology from Clark (1992), one can ask how robustly the cue is “expressed”; it is expressed robustly if
there are many simple utterances which can be analyzed by the child only as
I
[V]. So, for example, the
sentences of (3c) and (3d) can only be analyzed by the French child if the V lit raises to I; a simple sentence
like Jeanne lit les journaux ‘Jeanne reads the newspapers,’ on the other hand, could be analyzed with lit
raised to I or with the I lowered into the VP in the English style, and therefore it does not express the cue for
the V-to-I parameter.
In English the cue for the V-to-I operation,
I
[V], came to be expressed less in the PLD in the light of three
developments in early Modern English. First, the modal auxiliaries (can, could, may, might, shall, should,
will, would, must), while once instances of verbs that could raise to I, were recategorized such that they
came to be base-generated as instances of I; they were no longer verbs, and so sentences with a modal
auxiliary ceased to include
I
[V] and ceased to express the cue for V-to-I movement. The evidence for the
recategorization is the obsolescence of (5), which follows if the modal auxiliaries are generated in I and
therefore can occur only one per clause (5a), without an aspectual affix (5b), (5c), and mutually exclusively
with the infinitival marker to, which also occurs in I (5d):
(5)
a. John shall can do it
b. John has could do it
c. canning do it
d. I want to can do it
This change has been discussed extensively in Lightfoot (1979, 1991), Kroch (1989a), Roberts (1985,
1993a), and Warner (1983, 1993), and there is consensus that it was complete by the early sixteenth
century.
12/11/2007 03:38 PM
14. Grammatical Approaches to Syntactic Change : The Handbook of Historical Linguistics : Blackwell Reference Online
Page 6 of 10
http://www.blackwellreference.com/subscriber/uid=532/tocnode?id=g9781405127479_chunk_g978140512747916
century.
Second, as periphrastic do came to be used in negatives like John did not leave and interrogatives like did
John leave?, so there were still fewer instances of
I
[V]. Periphrastic do began to occur in significant numbers
at the beginning of the fifteenth century and steadily increased in frequency until it stabilized into its
modern usage by the mid-seventeenth century. Ellegård (1953) shows that the sharpest increase came in
the period 1475–1550.
Third, in early grammars with the much-discussed verb-second system all matrix clauses had a finite verb in
C. Therefore all matrix clauses expressed the cue for V-to-I,
I
[V] (on the assumption that V could move to C
only by moving first to I). As these grammars were lost and as finite verbs ceased to occur regularly in C, so
the expression of the cue for V-to-I raising was reduced.
By quantifying the degree to which a cue for a parameter is expressed, we can understand why English
grammars lost the V-to-I operation and why they lost it after the modal auxiliaries were reanalyzed as non-
verbs, as the periphrastic do became increasingly common, and as the V2 system was lost. We can
reconstruct a plausible history for the loss of V-to-I in English. What we are doing here is identifying when a
parameter came to be reset and how the available triggering experiences, specifically those expressing the
cue, seem to have shifted in critical ways prior to that parameter resetting. We know from acquisition studies
that children are sensitive to statistical shifts in input data. For example, Newport et al. (1977) showed that
the ability of English-speaking children to use auxiliaries appropriately results from exposure to non-
contracted, stressed forms in initial positions in yes-no questions: the greater the exposure to these
subject-auxiliary inversion forms, the earlier the use of auxiliaries in medial position. Also Richards (1990)
demonstrated a good deal of individual variation in the acquisition of English auxiliaries as a result of
exposure to slightly different trigger experiences. The issue is when trigger experiences differ critically, that
is, in such a way as to set some parameter differently.
Our conclusion in earlier work was that V-to-I movement was lost in the seventeenth century, much later
than suggested by Kroch (1989a), Roberts (1993a), and others. Warner (1997) now argues that the operation
may have been lost as late as in the eighteenth century. He offers some statistics from Ellegård (1953) and
Tieken-Boon van Ostade (1987). Ellegård shows that interrogative inversion with a non-auxiliary in positive
clauses (i.e., came he to London? as opposed to did he come to London?) occurred 27 percent of the time
for 1625–50, 26 percent for 1650–1700. Tieken-Boon van Ostade shows a drop to 13 percent in the
eighteenth century. Negative declaratives with a non-auxiliary (he came not to London as opposed to he did
not come to London) occur 68 percent in 1625–50, 54 percent in 1650–1700, dropping sharply to 20
percent in the eighteenth century. The drop is actually sharper than these figures suggest; Tieken-Boon van
Ostade's figures for the later period include a high proportion of recurrent items (know, doubt, etc.) which
Ellegård omitted. A particularly interesting feature of these figures is the discrepancy between the
interrogatives and the negatives, which lends some support to the hunch (above) that structures like those
underlying (3d) are a more effective expression of the cue
I
[V] than structures like those of (4b). In any case,
we see that structures like (4b) were robust and widely attested in the texts of the late seventeenth century
and then they disappeared rapidly - the kind of bumpiness that the notion of grammatical parameters leads
us to expect.
The historical facts, then, suggest that lack of rich subject-verb agreement cannot be a sufficient condition
for absence of V-to-I, but it may be a necessary condition. Under this view the possibility of V-to-I not
being triggered first arose in the history of English with the loss of rich verbal inflection; similarly in Danish
and Swedish. That possibility never arose in Dutch, French, or German, where verbal inflections remained
relatively rich. Despite this possibility, V-to-I continued to be triggered and it occurred in grammars well
after verbal inflection had been reduced to its present-day level. However, with the reanalysis of the modal
auxiliaries, the increasing frequency of periphrastic do and the loss of the V2 system, the expression of
I
[V]
in English became less and less robust in the PLD. That is, there was no longer anything very robust in the
PLD which had to be analyzed as
I
[V], that is, which required V-to-I, given that the morphological I-lowering
operation was always available. In particular, sentences like (4b) with post-verbal adverbs and quantifiers had
to be analyzed with the V in I, but these cues were not robust enough to set the parameter and they
disappeared quickly, a by-product of the loss of V-to-I.
This suggests that the expression of the cue dropped below some threshold, leading to the elimination of V-
to-I movement. The next task is to quantify this generally, but we should recognize that the gradual
12/11/2007 03:38 PM
14. Grammatical Approaches to Syntactic Change : The Handbook of Historical Linguistics : Blackwell Reference Online
Page 7 of 10
http://www.blackwellreference.com/subscriber/uid=532/tocnode?id=g9781405127479_chunk_g978140512747916
to-I movement. The next task is to quantify this generally, but we should recognize that the gradual
reduction in the expression of
I
[V] is not crucial, but rather the point at which the phase-transition took
place, when the last straw was piled on to the camel's back. This can be demonstrated by building a
population model, tracking the distribution of the [V] cues in the PLD, and identifying the point at which the
parameter was reset and V-to-I ceased to be triggered (differing, of course, from one individual or one
dialect area to another). This work remains to be done (see below), but one hopes to find correlations
between the changing distribution of the cue and the parametric shift.
3 Other Case Studies and Some Comparisons
This grammatical approach to diachrony explains changes at two levels. First, the set of parameters
postulated as part of UG explains the unity of the changes, why superficially unrelated properties cluster in
the way that they do. Second, the cues associated with the parameters permit an account of why the change
took place, why children at a certain point set a parameter differently: the distribution of those cues
changed in such a way that a threshold was crossed and the relevant parameter was set differently. That is
as far as this model goes, and it has nothing to say about why the distribution of the cues should change.
That may be explained by claims about language contact or socially defined speech fashions but it is not a
function of theories of grammar, acquisition, or change - except under one set of circumstances, where the
new distribution of cues results from an earlier parametric shift; in that circumstance one has a “chain” of
grammatical changes. One can, of course, embed these grammatical accounts in an appropriate model of
population change; see section 4.
Notice that this approach to change is independent of any particular grammatical model. Warner (1995)
offers a persuasive analysis of parametric shift using a lexicalist HPSG model, quite different from the one
assumed here. Interesting diachronic analyses have been offered for a wide range of phenomena, invoking
different grammatical claims: Fontana (1993), van Kemenade (1987), Pearce (1990), Roberts (1993a, 1993b,
1994, etc.), Sprouse and Vance (1999), Vance (1995), and many others.
Our general approach to abrupt change, where children acquire very different systems from those of their
parents, is echoed in work on creolization under the view of Bickerton (1984, 1999), and the acquisition of
signing systems by children exposed largely to unnatural input (Goldin-Meadow and Mylander 1990;
Newport 1999; Supalla 1990). For several years Bickerton has worked on plantation creoles, where new
languages appear to be formed in the space of a single generation. He argues, surely correctly, that
situations in which “the normal transmission of well-formed language data from one generation to the next
is most drastically disrupted” will tell us something about the innate component and how it determines
acquisition (Bickerton 1999); it certainly shows that children do not always proceed by converging on
grammars which match the input.
The work of Bickerton and his associates is limited by the sketchiness of the available data for the earliest
stages of creole languages, but the view that new languages emerge rapidly and fully formed despite very
impoverished input receives striking support from work on signed languages. The critical fact here is that
only about 10 percent of deaf children in the US are born to deaf parents who can provide early exposure to
a conventional sign language. This means that the vast majority of deaf children are exposed initially to
fragmentary signed systems which have not been internalized well by their primary models. This is often
some form of Manually Coded English (MCE), which maps English into a visual/gestural modality. Goldin-
Meadow and Mylander (1990) take these to be artificial systems, and they show how deaf children go
beyond their models in such circumstances and “naturalize” the system, altering the code and inventing new
forms which are more consistent with what one finds in natural languages. Supalla (1990) casts more light
on this, showing that MCE morphology fails to be attained well by children, who fail to use many of the
markers that they are exposed to and use other markers quite differently from their models. He focuses on
deaf children who are exposed only to MCE with no access to American Sign Language (ASL), and he found
that they restructure MCE morphology into a new system. Clearly this cannot be modeled by input-matching
learning devices, because the input is not matched. Furthermore, it is not enough to say that MCE
morphology simply violates UG constraints, because that would not account for the way in which children
devise new forms. More is needed from UG. The unlearnability of the MCE morphology suggests that
children are cue-based learners, programmed to scan for clitic-like, unstressed, highly assimilable
inflectional markers. That is what they find standardly in spoken languages and in natural signed languages
like ASL. If the input fails to provide such markers, then appropriate markers are invented; children seize
appropriate kinds of elements which can be interpreted as inflectional markers. The acquisition of signed
12/11/2007 03:38 PM
14. Grammatical Approaches to Syntactic Change : The Handbook of Historical Linguistics : Blackwell Reference Online
Page 8 of 10
http://www.blackwellreference.com/subscriber/uid=532/tocnode?id=g9781405127479_chunk_g978140512747916
appropriate kinds of elements which can be interpreted as inflectional markers. The acquisition of signed
languages under these circumstances offers an opportunity to understand more about abrupt language
change, creolization, and cue-based learning (Lightfoot 1999b).
The characterization of abrupt grammatical change sketched in this chapter makes sense only if one views
grammars as individual mental entities, and not as some kind of social entity codifying the data attested in
the texts of some period. Failure to make this simple distinction has entailed confusion in the literature,
discussed in Lightfoot (1995). There has been interesting work on the replacement of one grammar by
another, that is, the spread of change through a speech community. So, Kroch and his associates (Kroch
1989a; Kroch and Taylor 1997; Pintzuk 1990; Santorini 1992, 1993; Taylor 1990) have argued for coexisting
grammars. That work postulates that speakers may operate with more than one grammar in a kind of
“internalized diglossia,” and it enriches grammatical analyses by seeking to describe the variability of
individual texts and the spread of a grammatical change through a population (see Pintzuk, this volume).
However, the approach sketched here is not consistent with three other pervasive lines of thought. One is
the idea that all change is gradual and that abrupt, catastrophic change does not happen (Harris, this
volume; Harris and Campbell 1995; Hopper and Traugott 1993; Carden and Stewart 1988). This is
sometimes modeled in “lexicalist” theories of grammar, in which particular grammars differ from each other
not in terms of settings of abstract parameters but in terms of features on individual lexical items (see
Lightfoot 1991: ch. 6 for discussion). This approach to change implies that language acquisition is data-
driven, that children match their input, which may vary without limit. Where children appear not to match
their input, it is claimed that access to more complete data would reveal that abrupt transitions do not
happen. Of course, in dealing with historical texts, one is dealing with performance data which do not match
grammars perfectly, least of all single grammars. This means that grammarians must interpret the data and
each interpretation must find the most appropriate level of abstraction. For example, Fries (1940) offered
statistical data showing that Old English alternated between object-verb and verb-object order freely and
that “the order of… words … has no bearing whatever upon the grammatical relationships involved” (p.
199). He found that object-verb order occurred 53 percent of the time around the year 1000 and that it was
“gradually” replaced by verb-object order, reducing to 2 percent by the year 1500. However, his counts
ignored the distinction between matrix and embedded clauses and he had no analysis of verb-second
effects. If one makes such distinctions, one can show that Old English grammars most typically had object-
verb order underlyingly and an operation of verb movement raising finite verbs to C in matrix clauses to
yield verb-second order (van Kemenade 1987). Kroch and Taylor (1997) show that there was a dialect
difference involving movement of finite verbs to C, and consequently the grammatical change consisted in a
change in the head order parameter and the loss of “verb-second” grammars, each of which was
catastrophic (Lightfoot 1999b).
A second incompatible line of thought is that there exists a theory of change with some content (Harris, this
volume). If one has a theory of grammar and a theory of acquisition, it is quite unclear what a theory of
change is supposed to be a theory of. Presumably a “theory of grammaticalization” (Heine, Traugott, and
others, this volume) is a subpart of such a theory of change, insofar as it involves a claim that there is more
grammaticalization over time.
A third approach with which I would take issue is the tendency to incorporate historicist elements into UG.
Keyser and O'Neil (1985: 3) propose a condition that “whenever possible the language acquisition device
reduces the level of optionality, either by change of status or rule loss”; their evidence comes from changes
which they analyze as the loss of optional rules. Similarly, Bauer (1995) construes Latin as a thoroughgoing
left-branching (LB) language which changes into a thoroughgoing right-branching language (French). She
explains this on the grounds that LB languages (with non-agglutinating morphology) were hard to acquire:
“Latin must have been a difficult language to master, and one understands why this type of language
represents a temporary stage in linguistic development” (p. 188). So she explains her change not in a
mysterious theory of history, but rather in terms of human biology: our brains work in such a way that
complex structures in LB languages without agglutinative morphology are hard to acquire. This, of course,
immediately raises the question of why early Latin would have been LB: “If left-branching structures are …
acquired with greater difficulty, it is indeed legitimate to wonder why languages, in an early period, exhibit
this kind of structure” (p. 216). She concludes that this “still remains to be explained” (p. 217); see Lightfoot
(1996a) for further discussion. In the same vein, Kiparsky (1997) appeals to “endogenous optimization” and
Roberts (1993b) builds a weighting into UG so that UG effectively encourages learners to “grammaticalize”
independently of what they experience through their PLD; this is said to promote Diachronic Reanalyses (see
12/11/2007 03:38 PM
14. Grammatical Approaches to Syntactic Change : The Handbook of Historical Linguistics : Blackwell Reference Online
Page 9 of 10
http://www.blackwellreference.com/subscriber/uid=532/tocnode?id=g9781405127479_chunk_g978140512747916
Lightfoot 1997). Historical linguists often see general directions to change and they explain this either by
invoking laws of history (i.e., a “theory of change”; see Lightfoot 1979) or by attributing historical effects to
genetic predispositions. So Keyser and O'Neil (1985) build a clause into UG predisposing us against optional
rules. But for optional rules to be lost, they must first be introduced; if we are predisposed not to attain
optional rules, one wonders how they would be triggered in the first place. The identical point holds of the
inbuilt tendencies to branch to the right, to “optimize,” and to grammaticalize. Rather, one needs a more
contingent approach: two people attain different grammars only if exposed to PLD which differ in some
relevant way, and therefore parameter resetting is to be explained only by a prior change in the PLD.
Language acquisition takes place by an interaction of UG, the PLD, and nothing else.
4 Conclusion
For several years syntacticians and some phonologists have claimed that language acquisition proceeds as
children set the parameters prescribed by UG. However, there has been little discussion of the general
nature of parameters, their number, and how they are set by children. Indeed, some linguists have come to
equate parameters with superficial “differences” between languages, trivializing the notion. Parameters have
become more fine-grained, each one capturing smaller ranges of phenomena. So the “pro-drop parameter”
fragmented as linguists analyzed languages/dialects showing some but not all of the early diagnostic pro-
drop properties; and, most recently, it has disappeared as a distinct parameter altogether (Chomsky 1995:
ch. 4). Baker (1996) argues that this fragmentation results from research strategies focusing too narrowly on
closely related languages/dialects in so-called “micro-comparative” syntax. This runs the risk of allowing
parameters to proliferate and run out of control. We can counter the trend to fragment parameters and
equate them with mere surface differences between languages. We can do this by focusing on large-scale
shifts in language histories and seeking to determine what smaller shifts in the PLD, specifically in the cues,
took place just prior to those large-scale shifts. In this way we gain a better sense of the nature of some
central parameters and of what sets them. Our central concern is with the theory of grammars.
Work from this perspective yields a series of case studies as outlined in section 2. We aspire to offer all the
ingredients of an explanation of the grammatical change. Our work fuses research on language acquisition,
change, and variation. We aim to refine ideas about parameters by considering how they are triggered,
combining acquisitional and historical data with learnability concerns. This enables us to characterize the
“bumpiness” of language variation and change, and, in doing so, we employ no distinct “theory of change.”
In addition, Niyogi and Berwick (1995) have recently offered a population genetics computer model for
describing the spread of new grammars. It is generally agreed that certain changes progress in an S-curve
but now Niyogi and Berwick provide a model of the emergent, global population behavior which derives the
S-curve. They postulate a learning theory and a population of child learners, a small number of whom fail to
converge on pre-existing grammars, and they produce a plausible model of population changes for the loss
of null subjects in French. The fact that changes can be shown to progress through populations in an S-
curve is not surprising to those who have written about chaotic systems and catastrophic changes (Lightfoot
1991: ch. 7), but the success of Niyogi and Berwick is to show that it is not impossibly difficult to compute
(or simulate) grammatical dynamical systems; they show explicitly how to transform parameterized theories
and memoryless learning algorithms to dynamical systems, producing results along the way.
As we produce productive models for historical change along these lines, relating changes in simple cues to
large-scale parametric shifts, our results have consequences for the way in which we think about parameters
and how they are set and, therefore, for the way in which we study language acquisition. Experimental work
on language learners cannot presently approach the distinction between cue-based learners and those
following Clark's genetic algorithms, or associations between cues and parameter settings. However, with the
development of various computerized corpora, Niyogi and Berwick's results, and an explicit cue-based
theory of acquisition, we have all the ingredients for success in the historical domain, as I have sketched it,
and we shall learn something about how acquisition takes place, whether the child is a degree-0, cue-based
learner or some other kind of learner.
This paper was revised into its present form in 1998.
1 For an introductory account emphasizing poverty-of-stimulus problems, see Lightfoot (1982). Chomsky (1986)
offers a more detailed account, including some technical material. And Chomsky (1995) represents the presently
most advanced version of this research tradition.
12/11/2007 03:38 PM
14. Grammatical Approaches to Syntactic Change : The Handbook of Historical Linguistics : Blackwell Reference Online
Page 10 of 10
http://www.blackwellreference.com/subscriber/uid=532/tocnode?id=g9781405127479_chunk_g978140512747916
2 Catastrophe theory, developed originally by the French mathematician René Thom, is an attempt to provide a
mathematical framework for modeling various kinds of discontinuous processes. For example, one can lower the
temperature of a body of water and a catastrophic change takes place at 32°F, when it turns to ice; the water does
not gradually become more ice-like, but the phase transition is sudden. For a good and balanced discussion of
work on catastrophes, see Casti (1994: ch. 2), who points out that the French catastrophe is not quite as
catastrophic as the English catastrophe (p. 53). For us the “catastrophes” are the bumpy discrepancies that one
finds from time to time between the input that a given child is exposed to and the output that that child's mature
grammar yields.
3 See also Faarlund (1990) and Vance (1995) for illuminating discussion bearing on these matters.
Cite this article
LIGHTFOOT, DAVID. "Grammatical Approaches to Syntactic Change." The Handbook of Historical Linguistics.
Joseph, Brian D. and Richard D. Janda (eds). Blackwell Publishing, 2004. Blackwell Reference Online. 11 December
2007 <http://www.blackwellreference.com/subscriber/tocnode?
id=g9781405127479_chunk_g978140512747916>
Bibliographic Details
The Handbook of Historical Linguistics
Edited by: Brian D. Joseph And Richard D. Janda
eISBN: 9781405127479
Print publication date: 2004