Saturday, December 19, 2020

The TLA does not "store sets of sentences"

In a series of publications, David Lightfoot has made claims about the Triggering Learning Algorithm (TLA) for the acquisition of syntax. The core claim is that the TLA is an input-matching model which evaluates a grammar globally against a set of sentences heard by the child; this is then taken to show that the TLA is not a plausible or attractive model of acquisition. The purpose of this blog post is to show that this core claim is false.

What's the Triggering Learning Algorithm?

The TLA is an algorithm developed by Gibson & Wexler (1994) to characterize the acquisition of syntax by first-language learners. It assumes that the learning task involves setting a finite number of binary parameters: at any given time, the learner's grammar can be represented as just a vector of ones and zeroes. The flowchart below illustrates how it works, informally.

Flowchart for a child hearing a sentence under the TLA

The child receives an input sentence. This isn't just a raw string – rather, it's assumed that the child is able to extract certain syntactic properties unproblematically, so the input sentence, rather than "Mary likes motorbikes", would be something like "S(ubject) V(erb) O(bject)".

The child then checks whether the sentence is compatible with the grammar they have stored in memory. For instance (all else being equal), a sentence "S V O" would be compatible with a grammar where the Head Parameter is set to head-initial and the VP is therefore head-initial, but not with a grammar where it's set to head-final.

If the sentence is compatible with the stored grammar, then nothing more needs be done: the child doesn't change its stored grammar, and will wait until another sentence comes along, at which point the process starts again (the dotted line).

If, on the other hand, the sentence is incompatible with the stored grammar, then the child will entertain an alternative grammar. They do this by picking one of the parameters at random and flipping the value. If (say) there are six parameters, then this basically involves rolling a six-sided dice. For instance, they might flip the Head Parameter from head-final to head-initial. Or they might flip a different parameter – say the Null Subject Parameter, from "boo null subjects are bad" to "yay null subjects".

The child then checks whether the sentence is compatible with the new, alternative grammar. If the sentence is "S V O" and the child has flipped the Head Parameter to head-initial, then the sentence is compatible with the new grammar. This causes the child to replace the grammar they had stored in memory with the new one they're entertaining. If, on the other hand, the child has flipped the Null Subject Parameter to "yay null subjects", that doesn't help them parse the sentence "S V O", and at this point the child gives up (perhaps getting bored) and reverts to the stored grammar. Either way, the child then waits for the next sentence (the dotted line).

After a certain point (after a particular number of sentences, or the end of the Critical Period), the learning process stops, and the child's grammar has reached its final form.

The TLA in this form is pretty straightforward.

Obviously I'm presenting it informally, and glossing over a lot of technical details. In particular, like the exposition in Gibson & Wexler, I'm assuming the Single Value Constraint (the child only tries to flip one parameter at a time) and the Greediness Constraint (the child will only adopt the new, alternative grammar if it actually helps them analyse the sentence they've heard). There's more to be said, but go and read Gibson and Wexler (1994) if you find this sort of thing exciting.

Lightfoot's criticisms of the TLA

Now let's see what Lightfoot has to say about it.
"In being E-language based, these models face huge feasibility problems. One can see those problems emerging with the system of Gibson & Wexler (1994). ... [T]he child needs to determine which ... [grammar] his/her language falls into. That cannot be determined from a single sentence, given the way their system works, but rather the child needs to store the set of sentences experienced and to compare that set with each of the ... possible sets, not a trivial task and one that requires memory banks incorporating the data sets ... So if acquisition proceeds by Gibson & Wexler's TLA and there are forty parameters, then there will be over a trillion different data sets to be stored and checked, and each of those data sets will be enormous." (Lightfoot 2006: 76)
From the characterization of the TLA provided in this blog post, we can see immediately that the statement that "the child needs to store the set of sentences experienced" is incorrect. All the child ever needs to store long-term is the grammar, and short-term a single sentence needs to be held in working memory. Once a single sentence has been dealt with, it can be immediately forgotten. Nothing about the TLA involves storage of sets of sentences, or comparison of sets of sentences.

The same criticism is repeated in subsequent work by Lightfoot:
"Work in synchronic syntax has rarely linked grammatical properties to particular triggering effects, in part because practitioners often resort to a model of language acquisition that is flawed ... I refer to a model that sees children as evaluating grammars against sets of sentences and structures, matching input and evaluating grammars in terms of their overall success in generating the input data most economically, e.g. ... Gibson & Wexler (1994), and many others." (Lightfoot 2017a: 383)
There is no overall evaluation of the success of grammars in generating the input data under the TLA, as shown. There is also no role for economy.
"Gibson & Wexler['s] ... children are "error-driven" and determine whether a grammar will generate everything that has been heard, whether the grammar matches the input. ... There are huge feasibility problems for ... these global, evaluation-based, input-matching approaches to language acquisition, evaluating whole grammars against comprehensive sets of sentences experienced (Lightfoot 1999, 2006: 76f). ... [I]n order to check whether the generative capacity of a grammar matches what the child has heard, s/he will need to retain a memory in some fashion of everything that has been heard." (Lightfoot 2017b: 6–7)
As clarified above, the child does not need to remember everything that has been heard, under the TLA. In fact, the child does not need to remember anything that has been heard, in the long-term. There is no global evaluation.

Lightfoot pushes these points particularly strongly in his latest book, Born to Parse (2020). Here he states that systems such as the TLA
"... involve the global evaluation of grammars as wholes, where the grammar as a whole is graded for its efficiency (Yang 2002). Children evaluate postulated grammars as wholes against the whole corpus of PLD that they encounter, checking which grammars generate which data. This is "input matching" (Lightfoot 1999) and raises questions about how memory can store and make available to children at one time everything that has been heard over a period of a few years." (Lightfoot 2020: 24)
These questions do not arise, because the TLA does not need to store input. There is no evaluation of grammars as wholes.

(It's also not clear why Yang 2002 is cited here, because his model of acquisition also doesn't involve global grading of the efficiency of grammars.)
"Gibson and Wexler take a different approach, but in their view as well, children effectively evaluate whole grammars against whole sets of sentences ... [W]hen they encounter a sentence that their current grammar cannot generate ... children ... pick another parameter setting, and they continue until they converge on a grammar for which there are no unparsable PLD and no errors. ... Gibson and Wexler's child calculate[s] the degree to which the generative capacity of the grammar under study conforms to what they have heard." (Lightfoot 2020: 25)
The TLA does not require that the child converge on a grammar for which there are no unparsable PLD and no errors. In fact, when the child updates their stored grammar, the result may be to make sentences that were previously heard (and parsed correctly) unparsable in future. The child does not calculate the degree of anything: it's a fully discrete model of parameter setting, which doesn't even involve stored probabilities, unlike subsequent models such as that of Yang (2002).

In Lightfoot (2020), these (invalid) criticisms of the TLA and models like it are referred to constantly in subsequent chapters (e.g. p35, p60, p64, p91) in order to motivate a different approach, one which it would take us too far afield to discuss here.

So has the TLA been saved, then?

My goal in this blog post is not to defend the TLA as a viable theory of acquisition per se. There are lots of actually or conceivably valid criticisms of it. For instance:
  • For certain systems of parameters, local maxima or 'sinks' occur – grammars which it is impossible for the learner to escape from (Gibson & Wexler 1994; Berwick & Niyogi 1996).
  • Learning is more efficient if the Single Value constraint is dropped – that is, if the child is allowed to flip any number of parameters at once and jump to a completely different grammar (Berwick & Niyogi 1996; Niyogi & Berwick 1996).
  • The system of parameters explored by Gibson & Wexler (1994) can be learned by a much more conservative learner who just waits for the right unambiguous 'silver bullet' sentences to come along (Fodor 1998).
  • The TLA is very vulnerable to noisy input data: if the very last sentence the learner hears before fixing their grammar once and for all happens to be junk, the learner may flip to a different grammar, and "the learning experience during the entire period of language acquisition is wasted" (Yang 2002: 20).
  • The TLA falsely predicts abrupt changes in the child's linguistic behaviour, and cannot capture gradualness or probabilistic variation (Yang 2002: 20–22).
  • The TLA is dependent on a finite set of innate parameters in order to function; if this turns out not to be how syntactic variation works, then the TLA won't work either.
Perhaps because of some of these problems, the TLA has effectively been abandoned as a model of syntactic acquisition over the last two decades: at least as far as I am aware, no one is seriously working with it as a real contender (rather than as a toy model for expository purposes) at the moment. It's therefore not obvious why Lightfoot spends so much time attacking it. But there's no need for speculation on that front, because, as I've shown in this blog post, his criticisms of the TLA are invalid, and depend on imputing properties to the model that it does not actually have.

References

  • Berwick, Robert C., & Partha Niyogi. 1996. Learning from triggers. Linguistic Inquiry 27, 605–622.
  • Fodor, Janet D. 1998. Unambiguous triggers. Linguistic Inquiry 29, 1–36.
  • Gibson, Edward, & Kenneth Wexler. 1994. Triggers. Linguistic Inquiry 25, 407–454.
  • Lightfoot, David W. 1999. The development of language: acquisition, change, and evolution. Oxford: Blackwell.
  • Lightfoot, David W. 2006. How new languages emerge. Cambridge: Cambridge University Press.
  • Lightfoot, David W. 2017a. Acquisition and learnability. In Adam Ledgeway & Ian Roberts (eds.), The Cambridge handbook of historical syntax, 381–400.
  • Lightfoot, David W. 2017b. Discovering new variable properties without parameters. Linguistic Analysis 41, 1–36.
  • Lightfoot, David W. 2020. Born to parse: how children select their languages. Cambridge, MA: MIT Press.
  • Niyogi, Partha, & Robert C. Berwick. 1996. A language learning model for finite parameter spaces. Cognition 61, 161–193.

No comments: