Monday, January 19, 2015

How to make an EVOLANG argument

Bob Berwick recently sent me something that aims to survey, albeit sketchily, the state of play in the evolution of language (evolang) and a nice little paper surveying the current state of Gould and Lewontin’s spandrels paper (here) (hint: their warning is still relevant). There have also been more than a few comments in FOL threads remarking on the important progress that has been made on evolang. I believe that I have invited at least one evolang enthusiast to blog about this (I offered as much space as desired, in fact) so as to enlighten the rest of us about the progress that has been made. I admit that I did this in part because I thought that the offer would not be taken up (a put up or shut-up gambit) and also (should the challenge be accepted) because I would really be interested in knowing what has been found given my profound skepticism that at this moment in time there is anything much to find.  In other words, for better or for worse, right now I doubt that there is much substantive detail to be had about how language actually evolved in the species.[1] In this regard, we are not unlike the Paris Academy over a century ago when it called for a moratorium on such speculation.

That said, who can resist speculating? I can’t. And therefore, this post was intended to be an attempt to examine the logic of an evolution of language account that would satisfy someone like me. I wanted to do this, because, though close to vacuous most of the discussion I’ve seen is (like the fancy inversion here?), I think that Minimalism has moved the discussion one small conceptual step forward. So my intention had been to outline what I think this small step is as well as point to the considerable distance left to travel.  

As you can tell from the modal tenses above, I was going to do this, but am not going to do it. Why not? Because someone has done this for me and instead of my laying out the argument I will simply review what I have received. The text for the following sermon is here, a recent paper by Chomsky on these matters.[2] It is short, readable and (surprise, surprise) lays out the relevant logic very well. Let’s go through the main bits.

Any discussion of evolang should start with a characterization of what features of language are being discussed. We all know that “language” is a very complex “thing.” Any linguist can tell you that there are many different kinds of language properties. Syntax is not phonology is not semantics. Thus in providing an evolutionary account of language it behooves a proposal to identify the properties under consideration.

Note that this is not an idiosyncratic request. Evolution is the study of how biological entities and capacities change over time. Thus, to study this logically requires a specification of the entity/capacity of interest. This is no less true for the faculty of language (FL) than it is for hearts, kidneys or dead reckoning. So, to even rationally begin a discussion in evolang requires specifying the properties of the linguistic capacity of interest.

So, how do we specify this in the domain of language? Well, here we are in luck. We actually have been studying these linguistic capacities for quite a while and we have a rich, developed, and articulate body of doctrine (BOD) that we can pull from in identifying a target of evolutionary interest. Chomsky identifies one feature that he is interested in. He terms this the “Basic Property” (BP) and describes it as follows:

[E]ach language yields a digitally infinite array of hierarchically structured expressions with systematic interpretations at interfaces with two other internal systems, the sensorymotor system for externalization and the conceptual system, for interpretation, planning, organization of action, and other elements of what are informally called “thought.” (1)

So one evolang project is to ask how the capacity that delivers languages with these properties (viz. I-languages) arose in the species. We call the theory of I-languages “Universal Grammar” or UG as it “determines the class of generative procedures that satisfy the Basic Property” (1). We can take UG as “the theory of the genetic component of the faculty of language.” If we do, there is a corresponding evolang question: how did UG arise in the species?[3]

Note, that the above distinguishes FL and UG. FL is the mental system/”organ” that undergirds the human linguistic competence (ie. The capacity to develop (viz. “grow”) and deploy (viz. “use”) I-languages). UG is the linguistically specific component of FL. FL is likely complex, incorporating many capacities only some of which are linguistically proprietary. Thus, UG is a subpart of FL. One critical evolang question then is how much of FL is UG. How much of FL consists of linguistically proprietary properties, capacities/primitives that are exclusively linguistic?

Why is the distinction important? Well, because it sure looks like humans are the only animals with BP (i.e. nothing does language like humans do language!) and it sure looks like this capacity is relatively independent of (viz. dissociates with) other cognitive capacities we have (see here). Thus, it sure looks like the capacity to generate BP-I-languages (BPIs) is a property of humans exclusively. And now we come to the interesting evolang problem: as a point of evolutionary logic (we might dub this the Logical Problem of Language Evolution (LPLE)) the bigger the UG part of FL, the more demanding the problem of explaining the emergence of FL in the species. Or as Chomsky puts it (3): “UG must meet the condition of evolvability, and the more complex its assumed character, the greater the burden on some future account of how it might have evolved.”

We can further sharpen the evolvability problem by noting one more set of boundary conditions on any acceptable account. There are two relevant facts of interest, the first “quite firm” and the second “plausible” and that we refer to with “less confidence.”  These are:

1.     There has been no evolution of FL in the species in the last 50k years or more.
2.     FL emerged in the way it exists today about 75k years ago.

As Chomsky puts it (3): “It is, for now, a reasonable surmise that language –more accurately UG- emerged at some point in the very narrow window of evolutionary time, perhaps in the general neighborhood of 75 thousand years ago, and has not evolved since.”[4]

Why is (1) firm? Because there are no known group differences in the capacity humans have in acquiring and using a natural language. As the common wisdom is that our ancestors left Africa and their paths diverged about 50kya then this would be unexpected were there evolution of FL or UG after this point.

Why is (2) less firm? Because we infer it to be true based on material cultural artifacts that are only indirect indicators of linguistic capacity. This evidence has been reviewed by Ian Tattersal (here) and it looks like the conclusion he draws on these issues is a plausible one. Chomsky is here relying on this archeological “consensus” view for his “plausible” second assumption.

If these assumptions are correct then, as Chomsky notes (3)  “UG must be quite simple at its core” and it must have emerged more or less at once. These are really flip sides of the same claim. The evolutionary window is very narrow and so whatever happened must have happened quickly in evo-time and for something to happen quickly it is very likely that what happened was a small simple change. Complexity takes a long time. Simplicity not so much.[5] So, what we are looking for in an evolang account of our kinds of natural langauges is some small change that has BPI-effects. Enter Minimalism.

Chomsky has a useful discussion of the role of evolvability in early Generative Grammar (GG). He notes that the evolvability of FL/UG was always recognized to be an important question and that people repeatedly speculated about it. He mentions Lenneberg and Luria in this regard, and I think I recall that there was also some scattered discussion of this in the Royaumont conference. I also know that Chomsky discussed these issues with Francois Jacob as well. However, despite the interest of the problem and the fact that it was on everyone’s radar the speculation never got very far. Why not? Because of the state of the theory of UG.  Until recently, there was little reason for thinking that UG was anything but a very complicated object with complex internal structure, many different kinds of primitives, processes and conditions (e.g. just take a look at GB theory). Given the LPLE, this made any fruitful speculation idle, or, in Dwight Whitney’s words quoted by Chomsky: “The greater part of what is said and written about it is mere windy talk” (4) (I love this Ecclesiastical description: Wind, wind, all is wind!).

As Chomsky notes, minimalism changed this. How? By suggesting that the apparent complexity of UG as seen from the GB angle (and all of GB’s close relatives) is eliminable. How so? By showing that the core features of BPIs as described by GB can be derived from very a simple rules (Merge) applied in very simple ways (computationally “efficient”). Let me say this more circumspectly: if to the degree that MP succeeds to that degree the apparent complexity of FL/UG can be reduced. In the best case, the apparent complexity of BPIs reduces to one novel language specific addition to the human genome and out falls our FL.  This one UG addition together with our earlier cognitive apparatus and whatever non-cognitive laws of nature are relevant suffice to allow the mergence of the FL we all know and love. If MP can cash this promissory note, then we have taken a significant step towards solving the evolang problem.

Chomsky, of course, rehearses his favorite MP account (7-9): the simplest Merge operation yielding unordered merges, the simplest application of the rule to two inputs yielding PS rules and Movement, natural computational principles (not specific to language but natural for computation as such) resulting in conditions like Inclusiveness and Extension and something like phases, the simple merge rule yielding a version of the copy theory of movement with obvious interpretive virtues etc.  This story is well known, and Chomsky rightly sees that if something like this is empirically tenable then it can shed light on how language might have evolved, or, at the very least, might move us from windy discussions to substantive ones.

Let me say this one more way: what minimalism brings to the table is a vision of how a simple addition might suffice to precipitate an FL like the one we think we have empirical evidence for. And, if correct, this is, IMO, a pretty big deal. If correct, it moves evolang discussion of these linguistic properties from BS to (almost) science, albeit, still of a speculative variety.

Chomsky notes that this does not exhaust the kinds of evolang questions of interest. It only addresses the questions about generative procedure. There are others. One important one regards the emergence of our basic lexical atoms (“words”). These have no real counterpart in other animal communication systems and their properties are still very hard to describe.[6] A second might address how the generative procedure hooked up to the articulatory system. It is not unreasonable to suppose that fitting FL snugly to this interface took some evolutionary tinkering. But though questions of great interest remain, Chomsky argues, very convincingly in my view, that with the rise of MP linguistics has something non-trivial to contribute to the discussion: a specification of an evolvable FL.

There is a lot more in this little paper. For example, Chomsky suggests that much of the windiness of much evolang speculation relates to the misconceived notion that the natural language serves largely communicative ends (rather than being an expression of thought). This places natural languages on a continuum with (other) animal communication systems, despite the well-known huge apparent differences. 

In addition, Chomsky suggests what he intends with the locution ‘optimal design’ and ‘computationally efficient.’ Let me quote (13):

Of course, the term “designed” is a metaphor. What it means is that the simplest evolutionary process consistent with the Basic Property yields a system of thought and understanding [that is sic (NH)] computationally efficient since there is no external pressure preventing this optimal outcome.

“Optimal design” and “computational efficiency” are here used to mean more or less the same thing. FL is optimal because there is no required tinkering (natural selection?) to get it into place.  FL/UG is thus evolutionarily optimal. Whether this makes it computationally optimal in any other sense is left open.[7]

Let me end with one more observation. The project outlined above rests on an important premise: that simple phenotypic descriptions will correspond to simple genotypic ones. Here’s what I mean. Good MP stories provide descriptions of mental mechanisms, not  neural or genetic mechanisms. Evolution, however, selects traits by reconfiguring genes or other biological hardware. And, presumably, genes grow brains, which in turn secrete minds. It is an open question whether a simple mental description (what MP aims to provide) corresponds to a simple brain description, which, in turn, corresponds to a simple “genetic” description. Jerry Fodor describes this train of assumptions well here.[8]

…what matters with regard to the question whether the mind is an adaptation is not how complex our behaviour is, but how much change you would have to make in an ape’s brain to produce the cognitive structure of a human mind. And about this, exactly nothing is known. That’s because nothing is known about how the structure of our minds depends on the structure of our brains. Nobody even knows which brain structures it is that our cognitive capacities depend on.
Unlike our minds, our brains are, by any gross measure, very like those of apes. So it looks as though relatively small alterations of brain structure must have produced very large behavioural discontinuities in the transition from the ancestral apes to us…
…In fact, we don’t know what the scientifically reasonable view of the phylogeny of behaviour is; nor will we until we begin to understand how behaviour is subserved by the brain. And never mind tough-mindedness; what matters is what’s true.

In other words, the whole evolang discussion rests on a rather tendentious assumption, one for which we have virtually no evidence; namely that a “small” phenotypic change (e.g. reduction of all basic grammatical operations to Merge) corresponds to a small brain change (e.g. some brain fold heretofore absent all of a sudden makes an appearance), which in turn corresponds to a small genetic change (e.g. some gene gets turned on during development for a little longer than previously).  Whether any of this is correct is anyone’s guess. After all there is nothing incoherent in thinking that a simple genetic change can have a big effect on brain organization, which in turn corresponds to a very complex phenotypic difference. The argument above assumes that this is not so, but the operative word is “assume.” We really don’t know.

There is another good discussion of these complex issues in Lenneberg’s chapter 6, which is worth looking at and keeping in mind. This is not unusual in the evolution literature, which typically assumes that traits (not genes) are the targets of selection. But the fact that this is commonly the way that the issues are addressed does not mean that the connections assumed from phenotypic mental accounts to brains to genes are straightforward. As Fodor notes, correctly I believe, they are not.

Ok, that’s it. There is a lot more in the paper that I leave for your discovery. Read it. It’s terrific and provides a good model for evolang discussions. And please remember the most important lesson: you cannot describe the evolution of something until you specify that thing (and even then the argument is very abstract). So far as I know, only linguists have anything approaching decent specifications of what our linguistic capacities consists in. So any story in evolang not starting from these kinds of specifications of FL (sadly, the standard case from what I can tell) are very likely the windy products of waving hands. 

[1] Happily, I have put myself in the good position of finding out that I am wrong about this. Marc Hauser is coming to UMD soon to give a lecture on the topic that I am really looking forward to. If there are any interesting results, Marc will know what they are. Cannot wait.
[2] I’d like to thank Noam for allowing me to put this paper up for public consumption.
[3] Please observe that this does not imply that BP is the only property we might wish to investigate, though I agree with Chomsky that this is a pretty salient one. But say one were interested in how the phonological system arose, or the semantic system. The first step has to be to characterize the properties of the system one is interested. Only once this is done can evolutionary speculation fruitfully proceed. See here for further discussion, with an emphasis on phonology.
[4] It is worth noting that this is very fast in evolutionary terms and that if the time scale is roughly right then this seems to preclude a gradualist evolutionary story in terms of the slow accretion of selected features. Some seem to identify evolution with natural selection. As Chomsky notes (p. 11), Darwin himself did not assume this.
[5] Furthermore, we want whatever was added to be simple because it has not changed for the last 50k years. Let me say this another way: if what emerged 100kya was the product of slow moving evolutionary change with the system accreting complexity over time then why did this slow change stop so completely 50kya? Why didn’t change continue after the trek out of Africa? Why tones of change before hand and nothing since? If the change is simple, with not moving parts, as it were, then there is nothing in the core system to further evolve.
[6] I’ll write another post on these soon. I hope.
[7] If this reading of Chomsky’s intention here is correct, then I have interpreted him incorrectly in the past. Oh well, won’t be the last time. In fact, under this view, the linguistic system once evolved need not be particularly efficient computationally or otherwise.  On this view, computationally efficient seems to me “arose as a matter of natural law without the required intervention of natural selection.”
[8] The relevant passage is


  1. Noam's lecture - as well as the interesting Q&A - can be viewed online here:

    This was Noam's contribution to an Academy Colloquium, 'The Biology of Language', at the Royal Dutch Academy of Sciences, Trippenhuis, Amsterdam, 12 Dec 2014. The colloquium was organized by yours truly, Johan Bolhuis (who chaired this session), Martin Everaert and Riny Huybregts, all at Utrecht University. 

    Best wishes, Johan

  2. I find this blog extremely helpful. Thank you!

  3. I'm very glad to see this post and I agree with the greater substance of it. I would comment some more but - I don't know whether this is my fault or the platform's - I can't access the paper. When I click on the link (reference 2 in the main body), I am taken to a Blogger page which tells me I don't have permission to view it.

  4. That's a blogger problem. The link to the actual paper is in the main text (the word "here"), but here it is again:

  5. A tangential point as well: why is there still enthusiasm for Marc Hauser when he resigned for scientific misconduct? Maybe I'm unusually severe (or the culture isn't severe enough) but that kind of behaviour shows such disregard for the integrity of the field that I don't see why anyone is lenient about it.

    1. I have been quit vocal on the l'affaire Hauser. You can go back and read some of what I posted. The bottom line is that I am less convinced than others that there was any misconduct. I also find the hysteria around the case somewhat disturbing and misplaced. At any rate, Hauser is an acknowledged expert in these areas and I plan to learn what I can from him.

  6. Thanks for summarizing Noam Chomsky's paper. I haven't read it yet, but it indeed sounds a lot like the lecture he presented at the Biology of Language colloquium. There are many things to agree with, but I think the main message you distill from it -- 'you cannot describe the evolution of something until you specify that thing' -- is not quite true. What does it even mean? At which level do we need to specify something before we are allowed to study its evolution? Darwin had no knowledge about genes, chromosomes, mutations etc. but could still develop an impressive amount of theory, make predictions and make sense of lots of empirical observations. Virologists can study the evolution of viruses before they have genotyped them etc.

    This is more than a petty point about this one sentence, because I think it relates to the core disagreement between generative linguists and what at this blog is seen as the Evolang crowd. No-one claims that the big puzzles about the nature or origin of language are solved, but the Evolang-crowd (by and large) feels that it is useful and interesting to think about origins before we have completely answered questions about the nature of language.

    In his talk and the Q&A, Chomsky mentioned several times that everybody knew that theories of UG from the 1970s and 1980s 'could not be correct', but that the data from trying describe various languages forced generative linguists to postulate lots of language-specific innate knowledge. I cannot help thinking that if only evolutionary considerations had been taken serious at that time, lots of unhelpful controversy could have been avoided. And that looking a bit more closely at evolutionary considerations now might help to relax the unhelpful stance that communication has nothing to do with language.

    1. I also wonder what Chomsky means by saying that language is not about communication but about expressing thought. Communication is surely nothing more than expression of thought with a view to sharing those thoughts with others, and as it seems to me most linguistic behavior takes the form of conversations, rather than soliloquys, so language should be seen primarily as a form of communication. Is there something I'm missing?

    2. @William
      I guess I disagree. It's hard to study something without SOME specification of its properties. I cannot study the evolution of the four chambered heart until I say what this is. Ditto with winged flight. The specification need not be perfect, but we need to identify the properties of interest.

      As for what the EVOLANG crowd wants, I'd love to see an example or two worked out.f I even offer you space for extended disquisition. What exactly has the communication angle taught us? Give us an example or two. That would be helpful to fix ideas. There is no doubt that language is used for communication, but what this means exactly and whether this function played a role in the evolution of some linguistic capacity is what we want stories for. So, find a capacity, and explain how the communicative use of language explain why this capacity has the properties it has. I'd love to see a worked out example.

      Chomsky argues that as far as BP is concerned, the communicative use of language plays no role in explaining its evolution. Now maybe there are other aspects of the evolution of language which critically rely on this use for their explanation. Fine, identify one and explain.

      Gould and Lewontin long ago noted that currently prominent functions need not explain the hows and why of evolution of the capacity that supports that function. Wings started growing for thermo-regulation (at least in bugs). At any rate, one needs an argument tying the communicative function of language together with some trait. And one needs to identify the trait and specify how its evolution depends on the fact that language was used for communication, rather than, e.g. to organize one's thoughts. It helsp to be pretty specific, for otherwise 'communication' comes to mean nothing at all and then we gain nothing by pointing to this function. Give us an example. They always help to focus discussion.

    3. Norbert, it is fairly easy to provide the examples you’d “love to see”. What Chomsky calls “externalization” and “ancillary” is by many linguists seen as an essential, defining characteristic of language. Well, externalization is typically done in a communal code rather than in the private code that would suffice for the expression of thought. Obviously, the universal preference for a communal, shared vocabulary is explained by its communicative advantage.

    4. Do you feel that what Chomsky describes (i.e. this set based merge idea) is a sufficiently precise specification of the LAD/UG/FL .. to allow the study of its evolution? From his description, it seems that on his narrative Merge is not specific to language, and indeed predates language. So it isn't clear to me that the evolang people are even looking at the same problem as Chomsky is.

      And I am not sure I understand Chomsky's theory of the evolution of merge. Is merge an adaptation?

    5. Some of these issues are discussed in more detail in our (Bolhuis, Tattersall, Chomsky & Berwick) recent essay in PLoS Biology, which can be freely downloaded here:
      Norbert gave it a good review on this blog ;-)

      We outline the issues in the first para:

      "It is uncontroversial that language has evolved, just like any other trait of living organisms. That is, once—not so long ago in evolutionary terms—there was no language at all, and now there is, at least in Homo sapiens. There is considerably less agreement as to how language evolved. There are a number of reasons for this lack of agreement. First, “language” is not always clearly defined, and this lack of clarity regarding the language phenotype leads to a corresponding lack of clarity regarding its evolutionary origins. Second, there is often confusion as to the nature of the evolutionary process and what it can tell us about the mechanisms of language. Here we argue that the basic principle that underlies language's hierarchical syntactic structure is consistent with a relatively recent evolutionary emergence."

      So, yes, if there is no agreement on the nature of the trait you want to study, you will have a hard time tracing its evolutionary history. E.g., in Willem's example of the evolution of viruses, you don't need to genotype them, but you have to define what a virus is. The same goes for language; if you think (as some do) that language doesn't exist, or that it is the same as speech, or 'communication', then you end up with a completely different evolutionary reconstruction compared to defining language in the way we did. As we say in PLoS Biol:

      "The language faculty is often equated with “communication”—a trait that is shared by all animal species and possibly also by plants. In our view, for the purposes of scientific understanding, language should be understood as a particular computational cognitive system, implemented neurally, that cannot be equated with an excessively expansive notion of “language as communication” [1]. Externalized language may be used for communication, but that particular function is largely irrelevant in this context. Thus, the origin of the language faculty does not generally seem to be informed by considerations of the evolution of communication. This viewpoint does not preclude the possibility that communicative considerations can play a role in accounting for the maintenance of language once it has appeared or for the historical language change that has clearly occurred within the human species, with all individuals sharing a common language faculty, as some mathematical models indicate [1]–[3]. A similar misconception is that language is coextensive with speech and that the evolution of vocalization or auditory-vocal learning can therefore inform us about the evolution of language (Box 1) [1],[4]. However, speech and speech perception, while functioning as possible external interfaces for the language system, are not identical to it. An alternative externalization of language is in the visual domain, as sign language [1]; even haptic externalization by touch seems possible in deaf and blind individuals [5]. Thus, while the evolution of auditory-vocal learning may be relevant for the evolution of speech, it is not for the language faculty per se. We maintain that language is a computational cognitive mechanism that has hierarchical syntactic structure at its core [1], as outlined in the next section."

      In the paper we also say, as Noam does in this talk, that the evolution of language remains largely an enigma. We also state that at most, evolutionary considerations can give us CLUES (or, if you prefer, hypotheses) as to the mechanisms of language, they can never EXPLAIN them. In this case, the apparently rapid and recent emergence of language suggest that it is a relatively simple system at its core, which is consistent with the Strong Minimalist Thesis (SMT).

      Best, Johan Bolhuis

  7. @Norbert: I think, with Jan K, that there are many examples, in the functionalist literature, where the communication angle has taught us something. One example that springs to mind is our understanding of why vowel systems across languages have the shape that they have. I find the argument from Lindblom, de Boer and many others that they represent a compromise between articulatory ease and acoustic discrimability very convincing, and that clearly seems to be a feature of speech optimised for communication.

    But maybe you are after examples that are about the core syntactic machinery, about well-formedness constraints of some sort. There I can't immediately give you an example of a completely convincing explanation of the sort 'this constraint is there because it aids communication'. But there isn't the beginning of an understanding of the mechanisms by which the syntactic machinery would aid thought -- Chomsky's favourite initial 'function' -- either, so I don't think it's wise to exclude communication from the list of potential adaptive advantages of early language.

  8. This comment has been removed by the author.

  9. @Johan I think the key is in this quote you give from your joint paper: "This viewpoint does not preclude the possibility that communicative considerations can play a role in accounting for the maintenance of language once it has appeared".

    In evolution, new traits always appear through chance mutations, and not for some function. When we talk about a function it is always about why selection favoured it once it appeared, i.e. made sure that this new mutation didn't disappear again but was maintained in the population. So the caveat that communicative pressures might have played a role in maintaining the language ability is really no different from saying language might serve communicative ends.

    1. That is correct. Just before that quote we say: "Externalized language may be used for communication, but that particular function is largely irrelevant in this context.". So, yes, obviously "language might serve communicative ends", but so what? 'Communication' is one possible function of language. But the function of a trait is logically distinct from its cause (or 'mechanism', if you like) - we discuss this also in the PLoS Biol paper. The issue here is the possible evolution of language, NOT the possible evolution of one of language's possible functions.

    2. Johan, the problem with your position is that there is not the slightest reason to call "the computational system" (or the mechanisms underlying it) language in abstraction of its application through an invented lexical system, That kind of essential agentive functionality (aka culture) does not seem to play a role in purely biological structures, such as the mammalian visual system.

    3. Yes, I think that is a good way of putting it. If there were at some point in our evolutionary history some group of hominins who had Merge but no externalization, no vocabulary, and did not use merge for communication, we would not say that they had language.

      Indeed, on what grounds do we say that the great apes do not have merge? It may be in their toolbox, but not used, to use the toolbox metaphor that came up in the discussion of Piraha. Is there an inconsistency here between the way the notion of Merge has been weakened to account for any potential problems caused by Piraha, and the claim that it is uniquely human, if that is still on the table?

    4. @Jan K, well, if you want to use a functional definition of language then you will be asking different q's, as we outline in the paper. Similarly, if you think that language = speech, as Phil Lieberman seems to do, then you are looking at something else yet again. We discuss these issues in an upcoming comment in PLoS Biol.

      @ Alex, Merge is the basic operation that leads to hierarchical syntactic structure. As we say in PLoS Biol: "According to the “Strong Minimalist Thesis,” the key distinguishing feature of language (and what evolutionary theory must explain) is hierarchical syntactic structure.". Good point about non-human animals possibly having 'merge'. In Box 1 in the PLoS Biol paper we say: "There is no a priori reason why a version of such a combinatorial computational system could not have evolved in nonhuman animals, either through common descent (e.g., apes) or convergent evolution (e.g., songbirds) [1],[18].". The thing is, that so far there is no evidence for hierarchical syntactic structure in any non-human species. There were some recent claims for 'context free grammar' capabilities in songbirds, but we have shown that these claims were based on flawed experiments (Beckers et al. 2012; see PLoS Biol paper for more details) - Noam discusses this issue in a short video ('Chomsky on birdsong and language') here:

    5. @Johan You write 'obviously "language might serve communicative ends", but so what?'. I think Norbert would say it is a big deal - as he wrote above 'Chomsky suggests that much of the windiness of much evolang speculation relates to the misconceived notion that the natural language serves largely communicative ends'

      'But the function of a trait is logically distinct from its cause (or 'mechanism', if you like)' - are you referring here to a Tinbergian distinction here between mechanism-ontogeney-function-phylogeny? I thought it was pretty clear that we are talking about communication at the functional level here. If all that you are saying is 'be careful before assuming that the brain mechanisms for ape communication are homologous to those used for human language' then we don't disagree at this point either.

    6. @ Willem I think Norbert should speak for himself (if he wants to), but it seems to me that he summarized Noam's (and our) view rather well. So, what we are all saying (I think) is, yes 'communication' is a possible function of language, but we are talking about the possible evolution of language, NOT about the possible evolution of one of language's possible functions. - I notice that I start repeating myself here... Might I suggest you take a look at our PLoS Biol paper?

      Yes, I am referring to the Tinbergian distinction, which is a logical one. Again, we discuss this issue in some detail in PLoS Biol. I (we) am not saying what you think I am saying, because I am not concerned with 'communication', but with language.

    7. @Johan, you say "the key distinguishing feature of language (and what evolutionary theory must explain) is hierarchical syntactic structure.". But I don't think that is clear enough really. So Berwick et al 2011 (where Chomsky is a co-author) give a very weak definition of this notion of structure, where even regular grammars if they are infinite generate hierarchical structure. So according to this definition, if an ape can ring a bell twice, or three times, or make the signs for "cookie" three times in a row, that is enough for it to count as merge. There are many different definitions floating around, some of which are really incompatible, and I don't want to get back into a big debate about the meaning of the term "recursion". I just want to say that this concept is a rather wobbly basis to build your entire theory of evolution of language on. So any additional clarification you can bring would be very gratefully received, especially in the connection to context-free grammars which seems a different issue? And what would count as behavioral evidence for this in the case where a population does not have "externalized-language".

    8. @ Alex Presumably you are a linguist (I am not), familiar with the SMT, Merge and so on. Note that I was talking about HIERARCHICAL syntactic structure. So, Merge is not just associative learning (obviously - I need not tell a linguist!), but a recursive procedure that leads to hierarchical structure. As we put it in PLoS Biol (did you read that paper? If not, might I suggest that you do?): "Crucially, merge can apply to the results of
      its own output so that a further application of merge to ate and {the, apples} yields the set {ate, {the, apples}}, in this way
      deriving the full range of characteristic hierarchical structure that distinguishes human language from all other known
      nonhuman cognitive systems."

      I need to get on with my work now, but I see that Norbert did an excellent job explaining the core issues, below.

    9. Yes, I read that paper when it was linked from Norbert's blog.
      My concern is with the several different concepts that people refer to using terms like "recursion" or "hierarchical structure", sometimes the same people in different papers. So for example in the paper I mentioned, R. C. Berwick et al. , Cognitive Science 35 (2011) on p 1227, they say "But if a grammar is unbounded, in any sense of interest for the study of human languages, then one or more expression-combining operations must be applicable to their own outputs (via recursion or some logical equivalent, like a Fregean ancestral). ...In this sense, finitely statable grammars that generate boundlessly many expressions will at least associate expressions with structures, even if the expressions are strings.". This means, from my reading of that section, that if we have some behaviour where we have an unbounded number of discrete actions in a sequence, like an ape signing "cookie" over and over again, that would count as a case of a hierarchically structured expression. Other people would think that this is not because it is not the "right sort" of structure, which needs to be center embedded or something.

      So Chomsky is an author of that paper, but I don't know if this is the same concept as the one you sketch in your PLoS Biol paper.
      So the situation is a bit confused (to me at least if not to others).

    10. That last sentence is a little dishonest, actually, so let me correct it: I know that quite a few other people are very confused about what recursion or hierarchical structure means in these papers (see for example this recent paper by Lobina and various papers cited therein). So I apologize if this comes over as pointless nitpicking, but given that this is the central claim, it's worth getting it straight.

    11. @Alex
      'Recursive' here meansget unboundedly big structures with phrasal embeddings. Phrases within phrases within phrases ad nauseum. "cookie cookie cookie…" doesn't quite do it. But under some bracketings 'police police police police police' does. In short, NLs have specific kinds of recursive Gs and we want to figure out how THEY evolved.
      BTW, for what it's worth, not everyone agrees with Chomsky's exact specification of the problem (e.g. I don't). But I think that his problem and the one that I like are closely enough related that if you get one you'll say lots about the other. The general point stands: specify the object that is evolving and then discuss the history. That's the way to go.

    12. @Alex I think you're thinking about this from too much of a weak-generative capacity perspective. People always miss out the point that the hypothesis is Merge **plus the mappings to the interfaces**. The evolutionary speculation is that the relevant interface is with thought rather than with the motor systems responsible for saying/signing. So the idea is that the generative procedure as it creates structure associates that structure with meanings, so that hierarchical embeddings of structure correspond to hierarchical embeddings of meaning, which are detectable, although somewhat indirectly, via scope. So while your ape may have a computable procedure underlying it's bell ringing, that computable procedure is not recursively mapped to meaning.

      You get something like your bell dining in human languages too. So consider the difference between two meanings of

      (1) he's an old old friend

      One is like the bell-ringing: it's iterating and you just get an intensification of the meaning of old. The other is scopal (he's an ancient friend I've known for a while). You can even feel the scope: the first old (the outer one hierarchically) has the ancient meaning, not the long-time meaning, try putting very next to the first, then the second. So only the scopal reading involves recursive mapping to the interface. Maybe Merge does both, with mapping in the second but nit the first meaning, or maybe the first is done via a finite state process.

    13. This comment has been removed by the author.

    14. @ davidadger Well put! Noam says this quite clearly, early on in his talk. In the PLoS Biol paper, we state this as follows: "The “Strong Minimalist Thesis” (SMT) [6] holds that merge along with a general cognitive requirement for computationally minimal or efficient search suffices to account for much of human language syntax. The SMT also requires two mappings: one to an internal conceptual interface for thought and a second to a sensory-motor interface that externalizes language as speech, sign, or other modality [1]." (ref. 1 is the Berwick et al. 2013 TICS paper).

      In addition, in our (Bolhuis, Tattersall, Chomsky & Berwick) forthcoming commentary in PLoS Biol, we reitterate this important point:

      "In our essay [2] [i.e. the original PLoS Biol essay, JJB] we argue that “Crucially, merge can apply to the results of its own output (…)”, and we show that this recursive feature leads to the potentially unbounded hierarchical expressions characteristic of human language, each of which is systematically interpreted at the conceptual-intentional interface, that is, internal to the mind-brain, not just externalized as speech."

    15. Yes, I agree with both of those, and I think the Berwick et al 2011 paper goes a little farther than is justified (i.e. the quote about strings). More generally, I agree with David, that the focus on Merge is undesirable since for me at least, the interesting stuff is then (i.e. if Merge is just this set-theoretic thing) happening in the mapping to the interfaces. Merge on its own doesn't explain any of the things that I am interested in. Like for example language acquisition or the nature of UG.

  10. @Norbert: Maybe I'm still missing something, but the communicative advantage of discrete combinatoriality I thought was obvious. You can communicate much more information that way than by the limited number of signals available in other animal communication systems. Once you can communicate more information, you enable greater cooperation, which benefits the species. So once the mutation appeared, natural selection would take care of the rest. And as has been noted, natural selection doesn't need to account for how the mutation occurred in the first place, just as it never has to account for other random genetic mutations.

    1. Yes, once the mutation appeared. What did the mutation deliver? If the mutation was Merge a la Chomsky then it delivered hierarchical recursion plus movement plus copies. Thus the answer to Chomsky's question whence BP is that it was a mutation. He would love this answer. Now, is that it for language? No even given BP there is no reason to think that lots of further adaptation was necessary to allow for externalization, even if based on some prior usable hardware. So there is still lots of room for other eve details. But to repeat, Chomsky's point (and mine) is that one specifies the trait one is interested in before going ahead. Language is a complex thing. We need to break the question down to get anywhere. And, from where I sit the answer to Chomsky's question (whence Gs with BP?) is "some sort of mutation." Fine with me.

  11. Again, Norbert should speak for himself, but I think you've got the message here. This is exactly what we say in the PLoS Biol paper, e.g. in the very last sentence: "Clearly, such a novel computational system could have led to a large competitive advantage among the early H. sapiens who possessed it, particularly when linked to possibly preexisting perceptual and motor mechanisms.". There you have it!

    1. Ah great. Has Chomsky changed his views on the selective advantage of language over the decades? What I just expressed comes from Pinker's "Language Instinct", in which he was arguing against Chomsky's alleged denial that language had any selective advantage at all and couldn't be understood as a Darwinian adaptation.

    2. It is infinitely better to read Chomsky's recent papers than to consult a book by Pinker from 1994 - if you want to know Noam's views. Apart from that, one should distinguish between the emergence of language as a computational cognitive system, which we think is not the outcome of a slow process of natural selection - so in that sense it is not an 'adaptation', on the one hand, and, on the other hand, the selective advantage of the computational system once it is there - as we say in the last sentence of the PLoS Biol paper.

    3. Norbert was on the way to the airport and so missed this delicious give and take. So where am I on these issues. First, that the only evidence we have for the kind of discrete recursive hierarchical structure that we find in language comes from humans. So far as I know, though it is POSSIBLE that other animals have such structures (which are what Merge generates, as well as non-local dependencies and reconstruction effects) the only ACTUAL evidence we have for these comes from humans. Should other animals have this as well, then that would be interesting to know. If some of our ape cousins have this, then this would indeed identify an interesting evolang avenue for investigation (as the whole ape language industry understood quite a while ago). But, as Johan notes, possible is not actual and the proposed cases for establishing this in ANY OTHER animal have so far proven nugatory. My conclusion is that we are the only ones that do this given current evidence. Would the problem change were we to discover that apes, bonobos and chimps have merge. You bet it would! The question would then become how THEY got it, unless you think that its merge all the way down (which is something Chomsky would really love for what better argument can there be for merge being an expression of physical law than it being everywhere in the biological world. This talkative amoebae!).

      Some above have suggested that communicative function might have an effect on some aspects of language, e.g. why we have a common vocabulary and certain features of phonology. I have nothing against this view. In fact, Bob Brandon and I suggested something like this for one feature of words (displacement) in an article in Philosophy and Biology a long time ago. Is it true? Don't know. But I have nothing against the idea given what I know. However, this also seems to concede Chomsky's main point: that when it comes to syntax and its basic features (which he identifies: recursive hierarchy, movement and reconstruction a.o.) to date, nothing interesting has been said. I am pretty sure that Chomsky would accept this (see Johan's excellent points on this matter). Indeed, given Chomsky's general views in the linked to paper, this is precisely where he thinks the kind of local tinkering natural selection is good at would most fruitfully ply its trade. So, nothing to say about syntax, maybe some things to say about externalization (how this novelty got hooked up to the externalization systems).

      I should add, that we know quite a bit about these systems as well. Phonology and morphology are not exactly linguistic wastelands. So, IF you think that communication is what's driving results here, then there's lots of room for detailed work (I recall that spreading the vowels in the vowel space is often attributed to the communicative function of speech, but I don't know the details) and it would be nice to hear a non-trivial story or two or three, something akin to the evolution of the insect wing, say. So, I invite anyone of you to take one of these and give us an exposition. Take your favorite feature and give us a detailed story. This way we can have models of how a good evolang explanation would proceed. I'M NOT KIDDING HERE: OPEN INVITATION. TAKE ADVANTAGE AND LEND A HAND. NO ONE LINERS AND TWO PARAGRAPH ASSERTIONS. WALK US THROUGH A STORY THAT GOES FROM IDENTIFIED PROPERTY TO EVOLUTIONARY STORY THAT LEADS TO THAT PROPERTY.
      I look forward to being enlightened. BTW, Brandon and I did try and do this for some properties of words in the old paper. I am not sure how successful we were, but we did try, and it turned out to be quite challenging.

    4. 'Take your favorite feature and give us a detailed story.' - I completely agree, that's what we should all be doing, including not only the Evolang crowd, but also you, Chomsky, Bolhuis, Berwick and all those other people that take part in debates about how language evolved and debates about whether we will ever know how language evolved! (Regarding Chomsky's proposal that this thread started with: I was happy to see that there was much more detail in his scenario now than in the past, although the evolutionary process assumed still remains - deliberately perhaps - very underspecified).

      But I think there are already many more such detailed proposals out there, from the infamous Evolang crowd and others, than you (and the Language-evolution-will-always-remain-a-mystery crowd) seem to think. Bart de Boer has a paper on 'the Evolution of Phonology', Kenny Smith on 'the Evolution of Vocabulary', Simon Kirby onthe relation irregularity--frequency , I have a joint paper with de Boer on 'the Evolution of Combinatorial Phonology', Gerhard Jaeger on 'the Evolution of Convex Categories', there are papers on the evolution of duality of patterning, color words etc. Then there are a lot of papers by Martin Nowak and coauthors on the evolution of UG and compositionality that I'm less of a fan of, but at least they propose formal models (wasn't that the defining feature of generative grammar?) that are actually precise enough so that you can criticize them.

    5. @ Willem: I do not think you have any chance of convincing Norbert. If he would be seriously interested in someone 'walking him through the steps' he would by now have read the monumental book by Hauser&Chomsky co-author Tecumseh Fitch:

      I do not claim that Fitch [who -for those in the audience who may be unaware - also massively contributes to EvoLang] got everything right, but he certainly offers the kind of story Norbert claims could not be told from a very GG friendly perspective. [I recommended it years ago] But Norbert prefers moving in tight circles. He accepts Chomsky's claim that [1] If [UG is dead] there is no topic of evolution of UG [2] no topic of the evolution of language in the only coherent sense. [1] is true of course but [2] only follows if one is forbidden to question the Chomskyan dogma. Norbert accepts to prohibition and only considers answers that can be given within the UG framework. If you do not want to offer those you waste your [and Norbert's] time.

      BTW, anyone taken in by Chomsky's very funny claim that no evolutionary change could occur in a time frame as short as 50,000 years may want to have a look at: - seems real biologists disagree.

    6. Oh, I have no illusions about convincing anyone (I think there are too many convincers already in linguistics...). I'd really rather be convinced by someone here, as I really don't understand why words 'communication', 'cultural evolution', 'connectionism' and 'Bayes' continue to be such dirty words in generative linguistics. At some point, we want to answer questions about how brains might implement a complex language, and how the human brain became capable of doing that. Personally, I think there are exciting new ideas about how those riddles might be solved, and I don't get why we seem to still be discussing like it's hard core nativists against tabula-rasa empiricists while positions on both sides seem to have changed a lot.

    7. Willem, I was not kidding about doing some of this for us. I am a busy and lazy guy. So I would love to see some examples broken down for the unwashed like me. I would love to have you post an example or two of a good evolang story. Like I said, I have no problems believing that for many language features that natural selection plays a role (just not for Chomsky's BP). I would just like to see an example or two so that I know what I am looking for. And as some have noted, I have not read extensively in the literature. I was hoping that someone could hold my hand through an example.

      As for Bayes, connectionism, etc: they are not dirty words, at least to me. I think that as a matter of fact for reasons that Randy has given that connectionism won't work. I also think that as a matter of fact it has bought into associationism (though it need not have done so) and that this too indicates that it will fail. So, I don't really agree with your suggestions here. That said, why not do a 4-5 page illustration of something you think successful (sort of like some of us have done for PoWS arguments) and educate us all. I would love this. Consider yourself invited.

    8. @Willem:
      I was not entirely truthful. Connectionism is theoria non grata to me, mainly for reasons outlined by Fodor, Pylyshyn, Marcus, Gallistel and King. Moreover, criticism of my way of life has come from this quarter. Many have claimed that GG cannot be on the right track because brains are connection machines and so rules cannot be coded therein. Add to this the rather pungent associationism that comes with many (most) connectionist proposals and yes, I am not very open to what they have to say. I have written not this a lot here on FoL and you are free to take a look,

      As for Bayes, this is a bit different. I have objections to this too, but they are not principled. I can imagine useful combinations of Bayes and GG and have even pointed to a few. I have also pointed to stuff where the Bayes stuff has been oversold. WRT Bayes my view is "convince me." I can be convinced but do not assume that the framework must be right. I want proof.

      This is similar to my view about evolang. In the small areas I know anything about, syntax, I have seen nothing of particular interest. It simply fails to engage with what I take to be the central features of syntactic competence. I have followed with interest comparisons between bird song and phonology as this may tell use something about properties of vocalization systems more generally.

      Last point: we are not interested in the evolution of words or phones, but in the evolution of the word and sound CAPACITIES. That's what eve addresses, right? Sounds may change and so too words, but the capacity may not (think old to middle to modern English; same capacity different Gs). I actually am a fan of language evolution work. But this is not work about an evolving capacity as I understand it.

      Hope this clarifies things.Oh, and I have no idea what you mean by hard core nativists vs tabula rasa empiricists. I think that everyone assumes native structure and everyone assumes change due to environmental input. The question is how much in each particular case. Nobody that I know is fighting the straw people you mention.

    9. @Willem/Jelle: I would say that the EVOLANG work you brought up--de Boer Phonology, Kirby/Smith regularization, Nowak et al. models of UG evolution--are all examples of language change. I have done work myself using porting population genetic models to account for attested cases of language change.

      Once the capacity for language becomes available, these are suggestions of how language(s) took on the shape they have. Communication may be a factor in language change, as you noted (and the Bolhuis et al. paper explicitly endorse; see above), although the deficiencies of this approach are also well known from both traditional historical linguistics and the Labovian studies of language change in progress. In any case, these models do not at all address the question where language comes from. E.g., the Nowak model assumes that a compositional UG and a non compositional UG are available and proceeds to derive conditions under which the compositional UG wins; the model does not say where the compositional UG comes from. The regularization work, which is motivated by the instances of morphological change, presupposes that the human learners have the capacity to form compositional rules/patterns (and indeed, since the experimental subjects are human subjects). But that's building in the solution of the EVOLANG problem, i.e., language.

      Like Norbert suggests, it'd be a good if you could walk the reader through some of the worked out examples you have in mind.

    10. This is important: I have been assuming that by evolang we are interested in how the CAPACITY for language arose, not how current English arose from middle English. The latter is a fun topic, but not the one that Chomsky was addressing. If Charles is right (and he almost always is) then the cited examples seem irrelevant to the current discussion. Nobody is arguing against using selection models within linguistics to explain learning or language change or whatever. However, this is not the evolang question of interest. So, let me again repeat my request (and second Charles' second) for a little elaboration of someone's favorite little successful example of an evolang explanation of a linguistic capacity. Surely there must be a poster child example out there that someone would have the charity to lay before us.

    11. The papers I mentioned aren't about language change, if you define language change as the change from one mature natural language to another. Some of them do model cultural evolution rather than biological evolution, but with the goal of accounting for how properties of natural languages emerged in the first place. Some 'optimal communication' models (like my 2009 paper with De Boer in J. of Phonetics) remain agnostic about whether the optimization is due to biological or cultural evolution, or optimization in the individual, and just note that signal systems optimized for communication have properties that correspond to 'design features' of natural language.

      Cultural evolution models and experiments (that thus assume some existing, perhaps unused CAPACITY) have been a prominent line of research at the Evolang conferences, so I think it is a mistake to equate Evolang with adaptationist scenarios. That said, the origin of the capacity is of course a key topic; in my view, some version of a gene-culture coevolution theory has the best prospects for reconciling observations about the uniqueness and complexity of language with those about the biological continuity of the underlying mechanisms.

      I'll do my best to find the time to write some more about some Evolang success stories - but I too have a busy schedule (see, e.g., this: - I hear you are in Amsterdam, Charles, so you're very welcome!).

    12. @Jelle I'll stop by. Teaching will be over and this sounds like a lot of fun. See you tomorrow.

  12. OK so the main problem is that it's not clear how syntactic structure as we know it could have arisen out of primate communication systems by any stepwise evolutionary process. To borrow a term from intelligent design, language is irreducibly complex. One day there was no recursion or discrete infinity, then suddenly there was. This raises the question of whether our all-or-nothing theory of syntax is correct, though. Didn't Jackendoff try to describe an alternative theory about 10 years ago that could more easily be derived from chimp-like communication? Did anything come of that?

  13. Perhaps a bit off-topic, but I'd be interested to hear more about how you understand these most recent comments of Chomsky's about optimality ("the linguistic system once evolved need not be particularly efficient computationally or otherwise"), and how it differs from what you took it to mean before. Is the idea that the question everyone has been asking for 20 years --- "Optimal by what metric?" --- is supposed to have as its answer "minimizing the 'size' of the genetic mutation"?

    1. It differs a lot. This suggests that the line of interpretation I was pursuing was way off base. I took 'optimal' 'efficient' to be predicates of Gs or FL/UG. Chomsky takes it to be a predicate of the evolution of UG and FL. In other words, we can conclude nothing about the design of Gs from the optimal path of evolution.

      If this is correct (and it may be) then someone saying that such and such feature of UG/G is an instance of optimal design or is there to promote computational efficiency seems misplaced (maybe, I'm still unclear here). But what Chomsky is looking for are predicates that lead to a simple eve scenario. The result will be optimal/well designed but could be clunky on other grounds.

      I'm not really sure how to square this with other things that have been said: like minimality is related to optimal search or phases to bounding computation. It may explain Chomsky's view that Merge constructs sets if these are in some sense very simple and so would be easily evolvable (say easier to evolve than ordered pairs (I don't really believe this, I should say)).

      So, it looks like I was wrong on the Chomsky exegesis front. But I still like the idea and have not thrown it out. It makes for more points of contact between syntax and the rest of linguistics, which I take to be a good thing.

    2. If this is correct (and it may be) then someone saying that such and such feature of UG/G is an instance of optimal design or is there to promote computational efficiency seems misplaced (maybe, I'm still unclear here).

      Yes, this is sort of what I was wondering. It seems to shift all the talk about optimality and/or efficiency -- which was not that clearly defined in the first place even when the relevant metric was thought (at least by you and me) to be at the level of the grammar (phenotype, roughly) -- onto even shakier ground, because it now depends on assumptions about the phenotype/genotype relationship, which is poorly understood as you and Fodor point out.

      It may explain Chomsky's view that Merge constructs sets if these are in some sense very simple and so would be easily evolvable (say easier to evolve than ordered pairs (I don't really believe this, I should say)).

      Really? I would have thought that this is exactly kind of thing which becomes even harder to be convinced of (I don't see any reason to believe it either), on this view according to which "we can conclude nothing about the design of Gs from the optimal path of evolution" (which seems right to me).

    3. I think that Chomsky has in mind something like the following (but beware, last time I put words into Chomsky's mouth I was completely wrong): the most evolvable novelty will be as simple as possible. The simplest combination operation is one where A and B combine unordered. This suggests {A,B} which is just a way of coding that two things are combined and unordered. Now by order, Chomsky seems to mean *linearizing* as in concatenation. But the strongest version is that {A,B} is more evolvable than . So Merge A and B is simpler than Merge A with B, i.e. symmetrical operations are "simpler." This is really what he wants (and needs). Now maybe one can run such an argument. But I cannot find in me to believe that symmetrical Merge is all that simpler than asymmetric Merge, especially when one considers that the merging is plausibly thought of as a function of the properties of the mergees. At any rate, that's what I was thinking. We want a really simple combo operation for evolvability reasons. And the simplest one will have properties of the kind that Chomsky has noted. This is an optimal process from an evo perspective. That's how I am reading some of his optimality remarks now.

    4. Taking Norbert's last comment with his own proviso [that "last time [he] put words into Chomsky's mouth [he] was completely wrong"] here is a puzzle the typical Evolang participant would raise:

      Norbert says: "We want a really simple combo operation for evolvability reasons. And the simplest one will have properties of the kind that Chomsky has noted. This is an optimal process from an evo perspective."

      Assuming this to be true one has to ask whether this simplest combo operation accounts for everything that needed to evolve, whether the speculation Chomsky offers could yield a complete account? Interestingly enough just in the sentence before Norbert himself indicates it does not:

      "the merging is plausibly thought of as a function of the properties of the mergees".

      So in order for Merge to do all its magic it depends on mergees with very specific properties. Chomsky is of course entirely silent about how these wonderful mergees [that are also innate and are now doing much of the explanatory work in acquisition debates] might have evolved. But possibly someone explaining Chomsky's position here can address that question?

    5. The main problem I have with this line of argument, is that Merge is such a basic operation that it is hard to see what a computational system that did not use it would look like. Does anyone have a computational model of any type in mind, where we can say that
      without Merge we can only do computations in set S, but with Merge we can do S' which is bigger?

      I feel that hovering in the background is some distinction between context-free grammars/PDAs and finite automata, but I don't see how this fits, and some people reject that I know.

    6. Alex, if I understand Chomsky's evolution speculation right [and i do not claim I do], then you ask the wrong question: it is not a matter of comparison between different computational systems but between no system at all and the Merge system. You can get a flavour of it when you read: "In fact it is conceivable ... that higher primates, say gorillas or whatever, actually have something like a human language faculty but just have no access to it’ (Chomsky, 2000).

      On that view our ancestors were like gorillas until the mutation occurred. So adding Merge gives you one [by Chomsky's definition the simplest possible] computational system. While unlikely, that postulation is not entirely implausible: we go from nothing to something and by a stroke of luck this something was virtually perfect...

      But, of course this does not answer why our ancestors had the mergees that are required for merge to do anything. I doubt anyone would want to claim whatever gorillas have resembles our mergees ...

    7. Bonobos and chimps can do some complex behaviours but not others, so they clearly have some computational capacity -- even slime molds do. The question is more sharply: what behavioural differences do we predict between animals with Merge and without Merge (if we don't have the externalisation systems so neither group have language in the standard sense).

    8. This comment has been removed by the author.

    9. Alex said: "Bonobos and chimps ... have some computational capacity -- even slime molds do"

      This would only be a problem if Merge changed/replaced an existing computational system. But it seems that if one assumes Chomsky et al's. 'one-mutation-did-it-all' ego-scenario, then a new computational system was added that did not interfere with or depend on already existing systems. [That's why he keeps saying animal communication research is irrelevant to language evolution].

      Now as i said earlier this is not implausible but it is massively incomplete because it does not explain at all how this new system could access and manipulate existing cognitive 'atoms' [for lack of a better term]. Why would these things have been immediately usable mergees before an organism had merge?

    10. @Alex
      I have some sympathy for your reaction to Merge as the novelty. I also find it to be a bit too good and this makes it hard to see why it appears to be restricted to humans. It's partly for this reason that I suggested that the key innovation is "headedness" (via labeling) rather than Merge that is the real innovation.

      That said, what I like about the Chomsky proposal is not the particular suggestion, but the form of the argument. He is right that what you want is some small change that when added to previous cognitive powers yields an FL like object. What's an FL like object? ONe that gives you Gs with the kinds of properties more or less described by theories like GB (viz. involve movement, allow for reconstructions effects, have unbounded embedding of phrases etc.). In short, the form of the solution seems right, even if one objects to the particular version. And this is helpful. We need models for what kinds of stories to pursue. And in the domain of the evolution of syntax, Chomsky's is the only one that I know of with the right flavor.

  14. Re the evolutionary advantage of Merge, I thought the idea was that Merge allows a flexible and creative kind of combinatory thought, as argued for example by Spelke; a rat can have a notion of ‘northeast’ and a notion of ‘near the pillar,’ but cannot easily and flexibly combine any two notions to get things like ‘northeast of the pillar’ the way a human can.

    Humans can instantly assemble ad hoc complex notions on the fly, and use them in working memory for further computation, without necessarily even storing them long-term. It is plausible, as some of Spelke's experiments suggested, that other animals don't; their online computation does not have the range, and their concept formation doesn't happen on the fly.

    Primates able to recursively perform this mental trick (according to the speculation, humans after about 75000 years ago) would have an advantage over those that couldn't, even in the absence of any externalization.

    Substantial mental advantage could be gained from the ability to spontaneously Merge what we as linguists describe as roots, root-compounding.

    But if Merge can also act on more abstract notions like familiarity and time, you could get DPs and TPs as well (though I know Adger and Hinzen & Sheehan have different stories about how functional categories come in). And maybe also lexical categories like vP and nP, on Marantz' suggestion, if the essential character of v (V) and n (N) are similar to the other functional categories in whatever is the relevant way. All Merge has to be able to do is see them.

    Adger, David. 2013. A Syntax of Substance. MIT Press.

    Hinzen, Wolfram, and Michelle Sheehan. 2013. The Philosophy of Universal Grammar. OUP.

    Marantz, Alec. 2007. Phases and words. In S.H. Choe (ed.), Phases in the Theory of Grammar, pp. 191-222. Dong-In, Seoul.

    Spelke, Elizabeth S. 2003. What makes us smart? Core knowledge and natural language. In D. Gentner and S. Goldin-Meadow (eds.), Language in mind: Advances in the study of language and thought, pp. 277-311. MIT Press.

    1. @Peter
      I am not sure how the recursion trick fits together with the ad hoc part. Spelke focused on the ability to combine concepts from different modules. Recursion could in principle hold within just one module. So, how do you see these two apparently different points hanging together?

      Second, is the "on the fly" part important? Say we found an animal that could do this, but not "on the fly"? What difference would this make? Is "on the fly" and "ad hoc" what confers advantage or recursive combinatorics? Or are these intimately related?

      I ask, because I've never been able to outline the connections to my own satisfaction. Thx.

    2. Well, you can teach an animal an arbitrary association, like blue triangle = electric shock, presumably from different modules. And animals can perform some fast computations on the fly, like in reacting to a threat or opportunity. But it's not so clear that they can spontaneously make those arbitrary associations without any training or reinforcement. Most experiments showing that chimpanzees can do something sophisticated involve hundreds of training trials. But somebody can walk into your office and say "There's a banana behind your filing cabinet!" out of context and you'll instantly know exactly where to look. Later, you will remember the gist of what I said but will probably not remember the exact wording, suggesting you didn't actually store my sentence verbatim in long-term memory. But you had it in short-term memory long enough to process it.

      So I would guess that rats and chimpanzees can put very different things together, but only by a less computationally efficient process, and they also have computationally efficient processes, but with narrower domains.

    3. Sorry, "electric shock" was a poorly chosen example of an arbitrary association, because pain avoidance is special and might not require multiple exposures, but as Pavlov showed, you can teach an animal a whole range of arbitrary associations with training.

    4. Sorry, and where's the recursion come in? Is domain generality necessarily tied to recursion or contingently so? Could one have domain specificity and recursion and domain generality without? Do you take associations to be Merge related in any way? And are there hierarchically organized associations? Sorry I'm being thick here. I just can't follow.

    5. If Merge applies to things from different domains, then it has a certain generality to it, so it doesn't seem unnatural that it should be able to apply to its own output. I don't think I'm saying anything new there. A domain specific operation targets things with a more narrowly specified character, so it would be less surprising to find that such an operation was not recursive.

    6. I guess i thonk that there is still a missing step here. It's one thing to apply to different kinds of ATOMS and applying to outputs that are not atomic. The laater is a big leap, the recursive one. Thx however, i now get it.

    7. I realize that it's just another adaptationist just-so story, but it seems to me at least as plausible as any involving communication as the primary advantage. Obviously compounding is just the tip of the iceberg, a way to suggest that Merge would give you advantages even before you tried to do anything fancy with it; but then once you have it, you can do argument structure and construct propositions and embed one complex thing inside another and so on.

      I was trying in the particulars to distinguish Merge from other ways of putting things together. For example, a syntactic atom at some level involves an association with a phonological representation, something like /kæt/ <--> CAT and /z/ <--> PLURAL, to use DM-style representations. I don't think that anybody thinks that "<-->" is due to Merge --- i.e. that what combines the phonological information with the information that accesses meaning is Merge.

      So "<-->" is due to some other way of forming associations. These associations tend to be stored, in a way that one-off syntactic structures are typically not. I have no reason to think that "<-->" is recursive; e.g. you don't link the sound-meaning pairing to a distinct sound with a recursive layer of "<-->". So it can't be the case that the only way for a brain (human or otherwise) to combine two things is Merge, nor that any mechanism for combining two things will be recursive. But something like "<-->" can't get you complex thought, whereas Merge might.

    8. This could be of interest to some who participated in this discussion [especially Norbert may wish to resist the urge to ignore it because of who posted it]