Comments

Thursday, July 20, 2017

Is linguistics a science?

I have a confession to make: I read (and even monetarily support) Aeon. I know that they publish junk (e.g. Evans has dumped junk on its pages twice), but I think the idea of trying to popularize the recondite for the neophyte is a worthwhile endeavor, even if it occasionally goes awry. I mention this because Aeon has done it again. The editors clearly understand the value (measured in eyeballs) of a discussion of Chomsky. And I was expecting the worst, another Evans like or Everett like or Wolfe like effort. In other words I was looking forward to extreme irritation. To my delight, I was disappointed. The piece (by Arika Okrent here) got many things right. That said, it is not a good discussion and will leave many more confused and misinformed than they should be. In what follows I will try to outline my personal listing of pros and cons. I hope to be brief, but I might fail.

The title of Okrent’s piece is the title of this post. The question at issue is whether Chomskyan linguistics is scientific. Other brands get mentioned in passing, but the piece Is linguistics a science? (ILAS), is clearly about the Chomsky view of GG (CGG). The subtitle sets (part of) the tone:

Much of linguistic theory is so abstract and dependent on theoretical apparatus that it might be impossible to explain

ILAS goes into how CGG is “so abstract” and raises the possibility that this level of abstraction “might” (hmm, weasel word warning!) make it incomprehensible to the non-initiated, but it sadly fails to explain how this distinguishes CGG from virtually any other inquiry of substance. And by this I mean not merely other “sciences” but even biblical criticism, anthropology, cliometrics, economics etc.  Any domain that is intensively studied will create technical, theoretical and verbal barriers to entry by the unprepared. One of the jobs of popularization is to allow non-experts to see through this surface dazzle to the core ideas and results. Much as I admire the progress that CGG has made over the last 60 years, I really doubt that its abstractions are that hard to understand if patiently explained. I speak from experience here. I do this regularly, and it’s really not that hard. So, contrary to ILAS, I am quite sure that CGG can be explained to the interested layperson and the vapor of obscurity that this whiff of ineffability spritzes into the discussion is a major disservice. (Preview of things to come: in my next post I will try (again) to lay out the basic logic of the CGG program in a way accessible (I hope) to a Sci Am reader).

Actually, many parts of ILAS are much worse than this and will not help in the important task of educating the non-professional. Here are some not so random examples of what I mean: ILAS claims that CGG is a “challenge to the scientific method itself” (2), suggests that it is “unfalsifiable” Popper-wise (2), that it eschews “predictions” (3), that it exploits a kind of data that is “unusual for a science” (5), suggests that it is fundamentally unempirical in that “Universal grammar is not a hypothesis to be tested, but a foundational assumption” (6), bemoans that many CGG claims are “maddeningly circular or at the very least extremely confusing” (6), complains that CGG “grew ever more technically complex,” with ever more “levels and stipulations,” and ever more “theoretical machinery” (7), asserts that MP, CGG’s latest theoretical turn confuses “even linguists” (including Okrent!) (7), may be more philosophy than science (7), moots the possibility that “a major part of it is unfalsifiable” and “elusive” and “so abstract and dependent on theoretical apparatus that it might be impossible to explain” (7), moots that possibility that CGG is post truth in that there is nothing (not much?) “at stake in determining which way of looking at things is the right one” (8), and ends with a parallel between Christian faith and CGG which are described as “not designed for falsification” (9). These claims, spread as they are throughout ILAS, leave the impression that CGG is some kind of weird semi mystical view (part philosophy, part religion, part science), which is justifiably confusing to the amateur and professional alike. Don’t get me wrong: ILAS can appreciate why some might find this obscure hunt for the unempirical abstract worth pursuing, but the “impulse” is clearly more Aquarian (as in age of) than scientific. Here’s ILAS (8):

I must admit, there have been times when, upon going through some highly technical, abstract analysis of why some surface phenomena in two very different languages can be captured by a single structural principle, I get a fuzzy, shimmering glimpse in my peripheral vision of a deeper truth about language. Really, it’s not even a glimpse, but a ghost of a leading edge of something that might come into view but could just as easily not be there at all. I feel it, but I feel no impulse to pursue it. I can understand, though, why there are people who do feel that impulse.

Did I say “semi mystical,” change that to pure Saint Teresa of Avila. So there is a lot to dislike here.[1]

That said, ILAS also makes some decent points and in this it rises way above the shoddiness of Evans, Everett and Wolfe. It correctly notes that science is “a messy business” and relies on abstraction to civilize its inquiries (1), it notes that “the human capacity for language,” not “the nature of language,” is the focus of CGG inquiry (5), it notes the CGG focus on linguistic creativity and the G knowledge it implicates (4), it observes the importance of negative data (“intentional violations and bad examples”) to plumbing the structure of the human capacity (5), it endorses a ling vs lang distinction within linguistics (“There are many linguists who look at language use in the real world … without making any commitment to whether or not the descriptions are part of an innate universal grammar”) (6), it distinguishes Chomsky’s conception of UG from a Greenberg version (sans naming the distinction in this way)  and notes that the term ‘universal grammar’ can be confusing to many (6):

The phrase ‘universal grammar’ gives the impression that it’s going to be a list of features common to all languages, statements such as ‘all languages have nouns’ or ‘all languages mark verbs for tense’. But there are very few features shared by all known languages, possibly none. The word ‘universal’ is misleading here too. It seems like it should mean ‘found in all languages’ but in this case it means something like ‘found in all humans’ (because otherwise they would not be able to learn language as they do.)

And it also notes the virtues of abstraction (7).

Despite these virtues (and I really like that above explanation of ‘universal grammar’), ILAS largely obfuscates the issues at hand and gravely misrepresents CGG. There are several problems.

First, as noted, a central trope of ILAS is that CGG represents a “challenge to the scientific method itself” (2). In fact one problem ILAS sees with discussions of the Everett/Chomsky “debate” (yes, scare quotes) is that it obscures this more fundamental fact. How is it a challenge? Well, it is un-Popperian in that it insulates its core tenets (universal grammar) from falsifiability (3).

There are two big problems with this description. First, so far as I can see, there is nothing that ILAS says about CGG that could not be said about the uncontroversial sciences (e.g. physics). They too are not Popper falsifiable, as has been noted in the philo of science literature for well over 50 years now. Nobody who has looked at the Scientific Method thinks that falsifiability accurately describes scientific practice.[2] In fact, few think that either Falsificationism or the idea that science has a method are coherent positions. Lakatos has made this point endlessly, Feyerabend more amusingly. And so has virtually every other philosopher of science (Laudan, Cartwright, Hacking to name three more). Adopting the Chomsky maxim that if a methodological dictum fails to apply to physics then it is not reasonable to hold linguistics to its standard, we can conclude that ILAS’s observation that certain CGG tenets are falsifiable (even if this is so) is not a problem peculiar to CGG. ILAS’s suggestion that it is is thus unfortunate.

Second, as Lakatos in particular has noted (but Quine also made his reputation on this, stealing the Duhem thesis), central cores of scientific programs are never easily directly empirically testable. Many linking hypotheses are required which can usually be adjusted to fend off recalcitrant data.  This is no less true in physics than in linguistics.  So, having cores that are very hard to test directly is not unique to CGG. 

Lastly, being hard to test and being unempirical are not quite the same thing. Here’s what I mean. Take the claim that humans have a species specific dedicated capacity to acquire natural languages. This claim rests on trivial observations (e.g. we humans learn French, dogs (smart as they are) don’t!). That this involves Gs in some way is trivially attested by the fact of linguistic creativity (the capacity to use and understand novel sentences). That it is a species capacity is obvious to any parent of any child. These are empirical truisms and so well grounded in fact that disputing their accuracy is silly. The question is not (and never has been) whether humans have these capacities, but what the fine structure of these capacities is.  In this sense, CGG is not a theory, anymore than MP is. It is a project resting on trivially true facts. Of course, any specification of the capacity commits empirical and theoretical hostages and linguists have developed methods and arguments and data to test them. But we don’t “test” whether FL/UG exists because it is trivially obvious that it does. Of course, humans are built for language like ants are built to dead reckon or birds are built to fly or fish to swim.  So the problem is not that this assumption is insulated from test and thus holding it is unempirical and unscientific. Rather this assumption is not tested for the same reason that we don’t test the proposition that the Atlantic Ocean exists. You’d be foolish to waste your time.  So, CGG is a project, as Chomsky is noted as saying, and the project has been successful as it has delivered various theories concerning how the truism could be true, and these are tested every day, in exactly the kinds of ways that other sciences test their claims. So, contrary to ILAS, there is nothing novel in linguistic methodology. Period. The questions being asked are (somewhat) novel, but the methods of investigation are pure white bread.[3] That ILAS suggests otherwise is both incorrect and a deep disservice.

Another central feature of ILAS is the idea that CGG has been getting progressively more abstract, removed from facts, technical, and stipulative. This is a version of the common theme that CGG is always changing and getting more abstruse. Is ILAS pining for the simple days of LSLT and Syntactic Structures? Has Okrent read these (I actually doubt it given that nobody under a certain age looks at these anymore). At any rate, again, in this regard CGG is not different from any other program of inquiry. Yes, complexity flourishes for the simple reason that more complex issues are addressed. That’s what happens when there is progress. However, ILAS suggests that contemporary complexity contrasts with the simplicity of an earlier golden age, and this is incorrect. Again, let me explain.

One of the hallmarks of successful inquiry is that it builds on insights that came before. This is especially true in the sciences where later work (e.g. Einstein) builds on early work (e.g. Newton). A mark of this is that newer theories are expected to cover (more or less) the same territory as previous ones. One way of doing this for newbies to have the oldsters as limit cases (e.g. you get Newton from Einstein when speed of light is on the low side). This is what makes scientific inquiry progressive (shoulders and giants and all that). Well linguistics has this too (see here for first of several posts illustrating this with a Whig History). Once one removes the technicalia (important stuff btw), common themes emerge that have been conserved through virtually every version of CGG accounts (constituency, hierarchy, locality, non-local dependency, displacement) in virtually the same way. So, contrary to the impression ILAS provides, CGG is not an ever more complex blooming buzzing mass of obscurities. Or at least not more so than any other progressive inquiry. There are technical changes galore as bounds of empirical inquiry expand and earlier results are preserved largely intact in subsequent theory. The suggestion that there is something particularly odd of the way that this happens in CGG is just incorrect. And again, suggesting as much is a real disservice and an obfuscation.

Let me end with one more point, one where I kinda like what ILAS says, but not quite. It is hard to tell whether ILAS likes abstraction or doesn’t. Does it obscure or clarify? Does it make empirical contact harder or easier?  I am not sure what ILAS concludes, but the problem of abstraction seems contentious in the piece.  It should not be. Let me end on that theme.

First, abstraction is required to get any inquiry off the ground. Data is never unvarnished. But more importantly, only by abstracting away from irrelevancies can phenomena be identified at all. ILAS notes this in discussing friction and gravitational attraction. It’s true in linguistics too. Everyone recognizes performance errors, most recognize that it is legit to abstract away from memory limitations in studying the G aspects of linguistic creativity. At any rate, we all do it, and not just in linguistics. What is less appreciated I believe is that abstraction allows one to hone one’s questions and make it possible to make contact with empirics. It was when we moved away from sentences uttered to judgments about well formedness investigated via differential acceptability that we were able to start finding interesting Gish properties of native speakers. Looking at utterances in all their gory detail, obscures what is going on. Just as with friction and gravity.  Abstraction does not make it harder to find out what is going on, but easier.

A more contemporary example of this in linguistics is the focus on Merge. This abstracts away from a whole lot of stuff. But, it also by ignoring many other features of G rules (besides the capacity to endlessly embed) allows for inquiry to focus on key features of G operations: they spawn endlessly many hierarchically organized structures that allow for displacement, reconstruction, etc.  It also allows one to raise in simplified form new possibilities (do Gs allow for SW movement? Is inverse control/binding possible?). Abstraction need not make things more obscure. Abstracting away from irrelevancies is required to gain insight. It should be prized. ILAS fails to appreciate how CGG has progressed, in part, by honing sharper questions by abstracting away from side issues. One would hope a popularization might do this. ILAS did not. It made appreciating abstractions virtues harder to discern.

One more point: it has been suggested to me that many of the flaws I noted in ILAS were part of what made the piece publishable. In other words, it’s the price of getting accepted.  This might be so. I really don’t know. But, it is also irrelevant. If this is the price, then there are worse things than not getting published.  This is especially so for popular science pieces. The goal should be to faithfully reflect the main insights of what one is writing about. The art is figuring out how to simplify without undue distortion. ILAS does not meet this standard, I believe.


[1] The CGG as mysticism meme goes back a long way. I believe that Hockett’s review of  Chomsky’s earliest work made similar suggestions.
[2] In fact, few nowadays are able to identify a scientific method. Yes, there are rules of thumb like think clearly, try hard, use data etc. But the days of thinking that there is a method, even in the developed sciences, is gone.
[3] John Collins has an exhaustive and definitive discussion of this point in his excellent book (here). Read it and then forget about methodological dualism evermore.

Monday, July 17, 2017

The Gallsitel-King conjecture; another brick in the wall

Several people sent me this piece discussing some recent work showing how to store and retrieve information in "live" (vs synthetic) DNA.  It's pretty cool. Recall the Gallistel-King conjecture (GKC) is that a locus of cognitive computing will be intra-cellular and that large molecules like DNA will be the repository of memories. The advantage is that we know how to"write to" and "read from" such chemical computers and that this is what we need if we are to biologically model the kinds of computations that behavioral studies have shown to be what is going on in animal cognition. The proof of concept that this is realistic invites being able to do this in "live" systems. This report shows that it has been done.

The images and videos the researchers pasted inside E. Coli are composed of black-and-white pixels. First, the scientists encoded the pixels into DNA. Then, they put their DNA into the E. coli cells using electricity. Running an electrical current across cells opens small channels in the cell wall, and then the DNA can flow inside. From here, the E. Coli’s CRISPR system grabbed the DNA and incorporated it into its own genome. “We found that if we made the sequences we supplied look like what the system usually grabs from viruses, it would take what we give,” Shipman says.
Once the information was inside, the next step was to retrieve it. So, the team sequenced the E. coli DNA and ran the sequence through a computer program, which successfully reproduced the original images. So the running horse you see at the top of the page is really just the computer's representation of the sequenced DNA, since we can’t see DNA with the naked eye.

Now we need to find more plausible mechanisms by which this kind of process might take place. But, this is a cool first step and makes the GKC a little less conjectural.



Thursday, July 13, 2017

Some recent thoughts on AI

Kleanthes sent me this link to a recent lecture by Gary Marcus (GM) on the status of current AI research. It is a somewhat jaundiced review concluding that, once again, the results have been strongly oversold. This should not be surprising. The rewards to those that deliver strong AI (“the kind of AI that would be as smart as, say a Star Trek computer” (3)) will be without limit, both tangibly (lots and lots of money) and spiritually (lots and lots of fame, immortal kinda fame). And given hyperbole never cripples its purveyors (“AI boys will be AI boys” (and yes, they are all boys)), it is no surprise that, as GM notes, we have been 20 years out from solving strong AI for the last 65 years or so. This is a bit like the many economists who predicted 15 of the last 6 recessions but worse. Why worse? Because there have been 6 recessions but there has been pitifully small progress on strong AI, at least if GM is to be believed (and I think he is). 

Why despite the hype (necessary to drain dollars from “smart” VC money) has this problem been so tough to crack? GM mentions a few reasons.

First, we really have no idea how open ended competence works. Let me put this backwards. As GM notes, AI has been successful precisely in “predefined domains” (6). In other words, where we can limit the set of objects being considered for identification or the topics up for discussion or the hypotheses to be tested we can get things to run relatively smoothly. This has been true since Winograd and his block worlds. Constrain the domain and all goes okishly. Open the domain up so that intelligence can wander across topics freely and all hell breaks loose. The problem of AI has always been scaling up, and it is still a problem. Why? Because we have no idea how intelligence manages to (i) identify relevant information for any given domain and (ii) use that information in relevant ways for that domain. In other words, how we in general figure out what counts and how we figure out how much it counts once we have figured it out is a complete and utter mystery. And I mean ‘mystery’ in the sense that Chomsky has identified (i.e. as opposed to ‘problem’).

Nor is this a problem limited to AI.  As FoL has discussed before, linguistic creativity has two sides. The part that has to do with specifying the kind of unbounded hierarchical recursion we find in human Gs has been shown to be tractable. Linguists have been able to say interesting things about the kinds of Gs we find in human natural languages and the kinds of UG principles that FL plausibly contains. One of the glories (IMO, the glory) of modern GG lies in its having turned once mysterious questions into scientific problems. We may not have solved all the problems of linguistic structure but we have managed to render them scientifically tractable.

This is in stark contrast to the other side linguistic creativity: the fact that humans are able to use their linguistic competence in so many different ways for thought and self-expression. This is what the Cartesians found so remarkable (see here for some discussion) and that we have not made an iota of progress understanding. As Chomsky put it in Language & Mind (and is still a fair summary of where we stand today):

Honesty forces us to admit that we are as far today as Descartes was three centuries ago from understanding just what enables a human to speak in a way that is innovative, free from stimulus control, and also appropriate and coherent. (12-13)[1]

All-things-considered judgments, those that we deploy effortlessly in every day conversation, elude insight. That we do this is apparent. But how we do this remains mysterious. This is the nut that strong AI needs to crack given its ambitions. To date, the record of failure speaks for itself and there is no reason to think that more modern methods will help out much.

It is precisely this roadblock that limiting the domain of interest removes. Bound the domain and the problem of open-endedness disappears.

This should sound familiar. It is the message in Fodor’s Modularity of Mind. Fodor observes that modularity makes for tractability. When we move away from modular systems, we flat on our faces precisely because we have no idea how minds identify what is relevant in any given situation and how it weights what is relevant in a given situation and how it then deploys this information appropriately. We do it all right. We just don’t know how.

The modern hype supposes that we can get around this problem with big data. GM has a few choice remarks about this. Here’s how he sees things (my emphasis):

I opened this talk with a prediction from Andrew Ng: “If a typical person can do a mental task with less than one second of thought, we can probably automate it using AI either now or in the near future.” So, here’s my version of it, which I think is more honest and definitely less pithy: If a typical person can do a mental task with less than one second of thought and we can gather an enormous amount of directly relevant data, we have a fighting chance, so long as the test data aren’t too terribly different from the training data and the domain doesn’t change too much over time. Unfortunately, for real-world problems, that’s rarely the case. (8)

So, if we massage the data so that we get that which is “directly relevant” and we test our inductive learner on data that is not “too terribly different” and we make sure that the “domain doesn’t change much” then big data will deliver “statistical approximations” (5). However, “statistics is not the same thing as knowledge” (9). Big data can give us better and better “correlations” if fed with “large amounts of [relevant!, NH] statistical data”. However, even when these correlational models work, “we don’t necessarily understand what’s underlying them” (9).[2]

And one more thing: when things work it’s because the domain is well behaved. Here’s GM on AlphaGo (my emphasis):

Lately, AlphaGo is probably the most impressive demonstration of AI. It’s the AI program that plays the board game Go, and extremely well, but it works because the rules never change, you can gather an infinite amount of data, and you just play it over and over again. It’s not open-ended. You don’t have to worry about the world changing. But when you move things into the real world, say driving a vehicle where there’s always a new situation, these techniques just don’t work as well. (7)
 
So, if the rules don’t change, you have unbounded data and time to massage it and the relevant world doesn’t change, then we can get something that approximately fits what we observe. But fitting is not explaining and the world required for even this much “success” is not the world we live in, the world in which our cognitive powers are exercised. So what does AI’s being able to do this in artificial worlds tell us about what we do in ours? Absolutely nothing.

Moreover, as GM notes, the problems of interest to human cognition have exactly the opposite profile. In Big Data scenarios we have boundless data, endless trials with huge numbers of failures (corrections). The problems we are interested in are characterized by having a small amount of data and a very small amount of error. What will Big Data techniques tell us about problems with the latter profile? The obvious answer is “not very much” and the obvious answer, to date, has proven to be quite adequate.

Again, this should sound familiar. We do not know how to model the everyday creativity that goes into common judgments that humans routinely make and that directly affects how we navigate our open-ended world. Where we cannot successfully idealize to a modular system (one that is relatively informationally encapsulated) we are at sea. And no amount of big data or stats will help.

What GM says has been said repeatedly over the last 65 years.[3] AI hype will always be with us. The problem is that it must crack a long lived mystery to get anywhere. It must crack the problem of judgment and try to “mechanize” it. Descartes doubted that we would be able to do this (indeed this was his main argument for a second substance). The problem with so much work in AI is not that it has failed to crack this problem, but that it fails to see that it is a problem at all. What GM observes is that, in this regard, nothing has really changed and I predict that we will be in more or less the same place in 20 years.

Postscript:

Since penning(?) the above I ran across a review of a book on machine intelligence by Gary Kasparov (here). The review is interesting (I have not read the book) and is a nice companion to the Marcus remarks. I particularly liked the history on Shannon’s early thoughts on chess playing computers and his distinction on how the problem could be solved:

At the dawn of the computer age, in 1950, the influential Bell Labs engineer Claude Shannon published a paper in Philosophical Magazine called “Programming a Computer for Playing Chess.” The creation of a “tolerably good” computerized chess player, he argued, was not only possible but would also have metaphysical consequences. It would force the human race “either to admit the possibility of a mechanized thinking or to further restrict [its] concept of ‘thinking.’” He went on to offer an insight that would prove essential both to the development of chess software and to the pursuit of artificial intelligence in general. A chess program, he wrote, would need to incorporate a search function able to identify possible moves and rank them according to how they influenced the course of the game. He laid out two very different approaches to programming the function. “Type A” would rely on brute force, calculating the relative value of all possible moves as far ahead in the game as the speed of the computer allowed. “Type B” would use intelligence rather than raw power, imbuing the computer with an understanding of the game that would allow it to focus on a small number of attractive moves while ignoring the rest. In essence, a Type B computer would demonstrate the intuition of an experienced human player.

As the review goes on to note, Shannon’s mistake was to think that Type A computers were not going to materialize. They did, with the result that the promise of AI (that it would tell us something about intelligence) fizzled as the “artificial” way that machines became “intelligent” simply abstracted away from intelligence. Or, to put it as Kasparov is quoted as putting it:  “Deep Blue [the machine that beat Kasparov, NH] was intelligent the way your programmable alarm clock is intelligent.”

So, the hope that AI would illuminate human cognition rested on the belief that technology and brute calculation would not be able to substitute for “intelligence.” This proved wrong, with machine learning being the latest twist in the same saga, per the review and Kasparov. 

All this fits with GM’s remarks above. What both do not emphasize enough, IMO, is something that many did not anticipate; namely that we would revamp our views of intelligence rather than question whether our programs had it.  Part of the resurgence of Empiricism is tied to the rise of the technologically successful machine. The hope was that trying to get limited machines to act like we do might tell us something about how we do things. The limitations of the machine would require intelligent design to get it to work thereby possibly illuminating our kind of intelligence. What happened is that getting computationally miraculous machines to do things in ways that we had earlier recognized as dumb and brute force (and so telling us nothing at all) has transformed into the hypothesis that there is no such things as real intelligence at all and everything is “really” just brute force. Thus, the brain is just a data cruncher, just like Deep Blue is. And this shift in attitude is supported by an Empiricist conception of mind and explanation. There is no structure to the mind beyond the capacity to mine the inputs for surfacy generalizations. There is no structure to the world beyond statistical regularities. On this Eish viw, AI has not failed, rather the right conclusion is that there is less to thinking than we thought. This invigorated Empiricism is quite wrong. But it will have staying power. Nobody should underestimate the power that a successful (money making) tech device can have on the intellectual spirit of the age.


[1] Chomsky makes the same point recently, and he is still right. See here for discussion and links to article.
[2] This should again sound familiar. It is the moral that Chang drew on her work on faces as discussed here.
[3] I myself once made similar points in a paper with Elan Dresher. Few papers have been more fun to write. See here for the appraisal (if interested).

Thursday, July 6, 2017

The logic of adaptation

I recently ran across a nice paper on the logic of adaptive stories (here), along with a nice short discussion of its main points (here) (by Massimo Pigliucci (P)). The Olson and Arroyo-Santos paper (OAS) argues that circularity (or “loopiness”) is characteristic of all adaptive explanations (indeed, of all non-deductive accounts) but that some forms of loopiness are virtuous while others are vicious. The goal, then, is to identify the good circular arguments from the bad ones, and this amounts to distinguishing small uninteresting circles from big fat wide ones. Good adaptive explanations distinguish themselves from just-so stories in having independent data afforded from the three principle kinds of arguments evolutionary biologists deploy. OAS adumbrates the forms of these arguments and uses this inventory to contrast lousy adaptive accounts from compelling ones. Of particular interest to me (and I hope FoLers) is the OAS claim that looking at things in terms of how fat a circular/loopy account is will make it easy to see why some kinds of adaptive stories are particularly susceptible to just-soism. What kinds? Well ones like those applied to the evolutions of language, as it turns out. Put another way, OAS leads to Lewontin like conclusions (see here) from a slightly different starting point.

An example of a just-so story helps to illustrate the logic of adaptation that OAS highlights.  Why do giraffes have long necks? So as to be able to eat leaves from tall trees. Note, that giraffes eat from tall trees confirms that having long necks is handy for this activity, and the utility of being able to eat from tall trees would make having a long neck advantageous. This is the loopiness/circularity that OAS insists is part of any adaptational account. OAS further insists that this circularity is not in itself a problem. The problem is that in the just-so case the circle is very small, so small as to almost shrink to a point. Why? Because the evidence for the adaptation and the fact that the adaptation explains is the same: tall necks are what we want to explain and also constitute the evidence for the explanation. As OAS puts it:

…the presence of a given trait in current organisms is used as the sole evidence to infer heritable variation in the trait in an ancestral population and a selective regime that favored some variants over others. This unobserved selective scenario explains the presence of the observed trait, and the only evidence for the selective scenario is trait presence (168).

In other words, though ‘p implies p’ is unimpeachably true, it is not interestingly so. To get some explanation out of an account that uses these observations we need a broader circle. We need a big fat circle/loop, not an anorexic one.

OAS’s main take home message is that fattening circles/loops is both eminently doable (in some cases at least) and is regularly done. OAS lists three main kinds of arguments that biologists use to fatten up an adaptation account: comparative arguments, population arguments, and optimality arguments. Each brings something useful to the table. Each has some shortcomings. Here’s how OAS describes the comparative method (169):

The comparative method detects adaptation through convergence (Losos 2011). A basic version of comparative studies, perhaps the one underpinning most state- ments about adaptation, is the qualitative observation of similar organismal features in similar selective contexts.

The example OAS discusses is the streamlined body shapes and fins in animals that live in water. The observation that aquatic animals tend to be sleek and well built for moving around in water strongly suggests that there is something about the watery environment that is driving the observed sleekness.  As this example illustrates, a hallmark of the comparative method is “the use of cross-species variation” (170). The downside of this method is that it “does not examine fitness or heritability directly” and it “often relies on ancestral character state reconstructions or assumptions of tempo and mode that are impossible to test” (171, table 1).

A second kind of argument focuses on variations in a single population and sees how this affects “heritability and fitness between potentially competing individuals” (171). These kinds of studies involve looking at extant populations and seeing how their variations tie up with heritability. Again OAS provides an extensive example involving “the curvature of floral nectar spurs” in some flowers (171) and shows how variation and fitness can be precisely measured in such circumstances (i.e. where it is possible to do studies of  “very geographically and restricted sets of organisms under often unusual circumstances” (172)).

This method, too, has a problem.  The biggest drawback is that the population method “examines relatively minor characters that have not gone to fixation” and “extrapolation of results to multiple species and large time scales” is debatable (171, table 1). In other words, it is not that clear whether the situation in which population arguments can be fully deployed reveal the mechanisms that are at play “in generating the patterns of trait distribution observed over geological time and clades” because it is unclear whether the “very local population phenomena are…isomporphic with the factors shaping life on earth at large” (172).

The third type of argument involves optimality thinking. This aims to provide an outline of the causal mechanisms “behind a given variant being favored” and rests on a specification of the relevant laws driving the observed effect (e.g. principles of hydronamics for body contour/sleekness in aquatic animals). The downside to this mode of reasoning is that it is not always clear what variables are relevant for optimization.

OAS notes that adaptive explanations are best when one can provide all three kinds of reasons (as one can in the case, for example, of aquatic contour and sleekness (see figure 4 and the discussion in P). Accounts achieve just-so status when none of the three methods can apply and none have been used to generate relevant data. The OAS discussion of these points is very accessible and valuable and I urge you take a look.

The OAS framing also carries an important moral, one that both OAS and P note: if going from just-so to serious requires fattening with comparative, population and optimization arguments then some fashionable domains of evolutionary speculation relying on adaptive consideration are likely to be very just-soish. Under what circumstances will getting beyond hand waving prove challenging? Here’s OAS (184, my emphasis):

Maximally supported adaptationist explanations require evidence from comparative, populational, and optimality approaches. This requirement highlights from the outset which adaptationist studies are likely to have fewer layers of direct evidence available. Studies of single species or unique structures are important examples. Such traits cannot be studied using comparative approaches, because the putatively adaptive states are unique (cf. Maddison and FitzJohn 2015). When the traits are fixed within populations, the typical tools of populational studies are unavailable. In humans, experimental methods such as surgical intervention or selective breeding are unethical (Ruse 1979). As a result, many aspects of humans continue to be debated, such as the female orgasm, human language, or rape (Travis 2003; Lloyd 2005; Nielsen 2009; MacColl 2011). To the extent that less information is available, in many cases it will continue to be hard to distinguish between different alternative explanations to decide which is the likeliest (Forber 2009).

Let’s apply these OAS observations to a favorite of FoLers, the capacity for human language. First, human language capacity is, so far as we can tell, unique to humans. And it involves at least one feature (e.g. hierarchical recursion) that, so far as we can tell, emerges nowhere else in biological cognition. Hence, this capacity cannot be studied using comparative methods. Second, it cannot be studied using population methods, as, modulo pathology, the trait appears (at least at the gross level) fixed and uniform in the human species (any kid can learn any language in more or less the same way). Experimental methods, which could in principle be used (for there probably is some variation across individuals in phenomena that might bear on the structure of the fixed capacity (e.g. differences in language proficiency and acquisition across individuals) will, if pursued, rightly land you in jail or at the World Court in the Hague. Last, optimization methods also appear useless for it is not clear what function language is optimized for and so the dimensions along which it might be optimized are very obscure.  The obvious ones relating to efficient information transmission are too fluffy to be serious.[1]

P makes effectively the same point, but for evo-psych in general, not just evo-lang. In this he reiterates Lewontin’s earlier conclusions. Here is P:

If you ponder the above for a minute you will realize why this shift from vicious circularity to virtuous loopiness is particularly hard to come by in the case of our species, and therefore why evolutionary psychology is, in my book, a quasi-science. Most human behaviors of interest to evolutionary psychologists do not leave fossil records (i); we can estimate their heritability (ii) in only what is called the “broad” sense, but the “narrow” one would be better (see here); while it is possible to link human behaviors with fitness in a modern environment (iii), the point is often made that our ancestral environment, both physical and especially social, was radically different from the current one (which is not the case for giraffes and lots of other organisms); therefore to make inferences about adaptation (iv) is to, say the least, problematic. Evopsych has a tendency to get stuck near the vicious circularity end of Olson and Arroyo-Santos’ continuum.

There is more, much more, in the OAS paper and P's remarks are also very helpful. So those interested in evolang should take a look. The conclusion both pieces draw regarding the likely triviality/just-soness of such speculations is a timely re-re-re-reminder of Lewontin and the French academy’s earlier prescient warnings. Some questions, no matter how interesting, are likely to be beyond our power to interestingly investigate given the tools at hand.

One last point, added to annoy many of you. Chomsky’s speculations, IMO, have been suitably modest in this regard. He is not giving an evolang account so much as noting that if there is to be one then some features will not be adaptively explicable. The one that Chomsky points to is hierarchical recursion. Given the OAS discussion it should be clear that Chomsky is right in thinking that this will not be a feature liable to an adaptive explanation. What would “variation” wrt Merge be? Somewhat recursive/hierarchical? What would this be and how would the existence of 1-merge and 2-merge systems get you to unbounded Merge? It won’t, which is Chomsky’s (and Dawkins’) point (see here for discussion and references). So, there will be no variation and no other animals have it and it doesn’t optimize anything. So there will be no available adaptive account. And that is Chomsky’s point! The emergence of FL whenever it occurred was not selected for. Its emergence must be traced to other non adaptive factors. This conclusion, so far as I can tell, fits perfectly with OAS’s excellent discussion. What Chomsky delivers is all the non-trivial evolang we are likely to get our hands on given current methods, and this is just what OAS, P and Lewontin should lead us to expect.



[1] Note that Chomsky’s conception of optimal and the one discussed by OAS are unrelated. For Chomsky, FL is not optimized for any phenotypic function. There is nothing that FL is for such that we can say that it does whatever better than something else might. For example structure dependence has no function so that Gs that didn’t have it would be worse in some way than ones (like ours) that do.

Friday, June 30, 2017

Statistical obscurantism; math destruction take 2

I've mentioned before that statistical knowledge can be a dangerous thing (see here). It's a little like Kabbala, something that is dangerous in the hands of inexperienced, the ambitious and lazy.  This does not mean that in its place stats are not valuable tools. Of course they are. But there is a reason for the slogan "lies, damn lies and statistics." A few numbers can cover up the most awful thinking, sort of like pretty pix of brains in the NYT can sell almost any new cockamamie idea in cog-neuro. So, in my view, stats is a little like nitroglycerine; useful but dangerous on unsteady ground.

Now, even I don't really respect my views on these matters. What the hell do I know, really? Well, very little. So I will buck this view up by pointing you to an acknowledged expert on the subject who has come to a very similar conclusion. Here is Andrew Gelman despairing of the view that done right stats is the magic empirical elixir, able to get something out of any data set, able to spin scientific gold from any experimental foray:

In some sense, the biggest problem with statistics in science is not that scientists don’t know statistics, but that they’re relying on statistics in the first place.
How is stats the problem? Because it covers up dreadful thinking:
Just imagine if papers such as himmicanes, air rage, ages-ending-in-9, and other clickbait cargo-cult science had to stand on their own two feet, without relying on p-values—that is, statistics—to back up their claims. Then we wouldn’t be in this mess in the first place.
So, one problem with stats is that they can make drek look serious. Is this a problem with the good use of stats? No, but given the current culture, it is a problem. And as these pair of quotes suggests, if something absent the stats sounds dumb, then one should be very very very wary of the stats. In fact, one might go further: if the idea sans stats looks dumb then the best reaction on hearing that idea with stats is to reach for your wallet (ore your credulity).

So what does Gelman suggest we do? Well, he is a reasonable man so he says reasonable things:
I’m not saying statistics are a bad idea. I do applied statistics for a living. But I think that if researchers want to solve the reproducibility crisis, they should be doing experiments that can successfully be reproduced—and that involves getting better measurements and better theories, not rearranging the data on the deck of the Titanic.
Yup, it looks like he is recommending thinking. Not a bad idea.  The problem is that stats has the unfortunate tendency of replacing thought. It gives the illusion of being able to substitute technique for insight. Stats are often treated as the Empiricist's perfect tool: it is the method that allows the data speak for itself. And this is the illusion that Gelman is trying to puncture.

German has (given his posts of late) come to believe that this illusion is deeply desired. Here he is again replying to the suggestion that misuse of stats is largely an educational problem:
Not understanding statistics is part of it, but another part is that people—applied researchers and also many professional statisticians—want statistics to do things it just can’t do. “Statistical significance” satisfies a real demand for certainty in the face of noise. It’s hard to teach people to accept uncertainty. I agree that we should try, but it’s tough, as so many of the incentives of publication and publicity go in the other direction.
I would add, you will not be surprised to hear, that there is also the Eish dream I mentioned above wherein the aim is to minimize the human factor mediating data and theory. Rationalists believe that the world must be vigorously interrogated (sometimes put under extreme duress) to reveal its deep secrets. Es don't think that it has deep secrets as they don't really believe that the world has that much hidden structure. Rather the problem is with us: we fail to see what is before our eyes if we gather the data carefully and inspect it with an open heart. The data will speak for itself (which is what stats correctly applied will allow it to do). This Eish vision has its charms. I never underestimate it. I think that it partially lies behind the failure to appreciate Gelman's points.









Wednesday, June 28, 2017

Facing the nativist facts

One common argument for innateness rests on finding some capacity very early on. So imagine that children left the womb speaking Yiddish (the langauge FL/UG makes available with all unmarked values for parameters). The claim that Yiddish was innate would (most likely) not be a hard sell. Actually, I take this back: there will always be unreconstructed Empiricists that will insist that the capacity is environmentally driven, no doubt by some angel that co-habits the womb with the kid all the while sedulously imparting Yiddish competence.

Nonetheless, early manifestation of competence is a pretty good reason for thinking that the manifest capacity rests on biologically given foundations, rather than being the reflex of environmental shaping.  This logic applies quite generally and it is interesting to collect examples of it beyond the language case. The more Eism stumbles the easier it is to ignore it in my own little domain of language.

Here is the report of paper in Current Biology that makes the argument that face recognition is pre-wired in. The evidence? Kids in utero distinguish face like images from others. Given the previous post (here) this conclusion should not be very surprising. There is good evidence that face competence relies on abstract features used to generate a face space. Moreover, these features are not extracted from exemplars and so would appear to be a pre-condition (rather than consequence) for face experience. At any rate, the present article reports on a paper that provides more evidence for this conclusion. Here’s the abstract:

It's well known that young babies are more interested in faces than other objects. Now, researchers have the first evidence that this preference for faces develops in the womb. By projecting light through the uterine wall of pregnant mothers, they found that fetuses at 34 weeks gestation will turn their heads to look at face-like images over other shapes.
Pulling this experiment off required some technical and conceptual breakthroughs: a fancy 4D ultrasound and the appreciation that light could penetrate into the uterus. This realized, the kid in utero responded to faces as infants outside the uterus respond to them. “The findings suggest that babies' preference for faces begins in the womb. There is no learning or experience after birth required.” This does not mean that face recognition is based on innate features. After all, the kid might have acquired the knowledge underlying its discriminative abilities by looking at degraded faces projected through the womb, sort of a fetus’s version Plato’s Cave. This is conceivable, but I doubt that it is believable. Here’s one reason why. It apparently takes some wattage to get the relevant facial images to the in utero kid. Decent reception requires bright lights, hence the author’s following warning:

Reid says that he discourages pregnant mothers from shining bright lights into their bellies.

So, it’s possible that the low passed filter images that the kid sees bouncing around the belly screen is what drives the face recognition capacity. But then the Easter Bunny and Santa Clause are also logically possible.

This work looks ready to push back the data at which kids capacities are cognitively set. First faces, then numbers and quantities. Reid and colleagues are rightly ambitious to push back the time line on the alter two now that faces have been studied. In my view, this kind of evidence is unnecessary as the case for substantial innate machinery was already in place absent this cool stuff (very good for parties and small talk). However, far be it from me to stop others from finding this compelling. What matters is that we dump the blank slate view so that we can examine what the biological givens are. It would be weird were there not substantial innate capacity, not that there is. The question is not whether this is true, but which possible version is.

Last point: for all you skeptics out there: note this is standard infant cognition run in a biology journal. I fail to see any difference in the logic behind this kind of work and analogous work on language. The question is what’s innate. It seems that finding out what is so is a question of biological interest, at least if the publishing venue is a clue. So, to the degree that linguists’ claims bear on the innate mental structures underlying human linguistic facility, to that degree they are doing biology. Unless of course you think that research in biology gets is bona fides via its tools; no 4D ultrasounds and bright lights no biology. But who would ever confuse a discipline with its tools?
  

Wednesday, June 21, 2017

Two things to read

Here are a couple of readables that have entertained me recently.

The first is a NYT report (here) on what is taken to be an iconoclastic view of the role of animal aesthetics in evolution. According to the article, a female’s aesthetic preferences can drive evolutionary change. This, apparently, was Darwin’s view but it appears to be largely out of favor today. More utilitarian/mundane conceptions are favored. Here’s the mainstream view as per the NYT:

All biologists recognize that birds choose mates, but the mainstream view now is that the mate chosen is the fittest in terms of health and good genes. Any ornaments or patterns simply reflect signs of fitness.

The old/new view wants to allow for forces based on fluffier considerations:

The idea is that when they are choosing mates — and in birds it’s mostly the females who choose — animals make choices that can only be called aesthetic. They perceive a kind of beauty. Dr. Prum defines it as “co-evolved attraction.” They desire that beauty, often in the form of fancy feathers, and their desires change the course of evolution.

The bio world contrasts these two approaches, favoring the more “objective” utility-based one over the more “subjective” aesthetic one.  Why? I suspect because the former seems so much more hard-headed and, thus, “scientific.” After all, why would any animal prefer something on aesthetic grounds! If there is no cash value to be had, clearly there is not value to be had at all! (Though this reminds one of the saying about knowing the price of everything and the value of nothing).

An aside: I suspect that this preference hangs on importing the common sense understanding of ‘fitness’ into the technical term. The technical term dubs fit any animal that sends more of its genes into the next generation whatever the reason for this. So, if being a weak effete pretty boy allows greater reproductive success than being a tough successful but ugly looking tough guy than pretty boyhood is fitter than ugly tough guy even if the latter appears to eat more, control more territory and fight harder. Pretty boys may be less fit on the colloquial sense, but they are not less fit technically if they can get more of their genes into the next generation. So, strictly speaking appealing to a female’s aesthetics (if there is such a thing) in such a way as to make you more alluring to her and making it more likely that your genes will mix with hers makes you more fit even if you are slower, weaker and more pusillanimous (i.e. less fit in common parlance).  

Putting the aside aside, focusing on the less fluffy virtues may seem compelling when it comes to animals, though even here the story gets a bit involved and slightly incredulous.  So for example here’s one story: peahens prefer peacocks with big tails because if a peacock can make it in peacock world despite schlepping around a whopping big tail that makes doing anything at all a real (ahem) challenge, then that peacock must be really really really fit (i.e. stronger, tougher, etc.) and so any rational peahen would want its genes for its own offspring. The evaluation is purely utilitarian and the preference for the badly engineered results (big clumsly tail) are actually the hidden manifestations of a truer utilitarian calculus (really really fit because even with handicap it succeeds).

And what would the alternative be? Well, here’s a simple possibility: peahens find big tails hot and are attracted to showy males because they prefer hotties with big tails. There is nothing underneath the aesthetic judgment. It is not beautiful because of some implied utility. It’s simple lust for beauty driving the train. Beauty despite the engineering-wise grotesque baggage. Of course, believing that there is something like beauty that is not reducible to having a certain (biological) price is a belief that can land you in the poorly paid Arts faculty and exiled from the hard headed Science side of campus. Thus, they are unlikely to be happily entertained. However, it is worth noting how little there often is behind the hard headed view beside the (supposed) “self evident” fact that it is hard headed.  Nonetheless, that appears to be the debate being played out in the bio world as reported by the NYT, and, as regards animals, maybe ascribing aesthetics to them is indulgent anthropomorphism.

Why do I mention this? Because what is going on here is similar to what goes on in evo accounts concerning humans as well. The point is discussed in a terrific Jerry Fodor review of Pinker’s How the Mind Works in the LRB about 20 years ago (here). If you’ve never read it, go there now and delight yourself. It is Fodor at his acerbic (and analytical) best.

At any rate, he makes the following point in discussing Pinker’s attempt to “explain” human preferences for fiction, friends, games, etc. in more prudential (adaptationist/ selectionist) terms.

I suppose it could turn out that one’s interest in having friends, or in reading fictions, or in Wagner’s operas, is really at heart prudential. But the claim affronts a robust, and I should think salubrious, intuition that there are lots and lots of things that we care about simply for themselves. Reductionism about this plurality of goals, when not Philistine or cheaply cynical, often sounds simply funny. Thus the joke about the lawyer who is offered sex by a beautiful girl. ‘Well, I guess so,’ he replies, ‘but what’s in it for me?’ Does wanting to have a beautiful woman – or, for that matter, a good read – really require a further motive to explain it? Pinker duly supplies the explanation that you wouldn’t have thought that you needed. ‘Both sexes want a spouse who has developed normally and is free of infection … We haven’t evolved stethoscopes or tongue-depressors, but an eye for beauty does some of the same things … Luxuriant hair is always pleasing, possibly because … long hair implies a long history of good health.’

Read the piece: the discussion of why we love literature and want friends is quite funny. But the serious point is that aside from being delightfully obtuse, the more hard headed “Darwinian” account ends up sounding unbelievably silly. Just so stories indeed! But that’s what you get when in the end you demand that all values reduce to their cash equivalent.

So, the debate rages.

The second piece is on birdsongs in species that don’t bring up their own kids (here). Cowbirds are brood parasites. They are also songbirds. And they are songbirds that learn the cowbird song and not that of their “adoptive” hosts. The question is how do they manage to learn their own song and not that of their hosts (i.e. ignore the song of their hosts and zero in on that of their conspecifics? The answer seems to be the following:

…a young parasite recognizes conspecifics when it encounters a particular species-specific signal or "password" -- a vocalization, behavior, or some other characteristic -- that triggers detailed learning of the password-giver’s unique traits.

So, there is a certain vocal signal (a “password” (PW)) that the young cowbird waits for and this allows it to identify its conspecific and this triggers the song learning that allows the non cowbird raised bird to learn the cowbird song. In other words, it looks a very specific call (the “chatter call”) triggers the song learning part of the brain when it is heard. As the article puts it:

Professor Hauber's "password hypothesis" proposes that young brood parasites first recognize a particular signal, which acts as a password that identifies conspecifics, and the parasites learn other species­-specific characters only after encountering that password. One of the important features of the password hypothesis is that the password must be innate and familiar to the animal from a very early age. This suggests that encountering the password triggers specific neural responses early in development -- neural responses can actually be seen and measured.

It seems that some of the biochemistry behind this PW triggering process has been identified.

…cowbirds' brains change after the birds hear the chatter call by rapidly increasing production of a protein known as ZENK. This protein is ephemeral; it is produced in neurons after exposure to a new stimuli, but disappears only a few hours later, and it is not produced again if the same stimuli is encountered. The production of ZENK occurs in the neurons in the auditory forebrain, which are regions in the songbird brain that respond to learned vocalizations, such as songs, and also to specific unlearned calls.

So, hear PW, get ZENKed, get song. It gets you to think: why don’t humans have the same thing wrt language? Why aren’t there PWs for English, Chinese etc? Or more exactly, why isn’t it the case that humans come biologically differentiated so that they are triggered to learn different languages? Or, why is it that any child can acquire any language in the same way as any other child?  You might think that if language evolved with a P&P architecture and that different Gs were simply different settings of the same parameters with different values (i.e. each G was a different vector of such values) that evolution might have found it useful to give those most likely to grow up speaking Piraha or Hungarian a leg up by prepopulating their parameter space with Piraha or Hungarian values. Or at least endowing these offspring with PWs that when encountered triggered the relevant G values. Why don’t we see this?

Here’s one non-starter of an answer: there’s not enough time for this to have happened. Wrong! If we can have gone from lactose intolerant to lactose tolerant in 5,000 years then why couldn’t evo give some kids PWs in that time? Maybe too much intermixing of populations? But we know that there have been long stretches of time during which populations were quite isolated (right?). So this could have happened, and indeed it did with cowbirds. So why not with us? [1]

At any rate, it is not hard to imagine what the cowbird linguistic equivalent would be. Hear a sentence like “Mr Phelps, should you agree to take this mission then, as you know, should you or any of your team be captured, the government will disavow any knowledge of your activities” and poof, out pops English. Just think of how much easier second language acquisition would be. Just a matter of finding the right PWs.  But this is not, it seems, how it works with us. We are not cowbirds. Why not?

So, enjoy the pieces, they amused me. Hopefully they will amuse you too.


[1] This would particularly apposite given Mark Baker’s speculations (here; p. 23):

..it could be that linguistic diversity has the desirable function of making it hard for a greedy or dangerous outsider to join your group and get access to your resources and skills. You are less vulnerable to manipulation or deception by a would-be exploiter who cannot communicate with you easily.

In this context, a PW for offspring might be just what Dr Darwing might have ordered. But it appears not to exist.